pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-22 14:15:01 +08:00

Author	SHA1	Message	Date
Soumith Chintala	0b868b1906	add setup metadata to help PyPI flesh out content on pypi package page	2019-06-21 16:26:40 -04:00
Orion Reblitz-Richardson	fb6347d937	specify data type in the doc (#19959 ) (#20013 ) Summary: addresses comments in #19915 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19959 Differential Revision: D15149993 Pulled By: orionr fbshipit-source-id: 0e438cfa1a311e89d4bed7ae9d7710a9f1b19a78	2019-05-01 12:52:29 -04:00
driazati	1bb8cfcc5a	[jit] Document new features, fix some rst bugs (#19974 ) * Squashed commits * User Defined Types -> Classes	2019-05-01 09:55:44 -04:00
Richard Zou	142c973f41	Fix version handler in 1.1.0 docs. (#19977 ) Update the find & replace to be less restrictive. Will port this change to master to avoid problems in the future.	2019-04-30 19:22:19 -04:00
Richard Zou	e39ab6632f	Setup the docs build for v1.1.0 -> stable (#19962 ) This docs build won't actually be run until we open a new pull request, after this one has been merged, "merging" v1.1.0 into master. On that pull request, the docs build gets run. It is safe to merge this PR in whenever, but the new docs build should wait until after pytorch/pytorch.github.io#187 gets merged.	2019-04-30 15:56:47 -04:00
izdeby	20607a99a3	Fixed log_normal and geometric for CPU	2019-04-30 11:54:32 -07:00
Soumith Chintala	f0bc8d1dc5	cleanup	2019-04-30 07:05:45 -07:00
Soumith Chintala	cca6aca5d2	hard-set version in release branch to 1.1.0	2019-04-29 22:12:09 -07:00
Soumith Chintala	5a5ff34ff1	add numpy and future to requirements for binaries	2019-04-29 21:31:59 -07:00
Orion Reblitz-Richardson	63b2ecd934	Improve torch.utils.tensorboard docs	2019-04-29 16:05:13 -07:00
Shen Li	82f6886f73	Make find_unused_parameters in DDP default to False (#19895 ) Summary: As DDP in previous releases does not support unused params, turning off `find_unused_parameters` by default to derisk new reducer. CC pietern soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/19895 Reviewed By: pietern Differential Revision: D15118563 Pulled By: mrshenli fbshipit-source-id: 6215c486e1dae3387b36011d8e64a2721ac85f58	2019-04-29 16:04:11 -07:00
Pieter Noordhuis	fbe8a37832	Finer grained consistency check in reducer (#19901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19901 The existing code used `expect_autograd_hooks_` as a proxy for the situation where finalization of the previous iteration is needed. This is not correct, however, since you may decide to completely ignore the output of a DDP wrapped module. If this is the case, and no gradients have been passed to the reducer, it is fine to keep going. This commit adds a new variable `require_finalize_` that tracks whether the finalization is really needed. Reviewed By: mrshenli Differential Revision: D15118871 fbshipit-source-id: 25938eaf1fe13e2940feae1312892b9d3da8a67d	2019-04-29 15:59:24 -07:00
Pieter Noordhuis	89748dd0dd	Only call into reducer if torch.is_grad_enabled() (#19897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19897 During validation, gradient reduction is not needed, and autograd is never called. The model output will always be a detached tensor. After the new reducer was merged, this meant that it would find all model parameters unused, and kick off reduction for them. When #19799 and output where no parameters are used and it tries to kick off reduction of zeroed gradients. Test for `torch.is_grad_enabled()` and `self.training` before calling into the reducer. Reviewed By: mrshenli Differential Revision: D15118726 fbshipit-source-id: b0208f632a61cbe8110fa626fa427937b7f05924	2019-04-29 15:59:17 -07:00
davidriazati	092bcc9c69	Misc pickler improvements (#19638 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * #19638 [jit] Serialize attribute module as torch.jit._pickle * use `torch.jit._pickle` as the module for globals in the pickle program. Pickle will try to resolve these to the actual functions in `torch.jit._pickle.py` automatically (I believe this can also be overridden to point to whatever functions you want). This means that `pickle.load("my_model/attributes.pkl")` will work instead of having to use a custom `pickle.Unpickler` * use `REDUCE` opcodes instead of `BUILD` to make use of the last bullet * use a union in the unpickler to support globals better (+ any future metadata we might need that can't be stored in an `IValue`), this makes some of the code around `IntList`s clearer and lets us get rid of any lookbehind for opcodes * pickle things as a tuple instead of a list (an immutable result is more semantically correct)](https://our.intern.facebook.com/intern/diff/15111203/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19638 Pulled By: driazati Differential Revision: D15111203 fbshipit-source-id: 526c6c2b63a48eb1cba1c658045a7809730070dd	2019-04-29 15:58:36 -07:00
Elias Ellison	54f9440479	remove scalar to float matching (#19918 ) Summary: Trying to get this in before 1.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19918 Reviewed By: driazati Differential Revision: D15124430 Pulled By: eellison fbshipit-source-id: 549cdcbaff91218657e94ce08c0f4e69b576d809	2019-04-29 15:58:27 -07:00
peter	4adc14da61	Fix conda build for Windows (#19824 ) Summary: Let's test it before merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19824 Differential Revision: D15116111 Pulled By: soumith fbshipit-source-id: 0a73de3f045ee1349061674f5f8e2aaba382493c	2019-04-28 08:42:45 -07:00
Soumith Chintala	ea1d0eeb92	fix conda build folder	2019-04-27 22:17:49 -07:00
Pieter Noordhuis	c7ad499b33	Allow for iterations where no module parameter is used It is possible that not a single parameter is used during an iteration. If this is the case, the prepare_for_backward function marks all parameters as unused, kicks off reduction of all buckets, and finalizes the reduction. This is different from the prior implementation where we assumed that autograd would produce a gradient for at least a single parameter. We then used the autograd callback mechanism to queue a finalizer callback. Now, this finalizer may be executed in line.	2019-04-27 22:02:33 -07:00
Soumith Chintala	16f2b22120	fix env for binary builds	2019-04-27 21:59:23 -07:00
Chandler Zuo	472be69a73	Avoid Output Uninitialized Blobs in Load with load_all=1 (#19133 ) Summary: When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1. See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133 Reviewed By: dzhulgakov Differential Revision: D14883698 Pulled By: chandlerzuo fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6	2019-04-27 10:45:44 -07:00
Max Wang	268859ce0d	Fix CUDA stream syncing bug in allgather and reduce_scatter (#19631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19631 ghimport-source-id: edc47e77d6ef03e966944ff98eefc22f2574eeaa Reviewed By: mrshenli Differential Revision: D15110077 Pulled By: mxw fbshipit-source-id: 27a68308ade5ea511e2ea568a071eedb5d21c1ba	2019-04-27 08:35:56 -07:00
Michael Suo	a25b79531c	use fully qualified name for ScriptClasses (#19239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19239 ghimport-source-id: 830aad6dc11d2a7247760a9c7c9fc8556f70a706 Differential Revision: D14928293 Reviewed By: eellison Pulled By: suo fbshipit-source-id: d2efa5d7f7397526083278d6650b9cee8d967b1a	2019-04-26 19:17:21 -07:00
Xiaomeng Yang	2ce39de3fc	Add elementwise_affine for layer_norm_op (#19713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713 Add elementwise_affine for layer_norm_op Reviewed By: houseroad Differential Revision: D15075454 fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f	2019-04-26 17:20:01 -07:00
David Riazati	f9786ad351	Add support for LONG_BINGET pickler op (#19815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19815 ghimport-source-id: dd51c13892a8f0d91d726ae8ec65206d5e81f33e Differential Revision: D15109969 Pulled By: driazati fbshipit-source-id: da0bb5e30038173e74ca3e0e103dc11ba1638797	2019-04-26 17:13:48 -07:00
Elias Ellison	5a83a7424d	fix optional type unification (#19813 ) Summary: Previously in type unification when we encountered an Optional[T] and a None, we would unify it to Optional[Optional[T]]. If you think about Optionals as a union of [T, None], then a union of [Optional[T], None] -> [T, None]. We should just be never create an Optional of an Optional. The other fix is to change unify_types directly, but I think this is the more general fix, and would play more nicely with our optional type refinement, which also assumes we never encounter an Optional[Optional[T]]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19813 Reviewed By: suo Differential Revision: D15103083 Pulled By: eellison fbshipit-source-id: db803db10d6934eaa5458e7c1746546b0d0c0a6c	2019-04-26 16:14:51 -07:00
Michael Antonov	698103cdd6	DataLoader docs update to describe how workers are managed, including Windows. (#18091 ) Summary: It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091 Differential Revision: D15083766 Pulled By: soumith fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf	2019-04-26 16:01:30 -07:00
Soumith Chintala	4e6608e86d	Revert D15103223: [pytorch][PR] [CUDA 10] Resolve host_define.h warnings Differential Revision: D15103223 Original commit changeset: 5b56c4dd9cc4 fbshipit-source-id: f9a8e5ff0ee54cf5bb588896ab26dd9f0fb9ba45	2019-04-26 16:01:27 -07:00
Tongzhou Wang	42fbeef5d7	update F.grid_sample doc for clarity (#19754 ) Summary: https://github.com/pytorch/pytorch/issues/19717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19754 Differential Revision: D15085449 Pulled By: soumith fbshipit-source-id: 0dda05bd395d58a496bf397ca7f1c50a239b0ed1	2019-04-26 16:01:24 -07:00
davidriazati	dc67d9f3b9	Cleanup documentation (#19584 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * #19584 [jit] Cleanup documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/19584 Pulled By: driazati Differential Revision: D15104801 fbshipit-source-id: 87391fd62ee92b615e680469f8bd9a1ac654be7e	2019-04-26 15:43:07 -07:00
Soumith Chintala	75754beca3	Revert D14577575: [pytorch][PR] Fix lack of state init for adagrad and add share_memory flag Differential Revision: D14577575 Original commit changeset: 12440079ac96 fbshipit-source-id: 935106385e608471dc280fc61cfedf19d330812d	2019-04-26 15:43:04 -07:00
Orion Reblitz-Richardson	11297702b9	Fix the install of TensorBoard for doc generation (#19814 ) Summary: One more fix for https://github.com/pytorch/pytorch/pull/19810 We now know that we are running with python3, so no need to check python version. The quotes were probably causing problems here. cc ezyang soumith zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19814 Differential Revision: D15106459 Pulled By: orionr fbshipit-source-id: 0443b9b54d17fead9c8c2c9d8d2f373e1f95a28b	2019-04-26 14:56:04 -07:00
Stefan Krah	be20d65b70	Follow up to adaptive_max_pool3d() port (#19748 ) Summary: This is a follow up PR for #19547. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19748 Differential Revision: D15103230 Pulled By: ezyang fbshipit-source-id: e7ce925faeadea502f77ed42d52e247c8c6571d8	2019-04-26 14:34:54 -07:00
Stefan Krah	cb4d41afcd	Follow up to adaptive_max_pool2d() port (#19738 ) Summary: This is a follow up PR for #19409. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19738 Differential Revision: D15103231 Pulled By: ezyang fbshipit-source-id: 11c9fec641b389906b8accd22504a683331fa6ec	2019-04-26 14:30:09 -07:00
Syed Tousif Ahmed	2573e695b0	Resolve host_define.h warnings (#19789 ) Summary: Eigen was updated with the commit needed to get rid of this warning that plagued the CI. This PR bumps third_party/eigen to that commit head. ``` warning: #warning "host_defines.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19789 Differential Revision: D15103223 Pulled By: ezyang fbshipit-source-id: 5b56c4dd9cc41ff1794570ba2f6abfbe23f6ab68	2019-04-26 13:52:21 -07:00
Max Wang	c5845c4482	Add support for reduce-scatter in c10d (#18844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18844 ghimport-source-id: c6b2f0032c7c2212be2000a9c1f262f63d878a97 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18844 Add support for reduce-scatter in c10d * #18820 Refactor ProcessGroupNCCL collective primitives Reviewed By: mrshenli Differential Revision: D14768369 fbshipit-source-id: a9def7a0da6e9cd995e982371cc1e22f3df1a156	2019-04-26 13:46:57 -07:00
Junjie Bai	c9f380df02	Add aten mkldnn linear operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210 Reviewed By: dzhulgakov Differential Revision: D14901641 fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11	2019-04-26 13:41:57 -07:00
Junjie Bai	48b81da4cb	Add aten mkldnn view operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19209 Reviewed By: dzhulgakov Differential Revision: D14894545 fbshipit-source-id: 69455184811de1d1444b5d494e4a9d8c83301431	2019-04-26 13:41:54 -07:00
Junjie Bai	61d5a8dded	Add aten mkldnn add operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19207 Reviewed By: dzhulgakov Differential Revision: D14889477 fbshipit-source-id: 2c5e5ea5dfc26a9c9a172c5fa2c6d7584b167e16	2019-04-26 13:41:51 -07:00
Junjie Bai	fb53c189b3	Add aten mkldnn batch_norm operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19206 Reviewed By: dzhulgakov Differential Revision: D14887205 fbshipit-source-id: ea00c9e3205c449d08ab29535309164f951aab95	2019-04-26 13:41:48 -07:00
Junjie Bai	4864000e55	Add aten mkldnn ops: relu, max_pool2d and avg_pool2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19205 Reviewed By: dzhulgakov Differential Revision: D14850598 fbshipit-source-id: 5bbd5909c06df9c980de680ffb81bf772766c0ba	2019-04-26 13:41:44 -07:00
Junjie Bai	3445020ca3	Add aten mkldnn conv2d operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19204 Reviewed By: dzhulgakov Differential Revision: D14857513 fbshipit-source-id: 1172c9785e5a17a7d7360474551bdc7a511b3f2f	2019-04-26 13:41:41 -07:00
Junjie Bai	8f1445c406	Add is_mkldnn to at::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19633 Reviewed By: dzhulgakov Differential Revision: D15053320 fbshipit-source-id: 12b9f85a025a9e957e1b7b3014ba44ae71bfd7a5	2019-04-26 13:41:38 -07:00
Wanchao Liang	236c2b2387	Let script module buffer attributes can also cast device/type (#19700 ) Summary: Tested locally this fix #19039, did not add a test since there's no way to create a script module in the cpp world. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19700 Differential Revision: D15094195 Pulled By: wanchaol fbshipit-source-id: fcc2c1e5efbc160d976ae485ba2457442f62f065	2019-04-26 13:06:52 -07:00
Will Feng	5099db08d4	Ignore `nn::Functional` submodules in `nn::Module` serialization (#19740 ) Summary: Currently, the Python API doesn't serialize layers that don't have weights (such as `nn.ReLU` and `nn.MaxPool2d`e.g. in https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py#L80-L81). If one saves a model that contains weight-less layers in Python and tries to load it into C++, the C++ module loading code (`torch::load(...)`) will throw an error complaining that the expected layers are not found in the serialized file (e.g. https://github.com/pytorch/vision/pull/728#issuecomment-480974175). This PR solves the problem by ignoring layers that are not serializable (which currently only include `nn::Functional`) in the C++ module serialization code (`torch::save(...)` and `torch::load(...)`), and the user is expected to use `nn::Functional` to wrap the weight-less layers so that they can be ignored when serializing / deserializing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19740 Differential Revision: D15100575 Pulled By: yf225 fbshipit-source-id: 956481a2355d1de45341585abedda05e35d2ee8b	2019-04-26 12:47:23 -07:00
Max Wang	61d48aa989	Refactor ProcessGroupNCCL collective primitives (#18820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18820 ghimport-source-id: 220b2a3dd9d4d6d2e557e1802851f082c2dc6452 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18820 Refactor ProcessGroupNCCL collective primitives Planning to add reduce-scatter, but no room in my stomach for more copypasta. Also rewrote the tensor list validation logic. The existing validation was ill-suited for all the cases it was being used for; it took a vector of input tensors and a vector of output tensors, but only ever received either two references to the same vector, or a bespoke singleton vector and a vector of outputs (for which it would ignore all but the first output). In the first case, it performed unnecessary checks, and in the second, it skipped necessary ones. Reviewed By: mrshenli Differential Revision: D14762369 fbshipit-source-id: dcf882ce1c5854333a9eb4424bfc18d9f4648ddf	2019-04-26 12:38:48 -07:00
Orion Reblitz-Richardson	e1ebf330d5	Install TensorBoard for doc generation (#19810 ) Summary: In order to have `torch.utils.tensorboard.SummaryWriter` rendered in the documentation at the bottom of https://pytorch.org/docs/master/tensorboard.html we need to have TensorBoard installed. This change makes it so our pinned version of `tb-nightly` is used for doc generation same as it is used for running tests at https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L45-L52 Eventually we'll use a pinned version of `pip install tensorboard`, but it's not on the release channel yet. cc kostmo soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/19810 Differential Revision: D15101730 Pulled By: orionr fbshipit-source-id: c41678c4f9ef3d56a168f2b96a1ab05f351bdc56	2019-04-26 12:06:18 -07:00
huangyanhua	bacc8815c7	update Anaconda download link (#19794 ) Summary: Now `https://www.continuum.io/` is redirected to `https://www.anaconda.com` and old Anaconda download link `https://www.continuum.io/downloads` is dead. This PR update it to `https://www.anaconda.com/distribution/#download-section`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19794 Differential Revision: D15099538 Pulled By: soumith fbshipit-source-id: 967dcda34d9d446c0d26c0014f10cc710f69a0c5	2019-04-26 09:45:44 -07:00
Spandan Tiwari	dafee117e8	Removing unused arg f from _model_to_graph(). (#19647 ) Summary: Input argument `f` in `_model_to_graph()` method in `torch/onnx/utils.py` is unused. This PR removes it. If there's a reason to keep it around, please let me know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19647 Reviewed By: dzhulgakov Differential Revision: D15071720 Pulled By: houseroad fbshipit-source-id: 59e0dd7a4d5ebd64d0e30f274b3892a4d218c496	2019-04-26 09:40:52 -07:00
Pieter Noordhuis	0d8a3610c5	Multiple module outputs and multiple calls to backward (#19799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19799 A module that returns multiple outputs and where the called may end up doing multiple calls to torch.autograd.backward did not work with DistributedDataParallel. It expected the first call to torch.autograd.backward to provide gradients for ALL parameters that expect gradients and were used in computing the module output. If you have outputs with disjoint autograd graphs it is fine to call torch.autograd.backward on both and fill in the module's parameter gradients in separate chunks. With this change we delay queuing the finalizer callback until we have marked all buckets as ready, instead of queueing it the first time we receive an autograd hook. This returns the current implementation to be functionally equivalent to the DistributedDataParallel implementation before #18953 was merged. Reviewed By: mrshenli Differential Revision: D15097045 fbshipit-source-id: 2df023319713bc31e29a8b45108c78e6593fccd4	2019-04-26 08:20:10 -07:00
Eric Faust	dcfb5620df	Allow passing lists as trace inputs. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19580 Differential Revision: D15034978 fbshipit-source-id: d3bc32ccae1c12104f2bde43fd4700d220bb3ca9	2019-04-26 02:41:57 -07:00
Karl Ostmo	8f0603b128	C++ changes toward libtorch and libcaffe2 unification (#19554 ) Summary: * adds TORCH_API and AT_CUDA_API in places * refactor code generation Python logic to separate caffe2/torch outputs * fix hip and asan * remove profiler_cuda from hip * fix gcc warnings for enums * Fix PythonOp::Kind Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554 Differential Revision: D15082727 Pulled By: kostmo fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4	2019-04-26 01:38:10 -07:00
Yinghai Lu	9d180e602f	More topi support (#19728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19728 Added `Tanh`, `Transpose` and `Mul` support. Reviewed By: hlu1 Differential Revision: D15078878 fbshipit-source-id: 0a0df6b0d453bc38987b6d744774c127dd6875fe	2019-04-26 00:53:11 -07:00
zrphercule	c182824f69	Update foxi version (#19793 ) Summary: Update foxi to the latest version for group quantization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19793 Reviewed By: jackm321, houseroad Differential Revision: D15095982 Pulled By: zrphercule fbshipit-source-id: 0d1cb403cbda47a4fda9035e1712fced60ced283	2019-04-25 22:39:40 -07:00
Lu Fang	20c22bcae4	Automatic update of fbcode/onnx to 22662bfd4dcc6baebf29e3b823a051676f991001 (#19790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19790 Previous import was 27d4b617e7097cda7d0d4c45ff2b09d248f33179 Included changes: - [22662bfd](https://github.com/onnx/onnx/commit/22662bfd): Bump up version number and update Versioning for 1.5.0 release (#1965) <Raymond Yang> - [b1a3a8c8](https://github.com/onnx/onnx/commit/b1a3a8c8): fix the ci (#1964) <Lu Fang> Reviewed By: zrphercule Differential Revision: D15095183 fbshipit-source-id: b69cb62685122b83a1493b2702aa6ec950ee15bf	2019-04-25 22:23:25 -07:00
Junjie Bai	f0d493d290	Add devtoolset 8 (gcc 8) + glibc 2.26 + centos 7.5 rocm docker image (#19767 ) Summary: xw285cornell Will add py3.6-devtoolset8-glibc2.26-rocmrpm-centos7.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19767 Differential Revision: D15094446 Pulled By: bddppq fbshipit-source-id: 01a932d893cf4559f98612888308b3ad6900a038	2019-04-25 22:13:20 -07:00
Tzu-Wei Huang	98e312cf96	TensorBoard support within PyTorch (#16196 ) Summary: This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at https://github.com/tensorflow/tensorboard/pull/2065. With these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using ``` import torch from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() s1 = torch.rand(1) writer.add_scalar('data/scalar1', s1[0], 0) writer.close() ``` Design: - `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard - `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets. Action items: - [x] `from torch.utils.tensorboard import SummaryWriter` - [x] rename functions - [x] unittests - [x] move actual writing function to tensorflow/tensorboard in https://github.com/tensorflow/tensorboard/pull/2065 Review: - Please review for PyTorch standard formatting, code usage, etc. - Please verify unittest usage is correct and executing in CI Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future. cc orionr, ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/16196 Differential Revision: D15062901 Pulled By: orionr fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e	2019-04-25 21:30:23 -07:00
Ailing Zhang	97e80ab6fc	Always enable autodiff check (#19787 ) Summary: disable_autodiff_subgraph_inlining should be always on to check AD regression. Thanks eellison for spotting the test regression! Pull Request resolved: https://github.com/pytorch/pytorch/pull/19787 Differential Revision: D15093104 Pulled By: ailzhang fbshipit-source-id: 82a75a7dd7097d5f93a2e4074023da2105341c1b	2019-04-25 21:22:30 -07:00
Jack Montgomery	48d5ab54a8	Automatic update of fbcode/foxi to 8f74bc4df3a4cfc69b1a3eadf62aa29d9961c72d AND update Glow AND update C2 (#19792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19792 This diff also contains the contents of D15092641 and D15090411 so as to not let c2, foxi, and glow get out of sync Previous import was 81e1683d6348eee4b5ed1145222dc2c41be4269c Included changes: - [8f74bc4](https://github.com/houseroad/foxi/commit/8f74bc4): Small fixes (#12) <Jack Montgomery> - [72097e4](https://github.com/houseroad/foxi/commit/72097e4): Add multiple quantization params per tensor (#11) <Jack Montgomery> - [b681fe0](https://github.com/houseroad/foxi/commit/b681fe0): Merge pull request #10 from jackm321/add_autoinstrument_graph_prop <Jack Montgomery> - [a68d835](https://github.com/houseroad/foxi/commit/a68d835): Add ONNXIFI_GRAPH_PROPERTY_AUTO_INSTRUMENT_NODES <Jack Montgomery> Reviewed By: rdzhabarov, zrphercule Differential Revision: D15086794 fbshipit-source-id: 8df02c62303b580e16a218d6be7791747e3d7213	2019-04-25 21:03:32 -07:00
Alexander Sidorov	7a8bc85f47	Profiler: add Self CPU Time Total, CPU time total and other general improvements (#19378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19378 Function profile events are typically nested. In this diff I add parent child relationship to the intervals. This way we can attribute self time easily. As a result, user printing a table from a profiler trace gets self cpu time. This diff doesn't try to address CUDA self time as CUDA kernels are already getting special care in the profiler. There are also some other minor improvements. Like reporting total CPU time spent, reversed sorting, aggregated data after the table, etc. There is a new unit test added which tests more functionality than previous profiler test Reviewed By: zheng-xq Differential Revision: D14988612 fbshipit-source-id: 2ee6f64f0a4d0b659c6b23c0510bf13aa46f07dc	2019-04-25 20:53:55 -07:00
Zafar Takhirov	6e06154c13	Quantized SumRelu (#19319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19319 Quantized SUM + ReLU (Fused). The implementation is the same as the one in the DNNLOWP. Reviewed By: jianyuh Differential Revision: D14866442 fbshipit-source-id: c8c737a37e35b6ce3c1c2077c07546aba16e0612	2019-04-25 18:01:21 -07:00
Zafar Takhirov	76307667ca	Use the QTensor with QReLU (#19312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19312 Replaces the tuple hack with the QTensor. Please, note this can be landed ONLY after #18960 (D14810261) is landed. Reviewed By: raghuramank100 Differential Revision: D14819460 fbshipit-source-id: 75ca649304b1619cb3cfe845962c9f226b8f884a	2019-04-25 18:01:17 -07:00
Zafar Takhirov	db9008496e	Changing the rounding in the QTensor (#19714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19714 We had rounding in the quantizer set as `round(x/scale) + zp`. To make it consistent, converting it to `round(x/scale + zp)`. Reviewed By: raghuramank100 Differential Revision: D15077095 fbshipit-source-id: 5d20a90391fe8c2e11b338c05631fcf7770320c3	2019-04-25 18:01:13 -07:00
Jesse Hellemn	e814c11045	Fix env vars needed for devtoolset7 binaries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19780 Differential Revision: D15091963 Pulled By: pjh5 fbshipit-source-id: 2594395b2313d5c8a37db28965d99b0541a227e3	2019-04-25 17:50:14 -07:00
Jesse Hellemn	c5cca65351	Fixing update_s3_htmls for binaries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19746 Differential Revision: D15091326 Pulled By: pjh5 fbshipit-source-id: ed172c678dd5659fa31d5d9b6ee1bf119ede2889	2019-04-25 17:24:02 -07:00
Spandan Tiwari	9ef8eb4cbc	Fix case for `activations` attribute in nn.RNN ONNX export. (#19368 ) Summary: This PR addresses the https://github.com/pytorch/pytorch/issues/19366 issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19368 Reviewed By: zrphercule Differential Revision: D15043949 Pulled By: houseroad fbshipit-source-id: 9b90410307d31bc5f2fd14aa0cdd33b22572ed7c	2019-04-25 16:31:25 -07:00
Zachary DeVito	a425e1cbf8	Remove duplicate inlineCallToCode (#19724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19724 ghimport-source-id: a68d28ac9bbe62dd61f03bfd9d57f4ef1d0ce9c9 Reviewed By: jamesr66a Differential Revision: D15078532 Pulled By: zdevito fbshipit-source-id: bebd34ff6105f538395260b027dc169448b5bc96	2019-04-25 15:53:10 -07:00
Zachary DeVito	330990d878	Serialize first-class version of functions (#19723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19723 ghimport-source-id: 7f7ec6200c3b42d19046a3e228a3d82212697f14 Reviewed By: jamesr66a Differential Revision: D15078533 Pulled By: zdevito fbshipit-source-id: fe421afab9607ee942f6d200f04bb6335fc0aa97	2019-04-25 15:53:07 -07:00
Zachary DeVito	6cb1b994d8	Trace directly into first-class module form. (#19722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19722 ghimport-source-id: b024666feccb324f5ba9aae4a6301723e04d9846 Reviewed By: jamesr66a Differential Revision: D15078535 Pulled By: zdevito fbshipit-source-id: b866b31c1864a090c545560cbecee81e34ad2d16	2019-04-25 15:53:03 -07:00
Zachary DeVito	31524bda1f	@torch.jit.script(fn) now is a torch.jit.Function (#19721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19721 ghimport-source-id: b4f5024adc845a82dc5197d19aab1496bf85089f Reviewed By: jamesr66a Differential Revision: D15078534 Pulled By: zdevito fbshipit-source-id: 408d3a871302c5ac5d6426dc5de567f2188ebf4c	2019-04-25 15:53:00 -07:00
Zachary DeVito	12f7c2dea3	pybind CompilationUnit and Function directly (#19720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19720 ghimport-source-id: c5829234dbbe8f7fe719ffce3fa92ce5198ffd21 Reviewed By: jamesr66a Differential Revision: D15078536 Pulled By: zdevito fbshipit-source-id: e617de31fc907a408fb50e18d9358dfd64de1f9e	2019-04-25 15:52:57 -07:00
Oleg Bogdanov	bf5a5c2a31	caffe2 \| Use _aligned_free in WorkerPool destruction (#19751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19751 This has probably never been tested on Windows but destruction of WorkersPool crashes because it uses _aligned_malloc to allocate and 'free' to deallocate, which is not symmetric. Fix is to use _aligned_free in deallocation Reviewed By: hlu1 Differential Revision: D15083472 fbshipit-source-id: 42243fce8f2dfea7554b52e6b289d9fea81d7681	2019-04-25 14:54:50 -07:00
Yinghai Lu	65496e4e67	Bug fix in bound shape inferencer (#19729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19729 Accessing dims() without boundary check is not good. Reviewed By: zrphercule Differential Revision: D15078912 fbshipit-source-id: 3746d0c18261abeec0c4880c30430125928c3309	2019-04-25 14:50:19 -07:00
Thomas Viehmann	556c8a300b	Fall back to asking nvcc for detecting cuda version if no cudaart is found (#19741 ) Summary: This happens on Debian/Ubuntu with distribution-provided cuda repackaging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19741 Differential Revision: D15082550 Pulled By: soumith fbshipit-source-id: 2ca39c6cdc9305896529b6fd537270116223cd6c	2019-04-25 10:54:20 -07:00
Lu Fang	5025d1d5e4	Automatic update of fbcode/onnx to 27d4b617e7097cda7d0d4c45ff2b09d248f33179 (#19718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19718 Previous import was 0e8d2bc5e51455c70ef790b9f65aa632ed9bc8a7 Included changes: - [27d4b617](https://github.com/onnx/onnx/commit/27d4b617): Adding RoIAlign operator (#1869) <Sam Pepose> - [70c9026c](https://github.com/onnx/onnx/commit/70c9026c): add ReverseSequence op (#1927) <Guoliang Hua> - [ed2db02a](https://github.com/onnx/onnx/commit/ed2db02a): README.md: Update badge style for build status (#1942) <Yulong Wang> - [e36d3b54](https://github.com/onnx/onnx/commit/e36d3b54): Enable python 3.7 in CI for Windows (#1943) <Raymond Yang> Differential Revision: D15077516 fbshipit-source-id: c8c6935381ff5a96ab9a4ee519685814f4ea6e59	2019-04-25 10:54:15 -07:00
Lu Fang	bbedadddce	Fix Circle CI for ONNX repo (#19725 ) Summary: New pip package becomes more restricted. We need to add extra flag to make the installation work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19725 Differential Revision: D15078698 Pulled By: houseroad fbshipit-source-id: bbd782a0c913b5a1db3e9333de1ca7d88dc312f1	2019-04-25 10:48:41 -07:00
Ailing Zhang	0effe1d4a4	Make interpolate bicubic match opencv result (#19703 ) Summary: Fixes #19650 When driazati started bicubic implementation we used TF result as ground truth. It turns out opencv version bicubic resize is used more commonly. This PR does two things: - Fix a bug where we didn't use area mode to compute source index - Follow the Opencv logic to handle computed negative source indices(we used to bound them by 0). Pull Request resolved: https://github.com/pytorch/pytorch/pull/19703 Differential Revision: D15078159 Pulled By: ailzhang fbshipit-source-id: 06a32baf2fbc93b90a156b863b4f9fab326d3242	2019-04-25 10:21:31 -07:00
Daniel Thul	29d8711ef0	Fix compilation on Windows 10 (CUDA 10.0, Visual Studio 2017) (#19615 ) Summary: I want to use libtorch in a C++/CUDA project but as soon as I include `<torch/torch.h>`, ".cu" files fail to compile: `torch/csrc/jit/script/tree.h(64): error C3520: 'args': parameter pack must be expanded in this context` This PR makes it build on my machine (don't know if it breaks anything though). Pull Request resolved: https://github.com/pytorch/pytorch/pull/19615 Differential Revision: D15063712 Pulled By: ezyang fbshipit-source-id: 7561e705f8f5b42b8e6a23430710b36508fee1ee	2019-04-25 09:37:17 -07:00
kirayue	af06d6342c	Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler (#17226 ) Summary: Because of merge error with master in #15042, open a new PR for ezyang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17226 Differential Revision: D14418145 Pulled By: mrshenli fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae	2019-04-25 09:26:31 -07:00
Vitaly Fedyunin	465799fab3	Replace cpu_apply with TensorIterator inside of Copy function (#18618 ) Summary: Replace cpu_apply functions with the TensorIterator. Vectorize copy and clone functions. Move big pieces of the code to cpu kernels folder to be able to use AVX2. Add fast path for copy_ function if tensor types matches. Slow down observed on smaller tensors (up to 10% or about 1us per op.) which might be explained by the bigger CPU footprint of TensorInterator in compare to simpler cpu_apply. COntrary on bigger tensors we can see 2x-3x performance improvement (single threaded, multithreading giving even better performance boost). Pull Request resolved: https://github.com/pytorch/pytorch/pull/18618 Differential Revision: D14954118 Pulled By: VitalyFedyunin fbshipit-source-id: 9d9bdf3fd9d5e539a03071cced50d0a47bac1615	2019-04-25 08:09:14 -07:00
Seungwon Park	6c7135decb	fix typo: pytoch -> pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719 Differential Revision: D15080095 Pulled By: ezyang fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af	2019-04-25 06:40:40 -07:00
Natalia Gimelshein	3875e1ba45	try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 ) Summary: Sometimes at::cat gets transposed inputs and goes on a slow path. Also, make jit_premul lstm benchmark add bias to the whole input tensor to avoid separate reduction kernels in the backward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18816 Differential Revision: D15013576 Pulled By: wanchaol fbshipit-source-id: bcfa1cf44180b11b05b0f55f034707012f66281a	2019-04-24 23:44:25 -07:00
Wanchao Liang	c571969148	Fix the insert_guard for norm decomposation (#19646 ) Summary: move the insert_guard all the way up to the beginning of the decomposation, this will fix the case that we lose insert_point context after decomposeCommonNormalization and we still need to modify the graph. fixes #19502 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19646 Differential Revision: D15058040 Pulled By: wanchaol fbshipit-source-id: ebdbf8623ebfe4556c461e1b650e94b905791adb	2019-04-24 23:12:37 -07:00
Wanchao Liang	3c81eb3aa7	add max_pool2d to AD, add tests for both autodiff and inference mode Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19661 Differential Revision: D15074431 Pulled By: wanchaol fbshipit-source-id: b31cf2126c2c5d6a12c2ef5dc67b57677652f1fc	2019-04-24 22:19:51 -07:00
Summer Deng	cbd0a2d3c9	Fix the depthwise 3x3x3 fast path criteria for the stride (#19692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19692 Remove the requirement on stride for the optimized depthwise 3x3x3 kernels. Reviewed By: jspark1105 Differential Revision: D15070214 fbshipit-source-id: 9fe2d8e96930166e4eb0e2dd2288f6a0c4831e0a	2019-04-24 21:35:27 -07:00
Natalia Gimelshein	614871d948	use relative path to load libthnvrtc (#19690 ) Summary: We had a few hard to repro cases where very occasionally libthnvrtc failed to be loaded due to what looked like garbled dladdr return in `info.dli_fname` field. We could not root cause why this is happening, but this workaround avoids the problem altogether. $ORIGIN is already added to RPATH as the first search location, so dlopen("libthnnvrtc.so") will look for libthnvrtc in the caller (`libtorch.so.1`) directory, which was the purpose of the previous code that was getting `libtorch.so.1` directory using dladdr. ``` root@4ec0aab027a0:/opt/conda/lib/python3.6/site-packages/torch/lib# readelf -d ./libtorch.so.1 \| grep RPATH 0x000000000000000f (RPATH) Library rpath: [$ORIGIN:/usr/local/cuda/lib64:/opt/conda/lib] ``` Hopefully, same should be happening on Mac. cc zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/19690 Differential Revision: D15076990 Pulled By: soumith fbshipit-source-id: a4d2992ccf26953f1fc73f17c4e752d69c58e2fc	2019-04-24 21:09:31 -07:00
Roy Li	72b8b6c374	Change some comments related to moving copy_ to native (#19618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19618 ghimport-source-id: 6bb9965f2f7b72f602f03e27b664d7d7696edd00 Differential Revision: D15048632 Pulled By: li-roy fbshipit-source-id: a2707e3086f3a9993780a7f76104c5f00f2a9618	2019-04-24 19:23:06 -07:00
Roy Li	17e4cd0c0a	Remove old complex Types (#19616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19616 ghimport-source-id: d8b3e15d84d3e6f810af3cb83d1413c5f048bcdc Differential Revision: D15047741 Pulled By: li-roy fbshipit-source-id: 572045f88f410d97f60c56298018bfee6268b375	2019-04-24 19:18:16 -07:00
Roy Li	6fead42eb8	Remove function variant of copy_ (#19622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19622 ghimport-source-id: 41eadd845e2cbd113735b8650dca39354e1c7f1d Differential Revision: D15049274 Pulled By: li-roy fbshipit-source-id: 4f7d7cb2c8339e5e7e35f95397fe6a3f4b7c74f3	2019-04-24 17:57:18 -07:00
Roy Li	a6811e17c0	Restore copy_ overload with async arg (#19641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641 ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379 Differential Revision: D15056755 Pulled By: li-roy fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3	2019-04-24 17:51:50 -07:00
davidriazati	c08f3d06c3	Add some of nn.init to weak script (#19640 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * #19640 [jit] Add some of nn.init to weak script Pull Request resolved: https://github.com/pytorch/pytorch/pull/19640 Pulled By: driazati Differential Revision: D15065332 fbshipit-source-id: 30df9f02e527cd5e5ebe34b7e003444eae96c66d	2019-04-24 17:00:48 -07:00
Will Feng	9aa0e6078f	Support serializing std::vector<torch::Tensor> (#19677 ) Summary: In the distributed training development work, we need to be able to serialize a `std::vector` of `torch::Tensor`s. This PR adds support for serializing `std::vector<torch::Tensor>`. cc. mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/19677 Differential Revision: D15069860 Pulled By: yf225 fbshipit-source-id: 505147e5f5fea78be1bf60fb8418bc187dbc2a98	2019-04-24 16:50:16 -07:00
James Reed	32174bedb8	Fix fuser tests on sandcastle (#19684 ) Summary: suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/19684 Reviewed By: suo Differential Revision: D15067942 Pulled By: jamesr66a fbshipit-source-id: 697a836ea37dab78fffd092194cecd8294ca9907	2019-04-24 16:50:13 -07:00
David Riazati	3d6e956412	Add LONG_BINPUT to unpickler (#19696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19696 ghimport-source-id: 8d711cd3ed2b2810b5b3d765564429882f96d1f1 Differential Revision: D15072658 Pulled By: driazati fbshipit-source-id: a28a90218874e07cfbed4f8df3d6d23ae5e70933	2019-04-24 16:39:23 -07:00
Elias Ellison	62447a5aa3	improve err msg (#19645 ) Summary: Print out the tensor value when throwing the cannot insert tensor with grad error Pull Request resolved: https://github.com/pytorch/pytorch/pull/19645 Differential Revision: D15057809 Pulled By: eellison fbshipit-source-id: 3f622ef1322a75c965e780275f1fb447e9acf38d	2019-04-24 16:22:07 -07:00
Jerry Zhang	6ec55c13a9	Enable assignment for QTensor in pytorch frontend (#19676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19676 Make copy work with QTensor, enable assignment of QTensor in pytorch frontend. Differential Revision: D15064710 fbshipit-source-id: 04f2dc02a825695d41fa1114bfca49e92108fef3	2019-04-24 16:05:34 -07:00
Alex Şuhan	4a65ee95cc	Make torch.equal work with boolean CPU tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19604 Differential Revision: D15056022 Pulled By: li-roy fbshipit-source-id: 1309b107b2d4ee0a490bce1b43c3c175180a1580	2019-04-24 15:51:10 -07:00
Vitaly Fedyunin	d14abe3aff	Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688 ) Summary: Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688 Differential Revision: D15012644 Pulled By: VitalyFedyunin fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd	2019-04-24 15:38:56 -07:00
Dmytro Dzhulgakov	d247912dbf	Add no-gpu build mode for all of PyTorch and Caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19687 Differential Revision: D15023347 fbshipit-source-id: 5bed0d72e8ff337e066c142ca5c8e2c2bae93746	2019-04-24 13:27:59 -07:00
David Goodwin	c855e04d5f	Caffe2 shouldn't fail if CUDA peer access is already enabled Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19586 Differential Revision: D15061544 Pulled By: dzhulgakov fbshipit-source-id: 6a5f9f4fe45259d689671f58ad5206cdaf15c5bd	2019-04-24 13:22:27 -07:00
BowenBao	960513006f	Support exporting squeeze & unsqueeze with negative dim attribute Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19297 Reviewed By: zrphercule Differential Revision: D14953525 Pulled By: houseroad fbshipit-source-id: 8d7eecd2804b8e27d3ee4ad6e763352818d02d0c	2019-04-24 12:45:59 -07:00
Gu, Jinghui	b675f07bb6	Remove useless input shape checker in conv (#19608 ) Summary: The input shape checkers in conv/int8_conv operator is aims to avoid the issue when running with mkldnn winograd, the weigths has to be reordered each time if input shape changed. However, the checkers result to big performance regression due to frequent reorder. Meanwhile, in mkldnn-bridge, such case has been already fixed by correcting the prop_kind. Therefore, we have to remove the useless checker to fix the performance regression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19608 Differential Revision: D15061169 Pulled By: yinghai fbshipit-source-id: 649a43ae6fce989e84939210f6dffb143ec3d350	2019-04-24 11:39:43 -07:00
Zachary DeVito	87a6974193	Make it possible for self.forward to return a ScriptMethod (#19217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19217 ghimport-source-id: 6fdd7f5ac041dae950b47ca316f30682ede0b083 Reviewed By: suo Differential Revision: D14922120 Pulled By: zdevito fbshipit-source-id: 5e82e5d7ee72df6f401146d2519c80ea336ff40e	2019-04-24 11:14:34 -07:00
Rui Zhu	2f73b3d26e	Add if ops support for onnxifi and ssa-rewrite (#19585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19585 Originally we will unroll all If op to many different subnets; Now we will not unroll it anymore, but just add all external input of its subnet to the If op, and ssa-rewrite all external input/outputs. That would be enough. Reviewed By: yinghai Differential Revision: D15038139 fbshipit-source-id: 8532216d8749068acd5558ad0d8cb1d98463a063	2019-04-24 11:01:13 -07:00
Karl Ostmo	41486306d9	GCC ABI variants for nightly builds (#18888 ) Summary: closes #17492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18888 Differential Revision: D15065093 Pulled By: pjh5 fbshipit-source-id: 6abeabf68b91106fc8ae9df238f6a40613d40b57	2019-04-24 10:08:56 -07:00
SsnL	5e62ee2b97	Fix no SIGCHLD checking in DataLoaderIter._shutdown_workers (#19421 ) Summary: Also 1. Bump multiprocessing test timeout following python core tests 2. Fix one type of flakiness in `test_proper_exit`. 3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`. 3. Give `test_proper_exit` another try. I'll heavily retest this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421 Differential Revision: D15063728 Pulled By: ezyang fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c	2019-04-24 08:06:58 -07:00
Edward Yang	c42f3f9055	Revert D15008160: Enable assignment for QTensor in pytorch frontend Differential Revision: D15008160 Original commit changeset: 5f1166246d76 fbshipit-source-id: 24c7350431ae6a87199d6e3f7ffbbc8ec7d3c28b	2019-04-24 06:58:13 -07:00
Elias Ellison	84b275b70f	fix rocm test (#19663 ) Summary: for some reason exec in python fails on rocm build alone Pull Request resolved: https://github.com/pytorch/pytorch/pull/19663 Differential Revision: D15061382 Pulled By: eellison fbshipit-source-id: d6e1776e88c22de973796e5080147e6d31aba477	2019-04-24 00:48:27 -07:00
efaust	8273b9b3cb	Enforce consistent dict iteration order for trace inputs. (#19528 ) Summary: Stack:     ⚫  #19528 [pytorch] Enforce consistent dict iteration order for trace inputs.  [💛](https://our.intern.facebook.com/intern/diff/D15023656/) Don't iterate down unordered_maps and expect ordering. Should fix test flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19528 Differential Revision: D15023656 Pulled By: efaust fbshipit-source-id: 91c9a31a8652fcf93ae0e942bea4cec67bb490c9	2019-04-23 23:36:48 -07:00
Jerry Zhang	309c15e2df	Enable assignment for QTensor in pytorch frontend (#19530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19530 Make copy work with QTensor, enable assignment of QTensor in pytorch frontend. Differential Revision: D15008160 fbshipit-source-id: 5f1166246d768b23f009cde1fa03e8952368a332	2019-04-23 21:29:31 -07:00
eellison	d902774cad	Dont introduce aliasing in CSE or Constant Pooling (#19576 ) Summary: We can't introduce aliasing to a graph output, since they may be mutated after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19576 Differential Revision: D15057734 Pulled By: eellison fbshipit-source-id: 33594c05d985a0c58edebd6252e1ee2c0efb6f0e	2019-04-23 20:39:09 -07:00
Jerry Zhang	ba1cf38718	Remove QTensor alias (#19635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19635 att Differential Revision: D15053349 fbshipit-source-id: 7cd0e6c9ff567d05b051527410f452b059458af2	2019-04-23 20:34:11 -07:00
Dmytro Dzhulgakov	8b798f43e3	Commit explicit libtorch_python sources (#19607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607 Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable. As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems Reviewed By: ezyang Differential Revision: D15023348 fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df	2019-04-23 19:49:42 -07:00
Elias Ellison	5119cc7cdf	builtin ivalues sort (#19572 ) Summary: Add sorting to all the lists which we specialize on (Tensor, int, float, bool). First part of https://github.com/pytorch/pytorch/issues/19372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19572 Differential Revision: D15052677 Pulled By: eellison fbshipit-source-id: 301e8e0e3e29e04aca1311410db0a474fd833cff	2019-04-23 16:38:08 -07:00
James Reed	80020b3d2d	Guard {set,rebase}_history on grad_fn check (#19623 ) Summary: We would previously have statements like ``` set_history(flatten_tensor_args( result ), grad_fn); ``` Internally, {set,rebase}_history would check grad_fn and short circuit if it is nullptr. However, this means that we are executing the expression `flatten_tensor_args( result )` and immediately throwing away the results. This was causing unnecessary allocations + overhead. My JIT overhead benchmark script (with custom benchmark method): ``` import torch, time torch.jit.script def add(x, y): return x + y a = torch.rand([]) b = torch.rand([]) niter = 1000000 with torch.no_grad(): s = time.time() add.__getattr__('forward').benchmark(niter, a, b) e = time.time() - s print('overhead per call (us)', e / niter * 1e6) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19623 Differential Revision: D15053399 Pulled By: jamesr66a fbshipit-source-id: 8777e1a2b5c5a5bbd3a035b7247c8154c5fc4aa6	2019-04-23 15:40:11 -07:00
Xiaomeng Yang	fb9fc42a0c	optimize BatchMatmulOp (#18612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18612 optimize BatchMatmulOp Reviewed By: houseroad Differential Revision: D14681665 fbshipit-source-id: cf5ea4909ace58fd44fe6fa634531102ac84e851	2019-04-23 15:34:59 -07:00
Jerry Zhang	176bdc0722	fix lint (#19632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19632 at Differential Revision: D15052952 fbshipit-source-id: 7c38fad99799e5ac914685c36eadf932afe52b74	2019-04-23 15:29:38 -07:00
Phúc Lê	9b272affde	Add base support to torch.logspace, default base=10 (#19542 ) Summary: Add base support for torch.logspace. See #19220 for details. SsnL can you feedback? Thanks a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542 Differential Revision: D15028484 Pulled By: soumith fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e	2019-04-23 15:06:34 -07:00
Michael Suo	96b966297e	disable flake8 E302 (two blank lines) (#19634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19634 ghimport-source-id: 68b11ac3c19daf8df3bbf11e6181e9450899e90a Differential Revision: D15053466 Pulled By: suo fbshipit-source-id: 09d7859aa2059fc9eb3b47fa62467537bab40e05	2019-04-23 15:06:31 -07:00
Tongzhou Wang	6b8771a7a6	fix nn.Sequential doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19597 Differential Revision: D15042383 Pulled By: soumith fbshipit-source-id: f912ed2a726a17fcc25795ff66b73ae4caacd247	2019-04-23 14:58:16 -07:00
Oleg Bogdanov	70b82d28b8	caffe2 \| Windows compat fixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19531 Reviewed By: hlu1 Differential Revision: D15024541 fbshipit-source-id: cd8249a6d529afb65fa8afd74a05dbfe73eb1fb0	2019-04-23 14:30:19 -07:00
Sebastian Messmer	2e048feb9e	Remove fixed TODO (#19590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19590 - Reviewed By: ezyang Differential Revision: D15039561 fbshipit-source-id: 246cf4fa91a33cb4c96750b534b8c3d0c312f311	2019-04-23 13:50:22 -07:00
Huamin Li	55e53d3d7e	correct comments in group_norm_op (#19621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19621 Comments for group_norm_op is not accurate (i.e., the math part), this diff will fix it. Reviewed By: BIT-silence Differential Revision: D15048695 fbshipit-source-id: 27d41d3ae21054257967815254134849944d56ca	2019-04-23 13:31:15 -07:00
Sebastian Messmer	5f82d59c0a	Simplify argument test cases (#19593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19593 Removes a lot of duplication Reviewed By: dzhulgakov Differential Revision: D15039887 fbshipit-source-id: e90fe024b84220dd337fdd314d8f7e3620baec28	2019-04-23 12:58:35 -07:00
Sebastian Messmer	fddd763ec1	Add test cases for optional of list (#19592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19592 This is already supported but wasn't tested yet Reviewed By: ezyang Differential Revision: D15039888 fbshipit-source-id: dc8ea724c76dd1719b1d4810a20c8f958e5beecc	2019-04-23 12:58:32 -07:00
Stefan Krah	fc8834df4b	Port adaptive_max_pool3d() to ATen (#19547 ) Summary: This is the second part of #18064. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19547 Differential Revision: D15046630 Pulled By: ezyang fbshipit-source-id: 03f80602b94d47bca66bfd0dcab1b7bb99e5b7f1	2019-04-23 12:51:25 -07:00
Elias Ellison	0922a64d22	add torch.tensor requires grad (#19445 ) Summary: Add setting requires_grad = True within torchscript to torch.Tensor Within constant propagation, we can't insert any constants that require grad. Also added shape analysis and requires grad analysis to torch.tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/19445 Differential Revision: D15046211 Pulled By: eellison fbshipit-source-id: b4ef7a6b4b6b8dc03e1fa49f87dc415874cd1998	2019-04-23 12:27:52 -07:00
Yinghai Lu	4e8cc8ee90	Surface the Glow traces to C2 (#19087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19087 att Reviewed By: jackm321 Differential Revision: D14863112 fbshipit-source-id: 2680161b9f05391e73bb8dac4fbbeabb87a82c05	2019-04-23 12:27:49 -07:00
Kaiyu Shi	444f792fa6	Fix lack of state init for adagrad and add share_memory flag (#17679 ) Summary: The current code initialize the `state` in `__init__` method, but the initialization process is not invoked in `add_parameter_group`. I followed the same approach in other Optimizers to init the `state`. ```python import torch emb = torch.nn.Embedding(10,10) emb2 = torch.nn.Embedding(10,10) optim = torch.optim.Adagrad(emb.parameters()) print(optim.state[emb.weight]) # already initialized optim.add_param_group({'params': emb2.parameters()}) print(optim.state[emb2.weight]) # empty dict loss = emb2.weight.sum() + emb.weight.sum() loss.backward() optim.step() # raised KeyError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17679 Differential Revision: D14577575 Pulled By: ezyang fbshipit-source-id: 12440079ac964b9eedad48e393d47f558babe300	2019-04-23 12:22:19 -07:00
Priya Goyal	0d0acba3bd	Allow extracting element-wise loss in softmax (#19579 ) Summary: Often times, we want to experiment with loss per element (image etc.). This changeset allows getting per element loss as well. This output is optional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19579 Reviewed By: jerryzh168 Differential Revision: D15035797 Pulled By: prigoyal fbshipit-source-id: 562dea514f49c1f2f1cbbc083a1938dc019a75c4	2019-04-23 11:49:49 -07:00
Wanchao Liang	e9c8f372c4	dispatch max_pools with no indices, expose max_pools to torch namespace (#19449 ) Summary: in functional interfaces we do boolean dispatch, but all to max_pool\d_with_indices. This change it to emit max_pool\d op instead when it's not necessary to expose with_indices ops to different backends (for jit). It also bind max_pool\d to the torch namespace, which is the same behavior with avg_pool\d Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449 Differential Revision: D15016839 Pulled By: wanchaol fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c	2019-04-23 11:20:05 -07:00
Jerry Zhang	f3be2816ae	Adds `fakeQuantizePerTensorAffineOp` to pytorch (#19387 ) Summary: Adding fakequant op so that we can use it in pytorch models, the exact implementation might change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19387 Differential Revision: D13739657 fbshipit-source-id: d5cb084e843d236bb1da9827ac1ba3900ed99786	2019-04-23 11:12:53 -07:00
James Reed	1b3967b491	-fno-math-errno -fno-trapping-math (#19552 ) Summary: As suggested in https://github.com/pytorch/pytorch/pull/19152#discussion_r275925767, this may give the compiler more opportunities for auto-vectorization Pull Request resolved: https://github.com/pytorch/pytorch/pull/19552 Differential Revision: D15048358 Pulled By: jamesr66a fbshipit-source-id: db2c2c515c3e9f7d22305c039ab0c8a867fc43a2	2019-04-23 11:06:49 -07:00
Bram Wasti	d8729efabe	Only require python print on certain namespaces (#19383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19383 ghimport-source-id: b93c7849a52d11ecbf26b614704740d44a2447f9 Differential Revision: D15032727 Pulled By: bwasti fbshipit-source-id: a19f72abb99e63d87eab13022538f325b2e20526	2019-04-23 10:52:48 -07:00
Zafar Takhirov	3cc60e54e3	Use `fbgemm` for quantize/dequantize ops (#19500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19500 Changes the `quantize_linear` and `dequantize` to `fbgemm`-based implementation. Reviewed By: jianyuh, jerryzh168 Differential Revision: D15014561 fbshipit-source-id: b651e69d336b5b08b4a75a4a4eddf46c040a4934	2019-04-23 10:30:12 -07:00
Jiyan Yang	714344a976	Specify to use Float16UniformFill if necessary in sparse lookup layer (#18499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18499 If the init op is not fp16 compatible, it should throw. However, in the special case where the original init op is UniformFill, we replace it with Float16UniformFill Reviewed By: kennyhorror Differential Revision: D14627209 fbshipit-source-id: eb427772874a732ca8b3a25d06670d119ce8ac14	2019-04-23 10:14:08 -07:00
Chandler Zuo	e3f1504621	Fix the Division by Zero Bug of CosineAnnealingLR (#19180 ) Summary: Added the formula for the corner case. Updated unit tests. Fixes #17913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19180 Differential Revision: D14942023 Pulled By: ezyang fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec	2019-04-23 09:54:28 -07:00
Vadim Velicodnii	7a4189696f	Fix the documentation for BCEWithLogitsLoss (#17218 , #16804 ) (#19212 ) Summary: I fixed a mistake in the explanation of `pos_weight` argument in `BCEWithLogitsLoss` and added an example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19212 Differential Revision: D14923431 Pulled By: ezyang fbshipit-source-id: 15696c67d56789102ac72afbe9bdd7b667eae5a0	2019-04-23 09:54:24 -07:00
crcrpar	bb05f70724	fix the docstring of `RandomSampler` (#19113 ) Summary: fix - the order of `Arguments` in `RandomSampler` doc - the meaningless check of `replacement`'s type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19113 Differential Revision: D15013081 Pulled By: ezyang fbshipit-source-id: 39e367f42841de6814b1214eb9df7b75f14f747e	2019-04-23 09:54:20 -07:00
mruberry	83cf9473dc	Avoid (future) cusparse name collision (#19591 ) Summary: A future version of cusparse will define "cusparseGetErrorString." This PR simply updates PyTorch's name for this function to "getCusparseErrorString" to avoid the collision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19591 Differential Revision: D15046871 Pulled By: ezyang fbshipit-source-id: 821304f75fe84c68a26680a93809a18cfdbd540b	2019-04-23 09:40:15 -07:00
jhultman	f767c9ac76	Add docs and test guaranteeing indices from torch.nonzero ordered C-style (#19539 ) Summary: See #17556. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19539 Differential Revision: D15030151 Pulled By: ezyang fbshipit-source-id: d46ee56a66d89b0113f86e3f8693dc1680d0adb9	2019-04-23 09:29:21 -07:00
Tongzhou Wang	3b4d4ef503	Remove unnecessary printing from tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19606 Differential Revision: D15046583 Pulled By: ezyang fbshipit-source-id: ea9bb691d23855e7eddbabe68bf112a726641ba4	2019-04-23 09:24:08 -07:00
Bado Lee	36084908e4	Fix lr_scheduler's last_epoch value at the time of initialization (BC BREAKING!) (#7889 ) Summary: Hello everyone :) !! I've found that lr_scheduler was initialized with last_epoch as -1. This causes that even after the first step (not the one in init but explicit step of scheduler), learning rate of scheduler's optimizer remains as the previous. ```python >>> import torch >>> cc = torch.nn.Conv2d(10,10,3) >>> myinitial_lr = 0.1 >>> myoptimizer = torch.optim.Adam(cc.parameters(), lr=myinitial_lr) >>> mylrdecay = 0.5 >>> myscheduler = torch.optim.lr_scheduler.ExponentialLR(myoptimizer,mylrdecay) >>> myscheduler.get_lr() [0.2] # this is because of get_lr calculates lr by 0.1 * 0.5^-1 >>> myscheduler.optimizer.param_groups[0]["lr"] 0.1 # this is not consistent with get_lr value >>> myscheduler.last_epoch -1 >>> myscheduler.step() >>> myscheduler.get_lr() [0.1] # this should be the value right after the init, not after first step >>> myscheduler.optimizer.param_groups[0]["lr"] 0.1 # since this is after first step, it should have been decayed as 0.05 >>> myscheduler.last_epoch 0 >>> myscheduler.step() >>> myscheduler.last_epoch 1 >>> myscheduler.get_lr() [0.05] >>> myscheduler.optimizer.param_groups[0]["lr"] 0.05 >>> myscheduler.last_epoch 1 ``` First problem is, even after the init of lr_scheduler, you get the inconsistent parameter values. The second problem is, you are stuck with same learning rate in the first 2 epochs if the step function of lr_scheduler is not called in the beginning of the epoch loop. Of course, you can avoid this by calling lr_scheduler's step in the beginning, but I don't think this is proper use since, incase of optimizer, step is called in the end of the iteration loop. I've simply avoided all above issues by setting last_epoch as 0 after the initialization. This also makes sense when you init with some value of last_epoch which is not -1. For example, if you want to init with last epoch 10, lr should not be set with decayed 1 step further. Which is last_epoch gets +1 in the previous code. base_lr * self.gamma ** self.last_epoch Instead, it should be set with step 10 exact value. I hope this fix find it's way with all your help :) I'm really looking forward & excited to become a contributor for pytorch! Pytorch Rocks!! Pull Request resolved: https://github.com/pytorch/pytorch/pull/7889 Differential Revision: D15012769 Pulled By: ezyang fbshipit-source-id: 258fc3009ea7b7390a3cf2e8a3682eafb506b08b	2019-04-23 08:54:09 -07:00
SebFar	f9c4ce781f	Removes variable which is assigned but not used (#19194 ) Summary: n was set as self.in_channels, but not used within the scope of the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19194 Differential Revision: D14937764 Pulled By: ezyang fbshipit-source-id: 55cb599109309503fee897f77d798fd454fcc02d	2019-04-23 08:48:03 -07:00
SsnL	dce3d74dfb	add torch.cuda.synchronize(device=None) (#19573 ) Summary: fixes https://github.com/pytorch/pytorch/issues/19509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19573 Differential Revision: D15045730 Pulled By: ezyang fbshipit-source-id: 732721b4b360fc4348ca7c87d4cd1386e7651bdd	2019-04-23 08:40:38 -07:00
Stefan Krah	75ce5173a9	Port adaptive_max_pool2d() to ATen (#19409 ) Summary: This is the first part of #18064. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19409 Differential Revision: D15037390 Pulled By: ezyang fbshipit-source-id: 16a3feed2fd9cc66033696da224a7d5fb7208534	2019-04-23 07:37:25 -07:00
zhiqiang	88f78c719a	Fix math formatting of PairwiseDistance and CosineSimilarity docs and fix math formatting of CTC loss docs. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19534 Differential Revision: D15034011 Pulled By: ezyang fbshipit-source-id: 60b81c970c919508a57c86fb23edc9f64973117c	2019-04-23 07:24:07 -07:00
Michael Suo	5bafb64e67	Revert D15039713: [pytorch][PR] add torch.tensor requires grad Differential Revision: D15039713 Original commit changeset: 47f1931b6fc4 fbshipit-source-id: fd91ce8ddd6d2f4e0016054dcdc2541dacc0e191	2019-04-22 23:15:49 -07:00
James Reed	e7fc7c732c	Bugfix for fusion device check (#19594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19594 I missed a callsite Reviewed By: wanchaol Differential Revision: D15041457 fbshipit-source-id: eef76ad51bee06a56d31b4ab64f19250fe2ad8f0	2019-04-22 20:55:17 -07:00
Elias Ellison	d2b03512da	add torch.tensor requires grad (#19445 ) Summary: Add setting requires_grad = True within torchscript to torch.Tensor Within constant propagation, we can't insert any constants that require grad. Also added shape analysis and requires grad analysis to torch.tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/19445 Differential Revision: D15039713 Pulled By: eellison fbshipit-source-id: 47f1931b6fc4a1137c13d80110cc404465bfdf06	2019-04-22 18:02:41 -07:00
Vitaly Fedyunin	8be6d5ffd8	Add onnx support for _unique2 operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19582 Reviewed By: ezyang, jamesr66a Differential Revision: D15037375 fbshipit-source-id: 6060476925bf02fa07f852054e06d2107f046e38	2019-04-22 17:52:47 -07:00
Lu Fang	5a796d15be	Automatic update of fbcode/onnx to 0e8d2bc5e51455c70ef790b9f65aa632ed9bc8a7 (#19568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19568 Previous import was 83dd62659fc07d5b7fa93b5d1c1879f93509c7db Included changes: - [0e8d2bc5](https://github.com/onnx/onnx/commit/0e8d2bc5): [Minor need to be in 1.5]Fix an issue in NMS test data which introduce wrong shape. (#1953) <Hector Li> - [9346dd5d](https://github.com/onnx/onnx/commit/9346dd5d): adding modulus operator (#1874) <Jeff Saremi> - [414dbc73](https://github.com/onnx/onnx/commit/414dbc73): Fix shape inference for slice (#1950) <Hariharan Seshadri> - [6fb0775d](https://github.com/onnx/onnx/commit/6fb0775d): Fix shape inference for ConstantOfShape op (#1951) <Ashwini Khade> Reviewed By: bddppq, zrphercule, benoitsteiner Differential Revision: D15033070 fbshipit-source-id: f7eb90b142cbdc9bf1600cfd33e5a8df709045fb	2019-04-22 17:36:36 -07:00
James Reed	5be4bee4ff	Don't create FusionGroups for known-CPU producer values (#19342 ) Summary: I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342 Differential Revision: D15038351 Pulled By: jamesr66a fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db	2019-04-22 16:57:18 -07:00
Sebastian Messmer	969af4315a	Explicitly define supported types (#19516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19516 Explicitly define types that are supported in kernel inputs and outputs. Also, this allows us to show much nicer error messages if a user writes kernels with wrong argument types. Reviewed By: ezyang Differential Revision: D15020306 fbshipit-source-id: 55ebec81e075e874777acd59aa29a5578fc19ef7	2019-04-22 16:31:28 -07:00
Mikhail Zolotukhin	8abab61d39	IRParser: optionally create name->value map of the parsed IR. (#19551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19551 ghimport-source-id: e666e3c00786a3b1c747f2dd6e85a48a63bdd69d Differential Revision: D15028056 Pulled By: ZolotukhinM fbshipit-source-id: 37e08d6df1d43513748ecfdd8549738eac7ec24e	2019-04-22 16:09:05 -07:00
Nikolay Korovaiko	43d0b78c31	Profiling : Adding Profile Op to provide storage for profiling lambdas Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19365 Differential Revision: D14998968 Pulled By: Krovatkin fbshipit-source-id: a7f7d1529cbe4e8b30638c6eb8e2ff68f6e114c3	2019-04-22 15:09:30 -07:00
Xiang Gao	7f053b27bc	Step 5: remove _unique_dim in favor of unique_dim (#18654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18654 ghimport-source-id: 63c84cedc3335719fca4a085fa19bdc57d2bc88a Differential Revision: D15000635 Pulled By: VitalyFedyunin fbshipit-source-id: 9e8594622a867a79d8e2b6be96579816aa22ae2d	2019-04-22 12:42:51 -07:00
Yinghai Lu	767d184b77	Add back option to not adjust output batch size (#19442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19442 For cases like CV, some of ops like transpose and tile will mangle the batch size so that we don't know how to adjust output batch size. In this case, the current solution is just fix the input batch statically and do not adjust output batch size. Reviewed By: zrphercule Differential Revision: D15007237 fbshipit-source-id: a21b943a52ee5462d9d7804dfae44360f579f8cf	2019-04-22 12:29:24 -07:00
Michael Antonov	7655b857f7	Add debug logic to c2_ref_test and its helpers (#19359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19359 Even with file IO exception handling, some of the sandcastle c2_ref_tests are still failing in length-check assert, as can be seen here: https://our.intern.facebook.com/intern/test/844424932589974?ref_report_id=0 This is an attempt to add printing logic to debug what's going on. Reviewed By: dzhulgakov Differential Revision: D14966274 fbshipit-source-id: adce6d4780d664c5ef59f9341b6133b0d09324cb	2019-04-22 12:08:55 -07:00
Dehua Cheng	a09240b0a0	fix variable shadowing issus (#19567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19567 fix variable shadowing Reviewed By: bddppq, wx1988 Differential Revision: D15032114 fbshipit-source-id: 895ea21f22b87db8c7c8684f54fa186d22f24d10	2019-04-22 11:55:30 -07:00
Elias Ellison	19f73180cf	Add manual_seed in script (#19510 ) Summary: Add manual_seed to torch script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19510 Reviewed By: suo, driazati Differential Revision: D15018823 Pulled By: eellison fbshipit-source-id: d7734a8ad05ba254c0d88abf3fb58c4ce6a4e53b	2019-04-22 10:58:15 -07:00
Lu Fang	e714429bf4	Automatic update of fbcode/onnx to 83dd62659fc07d5b7fa93b5d1c1879f93509c7db (#19454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19454 Previous import was ad7313470a9119d7e1afda7edf1d654497ee80ab Included changes: - [83dd6265](https://github.com/onnx/onnx/commit/83dd6265): Add NonMaxSuppression operator (#1703) <Hector Li> - [31ca5d6f](https://github.com/onnx/onnx/commit/31ca5d6f): add node tests for quantized ops (#1944) <Ashwini Khade> - [e6076c1d](https://github.com/onnx/onnx/commit/e6076c1d): Fix test stat coverage script (#1948) <Raymond Yang> - [ad036405](https://github.com/onnx/onnx/commit/ad036405): Add IsInf to detect infinity values (#1884) <Wei-Sheng Chin> Reviewed By: benoitsteiner Differential Revision: D15010015 fbshipit-source-id: 4b29de21de60f8e6a2db75309809a4e619c92532	2019-04-22 10:46:08 -07:00
Gregory Chanan	5d5d67fa3f	Get rid of unnecessary matches_jit_signature: True specifications. (#19549 ) Summary: Unstacked version of https://github.com/pytorch/pytorch/pull/19431. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19549 Reviewed By: ezyang Differential Revision: D15027965 Pulled By: gchanan fbshipit-source-id: a4456326a999d77d6baeb0edbb1bb5db5208a8f8	2019-04-22 10:26:29 -07:00
vishwakftw	c30224ad21	Rename potri to cholesky_inverse (#19498 ) Summary: Changelog: - Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498 Differential Revision: D15029901 Pulled By: ezyang fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a	2019-04-22 08:18:39 -07:00
Jiyan Yang	deadf3ba89	Add assertion to make sure init op is always fp16 compatible in fp16 training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18498 Reviewed By: kennyhorror Differential Revision: D14626755 fbshipit-source-id: d8a0b3c02920ab3835911a21bf05e8956853fcd7	2019-04-21 23:43:13 -07:00
Roy Li	689dd800ed	Generate only one Type class per backend (#19295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19295 ghimport-source-id: 9345110f91f044a449804ddd5116cc9179444a00 Differential Revision: D14948581 Pulled By: li-roy fbshipit-source-id: a317b03d58d621e8df162918038f7543bfb13ba2	2019-04-21 21:16:14 -07:00
Roy Li	189f30603c	Make complex its own backend (#19275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19275 ghimport-source-id: 73fd40b02152aed6f24225a88d7ffde7f700899e Differential Revision: D14948582 Pulled By: li-roy fbshipit-source-id: a1be6e57057defc74a007c5351c5edb2b9dcaf30	2019-04-21 21:16:10 -07:00
Roy Li	ab78449e8c	Add ScalarType argument to Type::options() (#19270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19270 ghimport-source-id: a5ade6131f3260066c5750ea1fa9ed5c998bb791 Differential Revision: D14938707 Pulled By: li-roy fbshipit-source-id: 018fb3f01706531a06515d6d861e5683a455a705	2019-04-21 21:16:07 -07:00
Roy Li	a044ba1af5	Generate cases for all ScalarTypes in Type functions that call to TH (#19230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19230 ghimport-source-id: 81f360f2ebd137b8e7d8e885b85246cc219761aa Differential Revision: D14927991 Pulled By: li-roy fbshipit-source-id: 1b6a57918ecdc9c87858d3e50578edef0b6e7ad5	2019-04-21 21:16:03 -07:00
Mikhail Zolotukhin	868933a467	Fix clang-format. (#19550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19550 ghimport-source-id: 980d96762426d3e97c26839edbaf107a3fc18b2f Differential Revision: D15028055 Pulled By: ZolotukhinM fbshipit-source-id: a50a0aaa74d0f1b9249ad79ab80e4b7747c3bffc	2019-04-21 20:31:09 -07:00
Shen Li	1bd5f2c181	Fix some typos in jit README Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19548 Differential Revision: D15028275 Pulled By: mrshenli fbshipit-source-id: 84ff635be3b4681962451b4c301271683174d7a8	2019-04-21 19:45:05 -07:00
Gregory Chanan	5afc274708	Match JIT signature with triu_indices / tril_indices. (#19484 ) Summary: This just plugs into the existing mechanism to do a direct translation to TensorOptions in the backend, so no codegen changes. After this lands, all native_functions will match the JIT signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19484 Differential Revision: D15013051 Pulled By: gchanan fbshipit-source-id: 6818f868d2f765ca3e56e7e6f75fe4f68492466c	2019-04-21 15:57:52 -07:00
Gregory Chanan	9eb48e1b03	Make one_hot non-differentiable. (#19524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19524 ghimport-source-id: ceda3ad43471242ebbd272a21de11731c7d8bef6 Differential Revision: D15021417 Pulled By: gchanan fbshipit-source-id: 65d1f17a32f81f47dba5e58e343d0b7b828e1d51	2019-04-21 14:14:37 -07:00
Gregory Chanan	6733037416	Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19523 ghimport-source-id: 618a15c2d1d9af9f87b46e32f10ff77111c2e3b7 Differential Revision: D15021420 Pulled By: gchanan fbshipit-source-id: 048af8da3128de10bdee5827b6fbc169c3ad25a8	2019-04-21 14:14:34 -07:00
Gregory Chanan	3944601588	Have _embedding_bag_dense_backward match JIT signature. (#19522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19522 ghimport-source-id: ad645d87396de645a1aff5fd9d9939cb79cf6558 Differential Revision: D15021419 Pulled By: gchanan fbshipit-source-id: bd7017edadb4ec9d43cefddf0aee8c52c5cca6a4	2019-04-21 14:14:30 -07:00
Gregory Chanan	e3523979ae	Have embedding_dense_backward match JIT signature. (#19521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19521 ghimport-source-id: 817d3defb5f4ee98bae1f0488f99cb0e9a5226a2 Differential Revision: D15021376 Pulled By: gchanan fbshipit-source-id: 2e29f1d3913f94fab3347dc48676303510d7da46	2019-04-21 14:14:27 -07:00
Gu, Jinghui	638ffac359	Update mkldnn-bridge to fix crash issue in DNNLOWP dequantize op (#19159 ) Summary: Remove an useless format checker in mkldnn-bridge to fix the crash issue in DNNLOWP dequantize op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19159 Differential Revision: D15027670 Pulled By: yinghai fbshipit-source-id: ac97d6ff94de013105108b9596b1bd7621c5aa75	2019-04-21 14:05:13 -07:00
Gregory Chanan	83373e7755	Hook up non_differentiability in derivatives.yaml when no autograd function is generated. (#19520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19520 ghimport-source-id: a1272aa0b23692fb189974c4daba7b2e4e0dad50 Differential Revision: D15021380 Pulled By: gchanan fbshipit-source-id: ec83efd4bb6d17714c060f13a0527a33a10452db	2019-04-21 13:48:55 -07:00
Gregory Chanan	8868a4f20b	Move non_differentiable_arg_names from autograd functions to differentiability_info. (#19519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19519 ghimport-source-id: 74e603688b2e4ed33f6c46c7da9d009336140e74 Differential Revision: D15021378 Pulled By: gchanan fbshipit-source-id: e366a914c67a90ba0552b67d0bf5b347edbaf189	2019-04-21 11:09:39 -07:00
Tongzhou Wang	6d307db5b4	Move cuFFT plan cache note outside Best Practices (#19538 ) Summary: I mistakenly put it there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19538 Differential Revision: D15026500 Pulled By: soumith fbshipit-source-id: 0c13499571fdfd789c3bd1c4b58abd870725d422	2019-04-20 21:39:59 -07:00
Michael Suo	26f1c6d4d4	Revert D14689639: [pytorch] Allow passing lists as trace inputs. Differential Revision: D14689639 Original commit changeset: 6dcec8a64319 fbshipit-source-id: 03a5e7c80e7f2420e33b056b5844a78d7fd41141	2019-04-20 08:50:47 -07:00
Gu, Jinghui	c96c91da22	Improve optimizations for DNNLOWP support on MKL-DNN (#18843 ) Summary: In this PR, the fusion alogrithms are improved to support DNNLOWP. 1. Enabled conv fusions for DNNLOWP 2. Fused order switch op into following quantize op 3. Improve conv+sum fusion to parse larger scope/window 4. re-org fusion code to fix random crash issue due to changing graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/18843 Differential Revision: D15021030 Pulled By: yinghai fbshipit-source-id: 88d2199d9fc69f392de9bfbe1f291e0ebf78ab08	2019-04-20 02:12:06 -07:00
Nishant Pandit	fe87327c28	Make Observer class as template Quant class for QuantConfig (#19418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19418 This change makes Observer class template which always takes an observer function as argument. Second test-case becomes redundant, hence removing it. Reviewed By: jerryzh168 Differential Revision: D15000594 fbshipit-source-id: 9555fe98a5f2054b8fd01e64e9ac2db72c043bfa	2019-04-19 21:47:54 -07:00
Sam Leeman-Munk	9f4f7e1621	Support compilation on gcc-7.4.0 (#19470 ) Summary: There are two corrections in this pull request. The first is specific to gcc-7.4.0. compiled with -std=c++14 gcc-7.4.0 has __cplusplus = 201402L This does not meet the check set in Deprecated.h, which asks for >201402L. The compiler goes down to the __GNUC__ check, which passes and sets C10_DEPRECATED_MESSAGE to a value that c++14 does not appear to support or even recognize, leading to a compile time error. My recommended solution, which worked for my case, was to change the = into a >= The second correction comes in response to this error: caffe2/operators/crash_op.cc: In member function ‘virtual bool caffe2::CrashOp::RunOnDevice()’: caffe2/operators/crash_op.cc:14:11: error: ‘SIGABRT’ was not declared in this scope I am merely committing to the repository the solution suggested here (which worked for me) https://discuss.pytorch.org/t/building-pytorch-from-source-in-conda-fails-in-pytorch-caffe2-operators-crash-op-cc/42859 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19470 Differential Revision: D15019529 Pulled By: ailzhang fbshipit-source-id: 9ce9d713c860ee5fd4266e5c2a7f336a97d7a90d	2019-04-19 21:41:36 -07:00
James Reed	d17c22d024	Improve embedding_bag add kernel (#19329 ) Summary: This was actually getting pretty poor throughput with respect to memory bandwidth. I used this test to measure the memory bandwidth specifically for the AXPY call: https://gist.github.com/jamesr66a/b27ff9ecbe036eed5ec310c0a3cc53c5 And I got ~8 GB/s before this change, but ~14 GB/s after this change. This seems to speed up the operator overall by around 1.3x (benchmark: https://gist.github.com/jamesr66a/c533817c334d0be432720ef5e54a4166): == Before == time_per_iter 0.0001298875093460083 GB/s 3.082544287868467 == After == time_per_iter 0.00010104801654815674 GB/s 3.9623142905451076 The large difference between the local BW increase and the full-op BW increase likely indicates significant time is being spent elsewhere in the op, so I will investigate that. EDIT: I updated this PR to include a call into caffe2/perfkernels. This is the progression: before time_per_iter 8.983819484710693e-05 GB/s 4.456723564864611 After no axpy time_per_iter 7.19951868057251e-05 GB/s 5.56126065872172 AFter perfkernels time_per_iter 5.6699180603027346e-05 GB/s 7.061548257694262 After perfkernels no grad time_per_iter 4.388842582702637e-05 GB/s 9.122769670026413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19329 Reviewed By: dzhulgakov Differential Revision: D14969630 Pulled By: jamesr66a fbshipit-source-id: 42d1015772c87bedd119e33c0aa2c8105160a738	2019-04-19 19:16:24 -07:00
Pieter Noordhuis	6325b6e44e	Make finding unused model parameters optional (#19515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19515 This is still done by default, but can now be disabled by specifying `find_unused_parameters=False`. There are use cases where finding unused parameters results in erroneous behavior, because a subset of model parameters is used outside the `forward` function. One can argue that doing this is not a good idea, but we should not break existing use cases without an escape hatch. This configuration parameter is that escape hatch. Reviewed By: bddppq Differential Revision: D15016381 fbshipit-source-id: f2f86b60771b3801ab52776e62b5fd6748ddeed0	2019-04-19 17:23:36 -07:00
Sebastian Messmer	63e2833ceb	Disallow std::vector arguments (#19511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19511 In the c10 operator registration API, disallow std::vector arguments and show a nice error message pointing users towards using ArrayRef instead. Reviewed By: ezyang Differential Revision: D15017423 fbshipit-source-id: 157ecc1298bbc598d2e310a16041edf195aaeff5	2019-04-19 17:06:31 -07:00
Sebastian Messmer	1ac14b03b5	Drop instead of pop (#19503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19503 After reading the arguments from the stack, the c10 kernel wrapper accidentally popped them again, causing a vector to be allocated. Instead, it should just drop them because they have already been read. Reviewed By: ezyang Differential Revision: D15016023 fbshipit-source-id: b694a2929f97fa77cebe247ec2e49820a3c818d5	2019-04-19 17:06:28 -07:00
Mikhail Zolotukhin	9818c7cb63	Add minimalistic implementation of subgraph matcher. (#19322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19322 ghimport-source-id: 93c713f829d1b2a9aa5d104cb1f30148dd37c967 Differential Revision: D14962182 Pulled By: ZolotukhinM fbshipit-source-id: 3989fba06502011bed9c24f12648d0baa2a4480c	2019-04-19 16:35:16 -07:00
Mingzhe Li	26f12af537	Fix op benchmarks error in OSS environment (#19518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518 Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder. Reviewed By: ilia-cher Differential Revision: D15020787 fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b	2019-04-19 16:25:16 -07:00
Mingzhe Li	5da7b74d48	fix AI-PEP path error (#19514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19514 as title Reviewed By: hl475 Differential Revision: D15018499 fbshipit-source-id: 9ce38e3a577432e0575a6743f5dcd2e907d3ab9d	2019-04-19 16:25:13 -07:00
eellison	a421f882dc	First step at container aliasing (#18710 ) Summary: First step at allowing container types within alias analysis. Since the current implementation hides the concept of Wildcards within alias analysis and does not expose it to memory dag, we cannot represent whether a container type holds a wildcard. As a result, only handle TupleConstruct, where we can directly inspect if any input values are wildcards, and don't handle nested containers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18710 Differential Revision: D15017068 Pulled By: eellison fbshipit-source-id: 3ee76a5482cef1cc4a10f034593ca21019161c18	2019-04-19 16:07:11 -07:00
Xiaomeng Yang	f5fe7aa0b2	Fix relu bug for empty tensor (#19451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19451 Fix relu bug for empty tensor Reviewed By: xianjiec Differential Revision: D15009811 fbshipit-source-id: b75e567c3bec08d7d12b950d8f1380c50c138704	2019-04-19 15:21:07 -07:00
Eric Faust	2d4875b8ed	Allow passing lists as trace inputs. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18636 Differential Revision: D14689639 fbshipit-source-id: 6dcec8a64319ae3c4da9a93f574a13ce8ec223a5	2019-04-19 13:37:50 -07:00
Michael Suo	9245eaf3f0	Allow for segmented printing in PythonPrint (#19238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19238 ghimport-source-id: 469d33cd187fa68840b201d625800a0f4fead547 Differential Revision: D14928291 Reviewed By: zdevito Pulled By: suo fbshipit-source-id: 257fce3dd1601ba192092d3fc318374e3752907e	2019-04-19 13:02:06 -07:00
Michael Suo	73c166a5ed	add resolveType to Resolver (#19237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19237 ghimport-source-id: 70777ec37155be37efef1b743d564752e4dff9de Differential Revision: D14928289 Reviewed By: zdevito Pulled By: suo fbshipit-source-id: 46827da9ace16730669fc654bf781d83172d18b1	2019-04-19 13:02:02 -07:00
Michael Suo	1e94a3bc4d	Turn resolver into a class (#19236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19236 ghimport-source-id: d36705ea5ecff085d0d84ea57bb96d18d7c260dd Differential Revision: D14928292 Reviewed By: zdevito Pulled By: suo fbshipit-source-id: cd038100ac423fa1c19d0547b9e5487a633a2258	2019-04-19 13:01:59 -07:00
davidriazati	405c7bcea0	Fix bad annotation in docs (#19501 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * #19501 [jit] Fix bad annotation in docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/19501 Pulled By: driazati Differential Revision: D15016062 fbshipit-source-id: 3dcd0481eb48b84e98ffe8c5df2cbc9c2abf99f9	2019-04-19 12:42:26 -07:00
Yinghai Lu	b85edac16f	Fix out-of-topological-order issue in Nomnigraph (#19458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458 The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti. Differential Revision: D15011893 fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942	2019-04-19 12:19:39 -07:00
Roy Li	53bb739b67	Remove uses of TypeID (#19452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19452 ghimport-source-id: 816ae7fe1a18d76f064d5796dec44dca6a138a21 Differential Revision: D15009920 Pulled By: li-roy fbshipit-source-id: 722f05a927528148555561da62839f84dba645c6	2019-04-19 12:07:35 -07:00
Jerry Zhang	3762cf9cc6	Expose QScheme in frontend (#19381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19381 Expose QScheme enum in frontend so that people can use it in quantization configs in modules. Differential Revision: D14922992 fbshipit-source-id: ab07b8a7ec42c1c1f5fe84a4a0c805adbcad408d	2019-04-19 11:57:59 -07:00
Gregory Chanan	1898e9368b	Revert D15003385: Have embedding_dense_backward match JIT signature. Differential Revision: D15003385 Original commit changeset: 53cbe18aa454 fbshipit-source-id: be904ee2212aa9e402715c436a84d95f6cde326f	2019-04-19 11:27:16 -07:00
Gregory Chanan	e3470ae4bd	Revert D15003379: Have _embedding_bag_dense_backward match JIT signature. Differential Revision: D15003379 Original commit changeset: f8e82800171f fbshipit-source-id: 55f83557998d166aeb41d00d7a590acdc76fcf22	2019-04-19 11:27:13 -07:00
Gregory Chanan	79bfc3931a	Revert D15003387: Remove 'BoolTensor', 'IndexTensor' from frontend specifications. Differential Revision: D15003387 Original commit changeset: e518e8ce3228 fbshipit-source-id: af5b107239446ea8d6f229a427d5b157fcafd224	2019-04-19 11:27:10 -07:00
Gregory Chanan	013926cfcf	Revert D15003382: Make one_hot non-differentiable. Differential Revision: D15003382 Original commit changeset: e9244c7a5f0a fbshipit-source-id: 84789cf4c46c77cce655e70c2a8ff425f32f48bd	2019-04-19 11:27:08 -07:00
Jerry Zhang	fc1aadec3b	Make empty_affine_quantized private (#19446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19446 change empty_affine_quantized to _empty_affine_quantized Reviewed By: dzhulgakov Differential Revision: D15008757 fbshipit-source-id: c7699ac0c208a8f17d88e95193970c75ba7219d3	2019-04-19 11:21:44 -07:00
Gregory Chanan	c3755eeeee	Make one_hot non-differentiable. (#19430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19430 ghimport-source-id: 6787473873fdc21400138a4322e17fee8db62607 Differential Revision: D15003382 Pulled By: gchanan fbshipit-source-id: e9244c7a5f0ad7cd2f79635944a8b37f910231c9	2019-04-19 11:03:14 -07:00
Gregory Chanan	622cf1fec9	Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19429 ghimport-source-id: 6116682b84210a34babb8b87a92e7050433e5d59 Differential Revision: D15003387 Pulled By: gchanan fbshipit-source-id: e518e8ce322810e06175bb4e6672d4ea1eb18efd	2019-04-19 11:03:12 -07:00
Gregory Chanan	b0812d3d4c	Have embedding_dense_backward match JIT signature. (#19427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19427 ghimport-source-id: 93438cd495129a1e41118c62e6339909783035fd Differential Revision: D15003385 Pulled By: gchanan fbshipit-source-id: 53cbe18aa4541a2501f496abfee526e40093c0ff	2019-04-19 11:03:09 -07:00
Gregory Chanan	a6ab443e32	Have _embedding_bag_dense_backward match JIT signature. (#19428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19428 ghimport-source-id: 037efa3df95efc1fbff631826351d1698a3c49ec Differential Revision: D15003379 Pulled By: gchanan fbshipit-source-id: f8e82800171f632e28535e416283d858156068ec	2019-04-19 11:03:06 -07:00
Gregory Chanan	30b2953b8b	Stop generating autograd functions for derivatives.yaml entries that only specify output differentiability. (#19424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19424 ghimport-source-id: e9d1b86742607f5cbe39fb278fa7f378739cd6ef Differential Revision: D15003380 Pulled By: gchanan fbshipit-source-id: 8efb94fbc0b843863021bf25deab57c492086237	2019-04-19 10:56:20 -07:00
David Riazati	e7b9526dc6	Fix ord() when dealing with utf8 chars (#19423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19423 ghimport-source-id: e7449489fbc86ec1116f94027b3c1561942413ee Reviewed By: eellison Differential Revision: D15002847 Pulled By: driazati fbshipit-source-id: 4560cebcfca695447423d48d65ed364e7dbdbedb	2019-04-19 10:27:04 -07:00
barrh	557b1b362f	Fix copied optimizer (#19308 ) Summary: Add the defaults field to the copied object. Prior to this patch, optimizer.__getattr__ has excluded the defaults attribute of optimizer source object, required by some LR schedulers. (e.g. CyclicLR with momentum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19308 Differential Revision: D15012801 Pulled By: soumith fbshipit-source-id: 95801b269f6f9d78d531d4fed95c973b280cc96f	2019-04-19 10:27:01 -07:00
MilesCranmer	30292d994f	Add an identity module (#19249 ) Summary: This is a simple yet useful addition to the torch.nn modules: an identity module. This is a first draft - please let me know what you think and I will edit my PR. There is no identity module - nn.Sequential() can be used, however it is argument sensitive so can't be used interchangably with any other module. This adds nn.Identity(...) which can be swapped with any module because it has dummy arguments. It's also more understandable than seeing an empty Sequential inside a model. See discussion on #9160. The current solution is to use nn.Sequential(). However this won't work as follows: ```python batch_norm = nn.BatchNorm2d if dont_use_batch_norm: batch_norm = Identity ``` Then in your network, you have: ```python nn.Sequential( ... batch_norm(N, momentum=0.05), ... ) ``` If you try to simply set `Identity = nn.Sequential`, this will fail since `nn.Sequential` expects modules as arguments. Of course there are many ways to get around this, including: - Conditionally adding modules to an existing Sequential module - Not using Sequential but writing the usual `forward` function with an if statement - ... However, I think that an identity module is the most pythonic strategy, assuming you want to use nn.Sequential. Using the very simple class (this isn't the same as the one in my commit): ```python class Identity(nn.Module): def __init__(self, args, *kwargs): super().__init__() def forward(self, x): return x ``` we can get around using nn.Sequential, and `batch_norm(N, momentum=0.05)` will work. There are of course other situations this would be useful. Thank you. Best, Miles Pull Request resolved: https://github.com/pytorch/pytorch/pull/19249 Differential Revision: D15012969 Pulled By: ezyang fbshipit-source-id: 9f47e252137a1679e306fd4c169dca832eb82c0c	2019-04-19 10:12:18 -07:00
Junjie Bai	ef499cd567	Remove no-fork workaround for running tests with ROCm (#19436 ) Summary: This should have been fixed in newest ROCm version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19436 Reviewed By: ezyang Differential Revision: D15004685 Pulled By: bddppq fbshipit-source-id: 19fd4cca94c914dc54aabfbb4e62b328aa348a35	2019-04-19 09:51:03 -07:00
Edward Yang	f3ef94a806	Delete defunct test/ffi directory. (#19168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19168 ghimport-source-id: 5190a8d00c529735e99e8745c5e7cf1901fdb800 Differential Revision: D14938318 Pulled By: ezyang fbshipit-source-id: eaeb6814178c434f737b99ae1fce63fd9ecdb432	2019-04-19 08:16:49 -07:00
Bharat123rox	a97330b7c5	Fix missing doc out= for torch.cumprod (#19340 ) Summary: Fix #19255 by adding the `out=None` argument for `torch.cumprod` missing [here](https://pytorch.org/docs/master/torch.html#torch.cumprod) also added the docstring for `out` in torch.cumsum which was missing [here](https://pytorch.org/docs/master/torch.html#torch.cumsum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19340 Differential Revision: D14973931 Pulled By: ezyang fbshipit-source-id: 232f5c9a606b749d67d068afad151539866fedda	2019-04-19 07:59:57 -07:00
Clément Pinard	0676ba0c5c	Mention packed accessors in tensor basics doc (#19464 ) Summary: This is a continuation of efforts into packed accessor awareness. A very simple example is added, along with the mention that the template can hold more arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19464 Differential Revision: D15012564 Pulled By: soumith fbshipit-source-id: a19ed536e016fae519b062d847cc58aef01b1b92	2019-04-19 07:20:16 -07:00
Gregory Chanan	ea6c738c8a	Rename 'not_differentiable' to 'non_differentiable'. (#19272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19272 ghimport-source-id: 755e91efa68c5a1c4377a6853f21b3eee3f8cab5 Differential Revision: D15003381 Pulled By: gchanan fbshipit-source-id: 54db27c5c5e65acf65821543db3217de9dd9bdb5	2019-04-19 07:07:55 -07:00
Lu Fang	aa50f1e365	Clean the onnx constant fold code a bit (#19398 ) Summary: This is a follow up PR of https://github.com/pytorch/pytorch/pull/18698 to lint the code using clang-format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19398 Differential Revision: D14994517 Pulled By: houseroad fbshipit-source-id: 2ae9f93e66ce66892a1edc9543ea03932cd82bee	2019-04-18 23:59:26 -07:00
Eric Faust	593bb145ce	Allow passing dicts as trace inputs. (#18092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18092 Previously, tracing required all inputs to be either tensors, or tuples of tensor. Now, we allow users to pass dicts as well. Differential Revision: D14491795 fbshipit-source-id: 7a2df218e5d00f898d01fa5b9669f9d674280be3	2019-04-18 23:52:00 -07:00
Lu Fang	9034b66f14	skip test_trace_c10_ops if _caffe2 is not built (#19099 ) Summary: fix https://github.com/pytorch/pytorch/issues/18142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19099 Differential Revision: D15010452 Pulled By: houseroad fbshipit-source-id: 5bf158d7fce7bfde109d364a3a9c85b83761fffb	2019-04-18 23:40:15 -07:00
Gemfield	d9115b533a	remove needless ## in REGISTER_ALLOCATOR definition. (#19261 ) Summary: remove needless ## in REGISTER_ALLOCATOR definition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19261 Differential Revision: D15002025 Pulled By: soumith fbshipit-source-id: 40614b1d79d1fe05ccf43f0ae5aab950e4c875c2	2019-04-18 22:44:09 -07:00
Lara Haidar-Ahmad	9983c24cfc	Strip doc_string from exported ONNX models (#18882 ) Summary: Strip the doc_string by default from the exported ONNX models (this string has the stack trace and information about the local repos and folders, which can be confidential). The users can still generate the doc_string by specifying add_doc_string=True in torch.onnx.export(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/18882 Differential Revision: D14889684 Pulled By: houseroad fbshipit-source-id: 26d2c23c8dc3f484544aa854b507ada429adb9b8	2019-04-18 22:30:00 -07:00
Natalia Gimelshein	f0d98199fb	improve dim sort performance (#19379 ) Summary: We are already using custom comparators for sorting (for a good reason), but are still making 2 sorting passes - global sort and stable sorting to bring values into their slices. Using a custom comparator to sort within a slice allows us to avoid second sorting pass and brings up to 50% perf improvement. t-vi I know you are moving sort to ATen, and changing THC is discouraged, but #18350 seems dormant. I'm fine with #18350 landing first, and then I can put in these changes. cc umanwizard for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19379 Differential Revision: D15011019 Pulled By: soumith fbshipit-source-id: 48e5f5aef51789b166bb72c75b393707a9aed57c	2019-04-18 22:25:08 -07:00
SsnL	941ccd6b35	Fix missing import sys in pin_memory.py (#19419 ) Summary: kostmo pointed this out in #15331. Thanks :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19419 Differential Revision: D15002846 Pulled By: soumith fbshipit-source-id: c600fab3f7a7a5147994b9363910af4565c7ee65	2019-04-18 22:19:26 -07:00
Ran	940caed0d4	update documentation of PairwiseDistance#19241 (#19412 ) Summary: Fix the documentation of PairwiseDistance [#19241](https://github.com/pytorch/pytorch/issues/19241) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19412 Differential Revision: D14998271 Pulled By: soumith fbshipit-source-id: bcb2aa46d3b3102c4480f2d24072a5e14b049888	2019-04-18 22:13:52 -07:00
Soumith Chintala	8638634a6e	fixes link in TripletMarginLoss (#19417 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19417 Differential Revision: D15001610 Pulled By: soumith fbshipit-source-id: 1b85ebe196eb5a3af5eb83d914dafa83b9b35b31	2019-04-18 22:13:48 -07:00
Mingzhe Li	08f5c05d60	make separate operators as independent binaries (#19450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450 We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue. Reviewed By: ilia-cher Differential Revision: D14808159 fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac	2019-04-18 20:00:47 -07:00
svcscm	1e78252de7	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: a727513842c0a240b377bda4e313fbedbc54c2e8	2019-04-18 18:34:36 -07:00
Xiang Gao	e1750754c8	Step 4: add support for unique with dim=None (#18651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18651 ghimport-source-id: e11988130a3f9a73529de0b0d08b4ec25fbc639c Differential Revision: D15000463 Pulled By: VitalyFedyunin fbshipit-source-id: 9e258e473dea6a3fc2307da2119b887ba3f7934a	2019-04-18 18:28:07 -07:00
Michael Suo	1b1d1c9837	allow bools to be used as attributes (#19440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19440 ghimport-source-id: 9c962054d760526bf7da324b114455fcb1038521 Differential Revision: D15005723 Pulled By: suo fbshipit-source-id: 75fc87ae33894fc34d3b913881defb7e6b8d7af0	2019-04-18 18:13:21 -07:00
David Riazati	a0e09216f0	Fix test build (#19444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19444 ghimport-source-id: c85db00e8037e7f6f0424eb8bd17f957d20b7247 Reviewed By: eellison Differential Revision: D15008679 Pulled By: driazati fbshipit-source-id: 0987035116d9d0069794d96395c8ad458ba7c121	2019-04-18 18:05:04 -07:00
Thomas Viehmann	b9291f55bb	pow scalar exponent / base autodiff, fusion (#19324 ) Summary: Fixes: #19253 Fixing pow(Tensor, float) is straightforward. The breakage for pow(float, Tensor) is a bit more subtle to trigger, and fixing needs `torch.log` (`math.log` didn't work) from the newly merged #19115 (Thanks ngimel for pointing out this has landed.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19324 Differential Revision: D15003531 Pulled By: ailzhang fbshipit-source-id: 8b22138fa27a43806b82886fb3a7b557bbb5a865	2019-04-18 17:58:35 -07:00
Gao, Xiang	b4fa979a37	Improve unique CPU performance for returning counts (#19352 ) Summary: Benchmark on a tensor of shape `torch.Size([15320, 2])`. Benchmark code: ```python print(torch.__version__) print() a = tensor.flatten() print('cpu, sorted=False:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted=False) %timeit torch._unique2_temporary_will_remove_soon(a, sorted=False, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted=False, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted=False, return_inverse=True, return_counts=True) print() print('cpu, sorted=True:') %timeit torch._unique2_temporary_will_remove_soon(a) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True) print() ``` Before ``` 1.1.0a0+36854fe cpu, sorted=False: 340 µs ± 4.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 724 µs ± 6.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 54.3 ms ± 469 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.6 ms ± 659 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted=True: 341 µs ± 7.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 727 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 54.7 ms ± 795 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.3 ms ± 647 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` After ``` 1.1.0a0+261d9e8 cpu, sorted=False: 350 µs ± 865 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 771 µs ± 598 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.09 ms ± 6.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.09 ms ± 4.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cpu, sorted=True: 324 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 705 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.09 ms ± 5.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.09 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19352 Differential Revision: D14984717 Pulled By: VitalyFedyunin fbshipit-source-id: 3c56f85705ab13a92ec7406f4f30be77226a3210	2019-04-18 17:52:59 -07:00
Pieter Noordhuis	563de88aa5	Revert D14909203: Remove usages of TypeID Differential Revision: D14909203 Original commit changeset: d716179c484a fbshipit-source-id: 992ff1fcd6d35d3f2ae768c7e164b7a0ba871914	2019-04-18 17:47:39 -07:00
Sebastian Messmer	ce969c0bc4	Add tests for argument types (#19290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19290 Add test cases for the supported argument types And TODOs for some unsupported ones that we might want to support. Reviewed By: dzhulgakov Differential Revision: D14931920 fbshipit-source-id: c47bbb295a54ac9dc62569bf5c273368c834392c	2019-04-18 17:20:13 -07:00
David Riazati	d9052b2176	Allow optionals arguments from C++ (#19311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19311 ghimport-source-id: 699f62eb2bbad53ff2045fb2e217eb1402f2cdc5 Reviewed By: eellison Differential Revision: D14983059 Pulled By: driazati fbshipit-source-id: 442f96d6bd2a8ce67807ccad2594b39aae489ca5	2019-04-18 17:15:05 -07:00
Mingzhe Li	45d5b6be48	Enhance front-end to add op (#19433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433 For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op. Here is the logic to add new operator to the benchmark: ``` long_config = {} short_config = {} map_func add_test( [long_config, short_config], map_func, [caffe2 op] [pt op] ) ``` Reviewed By: zheng-xq Differential Revision: D14791191 fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05	2019-04-18 17:07:02 -07:00
Dmytro Dzhulgakov	edf77fe64a	Fix cpp_custom_type_hack variable handling (#19400 ) Summary: My bad - it might be called in variable and non-variable context. So it's better to just inherit variable-ness from the caller. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19400 Reviewed By: ezyang Differential Revision: D14994781 Pulled By: dzhulgakov fbshipit-source-id: cb9d055b44a2e1d7bbf2e937d558e6bc75037f5b	2019-04-18 16:44:25 -07:00
Ailing Zhang	4c93be0fa0	fix hub doc formatting issues (#19434 ) Summary: minor fixes for doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/19434 Differential Revision: D15003903 Pulled By: ailzhang fbshipit-source-id: 400768d9a5ee24f9183faeec9762b688c48c531b	2019-04-18 16:02:19 -07:00
Pieter Noordhuis	a5c4348d54	Recursively find tensors in DDP module output (#19360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19360 We'll return the output object verbatim since it is a freeform object. We need to find any tensors in this object, though, because we need to figure out which parameters were used during this forward pass, to ensure we short circuit reduction for any unused parameters. Before this commit only lists were handled and the functionality went untested. This commit adds support for dicts and recursive structures, and also adds a test case. Closes #19354. Reviewed By: mrshenli Differential Revision: D14978016 fbshipit-source-id: 4bb6999520871fb6a9e4561608afa64d55f4f3a8	2019-04-18 14:57:09 -07:00
Sebastian Messmer	17f05ad5e5	Moving at::Tensor into caffe2::Tensor without bumping refcount (#19388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388 The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor. Now, it is possible to move it without a refcount bump. Reviewed By: dzhulgakov Differential Revision: D14986815 fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b	2019-04-18 14:13:26 -07:00
Ailing Zhang	88f70a1670	Fix pickling torch.float32 (#18045 ) Summary: Attempt fix for #14057 . This PR fixes the example script in the issue. The old behavior is a bit confusing here. What happened to pickling is python2 failed to recognize `torch.float32` is in module `torch`, thus it's looking for `torch.float32` in module `__main__`. Python3 is smart enough to handle it. According to the doc [here](https://docs.python.org/2/library/pickle.html#object.__reduce__), it seems `__reduce__` should return `float32` instead of the old name `torch.float32`. In this way python2 is able to find `float32` in `torch` module. > If a string is returned, it names a global variable whose contents are pickled as normal. The string returned by __reduce__() should be the object’s local name relative to its module Pull Request resolved: https://github.com/pytorch/pytorch/pull/18045 Differential Revision: D14990638 Pulled By: ailzhang fbshipit-source-id: 816b97d63a934a5dda1a910312ad69f120b0b4de	2019-04-18 12:28:10 -07:00
David Riazati	f5435634b4	Respect order of Parameters in rnn.py (#18198 ) Summary: Previously to get a list of parameters this code was just putting them in the reverse order in which they were defined, which is not always right. This PR allows parameter lists to define the order themselves. To do this parameter lists need to have a corresponding function that provides the names of the parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18198 Differential Revision: D14966270 Pulled By: driazati fbshipit-source-id: 59331aa59408660069785906304b2088c19534b2	2019-04-18 11:18:20 -07:00
Nikolay Korovaiko	2d0d153288	Refactor EmitLoopCommon to make it more amenable to future extensions (#19341 ) Summary: This PR paves the way for support more iterator types in for-in loops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19341 Differential Revision: D14992749 Pulled By: Krovatkin fbshipit-source-id: e2d4c9465c8ec3fc74fbf23006dcb6783d91795f	2019-04-18 09:59:21 -07:00
Kutta Srinivasan	b7323a94ad	Cleanup init_process_group (#19033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19033 torch.distributed.init_process_group() has had many parameters added, but the contract isn't clear. Adding documentation, asserts, and explicit args should make this clearer to callers and more strictly enforced. Reviewed By: mrshenli Differential Revision: D14813070 fbshipit-source-id: 80e4e7123087745bed436eb390887db9d1876042	2019-04-18 09:37:38 -07:00
peterjc123	20a5aa9670	Sync FindCUDA/select_computer_arch.cmake from upstream (#19392 ) Summary: 1. Fixes auto detection for Turing cards. 2. Adds Turing Support Pull Request resolved: https://github.com/pytorch/pytorch/pull/19392 Differential Revision: D14996142 Pulled By: soumith fbshipit-source-id: 3cd45c58212cf3db96e5fa19b07d9f1b59a1666a	2019-04-18 07:03:19 -07:00
Alexandros Metsai	9e3bdb3231	Update module.py documentation. (#19347 ) Summary: Added the ">>>" python interpreter sign(three greater than symbols), so that the edited lines will appear as code, not comments/output, in the documentation. Normally, the interpreter would display "..." when expecting a block, but I'm not sure how this would work on the pytorch docs website. It seems that in other code examples the ">>>" sign is used as well, therefore I used with too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19347 Differential Revision: D14986154 Pulled By: soumith fbshipit-source-id: 8f4d07d71ff7777b46c459837f350eb0a1f17e84	2019-04-18 06:46:24 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Mingfei Ma	b8fb6eae88	Improve bmm() performance on CPU when input tensor is non-contiguous (#19338 ) Summary: This PR aims to improve Transformer performance on CPU, `bmm()` is one of the major bottlenecks now. Current logic of `bmm()` on CPU only uses MKL batch gemm when the inputs `A` and `B` are contiguous or transposed. So when `A` or `B` is a slice of a larger tensor, it falls to a slower path. `A` and `B` are both 3D tensors. MKL is able to handle the batch matrix multiplication on occasion that `A.stride(1) == 1 \|\| A.stride(2) == 1` and `B.stride(1) == \|\| B.stride(2) == 1`. From [fairseq](https://github.com/pytorch/fairseq) implementation of Transformer, multi-head attention has two places to call bmm(), [here](https://github.com/pytorch/fairseq/blob/master/fairseq/modules/multihead_attention.py#L167) and [here](https://github.com/pytorch/fairseq/blob/master/fairseq/modules/multihead_attention.py#L197), `q`, `k`, `v` are all slices from larger tensor. So the `bmm()` falls to slow path at the moment. Results on Xeon 6148 (202 cores 2.5GHz) indicate this PR improves Transformer training performance by 48%* (seconds per iteration reduced from 5.48 to 3.70), the inference performance should also be boosted. Before: ``` \| epoch 001: 0%\| \| 27/25337 [02:27<38:31:26, 5.48s/it, loss=16.871, nll_loss=16.862, ppl=119099.70, wps=865, ups=0, wpb=4715.778, bsz=129.481, num_updates=27, lr=4.05e-06, gnorm=9.133, ``` After: ``` \| epoch 001: 0%\| \| 97/25337 [05:58<25:55:49, 3.70s/it, loss=14.736, nll_loss=14.571, ppl=24339.38, wps=1280, ups=0, wpb=4735.299, bsz=131.134, num_updates=97, lr=1.455e-05, gnorm=3.908, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19338 Differential Revision: D14986346 Pulled By: soumith fbshipit-source-id: 827106245af908b8a4fda69ed0288d322b028f08	2019-04-18 06:34:17 -07:00
Sebastian Messmer	12d6f79ecd	Optional inputs and outputs (#19289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19289 Allow optional inputs and outputs in native c10 operators Reviewed By: dzhulgakov Differential Revision: D14931927 fbshipit-source-id: 48f8bec009c6374345b34d933f148c08bb4f7118	2019-04-18 02:04:57 -07:00
Sebastian Messmer	fa96de2b3f	Add some tests (#19288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19288 - Reviewed By: dzhulgakov Differential Revision: D14931924 fbshipit-source-id: 6c53b5d1679080939973d33868e58ca4ad70361d	2019-04-18 02:04:53 -07:00
Sebastian Messmer	601f36bacc	Use string based schema for exposing caffe2 ops (#19287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287 Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators. Reviewed By: dzhulgakov Differential Revision: D14931925 fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8	2019-04-18 02:04:50 -07:00
Sebastian Messmer	5ca22cce69	Allow registering ops without specifying the full schema (#19286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19286 The operator registration API now allows registering an operator by only giving the operator name and not the full operator schema, as long as the operator schema can be inferred from the kernel function. Reviewed By: dzhulgakov Differential Revision: D14931921 fbshipit-source-id: 3776ce43d4ce67bb5a3ea3d07c37de96eebe08ba	2019-04-18 02:04:46 -07:00
Sebastian Messmer	a456e1e196	Add either type (#19285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19285 The either type is a tagged union with two members. This is going to be used in a diff stacked on top to allow a function to return one of two types. Also, generally, either<Error, Result> is a great pattern for returning value_or_error from a function without using exceptions and we could use this class for that later. Reviewed By: dzhulgakov Differential Revision: D14931923 fbshipit-source-id: 7d1dd77b3e5b655f331444394dcdeab24772ab3a	2019-04-18 02:04:43 -07:00
Sebastian Messmer	12dcc77bcb	Allow ops without tensor args if only fallback kernel exists (#19284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19284 Instantiating a dispatch table previously only worked when the op had a tensor argument we could dispatch on. However, the legacy API for custom operators didn't have dispatch and also worked for operators without tensor arguments, so we need to continue supporting that. It probably generally makes sense to support this as long as there's only a fallback kernel and no dispatched kernel registered. This diff adds that functionality. Reviewed By: dzhulgakov Differential Revision: D14931926 fbshipit-source-id: 38fadcba07e5577a7329466313c89842d50424f9	2019-04-18 02:04:40 -07:00
Sebastian Messmer	8036af39d2	String-based schemas in op registration API (#19283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19283 Now that the function schema parser is available in ATen/core, we can use it from the operator registration API to register ops based on string schemas. This does not allow registering operators based on only the name yet - the full schema string needs to be defined. A diff stacked on top will add name based registration. Reviewed By: dzhulgakov Differential Revision: D14931919 fbshipit-source-id: 71e490dc65be67d513adc63170dc3f1ce78396cc	2019-04-18 01:03:40 -07:00
Sebastian Messmer	41dc54e291	Move function schema parser to ATen/core build target (#19282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19282 This is largely a hack because we need to use the function schema parser from ATen/core but aren't clear yet on how the final software architecture should look like. - Add function schema parser files from jit to ATen/core build target. - Also move ATen/core build target one directory up to allow this. We only change the build targets and don't move the files yet because this is likely not the final build set up and we want to avoid repeated interruptions for other developers. cc zdevito Reviewed By: dzhulgakov Differential Revision: D14931922 fbshipit-source-id: 26462e2e7aec9e0964706138edd3d87a83b964e3	2019-04-18 01:03:37 -07:00
Lu Fang	789c438d86	Automatic update of fbcode/onnx to ad7313470a9119d7e1afda7edf1d654497ee80ab (#19339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19339 Previous import was 971311db58f2fa8306d15e1458b5fd47dbc8d11c Included changes: - [ad731347](https://github.com/onnx/onnx/commit/ad731347): Fix shape inference for matmul (#1941) <Bowen Bao> - [3717dc61](https://github.com/onnx/onnx/commit/3717dc61): Shape Inference Tests for QOps (#1929) <Ashwini Khade> - [a80c3371](https://github.com/onnx/onnx/commit/a80c3371): Prevent unused variables from generating warnings across all platforms. (#1930) <Pranav Sharma> - [be9255c1](https://github.com/onnx/onnx/commit/be9255c1): add title (#1919) <Prasanth Pulavarthi> - [7a112a6f](https://github.com/onnx/onnx/commit/7a112a6f): add quantization ops in onnx (#1908) <Ashwini Khade> - [6de42d7d](https://github.com/onnx/onnx/commit/6de42d7d): Create working-groups.md (#1916) <Prasanth Pulavarthi> Reviewed By: yinghai Differential Revision: D14969962 fbshipit-source-id: 5ec64ef7aee5161666ed0c03e201be0ae20826f9	2019-04-18 00:45:20 -07:00
Roy Li	fbf505cba7	Remove copy and copy_ special case on Type (#18972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18972 ghimport-source-id: b5d3012b00530145fa24ab0cab693a7e80cb5989 Differential Revision: D14816530 Pulled By: li-roy fbshipit-source-id: 9c7a166abb22d2cd1f81f352e44d9df1541b1774	2019-04-18 00:21:43 -07:00
Spandan Tiwari	a64cce326f	Add constant folding to ONNX graph during export (Resubmission) (#18698 ) Summary: Rewritten version of https://github.com/pytorch/pytorch/pull/17771 using graph C++ APIs. This PR adds the ability to do constant folding on ONNX graphs during PT->ONNX export. This is done mainly to optimize the graph and make it leaner. The two attached snapshots show a multiple-node LSTM model before and after constant folding. A couple of notes: 1. Constant folding is by default turned off for now. The goal is to turn it on by default once we have validated it through all the tests. 2. Support for folding in nested blocks is not in place, but will be added in the future, if needed. Original Model: ![multiple_lstm_original](https://user-images.githubusercontent.com/23646532/53987630-6ac53980-40d6-11e9-9702-1ccfee124a83.JPG) Constant-folded model: ![multiple_lstm_constant_folded](https://user-images.githubusercontent.com/23646532/53987632-6c8efd00-40d6-11e9-81c5-362c16f68861.JPG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18698 Differential Revision: D14889768 Pulled By: houseroad fbshipit-source-id: b6616b1011de9668f7c4317c880cb8ad4c7b631a	2019-04-18 00:10:04 -07:00
Roy Li	01d7d3de46	Remove usages of TypeID (#19183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19183 ghimport-source-id: 9af190b072523459fa61e5e79419b88ac8586a4d Differential Revision: D14909203 Pulled By: li-roy fbshipit-source-id: d716179c484aebfe3ec30087c5ecd4a11848ffc3	2019-04-17 23:55:47 -07:00
Sebastian Messmer	c7b1fdb767	Fixing function schema parser for Android (#19281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281 String<->Number conversions aren't available in the STL used in our Android environment. This diff adds workarounds for that so that the function schema parser can be compiled for android Reviewed By: dzhulgakov Differential Revision: D14931649 fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad	2019-04-17 23:50:17 -07:00
Sebastian Messmer	094678c04b	Split function schema parser from operator (#19280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280 We want to use the function schema parser from ATen/core, but with as little dependencies as possible. This diff moves the function schema parser into its own file and removes some of its dependencies. Reviewed By: dzhulgakov Differential Revision: D14931651 fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea	2019-04-17 23:50:15 -07:00
Ailing Zhang	a1174dbc50	fix hub doc format Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19396 Differential Revision: D14993859 Pulled By: ailzhang fbshipit-source-id: bdf94e54ec35477cfc34019752233452d84b6288	2019-04-17 23:43:56 -07:00
Mikhail Zolotukhin	b31bab7860	Clang-format torch/csrc/jit/passes/quantization.cpp. (#19385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19385 ghimport-source-id: 67f808db7dcbcb6980eac79a58416697278999b0 Differential Revision: D14991917 Pulled By: ZolotukhinM fbshipit-source-id: 6c2e57265cc9f0711752582a04d5a070482ed1e6	2019-04-17 22:08:41 -07:00
Shen Li	6732358bf9	Allow DDP to wrap multi-GPU modules (#19271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19271 allow DDP to take multi-gpu models Reviewed By: pietern Differential Revision: D14822375 fbshipit-source-id: 1eebfaa33371766d3129f0ac6f63a573332b2f1c	2019-04-17 21:21:54 -07:00
Jiyan Yang	c48e1679f9	Add validator for optimizers when parameters are shared Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18497 Reviewed By: kennyhorror Differential Revision: D14614738 fbshipit-source-id: beddd8349827dcc8ccae36f21e5d29627056afcd	2019-04-17 21:10:38 -07:00
Ailing Zhang	2787f1d8ed	hub minor fixes (#19247 ) Summary: A few improvements while doing bert model Pull Request resolved: https://github.com/pytorch/pytorch/pull/19247 Differential Revision: D14989345 Pulled By: ailzhang fbshipit-source-id: f4846813f62b6d497fbe74e8552c9714bd8dc3c7	2019-04-17 21:04:33 -07:00
Elias Ellison	776fec0f9f	fix wrong schema (#19370 ) Summary: Op was improperly schematized previously. Evidently checkScript does not test if the outputs are the same type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19370 Differential Revision: D14985159 Pulled By: eellison fbshipit-source-id: feb60552afa2a6956d71f64801f15e5fe19c3a91	2019-04-17 19:55:30 -07:00
Mikhail Zolotukhin	bf5f30f39b	Fix printing format in examples in jit/README.md. (#19323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19323 ghimport-source-id: 74a01917de70c9d59099cf601b24f3cb484ab7be Differential Revision: D14990100 Pulled By: ZolotukhinM fbshipit-source-id: 87ede08c8ca8f3027b03501fbce8598379e8b96c	2019-04-17 18:38:09 -07:00
Eric Faust	48859e3ad3	Allow for single-line deletions in clang_tidy.py (#19082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19082 When you have just one line of deletions, just as with additions, there is no count printed. Without this fix, we ignore all globs with single-line deletions when selecting which lines were changed. When all the changes in the file were single-line, this meant no line-filtering at all! Differential Revision: D14860426 fbshipit-source-id: c60e9d84f9520871fc0c08fa8c772c227d06fa27	2019-04-17 17:02:30 -07:00
Michael Suo	242743eedb	Revert D14901379: [jit] Add options to Operator to enable registration of alias analysis passes Differential Revision: D14901379 Original commit changeset: d92a497e280f fbshipit-source-id: 51d31491ab90907a6c95af5d8a59dff5e5ed36a4	2019-04-17 16:56:14 -07:00
Michael Suo	0414f23855	Revert D14901485: [jit] Only require python print on certain namespaces Differential Revision: D14901485 Original commit changeset: 4b02a66d325b fbshipit-source-id: 93348056c00f43c403cbf0d34f8c565680ceda11	2019-04-17 16:56:11 -07:00
Yinghai Lu	5fa1aad670	Remove unused template parameter in OnnxifiOp (#19362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19362 `float` type is never used in OnnxifiOp.... Reviewed By: bddppq Differential Revision: D14977970 fbshipit-source-id: 8fee02659dbe408e5a3e0ff95d74c04836c5c281	2019-04-17 16:48:14 -07:00
Jerry Zhang	ad8f34fcca	Add empty_quantized (#18960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18960 empty_affine_quantized creates an empty affine quantized Tensor from scratch. We might need this when we implement quantized operators. Differential Revision: D14810261 fbshipit-source-id: f07d8bf89822d02a202ee81c78a17aa4b3e571cc	2019-04-17 16:17:40 -07:00
Elias Ellison	4371cb5e01	Cast not expressions to bool (#19361 ) Summary: As part of implicitly casting condition statements, we should be casting not expressions as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19361 Differential Revision: D14984275 Pulled By: eellison fbshipit-source-id: f8dae64f74777154c25f7a6bcdac03cf44cbb60b	2019-04-17 16:06:48 -07:00
Owen Anderson	d6b91075dc	Eliminate type dispatch from copy_kernel, and use memcpy directly rather than implementing our own copy. (#19198 ) Summary: It turns out that copying bytes is the same no matter what type they're interpreted as, and memcpy is already vectorized on every platform of note. Paring this down to the simplest implementation saves just over 4KB off libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19198 Differential Revision: D14922656 Pulled By: resistor fbshipit-source-id: bb03899dd8f6b857847b822061e7aeb18c19e7b4	2019-04-17 15:39:13 -07:00
Bram Wasti	3e0b46b6d1	Only require python print on certain namespaces (#18846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18846 ghimport-source-id: b211e15d24c88fdc32d79222d9fce2fa9c291541 Differential Revision: D14901485 Pulled By: bwasti fbshipit-source-id: 4b02a66d325ba5391d1f838055aea13b5e4f6485	2019-04-17 14:24:50 -07:00
Bram Wasti	3a031c414a	Add options to Operator to enable registration of alias analysis passes (#18589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18589 ghimport-source-id: dab203f6be13bf41963848f5315235b6bbe45c08 Differential Revision: D14901379 Pulled By: bwasti fbshipit-source-id: d92a497e280f1b0a63b11a9fd8ae9b48bf52e6bf	2019-04-17 13:14:55 -07:00
Richard Zou	eaa14f5f59	Error out on in-place binops on tensors with internal overlap (#19317 ) Summary: This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common binops. See #17935 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317 Differential Revision: D14972399 Pulled By: zou3519 fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2	2019-04-17 13:02:07 -07:00
Vitaly Fedyunin	ff4a4d6155	Update for #19326 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19367 Differential Revision: D14981835 Pulled By: VitalyFedyunin fbshipit-source-id: e8a97986d9669ed7f465a7ba771801bdd043b606	2019-04-17 12:56:08 -07:00
Zafar Takhirov	aad6f97898	Decorator to make sure we can import `core` from caffe2 (#19273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19273 Some of the CIs are failing if the protobuf is not installed. Protobuf is imported as part of the `caffe2.python.core`, and this adds a skip decorator to avoid running tests that depend on `caffe2.python.core` Reviewed By: jianyuh Differential Revision: D14936387 fbshipit-source-id: e508a1858727bbd52c951d3018e2328e14f126be	2019-04-17 11:22:49 -07:00
Yinghai Lu	f1f31b634d	Eliminate AdjustBatch ops (#19083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083 As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by - inling the input adjust batch op into Glow - inling the output adjust batch op into OnnxifiOp and do that only conditionally. This is the C2 part of the change and requires change from Glow side to work e2e. Reviewed By: rdzhabarov Differential Revision: D14860582 fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6	2019-04-17 10:00:25 -07:00
Bharat123rox	3fcee4875c	Add rst entry for nn.MultiheadAttention (#19346 ) Summary: Fix #19259 by adding the missing `autoclass` entry for `nn.MultiheadAttention` from [here](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/activation.py#L676) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19346 Differential Revision: D14971426 Pulled By: soumith fbshipit-source-id: ceaaa8ea4618c38fa2bff139e7fa0d6c9ea193ea	2019-04-17 04:40:28 -07:00
Sebastian Messmer	db611b7caf	Delete C10Tensor (#19328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328 Plans changed and we don't want this class anymore. Reviewed By: dzhulgakov Differential Revision: D14966746 fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347	2019-04-17 00:02:27 -07:00
Junjie Bai	33443d083e	Fix python lint (#19331 ) Summary: VitalyFedyunin jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19331 Differential Revision: D14969435 Pulled By: bddppq fbshipit-source-id: c1555c52064758ecbe668f92b837f2d7524f6118	2019-04-16 21:47:30 -07:00
Nikolay Korovaiko	58d4414c33	Profiling pipeline part1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18772 Differential Revision: D14952781 Pulled By: Krovatkin fbshipit-source-id: 1e99fc9053c377291167f0b04b0f0829b452dbc4	2019-04-16 21:21:08 -07:00
Tongzhou Wang	93201d0676	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19310 Differential Revision: D14952046 Pulled By: soumith fbshipit-source-id: 1bbaaad6f932a832ea8e5e804d0d9cd9140a5071	2019-04-16 20:31:15 -07:00
Jerry Zhang	06c28d8a12	Add slicing and int_repr() to QTensor (#19296 ) Summary: Stack:     ⚫  #19296 [pt1][quant] Add slicing and int_repr() to QTensor  [💛](https://our.intern.facebook.com/intern/diff/D14756833/)     ⚪  #18960 [pt1][quant] Add empty_quantized  [💛](https://our.intern.facebook.com/intern/diff/D14810261/)     ⚪  #19312 Use the QTensor with QReLU  [💛](https://our.intern.facebook.com/intern/diff/D14819460/)     ⚪  #19319 [RFC] Quantized SumRelu  [💛](https://our.intern.facebook.com/intern/diff/D14866442/) Methods added to pytorch python frontend: - int_repr() returns a CPUByte Tensor which copies the data of QTensor. - Added as_strided for QTensorImpl which provides support for slicing a QTensor(see test_torch.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19296 Differential Revision: D14756833 Pulled By: jerryzh168 fbshipit-source-id: 6f4c92393330e725c4351d6ff5f5fe9ac7c768bf	2019-04-16 20:17:21 -07:00
Jerry Zhang	33e7977154	move const defs of DeviceType to DeviceType.h (#19185 ) Summary: Stack:     ⚫  #19185 [c10][core][ez] move const defs of DeviceType to DeviceType.h  [💛](https://our.intern.facebook.com/intern/diff/D14909415/) att Pull Request resolved: https://github.com/pytorch/pytorch/pull/19185 Differential Revision: D14909415 Pulled By: jerryzh168 fbshipit-source-id: 876cf999424d8394f5ff20e6750133a4e43466d4	2019-04-16 20:02:21 -07:00
Xiaoqiang Zheng	5627940e9c	Add a fast path for batch-norm CPU inference. (#19152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19152 Adding a fast path for batch-norm CPU inference when all tensors are contiguous. * Leverage vectorization through smiple loops. * Folding linear terms before computation. * For resnext-101, this version gets 18.95 times faster. * Add a microbenchmark: * (buck build mode/opt -c python.package_style=inplace --show-output //caffe2/benchmarks/operator_benchmark:batchnorm_benchmark) && \ (OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/batchnorm_benchmark#binary.par) * batch_norm: data shape: [1, 256, 3136], bandwidth: 22.26 GB/s * batch_norm: data shape: [1, 65536, 1], bandwidth: 5.57 GB/s * batch_norm: data shape: [128, 2048, 1], bandwidth: 18.21 GB/s Reviewed By: soumith, BIT-silence Differential Revision: D14889728 fbshipit-source-id: 20c9e567e38ff7dbb9097873b85160eca2b0a795	2019-04-16 19:27:54 -07:00
Jerry Zhang	ff0a7ae43f	Testing for folded conv_bn_relu (#19298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19298 Proper testing for conv_bn_relu folding Differential Revision: D13998891 fbshipit-source-id: ceb58ccec19885cbbf38964ee0d0db070e098b4a	2019-04-16 19:04:06 -07:00
Ilia Cherniavskii	9f35185b56	Initialize intra-op threads in JIT thread pool (#19058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19058 ghimport-source-id: 53e87df8d93459259854a17d4de3348e463622dc Differential Revision: D14849624 Pulled By: ilia-cher fbshipit-source-id: 5043a1d4330e38857c8e04c547526a3ba5b30fa9	2019-04-16 18:27:22 -07:00
Mikhail Zolotukhin	5e7bc26f65	Fix ASSERT_ANY_THROW. (#19321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19321 ghimport-source-id: 9efffc36950152105bd0dc13f450161367101410 Differential Revision: D14962184 Pulled By: ZolotukhinM fbshipit-source-id: 22d602f50eb5e17a3e3f59cc7feb59a8d88df00d	2019-04-16 15:44:53 -07:00
David Riazati	78f589e794	Add len() for strings (#19320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19320 ghimport-source-id: 62131cb24e9bf65f0ef3e60001cb36509a1f4163 Reviewed By: bethebunny Differential Revision: D14961078 Pulled By: driazati fbshipit-source-id: 08b9a4b10e4a47ea09ebf55a4743defa40c74698	2019-04-16 15:11:33 -07:00
Xiang Gao	df67969e6b	Step 3: Add support for return_counts to torch.unique for dim not None (#18650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18650 ghimport-source-id: 75759c95e6c48e27c172b919097dbc40c6bfb5e6 Differential Revision: D14892319 Pulled By: VitalyFedyunin fbshipit-source-id: ec5d1b80fc879d273ac5a534434fd648468dda1e	2019-04-16 14:06:45 -07:00
Karl Ostmo	c8897d2263	invoke NN smoketests from a python loop instead of a batch file (#18756 ) Summary: I tried first to convert the `.bat` script to a Bash `.sh` script, but I got this error: ``` [...]/build/win_tmp/ci_scripts/test_python_nn.sh: line 3: fg: no job control ``` Line 3 was where `%TMP_DIR%/ci_scripts/setup_pytorch_env.bat` was invoked. I found a potential workaround on stack overflow of adding the `monitor` (`-m`) flag to the script, but hat didn't work either: ``` 00:58:00 /bin/bash: cannot set terminal process group (3568): Inappropriate ioctl for device 00:58:00 /bin/bash: no job control in this shell 00:58:00 + %TMP_DIR%/ci_scripts/setup_pytorch_env.bat 00:58:00 /c/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-test1/build/win_tmp/ci_scripts/test_python_nn.sh: line 3: fg: no job control ``` So instead I decided to use Python to replace the `.bat` script. I believe this is an improvement in that it's both "table-driven" now and cross-platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18756 Differential Revision: D14957570 Pulled By: kostmo fbshipit-source-id: 87794e64b56ffacbde4fd44938045f9f68f7bc2a	2019-04-16 13:11:03 -07:00
Vitaly Fedyunin	1c5073fb4b	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Now compatible with both torch scripts: ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)` and ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))` Same checked for all similar functions `rand_like`, `empty_like` and others It is fixed version of #18455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952 Differential Revision: D14801792 Pulled By: VitalyFedyunin fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba	2019-04-16 11:06:15 -07:00
J M Dieterich	31686805f2	Enable unit tests for ROCm 2.3 (#19307 ) Summary: Unit tests that hang on clock64() calls are now fixed. test_gamma_gpu_sample is now fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19307 Differential Revision: D14953420 Pulled By: bddppq fbshipit-source-id: efe807b54e047578415eb1b1e03f8ad44ea27c13	2019-04-16 10:58:27 -07:00
Jerry Zhang	e1f38a847d	Fix type conversion in dequant and add a test (#19226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19226 Type conversoin was wrong previously. Thanks zafartahirov for finding it! Differential Revision: D14926610 fbshipit-source-id: 6824f9813137a3d171694d743fbb437a663b1f88	2019-04-16 10:52:44 -07:00
Alexandr Morev	da4ff17eee	math module support (#19115 ) Summary: This PR refer to issue [#19026](https://github.com/pytorch/pytorch/issues/19026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19115 Differential Revision: D14936053 Pulled By: driazati fbshipit-source-id: 68d5f33ced085fcb8c10ff953bc7e99df055eccc	2019-04-16 10:44:07 -07:00
Shen Li	344acaa0ca	Revert replicate.py to disallow replicating multi-device modules (#19278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19278 Based on discussion in https://github.com/pytorch/pytorch/pull/19278 and https://github.com/pytorch/pytorch/pull/18687, changes to replicate.py will be reverted to disallow replicating multi-device modules. Reviewed By: pietern Differential Revision: D14940018 fbshipit-source-id: 7504c0f4325c2639264c52dcbb499e61c9ad2c26	2019-04-16 10:03:38 -07:00
Zachary DeVito	b9c20d5224	graph_for based on last_optimized_executed_graph (#19142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19142 ghimport-source-id: 822013fb7e93032c74867fc77c6774c680aef6d1 Differential Revision: D14888703 Pulled By: zdevito fbshipit-source-id: a2ad65a042d08b1adef965c2cceef37bb5d26ba9	2019-04-16 09:17:53 -07:00
Richard Zou	3b29cbaf86	Enable half for CUDA dense EmbeddingBag backward. (#19293 ) Summary: I audited the relevant kernel and saw it accumulates a good deal into float so it should be fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19293 Differential Revision: D14942274 Pulled By: zou3519 fbshipit-source-id: 36996ba0fbb29fbfb12b27bfe9c0ad1eb012ba3c	2019-04-16 08:57:20 -07:00
Mingzhe Li	3501576230	calculate execution time based on final iterations (#19299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19299 I saw larger than 5% performance variation with small operators, this diff aims to reduce the variation by avoiding python overhead. Previously, in the benchmark, we run the main loop for 100 iterations then look at the time. If it's not significant, we will double the number of iterations to rerun and look at the result. We continue this process until it becomes significant. We calculate the time by total_time / number of iterations. The issue is that we are including multiple python trigger overhead. Now, I change the logic to calculate execution time based on the last run instead of all runs, the equation is time_in_last_run/number of iterations. Reviewed By: hl475 Differential Revision: D14925287 fbshipit-source-id: cb646298c08a651e27b99a5547350da367ffff47	2019-04-16 08:57:17 -07:00
Ilia Cherniavskii	646cb6157d	Move OMP/MKL thread initialization into ATen/Parallel (#19011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19011 ghimport-source-id: 432e31eccfd0e59fa21a790f861e6b2ff4fdbac6 Differential Revision: D14846034 Pulled By: ilia-cher fbshipit-source-id: d9d03c761d34bac80e09ce776e41c20fd3b04389	2019-04-16 00:16:32 -07:00
Mark Santaniello	20fc7b6ec7	Avoid undefined symbol error when building AdIndexer LTO (#19009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19009 Move the definition of `MulFunctor<>::Backward()` into a header file. Reviewed By: BIT-silence Differential Revision: D14823230 fbshipit-source-id: 1efaec01863fcc02dcbe7e788d376e72f8564501	2019-04-15 23:43:13 -07:00
Nikolay Korovaiko	ada10ad416	Ellipsis in subscript Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17763 Differential Revision: D14893533 Pulled By: Krovatkin fbshipit-source-id: c46b4e386d3aa30e6dc03e3052d2e5ff097fa74b	2019-04-15 22:10:44 -07:00
Ilia Cherniavskii	f1c8e01524	Add input information in RecordFunction calls (#18717 ) Summary: Add input information into generated RecordFunction calls in VariableType wrappers, JIT operators and a few more locations Pull Request resolved: https://github.com/pytorch/pytorch/pull/18717 Differential Revision: D14729156 Pulled By: ilia-cher fbshipit-source-id: 811ac4cbfd85af5c389ef030a7e82ef454afadec	2019-04-15 20:28:08 -07:00
Summer Deng	84b264b17d	Add NHWC order support in the cost inference function of 3d conv (#19170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19170 As title The quantized resnext3d model in production got the following failures without the fix: ``` Caffe2 operator Int8ConvRelu logging error: [enforce fail at conv_pool_op_base.h:463] order == StorageOrder::NCHW. 1 vs 2. Conv3D only supports NCHW on the production quantized model ``` Reviewed By: jspark1105 Differential Revision: D14894276 fbshipit-source-id: ef97772277f322ed45215e382c3b4a3702e47e59	2019-04-15 16:47:22 -07:00
Jongsoo Park	ffc9e29844	unit test with multiple op invocations (#19118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19118 A bug introduced by D14700576 reported by Yufei (fixed by D14778810 and D14785256) was not detected by our units tests. This diff improves unit tests to catch such errors (with this diff and without D14778810, we can reproduce the bug Yufei reported). This improvement also revealed a bug that affects the accuracy when we pre-pack weight and bias together and the pre-packed weight/bias are used by multiple nets. We were modifying the pre-packed bias in-place which was supposed to be constants. Reviewed By: csummersea Differential Revision: D14806077 fbshipit-source-id: aa9049c74b6ea98d21fbd097de306447a662a46d	2019-04-15 14:41:28 -07:00
Karl Ostmo	00148825fc	Run shellcheck on Jenkins scripts (#18874 ) Summary: closes #18873 Doesn't fail the build on warnings yet. Also fix most severe shellcheck warnings Limited to `.jenkins/pytorch/` at this time Pull Request resolved: https://github.com/pytorch/pytorch/pull/18874 Differential Revision: D14936165 Pulled By: kostmo fbshipit-source-id: 1ee335695e54fe6c387ef0f6606ea7011dad0fd4	2019-04-15 12:48:52 -07:00
Pieter Noordhuis	a0263ec047	Make DistributedDataParallel use new reducer (#18953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18953 This removes Python side bucketing code from DistributedDataParallel and replaces it with calls to the new C++ based bucketing and reducing code. To confirm this is working well, we ran a test with both the previous implementation and the new implementation, and confirmed they are numerically equivalent. Performance is improved by a couple percent or more, including the single machine multiple GPU runs. Closes #13273. Reviewed By: mrshenli Differential Revision: D14580911 fbshipit-source-id: 44e76f8b0b7e58dd6c91644e3df4660ca2ee4ae2	2019-04-15 12:44:38 -07:00
Gemfield	6ed57e052d	Fix the return value of ParseFromString (#19262 ) Summary: Fix the return value of ParseFromString. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19262 Differential Revision: D14937605 Pulled By: ezyang fbshipit-source-id: 3f441086517186a075efb3d74f09160463b696b3	2019-04-15 12:39:29 -07:00
vishwakftw	3403cb857b	Modify Cholesky derivative (#19116 ) Summary: The derivative of the Cholesky decomposition was previously a triangular matrix. Changelog: - Modify the derivative of Cholesky from a triangular matrix to symmetric matrix Pull Request resolved: https://github.com/pytorch/pytorch/pull/19116 Differential Revision: D14935470 Pulled By: ezyang fbshipit-source-id: 1c1c76b478c6b99e4e16624682842cb632e8e8b9	2019-04-15 12:16:55 -07:00
Karl Ostmo	991279dc7d	produce diagram for caffe2 build matrix (#18517 ) Summary: This PR splits the configuration tree data from the logic used to construct the tree, for both `pytorch` and `caffe2` build configs. Caffe2 configs are also now illustrated in a diagram. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18517 Differential Revision: D14936170 Pulled By: kostmo fbshipit-source-id: 7b40a88512627377c5ea0f24765dabfef76ca279	2019-04-15 11:45:32 -07:00
Sam Gross	7caad0ed33	Free all blocks with outstanding events on OOM-retry (#19222 ) Summary: The caching allocator tries to free all blocks on an out-of-memory error. Previously, it did not free blocks that still had outstanding stream uses. This change synchronizes on the outstanding events and frees those blocks. See #19219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19222 Differential Revision: D14925071 Pulled By: colesbury fbshipit-source-id: a2e9fe957ec11b00ea8e6c0468436c519667c558	2019-04-15 11:29:27 -07:00
Vitaly Fedyunin	86619b8ba6	Make sure that any of the future versions can load and execute older models. (#19174 ) Summary: Helps to test #18952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19174 Differential Revision: D14899474 Pulled By: VitalyFedyunin fbshipit-source-id: a4854ad44da28bd0f5115ca316e6078cbfe29d0d	2019-04-15 10:49:31 -07:00
Sebastian Messmer	68c4ebbeeb	Sync fbcode/caffe2 and xplat/caffe2 (1) (#19218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19218 Sync some contents between fbcode/caffe2 and xplat/caffe2 to move closer towards a world where they are identical. Reviewed By: dzhulgakov Differential Revision: D14919916 fbshipit-source-id: 29c6b6d89ac556d58ae3cd02619aca88c79591c1	2019-04-13 21:45:52 -07:00
Ailing Zhang	2060e44ec8	upgrade bazel version in CI [xla ci] (#19246 ) Summary: The latest TF requires upgrading bazel version. This PR should fix xla tests in CI. [xla ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/19246 Differential Revision: D14929533 Pulled By: ailzhang fbshipit-source-id: f6deb31428ed39f267d96bb9814d06f76641e73b	2019-04-13 20:16:37 -07:00
Junjie Bai	1c099fd5c9	Update docker images to use ROCm 2.3 (#19231 ) Summary: xw285cornell petrex iotamudelta https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger-test/24676/ https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/17679/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/24652/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger/9943/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/19231 Differential Revision: D14928580 Pulled By: bddppq fbshipit-source-id: 025b0affa6bcda6ee9f823dfc6c2cf8b92e71027	2019-04-13 13:11:26 -07:00
Zachary DeVito	10bc789dff	fix flake8 (#19243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19243 ghimport-source-id: ae80aed3a5742df21afb6e55979686220a27cce7 Differential Revision: D14928670 Pulled By: zdevito fbshipit-source-id: 20ec0d5c8d6f1c515beb55e2e63eddf3b2fc12dd	2019-04-13 10:04:39 -07:00
Zachary DeVito	e958ceb5d7	Remove GraphExecutor's python bindings (#19141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19141 ghimport-source-id: 796a41f5514d29959af052fcf5391a2834850a80 Reviewed By: jamesr66a Differential Revision: D14888702 Pulled By: zdevito fbshipit-source-id: c280145f08e7bc210434d1c99396a3257b626cf9	2019-04-13 08:42:24 -07:00
Zachary DeVito	ddda563f22	Cleanup ScriptModule bindings (#19138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19138 ghimport-source-id: 10f810f5e7551c1cb65fc4799744083bd7ffd1ee Reviewed By: jamesr66a Differential Revision: D14886945 Pulled By: zdevito fbshipit-source-id: a5e5bb08694d03166a7516ec038656c2a02e7896	2019-04-13 08:42:21 -07:00
Zachary DeVito	dcb5fd3613	get propagate_shape logic out of module.h (#19137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19137 ghimport-source-id: 2394765f2d401e68ffdfa4c985bfab4cca2517f8 Reviewed By: jamesr66a Differential Revision: D14885946 Pulled By: zdevito fbshipit-source-id: daa2894ed9761107e9d273bb172840dc23ace072	2019-04-13 08:42:17 -07:00
Zachary DeVito	1827ca4c35	Make debug subgraph inlining thread local (#19136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19136 ghimport-source-id: 3a24ab36aa753ce5cce7bba3467bdbe88e5c7f60 Reviewed By: jamesr66a Differential Revision: D14885051 Pulled By: zdevito fbshipit-source-id: b39c6ceef73ad9caefcbf8f40dd1b9132bba03c2	2019-04-13 08:42:14 -07:00
Zachary DeVito	c38c7b0ec5	Support Kwargs in C++ Function/Method calls (#19086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19086 ghimport-source-id: 7790a5cc6e32f6f72e92add0b9f76dfa49ad9859 Reviewed By: jamesr66a Differential Revision: D14875729 Pulled By: zdevito fbshipit-source-id: ad1e4542381d9c33722155459e794f1ba4660dbb	2019-04-13 08:42:11 -07:00
Johannes M Dieterich	d8669a2c7e	Enable working ROCm tests (#19169 ) Summary: Enable multi-GPU tests that work with ROCm 2.2. Have been run three times on CI to ensure stability. While there, remove skipIfRocm annotations for tests that depend on MAGMA. They still skip but now for the correct reason (no MAGMA) to improve our diagnostics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19169 Differential Revision: D14924812 Pulled By: bddppq fbshipit-source-id: 8b88f58bba58a08ddcd439e899a0abc6198fef64	2019-04-12 21:51:10 -07:00
Ailing Zhang	ca02558d40	import warnings in torch.hub & fix master CI travis (#19181 ) Summary: fix missing import in #18758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19181 Differential Revision: D14908198 Pulled By: ailzhang fbshipit-source-id: 31e0dc4a27521103a1b93f72511ae1b64a36117f	2019-04-12 21:35:31 -07:00
Jerry Zhang	bba96db2a5	fix lint errors in gen.py (#19221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19221 att Reviewed By: colesbury Differential Revision: D14923858 fbshipit-source-id: 4793d7794172d401455c5ce72dfc27dddad515d4	2019-04-12 18:26:38 -07:00
Bram Wasti	b1539412db	Add pass registration mechanism (#18587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18587 ghimport-source-id: 80d753f7046a2a719e0c076684f44fa2059a0921 Differential Revision: D14901227 Pulled By: bwasti fbshipit-source-id: 56511d0313419b63945a36b80e9ea51abdef2bd4	2019-04-12 15:32:00 -07:00
Wanchao Liang	a3d3008e73	JIT Layernorm fusion (#18266 ) Summary: Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266 Differential Revision: D14879877 Pulled By: wanchaol fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe	2019-04-12 14:38:31 -07:00
Yinghai Lu	0e435afc3c	Add more debugging helper to net transformer (#19176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19176 Add some amenities for debugging. Reviewed By: llyfacebook Differential Revision: D14901740 fbshipit-source-id: 2c4018fdbf7e3aba2a754b6b4103a72893c229c2	2019-04-12 14:28:37 -07:00
Jerry Zhang	1c836e7bb9	Add Quantized Backend (#18546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18546 We'll expose all combinations of various ways of quantization in the top level dispatch key, that is we have AffineCPUTensor, PerChannelAffineCUDATensor, etc. QTensor method added: - is_quantized() - item() Differential Revision: D14637671 fbshipit-source-id: 346bc6ef404a570f0efd34e8793056ad3c7855f5	2019-04-12 12:55:49 -07:00
Xiang Gao	3f7ddd269c	Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim (#18649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18649 ghimport-source-id: 3411d240a6af5fe299a889667964730184e30645 Differential Revision: D14888292 Pulled By: VitalyFedyunin fbshipit-source-id: 80da83c264598f74ab8decb165da4a1ce2b352bb	2019-04-12 12:41:20 -07:00
Lu Fang	bd55abb463	Fix onnx ints (#19102 ) Summary: If JIT constant propagation doesn't work, we have to handle the ListConstructor in symbolic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19102 Reviewed By: zrphercule Differential Revision: D14875588 Pulled By: houseroad fbshipit-source-id: d25c847d224d2d32db50aae1751100080e115022	2019-04-12 12:01:14 -07:00
Huamin Li	c480798a1c	use C10_REGISTER for GELU op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19090 Reviewed By: BIT-silence Differential Revision: D14864737 fbshipit-source-id: 8debd53171f7068726f0ab777a13ca46becbfbdf	2019-04-12 11:41:04 -07:00
Edward Yang	79db4e9c10	Fix tabs lint. (#19196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19196 ghimport-source-id: c10b1b19b087d7650e1614f008a9c2db21dfec2f Differential Revision: D14913428 Pulled By: ezyang fbshipit-source-id: 815b919d8e4516d0e5d89ebbdc4dff6d1d08da47	2019-04-12 11:22:05 -07:00
Will Feng	65ae897ae8	Pin nvidia-container-runtime version (#19195 ) Summary: This PR is to fix the CI error: ``` nvidia-docker2 : Depends: nvidia-container-runtime (= 2.0.0+docker18.09.4-1) but 2.0.0+docker18.09.5-1 is to be installed E: Unable to correct problems, you have held broken packages. Exited with code 100 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19195 Differential Revision: D14913104 Pulled By: yf225 fbshipit-source-id: d151205f5ffe9cac7320ded3c25baa7e051c3623	2019-04-12 10:00:40 -07:00
peter	deda88e0aa	One more fix for #18790 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19187 Differential Revision: D14913100 Pulled By: ezyang fbshipit-source-id: bf147747f933a2c9a35f3ff00bf6b83a4f29286c	2019-04-12 09:29:15 -07:00
Jerry Zhang	7e73783c6f	Fix promoteTypes for QInt types (#19182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19182 This is a bug discovered by zafartahirov, right now if one of the tensor is QInt type we'll return undefined, but actually we want to allow ops that accepts Tensors of the same QInt type to work. Reviewed By: zafartahirov Differential Revision: D14909172 fbshipit-source-id: 492fd6403da8c56e180efe9d632a3b7fc879aecf	2019-04-11 19:43:18 -07:00
Roy Li	422b01e788	Replace more usages of Type with DeprecatedTypeProperties (#19093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19093 ghimport-source-id: a82e3dce912a173b42a6a7e35eb1302d9f334e03 Differential Revision: D14865520 Pulled By: li-roy fbshipit-source-id: b1a8bf32f87920ce8d82f990d670477bc79d0ca7	2019-04-11 17:02:05 -07:00
David Riazati	fea0a0be53	Support attributes when copying modules (#19040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19040 ghimport-source-id: 37933efd717795751283cae8141e2e2caaae2e95 Reviewed By: eellison Differential Revision: D14895573 Pulled By: driazati fbshipit-source-id: bc2723212384ffa673d2a8df2bb57f38c62cc104	2019-04-11 15:38:06 -07:00
Will Feng	4ae59e4744	Move version_counter_ to TensorImpl (#18223 ) Summary: According to https://github.com/pytorch/pytorch/issues/13638#issuecomment-468055428, after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18223 Differential Revision: D14735123 Pulled By: yf225 fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b	2019-04-11 15:12:45 -07:00
Iurii Zdebskyi	507fe66bea	Enable comp ops for bool tensor (#19109 ) Summary: Enabled comparison ops for bool tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/19109 Differential Revision: D14871187 Pulled By: izdeby fbshipit-source-id: cf9951847d69124a93e5e21dd0a39c9568b1037d	2019-04-11 14:37:10 -07:00
Will Feng	c7b5a8a876	Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139 ) Summary: Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead. Removing `is_variable_` is part of the work in Variable/Tensor merge. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139 Differential Revision: D14893339 Pulled By: yf225 fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a	2019-04-11 14:03:33 -07:00
Zachary DeVito	ef406ee925	First class modules in the compiler, round 2 (#19167 ) Summary: This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where: * compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr` * GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`. * Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things. * This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound. Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`. * This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions. Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ... Details: * In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs. * When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class. * The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167 Differential Revision: D14891966 Pulled By: zdevito fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea	2019-04-11 13:55:48 -07:00
Gregory Chanan	b6ee83a5b4	Materialize a non-default device for C2 legacy storage. (#18605 ) Summary: It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath. This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device. Instead, we materialize a device by allocating 0 bytes via the allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605 Differential Revision: D14680620 Pulled By: gchanan fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432	2019-04-11 13:50:41 -07:00
Yinghai Lu	bbe648dffb	Allow empty net type (#19154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154 I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case. Reviewed By: dzhulgakov Differential Revision: D14890072 fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d	2019-04-11 12:43:07 -07:00
Lu Fang	33a950924a	Skip Slice if it's no op (#19155 ) Summary: If it's identity op, just skip the slice and return the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19155 Reviewed By: zrphercule Differential Revision: D14890238 Pulled By: houseroad fbshipit-source-id: f87b93df2cca0cb0e8ae2a1d95ba148044eafd4a	2019-04-11 12:26:32 -07:00
Lu Fang	feb5d26510	Rename ONNX util test names (#19153 ) Summary: Rename test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19153 Reviewed By: zrphercule Differential Revision: D14890095 Pulled By: houseroad fbshipit-source-id: 37a787398c88d9cc92b411c2355b43200cf1c4b0	2019-04-11 11:29:16 -07:00
Pieter Noordhuis	c1b92f518d	Remove ProcessGroup::getGroupRank (#19147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19147 After #14809 was merged there is no longer a need for getGroupRank. Every ProcessGroup object has its own rank and size fields which are accurate for the global group as well as subgroups. Strictly speaking removing a function in a minor version bump is a big no-no, but I highly doubt this was ever used outside of `torch.distributed` itself. This will result in a compile error for folks who have subclassed the ProcessGroup class though. If this is a concern we can delay merging until a later point in time, but eventually this will need to be cleaned up. Differential Revision: D14889736 fbshipit-source-id: 3846fe118b3265b50a10ab8b1c75425dad06932d	2019-04-11 09:17:40 -07:00
Zafar Takhirov	c145c34a7b	Basic implementation of QRelu in C10 (#19091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19091 Implements a basic quantized ReLU (uint8). This is a temporary solution before using the `QTensor` type instead of the tuple. Reviewed By: dzhulgakov Differential Revision: D14565413 fbshipit-source-id: 7d53cf5628cf9ec135603d6a1fb7c79cd9383019	2019-04-11 08:47:56 -07:00
Guanheng Zhang	4b20fc826d	Import MultiheadAttention to PyTorch (#18334 ) Summary: Import MultiheadAttention into the core pytorch framework. Users now can import MultiheadAttention directly from torch.nn. See "Attention Is All You Need" for more details related to MultiheadAttention function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18334 Differential Revision: D14577966 Pulled By: zhangguanheng66 fbshipit-source-id: 756c0deff623f3780651d9f9a70ce84516c806d3	2019-04-11 08:07:30 -07:00
Xing Wang	b6f130aa70	try to enable uncertainty for lr loss (#17236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17236 Following the paper in https://papers.nips.cc/paper/7141-what-uncertainties-do-we-need-in-bayesian-deep-learning-for-computer-vision.pdf, approximate the classification case with the regression formulation. For the LRLoss, add penalty based on the variance and regularization on the variance with a tunable parameter lambda. Reviewed By: chocjy Differential Revision: D14077106 fbshipit-source-id: 4405d8995cebdc7275a0dd07857d32a8915d78ef	2019-04-11 07:35:19 -07:00
sakaia@jp.fujitsu.com	160d0776d5	Remove comment (#19148 ) Summary: Remove pointer to nonexistent Note. It is already removed in "Remove support for CUDNN 6 (#15851)" Pull Request resolved: https://github.com/pytorch/pytorch/pull/19148 Differential Revision: D14891514 Pulled By: soumith fbshipit-source-id: dd33cfefa3a21e18afae5b3992dea085adaabda8	2019-04-11 07:04:45 -07:00
Zachary DeVito	f5165ade5b	Revert D14842057: Compiler uses first-class modules** Differential Revision: D14842057 Original commit changeset: ca6e7b5a4380 fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3	2019-04-11 06:17:01 -07:00
Zachary DeVito	5e1f0b2a07	Compiler uses first-class modules** (#19043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043 ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf Differential Revision: D14842057 Pulled By: zdevito fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0	2019-04-11 00:00:48 -07:00
Christian Puhrsch	fce0c5e17d	Require matches_jit_signature within native_functions.yaml (#18956 ) Summary: """ This will verify that the func syntax follows the JIT signature schema. If you are a developer outside the core team, set this to False first to help us track unification. After your tests pass try setting this to True once and leave it set to True if it doesn't trigger any asserts. This means that your signature happens to be compliant. In general, it serves as a means of tracking an ongoing schema unification with the goal of aligning func syntax with other components of PyTorch in order to reduce overall complexity and assert coverage of all functions by each component. """ Pull Request resolved: https://github.com/pytorch/pytorch/pull/18956 Differential Revision: D14807952 Pulled By: cpuhrsch fbshipit-source-id: 42dac49269fb3cd96dc62e0b10820d0c32c7fb0e	2019-04-10 23:35:28 -07:00
Ailing Zhang	e54cb03a51	add/move a few apis in torch.hub (#18758 ) Summary: * `torch.hub.list('pytorch/vision')` - show all available hub models in `pytorch/vision` * `torch.hub.show('pytorch/vision', 'resnet18')` - show docstring & example for `resnet18` in `pytorch/vision` * Moved `torch.utils.model_zoo.load_url` to `torch.hub.load_state_dict_from_url` and deprecate `torch.utils.model_zoo` * We have too many env to control where the cache dir is, it's not very necessary. I actually want to unify `TORCH_HUB_DIR`, `TORCH_HOME` and `TORCH_MODEL_ZOO`, but haven't done it. (more suggestions are welcome!) * Simplify `pytorch/vision` example in doc, it was used to show how how hub entrypoint can be written so had some confusing unnecessary args. An example of hub usage is shown below ``` In [1]: import torch In [2]: torch.hub.list('pytorch/vision', force_reload=True) Downloading: "https://github.com/pytorch/vision/archive/master.zip" to /private/home/ailzhang/.torch/hub/master.zip Out[2]: ['resnet18', 'resnet50'] In [3]: torch.hub.show('pytorch/vision', 'resnet18') Using cache found in /private/home/ailzhang/.torch/hub/vision_master Resnet18 model pretrained (bool): a recommended kwargs for all entrypoints args & kwargs are arguments for the function In [4]: model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True) Using cache found in /private/home/ailzhang/.torch/hub/vision_master ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18758 Differential Revision: D14883651 Pulled By: ailzhang fbshipit-source-id: 6db6ab708a74121782a9154c44b0e190b23e8309	2019-04-10 23:10:39 -07:00
Pieter Noordhuis	5164622ba4	Revert D14878128: [jit] Support attributes when copying modules Differential Revision: D14878128 Original commit changeset: 7ef5f7b1b16b fbshipit-source-id: 3818222a897f8c01bc67f550ed0fd3ddecf61015	2019-04-10 22:24:30 -07:00
Pieter Noordhuis	ce166d949d	ProcessGroupMPI exists only if it is valid (#14809 ) Summary: Previously, MPI process groups were created for all processes, even if they were not part of the created group. Their MPI_Comm member field would be MPI_COMM_NULL and they would ignore any calls. Their rank and size were identical to that of the global process group and they had a special groupRank and groupSize field to capture the _real_ rank. This also meant assymetry with other process group types, where creating a new group would either return the process group OR GroupMember.NON_GROUP_MEMBER. For the MPI process group, it would always return a process group and an additional check was needed to verify whether or not a process was indeed part of a process group or not. This commit changes this such that every MPI process group is a valid process group, and by extension that we no longer have to special case MPI to determine whether or not a process is part of a group. Now, if the value returned by `new_group` is GroupMember.NON_GROUP_MEMBER, the process is not a member, otherwise it is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14809 Differential Revision: D14887937 Pulled By: pietern fbshipit-source-id: c5bf86d3b33e524cc5004ee68e30103178fa491d	2019-04-10 21:36:35 -07:00
Shen Li	6b0ca8eae5	Fix flaky store timeout test (#19114 ) Summary: ~Sometimes, `init_process_group()`, `store.get()`, and `destory_process_group()` can take more than a few seconds. Hence, removing thread join timeout.~ The error was due to `Address already in use` when starting TPC backend. The solution is to catch the error and report it to the `retry_on_address_already_in_use_error` decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19114 Reviewed By: ezyang Differential Revision: D14872680 Pulled By: mrshenli fbshipit-source-id: fc504d02853ca73f76288c0ade564ab20bc01f7e	2019-04-10 20:35:36 -07:00
Xiaomeng Yang	821b5f138a	Optimize SoftmaxOp on CPU (#18635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18635 Optimize SoftmaxOp on CPU Reviewed By: houseroad Differential Revision: D14689516 fbshipit-source-id: d2dcee2476d1a3a21f428e99bce9835f1d229d64	2019-04-10 18:52:15 -07:00
Zachary DeVito	1abbee0f8e	Allow Tensor lists to show up in symbolic differentiable graphs. (#16784 ) Summary: It is done by flattening all tensor lists that are inputs/outputs to the graph into the inputs/outputs list in the autograd graph. This is less desirable than simply allowing IValues to exist in the inputs/outputs of autograd::Function but it is substantially less intrusive. CaptureList describes the variables captured for backward in a single class. UnpackInstructs describes how the flattened inputs to backwards are re-packed into lists. ailzhang This PR is also part 2 of covering maskrcnn & bert AD formulas, following #16689. Ops added in this PR: ``` cat index meshgrid reshape split split_with_sizes stack unbind ``` I will also add a few perf numbers here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16784 Differential Revision: D14104063 Pulled By: ailzhang fbshipit-source-id: 5ceadadfd67ccaac60c5fd6740786c5354e252b9	2019-04-10 18:16:20 -07:00
David Riazati	612998f2ee	Support attributes when copying modules (#19040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19040 ghimport-source-id: 37933efd717795751283cae8141e2e2caaae2e95 Differential Revision: D14878128 Pulled By: driazati fbshipit-source-id: 7ef5f7b1b16b9bf9254e8503564fa3a750d841ab	2019-04-10 16:12:29 -07:00
Hao Lu	226a358136	Move ConcatBatchMatMulBatchGatherOp to OSS Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19059 Reviewed By: bwasti Differential Revision: D14849735 fbshipit-source-id: fefd1887d38e51151c07a8b187e9c7c50ef02c6e	2019-04-10 15:29:03 -07:00
Edward Yang	70313941b4	Print CuDNN version correctly. (#19110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19110 ghimport-source-id: efbaf9b23cb61e7ea65460684778c6eeb38ae28e Differential Revision: D14874497 Pulled By: ezyang fbshipit-source-id: ced03576f7598189dd8cce79b3303a5529551f46	2019-04-10 14:20:22 -07:00
Roy Li	8e8874ae54	Infer device from pointer in from_blob (#19094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19094 ghimport-source-id: 8207cf614ba36333af610309b24fdc13441b2837 Differential Revision: D14865925 Pulled By: li-roy fbshipit-source-id: 16613801f7fe0e829ccab8af081517ea4257db06	2019-04-10 12:51:05 -07:00
Gu, Jinghui	575aebc182	implement operators for DNNLOWP (#18656 ) Summary: Implement operators for DNNLOWP, including int8_conv, int8_FC, int8_pooling, int8_relu, int8_sum, quantize/dequantize, and order_swtich operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18656 Differential Revision: D14767092 Pulled By: yinghai fbshipit-source-id: 1f3e24929a358a42214da333bd304c593ea4468f	2019-04-10 12:04:39 -07:00
Gregory Chanan	d537e12310	Improve mismatched storage error message. (#19068 ) Summary: Previously the error message would look like: ``` Attempted to set the storage of a tensor on device cuda:0 to a storage on different device cuda. This is no longer allowed; the devices must match. ``` Now it looks like: ``` Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cuda". This is no longer allowed; the devices must match. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19068 Reviewed By: dzhulgakov Differential Revision: D14854257 Pulled By: gchanan fbshipit-source-id: deb1ef73c2fcbf9338e7d67f2856282db2befac8	2019-04-10 11:51:33 -07:00
David Riazati	86532c921d	Refactor pickler (#19035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19035 ghimport-source-id: 553977b9963d4877e5066a61702f887e81706598 Differential Revision: D14839341 Pulled By: driazati fbshipit-source-id: d6e4f21b2df28e2a0a21b26bf08d9905599119ad	2019-04-10 11:26:07 -07:00
iurii zdebskyi	1858773c0c	Fixed bool Tensor value change bug (#19096 ) Summary: Fixes #19077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19096 Differential Revision: D14871044 Pulled By: izdeby fbshipit-source-id: 61b12559c8c5b9613e00ba5933f478321ea80469	2019-04-10 11:09:07 -07:00
Dmytro Dzhulgakov	92f70bb639	Split python_ir.h in a more sensible way (#19081 ) Summary: Files included in libtorch do depend on torch/csrc/utils/object_ptr.h, e.g. ir.cpp: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/ir.h#L10 (including usage in std::vector that requires destructor for THPPointer) However, object_ptr.h depends on python stub: https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/object_ptr.h#L3 Whereas object_ptr.cpp depends full on on python: https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/object_ptr.cpp#L8 `torch/csrc/utils/object_ptr.cpp` is included only in Python extension target: https://github.com/pytorch/pytorch/blob/master/torch/CMakeLists.txt#L541 The only reason it was working on master is that compiler was aggressive enough in pruning unused inline functions. With a bit of changes in flags, it started breaking (like in kostmo's PR). This PR splits out python-dependent bits more explicitly by forward declaring THPPointer for real. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19081 Reviewed By: ezyang Differential Revision: D14860091 Pulled By: dzhulgakov fbshipit-source-id: 4e86cb8e2ac57aedb3cd00c15270d65bb376206c	2019-04-10 10:26:50 -07:00
Yinghai Lu	b461689cfd	Clear input/ouput shape cache for each inference (#19085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19085 This is a bug where input_shapes_ and output_shapes_ will grow indefinitely. Fix it here. Reviewed By: bertmaher, rdzhabarov Differential Revision: D14861695 fbshipit-source-id: d59116f27c3b54f5cc5a33533de4b9222dbb7afc	2019-04-10 10:21:37 -07:00
Xiang Gao	ea2405c7dc	Add torch.unique_consecutive (#19060 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/19045 Please review: VitalyFedyunin ngimel This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there. The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely. Benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) print() a = tensor.sort().values.to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+2addccc cpu, sorted_input=False: 340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cuda, sorted_input=False: 213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ```python print(torch.__version__) print() a1, a2 = tensor.unbind(1) indices = (a1 * tensor.max() + a2).sort().indices a = tensor.index_select(0, indices).to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` cpu, sorted_input=False: 55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda, sorted_input=False: 171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060 Differential Revision: D14866909 Pulled By: ezyang fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f	2019-04-10 07:36:08 -07:00
Lu Fang	23b0908d38	Replace tabs with space (#19100 ) Summary: fix the linter Pull Request resolved: https://github.com/pytorch/pytorch/pull/19100 Differential Revision: D14869256 Pulled By: houseroad fbshipit-source-id: 27ca93cd1dce01ac705b9c9ed93ca8eb6c36351c	2019-04-10 00:35:02 -07:00
Roy Ju	a9a29dd63f	Fixes error when too many parameters are passed to fused cuda kernel (#18063 ) Summary: Bug fix for https://github.com/pytorch/pytorch/issues/15043, where a large fusion in JIT with a large number of kernel arguments, which exceeds the limit allowed by nvrtc on a cuda device. The fix is to check the number of arguments before a cuda kernel is generated. If the number exceeds the limit, take the runFallBack() path. Add a reduced test from the original issue to keep the test time low. The test would fail without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18063 Differential Revision: D14691401 Pulled By: soumith fbshipit-source-id: b98829bc89ed7724e91eda82ae3a5a1151af721a	2019-04-09 22:37:09 -07:00
Summer Deng	496b0b03d9	amend D14778810 (#18902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18902 Fix in D14778810 had an issue that when we fallback to acc32 because the density of outlier is too high W_quantized_ is already modified. In this diff we first just count the number of outliers (without modifying W_quantized_) and only when density is low enough and no need for fallback we modify W_quantized_ and construct an outlier matrix. Reviewed By: jspark1105 Differential Revision: D14785256 fbshipit-source-id: 03933110a4ca7409686a06b18a9bb921f8657950	2019-04-09 22:08:54 -07:00
James Reed	82b570528d	Move abs, frac, reciprocal, and neg to TensorIterator (#19041 ) Summary: I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins. Benchmark script: ``` import torch, time ops = ['abs', 'neg', 'reciprocal', 'frac'] x = torch.rand(1024, 1024) NITER = 10000 print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t') for op in ops: s = time.time() for i in range(NITER): getattr(x, op)() elapsed_sec = ((time.time() - s) / NITER) print(op, elapsed_sec * 1000, (10241024/elapsed_sec)/1e9, (1024102442) / elapsed_sec / 1e9, sep='\t') ``` Before this change (on my mac with a skylake): ``` op time per iter (ms) gops/s GB/s abs 0.9730974197387695 1.0775652866097343 8.620522292877874 neg 1.0723679780960083 0.9778136063534356 7.822508850827485 reciprocal 1.2610594034194946 0.8315040490215421 6.6520323921723366 frac 1.1681334018707275 0.8976509004200546 7.181207203360437 ``` After this change: ``` op time per iter (ms) gops/s GB/s abs 0.5031076192855835 2.084198210889721 16.673585687117768 neg 0.4433974027633667 2.3648672578256087 18.91893806260487 reciprocal 0.47145988941192624 2.2241043693195985 17.79283495455679 frac 0.5036592721939087 2.0819154096627024 16.65532327730162 ``` So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19041 Differential Revision: D14862037 Pulled By: jamesr66a fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912	2019-04-09 21:55:00 -07:00
Wanchao Liang	56b18eadab	Fix aten op output assignment (#18581 ) Summary: Fixes the problem of #18391 The issue is that when we code gen the ATenOp, we always generated static number of outputs for each operator. E.g. If there's operator from a old model that only requires two outputs, in its createOperator it will only allocate two output blobs, while the newer version of the operator (`unique` in this case) requires more output blob to be allocated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18581 Differential Revision: D14865647 Pulled By: wanchaol fbshipit-source-id: 85f63fe16d6fe408a09eca84798c7e8cab3070e9	2019-04-09 21:39:12 -07:00
Richard Zou	447d74a074	EmbeddingBag w/ differentiable per_sample_weights (#18957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957 ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e Reviewed By: cpuhrsch Differential Revision: D14856526 Pulled By: zou3519 fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9	2019-04-09 18:13:06 -07:00
Richard Zou	c889ff6cf8	EmbeddingBag w/ per_sample_weights CUDA fwd + bwd (#18800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18800 ghimport-source-id: 17f638dea0e1ac9a86ec06b223c60362ed78449c Reviewed By: cpuhrsch Differential Revision: D14851422 Pulled By: zou3519 fbshipit-source-id: 27b114e51e66112e4bc9cfc63d1d1ddfa650d347	2019-04-09 18:13:02 -07:00
Richard Zou	0397d7c0c8	EmbeddingBag w/ per_sample_weights CPU backward (#18799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18799 ghimport-source-id: 58a6f629e890449013f24a9b6282664ca2a1e3ba Reviewed By: cpuhrsch Differential Revision: D14851417 Pulled By: zou3519 fbshipit-source-id: c36b9d469989354bf6cef1c2c3dc4f13e7cb1a25	2019-04-09 18:12:59 -07:00
Richard Zou	2a2007e5ac	EmbeddingBag CPU forward with per_sample_weights. (#18735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735 ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920 Reviewed By: cpuhrsch Differential Revision: D14851415 Pulled By: zou3519 fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325	2019-04-09 18:12:55 -07:00
Richard Zou	c561ef5406	Refactor CPU embedding_bag implementation (#18734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18734 ghimport-source-id: e0e50d4b47f2fb8c86e464aacb950521d601f8d3 Reviewed By: cpuhrsch Differential Revision: D14851413 Pulled By: zou3519 fbshipit-source-id: 8ac4e4de590a363e9807dc552fe4ca52b92652ed	2019-04-09 18:12:52 -07:00
Alexander Sidorov	0ca8f7a15f	Make BlackBoxPredictor handle networks throwing exceptions (#19080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080 OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test Reviewed By: dzhulgakov Differential Revision: D14814194 fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c	2019-04-09 16:42:12 -07:00
Shen Li	168c0797c4	Remind users to set map_location properly when using DDP Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19084 Differential Revision: D14861702 Pulled By: mrshenli fbshipit-source-id: 10ca4a9b41e707050a6bce228ccca4177c9fa4a6	2019-04-09 16:29:38 -07:00
Vishwak Srinivasan	487388d8ad	Rename btrisolve to lu_solve (#18726 ) Summary: Changelog: - Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726 Differential Revision: D14726237 Pulled By: zou3519 fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b	2019-04-09 15:21:24 -07:00
Shen Li	5eb6a2be41	Avoid calling tensor.data.set_() in DDP Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18961 Differential Revision: D14811208 Pulled By: mrshenli fbshipit-source-id: c1c46dfa13e0a6ec83aefd35696ee31a7ea3d810	2019-04-09 14:18:24 -07:00
Dmytro Dzhulgakov	1f0ee9d6e6	Reapply Wrap workaround for cpp custom types a bit prettier and add an example" (#19062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19062 As a temporary demonstration on how to extend this hack further until custom C types are ready. Reviewed By: ezyang Differential Revision: D14817809 fbshipit-source-id: 6eaf731e9135313eb858e178abcd9f25380ab8fe	2019-04-09 12:36:32 -07:00
Shen Li	8f9b11cf33	Propagate ProcessGroup timeout to Store (#16571 ) Summary: closes #16520 Hi pietern, I am not sure if this is the expected way to pass timeout to `Store`, could you please help take a look? Thanks! Questions: 1. How do I write tests for this? I wanted to do something like `test_barrier_timeout_global`, but it seems I need to set the pg's timeout larger than the `Store`'s default timeout (3 min) to see a difference, which is too long for a unit test. And I do not want to change the `Store`'s default timeout either. Any suggestion? 2. Should I also propagate timeout configuration down to `PrefixStore` in `_new_process_group_helper`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/16571 Differential Revision: D13954527 Pulled By: mrshenli fbshipit-source-id: 77f2653903f24255207233eb298f7c0321119a87	2019-04-09 12:36:28 -07:00
Wanchao Liang	aa017db59c	make test_jit_fuser runnable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19036 Differential Revision: D14839800 Pulled By: wanchaol fbshipit-source-id: b52c131b58e1b42a8c3da5d1117217c3dc2e5f5b	2019-04-09 12:36:25 -07:00
Edward Yang	ad45d09202	Fix documentation for unfold(dimension=..., ...), fixes #18793 (#19020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19020 ghimport-source-id: 8f31e51b79daba11939aa7992450984054713b9c Differential Revision: D14851890 Pulled By: ezyang fbshipit-source-id: 8498e86a63633fdfd9ecae9b7f85b773b75fe27a	2019-04-09 11:54:25 -07:00
Edward Yang	a3e177083b	Debugging: Increase process reporting for apt/dpkg. (#18880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18880 ghimport-source-id: b43a33c12df379ec75c1fd4c713c1fc723a763e1 Differential Revision: D14856296 Pulled By: ezyang fbshipit-source-id: 30691eb14dddfe998b2605b416aaa1b14d1b6ad5	2019-04-09 11:40:47 -07:00
Edward Yang	29ea08616b	Add torch.__config__.show(), reporting detailed version of all libraries. (#18579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18579 ghimport-source-id: 65124c95e49423de4ad1008c65e75057fea09b94 Differential Revision: D14778507 Pulled By: ezyang fbshipit-source-id: 1e4bb79f4800a116ce8fb7af2fefbd34da8d102c	2019-04-09 11:13:24 -07:00
Omegastick	31ff0ecd2b	Fix torch::nn::init::orthogonal_ with CNNs (#18915 ) Summary: Fixes #18518 I changed the C++ API torch::nn::init::orthogonal_ implementation to match the Python implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18915 Differential Revision: D14851833 Pulled By: ezyang fbshipit-source-id: 45b5e9741582777c203e9ebed564ab3ac1f94baf	2019-04-09 10:39:15 -07:00
Soumith Chintala	25bd28c3a0	move nightlies to 1.1.0xxx Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19069 Differential Revision: D14854600 Pulled By: soumith fbshipit-source-id: 85c703bddbd47c1b3914d58ab9521ed22ddeb62a	2019-04-09 10:33:29 -07:00
Lu Fang	ba77eadbca	add an utility function to check whether it's in the middle of onnx export or not Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19050 Reviewed By: yinghai Differential Revision: D14849878 Pulled By: houseroad fbshipit-source-id: a0a4a57f5f9f315ba1334edfccc9284a8099d17f	2019-04-09 10:07:08 -07:00
Lu Fang	75d6d8833d	remove interned_string.h dep (#19061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061 remove the deps on interned_string.h Reviewed By: BIT-silence Differential Revision: D14850078 fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32	2019-04-09 09:59:15 -07:00
Liang Xiong	b1bea0b733	add logging to make the saving action visible (#19042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19042 show the model saving step in the log. Reviewed By: kennyhorror Differential Revision: D14809385 fbshipit-source-id: c7a1e50ff92bb45b16b1c501d9325b304b07fbd3	2019-04-09 09:35:43 -07:00
Xiang Gao	89145e602b	Namedtuple return for gels, triangular_solve, and test refactor (#17195 ) Summary: Partial fix of: https://github.com/pytorch/pytorch/issues/394 - `gels` and `triangular_solve` now returns namedtuple - refactor test for namedtuple API for better coverage and maintainability Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195 Differential Revision: D14851875 Pulled By: ezyang fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c	2019-04-09 09:13:26 -07:00
Edward Yang	48a35135fb	Convert all tabs to spaces, add CI. (#18959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959 ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156 Differential Revision: D14831246 Pulled By: ezyang fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0	2019-04-09 08:12:26 -07:00
Shen Li	544783fa1d	Fix BN tests for >= 8 GPU test environments (#19049 ) Summary: DDP does not support replicating BN layers within a process. Existing BN tests fail if the test environment has more than 8 GPUs. This is fixed by explicitly setting each process to use a single replica. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19049 Differential Revision: D14845286 Pulled By: mrshenli fbshipit-source-id: 937dda5081d415ece48b21f2781b6b4e008dd42f	2019-04-09 08:08:05 -07:00
Shuichi KITAGUCHI	17adce1b69	do not use constexpr with CUDA >= 9.2 compiler on Windows. (#18986 ) Summary: Define `AT_CPP14_CONSTEXPR` from `constexpr` to empty on Windows with CUDA >= 9.2 as workaround. Discussed in #18425. When using CUDA 10.1 on Windows, I faced following errors: ~~~ D:/data/source/pytorch\c10/util/ArrayRef.h(144): error: variable in constexpr function does not have automatic storage duration detected during instantiation of "const T &c10::ArrayRef<T>::front() const [with T=at::Tensor]" D:/data/source/pytorch/aten/src\ATen/DeviceGuard.h(30): here ~~~ From documentation of CUDA Toolkit v10.1.105, compiler supports `constexpr` and relaxing requirements (in C++14), but compilation failed. I suppose this could be compiler bug and require this workaround. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18986 Differential Revision: D14821836 Pulled By: ezyang fbshipit-source-id: 9800da2fe7291e7c09e8e5e882adebab08d83ae3	2019-04-09 08:03:13 -07:00
Edward Yang	5f24f9a29b	Add torch/lib/protobuf to gitignore, fixes #18700 (#19019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19019 ghimport-source-id: 84d36f8d27912d1d094d5672154b82187dd88761 Differential Revision: D14846615 Pulled By: ezyang fbshipit-source-id: e402557ec321c85be3b28c8602b680246c8eecfe	2019-04-09 07:34:37 -07:00
Lu Fang	72e171dc52	Automatic update of fbcode/onnx to 971311db58f2fa8306d15e1458b5fd47dbc8d11c (#19046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19046 Previous import was 079c2639f9bb79b1774d1e3bfa05b0c093816ca7 Included changes: - [971311db](https://github.com/onnx/onnx/commit/971311db): use ONNX_NAMESPACE::to_string instead of std::to_string (#1915) <Lu Fang> - [65227446](https://github.com/onnx/onnx/commit/65227446): Remove all the experimental ops (#1909) <Lu Fang> - [bdb28f29](https://github.com/onnx/onnx/commit/bdb28f29): opset converter backward compatibility support for opset versions 9 and 8 (#1847) <Peyman Manikashani> - [47692338](https://github.com/onnx/onnx/commit/47692338): Create CODEOWNERS for automatic reviewer assignment for PRs (#1910) <Prasanth Pulavarthi> - [8121c731](https://github.com/onnx/onnx/commit/8121c731): Revert "quantization support in onnx (#1872)" (#1911) <Lu Fang> - [4cfa5426](https://github.com/onnx/onnx/commit/4cfa5426): quantization support in onnx (#1872) <Ke Zhang> - [030bbb80](https://github.com/onnx/onnx/commit/030bbb80): Update LICENSE formatting and clarify # of WG chairs (#1907) <Prasanth Pulavarthi> Reviewed By: yinghai Differential Revision: D14843284 fbshipit-source-id: 96c1c79abb62beff227a9fc8b2af9382c4673755	2019-04-08 23:20:02 -07:00
peter	3bfdffe487	Fix default CXX for Windows in cpp_extensions.py (#19052 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19017. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19052 Differential Revision: D14846702 Pulled By: soumith fbshipit-source-id: b0e4dadaa749da0fa2d0405a1a064820d094220a	2019-04-08 23:14:22 -07:00
Lu Fang	7db4c8ed76	fix the onnx ci Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19048 Reviewed By: yinghai Differential Revision: D14844917 Pulled By: houseroad fbshipit-source-id: 30719e05a443981284dedf34a9e51213271aa934	2019-04-08 23:07:31 -07:00
Xiaomeng Yang	fd40c0eba0	Add gelu op (#18992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992 Add gelu op Reviewed By: houseroad Differential Revision: D14814811 fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491	2019-04-08 21:58:29 -07:00
jgong5	3ad710b837	Add MKL-DNN Tensor (#17748 ) Summary: This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: https://github.com/pytorch/pytorch/issues/16038 Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17748 Reviewed By: dzhulgakov Differential Revision: D14614640 Pulled By: bddppq fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533	2019-04-08 21:41:38 -07:00
Soumith Chintala	e0c593eae7	detect C++ ABI flag for cpp extensions from available runtime information (#18994 ) Summary: Previously, when a user built PyTorch from source, but set the version string manually to be binary-formatted, it would've simply used CXX11_ABI=0 incorrectly. We have this information available at runtime with `torch._C._GLIBCXX_USE_CXX11_ABI`, so this PR improves the situation by simply using that information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18994 Differential Revision: D14839393 Pulled By: soumith fbshipit-source-id: ca92e0810b29ffe688be82326e02a64a5649a3ad	2019-04-08 17:50:03 -07:00
Spandan Tiwari	df05c7fbac	Fix momentum setting in BatchNorm forward pass. (#18764 ) Summary: This is a fix for issue https://github.com/pytorch/pytorch/issues/18525. The issue is related not only to ONNX export, but can manifest in other scenarios. An existing test point in test/onnx/test_operators.py has been updated to cover this scenario as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18764 Reviewed By: zrphercule Differential Revision: D14735166 Pulled By: houseroad fbshipit-source-id: 5a737c648f64355929ff31eb12bd4869e744768d	2019-04-08 16:30:00 -07:00
Jiakai Liu	cfb6054ada	add android build workflow to pytorch CI jobs (#18919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18919 ghimport-source-id: 3f0ce4334c899d262403d88bd8bd7513e99570f0 Reviewed By: kostmo Differential Revision: D14800728 Pulled By: ljk53 fbshipit-source-id: fec2e34c192181b8fa31c9a30f60c9bf7388f083	2019-04-08 16:25:30 -07:00
Lu Fang	443a58e03d	Export C10 operator in PyTorch Model (#18210 ) Summary: Almost there, feel free to review. these c10 operators are exported to _caffe2 domain. TODO: - [x] let the onnx checker pass - [x] test tensor list as argument - [x] test caffe2 backend and converter - [x] check the c10 schema can be exported to onnx - [x] refactor the test case to share some code - [x] fix the problem in ONNX_ATEN_FALLBACK Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210 Reviewed By: zrphercule Differential Revision: D14600916 Pulled By: houseroad fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144	2019-04-08 16:06:00 -07:00
Zachary DeVito	09c19e1068	Fix interpolate tracing (#19034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19034 ghimport-source-id: 874e0b0a8685184416152a77fc1850d9a06516ae Differential Revision: D14837282 Pulled By: zdevito fbshipit-source-id: b0ed82b607c288a54eecec3d6ed62c4626e5a563	2019-04-08 14:59:26 -07:00
Elias Ellison	930fb2f319	Fix default dtype in shape analysis (#18968 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/18823 Previously we were setting the dtype to Float when in torchscript the default is double. When the problem in https://github.com/pytorch/pytorch/issues/17662 gets landed, we will have to reevalute (and this test will fail). We should still be consistent in shape_analysis in the meantime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18968 Differential Revision: D14837939 Pulled By: eellison fbshipit-source-id: 32383b55c14bdc7753e26dec33c39ab10124c255	2019-04-08 14:50:28 -07:00
Iurii Zdebskyi	a7095b355e	Renamed bool tensors into byte tensors (#19021 ) Summary: Renamed bool tensors into byte tensors to represent the correct type in generated code Pull Request resolved: https://github.com/pytorch/pytorch/pull/19021 Differential Revision: D14835188 Pulled By: izdeby fbshipit-source-id: 0252d2c69dab35ac2f076cf9a87423463e902c76	2019-04-08 13:53:40 -07:00
Thomas Viehmann	026a9c6bf2	Handle None indexing in TorchScript (#18615 ) Summary: t[None], t[None, 1:, None] and friends for unsqueezing Fixes: #12810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18615 Differential Revision: D14837039 Pulled By: wanchaol fbshipit-source-id: ab3862c41629f087b0a46b7c59c93dac4018e6fe	2019-04-08 13:44:49 -07:00
Junjie Bai	239de1623d	Turn on mkldnn in most builds except rocm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18965 Differential Revision: D14836931 Pulled By: bddppq fbshipit-source-id: 463a9bc5043a1f3194158f7bbfae3b71c6cd4b20	2019-04-08 13:19:14 -07:00
David Riazati	9b76f69cd3	Remove dead code in module.cpp (#19022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19022 ghimport-source-id: cdf694c1b426eb9f82d4c148c9f2c2cfc180cedd Reviewed By: eellison Differential Revision: D14833409 Pulled By: driazati fbshipit-source-id: 8914c7227add7f3e07f56b21a513ba7727fb6800	2019-04-08 13:04:04 -07:00
Mikhail Zolotukhin	943f712d7a	Convert test_recursive_cse to use Filecheck inline annotations. (#19032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19032 ghimport-source-id: 58a146542deb08dd3057d099167ba530a5e51400 Differential Revision: D14836689 Pulled By: ZolotukhinM fbshipit-source-id: e65ca5f09193eb7c16c204aedd50c474ea31210c	2019-04-08 12:27:20 -07:00
Mikhail Zolotukhin	062b1321fe	Add a document 'How to Write Tests Using FileCheck' (#19005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19005 ghimport-source-id: f9c3eff54adc8eef3ead2c77be62c44d88d22a00 Differential Revision: D14826845 Pulled By: ZolotukhinM fbshipit-source-id: 62cc3657ee89acc979403da15e39bd4cd09a866d	2019-04-08 12:12:30 -07:00
Duc Ngo	e7b2669151	caffe2 - Expose tensor filler util to Python (#18886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886 Expose tensor filler util to Python and add a unit test (both C++/Python) Reviewed By: salexspb Differential Revision: D14784470 fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d	2019-04-08 11:54:10 -07:00
Jiakai Liu	66a3277dfa	call build_android.sh from pytorch CI build script (#18918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18918 ghimport-source-id: 98c63da263adbbc6ac74a69ac117740c852833cd Reviewed By: dreiss Differential Revision: D14800727 Pulled By: ljk53 fbshipit-source-id: 4d06f845bb34bcdb74b0602404f2a0782f8c8783	2019-04-08 11:03:54 -07:00
Jon Malmaud	0565141728	Type annotations for `util.data`. (#18963 ) Summary: I haven't had a chance to rigorously try these out yet so don't merge yet. Closes #18725. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18963 Differential Revision: D14832897 Pulled By: ezyang fbshipit-source-id: 4780e7a34126bc66ddbfd9d808dfc9e0edd77e68	2019-04-08 09:52:53 -07:00
Johannes M Dieterich	a2ac260524	ifdef guard some explicit pragma unrolls (#19018 ) Summary: the ROCm compiler cannot and will not satisfy them, causing compile time warnings. Reason being a runtime loop trip count. Some warnings remain arising from other parts of the ROCm stack - tickets are filed and they will be resolved within these components. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19018 Differential Revision: D14832859 Pulled By: ezyang fbshipit-source-id: 0d66e4aebe4e56af14dd5e2967d3c374a82be25c	2019-04-08 09:47:23 -07:00
Summer Deng	02968398d5	Fix a dev mode bug in activation distribution observer (#19004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19004 Handling the exception case when the data has min 3.40282e+38 max -3.40282e+38 Reviewed By: jspark1105 Differential Revision: D14822193 fbshipit-source-id: b9771d1584fdf8317f5b8c7f5806be5d27314386	2019-04-08 09:36:50 -07:00
Gregory Chanan	2addcccbf1	Clean up some sparse code. (#18962 ) Summary: 1) sparse_dispatches in native_parse was not used anymore, got rid of it. 2) got rid of overloaded sizes_ in SparseTensorImpl, which just uses the base implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18962 Differential Revision: D14811545 Pulled By: gchanan fbshipit-source-id: 2fa60ef50456b5f605caa63beae1d8d2542fd527	2019-04-08 08:15:42 -07:00
Roy Li	65b9196741	Remove tensorWithAllocator() from Type (#18780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18780 ghimport-source-id: 7d18a11ce87d988bd32f6ebb96acd878ab8d61be Stack from [ghstack](https://github.com/ezyang/ghstack): * #18780 Remove tensorWithAllocator() from Type * #18779 Remove tensorFromBlob() from Type Differential Revision: D14739336 fbshipit-source-id: 429ab10bb9f6ac9f97b5a11c7a836b6b6336cb2d	2019-04-08 00:00:39 -07:00
Johannes M Dieterich	5241e6ec5c	Fix sparse mm for ROCm (#18985 ) Summary: * Annotate also two pass reduction with launch bounds * ifdef some shortcomings of ROCm w.r.t. short-circuit returns - internal tickets filed * while there, plug memory leak by destroying matrix descriptor after the sparse call (applicable to cuSPARSE) * while there, fix types for cusparseXcoo2csr as per cuSPARSE documentation * enable test_dsmm in test_sparse which now passes Pull Request resolved: https://github.com/pytorch/pytorch/pull/18985 Differential Revision: D14822009 Pulled By: bddppq fbshipit-source-id: 757267a47a63ee56ef396c33059f7eca099f4833	2019-04-07 18:16:16 -07:00
Ilia Cherniavskii	6c91610f0c	Check if profiler is disabled in push/pop event (#18908 ) Summary: Make sure to check if profiler is disabled in push/pop and mark event functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/18908 Differential Revision: D14791931 Pulled By: ilia-cher fbshipit-source-id: e4f5149e69999ee2b9238c21cccad6d27c6a714a	2019-04-07 15:06:04 -07:00
Nishant Pandit	08ee4e5607	Implement Observer pass on simple model and validate stats (#18848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18848 The Observer Module is based on eager mode compute qparam implementation. Goal is to validate QParam result for EagerMode and Script Mode for simple model Observer stats are collected and qparam computed only for activations only at this point Reviewed By: zafartahirov Differential Revision: D14720805 fbshipit-source-id: cb2f321b4b9927b37905fdb8eb55c5610d41b351	2019-04-07 09:17:14 -07:00
Balint Cristian	67fdb4abf7	AVX2 with GCC9 fix. (#18991 ) Summary: Dear All, The proposed patch fixes the test code snippets used in cmake infrastructure, and implicit failure to set properly the ```CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS``` flag. The libcaffe2.so will have some ```UND``` avx2 related references, rendering it unusable. * Using GCC 9 test code from cmake build infra always fails: ``` $ gcc -O2 -g -pipe -Wall -m64 -mtune=generic -fopenmp -DCXX_HAS_AVX_1 -fPIE -o test.o -c test.c -mavx2 test.c: In function ‘main’: test.c:11:26: error: incompatible type for argument 1 of ‘_mm256_extract_epi64’ 11 \| _mm256_extract_epi64(x, 0); // we rely on this in our AVX2 code \| ^ \| \| \| __m256 {aka __vector(8) float} In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/immintrin.h:51, from test.c:4: /usr/lib/gcc/x86_64-redhat-linux/9/include/avxintrin.h:550:31: note: expected ‘__m256i’ {aka ‘__vector(4) long long int’} but argument is of type ‘__m256’ {aka ‘__vector(8) float’} 550 \| _mm256_extract_epi64 (__m256i __X, const int __N) \| $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/9/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 9.0.1 20190328 (Red Hat 9.0.1-0.12) (GCC) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18991 Differential Revision: D14821838 Pulled By: ezyang fbshipit-source-id: 7eb3a854a1a831f6fda8ed7ad089746230b529d7	2019-04-07 08:27:00 -07:00
Roy Li	f6af76ead7	Remove tensorFromBlob() from Type (#18779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18779 ghimport-source-id: e7453b74fcce0e4f4a9cbce0324992a85272a426 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18780 Remove tensorWithAllocator() from Type * #18779 Remove tensorFromBlob() from Type Differential Revision: D14739335 fbshipit-source-id: 8a0619a5b412332efa3b2d60c1edebd53d089d50	2019-04-07 01:37:43 -07:00
James Reed	9b69f21a95	Improve precision of emitted code for prim::Constant (#18817 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/18815 and https://github.com/pytorch/pytorch/pull/18811. This makes it so that we emit a higher-precision literal for float values in the fusion kernel, as well as assign that to a `double` variable. This prevents us from losing precision for values such as `pi`, but with the previous fixes this will also get downcasted to `float` if downstream operations require it. Therefore, we should not lose performance because of implicit promotions Pull Request resolved: https://github.com/pytorch/pytorch/pull/18817 Differential Revision: D14820842 Pulled By: jamesr66a fbshipit-source-id: 519671c6ca5e7adac746a4c4c72760a6d91e332f	2019-04-07 00:18:24 -07:00
Arunava	79533ef097	convert_sync_batch_norm to SyncBatchNorm (#18787 ) Summary: Closes #18382 Please let me know if any changes are required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18787 Differential Revision: D14821147 Pulled By: soumith fbshipit-source-id: edd98eab1b3f4151c4ae5148146435ddb2ae678d	2019-04-07 00:13:02 -07:00
Summer Deng	907b4c5890	fix bug when falling back to acc32 when weight is prepacked (#18974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18974 When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32. Reviewed By: bddppq Differential Revision: D14814067 fbshipit-source-id: aec917322de695e283f0aca1e930c5603d196404	2019-04-06 21:53:08 -07:00
Ailing Zhang	dbd9971dd2	move 2ops back to autodiff (#18969 ) Summary: Move these 2 ops back to autodiff to unblock xla CI. I will leave them for my next PR to cleanup symbolic_variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18969 Differential Revision: D14816811 Pulled By: ailzhang fbshipit-source-id: dd8a7e133dcad29560d3d1d25691883960117299	2019-04-06 21:41:25 -07:00
Nishant Pandit	1ffa358fca	Preserve naming for inputs/outputs with observer insertion (#18713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18713 - Quantizer observer node output is hooked up to following node which mutates the naming for input/output. This is not desired and required because observer op can be a sync node - Quantizer is aimed for quantizing tensors so we should insert observer op for Values that are type tensor Reviewed By: zafartahirov Differential Revision: D14715916 fbshipit-source-id: feca04c65a43103b46084d3548998498b19ee599	2019-04-06 21:01:21 -07:00
James Reed	34382e428f	Emit math functions specific to output type (#18815 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/18811 This makes it so that we only emit the *f variants of math functions if the output value's type is FloatTensor, otherwise we call the double variants to prevent loss of precision. This fixes more numerical issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/18815 Differential Revision: D14816965 Pulled By: jamesr66a fbshipit-source-id: 464be644168875ede987142281fb2168f4041e81	2019-04-06 17:56:05 -07:00
Soumith Chintala	8961ad8c5b	add instructions for NVIDIA Jetson platforms (#18990 ) Summary: Thanks to dusty-nv , we now have Stable and Weekly wheels provided for the NVIDIA Jetson Platform. They require JetPack 4.2. He's also maintaining source build instructions. This PR adds links to the binaries and source build instructions to the README. The links are dynamic, so when new stable / weekly wheels are available, Dustin will update the same URL to point to the new files Pull Request resolved: https://github.com/pytorch/pytorch/pull/18990 Differential Revision: D14820158 Pulled By: soumith fbshipit-source-id: 761a56557decb72ad9c1b9f8a2745667f558eec3	2019-04-06 12:42:43 -07:00
Nishant Pandit	bcd527190a	Quantizer pass to insert quant-dequant nodes into IR (#18446 ) Summary: - Quantizer pass to mutate IR by inserting quant-dequant nodes before and after nodes which support quantized ops. This information will be used by jit compiler to substitute with quantized ops - This currently covers simple model. It will be expanded later for subgraph pattern matching to cover more complex patterns Pull Request resolved: https://github.com/pytorch/pytorch/pull/18446 Differential Revision: D14592265 Pulled By: nishantpdce fbshipit-source-id: c9ba6c12aa96cb9c117826e386721eec83a55ea6	2019-04-06 12:39:26 -07:00
Soumith Chintala	7b5b1486c9	add SyncBatchNorm to docs (#18988 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18988 Differential Revision: D14820042 Pulled By: soumith fbshipit-source-id: 356169f554a42303b266d700d3379a5288f9671d	2019-04-06 11:43:20 -07:00
mooncake4132	d6d0fcc92b	Add c10_cuda to libraries in CUDAExtension for Windows (#18982 ) Summary: This change was necessary for me to compile [apex](https://github.com/NVIDIA/apex) on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18982 Differential Revision: D14819818 Pulled By: soumith fbshipit-source-id: 37ff9b93a72ab2b7c87f23a61e9f776c71c4c1a8	2019-04-06 10:30:51 -07:00
Gao, Xiang	1497d45315	Remove Trainer from README.md (#18980 ) Summary: Trainer has been removed long time ago Pull Request resolved: https://github.com/pytorch/pytorch/pull/18980 Differential Revision: D14819855 Pulled By: ezyang fbshipit-source-id: f62020e688ebf6663416aec7435bf1f531607941	2019-04-06 09:12:50 -07:00
Zachary DeVito	13f03a42d2	Create Object that represents a Module (#18469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18469 ghimport-source-id: 73cb8b58f43f10b1dcfca805fd5b25c4fa977632 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18469 Create Object that represents a Module * #18468 slots with explicit value/setValue make more sense in future patches * #18467 Make Object hold its ClassType * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. This changes the underlying storage for script::Module to hold a ivalue::Object which has slots for all the parameters and attributes. NamedIValue and Slot are now merged together into one class Slot that stores the tuple (ivalue::Object, offset) and can be used to read the name, type, or value of the slot and also to set the value. This cleans up a bunch of client uses. This PR does not actually use the module object in any generated code. A future PR will switch how code is generated to treat modules as first class. Differential Revision: D14613508 fbshipit-source-id: d853a7559f58d244de2ef54a781427fcd1060ed0	2019-04-05 18:58:52 -07:00
Gao, Xiang	8c9caf185b	Add numpy like repeat as torch.repeat_interleave (#18395 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/14093 cc: SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/18395 Differential Revision: D14599509 Pulled By: umanwizard fbshipit-source-id: 2391a1cc135fe5bab38475f1c8ed87c4a96222f3	2019-04-05 18:16:25 -07:00
Elias Ellison	e6bbbb017e	Fix interpolate trace (#18875 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10654 The issue is that in tracing `.size` returns an int tensor, and when an int tensor is multiplied by a scalar the int dominates and the scalar gets casted 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18875 Differential Revision: D14814441 Pulled By: eellison fbshipit-source-id: a4e96a2698f2fcbf3ec4b2bb4c43a30250f30ad9	2019-04-05 17:55:23 -07:00
James Reed	6084908287	Code string API for fuser testing (#18884 ) Summary: This adds a C++ function `debugGetFusedKernelCode` as well as a Python binding `_jit_fuser_get_fused_kernel_code` that will, given a FusionGroup graph and a set of specified inputs, return the compiled kernel source code. We can then check the contents of this source code for verification of the fuser codegen backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18884 Differential Revision: D14795508 Pulled By: jamesr66a fbshipit-source-id: 8f6e9dd13ebbb517737d893b0b5f5e9aa06af124	2019-04-05 17:13:17 -07:00
Michael Suo	ce67775f08	remove unused func (#18712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18712 ghimport-source-id: e435150a501b20695a5276addee93d795e04b532 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18712 [jit][easy] remove unused func * #18711 [jit] fix side-effects and aliasing for custom ops as title Differential Revision: D14730979 fbshipit-source-id: 381d16ea2a45779bf6d5fc6d90a4f8585461e902	2019-04-05 15:19:28 -07:00
Junjie Bai	46fe266507	Revert D14778810: [caffe2/int8] fix bug when falling back to acc32 when weight is prepacked Differential Revision: D14778810 Original commit changeset: d49a8c4b7c81 fbshipit-source-id: 15568b084848de74437582548bec42aadc74080d	2019-04-05 14:01:33 -07:00
Zachary DeVito	f6f34b3f4c	slots with explicit value/setValue make more sense in future patches (#18468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18468 ghimport-source-id: d4b41c521f2269a695e03c8e7d05d5542731ee48 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18469 Create Object that represents a Module * #18468 slots with explicit value/setValue make more sense in future patches * #18467 Make Object hold its ClassType * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. Reviewed By: suo Differential Revision: D14613509 fbshipit-source-id: 9f2208d0efd01465c78cebdc3e8365a9e0adf9ff	2019-04-05 13:41:02 -07:00
Zachary DeVito	091acb0978	Make Object hold its ClassType (#18467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18467 ghimport-source-id: d51bdd64d2529d08c634c58df1a0870b54ad49fb Stack from [ghstack](https://github.com/ezyang/ghstack): * #18469 Create Object that represents a Module * #18468 slots with explicit value/setValue make more sense in future patches * #18467 Make Object hold its ClassType * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. Currently it holds a symbol whose unqualified name is the name of the class. This will get confusing when there are multiple possible registries, and it makes getting the class type from the object difficult. The pointer to the class is only 4 more bytes so this patch just puts it in the object. Reviewed By: suo Differential Revision: D14613510 fbshipit-source-id: b35175ba4be83d2522deaa6dad5070d6ec691fed	2019-04-05 13:40:59 -07:00
Zachary DeVito	53458c97dd	Enforce single parent for script submodules (#18379 ) (#18860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18860 ghimport-source-id: 96305349bf3db564f43df2263b1e5bddcc9e9dae Reviewed By: suo Differential Revision: D14780421 Pulled By: zdevito fbshipit-source-id: 2bdd89b35866ba035ebea0adab037e441c1006e2	2019-04-05 13:40:56 -07:00
Stas Bekman	f9a56d4af2	CUDA_NVCC_EXECUTABLE is not needed, as nvcc is in PATH (#18958 ) Summary: As indicated by f0k: https://github.com/pytorch/pytorch/pull/18495#issuecomment-480178763 nvcc via ccache is already first in the PATH in the instructions I provided, so CUDA_NVCC_EXECUTABLE is not needed. I re-built to test that it's so. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/18958 Differential Revision: D14810732 Pulled By: ezyang fbshipit-source-id: 3758ae2253c745c5d7cfccedd49fa00cc4629965	2019-04-05 13:07:05 -07:00
Ahmad Salim Al-Sibahi	8e1e29124d	Fix precision issue with expansion that prefers 'probs' over 'logits' (#18614 ) Summary: I have experienced that sometimes both were in `__dict__`, but it chose to copy `probs` which loses precision over `logits`. This is especially important when training (bayesian) neural networks or doing other type of optimization, since the loss is heavily affected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18614 Differential Revision: D14793486 Pulled By: ezyang fbshipit-source-id: d4ff5e34fbb4021ea9de9f58af09a7de00d80a63	2019-04-05 13:07:01 -07:00
Joakim Rishaug	b90cbb841d	Method is supposed to be in-place (#18684 ) Summary: Tracing models which attempts to return this in-place value doesn't turn out well. I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before. Sample output from traced model attempting to set `max_norm` on `Embedding`: ``` a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so) frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame #39: python_call + 0x11 (0x5563c3c521c1 in uwsgi) frame #40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi) frame #41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi) frame #42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi) frame #43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi) frame #44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi) frame #45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi) frame #46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi) frame #47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi) frame #48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6) frame #49: _start + 0x2a (0x5563c3bec16a in uwsgi) : operation failed in interpreter: op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: _0 = torch.norm(self.item_embedding.weight, 2, 1, True) _1 = torch.div(self.item_embedding.weight, _0) m_weight = torch.t(_1) input_2 = torch.contiguous(input_1) weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.) ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE x = torch.embedding(weight_1, input_2, -1, False, False) input_3 = torch.div(x, torch.norm(x, 2, 2, True)) max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0)) hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu")) _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1] input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True) input = torch.matmul(input_4, torch.t(self.rnn2item.weight)) tastevec = torch.div(input, torch.norm(input, 2, 2, True)) outputs = torch.matmul(tastevec, m_weight) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18684 Differential Revision: D14782041 Pulled By: ezyang fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d	2019-04-05 13:00:29 -07:00
Summer Deng	28990f34d9	fix bug when falling back to acc32 when weight is prepacked (#18881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18878 When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32. TODO: add unit tests with better coverage Reviewed By: feiyu1990 Differential Revision: D14778810 fbshipit-source-id: d49a8c4b7c815ab29b77feb53ee730ad63780488	2019-04-05 13:00:26 -07:00
Marek Kolodziej	c1790fa202	More numerically stable lerp (#18871 ) Summary: The C++ and CUDA implementations of the lerp are not numerically stable. This is discussed on Wikipedia [here](https://en.wikipedia.org/wiki/Linear_interpolation#Programming_language_support). I checked the GPU SASS output and there's no overhead from using the more precise implementation, from Kepler all the way to Turing. I haven't looked at CPU ASM though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18871 Differential Revision: D14793438 Pulled By: ezyang fbshipit-source-id: 2ddc2e026c5285466cae7d1b4101174253100445	2019-04-05 12:51:20 -07:00
Pieter Noordhuis	edc7b4726b	Increase default c10d/ProcessGroupGloo test timeout (#18916 ) Summary: See #18659. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18916 Differential Revision: D14808749 Pulled By: pietern fbshipit-source-id: 9a9c8beddb2dbbb1bf4c5e575743d9e1fa3f07fa	2019-04-05 12:16:30 -07:00
Ailing Zhang	cb3a4a3d28	remove symbolic variable part 1 (#17986 ) Summary: As discussed with gchanan we should deduplicate symbolic_variable and symbolic_script to prepare for the future merge with derivatives.yaml. This PR moves most easy formulas to symbolic_script. TODO: run benchmarks to make sure no perf regression cc: apaszke zdevito wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/17986 Differential Revision: D14766412 Pulled By: ailzhang fbshipit-source-id: d95a3f876e256c0f505779a71587c985571d3b8f	2019-04-05 12:06:47 -07:00
Edward Yang	f3dbcfdfb5	Revert D14742020: Wrap workaround for cpp custom types a bit prettier and add an example Differential Revision: D14742020 Original commit changeset: 0f2fd83ae56a fbshipit-source-id: 5640255aef0319b7d8996e07132e87213130d31c	2019-04-05 12:02:12 -07:00
Karl Ostmo	c65eeeb075	Decompose more Windows scripts (#18917 ) Summary: This PR: * pulls four distinct installation steps out of `build_pytorch.bat` and into their own scripts. * eliminates the copy step for helper scripts called by `win-build.sh` and `win-test.sh` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18917 Differential Revision: D14807236 Pulled By: kostmo fbshipit-source-id: 03e91a5834dfd6d68903ad9725eacc099bbf6d53	2019-04-05 11:31:52 -07:00
Dmytro Dzhulgakov	ef779b3397	Wrap workaround for cpp custom types a bit prettier and add an example (#18791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18791 As a temporary demonstration on how to extend this hack further until custom C types are ready. Reviewed By: jamesr66a Differential Revision: D14742020 fbshipit-source-id: 0f2fd83ae56ab2abe16977a1829ed421e6abe74b	2019-04-05 11:20:13 -07:00
bddppq	c3a559deb7	Remove cuda::compat functions in aten (#18905 ) Summary: Looks like the issue of using `std::` functions is fixed in new rocm version Pull Request resolved: https://github.com/pytorch/pytorch/pull/18905 Differential Revision: D14792943 Pulled By: bddppq fbshipit-source-id: af11acbb85872943f23b6e55415db1f0699e7b8f	2019-04-05 11:15:16 -07:00
Michael Suo	fefa6d305e	fix side-effects and aliasing for custom ops (#18711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18711 ghimport-source-id: c9caedc0660b2b7ba3730cd0e1a2e0e9c3cf422b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18711 [jit] fix side-effects and aliasing for custom ops Previously we didn't track aliasing, mutation, or side effects for custom ops. This PR adds in guards with the most conservative assumptions possible: the op will 1) have side effects, 2) write to everything 3) produce a wildcard. In order to tell whether a given operator is a custom op, this PR introduces the concept of a "reserved" namespace (basically all our builtin namespaces). Custom ops live in non-reserved namespaces, so a check on the namespace is sufficient to tell whether a schema/node is "custom" or not. This is just to get things correct for now. Follow-ups to this: - Users should be able to specify aliasing/mutability without having to learn the whole alias annotation schema. - Relax assumptions a bit. In particular outputs can only alias input tensors, they don't have to be wildcards. Fixes #18490 Differential Revision: D14730978 fbshipit-source-id: 540b47a24ccf24145051609bdcc99c97e46e0fe0	2019-04-05 10:48:14 -07:00
Elias Ellison	abc758ed40	Expand the list of ops that mutate an inputs shape (#18812 ) Summary: Expand the list of ops that resize an input in-place to include broadcasting ops and other ops that affect shape. Whoever is reviewing the PR could you please look through pytorch in place ops and see if I missed any. Expanding the PR from: https://github.com/pytorch/pytorch/pull/17518 This is already being tested in test_resize_input_ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18812 Differential Revision: D14793410 Pulled By: eellison fbshipit-source-id: 125f4f5375ac1036fb96fabc9da2aaccc9adc778	2019-04-05 10:43:34 -07:00
J M Dieterich	e45e3634d6	add launch bounds, enable more tests (#18909 ) Summary: Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use. Enable tests that now work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909 Differential Revision: D14801490 Pulled By: ezyang fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7	2019-04-05 10:17:15 -07:00
Yinghai Lu	1d263ed92a	Add backward pass to infer single missing input shape for Concat opportunitiscally (#18911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18911 Att. Reviewed By: bddppq Differential Revision: D14791295 fbshipit-source-id: 4b7a775924f0eadb0cb73aa6c434a6a5be8b92be	2019-04-05 10:11:58 -07:00
Jiakai Liu	0c5d444b28	change to use clang if NDK >= 18 (#18914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18914 ghimport-source-id: 4d9d9322ee5559d96e13533ec37ff3be86a0227c Reviewed By: ezyang Differential Revision: D14794162 Pulled By: ljk53 fbshipit-source-id: caac55e12b1e62bf6ebcc6e2062d5ed122ad4e64	2019-04-05 10:02:03 -07:00
Zachary DeVito	5e8a9e8802	Revert D14673459: [pytorch][PR] [jit] Replace Slot on script::Method with NamedIValue Differential Revision: D14673459 Original commit changeset: 21200180c47f fbshipit-source-id: 9c01de4cf5bb7c87ac0c55705b901db990cd917b	2019-04-05 09:57:13 -07:00
Edward Yang	8793e8db42	Disable flaky test_proper_exit test. (#18950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18950 ghimport-source-id: 27bd575fd3c73a51ace1360aa020fa63a792a5d2 Differential Revision: D14802009 Pulled By: ezyang fbshipit-source-id: 051e1d038892c2c6e8337357fa80771b8dc42680	2019-04-05 09:49:54 -07:00
Edward Yang	865ed7682d	Checkout pytorch_sphinx_theme with https. (#18859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18859 ghimport-source-id: fbbcb8a2dd9c9f0a317de489b6bbb83e9071a7d8 Differential Revision: D14801989 Pulled By: ezyang fbshipit-source-id: a9bc02e1383adafcac01994e6346b28551d95c71	2019-04-05 09:35:49 -07:00
Pieter Noordhuis	ce92cf9bd1	Add tests for reducer class (#18845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18845 This adds a few CPU only test cases for the reducer class. Reviewed By: mrshenli Differential Revision: D14768432 fbshipit-source-id: c008a52206826304e634a95bc14167ed94c97662	2019-04-05 09:07:29 -07:00
Owen Anderson	79ac2120ba	Fix a few instances of notifying on a CV while holding the lock (#18857 ) Summary: Fix a few instances of notifying on a CV while holding the lock to release the lock before notifying. This avoids an extra thread suspension when the notified thread tries to grab the lock. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18857 Differential Revision: D14779132 Pulled By: resistor fbshipit-source-id: b18a05c4c15be1426ebfdffac1c8f002b771cfd7	2019-04-05 08:41:53 -07:00
peter	0829ef00dd	Unify caffe2 and libtorch build scripts on Windows (#18683 ) Summary: `scripts/build_windows.bat` is the original way to build caffe2 on Windows, but since it is merged into libtorch, the build scripts should be unified because they actually do the same thing except there are some different flags. The follow-up is to add the tests. Looks like the CI job for caffe2 windows is defined [here](https://github.com/pytorch/ossci-job-dsl/blob/master/src/jobs/caffe2.groovy#L906). Could we make them a separate file, just like what we've done in `.jenkins/pytorch/win-build.sh`? There's a bunch of things we can do there, like using ninja and sccache to accelerate build. cc orionr yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18683 Differential Revision: D14730188 Pulled By: ezyang fbshipit-source-id: ea287d7f213d66c49faac307250c31f9abeb0ebe	2019-04-05 07:47:32 -07:00
Gregory Chanan	84068f43f2	Simplify storage wrapping in TH. (#18855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18855 ghimport-source-id: 01faa229fa4db901ab8539d3778b716d909ba4cf Reviewed By: dzhulgakov Differential Revision: D14790669 Pulled By: gchanan fbshipit-source-id: 167b9bc9c9872743fa8f6040a26ddf7ff5789c27	2019-04-05 07:21:42 -07:00
Gregory Chanan	043e363c6c	Cache device on TensorImpl; clean up TensorImpl constructors. (#18833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18833 ghimport-source-id: 6f2be25fcc5e6be3ffe20582e604bd2c1fbab66b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * #18832 [STACK] Disallow changing the device of a tensor via set_. * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. 1) We cache device on TensorImpl. This means we can access the device without a virtual function and allows us to more easily extend TensorImpls (because they don't need to figure out how to store the Device for themselves). 2) Clean up TensorImpl APIs. We had a constructor that took a TensorTypeId and an allocator and would allocate a Storage based on the recognized types of TensorTypeIds. Instead, we just have two different constructors: one for types with a storage, one without. Reviewed By: dzhulgakov Differential Revision: D14766230 fbshipit-source-id: 745b8db84dcd6cb58f1a8675ad3ff8d033bc50df	2019-04-05 07:21:39 -07:00
Vitaly Fedyunin	b7c830b916	Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854 ) Summary: This reverts commit c484cf43a02863efd2f4a76aad43246fb0191ab5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854 Differential Revision: D14778393 Pulled By: VitalyFedyunin fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452	2019-04-05 06:25:33 -07:00
Sebastian Messmer	ab4133397c	Silence compiler warnings (#18912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18912 We intentionally test a deprecated API, no need to show the warnings here. Reviewed By: dzhulgakov Differential Revision: D14792617 fbshipit-source-id: 9ea2a4106d566064283726eed2c274b98f49a2e5	2019-04-05 01:52:00 -07:00
Dmytro Dzhulgakov	c34e5ff952	ScriptModuleOp in caffe2 (#18716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18716 Might be useful as an intermediate stage for some systems that currently use Caffe2 nets as an execution mechanism. Not sure it's a good idea all together, please comment. Limitations: - only Tensor types as inputs/outputs - the entire module is serialized as a zip archive inside a proto in Caffe2 db, it'd be subject to 4Gb limit and is likely very slow. For small models it'd work though. - no autograd, though it can be attached in principle - no way to retrieve parameters inside the script module from C2 runtime perspective (though they potentially can be alias-fetched and stored as individual blobs) - after deserialization, python wrappers returned don't have correct type (as we don't do module_lookup trick) Build-wise, I had to add dependency from pybind_state to libtorch.so. I don't think we build Caffe2 python frontend independently anymore, so it should be fine. Reviewed By: amirshim, houseroad Differential Revision: D14339599 fbshipit-source-id: 88a37a8abd1f1c4703e5ef937031f222535d4080	2019-04-05 01:07:43 -07:00
Karl Ostmo	8bdd0c3a85	flake8 fix on extracted python script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18931 Differential Revision: D14796114 Pulled By: kostmo fbshipit-source-id: 25971be5a36fffc61e29db981af7298a0fe0ed8c	2019-04-05 00:54:23 -07:00
David Riazati	8f5e478aa2	Replace Slot on script::Method with NamedIValue (#18252 ) Summary: This refactor lets us track the types of initial values added onto a `Method`. The main motivation for this is the change in `module.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18252 Differential Revision: D14673459 Pulled By: driazati fbshipit-source-id: 21200180c47f25bb70898771adfb569856e6c34a	2019-04-04 23:35:56 -07:00
Karl Ostmo	90b8552c98	U/kostmo/windows offload scripts 3 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18754 Differential Revision: D14794893 Pulled By: kostmo fbshipit-source-id: 05187d9b53615ffbcc7253accdc692c4ecaf25d9	2019-04-04 21:08:05 -07:00
Tongzhou Wang	4f5e72600e	fix lint in optim doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18883 Differential Revision: D14793365 Pulled By: ezyang fbshipit-source-id: c1b46c98e3319badec3e0e772d0ddea24cbf9c89	2019-04-04 19:08:13 -07:00
Iurii Zdebskyi	8a466d147c	Fixed the comment to reference gist example instead of private repo (#18852 ) Summary: Replace link to a file in a private repo with a gist Pull Request resolved: https://github.com/pytorch/pytorch/pull/18852 Reviewed By: ezyang Differential Revision: D14778481 Pulled By: izdeby fbshipit-source-id: 8389aa4bf115ddcfd85079cc2c861404efa678e7	2019-04-04 18:26:24 -07:00
Sepehr Sameni	b11a8c6aef	return missing keys from load_state_dict (#18668 ) Summary: return missing_keys and unexpected_keys from load_state_dict so the user can handle them when strict mode is off; also removed an unused variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/18668 Differential Revision: D14782073 Pulled By: ezyang fbshipit-source-id: ab3b855eb77bb7422594d971988067e86eef20f2	2019-04-04 18:11:56 -07:00
Junjie Bai	814c1df29a	Fix caffe2 miopen conv transpose gradient op for case of no dX gradient Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18809 Reviewed By: ezyang Differential Revision: D14759762 Pulled By: bddppq fbshipit-source-id: ff795b7e58c82f67a1d7284b5ab06b0e0e5fd3ae	2019-04-04 17:29:30 -07:00
Brennan Vincent	d35c39e73b	don't attempt to multiply by a sparse matrix (#18737 ) Summary: Tested by running the script in #16562 , and there was no error. Then: ``` >>> print(mat.grad) tensor([[1., 2., 3.], [1., 2., 3.], [1., 2., 3.]]) ``` which is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18737 Differential Revision: D14773078 Pulled By: umanwizard fbshipit-source-id: 8aa36eb6f6aa104263a467d9ac91d61b3bfd05f5	2019-04-04 17:24:53 -07:00
Wanchao Liang	07efee395c	add Fast-RNN to AI-PEP Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18885 Reviewed By: hl475 Differential Revision: D14728854 fbshipit-source-id: 7e7a2946929551963f7c938e3d82a260a9efdfbd	2019-04-04 17:04:21 -07:00
Pieter Noordhuis	7a19d3c9e1	Allow override of backend in dist.new_group() (#18595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18595 There is no need to force the backend to be the same as the global process group, as long as the backend is "nccl" or "gloo". Reviewed By: mrshenli Differential Revision: D14657204 fbshipit-source-id: 868817b9f219e3be8db0761a487f0027ed46663b	2019-04-04 14:23:03 -07:00
Lara	1ec1db477d	ONNX Export All Cases of Softmax Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18482 Reviewed By: zrphercule Differential Revision: D14630697 Pulled By: houseroad fbshipit-source-id: c06f1e3bead10a265c5f4ac3723d49f4caf46801	2019-04-04 13:24:04 -07:00
Iurii Zdebskyi	b4d2df1fee	Added bool and half support for resize_as_ and view methods (#18821 ) Summary: Enabled resize_as_ and view methods for bool and half tensors. tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/18821 Reviewed By: ezyang Differential Revision: D14762852 Pulled By: izdeby fbshipit-source-id: 4312079fb4e893fea6f71ff4f163094b2674f1e8	2019-04-04 13:09:10 -07:00
Lu Fang	bb16e8dacb	Automatic update of fbcode/onnx to 079c2639f9bb79b1774d1e3bfa05b0c093816ca7 (#18841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18841 Previous import was f0d7df2c643c4e37f1fd7735ef02c972c4d19fb5 Included changes: - [079c2639](https://github.com/onnx/onnx/commit/079c2639): update the squeeze and unsqueeze doc (#1905) <Lu Fang> - [a8b45d62](https://github.com/onnx/onnx/commit/a8b45d62): fix the ir_version onnx-operators.proto (#1903) <Lu Fang> Reviewed By: zrphercule Differential Revision: D14767158 fbshipit-source-id: 2d772fece45e25d72bf1d10fad156189397f3f86	2019-04-04 13:01:37 -07:00
James Reed	33f4751fb8	Actually model scalar type promotion in shape analysis (#18811 ) Summary: This was causing some numerical issues in the fuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/18811 Differential Revision: D14767390 Pulled By: jamesr66a fbshipit-source-id: f1123d1aab5501abad850d2edc996f8aa8dafe04	2019-04-04 12:56:40 -07:00
Max Wang	d108a1abb7	Add a .ctags.d/ toplevel directory (#18827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18827 ghimport-source-id: 38f857bc29b2c2c6a71069d00c4c69ed0bef1574 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18827 Add a .ctags.d/ toplevel directory Exclude build artifacts by default. Reviewed By: ezyang Differential Revision: D14765721 fbshipit-source-id: a785dbb2ef1df96af8e23cc65c8db2a6b67b4fce	2019-04-04 12:51:05 -07:00
Wanwannodao	8ca9ba17da	Fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18802 Differential Revision: D14781874 Pulled By: ezyang fbshipit-source-id: 0f94c40bd84c84558ea3329117580f6c749c019f	2019-04-04 12:46:39 -07:00
Xiaomeng Yang	b145dcca04	Add support for group ConvTranspose (#18794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18794 Add support for group ConvTranspose Reviewed By: houseroad Differential Revision: D14741327 fbshipit-source-id: 5d947ca044bf8495dd7f8f56122441ebbcc6c7e4	2019-04-04 11:52:06 -07:00
Gregory Chanan	8732a1b42e	Disallow changing the device of a tensor via set_. (#18832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18832 ghimport-source-id: fde4ad90541ba52dfa02bdd83466f17e6541e535 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * #18832 [STACK] Disallow changing the device of a tensor via set_. * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. This is necessary to cache the device on a TensorImpl. Differential Revision: D14766231 fbshipit-source-id: bba61634b2d6252ac0697b96033c9eea680956e8	2019-04-04 11:15:37 -07:00
Karl Ostmo	15b318de84	U/kostmo/win test offload scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18694 Differential Revision: D14766339 Pulled By: kostmo fbshipit-source-id: a2300e72129979f866430ca5c09dd7fff6df0a89	2019-04-04 10:42:11 -07:00
Zachary DeVito	f97eb8d9e4	Revert D14603722: Enforce single parent for script submodules Differential Revision: D14603722 Original commit changeset: 63ab5d0cccf7 fbshipit-source-id: 2c4174def102eda4589e08c4dbd67ce8af975199	2019-04-04 10:32:36 -07:00
Edward Yang	52a3a51490	Fix deviceCount on FakeGuardImpl. (#18745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18745 ghimport-source-id: 3ed111efe83b3061652869e33d9b5910b7daa732 Differential Revision: D14759198 Pulled By: ezyang fbshipit-source-id: 70a8db767f310fe0e0079c7b0693e9330d7cd472	2019-04-04 09:23:36 -07:00
Gregory Chanan	486fae563d	Stop swapping in Storages of the wrong device for Tensors. (#18831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18831 ghimport-source-id: 2741e0d70ebe2c2217572c3af54ddd9d2047e342 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * #18832 [STACK] Disallow changing the device of a tensor via set_. * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. This is necessary to support device caching, see https://github.com/pytorch/pytorch/pull/18751 and https://github.com/pytorch/pytorch/pull/18578. In library code, we potentially swap in Storages with the wrong device when device_guard is False. This happens as follows with "view-like" operations. 1) We allocate a tensor on the 'wrong' device (because device_guard is false). 2) We swap out the 'wrong' storage with the 'right' storage using e.g. THCTensor_setStorage. Instead, we can just construct the Tensor with the correct Storage from the beginning. This is what we do with 'view'. Note there are two other "view-like" cases where this happens: 1) unfold 2) set_() Because these aren't performance critical, I just added the device_guard instead of applying the above correction. For completeness, this also includes a test that all `device_guard: false` functions behave properly under these conditions. Reviewed By: dzhulgakov Differential Revision: D14766232 fbshipit-source-id: 0865c3ddae3f415df5da7a9869b1ea9f210e81bc	2019-04-04 06:25:33 -07:00
Roy Li	d70c6f23f4	Pass ScalarType separately from Type in python constructors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17786 Reviewed By: ezyang Differential Revision: D14379075 fbshipit-source-id: 3abf066563b789a30cafe5b0c868a41326f5b833	2019-04-04 02:24:20 -07:00
Roy Li	f5741eb855	Store ScalarType and Backend instead of Type in TensorIterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601 Reviewed By: ezyang Differential Revision: D14274754 fbshipit-source-id: b08880ae586b6ae57d4c0bbeb203796d087926c4	2019-04-04 02:24:16 -07:00
Roy Li	c705d9eb1e	Introduce DeprecatedTypeProperties class (#17991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17991 changes: -Breaks bc: Tensor::type() now returns DeprecatedTypeProperties& rather than Type&. -Added DeprecatedTypeProperties, it serves as a temporary replacement for Type as the return value of Tensor::type(). This contributes to making Type just for dispatch purposes so that we can make it dtype agnostic. -Tensor::dispatch_type() now returns Type& like Tensor::type() used to do. -Changed callsites of Tensor::type() appropriately. Reviewed By: ezyang Differential Revision: D14443117 fbshipit-source-id: 239ccb7a09626279a71d1a37f8f82e7f57bf7d9e	2019-04-04 02:24:13 -07:00
Bram Wasti	095f88e093	Fix to handle null strides in DLPack tensor (#18510 ) Summary: DLPack can have non-strided tensors, which is represented by a nullptr in the place of dl_tensor.strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18510 Differential Revision: D14647328 Pulled By: bwasti fbshipit-source-id: 5364282810a5772cfc2319fc8133fe86fdd84dd1	2019-04-04 00:28:13 -07:00
Yinghai Lu	e5e2110a8e	Add shape inference function for Split (#18838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18838 It turns out that we don't have shape inference function of `Split` op at all. This diff adds that. Reviewed By: bertmaher Differential Revision: D14766871 fbshipit-source-id: 535cb4f24bdada603c76579e00e7a39aee93e19f	2019-04-04 00:22:22 -07:00
Lu Fang	0c237f1383	Fix the duplication problem in _unique_state_dict (#18139 ) Summary: Since parameter.data will create a new torch.Tensor each time, we get duplicate tensors when call _unique_state_dict now. Try to deduplicate it before creating new tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18139 Reviewed By: dzhulgakov Differential Revision: D14511262 Pulled By: houseroad fbshipit-source-id: cb69795d0b6509721220650bbb19edeb3459a503	2019-04-03 23:16:44 -07:00
Jongsoo Park	fa0ad057f8	fold col offset into bias; optimize A symmetric quant (#17026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17026 D14013931 was for FC. This diff is similar optimizations for Conv. A subtle difference is that in FC, once we fold col_offset into bias during pre-processing step, we can treat everything as if A_zero_offset == 0 (symmetric quantization of A). In Conv, we can't do this because padding still needs to use the original A_zero_offset. From requantization point of view, once col_offset folded into bias, we can treat as if we're doing symmetric A quantization. But, for steps involving padding like im2col, im2col fused with packing, and direct conv for depth-wise/group convolution we still need to pass the original A_zero_offset. Reviewed By: jianyuh Differential Revision: D14020276 fbshipit-source-id: c29caefd1127bbc6aff0e9d535939bb0c1ecb66c	2019-04-03 22:52:54 -07:00
Michael Suo	72913a55a8	fix flake8 lint (#18835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18835 ghimport-source-id: 7b1f433ae51232822704d62699233688072cbc23 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18835 fix flake8 lint * #18826 [jit] run cpp tests for non-cuda builds in test_jit.py ...again Reviewed By: ZolotukhinM Differential Revision: D14766790 fbshipit-source-id: 29361a407589092831dfbc3c5d63d2834934cd02	2019-04-03 22:24:01 -07:00
Michael Suo	0a4117a36e	run cpp tests for non-cuda builds in test_jit.py (#18826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18826 ghimport-source-id: 7ffa3bc7ef7402a6d6eb6ba5849e197019d77bf8 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18826 [jit] run cpp tests for non-cuda builds in test_jit.py We did all the work of nicely separating our cpp tests that don't require CUDA, but they aren't run from test_jit.py if CUDA is missing. Reviewed By: ZolotukhinM Differential Revision: D14766287 fbshipit-source-id: 9326b3a5c90f6c20fc8cfaf1a1885a363b91f30a	2019-04-03 22:23:58 -07:00
Lu Fang	100f95a362	Fix the linter (#18842 ) Summary: Remove extra empty line Pull Request resolved: https://github.com/pytorch/pytorch/pull/18842 Differential Revision: D14767334 Pulled By: houseroad fbshipit-source-id: 63224bc407949949e1eb5123d3f151e4ac8f6988	2019-04-03 21:37:01 -07:00
Zachary DeVito	7e59c60454	Enforce single parent for script submodules (#18379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18379 ghimport-source-id: 9895ecc1ff7897e98853dc00675341f36726e7c7 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. The assumption that a ScriptModule has a single parent is present in our serialization format, and likely a few other places. It is not enforced on creation of script module hierarchies though, meaning that problems associated with (e.g. replicating a module twice in the output format) will not be caught until much later in the development cycle. This patch enforces the property when a submodule is registered. It also removes NamedModule since it is no longer necessary in this regime. This will also allow the easy discover of a modules fully-qualified name without needing to traverse the Module hierarchy. Differential Revision: D14603722 fbshipit-source-id: 63ab5d0cccf7d66c7833e0adf9023024ca9607cb	2019-04-03 20:26:58 -07:00
Elias Ellison	b80a4fa201	Allow ints, floats, and tensors in conditional (#18755 ) Summary: Per our offline discussion, allow Tensors, ints, and floats to be casted to be bool when used in a conditional Fix for https://github.com/pytorch/pytorch/issues/18381 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18755 Reviewed By: driazati Differential Revision: D14752476 Pulled By: eellison fbshipit-source-id: 149960c92afcf7e4cc4997bccc57f4e911118ff1	2019-04-03 17:12:17 -07:00
Wanchao Liang	843e6234f5	Fix layernorm ad formula on weight and bias (#18233 ) Summary: Fix the layernorm formula when weight and bias passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18233 Differential Revision: D14760375 Pulled By: wanchaol fbshipit-source-id: d6bd3b137bc04c391aa5c24d021d1f811ba2a877	2019-04-03 16:58:33 -07:00
Zachary DeVito	0512e4e323	Unify namespace of script::Module (#18378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18378 ghimport-source-id: 55c29bb436a2153d29ff2f4488d99d8863c187b1 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. This removes individual OrderedDicts in favor of a single unified namespace for all things in a script::Module. This removes a whole class of bugs where both a method and an parameter could get the same name, for instance. Since we no longer have to expose OrderedDict::Item objects, a lot of downstream code can be simplified. We no longer now double-store names (both in the key of the dictionary, and in the object itself). Differential Revision: D14603723 fbshipit-source-id: b5f7551b3074679623edd6ea70269830353b4d4c	2019-04-03 16:04:17 -07:00
Vitaly Fedyunin	773ce4fbd0	Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance (#18648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18648 ghimport-source-id: 1cf4a8fe91492621e02217f38cae5d7e0699fb05 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18661 Step 7: remove _unique * #18655 Step 6: Rename _unique2 to unique and add int? dim * #18654 Step 5: remove _unque_dim in favor of unique_dim * #18651 Step 4: add support for unique with dim=None * #18650 Step 3: Add support for return_counts to torch.unique for dim not None * #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim * #18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR ====== This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: Before --------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After ------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Differential Revision: D14730905 fbshipit-source-id: 10026b4b98628a8565cc28a13317d29adf1225cc	2019-04-03 15:29:55 -07:00
Shen Li	7ae0263e1b	Support replicating multi-GPU modules (#18687 ) Summary: If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See #18591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687 Differential Revision: D14706162 Pulled By: mrshenli fbshipit-source-id: dca630d3308f2dbcf8b75629c452d7a64092ba42	2019-04-03 14:43:07 -07:00
Wanchao Liang	eabd9eac2a	flake8 fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18810 Differential Revision: D14758293 Pulled By: wanchaol fbshipit-source-id: 975abe4fc5dc0dc4d43af61ec0f987e2c5670874	2019-04-03 14:14:18 -07:00
Gregory Chanan	862aff641a	Remove `device_guard: False` from native_functions that don't have a … (#18803 ) Summary: …tensor. There's nothing to device_guard on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18803 Reviewed By: ezyang Differential Revision: D14748091 Pulled By: gchanan fbshipit-source-id: ed6f16d6f4d3f07b6d5ad9696f71a14333c228b8	2019-04-03 14:00:02 -07:00
Edward Yang	cb959aa708	Switch our Linux machine AMI to a newer image. (#18433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18433 ghimport-source-id: 1c92f98b091232c0045a2e1db75d19c1f258ac1f Differential Revision: D14748827 Pulled By: ezyang fbshipit-source-id: a459451058cf5560811403bafb96c6ff083d7e3a	2019-04-03 13:50:37 -07:00
Jerry Zhang	dfcd7b0185	QTensor (#18230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230 Implementing minimum qtensor API to unblock other workstreams in quantization Changes: - Added Quantizer which represents different quantization schemes - Added qint8 as a data type for QTensor - Added a new ScalarType QInt8 - Added QTensorImpl for QTensor - Added following user facing APIs - quantize_linear(scale, zero_point) - dequantize() - q_scale() - q_zero_point() Reviewed By: dzhulgakov Differential Revision: D14524641 fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c	2019-04-03 13:17:11 -07:00
Dmytro Dzhulgakov	3af2d6d904	Enforce import order to make protobuf cpp implementation in python work (#18560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18560 We have to import python protobuf here before we load cpp extension. Otherwise it breaks under certain build conditions if cpp implementation of protobuf is used. Presumably there's some registry in protobuf library and python side has to initialize the dictionary first, before static initialization in python extension does so. Otherwise, duplicated protobuf descriptors will be created and it can lead to obscure errors like Parameter to MergeFrom() must be instance of same class: expected caffe2.NetDef got caffe2.NetDef. I think it also fixes https://github.com/facebookarchive/caffe2/issues/1573 Reviewed By: ezyang, iroot900 Differential Revision: D14622054 fbshipit-source-id: 2499eb88ecdee85ff8d845859048f7ae5da2a480	2019-04-03 13:17:08 -07:00
Lu Fang	3b71f2e1f2	Pin onnx ir_version to 4 (#18768 ) Summary: to make test_operators.py more stable. in future, we will bump this up manually, and I think it's acceptable, since ir_version should be bumped too often. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18768 Reviewed By: zrphercule Differential Revision: D14741514 Pulled By: houseroad fbshipit-source-id: 0369dbc55424e345a113e49fc104a441ea290d58	2019-04-03 13:16:59 -07:00
Soumith Chintala	8711df89cc	fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18739 ) Summary: resubmit of https://github.com/pytorch/pytorch/pull/18704 with additional fixes Fixes https://github.com/pytorch/pytorch/issues/18359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18739 Differential Revision: D14737274 Pulled By: soumith fbshipit-source-id: cfbbbf68b098594bd045861d1b2c085da693ea51	2019-04-03 12:52:50 -07:00
Soumith Chintala	b5d8844bbe	push magma init into lazyInitCUDA (#18527 ) Summary: Tries to fix C++ API's usage of MAGMA-based functions. Attempts to Fix https://github.com/pytorch/pytorch/issues/18074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18527 Differential Revision: D14691694 Pulled By: soumith fbshipit-source-id: dd04e74418e486d73ea4a92193ddf79352ed71ba	2019-04-03 12:47:34 -07:00
Jerry Zhang	ed9724f385	For some files that are touched by the QTensor diff (#18765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18765 att Reviewed By: ZolotukhinM Differential Revision: D14733442 fbshipit-source-id: 525002034e6dccc2045da645e1193671fd0474b3	2019-04-03 12:47:31 -07:00
Wanchao Liang	a21e256e8d	Fix contiguous AD and Autogradzero inconsistency (#18633 ) Summary: Fixes #17962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18633 Differential Revision: D14700449 Pulled By: wanchaol fbshipit-source-id: 3d15d67c01b69b28394a0f2f001db90ed9fd31dc	2019-04-03 12:47:28 -07:00
Iurii Zdebskyi	5950c1e8c4	Added indexing for bool tensors and bool Indices (#18583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18583 ghimport-source-id: 2b1941449827f4ab632fa0f5c8cf0791a6be0845 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18583 Added indexing for bool tensors and bool Indices * #18505 Added numpy conversion * #18166 Bool Tensor for CUDA ----------- This PR enables bool tensor indexing and indexing with bool indices. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU [Done] b) CUDA [In review] 3. Tensor Conversions. [In review] 4. Tensor Indexing. [This PR] 5. Tensor Operations. 6. Back compatibility related changes. TODO: as a follow up, we should move nonzero method from TH to Aten to make code cleaner. Change: ``` v = torch.tensor([True, False, True], dtype=torch.bool) boolIndices = torch.tensor([True, False, False], dtype=torch.bool) v[boolIndices] -> tensor([True], dtype=torch.bool) v = torch.randn(5, 7, 3) boolIndices = torch.tensor([True, False, True, True, False], dtype=torch.bool) v[boolIndices] -> tensor([[[ 0.5885, -0.3322, 0.7388], [ 1.1182, 0.7808, -1.1492], [-0.7952, 0.5255, -0.0251], [ 0.7128, 0.8099, 1.2689], [-0.7018, -1.4733, -0.3732], [ 0.4503, 0.4986, -1.1605], [ 0.3348, -1.3767, -0.2976]], [[-2.0303, -0.4720, -0.1448], [-0.1914, -0.6821, 2.0061], [-1.0420, -0.1872, -0.3438], [ 1.7587, -0.4183, -0.7577], [ 1.0094, -0.1950, -0.2430], [ 0.1174, 0.3308, -0.5700], [ 0.1110, -0.2714, 1.3006]], [[-0.1946, -1.4747, -0.4650], [-1.0567, 1.0110, -0.2809], [ 0.3729, -0.5699, 0.0815], [-0.7733, -0.8316, 0.1674], [ 1.2000, -0.3745, -1.1679], [ 1.7105, 0.9851, -0.1907], [-1.1077, 0.2086, -0.0548]]]) ``` Differential Revision: D14673403 fbshipit-source-id: 2b88ec2c7eb26a4f5ef64f8707fb68068d476fc9	2019-04-03 12:47:26 -07:00
Lu Fang	65dfe1203f	add an assertion to check the param num (#18145 ) Summary: Introduce this check to see whether it will break any existing workflow Pull Request resolved: https://github.com/pytorch/pytorch/pull/18145 Reviewed By: dzhulgakov Differential Revision: D14511711 Pulled By: houseroad fbshipit-source-id: a7bb6ac84c9133fe94d3fe2f1a8566faed14a136	2019-04-03 12:47:23 -07:00
Jiakai Liu	4afc067fed	add Android NDK param to CI docker build script (#18782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18782 ghimport-source-id: 6c4bde7dc835b59209c1d5f7b243f00c9fe99de2 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18782 [pytorch] add Android NDK param to CI docker build script Inspired by discussion: https://github.com/pytorch/pytorch/pull/16242 Reviewed By: dreiss Differential Revision: D14739471 fbshipit-source-id: 0a081045186cbf359eb3cdadee722741cd8cd62f	2019-04-03 12:47:20 -07:00
Gu, Jinghui	a7b82a44c4	Upgrade mkldnn-bridge for dnnlowp support (#16308 ) Summary: The mkldnn-bridge is upgraded in this PR to support DNNLOWP operators. Meanwhile, APIs have been updated in caffe2 to use latest version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16308 Differential Revision: D14697018 Pulled By: yinghai fbshipit-source-id: ca952589098accb08295fd5aa92924c61e74d69c	2019-04-03 12:47:17 -07:00
Michael Kösel	46a68c1b37	add 'abs' builtin Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18502 Differential Revision: D14750173 Pulled By: eellison fbshipit-source-id: 359cf08938ada442ca1a3b3ea14022ce10229499	2019-04-03 12:47:13 -07:00
kshitij12345	0916b5419a	Fix dense Embedding to work with double backward (#9078 ) Summary: Fixes : #6469 1. `ATen/native/native_functions.yml` had [dispatch](`03e7953a98/aten/src/ATen/native/native_functions.yaml (L451-L455)`) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](`03e7953a98/aten/src/ATen/native/Embedding.cpp (L35-L45)`) to it, thus leading to error. 2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](`03e7953a98/aten/src/ATen/native/Embedding.cpp (L93)`). Both have been solved and checked against (on CUDA and CPU) 1. As mentioned in the issue ``` import torch class Test(torch.nn.Module): def __init__(self): super(Test,self).__init__() self.embd = torch.nn.Embedding(1000, 100) self.dense = torch.nn.Linear(100, 1) def forward(self, inp): inp = self.embd(inp) return self.dense(inp) test = Test() inp = torch.tensor([0,1,2,1,1]) out = test(inp) raw_loss = out.mean(dim=0) loss_grad = torch.autograd.grad(outputs=raw_loss, inputs=list(test.parameters()), retain_graph=True, create_graph=True, only_inputs=True) norm = sum([param.norm()*2 for param in loss_grad]) loss = raw_loss + norm loss.backward(retain_graph=True) print(test.embd.weight.grad) ``` 2. Test Script ``` import torch import time start = time.time() l = [1,1]100 input = torch.tensor([[1,0],[1,0]],device='cpu') embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu') sq = embedding_matrix * embedding_matrix emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False) print('Embedding Matrix') print(embedding_matrix) print('-----------------') sum_ = emb.sum()#prod.sum() loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True) print('Gradient') print(loss_grad) print('-----------------') sum2_ = sum_ + loss_grad.sum() print(sum2_) sum2_.backward() print(embedding_matrix.grad) print(time.time() - start) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9078 Reviewed By: ezyang Differential Revision: D14691901 Pulled By: soumith fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f	2019-04-03 09:50:34 -07:00
Shen Li	c0ad6747a9	Highlight NCCL all_reduce and all_gather requirements (#18741 ) Summary: See #18689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18741 Differential Revision: D14726874 Pulled By: mrshenli fbshipit-source-id: a92404c653e3c62fc23fa3ccacfb3b2959b2e307	2019-04-03 09:50:29 -07:00
svcscm	2658190def	Updating submodules Reviewed By: zpao fbshipit-source-id: ea0b06ce68d3fd6092eaea7c835a8b51c1120ea0	2019-04-03 08:30:25 -07:00
peter	5e33085f27	Make it possible for users for select /Zi or /ZI over /Z7 when using MSVC (#18790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18701. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18790 Differential Revision: D14748195 Pulled By: ezyang fbshipit-source-id: e50df1b5ca199a88d7b5ea3ea45d25d23cd31a27	2019-04-03 08:24:52 -07:00
Jongsoo Park	06b7fe59f2	use optimization in D14020675 (#16945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16945 As title Reviewed By: jianyuh Differential Revision: D14020769 fbshipit-source-id: fc0f05fcc57bfe9b4aa0c5750060d7b2ba57dd7a	2019-04-03 08:05:10 -07:00
Gregory Chanan	2113ea6fbf	Add device and dtype to storage. (#18749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749 ghimport-source-id: 9026a037f5e11cdb9ccd386f4b6b5768b9c3259b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18751 Disallow changing the device of a tensor via set_. * #18750 Use non-legacy constructors for tensor deserialization. * #18749 Add device and dtype to storage. The goal here is to fix our serialization, which currently depends on the legacy constructors. Having dtype and device on Storage allows us to use the non-legacy constructors. This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor. Differential Revision: D14729516 fbshipit-source-id: bf4a3e8669ad4859931f4a3fa56df605cbc08dcb	2019-04-03 07:59:02 -07:00
Gregory Chanan	a3da3653eb	Use non-legacy constructors for tensor deserialization. (#18750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18750 ghimport-source-id: f1475cfb67841c41d9867d4429ba9125d5c7dd07 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18751 Disallow changing the device of a tensor via set_. * #18750 Use non-legacy constructors for tensor deserialization. * #18749 Add device and dtype to storage. Deserialization currently uses legacy constructors. This is bad because we need to maintain them, but there is a more immediate problem: 1) We are trying to implement device caching on TensorImpl to get rid of a virtual dispatch 2) This doesn't work if one is able to change the device of a Tensor underlying a Variable. 3) Deserialization does 2) So the plan is to change deserialization, then enforce that we don't change the device out from underneath a Variable. Differential Revision: D14729513 fbshipit-source-id: 090d6cdb375b94dc1bf4f554b2df243952b8cdc6	2019-04-03 07:54:11 -07:00
Iurii Zdebskyi	48f70ea0a2	Added numpy conversion (#18505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18505 ghimport-source-id: f3c9b9251e5793f9e192f587194ddfebb45facc1 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18505 [WIP]Added numpy conversion * #18166 Bool Tensor for CUDA Differential Revision: D14646403 fbshipit-source-id: 79d39d692c778ce1981c1d35b1c33e3d93111041	2019-04-03 07:28:24 -07:00
Gregory Chanan	7349dbb7ce	Remove THTensor_(newUnfold). (#18773 ) Summary: It's not used and unfold's use of `device_guard: False` is scary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18773 Differential Revision: D14736526 Pulled By: gchanan fbshipit-source-id: 6281a284bee45fa5038783e4c1ed4d1ed7ca81ab	2019-04-03 07:08:28 -07:00
mingzhe0908	cb66759600	temp fix for flake8 error (#18788 ) Summary: Fix lint error Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788 Reviewed By: houseroad Differential Revision: D14741840 Pulled By: mingzhe09088 fbshipit-source-id: 1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3	2019-04-02 22:52:52 -07:00
Igor Fedan	3079d95b6c	Fix flake8 issues Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18762 Reviewed By: houseroad Differential Revision: D14734152 Pulled By: ifedan fbshipit-source-id: 5adf123f88273895ad34ee9041896358d686de08	2019-04-02 21:18:01 -07:00
Jerry Zhang	40a54bf2f1	Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531 Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service, we would like to change it to C10_LOG_FIRST_N to prevent that. Reviewed By: dzhulgakov Differential Revision: D14647704 fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e	2019-04-02 21:03:37 -07:00
Yinghai Lu	80404cb2f5	Add support for getting TensorProto argument (#18364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18364 att Reviewed By: bddppq Differential Revision: D14584784 fbshipit-source-id: 03f9207d5cf4f7f4b812428a931edbcdcb21ca8d	2019-04-02 20:58:28 -07:00
Michael Suo	31849bc524	make test module hook use save/load (#18284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18284 ghimport-source-id: 5a92c03fda19072ffb6afd40e0f56806716c7be6 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18296 [jit] Add namespacing for ScriptClasses * #18284 [jit] make test module hook use save/load * #18211 [jit] Turn script_type_parser into a class * #18148 [jit] python interop for script classes Instead of python-printing and comparing strings (which does not capture depdency information, etc.), use save/load on in-memory buffers and compare the main module contents inside the buffer Reviewed By: ailzhang Differential Revision: D14581129 fbshipit-source-id: 52264ae9ce076775ab3fd1a0c32c8d6f6677a903	2019-04-02 18:09:52 -07:00
Zachary DeVito	2d07993bcb	Add ability to specialize class types to ArgumentSpec (#18314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18314 ghimport-source-id: 8cecb768d476ab19c9460f39c8f94a764e4cb052 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. Differential Revision: D14574395 fbshipit-source-id: cc3af6e56e9ae52990f4a1ad56ecceaa2d493577	2019-04-02 17:35:57 -07:00
Mingzhe Li	5f5a2aaab9	Operator-level performance microbenchmarks (#18740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740 Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure * benchmark_core.py : core utiltiites for running microbenchmark tests * benchmark_caffe2.py : Caffe2 specific benchmark utilitites * benchmark_pytorch.py: PyTorch specific benchmark utilities * benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP. The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend. Include two operator microbenchmarks; support both Caffe2/PyTorch: * MatMul * Add Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo. Reviewed By: zheng-xq Differential Revision: D13887111 fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce	2019-04-02 17:06:19 -07:00
Iurii Zdebskyi	b832b99afb	Bool Tensor for CUDA (#18166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166 ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18166 Bool Tensor for CUDA * #18165 Resolved comments from Bool Tensor for CPU PR ------ This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU [Done] b) CUDA [This PR] 3. Tensor Conversions. 4. Tensor Indexing. 5. Tensor Operations. 6. Back compatibility related changes. Change: Enable bool tensor in CUDA with the following operations: torch.zeros torch.tensor torch.ones torch.rand/rand_like/randint/randint_like torch.full torch.full_like torch.empty torch.empty_like Tested via unit tests and local scripts. Differential Revision: D14605104 fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c	2019-04-02 16:17:05 -07:00
Jan Schlüter	b77e3c2ca1	Add helpful information to the gradient/inplace operation exception (#18523 ) Summary: To debug a `one of the variables needed for gradient computation has been modified by an inplace operation` error, I wanted to know which variable has been modified, so I extended the error message with what information is easily available at this point. Before: ``` RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation ``` After: ``` RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [80, 1]], which is output 0 of UnsqueezeBackward0, is at version 1, not expected version 0. Hint: enable anomaly detection to find the forward pass operation which modified it. ``` The hint to enable anomaly detection is only shown when it is not enabled. It's meant to save people some googling. I'd even go further and reference `torch.autograd.set_detect_anomaly(True)`, but maybe we're not running Python? Disclaimer: I haven't looked at other parts of the code to check if using `std::stringstream` is acceptable practice, let me know if it isn't. Similarly, I haven't checked about indentation practices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18523 Differential Revision: D14683249 Pulled By: soumith fbshipit-source-id: f97a99d4aabea7461df766d66cd72300b48e2350	2019-04-02 15:23:04 -07:00
Mikhail Zolotukhin	74d9146559	build_variables.py: turn on link_whole for _C_impl library. (#18763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18763 Without `link_whole` flag in opt-builds some of the files are not linked into `_C_impl` library, which causes some of static initializers not to run (namely, registering an cutomPythonOperation from python_interpreter.cpp). This diff fixes it. Differential Revision: D14732471 fbshipit-source-id: 57cff6b4b6d479ad7ab7fd29f677746d91d6ff45	2019-04-02 15:17:13 -07:00
vaeksare	84a9694ed0	Fix windows msbuild bug (#18748 ) Summary: Fix the bug introduced by #18681 where an undefined variable was being used to limit max cpu count when building for Windows without Ninja. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18748 Differential Revision: D14733209 Pulled By: soumith fbshipit-source-id: 52fc0dd4dde99da75a6956b63f02da2e647eed4f	2019-04-02 14:35:40 -07:00
Igor Fedan	2e97c82470	torch.cross' dim default changed to c10::optional instead of int=-1 (#17582 ) Summary: Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim. Fixes #17229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582 Differential Revision: D14483063 Pulled By: ifedan fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746	2019-04-02 13:27:00 -07:00
Sacha	3027e783b1	Fix multi-configuration on Windows CMake (CUDA) (#18548 ) Summary: Multiple configurations is the default (eg. Release;Debug) on Windows and this check always broke this configuration as CMAKE_BUILD_TYPE was not set. The workaround was to always set CMAKE_BUILD_TYPE to Debug or Release, which was very unfortunate. The correct method is to use generator expressions that expand depending on the current CONFIG being processed. Side note: Anywhere else CMAKE_BUILD_TYPE is checked should probably be fixed too. Note that the CMakeLists.txt forces it in to Release mode. However, I came across this error when importing the prebuilt Config in to another project, where CMAKE_BUILD_TYPE was not set. > 3>CMake Error at pre_built/pytorch-1.0.1/share/cmake/Caffe2/public/cuda.cmake:380 (message): > 3> Unknown cmake build type: Proper support for configurations would mean we can build debug and release at the same time and as you can see, it is less CMake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18548 Differential Revision: D14730790 Pulled By: ezyang fbshipit-source-id: 70ae16832870d742c577c34a50ec7564c3da0afb	2019-04-02 13:19:07 -07:00
Igor Fedan	36237c4893	Fix flake8 issues in gragrad test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18727 Differential Revision: D14724887 Pulled By: ifedan fbshipit-source-id: 8c1db6460303e746e4aea0142302b8d61277c067	2019-04-02 12:45:18 -07:00
Sebastian Messmer	f095c34b73	Register operators by passing arguments to RegisterOperators constructor (#18577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18577 This is also part of the legacy API and we need to support it if we want to replace it. Reviewed By: dzhulgakov Differential Revision: D14671432 fbshipit-source-id: 007abf4ab816647a509fc08e35d79b6c1aa55b03	2019-04-02 12:33:33 -07:00
Sebastian Messmer	58f5954252	Allow registering an operator schema without a kernel (#18551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18551 This is helpful for defining a set of operators as an interface but not adding concrete kernels just yet. The registration logic will ensure that any other libraries that add kernels for these schemas exactly match the schema defined here. Reviewed By: dzhulgakov Differential Revision: D14660208 fbshipit-source-id: 7adb5a4876cff5a0ad21d92d8c450cb889f00cc3	2019-04-02 12:33:30 -07:00
Sebastian Messmer	7a37e066e6	Improve compiler error messages of the op registration API (#18550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18550 When the operator registration API is used wrongly, in most cases we should now get a nice compiler error instead of weird template error messages. This is done by making the enable_if conditions more broad so they also match error cases, but then having static_asserts against these error cases inside the function. Before that, since the function didn't match, the error message said something like "no function found to match your call", now it will show the error message specified in the static_asserts. Reviewed By: dzhulgakov Differential Revision: D14659178 fbshipit-source-id: 7ca4fb72d9051eadf0a7e2717b962bf1213a52b2	2019-04-02 12:33:27 -07:00
Sebastian Messmer	ae1d13a06f	Improve and test error messages for signature mismatches (#18547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18547 - Argument indices in the error messages are 1-indexed not 0-indexed. - Add test cases that a mismatching signature actually shows the correct error messages Reviewed By: dzhulgakov Differential Revision: D14656695 fbshipit-source-id: 55e45634baa3117e18b8687ea6b2a2f83715bdf6	2019-04-02 12:33:24 -07:00
Sebastian Messmer	bb8a0d717c	Enable gmock and fix system gtest issue (#18706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18706 - Enable gmock - Fix issue where the gtest source files in third_party would include system gtest headers Reviewed By: ezyang Differential Revision: D14715302 fbshipit-source-id: 5335390913e651bda85c69d7ea9b5c1bce58f172	2019-04-02 12:33:22 -07:00
Edward Yang	01c03caacc	Emergency workaround for apt-get failure. (#18733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18733 ghimport-source-id: b56766fb4b1084d8a7947cf622275d44e325141b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18733 Emergency workaround for apt-get failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: dreiss Differential Revision: D14725779 fbshipit-source-id: 6855347853a3f13461ca267ed563e2db5815166e	2019-04-02 10:49:21 -07:00
Pieter Noordhuis	0b6ed83f33	Fix clang-tidy errors in torch/csrc/distributed Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18709 Differential Revision: D14725936 Pulled By: pietern fbshipit-source-id: 307bc446d53da5d0e04d730bb51b7fb29212ace3	2019-04-02 10:32:37 -07:00
Eli Amesefe	385a755b68	Undefined behavior with memset of std::string to 0 (#18703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18703 `zeroPtr` is sometimes a `std::string` tensor, so `memset` to 0 is undefined behavior. This might be accidentally safe with `std::string` implementation that use SSO (Small String Optimization), but will crash otherwise. Reviewed By: zheng-xq Differential Revision: D14714458 fbshipit-source-id: 012a18464e6514d38ff791509b88ddc3fc55b2b1	2019-04-02 10:10:11 -07:00
Soumith Chintala	a799751e33	Revert D14717015: [pytorch][PR] fix nccl compilation to make sure it compiles for architectures that pytorch compiles for Differential Revision: D14717015 Original commit changeset: 4aac036f57e5 fbshipit-source-id: c820b8dfb27564271e6b80e133fe655658a7c25c	2019-04-02 09:39:03 -07:00
Lu Fang	1f5a46ab05	Automatic update of fbcode/onnx to f0d7df2c643c4e37f1fd7735ef02c972c4d19fb5 (#18695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18695 Previous import was fb1a80692c1ab0bd27b1072f2e7bffacba336777 Included changes: - [f0d7df2c](https://github.com/onnx/onnx/commit/f0d7df2c): fix testcase names of maxpool_2d_ceil and averagepool_2d_ceil (#1896) <karljang> Reviewed By: zrphercule Differential Revision: D14709993 fbshipit-source-id: 7fe2145a481ea2c1b6d85ba1c85c662200a53241	2019-04-02 09:16:48 -07:00
Vitaly Fedyunin	c484cf43a0	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455 Reviewed By: ezyang Differential Revision: D14672084 Pulled By: VitalyFedyunin fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124	2019-04-02 08:48:19 -07:00
Edward Yang	aed7c9bc96	Improve Backend comment. (#18567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18567 ghimport-source-id: 1e50e611a3afcfae86828b7afe06c3fdc6a7bef7 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18567 Improve Backend comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: dzhulgakov Differential Revision: D14666189 fbshipit-source-id: 64a41c4a998b1a59ff780d1ae06fa16e5ef3c7c4	2019-04-02 08:06:48 -07:00
vishwakftw	baac5489a8	Expose alias multinomial methods to ATen (#17904 ) Summary: This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods. cc: neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904 Differential Revision: D14700205 Pulled By: ezyang fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3	2019-04-02 07:56:41 -07:00
BloodAxe	5ade96fc84	Update cpp_extension.py (#18638 ) Summary: Hi. It seems that when building CPP-extensions with CUDA for Windows, an `extra_cuda_cflags` options are not properly forwarded to `nvcc`. Use of extra CUDA options is necessary to build, for instance, a InplaceABN (https://github.com/mapillary/inplace_abn), which requires `--expt-extended-lambda` option. This PR adds one line that correctly appends `extra_cuda_cflags`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18638 Differential Revision: D14704270 Pulled By: ezyang fbshipit-source-id: e1e330d193d9afd5707a5437a74c0499460d2b90	2019-04-02 07:56:38 -07:00
Mark Pare	fba89b2ae1	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18653 Differential Revision: D14713920 Pulled By: ezyang fbshipit-source-id: 170295a162dd23916c1dcc9330918d33277cc9ed	2019-04-02 07:51:30 -07:00
Gregory Chanan	d5bf6ddc29	Kill LegacyBridge functions that don't do multiple dispatch. (#18696 ) Summary: At some point, we needed these functions to deal with autograd dispatching to the sparse of TH version of a backwards. But we rewrote all backwards definitions in terms of native functions, so this is no longer necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18696 Differential Revision: D14710834 Pulled By: gchanan fbshipit-source-id: b22568c58eefc79d672555bd8832398ccd965cb7	2019-04-02 07:34:55 -07:00
svcscm	af84371ba8	Updating submodules Reviewed By: zpao fbshipit-source-id: da3cd711bb81b07c6c284426ffc5e10a969b0d2b	2019-04-02 06:50:53 -07:00
Jongsoo Park	f084c129db	add Int8FCRelu (#18673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18673 Add a fused FC + Relu Reviewed By: csummersea Differential Revision: D14667055 fbshipit-source-id: d88fefba008fc0ca450291532d2b320694c6b785	2019-04-01 23:50:30 -07:00
David Riazati	8e873ce273	Fix uninitialized value in pickler (#18678 ) Summary: Fixes #18671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18678 Differential Revision: D14708969 Pulled By: driazati fbshipit-source-id: d372c6e3a2a3d3fc48d8afc1fa6807f2ce0e5c6e	2019-04-01 17:34:36 -07:00
Soumith Chintala	2e029db2f9	fixes multiprocessing serialization for integer nn.Parameter (#18639 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639 Differential Revision: D14711565 Pulled By: soumith fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19	2019-04-01 17:15:42 -07:00
Soumith Chintala	fc6296d777	fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18704 ) Summary: cc: t-vi gchanan zou3519 This fixes https://github.com/pytorch/pytorch/issues/18359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18704 Differential Revision: D14717015 Pulled By: soumith fbshipit-source-id: 4aac036f57e564b05d759662e8ad7a80170901c0	2019-04-01 17:10:42 -07:00
Jon Malmaud	1b25fdbcd0	More type stubs (#18511 ) Summary: Added stubs for: * The `device` module * The `cuda` module * Parts of the `optim` module * Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start. This would close #16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue. The big remaining missing package is `nn`. Also added a `py.typed` file so mypy will pick up on the type stubs. That closes #17639. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511 Differential Revision: D14715053 Pulled By: ezyang fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9	2019-04-01 16:03:58 -07:00
Gregory Chanan	aa23b8c664	NCCL build fix WITH_DISTRIBUTED=1. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18691 Reviewed By: ezyang Differential Revision: D14706205 Pulled By: gchanan fbshipit-source-id: 802f19bfd7df3703c0dbce03036e2f2e32eb3efb	2019-04-01 15:58:54 -07:00
Duc Ngo	16f07d7dac	caffe2 - set up correct inheritance structure for remaining operator test classes (#18622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18622 Set up correct inheritance structure for remaining operator test classes Reviewed By: ezyang Differential Revision: D14685941 fbshipit-source-id: a6b1b3be325935b7fec7515be13a4994b3016bf0	2019-04-01 15:53:22 -07:00
Elias Ellison	20b63aa977	Peephole Optimize Shape Ops (#18549 ) Summary: Peephole optimize ops that just require Dimensioned Tensor Type, which is what we specialize graphs on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18549 Differential Revision: D14690827 Pulled By: eellison fbshipit-source-id: 9d7439eb584f0a5b877f5aa53cf80150f00e7e5f	2019-04-01 15:39:43 -07:00
Sebastian Messmer	4a0f842d42	Deprecated lambda based API (#18542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18542 This adds the deprecated API for defining kernels as lambdas. The new API for defining kernels as lambdas was introduced in D14653005. Reviewed By: dzhulgakov Differential Revision: D14653551 fbshipit-source-id: 99900f1436716c69e52c83b68333b642ec2c8558	2019-04-01 14:58:35 -07:00
Sebastian Messmer	723ce02a55	deprecated function based API (#18444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18444 This adds the deprecated function based API to c10::RegisterOperators(). This is the API currently exposed under jit::RegisterOperators() and we need to support it for backwards compatibility. Reviewed By: dzhulgakov Differential Revision: D14514218 fbshipit-source-id: c77676851cfd431d66f18fd8038cf153a3a7d7cc	2019-04-01 14:58:32 -07:00
Junjie Bai	246f5c412e	Revert "Tensor construction codemod(raw_mutable_data) (#16373 )" (#18680 ) Summary: This reverts commit d73c830e236f5b980e5c91914b818d150b60278c. We have observed significant perf drop when training ResNext101 with multiple amd GPUs: Before: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1636/console 2 GPUs ResNext training got 150\~160 imgs/sec 4 GPUs ResNext training got 270\~280 imgs/sec After: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1637/console Both 2 and 4 GPUs ResNext training drop to 110\~120 imgs/sec Similar perf drop are seen on ResNet50 training jobs as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18680 Differential Revision: D14702941 Pulled By: bddppq fbshipit-source-id: 828141805afc23f25c08d4a2eb6d4b99f817c128	2019-04-01 14:39:13 -07:00
Pieter Noordhuis	bdfdf6c2b9	C++ handler for gradient reduction (#18251 ) Summary: This commit adds the `c10d::Reducer` class that hooks into autograd and performs gradient bucketing and reduction. These are the core parts of `nn.parallel.DistributedDataParallel` that up to now were only usable for CUDA models. This should enable the following: * Distributed data parallelism for models defined using the C++ frontend. * Allow overlap of gradient computation and reduction for non-CUDA models. * Enable distributed data parallelism for models with some unused parameters. This does not include any logic for computing bucket assignment, which can be done separately; either by observing autograd execution order (this is what Apex does), or by assigning buckets based on some maximum byte size, or both. Also see #17757 and #13273. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251 Reviewed By: mrshenli Differential Revision: D14571899 Pulled By: pietern fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c	2019-04-01 14:30:02 -07:00
svcscm	a0285dd0f4	Updating submodules Reviewed By: zpao fbshipit-source-id: 735fc388bff7066e8f46526266a73bf35e121442	2019-04-01 13:59:58 -07:00
Jongsoo Park	89e9b1cf8e	add ConvRelu schema (#18693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18693 As title Reviewed By: protonu Differential Revision: D14662880 fbshipit-source-id: 3664faa660a04e1f528a413d2a1700b872c3c684	2019-04-01 13:09:07 -07:00
Karl Ostmo	90a5c56988	offload scripts from win-test.sh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18601 Differential Revision: D14711856 Pulled By: kostmo fbshipit-source-id: 75fe620541fe2903f69a53dbd1b6d51a0d718113	2019-04-01 13:04:30 -07:00
peter	929258a680	Some fixes for the build script on Windows (#18681 ) Summary: Fixes https://discuss.pytorch.org/t/pytorch-build-from-source-on-windows/40288/13?u=peterjc123. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18681 Differential Revision: D14711039 Pulled By: soumith fbshipit-source-id: f7e1a94b163064c055670b2925cd4502e7773599	2019-04-01 12:42:51 -07:00
Igor Fedan	d6c269c33e	Fix for double backwards tests (#18190 ) Summary: If none of the outputs require_grad, we don't actually check gradgrad, instead we will check that their numerical gradients are 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18190 Differential Revision: D14563388 Pulled By: ifedan fbshipit-source-id: a4eb94c9eb60f14dbe6986cd8cef1fe78a7bc839	2019-04-01 12:33:30 -07:00
David Riazati	be01c90797	Add string index/slice operations (#18247 ) Summary: Adds support for string indexing (`"a"[0]`) and slicing (`"abc"[1:3]`) to script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18247 Differential Revision: D14574486 Pulled By: driazati fbshipit-source-id: 4b42aa0881e5398ea7f112be46c0335e6e19dced	2019-04-01 12:11:35 -07:00
eellison	af9335436d	Re-land Parsing file check (#18570 ) Summary: The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix. Re-land of https://github.com/pytorch/pytorch/pull/18304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570 Reviewed By: driazati Differential Revision: D14707285 Pulled By: eellison fbshipit-source-id: 3a0265928aa8cad78961723d8bf0fbf871fdb71d	2019-04-01 11:56:32 -07:00
Ru Li	3749d65a7e	Create Node2Vec ModuleKeeper Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18504 Reviewed By: sunnieshang Differential Revision: D14632091 fbshipit-source-id: d4544866552dc6bcbc7515be9e88cb11e7622a44	2019-04-01 10:36:23 -07:00
Jongsoo Park	822c8ee143	use acc16 only when n>128 and k>128 in Skylake (#18672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18672 In Skylake, when n < 128 or k < 128, acc16 is slower. Reviewed By: jianyuh Differential Revision: D14700576 fbshipit-source-id: 80ca9f1af4626637eed9c5ca49f95ae744811189	2019-04-01 08:52:28 -07:00
Gregory Chanan	4c74cf7489	Move ideep singleton registration to ATen from C2. (#18335 ) Summary: Since we are going to add ideep to ATen, and ATen is always compiled, it makes sense to have the registration in ATen rather than C2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18335 Reviewed By: bddppq Differential Revision: D14578652 Pulled By: gchanan fbshipit-source-id: 4d77fcfc21a362b21d5291a127498aa722548873	2019-04-01 08:00:33 -07:00
Shuichi KITAGUCHI	ddbfdc911d	Create torch/lib directory before copying _C.lib on Windows environment. (#18666 ) Summary: `python setup.py develop` fails with following messages. ~~~ ... -- Building with NumPy bindings -- Not using cuDNN -- Not using MIOpen -- Not using CUDA -- Using MKLDNN -- Not using NCCL -- Building without distributed package Copying extension caffe2.python.caffe2_pybind11_state Copying caffe2.python.caffe2_pybind11_state from torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd to C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd copying torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python building 'torch._C' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release creating build\temp.win-amd64-3.7\Release\torch creating build\temp.win-amd64-3.7\Release\torch\csrc ... creating C:\data\source\pytorch\build\lib.win-amd64-3.7\torch C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\data\source\pytorch\torch\lib /LIBPATH:C:\data\dlenv\libs /LIBPATH:C:\data\dlenv\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" shm.lib torch_python.lib /EXPORT:PyInit__C build\temp.win-amd64-3.7\Release\torch/csrc/stub.obj /OUT:build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib /NODEFAULTLIB:LIBCMT.LIB ライブラリ build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib とオブジェクト build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.exp を作成中コード生成しています。コード生成が終了しました。 copying build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd -> torch copying build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> caffe2\python copying build/temp.win-amd64-3.7/Release/torch/csrc/_C.cp37-win_amd64.lib -> build/lib.win-amd64-3.7/torch/lib/_C.lib error: could not create 'build/lib.win-amd64-3.7/torch/lib/_C.lib': No such file or directory ~~~ When `python setup.py install` is executed, `torch/lib` has been created by previous process (copying many files) and this copy succeeds. But in develop mode, that process does not executed and this copy fails. This patch creates `torch/lib` directory if do not exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18666 Differential Revision: D14704269 Pulled By: ezyang fbshipit-source-id: b2d7c698a906b945bf34bb78f17b91b4fdfd3294	2019-04-01 07:28:08 -07:00
Sacha	8276d82f78	Move flags that do not work on MSVC (#18686 ) Summary: MSVC errors on these flags as they are not supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/18686 Differential Revision: D14704254 Pulled By: ezyang fbshipit-source-id: 936d33ed6b7474d7774a49505cdac50dbe8dd99a	2019-04-01 07:28:05 -07:00
Junjie Bai	44b5891121	Fix unused lambda capture warnings (#18662 ) Summary: ``` aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.DEFAULT.cpp:109:104: warning: lambda capture 'combs' is not used [-Wunused-lambda-capture] parallel_for(0, combs, internal::GRAIN_SIZE / (16 * m), [p, self_start, self_end, n, m, res_start, combs](int64_t k, int64_t end) { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18662 Differential Revision: D14699379 Pulled By: bddppq fbshipit-source-id: 5062d4327bb5f7b485c2ffa30c98e10576416f03	2019-03-31 22:35:58 -07:00
Jongsoo Park	505d50ea90	handle a rare case of histogram min is inf/nan (#18239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18239 When min is inf or nan, we get UBSAN errors Reviewed By: csummersea Differential Revision: D14537668 fbshipit-source-id: e70ffb5ecd2b10793356070c69fdabf8f25b203e	2019-03-31 21:32:54 -07:00
Edward Yang	6841537933	Delete duplicated technical content from contribution_guide.rst (#18628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18628 ghimport-source-id: d94b81a6f303883d97beaae25344fd591e13ce52 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18629 Provide flake8 install instructions. * #18628 Delete duplicated technical content from contribution_guide.rst There's useful guide in contributing_guide.rst, but the technical bits were straight up copy-pasted from CONTRIBUTING.md, and I don't think it makes sense to break the CONTRIBUTING.md link. Instead, I deleted the duplicate bits and added a cross reference to the rst document. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14701003 fbshipit-source-id: 3bbb102fae225cbda27628a59138bba769bfa288	2019-03-31 19:13:22 -07:00
Edward Yang	35bc83524d	Provide flake8 install instructions. (#18629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18629 ghimport-source-id: 66a8871c56ffcfa7d4bfdf601e180fae99194e28 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18629 Provide flake8 install instructions. * #18628 Delete duplicated technical content from contribution_guide.rst Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14701004 fbshipit-source-id: b64292f0ef01b7894cf6b9ff8d5fd9e921c8d162	2019-03-31 18:59:18 -07:00
Rui Zhu	19fe2b9db4	Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621 This diff added caffe2 support for onnxifi quantization. Reviewed By: yinghai Differential Revision: D14648767 fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3	2019-03-31 17:42:27 -07:00
David Riazati	3c70326cf4	Fix test on windows (#18667 ) Summary: Breakage in #18188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18667 Differential Revision: D14700133 Pulled By: driazati fbshipit-source-id: 4cc26bd579fc1b074b3bef6046cc1030facee130	2019-03-31 16:24:21 -07:00
Ailing Zhang	9c87543124	Enforce check ad in test_jit (#18509 ) Summary: If a test triggers autodiff, it must have a `DifferentiableGraph` in its differentiated forward graph, and this subgraph must have either the original aten node, or the corresponding nodes used in AD formula. Typically a forward differentiable graph looks like this: ``` graph(%i0 : Float(), %i1 : Float()): %3 : Float() = prim::DifferentiableGraph_0(%i0, %i1) return (%3) with prim::DifferentiableGraph_0 = graph(%0 : Float(), %1 : Float()): %2 : Float() = aten::max(%0, %1) return (%2) ``` which tells us `aten::max(Tensor self, Tensor other) -> Tensor` is symbolically differentiable. Update: there're a lot of cases (fusions/ConstantChunk/python implementations) that breaks it so I decided to make the check optionally take node names if different from function name. ~~[OLD]Theoretically I could also check if `aten::max` is in the differentiable block or not to be more precise, but there're also cases like `chunk` where in a differentiable block it's replaced with a prim node (ConstantChunk) and we will have to special case them. Any suggestions here (to be more precise or no) is very welcome!~~ We used to have a list containing nn tests should be run against AD, I moved it to an field when constructing our test(either torch or nn). I think it's cleaner this way, and it matches the fact that for the same op we support one schema of it but not all, in this way we could just turn on the corresponding test which triggers that supported schema. cc: apaszke zdevito wanchaol ngimel for a review [Done] : - Going through a manual second pass of all tests to check if they should enable AD test or not.... - Add a readme about how to add AD for an op and how to add/enable its test in test_jit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18509 Differential Revision: D14696811 Pulled By: ailzhang fbshipit-source-id: c5e693277baac585cd3aed5ab2c0e7faa5e6f29f	2019-03-31 08:51:30 -07:00
Junjie Bai	828a6a3b39	Use proper isnan check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18663 Differential Revision: D14699385 Pulled By: bddppq fbshipit-source-id: 596ad3371e7704802591e49f7e1c55dc6cd2896f	2019-03-31 02:08:11 -07:00
Soumith Chintala	cb39bd9c2f	pad_circular -> _pad_circular (#18608 ) Summary: pad_circular is really private, as circular padding is exposed via `F.pad` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18608 Differential Revision: D14691704 Pulled By: soumith fbshipit-source-id: 8c2f90596feed670976115041efed3ca071e8306	2019-03-30 13:27:04 -07:00
Will Feng	a45b79d23f	Fix wrap(at::Scalar) (#18632 ) Summary: Problem: ```cpp // This function expects a `Variable` as input inline PyObject* wrap(at::Tensor tensor) { return THPVariable_Wrap(Variable(std::move(tensor))); } inline PyObject* wrap(at::Scalar scalar) { // This function calls `wrap(at::Tensor tensor)` (the function above), but since // `scalar_to_tensor(...)` returns a `Tensor` and not a `Variable`, the call to // `wrap(at::Tensor tensor)` will fail with "Tensor that was converted to Variable // was not actually a Variable", which is not what we want. return wrap(scalar_to_tensor(scalar)); } ``` The right fix is to call `make_variable(...)` with the tensor returned from `scalar_to_tensor(scalar)`. This unblocks https://github.com/pytorch/pytorch/pull/18230 as it is the only patch that hits this code path now. All other native functions that return Scalar (such as `item()` or `_local_scalar_dense()`) either has custom-defined implementation that doesn't go through this path, or is not exposed to Python at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18632 Differential Revision: D14689293 Pulled By: yf225 fbshipit-source-id: be7ba5d3de83a69533a2997de97ad92989ff78ee	2019-03-30 11:36:11 -07:00
Gao, Xiang	0f6bf09db5	Deprecated type() -> scalar_type() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18642 Differential Revision: D14696848 Pulled By: ezyang fbshipit-source-id: 43d1f86ecee5f6c6c5b70fd7d0e2063c3fc473ab	2019-03-30 10:55:46 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
ryan	96456bfa4c	Update documentation for CTCLoss (#18415 ) Summary: This is meant to resolve #18249, where I pointed out a few things that could improve the CTCLoss docs. My main goal was to clarify: - Target sequences are sequences of class indices, excluding the blank index - Lengths of `target` and `input` are needed for masking unequal length sequences, and do not necessarily = S, which is the length of the longest sequence in the batch. I thought about Thomas's suggestion to link the distill.pub article, but I'm not sure about it. I think that should be up to y'all to decide. I have no experience with .rst, so it might not render as expected :) t-vi ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/18415 Differential Revision: D14691969 Pulled By: soumith fbshipit-source-id: 381a2d52307174661c58053ae9dfae6e40cbfd46	2019-03-30 01:26:34 -07:00
Sebastian Messmer	2a58fd9844	Fallback kernels (#18443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18443 Allow registering a kernel without a dispatch key. In this case, the kernel becomes a fallback kernel that is called whenever no other kernel matches. This is also useful for the legacy function based API (since that API doesn't know about dispatch keys) or any other custom ops that don't care about dispatch and just want one kernel to be called no matter the dispatch key. Reviewed By: dzhulgakov Differential Revision: D14603258 fbshipit-source-id: 242dc8871dad2989ca25079854d0cc97429e7199	2019-03-30 00:07:34 -07:00
Sebastian Messmer	f4e87e193a	Introduce lambda-based kernel API (#18541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18541 Allow registering lambdas as c10 kernels. Reviewed By: dzhulgakov Differential Revision: D14653005 fbshipit-source-id: f867cc776b1339e83b7a2e1935f5cf924cfba44a	2019-03-30 00:07:31 -07:00
Sebastian Messmer	24752eb7b8	Report better errors when kernel or dispatch key are missing (#18302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18302 These might be use cases we want to support in the future, but they don't work yet. Let's at least report an error instead of doing segfaults or worse. Reviewed By: dzhulgakov Differential Revision: D14572346 fbshipit-source-id: 49262ce131493bc887defe2978d8b22f202cd8cc	2019-03-30 00:07:28 -07:00
Sebastian Messmer	48e7f98917	Move stuff to cpp files (#18301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18301 Move code out of headers and templates into source files and non-templates. Reviewed By: dzhulgakov Differential Revision: D14572347 fbshipit-source-id: 9fd5d62d54000a95e93076cd73f591ba2c5c2653	2019-03-30 00:07:25 -07:00
Sebastian Messmer	14c28fabd2	Check kernel against function schema in c10 op registration (#18256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18256 This diff infers the function schema from the kernel function/functor and checks that it matches the specified function schema. This diff does not allow (yet) to omit specifying the function schema in the registration API. That will come in a future diff. Reviewed By: dzhulgakov Differential Revision: D14552738 fbshipit-source-id: 00202b489ede19f26ae686c97416b38c72c11532	2019-03-30 00:07:22 -07:00
Sebastian Messmer	c4bb09cc42	Add functor- and function-based kernel registration API (#18162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18162 - Adds the API to register a functor- and function-based kernel. - Change the experimental c10 ops to use this new API instead of the old one - Deletes the old APIs in KernelRegistration.h and OpSchemaRegistration.h Reviewed By: dzhulgakov Differential Revision: D14514239 fbshipit-source-id: 35b2f6e8f62964e54886450a6a5fac812ed20f26	2019-03-30 00:07:19 -07:00
Sebastian Messmer	9abc8a5b47	New operator registration MVP (#18161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18161 This introduces version 0 for the new operator registration. For now, it only works with kernels that are defined as stack-based functions. This is actually not the intended public API for defining kernels, but it's the basis which is going to be used to define the public APIs (see diffs on top for them), and it's also the API used for exposing caffe2 operators. This diff also switches the mechanism for exposing caffe2 operators to the new mechanism. Reviewed By: dzhulgakov Differential Revision: D14514231 fbshipit-source-id: 454ab7b5b46a10203aa27b175400d23f818dd1df	2019-03-30 00:07:16 -07:00
Junjie Bai	6095814229	Fix trt installation in CI (#18609 ) Summary: caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build is failing ``` ... Mar 29 04:44:46 Need to get 174 MB of archives. Mar 29 04:44:46 After this operation, 576 MB of additional disk space will be used. Mar 29 04:44:46 Do you want to continue? [Y/n] Abort. Exited with code 1 ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18609 Differential Revision: D14694990 Pulled By: bddppq fbshipit-source-id: 260446a8650f660a2baf123a3f17efdf0a8d6c64	2019-03-29 22:47:29 -07:00
David Riazati	24db1667da	Attribute serialization improvements (#18188 ) Summary: * adds attributes to `ScriptModule.__getattr__` so they can be accessed in Python after re-importing * full support for all the possible values for an `int64_t` * this necessitated a bunch more `pushWhatever` functions, so re-introduced a templated version to cut down on duplicate code * tests to validate references / value sharing works * adds `torch.jit.Unpickler` which people can use to de-serialize the pickle files into Python / have a quick reference on how to do this without PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/18188 Differential Revision: D14527490 Pulled By: driazati fbshipit-source-id: efd15579cc04aa2e28c4b2c9490d82d849dee559	2019-03-29 19:10:12 -07:00
Cheng,Penghui	e13101e069	support pre-convert filter format for mkldnn training mode and change 'OptimizeForIdeep' to 'OptimizeForMkldnn' (#15171 ) Summary: For MKL-DNN,the filter data will be reorderd to primitive format, it takes a lot of time. So the patch provide a method to convert filter format before training. And "OptimizeForIdeep" will be changed to "OptimizeForMkldnn" in this patch. This patch depends on https://github.com/pytorch/pytorch/pull/12866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15171 Differential Revision: D14590741 Pulled By: yinghai fbshipit-source-id: 07971c9977edac3c8eec08ca2c39cda639683492	2019-03-29 19:00:48 -07:00
Jerry Zhang	d73c830e23	Tensor construction codemod(raw_mutable_data) (#16373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16373 motivation: https://github.com/pytorch/pytorch/pull/12407 This is a manual diff. most of the fixes should be: ``` auto* Y = Output(0); Y->Resize(dims); Y->raw_mutable_data(dtype); ``` --> ``` auto* Y = Output(0, dims, at::dtype(dtype)); ``` But there might be other cases. Reviewed By: dzhulgakov Differential Revision: D13725460 fbshipit-source-id: 649a4b0e42f62cda1a60171dd9fa3e440dc9dca1	2019-03-29 18:36:46 -07:00
David Riazati	7b0ef31780	Add hash() global (#18258 ) Summary: This adds `hash()` which supports `int`, `str`, and `float`. It relies on `std::hash` which is implementation defined, so the result of `hash()` in TorchScript is not the same as in Python, but should satisfy the same properties. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18258 Differential Revision: D14692317 Pulled By: driazati fbshipit-source-id: 909df5d024bb3feea157d5a203b7de53c72261c9	2019-03-29 18:29:34 -07:00
Elias Ellison	a5ddecd00c	Move fuser to test_jit_fuser (#18590 ) Summary: Start of breaking up test_jit.py New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports. I am adding a test that's expected to fail to be sure it's running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590 Reviewed By: wanchaol Differential Revision: D14677094 Pulled By: eellison fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522	2019-03-29 18:13:26 -07:00
James Reed	85f36014e2	Experimental logging/counters API (#18235 ) Summary: This defines a generic counters API that users can utilize to provide monitoring functionality in e.g. a production service. We expose both counters for runtime internals as well as a TorchScript API to create user-defined counters. Synopsis of the API: - `torch/csrc/jit/script/logging.h` specifies the externally-facing API in C++ - `torch/jit/_logging.py` specifies the Python API We use an interface, `LoggerBase`, to define the interactions between users and a logging backend. Implementing a subclass of `LoggerBase` allows the user to handle these events in a custom way, such as logging into a DB or calling into an infra-specific counters API. From the frontend perspective, we can create log events in two ways: 1. We provide an `add_stat_value(name, val)` function. This calls into the Logger backend with a key/value pair. For example, we might call `add_stat_value('foo', 1)` to bump an event counter. 2. We provide a `time_point()` function to record a timestamp in nanoseconds. This can be used in conjunction with `add_stat_value` to record runtime wall clock durations. Examples of frontend usage can be found in `test_jit.py TestLogging`. We provide a trivial `LockingLogger` implementation as an example and for testing purposes. It is likely not ready for production usage. It demonstrates that a backend implementing the API can do things like specify aggregation types and report these aggregate stats via the `get_counters()` API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18235 Differential Revision: D14545060 Pulled By: jamesr66a fbshipit-source-id: 04099543a1898cfdd411511e46e03d5dce9b4881	2019-03-29 17:14:03 -07:00
David Riazati	e2fd1d966f	Revert D14668859: [pytorch][PR] Re-land Parsing file check Differential Revision: D14668859 Original commit changeset: 3825a35ddc61 fbshipit-source-id: f3343ec6b63fe8f1f04959adfac4331865990047	2019-03-29 17:14:00 -07:00
Pieter Noordhuis	47e2772320	Update argument names of torch::autograd::FunctionPostHook (#18140 ) Summary: They are called as (outputs, inputs) and were named (inputs, outputs). Possible follow up fix is to make the outputs argument an lvalue to allow for calling multiple post hooks without ever copying outputs vector. It looks like the copy is now forced because the hook takes a const reference as input and returns an value. This would change the prototype of the function, so needs further discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18140 Differential Revision: D14684498 Pulled By: pietern fbshipit-source-id: 1bd3ddbdd1ff7fe0a18241de5a9ec745a4e7ef07	2019-03-29 16:30:23 -07:00
Soumith Chintala	81b73951f1	note on updating existing source (#18409 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18409 Differential Revision: D14597666 Pulled By: soumith fbshipit-source-id: 156104c0cd19da06f6f96a225228d1e8cf831af1	2019-03-29 16:14:47 -07:00
eellison	393731ab24	Re-land Parsing file check (#18570 ) Summary: The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix. Re-land of https://github.com/pytorch/pytorch/pull/18304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570 Differential Revision: D14668859 Pulled By: eellison fbshipit-source-id: 3825a35ddc6179a0d433d70d22b5c1a96c20b21a	2019-03-29 15:46:59 -07:00
Spandan Tiwari	1240327c5c	Refactoring serialization of ONNX initializers to be name-based (Resubmission) (#17830 ) Summary: houseroad - this is the resubmission of https://github.com/pytorch/pytorch/pull/17420, as suggested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17830 Reviewed By: zrphercule Differential Revision: D14398714 Pulled By: houseroad fbshipit-source-id: bda475f1ae8a5273ebdb0f6883fc66036c29d326	2019-03-29 15:23:29 -07:00
Mikhail Zolotukhin	fca9d9a100	Initial implementation of InsertObserverNodes pass. (#18152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18152 ghimport-source-id: 1dd5e62c4d93394dcd8d8af2871554575c8d3d1a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18152 Initial implementation of InsertObserverNodes pass. * #18151 Add quant-passes stubs. gh-metadata: pytorch pytorch 18150 gh/zolotukhinm@gmail.com/2/head Differential Revision: D14584223 fbshipit-source-id: 30896acc1a8901d22c6a167eb87d2fbaafbbeb6f	2019-03-29 15:08:57 -07:00
Gu, Jinghui	84f020fe09	Fix bug in tensor feed which caused crash due to wrong tensor type (#18552 ) Summary: In blob feeder for ideep device, the wrong device option is given and led to a crash issue. This patch aims to correct the device option to fix this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18552 Differential Revision: D14679838 Pulled By: yinghai fbshipit-source-id: bde11e6a6fe44822166881dcb7c9bd0b34b4ecf3	2019-03-29 14:12:36 -07:00
Gu, Jinghui	e3b1758f19	Upgrade mkldnn-bridge to revert tensor capacity patch and prepare for DNNLOWP support (#18471 ) Summary: 1. Upgrade mkldnn-bridge to revert tensor capacity patch to avoid ASAN issue. 2. Prepare for DNNLOWP support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18471 Differential Revision: D14621569 Pulled By: yinghai fbshipit-source-id: 9df300b77d0f2acd1a4f63c2925b7a7cab7a474e	2019-03-29 13:54:04 -07:00
Yanghan Wang	f4e35d30ed	register BoxWithNMSLimit with C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17956 Reviewed By: houseroad Differential Revision: D14417300 fbshipit-source-id: eb5e2ba84513b3b7bfa509dc442424b13fe9148f	2019-03-29 13:41:40 -07:00
Gregory Chanan	d895d30876	Fix c10d build without nccl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18582 Differential Revision: D14672928 Pulled By: gchanan fbshipit-source-id: 74e9805cbaf5ebe8e3f579fe08dad72eb410b80a	2019-03-29 13:34:38 -07:00
Will Feng	6ebfbdf4c6	Add named submodule support to nn::Sequential (#17552 ) Summary: Previously, we were not able to assign names to `nn::Sequential`'s submodules. This PR adds this feature to match the Python API. Example use: ```cpp Sequential sequential(named_submodule({ {"linear", Linear(10, 3)}, {"conv2d", Conv2d(1, 2, 3)}, {"dropout", Dropout(0.5)}, {"batchnorm", BatchNorm(5)}, {"embedding", Embedding(4, 10)}, {"lstm", LSTM(4, 5)} })); ``` It also enables loading parameters of Python `nn.Sequential` module with custom submodules names into C++ frontend, unblocking https://github.com/pytorch/vision/pull/728#issuecomment-466661344. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17552 Differential Revision: D14246834 Pulled By: yf225 fbshipit-source-id: 3030b5c5d68f6dd5d3e37ac4b4f98dc6d6d9ba72	2019-03-29 13:06:29 -07:00
Vishwak Srinivasan	e73be58ff7	Rename `btriunpack` to `lu_unpack` (#18529 ) Summary: Changelog: - Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface. - Rename all relevant tests, fix callsites - Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529 Differential Revision: D14683161 Pulled By: soumith fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1	2019-03-29 13:01:30 -07:00
Elias Ellison	be2ac6828c	fix lint (#18623 ) Summary: Fix lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/18623 Differential Revision: D14686265 Pulled By: eellison fbshipit-source-id: 4bbe0f5bc58f508cbf4bc1baef2029ce1eaa42d8	2019-03-29 11:50:12 -07:00
Xiaodong Wang	62d8c8cf0a	Manual hipify caffe2/distributed and rocm update (no hcc modules support) (#18088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18088 Manually hipify the distributed folder Reviewed By: bddppq Differential Revision: D14482702 fbshipit-source-id: cc0abdf525b423ab1f18db8010d21e27c6668d36	2019-03-29 11:07:32 -07:00
Summer Deng	7c438c82eb	Change dnnlowp log level from warning to v2 (#18576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18576 As in title Reviewed By: feiyu1990 Differential Revision: D14670898 fbshipit-source-id: 1983099b2ba57daab393278553f10dcdb1812fdf	2019-03-29 09:29:25 -07:00
Stas Bekman	c0a2452ffe	multiline KeyError msg python bug workaround (#18557 ) Summary: make multiline KeyError msg readable by working around a python bug https://bugs.python.org/issue2651 discussion: https://github.com/pytorch/pytorch/issues/16647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18557 Differential Revision: D14681086 Pulled By: soumith fbshipit-source-id: acbd13a823302c854c3d364028ed414fd8ce6bc8	2019-03-29 07:04:20 -07:00
Søren Rasmussen	95d3825e48	ReduceLrOnPlateau: best=current -> best=copy(current) (#16364 ) (#16697 ) Summary: Fixes #16364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16697 Differential Revision: D14680879 Pulled By: soumith fbshipit-source-id: c50c22f3eacea4474fb3a04fe85fbf11d5a177c9	2019-03-29 06:56:51 -07:00
crcrpar	cf444f3544	make InstanceNorm1d raise an error if the input is 2D (#11992 ) Summary: Resolves #11991 . Any comment is welcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11992 Differential Revision: D14680974 Pulled By: soumith fbshipit-source-id: 8e287a9c32bf43b35edc9d127f16ed6b72c61d91	2019-03-29 06:50:04 -07:00
Arunava	c189eba3e1	Fixed torch.arange docs (#18604 ) Summary: Kindly let me know if its okay and if any places i need to make a fix. Closes #18534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18604 Differential Revision: D14680712 Pulled By: soumith fbshipit-source-id: 030e4a3d8f7839cbe2b8a3ef386323f0d39eb81a	2019-03-29 06:42:28 -07:00
Junjie Bai	e22a2b9015	Minor fixes in fastrnns benchmarks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18613 Reviewed By: wanchaol Differential Revision: D14681838 fbshipit-source-id: 60bd5c9b09398c74335f003cd21ea32dd1c45876	2019-03-29 01:22:28 -07:00
Vishwak Srinivasan	d859031ebf	Rename `btrifact*` to `lu` (#18435 ) Summary: Changelog: - Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`). - Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple. - Rename all tests, fix callsites - Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage. - Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435 Differential Revision: D14680352 Pulled By: soumith fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8	2019-03-29 00:34:30 -07:00
Xiaomeng Yang	c21e763cd6	Optimize relu op on GPU (#18506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18506 Optimize relu op on GPU Reviewed By: houseroad Differential Revision: D14633171 fbshipit-source-id: bd3afa9a0bae1325d32ad4153736a0c7ecb0ec64	2019-03-29 00:23:24 -07:00
Lu Fang	a5a1c9a171	Automatic update of fbcode/onnx to fb1a80692c1ab0bd27b1072f2e7bffacba336777 (#18585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18585 Previous import was b29e78a4efb8e5d8995f576bbf19a846807829b6 Included changes: - [fb1a8069](https://github.com/onnx/onnx/commit/fb1a8069): Fix wrongly handled attribute in MVN and test generating scripts (#1877) <Raymond Yang> - [b22041c3](https://github.com/onnx/onnx/commit/b22041c3): Add dilation attribute to MaxPool (#1864) <karljang> Reviewed By: zrphercule, benoitsteiner Differential Revision: D14668623 fbshipit-source-id: fa7f44b1ecc949d8dd654939d20b1e93db98b1d2	2019-03-28 23:47:10 -07:00
Lu Fang	0fd1ee3145	Automatic update of fbcode/foxi to 81e1683d6348eee4b5ed1145222dc2c41be4269c (#18596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18596 Previous import was 2bcc4064c90e87b9638615c733485f07c47b7558 Included changes: - [81e1683](https://github.com/houseroad/foxi/commit/81e1683): Merge pull request #9 from zrphercule/add_foxi_quantization <Rui Zhu> - [580559c](https://github.com/houseroad/foxi/commit/580559c): char=>uint8 <zrphercule> - [1a572f7](https://github.com/houseroad/foxi/commit/1a572f7): add quantization <zrphercule> Reviewed By: zrphercule Differential Revision: D14677404 fbshipit-source-id: 09429b3bf0e7783a25b8145020e505761bad887d	2019-03-28 23:24:30 -07:00
Elias Ellison	ff4b6d1a49	Delete batch tensor (#18575 ) Summary: Deleting batch tensor since we are no longer maintaining the project and keeping it functional is blocking other improvements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18575 Differential Revision: D14671126 Pulled By: eellison fbshipit-source-id: b42d5b699c4d12171ed95e6d3a977532167f0d2c	2019-03-28 23:13:27 -07:00
Thomas Viehmann	b1a0233ee4	Update NNPACK to current master (#18580 ) Summary: This fixes builds on x86 (32 bits). Pull Request resolved: https://github.com/pytorch/pytorch/pull/18580 Differential Revision: D14672462 Pulled By: soumith fbshipit-source-id: 7629b001c2bfa3e5b6ade7f1b03a8280232a4c16	2019-03-28 22:23:08 -07:00
Gemfield	1c3428af31	Enhance build_ios.sh to be consistent with build_android.sh (#18564 ) Summary: 1, Enhance build_ios.sh to be consistent with build_android.sh; 2, Add docs for build_ios.sh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18564 Differential Revision: D14680752 Pulled By: soumith fbshipit-source-id: 6d2667ed8a3c85a057a522838f5d0461dd4788cf	2019-03-28 21:37:55 -07:00
Hyungjoo Andrew Cho	3752916132	Serialization supports pathlib.Path object for the input argument (#18562 ) Summary: This will allow pathlib.Path object to the torch.load as an input argument. Fixes #16607 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18562 Differential Revision: D14668255 Pulled By: soumith fbshipit-source-id: 0ae4f7c210918582912f2d1ef2a98f1ab288c540	2019-03-28 21:01:15 -07:00
Aurélien Roy	12abc8a99a	Target and input sizes mismatch warning in L1 Loss / L1 Smooth Loss (#18565 ) Summary: Addind the same warning message already present in the mse_loss function to the L1 losses when input and target sizes are different. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18565 Differential Revision: D14671415 Pulled By: soumith fbshipit-source-id: 01f5e1fb1ea119dbb2aecf1d94d0cb462f284982	2019-03-28 20:49:51 -07:00
bddppq	1989716ae5	Resubmit PR-18512: Improved onnx export for 3 onnx ops (#18571 ) Summary: Fix ROCm CI failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/18571 Differential Revision: D14669323 Pulled By: bddppq fbshipit-source-id: 022afe5c20e680295c9cfdfe1ec14650305955a8	2019-03-28 18:12:49 -07:00
Jeff Daily	2f174e9453	in caching allocator, ignore and clear the error if not ready Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18584 Differential Revision: D14675041 Pulled By: bddppq fbshipit-source-id: c1fab797e0d224e0a481a0395a3f9975c4265ff6	2019-03-28 17:53:30 -07:00
Ilia Cherniavskii	600eeecbf4	Add external callbacks into RecordFunction (#17844 ) Summary: Add a way to insert external callbacks into PT's RecordFunction Pull Request resolved: https://github.com/pytorch/pytorch/pull/17844 Differential Revision: D14399664 Pulled By: ilia-cher fbshipit-source-id: 76654799811fefd3ffed4abfb46ed95b492cebab	2019-03-28 17:48:45 -07:00
Jing Huang	11ac0cf276	Implement rotated generate_proposals_op without opencv dependency (CPU version) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18533 Reviewed By: ezyang Differential Revision: D14648083 fbshipit-source-id: e53e8f537100862f8015c4efa4efe4d387cef551	2019-03-28 17:02:50 -07:00
Ahmed Aly	1ae2c1950c	Use SetOutputTensor instead of copying outputs manually (#17770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17770 As title Reviewed By: dzhulgakov Differential Revision: D14370937 fbshipit-source-id: f415490c38556cf03bb13dce3643775331483448	2019-03-28 16:01:33 -07:00
Shen Li	aea8ee1f68	Fix NCCL/Gloo process groups and DDP stream sync bug (#18465 ) Summary: DDP with NCCL backend uses a [worker stream](`d3eb941ed9/torch/csrc/distributed/c10d/ddp.cpp (L142)`) to flatten grand batch tensors, and passes the flattened tensor to [another stream](`d3eb941ed9/torch/lib/c10d/ProcessGroupNCCL.cpp (L379)`) to conduct ncclAllReduce. The flattened tensor has to record the ncclAllReduce stream, otherwise multiple streams might access the same memory space. cc ppwwyyxx Pull Request resolved: https://github.com/pytorch/pytorch/pull/18465 Differential Revision: D14613449 Pulled By: mrshenli fbshipit-source-id: b62773732552d12cc87b7adeb6897e9e11753ea9	2019-03-28 15:12:40 -07:00
Ahmed Aly	9eb0f435d9	Inference LSTM integration test (#18559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18559 Adding integration test for inference LSTM Reviewed By: houseroad Differential Revision: D14656698 fbshipit-source-id: 80fb2a72be30fcb695f4471b72bf9d6e3965bf81	2019-03-28 11:31:06 -07:00
Zachary DeVito	aa20591baa	Add Slot type to abstract the raw pointers being used for slots. (#18226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18226 ghimport-source-id: b9ec8651212875b30971cc6859d2ddec6559ae3a If modules become first-class IValues, then the slots will no longer be raw pointers but (IValue, index) pairs. This commit inserts the Slot abstraction so that this change can be made in later patches. Stack from [ghstack](https://github.com/ezyang/ghstack): * #18226 Add Slot type to abstract the raw pointers being used for slots. Differential Revision: D14542022 fbshipit-source-id: b81d7f4334c983d663e7551bda82df43680d7c5f	2019-03-28 10:35:36 -07:00
Junjie Bai	77280b11e3	Revert D14635130: Improved onnx export for 3 onnx ops. Differential Revision: D14635130 Original commit changeset: d54a2b6e2950 fbshipit-source-id: f624e2befdde245cb88435a95508b2a8e6b12e61	2019-03-28 10:26:34 -07:00
Benoit Steiner	eee760dbd3	Improved onnx export for 3 onnx ops. (#18512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18512 Ceil and Floor have been supported since version 6 of ONNX: export them using the native onnx ops instead of an Aten op. Similarly, support for the Where op has been added in version 9, so we don't need to wrap these op in an Aten op. Reviewed By: houseroad Differential Revision: D14635130 fbshipit-source-id: d54a2b6e295074a6214b5939b21051a6735c9958	2019-03-28 08:55:21 -07:00
Elias Ellison	ffc7158bf2	Revert D14652372: [pytorch][PR] Add parsing to file check Differential Revision: D14652372 Original commit changeset: 7430b9d1dc2b fbshipit-source-id: fa3d0f68515fe53447746469844d2db20c1292e0	2019-03-28 00:12:47 -07:00
Ilia Cherniavskii	b0d9712938	C++17.h: forward -> c10::guts::forward (#18492 ) Summary: Use c10::guts::forward instead of forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/18492 Reviewed By: smessmer Differential Revision: D14625513 Pulled By: ilia-cher fbshipit-source-id: 8bc4e20f102fe2a107a22f3e172882d60b95ab0e	2019-03-27 21:14:07 -07:00
Thomas Viehmann	9696f06bcf	Use __ldg for CUDA kernels in fuser (#18540 ) Summary: While benchmarking a kernel with broadcasted inputs, I noticed that is was much slower than a hand-coded kernel for the smae task. The kernel in question computed a * b + c for a of shape 32 x 32 x 10240 and b and c of shape 1 x 32 x 1. This patch accellerates said kernel from 450us to 250us on my GTX1080Ti. I didn't change half because there doesn't seem to be __ldg for half. An alternative could be to sprinkle const and restrict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18540 Differential Revision: D14657840 Pulled By: soumith fbshipit-source-id: 408847346ec12d1d1d9b119ac50bbc70f0d9ed33	2019-03-27 20:22:17 -07:00
Sam Pepose	8635078d9e	Adds Cyclical Learning Rate and Momentum (#18001 ) Summary: This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR This is finishing what #2016 started. Resolves #1909. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001 Differential Revision: D14451845 Pulled By: sampepose fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9	2019-03-27 19:56:04 -07:00
Edward Yang	54abfda124	Completely synchronize behavior of Facebook flake8 and public flake8. (#18538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18538 ghimport-source-id: 665b09f158d1c5dd94686d4212792504b55b7f73 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18538 Completely synchronize behavior of Facebook flake8 and public flake8. Previously, developers at Facebook had the very funny experience wherein /usr/local/bin/flake8 behaved differently than a freshly installed flake8 from pip. In this commit, I add enough ignores to .flake8 and install enough plugins to make the Facebook flake8 and public flake8 line up exactly. These means you don't have to care which flake8 you use; they all will report accurate information on your Python files. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14652336 fbshipit-source-id: ba7776eaa139cf2e3df2e65349da6fd7c99acca4	2019-03-27 19:51:21 -07:00
Elias Ellison	8faf0112f3	add slow tests annotation to some jit tests (#18545 ) Summary: Adds slow test annotation to the following very slow tests - 70.33s test/test_jit.py::TestScript::test_script_module_script_resnet 32.33s test/test_jit.py::TestBatched::test_beam_search 17.70s test/test_jit.py::TestBatched::test_greedy_search 15.58s test/test_jit.py::TestScript::test_script_module_trace_resnet18 The list of remaining slow tests is below. Let me know if you think any of the others should be added to slow tests as well. Slow tests will only run on master. 15.28s call test/test_jit.py::TestJit::test_export_batchnorm 12.96s call test/test_jit.py::TestEndToEndHybridFrontendModels::test_snli 11.65s call test/test_jit.py::TestEndToEndHybridFrontendModels::test_neural_style 6.38s call test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_1d 5.96s call test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_2d_uneven_pad 5.91s call test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_3d_custom_params 4.76s call test/test_jit.py::TestJit::test_alexnet 3.82s call test/test_jit.py::TestScript::test_number_math 3.81s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_no_bias 3.76s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_groups_thnn 3.65s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride_pad1circular 3.49s call test/test_jit.py::TestBatched::test_lstm 3.33s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_pad2circular 3.19s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv1d_stride1_pad2circular 3.11s call test/test_jit.py::TestEndToEndHybridFrontendModels::test_dcgan_models 3.11s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride_padding 3.11s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride 3.08s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_no_bias 3.08s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv1d_stride1_pad1circular 3.07s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_groups 3.05s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_dilated 3.05s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_depthwise_with_multiplier 3.04s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_groups 3.03s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_dilated 3.02s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_depthwise_dilated 3.02s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_dilated_strided Pull Request resolved: https://github.com/pytorch/pytorch/pull/18545 Differential Revision: D14656064 Pulled By: eellison fbshipit-source-id: d17ee23c3b3679276cee983555d43e83ce099356	2019-03-27 19:27:23 -07:00
Elias Ellison	0daafe0209	Add parsing to file check (#18304 ) Summary: This allows you to embed checks in IR, making the test more readable. E.g. ``` graph_str = 'graph(%0 : Double(5, 5)): # CHECK: aten::relu %1 : Double(5, 5) = aten::relu(%0) return (%1)' FileCheck().run(graph_str, parseIR(graph_str)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18304 Differential Revision: D14652372 Pulled By: eellison fbshipit-source-id: 7430b9d1dc2b7584704375aac02d7392ecec76a0	2019-03-27 18:16:05 -07:00
Elias Ellison	ad1ebf7082	bug fix for node with writers in create autodiff subgraph (#18491 ) Summary: Previously we were moving nodes with writers into differentiable subgraphs, without necessarily preserving whether or not they were written to. This can lead to bugs with CSE, which needs that context. I'm not completely sure if there's anything else we can do to be more aggresive here - inline these nodes and not run CSE and just run constant pooling, or possibly something else, but I think we should land this correctness condition first and then possibly think further. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18491 Differential Revision: D14648562 Pulled By: eellison fbshipit-source-id: bc1e444774ccdb708e22f0e06a477a221a231f9e	2019-03-27 16:08:03 -07:00
Xianjie Chen	d74b11ce0e	add extra info for the auto gen sum ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17934 Reviewed By: iroot900 Differential Revision: D14418689 fbshipit-source-id: 9e11e461001467f0000ea7c355d5b0f0d738fa85	2019-03-27 14:56:32 -07:00
Vitaly Fedyunin	58f3712ceb	Clarify error text of the pin_memory function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18530 Reviewed By: ezyang Differential Revision: D14647578 Pulled By: VitalyFedyunin fbshipit-source-id: ddd70240d52d2e9a96e26f5a0dfea8d76fe25078	2019-03-27 14:56:29 -07:00
Wanchao Liang	6684ef3f23	Move fast rnn benchmark to pytorch/pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18369 Differential Revision: D14652039 Pulled By: wanchaol fbshipit-source-id: 1177b1f60d96672c3e2c9d527b56ee06ca7c0af1	2019-03-27 14:46:09 -07:00
eellison	e4f1681c82	Rename isTensor api -> isCompleteTensor (#18437 ) Summary: Is Tensor has been brought up as misleading a couple times, rename it isCompleteTensor for clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18437 Differential Revision: D14605223 Pulled By: eellison fbshipit-source-id: 189f67f12cbecd76516a04e67d8145c260c79036	2019-03-27 14:46:06 -07:00
Elias Ellison	1eee2090d4	Const trace error v2 (#18535 ) Summary: Trying to reland https://github.com/pytorch/pytorch/pull/18298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18535 Differential Revision: D14652391 Pulled By: eellison fbshipit-source-id: 699e30045dd5f14f0a2b98378272045a292e1e2a	2019-03-27 14:40:56 -07:00
jithunnair-amd	fdedc62c26	enable more unit tests (#18537 ) Summary: Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537 Differential Revision: D14651822 Pulled By: ezyang fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235	2019-03-27 14:27:23 -07:00
Min Ni	c3e3c5cc39	Skip tests if C2/ONNX models cannot be read (#18494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18494 Today we have some C2 end2end test run requiring reading model data from external filesystem (for example, Gluster and AWS). This could be a source for flaky test when the external filesystems are not reachable during the tests. In this diff, we add try/catch logic around where we download models and open model files from external system. In case such attempts fails, we will catch the excption and let the unittest skip the current test instead of failure. I also refactor the code a little bit by removing some duplicated logic on downloading and build the c2 model data. It has been duplicated in two classes and a few functions... Reviewed By: yinghai Differential Revision: D14442241 fbshipit-source-id: da8bf56c8d096efa34ca2070de5cd10a18aad70c	2019-03-27 11:21:44 -07:00
zrphercule	30da6c7d06	Add qtensors in caffe2 protobuf argument (#18486 ) Summary: We are about to merge onnxifi quantization support soon. Before that, I would like to merge this diff seperately to make sure it doesnt break anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18486 Reviewed By: bddppq, houseroad Differential Revision: D14626419 Pulled By: yinghai fbshipit-source-id: 504c1eae60be1e629203267b59defb8b69d82c0a	2019-03-27 11:16:40 -07:00
Paul O’Shannessy	defe67caf2	Generate sphinx docs with secure content. (#18508 ) Summary: There are a number of pages in the docs that serve insecure content. AFAICT this is the sole source of that. I wasn't sure if docs get regenerated for old versions as part of the automation, or if those would need to be manually done. cf. https://github.com/pytorch/pytorch.github.io/pull/177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18508 Differential Revision: D14645665 Pulled By: zpao fbshipit-source-id: 003563b06048485d4f539feb1675fc80bab47c1b	2019-03-27 11:01:48 -07:00
ZhuBaohe	8c3285bf11	Fix loss functions doc (#18420 ) Summary: Correct docstring display error on web page caused by my previous PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/18420 Differential Revision: D14642467 Pulled By: soumith fbshipit-source-id: 16fdd3301a4c5bad27fbcd8686f7fbfcc1e908ee	2019-03-27 10:23:24 -07:00
Edward Yang	81e030d9a6	Upgrade flake8-bugbear to master, fix the new lints. (#18507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507 ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18507 Upgrade flake8-bugbear to master, fix the new lints. It turns out Facebobok is internally using the unreleased master flake8-bugbear, so upgrading it grabs a few more lints that Phabricator was complaining about but we didn't get in open source. A few of the getattr sites that I fixed look very suspicious (they're written as if Python were a lazy language), but I didn't look more closely into the matter. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14633682 fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8	2019-03-27 08:07:41 -07:00
peter	85d78a0532	Add export annotations for functions in c10 (#18464 ) Summary: Fixes #18461. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18464 Differential Revision: D14620963 Pulled By: ezyang fbshipit-source-id: c11f3967de2ac69c7140767c8fe73a85555e9f40	2019-03-27 07:58:58 -07:00
Li Yu	a3933b87c6	Back out "Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch" (#18514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18514 Original commit changeset: d6267ddfc339 Reviewed By: bddppq Differential Revision: D14634476 fbshipit-source-id: 2633b0b4c512d71001e5c20cd79c0c0d7856f942	2019-03-26 23:44:33 -07:00
Lu Fang	eae7ad4ca8	Automatic update of fbcode/onnx to b29e78a4efb8e5d8995f576bbf19a846807829b6 (#18503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18503 Previous import was c05f2ae412daf8fd64136ca354b97ccf73e0ea6c Included changes: - [b29e78a4](https://github.com/onnx/onnx/commit/b29e78a4): update copyright for open governance (#1885) <Prasanth Pulavarthi> - [3b0ecd55](https://github.com/onnx/onnx/commit/3b0ecd55): open governance (#1881) <Prasanth Pulavarthi> - [bbe28349](https://github.com/onnx/onnx/commit/bbe28349): Revert "Adding Reverse op (#1804)" (#1882) <Lu Fang> - [5be3e223](https://github.com/onnx/onnx/commit/5be3e223): Adding Reverse op (#1804) <Peyman Manikashani> Reviewed By: zrphercule Differential Revision: D14632717 fbshipit-source-id: 2637a4090e7071a59caff3a910fa4f077906bf3c	2019-03-26 21:58:22 -07:00
Yinghai Lu	f3ddc40ca4	Move weight offload inside backend construction functor (#18385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18385 By moving the weight offload into the backend initialization function, we can instantiate the backend once by creating the OnnxifiOp once and then clean up the parameter workspace. And we need to keep hold of that instantiated net (OnnxifiOp) without cleaning it. Subsequent ctor of OnnxifiOp of the same model will hit the cached backend and they will not look into weight offloading, which is safe as the weight is already gone. Reviewed By: ipiszy Differential Revision: D14590379 fbshipit-source-id: f7f34016e09777ad3df0af487885cd14658e1044	2019-03-26 21:03:17 -07:00
Tongzhou Wang	60538c8366	fix #16448 (#18479 ) Summary: Fixes #16448 bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/18479 Differential Revision: D14635360 Pulled By: ezyang fbshipit-source-id: 4010319fbce050dd0bdf4da3cd1171b9737f3c4c	2019-03-26 20:58:25 -07:00
James Reed	f447b63ed0	Add section about .code to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18493 Differential Revision: D14634677 Pulled By: jamesr66a fbshipit-source-id: 9ee065f6ce4218f725b93deb4c64b4ef55926145	2019-03-26 20:52:31 -07:00
Stas Bekman	45ec4920e3	how to use the `ccache` package on Ubuntu (#18495 ) Summary: Added full instructions for how to use the `ccache` package. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18495 Differential Revision: D14635351 Pulled By: ezyang fbshipit-source-id: 158e1052bae580e95f73644252fdbddcc0213128	2019-03-26 20:08:09 -07:00
peterjc123	d5861aa55c	Append c10 libs to TorchConfig.cmake (#18418 ) Summary: Fixes #18416. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18418 Differential Revision: D14635322 Pulled By: ezyang fbshipit-source-id: 81cb658f73583e4cd0358173617f747ebf4f7f8a	2019-03-26 19:53:02 -07:00
Xiang Gao	2ba41c5550	Add some missing docs for tensor methods and attributes, new unittest to enforce tensors.rst no longer miss anything (#16057 ) Summary: This depend on https://github.com/pytorch/pytorch/pull/16039 This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`. When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057 Differential Revision: D14619550 Pulled By: ezyang fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc	2019-03-26 18:05:56 -07:00
Li Yu	66e8c74814	Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch Differential Revision: D14613517 Original commit changeset: dd20d718db55 fbshipit-source-id: d6267ddfc339d04f182e2de1750a601c8d6bf8c6	2019-03-26 17:37:55 -07:00
Junjie Bai	e912632b74	Fix direct comparison of OperatorDef proto structs (#18466 ) Summary: arguments order is okay to be different ajyu Pull Request resolved: https://github.com/pytorch/pytorch/pull/18466 Differential Revision: D14627258 Pulled By: bddppq fbshipit-source-id: 430e1fb1bea2c5639a547ae7c1652368788c86b9	2019-03-26 17:25:09 -07:00
Soumith Chintala	66628f78b7	Revert D14605905: [pytorch][PR] Add return_counts to torch.unique Differential Revision: D14605905 Original commit changeset: 555f5a12a8e2 fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db	2019-03-26 17:18:01 -07:00
Sameer Indarapu	bdd098c694	Fix typo in Github links in elementwise_ops_schema.cc (#18018 ) Summary: s/elementwise_op_schema.cc/elementwise_ops_schema.cc Pull Request resolved: https://github.com/pytorch/pytorch/pull/18018 Differential Revision: D14612291 Pulled By: soumith fbshipit-source-id: 09276283b9ff92c039ce530165c62cc8421fb443	2019-03-26 15:37:26 -07:00
Tongzhou Wang	5292685d2f	Improve numerical precision of (s)logdet (#18449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449 Differential Revision: D14611638 Pulled By: soumith fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf	2019-03-26 15:32:14 -07:00
Soumith Chintala	436723122e	fix arange shape issue inconsistency across cpu and cuda (#18462 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462 Differential Revision: D14620263 Pulled By: soumith fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb	2019-03-26 15:27:24 -07:00
Kevin Chen	bbe110f4e1	Updating onnxtrt submodule to master branch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18441 Differential Revision: D14613517 Pulled By: bddppq fbshipit-source-id: dd20d718db55942df9cce7acd1151d6902bc57ff	2019-03-26 14:25:55 -07:00
BowenBao	654e59fcac	Minor fix for onnx ConstantOfShape export (#18199 ) Summary: Set value as tensor of 1 element instead of scalar, according to ONNX spec. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18199 Reviewed By: dzhulgakov Differential Revision: D14542588 Pulled By: houseroad fbshipit-source-id: 70dc978d870ebe6ef37c519ba4a20061c3f07372	2019-03-26 13:23:16 -07:00
Xiang Gao	5bff395a82	Namedtuple return for solve, slogdet, sort, topk (#17093 ) Summary: More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~ cc: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093 Differential Revision: D14620068 Pulled By: ezyang fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237	2019-03-26 12:39:08 -07:00
Sebastian Messmer	c6bfcb854b	Expose c10 operators to caffe2 by operator name (#18160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18160 When exposing a c10 operator to the caffe2 frontend, don't use the operator schema but use the operator name instead. This allows us to get rid of the existing mechanism for operator schema registration in a diff stacked on top. Reviewed By: dzhulgakov Differential Revision: D14513420 fbshipit-source-id: 6b08a9c6d9497eaf18b62361dd44bc07c7b4b76b	2019-03-26 12:36:11 -07:00
Edward Yang	3bbe204f32	Test running a CUDA build on CPU machine. (#18242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18242 ghimport-source-id: b949d312a48226a34f90304162e910acee7c95cd Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584429 fbshipit-source-id: b54de5b33f0c795a7d9605d30576cdf9b74050fd	2019-03-26 12:31:11 -07:00
Edward Yang	0aeaeffb6c	Properly use cudaGetLastError return code. (#18485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18485 I don't know how (1) we landed the wrong version of the patch and (2) how this passed the push blocking test Reviewed By: pjh5 Differential Revision: D14621961 fbshipit-source-id: 0a3953d7adcdc79727a61c2acff65f436dcafe55	2019-03-26 12:26:44 -07:00
Xiaomeng Yang	265fa0ce4d	Move math::Axpy function to elementwise lib (#18316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18316 Move math::Axpy function to elementwise lib i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D14574697 fbshipit-source-id: 7cfbb2da295c8966c5328bd6b577cce2638eea62	2019-03-26 12:19:19 -07:00
Gu, Jinghui	6f3186a578	Upgrade mkldnn to version 0.18.1 (#18463 ) Summary: Upgrade mkldnn to version 0.18.1 Fix the MKLDNN build issue if linking with MKL 2019. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18463 Differential Revision: D14620228 Pulled By: ezyang fbshipit-source-id: 136074ad0e4631e1dde4ca1b0af4ee6a41e50913	2019-03-26 11:00:25 -07:00
Pat Mellon	fa0bfa03ed	Add Google tag (#17690 ) Summary: This PR adds a Global Site Tag to the site. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17690 Differential Revision: D14620816 Pulled By: zou3519 fbshipit-source-id: c02407881ce08340289123f5508f92381744e8e3	2019-03-26 10:35:24 -07:00
Gemfield	20159c3ffe	remove redundant --install_dir parameter in GEN_COMMAND (#18473 ) Summary: remove redundant --install_dir parameter in GEN_COMMAND, since "--install_dir parameter " already contained in ${GEN_COMMAND}. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18473 Differential Revision: D14620193 Pulled By: ezyang fbshipit-source-id: ee9953b5d055f4b8beb3557f95f6539051b0028a	2019-03-26 10:22:00 -07:00
Iurii Zdebskyi	1a742075ee	Resolving comments from Bool Tensor for CPU PR (#18165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165 ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18166 Bool Tensor for CUDA * #18165 Resolved comments from Bool Tensor for CPU PR ------- ------------ This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs. gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies: [utils/python_scalars.h] why is this converting from uint8_t and not bool? (comment?) When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool. [ATen/Dispatch.h]better name?. fixed. [test/test_torch.py] what about other factories, such as full? (and more). There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR. [generic/THTensorMath.h] any changes in this file actually needed? Bad merge. Fixed. [TH/THTensor.h] this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool? Added [c10/core/ScalarType.h] I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here? Added bool to the macro and created a similar one without for a single case which fails the build with errors: _./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value’) return (this) insertConstant(rhs);_ Differential Revision: D14605105 fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d	2019-03-26 09:59:34 -07:00
Edward Yang	515238e0a5	Unify cudaGetDeviceCount implementations. (#18445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18445 ghimport-source-id: 30d018737bf6989bc68b7e3676f44e0ca6141fde Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18445 Unify cudaGetDeviceCount implementations. I went about doing this by searching for calls to cudaGetDeviceCount, and then methodically replacing them with references to c10::cuda::device_count() or at::cuda::device_count(). There is a point to doing this: the various implementations wildly differed in their handling of what to do when cudaGetDeviceCount returns an error. The final standardized behavior is that all errors are swallowed and we return device count of zero. This indirectly fixes running CUDA builds on CPU, which was broken in #17847. I added 'noexcept' to the 'deviceCount' virtual method on DeviceGuardImpl. This is a BC-breaking change for anyone inheriting from DeviceGuardImpl but all you need to do is put 'noexcept' on your method and it is backwards compatible with older libtorch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14612189 fbshipit-source-id: 3c8d186e3dd623c0e27625212c7ce30f75d943cb	2019-03-26 09:50:14 -07:00
Christian Puhrsch	cf094d4edc	Use TensorIterator for unary operations Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18309 Differential Revision: D14591533 Pulled By: cpuhrsch fbshipit-source-id: a3b0788a481bddf1803c9f2d3289263d7364f8d7	2019-03-26 09:22:52 -07:00
vishwakftw	5e462a3ed6	Introduce SobolEngine (#10505 ) Summary: `SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111. Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332. This closes #9332 . cc: soumith Balandat Pull Request resolved: https://github.com/pytorch/pytorch/pull/10505 Reviewed By: zou3519 Differential Revision: D9330179 Pulled By: ezyang fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d	2019-03-26 07:53:07 -07:00
Wanchao Liang	9080942afb	fix str of autogradzero Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18442 Differential Revision: D14602880 Pulled By: wanchaol fbshipit-source-id: ebd00f9bb5f1f7e33964c10d8c9f165b7bb4985f	2019-03-25 23:49:27 -07:00
eellison	dc6b5b2a52	Optimize boolean expressions & unwraps (#18259 ) Summary: Simplify or eliminate boolean and/or expressions, optimize unwrapping a value that cannot be None, and optimize using `is` with a None and a non-None value Since peephole optimize is now introducing constants, i added another constant propagation pass after running it. Previously i had a PR that did this & optimized shape ops - i will add the shape optimizations in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18259 Differential Revision: D14602749 Pulled By: eellison fbshipit-source-id: 1c3f5a67067d8dfdf55d7b78dcb616472ea8a267	2019-03-25 21:50:57 -07:00
Junjie Bai	a729630cbf	Fix python resolution in caffe2 CI scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18417 Differential Revision: D14612704 Pulled By: bddppq fbshipit-source-id: 0942048a9c3990afc50ce73c1fa1005c4d4097aa	2019-03-25 20:56:15 -07:00
Xiang Gao	bf2a30cb22	Support dim=None for argmax and argmin (#18264 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/18263 cc: houseroad Pull Request resolved: https://github.com/pytorch/pytorch/pull/18264 Reviewed By: ezyang Differential Revision: D14559234 Pulled By: houseroad fbshipit-source-id: c5b8623752d6c6af41c6d715fd9585a65294868d	2019-03-25 20:43:34 -07:00
Xiang Gao	e2730ddb21	Add return_counts to torch.unique (#18391 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/12598 This PR was originally authorized by ptrblck at https://github.com/pytorch/pytorch/pull/15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`. The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see https://github.com/pytorch/pytorch/issues/18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is. cc: ptrblck ezyang ngimel colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/18391 Reviewed By: soumith Differential Revision: D14605905 Pulled By: VitalyFedyunin fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce	2019-03-25 20:38:17 -07:00
Natalia Gimelshein	ba4de667fa	change dropout lowering in symbolic_script (#18375 ) Summary: Dropout is now eligible for fusion, and generated fused kernels are just as fast as dropout in ATen. Change its lowering in symbolic script so that it can actually be fused. Still special-cased for cuda, because without fusion this lowering is less efficient than current (bernoulli_ * input). Testing is covered by the test case that ailzhang added (test_dropout_cuda). Pull Request resolved: https://github.com/pytorch/pytorch/pull/18375 Differential Revision: D14611938 Pulled By: soumith fbshipit-source-id: 11b18f4784e6c9265e382a8f8deca7add8df3b37	2019-03-25 20:05:11 -07:00
Gao, Xiang	a40e0a7f2d	Add torch.version.git_version (#18299 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/18293 cc: colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/18299 Differential Revision: D14611972 Pulled By: soumith fbshipit-source-id: cdb48ef37c8869713a9a43ea0da08e1bed9279a2	2019-03-25 19:59:40 -07:00
Xiang Gao	674c274d92	Change deprecated IntList to IntArrayRef Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18262 Differential Revision: D14612244 Pulled By: ezyang fbshipit-source-id: 5d21c7b94d64104fececcb15c6d38d9bd2a1fc70	2019-03-25 19:47:21 -07:00
Tongzhou Wang	d1e416ac73	Enable printing to stderr for test_proper_exit for better debugging (#18458 ) Summary: related to https://github.com/pytorch/pytorch/issues/16608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18458 Differential Revision: D14611718 Pulled By: soumith fbshipit-source-id: 6dc903ff2d32b9c3b76470869d1f4e9a67f706df	2019-03-25 19:20:21 -07:00
Karl Ostmo	e1c272797b	Don't require pygraphviz for regenerate.sh (#17485 ) Summary: closes #17336 Do not overwrite config.yml if script throws an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/17485 Differential Revision: D14604388 Pulled By: kostmo fbshipit-source-id: 5024545e3a8711abdbc0800911c766929dbca196	2019-03-25 18:04:53 -07:00
Mikhail Zolotukhin	13b95eac55	Add quant-passes stubs. (#18151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18151 ghimport-source-id: 7d12462971bdf3e5e26a3f150f1fcad05bba1a15 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18152 Initial implementation of InsertObserverNodes pass. * #18151 Add quant-passes stubs. gh-metadata: pytorch pytorch 18149 gh/zolotukhinm@gmail.com/1/head Differential Revision: D14584224 fbshipit-source-id: b3d0b5ff797160d5ad23f91f732e627b0129086c	2019-03-25 17:48:54 -07:00
Duc Ngo	6a1a019c0a	caffe2 - support flaky operator tests for caffe2 build (#18155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18155 - Make a python decorator caffe2_flaky for caffe2 operator unit tests. - The environment variable CAFFE2_RUN_FLAKY_TESTS are now used to mark flaky test mode During test run, - If flaky tests mode are on, only flaky tests are run - If flaky tests mode are off, only non-flaky tests are run Mark ctc_beam_search_decoder_op_test as flaky Reviewed By: ezyang, salexspb Differential Revision: D14468816 fbshipit-source-id: dceb4a48daeb5437ad9cc714bef3343e9761f3a4	2019-03-25 16:58:34 -07:00
iurii zdebskyi	7a90bae416	Remove unused th_scalar_type (#18390 ) Summary: th_scalar_type seems to be unused anywhere so can be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18390 Reviewed By: ezyang Differential Revision: D14591374 Pulled By: izdeby fbshipit-source-id: 2113aa81229cdfdfb8dc5c951ea6dea3725b8582	2019-03-25 15:55:10 -07:00
Ivan Ogasawara	56c16fe26f	Porting CPU UpSample functions to ATen (#18020 ) Summary: This PR resolves partially #10482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18020 Differential Revision: D14598029 Pulled By: ezyang fbshipit-source-id: 513e7c6438ab6d5dc3f43241e7cb724744e9a287	2019-03-25 14:39:13 -07:00
nihui	ed8c462dc7	Fix caffe2 build with BLAS=OpenBLAS (#18422 ) Summary: g++ complains about failing to find the declaration of cblas_sscal and cblas_dscal BLAS function let's fix it :) fedora 29, gcc 8.3.1, openblas 0.3.5 build with cmake -DBLAS=OpenBLAS .. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18422 Differential Revision: D14598977 Pulled By: soumith fbshipit-source-id: bde77bfb359d2ff38226401caeed78c114ef7468	2019-03-25 11:59:10 -07:00
Wanchao Liang	6c9b312fd4	Add addcmul, lerp to fuser, enable scalar->float specialization in symbolic script (#18081 ) Summary: This PR did two things: 1. Enable scalar->float specialization in symbolic script, so AD formula that contains scalar in the schema, should write `float` instead. 2. add addcmul, lerp to AD and fuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18081 Differential Revision: D14490493 Pulled By: wanchaol fbshipit-source-id: b3b86d960d5f051b30733bc908b19786111cdaa4	2019-03-25 11:05:45 -07:00
Edward Yang	50df3e5e2e	Add ability to query if built with CUDA and MKL-DNN. (#18362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362 ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Fixes #18108. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584430 fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473	2019-03-25 10:39:09 -07:00
svcscm	17fcdfb925	Updating submodules Reviewed By: yns88 fbshipit-source-id: b2c5eb7dfa9048e399461c00d1103e945a30a5bc	2019-03-25 10:32:26 -07:00
Vitaly Fedyunin	5653a914f7	Implement reference counting for shared IPC CUDA tensors (#16854 ) Summary: This is to fix #16141 and similar issues. The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage. ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854 Differential Revision: D13994490 Pulled By: VitalyFedyunin fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1	2019-03-25 10:24:38 -07:00
Gregory Chanan	f5ea528687	Don't segfault on trying to get data_ptr of sparse tensor. (#18347 ) Summary: Also asserts in storage_initialized that there is a storage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18347 Differential Revision: D14582028 Pulled By: gchanan fbshipit-source-id: df3f5d181188f39e361839169fd054539c3b2839	2019-03-25 08:59:53 -07:00
Gregory Chanan	647154f82a	Assert tensor isn't sparse in enforce_invariants. (#18338 ) Summary: There's no reason we can't check this, but I'm punting on implementing it for now. But it currently segfaults, so this is an improvements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18338 Differential Revision: D14580308 Pulled By: gchanan fbshipit-source-id: 44d4cafeab12e1beeb3453a2d4068d221c2e9c4f	2019-03-25 08:44:17 -07:00
Sacha	a4f83fff2b	Only look for Caffe2 package when shared (#18421 ) Summary: Previously it would look for the Config even if it was not written. Fixed #18419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18421 Differential Revision: D14597139 Pulled By: ezyang fbshipit-source-id: c212cbf5dc91564c12d9d07e507c8285e11c6bdf	2019-03-25 07:27:24 -07:00
Summer Deng	c297f26843	Add more options to the quantization model exporter (#18383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18383 Add command line options for different quantization schemes. Reviewed By: amylittleyang Differential Revision: D14476862 fbshipit-source-id: 37fbf5b4c1c550121eae313f5a71d703a0a87f0f	2019-03-25 04:23:17 -07:00
Thomas Viehmann	9e176fe5fe	Revert "Specialize optional tensor inputs to graphs in the JIT (#18360 )" (#18411 ) Summary: This reverts commit 7cc7ed1322405ba3c627b9c5661a330f92c4183d. I think it's better to sort out the issues raised in #18407 firs. I'm sorry for not stopping it earlier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18411 Differential Revision: D14594937 Pulled By: soumith fbshipit-source-id: 3c90b7fa7694e2f59e55607acecde4a47af801ea	2019-03-24 21:29:29 -07:00
Gao, Xiang	5acac411e4	Fix deprecated: type() -> scalar_type() (#18406 ) Summary: Sorry for not sending these fixes in a single PR. I found this compiler warning when I was working on something else, and I just go to GitHub and modify the file directly for convenience... Pull Request resolved: https://github.com/pytorch/pytorch/pull/18406 Differential Revision: D14594180 Pulled By: soumith fbshipit-source-id: 92f48513bc62fbe2c67c759d68830a973296e43b	2019-03-24 19:46:03 -07:00
Gao, Xiang	6c029c80f7	Fix deprecated: type() -> scalar_type() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18394 Differential Revision: D14593890 Pulled By: soumith fbshipit-source-id: 92b9a8c22008341c0cc3b7a721bef1973c528daf	2019-03-24 19:27:23 -07:00
mc-robinson	8bc5b86709	Added tensor size warning to F.mse_loss() (#18349 ) Summary: To address the issue of broadcasting giving the wrong result in `nn.MSELoss()` as mentioned here https://github.com/pytorch/pytorch/issues/16045 . In particular, the issue often arises when computing the loss between tensors with shapes (n, 1) and (n,) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18349 Differential Revision: D14594176 Pulled By: soumith fbshipit-source-id: f23ae68a4bf42f3554ad7678a314ba2c7532a6db	2019-03-24 19:22:14 -07:00
Elias Ellison	ca962f0f95	Fix For Requires Grad Infinite Loop (#18361 ) Summary: Previously, we would continue to run requires grad on a loop body when the outputs and inputs disagreed. This adds a check so that we don't continue running if the results haven't changed since the last run. Fix for https://github.com/pytorch/pytorch/issues/18320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18361 Differential Revision: D14584332 Pulled By: eellison fbshipit-source-id: 696b225f80a2036318540946428b525985a9e735	2019-03-24 14:34:50 -07:00
Soumith Chintala	92c9fef860	update magma instructions (#18410 ) Summary: fixes https://github.com/pytorch/pytorch/issues/18389 cc: stas00 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18410 Differential Revision: D14594198 Pulled By: soumith fbshipit-source-id: fb46ef77a36c90ad95e47f7066f5d32aa1f1370f	2019-03-24 13:15:11 -07:00
Iurii Zdebskyi	1323c193ed	Removed some dead code (#18201 ) Summary: Removed some dead code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18201 Differential Revision: D14555251 Pulled By: izdeby fbshipit-source-id: f49640133ef4ae1b0306f7cec6655f23869cc6e7	2019-03-24 08:24:03 -07:00
Thomas Viehmann	7cc7ed1322	Specialize optional tensor inputs to graphs in the JIT (#18360 ) Summary: This specializes optional tensor inputs to either a DimensionedTensorType or, when None is passed, UndefinedTensor (aka AutogradZeroTensorType). This works because we already have different specs and thus separate plans for the two cases. It enhances the shape analysis - because now unwrapped optional tensors will have DimensionedTensorType with appropriate shape and required grad etc. Also, when combined with "if-pruning" (which I understand #18259 works towards), we actually get much nicer concrete graphs, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18360 Differential Revision: D14590577 Pulled By: soumith fbshipit-source-id: cac204a506d1d38b15703cbcc67a6b75fd4979f4	2019-03-23 23:00:37 -07:00
Will Feng	32d0e7e339	Move pyobj_ to TensorImpl (#18225 ) Summary: Currently, `THPVariable_Wrap(…)` and `THPVariable_NewWithVar(…)` depend on the existence of `pyobj_` in the autograd metadata of a Variable to convert the Variable to a Python tensor. However, after the Variable/Tensor merge, there will be Variables that don't contain autograd metadata, and to allow the conversion from non-autograd-meta Variable to a Python tensor we need to store the `pyobj_` outside of autograd metadata and in a place where it will always be available. This PR makes it possible by moving `pyobj_` into TensorImpl, so that `THPVariable_Wrap(…)` and `THPVariable_NewWithVar(…)` can always access a Variable's `pyobj_` and convert the Variable to a Python tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18225 Differential Revision: D14562616 Pulled By: yf225 fbshipit-source-id: 18d4aaace70eee6120abaf9276036d1f8f51b18d	2019-03-23 12:50:38 -07:00
Xiang Gao	5860fa5dcf	Fix deprecated scalar type in ATen/native/Distributions.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18265 Differential Revision: D14577543 Pulled By: ezyang fbshipit-source-id: 36674530b32366c51835e4073d7ba23d455d2fda	2019-03-23 10:09:26 -07:00
Edward Yang	d9960fbdb2	Revert D14446895: [C2] Implement rotated generate_proposals_op without opencv dependency (~2x faster) Differential Revision: D14446895 Original commit changeset: 847f2443e645 fbshipit-source-id: fc6ab5ee59e027f125f5ab0f7ee51ad7db37d4a4	2019-03-23 09:38:55 -07:00
Michael Suo	d85451c07b	Revert D14584266: [pytorch][PR] Better error message for tensor with grad as constant in tracing Differential Revision: D14584266 Original commit changeset: 4e7850dadc78 fbshipit-source-id: 3bb3b5006e469edff984c16e0ff8d5dac2862d88	2019-03-23 02:50:54 -07:00
Elias Ellison	7c2290e7ce	Better error when module attr is used (#18164 ) Summary: Adds a suggestion to add to __constants__ when a torch.nn.Module attr is accessed Pull Request resolved: https://github.com/pytorch/pytorch/pull/18164 Differential Revision: D14580060 Pulled By: eellison fbshipit-source-id: 0c5adc21d7341a5691d4b45930947cb1ba84c8e8	2019-03-22 20:22:27 -07:00
Will Feng	7be05b822c	Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values (#18179 ) Summary: Currently, this code gives incorrect result: ```python import torch indices=torch.tensor([[7, 1, 3]]) values=torch.tensor([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]]) x = torch.sparse_coo_tensor(indices, values, size=(10, 3)) values=torch.tensor(1.).expand(3, 3) y = torch.sparse_coo_tensor(indices, values, size=(10, 3)) z = x + y tensor(indices=tensor([[7, 1, 3]]), values=tensor([[2., 1., 1.], [1., 1., 1.], [1., 1., 1.]]), size=(10, 3), nnz=3, layout=torch.sparse_coo) ``` This PR fixes the bug by adding special handling for sparse tensors with non-contiguous values in the addition function (specifically, by cat'ing the indices and values together). This PR closes https://github.com/pytorch/pytorch/issues/17950 and https://github.com/pytorch/pytorch/issues/17919. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18179 Reviewed By: ezyang Differential Revision: D14569591 Pulled By: yf225 fbshipit-source-id: f5a14c4a31337fc95eab64596212066b4fb18b1a	2019-03-22 19:35:14 -07:00
Jing Huang	6052d04100	Implement rotated generate_proposals_op without opencv dependency (1.8x faster) (#18010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18010 [C2] Implement rotated generate_proposals_op without opencv dependency. Reviewed By: newstzpz Differential Revision: D14446895 fbshipit-source-id: 847f2443e645f8cae1327dfbaa111c48875ca9be	2019-03-22 18:15:27 -07:00
Mikhail Zolotukhin	0d78126a6f	Remove empty file (actual file_check.cpp resides in torch/csrc/jit/testing) (#18303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18303 ghimport-source-id: 66f4402075b123e36c6ffdf806b7c93187a1a58a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18307 Convert test_recursive_cse to use Filecheck inline annotations. * #18306 [Filecheck] Add a feature to parse check annotations from string. * #18305 Add python bindings for parseIR. * #18303 Remove empty file (actual file_check.cpp resides in torch/csrc/jit/testing) Differential Revision: D14586003 fbshipit-source-id: a13e57bd4302e4d3f06198068d525de25e2aa8b3	2019-03-22 17:03:25 -07:00
Michael Suo	ff3ecfec89	Turn script_type_parser into a class (#18211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18211 ghimport-source-id: 73b81e9ec631937b14db1da10991831788a6894b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18296 [jit] Add namespacing for ScriptClasses * #18284 [jit] make test module hook use save/load * #18211 [jit] Turn script_type_parser into a class * #18148 [jit] python interop for script classes If we are namespacing classes, the type parser will need to carry around some state about which namespaces to look in. This PR just wraps it in a class in preparation. Also, subscriptToType can no longer be static, since parseTypeFromExpr may give different results depending on the namespaces available, so it's been made a regular function instead of a static map lookup. Reviewed By: eellison Differential Revision: D14581128 fbshipit-source-id: 711315472ccde1920abf9fdb5a871ac27fb86787	2019-03-22 16:30:05 -07:00
Michael Suo	10751d5fb4	python interop for script classes (#18148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18148 ghimport-source-id: 40a9d745dc9aeba53d098743323fcbd50ca65137 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18148 py interop Support for converting classes between the Python–TorchScript boundary. Like other TorchScript values, ScriptClasses are native Python values when used in Python and IValues when used in TorchScript. Notably, there is a copy across this boundary, which will be surprising to users who will expect standard Python reference semantics. I have some ideas for fixing that, but it's a more involved process. Reviewed By: jamesr66a Differential Revision: D14526259 fbshipit-source-id: 5916e3032488a42dc7da756c1826d7c040a21ebd	2019-03-22 16:30:04 -07:00
Elias Ellison	3badea6eb3	Better error message for tensor with grad as constant in tracing (#18298 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/17583 There's an unrelated issue right now causing a segfault when printing tensor so that might have to fixed first for this to land Pull Request resolved: https://github.com/pytorch/pytorch/pull/18298 Differential Revision: D14584266 Pulled By: eellison fbshipit-source-id: 4e7850dadc78ef1e98ad40b9d8adc0fef42acf48	2019-03-22 15:29:30 -07:00
Nikolay Korovaiko	2ad2b2c7b1	Support for basic list comprehensions (#17267 ) Summary: Supports the following syntax: ``` torch.jit.script def comp(l): # type: (List[float]) -> List[float] n = [x * 3 for x in l] return n ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17267 Differential Revision: D14581119 Pulled By: Krovatkin fbshipit-source-id: 6fd091a8a9ab607386ac58fda6ad88bf8aea380e	2019-03-22 15:25:13 -07:00
Edward Yang	e20894fce5	Make it possible to trigger XLA/slow tests via commit message. (#18345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18345 ghimport-source-id: 9649d76bb194866859d62e6ba2a3a265c96ebba5 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18345 Make it possible to trigger XLA/slow tests via commit message. Four variants are supported: `[xla ci] [ci xla] [xla test] [test xla]`; substitute xla with slow for slow tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584557 fbshipit-source-id: fcbfdfb28246823135bb3d3910baae073d16e81d	2019-03-22 15:06:40 -07:00
Sebastian Messmer	f68faa35c0	Avoid refcount when looking up dispatch key Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18294 Reviewed By: ezyang Differential Revision: D14512979 fbshipit-source-id: 45e548974f06184c375c2bb8339e3049a4ebd880	2019-03-22 14:09:20 -07:00
Jiakai Liu	e5eb871419	Fix DCHECK to handle dangling else (#18295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18295 Replace "if (false)" with "while (false)" which fixes potential dangling else issue as shown in added test case. Reviewed By: ezyang Differential Revision: D14569608 fbshipit-source-id: 407052db9182ce27b7a59841e90fa50d3eca262e	2019-03-22 14:04:29 -07:00
Natalia Gimelshein	ed47b85d3b	Allow fusion of float function arguments (#18087 ) Summary: so that functions like `def fn(x, p:float)` can be fused. Fixes #9940 and #11186. Fuses only float (not integer) arguments to simplify assembling arguments for fusion launch. CPU fusion is disabled in CI and this won't be tested, but I tested it locally. cc t-vi, apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/18087 Differential Revision: D14581206 Pulled By: wanchaol fbshipit-source-id: ccb0cf79b1751706f9b2cdf1715115eae5a39fb6	2019-03-22 13:52:33 -07:00
Thomas Viehmann	2aac18098d	Fix error reporting in NVRTC use of the fuser (#18327 ) Summary: Two functions were not directed ad NVRTC. It's a bit hard to test this, as the fuser usually produces correct code - unless I try to hack on it. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18327 Differential Revision: D14579285 Pulled By: soumith fbshipit-source-id: 1be7ba461cc473d514ba619507742a47d4d7c97e	2019-03-22 13:41:00 -07:00
Ailing Zhang	fe5d23cf4a	Using sqrt for better precision in cosine_similarity (#18250 ) Summary: address comment in #18168 . Testing in CI... Pull Request resolved: https://github.com/pytorch/pytorch/pull/18250 Differential Revision: D14568601 Pulled By: ailzhang fbshipit-source-id: 39fbbdb08743b53fa665c7e88e4750cbe0976ec7	2019-03-22 13:33:30 -07:00
Jianyu Huang	18a6781f57	Fix alignment issues for Fake BFP16 fp32 -> bfp16 rounding routines (#18321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18321 As title. Reviewed By: jspark1105 Differential Revision: D14575512 fbshipit-source-id: 0e33cdab54b1aef8b67f0b4c366692c5dbdf631d	2019-03-22 12:41:58 -07:00
Dmytro Dzhulgakov	6e0cbc7f31	Untangle internal build python and cpp dependencies Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18326 Reviewed By: ezyang Differential Revision: D14576294 fbshipit-source-id: 186ce1e3d026d962b7386f861eddf093f583a878	2019-03-22 12:18:03 -07:00
Alexander Sidorov	d4c52158c7	Caffe2: crash op (#18207 ) Summary: this is handy when testing various core dump related things. If in the future we want to unit test our future gdb debugger extensions, we can use this op to generate a core dump for us within a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18207 Differential Revision: D14482186 Pulled By: salexspb fbshipit-source-id: 39a9fffbdd4bd083597f544d1c783a82cf023a89	2019-03-22 11:52:01 -07:00
Duc Ngo	172ec4ace5	caffe2 - Util to cleanup external inputs and outputs from a NetDef (#18194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18194 Add a util method to cleanup external inputs and outputs from a NetDef The following conditions will be met after the modification - No duplicate external inputs - No duplicate external outputs - Going through list of ops in order, all op inputs must be outputs from other ops, or registered as external inputs. - All external outputs must be outputs of some operators. Reviewed By: ZolotukhinM Differential Revision: D14528589 fbshipit-source-id: c8d82fda1946aa3696abcbec869a4a8bb22f09b6	2019-03-22 11:23:03 -07:00
Dmytro Dzhulgakov	7397eb7e8e	End to end hack to call server side Caffe2 ops (#18267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18267 Motivation: we don't actually want to use it for real under any circumstances. This is an idea to unblock our internal progress and parallelize workstreams. We can easily define schemas for all ops in question and implement forwarding to C2 ops which is NOT going to be performant. Then several things can be happening in parallel: * move code of ops outside of C2 ops that depend on protobuf into c10 * development of optimization/fusion passes * building python-level wrappers with clean API * improving perf This demonstrates, Relu, quant, dequant. It seems to cover all use cases necessary (maybe except weights prepacking). Ideally I'd demonstrate Conv, but will get to it later in a separate PR (contributions welcomed) Reviewed By: ezyang Differential Revision: D14531232 fbshipit-source-id: 4cd4a71ae0cb373c6c0e81f965c442b82a1b4069	2019-03-22 11:17:45 -07:00
Bilge Acun	f6df6aed89	Optimize MomentumSGDUpdate maximum block size and make it templated Summary: Removing the maximum number of blocks limit from the operator and making the nesterov parameter templated to remove branching. Reviewed By: BIT-silence Differential Revision: D14567003 fbshipit-source-id: 394c2039ee214adc6ccd2e562e4e9563d307131f	2019-03-22 09:54:25 -07:00
Edward Yang	e3da16a99e	Add test for #17271 (torch.exp incorrect for 2*31 size tensor) (#18292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18292 ghimport-source-id: a3e96584db0eef7b6202a1211808f9f6e59dd529 Stack from [ghstack](https://github.com/ezyang/ghstack): #18292 Add test for #17271 (torch.exp incorrect for 231 size tensor)** * #18291 Correctly call superclass setUp in TestCase subclasses. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14567642 fbshipit-source-id: c60ee7597a86f5d2c5c0b72cb106f17815950427	2019-03-22 07:50:38 -07:00
Edward Yang	2934153f35	Correctly call superclass setUp in TestCase subclasses. (#18291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18291 ghimport-source-id: d6e95e899bd320407967df41435801e54864ba62 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18292 Add test for #17271 (torch.exp incorrect for 2*31 size tensor) #18291 Correctly call superclass setUp in TestCase subclasses. This makes PYTORCH_TEST_SKIP_FAST work correctly for more tests, reducing the wasted testing effort on our slow_test job. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14567643 fbshipit-source-id: 40cf1d6556e0dd0a0550ff3d9ffed8b6000f8191	2019-03-22 07:46:44 -07:00
Gerard Goossen	46990c20fa	Verify def before infer fensor (#18129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129 A lot of tensor interference function assume the operator passes the schema. So call Verity to make sure this is actually the case. Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference) Reviewed By: mdschatz Differential Revision: D14503933 fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb	2019-03-22 06:36:25 -07:00
Jongsoo Park	77a7285764	add more Python interface functions to make quantization simpler (#18246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18246 Simplifies histogram collection and quantization process. Histogram collection before this diff was something like this ``` from caffe2.quantization.server import dnnlowp_pybind11 ... dnnlowp_pybind11.ObserveHistogramOfOutput(hist_file) for ... workspace.RunNet(predict_net) dnnlowp_pybind11.ClearNetObservers() # This is to trigger Stop function in the observer to dump out histogram file but this can have unintended consequence of also clearing all the other useful observers we attached ``` After this diff we can ``` workspace.CreateNet(predict_net) # Note we need to create net to have a net to attach observer histogram_observer = dnnlowp_pybind11.AddHistogramObserver(predic_net, hist_file) for ... workspace.RunNet(predict_net) predict_net.RemoveObserver(histogram_observer) ``` Choosing quantization parameters of weights before this diff was something like this ``` dnnlowp_pybind11.ObserveHistogramOfOutput(weight_hist_file) workspace.RunNetOnce(init_net) dnnlowp_pybind11.ClearNetObservers() # Has same issue as the histogram collection example above dnnlowp_pybind11.RegisterQuantizationParamsWithHistogram( weight_hist_file, is_weight=True, qparams_output_file_name=qparams_file ) workspace.CreateNet(init_net, overwrite=True) dnnlowp_pybind11.ClearNetObservers() logger.info("Loading quantization params from {}".format(qparams_file)) blobs_to_qparams = {} with open(qparams_file) as f: lines = f.readlines() for line in lines: op_id, op_type, output_id, tensor_name, mini, maxi, scale, zero_point, precision = ( line.split() ) op_id = int(op_id) output_id = int(output_id) op = net.Proto().op[op_id] if op_type != op.type or op.output[output_id] != tensor_name: print( "Corrupt qparams file {} {} {} {} {}".format( qparams_file, op_type, op.type, op.output[output_id], tensor_name ) ) blobs_to_qparams[tensor_name] = QuantizationParam(float(scale), int(zero_point)) ``` After this diff this can be simplified to ``` blobs_to_qparams = {} for op in init_net.Proto().op: for output in op.output: scale, zero_point = dnnlowp_pybind11.ChooseQuantizationParams(output) blobs_to_qparams[output] = QuantizationParam(scale, zero_point) ``` Reviewed By: dskhudia Differential Revision: D14544694 fbshipit-source-id: 4fd06cd63256201e2e9d15c39f503138d1be53c2	2019-03-22 00:52:24 -07:00
Weiyi Zheng	f3cf6ed789	add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#18257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18257 support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it. Reviewed By: jspark1105 Differential Revision: D14552695 fbshipit-source-id: 53dd44c84ad019019ab9f35fc04d076b7f941ddc	2019-03-22 00:19:59 -07:00
Lu Fang	afc7574aed	Automatic update of fbcode/onnx to c05f2ae412daf8fd64136ca354b97ccf73e0ea6c (#18285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18285 Previous import was 96c58ceeacf0f2b73d752e413e4fd78787a12da3 Included changes: - [c05f2ae4](https://github.com/onnx/onnx/commit/c05f2ae4): update both core and ml docs (#1879) <Lu Fang> - [f895279b](https://github.com/onnx/onnx/commit/f895279b): fix the problems introduced in previous PRs in operator registration (#1878) <Lu Fang> - [f6f80657](https://github.com/onnx/onnx/commit/f6f80657): Skip the schema check on ops in non-standard domain (#1876) <Lu Fang> - [8c8be722](https://github.com/onnx/onnx/commit/8c8be722): Introduce Function Body Helper (#1868) <Sherlock> - [b605eafb](https://github.com/onnx/onnx/commit/b605eafb): Support down sampling for Upsample with scales < 1. (#1773) <Ke Zhang> - [47f7aa71](https://github.com/onnx/onnx/commit/47f7aa71): Remove scaledtanh (#1866) <Ashwini Khade> - [4dfc56de](https://github.com/onnx/onnx/commit/4dfc56de): Add Ceil support for Max and Average Pooling (#1860) <Lara Haidar> - [552a8efc](https://github.com/onnx/onnx/commit/552a8efc): Add testcase generator for functions (#1862) <Raymond Yang> - [fdb978a5](https://github.com/onnx/onnx/commit/fdb978a5): Promote Thresholded Relu Op (#1856) <Ashwini Khade> - [ce332628](https://github.com/onnx/onnx/commit/ce332628): Update Slice with dynamic input & optional input steps (#1836) <Bowen Bao> - [3a9a8787](https://github.com/onnx/onnx/commit/3a9a8787): Merge function into opschema (#1834) <Raymond Yang> - [3dbf8fe9](https://github.com/onnx/onnx/commit/3dbf8fe9): Handle string comparision represented as np.objects (#1851) <Dmitri Smirnov> - [3b0d3bb2](https://github.com/onnx/onnx/commit/3b0d3bb2): remove global variable in header file (#1850) <Lu Fang> - [1cca8733](https://github.com/onnx/onnx/commit/1cca8733): bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (#1848) <Ke Zhang> - [1ec81bc6](https://github.com/onnx/onnx/commit/1ec81bc6): Change TopK operator to allow dynamic 'k' (#1829) <Hariharan Seshadri> - [a89a4a16](https://github.com/onnx/onnx/commit/a89a4a16): Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (#1832) <Ke Zhang> Reviewed By: yinghai Differential Revision: D14566202 fbshipit-source-id: b1e5912ae6887e2865fc628363071e2b9938dfa4	2019-03-22 00:13:42 -07:00
David Riazati	f79eac2c7a	Cleanup TorchScript rst docs (#18234 ) Summary: * Adds more headers for easier scanning * Adds some line breaks so things are displayed correctly * Minor copy/spelling stuff Pull Request resolved: https://github.com/pytorch/pytorch/pull/18234 Reviewed By: ezyang Differential Revision: D14567737 Pulled By: driazati fbshipit-source-id: 046d991f7aab8e00e9887edb745968cb79a29441	2019-03-21 20:19:17 -07:00
Junjie Bai	46439c78d0	Replace the remaining usages of IntList in caffe2 to IntArrayRef Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18282 Differential Revision: D14569269 Pulled By: bddppq fbshipit-source-id: 5fc33701b83f9efdec4b456d2691764831d10e7f	2019-03-21 16:34:38 -07:00
Yinghai Lu	979db03722	Blacklist certain op types when doing bound shape inference (#18290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18290 Some such as `Tile` will mess up our tracking of batch size and for now it makes sense to stop the shape inference on these ops so that we don't lower it and downstream ops without proper batch info. Reviewed By: zrphercule Differential Revision: D14463550 fbshipit-source-id: 2792481efa540f2a7dd310e677c213860c3053ca	2019-03-21 15:43:05 -07:00
Sebastian Messmer	104773c715	Fix use of c10::guts::apply (#18159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18159 In some instances, the call to forward could clash with std::forward. Fully qualify it to make sure it gets the right one Reviewed By: ezyang Differential Revision: D14512189 fbshipit-source-id: 6242607dbe54fcdb93229c1a4aaee8b84a88caa1	2019-03-21 14:57:33 -07:00
Sebastian Messmer	8b94de06af	Allow using C10_DECLARE_TENSOR_TYPE and C10_DEFINE_TENSOR_TYPE from any namespace (#18158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18158 They didn't work when called from other namespaces before because they didn't fully specify the c10 namespace. Reviewed By: ezyang Differential Revision: D14512187 fbshipit-source-id: a496b89a1bbe2b56137cfae03ab94a60f38d7068	2019-03-21 14:57:32 -07:00
Sebastian Messmer	daa77c6e26	Move schema inference to c10 (#18090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18090 This schema inference is needed by the c10 operator registration mechanism. Move it to c10. It is going to be used by diffs stacked on top. Reviewed By: ezyang Differential Revision: D14491454 fbshipit-source-id: 0f8ddcdbd91467c8347d315dd443a1ca8b216481	2019-03-21 14:57:30 -07:00
Sebastian Messmer	1877087df2	Allow registering same operator schema multiple times (#18038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18038 Now that we have named overloads, we can allow registering the same function schema multiple times and just check it's identical. This is going to be used in custom op registration since they register the schema every time a kernel is registered. Reviewed By: dzhulgakov Differential Revision: D14467494 fbshipit-source-id: 2c26cf72a64b65f120afe05e989302ec42597515	2019-03-21 14:57:28 -07:00
vishwakftw	291746f110	Rename trtrs to triangular_solve (#18213 ) Summary: Changelog: - Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`. - Rename all tests, fix callsites - Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage. - Move `isnan` to _torch_docs.py - Remove unnecessary imports Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213 Differential Revision: D14566902 Pulled By: ezyang fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f	2019-03-21 14:27:21 -07:00
kshitij12345	1c671c56c1	Fix contribution_guide docs (#18237 ) Summary: Fixes Typo and a Link in the `docs/source/community/contribution_guide.rst` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18237 Differential Revision: D14566907 Pulled By: ezyang fbshipit-source-id: 3a75797ab6b27d28dd5566d9b189d80395024eaf	2019-03-21 13:20:57 -07:00
svcscm	cf19ad2152	Updating submodules Reviewed By: yns88 fbshipit-source-id: 80b00c33e6f6c7cfa08f645cd33419f6545f45d2	2019-03-21 13:15:54 -07:00
Xiaomeng Yang	43a5c636e2	Optimize group_norm_op (#17945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17945 Optimize group_norm_op Reviewed By: houseroad Differential Revision: D14419908 fbshipit-source-id: 4024b5c5dbeff97f4f026d61fc44af1f0e98ed68	2019-03-21 13:05:01 -07:00
Edward Yang	9214852da2	Enable running of slow tests in CI. (#18236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18236 ghimport-source-id: 2bb80d017c2ea833669a2d55b340a922b2d44685 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18236 Enable running of slow tests in CI. * #18231 Add a decorator for marking slow tests. These tests only run on master, as they are slow. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14563115 fbshipit-source-id: f54ddef4abedc7e872e58657fc9ac537952773d0	2019-03-21 12:44:45 -07:00
Pieter Noordhuis	a7d886b9a0	Run clang-format on torch/csrc/distributed/c10d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18255 Differential Revision: D14563072 Pulled By: pietern fbshipit-source-id: bd83f90ae949b14bc95f4009ba12319c9b7936d0	2019-03-21 11:55:11 -07:00
Edward Yang	99ddcb2c9f	Shut up compiler about unused the_type. (#18278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18278 ghimport-source-id: 3c35f6e7229c3c2b3a27d96370d7c05fad58365e Stack from [ghstack](https://github.com/ezyang/ghstack): * #18278 Shut up compiler about unused this_type. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14563050 fbshipit-source-id: 4b516f6c9ef3784d1430f793f304066c351b1a93	2019-03-21 11:39:21 -07:00
Edward Yang	549c4da917	Add a decorator for marking slow tests. (#18231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18231 ghimport-source-id: 78c230f60c41877fe91b89c8c979b160f36f856b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18231 Add a decorator for marking slow tests. The general strategy: - It's a normal skip decorator, which triggers a skip if PYTORCH_TEST_WITH_SLOW is not set. - It also annotates the method in question that says it's slow. We use this to implement a catch-all skipper in setUp that skips all non-slow tests when PYTORCH_TEST_SKIP_FAST is set. I added a little smoketest to test_torch and showed that I get: ``` Ran 432 tests in 0.017s OK (skipped=431) ``` when running with PYTORCH_TEST_WITH_SLOW=1 and PYTORCH_TEST_SKIP_FAST=1 CI integration coming in later patch, as well as nontrivial uses of this decorator. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14544441 fbshipit-source-id: 54435ce4ec827193e019887178c09ebeae3ae2c9	2019-03-21 11:17:34 -07:00
Igor Fedan	3eff333bff	lint changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18276 Differential Revision: D14563385 Pulled By: ifedan fbshipit-source-id: 12a51dbdb7b9e96be9fefa21fe298796b1ae6b58	2019-03-21 11:11:35 -07:00
Thomas Viehmann	8356ffa922	move median to ATen (#17637 ) Summary: This moves median to ATen. - median with dimension reduces to kthvalue - median without dimension (aka medianall) is implemented in parallel to kthvalue because we would not want to reshape (copying for non-contiguous) and then copy again in kthvalue. We can sue the helper functions we moved from kthvalue. - `median_cuda` was accidentally already put into ATen in #17544. - The quickselect algorthm without indices for CPU in TH is now obsolete and removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17637 Differential Revision: D14346510 Pulled By: ezyang fbshipit-source-id: c07ad144efbd6b4194179bb1c02635862521d8cb	2019-03-21 10:02:04 -07:00
Edward Yang	d1497debf2	Fix B903 lint: save memory for data classes with slots/namedtuple (#18184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18184 ghimport-source-id: 2ce860b07c58d06dc10cd7e5b97d4ef7c709a50d Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530872 fbshipit-source-id: e26cecab3a8545e7638454c28e654e7b82a3c08a	2019-03-21 09:10:30 -07:00
Edward Yang	ba81074c40	Fix B902 lint error: invalid first argument. (#18181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181 ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint A variety of sins were committed: - Some code was dead - Some code was actually a staticmethod - Some code just named it the wrong way - Some code was purposely testing the omitted case Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530876 fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e	2019-03-21 09:10:28 -07:00
Edward Yang	0654c7d4a7	Fix B006 lint errors: using mutable structure in default argument. (#18178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18178 ghimport-source-id: 667ee76b418f505fa64b863e52a603c508dcd1bf Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530874 fbshipit-source-id: 38f4456a085bfe55f2a96fff53028ebd0d621604	2019-03-21 09:10:25 -07:00
Thomas Viehmann	0122121176	Two amendments for the shape analysis (#18271 ) Summary: Two small refinements to the shape analysis: - `detach` can set requires grad to false for dimensioned tensors (not sure if I would also need to deal with Complete?). - add `batch_norm_stats`. I noticed these while looking at what's going on when trying to code batch norm manually. (Hi wanchaol ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18271 Differential Revision: D14561303 Pulled By: ezyang fbshipit-source-id: 64a6879392e77403c44f2ed82f84b6397754d0ea	2019-03-21 08:07:51 -07:00
Edward Yang	9bc8badbcb	Fix lstrip bug revealed by B005 lint (#18177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18177 ghimport-source-id: fbbf915b66762fc88bc5b541464e71ba27500958 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint lstrip() doesn't strip a prefix; it strips all of the characters in the passed in string. B005 lint revealed this. Replaced with substring operation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530873 fbshipit-source-id: 13b3438fcc3cce13b5110730dc3d0b528a52930f	2019-03-21 07:56:24 -07:00
Igor Fedan	e5cdd94324	Backward function for torch.cdist Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17173 Differential Revision: D14111482 Pulled By: ifedan fbshipit-source-id: d72cfd53c29d0f8cf5f8ad1148d14f3d5abd938e	2019-03-21 00:39:29 -07:00
Lu Fang	2016daaf51	Fix ONNX symbolic for argmin and argmax (#18261 ) Summary: Fix the problem introduced in https://github.com/pytorch/pytorch/pull/17103 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18261 Reviewed By: bddppq Differential Revision: D14558781 Pulled By: houseroad fbshipit-source-id: 7bb50072e77d1d7b2a93f4011fa1362f26e9df1c	2019-03-20 22:51:13 -07:00
Xiaomeng Yang	e04c9195b7	Update math::Transpose to support tensor with size > 2G (#17670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17670 Update math::Transpose to support tensor with size > 2G i-am-not-moving-c2-to-c10 Differential Revision: D14313624 fbshipit-source-id: 0b4a85b913972e5a8981f0d40d0c539407b98f30	2019-03-20 18:22:21 -07:00
Jongsoo Park	bbbabda4e8	handle dst_bin_width==0 case properly (#18240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18240 For rare cases when dst_bin_width == 0 we should just put all numbers to an arbitrary bin. Reviewed By: csummersea Differential Revision: D14544685 fbshipit-source-id: 02d04ff8bd1555d6cf7e7eeb1196a4ab3325a9e5	2019-03-20 17:11:25 -07:00
Lu Fang	e12091d0a3	Revert D14114134: [asr] add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta Differential Revision: D14114134 Original commit changeset: 112bb2ceb9d3 fbshipit-source-id: 763262c1b78eed88a653caad5adc27d97feb43aa	2019-03-20 16:32:53 -07:00
Gao, Xiang	7e6220393f	Cleanup arg{min, max} (#17103 ) Summary: Why do we need this workaround? `PythonArgParser` handles these two cases well. The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was: > Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature. Will keep an eye on the CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103 Differential Revision: D14523503 Pulled By: VitalyFedyunin fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80	2019-03-20 16:28:27 -07:00
Bharat123Rox	ebc9f75895	Added the exception of ignore_index (#18117 ) Summary: Fix #17801 to add an exception regarding `ignore_index` in the documentation for `torch.nn.CrossEntropyLoss` and `torch.nn.NLLLoss` If any other files/functions are hit, I'd be glad to incorporate the changes there too! 😊 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18117 Differential Revision: D14542079 Pulled By: ezyang fbshipit-source-id: 7b918ac61f441dde7d3d6782d080c500cf2097f1	2019-03-20 16:03:34 -07:00
David Riazati	fd35814348	Add .get() for dicts (#18238 ) Summary: Fixes #18232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18238 Differential Revision: D14546689 Pulled By: driazati fbshipit-source-id: ed021e6f54c891d6c734c8f2345f4e83a3c6c905	2019-03-20 14:57:13 -07:00
Pieter Noordhuis	d5328a8a30	Update nccl submodule to 2.4.2 (#17883 ) Summary: Didn't test this. Let's see what happens. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17883 Differential Revision: D14547470 Pulled By: pietern fbshipit-source-id: c35d232f6bcc5a2dce55da636a0acbea5c2725d8	2019-03-20 14:39:52 -07:00
Pieter Noordhuis	83d84c22e4	Reinstate ncclCommDestroy (#17943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17943 Together with xw285cornell came up with a solution for static destruction order fiasco that caused the NCCL context to be destroyed after the CUDA context was already destroyed. In this commit we destroy all cached NCCL contexts as soon as the last NCCL related Caffe2 operator instance is destructed, thereby avoiding a dependency on static variable destruction. Reviewed By: xw285cornell Differential Revision: D14429724 fbshipit-source-id: fe5ce4b02b1002af8d9f57f6fa089b7a80e316ce	2019-03-20 14:20:45 -07:00
Davide Libenzi	272a48f6fe	Enable autograd to recognize the XLA backend as one providing multiple devices (#17847 ) Summary: …e devices, while not being CUDA/HIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17847 Differential Revision: D14545634 Pulled By: ezyang fbshipit-source-id: 417181bf2ff4f8978544afe2fb6b042e787854ed	2019-03-20 13:58:36 -07:00
Weiyi Zheng	1b71f6d4eb	add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#17905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17905 support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it. Reviewed By: jspark1105 Differential Revision: D14114134 fbshipit-source-id: 112bb2ceb9d3d5e663dd430585567f4eaa2db35f	2019-03-20 13:52:10 -07:00
Zhang Dong	1442808fcd	fixed typo in shape_analysis.cpp (#18227 ) Summary: cc: VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/18227 Differential Revision: D14541764 Pulled By: VitalyFedyunin fbshipit-source-id: 9477deb1a99e6581f15a4de4d7631d747f56f3a6	2019-03-20 12:47:32 -07:00
Lu Fang	18b31b73fb	Retain the parameter names in ONNX exporter (#17551 ) Summary: So, we will keep the names of ONNX initializers the same as the names in PyTorch state dict. Later, we will make this as the default behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17551 Reviewed By: dzhulgakov Differential Revision: D14491920 Pulled By: houseroad fbshipit-source-id: f355c02e1b90d7ebbebf4be7c0fb6ae208ec795f	2019-03-20 12:11:23 -07:00
Alexandr Morev	abc171bd53	Fix typo in docstring Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18216 Differential Revision: D14539824 Pulled By: ezyang fbshipit-source-id: 490b72951a75f3f8b949a2d692d660a3693ee98a	2019-03-20 11:16:36 -07:00
Vishwak Srinivasan	a519217ee7	Add batched version of trtrs (#18025 ) Summary: - Remove single batch TH/THC implementations - Remove `_batch_trtrs_lower` from `multivariate_normal` - Add tests for batched behavior - Modify trtrs_backward to accommodate for batched case - Modify docs In a future PR, this will be renamed to `triangular_solve`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18025 Differential Revision: D14523004 Pulled By: ifedan fbshipit-source-id: 11c6a967d107f969b60e5a5c73ce6bb8099ebbe1	2019-03-20 11:11:32 -07:00
Sacha Refshauge	e312801453	Remove GLOO usage when USE_GLOO is OFF Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18203 Differential Revision: D14540520 Pulled By: soumith fbshipit-source-id: f1c96cc563ed1e913040e3e16b109d3e3030128c	2019-03-20 09:31:53 -07:00
peterjc123	2a6cbfaccf	Enable 32 bit CPU build on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18176 Differential Revision: D14539884 Pulled By: ezyang fbshipit-source-id: 0e4bd9c1ef1830cd9bcc40df36b87534f61def08	2019-03-20 09:26:50 -07:00
peter	19c13eee39	Correct cmake flags passing (#18217 ) Summary: Fixes #18214. According to the CMake manual, we should pass the arguments first, and put the directory as the last element. Otherwise, these flags may not be passed correctly. Reference: 1. https://cmake.org/cmake/help/latest/manual/cmake.1.html#synopsis 2. https://stackoverflow.com/a/27169347 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18217 Differential Revision: D14540588 Pulled By: ezyang fbshipit-source-id: a027f585dde66c5da7bbbe584fa42c3e56027d59	2019-03-20 09:21:31 -07:00
Gregory Chanan	bd1271338a	Add python_variable._is_view for debugging. (#18197 ) Summary: I don't know if we actually want to expose this or not, but it's useful for debugging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18197 Reviewed By: ezyang Differential Revision: D14530712 Pulled By: gchanan fbshipit-source-id: 98fdba9cf113738f0db3a198c49365de536b9919	2019-03-20 08:43:02 -07:00
Johannes M Dieterich	4741d613ee	Do not apply these explicit unroll pragmas for ROCm. (#18204 ) Summary: Loop analysis indicates that there is a runtime trip count and hence unrolling cannot take place. This will silence compile-time warnings we have been observing with recent ROCm releases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18204 Differential Revision: D14539875 Pulled By: ezyang fbshipit-source-id: a7ea7f2a95603754296b76a6b62a154f56f4ad4d	2019-03-20 08:06:07 -07:00
Edward Yang	8f1db1c6c1	Copy-edit CONTRIBUTING and update. (#18131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18131 ghimport-source-id: 473dae70f6c236d317bec77d894310c0aa0376ec Stack from [ghstack](https://github.com/ezyang/ghstack): * #18131 Copy-edit CONTRIBUTING and update. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14505049 fbshipit-source-id: 02aeae33c0049889243c56dd0d761487dac2351e	2019-03-20 07:40:59 -07:00
Ailing Zhang	8895bfba6a	fix cosine_similarity (#18168 ) Summary: fixes #18057 according to colesbury 's suggestion. Thanks! cc: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/18168 Differential Revision: D14520953 Pulled By: ailzhang fbshipit-source-id: 970e6cfb482d857a81721ec1d0ee4a4df84a0450	2019-03-19 20:09:17 -07:00
Elias Ellison	3baf99bea7	Breakup test misc pt2 (#18191 ) Summary: Further breakup test_misc.h. The remaining tests don't directly map to a jit file so I left them in test_misc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18191 Differential Revision: D14533442 Pulled By: eellison fbshipit-source-id: 7f538ce0aea208b6b55a4716dfcf039548305041	2019-03-19 19:41:22 -07:00
David Riazati	9d455ac2fe	Add serialization docs to jit/README (#17951 ) Summary: Documents the serialization format for `torch.jit.save`. Some of the info is copied from houseroad's internal doc. [Formatted Markdown](https://github.com/driazati/pytorch/blob/serial_docs/torch/csrc/jit/README.md) Also refactors the readme to have a heading hierarchy + table of contents Pull Request resolved: https://github.com/pytorch/pytorch/pull/17951 Differential Revision: D14531644 Pulled By: driazati fbshipit-source-id: cbcd9462054cc9f8a2f8cea2c98d8aba4e7d227c	2019-03-19 16:47:04 -07:00
Edward Yang	08aa973fb8	Turn on Travis builds for ghstack PRs. (#18193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18193 ghimport-source-id: 540859cf0b238a9832f45b3f4c2351e3343fc1a2 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18193 Turn on Travis builds for ghstack PRs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14529945 fbshipit-source-id: 4476e996e311a04f2a997ca9b7c4cf2157dd6286	2019-03-19 14:51:07 -07:00
Michael Suo	cd6a6c54c6	do not throw when unicode is seen in pull request info (#18195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18195 ghimport-source-id: 05102cb115c6bd6d141f51905e20155bcd79a908 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18195 [build] do not throw when unicode is seen in pull request info Differential Revision: D14529707 fbshipit-source-id: 2f6a31b01b3a9b044fd24be466cc5325b70929ad	2019-03-19 14:45:47 -07:00
Edward Yang	6758f5587f	Delete bugbear from Python 2 lint. (#18192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18192 ghimport-source-id: 9523a09d7ec202ef08cf0ecdf48c42739ea6b0ce Stack from [ghstack](https://github.com/ezyang/ghstack): * #18192 Delete bugbear from Python 2 lint. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14529240 fbshipit-source-id: 1a433b53dd38d1c455e8c0750d97c594ac51ef09	2019-03-19 14:24:03 -07:00
David Riazati	1bc4eb93c7	Support attributes when emitting function calls (#18156 ) Summary: The type of each `initial_ivalue` is completely known at some point but that information is discarded by the time a call to it is emitted. This PR is kind of a hack, as a better (longer) solution, the method should know about the type of each initial value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18156 Differential Revision: D14525768 Pulled By: driazati fbshipit-source-id: 52d53e9711a07a4551c988bd95fe997e654aa465	2019-03-19 13:56:40 -07:00
Tongzhou Wang	f212fd9fd6	Customized pin_memory for PackedSequence (#18079 ) Summary: fixes https://github.com/pytorch/pytorch/issues/18078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18079 Reviewed By: ezyang Differential Revision: D14521192 Pulled By: zou3519 fbshipit-source-id: cec773a3a6f2c405a0d9701e213b7caf81649181	2019-03-19 13:41:30 -07:00
Edward Yang	916a670828	Enable flake8-bugbear line length checking. (#18138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18138 ghimport-source-id: be62a71ef98714e6f168a00f84120f612363528e Stack from [ghstack](https://github.com/ezyang/ghstack): * #18138 Enable flake8-bugbear line length checking. flake8-bugbear's line length checker (B950) which permits violations of up to 10% but specifies the "true" limit when you go over. I had to ignore a bunch of flake8-bugbear's other checks when I turned this on. They're good checks though (they're turned on in fbcode) and we should fix them eventually. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: salexspb Differential Revision: D14508678 fbshipit-source-id: 2610ecc0dd43cc0788d77f4d024ebd85b26b8d41	2019-03-19 13:31:04 -07:00
Michael Suo	794c631e23	fix bug in alias analysis (#18146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18146 ghimport-source-id: 4b061c27c5c44ef0d06066490ed16cab3d0c7a64 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18146 [jit] fix bug in alias analysis We handled hasWriters() incorrectly in the case of wildcards. There's even a comment describing the correct behavior. Sad! Much thanks to t-vi for tracking this down and suggesting the fix! Differential Revision: D14524208 fbshipit-source-id: 8010b54257241bd64013a0d0a8b6e7d22d8c70af	2019-03-19 11:35:28 -07:00
vishwakftw	234bb8719a	Add backend checks to solve methods (gesv, cholesky_solve) (#18116 ) Summary: Changelog: - Incorporate a simple backend check in the linearSolveCheckInputs function in LinearAlgebraUtils.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/18116 Differential Revision: D14504469 Pulled By: soumith fbshipit-source-id: 7402b6dbaa8d73048946613b806d54f68bcbd8f4	2019-03-19 10:44:45 -07:00
Hector Yuen	7bb36ada1f	fix -Wsign-compare warnings for some files inside c2 (#18123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123 the motivation of this fix is to resolve things like: for(auto i = 0; i < N; i++) where N is bigger than int32 These instances of comparison were found by enabling -Wsign-compare There are way too many things to fix, so issuing this as a series of fixes The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances Reviewed By: ZolotukhinM Differential Revision: D14497094 fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe	2019-03-19 10:39:20 -07:00
Neta Zmora	1c76746f61	SGD: remove unneeded multiply-add initialization operations (#18114 ) Summary: The momentum buffer is initialized to the value of d_p, but the current code takes the long way to do this: 1. Create a buffer of zeros 2. Multiply the buffer by the momentum coefficient 3. Add d_p to the buffer All of these can be collapsed into a single step: 1. Create a clone of d_p Pull Request resolved: https://github.com/pytorch/pytorch/pull/18114 Differential Revision: D14509122 Pulled By: ezyang fbshipit-source-id: 4a79b896201d5ff20770b7ae790c244ba744edb8	2019-03-19 10:34:17 -07:00
Ailing Zhang	a50ba7e238	specialized CUDA impl for dropout in AD (#17756 ) Summary: In aten we have a _fused_dropout implementation for CUDA case. As ngimel suggested if we discard it in JIT AD, it hurts performance. It doesn't seem ideal to include backend specific implementation in AD, but this is helpful to prevent performance regression atm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17756 Differential Revision: D14368999 Pulled By: ailzhang fbshipit-source-id: 9a371c5020f630e8f6e496849ec9772b6f196169	2019-03-19 10:34:15 -07:00
Neeraj Pradhan	9a153412fd	Fix underflow issue with dirichlet sample (#17488 ) Summary: Addresses #15738, using fritzo's suggestion. This adds a `torch._sample_dirichlet` method in `Distributions.cpp` and `Distributions.cu`. - For CPU, this leads to no perf hit since all we do is to promote the `alpha` to double when getting the gamma samples (the gamma sampler anyways uses `accscalar_t`(double for CPU)) and cast it back to float32 on return. - I have added an analogous method for CUDA as well, but the default sampler for CUDA uses scalar_t for efficiency, so I have kept it as that. With this, I do not see the bias towards 1 as reported in #15738 with `float32`, but there is a spurious mode at 0.5, as would be expected. Users would need to explicitly use `float64` for GPU to not see the spurious mode at 0.5. (EDIT: see note below, it appears that the bias issue is still there for certain builds). Added some tests and checked that there is no perf regression. My experience with C++ is very limited, so apologies in advance if I missed something basic. cc. ailzhang, fritzo, fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/17488 Differential Revision: D14410301 Pulled By: ezyang fbshipit-source-id: 62b2f694b4642685eab06db96d74ce28e05c3992	2019-03-19 10:34:13 -07:00
Gregory Chanan	84fe20600d	Kill Backend constructor of TensorOptions. (#18137 ) Summary: It's wrong and unused. Use one of the many other constructors instead :). Pull Request resolved: https://github.com/pytorch/pytorch/pull/18137 Differential Revision: D14508364 Pulled By: gchanan fbshipit-source-id: 19c6ff78ad9d9221d0874425edd02b78627c4ca7	2019-03-19 08:00:21 -07:00
Gregory Chanan	3a85f88efd	Remove deviceTypeToBackend, which is underspecified. (#18135 ) Summary: There are multiple backends for a device type, so we just kill this function. Also, kill an getNonVariableType instance which was also underspecified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18135 Differential Revision: D14507474 Pulled By: gchanan fbshipit-source-id: fc791a76d4b851b23d09a070725f3838621eb13d	2019-03-19 07:53:28 -07:00
Gregory Chanan	190c36bbc2	Stop generating unimplemented type methods. (#18144 ) Summary: This gets rid of 'aten_sparse' which was used at one time with legacy THS code, but is now only overloaded in native_parse.py. The way that 'aten_sparse' worked was wonky -- it extended all backends (default [CPU, CUDA]) to include sparse. But this is totally unnecessary; we already have the backends we need to generate for from type_method_definition_dispatch. codegen changes: `fc37c8e171/diff.txt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18144 Reviewed By: ezyang Differential Revision: D14511324 Pulled By: gchanan fbshipit-source-id: 8bb4ac4cf0985f8756790779a22bc229e18e8e7f	2019-03-19 07:42:06 -07:00
Bharat Raghunathan	8ed2b88bf1	Corrected type of 'swap' in torch.nn.TripletMarginLoss (#18115 ) Summary: Fix #16428 by correcting type of 'swap' from `float` to `bool` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18115 Differential Revision: D14516615 Pulled By: ezyang fbshipit-source-id: c61a45d533f3a443edf3c31c1ef3d9742bf46d2b	2019-03-19 07:09:15 -07:00
Deepali Chourasia	542c273e5b	handle scenario when GPU support is not available and p2p_access_pattern is empty (#17974 ) Summary: Observed that when there is no GPU support available `workspace `sets `GetGpuPeerAccessPattern `to `[]` in https://github.com/pytorch/pytorch/blob/master/caffe2/python/workspace.py#L79 and this case is not handled in https://github.com/pytorch/pytorch/blob/master/caffe2/python/data_parallel_model.py#L1065. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17974 Differential Revision: D14517066 Pulled By: ezyang fbshipit-source-id: 186911d95c07e9a55ab82a41d0c7c919e4281bb4	2019-03-18 23:11:54 -07:00
Lutz Roeder	195cba500f	Fix Caffe2 operator schemas (#15462 ) (#13229 ) (#18109 ) Summary: Maratyszcza harouwu yinghai This is broken since #13065. `c_str()` returns a pointer that isn't permanent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18109 Differential Revision: D14516622 Pulled By: ezyang fbshipit-source-id: 7113d92eac4f61479c4c7b323cf78cc8aa00b17e	2019-03-18 21:00:43 -07:00
Junji Hashimoto	afb2f2424a	Increase line-width of Declarations.yaml (#18050 ) Summary: There are some line breaks in schema_string of Declarations.yaml. Is this valid yaml? I am reading yaml-spec. It seems that the “\|” indicator or single/double quote is required to insert line-break. https://yaml.org/spec/1.2/spec.html ![image](https://user-images.githubusercontent.com/2469618/54405834-1e53ac80-471b-11e9-9925-be13a109eb46.png) Could you increase line-width of yaml to avoid newline? Pull Request resolved: https://github.com/pytorch/pytorch/pull/18050 Differential Revision: D14516694 Pulled By: ezyang fbshipit-source-id: 1db9f3bf131b54a783d668de973915892603189e	2019-03-18 20:49:05 -07:00
svcscm	86f1dd3fb0	Updating submodules Reviewed By: yns88 fbshipit-source-id: eeeec4229e05916f2c17e525aee5ac4465ef52db	2019-03-18 20:40:35 -07:00
Zhang Dong	2737d2c7dc	delete unnecessary file .gitkeep (#18136 ) Summary: delete unnecessary file .gitkeep in /pytorch/tree/master/torch/csrc/nn Pull Request resolved: https://github.com/pytorch/pytorch/pull/18136 Differential Revision: D14516584 Pulled By: ezyang fbshipit-source-id: a7555693cb3df1c5e37fcd3ca9bb379a2258f2d1	2019-03-18 20:31:25 -07:00
David Riazati	3d44305e9d	Attribute serialization (#17423 ) Summary: Allows serialization/loading of attributes (`IValue`s of any type). * metadata (attribute name, type) is stored in the `model.json` * The binary format is a subset of the `pickle` module that supports the operations necessary for `IValue`s * Attributes are serialized in the order they are defined on a module to a list in a single `attributes` file, with submodule attributes coming first. This order directly matches the order attributes are listed in `model.json` * This can be inspected in Python with `pickle.load()` or with `pickletools` (PyTorch need not be installed for this to work) * A class is used to store a tensor's index into the tensor table of the model, so to unpickle the file you have to use a custom Unpickler: ```python class TensorID(object): def __setstate__(self, id): self.id = id class JitUnpickler(pickle.Unpickler): def find_class(self, module, name): if module == '__main__' and name == 'TensorID': return TensorID JitUnpickler(open("my_model/attributes.pkl", "rb")).load() ``` * pickle format: https://svn.python.org/projects/python/trunk/Lib/pickletools.py * It currently does not support/guarantee that anything saved out with `pickle` (i.e. if you edit `attributes` with `pickle` directly) instead of our tools will be imported correctly Also will fix #17683 and fix #16367 Followup Work: * document format / choice of pickle: #17951 * create an example * list specializations * int size specializations, large binputs * do a first pass over attributes to output only necessary `BINPUT` ops * attribute reassignment (e.g `self.my_attribute = new_value`) * `tensor.save("some_checkpoint.pkl")` support with tensors embedded in Pickle file Pull Request resolved: https://github.com/pytorch/pytorch/pull/17423 Differential Revision: D14470965 Pulled By: driazati fbshipit-source-id: 6a21a9939efdbe59b4bc57fd31d6d630bab5297e	2019-03-18 18:18:22 -07:00
Jongsoo Park	87b6cbb6fd	fix bug in pool_dnnlowp_op_avx2.cc (#18141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18141 VLEN should've been 32 Reviewed By: jianyuh Differential Revision: D14510780 fbshipit-source-id: ddf12746e1c69677a268432432ddb088cc210084	2019-03-18 16:31:42 -07:00
svcscm	0a8efce51e	Updating submodules Reviewed By: yns88 fbshipit-source-id: ed297c07c681f5f45d3f99edf48680015ca5b138	2019-03-18 16:21:23 -07:00
Vishwak Srinivasan	421b508d55	Rename gesv to solve (#18060 ) Summary: Changelog: - Renames `gesv` to `solve` to remain consistent with `cholesky_solve`. - Rename all tests, fix callsites - Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060 Differential Revision: D14503117 Pulled By: zou3519 fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5	2019-03-18 16:04:24 -07:00
James Reed	0eb4f7aa71	Modify BeamSearch to support CharSourceEncoder (#18107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18107 Pull Request resolved: https://github.com/pytorch/translate/pull/396 also: 1. fix issues with OptionalType not having a createWithContainedType (PyTorch diff) 2. Delete tests for ONNX full beam search export (nobody is using it and it just makes things harder. Currently ONNX doesn't support `_unwrap_optional`) Reviewed By: jmp84 Differential Revision: D14483771 fbshipit-source-id: 0e37ef1cb5a16d03a535eef808b0488b98802128	2019-03-18 14:11:57 -07:00
Narine Kokhlikyan	670f509984	Circular Convolution Function via circular padding (#17240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17240 Added circular padding in addition to zero padding to Conv1D, Conv2D and Conv3D based on the solution suggested in: https://github.com/pytorch/pytorch/issues/3858 Reviewed By: ezyang Differential Revision: D14126416 fbshipit-source-id: a2f1587503ee0cfff98d5cb0d5b0a600ef8aaeb4	2019-03-18 12:33:20 -07:00
Thomas Viehmann	2b7a5d1876	don't include /usr/include when nvcc is in /usr/bin (#18127 ) Summary: ...because gcc will have failures with very strange error messages if you do. This affects people with Debian/Ubuntu-provided NVCC, the PR should not change anything for anyone else. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18127 Differential Revision: D14504386 Pulled By: soumith fbshipit-source-id: 1aea168723cdc71cdcfffb3193ee116108ae755e	2019-03-18 12:18:27 -07:00
Michael Suo	ed36fd30c8	fix double free in test_jit (#18121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18121 ghimport-source-id: 70c273bfbcb68f7b25cf87f5614c662960864758 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18121 [jit] fix double free in test_jit These definitions used to be in anonymous namespace so they weren't exported from the translation unit. #18071 put those in a `test` namespace so I guess they were getting their destructors called twice on exit somehow. Making them static again fixes the problem. Reviewed By: ezyang Differential Revision: D14498349 fbshipit-source-id: f969781695dcbebdfcfce667fce5b986222a373e	2019-03-18 09:59:13 -07:00
Huitong Qiu	754bf595ca	Replace resize_dim with set_sizes_and_strides in THTensor_(squeeze) (#18059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18059 Replace resize_dim() with set_sizes_and_strides() in `THTensor_(squeeze)` in aten/src/TH/generic/THTensor.cpp and `THCTensor_(squeeze)` in aten/src/THC/generic/THCTensor.cpp Reviewed By: ezyang Differential Revision: D14471066 fbshipit-source-id: 1c8c412ff09246c4df6843736e3bf0279bfadea8	2019-03-18 08:52:58 -07:00
Tongzhou Wang	2e311d2003	update exp. family doc (#18118 ) Summary: sphinx doesn't understand hyphen. it does not merge the two halves together in html. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18118 Differential Revision: D14498012 Pulled By: mrshenli fbshipit-source-id: d6f4cfddc0a8e3a8f91578da43c26ca9c6fff3ce	2019-03-17 21:39:42 -07:00
Gregory Chanan	fe22871b49	Change one_hot from IndexTensor to Tensor. (#18073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18073 ghimport-source-id: f4dadebafa0423c4c5a0e46c15b38129402d830a Stack: * #18072 properly device_guard IndexTensor and BoolTensor. * #18073 Change one_hot from IndexTensor to Tensor. There is no codegen change. Reviewed By: ezyang Differential Revision: D14485248 fbshipit-source-id: ee2ba8e5dcbbbaf0214a026c8e7ed4e6712becb0	2019-03-17 15:40:40 -07:00
Gregory Chanan	3c2fccc1b4	properly device_guard IndexTensor and BoolTensor. (#18072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18072 ghimport-source-id: 9653731602c72f299e095dd50e3afe6bcc8b01d6 Stack: * #18072 properly device_guard IndexTensor and BoolTensor. * #18073 Change one_hot from IndexTensor to Tensor. Currently IndexTensor and BoolTensors do not have device_guards applied to them. This is bad in the case where the only tensor(s) are IndexTensors or BoolTensors, because no device guard is present. The only case this currently happens is with one_hot which ends up not mattering because of the way the implementation is written. But I wanted to make sure we are covered here. Reviewed By: ezyang Differential Revision: D14485249 fbshipit-source-id: e57b28086fa1ad2fdd248bb1220e8a2e42da03e1	2019-03-17 15:40:39 -07:00
Michael Suo	f9ad125e39	fix corner case for optional aliasing (#18093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18093 ghimport-source-id: 021adc52aa7bfe5fff74531c76a8cd28cab30b2a Stack: * #18093 [jit] fix corner case for optional aliasing Occasionally the compiler can insert constant Nones to make types line up. In that case, don't try to make a pointer from the optional type to None, since we know statically that None won't be mutated or whatever. Reviewed By: shannonzhu Differential Revision: D14493004 fbshipit-source-id: 6564065f39d99ee5af664f3a0fe235892973d9be	2019-03-17 14:56:40 -07:00
Jianyu Huang	96fe2b4ecb	Typo fix (#18089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18089 Typo fix for the fully connected layer documentation. Reviewed By: jspark1105 Differential Revision: D14488632 fbshipit-source-id: ca0271ca0250c1d653ed7f250e8588f7b2ce1056	2019-03-16 15:07:01 -07:00
Duc Ngo	da3cc6e7ee	Caffe2 - Add flag to fails if float point exceptions is detected in operator runs (#18040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18040 Add flag to fails if float point exceptions is detected in operator runs Sample exception Exception [enforce fail at operator.h:837] !std::fetestexcept(FE_DIVBYZERO). Division by zero floating point exception (FE_DIVBYZERO) reported. Error from operator: input: "1" input: "0" output: "out" name: "" type: "Div" Reviewed By: jspark1105 Differential Revision: D14467731 fbshipit-source-id: fad030b1d619a5a661ff2114edb947e4562cecdd	2019-03-16 12:28:05 -07:00
Junjie Bai	0fe6e8c870	Remove ComputeLibrary submodule Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18052 Reviewed By: ezyang Differential Revision: D14477355 fbshipit-source-id: c56b802f6d69701596c327cf9af6782f30e335fa	2019-03-16 09:06:42 -07:00
Jongsoo Park	c7448aa13c	remove unused parameters in optimizer tests (#18084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18084 data_strategy parameter was not used in some of unit tests for optimizers Reviewed By: hyuen Differential Revision: D14487830 fbshipit-source-id: d757cd06aa2965f4c0570a4a18ba090b98820ef4	2019-03-15 18:06:15 -07:00
Sebastian Messmer	be364ac8d7	Specify overload name in function schema (#18037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037 The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this: my_func.overload1(arg1: Tensor) -> Tensor my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor Reviewed By: zdevito Differential Revision: D14467497 fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650	2019-03-15 16:58:13 -07:00
Sebastian Messmer	7a3488e0fc	Expose c10 cuda ops to caffe2 (#18036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036 - Add macros to export c10 cuda operators to caffe2 frontend - Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries Reviewed By: ezyang Differential Revision: D14467495 fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9	2019-03-15 16:58:12 -07:00
Jack Montgomery	cb2ea17707	Automatic update of fbcode/foxi to 2bcc4064c90e87b9638615c733485f07c47b7558 (#18070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18070 Previous import was d1f45b1a2b1585d0e9bc65e15e463db344fc3ff6 Included changes: - [2bcc406](https://github.com/houseroad/foxi/commit/2bcc406): Merge pull request #7 from jackm321/tracing_fixes <Jack Montgomery> - [c39033c](https://github.com/houseroad/foxi/commit/c39033c): Fixes for tracing events <Jack Montgomery> - [50912cf](https://github.com/houseroad/foxi/commit/50912cf): Merge pull request #5 from jackm321/add_trace_events <Jack Montgomery> - [ba2fdcb](https://github.com/houseroad/foxi/commit/ba2fdcb): Merge pull request #5 from jackm321/add_trace_events <Jack Montgomery> - [7d42b12](https://github.com/houseroad/foxi/commit/7d42b12): address comments <Jack Montgomery> - [dcabd8d](https://github.com/houseroad/foxi/commit/dcabd8d): Add trace events interface <Jack Montgomery> Reviewed By: houseroad Differential Revision: D14483201 fbshipit-source-id: f51ed869c9a89521079df89903abc0ac0a45ac7b	2019-03-15 16:49:08 -07:00
Gregory Chanan	d1843d4173	Add backwards compatibility and other fixes to Dispatch macros. (#17996 ) Summary: Changes: 1) https://github.com/pytorch/pytorch/pull/17527 changed dispatch macros to be ScalarType based instead of at::Type based. This broke cpp extensions that relied on dispatch macros. Since IMO these should be ScalarType based (and some extensions have already updated), we allow either at::Type or at::ScalarType to be passed, but passing at::Type will result in a deprecated warning. 2) Reintroduce macros that were deleted (AT_DISPATCH_ALL_TYPES_AND_HALF, AT_DISPATCH_COMPLEX_TYPES, AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX, AT_DISPATCH_ALL_TYPES_AND_COMPLEX); the AND_HALF ones now give a deprecated warning because there are more extensible macros that were introduced in their place. 3) Makes AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND into a ScalarType based macro (and updates usages). This was the result of a logical merge conflicts. 4) Adds a new macro, C10_DEPRECATED_MESSAGE for passing a deprecated message to the compiler. I didn't spend much time seeing if this can be enabled for versions before C++14. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17996 Reviewed By: ezyang Differential Revision: D14446203 Pulled By: gchanan fbshipit-source-id: 1da56e2e9c15aa8f913ebbf6bf1110c5b6dc375e	2019-03-15 14:21:46 -07:00
Elias Ellison	f3806094d5	Breakup Test Misc (batch 1/2) (#18071 ) Summary: Breakup test_misc so that a test for a file is in test_filename. I think we might want to wait on moving test files into the source directory, since that would involve moving some tests over to the C10 folder, and this goes 99% of the way for test discoverability IMO anyway. I added a file test_utils for common functions invoked in the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18071 Differential Revision: D14485787 Pulled By: eellison fbshipit-source-id: dcb20d1978d490999d435ea20c1d0503413a5c80	2019-03-15 13:56:19 -07:00
yuanhaoxie	aafbefa4d6	Remove the identical if branch (#18019 ) Summary: elif branch and else branch have the same content. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18019 Differential Revision: D14475107 Pulled By: ezyang fbshipit-source-id: 5075cc938f57649af7537de1a7c9d76ea976cafc	2019-03-15 13:14:26 -07:00
Roy Li	80a7eac79e	Remove Type::elementSizeInBytes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17785 Reviewed By: ezyang Differential Revision: D14379074 fbshipit-source-id: 60727f187d61eb571b144bd6eed4dd4908da0b51	2019-03-15 12:56:02 -07:00
Michael Kösel	9a8a268672	add index and count to list (#17446 ) Summary: see https://github.com/pytorch/pytorch/issues/16662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17446 Differential Revision: D14461293 Pulled By: Krovatkin fbshipit-source-id: 03572467cdf85efc909c1864c0558a93085c8ff3	2019-03-15 12:45:17 -07:00
Lara Haidar-Ahmad	001cffed9d	ONNX Export IsNan op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17698 Reviewed By: zrphercule Differential Revision: D14470646 Pulled By: houseroad fbshipit-source-id: d3e6adc83c4f9fa288c5fe0ae4c6af71fdd47905	2019-03-15 12:19:03 -07:00
Michael Suo	18f721fb9a	support serialization of classes (#17856 ) Summary: Stack:     ⚫  #17856 [jit] support serialization of classes  [💛](https://our.intern.facebook.com/intern/diff/D14402599/) Add support for saving/loading TorchScript modules that depend on user-defned classes. We track class dependencies the same we track tensor constants, then write them all out such that we can just compile them in order before compiling the module hierarchy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17856 Reviewed By: shannonzhu Differential Revision: D14461599 Pulled By: suo fbshipit-source-id: 7115f87e069fd00dc8381d7de9997864fef7ea9f	2019-03-15 12:06:23 -07:00
Michael Kösel	cd26200d1b	add reverse to list (#17001 ) Summary: Add reverse functionality to list. See https://github.com/pytorch/pytorch/issues/16662 ```python import torch torch.jit.script def foo(): a = [1, 2, 3, 4] a.reverse() return a ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17001 Reviewed By: eellison Differential Revision: D14092019 Pulled By: driazati fbshipit-source-id: b353c763677c22312b64dde0db268e2988610ba1	2019-03-15 11:53:37 -07:00
Lu Fang	b420f8ff70	1/2 Add Tracing support for C2 Ops (#17899 ) Summary: The C10 ops are not registered as custom ops in PyTorch. So we have to add the explicit support for it, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17899 Reviewed By: dzhulgakov Differential Revision: D14436999 Pulled By: houseroad fbshipit-source-id: a31fdf13a5c84f9b156a7288e0ffa57deb23b83f	2019-03-15 11:48:34 -07:00
Richard Zou	3b5ddaf034	Delete dead code in THTensorMoreMath.cpp (#17993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17993 ghimport-source-id: 5427773f6306bdeddffd9a3ae032acc3f253f458 Stack: * #17926 Implement at::has_internal_overlap helper function * #17927 Error out on in-place (unary) ops on tensors that have internal overlap * #17993 [easy] Delete dead code in THTensorMoreMath.cpp We seem to have new implementations already for these in ATen. Reviewed By: ezyang Differential Revision: D14457838 fbshipit-source-id: 8481aad74b2127bd28c0f3e09740889fc0488a31	2019-03-15 07:50:20 -07:00
Richard Zou	3c977fb7ce	Error out on in-place (unary) ops on tensors that have internal overlap (#17927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17927 ghimport-source-id: 626d321e430b6b5c0ea3aa1eb9df8c1e2d058bf8 Stack: * #17926 Implement at::has_internal_overlap helper function * #17927 Error out on in-place (unary) ops on tensors that have internal overlap On the way to #17935. Works for CPU and CUDA on the following ops: - abs_, acos_, asin_, atan_, ceil_, cos_, erf_, erfc_, exp_, expm1_ - floor_, log_, log10_, log1p_, log2_, round_, rsqrt_, - sin_, sqrt_, tan_, tanh_, trunc_ This PR adds a check to see if the out/result tensor has internal overlap. If it does, then we error out because the result may be incorrect. This is overly conservative; there are some cases where if the result is the same as the input, the inplace operation is OK (such as floor_, round_, and trunc_). However, the current code isn't organized in such a way that this is easy to check, so enabling those will come in the future. Reviewed By: ezyang Differential Revision: D14438871 fbshipit-source-id: 15e12bf1fdb2ab7f74bb806e22bc74840bd6abd1	2019-03-15 07:50:19 -07:00
Richard Zou	a4123decf7	Implement at::has_internal_overlap helper function (#17926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17926 ghimport-source-id: 9f7572b5d43e474492363fa17dcb86a6c27ca13c Stack: * #17926 Implement at::has_internal_overlap helper function * #17927 Error out on in-place (unary) ops on tensors that have internal overlap On the way to #17935. Checks if a tensor's sizes/strides indicate that multiple elements share the same memory location. This problem in general is hard so at::has_internal_overlap implements two heuristics and avoids solving the general problem: if a tensor is contiguous, it cannot have internal overlap if a tensor has any zero strides, it does have internal overlap otherwise, return MemOverlap::kTooHard to indicate that there might be overlap, but we don't know. Reviewed By: ezyang Differential Revision: D14438858 fbshipit-source-id: 607ab31771315921ab6165b2a1f072ac3e75925a	2019-03-15 07:50:17 -07:00
Gregory Chanan	ea652973f2	Fix truncation of default float values in JIT signatures. (#18044 ) Summary: In python2, float values get truncated. We are storing default float values as floats (not 100% sure why?), which results in the defaults being truncated in the JIT and not matching the (specified) native function signatures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18044 Reviewed By: ezyang Differential Revision: D14469868 Pulled By: gchanan fbshipit-source-id: a456de599e8dab106966bcac7a6033f02ce3cdd2	2019-03-15 07:43:15 -07:00
Choongwoo Han	40074d647c	Allow None for checkpoint (#17969 ) Summary: Currently, we cannot run a checkpointed function with None argument. ```python out = torch.utils.checkpoint.checkpoint(run_fn, input_var, None) ``` ``` File "/home/tunz/anaconda3/envs/torchdev/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 14, in detach_variable x = inp.detach() AttributeError: 'NoneType' object has no attribute 'detach' ``` This PR makes checkpoint function to safely handle None argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17969 Differential Revision: D14475148 Pulled By: ezyang fbshipit-source-id: 9afe9e9aac511a6df1e1620e9ac341536890d451	2019-03-15 07:38:41 -07:00
ttup7777	54ef852d7f	Fix unclosed files in download.py, test_onnxifi.py, test_trt.py (#18017 ) Summary: According to https://docs.python.org/3/tutorial/inputoutput.html, it is good practice to use the "with" keyword when dealing with file objects. If not, you should call f.close() to close the file and immediately free up any system resources used by it. Thus, I adjust the open file function to "with open() as f". Pull Request resolved: https://github.com/pytorch/pytorch/pull/18017 Differential Revision: D14475112 Pulled By: ezyang fbshipit-source-id: d1c0821e39cb8a09f86d6d08b437b4a99746416c	2019-03-15 07:29:46 -07:00
Junjie Bai	785c76584c	Run multi-gpu (single host) resnet50 and resnext101 training in bench (#18043 ) Summary: This is now working in rocm 2.2 cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/18043 Differential Revision: D14477493 Pulled By: bddppq fbshipit-source-id: 4d2dab1d5dbdbd4d6189162c074b19c4e9882c7d	2019-03-15 02:51:54 -07:00
BowenBao	8f07a9da30	Update nonzero onnx export (#18047 ) Summary: The output format of NonZero in ONNX(numpy https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html) differs from that in PyTorch: In ONNX: `[rank_of_input, num_of_nonzeros]`, whereas in PyTorch: `[num_of_nonzeros, rank_of_input]`. To resolve the difference a Transpose op after the nonzero output is added in the exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18047 Differential Revision: D14475081 Pulled By: ezyang fbshipit-source-id: 7a3e4899f3419766b6145d3e9261e92859e81dc4	2019-03-14 22:19:20 -07:00
Jongsoo Park	e21aa16931	more careful use of auto in sparse operations (#17958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17958 In some places, we need 64-bit for corner cases even though it's going to be rare. In some places, we were using 64-bit unnecessarily. Reviewed By: hyuen Differential Revision: D14435523 fbshipit-source-id: e01ab73029ff780133af7ff4bbbe2e17926ed5a2	2019-03-14 22:10:42 -07:00
Junjie Bai	30b80de876	Update caffe2 docker images tag to 253 (#18031 ) Summary: To use ROCm 2.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18031 Reviewed By: ezyang Differential Revision: D14469242 Pulled By: bddppq fbshipit-source-id: c969bcf95dabe067d7b1a2cf6e07209e11148ec1	2019-03-14 20:53:07 -07:00
Johannes M Dieterich	8362177bcf	Fix typo (#17949 ) Summary: Fix a very common typo in my name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17949 Differential Revision: D14475162 Pulled By: ezyang fbshipit-source-id: 91c2c364c56ecbbda0bd530e806a821107881480	2019-03-14 20:11:33 -07:00
J M Dieterich	1ba1ca0acb	Update to ROCm2.2 (#18007 ) Summary: ROCm 2.2 was released today, if we respin the CI docker images with the attached, PyTorch/Caffe2 will support ROCm 2.2 Changes necessary: * for the Ubuntu target, HIP PR 934 needs to be applied to fix the forceinline definition. ROCm 2.3 will contain this. * two unit tests proof flaky on different platforms, disable them defensively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18007 Differential Revision: D14473903 Pulled By: bddppq fbshipit-source-id: b1939f11d1c765a3bf71bb244b15f6ceb0e816d3	2019-03-14 18:47:22 -07:00
Michael Suo	8b32933ea1	fix clang-tidy (#18030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18030 ghimport-source-id: d68781226eee923c90be862ef54693feef5f1c1a Stack: * #18030 [jit] fix clang-tidy fix the following complaint ``` pytorch/torch/csrc/jit/ir.cpp:84:7: error: pass by value and use std::move [modernize-pass-by-value,-warnings-as-errors] const std::string& delim = ", ") ^~~~~~~~~~~~~~~~~~ std::string ``` Reviewed By: shannonzhu Differential Revision: D14466714 fbshipit-source-id: 195cba335ae656db28fc6230b9e56ad208c88c29	2019-03-14 17:31:08 -07:00
David Riazati	e782f200f7	Allow fewer arguments than the max in ArgumentSpec (#17826 ) Summary: Fixes #17558 The flattened tuple `Optional[Tuple[int, int]]` could either result in 1 (`None`) or 2 (`int` and `int`) values, so allow this case in `ArgumentSpec` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17826 Differential Revision: D14415290 Pulled By: driazati fbshipit-source-id: 971bfa39502cfb8f08a991f16ffed6d138e48dc9	2019-03-14 16:54:44 -07:00
Lu Fang	9de4350b77	Automatic update of fbcode/foxi to d1f45b1a2b1585d0e9bc65e15e463db344fc3ff6 (#18028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18028 Previous import was 520e8e135f1ad75959bf9b5bd15c361b8caeb8d6 Included changes: - [d1f45b1](https://github.com/houseroad/foxi/commit/d1f45b1): update the gitignore (#6) <Lu Fang> - [398135c](https://github.com/houseroad/foxi/commit/398135c): Remove static variable in header (#3) <Lu Fang> - [f817be1](https://github.com/houseroad/foxi/commit/f817be1): sync to ONNX cb544d07cc022e3fe83622fda9b2b1fa00b75b89 (#2) <Lu Fang> Reviewed By: zrphercule Differential Revision: D14464213 fbshipit-source-id: b5d166f05f7fd503dec11d676e219cc6c6a373f9	2019-03-14 15:47:36 -07:00
Edward Yang	d3e3b246ea	Use std::isnan instead of self-comparison. (#18021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18021 ghimport-source-id: 03423ba47ba5900c2b400c4457b148147ce8b35e Stack: * #18021 Use std::isnan instead of self-comparison. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: soumith Differential Revision: D14460699 fbshipit-source-id: d8feb7f3f0e93996bd1b4f4aea163548b1d12437	2019-03-14 15:41:36 -07:00
Yinghai Lu	b263a2d8a1	Unroll If ops when doing ONNXIFI transform (#18039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18039 We basically flatten the whole net in order to ease the ONNXIFI transform. An alternative way is to ONNXIFI the internal net of the If op, which can be done by adding interfacing inputs/outputs that the internal then_net or else_net referred to the inputs/outputs of the If op. This will be left as an TODO option. Reviewed By: zrphercule Differential Revision: D14452132 fbshipit-source-id: 00ad48d40da6fb8eabf9cca36701bcf61cbe4edc	2019-03-14 14:51:24 -07:00
Yinghai Lu	77d6d9e1b8	Minor improvements to ONNXIFI transform (#17964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17964 1. Make the output of TensorFill outputs CONSTANT during shape inference 2. Add option to avoid adding BatchAdjust ops 3. Add option to avoid lowering subgraph that's smaller than a limit Reviewed By: hyuen Differential Revision: D14360903 fbshipit-source-id: b3c5966b44e7cd0d56428acd6cc97f529b36b171	2019-03-14 14:51:23 -07:00
Junjie Bai	3057580c89	Run fp16 resnext101 training in bench script (#17963 ) Summary: cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/17963 Differential Revision: D14453445 Pulled By: bddppq fbshipit-source-id: 7ca0e0c33ae89d4d4cf6ddba321daf4d6b2d5ed6	2019-03-14 14:41:37 -07:00
Jie	6458a6f0fc	Tensor Iterator loop unrolling (#17667 ) Summary: Modified Tensor Iterator gpu reduction kernel. Creating multiple accumulator during thread reduce, this removes data dependency between unrolled loops, expose instruction level parallelism that benefits latency bounded kernels (e.g. welford used by `torch.std`) This approach increases register usage, such that we need to tune unrolling factors to prevent register spilling. Current implementation tune down the unrolling factor to 2 for welford (register heavy kernel), while keeping it unchanged (4) for the rest of reduction kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17667 Differential Revision: D14368325 Pulled By: umanwizard fbshipit-source-id: 9d64c0dccabdb1b7c3922a6557224af704a1974e	2019-03-14 14:09:01 -07:00
Xiaomeng Yang	9506779a73	Temp fix for TileOp backward compatibility (#18026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18026 Temp fix for TileOp backward compatibility Reviewed By: kittipatv Differential Revision: D14463672 fbshipit-source-id: 1f3ec550245cb63f1bc4f26196b9334cfe5d0705	2019-03-14 13:54:29 -07:00
Michael Suo	e862243abe	add a dump method to TreeViews (#17965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17965 ghimport-source-id: 0d3d6340141d8413ce524a8d8ed0d308854ee7ef Stack: * (to be filled) Also added it to the python bindings. Not for any particular reason, just because otherwise the function gets elided (even in debug mode!) and thus can't be called from the debugger Reviewed By: eellison Differential Revision: D14442654 fbshipit-source-id: 2868bb32ccb80b04f9483883faa702f63a7948bf	2019-03-14 12:27:32 -07:00
Duc Ngo	5cbc1981f3	JIT IR - Make valueMapPtr optional in convertNetDefToIR (#17942 ) Summary: Make valueMapPtr optional in convertNetDefToIR, and add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/17942 Differential Revision: D14429687 Pulled By: duc0 fbshipit-source-id: 3a5a72bbb5acc1bfd7144a987688c599016fbf7a	2019-03-14 12:22:49 -07:00
Yanghan Wang	53fb9a462a	register RoIAlign with C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17889 Reviewed By: smessmer Differential Revision: D14411630 fbshipit-source-id: c3b7941d725ae2c78e8d79f52a7983db92b75807	2019-03-14 11:55:29 -07:00
Wanchao Liang	10d64a1372	add tanh to AD and fix layernorm schema Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17816 Differential Revision: D14453048 Pulled By: wanchaol fbshipit-source-id: 45815db964a4d9ee85d8933e335b47f215e3c467	2019-03-14 11:20:40 -07:00
peter	9af6564060	Add magma debug version for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18008 Differential Revision: D14455117 Pulled By: soumith fbshipit-source-id: 29d9a2e0b36d72bece0bb1870bbdc740c4d1f9d6	2019-03-14 10:15:57 -07:00
peter	bba906c2cb	Simplify env creation when running Windows tests (#17916 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/13465. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17916 Differential Revision: D14460589 Pulled By: soumith fbshipit-source-id: e952d08648b833cfd4a8551355ecd68045fea25c	2019-03-14 10:10:31 -07:00
Edward Yang	84c30398c7	Fix lint in test_multiprocessing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18016 Reviewed By: eellison Differential Revision: D14458177 fbshipit-source-id: f17b3e06223ab399e9ce24be6988effe04dad636	2019-03-14 09:58:13 -07:00
Gregory Chanan	73c5921134	Remove ArgcountSortPlugin, which doesn't seem to be used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17977 Reviewed By: ezyang Differential Revision: D14438842 Pulled By: gchanan fbshipit-source-id: 9b1746880fd7e3bd2b76a2559face34940ce7570	2019-03-14 09:30:38 -07:00
Edward Yang	3fe7bdb2ff	Fix lint in test_nn.py (#18006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18006 ghimport-source-id: e267ece1ac03e0d17e01dddf4a77f52421859435 Stack: * #18006 Fix lint in test_nn.py Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: eellison Differential Revision: D14458108 fbshipit-source-id: 18ee6199447efed55a922cff5b3ad940a21c0536	2019-03-14 08:59:24 -07:00
Sebastian Messmer	a41b6d7d1f	Simplify macros for exposing c10 ops to c2 (#17781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17781 The wrapper for calling a c10 operator from caffe2 is now based on a runtime FunctionSchema instead of compile time information. This way, it can be created for any c10 operator schema with just one invocation to a simple macro instead of having to define arguments and more as compile time structures. Furthermore, previously, the wrapper assumed there's an argument present for preallocated outputs, but that was only true for caffe2 operators exported to c10. So the wrapper only worked correctly for calling caffe2->c10->caffe2. Now with the new implementation, it works for any c10 operator. Also, binary size for this should be much smaller. Reviewed By: ezyang Differential Revision: D14375054 fbshipit-source-id: bac7ab8e63929e6e2a148eacac41ed092009aa86	2019-03-14 08:54:16 -07:00
Sebastian Messmer	25d06eef7b	Improve caffe2 operator wrapping (#17743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17743 - caffe2::Operator::SetOutputTensor() can now be used in operators that are called from c10/PyTorch. - If the operator uses SetOutputTensor() instead of XOutput(), the wrapper doesn't preallocate an empty tensor for the operator anymore. Only outputs accessed in XOutput() will get an output tensor preallocated. - Remove the copying of the vector with output tensors into a vector with pointer to output tensors. - Preallocated outputs are now passed in as one TensorList argument on the stack. This TensorList argument has a well-defined name so other wrappers (i.e. the wrapper calling from c2 into c10) can recognize and use it). - Macros for exporting caffe2 operators to c10 are simplified. Instead of having `c10_op_handle_for_c2_op`, we now pass in the operator handle as a template argument. - `SetOutputTensor` and `OutputTensorOrUndefined` now work with operators exported to c10 Reviewed By: ezyang Differential Revision: D14362434 fbshipit-source-id: 44a5e717204f21ea8e9728437429d9b84906f9f5	2019-03-14 08:54:15 -07:00
Gregory Chanan	6def5b69e3	Remove unused KwargsPlugin. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17980 Reviewed By: ezyang Differential Revision: D14438877 Pulled By: gchanan fbshipit-source-id: f93764b00999effb5c8f852f8eda3a6da32dc767	2019-03-14 08:03:55 -07:00
vaeksare	40a3e14ade	Disable btri tests on Windows if MAGMA is not found (#17989 ) Summary: Fixes #17988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17989 Reviewed By: ezyang Differential Revision: D14454571 Pulled By: soumith fbshipit-source-id: fc39a807a597d3574f4ca4e22cea12194e4693c0	2019-03-14 07:22:55 -07:00
bhushan	16e50c78e7	Report convolution size mismatch (#17436 ) Summary: 1. Kernel size is larger than input 2. Expected output size to be less than zero Test case added: - invalid_conv1d - Relevant test cases for conv2d and conv3d exists Fixes #17247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17436 Reviewed By: mrshenli Differential Revision: D14354272 Pulled By: fmassa fbshipit-source-id: 94b98621aa03b1f60d151ef9399ed3da55d41b42	2019-03-14 06:35:29 -07:00
Jongsoo Park	8bd9465b79	make momentum non negative in adagrad test (#18009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18009 momentum should be initialized with non-negative values Reviewed By: hyuen Differential Revision: D14450841 fbshipit-source-id: 5bbbd11645db9e6f2dc42b26a00ff3caf378c59f	2019-03-14 03:15:07 -07:00
Lu Fang	f827f1052a	Fix the CI Summary: https://github.com/pytorch/pytorch/pull/17995 's CI has verified it should fix the CI. Reviewed By: bddppq Differential Revision: D14447674 fbshipit-source-id: 50085db9ae7421b5be216ed0a2216234babfdf6c	2019-03-13 17:28:50 -07:00
Junjie Bai	6df7116273	Fix missing return in HIPStreamMasqueradingAsCUDA::operator<< (#17961 ) Summary: ``` In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/BatchLinearAlgebra.hip:3: In file included from /var/lib/jenkins/workspace/aten/src/ATen/hip/HIPContext.h:5: /var/lib/jenkins/workspace/aten/src/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:107:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ 1 warning generated. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17961 Reviewed By: houseroad Differential Revision: D14436421 Pulled By: bddppq fbshipit-source-id: 962665602178699d7c7b55f4ca7ff1eb72ee0349	2019-03-13 16:04:42 -07:00
Gregory Chanan	c5b50a3440	Remove AssertNDim, which doesn't seem to be used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17978 Reviewed By: colesbury Differential Revision: D14438845 Pulled By: gchanan fbshipit-source-id: 106650c37fb1885201eaef27cb6d86b49ef27976	2019-03-13 15:10:55 -07:00
Gregory Chanan	42acae5406	Remove unused BoolOption. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17979 Reviewed By: zou3519 Differential Revision: D14438876 Pulled By: gchanan fbshipit-source-id: a6aeab0261ce6926ed82a81edee4564a8dd341ed	2019-03-13 13:38:19 -07:00
Elliot Waite	1e42720a77	Fix some typos in distributed.py. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17959 Differential Revision: D14437347 Pulled By: soumith fbshipit-source-id: 4c33571f56e9da687666516a310f91924cddd4d9	2019-03-13 09:28:03 -07:00
peter	1c3494daf0	Fix Windows test CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17954 Differential Revision: D14437473 Pulled By: soumith fbshipit-source-id: f0d79ff0c5d735f822be3f42bbca91c1928dacaf	2019-03-13 09:22:46 -07:00
Edward Yang	9089182ce4	Fix lint in test_utils.py (#17944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17944 ghimport-source-id: 5b45086428b5a36e737882c78f285141121fd1bc Stack: * #17944 Fix lint in test_utils.py Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14430132 fbshipit-source-id: b00de7b4c685645ad5a4dc8c5fe6ce7e1893a3eb	2019-03-13 09:02:35 -07:00
Guanheng Zhang	26a4c2ada6	Speed up gemm by reordering the for loops (#17730 ) Summary: Optimize the order of the "for" loops. Note: For "transa = true" cases, the order of the "for" loops has been optimzied in the original code. Therefore, no significant improvement is observed in those case (i.e. "transa && transb" and "transa && !transb") mode/opt (i.e. static libary) ////////////////////////////////////////////////////////////////////////////// transa && transb after: loops: 2229 x: 128 y: 128 z: 128 time: 2243ns => acceleration multiplier: 0.90 loops: 124 x: 128 y: 1024 z: 128 time: 40381ns => acceleration multiplier: 0.97 loops: 121 x: 1024 y: 128 z: 128 time: 41651ns => acceleration multiplier: 0.96 loops: 15 x: 1024 y: 1024 z: 128 time: 333771ns => acceleration multiplier: 0.98 loops: 4610 x: 128 y: 128 z: 64 time: 1084ns => acceleration multiplier: 0.95 loops: 252 x: 128 y: 1024 z: 64 time: 19860ns => acceleration multiplier: 0.98 loops: 248 x: 1024 y: 128 z: 64 time: 20232ns => acceleration multiplier: 0.98 loops: 30 x: 1024 y: 1024 z: 64 time: 167338ns => acceleration multiplier: 0.99 before: loops: 2468 x: 128 y: 128 z: 128 time: 2026ns loops: 128 x: 128 y: 1024 z: 128 time: 39338ns loops: 126 x: 1024 y: 128 z: 128 time: 39930ns loops: 16 x: 1024 y: 1024 z: 128 time: 327549ns loops: 4840 x: 128 y: 128 z: 64 time: 1033ns loops: 258 x: 128 y: 1024 z: 64 time: 19441ns loops: 252 x: 1024 y: 128 z: 64 time: 19854ns loops: 31 x: 1024 y: 1024 z: 64 time: 166254ns ////////////////////////////////////////////////////////////////////////////// transa && !transb after: loops: 4880 x: 128 y: 128 z: 128 time: 1024ns => acceleration multiplier: 0.98 loops: 638 x: 128 y: 1024 z: 128 time: 7839ns => acceleration multiplier: 1.04 loops: 605 x: 1024 y: 128 z: 128 time: 8276ns => acceleration multiplier: 1.01 loops: 77 x: 1024 y: 1024 z: 128 time: 65713ns => acceleration multiplier: 1.00 loops: 9935 x: 128 y: 128 z: 64 time: 503ns => acceleration multiplier: 1.00 loops: 1252 x: 128 y: 1024 z: 64 time: 3994ns => acceleration multiplier: 1.00 loops: 1183 x: 1024 y: 128 z: 64 time: 4226ns => acceleration multiplier: 0.98 loops: 153 x: 1024 y: 1024 z: 64 time: 32766ns => acceleration multiplier: 0.99 before: loops: 4985 x: 128 y: 128 z: 128 time: 1003ns loops: 615 x: 128 y: 1024 z: 128 time: 8140ns loops: 599 x: 1024 y: 128 z: 128 time: 8357ns loops: 76 x: 1024 y: 1024 z: 128 time: 65934ns loops: 9897 x: 128 y: 128 z: 64 time: 505ns loops: 1248 x: 128 y: 1024 z: 64 time: 4008ns loops: 1203 x: 1024 y: 128 z: 64 time: 4159ns loops: 154 x: 1024 y: 1024 z: 64 time: 32499ns ////////////////////////////////////////////////////////////////////////////// !transa && transb after: loops: 3919 x: 128 y: 128 z: 128 time: 1276ns => acceleration multiplier: 2.97 loops: 497 x: 128 y: 1024 z: 128 time: 10069ns => acceleration multiplier: 7.85 loops: 449 x: 1024 y: 128 z: 128 time: 11145ns => acceleration multiplier: 4.77 loops: 57 x: 1024 y: 1024 z: 128 time: 88595ns => acceleration multiplier: 7.12 loops: 7575 x: 128 y: 128 z: 64 time: 660ns => acceleration multiplier: 3.00 loops: 967 x: 128 y: 1024 z: 64 time: 5173ns => acceleration multiplier: 7.66 loops: 877 x: 1024 y: 128 z: 64 time: 5702ns => acceleration multiplier: 4.76 loops: 111 x: 1024 y: 1024 z: 64 time: 45232ns => acceleration multiplier: 7.03 before: loops: 1320 x: 128 y: 128 z: 128 time: 3789ns loops: 64 x: 128 y: 1024 z: 128 time: 79061ns loops: 95 x: 1024 y: 128 z: 128 time: 53107ns loops: 8 x: 1024 y: 1024 z: 128 time: 631161ns loops: 2521 x: 128 y: 128 z: 64 time: 1983ns loops: 127 x: 128 y: 1024 z: 64 time: 39604ns loops: 185 x: 1024 y: 128 z: 64 time: 27128ns loops: 16 x: 1024 y: 1024 z: 64 time: 318155ns ////////////////////////////////////////////////////////////////////////////// !transa && !transb after: loops: 3895 x: 128 y: 128 z: 128 time: 1283ns => acceleration multiplier: 1.73 loops: 393 x: 128 y: 1024 z: 128 time: 12746ns => acceleration multiplier: 3.36 loops: 411 x: 1024 y: 128 z: 128 time: 12170ns => acceleration multiplier: 1.93 loops: 46 x: 1024 y: 1024 z: 128 time: 110116ns => acceleration multiplier: 3.17 loops: 7404 x: 128 y: 128 z: 64 time: 675ns => acceleration multiplier: 1.58 loops: 636 x: 128 y: 1024 z: 64 time: 7872ns => acceleration multiplier: 2.70 loops: 724 x: 1024 y: 128 z: 64 time: 6911ns => acceleration multiplier: 1.32 loops: 73 x: 1024 y: 1024 z: 64 time: 68502ns => acceleration multiplier: 2.49 before: loops: 2253 x: 128 y: 128 z: 128 time: 2219ns loops: 117 x: 128 y: 1024 z: 128 time: 42788ns loops: 214 x: 1024 y: 128 z: 128 time: 23465ns loops: 15 x: 1024 y: 1024 z: 128 time: 349076ns loops: 4694 x: 128 y: 128 z: 64 time: 1065ns loops: 236 x: 128 y: 1024 z: 64 time: 21251ns loops: 549 x: 1024 y: 128 z: 64 time: 9108ns loops: 30 x: 1024 y: 1024 z: 64 time: 170799ns Pull Request resolved: https://github.com/pytorch/pytorch/pull/17730 Differential Revision: D14325149 Pulled By: zhangguanheng66 fbshipit-source-id: a7a5a83890fdf99fee6eb87a3a5060b7b6bd862f	2019-03-13 08:57:26 -07:00
livc	ecc5e623a2	fix punctuation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17973 Differential Revision: D14438725 Pulled By: zou3519 fbshipit-source-id: 30a5485b508b4ae028057e0b66a8abb2b163d66b	2019-03-13 08:14:30 -07:00
Thomas Viehmann	13bc002422	fixes for AVX detection (#17915 ) Summary: Our AVX2 routines use functions such as _mm256_extract_epi64 that do not exist on 32 bit systems even when they have AVX2. This disables AVX2 when _mm256_extract_epi64 does not exist. This fixes the "local" part of #17901 (except disabling FBGEMM), but there also is sleef to be updated and NNPACK to be fixed, see the bug report for further discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17915 Differential Revision: D14437338 Pulled By: soumith fbshipit-source-id: d4ef7e0801b5d1222a855a38ec207dd88b4680da	2019-03-13 03:55:06 -07:00
Thomas Viehmann	7e34bd230b	Disable FBGEMM when building under x86 32bit (#17922 ) Summary: FBGEMM doesn't work on x86 32bit and prior to this patch, it will generate x86_64 objects in a build that is supposed to be x86 32bit. FBGEMM actually relies on registers not available on x86_32, so we disable it. This takes of one element of #17901. There are more dependencies and a separate PR (#17915) regarding AVX detection for the code in the main repository. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17922 Differential Revision: D14437340 Pulled By: soumith fbshipit-source-id: bd9fc98cf607d9b0bc28127fbbc8b04fa10eecbe	2019-03-13 03:46:50 -07:00
serhii-havrylov	f6de833cac	Update docs for `mark_non_differentiable` method (#17891 ) Summary: The current documentation doesn't reflect the real values of tensors during the backward pass. This issue is mentioned in https://github.com/pytorch/pytorch/issues/12631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17891 Differential Revision: D14419949 Pulled By: soumith fbshipit-source-id: 8b495628c3f017bc880f8096682cd176a53974e5	2019-03-13 03:19:59 -07:00
Sebastian Messmer	1e7f027f5b	Simplify OpKernel (#17925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17925 There's no need for OpKernel to keep the cache creator around if we initialize cache on construction. This basically means, kernel caches are now constructed when the kernel is looked up from the dispatcher, and not delayed to the first call anymore. This gives us the benefit of cheaper calling because now kernel calling doesn't have to check if the cache is already initialized. Also, this improves thread-safety. Now, OpKernel is thread-safe if the kernel is thread-safe. Reviewed By: ezyang Differential Revision: D14424907 fbshipit-source-id: a0d09a3a560dfe78aab53d558c9ebb91b57722df	2019-03-13 01:40:10 -07:00
Junjie Bai	8714b8bb89	Mark DispatchTable move ctor and move assignment operator as deleted (#17948 ) Summary: ``` 21:39:50 /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/DispatchTable.h:125:3: warning: explicitly defaulted move constructor is implicitly deleted [-Wdefaulted-function-deleted] 21:39:50 DispatchTable(DispatchTable&&) = default; 21:39:50 ^ 21:39:50 /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/DispatchTable.h:212:36: note: move constructor of 'DispatchTable' is implicitly deleted because field 'kernels_' has a deleted move constructor 21:39:50 detail::ThreadsafeOperatorTable_ kernels_; 21:39:50 ^ 21:39:50 /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/DispatchTable.h:105:68: note: copy constructor of 'ThreadsafeOperatorTable_' is implicitly deleted because field 'map_' has a deleted copy constructor 21:39:50 LeftRight<ska::flat_hash_map<TensorTypeId, DispatchTableEntry>> map_; 21:39:50 ^ 21:39:50 /var/lib/jenkins/workspace/c10/util/LeftRight.h:152:16: note: copy constructor of 'LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::DispatchTableEntry, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::DispatchTableEntry> > > >' is implicitly deleted because field '_writeMutex' has a deleted copy constructor 21:39:50 std::mutex _writeMutex; 21:39:50 ^ 21:39:50 /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/mutex:129:5: note: 'mutex' has been explicitly marked deleted here 21:39:50 mutex(const mutex&) = delete; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17948 Reviewed By: ezyang Differential Revision: D14431344 Pulled By: bddppq fbshipit-source-id: b1c6593b73cb467a58b09a3470b8899b82564d5e	2019-03-13 01:29:50 -07:00
Lu Fang	4dcb4b1601	Add more hint in the JIT tracer (#17957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17957 So developer knows what action should be taken when model contains nondeterministic node Reviewed By: dzhulgakov Differential Revision: D14435923 fbshipit-source-id: 12d930185852f78c54efc8e90c51aa7c7c7faab5	2019-03-13 00:56:59 -07:00
Andrey Malevich	c8f9072ab6	Fix half-float conversion ops to handle tensors larger than 2B of params (#17952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17952 As desc. Reviewed By: hyuen Differential Revision: D14435092 fbshipit-source-id: dc614ba16ad531101d04d01aec8f1fbd534ebec5	2019-03-12 23:03:22 -07:00
Lu Fang	8bc3b66be9	Override the resolve_library_path in FBCode (#17497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17497 The following problems have been addressed: 1) import torch.ops correctly, 2) make realpath call optional Reviewed By: dzhulgakov Differential Revision: D14094358 fbshipit-source-id: 2f9a6fca656867287a7c82c465a4554384ff7323	2019-03-12 22:09:24 -07:00
Karl Ostmo	b21e9e4dae	update ccache guide (#17938 ) Summary: closes #17937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17938 Differential Revision: D14435791 Pulled By: kostmo fbshipit-source-id: b1d0db8902f78bde51150606e2a67fb9ddfe7812	2019-03-12 21:48:17 -07:00
Michael Suo	9a946c4072	unify cpp tests (#17947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17947 Instead of having a gtest and a no-gtest file that you have to remember to register tests in, add a single registration point and use some macro magic to make it work for both gtest and non-gtest builds Reviewed By: eellison Differential Revision: D14431302 fbshipit-source-id: e1abac135992577a943eaa7abcc81a6ed31fa6e5	2019-03-12 21:35:40 -07:00
svcscm	4f939dded1	Updating submodules Reviewed By: zpao fbshipit-source-id: 7d454d0f58898741f293b356dfc10d7fc31fd55c	2019-03-12 20:34:05 -07:00
Duc Ngo	66556f48e3	Remove sinkMaxPool transformation (#17694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17694 Remove sinkMaxPool transformation as it's unused Differential Revision: D14328624 fbshipit-source-id: bd245403b756157120faa61a0f9253c15120e7a8	2019-03-12 20:10:46 -07:00
Alexey Kozhevnikov	f7b70a69e5	Fix Windows build (#17917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17917 D14375995 introduced instantiation of the following templates with `bool` type (more specifically `To` is `int64_t`, `From` is `bool`): ``` template <typename To, typename From> typename std::enable_if<std::is_integral<From>::value, bool>::type overflows( From f) { using limit = std::numeric_limits<typename scalar_value_type<To>::type>; if (!limit::is_signed && std::numeric_limits<From>::is_signed) { // allow for negative numbers to wrap using two's complement arithmetic. // For example, with uint8, this allows for `a - b` to be treated as // `a + 255 * b`. return f > limit::max() \|\| (f < 0 && -static_cast<uint64_t>(f) > limit::max()); } else { return f < limit::lowest() \|\| f > limit::max(); } } template <typename To, typename From> typename std::enable_if<std::is_floating_point<From>::value, bool>::type overflows(From f) { using limit = std::numeric_limits<typename scalar_value_type<To>::type>; if (limit::has_infinity && std::isinf(static_cast<double>(f))) { return false; } if (!limit::has_quiet_NaN && (f != f)) { return true; } return f < limit::lowest() \|\| f > limit::max(); } ``` MSVC gives C4804 warning and because "treat warnings as errors" is on it fails to build on Windows. Disabling such warning for those 2 templates. Reviewed By: mingzhe09088 Differential Revision: D14421157 fbshipit-source-id: e72ba34406628c84da48518b32a46f851819bad1	2019-03-12 19:53:56 -07:00
Jongsoo Park	92e35ac0a7	fix overly restrictive assertion (#17939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17939 Instead of just asserting min <= 0 and max >= 0 , we adjust histogram to include 0 in the range. We need to include 0 in the range during norm error minimization to correctly represent our quantization method that includes 0. Reviewed By: csummersea Differential Revision: D14428732 fbshipit-source-id: 6669a9d2c7d409ec3b31aee0afe48071986b9b71	2019-03-12 18:18:49 -07:00
Owen Anderson	e34abe03a8	Enable threadpool threads to greedily acquire new tasks if available. (#17808 ) Summary: This improves locality and affinity by keeping work on the same threads preferentially to starting work on new ones, and reduces contention on the threadpool lock more generally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17808 Differential Revision: D14391282 Pulled By: resistor fbshipit-source-id: 3aec81656a50460a725aa4187c61864295d4f46e	2019-03-12 18:05:55 -07:00
Duc Ngo	552f903c63	JIT IR - Add option to remove prefix string when converting from JIT IR to NetDef (#17931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17931 When converting from NetDef to IR and back, the prefix string should be removed so the operator types are preserved in caffe2. Reviewed By: ZolotukhinM Differential Revision: D14425954 fbshipit-source-id: 2807e7337b0f804f126970768b1250a4a8c5f35c	2019-03-12 17:02:26 -07:00
Kai Zhang	4ad17c9031	Misleading documentation for module._load_from_state_dict (#17618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17618 Base on the code, we only add key to `missing_keys` and `unexpected_keys` if `$strict` is `True`. The documentation is confusing. This diff also fix one FLAKE8 warning. Reviewed By: ailzhang Differential Revision: D14280593 fbshipit-source-id: d368f5596bdf74ff62ee4d28d79120f5af91e0a3	2019-03-12 16:57:39 -07:00
Sandeep Kumar	6248266d91	Enable detectron on AMD GPU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17862 Differential Revision: D14429234 Pulled By: bddppq fbshipit-source-id: 5cb8750bd9db0ff8a179977d2bfbb180265cce81	2019-03-12 16:29:42 -07:00
Iurii Zdebskyi	1cfb50334f	Removed dead code from THTensorMath.h (#17873 ) Summary: This PR removes dead code from THTensorMath.h I found these unused methods while working on a PR where i plan to move fill and zero methods from TH/THC to Aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17873 Differential Revision: D14407013 Pulled By: izdeby fbshipit-source-id: a3551c5d91e7b380931a8b3bd4b3ae972d16911d	2019-03-12 13:53:33 -07:00
Edward Yang	6466ddbd86	Fix lint in test_torch.py (#17807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17807 Lint also detected a bug in test_linspace where we weren't actually testing the CUDA case. Differential Revision: D14388241 fbshipit-source-id: e219e46400f4952c6b384bca3baa0724ef94acde	2019-03-12 13:48:28 -07:00
svcscm	40ecdc57ff	Updating submodules Reviewed By: zpao fbshipit-source-id: 06c0f738c791cccf79025d15f1fc2076bf34fcd1	2019-03-12 13:29:46 -07:00
jainkunal3004	ee87254720	Eliminate the use of Type. (#17804 ) Summary: Stack:     ⚫  #17804 Eliminate the use of Type.  [💛](https://our.intern.facebook.com/intern/diff/D14382165/) at::CPU produces Type object which is then casted into TensorOptions, instead directly using TensorOptions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17804 Differential Revision: D14407851 Pulled By: ezyang fbshipit-source-id: 6462d698305b7c24382c1bfd440d3227bd28d9e4	2019-03-12 12:54:24 -07:00
Dan Povey	0f7e6f293b	Make Variable::set_data non-const; cosmetic fixes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17761 Differential Revision: D14406603 Pulled By: ezyang fbshipit-source-id: bc8bba73352eb4b3e21196b36522e9cec70f6676	2019-03-12 12:41:57 -07:00
Ailing Zhang	3e00f79a1e	remove warning for upsample code (#17921 ) Summary: IIRC we decided to remove warning in code in #11568. This got reverted accidentally in #14123. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17921 Differential Revision: D14422811 Pulled By: ailzhang fbshipit-source-id: 7067264bd1d3e3b7861d29e18ade2969ed705ca1	2019-03-12 12:16:33 -07:00
Xiaomeng Yang	f229521154	Optimize TileOp (#17290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17290 Optimize TileOp Reviewed By: wesolwsk Differential Revision: D14145844 fbshipit-source-id: 1571fa0512218dbc48080592ede4e23903be85dd	2019-03-12 12:16:30 -07:00
Xiaomeng Yang	54b33503ec	Optimize channel_stats_op (#16243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16243 Optimize channel_stats_op and add NHWC impl Reviewed By: takatosp1 Differential Revision: D13775515 fbshipit-source-id: decb889e646f5316d4afefdf9f9b6bc6343613cd	2019-03-12 12:08:00 -07:00
Hector Yuen	99f1465c35	enable shape inference for elementwise operators (#17885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17885 enable shape inference for elementwise operators Reviewed By: yinghai Differential Revision: D14411014 fbshipit-source-id: b19bcaabb2bba26fb79745ec84af0e4e5ed18ff0	2019-03-12 12:02:24 -07:00
Elias Ellison	4d2f6f1bbe	Remove remaining test jit expects redux (#17924 ) Summary: Trying to reland https://github.com/pytorch/pytorch/pull/17886 since it broke a build and I reverted it Pull Request resolved: https://github.com/pytorch/pytorch/pull/17924 Differential Revision: D14423842 Pulled By: eellison fbshipit-source-id: f219e786bd07f7da3b7f9e866981199f5ccf6318	2019-03-12 11:33:34 -07:00
Elias Ellison	abab9c1d78	Handle Scalars Better (#17875 ) Summary: This PR allows Scalars to be castable with `int()` and `float()`, allows scalars to match with float arguments, and prints out a better error message if `x.item()` is used as an int. Scalars are a very uncommon case, and I don't think we want to add the maintenance burden of building out op coverage for it. It's more maintainable to better handle converting it to int/float. Fix https://github.com/pytorch/pytorch/issues/17652 Also note: https://github.com/pytorch/pytorch/issues/16849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17875 Differential Revision: D14411138 Pulled By: eellison fbshipit-source-id: a4e957cefb0ffd10ddb234d92f6d1558cfce8751	2019-03-12 10:52:26 -07:00
Brian Johnson	fd04073e61	Fixed a formatting issue in doc comments (#17505 ) Summary: for torch.distributed.broadcast_multigpu per issue #17243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17505 Reviewed By: janewangfb Differential Revision: D14373865 Pulled By: pietern fbshipit-source-id: 6d7e91a3da50a7c9ba417ad852f7746eb5200043	2019-03-12 09:55:29 -07:00
Edward Yang	18949c8e00	Add nbytes, itemsize, element_size to at::Tensor. (#17810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17810 Partially addresses #12728. Also, switch the element_size bindings to use the new function, rather than the method on Type. We don't add Python bindings yet, as they need to be special (they will be properties.) Differential Revision: D14388790 fbshipit-source-id: 294183d0c8a59b0c13f2bf21d6f1cd557333e83b	2019-03-12 09:48:54 -07:00
Edward Yang	dc4cbd9565	Fix lint in test_distributions.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17821 Differential Revision: D14392899 fbshipit-source-id: 99f75b1d3a71bde8050caef8df7e5b9ecfe0c755	2019-03-12 09:39:24 -07:00
Edward Yang	030fec9703	Fix lint in test_jit.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17823 Differential Revision: D14392996 fbshipit-source-id: b9aa83898768c929e753c0f17bb09a54d724ae4d	2019-03-12 09:20:20 -07:00
Edward Yang	4073e3c2f2	Fix lint errors in test_autograd Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17812 Reviewed By: eellison Differential Revision: D14388897 fbshipit-source-id: 6e2671805dc8d57af68eb0a0cd6ccb24d9db45e2	2019-03-12 08:55:12 -07:00
Andras Tantos	f3a860ba07	Added a few extra python bindings to help with walking the IR graph from Python (#17822 ) Summary: These changes add the following new Python bindings: - Values have a 'type' property now that allows getting to the 'type' object - Blocks have now inputs and outputs as well as returnNode and paramNode properties Pull Request resolved: https://github.com/pytorch/pytorch/pull/17822 Differential Revision: D14410123 Pulled By: ezyang fbshipit-source-id: 64ef79f85a7a43b83e4b127b1d39efcaa64b74dc	2019-03-12 08:55:10 -07:00
Thomas Viehmann	aba9051a65	kthvalue consistency with sort in the presence of NaN (#17824 ) Summary: This PR causes kthvalue to be consistent with sort (i.e. treat NaN as larger than any number), so that `a.kthvalue(n) == a.sort()[n - 1]`. One drawback is that median with a NaN argument does not return NaN, which is a deviation from NumPy. Thank you, ngimel, for raising this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17824 Differential Revision: D14410092 Pulled By: ezyang fbshipit-source-id: bdec2d8272dc4c65bcf2f9b8995e237774c44c02	2019-03-12 08:49:19 -07:00
joy	9ecee93a16	Fix minor grammatical mistakes in torch/nn/modules/loss.py (#17892 ) Summary: Fixes some minor grammatical mistakes in the doc of `loss.py`. I think in the doc: > Note that for some losses, there multiple elements per sample. the "are" is lost between "there" and "multiple". This mistake takes place in all the descriptions of parameter `size_average` and there are 17 of them. It's minor but perfects the doc I think. 😁 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17892 Differential Revision: D14418177 Pulled By: ezyang fbshipit-source-id: 412759f2f9b215819463bf8452ab0e0513218cd6	2019-03-12 08:42:50 -07:00
Christian Puhrsch	02c48cced9	Remove (almost all) TensorOptions from native_functions.yaml (#17385 ) Summary: Stacked on top of https://github.com/pytorch/pytorch/pull/17386 Brings us to 1014/1106 of writing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17385 Differential Revision: D14248008 Pulled By: cpuhrsch fbshipit-source-id: 033e00de91e3edf7ae01ca03ebe436c0446b3b5c	2019-03-12 08:00:00 -07:00
Karl Ostmo	12d6725c15	Restore full Windows tests (#17102 ) Summary: closes #17101 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17102 Differential Revision: D14420716 Pulled By: ezyang fbshipit-source-id: 0134736e2d919afa683afa84cb2140f659042643	2019-03-12 06:34:45 -07:00
peter	525fef708d	Prevent VS2017 from emitting ambiguous symbol errors (second time) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17863 Differential Revision: D14404818 Pulled By: soumith fbshipit-source-id: 9dac6b926e270e2a29ec2e4dba2e93984da0e5f5	2019-03-12 01:56:58 -07:00
xuzhu	af2e347164	Fix windows test hang (#17778 ) Summary: This PR resolves two concurrent issues discovered when running the test in windows. Details about the windows test can be found here: https://github.com/pytorch/pytorch/issues/17609 The change covers two fixes: 1. update running_preloaders_ upfront before creating worker thread to prevent underflow. 2. add a lock when updating stop_ to prevent dead lock in condition variable cv_write_. The fix has been tested on both Windows and Linux. With --gtest_repeat=1000, the tests runs smoothly without issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17778 Differential Revision: D14404910 Pulled By: soumith fbshipit-source-id: 2fbb8007e4b0bce4613e9a9fd31b8aace1bbfa8d	2019-03-12 01:50:49 -07:00
vishwakftw	f268370b42	torch.btrifact for tensors with greater than 3 dimensions (#14964 ) Summary: Motivation: - Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check: > AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes()); What is in this PR?: - Move `btrifact` to ATen - Remove relation to TH/THC. - Handle tensors with more than three dimensions - Tests - Docs modifications: added a note about the non-pivoting variant. [blocked due to old magma-cuda binaries] Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964 Differential Revision: D14405106 Pulled By: soumith fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184	2019-03-12 01:46:07 -07:00
Roy Li	b161ac9634	Small clean up of aten_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17530 Reviewed By: ezyang Differential Revision: D14237931 fbshipit-source-id: fb73d63d89fab0622097a49be6ed0b75ddb02a7c	2019-03-11 21:04:16 -07:00
Michael Suo	496a3339dc	add support for parsing class defs to the string frontend (#17628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17628 This is not hooked up anywhere yet, just adding support. This shares the same restrictions as the python frontend—namely, that the only exprs allowed right now are method defs. Reviewed By: shannonzhu Differential Revision: D14291654 fbshipit-source-id: 7798e5ff412a52ef8803c7bae8f439e50968a73a	2019-03-11 19:13:55 -07:00
Michael Suo	64bb86d946	add test for out of order methods (#17624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17624 Just to make sure this path works Reviewed By: shannonzhu Differential Revision: D14288056 fbshipit-source-id: b719c0e90252b6821b1f9b22d3d98982985a6cb3	2019-03-11 19:13:54 -07:00
Michael Suo	f9820e55af	initializing class value (#17585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17585 Create a sugared value that represents a class during initialization. This is so that assignments to attributes correctly define attributes in __init__ but raise an error elsewhere. Reviewed By: shannonzhu Differential Revision: D14263403 fbshipit-source-id: 09b2feeb272302f00a79c2a0302fbdf5483aed6a	2019-03-11 19:13:52 -07:00
Pieter Noordhuis	2e753fc753	Remove unused parameter in ProcessGroupGloo (#17718 ) Summary: This is not used anywhere and wasn't cleaned up prior to 1.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17718 Reviewed By: janewangfb Differential Revision: D14355154 Pulled By: pietern fbshipit-source-id: f8ff3c8f50cd6365b369a5c5b85d72d8940df048	2019-03-11 18:01:20 -07:00
Elias Ellison	f540536dfd	Revert D14414435: [pytorch][PR] Remove remaining IR Expect files Differential Revision: D14414435 Original commit changeset: 0bfd7ce66ac2 fbshipit-source-id: 02de1814f3c4e581d3798059cee752517b176ed9	2019-03-11 17:36:44 -07:00
Elias Ellison	fd67f6b463	Remove remaining IR Expect files (#17886 ) Summary: Last batch of IR expect files removed. Includes some removal of expect files that are no longer used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17886 Differential Revision: D14414435 Pulled By: eellison fbshipit-source-id: 0bfd7ce66ac2f72a57f15f45ebd60b95e80b6c16	2019-03-11 17:32:19 -07:00
Iurii Zdebskyi	4aa22833cf	Bool tensor creation (cpu) (#17376 ) Summary: This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU (this PR) b) CUDA 3. Tensor Conversions. 4. Tensor Indexing. 5. Tensor Operations. 6. Back compatibility related changes. Change: Enable CPU tensors and these operations: - torch.zeros - torch.tensor - torch.ones - torch.randint - torch.full - torch.full_like - torch.empty - torch.empty_like Tested via: 1) unit tests 2) torch.zeros(2,2, dtype=torch.bool) torch.tensor([True, False], dtype=torch.bool) torch.tensor([-1, -1.1, 0, 1, 1.1, 2], dtype=torch.bool) torch.ones([1,2], dtype=torch.bool) torch.randint(10, (2, 2), dtype=torch.bool) torch.full((2, 3), True, dtype=torch.bool) torch.empty(4, dtype=torch.bool) a = torch.tensor([0,0,1]) b = torch.full_like(a, True) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17376 Reviewed By: ezyang Differential Revision: D14375995 Pulled By: izdeby fbshipit-source-id: a65490b5360ee0e6e3accc54ce7e32e49ad2d2a8	2019-03-11 17:03:40 -07:00
Roy Li	b5fa5a5603	Remove device guard from TypeDefault::copy() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17833 Reviewed By: ezyang Differential Revision: D14400901 Pulled By: li-roy fbshipit-source-id: ababc95dadfc94a996a80c5332f45f76a300963d	2019-03-11 15:53:41 -07:00
Michael Suo	066d15840f	re-enable torch.split tests (#17859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17859 this has been fixed due to improvements in shape analysis Reviewed By: driazati Differential Revision: D14402781 fbshipit-source-id: 4ef2722ffedd9c8ac1eff55c244b421d7d3715ed	2019-03-11 15:22:55 -07:00
Edward Yang	d391137acd	Fix lint in test_dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17878 Reviewed By: eellison Differential Revision: D14409933 fbshipit-source-id: 20ee8953a21e29b4557aff62b5e48dddd630eef6	2019-03-11 14:50:51 -07:00
Johannes M Dieterich	fa29c179b7	Optimize fused_dropout_kernel launch bounds for AMD hardware Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17870 Differential Revision: D14409990 Pulled By: ezyang fbshipit-source-id: 0452282f459770823641b2527f47b1186ab14666	2019-03-11 14:45:42 -07:00
Vishwak Srinivasan	3f1d0ee5d5	Deprecate torch.pstrf (#17866 ) Summary: Changelog: - Add deprecation warning to torch.pstrf Pull Request resolved: https://github.com/pytorch/pytorch/pull/17866 Differential Revision: D14405527 Pulled By: soumith fbshipit-source-id: 73f3b7d61c60eb57e4bffd08112e552ae3e6dfdc	2019-03-11 12:27:52 -07:00
Gao, Xiang	11c89dde55	Allow structseq to be input of operators where tuple is expected (#17208 ) Summary: Currently the following code gives an error on python 2 because `ret` is a structseq which is not a tuple ```python ret = a.max(dim=0) ret1 = torch.max(a, dim=0, out=ret) ``` This PR modify tuple check in python arg parser to allow structseq to be input of operators where tuple is expected, which would make the above code work. Depend on: https://github.com/pytorch/pytorch/pull/17136 Partially fixes: https://github.com/pytorch/pytorch/issues/16813 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17208 Differential Revision: D14280198 Pulled By: VitalyFedyunin fbshipit-source-id: beffebfd3951c4f5c7c8fe99a5847616a89491f3	2019-03-11 11:33:35 -07:00
Eric Nakagawa	b9e8f56daa	Add PyTorch Governance, Contributor Guide, and List of Persons of Interest Summary: Adding new documents to the PyTorch website to describe how PyTorch is governed, how to contribute to the project, and lists persons of interest. Reviewed By: orionr Differential Revision: D14394573 fbshipit-source-id: ad98b807850c51de0b741e3acbbc3c699e97b27f	2019-03-11 10:36:41 -07:00
Yinghai Lu	abd39d5a88	Fix compilation error (#17860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17860 att Reviewed By: bddppq Differential Revision: D14402751 fbshipit-source-id: 2d53b230dfd775372addeab1d3eaf0b9552fae9f	2019-03-11 10:26:42 -07:00
Edward Yang	b3c9090736	Revert D14392864: Fix lint in test_dataloader.py Differential Revision: D14392864 Original commit changeset: 12477b9cfe29 fbshipit-source-id: 1864a80d5cfaceeae55d0145340a578f978ab4a7	2019-03-11 10:19:41 -07:00
Iurii Zdebskyi	817fd9ebf1	Removed dead code from THTensorMath.h (#17769 ) Summary: This PR removes dead code from THTensorMath.h I found these unused methods while working on a PR where i plan to move fill and zero methods from TH/THC to Aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17769 Differential Revision: D14372732 Pulled By: izdeby fbshipit-source-id: 94fd3b52c691ebc89d2bdc8905452e7498038bf5	2019-03-11 10:14:44 -07:00
bhushan	b57fe3cc66	Introducing array-like sequence methods __contains__ (#17733 ) Summary: for tensor Fixes: #17000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17733 Differential Revision: D14401952 Pulled By: soumith fbshipit-source-id: c841b128c5a1fceda1094323ed4ef1d0cf494909	2019-03-11 09:00:16 -07:00
peter	906f9efc57	Revert "Add check for x64 Python before setup (#17707 )" (#17864 ) Summary: This reverts commit 08fb9021da32e73bd7dec73104eea6a76dd44439. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17864 Differential Revision: D14404920 Pulled By: soumith fbshipit-source-id: d41fc06e249f3437d4f80d1d6a5fdbd44c90462b	2019-03-11 08:52:13 -07:00
Nicki Skafte	8045b3eb14	Registering of kl-divergence for independent distribution (#17681 ) Summary: This address issue https://github.com/pytorch/pytorch/issues/13545 and implements the proposed fix together with a single test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17681 Differential Revision: D14360161 Pulled By: ezyang fbshipit-source-id: 427afc88e9054b5b0dc39ebbab1087b990695ea5	2019-03-11 08:10:16 -07:00
Edward Yang	c02369151d	Fix lint in test_dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17820 Reviewed By: eellison Differential Revision: D14392864 fbshipit-source-id: 12477b9cfe290428d51cc28e024c8cbe8bb7bf51	2019-03-11 08:01:33 -07:00
Tongzhou Wang	1d827b7271	Further improvements of nn.container docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17731 Differential Revision: D14401894 Pulled By: soumith fbshipit-source-id: cebb25859f78589cc4f4f8afb1e84c97f82b6962	2019-03-10 18:30:39 -07:00
Tongzhou Wang	b6313d74e1	fix faq typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17851 Differential Revision: D14401791 Pulled By: soumith fbshipit-source-id: ed6d64d6f5985e7ce76dca1e9e376782736b90f9	2019-03-10 15:33:52 -07:00
bhushan	6bcff88d3e	Fix log_softmax and softmax if any dimension is 0-d (#17651 ) Summary: - Test added - test_dim_function_empty: softmax and log_softmax on last dimension fixes: #17262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17651 Differential Revision: D14349009 Pulled By: gchanan fbshipit-source-id: b6f728f5c6be8ae7615749e3f0c201886632923e	2019-03-10 15:25:58 -07:00
ZhuBaohe	75f88d4da6	Correct loss docstrings (#17300 ) Summary: In the loss doc description, replace the deprecated 'reduct' and 'size_average' parameters with the 'reduction' parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17300 Differential Revision: D14195789 Pulled By: soumith fbshipit-source-id: 625e650ec20f13b2d22153a4a535656cf9c8f0eb	2019-03-10 11:56:41 -07:00
HE, Tao	98c54e9fa6	When openblas exists, "OpenBLAS_FOUND" is defined, rather than "OPENBLAS_FOUND". (#17841 ) Summary: See https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindOpenBLAS.cmake#L36 This typo lead to cmake fails to detect openblas on ubuntu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17841 Differential Revision: D14400261 Pulled By: soumith fbshipit-source-id: 287e019e122230cf6b70ab1ea94e5c514f429c88	2019-03-10 09:34:50 -07:00
bhushan	a6c4ea66dd	Passing indices as a list to Subset instead of Tensor (#17649 ) Summary: Indices in Subset were stored as tensors earlier passing as list in random_split to ensure integer indexing fixes: #17466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649 Differential Revision: D14400250 Pulled By: soumith fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0	2019-03-10 09:23:53 -07:00
James Reed	81e025d9ac	Clarify JIT docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17846 Differential Revision: D14400363 Pulled By: jamesr66a fbshipit-source-id: 862316b5fd95526b6edebeca19d2cc522779df11	2019-03-09 23:13:31 -08:00
Pritam Damania	24e7b824e0	Add metadata for torch jit TracedModules. (#17640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17311 I've extended our model metadata framework in this diff to support traced modules as well. Re-used a lot of components from the previous implementation of ScriptModule metadata. Tracing is a little different from Scripting since you can't just create a subclass of TopLevelTraceModule (type returned by torch.jit.trace) and attach metadata the way we did for ScriptModule. As a result, I've introduced a separate API torch.fb.jit_trace which returns an instance of TracedModuleWithMetadata which is a subclass of TopLevelTracedModule. As a result, we can now attach metadata to this instance. Reviewed By: dzhulgakov Differential Revision: D14117966 fbshipit-source-id: 3eee5eef733cb8d6a219c02e2f41d08698eca326	2019-03-09 21:37:15 -08:00
Konstantin Lopuhin	320c6977c2	Fix PySlice_Unpack not available on PyPy 3.6 yet (#17836 ) Summary: This is one of the fixes needed to support compilation on PyPy 3.6, see https://github.com/pytorch/pytorch/issues/17835 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17836 Differential Revision: D14399404 Pulled By: soumith fbshipit-source-id: ca650a6e2066aed86ddd3314a95d0cb3c515c633	2019-03-09 20:10:16 -08:00
Ronan Lamy	742568e7eb	PyPy compatibility: let unmodified slots be inherited in the standard way (#17837 ) Summary: This is needed to fix a segfault on PyPy 3.6, see https://bitbucket.org/pypy/pypy/issues/2968/segfault-calling-cpyext_tp_new_tuple and https://github.com/pytorch/pytorch/issues/17835 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17837 Differential Revision: D14399408 Pulled By: soumith fbshipit-source-id: 75328a30018313d3223dd3e3eef9240a416c049b	2019-03-09 11:42:16 -08:00
Junjie Bai	17232fb842	Run fp16 resnet50 training in bench script (#17831 ) Summary: cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/17831 Differential Revision: D14398532 Pulled By: bddppq fbshipit-source-id: 37c03cc2eebe3a6083e05631cb6ff03474e4a8a2	2019-03-08 21:53:12 -08:00
Summer Deng	c10c73f047	Int8 FC performance debugging (#17700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17700 Add performance debugging utilities in DNNLOWP FC operator and the python script Reviewed By: amylittleyang Differential Revision: D14321299 fbshipit-source-id: 50dbd7b352a1da5d2ecb659d8003e71e70750063	2019-03-08 19:03:54 -08:00
Xiaomeng Yang	0fd1dc45c0	Optimize LayerNormOp (#17604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17604 Optimize LayerNormOp i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D14274175 fbshipit-source-id: a7aa263a1b0eb109682d2be99306e7b2cdcc0faf	2019-03-08 17:38:14 -08:00
Roy Li	65b00aa597	Remove some simple use cases of Type::ScalarType() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17529 Reviewed By: ezyang Differential Revision: D14237932 fbshipit-source-id: be633a1fc19215d53cfe083fdd7196acf2b7dd2f	2019-03-08 16:42:05 -08:00
Roy Li	3aeb78079b	Change Dispatch.h to use ScalarType over Type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17527 Reviewed By: zou3519 Differential Revision: D14235395 fbshipit-source-id: 3f53e33f6794f1f14c2edf79014b8ef8397822c5	2019-03-08 16:42:04 -08:00
Lu Fang	cc07f968f8	Revert D14361993: [pytorch][PR] [Onnx] - refactoring serialization of ONNX initializers to be name-based Differential Revision: D14361993 Original commit changeset: da93e945d557 fbshipit-source-id: 15eea001fbcd059ac13903405aeb9ea182c6ee8b	2019-03-08 16:31:14 -08:00
James Reed	1d26a3ae7e	Open registration for c10 thread pool (#17788 ) Summary: 1. Move ATen threadpool & open registration mechanism to C10 2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788 Reviewed By: zdevito Differential Revision: D14379707 Pulled By: jamesr66a fbshipit-source-id: 949662d0024875abf09907d97db927f160c54d45	2019-03-08 15:38:41 -08:00
David Riazati	0955592243	Cast nn.Upsample.scale_factor to a float (#17732 ) Summary: Fixes #17106 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17732 Differential Revision: D14388192 Pulled By: driazati fbshipit-source-id: d9c9e87a7c6db63c1de3ddebbb8dcf619f0dc34d	2019-03-08 15:29:35 -08:00
Edward Yang	4bea15f580	Fix lint in run_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17815 Reviewed By: eellison Differential Revision: D14390308 fbshipit-source-id: 22efd62a1bbd1fc8155a942d7160d5b7d3158e6b	2019-03-08 14:41:36 -08:00
Edward Yang	dbb5d02a45	Fix lint in test/common_utils.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17814 Reviewed By: eellison Differential Revision: D14390194 fbshipit-source-id: b4b3bbe20a15d0b9ed127b255e01c0d6d0832c1b	2019-03-08 14:22:57 -08:00
Roy Li	7aae51cded	Replace tensor.type().scalarType() calls with tensor.scalar_type() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17515 Reviewed By: ezyang Differential Revision: D14233250 fbshipit-source-id: 6c7af8d2291c0c2b148001b30cf03834f34366c0	2019-03-08 14:08:18 -08:00
Yinghai Lu	efed875b3f	Catch exceptions in bound_shape_inference (#17775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17775 Handles use input shape hint properly. Reviewed By: zrphercule Differential Revision: D14368735 fbshipit-source-id: 504cd96589e47aa432617e56362aa6b01a25ba9b	2019-03-08 13:18:28 -08:00
Sebastian Messmer	4a7c549e8f	refactor caffe2 operator constructors - 11/9 (#17722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17722 clangr codemod Reviewed By: ezyang Differential Revision: D14350584 fbshipit-source-id: adef54cedc9409b4fb365f6644e2621a9e47b2ff	2019-03-08 12:38:54 -08:00
Edward Yang	55cf9c742a	Suppress C408 lint (don't use dict constructor) (#17813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17813 We have a lot of manually written out dict() constructors, and (1) I don't think use of curly brace syntax is much of an improvement and (2) it seems like a waste of time to fix them all. Reviewed By: eellison Differential Revision: D14390136 fbshipit-source-id: 6199bef4dea75b6079bcb9d9e8acf20a2e1a86e1	2019-03-08 12:19:17 -08:00
Christian Puhrsch	11f50e73e3	Add matches_jit_signature to recent native functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17805 Differential Revision: D14388004 Pulled By: cpuhrsch fbshipit-source-id: c50580b6fe1e9cfefed91aaa526376325d9f9c0d	2019-03-08 11:42:25 -08:00
peterjc123	fe90ee9dc8	Add /MD to prevent linking errors on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17799 Differential Revision: D14385777 Pulled By: ezyang fbshipit-source-id: 8c1d9f80c48399087f5fae4474690e6d80d740e6	2019-03-08 10:46:25 -08:00
Dmytro Dzhulgakov	a60fadfb71	Change message on unknown db type to be friendly (#17795 ) Summary: CreateDB actually returns nullptr when db type is unknown and throws when the file is missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795 Reviewed By: ezyang Differential Revision: D14383226 Pulled By: dzhulgakov fbshipit-source-id: 1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b	2019-03-08 10:46:24 -08:00
David Riazati	667763a63a	Trace rnn max_batch_size (#17727 ) Summary: This causes the tracer to record the select / cast to int operation instead of just an int constant Fixes #15319 but relies on a fix for #17583 first Pull Request resolved: https://github.com/pytorch/pytorch/pull/17727 Differential Revision: D14377886 Pulled By: driazati fbshipit-source-id: 59453def54ba72756303f723993844dbeb5d2f8b	2019-03-08 10:36:36 -08:00
Sebastian Messmer	7f7d12854d	Remove legacy way of exposing caffe2 operators to PyTorch (#17742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17742 This path isn't used anymore, and is incompatible with the changes stacked on top of this diff. Removing it. cc bwasti to check and confirm these can really be deleted Reviewed By: ezyang Differential Revision: D14362426 fbshipit-source-id: 32cdc19f28c2a981ae1e204901420998367ee588	2019-03-08 10:22:41 -08:00
Gregory Chanan	b132f0f1e7	Remove 'Tensor' key from ATen codegen. (#17782 ) Summary: We used to have different ATen Tensor types, but we don't anymore. This was just being maintained by a codegen'ed comment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17782 Reviewed By: ezyang Differential Revision: D14378004 Pulled By: gchanan fbshipit-source-id: 1bbf276393a391252d372cc385230c784bd78588	2019-03-08 09:46:38 -08:00
Gregory Chanan	5ffd7dbbb4	Remove ProcessorSpecificPlugin. (#17789 ) Summary: It doesn't seem to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17789 Reviewed By: ezyang Differential Revision: D14382423 Pulled By: gchanan fbshipit-source-id: 0ac3236c48979a1b2bcd615e307e55f10fd8eb77	2019-03-08 09:46:37 -08:00
Gregory Chanan	33c83a3f35	Remove THPPlugin. (#17790 ) Summary: It doesn't seem to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17790 Reviewed By: ezyang Differential Revision: D14380897 Pulled By: gchanan fbshipit-source-id: 3c3884a08c3b6c1489347d439509b19e079c5861	2019-03-08 09:42:01 -08:00
Edward Yang	424a03186a	Replace tens with hundreds. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17752 Differential Revision: D14366743 fbshipit-source-id: 39f6ac08180d780866e284024918d9abd197d239	2019-03-08 07:33:41 -08:00
Tim Khatkevich	6aacc1b2dd	Support failback for more operators in ideep (#17747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17747 RMACRegions, Normalize and RoIPooling Reviewed By: dskhudia Differential Revision: D14365096 fbshipit-source-id: dafcb7077515e03c2880832a442015b70fc7140d	2019-03-08 05:48:22 -08:00
Mikhail Zolotukhin	256923523a	Cleanup include files in jit/passes/common_subexpression_elimination.h. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17784 Differential Revision: D14381529 Pulled By: ZolotukhinM fbshipit-source-id: e32e17ee644ef888a6d56a8ee3648e7ac21758bf	2019-03-08 01:11:20 -08:00
Christian Puhrsch	b290a16b2d	Use return names in JIT operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17638 Differential Revision: D14295606 Pulled By: cpuhrsch fbshipit-source-id: 62040ac65434411357808735f0fe6cd33cc1c30f	2019-03-07 23:34:42 -08:00
Jerry Zhang	ac87488bd3	Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#17764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17764 Original commit changeset: f1923fdca4a1 reverted int8 ops fixes the original runtime regression. We'll ignore the memory regression since it is flaky, see D14228484 Reviewed By: dzhulgakov Differential Revision: D13885233 fbshipit-source-id: ccbe4b94acb44b7b4cb3ae4d73e3f6091e1e1195	2019-03-07 18:38:53 -08:00
Roy Li	cc7aec12fd	Clean up some old ScalarType stuff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17755 Differential Revision: D14377135 Pulled By: li-roy fbshipit-source-id: 35305760a1621340ba66c61a193ff61cfedfa7e8	2019-03-07 16:21:52 -08:00
Elias Ellison	549eb9e9bc	add reference to flake8-mypy in contributing.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17759 Differential Revision: D14376813 Pulled By: eellison fbshipit-source-id: cca1128e967ef7368633b94a3fa3c8e76a4a16f4	2019-03-07 15:28:59 -08:00
vishwakftw	9d70e199f4	Move lerp to ATen, add functionality for tensor weights (#17348 ) Summary: Changelog: - Remove TH/THC bindings - Add tensor weights for `lerp` - Modify derivatives appropriately Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348 Differential Revision: D14355845 Pulled By: soumith fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf	2019-03-07 14:04:58 -08:00
Iurii Zdebskyi	6227afb305	Refactor dispatcher (#17753 ) Summary: This is a side PR for a bool tensor feature. The idea of this change came from a feedback received in this [PR](https://github.com/pytorch/pytorch/pull/17376). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17753 Differential Revision: D14367989 Pulled By: izdeby fbshipit-source-id: 4fa380e56e20f18e480be68920170dbc3a4eb91c	2019-03-07 13:41:54 -08:00
Wanchao Liang	aa57f17808	add layernorm to AD Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17702 Differential Revision: D14368472 Pulled By: wanchaol fbshipit-source-id: 8db390e39444078258ad1d34ba74d6ddafa5d02b	2019-03-07 13:36:51 -08:00
Hector Yuen	5bf9e41938	move half<->float conversions to oss operators (#17548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17548 expose half float operators to OSS common/math/Float16.h is the original implementation this is substituted by caffe2/c10/util/Half.h from the comments seems like the both implementations don't handle denormals Reviewed By: jspark1105 Differential Revision: D14244200 fbshipit-source-id: f90ba28c5bf6a2b451b429cc4925b8cc376ac651	2019-03-07 13:00:13 -08:00
Lu Fang	aa4c4c47fa	Fix the update ONNX expect files (#17767 ) Summary: Fix the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/17767 Reviewed By: zrphercule Differential Revision: D14370483 Pulled By: houseroad fbshipit-source-id: e7b0bbde0797c41f5a010fa206fab80fe2792eb7	2019-03-07 12:54:42 -08:00
Mikhail Zolotukhin	7bcc2301ee	Cleanup testFusion/testOne: there are unused arguments. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17737 Differential Revision: D14366584 Pulled By: ZolotukhinM fbshipit-source-id: 3c2dd2aabfecca475909e4eec4a077d900795da9	2019-03-07 11:19:24 -08:00
Lu Fang	4480aa31c2	Automatic update of fbcode/onnx to 96c58ceeacf0f2b73d752e413e4fd78787a12da3 (#17676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17676 Previous import was e18bb41d255a23daf368ffd62a2645db55db4c72 Included changes: - [96c58ce](https://github.com/onnx/onnx/commit/96c58ce): Fix shape inference when auto_pad is notset again (#1830) <Li-Wen Chang> - [873ddbb](https://github.com/onnx/onnx/commit/873ddbb): More extendable Runner (#1809) <Michał Karzyński> Reviewed By: zrphercule Differential Revision: D14321241 fbshipit-source-id: 12de9021afc61f5435f1b719cccf7b0f4ad73a84	2019-03-07 11:10:31 -08:00
Lu Fang	1043ff6d68	Set the default ONNX opset to the latest stable opset (i.e., 9) (#17736 ) Summary: 1) The changes in the new opset won't affect internal pipeline. 2) The CI won't be affected by the ONNX changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17736 Reviewed By: zrphercule Differential Revision: D14358710 Pulled By: houseroad fbshipit-source-id: 4ef15d2246b50f6875ee215ce37ecf92d555ca6a	2019-03-07 10:56:06 -08:00
David Riazati	a2381fa346	Add module attributes (#17309 ) Summary: Similar to `nn.Parameter`s, this PR lets you store any `IValue` on a module as an attribute on a `ScriptModule` (only from the Python front-end currently). To mark something as an attribute, it should wrapped in `jit.Attribute(value, type)` (ex. `self.table = torch.jit.Attribute(table, Dict[str, torch.Tensor])`) Followup Work: * (de)serializing for use in C++ * change `self.training` to be a `bool` attribute instead of a buffer * mutable attributes * string frontend support * documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/17309 Differential Revision: D14354316 Pulled By: driazati fbshipit-source-id: 67e08ab5229366b67fbc837e67b58831a4fb3318	2019-03-07 10:44:10 -08:00
Spandan Tiwari	e4c9d75008	- refactoring serialization of ONNX initializers to be name-based (#17420 ) Summary: Currently, serialization of model parameters in ONNX export depends on the order in which they are stored in a container (`list` on Python side and `std::vector` on C++ side). This has worked fine till now, but if we need to do any pass on that graph that mutates the parameter list, then strictly order-based serialization may not work. This PR is the first in a set to bring in more passes (such as constant folding) related to ONNX export. This PR lays the groundwork by moving the serialization in ONNX export from order-based to name based approach, which is more amenable to some of the passes. houseroad - As discussed this change uses a map for export, and removes the code from `export.cpp` that relies on the order to compute initializer names. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17420 Differential Revision: D14361993 Pulled By: houseroad fbshipit-source-id: da93e945d55755c126de06641f35df87d1648cc4	2019-03-07 10:25:00 -08:00
Lara Haidar-Ahmad	3f94fc4862	ONNX Export for Max and Average Pooling in CEIL_MODE Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16769 Differential Revision: D14362175 Pulled By: houseroad fbshipit-source-id: 65cfb1dfba6a43d39cc85374add368fe8e4e5645	2019-03-07 10:10:21 -08:00
Elias Ellison	561037aef8	use flake8-mypy (#17721 ) Summary: Use flake8 installed with mypy checks so that our linter matches fbcode. Mypy type errors also provide valuable signal Pull Request resolved: https://github.com/pytorch/pytorch/pull/17721 Differential Revision: D14357778 Pulled By: eellison fbshipit-source-id: d8c9ea3fe3b5f550c3b70fe259e0eabf95e4c92d	2019-03-07 09:15:54 -08:00
Jongsoo Park	1d522598fb	use fp16<->fp32 intrinsic (#17496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17496 As title. Reviewed By: hyuen Differential Revision: D14222907 fbshipit-source-id: d5d6c032e725ca8b52aca2be7401ec3c59f6a242	2019-03-07 02:23:07 -08:00
Ahmed Aly	f8778aef78	Implement a Caffe2 standalone LSTM operator (#17726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17725 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461 Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions. Two things missing: - Profiling this implementation against the current ONNXified LSTM op - Make this operator available to use in PyTorch Reviewed By: dzhulgakov Differential Revision: D14351575 fbshipit-source-id: 3b99b53212cf593c7a49e45580b5a07b90809e64	2019-03-07 01:08:49 -08:00
Sebastian Messmer	7d02a1fbc7	caffe2:libtorch_cuda depends on caffe2:caffe2_gpu (#17729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17729 When doing "import torch" in fbcode, previously the caffe2 cuda kernels weren't loaded because libcaffe2_gpu.so wasn't loaded. Once you also did "from caffe2.python import workspace", then the cuda kernels were loaded because that triggered a runtime mechanism for loading libcaffe2_gpu.so. We want the cuda kernels to always be available, so this diff adds a dependency from caffe2:libtorch_cuda to caffe2:caffe2_gpu. Reviewed By: ezyang Differential Revision: D14353498 fbshipit-source-id: 76a9fe69f231b308ab40eac393bb216c6fad3658	2019-03-06 23:53:16 -08:00
Jongsoo Park	39423fbdd4	add tensor and cost inference functions (#17684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17684 Adding tensor and cost inference functions to more int8 operators. Reviewed By: yinghai Differential Revision: D14174746 fbshipit-source-id: dfad975fa75899565c8fb61f1b7747a9206ebd22	2019-03-06 23:34:02 -08:00
Lara Haidar	3dba1285ab	ONNX Export Narrow op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17550 Differential Revision: D14350401 Pulled By: houseroad fbshipit-source-id: 4d88079bb7a8bbd270b0272009826eb3b202cc33	2019-03-06 22:37:58 -08:00
Yinghai Lu	3230404645	Keep the dim_type of hinted shape as BATCH if possible (#17734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17734 If input is not BATCH, we will skip adjust its batch size during onnxifi transformation. So when we take hints, we take it as CONSTANT but later need to change it to BATCH if possible. Reviewed By: jackm321 Differential Revision: D14355983 fbshipit-source-id: 63eb54a44afb1565c71486fdd73db07ca0ac4fd4	2019-03-06 19:58:35 -08:00
jwu	8ec7357312	fix different round behavior on CPU and GPU #16498 (#17443 ) Summary: xxtemp, colesbury, bhushan23, zou3519, convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed. See #16498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17443 Differential Revision: D14261786 Pulled By: VitalyFedyunin fbshipit-source-id: 98156436b545d72769831a89e2775d43ad913ebc	2019-03-06 19:40:10 -08:00
zou3519	68c5c66800	Warn about memory overlaps on expanded tensors (#17576 ) Summary: Eventually we should remove these when we're certain that all our ops handle memory overlaps correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17576 Differential Revision: D14349990 Pulled By: zou3519 fbshipit-source-id: c3a09f6113b9b1bf93e7f13c0b426c45b2cdf21f	2019-03-06 17:44:04 -08:00
Tongzhou Wang	93768785ec	fix exp fam. formula Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17719 Differential Revision: D14349029 Pulled By: soumith fbshipit-source-id: cf016756a9319436f7379e8377f8bd1e1b672b40	2019-03-06 15:47:13 -08:00
Sebastian Messmer	f6fda4409b	refactor caffe2 operator constructors - 10/9 (#17659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17659 clangr codemod Reviewed By: ezyang Differential Revision: D14304675 fbshipit-source-id: 45fbd84c50651a70ae29bf46df3322715e99d225	2019-03-06 15:11:47 -08:00
Lu Fang	4db3f8f806	Improve ONNX symbolic for logsoftmax and softmax (#17672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17672 support dtype in the onnx symbolic Reviewed By: zrphercule Differential Revision: D14313987 fbshipit-source-id: e9364621b3f795191d880599711dfbcb220d0e31	2019-03-06 15:02:08 -08:00
peter	c78da0c6ed	Enable using CMD when building cpp extensions on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17706 Differential Revision: D14346482 Pulled By: ezyang fbshipit-source-id: 7c85e51c701f6c0947ad324ef19fafda40ae1cb9	2019-03-06 14:45:31 -08:00
Yinghai Lu	a87d475c2f	Do not rename net boundary inputs/outputs during ssaRewrite. (#17545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17545 This diff avoids renaming boundary inputs of net during onnxifi transform. It also removes adding mappings for the initializer during onnxifi op creation. Thus gets read of the mapped ws creation during onnxifi op creation. Reviewed By: zrphercule Differential Revision: D14243161 fbshipit-source-id: 6eafa920c45f6a6bfacbbb443e8e84cf9778644c	2019-03-06 14:26:58 -08:00
Sebastian Messmer	9024faaafe	Reapply D14078519 (#17596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17596 Was reverted before, now fixed version. Reviewed By: ezyang Differential Revision: D14270288 fbshipit-source-id: c72490b5d02cc6098cb60145fa9a842b3c9a24c5	2019-03-06 13:51:00 -08:00
eellison	bd7fcced69	Batch of expect file removals (#17581 ) Summary: Another batch of removing expect files. One note - I removed the Batched expect files without adding equivalent tests since they are already being tested in another ways, and we are no longer actively maintaining that project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17581 Differential Revision: D14343578 Pulled By: eellison fbshipit-source-id: ce0b1fd2b5b4ec80ad9003bab1b58f41645d3da6	2019-03-06 13:44:26 -08:00
jiej	39669316a6	(#14267 ) Summary: - Summary: Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group. Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API) - User-facing api: 1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None) 2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, *process_group=None) - supported use case: DistributedDataParallel with single-gpu multi-process* a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below: 1. use layers directly: torch.nn.SyncBatchNorm(...) similar API as with torch.nn.BatchNormXd(...) with added argument `process_group` which is used to limit the scope of synchronization within each process group. Default value is None, which implies synchronization across all GPUs 2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group) recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm` preserving values of parameters/buffers. the utility function also allows user to specify process_group value to all converted layers. b. user wraps their model with `torch.distributed.parallel.DataParallelDistributed`, from this point, user should follow the general guidelines for DDP use guide - Error checking For use cases not supported, we error out: 1. Application launched without ddp: > import torch > sbn = torch.nn.SyncBatchNorm(10).cuda() > inp = torch.randn(5, 10, 3, 3).cuda() > sbn(inp) --> Error! > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel 2. Application launched using DDP with multi-GPU per-process: > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank) > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267 Differential Revision: D14270035 Pulled By: ezyang fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d	2019-03-06 13:39:11 -08:00
Tongzhou Wang	0ed1b9fb98	Update ModuleDict doc about order Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17717 Differential Revision: D14346557 Pulled By: ezyang fbshipit-source-id: 2484c7d8105f9aa8bce5567d1fa2d4f587cc9cc2	2019-03-06 13:09:46 -08:00
Pieter Noordhuis	e2de88dc5a	Update CODEOWNERS (#17720 ) Summary: teng-li is passing the baton to mrshenli. Thanks for all your work on distributed teng-li!! 🎉 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17720 Differential Revision: D14350120 Pulled By: pietern fbshipit-source-id: edfe784520c54630203cc8fbb296455d3dbf341b	2019-03-06 12:33:48 -08:00
Lara Haidar-Ahmad	073634612f	ONNX Export Argmin and Argmax ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17382 Differential Revision: D14338811 Pulled By: houseroad fbshipit-source-id: be07548d8063d1aa94f1801c18137738365b85fb	2019-03-06 12:11:47 -08:00
Lu Fang	97eb139a94	Turn atol to 1e-5 when comparing the end to end results (#17708 ) Summary: results smaller than 1e-5 don't make sense. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17708 Differential Revision: D14348893 Pulled By: houseroad fbshipit-source-id: 5e07c38e5b58b27b61fae63bfc3c21e2fe5629fe	2019-03-06 12:06:45 -08:00
Elias Ellison	7fa996f8e2	remove loop expects (#17695 ) Summary: Replace loop unrolling expect files with assertions on the output IR Pull Request resolved: https://github.com/pytorch/pytorch/pull/17695 Differential Revision: D14347105 Pulled By: eellison fbshipit-source-id: 1703b4ca32bc1c67c01fc4330b0e6eb66feaa103	2019-03-06 11:48:46 -08:00
youkaichao	b87abdfc12	typo fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653 Differential Revision: D14302003 Pulled By: ezyang fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573	2019-03-06 11:36:44 -08:00
Deepali Chourasia	e3516d0a95	omit group conv NHWC test for GPU (#17715 ) Summary: Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error: ``` Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1]) You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case ``` This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`. I think this example should be skipped. I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715 Differential Revision: D14346642 Pulled By: ezyang fbshipit-source-id: b1f1fef09f625fdb43d31c7213854e61a96381ba	2019-03-06 11:32:35 -08:00
Elias Ellison	10ea02facf	fix tuple matching (#17687 ) Summary: Check for Tuple Matching in isSubvalueOf, since they may contain container types that need to be recursed within isSubvalueOf Fix for https://github.com/pytorch/pytorch/issues/17650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17687 Differential Revision: D14324642 Pulled By: eellison fbshipit-source-id: 7f1e019875286b2640a3b9c003d1635dda8cf543	2019-03-06 11:25:36 -08:00
Spandan Tiwari	c658d9b21b	Temporarily disable Upsample operator tests in pytorch-onnx tests (#17696 ) Summary: In discussion with houseroad, because Upsample op is being updated in ONNX https://github.com/onnx/onnx/pull/1773 and these tests are blocking it. These tests will be updated once the ONNX PR goes in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17696 Differential Revision: D14338845 Pulled By: houseroad fbshipit-source-id: cfaf8cf1ab578ae69dd3bf21b1c0681b572b9b6f	2019-03-06 11:25:34 -08:00
peter	08fb9021da	Add check for x64 Python before setup (#17707 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17657. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17707 Differential Revision: D14346705 Pulled By: ezyang fbshipit-source-id: 5daafacdb99eb9a9c6517263d10f20c79f920d24	2019-03-06 10:48:16 -08:00
Edward Yang	1e6acc676f	Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623 Despite it's generic sounding name, caffe2::DeviceGuard actually only worked on CUDA devices. Rename it to something that more clearly spells out its applicability. I'm not sure if it's the right call, but in this patch I added 'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more in-line with how the Caffe2 codebase is currently written. More idiomatic c10 namespace style would be to say cuda::CUDAGuard. Willing to change this if people shout. This is a respin of D13156470 (#14284) Reviewed By: dzhulgakov Differential Revision: D14285504 fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d	2019-03-06 10:48:15 -08:00
Duc Ngo	e9eb18a18c	Remove nomscheduler (#17693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17693 Remove nomscheduler tool Reviewed By: yinghai Differential Revision: D14328168 fbshipit-source-id: 674d0e18596a4dc2bbb6b8d321f4066c4fc454ab	2019-03-06 10:48:13 -08:00
bhushan	886e482776	index operation support for torch.HalfTensor (#17645 ) Summary: - Test cases added 1. indexing for half tensor 2. setting for half tensor fixes #17161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17645 Differential Revision: D14302069 Pulled By: ezyang fbshipit-source-id: 100f141c07046f200c904e27c5882a9417bccda0	2019-03-06 10:32:35 -08:00
Soumith Chintala	507c93bad2	Revert D14160172: Implement a Caffe2 standalone LSTM operator Differential Revision: D14160172 Original commit changeset: c33e3f9e8aea fbshipit-source-id: cffe35d93f0ac75ca93aa98a3b82af3d372f2fc1	2019-03-06 08:44:25 -08:00
Tongzhou Wang	39f94619ec	fix typo in hub doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17705 Differential Revision: D14338380 Pulled By: ailzhang fbshipit-source-id: d53eece30bede88a642e718ee6f829ba29c7d1c4	2019-03-05 23:19:30 -08:00
Ailing Zhang	fefaebabba	fix dropout AD & rename range to rangelist (#17691 ) Summary: fixes #17669 Address apaszke 's comments in #17523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17691 Differential Revision: D14328083 Pulled By: ailzhang fbshipit-source-id: 9ec4a54f13bfd1aaf4b1821dd00c31793ac07a44	2019-03-05 20:50:10 -08:00
Chaitanya Sri Krishna Lolla	36e0d39f50	enable use of MIOpen for depthwise convolutions (#17685 ) Summary: * added miopen conv mode to be used for setConvDescriptor * added miopen depthwise convolutions Pull Request resolved: https://github.com/pytorch/pytorch/pull/17685 Differential Revision: D14327811 Pulled By: bddppq fbshipit-source-id: d5bdc1abafd5f39694fadf3f9275b9d880c5b115	2019-03-05 18:44:14 -08:00
Ahmed Aly	bfe7a58f69	Implement a Caffe2 standalone LSTM operator (#17461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461 Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions. Two things missing: - Profiling this implementation against the current ONNXified LSTM op - Make this operator available to use in PyTorch Reviewed By: dzhulgakov Differential Revision: D14160172 fbshipit-source-id: c33e3f9e8aeae578b64d97593cb031a251216029	2019-03-05 17:34:44 -08:00
Soumith Chintala	a478d41620	Fix nll_loss crash on cpu where ignore_index is out of bounds (#17328 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17328 Differential Revision: D14322629 Pulled By: soumith fbshipit-source-id: 7d02f372be78794782c18affcfc109ce30b1e91c	2019-03-05 14:35:05 -08:00
Johannes M Dieterich	288e1fbd18	Add '--hip-clang-launch' to favor <<<>>>-based launch. (#17686 ) Summary: hip-clang uses triple chevron kernel dispatch syntax. Add an option to the hipification script to skip translating triple chevron to hipLaunchKernelGGL. Once we switch to hip-clang, this option will be default and subsequently removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17686 Differential Revision: D14327810 Pulled By: bddppq fbshipit-source-id: 5e1512325077dd3ebb8fb9b5bf35fd1f8d9a4dc3	2019-03-05 12:52:22 -08:00
Sam Gross	079093a662	Improve caching allocator for Pascal and newer GPUs. (#17120 ) Summary: ``` NVIDIA changed the CUDA allocation behavior on Pascal GPUs. The page size increased from 1MB to 2MB and allocations larger than 1MB are now always page-aligned. Previously, allocations larger than 1MB were aligned to 128KB boundaries. This interacted poorly with the caching allocator. The remaining memory in a page could only be filled by small cudaMalloc calls, but the caching allocator never cudaMalloc's a chunk smaller than 1MB. This behavior could also cause a large discrepancy between the memory usage reported by nvidia-smi and the memory usage reported by PyTorch, because nvidia-smi counts a partially used page as "full", while PyTorch only counts the actual memory requested. This PR makes a few changes to the caching allocator to better support Pascal and Volta GPUs: - All cudaMalloc calls are now multiples of 2MB (the page size) - Requests between 1-10MB allocate (and split) a 20MB block to reduce wasted space due to rounding - Small requests are now packed into 2MB blocks (instead of 1MB) This improves Mask R-CNN memory usage by 10-20% in internal tests on Volta GPUs. Maxwell performance seems to be largely unchanged, but it's possible that some use cases suffer slightly. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17120 Differential Revision: D14301536 Pulled By: colesbury fbshipit-source-id: a8282315ea8f7b8ca149b5066fdeaecd0d404edf	2019-03-05 09:44:27 -08:00
Davide Libenzi	8420a2025b	Turn the Half::from_bits into a constexpr function to avoid unresolve… (#17661 ) Summary: …d symbol errors when building in DEBUG mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17661 Differential Revision: D14319610 Pulled By: soumith fbshipit-source-id: 6c508a37155e29260f403d7174f343aa1ff32385	2019-03-05 07:31:38 -08:00
Elias Ellison	7fc3aa8c49	Remove Expect Files from python / tracing / script interop Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17622 Differential Revision: D14308307 Pulled By: eellison fbshipit-source-id: bda249d38ac2570000a12b0ca328c26233ecefe8	2019-03-04 23:04:54 -08:00
peterjc123	b219882c0b	Enable apex on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17675 Differential Revision: D14320473 Pulled By: soumith fbshipit-source-id: cb696984f5196f9b8b50722b4fe927bb6407c322	2019-03-04 21:53:47 -08:00
Soumith Chintala	f176450d60	bump docker build to upgrade magma to 2.5.0 (#17674 ) Summary: upgrades magma in docker build. vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/17674 Differential Revision: D14320187 Pulled By: soumith fbshipit-source-id: 7887f65fb703b802fc6231408b55ad9c4039882b	2019-03-04 20:31:16 -08:00
Sebastian Messmer	54c4b5a4db	refactor caffe2 operator constructors - 1/9 (#17082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17082 clangr codemod Reviewed By: ezyang Differential Revision: D14078498 fbshipit-source-id: f7f65d6d81c7942293f53fdaa61f756d8b7360c1	2019-03-04 16:04:01 -08:00
Sebastian Messmer	910519e45b	Expose cuda kernel for caffe2::GenerateProposals Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17066 Reviewed By: ezyang, wat3rBro Differential Revision: D14071130 fbshipit-source-id: 6fe26503f6069c36ec31d6c09b549b932d5db242	2019-03-04 14:59:08 -08:00
Jongsoo Park	aea8dd8377	print warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17176 As title Reviewed By: csummersea Differential Revision: D14111616 fbshipit-source-id: 1282cb2452c4ad385fd2dc6d3f8c19e9fec715ff	2019-03-04 14:28:42 -08:00
Sebastian Messmer	8569d9cbea	Fix XOutput/XOutputTensor for ivalue based c2 operators (#17599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17599 XOutput/XOutputTensor was broken for ivalue based operators. This diff fixes that. Reviewed By: ezyang Differential Revision: D14274003 fbshipit-source-id: b99f020244c66c4e2551dbd32ae0f665cc91b338	2019-03-04 14:20:13 -08:00
Sebastian Messmer	c7db0b35d8	Fix InputSize/OutputSize for ivalue based operators (#17579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17579 These methods previously just returned 0 when it was not a legacy operator, making it impossible to convert some operators. Reviewed By: dzhulgakov Differential Revision: D14253094 fbshipit-source-id: 72bfdcf6da291a4ab80d1e0ceb20984b86edc408	2019-03-04 14:20:12 -08:00
Wanchao Liang	173561ff12	Fix clamp fusion on missing limits (#17533 ) Summary: Fixes #17449 Context: before #17186, we don't fuse `clamp` for the case when `min/max` are missing inputs, because they are `prim::None` node, after #17186, we make None a `prim::Constant` node which enables the fusion for `clamp`. But codegen.cpp does not handle the case when `prim::Constant` is not a Double/Int/Bool, this PR makes it so that missing inputs are handled correctly, it is done in the following way: 1. emit nothing when you see `type? = prim::Constant()` 2. when emitRHS, do special casing for aten::clamp Pull Request resolved: https://github.com/pytorch/pytorch/pull/17533 Differential Revision: D14238450 Pulled By: wanchaol fbshipit-source-id: 61a272154754b13e89021bb86002927f02cde19c	2019-03-04 13:18:10 -08:00
Jie	a87eeec9bf	int32 indexing for Tensor Iterator Reduction (#17428 ) Summary: 1. Enabling int32 indexing for cases where TI cannot accumulate in output due to incompatible data types (e.g. Welford). 2. Updating Welford kernel to use int32 instead of int64 indexing on GPU. This change improves performance for torch.var / torch.std Implementation: 1. Allocated extra buffer to handle accumulation between sub Tensor Iterators. 2. Removed int64 indexing in gpu_reduce_kernel 3. WelfordOps now supports index type / combination typeas a template parameter. While GPU uses int32_t and float, CPU implementation uses int64_t and double. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17428 Differential Revision: D14264608 Pulled By: umanwizard fbshipit-source-id: 3eb54451de925b469dbc1127e5ea7443c4431036	2019-03-04 13:11:47 -08:00
Iurii Zdebskyi	3257608276	Removed all usages of TH_Index_Base (#17591 ) Summary: TH_Index_Base is hard coded to 0 and can be removed from the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17591 Differential Revision: D14269273 Pulled By: izdeby fbshipit-source-id: d844e261f4af7297bad8a81e7d6dcf0a391b94e6	2019-03-04 12:51:42 -08:00
Dmytro Dzhulgakov	dec116e96f	PyTorch/Caffe2 tensor interop in Python (#17190 ) Summary: Because of two separate python extensions with different pybind instances I have to go through void* conversion. Since it's hidden from user, it's fine. New APIs added on C2 side: - workspace.FetchTorch('blob') - workspace.Workspace.current.blobs['blob'].to_torch() - workspace.FeedBlob('blob', pytorch_tensor) Works on CPU an GPU. The only glitches are with resizing because of variable/tensor split. But data sharing works properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17190 Reviewed By: ezyang Differential Revision: D14163882 Pulled By: dzhulgakov fbshipit-source-id: d18e5b8fcae026f393c842a1149e972515732de2	2019-03-04 11:34:01 -08:00
wkcn	244d330980	Fixed typo in aten/src/ATen/native_parse.py (#17641 ) Summary: Hi, there. There is a typo in aten/src/ATen/native_parse.py, and I fix it. `std::aray` -> `std::array` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17641 Differential Revision: D14301981 Pulled By: ezyang fbshipit-source-id: a37859cdedcbf6c29333b954486dfa086d6c2176	2019-03-04 10:10:52 -08:00
Martin Schatz	5b835682e3	Remove GPU dependency from ProfileObserver (#17592 ) Summary: Remove GPU dependency and register ProfileObserver. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17592 Reviewed By: ezyang Differential Revision: D14265801 Pulled By: mdschatz fbshipit-source-id: f98c0c32653c64a8b087c58ece4f864dfbe1d4b8	2019-03-04 10:00:46 -08:00
Brennan Vincent	6a297b8675	Don't make factory methods create a tensor and then immediately copy it (#17565 ) Summary: Create a `make_variable` override that moves out of a tensor instead of going through `shallow_copy_and_detach`. Call this override from factory methods like `empty` that create a brand new tensor, do nothing with it, and then copy it into a variable. Will update this with actual numbers, but it seems to get rid of around 20-40% of the overhead of calling `torch.empty(0)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17565 Differential Revision: D14266130 Pulled By: umanwizard fbshipit-source-id: f57d5f2ca3f80ee8ee96d50f905e852fd10db941	2019-03-03 22:16:21 -08:00
Jack Richter-Powell	7a51c03a30	Fixed typo in torch/functional.py w/r/t broadcast_tensors (#17642 ) Summary: In reference to #17574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17642 Differential Revision: D14297177 Pulled By: ezyang fbshipit-source-id: 968176ea3b46a0153da0fd9e6b40db314d29e51c	2019-03-03 10:08:41 -08:00
Bryan He	01977c0a89	Change fake tqdm constructor to match real tqdm (#17636 ) Summary: Currently, the fake tqdm implementation requires an input (whereas real tqdm does not). This caused a problem in torchvision (https://github.com/pytorch/vision/pull/770), and seems likely to cause minor irritations elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17636 Differential Revision: D14296530 Pulled By: ezyang fbshipit-source-id: bc077d898773c93dab34c985a7b30525a43e558a	2019-03-03 01:06:10 -08:00
Christian Puhrsch	ef7ddcd29e	Mark native_functions as matched if uncaptured by JIT (#17631 ) Summary: Various functions aren't used by the JIT, so they're jit-compliant w.r.t. their schema by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17631 Differential Revision: D14295559 Pulled By: cpuhrsch fbshipit-source-id: a2ecdcb5df47eb67c54ec642d88d42e985515142	2019-03-02 18:20:04 -08:00
Christian Puhrsch	80927fc068	Ban std::array from native_functions.yaml Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17629 Differential Revision: D14292941 Pulled By: cpuhrsch fbshipit-source-id: 3c3eed57a5505a4e1da3aea682092677ab0e73e3	2019-03-01 19:21:49 -08:00
Christian Puhrsch	416474a720	Remove more usages of BoolTensor and IndexTensor from native_functions.yaml Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16468 Differential Revision: D14095405 Pulled By: cpuhrsch fbshipit-source-id: ea4d6bb7a4e81c05fe9861190ddbf52201612bbf	2019-03-01 19:14:41 -08:00
Thomas Viehmann	c6715eda06	Implement kthvalue in ATen (#17544 ) Summary: The CPU version is based on the TH version. The GPU version is based on #8406 by Pararth Shah (thank you). CPU quickselect based on that in TH's THTensorMoreMath.cpp, but with C++ (quickselectnoindex will be achieved by a different swap) CPU kthvalue is based on the THTensor function in the same file. The dim_apply function is a C++ replacement for TH_TENSOR_DIM_APPLYx macros. The CUDA kernel uses functions adapted from the THCTensorSortK implementation. In particular radixSelect is from THCTensorTopK.cuh. The CUDA launcher code replaces a bunch of macros with C++. It will be re-used in one of the following patches. Plan for further PRs: - This - Sort - TopK + Mode + Median in any order - Rip out THC stuff. There may be utility functions / structs in the SortingCommon.cuh that come into relevance only with sort. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17544 Differential Revision: D14286934 Pulled By: ezyang fbshipit-source-id: 35dbea050b097e88777ac5fa5c0f499d5e23c738	2019-03-01 19:00:10 -08:00
Christian Puhrsch	43f94077d8	Change vml.h to support sizes greater than 2**32 - 1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17280 Differential Revision: D14154997 Pulled By: cpuhrsch fbshipit-source-id: c19b15d18da59c9ee87e82765d3244d2a4ef6729	2019-03-01 17:22:26 -08:00
Grigory Arutyunov	2336f0ba06	msvc_fixes (#17201 ) Summary: Fixing MSVC errors ``` D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(144): error C4002: too many actual paramet ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp roj] D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(259): error C4002: too many actual paramet ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp roj] D:/pytorch-scripts/caffe2_builders/v141/pytorch/aten/src/THCUNN/SpatialDilatedMaxPooling.cu(51): error C4002: too man y actual parameters for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2 \caffe2_gpu.vcxproj] ``` on variadic C10_LAUNCH_BOUNDS as well as Debug linking issues with at::Half in pool_op_cudnn.cc like this one ``` pool_op_cudnn.obj : error LNK2019: unresolved external symbol "public: bool __cdecl caffe2::MaxPoolFunctor<class caff e2::CUDAContext>::GlobalPoolingBackward<struct c10::Half,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@ @$01@?$MaxPoolFunctor@VCUDAContext@caffe2@@caffe2@QEBA_NHHHPEBUHalf@c10@00PEAU23@PEAVCUDAContext@1@Z) referenced in function "public: bool __cdecl caffe2::`anonymous namespace'::CuDNNMaxPoolFunctor::GlobalPoolingBackward<struct c10::H alf,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@@$01@CuDNNMaxPoolFunctor@?A0xb936404a@caffe2@QEBA_NH HHPEBUHalf@c10@00PEAU34@PEAVCUDAContext@2@Z) [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caff e2_gpu.vcxproj] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17201 Differential Revision: D14165732 Pulled By: ezyang fbshipit-source-id: 875fd9a5b2db6f83fc483f6d750d2c011260eb8b	2019-03-01 15:17:41 -08:00
Jithun Nair	06c8aa7a3b	Hipify fixes for Masquerade logic (#17598 ) Summary: ezyang Please review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17598 Differential Revision: D14287724 Pulled By: ezyang fbshipit-source-id: 46e5083854a827370bb4c81b82e5a4ede511e473	2019-03-01 15:13:19 -08:00
Wanchao Liang	ab95b5c6cc	Rename prim::Undefined to prim::AutogradZero (#17611 ) Summary: supersedes #17245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17611 Differential Revision: D14283581 Pulled By: wanchaol fbshipit-source-id: 8022d02b8a021ea2fee9a18a2c8920eb123200c5	2019-03-01 15:13:18 -08:00
Roy Li	5b6703629c	Add python test for extension backend tensor.device (#17602 ) Summary: Adding a test for #17361 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17602 Differential Revision: D14287373 Pulled By: li-roy fbshipit-source-id: 544ecf17eb310aed22ba0ea5f86f46b8e3bb69b5	2019-03-01 14:22:49 -08:00
Edward Yang	2ed99fee0d	Revert D13935403: Call c10 cuda op from test_torch Differential Revision: D13935403 Original commit changeset: b2915ec8a366 fbshipit-source-id: 0f3409d5c102d719bc1f0483695aee93e7d613c9	2019-03-01 14:18:26 -08:00
Amy Yang	c2c32340a4	add command line option to use hive filler; add README (#17619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17619 --filler hive --iter -1 will let debugger exhaust all batches from a hive partition before exiting. add README that summarizes command line options and usage. Reviewed By: yinghai Differential Revision: D14220166 fbshipit-source-id: daa23b7e8a9184481c6d7b67acf1599e5c99d74a	2019-03-01 13:56:15 -08:00
Thomas Viehmann	5360984fbd	Remove TH(CU)NN Sparse Linear (#17610 ) Summary: Sparse Linear in TH(CU)NN implements sparse linear layers without using sparse matrices. It is currently not documented in PyTorch and there is no functional or module interface. This means it is unused from a PyTorch point of view. The reason for removing it is twofold: - The module uses sort, which I would like to move to ATen. - When we implement a SparseLinear layer, we would want to do it using sparse tensors, so it's not all that useful, anyway. I checked this on slack with soumith, I hope the above is an accurate representation. All bad ideas are my own. This is part of the ongoing work to move sort/topk/mode/median/kthvalue to ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17610 Differential Revision: D14280663 Pulled By: gchanan fbshipit-source-id: 289231d2c20626855ce2ceecd4f204b460c32378	2019-03-01 12:36:52 -08:00
ZhuBaohe	19a6de328f	Correct docstring of vision/init functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17351 Differential Revision: D14276355 Pulled By: soumith fbshipit-source-id: 9b572b6a04eeb1e44cd93961edac76ed10f7b24e	2019-03-01 11:40:23 -08:00
Sebastian Messmer	0a7b2af13b	Call c10 cuda op from test_torch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16692 Reviewed By: ezyang Differential Revision: D13935403 fbshipit-source-id: b2915ec8a3664bb6e918ed357908cc33d8f9449a	2019-03-01 10:59:19 -08:00
peter	698f947463	Revert #17191 and #17215 that no longer apply on Windows (#17567 ) Summary: They are previously merged to resolve #17051. However, since it was resolved by the upstream, and it was causing some issues like https://github.com/abjer/tsds/issues/8, I think it's time to revert these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17567 Differential Revision: D14265241 Pulled By: kostmo fbshipit-source-id: 7fa2b7dd4ebc5148681acb439cf82d983898694e	2019-03-01 10:37:27 -08:00
Michael Suo	e6a9062335	usertype -> class (#17528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17528 as title. register_prim_ops is messy because someone ruined clang-format, but I figured it's okay to include here since this is such a mechanical change Reviewed By: driazati Differential Revision: D14236943 fbshipit-source-id: c2b22845837b7f830015510e48ec2ee5202fa407	2019-03-01 10:08:23 -08:00
Michael Suo	830ca665f5	alias analysis refactor take 2 (#17594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17594 The original version of this broke things because a concurrent change raced with it in CI. Reviewed By: ezyang Differential Revision: D14266663 fbshipit-source-id: e8ac5dfcb7349b4f2c425d9f0eabbfc964314063	2019-03-01 10:08:22 -08:00
peter	7fddd01c51	Fix the missing Windows CPU job in the build status section (#17608 ) Summary: It will be better to split the CPU job on CI. But unluckily, we are out of Windows machines. cc, davidbrownellWork yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17608 Differential Revision: D14281393 Pulled By: soumith fbshipit-source-id: ae9a6140b7207ce56cfb2da3d812bc3fe060764a	2019-03-01 10:03:25 -08:00
peter	81f2bdf9c2	Update magma to 2.5.0 for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17607 Differential Revision: D14281291 Pulled By: yf225 fbshipit-source-id: 51209c5540932871e45e54ba6d61b3b7d264aa8c	2019-03-01 09:53:56 -08:00
bhushan	a6170573c8	Adding support for 0-d tensor for transpose (.t()) (#17535 ) Summary: - Test updates 1. test_torch: added 0-d test case and t_() test cases 2. test_jit : updated error message for TestAsync.test_async_script_error - Updating documentation for torch.t() Adding information regarding new support of 0-D and 1-D tenso Fixes #17520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17535 Differential Revision: D14269984 Pulled By: gchanan fbshipit-source-id: 38b723f31484be939261c88edb33575d242eca65	2019-03-01 08:45:01 -08:00
svcscm	6899e901cc	Updating submodules Reviewed By: yns88 fbshipit-source-id: 05fafcfb34c76f425ac5c8ef24a5f920641c2cf7	2019-03-01 01:37:01 -08:00
Junjie Bai	212024282b	Mark cudaGetLastError return value unused in C10_CUDA_CHECK Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17605 Reviewed By: xw285cornell Differential Revision: D14277586 Pulled By: bddppq fbshipit-source-id: 38879208f2ab83cf39d8a8a61b288cd09fcafd9a	2019-03-01 00:05:46 -08:00
Huan Gui	d3fcd0d798	add dropout during eval (#17549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17549 Currently Dropout is only enabled in training, we enable the option of having dropout in Eval. This is to follow [1]. This functionality would be used for uncertainty estimation in exploration project. [1] Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016. Reviewed By: Wakeupbuddy Differential Revision: D14216216 fbshipit-source-id: 87c8c9cc522a82df467b685805f0775c86923d8b	2019-02-28 23:21:29 -08:00
Johannes M Dieterich	3ed44b6714	Adjust launch_bounds annotation for AMD hardware. (#17555 ) Summary: The max pooling backwards kernel is currently annotated with launch bounds (256,8). Adjust the number of waves to 4 (4 times 64 is 256) for ROCm. This improves training performance for torchvision models by up to 15% (AlexNet) on a gfx906 GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17555 Differential Revision: D14277744 Pulled By: bddppq fbshipit-source-id: 2a62088f7b8a87d1e350c432bf655288967c7883	2019-02-28 22:59:11 -08:00
Sebastian Messmer	6b07612cef	Fix verbose compiler warning in flat_hash_map (#17562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17562 fixes https://github.com/pytorch/pytorch/issues/17332 Reviewed By: ezyang Differential Revision: D14254499 fbshipit-source-id: 9d5d7408c2ce510ac20cd438c6514dc2bbe3a854	2019-02-28 16:38:43 -08:00
Sebastian Messmer	35a52aa33f	Fix diagnostic pragmas (#17561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17561 The push at the top of the file was missing a corresponding pop Reviewed By: ezyang Differential Revision: D14254500 fbshipit-source-id: ff20359b563d6d6dcc68273dc754ab31aa8fad12	2019-02-28 16:38:42 -08:00
Sebastian Messmer	2e94054e34	Allow dispatch based on tensor list args (#17522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17522 Dispatch is still based on the first tensor arg, but that first "tensor arg" is now allowed to be a tensor list. That is, the first argument that is either Tensor or TensorList will be the deciding factor for dispatch. If it is a TensorList, then that TensorList must not be empty or dispatch will fail. Reviewed By: ezyang Differential Revision: D14235840 fbshipit-source-id: 266c18912d56ce77aa84306c5605c4191f3d882b	2019-02-28 16:32:00 -08:00
Sebastian Messmer	b004b31d06	Allow exposing caffe2 operators with variable number of input tensors to c10 (#17491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17491 Before, there was no way to expose a caffe2 operator that had a variable number of inputs. Now, this is allowed by giving the operator one tensor list input. Note that the tensor list must be the first input, and that any other tensor inputs will be ignored and inaccessible in this case. Reviewed By: ezyang Differential Revision: D14220705 fbshipit-source-id: 7f921bfb581caf46b229888c409bbcc40f7dda80	2019-02-28 16:31:59 -08:00
Syed Tousif Ahmed	1ccf74ae9d	blacklist fft algorithms for strided dgrad (#17016 ) Summary: Applies https://github.com/pytorch/pytorch/pull/16626 from v1.0.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17016 Differential Revision: D14270100 Pulled By: ezyang fbshipit-source-id: 1137899dd1551d33d16f39e8dde76cad8192af46	2019-02-28 16:28:07 -08:00
Sebastian Messmer	0f60283e84	Revert D14078519: [codemod][caffe2] [clangr] refactor caffe2 operator constructors - 5/9 Differential Revision: D14078519 Original commit changeset: b0ca31a52e4a fbshipit-source-id: 713ae108d3dd6f33abdbf98a5f213e57e2b64642	2019-02-28 15:09:28 -08:00
David Riazati	b36d9351b1	Add generic list/dict custom op bindings (#17587 ) Summary: Fixes #17017 Sandcastle refuses to land #17037, so trying fresh here Pull Request resolved: https://github.com/pytorch/pytorch/pull/17587 Differential Revision: D14265402 Pulled By: driazati fbshipit-source-id: b942721aa9360ac6b3862f552ac95529eb0cf52c	2019-02-28 15:00:26 -08:00
Sebastian Messmer	7413f0926a	refactor caffe2 operator constructors - 8/9 (#17089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17089 clangr codemod Reviewed By: ezyang Differential Revision: D14078539 fbshipit-source-id: 9ca196af4af7f26fc82e6cf82b35d478d0597752	2019-02-28 14:45:20 -08:00
Sebastian Messmer	28b5df1c8f	refactor caffe2 operator constructors - 6/9 (#17087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17087 clangr codemod Reviewed By: ezyang Differential Revision: D14078525 fbshipit-source-id: 7cc03b30b0d4eb99818e35406be4119b27bdb1bc	2019-02-28 14:23:57 -08:00
Sebastian Messmer	a4ed7126ca	refactor caffe2 operator constructors - 2/9 (#17083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17083 clangr codemod Reviewed By: ezyang Differential Revision: D14078504 fbshipit-source-id: 34dddb035eee2fca3150e47c57489614b91b6725	2019-02-28 14:23:55 -08:00
Sebastian Messmer	8db403b9dc	refactor caffe2 operator constructors - 7/9 (#17088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17088 clangr codemod also manually moved the constructor of a class from the .cpp file to the .h file. Reviewed By: ezyang Differential Revision: D14078531 fbshipit-source-id: 2adb4ac0ce523742da6cce3bc3b6c177b816c299	2019-02-28 14:23:53 -08:00
Sebastian Messmer	42512242cc	refactor caffe2 operator constructors - 4/9 (#17085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17085 clangr codemod Reviewed By: ezyang Differential Revision: D14078515 fbshipit-source-id: aaa48ae10892e3f47063f2133e026fea46f3240b	2019-02-28 14:23:52 -08:00
Sebastian Messmer	b0d3165cc8	refactor caffe2 operator constructors - 3/9 (#17084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17084 clangr codemod Reviewed By: ezyang Differential Revision: D14078507 fbshipit-source-id: ed02d772890b30196302b6830f541f054b7e95c8	2019-02-28 14:13:17 -08:00
Edward Yang	c9989dfe37	Make HIPStream also masquerade as CUDA. (#17469 ) Summary: HIPGuard interfaces that interacted with HIPStream were previously totally busted (because the streams had the wrong device type). This fixes it, following along the same lines of MasqueardingAsCUDA. Along the way I beefed up the explanatory comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc jithunnair-amd iotamudelta bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/17469 Differential Revision: D14243396 Pulled By: ezyang fbshipit-source-id: 972455753a62f8584ba9ab194f9c785db7bb9bde	2019-02-28 13:46:11 -08:00
Alex Şuhan	e157a6432f	Fix Python device type property for XLA and MSNPU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17361 Differential Revision: D14243546 Pulled By: soumith fbshipit-source-id: b7498968f72e3d97de5bf6e5b44c5a59b6913acb	2019-02-28 13:36:19 -08:00
Morgan Funtowicz	c596683309	Rely on numel() == 1 to check if distribution parameters are scalar. (#17503 ) Summary: As discussed here #16952, this PR aims at improving the __repr__ for distribution when the provided parameters are torch.Tensor with only one element. Currently, __repr__() relies on dim() == 0 leading to the following behaviour : ``` >>> torch.distributions.Normal(torch.tensor([1.0]), torch.tensor([0.1])) Normal(loc: torch.Size([1]), scale: torch.Size([1])) ``` With this PR, the output looks like the following: ``` >>> torch.distributions.Normal(torch.tensor([1.0]), torch.tensor([0.1])) Normal(loc: 1.0, scale: 0.10000000149011612) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17503 Differential Revision: D14245439 Pulled By: soumith fbshipit-source-id: a440998905fd60cf2ac9a94f75706021dd9ce5bf	2019-02-28 13:36:17 -08:00
Zachary DeVito	9cbd7a18f5	fix reordering of inlines (#17557 ) Summary: See comment inside of code. This fixes a bug where sometimes we would try to avoid printing long lines but would inadvertently reorder the expressions, which can change the semantics of the program Pull Request resolved: https://github.com/pytorch/pytorch/pull/17557 Differential Revision: D14250608 Pulled By: zdevito fbshipit-source-id: d44996af4e90fe9ab9508d13cd04adbfc7bb5d1c	2019-02-28 13:12:15 -08:00
Xiang Gao	2e5a8cee82	Customize the printing of namedtuple return (#17136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17112 ```python print("good", torch.randn(5,5,5).max(1)) print("terrible", torch.randn(5,5,10).max(1)) print("not as good", torch.randn(5,5,500).max(1)) print ("old behaviour = gold standard") print(tuple(torch.randn(5,5,5).max(1))) print(tuple(torch.randn(5,5,10).max(1))) print(tuple(torch.randn(5,5,500).max(1))) ``` now gives ``` >>> import torch >>> print("good", torch.randn(5,5,5).max(1)) good torch.return_types.max( values=tensor([[ 1.2821, 1.8063, 1.8075, 1.3082, -0.1267], [ 0.3437, 0.7353, 1.2619, 0.7557, 1.6662], [ 0.8583, 1.8906, 1.0246, 1.7598, 1.1184], [ 1.7821, 0.0230, 0.9452, 1.0318, 1.0823], [ 0.4116, -0.0379, -0.1843, 1.4129, 1.8796]]), indices=tensor([[4, 4, 3, 2, 1], [1, 2, 4, 1, 1], [2, 4, 0, 2, 1], [0, 2, 0, 3, 1], [0, 4, 4, 4, 4]])) >>> print("terrible", torch.randn(5,5,10).max(1)) terrible torch.return_types.max( values=tensor([[ 2.1272, 1.3664, 2.2067, 1.3974, -0.0883, 1.2505, 1.0074, 1.1217, 0.3849, 0.6936], [ 0.6288, -0.4560, 1.2748, 1.5482, 1.2777, 1.6874, 0.7151, 0.6041, 1.3572, 1.6232], [ 1.6703, 1.0075, 1.6480, 2.2839, 1.3390, 0.4938, 1.6449, 1.7628, 0.8141, 2.5714], [ 0.7079, 1.8677, 3.2478, 1.5591, 2.4870, 0.8635, -0.1450, 1.6923, 1.4924, 1.6298], [ 2.4056, 0.8002, 0.9317, 0.7455, 0.7866, 2.1191, 0.3492, 1.2095, 1.8637, 1.7470]]), indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4], [4, 2, 2, 1, 2, 2, 3, 1, 1, 3], [0, 3, 3, 0, 2, 1, 4, 1, 0, 1], [4, 1, 3, 0, 3, 2, 0, 1, 4, 3], [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]])) >>> print("not as good", torch.randn(5,5,500).max(1)) not as good torch.return_types.max( values=tensor([[ 0.3877, 0.7873, 1.8701, ..., 0.5971, 1.6103, -0.3435], [ 1.1300, 2.2418, 1.4239, ..., 1.3943, 0.3872, 1.6475], [ 2.0656, 1.3136, 0.9896, ..., 2.3918, 0.8226, 1.0517], [ 1.1054, 0.9945, 1.0561, ..., 2.1039, 1.1524, 3.0304], [ 1.5041, 2.2809, 1.0883, ..., 0.8504, 2.4774, 1.1041]]), indices=tensor([[4, 3, 1, ..., 1, 4, 0], [4, 4, 4, ..., 3, 0, 3], [3, 0, 1, ..., 2, 2, 4], [0, 1, 1, ..., 4, 2, 2], [1, 0, 4, ..., 2, 0, 2]])) >>> print ("old behaviour = gold standard") old behaviour = gold standard >>> print(tuple(torch.randn(5,5,5).max(1))) (tensor([[ 1.1908, 1.1807, 1.3151, 1.7184, 0.3556], [ 0.3798, 0.9213, 0.3001, 1.3087, 2.2419], [ 1.4233, 1.4814, 1.9900, 1.7744, 1.3059], [ 1.0026, -0.0330, 1.3061, 1.8730, 2.0685], [ 1.3041, 1.6458, 1.3449, 1.8948, 3.6206]]), tensor([[0, 4, 3, 4, 0], [1, 1, 4, 0, 4], [4, 1, 0, 3, 3], [1, 2, 1, 4, 0], [3, 3, 0, 3, 3]])) >>> print(tuple(torch.randn(5,5,10).max(1))) (tensor([[-0.1232, 0.8275, 0.6732, 1.1223, 0.8247, 1.2851, 1.6009, 1.9979, 1.9109, 0.7313], [ 0.2260, 0.5922, 1.6928, 0.6024, 2.1158, 3.0619, 0.5653, 0.7426, 0.8316, 0.6346], [ 0.4319, 0.2231, 0.5255, 1.7620, 1.1657, 0.8875, 0.5782, 0.6506, 0.5032, 1.7097], [ 0.4137, 1.7265, 1.4260, 2.0301, 1.2244, 0.7128, 2.6345, 0.7230, 1.3553, 1.6508], [ 1.0684, 1.7195, 1.4068, 0.7076, -0.0242, 0.8474, 0.8754, 1.7108, 0.2188, 1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0], [1, 4, 0, 0, 3, 2, 0, 0, 3, 3], [2, 3, 1, 1, 4, 0, 1, 4, 4, 4], [0, 4, 1, 3, 2, 0, 2, 0, 3, 1], [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]])) >>> print(tuple(torch.randn(5,5,500).max(1))) (tensor([[0.9395, 1.5572, 1.8797, ..., 2.0494, 0.8202, 0.9623], [1.7937, 0.7225, 1.8836, ..., 0.7927, 1.4976, 1.1813], [0.8558, 1.6943, 1.4192, ..., 0.8327, 1.9661, 0.4197], [1.2993, 1.4995, 0.9357, ..., 0.7810, 1.3030, 2.6216], [1.4206, 1.8315, 1.0338, ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3, ..., 3, 0, 2], [0, 1, 0, ..., 0, 4, 3], [3, 4, 3, ..., 3, 0, 0], [3, 2, 3, ..., 1, 2, 1], [1, 2, 4, ..., 3, 1, 3]])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136 Differential Revision: D14250021 Pulled By: VitalyFedyunin fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93	2019-02-28 13:07:26 -08:00
Michael Suo	1046593509	Revert D14231251: [jit] alias_analysis refactor Differential Revision: D14231251 Original commit changeset: 6cd98ae6fced fbshipit-source-id: 96189f47daf7cc4cf4ef5cd343022d56a2296b39	2019-02-28 12:56:17 -08:00
Sebastian Messmer	44fb22f9fe	refactor caffe2 operator constructors - 5/9 (#17086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17086 clangr codemod Reviewed By: ezyang Differential Revision: D14078519 fbshipit-source-id: b0ca31a52e4ab97b145a1490461d59f8fa93874a	2019-02-28 12:00:39 -08:00
Michael Suo	54c5b10934	alias_analysis refactor (#17511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17511 AliasTracker was doing bookkeeping for three concepts: the points-to graph, writes, and wildcards. This PR makes AliasTracker's job clearer: it keeps track of the points-to graph. Thus it has been renamed MemoryDAG. Write and wildcard information were pulled back into AliasDb as part of this—I may decide to pull them into their own little modules since I don't want the alias analysis stuff to get too bloated. This refactor is necessary because we want to start tracking information for aliasing elements that _aren't_ first-class IR Values (e.g. the "stuff" inside a list). So MemoryDAG can't know too much about Values Reviewed By: houseroad Differential Revision: D14231251 fbshipit-source-id: 6cd98ae6fced8d6c1522c2454da77c3c1b2b0504	2019-02-28 12:00:36 -08:00
Michael Suo	f9d3f1dca5	allow "before" and "after" alias annotations (#17480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17480 This was always part of our "spec" but not implemented Reviewed By: houseroad Differential Revision: D14214301 fbshipit-source-id: 118db320b43ec099dc3e730c67d39487474c23ea	2019-02-28 12:00:34 -08:00
Rui Zhu	e1df99295f	ONNXIFI extension & e2e tests. (#17478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17478 Enable onnxifi_ext in glow and build an e2e test in caffe2. Reviewed By: yinghai Differential Revision: D14190136 fbshipit-source-id: 26245278b487b551623109b14432f675279b17b5	2019-02-28 11:54:55 -08:00
Soumith Chintala	7fbee1f79e	update slack invite instructions Summary: update slack invite instructions Reviewed By: pjh5 Differential Revision: D14255348 fbshipit-source-id: 564fed0d44a6a68f80d1894fed40c3ddb360aa52	2019-02-28 11:29:05 -08:00
Evgeny Mankov	456d3e5f56	Fix errors in the description for installation on Windows (#17475 ) Summary: + All quotes for ENV VARS are erroneous; + Toolset hasn't be specified; + Provide paths for all 3 Visual Studio 2017 products: Community/Professional/Enterprise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17475 Differential Revision: D14262968 Pulled By: soumith fbshipit-source-id: c0504e0a6be9c697ead83b06b0c5cf569b5c8625	2019-02-28 10:41:16 -08:00
Sebastian Messmer	a9395ce259	refactor caffe2 operator constructors - 9/9 (#17090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17090 clangr codemod Reviewed By: ezyang Differential Revision: D14078550 fbshipit-source-id: 68e6de4298e55ce83039b7806c1a275c4d6593c8	2019-02-28 09:53:18 -08:00
Gemfield	9bcceb75b5	Fix the false generated_comment (#17563 ) Summary: The generated_comments are wrong to below generated files: ```bash ./torch/csrc/autograd/generated/VariableType_0.cpp:3:// generated from tools/autograd/templates/VariableType_0.cpp ./torch/csrc/autograd/generated/VariableType_1.cpp:3:// generated from tools/autograd/templates/VariableType_1.cpp ./torch/csrc/autograd/generated/VariableType_2.cpp:3:// generated from tools/autograd/templates/VariableType_2.cpp ./torch/csrc/autograd/generated/VariableType_3.cpp:3:// generated from tools/autograd/templates/VariableType_3.cpp ./torch/csrc/autograd/generated/VariableType_4.cpp:3:// generated from tools/autograd/templates/VariableType_4.cpp ./torch/csrc/autograd/generated/VariableTypeEverything.cpp:3:// generated from tools/autograd/templates/VariableTypeEverything.cpp ./torch/csrc/jit/generated/register_aten_ops_0.cpp:23:// generated from tools/autograd/templates/register_aten_ops_0.cpp ./torch/csrc/jit/generated/register_aten_ops_1.cpp:23:// generated from tools/autograd/templates/register_aten_ops_1.cpp ./torch/csrc/jit/generated/register_aten_ops_2.cpp:23:// generated from tools/autograd/templates/register_aten_ops_2.cpp ``` These generated files were split to speed the compile, however, the template files are not. After this fix, the comments will look like below: ```bash ./torch/csrc/autograd/generated/VariableType_0.cpp:3:// generated from tools/autograd/templates/VariableType.cpp ./torch/csrc/autograd/generated/VariableType_1.cpp:3:// generated from tools/autograd/templates/VariableType.cpp ...... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17563 Differential Revision: D14260992 Pulled By: soumith fbshipit-source-id: 038181367fa43bee87837e4170704ddff7f4d6f2	2019-02-28 09:44:08 -08:00
Dmytro Dzhulgakov	13e6326c07	Remove useless OpenCV reference Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17564 Differential Revision: D14255542 Pulled By: dzhulgakov fbshipit-source-id: c129f3751ae82deedd258ee16586552b77baaca6	2019-02-27 23:31:10 -08:00
Ailing Zhang	03132c1f56	convolution/matmul/dropout (#17523 ) Summary: * Add AD formula for _convolution & matmul & dropout * add prim::range, fixes #17483 Example: ``` dim = 3 x = range(dim) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17523 Differential Revision: D14254002 Pulled By: ailzhang fbshipit-source-id: ba60d77b047db347929b72beca2623fb26aec957	2019-02-27 21:41:59 -08:00
Elias Ellison	221edddd18	disallow shape analysis with resize ops (#17518 ) Summary: resize_ and resize_as resize the input tensor. because our shape analysis is flow invariant, we don't do shape analysis on any op that relies on a Tensor that can alias a resized Tensor. E.g. in the following graph the x += 10 x may have been resized. ``` torch.jit.script def test(x, y): for i in range(10): x += 10 x.resize_as_([1 for i in int(range(torch.rand()))) return x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17518 Differential Revision: D14249835 Pulled By: eellison fbshipit-source-id: f281b468ccb8c29eeb0f68ca5458cc7246a166d9	2019-02-27 19:02:09 -08:00
Sebastian Messmer	6706e9af19	Make C10_MOBILE consistent with how feature macros are usually used (#17481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481 Usually, feature macros are either defined or undefined and checked accordingly. C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0. This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it from the server build (because I was using ifdef). Also, I found a place in the existing code base that made that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts Reviewed By: dzhulgakov Differential Revision: D14214825 fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6	2019-02-27 17:57:51 -08:00
Sebastian Messmer	7c5ffc4120	Disable c10 dispatcher on mobile (#17078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17078 This prevents caffe2 operators from being expsoed to c10 on mobile, which in turn causes the whole c10 dispatcher to be stripped away and saves binary size. We probably want to re-enable the c10 dispatcher for mobile, but for now this is ok. Reviewed By: ezyang Differential Revision: D14077972 fbshipit-source-id: e4dd3e3b60cdfbde91fe0d24102c1d9708d3e5c4	2019-02-27 17:57:50 -08:00
Shen Li	1154506533	Always synchronize src and dst streams when copying tensors (#16966 ) Summary: fixes #15568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16966 Differential Revision: D14213144 Pulled By: mrshenli fbshipit-source-id: 2fcf5e07895fde80b4aee72e2736b0def876d21f	2019-02-27 14:57:56 -08:00
Lara Haidar	5f06dcc4d7	ONNX Export Adaptive Pooling Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17412 Differential Revision: D14247923 Pulled By: houseroad fbshipit-source-id: 5530cea8f80da7368bff1e29cf89c45ad53accee	2019-02-27 14:57:54 -08:00
Christian Puhrsch	e47aeede32	Use name for output variables instead of out in JIT (#17386 ) Summary: This adds 88 matches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17386 Differential Revision: D14179139 Pulled By: cpuhrsch fbshipit-source-id: 2c3263b8e4d084db84791e53290e8c8b1b7aecd5	2019-02-27 14:03:33 -08:00
Jesse Hellemn	1971c0528d	Forcing UTC on Mac circleci jobs (#17516 ) Summary: And adding timestamps to linux build jobs Pull Request resolved: https://github.com/pytorch/pytorch/pull/17516 Differential Revision: D14244533 Pulled By: pjh5 fbshipit-source-id: 26c38f59e0284c99f987d69ce6a2c2af9116c3c2	2019-02-27 13:22:06 -08:00
Xiaomeng Yang	9709d5e787	Fix math::Set for large tensor (#17539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17539 Fix math::Set for large tensor i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov, houseroad Differential Revision: D14240756 fbshipit-source-id: 0ade26790be41fb26d2cc193bfa3082c7bd4e69d	2019-02-27 12:34:58 -08:00
Natalia Gimelshein	b4572668b4	Add sparse gradient option to `gather` operation (#17182 ) Summary: This PR allows `gather` to optionally return sparse gradients, as requested in #16329. It also allows to autograd engine to accumulate sparse gradients in place when it is safe to do so. I've commented out size.size() check in `SparseTensor.cpp` that also caused #17152, it does not seem to me that check serves a useful purpose, but please correct me if I'm wrong and a better fix is required. Motivating example: For this commonly used label smoothing loss function ``` def label_smoothing_opt(x, target): padding_idx = 0 smoothing = 0.1 logprobs = torch.nn.functional.log_softmax(x, dim=-1, dtype=torch.float32) pad_mask = (target == padding_idx) ll_loss = logprobs.gather(dim=-1, index=target.unsqueeze(1), sparse = True).squeeze(1) smooth_loss = logprobs.mean(dim=-1) loss = (smoothing - 1.0) * ll_loss - smoothing * smooth_loss loss.masked_fill_(pad_mask, 0) return loss.sum() ``` backward goes from 12.6 ms with dense gather gradients to 7.3 ms with sparse gradients, for 9K tokens x 30K vocab, which is some single percent end-to-end improvement, and also improvement in peak memory required. Shout-out to core devs: adding python-exposed functions with keyword arguments through native_functions.yaml is very easy now! cc gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/17182 Differential Revision: D14158431 Pulled By: gchanan fbshipit-source-id: c8b654611534198025daaf7a634482b3151fbade	2019-02-27 11:42:48 -08:00
Jane Wang	a2b9f7f484	add elastic zeus handler (#16746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16746 as titled. We use a special url schem elasticzeus for elastic zeus so that we dont need to change the public interface of init_process_group. Reviewed By: aazzolini, soumith Differential Revision: D13948151 fbshipit-source-id: 88939dcfa0ad93467dabedad6905ec32e6ec60e6	2019-02-27 11:29:59 -08:00
Jongsoo Park	222a07863f	optimize elementwise sum (#17456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17456 Using an instruction sequence similar to function in fbgemm/src/QuantUtilAvx2.cc elementwise_sum_benchmark added Reviewed By: protonu Differential Revision: D14205695 fbshipit-source-id: 84939c9d3551f123deec3baf7086c8d31fbc873e	2019-02-27 10:12:41 -08:00
rohithkrn	8c72217817	Enable boolean_mask, adadelta, adagrad fp16 on ROCm (#17235 ) Summary: - Fix bugs, indentation for adadelta and adagrad tests to enable fp16 - Enable boolean_mask fp16 on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/17235 Differential Revision: D14240828 Pulled By: bddppq fbshipit-source-id: ab6e8f38aa7afb83b4b879f2f4cf2277c643198f	2019-02-27 10:07:36 -08:00
Iurii Zdebskyi	e0b44cac1f	Enabled HALF for fill() and zero() methods. Moved them into THTensorFill (#17536 ) Summary: For some additional context on this change, please, see this [PR](https://github.com/pytorch/pytorch/pull/17376) As a part of work on Bool Tensor, we will need to add support for a bool type to _fill() and _zero() methods that are currently located in THTensorMath. As we don't need anything else and those methods are not really math related - we are moving them out into separate THTensorFill for simplicity. Change: -moved _fill() and _zero() from THTensorMath.h to THTensorFill -enabled _fill() and _zero() for HALF type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17536 Differential Revision: D14242130 Pulled By: izdeby fbshipit-source-id: 1d8bd806f0f5510723b9299d360b70cc4ab96afb	2019-02-27 09:21:54 -08:00
Tongzhou Wang	44a607b90c	Fix autograd with buffers requiring grad in DataParallel (#13352 ) Summary: Causing a problem with spectral norm, although SN won't use that anymore after #13350 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/13352 Differential Revision: D14209562 Pulled By: ezyang fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134	2019-02-26 20:53:19 -08:00
Chaitanya Sri Krishna Lolla	74098eadb0	enable assymetric dilations and stride for miopen conv (#17472 ) Summary: As of MIOpen 1.7.1 as shipped in ROCm 2.1 this works correctly and we can use MIOpen and do not need to fall back Pull Request resolved: https://github.com/pytorch/pytorch/pull/17472 Differential Revision: D14210323 Pulled By: ezyang fbshipit-source-id: 4c08d0d4623e732eda304fe04cb722c835ec70e4	2019-02-26 20:45:35 -08:00
Johannes M Dieterich	76828647c1	Enable tests working on ROCm 2.1 dual gfx906 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17473 Reviewed By: bddppq Differential Revision: D14210243 Pulled By: ezyang fbshipit-source-id: 519032a1e73c13ecb260ea93102dc8efb645e070	2019-02-26 20:41:16 -08:00
peter	96faaa9d50	Fix linking errors when building dataloader test binaries on Windows (#17494 ) Summary: Fixes #17489. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17494 Differential Revision: D14226525 Pulled By: ezyang fbshipit-source-id: 3dfef9bc6f443d647e9f05a54bc17c5717033723	2019-02-26 20:36:45 -08:00
hysts	cbefd0323b	Fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17521 Differential Revision: D14237482 Pulled By: soumith fbshipit-source-id: 636e0fbe2c667d15fcb649136a65ae64937fa0cb	2019-02-26 20:23:34 -08:00
Christian Puhrsch	eff672ef06	Remove Bool/IndexTensor from schema for native functions with derivatives (#17193 ) Summary: This only deals with four functions, but is an important first step towards removing BoolTensor and IndexTensor entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17193 Differential Revision: D14157829 Pulled By: cpuhrsch fbshipit-source-id: a36f16d1d88171036c44cc7de60ac9dfed9d14f2	2019-02-26 17:54:33 -08:00
Ilia Cherniavskii	348d1889ff	Fix operator initialization order (#15445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15445 Initilize task graph after operators (task graph uses ops) Reviewed By: yinghai Differential Revision: D13530864 fbshipit-source-id: fdc91e9158c1b50fcc96fd1983fd000fdf20c7da	2019-02-26 15:41:16 -08:00
bhushan	4ca1a54526	Make transpose consistent with numpy's behavior (#17462 ) Summary: Pytorch's tensor.t() is now equivalent with Numpy's ndarray.T for 1D tensor i.e. tensor.t() == tensor Test case added: - test_t fixes #9687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17462 Differential Revision: D14214838 Pulled By: soumith fbshipit-source-id: c5df1ecc8837be22478e3a82ce4854ccabb35765	2019-02-26 14:23:19 -08:00
Lu Fang	63519df07a	Bump up the ONNX default opset version to 10 (#17419 ) Summary: Align with the master of ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17419 Reviewed By: zrphercule Differential Revision: D14197985 Pulled By: houseroad fbshipit-source-id: 13fc1f7786aadbbf5fe83bddf488fee3dedf58ce	2019-02-26 12:37:27 -08:00
liangdzou	72eb70c272	' ' ==> ' ' (#17498 ) Summary: Fix formatting error for cpp code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17498 Reviewed By: zou3519 Differential Revision: D14224549 Pulled By: fmassa fbshipit-source-id: f1721c4a75908ded759aea8c561f2e1d66859eec	2019-02-26 12:31:50 -08:00
Johannes M Dieterich	1607bb322d	Support all ROCm supported uarchs simultaneously: gfx803, gfx900, gfx906 (#17367 ) Summary: Correct misspelled flag. Remove dependency on debug flag (HCC_AMDGPU_TARGET) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17367 Differential Revision: D14227334 Pulled By: bddppq fbshipit-source-id: d838f219a9a1854330b0bc851c40dfbba77a32ef	2019-02-26 11:54:07 -08:00
knightXun	5903522ad6	refactor: a bit intricate so I refactor it (#16995 ) Summary: this code is a bit intricate so i refactor it Pull Request resolved: https://github.com/pytorch/pytorch/pull/16995 Differential Revision: D14050667 Pulled By: ifedan fbshipit-source-id: 55452339c6518166f3d4bc9898b1fe2f28601dc4	2019-02-26 10:21:22 -08:00
Elias Ellison	e5b4baab40	new batch of expect file removals Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17486 Differential Revision: D14218963 Pulled By: eellison fbshipit-source-id: dadc8bb71e756f47cdb04525d47f66c13ed56d16	2019-02-26 08:20:43 -08:00
Michael Suo	2cdbb140e6	user defined types (#17314 ) Summary: First pass at user defined types. The following is contained in this PR: - `UserType` type, which contains a reference to a module with all methods for the type, and a separate namespace for data attributes (map of name -> TypePtr). - `UserTypeRegistry`, similar to the operator registry - `UserObject` which is the runtime representation of the user type (just a map of names -> IValues) - `UserTypeValue` SugaredValue, to manage getattr and setattr while generating IR, plus compiler.cpp changes to make that work. - Frontend changes to get `torch.jit.script` to work as a class decorator - `ClassDef` node in our AST. - primitive ops for object creation, setattr, and getattr, plus alias analysis changes to make mutation safe. Things that definitely need to get done: - Import/export, python_print support - String frontend doesn't understand class definitions yet - Python interop (using a user-defined type outside TorchScript) is completely broken - Static methods (without `self`) don't work Things that are nice but not essential: - Method definition shouldn't matter (right now you can only reference a method that's already been defined) - Class definitions can only contain defs, no other expressions are supported. Things I definitely won't do initially: - Polymorphism/inheritance Pull Request resolved: https://github.com/pytorch/pytorch/pull/17314 Differential Revision: D14194065 Pulled By: suo fbshipit-source-id: c5434afdb9b39f84b7c85a9fdc2891f8250b5025	2019-02-26 01:34:07 -08:00
Michael Suo	3ceeaae5e6	add mutability to docs (#17454 ) Summary: Not sure the best way to integrate this…I wrote something that focuses on mutability "vertically" through the stack. Should I split it up and distribute it into the various sections, or keep it all together? Pull Request resolved: https://github.com/pytorch/pytorch/pull/17454 Differential Revision: D14222883 Pulled By: suo fbshipit-source-id: 3c83f6d53bba9186c32ee443aa9c32901a0951c0	2019-02-26 00:36:19 -08:00
Christian Puhrsch	5d77f4f0d5	Remove usages of int64_t from native_functions.yaml Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17387 Differential Revision: D14185458 Pulled By: cpuhrsch fbshipit-source-id: 5c8b358d36b77b60c3226afcd3443c2b1727cbc2	2019-02-25 17:52:26 -08:00
Michael Suo	2f840ba6d6	upload alias tracker graph for docs (#17476 ) Summary: as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/17476 Differential Revision: D14218312 Pulled By: suo fbshipit-source-id: 64df096a3431a6f25cd2373f0959d415591fed15	2019-02-25 16:58:43 -08:00
Ailing Zhang	68e90a398e	Temporarily disable select/topk/kthvalue AD (#17470 ) Summary: Temporarily disable them for perf consideration. Will figure out a way to do `torch.zeros(sizes, grad.options())` in torchscript before enabling these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17470 Differential Revision: D14210313 Pulled By: ailzhang fbshipit-source-id: efaf44df1192ae42f4fe75998ff0073234bb4204	2019-02-25 16:29:11 -08:00
Elias Ellison	411cf434af	Batch of expect file removals Remove dce expect files (#17471 ) Summary: Batch of removing expect test files Pull Request resolved: https://github.com/pytorch/pytorch/pull/17471 Differential Revision: D14217265 Pulled By: eellison fbshipit-source-id: 425da022115b7e83aca86ef61d4d41fd046d439e	2019-02-25 16:15:19 -08:00
Dmytro Dzhulgakov	df0d4e6c7a	Back out part of "Fix NERPredictor for zero initialization" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17482 Reviewed By: david-y-lam Differential Revision: D14216135 fbshipit-source-id: 2ef4cb5dea74fc5c68e9b8cb43fcb180f219cb32	2019-02-25 16:03:45 -08:00
Stefan Krah	e4e9b738d3	Followup to #17049 : change more instances of RuntimeError to IndexError Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17114 Differential Revision: D14150890 Pulled By: gchanan fbshipit-source-id: 579ca71665166c6a904b894598a0b334f0d8acc7	2019-02-25 15:34:22 -08:00
Krishna Kalyan	59ece70201	Missing argument description (value) in scatter_ function documentation (#17467 ) Summary: Update the docs to include the value parameter that was missing in the `scatter_` function. Differential Revision: D14209225 Pulled By: soumith fbshipit-source-id: 5c65e4d8fbd93fcd11a0a47605bce6d57570f248	2019-02-25 14:39:26 -08:00
Lu Fang	9e08c998db	Throw exception when foxi is not checked out (#17477 ) Summary: Add check and provide useful warning/error information to user if foxi is not checked out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17477 Reviewed By: zrphercule Differential Revision: D14212896 Pulled By: houseroad fbshipit-source-id: 557247d5d8fdc016b1c24c2a21503e59f874ad09	2019-02-25 14:39:24 -08:00
svcscm	6f53c51a01	Updating submodules Reviewed By: yns88 fbshipit-source-id: ae3e05c2ee3af5df171556698ff1469780d739d1	2019-02-25 13:50:20 -08:00
Michael Suo	79a5a73a1e	simplify aliasdb interface (#17453 ) Summary: Stack:     ⚫  #17453 [jit] simplify aliasdb interface  [💛](https://our.intern.facebook.com/intern/diff/D14205209/) The previous "getWrites" API relies on the user to do alias checking, which is confusing and inconsistent with the rest of the interface. So replace it with a higher-level call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17453 Differential Revision: D14209942 Pulled By: suo fbshipit-source-id: d4aff2af6062ab8465ee006fc6dc603296bcb7ab	2019-02-25 13:34:51 -08:00
Elias Ellison	b0b7541ca4	fix list type unification (#17424 ) Summary: Previously we were unifying the types of lists across if block outputs. This now fails with Optional subtyping because two types which can be unified have different runtime representations. ``` torch.jit.script def list_optional_fails(x): # type: (bool) -> Optional[int] if x: y = [1] else: y = [None] return y[0] ``` the indexing op will expect y to be a generic list, but it will find an intlist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17424 Differential Revision: D14210903 Pulled By: eellison fbshipit-source-id: 4b8b26ba2e7e5bebf617e40316475f91e9109cc2	2019-02-25 13:34:50 -08:00
Shen Li	b527055fcf	Restore current streams on dst device after switching streams (#17439 ) Summary: When switching back to `d0` from a stream on a different device `d1`, we need to restore the current streams on both `d0` and `d1`. The current implementation only does that for `d0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17439 Differential Revision: D14208919 Pulled By: mrshenli fbshipit-source-id: 89f2565b9977206256efbec42adbd789329ccad8	2019-02-25 12:06:41 -08:00
Lu Fang	29c27d7b99	Automatic update of fbcode/onnx to e18bb41d255a23daf368ffd62a2645db55db4c72 (#17460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17460 Previous import was 4c091e048ca42682d63ccd3c1811560bc12b732d Included changes: - [e18bb41](https://github.com/onnx/onnx/commit/e18bb41): Infer shape of the second output of Dropout op (#1822) <Shinichiro Hamaji> - [cb544d0](https://github.com/onnx/onnx/commit/cb544d0): Clarify dtype of Dropout's mask output (#1826) <Shinichiro Hamaji> - [b60f693](https://github.com/onnx/onnx/commit/b60f693): Fix shape inference when auto_pad is notset (#1824) <Li-Wen Chang> - [80346bd](https://github.com/onnx/onnx/commit/80346bd): update test datat (#1825) <Rui Zhu> - [b37fc6d](https://github.com/onnx/onnx/commit/b37fc6d): Add stringnormalizer operator to ONNX (#1745) <Dmitri Smirnov> Reviewed By: zrphercule Differential Revision: D14206264 fbshipit-source-id: 0575fa3374ff2b93b2ecee9989cfa4793c599117	2019-02-25 11:09:08 -08:00
Will Feng	393c97fda7	Fix variable checking in THCPModule_setRNGState (#17474 ) Summary: See https://github.com/pytorch/pytorch/pull/16325/files#r259576901 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17474 Differential Revision: D14209549 Pulled By: yf225 fbshipit-source-id: 2ae091955ae17f5d1540f7d465739c4809c327f8	2019-02-25 11:05:51 -08:00
vishwakftw	724c7e76c6	Fix reduction='none' in poisson_nll_loss (#17358 ) Summary: Changelog: - Modify `if` to `elif` in reduction mode comparison - Add error checking for reduction mode Pull Request resolved: https://github.com/pytorch/pytorch/pull/17358 Differential Revision: D14190523 Pulled By: zou3519 fbshipit-source-id: 2b734d284dc4c40679923606a1aa148e6a0abeb8	2019-02-25 10:35:33 -08:00
Michael Liu	f9ba3831ef	Apply modernize-use-override (4) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. bypass-lint drop-conflicts Reviewed By: ezyang Differential Revision: D14191981 fbshipit-source-id: 1f3421335241cbbc0cc763b8c1e85393ef2fdb33	2019-02-25 08:31:27 -08:00
Gregory Chanan	15a55b86ed	Fix nonzero for scalars on cuda, to_sparse for scalars on cpu/cuda. (#17406 ) Summary: I originally set out to fix to_sparse for scalars, which had some overly restrictive checking (sparse_dim > 0, which is impossible for a scalar). This fix uncovered an issue with nonzero: it didn't properly return a size (z, 0) tensor for an input scalar, where z is the number of nonzero elements (i.e. 0 or 1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17406 Differential Revision: D14185393 Pulled By: gchanan fbshipit-source-id: f37a6e1e3773fd9cbf69eeca7fdebb3caa192a19	2019-02-25 08:23:40 -08:00
Tongliang Liao	65ecef1509	Export ElementwiseLinear to ONNX (Mul + Add). (#17411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17411 Reshape-based approach to support dynamic shape. The first Reshape flatten inner dimensions and the second one recover the actual shape. No Shape/Reshape will be generated unless necessary. ![image](https://user-images.githubusercontent.com/5203025/52215001-114ace80-28ce-11e9-815f-28ad190d3189.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16716 Reviewed By: zrphercule Differential Revision: D14094532 Pulled By: houseroad fbshipit-source-id: bad6a1fbf5963ef3dd034ef4bf440f5a5d6980bc	2019-02-25 08:11:13 -08:00
Lu Fang	3d68a2d6de	Add foxi submodule (ONNXIFI facebook extension) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17178 Reviewed By: yinghai Differential Revision: D14197987 Pulled By: houseroad fbshipit-source-id: c21d7235e40c2ca4925a10c467c2b4da2f1024ad	2019-02-25 08:00:03 -08:00
Michael Liu	3de67cd63d	Fix remaining -Wreturn-std-move violations in fbcode (#17308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17308 In some cases there is still no RVO/NRVO and std::move is still needed. Latest Clang gained -Wreturn-std-move warning to detect cases like this (see https://reviews.llvm.org/D43322). Reviewed By: igorsugak Differential Revision: D14150915 fbshipit-source-id: 0df158f0b2874f1e16f45ba9cf91c56e9cb25066	2019-02-25 07:29:16 -08:00
Michael Suo	4ac91b2d64	add debug/release tip to cpp docs (#17452 ) Summary: as title. These were already added to the tutorials, but I didn't add them to the cpp docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17452 Differential Revision: D14206501 Pulled By: suo fbshipit-source-id: 89b5c8aaac22d05381bc4a7ab60d0bb35e43f6f5	2019-02-24 23:08:15 -08:00
Michael Suo	15840e30dc	add pointer to windows FAQ in contributing.md (#17450 ) Summary: " ProTip! Great commit summaries contain fewer than 50 characters. Place extra information in the extended description." lol Pull Request resolved: https://github.com/pytorch/pytorch/pull/17450 Differential Revision: D14206500 Pulled By: suo fbshipit-source-id: af7ffe299f8c8f04fa8e720847a1f6d576ebafc1	2019-02-24 23:03:00 -08:00
Thomas Viehmann	d76b9395a0	Remove ROIPooling (#17434 ) Summary: Fixes: #17399 It's undocumented, unused and, according to the issue, not actually working. Differential Revision: D14200088 Pulled By: soumith fbshipit-source-id: a81f0d0f5516faea2bd6aef5667b92c7dd012dbd	2019-02-23 21:00:10 -08:00
Krishna Kalyan	d80f0a1f3a	Add example to WeightedRandomSampler doc string (#17432 ) Summary: Example for the weighted random sampler are missing [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler) Differential Revision: D14198642 Pulled By: soumith fbshipit-source-id: af6d8445d31304011002dd4308faaf40b0c1b609	2019-02-23 20:29:06 -08:00
Michael Suo	96b765dcf6	Revert D14095703: [pytorch][PR] [jit] Add generic list/dict custom op bindings Differential Revision: D14095703 Original commit changeset: 2b5ae20d42ad fbshipit-source-id: 85b23fe4ce0090922da953403c95691bf3e28710	2019-02-23 15:55:08 -08:00
svcscm	a1ca908ac2	Updating submodules Reviewed By: zpao fbshipit-source-id: 8fa0be05e7410a863febb98b18be55ab723a41db	2019-02-23 12:45:50 -08:00
Jaliya Ekanayake	bb3a2d99ac	Jaliyae/chunk buffer fix (#17409 ) Summary: The chunk buffer had a possibility to hang when no data is read and the buffer size is lower than chunk size. We detected this while running with larger dataset and hence the fix. I added a test to mimic the situation and validated that the fix is working. Thank you Xueyun for finding this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17409 Differential Revision: D14198546 Pulled By: soumith fbshipit-source-id: b8ca43b0400deaae2ebb6601fdc65b47f32b0554	2019-02-23 08:48:53 -08:00
Stefan Krah	5ea6344c54	Skip test_event_handle_multi_gpu() on a single GPU system (#17402 ) Summary: This fixes the following test failure: ``` ====================================================================== ERROR: test_event_handle_multi_gpu (__main__.TestMultiprocessing) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_multiprocessing.py", line 445, in test_event_handle_multi_gpu with torch.cuda.device(d1): File "/home/stefan/rel/lib/python3.7/site-packages/torch/cuda/__init__.py", line 229, in __enter__ torch._C._cuda_setDevice(self.idx) RuntimeError: cuda runtime error (10) : invalid device ordinal at /home/stefan/pytorch/torch/csrc/cuda/Module.cpp:33 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17402 Differential Revision: D14195190 Pulled By: soumith fbshipit-source-id: e911f3782875856de3cfbbd770b6d0411d750279	2019-02-23 08:29:36 -08:00
Olen ANDONI	be4ad3fe30	fix(typo): Change 'integeral' to 'integer' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17396 Differential Revision: D14195023 Pulled By: soumith fbshipit-source-id: 300ab68c24bfbf10768fefac44fad64784463c8f	2019-02-23 08:22:01 -08:00
Lu Fang	6a99f86429	Fix the ONNX expect file (#17430 ) Summary: The CI is broken now, this diff should fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17430 Differential Revision: D14198045 Pulled By: houseroad fbshipit-source-id: a1c8cb5ccff66f32488702bf72997f634360eb5b	2019-02-23 00:02:02 -08:00
Karl Ostmo	674e11ccde	order caffe2 ubuntu configs contiguously (#17427 ) Summary: This involves another purely cosmetic (ordering) change to the `config.yml` to facilitate simpler logic. Other changes: * add some review feedback as comments * exit with nonzero status on config.yml mismatch * produce a diagram for pytorch builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/17427 Differential Revision: D14197618 Pulled By: kostmo fbshipit-source-id: 267439d3aa4c0a80801adcde2fa714268865900e	2019-02-22 20:18:29 -08:00
Jongsoo Park	c10662962c	remove redundant inference functions for FC (#17407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17407 As title says Reviewed By: csummersea Differential Revision: D14177921 fbshipit-source-id: e48e1086d37de2c290922d1f498e2d2dad49708a	2019-02-22 20:13:20 -08:00
Jongsoo Park	08fed51926	optimize max pool 2d (#17418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17418 Retry of D14181620 this time with CMakeLists.txt changes Reviewed By: jianyuh Differential Revision: D14190538 fbshipit-source-id: c59b1bd474edf6376f4c2767a797b041a2ddf742	2019-02-22 19:43:57 -08:00
Roy Li	340cf2a2dd	Generate derived extension backend Type classes for each scalar type (#17278 ) Summary: Previously we only generate one class for each extension backend. This caused issues with scalarType() calls and mapping from variable Types to non-variable types. With this change we generate one Type for each scalar type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17278 Reviewed By: ezyang Differential Revision: D14161489 Pulled By: li-roy fbshipit-source-id: 91e6a8f73d19a45946c43153ea1d7bc9d8fb2409	2019-02-22 18:38:33 -08:00
Ilia Cherniavskii	47263e48f4	Better handling of net errors in prof_dag counters (#17384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17384 Better handling of possible net run errors in prof_dag counters. Reviewed By: yinghai Differential Revision: D14177619 fbshipit-source-id: 51bc952c684c53136ce97e22281b1af5706f871e	2019-02-22 18:38:31 -08:00
eellison	d8d8371bd3	Batch of Expect Files removal (#17414 ) Summary: Batch of removing expect files, and some tests that no longer test anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17414 Differential Revision: D14196342 Pulled By: eellison fbshipit-source-id: 75c45649d1dd1ce39958fb02f5b7a2622c1d1d01	2019-02-22 18:11:51 -08:00
Arthur Crippa Búrigo	c65b0cbe3d	Fix target name. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17365 Differential Revision: D14195831 Pulled By: soumith fbshipit-source-id: fdf03f086f650148c34f4c548c66ef1eee698f05	2019-02-22 17:27:16 -08:00
Zachary DeVito	4491577fb5	jit technical docs - parts 1, 2, and most of 3 (#16887 ) Summary: This will evolve into complete technical docs for the jit. Posting what I have so far so people can start reading it and offering suggestions. Goto to Files Changed and click 'View File' to see markdown formatted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16887 Differential Revision: D14191219 Pulled By: zdevito fbshipit-source-id: 071a0e7db05e4f2eb657fbb99bcd903e4f46d84a	2019-02-22 17:27:14 -08:00
Vishwak Srinivasan	9e69703dac	USE_ --> BUILD_ for CAFFE2_OPS and TEST (#17390 ) Differential Revision: D14195572 Pulled By: soumith fbshipit-source-id: 28e4ff3fe03a151cd4ed014c64253389cb85de3e	2019-02-22 17:19:44 -08:00
Gemfield	3ab2080047	Fix install libcaffe2_protos.a issue mentioned in #14317 (#17393 ) Summary: Fix install libcaffe2_protos.a issue mentioned in #14317. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17393 Differential Revision: D14195359 Pulled By: soumith fbshipit-source-id: ed4da594905d708d03fcd719dc50aec6811d5d3f	2019-02-22 17:05:48 -08:00
Yinghai Lu	1d05d0d848	Improve onnxifi backend init time (#17375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17375 Previously we create the onnxGraph first and take it to the onnx manager for registration. It doesn't work well in practice. This diff takes "bring your own constructor" approach to reduce the resource spent doing backend compilation. Reviewed By: kimishpatel, rdzhabarov Differential Revision: D14173793 fbshipit-source-id: cbc4fe99fc522f017466b2fce88ffc67ae6757cf	2019-02-22 16:58:30 -08:00
vfdev	e984244828	fix code block typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17421 Differential Revision: D14194877 Pulled By: soumith fbshipit-source-id: 6173835d833ce9e9c02ac7bd507cd424a20f2738	2019-02-22 16:34:12 -08:00
Junjie Bai	807632d402	Double resnet50 batch size in benchmark script (#17416 ) Summary: The benchmarks are now running on gpu cards with more memory Pull Request resolved: https://github.com/pytorch/pytorch/pull/17416 Differential Revision: D14190493 Pulled By: bddppq fbshipit-source-id: 66db1ca1fa693d24c24b9bc0185a6dd8a3337103	2019-02-22 15:30:35 -08:00
Mikhail Zolotukhin	6d744f8fbf	Preserve names when converting to/from NetDef. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17378 Differential Revision: D14176515 Pulled By: ZolotukhinM fbshipit-source-id: da9ea28310250ab3ca3a99cdc210fd8d1fbbc82b	2019-02-22 15:25:52 -08:00
David Riazati	dbd66c17bc	Add generic list/dict custom op bindings (#17037 ) Summary: Fixes #17017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17037 Differential Revision: D14095703 Pulled By: driazati fbshipit-source-id: 2b5ae20d42ad21c98c86a8f1cd7f1de175510507	2019-02-22 14:49:43 -08:00
Elias Ellison	93e8b938ff	fix test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17304 Differential Revision: D14151545 Pulled By: eellison fbshipit-source-id: d85535b709c58e2630b505ba57e9823d5a59c1d5	2019-02-22 14:43:23 -08:00
Ailing Zhang	9aae82bc2c	Improvements for current AD (#17187 ) Summary: This PR removes a few size of `self` that passed from forward pass to backward pass when `self` is already required in backward pass. This could be reason that cause the potential slow down in #16689 . I will attach a few perf numbers (still a bit volatile among runs tho) I got in the comment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17187 Differential Revision: D14179512 Pulled By: ailzhang fbshipit-source-id: 5f3b1f6f26a3fef6dec15623b940380cc13656fa	2019-02-22 14:34:14 -08:00
Lu Fang	e422b27f17	Bump up the producer version in ONNX exporter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17410 Reviewed By: zrphercule Differential Revision: D14187821 Pulled By: houseroad fbshipit-source-id: a8c1d2f7b6ef63e7e92cba638e90922ef98b8702	2019-02-22 14:28:59 -08:00
Michael Kösel	b18c60839d	list add insert and remove (#17200 ) Summary: See https://github.com/pytorch/pytorch/issues/16662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17200 Differential Revision: D14144020 Pulled By: driazati fbshipit-source-id: c9a52954fd5f4fb70e3a0dc02d2768e0de237142	2019-02-22 14:12:56 -08:00
Jesse Hellemn	9977f43d19	Pin nightly builds to last commit before 5am UTC (#17381 ) Summary: This fell through the cracks from the migration from pytorch/builder to circleci. It's technically still racey, but is much less likely now Pull Request resolved: https://github.com/pytorch/pytorch/pull/17381 Differential Revision: D14190137 Pulled By: pjh5 fbshipit-source-id: 2d4cd04ee874cacce47d1d50b87a054b0503bb82	2019-02-22 14:08:06 -08:00
Zachary DeVito	356a94b64e	Lazily load libcuda libnvrtc from c++ (#17317 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16860 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17317 Differential Revision: D14157877 Pulled By: zdevito fbshipit-source-id: c37aec2d77c2e637d4fc6ceffe2bd32901c70317	2019-02-22 13:51:45 -08:00
Elias Ellison	81b43202ae	Refactor Type Parser b/w Schemas & IRParser into a type common parser (#17383 ) Summary: Creates a new shared type parser to be shared between the IR parser and the Schema Parser. Also adds parsing of CompleteTensorType and DimensionedTensorType, and feature-gates that for the IRParser. Renames the existing type_parser for python annotations, python_type_parser, and names the new one jit_type_parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17383 Differential Revision: D14186438 Pulled By: eellison fbshipit-source-id: bbd5e337917d8862c7c6fa0a0006efa101c76afe	2019-02-22 13:43:55 -08:00
Lu Fang	b0c18570ca	add the support for stable ONNX opsets in exporter (#16068 ) Summary: Still wip, need more tests and correct handling for opset 8 in symbolics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16068 Reviewed By: zrphercule Differential Revision: D14185855 Pulled By: houseroad fbshipit-source-id: 55200be810c88317c6e80a46bdbeb22e0b6e5f9e	2019-02-22 12:05:17 -08:00
Karl Ostmo	dd3acbc6d5	add readme and notice at the top of config.yml (#17323 ) Summary: reorder some envars for consistency add readme and notice at the top of config.yml generate more yaml from Python closes #17322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17323 Differential Revision: D14186734 Pulled By: kostmo fbshipit-source-id: 23b2b2c1960df6f387f1730c8df1ec24a30433fd	2019-02-22 11:30:49 -08:00
Lu Fang	0c24f3754b	Revert D14181620: [caffe2/int8] optimize max pool 2d Differential Revision: D14181620 Original commit changeset: ffc6c4412bd1 fbshipit-source-id: 4391703164a672c9a8daecb24a46578765df67c6	2019-02-22 11:23:59 -08:00
Gu, Jinghui	60de0b885f	fallback operators to CPU for onnx support (#15270 ) Summary: fallback operators to CPU for onnx support Pull Request resolved: https://github.com/pytorch/pytorch/pull/15270 Differential Revision: D14099496 Pulled By: yinghai fbshipit-source-id: 52b744aa5917700a802bdf19f7007cdcaa6e640a	2019-02-22 10:47:53 -08:00
Jongsoo Park	4778a4089e	optimize max pool 2d (#17391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17391 Optimize 2D max pool using AVX2 intrinsics. Reviewed By: jianyuh Differential Revision: D14181620 fbshipit-source-id: ffc6c4412bd1c1d7839fe06226921df40d9cab83	2019-02-22 10:36:19 -08:00
Iurii Zdebskyi	8a21c6a5ee	Fixed the script for the THC generated files (#17370 ) Summary: As of tight now, the script will produce a new generated file which will be inconsistent with the rest. Test Result: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17370 Differential Revision: D14184943 Pulled By: izdeby fbshipit-source-id: 5d3b956867bee661256cb4f38f086f33974a1c8b	2019-02-22 09:42:43 -08:00
Gregory Chanan	2b86cc442c	Fix coalesce, clone, to_dense for sparse scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17379 Differential Revision: D14183641 Pulled By: gchanan fbshipit-source-id: dbd071b648695d51502ed34ab204a1aee7e6259b	2019-02-22 09:02:37 -08:00
Tongzhou Wang	3d5968d366	Fix DataParallel(cpu_m).cuda() not working by checking at forward (#17363 ) Summary: Fixes #17362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17363 Differential Revision: D14175151 Pulled By: soumith fbshipit-source-id: 7b7e2335d553ed2133287deeaca3f6b6254aea4a	2019-02-22 08:31:36 -08:00
Will Feng	be6ad7ddde	Rename BatchNorm running_variance to running_var (#17371 ) Summary: Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (https://github.com/pytorch/vision/pull/728#issuecomment-466067138): ``` terminate called after throwing an instance of 'c10::Error' what(): No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so) frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1) frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1) frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1) frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1) frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1) frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6) frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) ``` Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem. This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17371 Reviewed By: goldsborough Differential Revision: D14172775 Pulled By: yf225 fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6	2019-02-22 08:00:25 -08:00
svcscm	562fa55f3d	Updating submodules Reviewed By: zpao fbshipit-source-id: ac16087a2b27b028d8e9def81369008c4723d70f	2019-02-21 22:52:41 -08:00
Chandler Zuo	260f66c316	Fix concat dimension check bug (#17343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17343 See [post](https://fb.workplace.com/groups/1405155842844877/permalink/2630764056950710/) Reviewed By: dzhulgakov Differential Revision: D14163001 fbshipit-source-id: 038f15d6a58b3bc31910e7bfa47c335e25739f12	2019-02-21 19:34:30 -08:00
David Riazati	1fea60be25	Add dict to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16640 Differential Revision: D14178270 Pulled By: driazati fbshipit-source-id: 581040abd0b7f8636c53fd97c7365df99a2446cf	2019-02-21 17:45:24 -08:00
David Riazati	2370c989d8	Add LSTM to standard library (#15744 ) Summary: WIP Attempt 2 at #14831 This adds `nn.LSTM` to the jit standard library. Necessary changes to the module itself are detailed in comments. The main limitation is the lack of a true `PackedSequence`, instead this PR uses an ordinary `tuple` to stand in for `PackedSequence`. Most of the new code in `rnn.py` is copied to `nn.LSTM` from `nn.RNNBase` to specialize it for LSTM since `hx` is a `Tuple[Tensor, Tensor]` (rather than just a `Tensor` as in the other RNN modules) for LSTM. As a hack it adds an internal annotation `@_parameter_list` to mark that a function returns all the parameters of a module. The weights for `RNN` modules are passed to the corresponding op as a `List[Tensor]`. In Python this has to be gathered dynamically since Parameters could be moved from CPU to GPU or be deleted and replaced (i.e. if someone calls `weight_norm` on their module, #15766), but in the JIT parameter lists are immutable, hence a builtin to handle this differently in Python/JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15744 Differential Revision: D14173198 Pulled By: driazati fbshipit-source-id: 4ee8113159b3a8f29a9f56fe661cfbb6b30dffcd	2019-02-21 16:24:19 -08:00
David Riazati	ac00a0cd47	Dict mutability (#16884 ) Summary: Adds `aten::_set_item` for `dict[key]` calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/16884 Differential Revision: D14000488 Pulled By: driazati fbshipit-source-id: ea1b46e0a736d095053effb4bc52753f696617b2	2019-02-21 16:24:17 -08:00
Soumith Chintala	3a47d56946	Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705 ) (#17337 ) Summary: Attempt #2 (attempt 1 is https://github.com/pytorch/pytorch/pull/16705 and got reverted because of CI failures) Fixes https://github.com/pytorch/pytorch/issues/14805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17337 Differential Revision: D14175626 Pulled By: soumith fbshipit-source-id: 66f2e10e219a1bf88ed342ec5c89da6f2994d8eb	2019-02-21 16:12:02 -08:00
Elias Ellison	290b2a1d9d	Fix Insert Constant Lint Fail (#17316 ) Summary: The test I added was failing lint because a constant was being created that wasn't being destroyed. It was being inserted to all_nodes, then failing the check ` AT_ASSERT(std::includes(ALL_OF(sum_set), ALL_OF(all_nodes_set)));` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17316 Differential Revision: D14172548 Pulled By: eellison fbshipit-source-id: 0922db21b7660e0c568c0811ebf09b22081991a4	2019-02-21 15:54:44 -08:00
Zachary DeVito	4c6da649e5	Partial support for kwarg_only arguments in script (#17339 ) Summary: This provides the minimum necessary to allow derivative formulas for things that have a kwarg only specifier in their schema. Support for non-parser frontend default arguments for kwargs is not completed. Fixes #16921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17339 Differential Revision: D14160923 Pulled By: zdevito fbshipit-source-id: 822e964c5a3fe2806509cf24d9f51c6dc01711c3	2019-02-21 15:27:06 -08:00
Natalia Gimelshein	5fa78303ed	fix double backward for half softmax/logsoftmax (#17330 ) Summary: Fix for #17261, SsnL do you have tests for it in your other PR? If not, I'll add to this. Example from #17261 now does not error out (and same for log_softmax). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17330 Differential Revision: D14171529 Pulled By: soumith fbshipit-source-id: ee925233feb1b44ef9f1d757db59ca3601aadef2	2019-02-21 14:58:45 -08:00
Christian Puhrsch	9101dfc57c	Revisit some native functions to increase number of jit matches (#17340 ) Summary: Adds about 30 matches due to new functions / misuse of double. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17340 Differential Revision: D14161109 Pulled By: cpuhrsch fbshipit-source-id: bb3333446b32551f7469206509b480db290f28ee	2019-02-21 14:41:06 -08:00
Mikhail Zolotukhin	46f15b74b7	Add Value::isValidName method. (#17372 ) Summary: The method will be used in IRParser and in NetDef converter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17372 Differential Revision: D14172494 Pulled By: ZolotukhinM fbshipit-source-id: 96cae8422bc73c3c2eb27524f44ec1ee8cae92f3	2019-02-21 14:34:17 -08:00
Bharat123Rox	6feded880e	Fix #17218 by updating documentation (#17258 ) Summary: Fix Issue #17218 by updating the corresponding documentation in [BCEWithLogitsLoss](https://pytorch.org/docs/stable/nn.html#torch.nn.BCEWithLogitsLoss) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17258 Differential Revision: D14157336 Pulled By: ezyang fbshipit-source-id: fb474d866464faeaae560ab58214cccaa8630f08	2019-02-21 14:17:35 -08:00
Soumith Chintala	45251fb52e	fix lint (#17366 ) Summary: fix lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/17366 Differential Revision: D14171702 Pulled By: soumith fbshipit-source-id: 5d8ecfac442e93b11bf4095f9977fd3302d033eb	2019-02-21 13:39:53 -08:00
Nikolay Korovaiko	3145f46a22	switch to Operation in register_prim_ops.cpp (#17183 ) Summary: This PR switches from `OperationCreator` to `Operation` to simplify the logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17183 Differential Revision: D14169829 Pulled By: Krovatkin fbshipit-source-id: 27f40a30c92e29651cea23f08b5b1f13d7eced8c	2019-02-21 12:45:25 -08:00
Karl Ostmo	b312e9de6a	Use standard docker image for XLA build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17287 Differential Revision: D14169689 Pulled By: kostmo fbshipit-source-id: 24e255be23936542093008ed51d2c061b2924993	2019-02-21 11:56:23 -08:00
Gregory Chanan	25730f15bb	Modernize test_sparse. (#17324 ) Summary: Our sparse tests still almost exclusively use legacy constructors. This means you can't, for example, easily test scalars (because the legacy constructors don't allow them), and not surprisingly, many operations are broken with sparse scalars. Note: this doesn't address the SparseTensor constructor itself, because there is a separate incompatibility there that I will address in a follow-on commit, namely, that torch.sparse.FloatTensor() is supported, but torch.sparse_coo_tensor() is not (because the size is ambiguous). The follow-on PR will explicitly set the size for sparse tensor constructors and add a test for the legacy behavior, so we don't lose it. Included in this PR are changes to the constituent sparse tensor pieces (indices, values): 1) IndexTensor becomes index_tensor 2) ValueTensor becomes value_tensor if it is a data-based construction, else value_empty. 3) Small changes around using the legacy tensor type directly, e.g. torch.FloatTensor.dtype exists, but torch.tensor isn't a type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17324 Differential Revision: D14159270 Pulled By: gchanan fbshipit-source-id: 71ee63e1ea6a4bc98f50be41d138c9c72f5ca651	2019-02-21 11:40:43 -08:00
Soumith Chintala	c63af8837d	remove nn.Upsample deprecation warnings from tests (#17352 ) Differential Revision: D14168481 Pulled By: soumith fbshipit-source-id: 63c37c5f04d2529abd4f42558a3d5e81993eecec	2019-02-21 11:27:24 -08:00
Soumith Chintala	3069c45069	upgrade documentation in setup.py to NO_ -> USE_ (#17333 ) Summary: fixes https://github.com/pytorch/pytorch/issues/17265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17333 Differential Revision: D14168483 Pulled By: soumith fbshipit-source-id: a79f4f9d9e18cb64e2f56f777caa69ae92d2fa4b	2019-02-21 10:25:43 -08:00
Dmytro Dzhulgakov	5744d5213d	Enforce non-negativity of tensor construction (#17077 ) Summary: Apparently, before the only way we enforced it was size>=0 in alloc_cpu. So empty((5,-5)) would fail but empty((-5,-5)) would hang :) Please suggest better place to enforce it if any. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17077 Differential Revision: D14077930 Pulled By: dzhulgakov fbshipit-source-id: 1120513300fd5448e06fa15c2d72f9b0ee5734e4	2019-02-21 09:28:53 -08:00
Igor Macedo Quintanilha	94a95a0c7f	Fixing docstring in CTCLoss (#17307 ) Summary: The argument `zero_infinity` is in the wrong place! :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17307 Differential Revision: D14154850 Pulled By: ezyang fbshipit-source-id: 7a9fe537483b23041f21ba1b80375b7f44265538	2019-02-21 08:13:28 -08:00
fehiepsi	de81a2741f	Fix the slowness of mvn's log_prob (#17294 ) Summary: This PR addresses the slowness of MVN's log_prob as reported in #17206. t-vi I find it complicated to handle permutation dimensions if we squeeze singleton dimensions of bL, so I leave it as-is and keep the old approach. What do you think? Pull Request resolved: https://github.com/pytorch/pytorch/pull/17294 Differential Revision: D14157292 Pulled By: ezyang fbshipit-source-id: f32590b89bf18c9c99b39501dbee0eeb61e130d0	2019-02-21 08:04:33 -08:00
Gao, Xiang	722cbe3064	Move argsort to C++ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17099 Differential Revision: D14165671 Pulled By: ezyang fbshipit-source-id: 3871de6874fe09871ebd9b8943c13c9af325bf33	2019-02-21 07:59:27 -08:00
Tri Dao	37890610b0	Include vec256 headers in setup.py (#17220 ) Summary: Fix #16650. Headers such as `ATen/cpu/vml.h` contain `#include <ATen/cpu/vec256/vec256.h>` for example, but these vec256 headers aren't included, due to commit e4c0bb1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17220 Differential Revision: D14165695 Pulled By: ezyang fbshipit-source-id: 27b2aa2a734b3719ca4af0565f79623b64b2620f	2019-02-21 07:37:01 -08:00
peter	5106918656	Enable MAX_JOBS for using Ninja on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17341 Differential Revision: D14164740 Pulled By: soumith fbshipit-source-id: 7a1c3db0a7c590f72a777fcd32e1c740bb0c6257	2019-02-21 04:40:17 -08:00
Luca Wehrstedt	29f4f8f048	Avoid unnecessary CPU-to-GPU copy of torch.load with CUDA (#17297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17297 When `torch.load` needs to load a tensor, no matter which device it will be end up being loaded on, it first creates a CPU storage for it of the necessary size. This storage is allocated but it's not "set" yet, hence no data is written to it: it exists in the kernel's memory map, but it's not resident and doesn't take up physical pages. Then, this storage is passed to the `map_location` function (if the parameter is a string, a device or a map, PyTorch builds that function automatically). The default map for CUDA consists effectively in `lambda storage, _: storage.cuda()` (I omitted the code needed to pick the correct device). This creates a GPU storage and copies over the data of the CPU storage. This step is unnecessary as we're copying uninitialized memory. (Surprisingly enough, though, it appears the kernel is smart enough that reading from the unpaged CPU memory doesn't cause it to become paged.) Once `map_location` returns a storage residing on the correct target device, `torch.load` resumes reading the file and copying the tensor's content over into the storage. This will overwrite the content that had previously been written to it, which confirms that the above copy was pointless. A way to avoid this useless copy is to just create and return a new empty storage on the target GPU, instead of "transforming" the original one. This does indeed increase the performance: ``` In [5]: torch.save(torch.rand(100, 100, 100), "/tmp/tensor") In [6]: %timeit torch.load("/tmp/tensor", map_location="cuda") 1.55 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: %timeit torch.load("/tmp/tensor", map_location=lambda storage, _: torch.cuda.FloatStorage(storage.size())) 1.03 ms ± 44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Credit for this diff is shared with adamlerer and fmassa. Differential Revision: D14147673 fbshipit-source-id: a58d4bc0d894ca03a008499334fc2cdd4cc91e9f	2019-02-21 01:32:19 -08:00
Michael Suo	2c302b6ea6	allow lists to contain any tensor type (#17321 ) Summary: If something is a TensorList, it should be a list of `TensorType`, not a list of some specialized type. Fixes #17140, #15642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17321 Differential Revision: D14158192 Pulled By: suo fbshipit-source-id: ba8fe6ae8d618c73b23cd00cbcb3111c390c5514	2019-02-21 00:18:50 -08:00
Junjie Bai	d92ddcf7ca	Skip convnets benchmark in rocm CI (#17331 ) Summary: random coredump Pull Request resolved: https://github.com/pytorch/pytorch/pull/17331 Differential Revision: D14162018 Pulled By: bddppq fbshipit-source-id: 3ed15a79b7bca2498c50f6af80cbd6be7229dea8	2019-02-20 21:12:24 -08:00
Edward Yang	b3b692a80a	Don't have malloc-free pairs that cross DLL boundaries. (#17302 ) Summary: See https://blogs.msdn.microsoft.com/oldnewthing/20060915-04/?p=29723 for more background on this requirement on Windows. Fixes #17239. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc xkszltl peterjc123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17302 Differential Revision: D14150067 Pulled By: ezyang fbshipit-source-id: 9dc16ca781ff17515b8df1bb55492477e7843d4c	2019-02-20 20:31:41 -08:00
bddppq	c063a33ef3	Add support to build for multiple amd gpu targets (#17329 ) Summary: iotamudelta petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/17329 Differential Revision: D14161277 Pulled By: bddppq fbshipit-source-id: f3eb9f52e96a8fcd779c57df0f8c9a2c54754e35	2019-02-20 18:45:24 -08:00
Michael Suo	501d346da8	batched cleanups (#17288 ) Summary: Bunch of random stuff I came across while doing UDT stuff. Putting in a separate PR to avoid noise - fix up the alias analysis list ops to include fork/wait - improve dump() for aliasDb to print writes - Move BuiltinFunction::call() to sugaredvalue with the rest of the methods - formatting and includes Pull Request resolved: https://github.com/pytorch/pytorch/pull/17288 Differential Revision: D14147105 Pulled By: suo fbshipit-source-id: 62e2a922a1726b684347365dc42c72188f154e9c	2019-02-20 18:31:53 -08:00
Edward Yang	cb3d3d1115	(Permanently) fix CI breakage due to new docker version. (#17338 ) Summary: Pull request resolved: https://github.com/pytorch/pytorch/pull/17338 See comment in config.yml for details. build-break Reviewed By: orionr Differential Revision: D14160934 fbshipit-source-id: a91160ab15dd6c174a7d946a78a7d2d50ae0a011	2019-02-20 18:00:11 -08:00
Cheng,Penghui	376bb40379	Implementation convolutionTranspose operator for mkl-dnn (#12866 ) Summary: the speed-up of a single operation is up to 2-3X on BDW. This PR depend on https://github.com/pytorch/pytorch/pull/14308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12866 Differential Revision: D13936110 Pulled By: ezyang fbshipit-source-id: 34e3c2ca982a41e8bf556e2aa0477c999fc939d3	2019-02-20 17:26:10 -08:00
Cheng,Penghui	c02e2ff0b0	Support multi-device configuration for MKL-DNN (#12856 ) Summary: MKL-DNN support multi-node mode，but not support multi-devices mode,this commit will support multi-devices for MKL-DNN.This commit depend on https://github.com/pytorch/pytorch/pull/11330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12856 Differential Revision: D13735075 Pulled By: ezyang fbshipit-source-id: b63f92b7c792051f5cb22e3dda948013676e109b	2019-02-20 16:57:43 -08:00
Ailing Zhang	90950f79c7	fix missing std (#17263 ) Summary: add missing std introduced by #16689 . Investigating why this wasn't caught in CI (nor my local dev environment). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17263 Reviewed By: ezyang Differential Revision: D14134556 Pulled By: ailzhang fbshipit-source-id: 6f0753fa858d3997e654924779646228d6d49838	2019-02-20 16:47:35 -08:00
Ilia Cherniavskii	0edc81136c	Rethrow exceptions from RunAsync (#15034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15034 Rethrow exception happened during RunAsync, ensure that pending tasks are not executed after marked as finished Reviewed By: andrewwdye Differential Revision: D13409649 fbshipit-source-id: 3fd12b3dcf32af4752f8b6e55eb7a92812a5c057	2019-02-20 16:32:24 -08:00
Ilia Cherniavskii	0337494c6a	Reinforce scheduling invariants (#17132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17132 schedule() function is not supposed to throw exception and is supposed to succeed in scheduling the full graph of tasks, potential errors (e.g. errors from underlying thread pool, out of memory exceptions etc) are considered not recoverable. The invariant - the graph of tasks is either not executed or executed in full before the call to finishRun() Reviewed By: andrewwdye Differential Revision: D14092457 fbshipit-source-id: a3e5d65dfee5ff5e5e71ec72bb9e576180019698	2019-02-20 16:32:23 -08:00
Lukasz Wesolowski	3e44880d4d	Modify TileOp GPU implementation to expose more concurrency and better utilize GPU memory bandwidth (#17275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17275 Previous implementation used a memcpy inside the kernel. It is more efficient to reduce the data fetched per thread to a single word from memory. This exposes more concurrency and takes advantage of GPU memory coalescing support. Reviewed By: takatosp1 Differential Revision: D14120147 fbshipit-source-id: c4734003d4342e55147c5b858f232a006af60b68	2019-02-20 16:02:14 -08:00
Christian Puhrsch	9e4a993878	Support str for native_functions.yaml schema Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17276 Differential Revision: D14154222 Pulled By: cpuhrsch fbshipit-source-id: 411181da5399608c1d1f3218f8f570bb106c88ec	2019-02-20 15:47:06 -08:00
Xiaomeng Yang	2e67b34ea7	Separate gpu reduce functions (#17146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17146 Separate gpu reduce functions i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D14097564 fbshipit-source-id: a27de340997111a794b1d083c1673d4263afb9fb	2019-02-20 14:49:01 -08:00
Edward Yang	474adf5458	Minor doc updates in c10/core/Allocator.h (#17164 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/17164 Differential Revision: D14154393 Pulled By: ezyang fbshipit-source-id: 59d8276d4bb4e7cadb4382769b75e5348ed388de	2019-02-20 14:36:15 -08:00
Xiang Gao	b2dde4386a	Namedtuple return for symeig, eig, pstrf, qr, geqrf (#16950 ) Summary: More ops for https://github.com/pytorch/pytorch/issues/394 Differential Revision: D14118645 Pulled By: ezyang fbshipit-source-id: a98646c3ddcbe4e34452aa044951286dcf9df778	2019-02-20 14:01:19 -08:00
Thomas Viehmann	36ddad3bfe	Allow PyTorch to be built without NCCL (#17295 ) Summary: With this patch you can use USE_DISTRIBUTED=OFF (possibly in combination with USE_NCCL=OFF (?)) The significance is partly because the NCCL doesn't build with CUDA 8. This is written under the assumption that NCCL is required for distributed if not, the USE_DISTRIBUTED check in nccl.py should be replaced by a check for the USE_NCCL environment variable. Fixes: #17274 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17295 Differential Revision: D14155080 Pulled By: ezyang fbshipit-source-id: 0d133f7c5b4d118849f041bd4d4cbbd7ffc3c7b4	2019-02-20 13:35:16 -08:00
Lu Fang	e6cf3c886d	add foxi submodule (#17184 )	2019-02-20 16:25:05 -05:00
Peizhao Zhang	54e4c4d7de	Removed obsolete argument correct_transform_coords in bbox_transform op. (#16723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16723 Removed obsolete argument correct_transform_coords in bbox_transform op. * It was only for backward compatibility. We should not have models using it now. Differential Revision: D13937430 fbshipit-source-id: 504bb066137ce408c12dc9dcc2e0a513bad9b7ee	2019-02-20 13:22:33 -08:00
Hector Yuen	075c7b1fef	make the threshold for acurracy more precise (#17194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17194 we found that there is a per row absolute error due to int8 quant and a relative error table-wide in case fp16 is used Reviewed By: csummersea Differential Revision: D14113353 fbshipit-source-id: c7065aa9d15c453c2e5609f421ad0155145af889	2019-02-20 13:14:11 -08:00
Yinghai Lu	db1d61a5c3	Add rule based filtering for ONNXIFI transformation (#17198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17198 We come to the point that we need to apply some rules to bind certain ops together to avoid un-inferrable intermediate shapes. We either lower them together to backend or neither. This diff adds a pass for us to add rules like this. The first one is to bind `Gather` with `SparseLengthsWeighted*`. Reviewed By: ipiszy Differential Revision: D14118326 fbshipit-source-id: 14bc62e1feddae02a3dd8eae93b8f553d52ac951	2019-02-20 12:47:24 -08:00
svcscm	63214b572b	Updating submodules Reviewed By: zpao fbshipit-source-id: 4ee15707bcf8c23c2d7feb6987acecef4131d467	2019-02-20 09:38:12 -08:00
Oleg Bogdanov	260facfdea	caffe2 \| added missing operator source file (#17272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17272 after windows-specific fixes were applied new file was left out of CMakeLists Reviewed By: orionr Differential Revision: D14140419 fbshipit-source-id: 6a6c652048ed196ec20241bc2a1d08cbe2a4e155	2019-02-20 09:28:29 -08:00
Nikolay Korovaiko	a91e056f2a	add list methods: copy,extend (#17092 ) Summary: This PR adds the following methods to python's list. * copy * extend and tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/17092 Differential Revision: D14141817 Pulled By: Krovatkin fbshipit-source-id: c89207f0f25f3d1d4ad903ee634745615d61d576	2019-02-20 09:24:25 -08:00
SsnL	79f898263b	Improve error message w/ size inference on empty tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17255 Differential Revision: D14143094 Pulled By: soumith fbshipit-source-id: f96fa7f8eb6eaac72887d3e837546cbfa505f101	2019-02-20 09:12:26 -08:00
Gemfield	c3a23379ea	add install step and docs for Android build (#17298 ) Summary: This commit did below enhancements: 1, add doc for build_android.sh; 2, add install step for build_android.sh, thus the headers and libraries can be collected together for further usage conveniently; 3, change the default INSTALL_PREFIX from $PYTORCH_ROOT/install to $PYTORCH_ROOT/build_android/install to make the project directory clean. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17298 Differential Revision: D14149709 Pulled By: soumith fbshipit-source-id: a3a38cb41f26377e21aa89e49e57e8f21c9c1a39	2019-02-20 07:05:24 -08:00
Soumith Chintala	1b3315ec17	improve libtorch install docs with GPU note (#17299 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17299 Differential Revision: D14149712 Pulled By: soumith fbshipit-source-id: 5b83110bb00e4d4dad04c1f293c2b52e41711f11	2019-02-20 06:30:08 -08:00
Thomas Viehmann	237e5438f5	Add launch bounds for TopK kernel, be more conservative in sorting (#17296 ) Summary: The particular use case reported is Jetson TX2 and maskrcnn. Fixes #17144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17296 Differential Revision: D14147886 Pulled By: soumith fbshipit-source-id: 44d5a89aaeb4cc07d1b53dd90121013be93c419c	2019-02-20 03:10:46 -08:00
Lara Haidar	b8d1f4a423	ONNX Export Maxpool Indices Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16455 Differential Revision: D14140375 Pulled By: houseroad fbshipit-source-id: 12d02c447e7fe0fae49969d1daf40a87660ed416	2019-02-19 21:10:14 -08:00
Michael Suo	4f45bc73f7	Revert D14144264: [pytorch][PR] [jit] clean up print from test Differential Revision: D14144264 Original commit changeset: eec837d29c46 fbshipit-source-id: ad91cb1d047fd34967385b661a6757111f92026e	2019-02-19 18:56:23 -08:00
svcscm	f8c9ec5e44	Updating submodules Reviewed By: zpao fbshipit-source-id: 68a648b2136823994f02fa5b567a2656494f6dd3	2019-02-19 17:50:40 -08:00
Michael Suo	5be3ffbde2	clean up print from test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17279 Differential Revision: D14144264 Pulled By: suo fbshipit-source-id: eec837d29c46e96be37c54192a841046b486cb8b	2019-02-19 17:46:41 -08:00
peter	428b666814	Fix dll loading process in newer Python on Windows (#17191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17051. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17191 Differential Revision: D14138427 Pulled By: kostmo fbshipit-source-id: 9f207105161ad0312eb09fd86072afd5f22de785	2019-02-19 17:16:41 -08:00
peter	972fc5f191	Fix dll loading issue for Caffe2 and Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17215 Reviewed By: orionr Differential Revision: D14138445 Pulled By: kostmo fbshipit-source-id: 0bb4f2f1ed5bda7416ba7e4c6b0618414b328934	2019-02-19 17:04:06 -08:00
Tongzhou Wang	2b57bdb7ab	Fix cuda softmax backward with empty input (#17259 ) Summary: Fixes #17256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17259 Differential Revision: D14142196 Pulled By: soumith fbshipit-source-id: 1f2dc202951b59b43da27684f9f924314bcd3040	2019-02-19 16:41:52 -08:00
Jie	594a4d7b55	at::native batch norm kernel launch config update (#17047 ) Summary: limit block dimension to avoid configuration error on batch norm kernel launch This should resolve #16998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17047 Differential Revision: D14142132 Pulled By: soumith fbshipit-source-id: 9c8c52dcd1d108cda1f65f5227e625b8fe6e12a0	2019-02-19 16:41:51 -08:00
Sergei Nikolaev	6455d91e4d	False alarm about leak in TestNN.test_variable_sequence_cuda (#17242 ) Summary: `TestNN.test_variable_sequence_cuda` sometimes brakes due to CUDA leak. The cause appears to be too small tolerance breaking float16 sub-test of the test above. When it breaks it calls abort disrupting correct tear down of the test and false alarming about the leak. ~~Also, removed annoying Upsample module warning. IMHO this warning is wrong because the module Upsample is not deprecated. Seems like it's been mixed with `nn.functional.upsample` function which is indeed deprecated in favor of `nn.functional.interpolate`, see `torch/nn/functional.py:2387` for details (this replacement is also performed in `test_nn.py`).~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/17242 Differential Revision: D14141686 Pulled By: soumith fbshipit-source-id: faa8f87440d94bdc6ab0ff00be6dad82353115c4	2019-02-19 15:59:30 -08:00
Karl Ostmo	09c9af9451	U/kostmo/gen circle conf (#17189 ) Summary: Diagram preview: ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/53040977-a0f88d00-3437-11e9-9190-796cc243e0f9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17189 Differential Revision: D14141362 Pulled By: kostmo fbshipit-source-id: 0625a1234d0307c6be79f17e756ddb1cc445b374	2019-02-19 15:37:09 -08:00
Ailing Zhang	f827f9f77a	update doc for multinomial (#17269 ) Summary: Update documentation to raise awareness of the fix in #12490. Thanks matteorr for pointing this out! Pull Request resolved: https://github.com/pytorch/pytorch/pull/17269 Reviewed By: ezyang Differential Revision: D14138421 Pulled By: ailzhang fbshipit-source-id: 6433f9807a6ba1d871eba8e9d37aa6b78fa1e1fd	2019-02-19 15:30:52 -08:00
Lu Fang	d73e6cb59d	Automatic update of fbcode/onnx to 4c091e048ca42682d63ccd3c1811560bc12b732d (#17264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17264 Previous import was 822d8df0a2a32233c6022f50a158817a0f19bdc7 Included changes: - [4c091e0](https://github.com/onnx/onnx/commit/4c091e0): Support defined ONNX_ML in parent cmake files (#1821) <Lu Fang> - [57372f3](https://github.com/onnx/onnx/commit/57372f3): Delete OpsetVersionConverter.md which is a duplicate of VersionConverter.md (#1818) <Prasanth Pulavarthi> - [ab1c57e](https://github.com/onnx/onnx/commit/ab1c57e): [ONNXIFI]Add extension to be implementable (#1796) <Rui Zhu> - [b92eee8](https://github.com/onnx/onnx/commit/b92eee8): Revert "Implement Op Annotation's for ONNX (#1648)" (#1812) <Ke Zhang> - [61f1e9e](https://github.com/onnx/onnx/commit/61f1e9e): Enable ONNX_ML by default (#1810) <Shinichiro Hamaji> - [4f064a1](https://github.com/onnx/onnx/commit/4f064a1): fix Greater and Less doc (#1811) <Guoliang Hua> - [0628582](https://github.com/onnx/onnx/commit/0628582): Implement Op Annotation's for ONNX (#1648) <Armen> - [ad9d2f7](https://github.com/onnx/onnx/commit/ad9d2f7): Versioning doc update for Opset 9 (#1805) <Vinitra Swamy> - [e71e3be](https://github.com/onnx/onnx/commit/e71e3be): add dilation case for ConvTranspose op (#1797) <Randy> Reviewed By: yinghai Differential Revision: D14135024 fbshipit-source-id: 1e4f9dda89abf48994798d080dd5d58207a6e4b6	2019-02-19 14:54:34 -08:00
Will Feng	c88798dbc1	Make tril_ and triu_ actually in-place (#17031 ) Summary: Currently, when the input tensor `self` is not contiguous, `tril_` and `triu_` calls `self = self.contiguous()`, which allocates a new contiguous tensor and assign it to `self`. This effectively changes the input tensor `self`'s pointer and will break downstream code after Variable/Tensor merge. This PR fixes it so that `tril_` and `triu_` always update the input tensor in-place and preserve the input tensor's TensorImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17031 Differential Revision: D14069592 Pulled By: yf225 fbshipit-source-id: d188218f426446a44ccc1d33fc28ac3f828c6a05	2019-02-19 14:47:17 -08:00
Michael Liu	0fc03d155a	Fix remaining -Wreturn-std-move violations in fbcode Summary: Some value are copied when it could've been moved. Detected by compiler flag -Wreturn-std-move Reviewed By: igorsugak Differential Revision: D14134303 fbshipit-source-id: 8fc3bb2017108b3d65097cb8447e33f5b6c743b4	2019-02-19 12:41:27 -08:00
Elias Ellison	89df22e57b	Lightweight String check Utility (#16858 ) Summary: light weight implementation of LLVM filecheck utility. Currently only handles string matching - regexes & saving a regex to a variable name can be added as needed. Current intended usage is through FileCheckBuilder python handle, and is shown in the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16858 Differential Revision: D14096244 Pulled By: eellison fbshipit-source-id: c7c8d1457691c105e6ccbb3c1a378d96baac2569	2019-02-19 12:31:57 -08:00
eellison	82aa511146	move prim::None to prim::Constant (again) (#17186 ) Summary: Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test. https://github.com/pytorch/pytorch/pull/16160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186 Differential Revision: D14115304 Pulled By: eellison fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7	2019-02-19 11:45:50 -08:00
Krishna Kalyan	9ebc433bda	Clarification of Lerp operation on tensors (#17253 ) Summary: The `tensor` be used as `end` clarified in the docs. Differential Revision: D14132212 Pulled By: ezyang fbshipit-source-id: e9bca14d5079e5f7adfc18afcb1eec832ef86e9e	2019-02-19 11:27:02 -08:00
Natalia Gimelshein	19117f6a0a	reenable rand_like fusion when there is no broadcast (#16087 ) Summary: Reenables rand_like fusion if no tensor is broadcasted in the fusion group. This is a sufficient but not necessary condition for fused rand_like to produce correct results, and it has an unpleasant side effect of falling back to non-fused path if rand_like was optimistically included in the fusion group, but there is a broadcast in the fusion group not necessarily related to rand_like. E.g. before this PR, if the network had (biasAdd -> relu -> dropout), fuser could fuse biasAdd and relu, now it will try fusing the whole thing (if dropout is expressed via rand_like) and fall back every time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16087 Differential Revision: D13720232 Pulled By: zou3519 fbshipit-source-id: 1e19203bec4a59257bfc7078b054a19f00fab4ad	2019-02-19 11:12:25 -08:00
Karl Ostmo	43d5cd4d34	discrepancy in smoke_macos_libtorch_2.7_cpu job spec (#17224 ) Summary: closes #17223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17224 Reviewed By: pjh5 Differential Revision: D14121612 Pulled By: kostmo fbshipit-source-id: bfd5a392de5e614031389725535756d7fa7db784	2019-02-19 10:14:21 -08:00
Iurii Zdebskyi	444039c47b	Bool tensor. Part 0: Boolean storage implementation (#16810 ) Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: https://github.com/pytorch/pytorch/issues/4764 https://github.com/pytorch/pytorch/issues/4219 https://github.com/pytorch/pytorch/issues/4288 Change: Added boolean type to the Storage class for CPU and CUDA backends. Tested via: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e	2019-02-19 08:22:13 -08:00
ZhuBaohe	e81878e0a9	Correct padding and activations docstrings in nn module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17197 Differential Revision: D14131284 Pulled By: soumith fbshipit-source-id: 6edd225b47b1dde81b5ad0a23c588c6621987a69	2019-02-19 08:16:52 -08:00
Michael Liu	f2f4030294	Use move to avoid copying (#17188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17188 Using flag "-Wreturn-std-move", compiler can identify the cases where a copy operation is performed when a move operation would have been available. Wrapped return statement with std::move to fix. For some reason, these files are not automatically modded. With D14115372 we should be able to turn on the compile flag Reviewed By: soumith Differential Revision: D14115786 fbshipit-source-id: e763b92eecbe4468027fc141d029618d1e9f280b	2019-02-19 07:14:27 -08:00
Zhonghao Liu	57617ee429	Replace resize_dim() with set_sizes_and_strides() (#17127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17127 Replace resize_dim() with set_sizes_and_strides() in `THTensor_(squeeze1d) in aten/src/TH/generic/THTensor.cpp` Reviewed By: ezyang Differential Revision: D14088697 fbshipit-source-id: 518b72f7c0c4fbedf11a29a6ceb9fee8eefd9273	2019-02-19 07:04:17 -08:00
Jaliya Ekanayake	9477c143c6	C++ Frontend: adding two distributed samples (Random and Sequential) (#16910 ) Summary: Adding two distrbuted samplers, Random and Sequential to the mix. Similar to python counterpart, DistributedSampler introduces a new method `set_epoch(size_t epoch)` which can be use to shuffle data determinstically between distributed processes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16910 Differential Revision: D14130980 Pulled By: soumith fbshipit-source-id: ec08b7130c01e2fc6dc3693f7ac622a0a6d60f10	2019-02-19 05:40:37 -08:00
ZhuBaohe	8852e21245	Correct recurrent/linear/dropout/sparse layers docstrings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17238 Differential Revision: D14130811 Pulled By: soumith fbshipit-source-id: d3998ca7da46aec5a59220c6af489f71f3d60735	2019-02-19 05:23:04 -08:00
surgan12	fad9eda7fb	Optional arg fixes (#17222 ) Summary: fixes #17210. cc : ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/17222 Differential Revision: D14130833 Pulled By: soumith fbshipit-source-id: 19ff6020c47208e3436ae28cd16110a0f435b25e	2019-02-19 04:39:18 -08:00
bhushan	7e5442f900	Reset grad attribute when called using del (#16525 ) Summary: del Tensor.grad set PyObject to nullptr and Tensor.grad = None set PyObject to Py_None Handling both the cases now fixes ##16471 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16525 Differential Revision: D14130800 Pulled By: soumith fbshipit-source-id: ed85c38305bba94d5047311cb58e4e4cedd09832	2019-02-19 04:33:57 -08:00
Ying Zhang	9a7bcacc27	Logging stuffs (#17177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17177 Add more logging and flag. Reviewed By: yinghai Differential Revision: D14111643 fbshipit-source-id: 4b1c005faf41c21f59100bc401120c6970a24c42	2019-02-17 13:41:50 -08:00
Mikhail Zolotukhin	3a01a45f06	Implement IRParser. (#16987 ) Summary: It might need some cleaning up and might be missing some features, but it should be already working for most cases. This PR is based on top of PR16986 (so please review only the last commit here). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16987 Differential Revision: D14074577 Pulled By: ZolotukhinM fbshipit-source-id: 712b598f423265655f574bb9903e2066628eaad3	2019-02-16 20:23:50 -08:00
Junjie Bai	bf16a6bc3c	Skip onnx logsoftmax tests in rocm (#17170 ) Summary: similar to softmax there are issues of getting nan randomly Pull Request resolved: https://github.com/pytorch/pytorch/pull/17170 Differential Revision: D14110515 Pulled By: bddppq fbshipit-source-id: 5c97661184d45a02122fd69d35a839fdf4520c8c	2019-02-16 18:06:04 -08:00
Gao, Xiang	b6b99fd7d3	Add namedtuple return for min, median, mode, kthvalue, add test for namedtuple return API (#16186 ) Summary: This partially fixes https://github.com/pytorch/pytorch/issues/394 and depend on https://github.com/pytorch/pytorch/pull/15429. I suggest to review this only after https://github.com/pytorch/pytorch/pull/15429 get landed, otherwise the diff might be large to review. The test only allows explicitly whitelisted operators to have named return. Differential Revision: D14070735 Pulled By: ezyang fbshipit-source-id: ace2a672998b4e4a8094f52cbda5aa1cea6e3b42	2019-02-16 00:01:33 -08:00
David Riazati	b3d8c569d3	Remove templates for GenericDict Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17175 Differential Revision: D14113022 Pulled By: driazati fbshipit-source-id: 5183e131cc8ccb58525875f76fa03133570a59ea	2019-02-15 21:35:19 -08:00
Ailing Zhang	20fd6dca77	fix missing constant in adaptive_avg_pool2d AD (#17180 ) Summary: Thanks ngimel for pointing this out! Pull Request resolved: https://github.com/pytorch/pytorch/pull/17180 Differential Revision: D14113001 Pulled By: ailzhang fbshipit-source-id: 78e7d7f2cda3889138e2bf26a54980c2cc665882	2019-02-15 21:14:34 -08:00
Mikhail Zolotukhin	6c06b32558	Implement NetDef <--> JIT IR converters. Try 2. (#17123 ) Summary: Currently the converters are very straightforward, i.e. there is no code for trying to preserve semantics, we're purely perform conversion from one format to another. Two things that we might want to add/change: 1. Add semantic conversion as well (but probably it would be a good idea to keep it separate as a temporary thing). 2. Make sure we don't mess with value names, as they are crucial for current uses of NetDefs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17123 Differential Revision: D14090244 Pulled By: ZolotukhinM fbshipit-source-id: 07175fa9235582e1d1da5f10a42a5c1280b1b394	2019-02-15 20:39:30 -08:00
Hector Yuen	cde7204636	change the epsilon for fp32/fp16 to uint8 to be the same (#17062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17062 from jiyan's training jobs it seems like we found a quantization bug fp32 fp32->rowwise int8 is fine fp16 is fine fp16->rowwise int8 is not fine we are preconverting everything to fp32 and using the existing code, so there is no need to change the epsilon in the case of fp16 since at the time of converting, everything is a float Reviewed By: jspark1105 Differential Revision: D14063271 fbshipit-source-id: 747297d64ed8c6fdf4be5bb10ac584e1d21a85e6	2019-02-15 18:33:37 -08:00
Elias Ellison	91c1d728ac	Revert D14109636: [pytorch][PR] move prim::None to a case in prim::Constant Differential Revision: D14109636 Original commit changeset: d26fd3839761 fbshipit-source-id: c8c8113e2bff49ea93235732603e6ebc89356533	2019-02-15 16:38:12 -08:00
Elias Ellison	7caa21f5ca	move prim::None to a case in prim::Constant (#16160 ) Summary: This change simplifies analysis done on constants since prim::None does not need to be handled separately now. To check if a constant node is None, use node->isNone(). Next step will be to remove prim::Undefined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160 Differential Revision: D14109636 Pulled By: eellison fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876	2019-02-15 16:27:57 -08:00
Xiang Gao	4fcab92d6c	Move outplace ops to ATen (#16788 ) Summary: Based on https://github.com/pytorch/pytorch/pull/12413, with the following additional changes: - Inside `native_functions.yml` move those outplace operators right next to everyone's corresponding inplace operators for convenience of checking if they match when reviewing - `matches_jit_signature: True` for them - Add missing `scatter` with Scalar source - Add missing `masked_fill` and `index_fill` with Tensor source. - Add missing test for `scatter` with Scalar source - Add missing test for `masked_fill` and `index_fill` with Tensor source by checking the gradient w.r.t source - Add missing docs to `tensor.rst` Differential Revision: D14069925 Pulled By: ezyang fbshipit-source-id: bb3f0cb51cf6b756788dc4955667fead6e8796e5	2019-02-15 15:58:10 -08:00
Igor Fedan	5737c5259c	Fix for 16939:multinomial performance regressed Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17121 Differential Revision: D14088558 Pulled By: ifedan fbshipit-source-id: e03583135f1e797fe1d8081ec5e9e6b63d4015c1	2019-02-15 15:44:41 -08:00
Adam Paszke	7157be8622	Add special ops for BatchNorm symbolic differentiation (#15403 ) Summary: The main problem there is with differentiating batch norm statically is that we make a lot of complex run-time decisions about the backend we choose. Then, the autograd derivatives are implemented for every backend separately, which makes sense, because they might be saving buffers containing different values. To resolve the issue, the forward op returns an index of the chosen backend, and the backward function takes it as an argument, such that it knows how to interpret the buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403 Differential Revision: D14098815 Pulled By: ailzhang fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad	2019-02-15 15:40:28 -08:00
Elias Ellison	21696502ff	improve error msg when module list isn't added to __constants__ (#17167 ) Summary: Add suggestion to add to __constants__ when a ModuleList of Sequential module is used as a tuple Addresses https://github.com/pytorch/pytorch/issues/13899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17167 Differential Revision: D14107688 Pulled By: eellison fbshipit-source-id: 8c07d1f3e25a9c6bdcfd96dbf6b72c2130838278	2019-02-15 15:03:50 -08:00
Josh Varty	1cdcdd78af	Kaiming Initialization (#14718 ) Summary: /cc goldsborough Working on #14582 The corresponding python implementations are at: [pytorch/torch/nn/init.py](`6302e4001a/torch/nn/init.py (L261-L327)`) Here is my initial implementation of Kaiming Initialization. I have not been able to figure out how to successfully run tests locally so I haven't added any yet. A couple questions: - Are the enums defined in the right place? I copied their names from Python, but do you prefer different naming conventions for C++? - To run tests locally do I use `python setup.py test`? Can I run just a subset of the tests somehow? - Should I add my tests at [test/cpp/api/misc.cpp](https://github.com/pytorch/pytorch/blob/master/test/cpp/api/misc.cpp#L47-L54)? Pull Request resolved: https://github.com/pytorch/pytorch/pull/14718 Differential Revision: D14049159 Pulled By: goldsborough fbshipit-source-id: 966ac5126875936e69b185b5041f16476ed4cf70	2019-02-15 14:58:22 -08:00
Andy Wei	5eee0670ab	Pass torch.distributed launch process local rank as environment variable instead of argument (#16360 ) Summary: In `torch.distributed.launch.py`, it passes `local_rank` as argument and requires user's program to parse it. However, it would be more flexible for users and consistent with other variables, e.g. `RANK`, `MASTER_PORT`, `WORLD_SIZE`, if passing through environment variables. `265ed8ff45/torch/distributed/launch.py (L200-L212)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16360 Differential Revision: D14070372 Pulled By: ezyang fbshipit-source-id: c3f6a8e55ab513918cad09d1326eccdedb4d98c9	2019-02-15 14:52:55 -08:00
David Riazati	ea405f8d01	Assert cases exist for unschematized ops in alias analysis Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16334 Differential Revision: D13901238 Pulled By: driazati fbshipit-source-id: be99f89e7dc6a299b770ea92e217932a5271027d	2019-02-15 14:27:26 -08:00
Ailing Zhang	8d33eb450e	Fix avg pool2d api (#17166 ) Summary: Fix xla breakage (partially). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17166 Differential Revision: D14106954 Pulled By: ailzhang fbshipit-source-id: 35ae6713272d0517b66da2ee9209f49015492b89	2019-02-15 13:58:30 -08:00
Karl Ostmo	eba1b23ddd	Fix syntax error in set instantiation (#17174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17174 Use curly braces syntax to avoid Lint complaint Reviewed By: yf225 Differential Revision: D14111368 fbshipit-source-id: 44aa21deb9feededb94f23d92262a4164fe0cc1c	2019-02-15 13:58:29 -08:00
Gregory Chanan	6454e3262d	Make getting the dtype of a tensor work for backend extensions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17131 Differential Revision: D14093163 Pulled By: gchanan fbshipit-source-id: 06638706e26505e3c741b7ae290000ca258599db	2019-02-15 13:47:37 -08:00
Gregory Chanan	9b5d3f6f5e	Stop reassigning (output) reference arguments in BinaryOps. (#17059 ) Summary: The binary ops that are using TensorIterator do a trick in order to only write the code once for out and non-out variants: 1) Have the non-out variant call the out variant with an undefined tensor. 2) the out variant then reassigns the result tensor to the output of the TensorIterator; this is a no-op in the case where a valid tensor was passed and it correctly propagates the result back to the non-out variant, which is legal because it's just reassigning an undefined tensor. I believe other solutions to this problem would require an unnecessary reference bump, e.g. defining another out variant that returns a Tensor rather than a reference. Unfortunately, this doesn't work with const-references, which we want to move our output arguments to be (because const doesn't actually provide const correctness here, and writers mistakenly reassign the parameter in the case it isn't an out variant). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17059 Differential Revision: D14068402 Pulled By: gchanan fbshipit-source-id: 89fef177a1e174dbe2858e2eae0f6d85460b07d1	2019-02-15 13:41:04 -08:00
Yinghai Lu	70ee257ad4	Fix batch insert (#17158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17158 Because of Reshape op, batch size can be changed. This diff addresses first order issue raised from multiple batch size system. We need to export different real_batch_size for different max_batch_size input and attach it to the right output. It also fixes a false exception. Reviewed By: ipiszy Differential Revision: D14099541 fbshipit-source-id: 0fa9e86826f417a11d2b5dd2ee60dff64a7ce8c4	2019-02-15 12:28:23 -08:00
Karl Ostmo	01686db21b	Generate CircleCI config.yml from a script (#17039 ) Summary: This initial PR splits the `.circleci/config.yml` file into several smaller files that are stitched verbatim back into the original. A proof of concept of dynamically generating yaml for the job configuration list is also introduced. Since the `config.yml` file must exist in the repo in its final form, there must exist a manual update and check-in step to regenerate `config.yml` from its constituent parts. Consistency between the checked-in `config.yml` file and the authoritative source data is enforced at build time through TravisCI. closes #17038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17039 Reviewed By: yf225 Differential Revision: D14109059 Pulled By: kostmo fbshipit-source-id: bc04a73145290358854f5a5e552a45e559118fc3	2019-02-15 12:21:25 -08:00
Nikolay Korovaiko	82b269060c	Add support for simpler for-in-list + tests (#16726 ) Summary: This PR add supports for simpler for-in-list loops such as the example below: ```python torch.ji.python def sum_list(a): # type: (List[int]) -> int sum = 0 for i in a: sum += i return sum ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16726 Differential Revision: D14070007 Pulled By: ezyang fbshipit-source-id: b4d971ee647729a6caa3099ceac34ec5c4f143de	2019-02-15 11:41:20 -08:00
David Riazati	326c891d32	Update pybind11 (#17143 ) Summary: Fixes #17130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17143 Differential Revision: D14107386 Pulled By: zdevito fbshipit-source-id: 1834d14bcdcad6857c199bf4fb8f67298394bbf3	2019-02-15 11:24:25 -08:00
Shen Li	472cfc0f2c	Enforce module device at DataParallel construction time (#17129 ) Summary: closes #17065 CC douwekiela Pull Request resolved: https://github.com/pytorch/pytorch/pull/17129 Differential Revision: D14093353 Pulled By: mrshenli fbshipit-source-id: 9a5a10f16e392337a7f7073223541cf69b402f82	2019-02-15 11:14:46 -08:00
Krishna	b892f69440	one_hot docs missing (#17142 ) Summary: one_hot docs is missing [here](https://pytorch.org/docs/master/nn.html#one-hot). I dug around and could not find a way to get this working properly. Differential Revision: D14104414 Pulled By: zou3519 fbshipit-source-id: 3f45c8a0878409d218da167f13b253772f5cc963	2019-02-15 10:48:18 -08:00
Michael Kösel	38139bc356	add pop support to list (#17015 ) Summary: [WIP] add "pop" to list, see https://github.com/pytorch/pytorch/issues/16662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17015 Differential Revision: D14071680 Pulled By: eellison fbshipit-source-id: b49a318059c1cc131acda50713132e11b562568f	2019-02-15 10:48:17 -08:00
svcscm	7481cc9d7c	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: bbfb709d8681da60ccc9f3bafc6c296c32fcf835	2019-02-15 10:42:23 -08:00
Jongsoo Park	dad0dbd3b9	merge fully_connected_rowwise_dnnlowp_op into fully_connected_dnnlowp_op (#17105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17105 To make FC with rowwise quantization faster, reduce code duplication, and make code consistent with Convolution Reviewed By: csummersea Differential Revision: D14080461 fbshipit-source-id: 2b0e67b86e7e3029c90751a8824bf80ae1223680	2019-02-15 09:50:11 -08:00
Jongsoo Park	90fc6133b2	bug fix when we prepack weight and bias together (#17145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17145 Prepacked weight contains both weight and bias, so the bias should be obtained from input index 1, not from 2 Reviewed By: jianyuh Differential Revision: D14097281 fbshipit-source-id: b8b836b85a7b240e2fd1734377c46d9bf2ce3390	2019-02-15 09:21:20 -08:00
Brian W. Hart	fbd690c1fe	caffe2: fix PinnedCPUAllocator cudaHostRegister() leak (#16340 ) Summary: In the NUMA case, PinnedCPUAllocator's allocate() would return a DataPtr constructed by DefaultCPUAllocator, which would reference the Default... Delete() rather than the Pinned... Delete(). That meant Pinned... Delete() would never run, so cudaHostUnregister() would never be called when regions were freed. See: https://github.com/pytorch/pytorch/issues/16280 This change adds a 'naked_allocate()' method to the Default allocator that just returns a pointer to the allocated memory rather than wrapping it in a DataPtr. Pinned allocator uses that then constructs a DataPtr with reference to its own Delete(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16340 Reviewed By: dzhulgakov Differential Revision: D13843206 Pulled By: ezyang fbshipit-source-id: 9efb572e5a01b49ef2a4aceeccc13cd0b1066528	2019-02-15 07:02:33 -08:00
Xiang Gao	07b5782ff7	Add some missing docs to torch.rst, new unittest to enforce torch.rst no longer miss anything (#16039 ) Summary: This prevent people (reviewer, PR author) from forgetting adding things to `torch.rst`. When something new is added to `_torch_doc.py` or `functional.py` but intentionally not in `torch.rst`, people should manually whitelist it in `test_docs_coverage.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16039 Differential Revision: D14070903 Pulled By: ezyang fbshipit-source-id: 60f2a42eb5efe81be073ed64e54525d143eb643e	2019-02-15 07:02:31 -08:00
Jie	a771a6ba67	(#16825 ) Summary: setting the correct math type for cudnn rnn, which is enforced starting from cudnn 7.5+ 1. Updating persistent rnn check with input data type instead of rnn math type; 2. Updating rnn type promotion to set correct math type for accumulation; 3. Replace datatype check for filter descriptor from rnn.datatype to input.datatype; Pull Request resolved: https://github.com/pytorch/pytorch/pull/16825 Differential Revision: D14071190 Pulled By: ezyang fbshipit-source-id: 1c9a1531ccf510cb0619e830be444c20c5e72f3f	2019-02-15 07:02:30 -08:00
ZhuBaohe	acf5ec07af	Correct conv and pooling docstrings in nn module (#17052 ) Summary: This PR fix conv and pooling docstrings in nn module Pull Request resolved: https://github.com/pytorch/pytorch/pull/17052 Differential Revision: D14068566 Pulled By: ezyang fbshipit-source-id: 3ec1de232ff6334b6a544dadefbb0ee6193d443a	2019-02-15 06:58:02 -08:00
wbydo	6c67dcfb05	Fix AdaptiveLogSoftmaxWithLoss's constructor (#16694 ) Summary: t-ken1 and I are members of a same team. I have added test codes about the pull request https://github.com/pytorch/pytorch/pull/16656. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16694 Differential Revision: D14070106 Pulled By: ezyang fbshipit-source-id: ff784dbf45e96a6bcf9a4b5cb9544a661a8acad2	2019-02-15 06:58:00 -08:00
David Riazati	48943c3b7a	Update Upsample docs to match nn.interpolate Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17134 Reviewed By: ezyang Differential Revision: D14095694 Pulled By: driazati fbshipit-source-id: 79afec9ddd50b3b8ce39acf98c2543cf1a3d1127	2019-02-15 06:38:41 -08:00
Johannes M Dieterich	f84165d20d	Remove static_cast insertion/kernel argument extration. (#17055 ) Summary: In light of the antistatic feature being a part of the released ROCm 2.1, remove the feature in pyHIPIFY for extraction of kernel arguments and insertion of static_casts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17055 Differential Revision: D14068478 Pulled By: bddppq fbshipit-source-id: 6895f490c78247a129aa18c520ff8d4d1a3d3642	2019-02-15 01:54:31 -08:00
Gu, Jinghui	b65c22c01a	Upgrade mkl-dnn to v0.17.3 to fix core dump issue (#17107 ) Summary: Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17107 Differential Revision: D14097600 Pulled By: yinghai fbshipit-source-id: 2baa44e211ce37fbdf01585344c98745f5ba008c	2019-02-15 01:23:07 -08:00
Peizhao Zhang	056dd5b6de	Updated bbox_transform and nms unit test for caffe2 ops. (#16722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16722 Updated bbox_transform and nms unit test for caffe2 ops. Differential Revision: D13937416 fbshipit-source-id: 034743d29671c6e73d323a935e2d734ecc071bff	2019-02-15 00:21:55 -08:00
BowenBao	2634e306e4	Extend support for exporting reshape to onnx. (#16971 ) Summary: Resolve issue with reshape_as test case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16971 Differential Revision: D14098871 Pulled By: houseroad fbshipit-source-id: ed6b966821462d374313256abbbe27f96ce11b2c	2019-02-15 00:17:05 -08:00
Wanchao Liang	f062f5fd4a	add std to autodiff, and mean/var/std to operator set (#17137 ) Summary: supersedes #16684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17137 Differential Revision: D14096724 Pulled By: wanchaol fbshipit-source-id: d801d70029a6a1f5851400ff4094c0299c102b2b	2019-02-14 23:18:53 -08:00
Guoqiang Jerry Chen	678a472ee5	Script module data parallel (#16891 ) Summary: support data parallel for ScriptModule. see unit tests for testing done for this PR. I also tried traced version of resnet18 from torchvision. I'm yet to try a complete end-to-end data parallel training. This will be next steps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16891 Differential Revision: D14002222 Pulled By: gqchen fbshipit-source-id: fce3598169113215599815c6978e66d3c3a8c282	2019-02-14 22:52:19 -08:00
Jongsoo Park	0a975d333f	add pre-packing operation in README.md (#17151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17151 As title Reviewed By: jianyuh Differential Revision: D14084272 fbshipit-source-id: e58c041e0374f6e82b337e5b6325ef06981ad8b4	2019-02-14 22:46:47 -08:00
Summer Deng	a1f2ed008f	Minor fix of the histogram observer in FBL eval flows (#17118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17118 Fix the bug in quantization eval workflow; Add mul_nets option in histogram observer pybind Reviewed By: yinghai Differential Revision: D14085321 fbshipit-source-id: 08e3153148522ebc9512a57144d9a8ad154bb6f8	2019-02-14 22:02:04 -08:00
Wanchao Liang	5f6ecd14c4	more test coverage on emitIf none dispatch (#16794 ) Summary: Follow up of #14533, add more test coverage for emitif metaprogramming conditions. Also delete some unwrap optional usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16794 Differential Revision: D14096868 Pulled By: wanchaol fbshipit-source-id: ee1cec609c58d0dd65211249a90207be06649e71	2019-02-14 21:39:55 -08:00
ngimel	91c50aeec6	Speed-up adaptive average pooling for the common case of size=1 output (#17011 ) Summary: When adaptive pooling has to produce a single pixel feature map, it is faster to do so by calling .mean(). Backward calls a pretty inefficient cuda kernel with atomics, which becomes ridiculously slow for halfs. For half this PR provides approx 30x speed-up for adaptive average pooling, which results in 30% end-to-end speed-up on senet. Improvements are smaller for float, but still significant (approx 5x). Also this PR unifies handling of 3d (no batch dimension) and 4d tensors, using negative dimension indices. cc ezyang for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17011 Reviewed By: ailzhang Differential Revision: D14078747 Pulled By: soumith fbshipit-source-id: 0eb9255da2351190a6bcaf68c30e2ae2402a2dd9	2019-02-14 21:15:16 -08:00
Thomas Viehmann	7cff803d0a	Improve example for torch.mode (#17069 ) Summary: This updates the example for `torch.mode` to show a case where there is a mode. Also add a bit of a description to the explanation as well as being a bit more precise about "a" mode rather than "the" mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17069 Differential Revision: D14078722 Pulled By: soumith fbshipit-source-id: 837a238d53a9b8e868511acbdc258633975bea48	2019-02-14 18:52:53 -08:00
Yinghai Lu	58648a19df	Create BackendTransformerBase to host common functions used for backend lowering (#17074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17074 There are some common functionalities in backend lowering. This diff creates a base class which hosts these common stuff. Reviewed By: ipiszy Differential Revision: D14073192 fbshipit-source-id: 9617603d0e73db6f7fcc5572756b9dbab506dae5	2019-02-14 17:57:03 -08:00
Zhizhen Qin	6a46738986	Fix android crash when model detects nothing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17119 Reviewed By: sf-wind Differential Revision: D14087835 Pulled By: ZhizhenQin fbshipit-source-id: 32e61d46679bae645fd0bbec724513cfa5c553ab	2019-02-14 17:29:14 -08:00
kngwyu	d61455cf40	Fix some documentation links in torch.tensor (#17109 ) Summary: Currently it's broken https://pytorch.org/docs/stable/tensors.html#torch.Tensor.norm Pull Request resolved: https://github.com/pytorch/pytorch/pull/17109 Differential Revision: D14093567 Pulled By: ezyang fbshipit-source-id: b167cde2150ee97ccf5689fcf50ff8157acfce10	2019-02-14 17:13:50 -08:00
Michael Liu	5f866d0ea2	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14086124 fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58	2019-02-14 16:52:57 -08:00
James Reed	f1da9892e9	Generalize catArray for contiguous inputs and dim != 0 (#17032 ) Summary: I noticed that we were sinking a lot of time into `cat` operations in machine translation on CPU, and drilled down to us doing the cat element-by-element, even though all the inputs were contiguous. The reason was we were doing the cat along a dimension that was not 0, and that caused us to not use the fast `memcpy` branch. This PR generalizes that branch. Quick benchmark script: ``` import torch, time tensors = [torch.rand(6, 2, 1024) for i in range(5)] NITER = 1000 s = time.time() for i in range(NITER): torch.cat(tensors, dim=1) print('time per iter ', (time.time() - s) / NITER) ``` Before: ``` time per iter 8.089399337768554e-05 ``` After: ``` time per iter 2.183413505554199e-05 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17032 Differential Revision: D14090038 Pulled By: jamesr66a fbshipit-source-id: 2c733a84915896008ac95f2233f44894bd2573de	2019-02-14 16:33:23 -08:00
Wanchao Liang	f3dd5563e4	fix test_jit canonicalize_tensor_iterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17104 Differential Revision: D14089928 Pulled By: wanchaol fbshipit-source-id: 8b288514ab9ee8d24a11d39b75eef95783f28f20	2019-02-14 16:03:34 -08:00
Sebastian Messmer	65e06df24a	Use new constructor in USE_SIMPLE_CTOR_DTOR (#17080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17080 This changes all operators using this macro to the new format Reviewed By: dzhulgakov Differential Revision: D14078628 fbshipit-source-id: 67048e485e326765fd49567cc008633d3d500d5c	2019-02-14 15:54:16 -08:00
Xiaodong Wang	6f2bcc9b4f	Caffe2 TARGETS for HIP (#17076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17076 OSS: slightely change the tools/amd_build/build_amd.py to add the output_directory for internal use. Also modify the renaming convention in hipify script to reflect the updated rules. Reviewed By: bddppq Differential Revision: D13767218 fbshipit-source-id: cbcadc51daab42197d545f204840dcc18176bb3d	2019-02-14 15:45:21 -08:00
Ailing Zhang	b0545aa85f	maskrcnn & bert AD coverage part 1 (#16689 ) Summary: - Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver. - Added a hack to loop up keyword only argument. Will add proper support for kw only later - Simulate function overload in aten using `_<number>` as function name suffix. - Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support. - Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk` and leave them for next PR. Ops supported in this PR: ``` erf expand_as index kthvalue mean permute pow rsub select sqrt squeeze t to topk transpose view var embedding logsumexp // grad is None _dim_arange contiguous nonzero ones_like ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689 Differential Revision: D14020806 Pulled By: ailzhang fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5	2019-02-14 15:36:39 -08:00
jiej	b5193b6a81	Second PR to restore reverted commit (#16224 ) (#17040 ) Summary: update: 1. global_reduce check for should_block_y_reduce first. This avoids the enabling global_reduce without block_y_reduce. Leading to accessing shared memory during global reduce without allocation. 2. updating block_y_reduce heuristics. Improves perf on tiny tensors 3. adding test case covering old cases where illegal memory access might occur TensorIterator cuda launch configs update (#16224) Update launch configs for TensorIterator gpu_reduce_kernel. Enable flexible block dimension to improve efficiency for reduction cases with small fast dimension. Previously TensorIterator launches blocks with fixed 32x16 threads. For cases like: import torch torch.randn(2**20, 4, device='cuda').sum(0) The fixed launch config does handle coalesced memory access efficiently. Updated launch configure enables flexible block dimension. Combining with improved reduction scheme (using flexible vertical / horizontal reduction instead of limited warp / block reduction in the old code), it ensures optimal memory access pattern even with reduction on dimension with small stride. Possible future improvements: 1. Precise dynamic shared memory allocation. 2. Using warp shuffle for vertical (block_y) reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17040 Differential Revision: D14078295 Pulled By: umanwizard fbshipit-source-id: ecc55054a5a4035e731f0196d633412225c3b06c	2019-02-14 15:23:01 -08:00
Yinghai Lu	b515ebc6f1	Remove fake inference for shape info in ONNXIFI transform (#17046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17046 As we are moving to use bound shape inference, we can remove the awkward fake inference run path and make the code cleaner. Reviewed By: ipiszy Differential Revision: D14061501 fbshipit-source-id: b3ace98b3dabef3c3359086a0bb1410518cefa26	2019-02-14 15:12:20 -08:00
Gregory Chanan	0a5de6e972	Update alexnet expect. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17122 Reviewed By: colesbury Differential Revision: D14090209 Pulled By: gchanan fbshipit-source-id: 78c5961dd7d752b237782b6ed90c376bbd6d3145	2019-02-14 14:45:02 -08:00
Michael Kösel	ff2053dfa1	add clear functionality to list (#17050 ) Summary: Add clear functionality to list. See #16662 ```python import torch torch.jit.script def foo(): a = [1, 2, 3, 4] a.clear() return a ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17050 Differential Revision: D14071799 Pulled By: driazati fbshipit-source-id: 305551c16f7db127c43de0ad5885d9f10678e101	2019-02-14 14:38:11 -08:00
Yinghai Lu	d52862ca81	Moderate the dim type after LengthsRangeFill (#17096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17096 LengthsRangeFill will take a batch size of lengths input and expand it into sequence. Later op should follow this type until it hits another batch type moderating op, e.g. SparseLengthsSum. Reviewed By: ipiszy Differential Revision: D14079422 fbshipit-source-id: 1a26925d502c32875ea95c160268bf6a256cc955	2019-02-14 14:28:27 -08:00
jayleverett	016f212357	fix behavior of ConcatDataset w/ negative indices (#15756 ) Summary: Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the first dataset, rather than the last entry of the last dataset, as we would expect. This is a simple fix to support the expected behavior for negative indices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756 Reviewed By: ezyang Differential Revision: D14081811 Pulled By: fmassa fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a	2019-02-14 13:02:54 -08:00
Dwarak Rajagopal	65d6f1014a	Add support of count_include_pad and test end to end test for AveragePool (#17034 ) Summary: Add support of count_include_pad end to end test for AveragePool We can export AveragePool from PyTorch with count_include_pad attribute. However, we don't directly support it in Caffe2's ONNX backend. We also want to check whether we can pass the end to end test for average pool operator with count_include_pad attribute (pytorch => onnx => caffe2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17034 Reviewed By: houseroad Differential Revision: D14060186 Pulled By: dwarakrajagopal fbshipit-source-id: 10dae532611c71f8c8cfc3fa701cc7c1c1c02695	2019-02-14 11:48:42 -08:00
BowenBao	19addc7eb0	Support nonzero onnx export Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17036 Differential Revision: D14079676 Pulled By: houseroad fbshipit-source-id: 562b538dd9ab330c26f15fdb34c98dc7a23571a1	2019-02-13 23:52:42 -08:00
Dmytro Dzhulgakov	5a26579e27	Add more headers to setup.py to make pytorch/benchmark work (#16890 ) Summary: Since we don't do tmp_install any more it's better to include all necessary headers. cc kostmo for better suggestions of how to list all headers here Pull Request resolved: https://github.com/pytorch/pytorch/pull/16890 Differential Revision: D14079848 Pulled By: dzhulgakov fbshipit-source-id: 4522c80d05e5d91f99f6700cde46cac559330d28	2019-02-13 23:14:36 -08:00
Dmytro Dzhulgakov	3408d9de20	Clean up Storage/StorageImpl constructors (#16948 ) Summary: Small cleanup while doing https://github.com/pytorch/pytorch/pull/16857: - rename C2 constructors as create_legacy - remove duplicated constructors - make resizable flag non-default Pull Request resolved: https://github.com/pytorch/pytorch/pull/16948 Differential Revision: D14062755 Pulled By: dzhulgakov fbshipit-source-id: 3b7b4ec9cdf67d2628cccc001156e040006b673e	2019-02-13 22:58:32 -08:00
Dmytro Dzhulgakov	11816affab	Safety check for negative alloc_cpu() attempt (#17071 ) Summary: Some legacy TH code was relying on alloc to throw when called with negative number!!! E.g. `torch.linspace(0, 1, -1)`. And it breaks ASAN build. I still believe alloc should receive size_t, but I added a safety enforce inside. It should fix ASAN. I'll follow up with a proper fix for empty_cpu (which is probably the right place to do it) separately Pull Request resolved: https://github.com/pytorch/pytorch/pull/17071 Differential Revision: D14074157 Pulled By: dzhulgakov fbshipit-source-id: 3ed3bdb873e446edecb558e1df491310fd7179e3	2019-02-13 22:41:13 -08:00
svcscm	f0fed41ea2	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: b4e7a3850b01bbec56faa3eb0feb3bc6197c0393	2019-02-13 22:07:16 -08:00
Michael Liu	92a516b9ff	Apply modernize-use-override - 2/2 Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14054721 fbshipit-source-id: 15d266fa1779b1e3ea6270f00841d7fb1e4d44ee	2019-02-13 21:01:28 -08:00
svcscm	84bdf86034	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 5d9763a6f26ba53c6402b978004aaa7508f4e354	2019-02-13 21:01:27 -08:00
ptrblck	8abfd28f58	#16627 convert weights using torch.as_tensor to avoid warning (#17067 ) Summary: Minor change which fixes #16627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17067 Differential Revision: D14078726 Pulled By: soumith fbshipit-source-id: c04a5f1eff44e4a4b04b981f0ae8de6ff018515b	2019-02-13 20:54:29 -08:00
svcscm	b33f4cff6b	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: e074a865b859fd72b34b012505dfbd3a27a0cc41	2019-02-13 20:54:27 -08:00
Edward Yang	dae356df1f	Revert D14062537: [pytorch][PR] Implement NetDef <--> JIT IR converters. Differential Revision: D14062537 Original commit changeset: 88b184ee7276 fbshipit-source-id: 01971bbe20daade40cc2cbf85fc08edb380b445c	2019-02-13 20:29:17 -08:00
Pritam Damania	c3f5ba9460	PyTorch model metadata. (#16275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16275 Adding a generic string `metadata` field as part of the model to capture additional metadata with the model. Reviewed By: dzhulgakov Differential Revision: D13579029 fbshipit-source-id: 7456ef2edbe73bb70bbb31889cecd94e0db329a2	2019-02-13 19:48:11 -08:00
Dmytro Dzhulgakov	46503a7ac0	Trim libshm deps, move tempfile.h to c10 (#17019 ) Summary: libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019 Differential Revision: D14052688 Pulled By: dzhulgakov fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c	2019-02-13 19:38:35 -08:00
Mikhail Zolotukhin	d25fee31fc	Implement NetDef <--> JIT IR converters. (#16967 ) Summary: Currently the converters are very straightforward, i.e. there is no code for trying to preserve semantics, we're purely perform conversion from one format to another. Two things that we might want to add/change: 1. Add semantic conversion as well (but probably it would be a good idea to keep it separate as a temporary thing). 2. Make sure we don't mess with value names, as they are crucial for current uses of NetDefs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16967 Differential Revision: D14062537 Pulled By: ZolotukhinM fbshipit-source-id: 88b184ee7276779e5e9152b149d69857515ad98a	2019-02-13 18:39:39 -08:00
David Riazati	decc0893f2	Remove IgnoredPythonOp sugared value Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17042 Differential Revision: D14072497 Pulled By: driazati fbshipit-source-id: 68fe3fa89c22e60142d758c8cbe0e6e258e7d5c2	2019-02-13 17:59:56 -08:00
Xiaomeng Yang	3a34f443c5	Separate reduce functions from math (#16929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929 Separate CPU reduce functions from math i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13999469 fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1	2019-02-13 17:50:47 -08:00
Junjie Bai	9b7f3da74b	Skip test_cudnn_multiple_threads_same_device on ROCm (flaky) (#17061 ) Summary: cc iotamudelta https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10722//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10710//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10753//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/1756//console ``` 19:07:18 ====================================================================== 19:07:18 FAIL: test_cudnn_multiple_threads_same_device (test_nn.TestNN) 19:07:18 ---------------------------------------------------------------------- 19:07:18 Traceback (most recent call last): 19:07:18 File "/var/lib/jenkins/workspace/test/test_nn.py", line 3905, in test_cudnn_multiple_threads_same_device 19:07:18 (2048 - test_iters) * (2048 - test_iters)) 19:07:18 File "/var/lib/jenkins/workspace/test/common_utils.py", line 453, in assertEqual 19:07:18 super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 19:07:18 AssertionError: 3794704.0 not less than or equal to 1e-05 : ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17061 Differential Revision: D14069324 Pulled By: bddppq fbshipit-source-id: e33b09abca217a62a8b577f9c332ea22985ef4ff	2019-02-13 17:18:47 -08:00
Tongliang Liao	a670824fee	Support FC (Caffe2) -> Gemm (ONNX) with variable input shape. (#16184 ) Summary: For >2D input, previously the code uses static shape captured during tracing and reshape before/after `Gemm`. Now we add `-1` to the first `Reshape`, and uses `Shape(X) => Slice(outer) => Concat(with -1 for inner) => Reshape` for the second. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16184 Differential Revision: D14070754 Pulled By: ezyang fbshipit-source-id: 86c69e9b254945b3406c07e122e57a00dfeba3df	2019-02-13 17:12:34 -08:00
Junjie Bai	2ad5dcbbe4	Make timeout in resnet50_trainer configurable (#17058 ) Summary: xw285cornell petrex dagamayank Pull Request resolved: https://github.com/pytorch/pytorch/pull/17058 Differential Revision: D14068458 Pulled By: bddppq fbshipit-source-id: 15df4007859067a22df4c6c407df4121e19aaf97	2019-02-13 17:03:48 -08:00
Boris Daskalov	41dddfd55f	Make mkldnn Stream object thread_local and enable mkldnn thread-safe (#17022 ) Summary: This PR fixes following issue: https://github.com/pytorch/pytorch/issues/16828 It is a combination of two things: 1) MKLDNN streams are not thread-safe but are currently shared between different threads. This change makes them thread_local 2) By default MKLDNN primitives can share global memory and can't be invoked from multiple threads. This PR enables the MKLDNN_ENABLE_CONCURRENT_EXEC cmake configuration option that makes them thread-safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17022 Differential Revision: D14069052 Pulled By: ezyang fbshipit-source-id: f8f7fcb86c40f5d751fb35dfccc2f802b6e137c6	2019-02-13 16:04:53 -08:00
Tongliang Liao	491f2d4cb8	Support conversion from Caffe2 MergeDim to ONNX Reshape + Squeeze. (#16189 ) Summary: `MergeDim` can be done by `Reshape([1, -1, 0, 0, ...]) + Squeeze`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16189 Differential Revision: D14070676 Pulled By: ezyang fbshipit-source-id: 28d7e9b35cc2c1dcbd4afb3fbdf7383e219b1777	2019-02-13 15:53:38 -08:00
vishwakftw	86594e63eb	Fix mvlgamma doc (#17045 ) Summary: Changelog: - Fix the constant in the docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/17045 Differential Revision: D14068698 Pulled By: ezyang fbshipit-source-id: af040b9a9badea213785f5bf3b6daf4d90050eb2	2019-02-13 15:24:44 -08:00
Mikhail Zolotukhin	f79563a665	Change IR graph print format to make it look more pythonic (#16986 ) Summary: This removes curly braces from the outputs (we have indentation to indicate scopes), also adds ':' after graph and blocks declaration and removes ';' from the return line. ".expect" tests are updated to keep up with it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16986 Differential Revision: D14062540 Pulled By: ZolotukhinM fbshipit-source-id: 7f8e2d11619152a21ef7f1f7f8579c49392c3eca	2019-02-13 12:37:24 -08:00
Gregory Chanan	18b8572505	Turn off the ability for Declarations.cwrap entries to be methods. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17053 Differential Revision: D14065887 Pulled By: gchanan fbshipit-source-id: 5d06ac66d27d28d48c2aff2b0d911f34ea0cd6fd	2019-02-13 12:25:05 -08:00
Jaliya Ekanayake	bc39cf4d5e	Remove chunk count check on the ChunkBuffer (#16868 ) Summary: Previously, the ChunkBuffer depends on the remaining chunk count to signal end of dataloading. This does not work with distributed samplers where each sampler only loads a subset of chunks. This refactor remove the dependency on the remaining chunk count at the ChunkBuffer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16868 Differential Revision: D14066517 Pulled By: goldsborough fbshipit-source-id: 293dfe282ceff326dff0876c2f75c2ee4f4463e2	2019-02-13 11:09:42 -08:00
Stefan Krah	a5e7b1d032	Use IndexError instead of RuntimeError in ATen CPU kernels Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17049 Reviewed By: ezyang Differential Revision: D14064700 Pulled By: fmassa fbshipit-source-id: 3575db103bba5a7d82f574cbb082beca419151ec	2019-02-13 10:19:28 -08:00
Edward Yang	1fc05bd285	Mark IntList as deprecated; add C10_DEPRECATED_USING (#16824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16824 There was a big wooly yak getting the deprecated macros to work. Gory details are in Deprecated.h Reviewed By: smessmer Differential Revision: D13978429 fbshipit-source-id: f148e5935ac36eacc481789d22c7a9443164fe95	2019-02-13 08:51:20 -08:00
Yinghai Lu	db82fc7ca6	Add more debugging facilities to ONNXIFI transform (#17043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17043 Add more debugging facilities for ONXNIFI transform. Reviewed By: ipiszy Differential Revision: D14019492 fbshipit-source-id: 8c258ccba2f8ce77db096031fc8a61e15bd8af93	2019-02-13 00:05:41 -08:00
svcscm	26018d027a	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 399afdc341075c383227d0d410a30eeb6c1d3b08	2019-02-13 00:05:40 -08:00
svcscm	2b5bef22b7	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: edb216d2eca7120d0f7729b2e4640096a0341154	2019-02-12 21:26:24 -08:00
Dmytro Dzhulgakov	51dd2000cd	unify c2 and TH allocator (#16892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892 Replaces https://github.com/pytorch/pytorch/pull/14517 Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators. `memset` of caffe2 allocator is gone now. These two allocators should be almost the same. Baseline: ``` Running ./tensor_allocation Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_MakeStorageImpl 148 ns 148 ns 4676594 BM_StorageImplCtor 54 ns 54 ns 12957810 BM_MallocStorageImpl 62 ns 62 ns 11254745 BM_TensorImplCtor 22 ns 22 ns 31939472 BM_MallocTensorImpl 105 ns 105 ns 6505661 BM_Malloc_1 43 ns 43 ns 16464905 BM_MakeTensorFromStorage 126 ns 126 ns 5586116 BM_MakeVariableFromTensor 236 ns 236 ns 2995528 BM_ATenCPUTensorAllocationSmall1 319 ns 319 ns 2268884 BM_ATenCPUTensorAllocationSmall2 318 ns 318 ns 2163332 BM_ATenCPUTensorAllocationMedium1 403 ns 403 ns 1663228 BM_ATenCPUTensorAllocationMedium2 448 ns 448 ns 1595004 BM_ATenCPUTensorAllocationBig1 532 ns 532 ns 1352634 BM_ATenCPUTensorAllocationBig2 4486 ns 4486 ns 160978 ``` Changed: ``` Running ./tensor_allocation Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_MakeStorageImpl 141 ns 141 ns 4803576 BM_StorageImplCtor 55 ns 55 ns 13129391 BM_MallocStorageImpl 64 ns 64 ns 11088143 BM_TensorImplCtor 23 ns 23 ns 31616273 BM_MallocTensorImpl 101 ns 101 ns 7017585 BM_Malloc_1 39 ns 39 ns 18523954 BM_MakeTensorFromStorage 118 ns 118 ns 5877919 BM_MakeVariableFromTensor 452 ns 452 ns 1565722 BM_ATenCPUTensorAllocationSmall1 384 ns 384 ns 1819763 BM_ATenCPUTensorAllocationSmall2 389 ns 389 ns 1857483 BM_ATenCPUTensorAllocationMedium1 425 ns 425 ns 1646284 BM_ATenCPUTensorAllocationMedium2 430 ns 430 ns 1561319 BM_ATenCPUTensorAllocationBig1 508 ns 508 ns 1309969 BM_ATenCPUTensorAllocationBig2 3799 ns 3799 ns 173674 ``` lstm benchmark: Before: ``` INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k ``` After: ``` INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k ``` Reviewed By: ezyang Differential Revision: D13202632 fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d	2019-02-12 21:16:34 -08:00
svcscm	f87022bf2f	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 7d730945dbdd7bb7d10192061229ee6e759a1a7f	2019-02-12 20:45:38 -08:00
Yinghai Lu	fb5790ce94	Remove second output of Reshape during ONNXIFI transform (#17027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17027 Glow doesn't support second output of Reshape right now and it's useless. For correctness, we do make sure that the second output of Reshape is of Constant type during bound shape inference. Reviewed By: ipiszy Differential Revision: D14056555 fbshipit-source-id: f39cca7ba941bf5a5cc3adc96e2b1f943cc0be93	2019-02-12 18:31:53 -08:00
Johannes M Dieterich	9d01be1a5a	enable more unit tests in test_nn (#16994 ) Summary: These tests work with ROCm 2.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16994 Differential Revision: D14059802 Pulled By: bddppq fbshipit-source-id: 8e2cbb13196c2e0283d3e02b7f761374bc580751	2019-02-12 17:58:44 -08:00
Johannes M Dieterich	02b838e065	fix bicubic upsampling and enable tests (#17020 ) Summary: Fix macro name in ifdef guard, enable upsampling tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17020 Differential Revision: D14059780 Pulled By: bddppq fbshipit-source-id: 82c57d17d5bccdccb548c65d2b7a1ff8ab05af30	2019-02-12 17:33:08 -08:00
Jongsoo Park	92221ad840	Fold col offsets into bias; optimize A symmetric quant (#16942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16942 We can fold col offsets into bias if zero point of activation is constant. fbgemm still needs to provide an option to pass col offsets in case zero point of activation keep changes (e.g., dynamic quantization). A trick to optimize static quantization case is setting A zero point to 0 after folding into bias. This diff also optimizes when weights use symmetric quantization. When B zero point is 0, we use PackAMatrix instead of PackAWithRowOffset . TODO: Ideally, PackAWithRowOffset should perform as fast as PackAMatrix when B_zero_point is 0 to make client code simpler Same in PackAWithIm2Col and depth-wise convolution (group convolution is already doing this) Reviewed By: csummersea Differential Revision: D14013931 fbshipit-source-id: e4d313343e2a16a451eb910beed30e35de02a40c	2019-02-12 17:33:06 -08:00
Johannes M Dieterich	3e1e5d5a8b	enable unit tests in test_cuda that now pass with ROCm 2.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17012 Differential Revision: D14059761 Pulled By: bddppq fbshipit-source-id: 8309c3ffe1efed42b5db69fdec26427413c3f224	2019-02-12 17:28:46 -08:00
Sebastian Messmer	9696fee635	Register CUDA kernels for caffe2 operators (#16691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691 Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10. This now also registers the CUDA kernels with it. Reviewed By: bwasti Differential Revision: D13901619 fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6	2019-02-12 17:24:01 -08:00
Johannes M Dieterich	059c55f8cc	Enable test_jit tests that work on ROCm 2.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17010 Differential Revision: D14059748 Pulled By: bddppq fbshipit-source-id: 7a1f7eee4f818dba91e741437415370973e4d429	2019-02-12 17:18:44 -08:00
Ying Zhang	7c24de8d04	Extract ShapeInfo and some util functions into a separate file. (#17025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17025 Extract ShapeInfo and some util functions into a separate file. Reviewed By: yinghai Differential Revision: D14017432 fbshipit-source-id: 201db46bce6d52d9355a1a86925aa6206d0336bf	2019-02-12 17:06:29 -08:00
Yinghai Lu	f435fb8290	Allow customization of blob node in net_drawer (#16915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16915 TSIA Reviewed By: ipiszy Differential Revision: D14018010 fbshipit-source-id: df5ccc06fa37f08e7a02a8acc466c4ad47afe04e	2019-02-12 15:02:50 -08:00
Yinghai Lu	65b49b4696	Ignore unknown_shaped tensor in bound shape inference (#16916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16916 Two fixes for maximum effort bound shape inference 1. Ignore failed and unknown shape 2. Add specialization for `SparseLengthsWeightedSumFused8BitRowwise`. Reviewed By: ipiszy Differential Revision: D14017810 fbshipit-source-id: 25cd68d35aa20b9ed077bdb562eb7f9deff0ab96	2019-02-12 15:02:49 -08:00
Pearu Peterson	7c1e4258a9	Workarounds to the lack of nvidia-smi and ldconfig programs in macosx (was PR 16968) (#16999 ) Summary: Fix issue #12174 for Mac OSX. PS: This is a duplicate of PR #16968 that got messed up. Sorry for the confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16999 Differential Revision: D14050669 Pulled By: zou3519 fbshipit-source-id: a4594c03ae8e0ca91a4836408b6c588720162c9f	2019-02-12 14:39:28 -08:00
vishwakftw	0d95028bee	Dispatch the correct legacy function for geqrf_out and ormqr_out (#16964 ) Summary: This fixes the segfault. Changelog: - Modify the function calls in LegacyDefinitions for `geqrf_out` and `ormqr_out` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16964 Differential Revision: D14025985 Pulled By: gchanan fbshipit-source-id: aa50e2c1694cbf3642273ee14b09ba12625c7d33	2019-02-12 13:48:51 -08:00
Davide Libenzi	68c3b959de	Register layout for XLA backend. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16946 Differential Revision: D14054716 Pulled By: gchanan fbshipit-source-id: 063495b99b9f7d29ca3ad2020a6bc90d36ba0d7d	2019-02-12 13:44:07 -08:00
Tongliang Liao	0eee56fff7	Export ReduceMean/ReduceFrontMean/ReduceBackMean (Caffe2) to ReduceMean (ONNX). (#16727 ) Summary: The second input (`lengths`) is not supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16727 Differential Revision: D14054105 Pulled By: houseroad fbshipit-source-id: 36b8d00460f9623696439e1bd2a6bc60b7bb263c	2019-02-12 13:35:32 -08:00
James Reed	b0d57aa7b1	Clean up allocations in FBGEMM linear (#16985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16985 These statements were causing some redundant allocations + copying, so I cleaned them up Reviewed By: zdevito, wanchaol Differential Revision: D14031067 fbshipit-source-id: f760fb29a2561894d52a2663f557b3e9ab1653de	2019-02-12 13:02:21 -08:00
Gregory Chanan	34e4bd3ec5	Properly dispatch s_copy__cpu. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16974 Differential Revision: D14030516 Pulled By: gchanan fbshipit-source-id: ba4cde5ebf2898d207efbc9117c1f1d6ccae861b	2019-02-12 12:53:36 -08:00
Gregory Chanan	2b1e2b6b53	Get rid of unused THPStorage defines related to accreal. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16973 Differential Revision: D14029538 Pulled By: gchanan fbshipit-source-id: b51f203ccff97695bf228772bb13e3e6b9bb6d1a	2019-02-12 12:48:48 -08:00
Yinghai Lu	f2e6a3f230	Fix AddAdjustBatchOp (#16997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16997 1. Don't create multiple AdjustBatch ops for the same input name. We create it once and hook input to abc_post_adjust_batch. 2. Dangling tensor. The problem for such an error is still with AttachAdjustBatchOp. Considering such as net ``` op { type : "Relu" input: "X" outpu: "Y" } op { type : "Relu" input: "Y" output: "Y2" } external_output: "Y" external_output: "Y2" ``` In this the output of first Relu will be used as an internal node as well as output. We cannot simply rename Y into Y_pre_batch_adjust. Basically, we need another pass in to check all the input of the ops in the net and rename Y into Y_pre_batch_adjust. Reviewed By: bertmaher Differential Revision: D14041446 fbshipit-source-id: f6553e287a8dfb14e4044cc20afaf3f290e5151b	2019-02-12 11:45:43 -08:00
Will Feng	21ce1da5e9	Roll back PyTorch DockerVersion to 282 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17013 Differential Revision: D14052415 Pulled By: yf225 fbshipit-source-id: df663fb46ee825174fe06b8d395979b3d4e84766	2019-02-12 11:39:15 -08:00
Karl Ostmo	a2c322e735	fix silent failure on Windows builds (#16984 ) Summary: Closes #16983 Remove backticks that are being interpreted by the shell. Add -e option to bash script to avoid future such failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/16984 Reviewed By: yf225 Differential Revision: D14039128 Pulled By: kostmo fbshipit-source-id: c31a1895377ca86c1b59e79351843cc8c4fd7de3	2019-02-12 11:06:27 -08:00
Theo	3618b52c74	Add module and name to func created with _jit_internal.boolean_dispatch (#16922 ) Summary: The use case for making this PR is the following bug : (with F = torch.nn.functional) `F.max_pool2d.__module__` is `torch._jit_internal` `F.max_pool2d.__name__` is `fn` With this PR you get: `F.max_pool2d.__module__` is `torch.nn.functional` `F.max_pool2d.__name__` is `max_pool2d` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16922 Differential Revision: D14020053 Pulled By: driazati fbshipit-source-id: c109c1f04640f3b2b69bc4790b16fef7714025dd	2019-02-12 09:38:48 -08:00
Edward Yang	40528efeac	More docs for methods in operator.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16826 Reviewed By: izdeby Differential Revision: D13979891 fbshipit-source-id: df8391ffaff0d44845057bb839f05aea6fc5712c	2019-02-12 08:19:38 -08:00
Daniel	e5742494f6	Minor typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16980 Differential Revision: D14033686 Pulled By: gchanan fbshipit-source-id: 9f7967defc6795640e14157d0b701b185061741f	2019-02-12 08:02:04 -08:00
SsnL	f61f9e1757	Fix allow_inf in assertEqual (#16959 ) Summary: gchanan pointed out in https://github.com/pytorch/pytorch/pull/16389 that `allow_inf` is treating `-inf` and `inf` as equal. This fixes it. Also fixing #16448 since it's near and 2.1 has released. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16959 Differential Revision: D14025297 Pulled By: gchanan fbshipit-source-id: 95348309492e7ab65aa4d7aabb5a1800de66c5d6	2019-02-12 07:56:34 -08:00
Edward Yang	ae1fc584ea	Refine return type Stream to HIPStream in HIPStreamGuardMasqueradingAsCUDA (#16978 ) Summary: Previously, we used the templated class directly to provide implementations. However, there is a subtle difference between this, and CUDAStreamGuard: CUDAStreamGuard has refined types for the Streams it returns. This lead to a compilation failure of HIPified ddp.cpp. This commit lines them up more closely, at the cost of copy-paste. A possible alternate strategy would have been to extend the InlineDeviceGuard templates to optionally accept refinements for Stream. I leave this for future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16978 Differential Revision: D14045346 Pulled By: ezyang fbshipit-source-id: 2b101606e62e4db588027c57902ea739a2119410	2019-02-12 07:36:25 -08:00
Edward Yang	b7b245845a	Revert D14030665: [pytorch][PR] [HOTFIX] Pin docker-ce version to the one expected by nvidia-docker2 Differential Revision: D14030665 Original commit changeset: dece6a5aa4d1 fbshipit-source-id: 885a464ec3d1c23d4e07630fa3b67e69a3eab1b8	2019-02-12 07:04:57 -08:00
Simeon Monov	bad4442a7c	Parse the command line and check the arguments before build_deps() (#16914 ) Summary: This is needed to check for wrong arguments or --help options before `build_deps()` is executed. Otherwise command line arguments are not parsed and checked until `setup()` is run. Fixes: #16707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16914 Differential Revision: D14041236 Pulled By: soumith fbshipit-source-id: 41f635772ccf47f05114775d5a19ae04c495ab3b	2019-02-12 00:15:42 -08:00
Dmytro Dzhulgakov	4d4c5273de	Fix and add testing for nullptr allocator in c2->pt conversion (#16857 ) Summary: Fixes the bug for when tensor is created on Caffe2 side, then passed to PT and resized. Now we just initialize allocator correctly. Note that the code in raw_mutable_data() is still necessary because of non-resizable tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16857 Reviewed By: houseroad Differential Revision: D14019469 Pulled By: dzhulgakov fbshipit-source-id: 14d3a3b946d718bbab747ea376903646b885706a	2019-02-11 23:21:02 -08:00
Dmytro Dzhulgakov	aa626840af	Fix NERPredictor for zero initialization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16931 Reviewed By: dragonxlwang Differential Revision: D14016749 fbshipit-source-id: b5512c52cef77651bdba1e31f588ea649daacdd9	2019-02-11 23:12:26 -08:00
David Riazati	d266453541	Allow calling a Python function with a dict Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16989 Differential Revision: D14037896 Pulled By: driazati fbshipit-source-id: 5f26d2d8fabf0f267909a3383f19d984645f94d0	2019-02-11 21:52:44 -08:00
Kimish Patel	4292d13240	Keep weights name unchanged during SsaRewrite (#16932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16932 During onnxifi transformation net ssa is rewritten. At the last step the weight names are changed back to what they were before. The diff keeps the weight names unchanged thru the process. Reviewed By: yinghai Differential Revision: D13972597 fbshipit-source-id: 7c29857f788a674edf625c073b345f2b44267b33	2019-02-11 14:55:31 -08:00
Will Feng	917eac91f4	Pin docker-ce version to the one expected by nvidia-docker2 (#16976 ) Summary: Fix errors such as https://circleci.com/gh/pytorch/pytorch/760715. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16976 Differential Revision: D14030665 Pulled By: yf225 fbshipit-source-id: dece6a5aa4d13ff771c18b4ce02a0b9f9572a379	2019-02-11 14:21:20 -08:00
Sebastian Messmer	920c684367	Expose GenerateProposals to PyTorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16880 Reviewed By: bwasti Differential Revision: D13998092 fbshipit-source-id: 23ab886ba137377312557fa718f262f4c8149cc7	2019-02-11 14:15:47 -08:00
Sebastian Messmer	0c02d317ea	Expose BBoxTransform to pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16879 Reviewed By: bwasti Differential Revision: D13998093 fbshipit-source-id: ddfe4bff83e9a1a4cedf1e520e6d2977b21cb3af	2019-02-11 14:15:45 -08:00
Sebastian Messmer	64aa769ef9	Minimize templated code in caffe2 operator wrapper (#16965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16965 Instead of having one large templated function to wrap the caffe2 op, minimize the amount of templated code. Non-templated code can be reused between different operators and decreases binary size. Reviewed By: orionr Differential Revision: D14018806 fbshipit-source-id: bedd4152eec21dd8c5778446963826316d210543	2019-02-11 14:15:43 -08:00
Adam Paszke	7743ed8502	Don't keep unnecessary saved_inputs alive (#16583 ) Summary: Fixes #16577. This greatly improves memory efficiency of certain ops like Dropout2d. Previously, they were implemented as `input * mask` where mask never requires_grad, but we didn't use that knowledge in forward, and (in case of a in-place dropout) kept input.clone() for the backward, when it would simply get ignored. This patch tries to address this situation by emitting some guards for stores like this, but only if they are as simple, as checking if a single value requires_grad. Interestingly, the same optimizations apply to methods like bmm, baddmm, etc., but _not to mm nor addmm_, because of how their derivatives are defined. Apparently they unnecessarily use `mat1` to compute the derivative of `mat1` just to improve the error message in case `mat1` was sparse. I'd like to apply this optimization to that case, but I don't want to loose the nicer error message, so if anyone has any ideas for solutions, please let me know... Full list of operators affected by this patch: * _nnpack_spatial_convolution * addbmm * addcdiv * addcmul * addmv * addr * baddbmm * bmm * cross * div * dot * fmod * ger * index_add_ * mul * mv * scatter_add_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/16583 Differential Revision: D13900881 Pulled By: gchanan fbshipit-source-id: dd0aeb2ab58c4b6aa95b37b46d3255b3e014291c	2019-02-11 13:42:09 -08:00
Will Feng	e2a5b203fc	Enforce same input tensor storage in VariableType functions (#16305 ) Summary: In VariableType.cpp, when a function modifies its input tensors, it should only change the input tensors' storage data in-place, and should never change the input tensors' storage pointers. This PR adds checks for this, and also fixes functions that fail this test. This is part of the Variable/Tensor merge work (https://github.com/pytorch/pytorch/issues/13638). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16305 Differential Revision: D13897855 Pulled By: yf225 fbshipit-source-id: 0c4fc7eb530d30db88037b1f0981f6f8454d3b79	2019-02-11 13:33:12 -08:00
Sebastian Messmer	4b454c3bdd	Revert unneeded fixes in flat_hash_map (#16907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16907 The begin()/end() fix actually doesn't make sense, see my comment on https://github.com/skarupke/flat_hash_map/pull/8 This diff removes it. Reviewed By: ezyang Differential Revision: D13985779 fbshipit-source-id: f08b02c941069e2a4e728e02a19b65dc72f96b41	2019-02-11 13:28:25 -08:00
Sebastian Messmer	9521612bb7	Fix constexpr in KernelRegistrationBuilder (#16906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16906 In C++11, constexpr implies const, so these methods actually wouldn't be rvalue overloads as intended but const rvalue overloads. Let's only apply the constexpr flag in C++14 to be safe. Reviewed By: bddppq Differential Revision: D13998486 fbshipit-source-id: a04d17ef0cc8f45e3d0a1ca9843d194f4f0f6f7f	2019-02-11 13:28:23 -08:00
Xiaodong Wang	af0c79eed4	Catch cudaError_t return val (nodiscard in rocm) (#16399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16399 Catching cudaError_t return values in a few places, because it's nodiscard in rocm. Unless we add -Wno-unused-result, it'll end up with a compilation error. Also in c10/cuda/test, check whether a host has GPU or not. We were silently throwing out the error before (so not really testing the cuda api). Reviewed By: bddppq Differential Revision: D13828281 fbshipit-source-id: 587d1cc31c20b836ce9594e3c18f067d322b2934	2019-02-11 13:18:36 -08:00
Thomas Viehmann	29f096cc70	optionally zero infinite losses in CTCLoss (#16199 ) Summary: Here is a stab at implementing an option to zero out infinite losses (and NaN gradients). It might be nicer to move the zeroing to the respective kernels. The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful. This topic seems to come up regularly, e.g. in #14335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199 Differential Revision: D14020462 Pulled By: ezyang fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428	2019-02-11 13:12:55 -08:00
Zhizhen Qin	632df48207	Merge binaries "convert_image_to_tensor" and "caffe2_benchmark" (#16875 ) Summary: Merge binaries "convert_image_to_tensor" and "caffe2_benchmark" to remove the overhead of writing to/reading from Tensor file. *TODO next: TensorProtos is another overhead. No need for de-serialization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16875 Reviewed By: sf-wind Differential Revision: D13997726 Pulled By: ZhizhenQin fbshipit-source-id: 4dec17f0ebb59cf1438b9aba5421db2b41c47a9f	2019-02-11 13:07:26 -08:00
SsnL	b4f1a871e8	Fix missing CircleCI GPG key (#16961 ) Summary: I'm seeing a bunch of apt gpg key errors on CI with the following message: ``` An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://packagecloud.io trusty InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 4E6910DFCB68C9CD ``` Most of the times apt will reuse the old cached version, but sometimes this results in a build failure: https://circleci.com/gh/pytorch/pytorch/758366?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link. This should hopefully fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16961 Differential Revision: D14028151 Pulled By: ezyang fbshipit-source-id: 7648a0a58ece38d8d04916937a9fa17f34f8833e	2019-02-11 12:31:38 -08:00
Edward Yang	c90a33b627	Disable binary_linux_conda_3.6_cu90_build on PRs. (#16958 ) Summary: Issue tracked at https://github.com/pytorch/pytorch/issues/16710 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16958 Differential Revision: D14028078 Pulled By: ezyang fbshipit-source-id: 6c68f79775a156ef4a55ac450a5a0ecacc0e6af5	2019-02-11 11:53:49 -08:00
Xiaodong Wang	48c5d0ae8c	Install Thrust package and stop patching (#16911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16911 I think the Thrust package has want we want for /opt/rocm/include/thrust. We probably can stop patching it now. Reviewed By: bddppq Differential Revision: D14015177 fbshipit-source-id: 8d9128783a790c39083a1b8b4771c2c18bd67d46	2019-02-11 09:47:39 -08:00
Eskil Jörgensen	8042edcdb1	Make pin_memory and default_collate preserve namedtuples (#16440 ) Summary: Open issue: https://github.com/pytorch/pytorch/issues/3281 Corresponding PR (conflict): https://github.com/pytorch/pytorch/pull/4577 Another open issue: https://github.com/pytorch/pytorch/issues/14613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16440 Differential Revision: D14020901 Pulled By: ezyang fbshipit-source-id: 4abe817fc43c281a510715d311bad544511995d3	2019-02-11 08:47:33 -08:00
Edward Yang	d7e6f9b5a7	Revert D14020906: [pytorch][PR] Extend support for exporting reshape to onnx. Differential Revision: D14020906 Original commit changeset: 168616873044 fbshipit-source-id: 2730bb6990d41f3a9cef6625ea919c219733433d	2019-02-11 06:08:55 -08:00
Ivan Ogasawara	8b4dea3f56	Added scientific notation on set_printoptions (#16876 ) Summary: This PR fixes #15683 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16876 Differential Revision: D14021703 Pulled By: soumith fbshipit-source-id: 1f603a7d24e331831d8d389f4a704c6a5b070b0c	2019-02-11 04:55:12 -08:00
BowenBao	4335aac6e6	Extend support for exporting reshape to onnx. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16632 Differential Revision: D14020906 Pulled By: ezyang fbshipit-source-id: 168616873044b980145a3554dab942bdec19efb2	2019-02-10 20:19:35 -08:00
eyyub.sari@epitech.eu	e661dc27ff	Int8GivenTensorFill Operator Schema fix typo (#16204 ) Summary: Hi, caffe2/operators/quantized/int8_given_tensor_fill_op.cc expects the value array to be named "values" but the operator schema describe "value" (no s). I guess it is a little typo but it made me losing a bit of time before understanding why I had this error by passing "value" instead of "values": ``` [F int8_given_tensor_fill_op.h:95] Check failed: output->t.numel() == values_.numel() output size: 3 given size: 0 Aborted (core dumped) ``` Thanks, Eyyüb Sari Pull Request resolved: https://github.com/pytorch/pytorch/pull/16204 Differential Revision: D14020476 Pulled By: ezyang fbshipit-source-id: a8a46bfc44ec125e7925ce4b7c79fdf99c890a50	2019-02-10 20:08:45 -08:00
Adam Paszke	8f6ee88a1d	Add support for fusion of half batch norm with float stats (#16735 ) Summary: Fixes #16642. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/16735 Differential Revision: D14020310 Pulled By: ezyang fbshipit-source-id: ac78726f471d16d188eb998354d52bc79fe2c282	2019-02-10 19:37:57 -08:00
musikisomorphie	c282afffa7	Improve the Sparse matrix multiplication computational speed #16187 (#16905 ) Summary: Instead of converting coo to csr format of the sparse matrix in the original implementation, in my revision I directly use coo format for sparse dense matrix mutliplication. On my linux machine it is 5 times faster than the original code: ``` (original code) SIZE: 15000 DENSITY: 0.01 DEVICE: cpu torch: 0.39403 seconds np: 0.00496674 seconds torch/np: 79.3338 ---------------------------------------- (my update) SIZE: 15000 DENSITY: 0.01 DEVICE: cpu torch: 0.0812583 seconds np: 0.00501871 seconds torch/np: 16.1911 ``` Further code feedback and running time tests are highly welcomed. I will keep revise my code if needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16905 Differential Revision: D14020095 Pulled By: ezyang fbshipit-source-id: 4ab94075344a55b375f22421e97a690e682baed5	2019-02-10 19:37:54 -08:00
Michael Carilli	0742874643	Allow dataloader to accept a custom memory pinning function (#16743 ) Summary: Renewed attempt at https://github.com/pytorch/pytorch/pull/14171 From the original PR: > Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. > >This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type. The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly. The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for. fmassa and yf225 who were my POCs on the old PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743 Differential Revision: D13991745 Pulled By: ezyang fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17	2019-02-10 19:37:53 -08:00
Hameer Abbasi	73d7ecd183	Add abs for ByteTensor and CharTensor. (#16893 ) Summary: Fixes #15089 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16893 Differential Revision: D14020115 Pulled By: ezyang fbshipit-source-id: 6f3be6ed28d2d37667159be45959d400bc473451	2019-02-10 19:31:57 -08:00
Xiang Gao	eae139e18f	Support named tuple return from operators on JIT (#16253 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/16233 The following changes are made: - Modify `TupleType` to store optional field names - Modify schema matching to return fill in those field names when creating `TupleType` as return type. - Modify codegen of JIT to copy field names to schema string - Modify `SchemaParser` to set field names of returned schema. - Modify `SimpleValue::attr` to emit tuple indexing for named tuple. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16253 Reviewed By: ezyang Differential Revision: D13954298 Pulled By: zdevito fbshipit-source-id: 247d483d78a0c9c12d1ba36e1f1ec6c3f1a3007b	2019-02-10 18:15:56 -08:00
Derek Kim	9cb41e5386	Enhance the documentation for torch.nn.DataParallel (#15993 ) Summary: I found a few sentences in DataParallel docstring confusing, so I suggest this enhancement. - Arbitrary arguments are allowed to be passed .... INCLUDING tensors (Not EXCLUDING) - The original author said that "other types" are shallow-copied but I think actually only some builtin types are (effectively) shallow-copied. And "other types" are shared. Here is an example. ```python import torch from torch.nn import Module, DataParallel from collections import deque class MyModel(Module): def forward(self, x): x.append(None) model = MyModel(); model.cuda() model = DataParallel(model) d = deque() model.forward(d) print(d) ``` This is a side note. As far as I know, copying objects is not a specially frequent operation in python unlike some other languages. Notably, no copying is involved in assignment or function parameter passing. They are only name bindings and it is the whole point of "everything is object" python philosophy, I guess. If one keep this in mind, it may help you dealing with things like multithreading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15993 Differential Revision: D14020404 Pulled By: ezyang fbshipit-source-id: a38689c94d0b8f77be70447f34962d3a7cd25e2e	2019-02-10 15:55:31 -08:00
ZhuBaohe	aae6b53c5b	DOC: correct docstring for torch and torch.Tensor package (#16842 ) Summary: This PR is a simple fix for the mistake in the "tensor" and "torch.Tensor"doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16842 Differential Revision: D14020300 Pulled By: ezyang fbshipit-source-id: 3ab04f1223d6e60f8da578d04d759e385d23acbb	2019-02-10 14:37:29 -08:00
Thomas Viehmann	6a528007a6	find libnvToolsExt instead of using only hardcoded path (#16714 ) Summary: This changes the libnvToolsExt dependency to go through CMake find_library. I have a machine where cuda libs, and libnvToolsExt in particular, are in the "usual library locations". It would be neat if we could find libnvToolsExt and use the path currently hardcoded as default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16714 Differential Revision: D14020315 Pulled By: ezyang fbshipit-source-id: 00be27be10b1863ca92fd585f273d50bded850f8	2019-02-10 14:01:00 -08:00
Xiang Gao	8c9df48fd4	Clean up autograd method tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16790 Differential Revision: D14020305 Pulled By: ezyang fbshipit-source-id: 3aa3362830cde35967a3895837a25b3cf3287569	2019-02-10 13:49:12 -08:00
Travis Johnston	a1a330bd6e	fixed LogSigmoid math string that wasn't rendering in documentation (#16900 ) Summary: The documentation for LogSigmoid says: > Applies the element-wise function: > \<blank\> Now the documentation properly displays the math string. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16900 Differential Revision: D14020097 Pulled By: ezyang fbshipit-source-id: 41e229d0fcc6b9bb53367be548bf85286dc13546	2019-02-10 11:47:56 -08:00
drkw	e0323a6aea	ctc_loss error message bug fix. (#16917 ) Summary: CTCLLoss argument error message is wrong. Please fix this. (sorry if I made some mistakes.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16917 Differential Revision: D14019983 Pulled By: ezyang fbshipit-source-id: 3337a2e86da6f3f7594c73fddb73340494a19ce2	2019-02-10 10:49:29 -08:00
Will Feng	202eaa4ef4	Use non-Variable type for callsites that check type equality (#16325 ) Summary: When Variable and Tensor are merged, the dynamic type of the tensors passed to certain functions will become variables, and expecting `type()` on those variables to still return non-Variable types will cause type mismatch error. One way to fix this problem is to use the thread-local guard `at::AutoNonVariableTypeMode` to force `type()` to return non-Variable type, but ideally we want to limit the use of `at::AutoNonVariableTypeMode` to be only in VariableType.cpp. Another way to fix the problem is to use `at::globalContext().getNonVariableType()` instead to get the non-Variable type of the tensor, which is what this PR is trying to achieve. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16325 Differential Revision: D14012022 Pulled By: yf225 fbshipit-source-id: 77ef1d2a02f78bff0063bdd72596e34046f1e00d	2019-02-10 09:47:50 -08:00
Jiren Jin	a9f1d2e371	Fix the error in the note about `torch.device` documentation. (#16839 ) Summary: This PR is a simple fix for the mistake in the first note for `torch.device` in the "tensor attributes" doc. ![image](https://user-images.githubusercontent.com/8536399/52399611-1becaa00-2b00-11e9-85bf-cac04b29842d.png) ``` >>> # You can substitute the torch.device with a string >>> torch.randn((2,3), 'cuda:1') ``` Above code will cause error like below: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-53-abdfafb67ab1> in <module>() ----> 1 torch.randn((2,3), 'cuda:1') TypeError: randn() received an invalid combination of arguments - got (tuple, str), but expected one of: * (tuple of ints size, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) * (tuple of ints size, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) ``` Simply adding the argument name `device` solves the problem: `torch.randn((2,3), device='cuda:1')`. However, another concern is that this note seems redundant as there is already another note covering this usage: ![image](https://user-images.githubusercontent.com/8536399/52399583-0ecfbb00-2b00-11e9-914f-e95da4edecd1.png) So maybe it's better to just remove this note? Pull Request resolved: https://github.com/pytorch/pytorch/pull/16839 Reviewed By: ezyang Differential Revision: D13989209 Pulled By: gchanan fbshipit-source-id: ac255d52528da053ebfed18125ee6b857865ccaf	2019-02-09 20:18:58 -08:00
Johannes M Dieterich	19790b218f	Register coalescer bug was fixed in ROCm 2.1 (#16923 ) Summary: Remove specialization/workaround for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16923 Differential Revision: D14018521 Pulled By: bddppq fbshipit-source-id: d88162740bca6dc8ad37397dfbf8c84408074a00	2019-02-09 11:27:50 -08:00
Johannes M Dieterich	d72c5d5a49	Alignas is now correctly handled on ROCm (#16920 ) Summary: Post 2.1 release, packing is fixed and alignas works as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16920 Differential Revision: D14018539 Pulled By: bddppq fbshipit-source-id: 0ed4d9e9f36afb9b970812c3870082fd7f905455	2019-02-09 11:27:48 -08:00
Johannes M Dieterich	5089ee9677	Enable buildin bitonic sort (#16919 ) Summary: It now works post ROCm 2.1 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16919 Differential Revision: D14018538 Pulled By: bddppq fbshipit-source-id: c4e1bafb53204a6d718b2d5054647d5715f23243	2019-02-09 11:27:47 -08:00
Junjie Bai	f169f398d0	Change the default image size from 227 to 224 in resnet50 trainer (#16924 ) Summary: cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/16924 Differential Revision: D14018509 Pulled By: bddppq fbshipit-source-id: fdbc9e94816ce6e4b1ca6f7261007bda7b80e1e5	2019-02-09 11:18:58 -08:00
Johannes M Dieterich	23e1c55cc0	enable unit tests working on ROCm 2.1 (#16871 ) Summary: This is the first round of enabling unit tests that work on ROCm 2.1 in my tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871 Differential Revision: D13997662 Pulled By: bddppq fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f	2019-02-09 00:30:50 -08:00
Elias Ellison	fc4f33b08f	Add suggest add to __constants__ message on save fail Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16850 Differential Revision: D14014735 Pulled By: eellison fbshipit-source-id: 7b6d5d5b64b9b107743cea1548cb4ee1b653977e	2019-02-08 19:40:11 -08:00
Chandler Zuo	6737190b5c	Make the exception raised from "numpy.dtype(numpy.void, (INT,))" less cryptic (#16809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16809 https://fb.facebook.com/groups/582508038765902/permalink/736710343345670/?comment_id=824042307945806&reply_comment_id=824318864584817 numpy.dtype(numpy.void, (<INT>, )) raises a cryptic message "invalid itemsize in generic type tuple" that is hard to debug. This diff adds the message to ask the user to investigate the error causing blob. Reviewed By: kennyhorror Differential Revision: D13973359 fbshipit-source-id: 43a0c492ffafbabdfd7f7541c08a258e5ac0280f	2019-02-08 16:46:50 -08:00
Bram Wasti	12bace141b	Revert D13970381: [caffe2] Add visibility to registry class to fix ubsan error Differential Revision: D13970381 Original commit changeset: 763db24b8a98 fbshipit-source-id: dda8672ed0bc6fecc4dde5ce73feb99e15205978	2019-02-08 16:21:10 -08:00
Nikita Shulga	0799a81cb7	Extend Net.RunAllOnGPU() to support RecurrentNetwork op (#15713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15713 [caffe2] Extend Net.RunAllOnGPU() to support RecurrentNetwork op Reviewed By: dzhulgakov Differential Revision: D13576507 fbshipit-source-id: f517127492c9d516ece663d42fef84338c70344e	2019-02-08 15:48:42 -08:00
James Reed	48fe839d56	delete critical section in TH*Tensor_addmm (#16889 ) Summary: This was serializing all calls to `addmm` (and any op that used it, in my case `bmm`) in the entire process, and led to downright atrocious performance in the TorchScript threaded runtime. Removing this gives a 2x throughput boost for high-load machine translation inference. The original justification for this is dubious: there are other `gemm` callsites in the codebase that are not protected by critical sections. And in caffe2 land we never had any issues with nonreentrant BLAS libraries Pull Request resolved: https://github.com/pytorch/pytorch/pull/16889 Differential Revision: D14008928 Pulled By: jamesr66a fbshipit-source-id: 498e2133bd6564dba539a2d9751f4e61afbce608	2019-02-08 14:49:01 -08:00
Bram Wasti	f83556bb7b	Revert D13806753: [pytorch][PR] TensorIterator cuda launch configs update Differential Revision: D13806753 Original commit changeset: 37e45c7767b5 fbshipit-source-id: 74ac9f54f86853287b372ccf21fb37ed0e04a5d3	2019-02-08 12:44:42 -08:00
Elias Ellison	cd2dca3caf	Allow sequential modules in module list (#16882 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/16845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16882 Differential Revision: D14007746 Pulled By: eellison fbshipit-source-id: d7918275cc1de6a67320619c3203463f66783343	2019-02-08 12:32:11 -08:00
Gu, Jinghui	5ada54e0bc	Impl ExpandDims op and fallback to CPU if needed (#15264 ) Summary: Impl ExpandDims op and fallback to CPU if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15264 Differential Revision: D13808797 Pulled By: yinghai fbshipit-source-id: 7795ec303a46e85f84e5490273db0ec76e8b9374	2019-02-08 12:04:53 -08:00
Bram Wasti	54c981d9a9	Add visibility to registry class to fix ubsan error (#16792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16792 fix Reviewed By: ezyang Differential Revision: D13970381 fbshipit-source-id: 763db24b8a98a2757a63b77c70c8c68ba47f31e6	2019-02-08 10:17:47 -08:00
Edward Yang	b9b0be7af2	Remove Legacy entry point. (#16721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16721 The very key line is we have to set the stream to the default stream before calling the allocator. This is very interesting. It shouldn't be necessary, but seemingly is! Reviewed By: dzhulgakov Differential Revision: D13943193 fbshipit-source-id: c21014917d9fe504fab0ad8abbc025787f559287	2019-02-08 09:33:58 -08:00
Edward Yang	b3fbd3eebf	Deduplicate instances caching allocator, so that we only have one instance. (#16720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16720 I'm taking the deduplication slowly because there is something here that is causing problems, and I want to figure out what it is. Reviewed By: dzhulgakov Differential Revision: D13943194 fbshipit-source-id: cbc08fee5862fdcb393b9dd5b1d2ac7250f77c4b	2019-02-08 09:33:56 -08:00
Edward Yang	5c982622b0	Delete duplicate copy of THCCachingAllocator (round two). (#16615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16615 This is another go at landing https://github.com/pytorch/pytorch/pull/16226 Now that the caching allocator is moved to c10_cuda, we can delete the duplicate copy from Caffe2. The difference between this and the previous PR is that this version faithfully maintains the binding code; in particular, we end up with a SECOND copy of the caching allocator in this patch. I verified that this code does NOT cause a crash in the workflow we canaried last time. In further diffs, I plan to eliminate the second copy, and then adjust the binding code. Reviewed By: dzhulgakov Differential Revision: D13901067 fbshipit-source-id: 66331fd4eadffd0a5defb3cea532d5cd07287872	2019-02-08 09:33:55 -08:00
Junjie Bai	f03296299b	Bump caffe2 docker images to 248 (#16863 ) Summary: Jenkins jobs update will be separate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16863 Differential Revision: D13994672 Pulled By: bddppq fbshipit-source-id: 5b27879dc6ac11a42016fe7835e9124345005ebb	2019-02-08 00:40:04 -08:00
Sebastian Messmer	6ce147c021	Also register op schema when no kernels are registered Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16878 Reviewed By: bwasti Differential Revision: D13997959 fbshipit-source-id: 7527a560b03f672f76e95d4f22ae28ce24698cc1	2019-02-07 20:53:21 -08:00
Sebastian Messmer	2c713032a1	Don't automatically handle context parameter (#16867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16867 Some caffe2 operators (example: BBoxTransform) have not just one template parameter which is the context, but might have multiple template parameters. Because of this, we can't handle the context parameter inside the macro. Reviewed By: bwasti Differential Revision: D13995696 fbshipit-source-id: f55c3be913c8b125445a8d486846fc2fab587a63	2019-02-07 20:53:17 -08:00
Yinghai Lu	fe5989d466	Support onnxifi with partially shaped inferred net (#16877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16877 That's it. Reviewed By: ipiszy Differential Revision: D13997771 fbshipit-source-id: f512c7f30b4a4747aca335a0769712c2a2cc2206	2019-02-07 20:44:39 -08:00
Pearu Peterson	7ce33c586d	Robust determination of cudnn library and relevant conda packages. (#16859 ) Summary: This PR implements: 1. a fix to issue #12174 - determine the location of cudnn library using `ldconfig` 2. a fix to determine the installed conda packages (in recent versions of conda, the command `conda` is a Bash function that cannot be called within a python script, so using CONDA_EXE environment variable instead) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16859 Differential Revision: D14000399 Pulled By: soumith fbshipit-source-id: 905658ecacb0ca0587a162fade436de9582d32ab	2019-02-07 20:34:46 -08:00
Yinghai Lu	930ed00b33	Specialize LengthsRangeFill and SparseLengthsWeightedSum in bound shape inference (#16869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16869 TSIA. Reviewed By: ipiszy, rdzhabarov Differential Revision: D13994946 fbshipit-source-id: 7e507abc5a3c2834c92910e521387085c56e8b2e	2019-02-07 20:18:15 -08:00
Summer Deng	b5111918cd	Activation histogram net observer with multiple histogram files as output (#16855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16855 Save the histogram of each net to a separate file Reviewed By: jspark1105 Differential Revision: D13991610 fbshipit-source-id: a5be4e37a5e63567dcd7fdf99f451ee31bb350a5	2019-02-07 19:51:30 -08:00
David Riazati	ee0e71bee7	Allow dicts in C++ frontend (#16846 ) Summary: Fixes #16856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16846 Differential Revision: D13991103 Pulled By: driazati fbshipit-source-id: 4830dd6f707fa90429b5d3070eeda0bee53d2f2b	2019-02-07 18:44:49 -08:00
Xiaomeng Yang	2db847b3a7	Separate elementwise level2 math functions (#16753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753 Separate elementwise level2 math functions i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13954928 fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67	2019-02-07 18:38:26 -08:00
Freddie Mendoza	22477c6a7f	Fix (#2 ) ppc64le build break on git status --porcelain check (#16852 ) Summary: Add test/.hypothesis/ to .gitignore to pass git status --porcelain check in CI build Pull Request resolved: https://github.com/pytorch/pytorch/pull/16852 Differential Revision: D14000206 Pulled By: soumith fbshipit-source-id: 5da99a4bb242c12aa35776f7254f6399a7fa6d8c	2019-02-07 18:29:37 -08:00
Michael Suo	96369506c4	doc updates for TorchScript (#16866 ) Summary: Some batched updates: 1. bool is a type now 2. Early returns are allowed now 3. The beginning of an FAQ section with some guidance on the best way to do GPU training + CPU inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/16866 Differential Revision: D13996729 Pulled By: suo fbshipit-source-id: 3b884fd3a4c9632c9697d8f1a5a0e768fc918916	2019-02-07 18:03:57 -08:00
Alex Şuhan	67bb7b2931	Fix autodiff of nll_loss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16851 Differential Revision: D13995046 Pulled By: wanchaol fbshipit-source-id: 557c99f1d1825fa9b6031dd9fa8ba9b54205e8c4	2019-02-07 17:42:01 -08:00
James Reed	c35f3ae89f	aten::_convolution now participates in shape analysis (#16837 ) Summary: During tracing, we record `aten::_convolution` rather than `aten::convolution`. The schema for the former was not present in the shape analysis pass, and resulted in some missing shape information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16837 Differential Revision: D13993831 Pulled By: jamesr66a fbshipit-source-id: ebb63bf628d81613258caf773a3af5930303ce5a	2019-02-07 17:26:11 -08:00
peter.yeh@amd.com	c65b03b9f8	Enable arg_ops_test/unique_ops_test on AMD/rocm (#16853 ) Summary: Verified both tests are passing on rocm 2.1 env. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16853 Differential Revision: D13996279 Pulled By: bddppq fbshipit-source-id: c0df610d7d9ca8d80ed2d1339cdadef59105a71c	2019-02-07 16:51:15 -08:00
Johannes M Dieterich	bca358ad02	Update CI to recently released ROCm 2.1 release (#16808 ) Summary: * we do not need EAP packages any longer as the antistatic feature is now in the release * consistently install the rccl package * Skip one unit test that has regressed with 2.1 * Follow-up PRs will use 2.1 features once deployed on CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/16808 Differential Revision: D13992645 Pulled By: bddppq fbshipit-source-id: 37ca9a1f104bb140bd2b56d403e32f04c4fbf4f0	2019-02-07 15:12:18 -08:00
Yinghai Lu	0f42a1ed29	Use bound shape inference in SparseNN tests (#16834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16834 Inserting AdjustBatch ops will possibly change the names of the input/output, so we need to create a mapping and use the renamed names for external_inputs/outputs and input_shape_info for the onnxifi_net. Reviewed By: ipiszy Differential Revision: D13982731 fbshipit-source-id: c18b8a03d01490162929b2ca30c182d166001626	2019-02-07 14:51:32 -08:00
Davide Libenzi	66084c0bc9	Add recognition for XLA device types. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16844 Differential Revision: D13988805 Pulled By: gchanan fbshipit-source-id: 4e89d6d2cde8bdac41739efa65cc91569a360953	2019-02-07 14:51:28 -08:00
Sebastian Messmer	64339dbd51	Fix and re-enable test case (#16643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643 The test was disabled in D13908117 because it conflicted with another diff that was about to land. Now fixed the merge conflict and re-landing it. Reviewed By: ezyang Differential Revision: D13911775 fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b	2019-02-07 13:58:16 -08:00
Sebastian Messmer	6750e1e3e9	C10_REGISTER_CAFFE2_OPERATOR: Macro for registering c2 kernels (#16548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548 With this macro, a caffe2 operator can now directly be registered with c10. No need to write custom wrapper kernels anymore. Differential Revision: D13877076 fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c	2019-02-07 13:58:14 -08:00
Jesse Hellemn	ac4f66c9c3	Fix Anaconda logins on binary builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16848 Differential Revision: D13993614 Pulled By: pjh5 fbshipit-source-id: 16854b06d01460b78d9dbe7bd0341b7332984795	2019-02-07 13:44:52 -08:00
Zhicheng Yan	4193f7a106	new embedding label type in image input op (#16835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16835 We were using label type `multi_label_dense` to denote both 1) dense representation of integer label 2) embedding label of data type floating number. This cause some issues as two cases have different assumption, such as for integer label, we will check whether label value is in [0, number_class - 1]. But such check should be skipped for `embedding label`. Reviewed By: BIT-silence Differential Revision: D13985048 fbshipit-source-id: 1202cdfeea806eb47647e3f4a1ed9c104f72ad2c	2019-02-07 13:35:59 -08:00
Michael Antonov	b6648c1bbc	Update ATen internals to use int64_t for dimension indexing (#16739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16739 Some code ATen locations seemed to use int, etc. inclorrectly where either int64_t or size_t was required. Update them to use int64_t for dimension indexing where necessary. Reviewed By: ezyang Differential Revision: D13950124 fbshipit-source-id: aaf1cef783bf3c657aa03490f2616c35c816679f	2019-02-07 13:15:42 -08:00
Will Feng	1aa90192ea	Make JIT attributes t_ and ts_ store Variable instead of Tensor (#16596 ) Summary: Discussed with zdevito and we want to use Variable (with `set_requires_grad(false)`) instead of Tensor in all parts of JIT, to eliminate the distinction and the conceptual overhead when trying to figure out which one to use. This also helps with the Variable/Tensor merge work tracked at https://github.com/pytorch/pytorch/issues/13638, which will make common functions (such as `numel()` / `sizes()` / `dim()`) on Variable much faster when finished. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16596 Differential Revision: D13979971 Pulled By: yf225 fbshipit-source-id: c69119deec5bce0c22809081115f1012fdbb7d5a	2019-02-07 12:34:00 -08:00
David Riazati	44d98c30a3	Better error when using a constant tensor (#16724 ) Summary: Fixes #16284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16724 Differential Revision: D13990531 Pulled By: driazati fbshipit-source-id: adbf47a07eddb3813fbe1322944abfe5fcff89fa	2019-02-07 12:28:28 -08:00
Richard Zou	72f070a124	Backport the stable doc build on v1.0.1 to master (#16503 ) Summary: List of changes: - Always push the final state of the doc build docker for debugging purposes. - Adds code for the stable doc build. This code is never actually run on master, only the v1.0.1 branch. There is a big note for this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16503 Differential Revision: D13972469 Pulled By: zou3519 fbshipit-source-id: 68f459650ef0de200a34edd43fc1372143923972	2019-02-07 11:41:07 -08:00
Wanchao Liang	ac00e85e36	Remove undefined tensor in jit script (#16379 ) Summary: This PR is a follow up of #15460, it did the following things: * remove the undefined tensor semantic in jit script/tracing mode * change ATen/JIT schema for at::index and other index related ops with `Tensor?[]` to align with what at::index is really doing and to adopt `optional[tensor]` in JIT * change python_print to correctly print the exported script * register both TensorList and ListOfOptionalTensor in JIT ATen ops to support both * Backward compatibility for `torch.jit.annotate(Tensor, None)` List of follow ups: * remove the undefined tensor semantic in jit autograd, autodiff and grad_of * remove prim::Undefined fully For easy reviews, please turn on `hide white space changes` in diff settings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16379 Differential Revision: D13855677 Pulled By: wanchaol fbshipit-source-id: 0e21c14d7de250c62731227c81bfbfb7b7da20ab	2019-02-07 11:02:14 -08:00
Fritz Obermeyer	0d366e1bde	Support multiple inheritance in torch.distributions (#16772 ) Summary: This adds calls to `super().__init__()` in three classes in torch.distributions. This is needed when `Distribution` and `Transform` objects are used with multiple inheritance, as e.g. combined with `torch.nn.Module`s. For example ```py class MyModule(torch.distributions.Transform, torch.nn.Module): ... ``` cc martinjankowiak esling who have wanted to use this pattern, e.g. in #16756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16772 Differential Revision: D13978633 Pulled By: soumith fbshipit-source-id: 8bc6cca1747cd74d32135ee2fe588bba2ea796f1	2019-02-07 01:37:57 -08:00
vishwakftw	2681af1c8a	Remove redundant wrappers in torch.distributions (#16807 ) Summary: Changelog: - Remove torch.distributions.multivariate_normal._batch_diag : same functionality is provided by torch.diagonal - Remove torch.distributions.lowrank_multivariate_normal._batch_vector_diag : same functionality is provided by torch.diag_embed Pull Request resolved: https://github.com/pytorch/pytorch/pull/16807 Differential Revision: D13985550 Pulled By: soumith fbshipit-source-id: 25c7d00c52ff7f85e431134e9ce0d5dda453667b	2019-02-07 01:13:55 -08:00
Ying Zhang	511f6fc2d5	Insert AdjustBatchSizeOp into the predict_net. (#16811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16811 As the title. The AdjustBatch ops will be inserted before and after the Onnxifi op to: 1) adjust batch/seq sizes to the ideal batch/seq size before these tensors are processed by the Onnxifi op; 2) adjust batch size to the original batch size for batches generated by the Onnxifi op. Reviewed By: yinghai Differential Revision: D13967711 fbshipit-source-id: 471b25ae6a60bf5b7ebee1de6449e0389b6cafff	2019-02-07 00:40:11 -08:00
rohithkrn	aa88c2c0b6	Unify gpu_support variable in python tests (#16748 ) Summary: Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748 Differential Revision: D13983132 Pulled By: bddppq fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f	2019-02-07 00:29:51 -08:00
Mohana Rao	85ac272670	Update Docker file section in README.md (#16812 ) Summary: Emphasize on the fact that docker build should be triggered from pytorch repo directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16812 Differential Revision: D13985531 Pulled By: soumith fbshipit-source-id: c6511d1e81476eb795b37fb0ad23e8951dbca617	2019-02-06 23:53:50 -08:00
Jie	49443d49fb	TensorIterator cuda launch configs update (#16224 ) Summary: Update launch configs for TensorIterator gpu_reduce_kernel. Enable flexible block dimension to improve efficiency for reduction cases with small fast dimension. Previously TensorIterator launches blocks with fixed 32x16 threads. For cases like: import torch torch.randn(2**20, 4, device='cuda').sum(0) The fixed launch config does handle coalesced memory access efficiently. Updated launch configure enables flexible block dimension. Combining with improved reduction scheme (using flexible vertical / horizontal reduction instead of limited warp / block reduction in the old code), it ensures optimal memory access pattern even with reduction on dimension with small stride. Possible future improvements: 1. Precise dynamic shared memory allocation. 2. Using warp shuffle for vertical (block_y) reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16224 Differential Revision: D13806753 Pulled By: soumith fbshipit-source-id: 37e45c7767b5748cf9ecf894fad306e040e2f79f	2019-02-06 23:10:41 -08:00
Sebastian Messmer	b2135b2b72	Define layer_norm schema in caffe2 (#16535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16535 There is now no need anymore to define the layer norm schema in a central location. It can just be defined in caffe2 next to the kernel implementation. Reviewed By: ezyang Differential Revision: D13869503 fbshipit-source-id: c478153f8fd712ff6d507c794500286eb3583149	2019-02-06 21:21:34 -08:00
Sebastian Messmer	16468a9f45	Automatically register c10 ops with JIT (#16534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16534 All c10 ops from the c10 dispatcher are now automatically registered with JIT Reviewed By: dzhulgakov Differential Revision: D13869275 fbshipit-source-id: 5ab5dec5b983fe661f977f9d29d8036768cdcab6	2019-02-06 21:21:33 -08:00
Yinghai Lu	e5e0bf4152	Add AdjustBatch Op (#16676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16676 This op is used for changing batch size (first dimension) of the tensor. Reviewed By: bertmaher, ipiszy Differential Revision: D13929200 fbshipit-source-id: 4f2c3faec072d468be8301bf00c80d33adb3b5b3	2019-02-06 19:15:41 -08:00
bddppq	100aa0798e	Bring back running pytorch tests in rocm CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16829 Differential Revision: D13982323 Pulled By: bddppq fbshipit-source-id: 6ffadb96b9e2ebd64a29e38674a51401dfb211db	2019-02-06 17:58:48 -08:00
Zachary DeVito	f34192db0f	Rename DynamicType -> TensorType (#16787 ) Summary: ``` import json from subprocess import check_call from pprint import pprint renames = { 'c10::TensorType': 'DimentionedTensorType', 'c10::DynamicType': 'TensorType', 'c10::TensorTypePtr': 'DimentionedTensorTypePtr', 'c10::DynamicTypePtr': 'TensorTypePtr', 'c10::TypeKind::DynamicType': 'TensorType', 'c10::TypeKind::TensorType': 'DimentionedTensorType', } entries = json.loads(open('compile_commands.json', 'r').read()) build = None sources = [] for e in entries: name = e['file'] if not ('jit' in name or 'ATen/core' in name): continue build = e['directory'] sources.append(name) args = ['clang-rename', '-i', '-force', '-pl'] for name in sorted(renames.keys()): args += ['-qualified-name={}'.format(name), '-new-name={}'.format(renames[name])] for source in sources: cmd = args + [source] pprint(args) check_call(cmd, cwd=build) check_call(['git', 'stash', 'push', '-m', 'rename']) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16787 Differential Revision: D13974132 Pulled By: zdevito fbshipit-source-id: 8368fd53e17cff83707bbe77f2d7aad74f8ce60e	2019-02-06 17:31:07 -08:00
Yinghai Lu	1b919ca93e	Use bound shape inference in onnxifi transform (#16598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16598 ATT. Reviewed By: bertmaher, rdzhabarov Differential Revision: D13893698 fbshipit-source-id: 8d2ad9814fe76924a46b450eb7ebd3601fbdbbc7	2019-02-06 16:34:37 -08:00
Soumith Chintala	717ae09184	improve error message (#16719 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16719 Differential Revision: D13978688 Pulled By: ezyang fbshipit-source-id: 61f8fa4c54c6969a58550e32e18be2eb9254ced7	2019-02-06 15:51:58 -08:00
Jongsoo Park	8105aaca86	int8 SpatialBN (#16796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16796 SpatialBN int8 version Reviewed By: dskhudia Differential Revision: D13971224 fbshipit-source-id: e55fd608c161069daaa4e62c618bc14b01f32cb7	2019-02-06 15:32:01 -08:00
Jongsoo Park	30ab1773f9	call istringstream clear after str (#16820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16820 Sometimes parsing histogram was not working correctly due to changes in D13633256 We need to call istringstream clear after str Reviewed By: csummersea Differential Revision: D13977509 fbshipit-source-id: ce3e8cb390641d8f0b5c9a7d6d6daadffeddbe11	2019-02-06 15:23:08 -08:00
Narine Kokhlikyan	ea35d8e40a	Replace resize_dim() with set_sizes_and_strides() (#16732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16732 Use set_sizes_and_strides instead of resize_dim with. Reviewed By: ezyang Differential Revision: D13947867 fbshipit-source-id: 067b096b1fde14b039690992a5fe3ace386b2789	2019-02-06 14:50:52 -08:00
Jongsoo Park	929cd23da1	no EIGEN engine for DeformConv (#16785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16785 There's no EIGEN engine implemented for DeformConv but unit test was checking it. Reviewed By: BIT-silence Differential Revision: D13967306 fbshipit-source-id: e29c19f59f5700fc0501c59f45d60443b87ffedc	2019-02-06 11:59:31 -08:00
Jongsoo Park	8d4b2db529	format deform_conv_test.py (#16786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16786 Format to prepare D13967306 Reviewed By: BIT-silence Differential Revision: D13967317 fbshipit-source-id: 2de895f8474b04c55ba067fbf788c553dc010c60	2019-02-06 11:59:29 -08:00
Yinghai Lu	1a13dedf98	Fix/Improve bound shape inference with real net tests (#16597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16597 This diff fixes some bugs in shape inference for `SparseLengthsSumFused8BitRowwise`. And added input shape inference for `Concat` when `add_axis=1`. Reviewed By: bertmaher Differential Revision: D13892452 fbshipit-source-id: 6cd95697a6fabe6d78a5ce3cb749a3a1e51c68e7	2019-02-06 10:41:07 -08:00
Edward Yang	34cfbb0040	Typofix (#16800 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16800 Differential Revision: D13972592 Pulled By: ezyang fbshipit-source-id: 45c352ac6090c8060bf75f44dec7205556986d88	2019-02-06 10:34:04 -08:00
Oleg Bogdanov	30a6feda84	caffe2 \| MSVS compatibility fixes (#16765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16765 Code changes required to build caffe2 for windows with toolchain used by FB. Reviewed By: orionr Differential Revision: D13953258 fbshipit-source-id: 651823ec9d81ac70e32d4cce5bc2472434104733	2019-02-06 09:47:01 -08:00
Gu, Jinghui	887080e92a	Fallback sum/add to CPU if needed (#15267 ) Summary: Fallback sum/add to CPU if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15267 Differential Revision: D13935064 Pulled By: yinghai fbshipit-source-id: eb228683d00a0462a1970f849d35365bc98340d6	2019-02-06 09:35:14 -08:00
Lu Fang	39eab01b61	Automatic update of fbcode/onnx to 822d8df0a2a32233c6022f50a158817a0f19bdc7 (#16791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16791 Previous import was bfa8b335ab6d1ed7b688dc2ec96421a3fe9e644c Included changes: - [822d8df](https://github.com/onnx/onnx/commit/822d8df): allow removed experimental ops in the checker for now (#1792) <Lu Fang> Reviewed By: MisterTea Differential Revision: D13970103 fbshipit-source-id: 5feaaa6c65ba10901eeba0b63724d7e451e9b642	2019-02-06 09:21:41 -08:00
Freddie Mendoza	f2e0d64775	Adding torch/lib64 in .gitignore for ppc64le CI build to pass (#16782 ) Summary: Adding torch/lib64 in .gitignore so that a git status --porcelain check during CI build and test passes for ppc64le. During build torch/lib64 is created and contains third-party libraries. This should be ignored by the porcelain check Pull Request resolved: https://github.com/pytorch/pytorch/pull/16782 Differential Revision: D13972794 Pulled By: ezyang fbshipit-source-id: 5459c524eca42d396ac46e756a327980b4b1fa53	2019-02-06 09:05:49 -08:00
Edward Yang	a3f600e394	Revert D13854304: [redo][c10] LayerNorm Registration Example Differential Revision: D13854304 Original commit changeset: ec463ce22721 fbshipit-source-id: 4262b9a2ef486e1c7c0283ea021331ac97cc5f56	2019-02-06 08:26:23 -08:00
Edward Yang	fc0e88dd77	Revert D13855525: [c10] Expose RoIAlign to torch Differential Revision: D13855525 Original commit changeset: cfee7bb1544d fbshipit-source-id: 0b4124b78c4082b52e592a1275069c879a9aed39	2019-02-06 08:26:22 -08:00
Edward Yang	33a6a7a3ea	Revert D13856086: [c10] Expose GenerateProposals to torch Differential Revision: D13856086 Original commit changeset: a4873646a71a fbshipit-source-id: 79b634426404236ddbc407d3796a350ad3dae5ca	2019-02-06 08:26:20 -08:00
Edward Yang	018485130f	Revert D13864292: [c10] Expose BBoxTransform to pytorch Differential Revision: D13864292 Original commit changeset: 1f57664e7834 fbshipit-source-id: 37663b7e8213185ecaa5c219076fc7de64704549	2019-02-06 08:26:18 -08:00
Edward Yang	c0a7bf94ed	Revert D13865221: [c10] Expose BoxWithNMSLimit Differential Revision: D13865221 Original commit changeset: 8a3f1d420183 fbshipit-source-id: 0057be9619b660dcad8c01bae67b54400127577e	2019-02-06 08:26:17 -08:00
Edward Yang	cda43336d4	Revert D13866214: [c10] Expose HeatmapMaxKeypoints to torch Differential Revision: D13866214 Original commit changeset: 2ca79037fc07 fbshipit-source-id: d2c653f4f32cf0ea76875888f3523c0dc7db9960	2019-02-06 08:26:16 -08:00
Rodrigo Berriel	d327965dac	Fix pip list format in collect_env (#16798 ) Summary: Since pip 18.0 (2018-07-22), `legacy` is no longer a valid choice for `pip list --format` as can be seen in the [Release Notes](https://pip.pypa.io/en/stable/news/#id62). Therefore, the options now are: `columns`, `freeze` and `json`. With `legacy`, this is how it looked like: ``` [...] Versions of relevant libraries: [pip3] numpy (1.16.1) [pip3] torch (1.0.1) [pip3] torchvision (0.2.1) [...] ``` Changing to `freeze`, this is how it looks like: ``` [...] Versions of relevant libraries: [pip3] numpy==1.16.1 [pip3] torch==1.0.1 [pip3] torchvision==0.2.1 [...] ``` Currently, this is what happens: ``` [...] Versions of relevant libraries: [pip] Could not collect [...] ``` The `freeze` option is also available in old pip, so this change is backwards compatible. Also, if we would like to keep the old style, which I think it is not necessary, I could easily change that. --- In case anyone wants to know how `columns` looks like (I prefer `freeze`): ``` [...] Versions of relevant libraries: [pip3] numpy 1.16.1 [pip3] torch 1.0.1 [pip3] torchvision 0.2.1 [...] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16798 Differential Revision: D13971793 Pulled By: soumith fbshipit-source-id: 3721d9079a2afa245e1185f725598901185ea4cd	2019-02-06 07:48:08 -08:00
Soumith Chintala	d1b2ab83fc	disable default system-wide detection of gflags, glog, opencv, lmdb, leveldb (#16789 ) Summary: They can instead by enable by env flags USE_* (as always). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16789 Differential Revision: D13971789 Pulled By: soumith fbshipit-source-id: d5eac9be677114be3fb15b43080faa0efdfff8ee	2019-02-06 05:13:47 -08:00
Zachary DeVito	255136fc1d	fix BUILD_CAFFE2_OPS Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16783 Differential Revision: D13965061 Pulled By: zdevito fbshipit-source-id: 6fe710ca51e2f338873b56f23256668ca3fe2032	2019-02-05 22:39:51 -08:00
Edward Yang	ab035d01e3	Remove unnecessary typing import. (#16777 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16777 Differential Revision: D13969679 Pulled By: ezyang fbshipit-source-id: d4728797a5927ae32628621c654eadb93c0e7682	2019-02-05 21:12:35 -08:00
Michael Suo	43f4c86238	Fix alias analysis for fork/wait (#16671 ) Summary: (review top commit only). As expected, fork/wait introduces some corner cases into the alias analysis. The comments inline should describe the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16671 Differential Revision: D13963219 Pulled By: suo fbshipit-source-id: 2bec6fc03a4989cf309fbb9473f3f2ffe2c31431	2019-02-05 20:43:30 -08:00
Ailing Zhang	c1dff549da	changes to apply xla patch (#16781 ) Summary: This PR will let xla tests passes after https://github.com/pytorch/xla/pull/183 is in. Will add back the branch filters once it's ready. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16781 Differential Revision: D13968976 Pulled By: ailzhang fbshipit-source-id: df3b173336b3247aa56ef723569a1f68cdfa56e0	2019-02-05 19:03:05 -08:00
Jerry Zhang	db5a3c274d	Tensor construction codemod (#16568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16568 In caffe2/caffe2/operators and caffe2/caffe2/fb/operators (Resize + mutable_data) and (ResizeLike + mutable_data) motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13863416 fbshipit-source-id: 90ad3971850b89bf4b2ac81e9fa59d3bc43dc1c9	2019-02-05 18:51:02 -08:00
David Riazati	18edd3ab08	Warn when tracing legacy constructors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16770 Differential Revision: D13963581 Pulled By: driazati fbshipit-source-id: 8f8cdfc455ba65be370fd952fc5e5c233525d002	2019-02-05 18:32:59 -08:00
David Riazati	7bf7a4162d	Use torch.zeros for nn.LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16779 Differential Revision: D13963577 Pulled By: driazati fbshipit-source-id: dc9edc3d2096760737ecbe4b3dd441ed2d53f4ad	2019-02-05 17:57:51 -08:00
Roy Li	c5c831953b	Set SCCACHE_IDLE_TIMEOUT=1200 (#16728 ) Summary: Doubling the sccache timeout from default of 600. the asan build of #16645 will fail without this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16728 Differential Revision: D13963727 Pulled By: li-roy fbshipit-source-id: 3614d75c1b46d663fa05b84f99d8a099283a8e64	2019-02-05 15:28:35 -08:00
Johannes M Dieterich	448e0d78e9	Document hip-clang and its __HIP__ macro (#16771 ) Summary: In #16085 , we introduced initial hip-clang bring-up code. Document the use of the __HIP__ macro now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16771 Differential Revision: D13961538 Pulled By: ezyang fbshipit-source-id: 67f6226abcbe62e2f4efc291c84652199c464ca6	2019-02-05 15:13:52 -08:00
Edward Yang	4404762d7d	Rename IntList to IntArrayRef. (#16751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751 This was made more complicated by the fact that ivalue::IntList is a thing. So I had to fix all of the sites where we referring to IValue post facto. The following codemods were run, in this order: ``` codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>' codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>' ``` Some manual fixups were done afterwards; they can be reviewed separately at https://github.com/pytorch/pytorch/pull/16752 Reviewed By: dzhulgakov Differential Revision: D13954363 fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64	2019-02-05 14:54:34 -08:00
David Riazati	e2d3a3fd6a	dict values(), keys(), and len() (#16629 ) Summary: Adds some operations for dicts to match Python and tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/16629 Differential Revision: D13961144 Pulled By: driazati fbshipit-source-id: b31f27a4320ff62cd118b508fb0a13056535dc7c	2019-02-05 13:55:25 -08:00
Lu Fang	0ceef3c9f6	Automatic update of fbcode/onnx to bfa8b335ab6d1ed7b688dc2ec96421a3fe9e644c (#16767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16767 Previous import was 875f7bbe537b9d6931d065977c192eaaf61e1179 Included changes: - [bfa8b33](https://github.com/onnx/onnx/commit/bfa8b33): [ONNXIFI]Add extension of onnxSetIOAndRunGraph (#1781) <Rui Zhu> Reviewed By: zrphercule Differential Revision: D13959349 fbshipit-source-id: 4876d00a3f7033cf9d89554f8b4789acd6881f72	2019-02-05 13:17:35 -08:00
Jesse Hellemn	0f7a0f8c83	Fix commit races on binary CI on master PR-merges (#16773 ) Summary: There is no way to test this until it is merged. On master jobs that run after a PR is merged, there is no CIRCLE_PR_NUMBER so the binary builds clone pytorch/pytorch/master, which races. Based off of https://circleci.com/docs/2.0/env-vars/ and the circleci checkout code ``` git config --global url."ssh://git@github.com".insteadOf "https://github.com" \|\| true git config --global gc.auto 0 \|\| true if [ -e /home/circleci/project/.git ] then cd /home/circleci/project git remote set-url origin "$CIRCLE_REPOSITORY_URL" \|\| true else mkdir -p /home/circleci/project cd /home/circleci/project git clone "$CIRCLE_REPOSITORY_URL" . fi if [ -n "$CIRCLE_TAG" ] then git fetch --force origin "refs/tags/${CIRCLE_TAG}" else git fetch --force origin "master:remotes/origin/master" fi if [ -n "$CIRCLE_TAG" ] then git reset --hard "$CIRCLE_SHA1" git checkout -q "$CIRCLE_TAG" elif [ -n "$CIRCLE_BRANCH" ] then git reset --hard "$CIRCLE_SHA1" git checkout -q -B "$CIRCLE_BRANCH" fi git reset --hard "$CIRCLE_SHA1" ``` I believe we do no use git tags Pull Request resolved: https://github.com/pytorch/pytorch/pull/16773 Differential Revision: D13962132 Pulled By: pjh5 fbshipit-source-id: c62d2139f38ff39ecda1509b0bcd8bd102828e40	2019-02-05 13:17:33 -08:00
Bram Wasti	a9713d07b0	Expose HeatmapMaxKeypoints to torch (#16528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16528 .. Reviewed By: smessmer Differential Revision: D13866214 fbshipit-source-id: 2ca79037fc070bade5542345af5ce09f88beda44	2019-02-05 12:56:58 -08:00
Bram Wasti	3df7b321cc	Expose BoxWithNMSLimit (#16529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16529 .. Reviewed By: smessmer Differential Revision: D13865221 fbshipit-source-id: 8a3f1d420183ed5ae51b3c9e4eb6e033078c7ae4	2019-02-05 12:56:56 -08:00
Bram Wasti	add39b85cc	Expose BBoxTransform to pytorch (#16530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16530 .. Reviewed By: smessmer Differential Revision: D13864292 fbshipit-source-id: 1f57664e78347e72c0087aa3d825a6a9517c1945	2019-02-05 12:56:54 -08:00
Bram Wasti	f33a2b960e	Expose GenerateProposals to torch (#16477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16477 expose generateproposals to torch Reviewed By: smessmer Differential Revision: D13856086 fbshipit-source-id: a4873646a71a6b6c01740d21729e827f4b36588f	2019-02-05 12:56:52 -08:00
Bram Wasti	f5d4636021	Expose RoIAlign to torch (#16476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16476 enable calling roialign (caffe2) from torch frontend Reviewed By: smessmer Differential Revision: D13855525 fbshipit-source-id: cfee7bb1544dc58df4231604ba01d61ca905ae3f	2019-02-05 12:56:50 -08:00
Bram Wasti	240240bb10	LayerNorm Registration Example (#16478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478 This diff includes an example registration of a caffe2 op in torch. A previous attempt ran into a static initialization order bug. Reviewed By: smessmer Differential Revision: D13854304 fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc	2019-02-05 12:56:48 -08:00
Bram Wasti	af4d2b889c	Enable undefined at::Tensor to be passed as Output (#16730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16730 with Jerry's new updates Tensor must be defined -- as a result I've needed to update the shim for caffe2 ops being used in PyTorch Reviewed By: smessmer Differential Revision: D13946950 fbshipit-source-id: 6f77877c61a743f82bdfc2ad04d6ab583000cc18	2019-02-05 12:56:46 -08:00
Alex Şuhan	9811a4220d	Add XLA / TPU device type, backend type and type id (#16763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16763 Replicate the easy bits in https://github.com/pytorch/pytorch/pull/15153 with TPU / XLA instead of MSNPU. Also don't initialize the storage for XLA tensors for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16585 Reviewed By: ezyang Differential Revision: D13912118 Pulled By: gchanan fbshipit-source-id: 4889177e2478768fb281ed075b71146d1d850bd9	2019-02-05 12:56:44 -08:00
Zachary DeVito	6efa40e07b	Preserve method parameter names (#16750 ) Summary: Fixes #16591 This uses uniqueBaseName so that parameters do not end up with suffixes. It changes next_id to be per-base-name rather than global to fix jittering issues when re-importing a re-numbered graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16750 Differential Revision: D13960282 Pulled By: zdevito fbshipit-source-id: 2156f581d9b95d77bf1f1252074e800b19116555	2019-02-05 12:51:24 -08:00
Ailing Zhang	f8d4a14f6d	add xla tests to enabled-configs (#16761 ) Summary: This should enable xla tests thus let master xla tests pass. As usual, I will add the branch filters back before landing. Thanks ezyang ! Pull Request resolved: https://github.com/pytorch/pytorch/pull/16761 Differential Revision: D13959746 Pulled By: ailzhang fbshipit-source-id: 7384da281d093d16edccb4283c74e47ac659eeff	2019-02-05 12:25:45 -08:00
Jesse Hellemn	e30d33483b	Fix logging top commit of pytorch + builder in binaries for long summaries (#16766 ) Summary: I'll test with this really long summary. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce risus sem, mattis vitae commodo vitae, mattis vel ex. Integer nec consectetur ligula, sit amet ultricies risus. Suspendisse potenti. Donec aliquet quam ante. Donec porttitor justo ligula, ut vestibulum erat facilisis a. Nullam eget lobortis nisi. Aenean quis sem id ante eleifend condimentum nec a lacus. Sed sed dolor augue. Proin feugiat, tellus in eleifend cursus, libero nulla lacinia erat, et efficitur dui odio ut ex. In et sem purus. Proin dictum scelerisque magna, nec feugiat dolor lobortis id. Proin ante urna, ultrices in semper et, pulvinar et dui. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Mauris ullamcorper neque a pharetra rhoncus. Aliquam vel semper felis. Integer id massa erat. Morbi leo eros, varius sed viverra eu, dictum nec purus. Fusce vitae mollis sem, non fringilla nulla. Donec tincidunt luctus dolor. Morbi lobortis, magna quis viverra bibendum, lacus tortor pulvinar risus, eu porta tellus nulla vitae dolor. Sed tincidunt, turpis quis facilisis malesuada, nulla eros lobortis lorem, a fermentum mi nisl non quam. Pellentesque vehicula, nisl non eleifend viverra, tellus neque accumsan tellus, id ultricies lacus mi sed sapien. Proin rutrum ultrices quam sit amet euismod. Maecenas vel faucibus libero, nec efficitur mi. Proin felis augue, elementum eget vestibulum non, euismod sed urna. Curabitur purus nisi, interdum nec rutrum id, faucibus nec sapien. Integer consectetur interdum elit, volutpat vulputate velit. Integer et ultricies magna. Fusce blandit lorem urna, quis sodales sapien porttitor in. Nulla nec sodales sem. Morbi consequat massa sit amet fringilla pretium. Nunc maximus vitae neque auctor pharetra. Morbi gravida feugiat urna, eu sagittis est pulvinar eget. Maecenas ut fermentum ante, eget malesuada neque. In ut maximus magna. Donec nec finibus sapien. Quisque viverra erat lobortis, rhoncus augue sed, hendrerit dui. Donec in feugiat augue, a ultrices justo. Pellentesque rutrum augue sed nulla auctor, a venenatis risus aliquam. Nullam ipsum justo, dictum sit amet elementum eu, eleifend a turpis. Proin ut tellus ut urna volutpat fermentum ac aliquam tellus. Quisque ultricies est id eros dictum ultrices. Cras eu urna interdum, eleifend felis vitae, vulputate nulla. Cras tincidunt, mi sodales imperdiet tristique, diam odio convallis ligula, ac vulputate enim sapien eu tellus. Phasellus eleifend finibus sapien id ullamcorper. Donec aliquet eleifend consectetur. Proin in nulla venenatis, egestas neque quis, blandit sem. Suspendisse pellentesque arcu vel ligula fermentum maximus. Aliquam non ipsum ut ante pharetra finibus. Nunc rhoncus purus sit amet risus congue venenatis. Integer id vestibulum neque, et fermentum elit. Nunc sit amet tortor quis mi aliquam vestibulum et in mauris. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas mollis hendrerit nulla, non tempus neque pharetra ac. Proin commodo bibendum velit, consectetur pretium metus sollicitudin eget. Aliquam malesuada semper tempor. Ut vel vulputate dolor, eu faucibus mauris. Nam commodo quis dolor sit amet eleifend. Phasellus eget massa odio. Donec tempor est at ante finibus lobortis. Suspendisse porttitor imperdiet ultrices. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nullam id dignissim magna, non suscipit odio. Vestibulum vel maximus erat, suscipit ullamcorper tellus. Fusce egestas augue lorem, in ultricies est vehicula ac. Integer pretium, ex in elementum varius, nisi turpis posuere lectus, nec posuere ligula mi ac ligula. Donec vehicula dolor ut ex elementum, quis scelerisque tellus molestie. Mauris euismod magna ac ornare cursus. Vivamus dapibus quam nec tellus aliquam elementum. Phasellus ultricies quis augue ut fringilla. Suspendisse eu molestie eros. Suspendisse potenti. Curabitur varius sodales maximus. Etiam nec rutrum est. Sed vulputate suscipit elit, eu condimentum mauris pretium eget. Curabitur convallis commodo dui. Aenean lectus orci, pretium non mi sit amet, commodo imperdiet dui. In hac habitasse platea dictumst. In et ex nisl. Duis justo tortor, finibus at augue vitae, fermentum hendrerit tellus. Donec malesuada justo a molestie posuere. Morbi nisl leo, feugiat ut faucibus ut, mattis id purus. Vestibulum hendrerit lorem ligula, et ullamcorper nisl lacinia sed. Integer vitae lacinia nunc, sed interdum enim. Aliquam aliquet ipsum vitae eros ornare accumsan. Phasellus venenatis laoreet est, sed feugiat neque lobortis id. Proin pulvinar placerat leo lacinia vehicula. Duis accumsan semper lobortis. Donec elementum nunc non quam aliquam, rutrum fringilla justo interdum. Morbi pulvinar pellentesque massa vitae maximus. Cras condimentum aliquam massa, et pellentesque lorem dictum a. Vivamus at dignissim justo. Donec ligula dui, tempus vestibulum est vel, rutrum blandit arcu. Vivamus iaculis molestie neque in elementum. Sed convallis tempus quam non elementum. Nulla euismod lobortis ligula. Etiam ac mauris eget magna posuere ornare id vitae felis. Nunc efficitur lorem et euismod porttitor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16766 Differential Revision: D13959962 Pulled By: pjh5 fbshipit-source-id: 9b71bdf981d4fda9d8951e2d183db81f349b7f81	2019-02-05 11:30:53 -08:00
Richard J. Knight	2a85d98745	Fix type-o in unsupported data type error message (#16537 ) Summary: -In the case where an operator does not support a given data type an error message is emitted to alert the user, this message is incorrectly structured. This commit adds to and rearranges the error message to make it a little clearer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16537 Differential Revision: D13958859 Pulled By: zou3519 fbshipit-source-id: 935fc3adcef2f969042b1db902c9ec004488ea9c	2019-02-05 10:46:46 -08:00
Adam Paszke	963e410b57	Make tuple checks faster (#16657 ) Summary: As the comment indicates, the issue is only present in some versions of Python 2, so we should be able to use heavily optimized PyTuple_Check in most cases, and skip allocation of the strings, and unnecessary lookups on object's type. cc ezyang zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/16657 Differential Revision: D13957854 Pulled By: ezyang fbshipit-source-id: be32eb473ad77a0805e8247d8d583d673d4bdf25	2019-02-05 09:35:37 -08:00
Syed Tousif Ahmed	85ad011843	Fixes selection of cuDNN algorithm (#15881 ) Summary: This PR updates the logic for using cudnnGet* and cudnnFind. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions. Changelist: - Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`. - Used cudnnGet_v7 everywhere cudnnGet* was being used. - Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851) Differential Revision: D13957944 Pulled By: ezyang fbshipit-source-id: a88c39d80ae37f2d686665622302b62b50fab404	2019-02-05 09:30:00 -08:00
Adam Paszke	c751cf8b36	Don't throw in operator== for TypeMeta and ScalarType (#16736 ) Differential Revision: D13957847 Pulled By: ezyang fbshipit-source-id: 3cc01538aab1bbb396c29ce61e0e95118f8d011f	2019-02-05 08:56:22 -08:00
Brennan Vincent	1ce188c510	logsumexp for multiple dimensions (#16475 ) Summary: Move `logsumexp` and `max_values` to `TensorIterator` and use it to make `logsumexp` work for multiple dimensions. Timings on a tensor of shape `(10,1000000,10)`, for each combination of (cpu, single-threaded cpu, gpu) and dimension: before 208 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 279 ms ± 5.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 199 ms ± 2.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.11 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.25 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.11 s ± 6.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 15.4 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 132 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) after 199 ms ± 8.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 307 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 207 ms ± 7.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.16 s ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.26 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.13 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 15.4 ms ± 868 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 132 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 21.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16475 Differential Revision: D13855746 Pulled By: umanwizard fbshipit-source-id: aaacc0b967c3f89073487e1952ae6f76b7bd7ad3	2019-02-05 08:32:11 -08:00
Edward Yang	4047c97266	Revert D13952085: [pytorch][PR] Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA Differential Revision: D13952085 Original commit changeset: 410c4e117a44 fbshipit-source-id: fca59c37e71f8e61ae52867d5401b28fbacefe5a	2019-02-05 07:42:59 -08:00
James Reed	29827e1971	Integrate PyTorch quantization APIs into ensemble export modules (#309 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/309 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16481 This gives us a boolean flag `quantize` on the `BeamSearch` module that allows us to apply FBGEMM quantization to a pretrained PyTorch model and export this to PyTorch native runtime. Reviewed By: jmp84 Differential Revision: D13514776 fbshipit-source-id: 3f7cbff0782aae54c9623ad1ea7e66d7f49e2b32	2019-02-05 01:55:11 -08:00
James Reed	0cd918f4d3	Fork/join parallelism for ensemble export modules (#310 ) Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/310 This adds fork/join parallelism to the EncoderEnsemble and DecoderBatchedStepEnsemble models. Note that when run in Python, these calls are no-op, and similarly we remove these calls before exporting to ONNX. But when we run in the PyTorch native runtime, we will now have the opportunity to run these sections in parallel. Benchmark validation is pending me slogging through FBLearner Flow issues, as usual Reviewed By: jmp84 Differential Revision: D13827861 fbshipit-source-id: 0cb9df6e10c0ba64a6b81fa374e077bce90f1d5b	2019-02-05 01:55:09 -08:00
James Reed	ce15ae8f23	Add an API to set the number of threads in C10 thread pool (#16669 ) Summary: Tested locally on machine translation service Pull Request resolved: https://github.com/pytorch/pytorch/pull/16669 Differential Revision: D13927858 Pulled By: jamesr66a fbshipit-source-id: efcb8c21e0c2f76ac37967e6f52967da515595c3	2019-02-05 00:15:56 -08:00
Dmytro Dzhulgakov	3796cbaf7a	Try to turn off zero-out of tensors fully Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16601 Reviewed By: ezyang Differential Revision: D13893776 fbshipit-source-id: 3190258f2591540dc54ad8504ac6ded998bef384	2019-02-04 23:59:11 -08:00
Jerry Zhang	ae5fd10b02	Tensor method rename size()->numel() - 2/3 (#16745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16745 Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: dzhulgakov Differential Revision: D13944353 fbshipit-source-id: 25c2ca22204706544ee67e59c663bf495f2b4f6b	2019-02-04 23:59:10 -08:00
Jerry Zhang	3df91ceb5e	Tensor method rename size()->numel() - 3/3 (#16747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16747 Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: dzhulgakov Differential Revision: D13944380 fbshipit-source-id: 2167e2092ab27d31a4d5ef6cfa4b65d192f597a8	2019-02-04 23:54:33 -08:00
Jerry Zhang	cb9740a608	Tensor method rename size()->numel() - 1/3 Summary: Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: dzhulgakov Differential Revision: D13944296 fbshipit-source-id: 67e97c2cf45889d25f2cb3e2203cecba03c8a3aa	2019-02-04 23:33:17 -08:00
Summer Deng	a7a2618d51	Bug fix in l2 quantization (#16749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16749 Use global quantization options in l2 quantization Reviewed By: jspark1105 Differential Revision: D13951378 fbshipit-source-id: d4e356149587e5d2d09a6937c7fa1aa131957fd6	2019-02-04 22:31:38 -08:00
Michael Suo	b1822966ee	points-to graph simplification (#16605 ) Summary: This PR reworks the mutability API to be simpler (updates passes to use "mayAlias" calls) and improves the caching logic. The difference is that we now directly express the idea of a "memory location." Leaves in the alias trackers points-to graph are considered unique memory locations, and mayAlias questions can be boiled down whether two values share a leaf. To speed up queries, some basic path compression has been added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16605 Differential Revision: D13952738 Pulled By: suo fbshipit-source-id: cfc7fb2b23369f1dc425d1d8ca2c753c193d95dd	2019-02-04 22:04:25 -08:00
Edward Yang	6c04224cd8	Revert "Move outplace ops to ATen (#12413 )" (#16731 ) Summary: This reverts commit f660d3ae19decc64390e894fbaf8de80d87585e0. cc zasdfgbnm Reasoning at https://github.com/pytorch/pytorch/pull/12413#issuecomment-460424129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16731 Differential Revision: D13948022 Pulled By: ezyang fbshipit-source-id: b10669cf03679e306850314b7b5b08bed0839e19	2019-02-04 19:30:04 -08:00
Lu Fang	1409a2afc8	Automatic update of fbcode/onnx to 875f7bbe537b9d6931d065977c192eaaf61e1179 (#16734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16734 Previous import was 15c33c945851907411619f599900c3852108e7e3 Included changes: - [875f7bb](https://github.com/onnx/onnx/commit/875f7bb): Bump docker image version from 230 to 238 (#1786) <bddppq> - [f94e430](https://github.com/onnx/onnx/commit/f94e430): Fix: setup.py is using wrong cmake build type (#1784) <Changming Sun> - [2896c77](https://github.com/onnx/onnx/commit/2896c77): Fix Cast testcase data (#1776) <Raymond Yang> Reviewed By: bddppq Differential Revision: D13948288 fbshipit-source-id: 5f733005d4bf483d58b630d511cadb0fa4ac7910	2019-02-04 17:37:40 -08:00
Soumith Chintala	3f570b5eea	Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705 ) Differential Revision: D13952085 Pulled By: soumith fbshipit-source-id: 410c4e117a44c08eadc6f3ded91fafc320a7c696	2019-02-04 16:51:12 -08:00
Jerry Zhang	846a64e805	Tensor method rename ndim()->dim() - 1/3 (#16678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16678 Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: houseroad Differential Revision: D13929413 fbshipit-source-id: 677ce760bdbf9f5560630fdc40dd60af227fb696	2019-02-04 15:49:16 -08:00
Mas-ud Hussain	9e31d6dbf1	Merge job-spec env variables of Pytorch/Caffe2 CI jobs (#16649 ) Summary: The idea is to unify the environment variables `JOB_BASE_NAME` and `BUILD_ENVIRONMENT` which controlled the Pytorch and Caffe2 jobs respectively. In this commit, we have converted all the `JOB_BASE_NAME` references in _.jenkins/pytorch/_ files to `BUILD_ENVIRONMENT`. Then, did the same thing in ._circleci/config.yml_. One thing that we needed to be careful was when both `BUILD_ENVIRONMENT `and `JOB_BASE_NAME` were present under same declaration in _config.yml_ file (e.g., for "caffe2-" stuffs). To ensure that all "==" checks work as expected, we also had to add "" in some if conditions in _.jenkins/caffe2/build.sh_ file. Finally, removed "-build", "-test", etc. suffixes from `COMPACT_JOB_NAME` variable assignment in the bash script files in _.jenkins/pytorch_ folder, e.g., modify `COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"` to `COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16649 Differential Revision: D13946392 Pulled By: mmh683 fbshipit-source-id: 790de6abf96de184758e395c9098a50998e05bc5	2019-02-04 15:37:44 -08:00
Jesse Hellemn	b250385811	Log top commit of pytorch + builder in binaries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16729 Differential Revision: D13947737 Pulled By: pjh5 fbshipit-source-id: 9ba8ea56baff7147f73458ab26d0553fff31a46f	2019-02-04 14:30:44 -08:00
Junjie Bai	c15ed3a2f2	Run resnext101 training in rocm benchmark (#16017 ) Summary: cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/16017 Differential Revision: D13946680 Pulled By: bddppq fbshipit-source-id: ea125b0389188a59db3d537671a3214a557aecdb	2019-02-04 14:16:25 -08:00
Joshua Meier	6d407baedf	Replace resize_dim() with set_sizes_and_strides() in THTensor_(unsqueeze1d) in aten/src/TH/generic/THTensor.cpp (#16673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16673 Replace resize_dim() with set_sizes_and_strides() in THTensor_(unsqueeze1d) in aten/src/TH/generic/THTensor.cpp, as described in T38058642. Reviewed By: ezyang Differential Revision: D13928879 fbshipit-source-id: d593cebcc82589cd362ac78884d4e367d0da0ce6	2019-02-04 12:32:14 -08:00
Jerry Zhang	db4235f31d	Tensor method rename ndim()->dim() - 2/3 (#16679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16679 Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: houseroad Differential Revision: D13929450 fbshipit-source-id: fcc222744c28b41f2cedffc0c2ef5d04aceaa5af	2019-02-04 11:12:57 -08:00
JerryShih	73db487a8e	Update the cmake build configuration for AppleClang compiler (#15820 ) Summary: This pr try to merge the https://github.com/pytorch/pytorch/pull/11563 again and fix the linking error in https://github.com/pytorch/pytorch/pull/14837. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15820 Differential Revision: D13942024 Pulled By: ezyang fbshipit-source-id: dc6d1e9c4b0f177914f3745665244272a03ce33c	2019-02-04 08:53:47 -08:00
Dmytro Dzhulgakov	dc528fd734	Fix build with cuda but no cudnn in caffe2 (#16701 ) Summary: Just noticed while building on a machine without cudnn present - it was building but the runtime failed since some methods weren't bound Pull Request resolved: https://github.com/pytorch/pytorch/pull/16701 Differential Revision: D13937247 Pulled By: dzhulgakov fbshipit-source-id: c81f05be7a9e64a1a8591036dcf8692c0ed4064e	2019-02-03 22:14:51 -08:00
Dmytro Dzhulgakov	da24749e8d	Fix ReservoirSampling zero-initialization reliance (#16702 ) Summary: The op was implicitly relying on pos_to_output to be zero-initialized after extending. We're removing this functionality from allocator, thus fixing here. For some reason it wasn't spotted by junk-initialization but was reliably reproducible with standard malloc() if both junk_fill and zero_fill flags are turned off. cc kittipatv jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16702 Reviewed By: kittipatv Differential Revision: D13937257 Pulled By: dzhulgakov fbshipit-source-id: 3ee520b05467108e6c3e64eb3e6c60589bdf3d87	2019-02-03 21:31:56 -08:00
Pieter Noordhuis	6cb593b88c	Remove --without-parallel (#16704 ) Summary: See homebrew/homebrew-core@60c72ba9 and homebrew/homebrew-core#31510. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16704 Differential Revision: D13938093 Pulled By: pietern fbshipit-source-id: 8a70d462180257f96202a0373a86a273b524045c	2019-02-03 13:39:26 -08:00
Pieter Noordhuis	a53d28dd87	Bump gloo (#16638 ) Summary: This bump includes: * Memory leak fix where the Gloo transport would hold on to auxiliary structures for send/recv pairs after they finished. * Fix write-after-free from Gloo thread during stack unwinding on error. * Removal of the PATENTS file. Fixes #16144. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16638 Differential Revision: D13937950 Pulled By: pietern fbshipit-source-id: 3cfecaf13ee0f214c06681386557a4b1c3e1d6b9	2019-02-03 11:52:31 -08:00
vishwakftw	6d86bc7c3f	Fix issue with scalars and __rpow__ (#16687 ) Summary: Changelog: - Modify __rpow__ function in tensor.py to adapt to scalars Pull Request resolved: https://github.com/pytorch/pytorch/pull/16687 Differential Revision: D13936720 Pulled By: soumith fbshipit-source-id: b0c8727968b04efbc6e7461807c812d962f03370	2019-02-02 18:55:51 -08:00
Sebastian Messmer	4c16ea93d1	Improve LeftRight (#16524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16524 - Make it exception safe. When an exception happens during write, the old state is recovered. - Use RAII instead of try/catch to increment counters in readers. This is more readable, and it also makes it work with reader closures that return void, which previously didn't work because the reader return value was stored on the stack. - Assert there's no reads or writes happening when it's destructed to avoid destruction race conditions - Explain the algorithm in detail in comments - Add test cases Reviewed By: ezyang Differential Revision: D13866609 fbshipit-source-id: 01306a282a3f555569caa13d8041486f960d00e2	2019-02-02 16:33:27 -08:00
svcscm	39a55f4ea6	Updating submodules Reviewed By: zpao fbshipit-source-id: e66e01e164d1784740fcb8bebc4817d2a8cd7903	2019-02-02 16:33:25 -08:00
svcscm	988647b21c	Updating submodules Reviewed By: zpao fbshipit-source-id: 31a8d843ffba2d7405b4742ea553937a00dff216	2019-02-02 05:05:07 -08:00
James Reed	44809fda84	fix conditional in mean workaround (#16686 ) Summary: When trying to get a test to pass I was missing an exclamation mark. Instead now I just use a different function in the conditional Pull Request resolved: https://github.com/pytorch/pytorch/pull/16686 Differential Revision: D13935182 Pulled By: jamesr66a fbshipit-source-id: 7525a1a829276641dbafe06734f03f6202df6b22	2019-02-02 00:55:58 -08:00
Xiaomeng Yang	7d4a81cbb2	Use macro for reduce on 2d blocks (#16344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16344 Use macro for reduce on 2d blocks i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13808988 fbshipit-source-id: b68c0fb6079c1b6e203a072083aba7a95c202bc2	2019-02-01 23:49:07 -08:00
Sebastian Messmer	f36f3cce9a	Simplify layer_norm_op_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16570 Reviewed By: ezyang Differential Revision: D13883913 fbshipit-source-id: 7437d3cbc00c0de92bb01562c620cb658aa9f0d3	2019-02-01 21:34:18 -08:00
Hao Lu	13db5dbb81	Make predictor base class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16541 Reviewed By: ajtulloch Differential Revision: D13858261 fbshipit-source-id: acbfdbea59bd20ab1cc7956ee0d8856d6faa8361	2019-02-01 20:59:19 -08:00
Yinghai Lu	98b333d810	Tag model_id and onnxifi index in OnnxifiOp (#16648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16648 We added onnxGraph sharing keyed on model id and net seq number but we forgot to supply these info to the Onnxifi. Therefore, we will only create ONE onnxGraph whatsoever... This diff adds necessary info to the OnnxifiOp to prevent this from happening. Reviewed By: bertmaher, rdzhabarov Differential Revision: D13912356 fbshipit-source-id: fe8982327287a35f32fe3b125d94b617d18c0ab5	2019-02-01 18:51:51 -08:00
svcscm	a4ac3cbb2f	Updating submodules Reviewed By: zpao fbshipit-source-id: ed389204bc423d2d5f7a36e2d61c0f55fe0522e1	2019-02-01 18:46:03 -08:00
David Riazati	c865d46736	Add @ignore annotation (#16055 ) Summary: Adds a decorator `torch.jit.ignore` for Python functions that tells the compiler to skip over these Python values, putting a `prim::Error` in their place which always throws an exception when run. This lets you have Python-only code in your model in an explicit way, which is useful for debugging, and still be able to save/load the model. Fixes #15815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16055 Differential Revision: D13797286 Pulled By: driazati fbshipit-source-id: 29d36776608ec101649a702952fc6ff3c27655b1	2019-02-01 16:46:12 -08:00
Hui Wu	31ab03e34f	Add Winograd Conv method for CPU (#15196 ) Summary: Add winograd conv method. Users can select the direct conv or winograd conv in the model file. We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196 Differential Revision: D13463721 Pulled By: yinghai fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee	2019-02-01 16:41:30 -08:00
Jesse Hellemn	69a816c060	Increase timeout on anaconda logins Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16682 Differential Revision: D13931438 Pulled By: pjh5 fbshipit-source-id: 9961e91a80d8c59ab6347e830b1da38533524dd2	2019-02-01 16:33:51 -08:00
Jerry Zhang	dc64d95f3a	Tensor method rename ndim()->dim() - 3/3 (#16680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16680 Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: houseroad Differential Revision: D13929471 fbshipit-source-id: b284ead11031f96fd8b6d96d2f29ffeb14207faa	2019-02-01 16:28:28 -08:00
Lu Fang	9594d9bcfd	fix the ONNX ci (#16674 ) Summary: ~~Let's see whether this trigger and fix the problem~~ remove the expect files from test_verify Pull Request resolved: https://github.com/pytorch/pytorch/pull/16674 Reviewed By: zrphercule Differential Revision: D13930668 Pulled By: houseroad fbshipit-source-id: 092157af07f475cf3809c95a4fe586e050c53b7e	2019-02-01 15:58:39 -08:00
Jesse Hellemn	7139410b72	Allow USE_NINJA to be toggled by an env variable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16665 Differential Revision: D13930021 Pulled By: pjh5 fbshipit-source-id: 4b490f952a56e8561329ab8898be2bf779b46b9d	2019-02-01 15:33:06 -08:00
Michael Suo	bd75fba4e8	fix tracing using a dictionary as input (#16616 ) Summary: Previously this would fail with the error message: ``` ValueError: Auto nesting doesn't know how to process an input object of type dict. Accepted types: Tensors, or lists/tuples of them ``` Turns out we're not using the line that causes this error (or a side effect of that line), so removing it fixes the issue. Also cleaned up some related dead code (cc apaszke to make sure the code isn't useful in some way) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16616 Differential Revision: D13908352 Pulled By: suo fbshipit-source-id: 27094f1f4ea0af215b901f7ed3520e94fbc587b3	2019-02-01 14:44:56 -08:00
Sebastian Messmer	aaa8ace486	Implement new c10 dispatcher (#16625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16625 This is a squash of multiple PRs that refactored the old c10 dispatcher into a new one that follows the c10 dispatcher design doc. It is now unboxed and follows the Stack semantics from JIT. It also uses the runtime JIT schema instead of its own compile time schema definitions. Reviewed By: ezyang Differential Revision: D13907069 fbshipit-source-id: edcc4806ccd21474fdfb5a98516219b1956db13d	2019-02-01 13:52:01 -08:00
Will Feng	a40e8ce7c5	Add train() / eval() / is_training() to C++ ScriptModule API (#16044 ) Summary: This PR aims to fix https://discuss.pytorch.org/t/how-to-change-a-loaded-model-to-evaluation-mode-in-c/32330, by adding `train()` / `eval()` / `is_training()` to C++ ScriptModule API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16044 Differential Revision: D13857724 Pulled By: yf225 fbshipit-source-id: 16d3969fb5840ff7e66c7f72e800e6c75db8d2ff	2019-02-01 13:07:38 -08:00
Syed Tousif Ahmed	6d373c02ef	Revert "Fixes selection of cuDNN algorithm (#15881 )" (#16484 ) Summary: There is a regression in cudnnGet*_v7 that causes slowdown in resnet50 training. I am opening a bug with cuDNN team about this. This reverts commit 38374468832e307ca741901870914857a836dd5d. ezyang 😿 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16484 Differential Revision: D13924755 Pulled By: soumith fbshipit-source-id: 8c719345fc443f1289539bfae630eea9224ba4a5	2019-02-01 13:07:36 -08:00
Soumith Chintala	638dbe4b46	Revert "Upgrade mkl-dnn to v0.17.3 to fix core dump issue (github#161… (#16660 ) Summary: …83) (#16653)" This reverts commit 87ae1558a6c8c7c0693bfa995458d16239c484d7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16660 Differential Revision: D13924272 Pulled By: soumith fbshipit-source-id: 79747d728adff1a9c32d8529846f0305052e57e8	2019-02-01 11:12:16 -08:00
Roy Li	4c803f4ebd	Expose backend extensions to python Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16582 Reviewed By: gchanan Differential Revision: D13887539 fbshipit-source-id: 8755babf2e3e849af974655f2f3a91740efe977e	2019-02-01 11:00:18 -08:00
Roy Li	7e642dfff3	Introduce backend extensions (overriding operators on custom backends) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15153 Reviewed By: gchanan Differential Revision: D13445571 fbshipit-source-id: 62e2ebe0a6e81c4983b47cddb57ee5eb78e96708	2019-02-01 11:00:16 -08:00
Roy Li	64186e06ec	Dispatch factory functions on Type (#15093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15093 Needed for backend extensions. Reviewed By: ezyang Differential Revision: D13427897 fbshipit-source-id: d0b34b0072e597ae599bd3bc25356831d7a18d6a	2019-02-01 11:00:15 -08:00
Edward Yang	d29912f59e	Only run Travis on master branch, not on export-DXXXXX branches. (#16628 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16628 Differential Revision: D13922097 Pulled By: ezyang fbshipit-source-id: eb16d90cc61167af5edc0c4e361d7a807a3099e5	2019-02-01 09:31:46 -08:00
Ailing Zhang	3672f1536e	Ignore assert_git_not_dirty for xla tests (#16611 ) Summary: Testing, will restore the branch filter before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16611 Differential Revision: D13902234 Pulled By: ailzhang fbshipit-source-id: 7fa4048b891645f5253c48b905fb9630e3079524	2019-02-01 08:56:44 -08:00
Asher Mancinelli	7078b2baf5	Better bounds checks in ctcloss (#16269 ) Summary: Adds better bounds checks for target lengths in CTC loss, checks for integral types for target and prediction lengths, and adds tests for each, according to #15946 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16269 Differential Revision: D13847567 Pulled By: ezyang fbshipit-source-id: 5d7a975565e02baf78fe388813a1d1ef56dfb212	2019-02-01 08:02:54 -08:00
Gu, Jinghui	87ae1558a6	Upgrade mkl-dnn to v0.17.3 to fix core dump issue (github#16183) (#16653 ) Summary: Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16653 Differential Revision: D13918278 Pulled By: soumith fbshipit-source-id: b9c09c50ef188b4099966216e155c9f3f2542276	2019-02-01 07:16:46 -08:00
peter.yeh@amd.com	10cd9d5a03	Skip dag_net_forking test on Rocm (#16639 ) Summary: -Skip the test due to flaky behavior on AMD/Rocm -The fix is expected in Rocm 2.2 ( HSA runtime) bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639 Differential Revision: D13915231 Pulled By: bddppq fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8	2019-02-01 00:53:54 -08:00
Amy Yang	b67b29b667	add SingleLoadedNetSupplier (#16620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16620 LogfiledbNetLoader loads all external input blobs into a workspace instance, we pack a shared pointer to this loaded workspace into the SingleLoadedNetSupplier. SingleLoadedNetSupplier will pass this workspace to BlackBoxPredictor to be executed. (D13891759 is a WIP of how it all comes together) Reviewed By: pjh5 Differential Revision: D13901467 fbshipit-source-id: 20589f898922f5f1aec50be131dad17a8c38e9b2	2019-01-31 23:51:59 -08:00
Xiaomeng Yang	4ae9ab24b6	Update conv_base to support empty batch (#16603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16603 Update conv_base to support empty batch Reviewed By: houseroad Differential Revision: D13894111 fbshipit-source-id: fc4370ff16ba6046f374e77bd845d28e6af05ea3	2019-01-31 23:46:18 -08:00
James Malcolm	b0e692c8a6	Improving docs for MultiLabelSoftMarginLoss (#16644 ) Summary: Resolves #15863 Changed the documentation for MultiLabelSoftMarginLoss and MultiLabelMarginLoss to be more explicit about the `target` format. More than happy to change the messaging based on discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16644 Differential Revision: D13912395 Pulled By: soumith fbshipit-source-id: 24a3c214c5f6f9d043e25b13ac758c1c1211b641	2019-01-31 22:07:14 -08:00
Zachary DeVito	536f647bae	respect MAX_JOBS (#16641 ) Summary: We inadvertently switch the OSX build over to ninja on CI. It then fails to respect MAX_JOBS and hits the same scache deadlock bug, this makes the ninja build respect MAX_JOBS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16641 Differential Revision: D13910751 Pulled By: zdevito fbshipit-source-id: 61bec500539519b019b74421a13cd87fc1d86090	2019-01-31 20:55:37 -08:00
James Reed	6ba4ca8780	Workaround unvectorized mean implementation (#16618 ) Summary: Workaround for https://github.com/pytorch/pytorch/issues/16617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16618 Differential Revision: D13904276 Pulled By: jamesr66a fbshipit-source-id: f8b5ea4c5f12dbc405123c9080c55b342c95bcd1	2019-01-31 20:50:53 -08:00
svcscm	2486facc34	Updating submodules Reviewed By: zpao fbshipit-source-id: 4d94eb18d4da58541a96c9f9c2ecc9746f779933	2019-01-31 19:37:24 -08:00
Edward Yang	e48ffa84d8	Add compare_exchange_deleter to DataPtr/UniqueVoidPtr (#16513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16513 compare_exchange_deleter makes it easier to replace a deleter on a DataPtr with a new one, without requiring allocating another closure to hold the old deleter. See comment for details. This diff was originally landed as part of D13762540 (#16226) but we are reverting that diff D13863610 (#16510) Reviewed By: smessmer Differential Revision: D13864245 fbshipit-source-id: 56eda4748238dd3a5130ba6434fda463fe7c690e	2019-01-31 17:40:04 -08:00
Bram Wasti	e4c1b51d82	Shim caffe2 GetRepeatedArgument helper for use with Ivalue (#16519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16519 GetRepeatedArguments is needed for some ops Reviewed By: dzhulgakov Differential Revision: D13864293 fbshipit-source-id: a39255cd391c28acd75a6f0e81d558542417e032	2019-01-31 17:33:57 -08:00
SsnL	13422fca32	Add torch.backends.openmp.is_available(); fix some cmake messages (#16425 ) Summary: 1. add `torch.backends.openmp.is_available()` 2. Improve various `cmake` outputs 3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets 4. Fix `MKL` warning message, and QUIET flag. 5. Fix various typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425 Differential Revision: D13903395 Pulled By: soumith fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d	2019-01-31 16:15:46 -08:00
Xiang Gao	f660d3ae19	Move outplace ops to ATen (#12413 ) Summary: So that things like below can be JITable, and available in C++ API: ```python import torch torch.jit.script def f(x, y, z): x.index_add(0, y, z) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12413 Differential Revision: D13899948 Pulled By: suo fbshipit-source-id: b0006b4bee2d1085c813733e1037e2dcde4ce626	2019-01-31 16:09:45 -08:00
Jesse Hellemn	6e17f4a126	Grant credentials to s3 html update job Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16631 Differential Revision: D13908331 Pulled By: pjh5 fbshipit-source-id: 846a4f933d947f7217b856bd79ff85b7f97288a8	2019-01-31 15:59:31 -08:00
Jerry Zhang	d5d7718770	fix scope related naming issue in build_quant_conv_bn_relu, and also format function signature Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14885 Reviewed By: harouwu Differential Revision: D13374077 fbshipit-source-id: 5082c4ea0d2fdc197243b022b9b489f38b04c8e9	2019-01-31 15:53:27 -08:00
Dmytro Dzhulgakov	51752e09c6	Disable layernorm_c10 test for now (#16630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630 two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly. Reviewed By: ezyang Differential Revision: D13908117 fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c	2019-01-31 15:47:13 -08:00
Elias Ellison	a386c28fcd	Remove constant propagation expect files (#16348 ) Summary: Remove constant prop expect files, and express graph conditions via python bindings. First diff in larger effort to remove expect files Pull Request resolved: https://github.com/pytorch/pytorch/pull/16348 Differential Revision: D13906929 Pulled By: eellison fbshipit-source-id: 7963caa3ccbc7bfc0006a160c952aa173d1ce633	2019-01-31 15:41:22 -08:00
James Reed	dfb081a7e4	Fix a lot of C++ build warnings (#16411 ) Summary: I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411 Differential Revision: D13901006 Pulled By: jamesr66a fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033	2019-01-31 14:35:56 -08:00
David Riazati	3f8fd19a86	Add immutable dict support (#16208 ) Summary: This PR adds basic support (creation and indexing) for immutable dictionaries in Script. This includes Python/string frontend support and a `IValue::GenericDict` type backed by a `std::unordered_map`. Only `str`, `int`, and `float` are supported as keys, any type can be a value. Structure is pretty similar to list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16208 Differential Revision: D13881686 Pulled By: driazati fbshipit-source-id: 29ce9835b953c3456f57bcc2bbdf7fe0cbf941c0	2019-01-31 14:29:23 -08:00
Jithun Nair	4bdf51cbd6	Make the miopen handle part of ConvolutionParams (#16613 ) Summary: so that it's included in the hashed key that decides whether to call Find or not. This is required to ensure that Find is run for all devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/16613 Differential Revision: D13901769 Pulled By: bddppq fbshipit-source-id: 7d29ea9e40231cd4eef80847afa1307efeb0945c	2019-01-31 14:09:04 -08:00
Dmytro Dzhulgakov	a061e3fd77	Back out "Revert D13596031: Improve c2-aten tensor interop and add proper testing" (#16514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16514 Original commit changeset: dc371697f14b Relanding https://github.com/pytorch/pytorch/pull/15860 - the problem was that layer_norm was using at::empty which is not yet on mobile Reviewed By: ezyang Differential Revision: D13861480 fbshipit-source-id: e2116da32bc117175c96b9151b1beba9b31eff36	2019-01-31 13:38:55 -08:00
Zachary DeVito	0b29bd82f6	use distutils to discover msvc compiler paths (#16540 ) Summary: This simplifies the process for building on windows, since users no longer have to find and run the vcvarsall.bat file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16540 Differential Revision: D13893596 Pulled By: zdevito fbshipit-source-id: 79b7ad55c3251b3f573fd8464931138f8a52dd1d	2019-01-31 13:25:33 -08:00
Bram Wasti	1ff46f03ed	Fix SIOF in torch using caffe2 registry (#16473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16473 This resolves the issues associated with caffe2 initialization (specifically the REGISTER_FUNCTION_SCHEMA_OPERATOR calls) being run after Torch's static op registration calls. The fix employs a meyer's singleton wrapped by the constructor of a type. Everything is placed inside a macro to make it easier for users to use. Reviewed By: smessmer Differential Revision: D13854306 fbshipit-source-id: ecf60861f229532826fae254974e9af4389055df	2019-01-31 13:04:11 -08:00
Bram Wasti	1efad7f6be	Swap Caffe2 operator constructor to pass arguments by value (#16576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16576 allows instantiation of operator with arguments passed by move rather than explicit copies per Sebastian's suggestion Reviewed By: smessmer Differential Revision: D13882416 fbshipit-source-id: bc8d50e73f5a1ae87155b0cf96799b8573a7a8fa	2019-01-31 13:04:09 -08:00
David Riazati	26565046ac	Allow ScriptModule(optimize=False) when jit disabled (#16297 ) Summary: Fixes #16285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16297 Differential Revision: D13797276 Pulled By: driazati fbshipit-source-id: 3a93500d4233cfbb8f5af7feba43f6ff4c3d22c7	2019-01-31 12:29:15 -08:00
Thomas Viehmann	20d45c43d7	Get more fusion after autodiff uses SumToSize (#14957 ) Summary: Here is a fresh attempt at getting some fusion back in autodiff-generated graphs in the presence of SumToSize. - The sum to size operator is now `aten::_grad_sum_to_size` to allow symbolic script differentiation (and that in turn would need to use this in place of sum_to_size to signal that it strictly operates on gradients). This is also used in the autodiff code, replacing `prim::SumToSize`. - `_grad_sum_to_size` is now fusable, `cat`s - which are fused afterwards thanks to Adam's simplification of the code - are only fused if there is no `_grad_sum_to_size` in the fusion group. - I push the `_grad_sum_to_size` out of the the fusion group when compiling and record the desired summations in the KernelSpec. The reasoning is the following: - As the autodiff is a repeated applicaiton of the chain rule, we always have the pattern `grad_in = mm(A, grad_out)`, with A often diagonal for cases interesting to the fuser, whence it is `grad_in = a * grad_out` (a pointwise multiplication). We know that only `grad_out` may have AutodiffGradSumToSize applied, so we can commute AutodiffGradSumToSize with the `mul` (and `div` and `neg` are of similar origin). - For `type_as` the gradient might be giving the type, so just skip SumToSize, - `add` (which was inserted as `prim::AutogradAdd`) adding gradients when the forward used the same value in several places. This is non-broadcasting, so we know that the two arguments would have the same sizes as inputs - which is good so we don't have to do bookkeeping of the two parts. Details: - During fusion, the Tensor arguments are always kept as the first parameters of the fusion group to accomodate indexing assumptions in the fuser. - The rewriting of the fusion group to record the necessary output transformation and eliminate `_grad_sum_to_size` from the fusion group is now in the fuser compile step. - In the execution step, the arguments are split into Tensor / Non-Tensor and the non-tensor args are mostly forgotten about except for doing `sum_to_size` at the end. This would want to be improved if/when we fuse nonconstant scalar arguments. - In a number of places in the fuser, the non-Tensor arguments to the fusion group needed to be ignored. Thank you, apaszke for the insightful discussion. All bad ideas and errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14957 Differential Revision: D13888173 Pulled By: zou3519 fbshipit-source-id: 071992c876e8b845f2b3e6329ae03a835d39a0ea	2019-01-31 12:24:38 -08:00
peter	4b7e70067c	Enable USE_NINJA in build_pytorch_libs.py if it is in PATH (#16545 ) Summary: It is required to fix the nightly conda builds. cc zdevito ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/16545 Differential Revision: D13900610 Pulled By: soumith fbshipit-source-id: 676f903a082f6f083e07245a1df38175bb82b2f7	2019-01-31 11:57:11 -08:00
sebftw	b109549bf3	Replaced "from_numpy" with "as_tensor" in docs. (#16587 ) Summary: In the warning box on https://pytorch.org/docs/stable/tensors.html#torch.Tensor.new_tensor it says: > new_tensor() always copies data. [...] If you have a numpy array and want to avoid a copy, use torch.from_numpy(). But then further up the page we have another warning box with the message: > torch.tensor() always copies data. [...] If you have a numpy array and want to avoid a copy, use torch.as_tensor(). Now I believe this is just a small oversight, since from_numpy is to be deprecated in favour of as_tensor. See for example https://github.com/pytorch/pytorch/issues/6885 and https://github.com/pytorch/pytorch/issues/8611. I suggest to just use torch.as_tensor() in both of the warning boxes. cc gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/16587 Differential Revision: D13897038 Pulled By: gchanan fbshipit-source-id: 2eb3cd47d2c0b5bf4350f980de3be9fe59b4a846	2019-01-31 11:51:32 -08:00
bhushan	482d3a3bf3	printing correct dimension while indexing (#16495 ) Summary: applySelect does modify the tensor and removes the top most dimension which makes it complicated to track just using dim and need to use another parameter as real_dim to signify original dimension fixes #16192 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16495 Differential Revision: D13897182 Pulled By: gchanan fbshipit-source-id: 105581dbbff6b431cc8e2539a07e0058161e53a1	2019-01-31 11:45:56 -08:00
Brennan Vincent	32daa90fbd	remove unused capture (#16526 ) Summary: We don't use this in the lambda body anymore. Remove it to fix a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16526 Differential Revision: D13867043 Pulled By: umanwizard fbshipit-source-id: 4c9a9d194fdfcb63fde16823517d2c6c8e2ae93d	2019-01-31 11:11:55 -08:00
Michael Suo	72a431edce	split up AliasTracker into a separate file (#16588 ) Summary: This just moves thing around to make AliasTracker independently testable and keep things a little more separate. Follow-on PRs will change the interfaces of AliasDb and AliasTracker to be more clearly distinct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16588 Differential Revision: D13891894 Pulled By: suo fbshipit-source-id: c5b590b5fdd462afefe743e499034068bf35784a	2019-01-31 10:53:53 -08:00
Zachary DeVito	e7e3838f3b	Access profiler from cpp (#16580 ) Summary: jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/16580 Differential Revision: D13891299 Pulled By: zdevito fbshipit-source-id: 83b335bf3231a9ab30e9318f2bce6d741ba5ffae	2019-01-31 10:37:47 -08:00
SsnL	d2861230f3	Fix cuFFT plan cache size on CUDA 10 cannot be set to > 4096 (#16384 ) Summary: Doc doesn't need to be changed. Also clarifies two inaccurate comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16384 Differential Revision: D13886637 Pulled By: soumith fbshipit-source-id: 227385008211a6f3ad9135c54fd2d3754cc9daaf	2019-01-31 06:56:39 -08:00
Jesse Hellemn	d47108add0	Clean up binary jobs in CircleCI (#16511 ) Summary: - Add libtorch upload jobs - Unify checkout and env code for binary jobs (san binary test jobs) - Compress variables passed into binary jobs Pull Request resolved: https://github.com/pytorch/pytorch/pull/16511 Differential Revision: D13893714 Pulled By: pjh5 fbshipit-source-id: b8bd72e1397dec569a8ec3e859e319178c7c6f8b	2019-01-30 23:46:58 -08:00
svcscm	8b053fccc7	Updating submodules Reviewed By: zpao fbshipit-source-id: 36c332beab1aaccb281d5ee07952d399056b7f8c	2019-01-30 23:37:23 -08:00
Jongsoo Park	db121375e7	more careful use of inline/template function in perfkernels (#15388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15388 This is another pass to make perfkernels code safer from illegal instruction error. Removed dependency to c10/util/Logging.h We're err on the safer side at the expense of some verbosity. Reviewed By: dskhudia Differential Revision: D13502902 fbshipit-source-id: 4f833115df885c5b4f8c1ca83b9badea1553f944	2019-01-30 22:49:37 -08:00
svcscm	26200ebf56	Updating submodules Reviewed By: zpao fbshipit-source-id: a0a2a635f86ef3bebfb4ca1a36f7ec9c2b09d7bb	2019-01-30 21:12:02 -08:00
Jerry Zhang	d3742603cb	DeviceScope support for CUDA and testing (#15357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15357 Supporting device option in FQ bn folding for ITER related ops Reviewed By: wat3rBro Differential Revision: D13370259 fbshipit-source-id: 4324c2716dfa69ddedc661ae2b1ad34c2f6fc4b6	2019-01-30 18:42:12 -08:00
Antoine Busque	a44826e659	Fix: avoid race condition on model zoo directory creation (#16578 ) Summary: The current implementation of the `torch.utils.model_zoo.load_url` function is prone to a race condition when creating the directory in which it saves the loaded models, since it checks whether the directory exists and then creates it in two separate steps. The directory can be created after the check was made but before we attempt to create the directory, resulting in an unhandled exception. Instead, try to create the directory directly, and do nothing if it already exists. Note: for Python versions ≥ 3.2, we could simply use the `exist_ok=True` flag on `os.makedirs`, but this is unavailable in Python 2.7. Signed-off-by: Antoine Busque <antoine.busque@elementai.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16578 Differential Revision: D13886470 Pulled By: soumith fbshipit-source-id: 88815c8a65eec96caea32d6e9a7f83802502fdb9	2019-01-30 18:35:45 -08:00
Iurii Zdebskyi	bc53805f2e	Remove redundant declarations (#16463 ) Summary: As there are no checks that all the functions are actually being used, we can end up with stale entries. This diff removes unused entries from Declarations.cwrap Testing: Successful build via "python setup.py develop" Pull Request resolved: https://github.com/pytorch/pytorch/pull/16463 Differential Revision: D13885815 Pulled By: izdeby fbshipit-source-id: 4e35c2ac9196167af74dff3d4f971210721285f8	2019-01-30 18:29:00 -08:00
Michael Suo	3ba6f55ae3	begin splitting up cpp tests (#16536 ) Summary: Start splitting up these tests so we don't have a massive test file. Doesn't change how you run them, since `gtest.cpp` and `no-gtest.cpp` will still collect everything. Renamed `tests.h` to `test_misc.h` to vaguely discourage people from adding yet more stuff to it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16536 Reviewed By: zdevito, eellison Differential Revision: D13882215 Pulled By: suo fbshipit-source-id: 61cf97f3c2c50703dcf6a3a34da01415ecb7e7d6	2019-01-30 17:58:54 -08:00
Christian Puhrsch	0ef9569841	Use dispatch tensor for device_guard instead of first Tensor argument Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16579 Differential Revision: D13886593 Pulled By: cpuhrsch fbshipit-source-id: 0722ec61da13c2541f7de51bf5c1ecfb9a12fad2	2019-01-30 17:30:24 -08:00
Owen Anderson	fc2d8c6889	Eliminate PYCMD in favor of PYTHON_EXECUTABLE in CMake. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16522 Differential Revision: D13867376 Pulled By: resistor fbshipit-source-id: 6bce68facea83c5161a31fcdfafe08827999eb2b	2019-01-30 17:13:43 -08:00
ParticularlyPythonicBS	16e2e4f29f	added example to clear ambiguity in torch.Tensor.view (#16563 ) Summary: Added example to the documentation of [torch.Tensor.view](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view) to avoid the misunderstanding referenced in issue [#16560](https://github.com/pytorch/pytorch/issues/16560) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16563 Differential Revision: D13885008 Pulled By: soumith fbshipit-source-id: b7e7fbea1f16124bc4e679ae9c50ab619e1f043d	2019-01-30 16:53:31 -08:00
Gregory Chanan	851437dd4b	Fix uninitialized data and broken broadcasting with sparse.mm and spa… (#16572 ) Summary: …rse.addmm. Fixes https://github.com/pytorch/pytorch/issues/16543. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16572 Differential Revision: D13884235 Pulled By: gchanan fbshipit-source-id: 308916051364d72f72ec56f0495c6c7c09845131	2019-01-30 16:08:50 -08:00
SsnL	33f2ab1fdb	add new build files to gitignore; test that build does not leave git repo checkout dirty (#16565 ) Summary: These appear when I run ``` MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ NO_CUDA=1 NO_DISTRIBUTED=1 BUILD_CAFFE2_OPS=0 DEBUG=1 python3 setup.py develop --cmake ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16565 Differential Revision: D13885790 Pulled By: ezyang fbshipit-source-id: af0e028d7fa7832a945aaee4e241ceb5418f4ec8	2019-01-30 15:19:11 -08:00
Edward Yang	c653fa2b00	Move Deprecated.h to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16504 Reviewed By: smessmer Differential Revision: D13860570 fbshipit-source-id: 4742dc30c78d49b0f655b4e9536f51b36a595636	2019-01-30 14:26:37 -08:00
Elias Ellison	18659e1336	Allow generic containers as module inputs (#16482 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16326 Previously we didn't handle module inputs which included Generic Lists. When checking whether a generic list if a subvalue of the input arg type, I currently recurse on every element of the list. This shouldn't be too slow since the innermost list will be specialized and we won't have to check it's elements. E.g. Tensor[][] -> GenericList [TensorList ]. The error message could be improved, but extracting the complete type of nested lists would have to deal with unifying types across lists / empty lists & typevars so I'm going to save that for a follow up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16482 Differential Revision: D13882582 Pulled By: eellison fbshipit-source-id: 3609bc572f0ee9ebf20a77ea5ebc8fa3b165e24b	2019-01-30 14:20:56 -08:00
Erik Brinkman	22e9c3055a	Explicit pdist captures (#16286 ) Summary: Per discussion with cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/16286 Differential Revision: D13883001 Pulled By: erikbrinkman fbshipit-source-id: 86f35d35fde5db67e3fbb09abc418da0116c9aac	2019-01-30 14:02:36 -08:00
Mikhail Zolotukhin	1905bbb01d	Include ATen/core/functional.h directly instead of torch/csrc/utils/functional.h. (#16377 ) Summary: One more shim removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16377 Differential Revision: D13821816 Pulled By: ZolotukhinM fbshipit-source-id: 007f014d404de51841437db7eef28367a2f6e46b	2019-01-30 14:02:34 -08:00
Jesse Hellemn	b28f0f9d37	Remove --no-update-dependencies (#16575 ) Summary: Absolutely no idea why this is needed. This should be a valid argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16575 Differential Revision: D13884796 Pulled By: pjh5 fbshipit-source-id: 6011e721e2870499f6b5e627d5ad00ece08b530b	2019-01-30 13:53:51 -08:00
Edward Yang	3ab736b774	Update PyTorch DockerVersion to 285. (#16507 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16507 Differential Revision: D13884588 Pulled By: ezyang fbshipit-source-id: b7e22daa15874f9a226195d4749b4f9f827d7c1e	2019-01-30 13:29:25 -08:00
Tim Khatkevich	2ed5569bd6	Support fallback for more operators (#16566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16566 it's a follow-up to https://github.com/pytorch/pytorch/pull/16456 Reviewed By: yinghai Differential Revision: D13881462 fbshipit-source-id: eff063580ac8f622477417ed4b25320299451811	2019-01-30 13:21:20 -08:00
Lu Fang	307c83b5eb	fix the linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16567 Differential Revision: D13882166 Pulled By: houseroad fbshipit-source-id: daf760f51e4fce376ca09421900405970d00c4d2	2019-01-30 13:16:49 -08:00
Sebastian Messmer	c43917b0a3	Add a test case calling caffe2 layer_norm from caffe2 executor but through the c10 dispatcher Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16283 Reviewed By: ezyang Differential Revision: D13792591 fbshipit-source-id: 9c190649e38e8706549102b2e136ceaf508eb37f	2019-01-30 13:16:47 -08:00
Jerry Zhang	2af95d8e3e	Back out "[pt1][tensor] Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize" (#16516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16516 Original commit changeset: 64abce3dbaed Reviewed By: dzhulgakov Differential Revision: D13863715 fbshipit-source-id: f1923fdca4a1a82768d9c280a8493ff15a7eb2ba	2019-01-30 12:50:38 -08:00
zrphercule	cdbd388206	Remove the debugging info of pytorch=>onnx coverage script (#16538 ) Summary: Remove the debug info. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16538 Reviewed By: houseroad Differential Revision: D13872068 Pulled By: zrphercule fbshipit-source-id: 7572668d0048c37f6b6029a48e5ae4b8b21823f7	2019-01-30 11:40:28 -08:00
Jacie Fan	a7796bc24d	CUDA histogram implementation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15842 Reviewed By: zou3519 Differential Revision: D13868982 Pulled By: jaciefan fbshipit-source-id: bce81dc121c4538d204047506f8f14d0b4d8f905	2019-01-30 11:36:20 -08:00
Michael Suo	dc84ff1e5a	Use a points-to graph for alias analysis (#16386 ) Summary: This PR changes the way we store aliasing information from a "set" approach to a "points-to" analysis. Set-based approaches lose information in ways that make it difficult to do "live" updates to the alias DB as one as mutating the graph. The tradeoff is that simple queries get more expensive, since they require traversing the points-to graph to answer most questions. In practice, this is unlikely to be that costly since we don't have massive aliasing chains, but we could create an approximation/caching layer if this becomes a problem. My rough plan is: 1. This PR, switching to a points-to graph 2. Make it "live": analyzing a node should record all the edges the node added, so that we can rollback when the node is destroyed. 3. Reduce wildcard scope: we can make the wildcard a special vertex that points to anything that we're not "sure" about; namely, things that have been put inside lists, or graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16386 Differential Revision: D13855117 Pulled By: suo fbshipit-source-id: f009f58143173c275501624eb105d07ab60fe5e1	2019-01-30 11:28:03 -08:00
Lara Haidar-Ahmad	dff8165d04	ONNX Export Flatten operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16240 Reviewed By: bddppq Differential Revision: D13800025 Pulled By: houseroad fbshipit-source-id: ae4c5e42026477b28ffd44eda2438d93936ea510	2019-01-30 11:05:00 -08:00
Edward Yang	68620cdcb5	Revert D13880053: [pytorch][PR] add new build files to gitignore; test that build doesn't leave repo dirty Differential Revision: D13880053 Original commit changeset: 0171f42438ef fbshipit-source-id: a734f8704c1cbe16434c672289c505b19b2b490a	2019-01-30 11:04:58 -08:00
vishwakftw	34b43baeec	Allow list and tuples to be passed as output_size to max_unpool1d (#16489 ) Summary: Changelog: - Modify concantenation of [1] to a tuple by using cases for list and non-list types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16489 Differential Revision: D13875838 Pulled By: soumith fbshipit-source-id: fade65cc47385986b773b9bde9b4601ab93fe1cf	2019-01-30 11:00:34 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
Ailing Zhang	3b91df3744	add example multiprocess code (#16345 ) Summary: fixes #16141 Differential Revision: D13868539 Pulled By: ailzhang fbshipit-source-id: 03e858d0aff7804c5e9e03a8666f42fd12836ef2	2019-01-30 09:35:58 -08:00
Yinghai Lu	fa717cba63	Support int64_t shape data for ideep reshape op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16533 Reviewed By: jerryzh168 Differential Revision: D13867402 fbshipit-source-id: ff53a851f142ef915ad69da3868bb3aab4d48987	2019-01-30 09:00:09 -08:00
SsnL	2d2eb7145a	add new build files to gitignore; test that build doesn't leave repo dirty Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16441 Differential Revision: D13880053 Pulled By: ezyang fbshipit-source-id: 0171f42438efdd651b6af22e521b80e85b12681c	2019-01-30 08:41:59 -08:00
Tim Khatkevich	7d7855ea31	Fallback support for more operators (#16456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16456 Adding fallbacks for more operators and fixing ifndef for expand_op.h Reviewed By: yinghai Differential Revision: D13845382 fbshipit-source-id: b7c5b7f7f176707b9ddffade139562a8085967ed	2019-01-30 03:54:11 -08:00
Lu Fang	21907b6ba2	Fix the dropout onnx symbolic, and ensure all exported models in test_operators.py are eval mode (#16547 ) Summary: In eval mode, skip dropout operator in onnx exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16547 Reviewed By: houseroad Differential Revision: D13877136 Pulled By: dzhulgakov fbshipit-source-id: c366da156f83677bcf4989b79166aae5b6c36125	2019-01-30 01:16:21 -08:00
Xiaomeng Yang	598b713660	Seperate level1 elementwise functions from math (#16397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16397 Seperate level1 elementwise functions from math i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13830626 fbshipit-source-id: e6e672647076dab8b3b24be181f580a1486250c9	2019-01-30 00:04:12 -08:00
Sebastian Messmer	ed4776820a	Fix includes for ATen/core/stack.h (#16462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16462 This file was moved, now we change the includes to the new location and remove the proxy header. Reviewed By: ezyang Differential Revision: D13847279 fbshipit-source-id: 4617d52fdcfe785cb7b2154460a6686c437abd8b	2019-01-29 23:33:13 -08:00
Sebastian Messmer	7c66ad7455	Add test case for calling c10 ops from pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16062 Reviewed By: ezyang Differential Revision: D13628955 fbshipit-source-id: f6ed3f07db2675bd9ae9251da990ca7b8c963717	2019-01-29 18:22:52 -08:00
Sebastian Messmer	12f92f453b	Kernel gets Stack* instead of ArrayRef<IValue> (#16282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16282 This changes the core kernel abstraction to be a function taking a stack, popping its arguments from the stack and pushing results to the stack, instead of getting arguments as ArrayRef<IValue> and returning an output IValue. Caffe2 operators need to have a way to pass in preallocated output tensors. The convention for them is to get all inputs and outputs on the stack and also return all of them, i.e. a caffe2 op will always have inputs == outputs. This will probably change in later diffs towards making the outputs in-arguments optional in the JIT schema. Reviewed By: ezyang Differential Revision: D13792335 fbshipit-source-id: e9cc2b5e438cc4653e1f701633a154b92b604932	2019-01-29 18:22:51 -08:00
xuzhu	6249442e90	Chunk dataset implementation (#15932 ) Summary: This PR contains the implementation of chunk dataset, with the API proposed in PR https://github.com/pytorch/pytorch/pull/15562 A chunk dataset is derived from StatefulDataset. It utilizes worker threads to prefetches chunk data, splits it into batches and caches them into a queue. When get_batch is called from dataloader, batch data is retrieved from the queue, and data in new chunks will be pushed for later following batches. Chunk dataset uses two samplers (chunk_sampler and example_sampler) to perform sampling. The chunk_sampler decides which chunk to load, and example_sampler shuffles the examples inside a specific chunk. More detail of this sampling approach can be found here: http://martin.zinkevich.org/publications/nips2010.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/15932 Differential Revision: D13868688 Pulled By: soumith fbshipit-source-id: a43000c478ca2a3c64cc84b3626d6b8b1ad9a07e	2019-01-29 18:06:01 -08:00
Zachary DeVito	21193bf123	try to get rid of tmp_install (#16414 ) Summary: Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414 Differential Revision: D13863635 Pulled By: zdevito fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828	2019-01-29 17:29:40 -08:00
Gregory Chanan	ffed8bff6a	Fix torch.sparse.sum parsing of dim. (#16517 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16501. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16517 Differential Revision: D13865322 Pulled By: gchanan fbshipit-source-id: fa0ac37a9e7b8f19a5bdf75e5771128e48c41612	2019-01-29 16:19:22 -08:00
Pieter Noordhuis	f924fc6eb1	Make Store::setTimeout take milliseconds (#16278 ) Summary: This is particularly useful when using a c10d::Store from tests. cc jgehring Pull Request resolved: https://github.com/pytorch/pytorch/pull/16278 Reviewed By: janewangfb Differential Revision: D13866271 Pulled By: pietern fbshipit-source-id: c8670b5f4ebd5cd009f2cabbe46cc17a9237d775	2019-01-29 16:15:25 -08:00
Edward Yang	279238f0b8	Back out "Delete duplicate copy of THCCachingAllocator." (#16510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16510 This diff was supposed to be memory usage neutral, but based on some internal flows involving cuDNN, it was not. Reverting pending further investigation. Original commit changeset: 03f1ebf7f11c Reviewed By: xw285cornell Differential Revision: D13863610 fbshipit-source-id: 15517e255fd6b0c064b65fb99f0ef19742236cfd	2019-01-29 15:44:19 -08:00
Matthew Brandyberry	4f809397fd	Fix compare_exchange_weak usage in weak_intrusive_ptr (#16302 ) Summary: In the case of spurious failure, refcount is not incremented -- which leads to underflow once all references are released. This was discovered when exercising multiprocessing on ppc64le. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16302 Differential Revision: D13845435 Pulled By: ezyang fbshipit-source-id: 8e264fff9dca8152cb12617e3216d5e48acd9557	2019-01-29 14:10:04 -08:00
Lu Fang	719134f3c3	Automatic update of fbcode/onnx to 15c33c945851907411619f599900c3852108e7e3 (#16493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16493 Previous import was dc75285d4a1cff9618400164dfdb26c5a1bab70a Included changes: - [15c33c9](https://github.com/onnx/onnx/commit/15c33c9): Add ppc64le build (#1768) <Chin Huang> - [198f840](https://github.com/onnx/onnx/commit/198f840): Update Broadcasting.md (#1769) <Verma-Rajat> - [60ac95f](https://github.com/onnx/onnx/commit/60ac95f): Merge back from release 1.4.1 (#1767) <Raymond Yang> - [a683372](https://github.com/onnx/onnx/commit/a683372): Bump up version number for v1.4.0 (#1761) (#1763) <Raymond Yang> - [dbf3581](https://github.com/onnx/onnx/commit/dbf3581): Add TfIdfVectorizer operator to ONNX (#1721) <Dmitri Smirnov> Reviewed By: zrphercule Differential Revision: D13858840 fbshipit-source-id: 1d00f63f265cc6deed965b92ed00c44f547ff03e	2019-01-29 13:48:49 -08:00
Edward Yang	541ce96564	Make the pytorch's cmake minimum required version equal to caffe2's. (#16506 ) Summary: Stack:     ⚫  #16506 Make the pytorch's cmake minimum required version equal to caffe2's.  [💛](https://our.intern.facebook.com/intern/diff/D13861564/) Originally authored by JerryShih <bignose1007@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16506 Differential Revision: D13863979 Pulled By: ezyang fbshipit-source-id: 9275739a820ae03ec6eaa41959f9340c9bba8de3	2019-01-29 13:39:32 -08:00
peter	3ab620926f	More windows fixes towards the code refactor (#16451 ) Summary: Fixes #16446. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16451 Differential Revision: D13864388 Pulled By: soumith fbshipit-source-id: 6cb173eafbd3da33c479c56c85aff75e8be4bf35	2019-01-29 13:15:36 -08:00
SsnL	ded6fb0293	Add stack & cat support for CPU Half (#16389 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/6968 Needed for #14705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16389 Differential Revision: D13861446 Pulled By: gchanan fbshipit-source-id: 7b8700b95aaf252d9669693dbddccb2302e58409	2019-01-29 13:06:29 -08:00
peter	d79e45bbba	Add some smoke tests for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16496 Differential Revision: D13863489 Pulled By: soumith fbshipit-source-id: 518003c27a6b788b5a78b58cdb8698f0bb6ce4d8	2019-01-29 12:54:39 -08:00
Thomas Viehmann	6a6983ed7f	create type hint stub files for module torch (#12500 ) Summary: We have: - This is an initial stab at creating a type stub `torch/__init__.pyi` . - This is only tested on Python 3, since that's the only Python version mypy works on. - So far, we only aim at doing this for torch functions and torch.Tensor. - Quite a few methods and functions have to be typed manually. These are done in `torch/__init__.pyi.in` For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out. An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500 Differential Revision: D13695553 Pulled By: ezyang fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21	2019-01-29 12:14:17 -08:00
Edward Yang	3b337e7892	Revert D13596031: Improve c2-aten tensor interop and add proper testing Differential Revision: D13596031 Original commit changeset: d20b601e06ba fbshipit-source-id: dc371697f14b3893a9164380a39e7a49d8d68ecf	2019-01-29 07:14:57 -08:00
Soumith Chintala	bd19dd4b90	url download bugfix for URLs served without Content-Length header (#16153 ) Summary: Some HTTP servers dont return Content-Length, account for that Fixes: https://github.com/pytorch/pytorch/issues/16152 Differential Revision: D13858882 Pulled By: soumith fbshipit-source-id: e4293e9368ed4c87548d22adec1ce0c25ea4bd8f	2019-01-29 01:28:47 -08:00
Mikhail Zolotukhin	dbebb5322c	Properly screen string literals when dumping JIT IR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16056 Differential Revision: D13719444 Pulled By: ZolotukhinM fbshipit-source-id: 7113ee9328eff6263513476cdf9254a2e1116f4c	2019-01-29 00:26:37 -08:00
Mikhail Zolotukhin	0e6123fb8a	Remove dependency on ResourceGuard from IR.h. (#16351 ) Summary: It looks like `WithInsertionPoint` and `WithCurrentScope` can be easily implemented without `ResourceGuard` - that helps readability and removes one more dependency. Is there anything I'm missing? Pull Request resolved: https://github.com/pytorch/pytorch/pull/16351 Differential Revision: D13821826 Pulled By: ZolotukhinM fbshipit-source-id: b203200b345fb5508a97dc8656e6f51cde4cc21f	2019-01-29 00:21:32 -08:00
Mikhail Zolotukhin	862d466bef	Remove redundant includes from scope.h and attributes.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16472 Differential Revision: D13852553 Pulled By: ZolotukhinM fbshipit-source-id: d5634982c2c42e704d9902774a77660e05fd71eb	2019-01-28 23:47:15 -08:00
Dmytro Dzhulgakov	5e21e0fe75	Improve c2-aten tensor interop and add proper testing (#15860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15860 Few changes (which are harder to split in separate diffs, so together): - make conversion explicit (as they can throw to avoid surprises) - fix tensor legacy dispatch not initialized when tensor is created on C2 side - add a bunch of invariants to enforce Reviewed By: ezyang Differential Revision: D13596031 fbshipit-source-id: d20b601e06ba47aeff2f6e8e15769840e2d46108	2019-01-28 23:41:50 -08:00
Your Name	9d6be6ac09	Remove redundant "build" setup.py commond from onnx scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16487 Differential Revision: D13858628 Pulled By: bddppq fbshipit-source-id: e1ff3fc5f9be5d3dbbf96ee73c3a8c901b440b82	2019-01-28 22:59:33 -08:00
James Reed	7f552041ff	Fix identifier shadowing in tracer (#16480 ) Summary: This was causing build failures under `-Werror` targets under optimized build modes Pull Request resolved: https://github.com/pytorch/pytorch/pull/16480 Differential Revision: D13857621 Pulled By: jamesr66a fbshipit-source-id: 2990b987dbca943298ad478c9ee2792236f5fa5b	2019-01-28 21:47:39 -08:00
Owen Anderson	f204e3e624	Pass WERROR to CMake as an explicit parameter rather than an env var. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16465 Differential Revision: D13853949 Pulled By: resistor fbshipit-source-id: 71ccf90a2824ad21c9f26dd753b186f30435d82a	2019-01-28 20:57:18 -08:00
Edward Yang	99fab45733	Remove redundant build from build develop instructions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16467 Differential Revision: D13849661 Pulled By: ezyang fbshipit-source-id: d3d58bd31ac65ad9cbf0057b9a4c499c0f59d95a	2019-01-28 20:47:11 -08:00
Jerry Zhang	52ca4b86db	Change SetOutputSize in ConvTransposeUnpoolBaseOp (#16179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16179 to avoid passing partially initialized Tensor around. Reviewed By: ezyang Differential Revision: D13744009 fbshipit-source-id: 4c545765e1cd164b3e87ce08ec4c1cb1e37e2b8f	2019-01-28 18:28:11 -08:00
Sebastian Messmer	5ebf4cd4e3	Move stack.h to ATen/core (#16247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16247 Stack is going to be used by the c10 dispatcher. This just moves the file, also changing the namespace turned out to be more complicated than I thought, I'll leave the namespace for now. Reviewed By: ezyang Differential Revision: D13774189 fbshipit-source-id: 66aeee36425e0ea2b3a4f8159604f38572306d57	2019-01-28 17:46:10 -08:00
Sebastian Messmer	504bcb276c	Remove state from schema and calling API (#16180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16180 Only the kernel knows about its state, the caller doesn't see it anymore. Reviewed By: ezyang Differential Revision: D13744071 fbshipit-source-id: cb00ff1a881508c1b36ac4123bee1f68ca02ca9c	2019-01-28 17:46:08 -08:00
Mikhail Zolotukhin	cc2d49deb7	Remove generic_if.h. (#16354 ) Summary: The current uses of `IR_IF` are mostly trivial, so there is not much value in having special macros for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16354 Differential Revision: D13821823 Pulled By: ZolotukhinM fbshipit-source-id: 1ca73111f5b4868fa38a1f29c9230540773e5de6	2019-01-28 17:02:23 -08:00
Jesse Hellemn	ed50bccb35	Remove CUDA_VERSION to flag and remove JOB_BASE_NAME from binary jobs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16470 Differential Revision: D13853387 Pulled By: pjh5 fbshipit-source-id: a2baccde65ab82b69380ee57b16e43cc80ed3e04	2019-01-28 16:52:11 -08:00
Gregory Chanan	4eceb7a055	Fix cmake byte version issue in build_pytorch_libs. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16457 Differential Revision: D13846408 Pulled By: gchanan fbshipit-source-id: 26962bc12d7d9fdad71f9dd7526f6d32e6008295	2019-01-28 16:00:28 -08:00
Jerry Zhang	ff963d4b9f	Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#16273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16273 Previously we have SetOutputSize which accept a partially initialized Output Tensor and set it to the correct size, the diff change this to GetOutputSize that returns the correct size instead. e.g. ``` auto* Y = Output(0); ConvPoolOp<Context>::SetOutputSize(X, Y, channels); ... Y->mutable_data<T>... ``` --> ``` auto sizes = ConvPoolOp<Context>::GetOutputSize(X, channels); auto* Y = Output(0, sizes, at::dtype<T>()); ``` Reviewed By: dzhulgakov Differential Revision: D13736281 fbshipit-source-id: 64abce3dbaed0b375098463333dfd0ea5a3b1945	2019-01-28 15:56:34 -08:00
James Reed	b076227b21	Move tracer impls into cpp file (#16410 ) Summary: Working on the tracer was really annoying because a lot of the implementations were in `tracer.h` and editing that file caused us to rebuild almost the whole world. So this moves all the implementations into tracer.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/16410 Differential Revision: D13847776 Pulled By: jamesr66a fbshipit-source-id: ec8500da32b2d4cd990f293a0a96101d3e82f158	2019-01-28 15:34:02 -08:00
Michael Suo	1a77918955	fix alias annotations on to, cpu, cuda (#16460 ) Summary: Fix alias annotations for ops that may return a fresh tensor. The previous version was overly conservative. Currently there is no actual behavior change in the alias analysis, but we may use the information in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16460 Differential Revision: D13849086 Pulled By: suo fbshipit-source-id: cd23b314a800e5e077d866e74456d37a321439d5	2019-01-28 15:20:23 -08:00
Your Name	e3c0926c44	Remove usage of deprecated "min_satisfying_examples" hypothesis setting (#16401 ) Summary: This setting has been deprecated in [hypythesis 3.56.0](`d1b0df5b91/hypothesis-python/docs/changes.rst (3560---2018-04-17)`) and recently has been removed in [hypothesis 4.x](`d1b0df5b91/hypothesis-python/docs/changes.rst (400---2019-01-14)`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16401 Reviewed By: ezyang Differential Revision: D13832528 Pulled By: bddppq fbshipit-source-id: 04b9f1dfdf2dcfe0ef121dd02f7fbfdf6bf4aead	2019-01-28 14:17:10 -08:00
Christian Puhrsch	0ae45e30bc	Support Tensor alias annotations for native_functions.yaml (#16239 ) Summary: Adds Tensor alias annotations. This isn't a full implementation of alias annotations, but that isn't required to increase compliance with the JIT signature schema. There are some sanity checks within native_parse.py for their usage, which can also help overall correctness. Otherwise, this exists solely for further alignment between the JIT signature schema and the native_functions.yaml func schema. This gets us to ~85% matches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16239 Differential Revision: D13804133 Pulled By: cpuhrsch fbshipit-source-id: aa5750f2c7e0f08b8c35d6d8f38cb148e9629855	2019-01-28 13:57:25 -08:00
Johannes M Dieterich	120c54743e	Annotate the bicubic interpolation kernels (#16449 ) Summary: with the correct `__launch_bounds__` for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16449 Differential Revision: D13844111 Pulled By: bddppq fbshipit-source-id: 07ed8552a630f3a6426d9e5648c415f066991e3d	2019-01-28 13:47:28 -08:00
SsnL	fb17be1368	Clear cmake cache when --cmake (#16426 ) Summary: Also, because sometimes we have `CMakeCache.txt` but cmake errored out so I'm adding the existence of `'build.ninja'` as another criterion of rerunning cmake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16426 Differential Revision: D13843801 Pulled By: ezyang fbshipit-source-id: ea1efb201062f23b7608f8d061997d8a8e293445	2019-01-28 13:43:17 -08:00
Jerry Zhang	e866bc7c88	Remove dims() in caffe2::Tensor (#16356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16356 att Reviewed By: dzhulgakov Differential Revision: D13813197 fbshipit-source-id: 68c0fb43404536f622422c51949c819d8a037aa5	2019-01-28 12:42:42 -08:00
Sebastian Messmer	05678d0bfa	Op-calling API can handle state (#16177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16177 Change the API for calling operators so that it can store state in an OpKernel object. This diff doesn't store the state there yet, that comes in a follow up diff. Reviewed By: ezyang Differential Revision: D13742889 fbshipit-source-id: 20511a9a1b9f850074e50634d4b4acf87f8c6ecd	2019-01-28 11:46:05 -08:00
Sebastian Messmer	80f4374dde	Handle stack correctly (#16246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246 The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple. For some reason, this did work when called from python (probably some auto-unpacking), but once called from JIT, it segfaulted. This diff fixes that. Reviewed By: dzhulgakov Differential Revision: D13780147 fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475	2019-01-28 11:46:03 -08:00
Helmut	c7547dbd5e	Fix compiler error in swapBytes64 for rare architectures (#16418 ) Summary: swapBytes64 used to use SwapByteOrder_32 and value, both of which dont exist. This commit rewrites that part from scratch. This happened on Debugbuild on Microsoft compiler. For that case " && !defined(_DEBUG)" is also removed, because _byteswap_uint64 works fine in debug mode (if it is necessary it should me commented why). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16418 Differential Revision: D13843306 Pulled By: ezyang fbshipit-source-id: dde1c7baeccec3aaa750d4b7200b3f4ccb4a00cb	2019-01-28 11:38:07 -08:00
Junjie Bai	17d7818578	Fix lint errors introduced in pytorch/pytorch@ceece5d (#16454 ) Summary: ifedan ``` ./test/common_utils.py:748:1: E302 expected 2 blank lines, found 1 ./test/test_torch.py:1235:5: E303 too many blank lines (2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16454 Differential Revision: D13844905 Pulled By: bddppq fbshipit-source-id: 3dc7c740d86310a8efc9864d7c7798fda8257a21	2019-01-28 11:29:11 -08:00
Syed Tousif Ahmed	17e3ab957a	Report the slowest 10 tests when using pytest (#16423 ) Summary: This flag is useful in identifying if a test is taking way too long like the ones in the following snippet when running the test suite with pytest. `9757ad35b0/test/common_utils.py (L814-L835)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16423 Differential Revision: D13843507 Pulled By: ezyang fbshipit-source-id: 643e1766a85905b3b112ea5ca562135a17896a72	2019-01-28 10:33:05 -08:00
Xiaomeng Yang	0a2d14dd7c	Optimize SpatialBNOp on GPU (#16395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16395 Optimize SpatialBNOp on GPU i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13829833 fbshipit-source-id: 04d2a63e8e9830c4c39a91cf87fcd7aa765dc55f	2019-01-28 09:36:45 -08:00
Igor Fedan	ceece5dd0f	CPU implementation of torch.cdist (#16168 ) Summary: cdist is used for calculating distances between collections of observations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16168 Differential Revision: D13739147 Pulled By: ifedan fbshipit-source-id: 9419c2c166891ac7db40672c72f17848f0b446f9	2019-01-28 09:16:32 -08:00
Brennan Vincent	14138f4605	Don't initialize a new `std::vector` in a loop. (#15850 ) Summary: Before this diff, we execute `std::vector<optional<acc_t>> buffer((unsigned)max_threads, optional<acc_t> {});` in every iteration of `foreach_reduced_elt`. Change the code to only execute that line if we need it; i.e., we are actually about to parallelize. This overhead is quite significant when we are doing a lot of small reductions in single-threaded code. ``` x=torch.randn((1024,10,1024),dtype=torch.float64) torch.set_num_threads(1) %timeit x.std(1) ``` Before (with #15845 applied): 708.25 ms After: 508 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/15850 Differential Revision: D13612960 Pulled By: umanwizard fbshipit-source-id: f5e61abfe0027775c97ed81ac09c997fbee741df	2019-01-28 08:50:27 -08:00
Edward Yang	d2cdffaf37	More documentation on caffe2::Operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16371 Reviewed By: dzhulgakov Differential Revision: D13820472 fbshipit-source-id: efccea0e92c86d30ec2bdda50eb9aab8a3a1504d	2019-01-28 07:41:14 -08:00
rotuna	fdaa77ae8b	Better error message when creating a module instance in jit.script (#16416 ) Summary: Made the change requested in #15555 PR was failing build due to a time out error while getting packages using pip. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16416 Differential Revision: D13833873 Pulled By: soumith fbshipit-source-id: e2200e9e8015558fcd359dfa3d025b25802d62b5	2019-01-27 16:29:46 -08:00
peter	952a03ccea	Fix issues on Windows brought by #16289 (#16412 ) Summary: This one needs to be merged ASAP because the CUDA build for Windows is skipped at this time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16412 Differential Revision: D13833889 Pulled By: soumith fbshipit-source-id: 95a401a01fb0f9c1045df0bfd72d8206b8a6f3fd	2019-01-27 15:02:31 -08:00
Gemfield	2b6607065b	Fix a typo in Parallel.h (#16419 ) Summary: Fix a typo in Parallel.h. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16419 Differential Revision: D13833705 Pulled By: soumith fbshipit-source-id: 824ebe753e028fc8e2b5d7a51fdba98a365fd29a	2019-01-27 14:16:47 -08:00
peterjc123	ee18448138	Don't install PDB for Windows static build of caffe2_observers (#16420 ) Summary: Fixes #16292. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16420 Differential Revision: D13833704 Pulled By: soumith fbshipit-source-id: 482ad6ce103bed7206e924e8c82454fbb1bfac42	2019-01-27 12:29:49 -08:00
SsnL	c863a759a0	Fix slogdet sign requiring grad when input requires grad (#16337 ) Summary: The real fix for https://github.com/pytorch/pytorch/issues/15605. This is sort of BC breaking because now ```py In [1]: import torch In [2]: a = torch.randn(3, 3, requires_grad=True) In [3]: a.slogdet() Out[3]: (tensor(1.), tensor(0.1356, grad_fn=<SlogdetBackward>)) In [4]: a.slogdet()[0].requires_grad Out[4]: False ``` while before this patch ` a.slogdet()[0]` requires grad with `grad_fn=<SlogdetBackward>`. But any use of backproping through this value will meet the error in #15605 so I don't think this is a problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16337 Differential Revision: D13832644 Pulled By: soumith fbshipit-source-id: f96c477e99edcbdbd966888e5c5ea7fd058429a8	2019-01-27 12:11:14 -08:00
Zachary DeVito	6944461a76	CI Fix: restore MAX_JOBS variable (#16415 ) Summary: Restores a CI workaround (https://github.com/pytorch/pytorch/pull/7361) that got dropped with build_pytorch_libs.sh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16415 Differential Revision: D13833092 Pulled By: zdevito fbshipit-source-id: f78b60cafd8da945790dba28de373b8faf46e9f5	2019-01-27 01:27:50 -08:00
Samuel Fadel	3c30cf3237	Update einsum documentation. (#16323 ) Summary: The documentation stated that operands to einsum should be a list of Tensors, not individual arguments. The function, however, now accepts individual arguments for each Tensor operand and a single argument consisting of a list of Tensors. The documentation was updated to reflect this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16323 Differential Revision: D13832647 Pulled By: soumith fbshipit-source-id: c01c2b350f47674d3170337f493b0ee2ea381b3f	2019-01-26 18:00:57 -08:00
James Reed	de6bb3f3e3	Fix flake8 warnings/errors in test_jit.py (#16409 ) Summary: These were really annoying to see in the phabricator UI when trying to land PRs that touched test_jit.py, so this fixes them. One remaining item is the T484 error. Locally, flake8 still chokes on that line even though I put the noqa comment there (and tried varying whitespaces around it etc). Not sure why it still persists... Pull Request resolved: https://github.com/pytorch/pytorch/pull/16409 Differential Revision: D13832658 Pulled By: jamesr66a fbshipit-source-id: 46356ba6444ae5ee1a141c28489bdcc7c99e39c0	2019-01-26 17:42:08 -08:00
James Reed	d1ed0176df	Trace fork and join calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16232 Differential Revision: D13772974 Pulled By: jamesr66a fbshipit-source-id: b2db370271809e26d3301f8cc98eec567db5e62b	2019-01-26 14:42:45 -08:00
vishwakftw	8c81a72e87	Switch to CUDA implementation if batch size >= 65536 for affine_grid (#16403 ) Summary: Changelog: - Append a condition that switches to the native CUDA implementation for affine_grid Fixes #16365 Differential Revision: D13832192 Pulled By: soumith fbshipit-source-id: 3f484e6673d71e3ba7627b170cb8f1611e12b9b2	2019-01-26 11:18:57 -08:00
SsnL	f6e6b0fd33	gitignore gdb history Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16404 Differential Revision: D13832191 Pulled By: soumith fbshipit-source-id: ab23d1ad72c041ec2d9616c273bbf399e0feb10d	2019-01-26 09:46:01 -08:00
Juan Miguel Pino	41e9b092a9	Revert D13821061: [redo][c10] layernorm example Differential Revision: D13821061 Original commit changeset: 82f0dade0145 fbshipit-source-id: e5b0b1bab0c9e731ae04add35e9a6c91656dd178	2019-01-25 22:52:04 -08:00
Jerry Zhang	f4e54fd659	trying to fix testX (#16370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16370 passed locally but seems testX has some problem Reviewed By: ezyang Differential Revision: D13820250 fbshipit-source-id: e4ad9d1ec99508867d4ead46753a7fb7019c50bd	2019-01-25 17:02:21 -08:00
Bram Wasti	27a1ba3ef2	layernorm example (#16374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374 this fixes the original attempt in OSS (adds to CMake and python build files) Reviewed By: smessmer Differential Revision: D13821061 fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162	2019-01-25 16:52:33 -08:00
Bram Wasti	13fde345fb	plug caffe2 into jit" (#16388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16388 previous diff broke master -- this refactors out the custom_operator.cpp file into a separate header + cpp pair (caffe2_operator.{h,cpp}) Reviewed By: smessmer Differential Revision: D13823550 fbshipit-source-id: 00e005e650336132d05aef97c1f0e5242ccad5ba	2019-01-25 16:52:32 -08:00
Junjie Bai	41acbb3b6b	Enable centos pytorch rocm CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14879 Differential Revision: D13821534 Pulled By: bddppq fbshipit-source-id: 45151b880992f1efa83e29c4985a723374575506	2019-01-25 16:27:55 -08:00
Zachary DeVito	9477a5d9c8	Remove bash from build (#16289 ) Summary: This commit removes the dependency on `build_pytorch_libs.sh` by moving the remaining functionality that is not expressible in cmake into python. Removing the indirection through bash also removes over 300 lines of environment munging code that is incredibly hard to understand because it passes a lot of secret parameters through `os.env`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16289 Reviewed By: ezyang Differential Revision: D13821662 Pulled By: zdevito fbshipit-source-id: d658d26925e3b1169ac1e3d44a159cf8a1f0d9b1	2019-01-25 16:03:53 -08:00
Jerry Zhang	539894d70a	Remove caffe2::ShareData (#16139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139 Original commit changeset: 4b15a4c62995 Reviewed By: dzhulgakov Differential Revision: D13677464 fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb	2019-01-25 15:39:11 -08:00
Jesse Hellemn	ca86d1f01d	Trying a fix to anaconda logins on nightlies Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16387 Differential Revision: D13826227 Pulled By: pjh5 fbshipit-source-id: 769a53e40a4912879faf9716a80c0e0c86acdbf8	2019-01-25 15:19:29 -08:00
Elias Ellison	956cabd887	Update Documentation for Optionals (#16380 ) Summary: Now that https://github.com/pytorch/pytorch/pull/15587 has landed, updating docs. Will close https://github.com/pytorch/pytorch/issues/15278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16380 Differential Revision: D13825221 Pulled By: eellison fbshipit-source-id: c5a7a7fbb40ba7be46a80760862468f2c9967169	2019-01-25 15:14:16 -08:00
Zachary DeVito	c42431bd7a	Revert D13740752: [c10] plug caffe2 into jit Differential Revision: D13740752 Original commit changeset: 2d9383574d42 fbshipit-source-id: e9ff217a438720423340a10af7fa263b33f2ae24	2019-01-25 12:29:19 -08:00
Gu, Jinghui	0e6791b275	Impl Shape op for mkldnn (#15266 ) Summary: Impl Shape op for mkldnn Pull Request resolved: https://github.com/pytorch/pytorch/pull/15266 Differential Revision: D13804558 Pulled By: yinghai fbshipit-source-id: 8a35f608c23973d7a15c3d645aee4059eb55f245	2019-01-25 11:04:57 -08:00
Bram Wasti	958f846fb3	Back out "[c10] layernorm example" Summary: Original commit changeset: 87240ca7f48d Reviewed By: bddppq Differential Revision: D13816657 fbshipit-source-id: bafcf0779d811c7e4a134cfb323a89352fa8c180	2019-01-25 10:22:30 -08:00
Ailing Zhang	f087c65a56	Add xla test in CI (#15978 ) Summary: Adding xla CPU tests in our CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15978 Differential Revision: D13816344 Pulled By: ailzhang fbshipit-source-id: f74c52e846976ea4ac439313847908a0e99d05eb	2019-01-25 09:24:45 -08:00
Edward Yang	45602ce9a2	Delete Tensor::swap(), replace with pointer swap (#12730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12730 i-am-not-moving-c2-to-c10 Reviewed By: smessmer Differential Revision: D10415430 fbshipit-source-id: 8a2ce8611c5fa77bbbd73fb6788c1baa3b370f07	2019-01-25 08:25:07 -08:00
SsnL	4aae89fa7b	Make test_proper_exit more robust (#16249 ) Summary: 1. Improve error message for better debugging info 2. Increase timeout 3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness Attempt to fix #14501 cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249 Differential Revision: D13784702 Pulled By: ezyang fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f	2019-01-25 08:25:05 -08:00
Si Chen	ec2a7fa4d4	fix contbuild (#16362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16362 https://our.intern.facebook.com/intern/testinfra/diagnostics/281475065177800.844424930381786.1548397180/ Reviewed By: ezyang Differential Revision: D13816639 fbshipit-source-id: 024117233f6d3bc6244013ca2ee1aea065560212	2019-01-25 08:25:04 -08:00
Xiaomeng Yang	8683b75df6	Minor change of group_norm_gradient on GPU (#16307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16307 Minor change of group_norm_gradient on GPU Reviewed By: houseroad Differential Revision: D13800613 fbshipit-source-id: 9e55f93b1e322efe3fc2d684b9c47c3dbb7a0f48	2019-01-25 01:25:29 -08:00
Junjie Bai	52135e9b12	Revert D13551909: [fbcode] logdevice for generic feature type Differential Revision: D13551909 Original commit changeset: 807830c50bee fbshipit-source-id: 48cacf4ec1765253a9be9d78f4b28cc48330be59	2019-01-25 00:33:06 -08:00
Qin Huang	11a2b3799b	logdevice for generic feature type (#16191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16191 logdevice related modifications for generic feature type we directly convert the generic feature structures to json strings, which corresponds to the column input in offline and dper Reviewed By: itomatik Differential Revision: D13551909 fbshipit-source-id: 807830c50bee569de202530bc3700374757793a2	2019-01-24 23:33:19 -08:00
Bram Wasti	265ed8ff45	layernorm example (#16350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350 Example usage of the new caffe2 integration Reviewed By: smessmer Differential Revision: D13408546 fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611	2019-01-24 22:28:22 -08:00
Bram Wasti	6d2aee4a9b	plug caffe2 into jit (#16331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16331 Temporary measure to enable caffe2 ops in pytorch Reviewed By: smessmer Differential Revision: D13740752 fbshipit-source-id: 2d9383574d42ce84ee471aba32eeb4f5a0cc7a4c	2019-01-24 22:28:21 -08:00
Bram Wasti	d4b60f4014	Add RunOperator for using FunctionSchema registered ops easily in caffe2 (#16173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16173 Helper to make it easy to run ops in caffe2 Reviewed By: smessmer Differential Revision: D13468240 fbshipit-source-id: 2276c7870af6dcdf829957f005fd16ac1ef319b5	2019-01-24 22:28:19 -08:00
Bram Wasti	3b6b777a11	Add correct Input() shim to caffe2 operator impl (#16048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16048 This enables full shimming of the operator (previously it was only Output() shimmed). Reviewed By: smessmer Differential Revision: D13468241 fbshipit-source-id: c853b775ab5cdcd968f4a6cc4766e91c3c6b1c45	2019-01-24 22:28:18 -08:00
Shen Li	7ce634ebc2	Relax lower bound for nogil timing test to avoid false alarm (#16259 ) Summary: fixes #16250, #16271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16259 Differential Revision: D13784505 Pulled By: mrshenli fbshipit-source-id: 0b7ad98cd3c018b9907d70158de3abc3c4cb57ef	2019-01-24 17:16:02 -08:00
Mikhail Zolotukhin	c787de6284	Code-style fixes. (#16342 ) Summary: Some cleanups in ir.{h,cpp}. I plan to continue cleaning it up, so this is a first step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16342 Differential Revision: D13808897 Pulled By: ZolotukhinM fbshipit-source-id: 2dedb414576c3efbf8e36434145d7f14a66b1ee7	2019-01-24 16:44:25 -08:00
Jongsoo Park	6700eff03e	disable testing group conv with EIGEN engine (#16335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335 group conv is not implemented with EIGEN engine so this diff disables related tests Reviewed By: jamesr66a Differential Revision: D13807204 fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7	2019-01-24 16:39:20 -08:00
Elias Ellison	c2be9f1487	Remove unneeded manual unwrap optionals (#16245 ) Summary: Remove calls to torch.jit._unwrap_optional that are no longer needed. The remaining instances would require control flow logic for exceptions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16245 Differential Revision: D13804292 Pulled By: eellison fbshipit-source-id: 08c5cbe4b956519be2333de5cf4e202488aff626	2019-01-24 15:48:01 -08:00
Yan Shang	f769cf999d	fix buildindexop (#16341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16341 as in the title Reviewed By: intermilan Differential Revision: D13808679 fbshipit-source-id: 0d12d3253f380bec66bc9be899be565861b8163a	2019-01-24 15:31:54 -08:00
Hoa Dinh	69b5ae4c54	Revert D13747581: Optimize SpatialBN on GPU Differential Revision: D13747581 Original commit changeset: 48a885a240ef fbshipit-source-id: 58cec6023843d7459865eb80c9db8dac463cb96c	2019-01-24 15:26:37 -08:00
Jerry Zhang	0b470d0a3b	Add Test for ReinitializeTensor (#16338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16338 att Reviewed By: ezyang Differential Revision: D13806760 fbshipit-source-id: 322b9b7d314aeb0194f52b803ca35c0cb8efcdec	2019-01-24 15:05:21 -08:00
Will Feng	2a70f24cce	Add thread-local guard: at::AutoNonVariableTypeMode (#15939 ) Summary: This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable. This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939 Reviewed By: ezyang Differential Revision: D13640980 Pulled By: yf225 fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f	2019-01-24 14:33:03 -08:00
Jongsoo Park	f0dd85d141	reduce parameter space of test_1x1_conv to avoid timeout (#16223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16223 As title says Reviewed By: jamesr66a Differential Revision: D13758202 fbshipit-source-id: 3cdffb80a5dad53b29e65e8eb0ae128edba70dbb	2019-01-24 14:17:11 -08:00
Sidney Zhang	fdda533eb1	Update docs to include variable annotation example (#16324 ) Summary: Relates to this issue https://github.com/pytorch/pytorch/issues/16288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16324 Reviewed By: ezyang Differential Revision: D13805412 Pulled By: suo fbshipit-source-id: 8b80f988262da2c717452a71142327bbc23d1b8f	2019-01-24 13:10:56 -08:00
Edward Yang	792cb774f1	Delete duplicate copy of THCCachingAllocator. (#16226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16226 Now that the caching allocator is moved to c10_cuda, we can delete the duplicate copy from Caffe2. Reviewed By: dzhulgakov, smessmer Differential Revision: D13762540 fbshipit-source-id: 03f1ebf7f11c68c19aa0d66110156fe228da6138	2019-01-24 12:06:57 -08:00
Edward Yang	e936a69085	Move THCCachingAllocator to c10_cuda. (#16119 ) Summary: Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119 Reviewed By: smessmer Differential Revision: D13718768 fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656	2019-01-24 12:06:56 -08:00
Edward Yang	24b50f1411	Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117 This means I can move it to c10_cuda with minimal fuss. Reviewed By: smessmer Differential Revision: D13717836 fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00	2019-01-24 12:06:54 -08:00
Mikhail Zolotukhin	47bf30661f	Directly include headers from ATen. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287 Differential Revision: D13792949 Pulled By: ZolotukhinM fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3	2019-01-24 11:22:27 -08:00
Richard Zou	af513cd433	Refactor the docs build workflow (#16265 ) Summary: In preparation for setting up a doc build job for stable docs, I wanted to refactor the workflow so that future changes will be easier. This PR the following changes: - Refactor the doc push script into a reusable command - Add command line options for the doc push script. These don't matter too much for now but will be useful for setting up future jobs for building different versions of the docs. - Instead of checking out pytorch/pytorch:master, we re-use the pytorch installation inside the docker image. - Change the sed in the script to a perl command. sed is annoyingly different across platforms; the perl command is more stable - Run the script in dry run mode (without pushing the doc build) whenever a PR is opened. This lets us test changes to the doc build workflow. Test Plan - I tested the doc build script locally with my own credentials and it worked fine. - Wait for the pytorch_doc_push CI. - After merging this PR, keep an eye on the pytorch_doc_push CI status. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16265 Differential Revision: D13803511 Pulled By: zou3519 fbshipit-source-id: 4564bca3e74d490f89a1d1da9fb8b98eb44bdbb1	2019-01-24 11:18:57 -08:00
Owen Anderson	a4be15377f	Save a little bit of work in constant pooling by not moving nodes that will get deleted. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16161 Differential Revision: D13791247 Pulled By: resistor fbshipit-source-id: 2a5a4f98309509b4ba875373ee57e6f63c75a4fd	2019-01-24 10:59:57 -08:00
Gregory Chanan	0cb24098c7	Handle non-contiguous inputs with mkldnn convolution. (#16300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16018. Backwards appears to be fine because the derivative is written in terms of mkldnn_convolution itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16300 Differential Revision: D13797776 Pulled By: gchanan fbshipit-source-id: 68a990b8a3c186412a99d176931314806c9ed7bf	2019-01-24 07:39:31 -08:00
Xiaomeng Yang	45c3cc9174	Optimize SpatialBN on GPU (#16202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16202 Optimize SpatialBN on GPU Reviewed By: houseroad Differential Revision: D13747581 fbshipit-source-id: 48a885a240ef2a325235e8f89ebbe50e7c780c84	2019-01-24 02:55:10 -08:00
Xiaomeng Yang	60241e94b3	optimize group_norm (#16216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16216 Optimize GroupNormOp Reviewed By: houseroad Differential Revision: D13754145 fbshipit-source-id: 650f64c81486c6c9d276f2e3325392d5838751ba	2019-01-23 23:57:45 -08:00
Lu Fang	8ab4d348f4	Fix the tensor deserialization problem of jit script module on CUDA (#16279 ) Summary: Now we create a temporary tensor for the whole record. Fix https://github.com/pytorch/pytorch/issues/15271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16279 Reviewed By: BIT-silence Differential Revision: D13791442 Pulled By: houseroad fbshipit-source-id: 6f52ca09627fb684f74121357cc42e4adadec36a	2019-01-23 21:35:35 -08:00
Erik Brinkman	3cba115abb	Small fixes for pdist (#16210 ) Summary: pdist was recently patched to remove buggy batch support and fix issues with large tensors. This fixed missed a few spots, and didn't handle a few recommendations that this commit addresses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16210 Differential Revision: D13791914 Pulled By: gchanan fbshipit-source-id: 0595841be1b298f7268fd4c02a6628acfec918f2	2019-01-23 19:40:16 -08:00
Jerry Zhang	0a3932acb2	Fix comparison in ReinitializeTensor (#16294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294 In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`. Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs. Reviewed By: BIT-silence Differential Revision: D13795635 fbshipit-source-id: 24d6afa1a0196a32eb0134ee08b4280244cdb0c3	2019-01-23 19:29:29 -08:00
Benny Chen	f25322fb97	Fix issues under caffe round 1 Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2 Reviewed By: ezyang Differential Revision: D13776185 fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24	2019-01-23 19:04:59 -08:00
David Riazati	31de19f210	Add support for overloaded functions (#15556 ) Summary: This PR adds support for overloaded functions as a step toward adding rnn modules to the JIT standard library. Possible overloads must be manually specified, and when resolving the overload it chooses by the first one that passes the schema matching logic. The structure is very similar to boolean dispatch in #14425. The overload will only work on weak modules. In order to avoid supporting overloaded methods in Python to match the JIT execution, the current setup offloads that work to the user. In the test added in `test_jit.py`, two methods are used to overload the `forward` method. In order to call `forward` outside the JIT, a Python-only `forward` that does the right argument type switching must also be provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15556 Differential Revision: D13576348 Pulled By: driazati fbshipit-source-id: 7d3bdd4ee5a6088cc20c92f26a696d1ee5b9204b	2019-01-23 18:16:01 -08:00
Elias Ellison	8710184eea	Constant propagation changes (#16244 ) Summary: - remove loop node that is guaranteed not to execute - remove extra loop outputs that are no longer needed - if we are inlining an if node, only run constant propagation on the block that will execute - remove the recurse argument since we only expose the Graph Constant Propagation and it's not used This also includes a few extra hooks to python_ir that I think make it a little be easier to test graph conditions from python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16244 Differential Revision: D13791635 Pulled By: eellison fbshipit-source-id: d16351fffcfc8013b02015db200f8fde002e0577	2019-01-23 17:50:33 -08:00
nlml	4b06c063a5	raise exception if try jit.load non-existent file (#16270 ) Summary: addresses https://github.com/pytorch/pytorch/issues/16267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16270 Differential Revision: D13791773 Pulled By: suo fbshipit-source-id: 256304a02dbf724a7c0baade48c94b3ee77f53cf	2019-01-23 16:16:18 -08:00
Jesse Hellemn	80bd28bcb2	Fixing upload of nightly binaries and clean MacOS output (#16016 ) Summary: - Fix environment variable used to guard binary uploads - Move common MacOS brew setup-code into a common function to decrease code duplication and also to move that noisy console output into its own CircleCI step - Split Mac builds into separate build-test and upload jobs. Add one of these jobs to PR runs; add upload jobs to nightly binarybuilds workflow Pull Request resolved: https://github.com/pytorch/pytorch/pull/16016 Differential Revision: D13791084 Pulled By: pjh5 fbshipit-source-id: 8eeb8e1963d46eab84f0f6dad9f0265163d5bf73	2019-01-23 15:38:04 -08:00
Teng Li	fc5b79cd1c	CUDA event should only be recorded after NCCL group (#8219 ) Summary: Otherwise, it won't work if we sync on this event. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8219 Reviewed By: pietern Differential Revision: D13788657 Pulled By: teng-li fbshipit-source-id: 8c96e9691ed2441d7a685fb7ae8fece906f58daf	2019-01-23 14:18:26 -08:00
Edward Yang	07a090247a	Change data() accessor in Caffe2 to return non-const pointer. (#16176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16176 This makes PyTorch and Caffe2's data() method line up. Historically, PyTorch made no distinction between tensors with const or non-const data, and thus provided a non-const pointer with data() member. Changing the API to return a const-pointer would break all mutable code, whereas changing the Caffe2 API to change a pointer doesn't break any code, except for code which required an exact match on const-ness (e.g., in template arguments). Since the latter is less disruptive, we've opted for it here. The few places downstream that broke due to this are fixed in this patch. Reviewed By: smessmer Differential Revision: D13742916 fbshipit-source-id: baa4b4544cfdf7c1f369f4d69a1e0d5953c1bd99	2019-01-23 13:55:24 -08:00
svcscm	dba4d37ac2	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 99d58034f9369846f8c82a5ea11c71e202e52a4e	2019-01-23 13:08:36 -08:00
Christian Puhrsch	f2b1842344	Align native_functions.yaml func schema more with JIT signature schema (#16111 ) Summary: This PR applies a few minor modifications leading to 100s of additional matches Modifications to native_functions.yaml 1) double to float 2) int64_t to int 3) IntList[\d] to int[\d] 4) {} to [] 5) Tensor? x=[] to Tensor? x=None 6) TensorList to Tensor[] 7) 1e-x to 1e-0x 8) Generator* x = nullptr to Generator? x = None 9) `{.}` to `[.]` Overall this adds about 300 new matches and brings us to about 1/2 compliance of native_functions func with their JIT signature equivalent While this is still a draft "tools/jit/gen_jit_dispatch.py" contains code to aid in finding close signatures Pull Request resolved: https://github.com/pytorch/pytorch/pull/16111 Reviewed By: ezyang Differential Revision: D13738123 Pulled By: cpuhrsch fbshipit-source-id: d1ec1e089bdb26ec155f6f31ccf768270acb76c7	2019-01-23 12:52:31 -08:00
Syed Tousif Ahmed	3837446883	Fixes selection of cuDNN algorithm (#15881 ) Summary: This PR updates the logic for using cudnnGet* and cudnnFind. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions. Changelist: - Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`. - Used cudnnGet_v7 everywhere cudnnGet* was being used. - Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851) Differential Revision: D13787601 Pulled By: ezyang fbshipit-source-id: 81fe86727673d021306fe1c99c3e528b7c9ad17f	2019-01-23 12:47:15 -08:00
Edward Yang	879bf65811	Disable flaky test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16274 Reviewed By: pietern Differential Revision: D13788036 fbshipit-source-id: a9b7353fb0655908e6d47387cc77af33e9471aed	2019-01-23 11:57:44 -08:00
Junjie Bai	9310eb1fd0	Update third_party protobuf to v3.6.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16251 Reviewed By: ezyang Differential Revision: D13781444 Pulled By: bddppq fbshipit-source-id: b713a021033d214f30a49ee02b95edf8633bcc50	2019-01-23 09:34:53 -08:00
Armaan Sethi	e669f72466	fix sigma in the middle of when word (#16227 ) Summary: there is a random sigma in the when word on : https://pytorch.org/cppdocs/contributing.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/16227 Differential Revision: D13762753 Pulled By: goldsborough fbshipit-source-id: 3d4bf4be859a3069402fe8c3fbc8ebee4f25cc5a	2019-01-23 08:35:32 -08:00
Derek Kim	36e27aa092	Typos and broken RSTs fixed in torch.distribution (#16136 ) Summary: - probabilty -> probability - make long lines break - Add LogitRelaxedBernoulli in distribution.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/16136 Differential Revision: D13780406 Pulled By: soumith fbshipit-source-id: 54beb975eb18c7d67779a9631dacf7d1461a6b32	2019-01-23 03:03:10 -08:00
Johannes M Dieterich	8b49efe86a	tune elementwise for AMD uarch (#16217 ) Summary: Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression. No functional/performance change for CUDA - just shifting numbers into constrexpr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217 Differential Revision: D13776684 Pulled By: bddppq fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9	2019-01-22 18:23:51 -08:00
rohithkrn	ddeaa541aa	fix typo in resnet50_trainer.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219 Differential Revision: D13776742 Pulled By: bddppq fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9	2019-01-22 17:28:04 -08:00
Lu Fang	e7a77ac3b0	Automatic update of fbcode/onnx to dc75285d4a1cff9618400164dfdb26c5a1bab70a Summary: Previous import was c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 Included changes: - [dc75285](https://github.com/onnx/onnx/commit/dc75285): Relax constraint that the initializers must be a subset of graph inputs (#1718) <G. Ramalingam> - [985c8cd](https://github.com/onnx/onnx/commit/985c8cd): Fix typo in scan shape inferencing (#1753) <Scott McKay> - [ab52a5d](https://github.com/onnx/onnx/commit/ab52a5d): remove stale test cases <Lu Fang> - [56434bb](https://github.com/onnx/onnx/commit/56434bb): Removing experimental ConstantFill op. <Spandan Tiwari> - [881c63c](https://github.com/onnx/onnx/commit/881c63c): Show string names of data types instead of int IDs (#1749) <Shinichiro Hamaji> - [0a12fe4](https://github.com/onnx/onnx/commit/0a12fe4): Update ConstantOfShape op. (#1744) <Bowen Bao> - [ef028e5](https://github.com/onnx/onnx/commit/ef028e5): Update definition of Cast Op to support casting to/from string (#1704) <Raymond Yang> Reviewed By: BIT-silence Differential Revision: D13773962 fbshipit-source-id: b98079277994a699d4807210ba1d9c27f4672090	2019-01-22 15:01:29 -08:00
Shen Li	2235fb256e	Add default_stream() and enhance current_stream() (#16200 ) Summary: Closes #16156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200 Differential Revision: D13747455 Pulled By: mrshenli fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec	2019-01-22 14:35:19 -08:00
Edward Yang	3e790f6ee8	complex_registration_extension.cpp includes to angled brackets Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16122 Reviewed By: smessmer Differential Revision: D13717900 fbshipit-source-id: 8401f39d993482d3e08d2d79bc1841deafee2a5b	2019-01-22 14:22:38 -08:00
Edward Yang	0f45e6dbdc	Remove ATen/Allocator.h forwarding header. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16121 Reviewed By: smessmer Differential Revision: D13717899 fbshipit-source-id: 83488f2aa801ca75059949ec85171ec03e64c4ff	2019-01-22 14:22:36 -08:00
Edward Yang	77de69867a	Remove dead curVal store. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16116 Reviewed By: smessmer Differential Revision: D13717719 fbshipit-source-id: 2ecee3f08f64e64ec5ac3c92fb326bc3df37e40e	2019-01-22 13:38:36 -08:00
Sebastian Messmer	325df4ccfb	Make kernel registration constexpr again (#16166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16166 Since we now don't use std::function anymore, we can make kernel registration constexpr again. Reviewed By: ezyang Differential Revision: D13738630 fbshipit-source-id: 918fa3a3c8c6f0ddbd0f08b3b143cdf066265387	2019-01-22 13:29:13 -08:00
Sebastian Messmer	cd8f4154f4	Avoid closure around kernel (#16165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16165 Store kernels as direct function pointers instead of std::function. Using direct function pointers avoids a performance risk std::function would introduce. Reviewed By: ezyang Differential Revision: D13738627 fbshipit-source-id: a348906c8a201436699681980a82ca95065a06a0	2019-01-22 13:29:11 -08:00
Sebastian Messmer	6192831b76	Pass IValues from JIT to c10 dispatcher (#16066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16066 Don't unwrap and re-wrap but directly pass through the IValues Reviewed By: ezyang Differential Revision: D13689037 fbshipit-source-id: 99b8155e640eb61a3c0597bf0f2b9c338712b45e	2019-01-22 13:29:09 -08:00
Shen Li	1c058de9ac	Release GIL when synchronize or wait (#16182 ) Summary: address the second future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182 Differential Revision: D13744972 Pulled By: mrshenli fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92	2019-01-22 13:29:07 -08:00
Wanchao Liang	c6503a4205	Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable Differential Revision: D13540278 Original commit changeset: 3768c76a90b0 fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7	2019-01-22 12:22:40 -08:00
Xiang Gao	c5e1b469be	Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429 ) Summary: Partially fixes: https://github.com/pytorch/pytorch/issues/394 Implementation detail: Codegen is modified to generate codes that looks like below: ```C++ static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PythonArgParser parser({ "svd(Tensor input, bool some=True, bool compute_uv=True, , TensorList[3] out=None)", }, /traceable=*/true); ParsedArgs<6> parsed_args; auto r = parser.parse(args, kwargs, parsed_args); static PyStructSequence_Field fields0[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc0 = { "torch.return_types.svd_out", nullptr, fields0, 3 }; static PyTypeObject type0; static bool namedtuple_type_initialized0 = false; if (!namedtuple_type_initialized0) { PyStructSequence_InitType(&type0, &desc0); namedtuple_type_initialized0 = true; } static PyStructSequence_Field fields1[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc1 = { "torch.return_types.svd", nullptr, fields1, 3 }; static PyTypeObject type1; static bool namedtuple_type_initialized1 = false; if (!namedtuple_type_initialized1) { PyStructSequence_InitType(&type1, &desc1); namedtuple_type_initialized1 = true; } if (r.idx == 0) { if (r.isNone(3)) { return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2))); } else { auto results = r.tensorlist_n<3>(3); return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2])); } } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called. When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`. In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue. Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs. There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429 Differential Revision: D13709678 Pulled By: ezyang fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf	2019-01-22 11:12:18 -08:00
Jongsoo Park	1e19fd941f	Fix formating in caffe2/quantization/server/README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237 Reviewed By: dskhudia Differential Revision: D13751791 Pulled By: jspark1105 fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799	2019-01-22 10:15:37 -08:00
Yaxun (Sam) Liu	9521a15c88	hip-clang enablement (#16085 ) Summary: Initial enabling of the upcoming hip-clang compiler for the PyTorch source base. Changes: * update the Eigen submodule to a version including our upstreamed hip-clang enabling there * modify a few ifdef guards with the `__HIP__` macro used by hip-clang * use `__lane_id` instead of `hc::__lane_id` * add Debug flags for ROCm to the cmake infrastructure Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085 Differential Revision: D13709459 Pulled By: ezyang fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7	2019-01-22 09:09:48 -08:00
Andy Wei	4cf76574b9	Raise CalledProcessError when torch.distributed launch process not return 0 (#16069 ) Summary: `torch.distributed.launch.py` will not raise error when `subprocess.Popen` is not return 0. For better debugging it should always raise an error if processes launched have unusual behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/16069 Differential Revision: D13709467 Pulled By: ezyang fbshipit-source-id: 31d32a5ec8fed7bccd62d845bfba0e670ed3fe20	2019-01-22 08:50:47 -08:00
Shahzad Lone	53ae8bc64d	Reserve vectors that we know the size in advance for. (#16201 ) Summary: Save reallocation costs, by reserving vectors according to how many elements we expect to put in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201 Differential Revision: D13762594 Pulled By: ezyang fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0	2019-01-22 08:02:40 -08:00
Will Feng	dfcafb1f71	cpp doc fix (#16221 ) Summary: Fixed a few C++ API callsites to work with v1.0.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16221 Differential Revision: D13759207 Pulled By: yf225 fbshipit-source-id: bd92c2b95a0c6ff3ba5d73cb249d0bc88cfdc340	2019-01-21 21:56:22 -08:00
Lu Fang	addebf110f	Move away from ConstantFill (#16214 ) Summary: Prerequisite of https://github.com/onnx/onnx/pull/1434 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16214 Reviewed By: BIT-silence Differential Revision: D13755116 Pulled By: houseroad fbshipit-source-id: a46be8d7df959b5ede93e1f9c911a9a9326e6879	2019-01-21 20:15:38 -08:00
Zachary DeVito	9757ad35b0	ban conv_double_backward from sandcastle, it takes too long Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16220 Differential Revision: D13755108 Pulled By: zdevito fbshipit-source-id: 46b1b128b155964c25249add0c84680491845e9b	2019-01-21 20:00:29 -08:00
Zachary DeVito	0cd1ab82b0	Remove dead code from setup.py, remove need for build target. (#16162 ) Summary: Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun. The NinjaBuilder stuff is dead. It was used to make building _C.so faster but now _C.so is just an empty stub file. Removed a bunch of custom build commands from setup.py that are no longer meaningful now that cmake handles most of the build. Removed unused targets in build_pytorch_lib.sh/bat Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162 Differential Revision: D13744155 Pulled By: zdevito fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892	2019-01-21 17:27:56 -08:00
Xiang Gao	bed7db7772	Unhide unique from C++, make unique partially scriptable (#15256 ) Summary: This PR does three things: ~~Allow `int64_t?` in function schema, which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~ ~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~ ~~Example:~~ ```yaml - func: myop(Tensor self, int64_t? dim=None) -> Tensor variants: function ``` ~~cc: zou3519~~ Edit: implemented in https://github.com/pytorch/pytorch/pull/15234 Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`. Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user. Differential Revision: D13540278 Pulled By: wanchaol fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd	2019-01-21 12:31:37 -08:00
Lu Fang	c33512bdfc	Automatic update of fbcode/onnx to c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 (#16190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16190 Previous import was fd60104394fa353e1762f44ecad1b2166e33deef Included changes: - [c553fb3](https://github.com/onnx/onnx/commit/c553fb3): Handle negative axis in scan shape inference (#1748) <G. Ramalingam> - [51b6ecc](https://github.com/onnx/onnx/commit/51b6ecc): external_data: Store large tensor values in separate files (#678) <Michał Karzyński> - [ba05f26](https://github.com/onnx/onnx/commit/ba05f26): Scan output axes (#1737) <G. Ramalingam> - [90920c0](https://github.com/onnx/onnx/commit/90920c0): Add NonZero op. (#1714) <Sergii Dymchenko> - [c4cf112](https://github.com/onnx/onnx/commit/c4cf112): fix the test cases for constantofshape (#1746) <Lu Fang> - [d902349](https://github.com/onnx/onnx/commit/d902349): Add sample implementation support (#1712) <Lu Fang> Differential Revision: D13745693 fbshipit-source-id: 05e2cce9ae1dfa2865db83840df64673d55cea57	2019-01-21 09:46:29 -08:00
Xiaomeng Yang	866c4e3467	Separate Moments from math and optimize it (#16175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175 Separate Moments from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13742472 fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992	2019-01-20 08:53:25 -08:00
Shen Li	898329c3f9	Unify device() return type in Stream, Event, and Tensor (#16150 ) Summary: Addresses one future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150 Differential Revision: D13732299 Pulled By: mrshenli fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18	2019-01-19 23:01:31 -08:00
Spandan Tiwari	1fb6b431a3	Replace use of ConstantLike with with ConstantOfShape (#16095 ) Summary: Submitting this PR as an update to existing PR (https://github.com/pytorch/pytorch/pull/15938) on houseroad 's request. This PR replaces the use of ONNX op `ConstantLike` with `ConstantOfShape` in the ONNX exporter. In addition to removing the call sites in `symbolic.py`, it also replace the call site in `peephole.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16095 Differential Revision: D13745723 Pulled By: houseroad fbshipit-source-id: e2a5f534f01adf199df9e27544f7afcfa540e1f0	2019-01-19 19:52:54 -08:00
Miro Furtado	33d1ec396b	Fix LBFGS issue (#16167 ) Summary: Resolves #15923 where LBFGS threw "Error: a leaf Variable that requires grad has been used in an in-place operation." Pull Request resolved: https://github.com/pytorch/pytorch/pull/16167 Differential Revision: D13745822 Pulled By: soumith fbshipit-source-id: 7d1d0511d06838c0c6f4c8a6b53cf15193283059	2019-01-19 15:01:06 -08:00
Kjell Schubert	a28c0ff7b8	Allow for concurrent quantization in FullyConnectedDNNLowPOp (#16174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16174 Our service creates a new caffe2 workspace for the same underlying network on multiple threads concurrently at service startup time (later these workspaces are being reused for sequential requests), resulting in concurrent quantization via FullyConnectedDNNLowPOp calling GetOrCreateFbgemmPackBMatrix(). The lazily performed quantizations during the first inference in each workspace are all funnelled through GetOrCreateFbgemmPackBMatrix()'s cache_mutex, which means quantization is serialized, so at service startup time only a single CPU core is being used for around a minute until the serial quantization is done. An better solution would be to avoid the quantization of the same weight matrix of the operator copies in different net copies to begin with, but this here is the simpler solution for our current problem. Reviewed By: jspark1105 Differential Revision: D13708785 fbshipit-source-id: 537519896b3b939c552d67f400bafc8a69ce11eb	2019-01-19 06:00:22 -08:00
Lu Fang	daedec2350	Support ConstantOfShape in Caffe2 ONNX Backend (#16108 ) Summary: This PR is the prerequisite to land https://github.com/pytorch/pytorch/pull/16095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16108 Reviewed By: BIT-silence Differential Revision: D13725722 Pulled By: houseroad fbshipit-source-id: 28c0fb72f075cd04f9db44dfab0163844c20c620	2019-01-18 22:58:23 -08:00
Xiaomeng Yang	b436f94b53	Separate affine_channel from math and optimize it (#16135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135 Separate affine_channel from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13727606 fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9	2019-01-18 22:40:16 -08:00
Sebastian Messmer	e8b872abe2	Pass IValue from c10 dispatcher to caffe2 operator (#16065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16065 Before, we registered the caffe2 kernel with the c10 dispatcher using plain C types. Now, we pass in IValues, which avoids the unwrapping inbetween. Reviewed By: ezyang Differential Revision: D13689036 fbshipit-source-id: b976a2c46a5a541f6a926b3df255e8a535e32420	2019-01-18 16:02:18 -08:00
Sebastian Messmer	c9044166a5	Make c10 dispatcher use boxed kernel function pointers (#16051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051 This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction. Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack`. Reviewed By: ezyang Differential Revision: D13684518 fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991	2019-01-18 16:02:15 -08:00
Thomas Viehmann	b662a9b66a	add back NNPACK in PyTorch (#15924 ) Summary: This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions. In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.) The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.) The CMake changes try to use the NNPack we already have in git. In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924 Differential Revision: D13709576 Pulled By: ezyang fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c	2019-01-18 15:34:35 -08:00
Natalia Gimelshein	ed57425b0a	improve performance of unique with inverse indices (#16145 ) Summary: Partial fix for #15804, only w/o dim. For jcjohnson benchmarking script I'm getting the following results on V100: Before: ``` unning with N = 10000, M = 10000 cuda (no inverse): 0.98 ms cpu (no inverse): 0.96 ms cuda (with inverse): 1.07 ms cpu (with inverse): 1.76 ms Running with N = 10000, M = 100000 cuda (no inverse): 0.76 ms cpu (no inverse): 1.53 ms cuda (with inverse): 1.23 ms cpu (with inverse): 3.02 ms Running with N = 100000, M = 100000 cuda (no inverse): 1.28 ms cpu (no inverse): 11.22 ms cuda (with inverse): 69.76 ms cpu (with inverse): 20.28 ms Running with N = 100000, M = 1000000 cuda (no inverse): 0.78 ms cpu (no inverse): 18.78 ms cuda (with inverse): 133.45 ms cpu (with inverse): 34.09 ms Running with N = 500000, M = 500000 cuda (no inverse): 1.43 ms cpu (no inverse): 61.13 ms cuda (with inverse): 3315.18 ms cpu (with inverse): 104.57 ms Running with N = 500000, M = 5000000 cuda (no inverse): 0.86 ms cpu (no inverse): 96.44 ms cuda (with inverse): 5209.93 ms cpu (with inverse): 176.10 ms ``` After ``` Running with N = 10000, M = 10000 cuda (no inverse): 1.04 ms cpu (no inverse): 0.94 ms cuda (with inverse): 0.64 ms cpu (with inverse): 1.76 ms Running with N = 10000, M = 100000 cuda (no inverse): 0.77 ms cpu (no inverse): 1.55 ms cuda (with inverse): 0.58 ms cpu (with inverse): 2.79 ms Running with N = 100000, M = 100000 cuda (no inverse): 1.30 ms cpu (no inverse): 14.15 ms cuda (with inverse): 1.63 ms cpu (with inverse): 20.90 ms Running with N = 100000, M = 1000000 cuda (no inverse): 0.82 ms cpu (no inverse): 18.63 ms cuda (with inverse): 0.61 ms cpu (with inverse): 33.52 ms Running with N = 500000, M = 500000 cuda (no inverse): 1.51 ms cpu (no inverse): 59.81 ms cuda (with inverse): 1.23 ms cpu (with inverse): 110.69 ms Running with N = 500000, M = 5000000 cuda (no inverse): 0.92 ms cpu (no inverse): 104.26 ms cuda (with inverse): 0.84 ms cpu (with inverse): 187.12 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16145 Differential Revision: D13738821 Pulled By: soumith fbshipit-source-id: 0811fb4ade47e3b466cebbc124e3f3333a986749	2019-01-18 14:56:39 -08:00
Michael Suo	c6d9c51c7e	fix for clang-tidy (#16164 ) Summary: It turns out that clang-tidy is bundled with travis's standard trusty distribution, so no need to install it manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16164 Differential Revision: D13738986 Pulled By: suo fbshipit-source-id: d0cd76c615625b2ed7f18951289412989f15849d	2019-01-18 14:04:26 -08:00
Shen Li	292edfb087	Change current device in stream context manager if necessary (#16128 ) Summary: Fixes #16019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128 Differential Revision: D13721850 Pulled By: mrshenli fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5	2019-01-18 12:39:51 -08:00
Jerry Zhang	eea50e91fa	Fix SoftmaxOps (#16049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16049 We might see the pattern ``` if (scale_.numel() != N) { scale_->Resize(N); // set initial value for scale_ } // In class: Tensor scale_{CPU}; ``` before in the code, where `scale_` is a member variable of Type `caffe2::Tensor` This pattern actually serves two purposes, if `scale_` is partially initialized with device type but not size, this call will initialize Tensor with the correct size, or if `scale_` is already initialized with size, it will check whether the size matches a runtime value `N` and if not it will Resize. To rewrite this we'll do the following: ``` if (!scale_.defined() \|\| scale_.numel() != N) { ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU)); // set initial value for scale_ } ``` There are some variants, if `scale_` is resized to a constant size, we can call `ReinitializeTensor` instead ``` if (scale_.numel() != 1) { scale_->Resize(1); } ``` --> ``` ReinitializeTensor(&scale_, {1}, at::dtype<float>().device(CPU)); ``` Normal Resize will be refactored directly into ReinitializeTensor: ``` scale_->Resize(N); ``` --> ``` ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU)); ``` Reviewed By: dzhulgakov Differential Revision: D13667883 fbshipit-source-id: 2c7cb61544b72765b594011b99150eb5a1b50836	2019-01-18 12:30:59 -08:00
Jerry Zhang	3f4bb3d493	rest of uses for deprecation of dims() in Tensor (#16118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118 att Differential Revision: D13697211 fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112	2019-01-18 11:52:12 -08:00
Nikita Shulga	b69c05dbd6	RNN operators should inherit step_net device_options (#16086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16086 [caffe2] RNN operators should inherit step_net device_options According to NetDef documentaiton, if network has a specific device option it applies to all network operators that do not explicitly specifiy it. But this does not seem to be the case for RecurrentNetwork operators Reviewed By: orionr Differential Revision: D13699552 fbshipit-source-id: 14529bc9504e3b02f763e3c2429be21e46f82b68	2019-01-18 11:36:38 -08:00
Elias Ellison	d4f6befc93	Add implicit optional unwrapping (#15587 ) Summary: Add support for type inference for optional type refinement. If a conditional is of the form "x is None" or "x is not None", or is a boolean expression containing multiple none checks, the proper type refinements are inserted in each branch. For example: if optional_tensor is not None and len(optional_tensor) < 2: # optional_tensor is a Tensor if optional_tensor1 is not None and optional_tensor2 is not None: # both optional_tensor1 and optional_tensor2 are Tensors TODO: - not run an op for unchecked unwrap optional in the interpreter - potentially refine types to prim::None (omitted for now to simply things & because it's not an actual use cause). Pull Request resolved: https://github.com/pytorch/pytorch/pull/15587 Differential Revision: D13733810 Pulled By: eellison fbshipit-source-id: 57c32be9f5a09ab5542ba0144a6059b96de23d7a	2019-01-18 11:25:01 -08:00
Jerry Zhang	da578b7dcf	Add defined() to caffe2::Tensor (#16125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125 Add defined() method to check whether the Tensor is defined. Reviewed By: ezyang Differential Revision: D13719222 fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47	2019-01-18 11:03:36 -08:00
Edward Yang	b9b160d86f	Remove ATen/Half.h and ATen/core/Half.h forwarding headers. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115 Reviewed By: bddppq Differential Revision: D13717049 fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a	2019-01-18 10:55:21 -08:00
Shen Li	1ff864712b	Port legacy any(*) to ATen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15547 Differential Revision: D13549495 Pulled By: mrshenli fbshipit-source-id: 09a065a8ffa7d73f409759b779c7314cc87f4853	2019-01-18 10:32:19 -08:00
Richard Zou	ed0a761c82	Improve pack_sequence and pack_padded_sequence error message (#16084 ) Summary: Mention that if enforce_sorted=True, the user can set enforce_sorted=False. This is a new flag that is probably hard to discover unless one throughly reads the docs. Fixes #15567 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084 Differential Revision: D13701118 Pulled By: zou3519 fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260	2019-01-18 07:58:54 -08:00
Teng Li	b4bc55beef	TCP init method race condition fix (#15684 ) Summary: This PR fixes a race condition for TCP init method, when master rank can exit earlier than slave ranks and thus the TCP daemon thread gets shutdown before other slaves are able to access it. This will let every rank (process) write a special key to the store to mark that they are completed (and thus about to exit). The master rank (who is the server) will always wait until all the ranks to complete before complete itself. This should fix: https://github.com/pytorch/pytorch/issues/15638 Tested using the repro of https://github.com/pytorch/pytorch/issues/15638 and works fine. Also test_distributed and test_c10d should have already had this coverage. I had to make rendezvous test in c10d the world size of 1, since it is a single process code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15684 Differential Revision: D13570904 Pulled By: teng-li fbshipit-source-id: 34f3bc471204bbd29320df359347ad5561c6b589	2019-01-18 02:29:38 -08:00
Dmytro Dzhulgakov	aaff2fecda	Remove caffe2::Tensor copy constructor (#15416 ) Summary: Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior. This change also identified a few places that misused the copy constructor - those are fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416 Reviewed By: Yangqing Differential Revision: D13524598 fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7	2019-01-18 00:31:56 -08:00
Zachary DeVito	b5c733324c	Fix RERUN_CMAKE Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16132 Differential Revision: D13726816 Pulled By: zdevito fbshipit-source-id: 26ad70651b0138642ad5240670f5c452018c13a2	2019-01-18 00:04:31 -08:00
Mikhail Zolotukhin	cb2961f63c	Cleanup includes in python_print.cpp. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16129 Differential Revision: D13724297 Pulled By: ZolotukhinM fbshipit-source-id: 24e140bc052c85ef40b928eb84f463d341346a51	2019-01-17 18:13:17 -08:00
Mikhail Zolotukhin	27674dc7c6	Refactor attributes.h (#16098 ) Summary: This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098 Differential Revision: D13717637 Pulled By: ZolotukhinM fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb	2019-01-17 17:39:58 -08:00
Sebastian Messmer	40b3e4907c	Fix export macros (#15856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15856 They seem to be wrong. cc zdevito to take a look but I think this is now more correct. It's weird this didn't cause linker errors. Probably, this functionality isn't used across library boundaries yet. Reviewed By: dzhulgakov Differential Revision: D13605257 fbshipit-source-id: 7077ca9027c3ac79a4847ec15ead7ddb28696445	2019-01-17 16:04:01 -08:00
Sebastian Messmer	0ab8de3125	Remove some dependencies from ivalue.h to ATen (#15855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15855 This is preparation work for moving IValue to c10. Reviewed By: ezyang Differential Revision: D13605259 fbshipit-source-id: cc545f582ab8607bb02aaf71273cb2710200b295	2019-01-17 16:03:58 -08:00
Sebastian Messmer	68164c1c3e	Code style cleanup (#15854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15854 - Remove unnecessary `inline` keyword - Add a TODO stating the intention for Blob::ShareExternal() Reviewed By: dzhulgakov Differential Revision: D13605258 fbshipit-source-id: c0bc85c74c4ca4b3811d42ac7f866182e159d840	2019-01-17 16:03:56 -08:00
Sebastian Messmer	637b35b372	Use intrusive_ptr for Blob in IValue (#16052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16052 We need IValue to take/return Blob as an intrusive_ptr because we want to pass it around and Blob has disabled copying. This is needed in a diff on top. Reviewed By: ezyang Differential Revision: D13684761 fbshipit-source-id: 7cb3d7e9fec39a2bc9f063d4d30404e6d7016eb2	2019-01-17 15:56:54 -08:00
Sebastian Messmer	3e85a2bcbf	Move c10 dispatcher back to ATen/core (#16050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16050 The c10 dispatcher will (soon) depend on IValue and IValue can't be moved to c10 yet because it depends on at::Tensor, which depends on legacy Type dispatch and we don't want the legacy dispatch in c10. So instead, we move the c10 dispatcher back to ATen/core until we can actually move at::Tensor to c10. Reviewed By: ezyang Differential Revision: D13684517 fbshipit-source-id: 1125f4254223907c52f96ff73034f6d4ae9fd0a7	2019-01-17 15:56:52 -08:00
Chris Gottbrath	a9438ba62f	Moving cuda-convnet2 to the internal fb dir to satisfy internal dependencies. (#16104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16104 PyTorch PR 15784 removed cuda-convnet from the contrib directory. This broke some internal-only fb dependencies. Moving this to the internal area. Reviewed By: ezyang Differential Revision: D13709112 fbshipit-source-id: 2d7811545da67489869b59c350a29817eff693cf	2019-01-17 15:11:20 -08:00
Michael Suo	431a34f3ff	further wildcard cleanups (#16041 ) Summary: Some cleanup to wildcard handling, including one bugfix: previously, we were not considering writes to the wildcard set as part of the potential write set for nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16041 Differential Revision: D13705738 Pulled By: suo fbshipit-source-id: acb8ccbaa70fe47445577ddf24a69f84630de411	2019-01-17 14:54:34 -08:00
David Riazati	962f3f4864	Refactor _jit_internal (#16058 ) Summary: Use qualified names in `jit/__init__.py` to avoid polluting that namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/16058 Differential Revision: D13718745 Pulled By: driazati fbshipit-source-id: 19d150569c8374541250a961f24f70c3f523de03	2019-01-17 13:56:50 -08:00
Jesse Hellemn	99b029aca3	Include all Caffe2 headers in Python installations (#16124 ) Summary: Confirmed on a local run that all the additional headers are present. This shouldn't be caught in any existing tests though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16124 Differential Revision: D13720773 Pulled By: pjh5 fbshipit-source-id: 22a42639f5649cac555ecc5a8b6760a8cbfcf01f	2019-01-17 13:51:51 -08:00
Aaron Jaech	0282318bea	Add comment to explain rnn bias vectors (#15843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15843 RNN/LSTMs only need one bias vector, but our implementation uses two to be compatible with CuDNN. This diff adds a comment to explain this. Reviewed By: ezyang Differential Revision: D13602365 fbshipit-source-id: eef5bd9383d9f241dc0ef0472f753b4a44cc19b5	2019-01-17 13:32:42 -08:00
Will Feng	7e9e1c7a9f	Add @yf225 to cpp codeowner Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16120 Differential Revision: D13718880 Pulled By: yf225 fbshipit-source-id: 1c0a41ffba71855a3ad88b8d263ba2bd5076351d	2019-01-17 13:03:48 -08:00
Yangqing Jia	9aac5c7e85	Update FP16 to latest master (#14498 ) Summary: TSIA - fp16 cmake had a bug that is fixed in https://github.com/Maratyszcza/FP16/pull/9 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/14498 Differential Revision: D13240829 Pulled By: Yangqing fbshipit-source-id: 724745750efe4f1b49d29ee07380c36997579915	2019-01-17 13:00:28 -08:00
Egil Martinsson	d6a8dd9538	Cleanup gumbel_softmax (#13339 ) Summary: Fixes #12643, amends to #3341. - Allow multidimensional input ~~(but apply softmax over `dim=-1`)~~ with `dim` argument - Cleaner: Less lines of code - Faster (1.32x speedup vs original, 2x speedup vs using `torch.Distributions`) - Small fixes in docstring - Remove some references in docstring. Was the linked (excellent) ipynb the first to do the straight-through trick? Instead, I propose changing to reference to the two papers most known for it. - Add deprecationwarning for `eps`. It's not needed anymore. - Initial commit keeps some code alternatives commented to exploit CI - As of discussion when `gumbel_softmax` was added (#3341), this was merged into `torch.nn.functional` before all the work with `Distributions` and `Pyro`, and there will probably be multiple other best practices for this in the future. I've tested building using the `Distributions`-api, but it was too slow, see below. I therefore propose not using `Distributions` to keep it fast and simple, but adding a comment in docstring that `gumbel_softmax` may be deprecated in the future. ``` dist = torch.distributions.RelaxedOneHotCategorical(temperature=tau, logits=logits, validate_args=False) y_soft = dist.rsample() ``` Pros: * Built using tricks like `logsumexp` etc * Explicitly uses `torch.distributions.utils._finfo` to avoid overflow (old implementation had an `eps` flag) * Maintained for this exact purpose. Cons: * Very slow. Construction of distribution adds overhead see timings below. May be solved in future with speedups of `TransformedDistribution` and `Distribution`. * Assumes which `dim` to apply softmax over. ``` y_soft = logits.new(logits.shape) y_soft = (logits - y_soft.exponential_().log()) / tau # Gumbel noise y_soft = y_soft.softmax(dim) # Gumbel softmax noise ``` Pros: * Faster ``` import time start = time.time() num_draws = 1000000 logits = torch.randn(1,3) for draw in range(num_draws): y_draw = gumbel_softmax(logits, hard=True) counts = counts + y_draw print(end - start) >> 12.995795965194702 >> 7.658372640609741 >> 20.3382670879364 ```` Decide on which path to chose. I'll commit in changes to the unit tests in a while to show that it passes both old tests and new tests. I'll also remove the commented code about `RelaxedOneHotCategorical` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13339 Differential Revision: D13092434 Pulled By: ezyang fbshipit-source-id: 4c21788df336f4e9c2ac289022e395b261227b4b	2019-01-17 12:56:35 -08:00
Christian Puhrsch	a667767220	Add matches_jit_signature attribute to native_functions.yaml (#16040 ) Summary: If "matches_jit_signature" is set to True for a particular function, we will assume that the func syntax follows the JIT signature syntax. This is a temporary attribute and doesn't need to be set by developers outside the core team. It serves as a means of tracking an ongoing schema unification with the goal of aligning func syntax with other components of PyTorch in order to reduce overall complexity and match coverage of different function descriptions. Followup PRs might be about removing _out from native_functions.yaml and using Tensor annotations instead, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16040 Reviewed By: ezyang Differential Revision: D13703176 Pulled By: cpuhrsch fbshipit-source-id: ce248e1823a6f18efa95502f9f3eebf023b4a46c	2019-01-17 12:39:08 -08:00
FrankHui	fe4ae9dfe4	add if in register_buffer like register_parameters (#16110 ) Summary: without this "if", code below will throw error " Linear' object has no attribute '_buffers' " And with this if, error would be "cannot assign buffer before Module.\_\_init\_\_() call", which I think it's more accurate, just like register_parameter. ``` import math import torch from torch.nn.parameter import Parameter from torch.nn import functional as F from torch.nn import Module class Linear(Module): def __init__(self, in_features, out_features, bias=True): self.in_features = in_features self.out_features = out_features self.register_buffer('test', torch.Tensor(out_features, in_features)) self.weight = Parameter(torch.Tensor(out_features, in_features)) if bias: self.bias = Parameter(torch.Tensor(out_features)) else: self.register_parameter('bias', None) super(Linear, self).__init__() self.reset_parameters() def reset_parameters(self): stdv = 1. / math.sqrt(self.weight.size(1)) self.weight.data.uniform_(-stdv, stdv) if self.bias is not None: self.bias.data.uniform_(-stdv, stdv) def forward(self, input): return F.linear(input, self.weight, self.bias) def extra_repr(self): return 'in_features={}, out_features={}, bias={}'.format( self.in_features, self.out_features, self.bias is not None ) linear = Linear(3,4) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16110 Differential Revision: D13715839 Pulled By: soumith fbshipit-source-id: c300eff0a8655aade448354cf489a592f7db722a	2019-01-17 11:50:12 -08:00
Edward Yang	f1ad5e08c7	Revert D13709409: [pytorch][PR] Exclude pyi from flake8 checks. Differential Revision: D13709409 Original commit changeset: ec4a959e146f fbshipit-source-id: feabed5719a0bfdfe7979074b7e1ba9756c4ba25	2019-01-17 11:28:56 -08:00
Guoqiang Jerry Chen	6641b09fac	respect grad guard for torch.jit._fork and torch.jit._wait (#16101 ) Summary: respect grad guard for torch.jit._fork and torch.jit._wait. Verified that the test failed without the fix, and pass with the fix. Ideally I would like to enable and disable grad inside the forked function. It doesn't seems like it's supported at this moment. This code handles that as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16101 Differential Revision: D13708374 Pulled By: gqchen fbshipit-source-id: 0533f080c4d0253fb4c61d2a0d3cc22de5721a09	2019-01-17 11:12:57 -08:00
Gregory Chanan	595f767880	Revert batched pdist, improve existing kernel, add test (#15901 ) Summary: 1) Reverts https://github.com/pytorch/pytorch/pull/12302 which added support for batched pdist. Except I kept the (non-batched) test improvements that came with that PR, because they are nice to have. Motivation: https://github.com/pytorch/pytorch/issues/15511 2) For the non-batched pdist, improved the existing kernel by forcing fp64 math and properly checking cuda launch errors 3) Added a 'large tensor' test that at least on my machine, fails on the batch pdist implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15901 Reviewed By: ezyang Differential Revision: D13616730 Pulled By: gchanan fbshipit-source-id: 620d3f9b9acd492dc131bad9d2ff618d69fc2954	2019-01-17 10:44:43 -08:00
Derek Kim	fbdafb006e	Fix trivial typos in torch.cuda._utils (#16026 ) Summary: Trivial typo fixings. Maybe the indefinite article "an" is needed before each "specified index" but I'm not perfectly sure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16026 Differential Revision: D13709499 Pulled By: ezyang fbshipit-source-id: 698b000bb8aa063afd81db6e67046456a439b2ce	2019-01-17 10:40:43 -08:00
Sasha Rush	dbe6a7a9ff	Unify the shape notation for all of the pytorch modules (#15741 ) Summary: PR to update the shape notation for all of the torch.nn modules to take a unified form. The goal is to make these definitions machine-readable and those checkable by unifying the style across all of the different modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15741 Differential Revision: D13709601 Pulled By: ezyang fbshipit-source-id: fb89a03903fdf0cd0dcf76f3e469b8582b2f3634	2019-01-17 10:32:14 -08:00
Neeraj Pradhan	67f2039f4c	Fix numerical stability in binomial.log_prob (#15962 ) Summary: This issue was discovered by fehiepsi in https://github.com/uber/pyro/issues/1706 with the `log_prob` computation for Binomial, ~and can be seen with `torch.float32` when we have a combination of low probability value and high `total_count` - a test is added to capture this (since scipy only uses float64, the comparison is done using relative tolerance).~ The problem is in the code that tries to pull out the minimum values amongst the logits (written by me earlier, presumably to avoid numerical instability issues), but it is not needed. EDIT: After a few attempts, I have been unable to reliably show that the change is more numerically stable, and have removed my previous test which fails on linux. The reason is that the issue manifests itself when `total_count` is high and `probs` is very low. However, the precision of `lgamma` when `total_count` is high is bad enough to wash away any benefits. The justification for this still stands though - (a) simplifies code (removes the unnecessary bit), (b) is no worse than the previous implementation, (c) has better continuity behavior as observed by fehiepsi in the issue above. cc. fehiepsi, alicanb, fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/15962 Differential Revision: D13709541 Pulled By: ezyang fbshipit-source-id: 596c6853b6e4d5fba42336afa168a665ab6fbde2	2019-01-17 10:18:37 -08:00
Lu Fang	c51cf09a4b	Automatic update of fbcode/onnx to fd60104394fa353e1762f44ecad1b2166e33deef (#16094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16094 Previous import was 84a0441ae28795a928005863dc142bee81827566 Included changes: - [fd60104](https://github.com/onnx/onnx/commit/fd60104): deprecate no-spatial mode of BN (#1637) <liqunfu> Reviewed By: BIT-silence Differential Revision: D13705357 fbshipit-source-id: 44dbc8bf15fced6d50048b04c2882e38f75c0e34	2019-01-17 10:14:15 -08:00
Derek Kim	f09003d95d	A trivial typo fixed in onnx.verify.verify (#15871 ) Summary: A trivial typo fixing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15871 Differential Revision: D13709588 Pulled By: ezyang fbshipit-source-id: 84460e53e30470bef72bc836c08fd149b4d725cf	2019-01-17 09:57:33 -08:00
Syed Tousif Ahmed	0e80df515d	Remove support for CUDNN 6 (#15851 ) Summary: This PR aims to remove support for cuDNN 6. Differential Revision: D13709595 Pulled By: ezyang fbshipit-source-id: 853624db1cf66b0534d7028654c38c2806fb4107	2019-01-17 09:57:26 -08:00
Edward Yang	1a5c5fe7c9	Exclude pyi from flake8 checks. (#16105 ) Summary: Idiomatic pyi files will fail with Python 2 flake8 even though they would work with mypy. This is because pyi files generally use Python 3 only syntax. No point in linting them. There are currently no pyi files checked in, this is purely a prophylactic measure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16105 Reviewed By: zou3519 Differential Revision: D13709409 Pulled By: ezyang fbshipit-source-id: ec4a959e146f81ccb9533b04348be8dd78808421	2019-01-17 09:51:45 -08:00
Marat Dukhan	76782cfc21	Update cpuinfo to avoid reporting error when sysfs is not accessible (#16107 ) Summary: On some cloud-based x86 systems /sys/ is not mounted. cpuinfo has a work-around for these systems, but it reports an error if sysfs files fail to read, and this error was confusing to some users (e.g. pytorch/cpuinfo#20). This update downgrades the error to a warning, so it is not reported with default configuration options. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16107 Differential Revision: D13715243 Pulled By: soumith fbshipit-source-id: f5c4c86422343ca449487f0185f3a8865ccf3b9d	2019-01-17 09:26:49 -08:00
bddppq	1a09a2a27f	Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16106 Differential Revision: D13709490 Pulled By: bddppq fbshipit-source-id: 1b5b32261f06543371f7bd7ac9b11957a5eb4ad0	2019-01-17 09:18:08 -08:00
DavidWongEA	334258e39e	Potential fix for model inference crash on Win10 (#15919 ) (#16092 ) Summary: Please refer to issue #15919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16092 Differential Revision: D13712897 Pulled By: soumith fbshipit-source-id: edcd1ed3504f1fa1af841a1757616382c745958f	2019-01-17 08:44:02 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
Derek Kim	1e425d1a47	A trivial typo fix in caffe2.python (#15907 ) Summary: blobl -> globl Pull Request resolved: https://github.com/pytorch/pytorch/pull/15907 Differential Revision: D13709586 Pulled By: ezyang fbshipit-source-id: 9d3ad76b7fea76c7934407d3c164417b4157e234	2019-01-17 04:57:34 -08:00
Xiaomeng Yang	7536887cb7	Add count_include_pad for avg_pool on CuDNN (#16100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16100 Add count_include_pad for avg_pool on CuDNN Reviewed By: houseroad Differential Revision: D13707959 fbshipit-source-id: 261f5d116066fef75cf9a5787dfbc5d12b5b9f9b	2019-01-17 02:10:12 -08:00
Derek Kim	4171ef3728	Enhance the documentation for DistributedDataParallel from torch.nn.parallel.distributed (#16010 ) Summary: - a typo fixed - made the docs consistent with #5108 And maybe one more change is needed. According to the current docs > The batch size should be larger than the number of GPUs used locally. But shouldn't the batch size be larger than the number of GPUs used either locally or remotely? Sadly, I couldn't experiment this with my single GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16010 Differential Revision: D13709516 Pulled By: ezyang fbshipit-source-id: e44459a602a8a834fd365fe46e4063e9e045d5ce	2019-01-17 01:02:44 -08:00
QingfengLi	ded4ff87af	fix a little error in comments (#15922 ) Summary: There is a little error in the comment, "A->B", so the Task B must start after task A finishes, not "B". Pull Request resolved: https://github.com/pytorch/pytorch/pull/15922 Differential Revision: D13709579 Pulled By: ezyang fbshipit-source-id: 735afe83f4532b7c7456da3e96209b3e07071f37	2019-01-17 00:25:23 -08:00
fulltopic	c7a48da493	Corresponding data type for BYTE (#15627 ) Summary: TensorProto.DataType in caffe2/proto/caffe2.proto has BYTE = 3 defined, while there is no corresponding TypeMeta defined in caffe2/core/types.cc: DataTypeToTypeMeta. This issue failed the C++ tutorial of MNIST + LMDB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15627 Differential Revision: D13709602 Pulled By: ezyang fbshipit-source-id: d4826d0f9b3975e6a8478d4bad1abbbedcaea197	2019-01-17 00:17:56 -08:00
Derek Kim	ec8b1c94a9	Fix possible importing errors in build_libtorch.py (#15471 ) Summary: 1. I fixed the importing process, which had some problems - I think `setup_helpers` should not be imported as the top level module. It can lead to many future errors. For example, what if `setup_helpers` imports another module from the upper level? So we need to change it. - The code is not consistent with other modules in `tools` package. For example, other modules in the package imports `from tools.setuptools...` not `from setuptools...`. - It should be able to run with `python -m tools.build_libtorch` command because this module is a part of the tools package. Currently, you cannot do that and I think it's simply wrong. ~~2. I Added platform specific warning messages. - I constantly forgot that I needed to define some environment variables in advance specific to my platform to build libtorch, especially when I'm working at a non pytorch root directory. So I thought adding warnings for common options would be helpful .~~ ~~3. Made the build output path configurable. And a few other changes.~~ orionr ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/15471 Differential Revision: D13709607 Pulled By: ezyang fbshipit-source-id: 950d5727aa09f857d973538c50b1ab169d88da38	2019-01-16 23:55:57 -08:00
Mikhail Zolotukhin	fcb4b4f002	Remove redundant includes from ir.{h,cpp}. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16080 Differential Revision: D13701796 Pulled By: ZolotukhinM fbshipit-source-id: 7efae3a0fd969376e4b438a8d8fb96adb33dc55c	2019-01-16 23:43:45 -08:00
peter	f7733526aa	Generate PDB files for better debugging on Windows (#16008 ) Summary: 1. Unify `build_pytorch_libs.bat`, `setup.py` and `torch/CMakeLists.txt` on the debugging flags with the `CMAKE_BUILD_TYPE` being `Debug`, `Release` and `RelWithDebInfo`. 2. Install PDBs through CMake if they are generated. Reference: 1. CMake PDB install: https://gitlab.kitware.com/cmake/cmake/issues/18393#note_459199 2. About debugging flags https://stackoverflow.com/a/4662345 3. MSDN page about /DEBUG flag: https://docs.microsoft.com/en-us/cpp/build/reference/debug-generate-debug-info?view=vs-2017 4. MSDN page about /Z{i/I/7}: https://docs.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=vs-2017 Work to do: - [x] Test the changes work in Release config through this PR - [ ] <del> Test debug build through https://github.com/pytorch/pytorch/pull/16009 </del> - [x] Test release build with debugging symbols through #16013 Difficulties: - [x] Replace /Zi flags with /Z7 (which will be added if DEBUG or RelWithDebInfo is used), as it is not supported by sccache - [x] Resolve `LINK : fatal error LNK1210: exceeded internal ILK size limit; link with /INCREMENTAL:NO` in the debug build - [ ] DEBUG build blocked by a MSVC bug. In order to resolve it, we'll need to update the MSVC in CI: https://developercommunity.visualstudio.com/content/problem/225957/fatal-error-lnk1318-unexpected-pdb-error-ok-0.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/16008 Differential Revision: D13709527 Pulled By: ezyang fbshipit-source-id: e8365bc75d9ec64099093f7001f83d99a06b196b	2019-01-16 23:34:32 -08:00
JerryShih	0dfdc2cbdb	Update int8_simd.h (#13859 ) Summary: If we use clang with sse4 support, we will have the function redefinition error between [1] and [2]. This patch try to add some checkings to fix this problem. I just turn on USE_NATIVE_ARCH with clang, then I hit the redefinition error. [1] caffe2/operators/quantized/int8_simd.h [2] third_party/gemmlowp/gemmlowp/fixedpoint/fixedpoint_sse.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/13859 Differential Revision: D13095694 Pulled By: ezyang fbshipit-source-id: c65166e4d5a04bb54e2b82c52740af00116ccb0d	2019-01-16 23:19:46 -08:00
SsnL	ffd613800f	Add IS_PYTORCH_CI flag for testing (#16006 ) Summary: Use case: Some data loader tests rely on `psutil` (a third party lib). So they are guarded by `skipIf`. But we want to always test them on CI envs. With `IS_PYTORCH_CI`, we can raise if `psutil` is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16006 Reviewed By: ezyang Differential Revision: D13673957 Pulled By: yf225 fbshipit-source-id: c63a7138093f45333c0b371fed0bcc88b67f2a22	2019-01-16 23:07:38 -08:00
jiej	7c56db73d5	Moving torch.norm to ATen using TensorIterator (#15414 ) Summary: Adding supports for torch.nomr: i. multi dimensions for dim ii. dtype that specifies math/output tensor type Pull Request resolved: https://github.com/pytorch/pytorch/pull/15414 Differential Revision: D13702022 Pulled By: ezyang fbshipit-source-id: da2676f2b6aff988889b1539d0de8ecd4946823a	2019-01-16 22:15:25 -08:00
Tongliang Liao	55511004d1	Resolve errors in perfkernel for Windows (#16031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16031 1. MSVC only has _mm_prefetch(const char*, int). Fixed in both python codegen and C++ files. 2. uint32_t in "cvtsh_ss_bugfix.h" requires "#include <cstdint>". 3. Some files use gflags headers. Add dependency via c10. 4. Isolate arch flags with interface library and private compile options. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15753 Reviewed By: dskhudia Differential Revision: D13636233 Pulled By: jspark1105 fbshipit-source-id: cdcbd4240e07b749554a2a5676c11af88f23c31d	2019-01-16 21:51:00 -08:00
Soumith Chintala	aa6b0f50ad	add a constexpr in c10::Half (#16091 ) Summary: Debug build generates references which are not resolved otherwise as recognized by dlibenzi Pull Request resolved: https://github.com/pytorch/pytorch/pull/16091 Differential Revision: D13703584 Pulled By: soumith fbshipit-source-id: 6ac5666d2c6b1520e083f6eac9c535a1609d9c6b	2019-01-16 21:13:21 -08:00
Jerry Zhang	d277f77da2	Tensor reinitialization codemod - 3/5 (#15912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15912 Codemod generated with clangr shard mode, 25 files per diff, To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and call `ReinitializeTensor` to initialize it. motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13586734 fbshipit-source-id: 8485d2c51225343961351c7a2e8f95055534f9a9	2019-01-16 19:49:01 -08:00
Yinghai Lu	57d29ffa9c	Bound shape inference for c2 (#16081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16081 A simple version of bound shape inference, conditioned on batch size. In addition to doing normal shape inference, it will change the batch size (1st dim of the shape) of the inputs as well as batch size modulating ops such as `SparseLengthsSum`. Probably support to more ops is needed, such as `SparseToDense`. We can build on this. Reviewed By: jackm321, rdzhabarov Differential Revision: D13661968 fbshipit-source-id: 6a724a647e109757c26e3e26e15a49725ecc75cc	2019-01-16 19:02:56 -08:00
Xiaomeng Yang	7a5f782c2e	Fix max_pool_grad test (#16088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16088 Fix max_pool_grad test Reviewed By: houseroad Differential Revision: D13700917 fbshipit-source-id: f4f942ee920bcd943c38a8f8a6aafd1d13c4515f	2019-01-16 15:32:27 -08:00
Edward Yang	5b2d30ec85	Revert D12812029: [pt1][tensor] Remove deprecated caffe2::Tensor APIs Differential Revision: D12812029 Original commit changeset: ea0c3dd882be fbshipit-source-id: d5bb4cbb1d7c9be08789599a7db0fb3313f3dbc4	2019-01-16 14:53:20 -08:00
Chandler Zuo	237c0c3c7a	Port the backend of FractionalMaxPool3d from TH to ATen (#15575 ) Summary: 1. Port the FractionalMaxPool3d implementation from THNN/THCUNN to ATen. 2. Expose this function to Python module nn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15575 Differential Revision: D13612848 Pulled By: chandlerzuo fbshipit-source-id: 5f474b39005efa7788e984e8a805456dcdc43f6c	2019-01-16 14:16:30 -08:00
Natalia Gimelshein	aff0964ee7	update pytorch docker to cuda 10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16082 Differential Revision: D13699081 Pulled By: soumith fbshipit-source-id: 86942e2c5595931384cf87dd1ef75936a4d74a57	2019-01-16 13:37:37 -08:00
Thomas Viehmann	d33e7d1236	multinomial: fix detection of zero probability (#16075 ) Summary: The cumsum over the probabilities can be not monotonically non-decreasing. Thus it is hard to detect zero probability classes using just the cumsum. This changes the binary search postprocessing to use the (non-cumulated) distribution instead. Thank you, jcjohnson, for the bug report with reproducing case. Fixes: #13867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16075 Differential Revision: D13695565 Pulled By: soumith fbshipit-source-id: 02c4d6f868f0050c1ae7d333f4317c5610e49cd9	2019-01-16 12:50:49 -08:00
Kimish Patel	e58cc6ab28	Enable single graph sharing between multiple threads for onnxifiop (#16047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16047 Implements single thead safe map enabling sharing of generated graph between different ops. Added model_id to every onnxified op to help create a unique id in the map. Some formatting fix. Reviewed By: yinghai Differential Revision: D13663927 fbshipit-source-id: 27417e8fe752fdd48abb6a87966cd76d592e1206	2019-01-16 12:19:16 -08:00
vishwakftw	503f412f79	Fix error message formatting in AT_CHECK/AT_ERROR (#16067 ) Summary: Changelog: - Fix formatting for error messages in prelu, EmbeddingBag, RNN Fixes https://github.com/pytorch/pytorch/issues/16043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16067 Differential Revision: D13693286 Pulled By: soumith fbshipit-source-id: b0760d13c9a45e82dababfc44dabe648e5345ca3	2019-01-16 11:34:13 -08:00
Rasmus Diederichsen	71b24127d2	Correct sphinx-note in symeig (wrong indentation) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16073 Differential Revision: D13692874 Pulled By: soumith fbshipit-source-id: ea2a98e88679d382f9a2edab199e9ba7c8ce2213	2019-01-16 10:47:48 -08:00
peter	3cf76e78bd	Fix the caffe2_gpu linkage with torch on Windows (#16071 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15992. Inspired by https://docs.microsoft.com/en-us/cpp/build/reference/optimization-best-practices?view=vs-2017. But this PR needs to be tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16071 Differential Revision: D13693006 Pulled By: soumith fbshipit-source-id: e83e9ae2591fa4da01d2b1b593558dba3bdc3cf7	2019-01-16 09:10:39 -08:00
Shen Li	a2af554e6f	Port legacy all() to ATen (#15540 ) Summary: Questions: 1. ~This PR disables `common_dtype` computation [in `TensorIterator.cpp`](https://github.com/mrshenli/pytorch/blob/all/aten/src/ATen/native/TensorIterator.cpp#L489-L491) for `all` operators. The reason is that, [this code](https://github.com/mrshenli/pytorch/blob/all/aten/src/ATen/native/TensorIterator.cpp#L120) otherwise complains type mismatch, where the `op.tensor` is `type Variable[CPUByteType]` while the `op` is `CPUByteType`. I am not sure if this is the right solution for this problem.~ 2. Should I clean up all occurrences of `_th_all` and `_th_all_out` (and `logicalAnd`, `logicalAndAll`)? 3. Do I need to implement derivatives for `all`? gchanan Benchmark: <img width="590" alt="screen shot 2018-12-26 at 3 24 31 pm" src="https://user-images.githubusercontent.com/16999635/50456505-e9596a00-0922-11e9-844e-00c4b4aad7ca.png"> <img width="587" alt="screen shot 2018-12-26 at 3 26 10 pm" src="https://user-images.githubusercontent.com/16999635/50456509-ef4f4b00-0922-11e9-96bf-0a30c8574fe7.png"> <img width="590" alt="screen shot 2018-12-26 at 3 26 54 pm" src="https://user-images.githubusercontent.com/16999635/50456510-ef4f4b00-0922-11e9-8a63-e47988843cc8.png"> <img width="589" alt="screen shot 2018-12-26 at 3 27 16 pm" src="https://user-images.githubusercontent.com/16999635/50456511-ef4f4b00-0922-11e9-9004-2518aebcdc6e.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15540 Differential Revision: D13548938 Pulled By: mrshenli fbshipit-source-id: 5a2e5eef1047decb4c79906cb9f3332034908c9c	2019-01-16 09:06:26 -08:00
Edward Yang	411173757e	Rename away uses of THAllocator and THCDeviceAllocator (#16061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16061 I discovered I needed to delete these names in preparation of moving THCCachingAllocator to c10_cuda; might as well also fix all the other sites too. Reviewed By: dzhulgakov Differential Revision: D13686869 fbshipit-source-id: e8cc55d39ac4bfd3e3a22c761f89a7a111ce5f5e	2019-01-16 05:36:47 -08:00
Edward Yang	4d07951a54	Stop pretending that TH headers are both C++ and C compatible. (#16059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16059 Just deleted all __cplusplus ifdef guards; we only ever use these headers in C++ contexts. Reviewed By: dzhulgakov Differential Revision: D13686580 fbshipit-source-id: ce28c4a32f3596bfb17aeeb34904a02899991453	2019-01-16 05:36:45 -08:00
Brennan Vincent	fb68d813be	Fix logic errors when accumulating reductions in output (CUDA) (#16023 ) Summary: The correct logic is as follows: * If there is an earlier split, we need to combine with its result * If there is not a later split, we need to project before saving into the output. This should partially f i x #15837 . For example: ``` In [7]: a=torch.ones([1838860800], dtype=torch.float, device="cuda:1") In [8]: a.mean() Out[8]: tensor(1., device='cuda:1') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16023 Differential Revision: D13678449 Pulled By: umanwizard fbshipit-source-id: ab5078484c88e96bb30121b5cf24a0e8b0a8c2f8	2019-01-15 19:57:57 -08:00
Jerry Zhang	5353847b19	Remove deprecated caffe2::Tensor APIs (#15814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15814 Plan is to remove the APIs we want to deprecate one by one and make sure it still builds in sandcastle and ossci Reviewed By: ezyang Differential Revision: D12812029 fbshipit-source-id: ea0c3dd882bec95fcd4507160ebc61f598b6d040	2019-01-15 18:42:04 -08:00
Jerry Zhang	5e72e99c86	Remaining Tensor API fixes - dims() -> sizes() (#15743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15743 Remaining fixes so that D12812029 will compile Reviewed By: dzhulgakov Differential Revision: D13535559 fbshipit-source-id: 2c8b3403570c8c35ac8efe2d827233abc0e6e0d1	2019-01-15 18:42:02 -08:00
Edward Yang	8b5894491c	Comment about CuDNNWrapper (#15496 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15496 Differential Revision: D13544130 Pulled By: ezyang fbshipit-source-id: 51bdd8312b482925b30a478774cdfa629c57ee4e	2019-01-15 18:01:12 -08:00
Chandler Zuo	ad39cbde59	Port FractionalMaxPool2d from TH to ATen (#15531 ) Summary: Tested: pytest test/test_nn.py -k Fractional Pull Request resolved: https://github.com/pytorch/pytorch/pull/15531 Differential Revision: D13612833 Pulled By: chandlerzuo fbshipit-source-id: b919d698d068b97ba7a4f8021367e7f6c8aae39c	2019-01-15 17:57:12 -08:00
James Reed	dc4977ddf0	Support tracing GenericList (#15969 ) Summary: Treat GenericList similarly to tuples and TensorList: recursively unpack them and assignValueTrace accordingly. Also add interpreter support for ListUnpack on GenericList Pull Request resolved: https://github.com/pytorch/pytorch/pull/15969 Differential Revision: D13665139 Pulled By: jamesr66a fbshipit-source-id: cd8cb3dd7475f424e48a69d217f2eac529df9f6a	2019-01-15 17:32:48 -08:00
Kyle Lexmond	b792bfec0e	s/fwdproxy.any/fwdproxy/g in fbsource (#16024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16024 codemod with 'Yes to all': s/fwdproxy.any/fwdproxy/g in fbsource Reviewed By: maxgeorg Differential Revision: D13666336 fbshipit-source-id: a5a694d66efec5304a1c8c231d638441f88efe1d	2019-01-15 17:26:31 -08:00
Lu Fang	8f11df3cb7	Automatic update of fbcode/onnx to 84a0441ae28795a928005863dc142bee81827566 (#16046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16046 Previous import was 7abd834091f1024c11749dcfd25126802db9fdd5 Included changes: - [84a0441](https://github.com/onnx/onnx/commit/84a0441): Clarify namescopes in the presence of nested subgraphs (#1665) <G. Ramalingam> - [118fec5](https://github.com/onnx/onnx/commit/118fec5): Add Where op. (#1569) <Sergii Dymchenko> - [beefa15](https://github.com/onnx/onnx/commit/beefa15): Use strings directly for casing as np.object w/o redundant StringHolder. (#1736) <Dmitri Smirnov> - [4023bae](https://github.com/onnx/onnx/commit/4023bae): Add a capability to input/output unicode strings (#1734) <Dmitri Smirnov> - [1a8a7fc](https://github.com/onnx/onnx/commit/1a8a7fc): typos fixed: iutput -> input (#1726) <Beomsoo Kim> - [0128478](https://github.com/onnx/onnx/commit/0128478): Scan test update (#1732) <G. Ramalingam> - [c6a24fd](https://github.com/onnx/onnx/commit/c6a24fd): turn rtol to 0.002 on densenet121, since AMD and Nvidia GPU's precion difference (#1733) <Lu Fang> - [5b7ac72](https://github.com/onnx/onnx/commit/5b7ac72): Add Shrink operator (#1622) <Rui Zhu> Reviewed By: yinghai Differential Revision: D13676711 fbshipit-source-id: 513cc137223469b47af48919432aaecf58006012	2019-01-15 17:17:31 -08:00
Xiaomeng Yang	13f38ab79d	Add count_include_pad to average_pool_gradient_op (#15997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15997 Add count_include_pad to average_pool_gradient_op Reviewed By: houseroad Differential Revision: D13648339 fbshipit-source-id: 205cb2acb32dc24a85256b628298b1a11f0ffa2c	2019-01-15 16:56:40 -08:00
Zachary DeVito	b2eb98f6c3	Remove cuda from autograd profiler (#15898 ) Summary: This puts stubs in the autograd profiler for the use of cuda APIs allowing the cuda parts of libtorch to be linked separately from the CPU parts. This also edits the buck build. Previous: For GPU builds: _C -> csrc -> caffe2 For CPU builds: _C -> csrc-cpu -> caffe2 Now: GPU: _C -> libtorch_cuda -> (libtorch -> caffe2, for CPU) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15898 Reviewed By: ailzhang Differential Revision: D13617991 Pulled By: zdevito fbshipit-source-id: 6d84a50bb356a54b4217f93219902755601b00e1	2019-01-15 16:43:11 -08:00
Yavuz Yetim	2824cb6e9c	Fix namespace typo. (#16021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16021 Adds nom:: so that TRIVIAL_CONVERTER works more generally. Reviewed By: janewangfb Differential Revision: D13664748 fbshipit-source-id: 100f47a8326e41bd0ac2ae281669f5a0363fe060	2019-01-15 16:43:08 -08:00
Jesse Hellemn	c448f85e1f	Fixing missing cpp tests for Caffe2 setup.py builds (#16037 ) Summary: These were broken (always skipped in setup.py builds) by https://github.com/pytorch/pytorch/pull/15917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16037 Differential Revision: D13675549 Pulled By: pjh5 fbshipit-source-id: fed50855dd0b5d0c80fface3d8b2156f18aae4e7	2019-01-15 13:09:12 -08:00
Sebastian Messmer	57b5e7572b	Test cases for calling caffe2 LayerNorm from PyTorch and JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15895 Reviewed By: dzhulgakov Differential Revision: D13615336 fbshipit-source-id: de28fef8ce025d6d37a4c80c029ec97b7195cfd9	2019-01-15 12:03:57 -08:00
Shane Li	620ff25bdb	Enhance cpu support on gloo based multi-nodes mode. (#11330 ) Summary: 1. Add some gloo communication operators into related fallback list; 2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator; 3. Add new cpu context support for some python module files and resnet50 training example file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330 Reviewed By: yinghai Differential Revision: D13624519 Pulled By: wesolwsk fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f	2019-01-15 11:47:10 -08:00
Elias Ellison	7d601715e5	Constant prop prim::None (#15979 ) Summary: Previously we were only constant propping prim::Constants, but we should be constant propping prim::None as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15979 Differential Revision: D13664692 Pulled By: eellison fbshipit-source-id: 01839403576c21fc030c427e49275b8e1210fa8f	2019-01-15 11:34:51 -08:00
Edward Yang	9a6fe4feec	Add a note about THNN height/width/etc argument reordering. (#15819 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15819 Differential Revision: D13665297 Pulled By: ezyang fbshipit-source-id: 4570275bc9e65269788f836f2447d09474cefeff	2019-01-15 10:52:39 -08:00
Jesse Hellemn	406b9c49bd	Fix Python path finding for benchmark tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16022 Differential Revision: D13673792 Pulled By: pjh5 fbshipit-source-id: 177a823ef343b7f60e26ad9ef51415332045438d	2019-01-15 10:48:40 -08:00
James Reed	7f1397acef	Quantized RNNCell modules (#15469 ) Summary: Similarly to https://github.com/pytorch/pytorch/pull/13777, we apply post-processing quantization to RNN cell modules (`RNNCell`, `LSTMCell`, and `GRUCell`). A further follow-up PR will involve quantizing the full `RNN`, `GRU`, and `LSTM` modules. This depends on those modules being scriptable as part of the standard library scripting effort, though. Note that infrastructure in this pr such as `gather_quantized_params` is currently unused but should be used in the future when we can port over the full RNN modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15469 Differential Revision: D13545802 Pulled By: jamesr66a fbshipit-source-id: ad3b694517842893ea619438e9f5e88fd7b96510	2019-01-15 10:40:51 -08:00
Derek Kim	19717224c5	Miscellaneous broken RSTs fixed (#16033 ) Summary: https://pytorch.org/docs/master/tensors.html#torch.Tensor.bernoulli_ https://pytorch.org/docs/master/torch.html#torch.addmm https://pytorch.org/docs/master/distributed_deprecated.html#torch.distributed.deprecated.reduce_multigpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/16033 Differential Revision: D13671202 Pulled By: soumith fbshipit-source-id: 276e10e610affe205376573e7f0f9894695d218d	2019-01-15 09:50:12 -08:00
Lu Fang	b329e03684	Add PyTorchPredictorContainer (#15899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15899 Add PyTorchPredictorContainer to support multiple jit script modules Reviewed By: pritamdamania87 Differential Revision: D13596139 fbshipit-source-id: 3ce0bdf2f4dbba7aa1d20e824d03e5ac98f5d887	2019-01-15 09:18:18 -08:00
Xiang Gao	1065e7cd24	Add `itertools.{prod, combinations, combinations_with_replacement}` like op to pytorch (#9393 ) Summary: closes https://github.com/pytorch/pytorch/issues/7580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9393 Differential Revision: D13659628 Pulled By: zou3519 fbshipit-source-id: 3a233befa785709395a793ba8833413be394a6fd	2019-01-15 08:31:22 -08:00
Jongsoo Park	964732fa8d	use fbgemm gconv in dnnlowp (#16020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16020 Needs to go over more iterations. For conv, I think we need a high level interface that abstracts out low-level details of which code path will be taken (acc16, outlier-aware, depth-wise, group conv, ...) otherwise the client code will be complex as can be seen from DNNLOWP Conv ops. This will also help us to make interface more stable. Reviewed By: dskhudia, jianyuh Differential Revision: D13588996 fbshipit-source-id: 9afce9e441bcaf20437fcc2874fb9d4165a46bcb	2019-01-15 00:02:31 -08:00
Brennan Vincent	bc233fe405	`var` for multiple dimensions (#15892 ) Summary: Timings are the same as for `std` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/15892 Differential Revision: D13651173 Pulled By: umanwizard fbshipit-source-id: a26bf1021dd972aa9e3e60fb901cd4983bfa190f	2019-01-14 20:17:42 -08:00
svcscm	02b9f686a7	Updating submodules Reviewed By: yns88 fbshipit-source-id: 19841cff4a7fd69318d7828db75c16cd75757edd	2019-01-14 20:17:41 -08:00
svcscm	5d527b9cc2	Updating submodules Reviewed By: yns88 fbshipit-source-id: 68b7c41366618ffd636c2b9c45c7ffbbcbc44f85	2019-01-14 18:43:27 -08:00
Duc Ngo	10b16953d1	nomnigraph - easy - use new test utils in converter_nomnigraph_test (#15751 ) Summary: Use new test utils in converter_nomnigraph_test , and add utils to set device option name, external inputs, outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15751 Differential Revision: D13586228 Pulled By: duc0 fbshipit-source-id: ff809dd7bf9f30641ce2a6fef7e2810f005521c2	2019-01-14 18:38:38 -08:00
Sebastian Messmer	4ed9de8680	Remove code duplication (#15880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15880 The layer_norm reference was implemented twice. Removing one of them. Reviewed By: dzhulgakov Differential Revision: D13611232 fbshipit-source-id: cee96c78d3255c3a4e34300693bf9260cf096615	2019-01-14 17:59:37 -08:00
Edward Yang	ddece5a793	Fix ormqr docs, fixes #15565 (#15694 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc meganset Pull Request resolved: https://github.com/pytorch/pytorch/pull/15694 Differential Revision: D13573064 Pulled By: zou3519 fbshipit-source-id: 1d0b693d7c26db91826b81e6c98b45a69b5e9bc4	2019-01-14 17:08:18 -08:00
SsnL	774705ba05	Fix c10d checking errno unconditionally (#15986 ) Summary: In #15964, I learned that `errno` is only meaningful if the function call fails. E.g., on some macos, a successful `fork()` sets `errno` to `EINVAL` in child process. This commit changes the `SYSCALL` macro so error checking is only done when an error happens. This means checking whether `rv == -1` for most calls, but is checking `rv == nullptr` for `inet_ntop`. Now `SYSCALL` accepts a second argument `success_cond`, which should be an expression returning whether the call succeeded. `SYSCHECK_ERR_RETURN_NEG1` is the shorthand for checking if rv is `-1`. Any suggestion on better macro names is welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15986 Reviewed By: janewangfb Differential Revision: D13661790 Pulled By: pietern fbshipit-source-id: 9551b14b9f88805454a7bfb8e4d39e0f3aed8131	2019-01-14 16:02:05 -08:00
Elias Ellison	4fb3931896	add tensor.to to script (#15976 ) Summary: Previously it only worked with keyword arguments. Now it is fully compatible. Fix for: https://github.com/pytorch/pytorch/issues/15478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15976 Differential Revision: D13643979 Pulled By: eellison fbshipit-source-id: 6a47bce7db362da80452adffebd2732f8e62a240	2019-01-14 15:49:31 -08:00
Jesse Hellemn	8964a2e6e6	Split Caffe2 CI into cmake-only and python builds (#15917 ) Summary: bypass-lint - Change all Caffe2 builds to use setup.py instead of cmake - Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp - Move skipIfCI logic from onnx test scripts to the rest of CI logic - Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917 Reviewed By: orionr Differential Revision: D13637583 Pulled By: pjh5 fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153	2019-01-14 15:20:44 -08:00
Peter Goldsborough	4bdaca827c	Make call operator on module holder call forward (#15831 ) Summary: In Python, you can use the call operator to invoke the `forward()` method of a module. In C++ this was currently not possible, because I couldn't figure out how to deduce the return type of a module's `forward()` method under the constraint that `forward()` may not exist at all (since the base module class in C++ does not mandate a `forward()` method). I now figured it out, so the call operator can be used. ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/15831 Differential Revision: D13652676 Pulled By: goldsborough fbshipit-source-id: ccab45a15215dda56460e560f0038781b539135f	2019-01-14 14:40:33 -08:00
svcscm	c620725177	Updating submodules Reviewed By: yns88 fbshipit-source-id: 0e31357e8a34614226e8948ae76d67e0786a9196	2019-01-14 12:46:24 -08:00
Derek Kim	8c55e56c37	Fix broken rst of torch.nn.utils.spectral_norm and others (#15995 ) Summary: - Currently, the [rst](https://pytorch.org/docs/stable/nn.html#torch.nn.utils.spectral_norm) looks broken, at least in my browser. So I fixed it. - I thought a subscript may be needed to the left W in the definition. - A few typos fixed. crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/15995 Differential Revision: D13649888 Pulled By: soumith fbshipit-source-id: 00a2c3b043c7c8ebdd9fc2bf77ba27ae695fee3f	2019-01-14 07:35:36 -08:00
SsnL	300dcc3b96	Add cuda.reset_max_memory_* (#15985 ) Summary: Addresses #15968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985 Differential Revision: D13649916 Pulled By: soumith fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff	2019-01-14 07:31:51 -08:00
SsnL	7c08f1083e	libshm retry on EINTR (#15964 ) Summary: fixes https://github.com/pytorch/pytorch/issues/14314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15964 Differential Revision: D13639034 Pulled By: soumith fbshipit-source-id: 44592762aa46982e5d3616d55b5666a2c2ce9105	2019-01-14 04:30:40 -08:00
Derek Kim	abdaa477e5	Improved the documentation for torch.nn.functional.pad (#15984 ) Summary: - Fixed a few typos and grammar errors. - Changed the sentences a bit. - Changed the format of the tuples to be consistent with padding notations in the other places. For example, `ReflectionPad2d`'s dostring contains :math:`H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}`. I also made sure that the generated html doesn't break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15984 Differential Revision: D13649939 Pulled By: soumith fbshipit-source-id: 0abfa22a7bf1cbc6546ac4859652ce8741d41232	2019-01-14 04:12:45 -08:00
Derek Kim	6ec753f2f9	Improve the docstring of nn.random.fork_rng (#15960 ) Summary: Improved the docstring of nn.random.fork_rng Pull Request resolved: https://github.com/pytorch/pytorch/pull/15960 Differential Revision: D13649929 Pulled By: soumith fbshipit-source-id: d3843179a2f1f838792c2f07f34deda2c06af56e	2019-01-14 02:41:18 -08:00
surgan12	492b7d410b	doc fixes (#15990 ) Summary: fixes #15597 , #15283 and #10258 Differential Revision: D13649905 Pulled By: soumith fbshipit-source-id: 753f46c2c96c61fba460019d9ed3e0d047d42ee7	2019-01-13 23:38:39 -08:00
Jongsoo Park	ca18fb8567	simplify lambda function use in conv dnnlowp ops to fix #15911 (#15996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15996 As reported in issue #15911, gcc 4.9 was getting internal compiler error due to a complex use of lambda function in conv_dnnlowp_op.cc and conv_acc16_op.cc . This diff simplifies them. Reviewed By: viswanathgs Differential Revision: D13648264 fbshipit-source-id: 1551ae8a0a7653749185dca51ccceb2471b96b82	2019-01-13 23:32:48 -08:00
kyryl	a7415787ac	fix RandomSampler length (#15991 ) Summary: Hi! This PR addresses #15537 issue. Please review. Thanks! Differential Revision: D13649890 Pulled By: soumith fbshipit-source-id: 166212ae383331345423236dfc4fa2ea907d265d	2019-01-13 23:09:51 -08:00
peter	e18d6cd455	Fix static build on Windows (#15989 ) Summary: Tested locally. It could be now be started by running `set EXTRA_CAFFE2_CMAKE_FLAGS= -DTORCH_STATIC=1` before build. If we want to make sure it works, then maybe we should add it into CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15989 Differential Revision: D13649935 Pulled By: soumith fbshipit-source-id: 956945ed572819d8cf0bc9bd48df3ea9bc6f4a8a	2019-01-13 22:53:30 -08:00
Sergei Nikolaev	a282378baf	Caffe 2: Reshape Op upgrade (#15380 ) Summary: This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380 Differential Revision: D13649825 Pulled By: houseroad fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94	2019-01-13 22:49:40 -08:00
Jongsoo Park	04b8a2f1ba	fix compile error reported in issue #15911 (#15953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15953 Fix issue reported in https://github.com/pytorch/pytorch/issues/15911 Reviewed By: csummersea Differential Revision: D13633256 fbshipit-source-id: 3808f100ff7dedfe5e20708e72e6081ff07eb32c	2019-01-12 21:03:12 -08:00
Jerry Zhang	6371bc76a9	Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983 Original commit changeset: 6e4275d02f4c Reviewed By: supertopher, Yangqing Differential Revision: D13644123 fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504	2019-01-12 07:07:22 -08:00
wuhuikx	35480a7c44	Remove StopGradient op when it is inplace in inference (#12152 ) Summary: For Inference, if the StopGradient op is inpalce, we just remove it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12152 Differential Revision: D13633946 Pulled By: yinghai fbshipit-source-id: 57762bcc37b38a1d39cb4af316ca50bfe961b105	2019-01-11 23:55:01 -08:00
Xiaomeng Yang	586d030311	Add global pooling specialization and also update MaxPooling on GPU (#15824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15824 Add global pooling specialization and also update MaxPooling on GPU Reviewed By: houseroad Differential Revision: D13596340 fbshipit-source-id: c8a42aa69ee92c383c9f19d3ed57b77cb3e5bd28	2019-01-11 22:37:48 -08:00
Michael Suo	83c054de48	AliasDB interface cleanup (#15656 ) Summary: This is the first of several PRs to simplify AliasDb usage. - Hide the concept wildcards from users. They are too hard to think about and too easy to forget about. - Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move). Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656 Differential Revision: D13615492 Pulled By: suo fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f	2019-01-11 20:06:53 -08:00
svcscm	00b2dff6b6	Updating submodules Reviewed By: zpao fbshipit-source-id: 2671ea6bb594280a9d3352fbfa3628f28c6847aa	2019-01-11 19:57:11 -08:00
Peter Goldsborough	a4c1aa4bc5	Add the normalize transform to the core library (#15891 ) Summary: Adds the `Normalize` transform to the core C++ frontend library. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15891 Differential Revision: D13642167 Pulled By: goldsborough fbshipit-source-id: 573428e626d6106cf2aadf3dc2e2aecb9a85efc3	2019-01-11 19:50:18 -08:00
Jongsoo Park	e5266b4ba6	3x3x3 depthwise convolution with per channel quantization (#15775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15775 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/55 fbgemm didn't have per-channel quantization for 3x3x3 depth-wise convolution Reviewed By: jianyuh Differential Revision: D13587438 fbshipit-source-id: 91c36fae7a0e8386e3bc49808e18918b01681dd1	2019-01-11 19:42:29 -08:00
Jianyu Huang	264e16bffd	Make it consistent for OperatorBase usage (#15908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15908 "OperatorBase::" is changed to "this->template ". For example, # This no longer works OperatorBase::GetSingleArgument<>() # Should change to: this->template GetSingleArgument<>() https://fb.workplace.com/groups/101100140348621/permalink/576804082778222/ Follow up of D13574832. Sample Diff: D9319742, D10045844. Reviewed By: jspark1105 Differential Revision: D13613574 fbshipit-source-id: 2cb4094557b4af78d41e289816cad3e1194fb82c	2019-01-11 19:32:58 -08:00
Jerry Zhang	55b0e2a1eb	rocm build (#15981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15981 caffe2/operators/unique_ops.cu translated to caffe2/operators/hip/unique_ops.hip breaks rocm build Reviewed By: BIT-silence Differential Revision: D13646129 fbshipit-source-id: 900a14e14216686ec4560b30df2eabbd7ec2ff91	2019-01-11 18:39:52 -08:00
svcscm	6f08e2a588	Updating submodules Reviewed By: zpao fbshipit-source-id: 3bbf550cb0bfe71c05b73b8bc4ce97285b50608b	2019-01-11 18:00:01 -08:00
Jerry Zhang	bff0f88cc8	Tensor construction codemod(ResizeLike) - 2/3 (#15940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15940 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13629047 fbshipit-source-id: 5f0641a9aaab9045fa63c32c6a07a4cab3340cc3	2019-01-11 17:41:48 -08:00
James Webber	162ad94590	Fixed typo in batchnorm docstrings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15975 Differential Revision: D13642271 Pulled By: soumith fbshipit-source-id: 60ffa392bf1f916f2b93c943bb44a642a9815c42	2019-01-11 17:28:37 -08:00
Jerry Zhang	fd0ed2e298	Tensor reinitialization codemod - 4/5 (#15967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15967 Codemod generated with clangr shard mode, 25 files per diff, To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and call `ReinitializeTensor` to initialize it. motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13586735 fbshipit-source-id: eae2d79e1107a2e813ce3809e690af4706aaa9ca	2019-01-11 16:41:19 -08:00
Lu Fang	94acddb57f	Fix the lint (#15973 ) Summary: Fix the lint error introduced in https://github.com/pytorch/pytorch/pull/15965 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15973 Differential Revision: D13640856 Pulled By: houseroad fbshipit-source-id: 3f14d9898dcfb0fc469468f63fa1461c88b66b2e	2019-01-11 15:59:59 -08:00
Jerry Zhang	eb15587c99	Tensor reinitialization codemod - 2/5 (#15947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15947 Codemod generated with clangr shard mode, 25 files per diff, To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and call `ReinitializeTensor` to initialize it. motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13586732 fbshipit-source-id: 5295ab27ca0155f96a4fccf9c0ba8a609101ba24	2019-01-11 15:05:01 -08:00
James Reed	1235aa4fca	Expose dim() on type and use it in ONNX symbolics (#15933 ) Summary: While integrating fork/join into production translation, we found that trying to export `transpose()` where the input is of `TensorType` (rather than `CompleteTensorType`) failed. This is not ideal, since `TensorType` still contains the number of dimensions of the tensor, and that's all the `transpose` symbolic needs. This PR introduces a pybind binding for `dim()` on `TensorType` (and `CompleteTensorType` by inheritance). We now use this in places where it logically makes sense in the symbolics: those symbolics which only require knowledge of the number of dimensions rather than concrete sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15933 Differential Revision: D13639657 Pulled By: jamesr66a fbshipit-source-id: 6e50e407e93060085fd00a686a928764d0ec888d	2019-01-11 14:54:19 -08:00
Jerry Zhang	253b680928	Tensor construction codemod(ResizeLike) - 3/3 (#15943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15943 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13629082 fbshipit-source-id: d3863615fd612f73bb73ac67159fd0f0d237fe5c	2019-01-11 14:34:31 -08:00
Lin Yang	d042914221	FC shape inference should use int64_t (#15961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15961 as title Reviewed By: yinghai Differential Revision: D13634427 fbshipit-source-id: ec7d168b6272f0dac8a693401cfd0bea368f929a	2019-01-11 14:28:39 -08:00
Christian Puhrsch	d33159a426	Undo norm optimizations and add more documentation for parallel.h (#15885 ) Summary: See https://github.com/pytorch/pytorch/issues/15602 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15885 Differential Revision: D13614841 Pulled By: cpuhrsch fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf	2019-01-11 13:32:35 -08:00
Cheng,Penghui	926e718d5f	Add/fallback some operators for mkl-dnn (#11696 ) Summary: Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW. Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator. Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators. Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators. Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696 Reviewed By: yinghai Differential Revision: D10100438 Pulled By: wesolwsk fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d	2019-01-11 12:53:06 -08:00
Dmytro Dzhulgakov	96ea2594d8	Don't call cudaStreamDestroy at destruction time (#15692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692 It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed. Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose. Reviewed By: Yangqing Differential Revision: D13571988 fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46	2019-01-11 12:36:41 -08:00
Jerry Zhang	726341fea7	Tensor construction codemod(ResizeLike) - 1/3 (#15944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15944 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13628999 fbshipit-source-id: e17c44cec6746674dfd5c2a89c28c4ac0a3da450	2019-01-11 12:28:12 -08:00
Jesse Hellemn	bcc88dfb4e	Move nightly binary builds to 05:05 UTC (#15966 ) Summary: This corresponds to 00:05 EST Pull Request resolved: https://github.com/pytorch/pytorch/pull/15966 Differential Revision: D13639027 Pulled By: pjh5 fbshipit-source-id: 6685a7af74329b2730e519afd10e350ef2258f32	2019-01-11 11:46:21 -08:00
vishwakftw	e07cca1312	Add backend checks for batch norm (#15955 ) Summary: Fixes #15826 Changelog: - Add backend checks in `batch_norm_cpu` and `batch_norm_cuda` - Modify check in `checkBackend` to pass on undefined tensors. Differential Revision: D13636410 Pulled By: soumith fbshipit-source-id: 3b1cfe5ca8b7c0346569077163503065e75c2659	2019-01-11 11:28:45 -08:00
zrphercule	c9d7ead0c4	Add scalar_type_to_pytorch_type dict in ONNX symbolic Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15965 Differential Revision: D13637521 Pulled By: zrphercule fbshipit-source-id: 922cadc56f6380f67c14444cff4aa354a87150af	2019-01-11 10:55:43 -08:00
Zachary DeVito	3f6b212e80	Register CPU/CUDA fuser dynamically (#15887 ) Summary: This avoids a bunch of conditional compilation logic Pull Request resolved: https://github.com/pytorch/pytorch/pull/15887 Reviewed By: eellison Differential Revision: D13613239 Pulled By: zdevito fbshipit-source-id: a18fc69676b3ef19b4469ab58d8714d1f6efccbb	2019-01-11 10:50:35 -08:00
Adam Paszke	d580d3583b	Simplify cat fusion (#15633 ) Summary: That makes that definition of a "fusable node" much simpler, as we don't need to keep considering whether something has to be an "exit node" at every step. The fuser now tries to maximize the pointwise fusions first, and proceeds to prepending chunks and appending concats only once a fix point is reached. This patch not only makes the fuser much simpler to reason about, making it siginifcantly easier to implement features like SumToSize fusion, to improve performance of derivative graphs. cc zou3519 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/15633 Differential Revision: D13575306 Pulled By: zou3519 fbshipit-source-id: 0c55ea61d65d1f1ed3d75a8e1e83bc85a83f3aff	2019-01-11 10:33:42 -08:00
Elias Ellison	3d0d16d31c	Add bindings for .cpu() & .cuda() to script (#15904 ) Summary: Adding bindings for .cpu() and .cuda() to script. It's worth noting that if the device remains unchanged, than the returned tensor aliases the input, but if it does change than they do not alias each other. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15904 Differential Revision: D13632879 Pulled By: eellison fbshipit-source-id: 024a04f267909674aa1e510562efd9cb081f407c	2019-01-11 10:04:08 -08:00
Shen Li	03a570cad9	comment out large test cases for tril(u)_indices (#15959 ) Summary: 4GB is still too large and leads to CUDA OOM failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15959 Differential Revision: D13635146 Pulled By: mrshenli fbshipit-source-id: 3dc34a03d6ed65c458839d8fa37cd05bf3bc8106	2019-01-11 09:25:03 -08:00
Lu Fang	7841fe4f27	Automatic update of fbcode/onnx to 7abd834091f1024c11749dcfd25126802db9fdd5 (#15942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15942 Previous import was 8384c788939bc65463f9754b6a7a00b212b18ba1 Included changes: - [7abd834](https://github.com/onnx/onnx/commit/7abd834): Clarify some aspects of the Loop spec. (#1587) <Scott McKay> - [5a5b15f](https://github.com/onnx/onnx/commit/5a5b15f): Support rtol and atol at the model granularity (#1723) <Lu Fang> - [ba76e45](https://github.com/onnx/onnx/commit/ba76e45): print some information (#1724) <Lu Fang> - [797390d](https://github.com/onnx/onnx/commit/797390d): Update README.md (#1722) <Prasanth Pulavarthi> - [40cdb5f](https://github.com/onnx/onnx/commit/40cdb5f): repaire convtranspose shape inference (#1660) <peter yang> - [68fdb3f](https://github.com/onnx/onnx/commit/68fdb3f): [Minor] Fix Windows line ending in test coverage generating script (#1717) <Raymond Yang> - [00101bf](https://github.com/onnx/onnx/commit/00101bf): Remove ConstantLike op. Updates to ConstantOfShape op. (#1716) <Spandan Tiwari> - [c59e90a](https://github.com/onnx/onnx/commit/c59e90a): add a shape inference test for group conv (#1719) <Lu Fang> Reviewed By: zrphercule Differential Revision: D13629499 fbshipit-source-id: 4b3e4cb29bdb84c3777a8fb26263548efb20f317	2019-01-11 08:28:01 -08:00
Brennan Vincent	70dd44f6a8	Match NumPy by considering NaNs to be larger than any number when sorting (#15886 ) Summary: Fixes #15764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15886 Differential Revision: D13612971 Pulled By: umanwizard fbshipit-source-id: 91f552a25d1fd108f2f0b10e09a0ce0364f8c21e	2019-01-11 08:14:11 -08:00
Gregory Chanan	b7cdeb3fc3	Port empty_strided to ATen. (#15948 ) Summary: Turns out this has basically been implemented already in Resize.h / Resize.cuh. Also added some testing, basically just to check that empty_strided behaves equivalently to as_strided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15948 Differential Revision: D13631098 Pulled By: gchanan fbshipit-source-id: eb0e04eead45e4cff393ebde340f9d265779e185	2019-01-11 07:58:05 -08:00
Syed Tousif Ahmed	14dcdc4c35	Move cudaDeviceProp to ATen (#14834 ) Summary: This PR moves `deviceProperties` from `THCState` struct to `CUDAContext` in ATen and hence, takes one more step towards removing `THCState`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14834 Differential Revision: D13633956 Pulled By: soumith fbshipit-source-id: 51820ac224fc566f17aa92570fd378cff4248596	2019-01-11 07:09:32 -08:00
Derek Kim	da753b7ccf	Trivial typo fixings in nn.functional dropout* docstrings (#15951 ) Summary: Defualt -> Default Pull Request resolved: https://github.com/pytorch/pytorch/pull/15951 Differential Revision: D13633875 Pulled By: soumith fbshipit-source-id: 0da823ef235418396e9322089f6610b592e6990f	2019-01-10 22:42:52 -08:00
Syed Tousif Ahmed	86af14b0c7	Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461 ) Summary: When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning. This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461 Differential Revision: D13633952 Pulled By: soumith fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff	2019-01-10 21:44:39 -08:00
Gu, Jinghui	07ea3e035e	Fix fallback issues to handle inplace case (#15726 ) Summary: Fix fallback issues to handle inplace case Pull Request resolved: https://github.com/pytorch/pytorch/pull/15726 Differential Revision: D13591243 Pulled By: yinghai fbshipit-source-id: 6897f1daacb36beabcdfc22c39242bbdfdd0e534	2019-01-10 19:47:09 -08:00
Vitaly Fedyunin	0934e8de58	Optimize CPU version performance of the nonzero function. (#15925 ) Summary: Same as #15190 but compatible with MSVS compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/15925 Differential Revision: D13623473 Pulled By: VitalyFedyunin fbshipit-source-id: d0db9dbc1a0d8fc9bda08348cb1d3763ae9f8679	2019-01-10 17:50:38 -08:00
Jerry Zhang	890568a018	Tensor reinitialization codemod - 5/5 (#15884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15884 Codemod generated with clangr shard mode, 25 files per diff, To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and call `ReinitializeTensor` to initialize it. motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: hyuen Differential Revision: D13586737 fbshipit-source-id: dc8e49e9f29505b8898bb19f84c1a983f2d811ab	2019-01-10 16:32:26 -08:00
Evgeniy Zheltonozhskiy	e46e572b30	Add backward pass notes for eig() and symeig() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15929 Differential Revision: D13626158 Pulled By: soumith fbshipit-source-id: ab869560926036053c39d20b217ccef8767e7d3f	2019-01-10 16:27:48 -08:00
Sebastian Messmer	da7468853a	caffe2::Tensor::is_same() (#15407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15407 Don't ask the tensor for its intrusive pointer if we just want to check if two tensors are the same. This mirrors ATen APIs. Reviewed By: dzhulgakov Differential Revision: D13520389 fbshipit-source-id: 681317f36f480ab60e532bb08a073f98f39770fd	2019-01-10 16:22:25 -08:00
Sebastian Messmer	b9e1028cff	Clean up Half (#15317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15317 - Merge bitcasts.h and Half.h - Remove 'static' keyword Reviewed By: dzhulgakov Differential Revision: D13498492 fbshipit-source-id: 46d47143e7d3a9d3f4aa7d92379dbba015c97435	2019-01-10 16:22:23 -08:00
Sebastian Messmer	d408324350	Move files to/from c10/core and c10/util (#15316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316 This starts cleaning up the files in c10 according to the module structure we decided on. Move to c10/util: - Half.h, Half-inl.h, Half.cpp, bitcasts.h Move to c10/core: - Device.h, Device.cpp - DeviceType.h, DeviceType.cpp i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov Differential Revision: D13498493 fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63	2019-01-10 16:22:22 -08:00
Sebastian Messmer	6b64052e20	Remove Context from c10 operator schemas (#15312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15312 Context will soon be entirely obsolete. Remove it from the operator schema interface. Reviewed By: dzhulgakov Differential Revision: D13495323 fbshipit-source-id: caa0f8f092cd6284e510c3e1e3374fe2f8338364	2019-01-10 16:22:20 -08:00
Sebastian Messmer	8136c39b5e	Enable calling caffe2 LayerNorm from PyTorch and JIT (#15243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15243 Register it as a custom JIT op. Reviewed By: dzhulgakov Differential Revision: D13473791 fbshipit-source-id: 0f7e72e3efc85a75060a7597fadaf0a8bd289651	2019-01-10 16:22:18 -08:00
Zachary DeVito	913785445e	fix rocm build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15945 Differential Revision: D13630505 Pulled By: zdevito fbshipit-source-id: a4d2ae1370ab475fc1711027c0c9d2a9192be195	2019-01-10 16:16:15 -08:00
bddppq	27f6a29fd0	Remove USE_CUDA and USE_ROCM in engine.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15893 Differential Revision: D13627319 Pulled By: zdevito fbshipit-source-id: 7c72c1c6cc242143fb66383423c668c9b9810884	2019-01-10 14:45:11 -08:00
Peter Goldsborough	c5012d8641	Extend note about contributing to the C++ frontend (#15902 ) Summary: soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15902 Differential Revision: D13628525 Pulled By: goldsborough fbshipit-source-id: 70cf36d1bacd9d689d4fa4f2290886fd3765e89b	2019-01-10 14:22:00 -08:00
Jesse Hellemn	3ec3351306	Fix different env variables in schedules runs pt 2 (#15934 ) Summary: Unfortunately I do not know how to test this without merging it first Pull Request resolved: https://github.com/pytorch/pytorch/pull/15934 Reviewed By: orionr Differential Revision: D13627472 Pulled By: pjh5 fbshipit-source-id: 35eced1483bbf3c0c3f6f62fb7bbbf2f200e50e6	2019-01-10 14:09:12 -08:00
Xiaomeng Yang	4b427780aa	Change PoolOp Functors design to support CuDNN CUDA fallback (#15903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15903 Change PoolOp Functors design to support CuDNN CUDA fallback Reviewed By: houseroad Differential Revision: D13617085 fbshipit-source-id: 8a539d77f35bc47afe5dc8e32aaad52e45cb691c	2019-01-10 14:00:22 -08:00
Peter Goldsborough	b1fa19961e	Fix bug in torch::load and unpack torch::optim::detail namespace (#15926 ) Summary: Wasn't clearing optimizer buffers before adding new entries to it during deserialization. Successive calls to `torch::load` with the same optimizer would just append to the buffer container. Also moved `serialize()` function from `torch::optim::detail` into `torch::optim` so users can use it for custom optimizers. Fixes #15792 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15926 Differential Revision: D13623615 Pulled By: goldsborough fbshipit-source-id: e193091f25f56a95f2a9648af312cb7caa45f300	2019-01-10 13:55:50 -08:00
Elias Ellison	9173cd5a4d	fix aliasing on unwrap optional (#15748 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/15604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15748 Differential Revision: D13583632 Pulled By: eellison fbshipit-source-id: 9655ee010494179e17e34f3047363477dad15fb1	2019-01-10 12:52:53 -08:00
Adam Paszke	d35295c603	JIT Batch Norm fusion (#15897 ) Summary: Resubmit of #15146, which has been accidentally reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15897 Differential Revision: D13616093 Pulled By: zou3519 fbshipit-source-id: 0c3a3bec8f9fed57274da9f6c7cf40cbc05cf91a	2019-01-10 12:38:47 -08:00
Jesse Hellemn	7f268c6262	Fix different env variables in schedules runs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15927 Reviewed By: orionr Differential Revision: D13624127 Pulled By: pjh5 fbshipit-source-id: e8b14f0401b0c278a5d17af6d7979800917e3ae6	2019-01-10 12:33:24 -08:00
Orion Reblitz-Richardson	4edc8273eb	Allow for registration after GlobalInit (#15876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15876 Build changes made it so some .so libraries are now registered after GlobalInit is called. Although this shouldn't be common, it also shouldn't be explicitly excluded. These changes allow for late Caffe2 registration, but also warn in that case. Reviewed By: kuttas Differential Revision: D13608186 fbshipit-source-id: 0ca7bcd32516d374077db0c2548cf8c28ccdd5f6	2019-01-10 09:35:33 -08:00
SsnL	9b5ec2a076	Fix TestDataLoader.test_proper_exit (#15665 ) Summary: Currently, in `test_proper_exit`, 1. we do not kill the correct input `pid` in the `kill_pid` function `fe15d6a2c2/test/test_dataloader.py (L325-L329)` 2. the Windows command that detects process status doesn't actually work `fe15d6a2c2/test/test_dataloader.py (L641-L646)` 3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`. In this PR, I, in separate commits: 1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30 https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795). 2. Rewrite `test_proper_exit` with `psutil` so we 1. do not rely on the hacky `is_process_alive` `fe15d6a2c2/test/test_dataloader.py (L640-L653)` 2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger 3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario. 3. Fix Windows data loader not having any mechanism to detect worker failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665 Differential Revision: D13615527 Pulled By: soumith fbshipit-source-id: cfb2f67837d2d87928a53f00b4d20f09754b7949	2019-01-10 08:47:27 -08:00
peter	0ed3f766e9	Unify flags and environmental variable when building LibTorch/PyTorch (#15868 ) Summary: Fixes #15858. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15868 Differential Revision: D13622354 Pulled By: soumith fbshipit-source-id: bb8c49520ebf926c6194d42db75accba867018c7	2019-01-10 06:47:14 -08:00
Jesse Hellemn	3d68f35639	Adding binary builds to circleci Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15577 Reviewed By: orionr Differential Revision: D13617359 Pulled By: pjh5 fbshipit-source-id: 2b2a1b8735f2af6973a2352bee78912794402ae1	2019-01-10 00:06:09 -08:00
SsnL	2fa9264ba1	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15910 Differential Revision: D13620684 Pulled By: houseroad fbshipit-source-id: af3b1e2fed55ecd3417f66e549fa921bf4fd758e	2019-01-09 23:20:32 -08:00
an-kumar	cdaeb0db54	Make SGD match python (#15840 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15840 Differential Revision: D13608503 Pulled By: goldsborough fbshipit-source-id: aad17c110d64cbe2c126bccd36d228e4108ffa9a	2019-01-09 22:21:14 -08:00
Mikhail Zolotukhin	628bf5e3c9	test_jit.py: Speedup EndToEnd tests by reducing workload size. (#15906 ) Summary: Currently these tests are taking most of the time in test_jit.py run, with the proposed changes the testing time is reduced by ~75%: ``` TestEndToEndHybridFrontendModels.test_neural_style: 203.360s -> 10.650s TestEndToEndHybridFrontendModels.test_snli: 422.315s -> 9.152s TestEndToEndHybridFrontendModels.test_super_resolution: 73.362s -> 19.185s time python test/test_jit.py (real): 13m50.828s -> 3m11.768s time python test/test_jit.py (user): 85m59.745s -> 13m18.135s time python test/test_jit.py (sys): 144m9.028s -> 25m58.019s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15906 Differential Revision: D13619659 Pulled By: ZolotukhinM fbshipit-source-id: 6c22d8740f8ddb865c3a0667af32653723383816	2019-01-09 21:14:35 -08:00
Shen Li	23e28efed4	Porting legacy reflection_pad2d to ATen Summary: Other changes: 1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index. 2. Changed Camelcase naming to underscore naming. Differential Revision: D13546803 fbshipit-source-id: 1df54f13e64934da3d803d9b6586bd5208d42d6d	2019-01-09 20:55:27 -08:00
vishwakftw	5f1dd9e743	Fix log_prob for Gumbel distribution (#15878 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15681 Changelog: - Add hard-coded implementation of log_prob Pull Request resolved: https://github.com/pytorch/pytorch/pull/15878 Differential Revision: D13613716 Pulled By: soumith fbshipit-source-id: 2ba74e52748b6213098b167940dcc068f0c056f4	2019-01-09 20:09:34 -08:00
Jerry Zhang	4caca2f062	Tensor method rename sizes().size() -> dim() Summary: Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: smessmer Differential Revision: D13568637 fbshipit-source-id: 4e1b6658355d4073097eb666ba73596e0261bef1	2019-01-09 19:53:56 -08:00
vishwakftw	b4c3268b23	Batched upper triangular, lower triangular (#15257 ) Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: https://github.com/pytorch/pytorch/pull/15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4	2019-01-09 19:46:39 -08:00
Summer Deng	5af9aaa5bb	Minor bug fix in dnnlowp (#15841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15841 Fix the bugs in dnnlowp to support int8/int16 quantization for sparsenn. Reviewed By: jspark1105 Differential Revision: D13600878 fbshipit-source-id: 27f06d7c54a663208320c8f211714220a9b49540	2019-01-09 17:18:30 -08:00
Mikhail Zolotukhin	159c2f3918	test_jit.py: Replace direct `exec` invocation with a wrapper. (#15882 ) Summary: Python2 doesn't allow to invoke `exec` from a nested function: File "test/test_jit.py", line 4653 exec(code, globals(), scope) SyntaxError: unqualified exec is not allowed in function 'test' it is a nested function This patch wraps exec with a separate function, making it work for both python2 and python3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15882 Differential Revision: D13614235 Pulled By: ZolotukhinM fbshipit-source-id: 9a074308c2379f089402e0bf5a996cc649d6dbca	2019-01-09 17:01:20 -08:00
Gregory Chanan	b28738ccb5	Revert D13468570: [pytorch][PR] Optimize CPU version performance of the nonzero function. Differential Revision: D13468570 Original commit changeset: e55ce54d6062 fbshipit-source-id: 4c043564b0a69b5af11559e5dc94790e7064841f	2019-01-09 15:41:36 -08:00
Mickaël Schoentgen	71c6e24373	Fix several ResourceWarning: unclosed file (#15746 ) Summary: Hello, This is a patch to fix `ResourceWarning: unclosed file`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15746 Differential Revision: D13587286 Pulled By: soumith fbshipit-source-id: 08ac34c5b51d9334867f65a2927bff11511553f3	2019-01-09 15:36:53 -08:00
Yan Shang	a1180d8e86	Fix BuildIndexOp (#15580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15580 adding the UNDEFINED datatype. Reviewed By: itomatik Differential Revision: D13556099 fbshipit-source-id: b730f7fca8faefb8a013c265296eee26bcedaff0	2019-01-09 15:12:50 -08:00
Shen Li	7b9f794580	Wrap C10 CUDAStream instead of cudaStream_t in THCPStream Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15833 Differential Revision: D13608337 Pulled By: mrshenli fbshipit-source-id: 4c66ef89fad0dc14a11ddb69da92907797cd2828	2019-01-09 15:12:48 -08:00
Jerry Zhang	0c32e1b43e	use C10_MOBILE/ANDROID/IOS (#15363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363 Didn't define C10_MOBILE in the numa file move diff: D13380559 move CAFFE2_MOBILE/ANDROID/IOS to c10 ``` codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS" ``` i-am-not-moving-c2-to-c10 Reviewed By: marcinkwiatkowski Differential Revision: D13490020 fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4	2019-01-09 15:08:20 -08:00
Vitaly Fedyunin	5838b59c5d	Optimize CPU version performance of the nonzero function. (#15190 ) Summary: Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy. Can be further optimized for 1D tensors and boolean tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15190 Differential Revision: D13468570 Pulled By: VitalyFedyunin fbshipit-source-id: e55ce54d60626a42d9a10a02e407856458b8055e	2019-01-09 13:37:38 -08:00
Gregory Chanan	0571eaebab	Remove TH binding of newWithStorage as it is not used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15838 Differential Revision: D13601517 Pulled By: gchanan fbshipit-source-id: 71ec107de2c880e7e0fd2ad6b4ea3d112dbb9d86	2019-01-09 13:10:33 -08:00
Junjie Bai	692caa7211	Revert D13598894: [pytorch][PR] [Caffe2] [ROCm] Use correct workspace alloc call in MIOpen conv operator Differential Revision: D13598894 Original commit changeset: 44886161abdf fbshipit-source-id: 6c6057136f1ea741fcd1734695356709aeb4bf12	2019-01-09 10:03:50 -08:00
Topher Lubaway	14b40c0633	Revert D13548303: [pytorch][PR] Add support for batch_norm fusion to the JIT Differential Revision: D13548303 Original commit changeset: a2e2e5abc383 fbshipit-source-id: 5b70cdbcbd1cac06eeefb2a939773358c061183c	2019-01-09 08:53:57 -08:00
SsnL	fe15d6a2c2	Fix macos build (#15873 ) Summary: macos builds are broken now with the following error: ``` /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `initialize': no implicit conversion of nil into String (TypeError) from /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `new' from /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `<top (required)>' from /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.7/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require' from /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.7/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require' from /usr/local/Homebrew/Library/Homebrew/global.rb:25:in `<top (required)>' from /usr/local/Homebrew/Library/Homebrew/brew.rb:13:in `require_relative' from /usr/local/Homebrew/Library/Homebrew/brew.rb:13:in `<main>' Exited with code 1 ``` No recent commits look suspicious, and I can even reproduce locally on my macbook, so it might be related to some new `brew` updates. Empirically, calling `brew update` first seems to fix this. Example error build: https://circleci.com/gh/pytorch/pytorch/534392?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/15873 Differential Revision: D13608019 Pulled By: soumith fbshipit-source-id: 1499cb5246929e275a11ca6fccef6ef32918e45e	2019-01-09 07:50:36 -08:00
zou3519	f0c2a9a7b6	Add torch.bincount() test case on sliced tensor (#15835 ) Summary: This was causing a problem in #15735 but appears to have been fixed. Adding this test to prevent regressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15835 Differential Revision: D13600282 Pulled By: zou3519 fbshipit-source-id: d9939e74d372be71c50122a5f6a615fbd7fa4df6	2019-01-09 07:31:19 -08:00
Andre Georg Holzner	961f829067	deduplicated code in elementwise_op_broadcast_test.py (#15865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865 factored out code used in tests for operators Add, Mul and Sub into two new methods: a first one to generate the test vectors, a second one to run the actual tests given a caffe2 and python operator. Reviewed By: houseroad Differential Revision: D13526955 fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678	2019-01-09 03:07:22 -08:00
Jon Crall	c7ec7cdd46	Fixed syntax error in doctest (#15646 ) Summary: I fixed a very small extra parenthesis in a doctest. I'm also going to use this issue as a place to propose the eventual inclusion of xdoctest (a pip installable library I wrote) in pytorch's test suite. I think there are a lot of problems with Python's built in doctest module, and I've built xdoctest to fix them. I would love for my project to get some exposure and its addition to PyTorch may benefit both projects. Please see the readme for more details on what xdoctest brings to the table over the builtin doctest module: https://github.com/Erotemic/xdoctest I came across this small syntax error when working on ensuring xdoctest was compatible with pytorch. It isn't 100% there yet, but I'm working on it. My goal is to ensure that xdoctest is 100% compatible with all of torch's doctest out-of-the-box before writing up the PR. I'm also airing the idea out-loud before I commit too much time into this (or get my hopes up), so I'm attaching this little blurb to a no-brainer-merge PR to (1) demonstrate a little bit of value (because xdoctest flagged this syntax error) and (2) see how its received. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15646 Differential Revision: D13606111 Pulled By: soumith fbshipit-source-id: d4492801a38ee0ae64ea0326a83239cee4d811a4	2019-01-09 01:29:11 -08:00
surgan12	ac206a95f5	crelu mentioned (#15825 ) Summary: Mentioning crelu near relu in the docs . fixes #15730 . cc : ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15825 Differential Revision: D13605782 Pulled By: soumith fbshipit-source-id: d34932cf82e5407c48548dbdfc1c61b596669a0b	2019-01-08 22:55:49 -08:00
Yinghai Lu	5fe2697655	Initialize tensor with fp32 in Caffe2Backend.prepare() (#15832 ) Summary: Fix https://github.com/pytorch/pytorch/issues/14104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15832 Reviewed By: bddppq Differential Revision: D13598332 Pulled By: yinghai fbshipit-source-id: 3302ac47928974f49353c5da8af440e5c1716c22	2019-01-08 22:33:52 -08:00
Thomas Viehmann	c93cf89de2	Fix cuda native loss_ctc for varying input length (#15798 ) Summary: Thank you, freesouls, for the reproducing example! This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately. Fixes: #14401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15798 Differential Revision: D13605739 Pulled By: soumith fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f	2019-01-08 22:28:39 -08:00
marka17	cb32418669	Add element-wise multiplication in formulas (#15834 ) Summary: Absence of element-wise multiplication can confused some beginners Pull Request resolved: https://github.com/pytorch/pytorch/pull/15834 Differential Revision: D13603369 Pulled By: soumith fbshipit-source-id: 1d5c17c57778ddbb4b201122d826d1d6437204d1	2019-01-08 21:17:25 -08:00
Derek Kim	3f6e58b43b	Typos fixed in CWrapPlugin.get_type_check (#15859 ) Summary: Typos fixed in CWrapPlugin.get_type_check Pull Request resolved: https://github.com/pytorch/pytorch/pull/15859 Differential Revision: D13605908 Pulled By: soumith fbshipit-source-id: a8c970f0ac6d54dfd69b9775fc1a2b4f198b4ed6	2019-01-08 20:55:35 -08:00
Sebastian Messmer	1d6e818f2c	Move LayerNorm op schema to c10 (#15199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15199 In order to call it from PyTorch, this op schema can't live in caffe2 but must be included from PyTorch. Moving it to c10. This is not where it should be in the end (that's why there is a large TODO here), but an intermediate hack to enable this use case and proof-of-concept. Reviewed By: ezyang Differential Revision: D13462124 fbshipit-source-id: 1e187b9def8ef049c91e6de947ea4a85758d711b	2019-01-08 20:31:48 -08:00
Sebastian Messmer	11708cbd7b	Update flat_hash_map (#15367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15367 This updates flat_hash_map and fixes an issue with singletons across library boundaries (see the PRs linked at the top of the file) Reviewed By: ezyang Differential Revision: D13510912 fbshipit-source-id: e90a297a7a2d69ae3fe48e4fcd8a44ad4b81292a	2019-01-08 20:31:46 -08:00
Sebastian Messmer	905df3943a	Fix C10_API/C10_EXPORT for op schema registration (#15324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15324 This was missing but needs to be here, otherwise we can't register schemas without linker errors. Reviewed By: ezyang Differential Revision: D13500679 fbshipit-source-id: ba06351cb8ae09ec456cb93e527d388ace578fbb	2019-01-08 20:31:45 -08:00
Sebastian Messmer	d562840910	Use C10Tensor in the dispatcher (#15195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15195 This removes the use of caffe2::Tensor or at::Tensor in the c10 dispatcher and only uses C10::Tensor. It also changes output tensors to be passed as `const Tensor&` instead of `Tensor*` because we otherwise can't forward them in operator_c10wrapper.h. Reviewed By: ezyang Differential Revision: D13461640 fbshipit-source-id: 7f79925a7d60f01660a24bbfda47391af0c70ed3	2019-01-08 20:31:43 -08:00
Sebastian Messmer	8ac55a6812	Convert caffe2/aten Tensors to/from c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14820 Reviewed By: dzhulgakov Differential Revision: D13348044 fbshipit-source-id: 95008e6ead3cfc478696b1c203769241d4cf6ca8	2019-01-08 20:31:42 -08:00
Sebastian Messmer	31d7c933af	Implement c10::Tensor (#14819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14819 This is a minimal wrapper for a c10::TensorImpl, maybe destined for greatness later when we move caffe2::Tensor or at::Tensor into c10. Reviewed By: dzhulgakov Differential Revision: D13348039 fbshipit-source-id: 874f515358e94f35dc7a4c3e55b35fde59c51ff1	2019-01-08 20:31:40 -08:00
albanD	828cb18fa3	Allow ReadyQueue to handle empty tasks (#15791 ) Summary: Allow the comparison function used in ReadyQueue to handle the empty FunctionTasks created by the reentrant autograd. Fix #11732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15791 Differential Revision: D13598006 Pulled By: soumith fbshipit-source-id: 0bfdf28a735fbfe44f0fdbaf8b74a6198e6a1984	2019-01-08 20:06:04 -08:00
Brennan Vincent	8a07cbe5e1	In loop_wrapper, do not copy the passed-in functor (capture it by reference instead). (#15845 ) Summary: The overhead of the copy actually makes an appreciable difference when doing a lot of small reductions (i.e., when the reduced dimension is significantly smaller than the non-reduced dimensions. ``` x=torch.randn((1024,10,1024),dtype=torch.float64) torch.set_num_threads(1) %timeit x.std(1) ``` Before: 813.0 ms After: 708.25 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/15845 Differential Revision: D13603246 Pulled By: umanwizard fbshipit-source-id: 020d224d76fcb8a0b55b75b0f2937e9508891beb	2019-01-08 19:59:39 -08:00
David Carrillo Cisneros	2b22612289	Add NHWC support to Resize Operator (#15553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553 Add unit test and implementation of NHWC layout for Resize operator. Also, add pragma parallel loop to old NCHWC layout. Reviewed By: jspark1105 Differential Revision: D13540762 fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5	2019-01-08 16:44:17 -08:00
andersj	8a5ba577c1	Revert "remove use of tmp_install" (#15847 ) Summary: This reverts commit 04bf5285896e52ac118d2f9e9b7f582f695f13e2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15847 Differential Revision: D13603174 Pulled By: anderspapitto fbshipit-source-id: ae321434d3345ad94fad67bf71fd027cddeb4588	2019-01-08 16:30:19 -08:00
Jesse Hellemn	4f51ca490e	Correcting source pybind11 library to install into Python Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15836 Reviewed By: anderspapitto Differential Revision: D13601331 Pulled By: pjh5 fbshipit-source-id: 36785c501774c01f47acb49cdac265b2c95a5040	2019-01-08 15:06:55 -08:00
Zachary DeVito	acc83ad54e	implement floordiv with correct integer and division by 0 semantics (#15813 ) Summary: fixes #15768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15813 Differential Revision: D13594872 Pulled By: zdevito fbshipit-source-id: c6c78c9e17fb16ec2bdc42402d203592cf35b7db	2019-01-08 13:44:18 -08:00
Derek Kim	92a2bfe52d	A trivial error message updates on `at::Tensor _convolution` (#15830 ) Summary: I fixed an grammatical error on this function previously, but I also realized that its content was also wrong. A weight tensors of a convolutional layer should be at least 3 dimensional, not 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15830 Differential Revision: D13597968 Pulled By: soumith fbshipit-source-id: 72a75106e88945c68d6462828b149441cfb5acde	2019-01-08 13:20:00 -08:00
peter	24314e9ceb	Enable torch static build on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15769 Reviewed By: yf225, pjh5 Differential Revision: D13597845 Pulled By: orionr fbshipit-source-id: 99640e22974990ae570a4795ce07274c4447cb01	2019-01-08 13:19:57 -08:00
Richard Zou	196eee6ccd	Fix sum_to behavior with zero dimensions (#15796 ) Summary: Fixes #15223. This fixes an autograd bug where backprop either fails or produces gradients of incorrect sizes when tensors with zero-sized dimensions are involved. Previously, we were reducing along dimensions that had size greater than 1 when summing to a size in autograd. This is incorrect because we should also reduce along dimensions with size 0 to produce a tensor of size 1 in that dimension that then gets viewed to the correct shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15796 Differential Revision: D13593199 Pulled By: zou3519 fbshipit-source-id: 2e2acac34943a9b7fabadc10c9efd4f66db298fd	2019-01-08 13:19:54 -08:00
mwootton	734eb31035	Cache workspace size in the BenchmarkCache. (#15742 ) Summary: Cache the workspace size information for MIOpen for a given configuration as opposed to inquiring it every time. This reduces overhead significantly as inquiring the workspace size forces a full read of the performance database in MIOpen and this database has grown significantly in recent releases. This caching gets us back to ideal performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15742 Differential Revision: D13598932 Pulled By: bddppq fbshipit-source-id: 4e65d247b71dec828293cf0562aac3fbd4fad83a	2019-01-08 13:10:15 -08:00
mruberry	1bc47c0d86	Refactors shape logic out of code generation, fixes possible segfault (#15750 ) Summary: This PR: - Removes shape logic from the code generator, which was previously relied on to return chunk and concat information - Copies the logic to detect if a kernel has a rand_like node to the executor, making its pass independent of the code generator - Fixes a possible segfault where references to a vector still being modified were relied upon The actual shape logic is unchanged. The possible segfault is in the handling of the former "flat_inputs" in codegen.cpp. This vector holds pairs, and the second element of these pairs is a reference. In some cases these would be references to items in the vector chunk_desc, which could be added to later, possibly invalidating any references to items in it. I hit a similar segfault in testing when naively making parallel code for "flat_outputs." I'm submitting this small PR because it's separable, self-contained, has a fix, and I am trying to actively get away from large PRs to encourage more stability and incremental change in the fuser. ngimel zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15750 Differential Revision: D13597451 Pulled By: zou3519 fbshipit-source-id: 0d48b365779b42849b044ba0286258aacc7b0332	2019-01-08 12:36:59 -08:00
Johannes M Dieterich	c42def29c8	Use parallel thrust execution policy on ROCm (#15481 ) Summary: The Thrust shipped with ROCm is recent enough to support this API. Minimize divergence between CUDA/ROCm by changing idef guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15481 Differential Revision: D13598739 Pulled By: bddppq fbshipit-source-id: 20d0a7e3887a4050eea65033161561af47411de1	2019-01-08 12:20:26 -08:00
ashishfarmer	cc402d8fa1	Use correct workspace alloc call in MIOpen conv operator (#15712 ) Summary: This PR contains changes for: 1. Using memory alloc from HIPContext while allocating workspace for MIOpen conv and transpose_conv operators rather than direct HIP mem alloc 2. Minor cleanup and removing an unnecessary sync call from MIOpen conv op Differential Revision: D13598894 Pulled By: bddppq fbshipit-source-id: 44886161abdf91cd29c7c93b3e23620e1b09c7c9	2019-01-08 11:38:45 -08:00
Jerry Zhang	532a709771	Tensor method rename dims()->sizes() - 2/2 Summary: Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: smessmer Differential Revision: D13581787 fbshipit-source-id: b04c6aa87fea3a10b522a71fccc1fcfb76a2c212	2019-01-08 11:34:36 -08:00
Jerry Zhang	ede1f4ad05	Remove caffe2::ShareData (#15418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15418 Previously we are using Resize + ShareData. Instead, we'll create a function on Tensor that clones itself with same storage. Suppose we want `t` to `ShareData` with `t0`, Previous: ``` Tensor t(dims, CPU); t.Resize(t0.sizes()); t.ShareData(t0); ``` Now: ``` Tensor t = t0.Alias(); ``` Reviewed By: dzhulgakov Differential Revision: D13507609 fbshipit-source-id: 6e4275d02f4c3356cbce91127f1b01111dc86b9f	2019-01-08 11:01:56 -08:00
Peter Goldsborough	8232bd526f	Move isnan to C++ (#15722 ) Summary: Wanted to use `Tensor.isnan` in C++, figured it'd be nice to have, so I made it into a tiny native function. gchanan ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/15722 Differential Revision: D13591315 Pulled By: goldsborough fbshipit-source-id: a78bd22101fde87a0257f759b9bfcf3b4208f5fa	2019-01-08 10:42:33 -08:00
Natalia Gimelshein	461dc9a28b	use all_weights instead of _parameters in _flat_weights in rnn (#15766 ) Summary: Fixes #15749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15766 Differential Revision: D13592320 Pulled By: soumith fbshipit-source-id: 6c3805f576c3df5a2da8bef1e4305eda379718df	2019-01-08 09:48:36 -08:00
Richard Zou	8f11147d43	Use CUDAGuard when serializing CUDA Tensors (#15807 ) Summary: Fixes #15308. Before this change, `torch.save` and `torch.load` would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1. This PR fixes that bug by using CUDAGuard in the storage serialization path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15807 Differential Revision: D13593201 Pulled By: zou3519 fbshipit-source-id: 4addc91ea5a5278d56a03f3d422577ee39e99897	2019-01-08 07:31:50 -08:00
Adam Paszke	29a9d6af45	Stop leaving garbage files after running test_jit.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15404 Differential Revision: D13548316 Pulled By: zou3519 fbshipit-source-id: fe8731d8add59777781d34d9c3f3314f11467b23	2019-01-08 07:22:55 -08:00
Adam Paszke	5e1b35bf28	Add support for batch_norm fusion to the JIT (#15146 ) Summary: We don't support reductions yet, but simply decomposing batch_norm into a kernel that computes the stats, and the fusing everything else with ReLU and following pointwise ops provides nice speedups. Note that this is only limited to inference mode for now, because we don't support convolutions and batch norm in AD, so the fuser isn't applied to those parts. This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard `torchvision` models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe). cc zou3519 zdevito mruberry ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/15146 Differential Revision: D13548303 Pulled By: zou3519 fbshipit-source-id: a2e2e5abc383f637fae19bd1b423f20c2cbc056a	2019-01-08 07:00:19 -08:00
Yinghai Lu	c3a0000864	Support communicating with C2 protobuf in Onnxifi flow (#15472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15472 Create a path to pass serialized C2 protobuf instead of ONNX during ONNXIFI flow Reviewed By: houseroad Differential Revision: D13536603 fbshipit-source-id: 7d016474f4beedbda480ed2e2c0004af7868aafe	2019-01-07 22:12:29 -08:00
Xiaomeng Yang	4650d70e93	Add count_include_pad arg for AveragePoolOp on GPU (#15787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15787 Add count_include_pad arg for AveragePoolOp on GPU Reviewed By: houseroad Differential Revision: D13589185 fbshipit-source-id: 235a84cfcd2033ee796c13e338fc3d03e832b5b1	2019-01-07 21:36:26 -08:00
Shen Li	99d2743863	Move Stream.query() implementation down to C++ (#15737 ) Summary: See #15682 Pushing up this small PR to check if I am doing the right thing. If correct, more will follow for other Stream APIs. Questions will be added inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15737 Differential Revision: D13581400 Pulled By: mrshenli fbshipit-source-id: 24afed7847b89b62f0692c79a101ec7ff9d9ee4d	2019-01-07 20:58:07 -08:00
Derek Kim	55baca57d2	A trivial error in the error message of `at::Tensor _convolution` fixed (#15772 ) Summary: A trivial grammatical error fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15772 Differential Revision: D13592279 Pulled By: zou3519 fbshipit-source-id: 14f60c61747a3893cd0e4c860f7b4c4c4ba28c28	2019-01-07 20:01:43 -08:00
Jongsoo Park	770b5ac42b	clean up D13579188 (#15759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15759 Some flags have too long names. And some other few minor clean ups. Reviewed By: jianyuh Differential Revision: D13587353 fbshipit-source-id: f8aee7f167505644f5d8f80fe2eed70201ef1e54	2019-01-07 18:48:25 -08:00
BowenBao	24867a58aa	Add support for exporting onnx split (#15092 ) Summary: * With the update of split output to dynamic list it breaks the export to onnx. Now split ir becomes two ops: 1. Dynamic[] <= Split(), and 2. out1, out2, out3 <= Prim::ListUnpack. In this fix these two consecutive ops get fused when being exported to onnx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15092 Reviewed By: dzhulgakov Differential Revision: D13583832 Pulled By: houseroad fbshipit-source-id: 3eb18c871e750921ad6d5cc179254bee9bcf4c99	2019-01-07 16:09:24 -08:00
Jongsoo Park	bc328d01e5	simplify conv dnnlowp ops by not allowing fp32 in/out (#15758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15758 DNNLOWP Conv operators became very complex due to many options. This diff simplifies them by not allowing fp32 in/out. This is OK for Conv operators because Conv operators are usually used in deep networks where quantizing and dequantizing using separate operators is not much overhead. Reviewed By: csummersea Differential Revision: D13587341 fbshipit-source-id: e88c919dae79d1c5b7d787ea539edf5bcb064afc	2019-01-07 15:14:59 -08:00
Gu, Jinghui	49ba2cb796	Enable conv+add fusion, same as conv+sum (#15268 ) Summary: Enable conv+add fusion, same as conv+sum Caution: only element-wise add is supported on IDEEP without scalar broadcast. Otherwise, the fusion is illegal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15268 Differential Revision: D13577375 Pulled By: yinghai fbshipit-source-id: 92c9c4b667c5ca5f7a262a5bffaa8aa68eeff3bd	2019-01-07 14:42:45 -08:00
David Riazati	76feb8c40f	Allow List arguments to Python Ops (#15721 ) Summary: Adds `List` to eval environment for type lines and allows `List` to be used on PythonOps (follows the same style as the `Tuple` code), fixes #15661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15721 Differential Revision: D13578540 Pulled By: driazati fbshipit-source-id: fce54dc3c0931d8b017b2e3483f0ac53826dda94	2019-01-07 13:51:53 -08:00
SsnL	668678e753	Bump CircleCI docker version to 278 (#15795 ) Summary: Just changing the version number doesn't seem to work. I needed to also fix macos brew parallel conflict should this merge together with https://github.com/pytorch/ossci-job-dsl/pull/36 ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/15795 Differential Revision: D13591839 Pulled By: yf225 fbshipit-source-id: 6b2a90943e63c8dcc4b6d9159eb54f1b5974c9ac	2019-01-07 12:32:33 -08:00
Peter Goldsborough	382807302c	Fix C++ Frontend example in frontend.html (#15717 ) Summary: The small end-to-end example in https://pytorch.org/cppdocs/frontend.html is a little outdated and needs fixes. ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15717 Differential Revision: D13591306 Pulled By: goldsborough fbshipit-source-id: 3334d68c7f77cf094b66ec2b2f396c4c65bb0d72	2019-01-07 11:39:47 -08:00
Peter Goldsborough	321a559359	Fix restructured text issue in tensor_basics.rst (#15701 ) Summary: Fix submitted by huntzhan in https://github.com/pytorch/cppdocs/pull/4. The source is in this repo so the patch has to be applied here. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15701 Differential Revision: D13591302 Pulled By: goldsborough fbshipit-source-id: 796957696fd560a9c5fb42265d7b2d018abaebe3	2019-01-07 11:35:19 -08:00
Gu, Jinghui	2ebeb33697	Fallback to CPU concat op to handle TensorCPU inputs (#15263 ) Summary: Fallback to CPU concat op to handle TensorCPU inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/15263 Differential Revision: D13587030 Pulled By: yinghai fbshipit-source-id: 010a8579d61c3beb8556eb92493a552b2ab0030c	2019-01-07 11:13:23 -08:00
Jongsoo Park	c68eb5ec44	fix conv unit test for groupwise quantization and pre-packing (#15761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15761 As title says. Reviewed By: csummersea Differential Revision: D13587727 fbshipit-source-id: f0631b8cbb89d65a1d952bc25b463de23de93bec	2019-01-07 11:08:32 -08:00
vishwakftw	95febdfacc	Add is_floating_point to docs (#15704 ) Summary: Fixes #15700 . Changelog: - Expose torch.*.is_floating_point to docs Differential Revision: D13580734 Pulled By: zou3519 fbshipit-source-id: 76edb4af666c08237091a2cebf53d9ba5e6c8909	2019-01-07 10:43:22 -08:00
Elias Ellison	2ff0e3b196	Pool prim::None nodes (#15745 ) Summary: Make the constant pooling pass pool prim::None nodes Pull Request resolved: https://github.com/pytorch/pytorch/pull/15745 Differential Revision: D13583518 Pulled By: eellison fbshipit-source-id: 7f8aa70522515805ab0991c6db3d96b5a96cdede	2019-01-07 10:00:51 -08:00
Owen Anderson	3277723173	Replace some malloc+memset pairs with calloc. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15765 Differential Revision: D13588723 Pulled By: resistor fbshipit-source-id: 47d35dc608847a5b173cfcf2aaa2a77359e56722	2019-01-06 18:57:17 -08:00
mruberry	b6a8c45f57	Removes print statements from test_torch.py (#15747 ) Summary: These print statements do not affect the test, and tests (generally) shouldn't print. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15747 Differential Revision: D13587289 Pulled By: soumith fbshipit-source-id: c758793c9e35faf02bacba6c7c6d072f7c40453f	2019-01-05 09:07:27 -08:00
Mickaël Schoentgen	04f5605ba1	Fix several DeprecationWarning: invalid escape sequence (#15733 ) Summary: Hello, This is a little patch to fix `DeprecationWarning: invalid escape sequence`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15733 Differential Revision: D13587291 Pulled By: soumith fbshipit-source-id: ce68db2de92ca7eaa42f78ca5ae6fbc1d4d90e05	2019-01-05 08:53:35 -08:00
ArutyunovG	2fb2d080d3	caffe2_benchmark msvc build fix (#15619 ) Summary: Fixing error in caffe2_benchmark binary ``` 2018-12-29T14:09:59.7867995Z d:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.h(90): error C2678: binary '\|=': no operator found which takes a left-hand operand of type 'std::_Iosb<int>::_Openmode' (or there is no acceptable conversion) (compiling source file D:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.cc) [D:\a\1\s\caffe2_builders\v141\pytorch\build\Release\binaries\caffe2_benchmark.vcxproj] 2018-12-29T14:09:59.7868252Z d:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.h(92): error C2678: binary '\|=': no operator found which takes a left-hand operand of type 'std::_Iosb<int>::_Openmode' (or there is no acceptable conversion) (compiling source file D:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.cc) [D:\a\1\s\caffe2_builders\v141\pytorch\build\Release\binaries\caffe2_benchmark.vcxproj] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15619 Differential Revision: D13580195 Pulled By: soumith fbshipit-source-id: b0a4479cd5f7555801b1977aeee96b6433293da7	2019-01-05 08:29:31 -08:00
Lu Fang	a918f1d9af	Adding a hook (wrapper) for non-std stream reader in PyTorchStreamReader (#15551 ) Summary: To implement a stream is very annoying, since it is closely defined with the underlying storage streambuffer. So in this PR, we add ReadAdapterInterface and PyTorchStreamReader will use it. We implement IStreamAdapter as a wrapper of std::istream. And keep the user interface unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15551 Reviewed By: zrphercule Differential Revision: D13568907 Pulled By: houseroad fbshipit-source-id: 93708cb801248a6c101f35cb14d1631029365c3c	2019-01-04 22:50:07 -08:00
Cheng,Penghui	1488c5dd03	support 0 size in any of the tensor dimensions in mkldnn (#15295 ) Summary: support 0 size in any of the tensor dimensions in mkldnn Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295 Differential Revision: D13573747 Pulled By: yinghai fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53	2019-01-04 22:33:18 -08:00
Lin Huang	2d8b332262	Port replication_pad2d and replication_pad3d to ATen (#15538 ) Summary: port replication padding 2D and 3D from legacy TH API implementation to ATen implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15538 Differential Revision: D13547567 Pulled By: lhuang04 fbshipit-source-id: decfe100d9edfdcfb62f39ee23f37b6cae0d461f	2019-01-04 17:08:14 -08:00
zrphercule	3d44eeec0a	Fix different types in rsub caused bug (#15707 ) Summary: Before this pr, rsub did not convert two elements into the same dtype, therefore "1 - x" may export to an onnx model that two elements of rsub having different dtype. By adding this symbolic patch this bug should be fixed. Related test cases also created. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15707 Differential Revision: D13583042 Pulled By: zrphercule fbshipit-source-id: 3a2de47a1a8d1ded1a0adfb911adbe6ac729cdef	2019-01-04 16:14:13 -08:00
Jerry Zhang	ae91156e5d	Tensor method rename dims()->sizes() - 1/2 Summary: Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: BIT-silence Differential Revision: D13581782 fbshipit-source-id: b16b4198e100617769d84aa599bf141117cfbe5b	2019-01-04 16:02:22 -08:00
Lu Fang	12e6c1ceeb	Automatic update of fbcode/onnx to 8384c788939bc65463f9754b6a7a00b212b18ba1 (#15739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15739 Previous import was 765f5ee823a67a866f4bd28a9860e81f3c811ce8 Included changes: - [8384c78](https://github.com/onnx/onnx/commit/8384c78): add constantofshape (#1582) <Rui Zhu> - [9afc06c](https://github.com/onnx/onnx/commit/9afc06c): Set symbol visibility to hidden for non-Windows (#1707) <Paul Jesse Hellemn> - [6f8a9f0](https://github.com/onnx/onnx/commit/6f8a9f0): Revert "Add NonMaxSupression operator (#1695)" (#1702) <Lu Fang> - [8b89544](https://github.com/onnx/onnx/commit/8b89544): Add NonMaxSupression operator (#1695) <Hector Li> - [0a7cc48](https://github.com/onnx/onnx/commit/0a7cc48): Add bfloat16 support. (#1699) <Dmitri Smirnov> - [da7c50c](https://github.com/onnx/onnx/commit/da7c50c): ONNX does not maintain versions for experimental ops (#1696) <Ke Zhang> - [0c8d857](https://github.com/onnx/onnx/commit/0c8d857): Correct type of value_info in Graph (#1694) <Maik Riechert> - [f612532](https://github.com/onnx/onnx/commit/f612532): Fix typos (#1686) <Eundoo Song> Reviewed By: zrphercule Differential Revision: D13581674 fbshipit-source-id: 8f8ee86a05a86fe99bf94509148c559ea3df1464	2019-01-04 15:56:55 -08:00
andersj	04bf528589	remove use of tmp_install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14553 Differential Revision: D13583335 Pulled By: anderspapitto fbshipit-source-id: 8711fead9eda877c1037a0bc59f91a3d2e01f3e0	2019-01-04 13:48:12 -08:00
Will Feng	6adbe12c74	Update CI credentials Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15736 Differential Revision: D13583174 Pulled By: yf225 fbshipit-source-id: 742470db10ef9df8f95e27626453b68ca90723e8	2019-01-04 13:36:10 -08:00
zrphercule	43761e01f5	Temporarily disable all XXXlike operator tests in pytorch-onnx test (#15740 ) Summary: We are going to have some breaking changes in ConstantLike and related operators in onnx, therefore it is better to disable all related tests for these operators for now. These operators are not currently supported by caffe2, and are not included in our most recently released onnx, therefore we do not need to worry about internal/external production breaking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15740 Differential Revision: D13582528 Pulled By: zrphercule fbshipit-source-id: 92a890c1dc2a833969af69edfea85331bb4d562f	2019-01-04 13:36:09 -08:00
Jerry Zhang	07c4991622	Tensor construction codemod - 2/2 (#15600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15600 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13542455 fbshipit-source-id: 8a3b15b0a1f81565f34e309114e1c3e1f7f65a3c	2019-01-04 13:31:53 -08:00
Elias Ellison	b1529eeadb	Print out operator suggestions for unknown builtin op (#15183 ) Summary: This improves the error message for "unknown builtin op" to suggest similarly named ops. Currently it prints out all operators with a name within two edits. Related issue: https://github.com/pytorch/pytorch/issues/13409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15183 Differential Revision: D13578509 Pulled By: eellison fbshipit-source-id: 5c73408eda1f7aa456f5bd28790c34df0c76aeca	2019-01-04 13:04:44 -08:00
svcscm	fad8480146	Updating submodules Reviewed By: yns88 fbshipit-source-id: b8be56b57d109dfef5980ea7255e2ab021da099e	2019-01-04 12:28:13 -08:00
Jerry Zhang	9e88547d72	Tensor construction codemod - 1/2 (#15598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15598 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13542429 fbshipit-source-id: db1059c78e85724d9b4fdab70466cf329db68359	2019-01-04 11:53:36 -08:00
Jongsoo Park	ad0ef7ae48	remove dependency to fp32 batch permutation op (#15723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15723 As title says. Reviewed By: jianyuh Differential Revision: D13578604 fbshipit-source-id: 0da0ac31ae83c1e0daa9077e878feb4deffed6a3	2019-01-04 07:56:05 -08:00
Michael Carilli	e313f1a7bf	Cudnn Handle Pool 3: At Wit's End (#15668 ) Summary: ezyang Here's a freshly rebased version of https://github.com/pytorch/pytorch/pull/15080 with the if statement that relieved the hangs that occasionally, nondeterministically, occurred on cudnnCreate on a particular windows build ([example w/debug statements](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/19238/console)) in https://github.com/pytorch/pytorch/pull/15280. I'd like to run the CI over this several times before it's considered mergeable. Sometimes the windows hang doesn't manifest for 2 or 3 consecutive trials. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15668 Differential Revision: D13579291 Pulled By: soumith fbshipit-source-id: 3972eb98bad6ece933ca5e67a10fc4bc2ed06068	2019-01-04 06:28:21 -08:00
vishwakftw	e798a09f6d	Remove TH/THC link for cholesky_solve (#15691 ) Summary: Changelog: - Remove TH/THC binding - Port single matrix case to ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/15691 Differential Revision: D13579317 Pulled By: soumith fbshipit-source-id: 63a55606c656396e777e8e6828acd2ef88ed1543	2019-01-04 06:24:17 -08:00
Youngseok	b740b92f36	Modify torch.gesv error message (#15654 ) Summary: [doc](https://pytorch.org/docs/stable/torch.html#torch.gesv) uses `B` uppercase so error message should follow to avoid confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15654 Differential Revision: D13571297 Pulled By: soumith fbshipit-source-id: 0b4e7797eceff92618f808bbfa65d13c1dcc2da0	2019-01-03 21:46:02 -08:00
Jongsoo Park	069d894145	make conv_depthwise_dnnlowp_op_test faster (#15725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15725 As title says. Reviewed By: jianyuh Differential Revision: D13579188 fbshipit-source-id: 382072c95929ccf9e189e2338e35b046c4a0650f	2019-01-03 21:46:00 -08:00
Elad Zippory	2d8f14cd12	clarified language of doc for torch.mul (#15664 ) Summary: see issue #15636 Please note - I build the documents but the HTML is not updated with the edited content. I did not also build the fork. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15664 Differential Revision: D13571310 Pulled By: soumith fbshipit-source-id: d43be0f61705693d778cc12c13e86d6b06130ac7	2019-01-03 21:39:35 -08:00
Jongsoo Park	a923ea7cf0	disallow nbits_in_non_outlier == 0 in acc16 conv; option to fallback to acc32 (#15708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15708 nbits_in_non_outlier == 0 doesn't make sense because it means everything is outlier and we can just use 32-bit accumulation. Depending on architecture, break-even point between acc16 and acc32 can be different. Adding thresholds for falling back to acc32. Reviewed By: jianyuh Differential Revision: D13574832 fbshipit-source-id: b7a37aacbfdc7867e31838dafcdd5f7c2ac282af	2019-01-03 20:31:33 -08:00
Elias Ellison	bebf1f7463	Torch tensor (#15224 ) Summary: Support torch.tensor in script. Already been accepted, trying to reland Pull Request resolved: https://github.com/pytorch/pytorch/pull/15224 Differential Revision: D13466616 Pulled By: eellison fbshipit-source-id: f7850da07b0eb11af98f255fc15bd3cf861f2a40	2019-01-03 17:35:17 -08:00
Shen Li	1e9a6d7192	A quick fix for Stream operation errors on non-current device (#15689 ) Summary: see #15682 This is a quick fix by implementing the simpler solution as suggested by colesbury. As benchmark result shows, it slows down `Stream.query()` by ~20%, I would be happy to further pursue a more complex solution by implementing this in C++/ATen. But I would still vote for merge this quick fix first just to get rid of the bug sooner. ~Test TBA~ Added FYI jeffreyksmithjr now ```python In [1]: def f(): ...: d0 = torch.device('cuda:0') ...: d1 = torch.device('cuda:1') ...: with torch.cuda.device(d0): ...: s0 = torch.cuda.current_stream() ...: with torch.cuda.device(d1): ...: s1 = torch.cuda.current_stream() ...: s0.query() ...: s1.query() In [4]: %timeit f() 38.1 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 37.6 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` before ```python In [4]: %timeit f() 28.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 35.3 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15689 Differential Revision: D13571697 Pulled By: mrshenli fbshipit-source-id: 4fe697f91248c6419136d37bb5b7147e612e2f4c	2019-01-03 15:14:58 -08:00
David Riazati	3270e4d4a5	Break up generated tests (#13992 ) Summary: This PR breaks up `TestJitGenerated` into 3 classes. This makes for easier testing of specific groups (e.g. run all generated functional tests without having to wait for the autograd tests) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13992 Differential Revision: D13076371 Pulled By: driazati fbshipit-source-id: 1267af59be7d69feb690f5805fcd43fea58a7159	2019-01-03 14:34:13 -08:00
Michael Suo	dcbc4f32db	flake8 hook fix (#15693 ) Summary: This PR bypasses checking the user's configuration entirely and always use strict, since the CI considers it a hard failure if you can't pass flake8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15693 Differential Revision: D13574889 Pulled By: suo fbshipit-source-id: f5e1c5731cc49b6223b415317033c275bc7d4fec	2019-01-03 13:55:20 -08:00
Stuart Golodetz	2403135257	Prevent VS2017 from emitting ambiguous symbol errors (#15697 ) Summary: These `std::forward` calls cause VS2017 to emit: error C2872: 'std': ambiguous symbol This fix prevents the ambiguity by specifying that `::std` is intended. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15697 Differential Revision: D13573483 Pulled By: goldsborough fbshipit-source-id: 0439de3523a37a18df7af0cff4a1284a53833ddd	2019-01-03 13:45:35 -08:00
Zachary DeVito	d42e90991b	trace s_copy_ (#15690 ) Summary: s_copy_ was previously special-cased for out of place tracing. This adds support for inplace tracing, which fixes tracing of inception_v3 Fixes #15216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15690 Differential Revision: D13572011 Pulled By: zdevito fbshipit-source-id: 1d565dec039a4b8c59179254285e61d2517ef9a9	2019-01-03 12:28:14 -08:00
Ailing Zhang	78442f04fc	Add mkldnn conv double backward (#15686 ) Summary: Fixes #15353 . Like cudnn conv implementation, mkldnn also falls back to the default `_convolution_double_backward` as double backward. This bug wasn't caught by CI before because mkldnn is only used when input scalar type is float, but our tests are all using double as default. Adding test for float inputs, but mkldnn seems to have imprecision issues similar to cudnn implementation, so here I only check if double backward exists instead of calling `gradgradcheck`. Please correct me if the precision should actually be checked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15686 Differential Revision: D13571682 Pulled By: ailzhang fbshipit-source-id: f1762439762370f276cfd59e8b8b8a4dee960a4b	2019-01-03 10:50:00 -08:00
Spandan Tiwari	947229ebd7	Fix ONNX export of logical ops, including torch.ne, to have correct output datatype (#15677 ) Summary: This is the an updated version of the earlier PR https://github.com/pytorch/pytorch/pull/15185, since that one was closed. Currently PyTorch ONNX exporter exports the logical ops (lt, gt, le, ge, eq, ne) with output type in corresponding ONNX ops as type tensor(uint8). But ONNX spec allows for only tensor(bool), which is why models that have these ops fail to load properly. This issue is captured in #11339. Part of this issue, relating to the allowed input types, has been fixed in ONNX spec by houseroad. This PR fixes the other part pertaining to output type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15677 Reviewed By: dzhulgakov Differential Revision: D13568450 Pulled By: houseroad fbshipit-source-id: a6afbea1afdb4edad8f8b1bc492f50b14e5f2fce	2019-01-03 10:35:25 -08:00
Shen Li	279ca4acd2	Port legacy reflection_pad1d to ATen (#15480 ) Summary: 1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index. 2. Changed Camelcase naming to underscore naming. Profiling: Legacy: ```bash $py.test test/test_nn.py -k ReflectionPad1d -v -s .... =========== 2 passed, 1258 deselected, 800 warnings in 4.35 seconds ============ ``` Now: ```bash $py.test test/test_nn.py -k ReflectionPad1d -v -s ... =========== 2 passed, 1258 deselected, 800 warnings in 4.03 seconds ============ ``` I have two questions about the code. Any insights are appreciated. gchanan zou3519 1. I can verify that [this magic](https://github.com/pytorch/pytorch/blob/master/aten/src/THCUNN/TemporalReflectionPadding.cu#L32-L36) correctly maps output index to input index in different cases. But, I have no idea about how did you come up with this algorithm that merges three categories (in left padding, in original input, in right padding) into a single statement? 2. Why do we need [get contiguous](https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/TemporalReflectionPadding.c#L80) tensors when calculating forward and backward propagation? Reflection_pad2d porting will come in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15480 Differential Revision: D13544924 Pulled By: mrshenli fbshipit-source-id: 182045434f210032a82cab721a190da0cd781fbf	2019-01-03 10:30:37 -08:00
Jongsoo Park	1159302ab1	bug fix in 3d group conv (#15625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15625 3D group conv (both NCHW and NHWC layout) was not correct. Added group=2 in test_1d_convolution and test_3d_convolution in conv_test Reviewed By: protonu Differential Revision: D13562099 fbshipit-source-id: 586e8a7574a2764f2a3b559db6c2415b3ab90453	2019-01-03 09:46:49 -08:00
Gregory Chanan	6103a04cff	Port torch.arange to aten and parallelize on CPU. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15667 Differential Revision: D13566631 Pulled By: gchanan fbshipit-source-id: e3243a4e81ecb58373681df8bf6a00428352fb14	2019-01-03 09:20:41 -08:00
Gerard Goossen	10c10b0990	Ignore flake8 warning about whitespace before ':' (#15663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15663 Ignore sometimes incorrect flake8 warning about whitespace before ':' See https://github.com/ambv/black/issues/315 Reviewed By: soumith Differential Revision: D13565818 fbshipit-source-id: 9d5ec2335899527ee71f4b505c00865a354e3bf0	2019-01-03 05:02:10 -08:00
Xiaomeng Yang	f53010370b	Add count_include_pad arg for PoolOpGradient on CPU and fix ARM performance issue. (#15651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15651 Add count_include_pad arg for PoolOpGradient on CPU and fix ARM performance issue. Reviewed By: houseroad Differential Revision: D13564257 fbshipit-source-id: 3a143f1122bc507ccb7827e9b46908d5c7203735	2019-01-03 00:18:47 -08:00
Jianyu Huang	3b5a940355	Unify the usage of Dequantize (#15685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15685 The declaration of "Dequantize" is in "fbsource/fbcode/deeplearning/fbgemm2/QuantUtils.h", so it requires the "namespace fbgemm". <T> is actually optional, since the type can de deduced from the first argument. In some places we have "Dequantize<T>(...)", while in other places we have "Dequantize(...)". We'd better unify them. As a reference, all occurrences of "Quantize" are using "fbgemm::Quantize<T>(...)". Reviewed By: jspark1105 Differential Revision: D13570847 fbshipit-source-id: 7fca9f7f9e4e0d9e5eb27ac44b8707adc3c80717	2019-01-02 21:32:46 -08:00
Shen Li	efc3d6b65d	Fix vec256 inversion (#15659 ) Summary: soumith zou3519 I was browsing the code, and think `vec256_int.h` might need a minor revision, but not 100% sure. 1. It currently invert the result by `XOR` with 0. Should it `XOR` with 1 instead? ~2. AVX2 logical operations would set all bits in a byte/word/... to `1` if the condition holds. So functions, such as `_mm256_cmpeq_epi64 ` would return `0/-1` instead of `0/1`. Should it be masked with `1` to make sure it returns 0/1?~ ~Would I be correct if I assume that the code revised below is not yet activated, but will be after we port legacy code to ATen?~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15659 Differential Revision: D13565929 Pulled By: mrshenli fbshipit-source-id: 8ae3daf256c3d915dd855a2215c95275e899ea8c	2019-01-02 21:32:44 -08:00
Zachary DeVito	b0cf780ecc	Add min/max on numbers to JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15680 Differential Revision: D13568806 Pulled By: zdevito fbshipit-source-id: ef0f33cc12a057184293bc31d28cc7b24f73eb94	2019-01-02 20:10:38 -08:00
Natalia Gimelshein	e2549cbc01	initialize with ident value in global reduction (#15653 ) Summary: Fixes #15647. cc colesbury. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15653 Differential Revision: D13571132 Pulled By: soumith fbshipit-source-id: 8f25943c974b3b931f4528e0e0a370bc095dab51	2019-01-02 19:52:57 -08:00
svcscm	0b0553f92d	Updating submodules Reviewed By: yns88 fbshipit-source-id: f7b540159cf1fe72825d09d55d56117d14ff90eb	2019-01-02 19:00:02 -08:00
rtarquini	879bccb1af	Support for Jetson Xavier (#15660 ) Summary: The request changes are to support building Pytorch 1.0 on the Jetson Xavier with Openblas. Jetson Xavier with Jetpack 3.3 has generic lapack installed. To pick up the CUDA accelerated BLAS/Lapack, I had to build Openblas and build/link pytorch from source. Otherwise, I got a runtime error indicating lapack routines were not cuda enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15660 Differential Revision: D13571324 Pulled By: soumith fbshipit-source-id: 9b148d081d6e7fa7e1824dfdd93283c67f69e683	2019-01-02 18:51:42 -08:00
Jesse Hellemn	62883a911c	Fixing cuda100 smoke tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15673 Reviewed By: yf225 Differential Revision: D13568746 Pulled By: pjh5 fbshipit-source-id: e636de417d61b48074399da75bfb2576c9f62743	2019-01-02 17:13:16 -08:00
Jerry Zhang	3ea5a9a66d	Remove PythonOp non-CPU path and PytorchOp (#15417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15417 Right now the way we test whether Blob contains a CPU tensor is broken in ```PythonOpBase``` is broken, which means non-CPU path might never be taken. Searching through the codebase, non-gpu path is used in PythonDLPack, and it is used in PytorchOp which is unused. So we'll remove non-gpu path in this diff. Reviewed By: dzhulgakov Differential Revision: D13495011 fbshipit-source-id: 9fe9537f05026d2a2cf7051efa81d184de722710	2019-01-02 16:36:37 -08:00
svcscm	7857909158	Updating submodules Reviewed By: yns88 fbshipit-source-id: bb142e8f91046cc2b7ea32dac46ec0753b4bc218	2019-01-02 14:58:48 -08:00
Michael Suo	d86cc3e7de	fix select after chunk op (#15672 ) Summary: Fixes #15669. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15672 Differential Revision: D13567274 Pulled By: suo fbshipit-source-id: a63e6cfc9dacedd4cb99dc51eee452038418001e	2019-01-02 14:35:23 -08:00
Michael Suo	bb3c3f516b	make flake8 failure blocking (#15675 ) Summary: Right now it just prints whatever flake8 errors and moves forward with the commit. This is too easy to miss. It should block the commit so that the user can fix the issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/15675 Differential Revision: D13567821 Pulled By: suo fbshipit-source-id: 5f0de40ddd771bad8d6848417408cffbceb03183	2019-01-02 12:52:59 -08:00
Zachary DeVito	c5554856c9	redo sleef build fix (#15549 ) Summary: This was accidentally reverted by #14866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15549 Differential Revision: D13549674 Pulled By: zdevito fbshipit-source-id: e209aac53dccb082b91cfa2d292310eabeb459e3	2019-01-02 12:48:25 -08:00
Jongsoo Park	bee6c6761e	format conv_test.py to prepare D13562099 (#15632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15632 Just formatting and a few lints. Reviewed By: yinghai Differential Revision: D13562403 fbshipit-source-id: c56f8ee61f68cdaccc0828a764ff729454f68259	2019-01-02 11:34:30 -08:00
kiendang	eeb14675f1	Fix torch.gesv args in doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15649 Differential Revision: D13564312 Pulled By: soumith fbshipit-source-id: b3bba2ece600880077eb09b092ce17e331995bd6	2019-01-02 00:20:22 -08:00
surgan12	b52420742d	clamp fixes (#15479 ) Summary: fix to #15338 . Differential Revision: D13564343 Pulled By: soumith fbshipit-source-id: be64b572945533e10ae6f627d335b47f093720a3	2019-01-01 23:12:17 -08:00
svcscm	8278a8b16f	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: acb68439e62ea270af22364183a6ecba883fab66	2019-01-01 23:12:16 -08:00
svcscm	2398b607ec	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 5c5ad6a5cc9220ee1dd9565d64c7459f866ff74d	2019-01-01 17:23:01 -08:00
Alexander Rodin	a0d22b6965	Fix typo in documentation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15628 Differential Revision: D13562685 Pulled By: soumith fbshipit-source-id: 1621fcff465b029142313f717035e935e9159513	2018-12-30 18:07:57 -08:00
vishwakftw	7bb41e3953	Make btriunpack work for high dimensional batches and faster than before (#15286 ) Summary: Changelog: - Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15286 Differential Revision: D13562038 Pulled By: soumith fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6	2018-12-30 12:42:07 -08:00
Xiaomeng Yang	56d945a1ca	Add count_include_pad arg for average_pool_op on CPU (#15593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15593 Add count_include_pad arg for average_pool_op on CPU Reviewed By: houseroad Differential Revision: D13558123 fbshipit-source-id: 188879ec3af313105ff66ac0b5a81ea44fca2855	2018-12-30 04:16:47 -08:00
vishwakftw	ef487d4f1d	Remove TH/THC link for cholesky (#15595 ) Summary: Changelog: - Remove TH/THC binding - Port single matrix case to ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/15595 Differential Revision: D13561657 Pulled By: soumith fbshipit-source-id: 65f8c4b455cf19a0c7b6aeac2e3b985c7a7208f8	2018-12-29 17:54:50 -08:00
Christoph	2a45050fdc	Concatenate directly into shared memory when constructing batches for numpy (#14534 ) Summary: Since #1323 tensors are shared with shared memory, but this feature is not active for numpy. This PR fix this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534 Differential Revision: D13561649 Pulled By: soumith fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0	2018-12-29 17:51:02 -08:00
Mark Harfouche	4047cdc690	Add a patch for OSX with SDK<10.12 (#15615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15614 Build passing on SDK 10.9 https://dev.azure.com/ramonaoptics/feedstock-builds/_build/results?buildId=13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15615 Differential Revision: D13561737 Pulled By: soumith fbshipit-source-id: 2ab0f78338d4949fa3f2735915fd96dce4bcd621	2018-12-29 16:11:58 -08:00
Gao, Xiang	d3e5540276	Fix typo: szie -> size Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15466 Differential Revision: D13536343 Pulled By: soumith fbshipit-source-id: cb3df30bf346ef6bc0bc1b6430107b3e0e086f8d	2018-12-28 22:40:52 -08:00
peter	119efd5266	Make the warning suppression safer (#15560 ) Summary: Address the problem introduced in https://github.com/pytorch/pytorch/pull/15499#issuecomment-450038494. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15560 Differential Revision: D13561346 Pulled By: soumith fbshipit-source-id: 6abf622672bdcb77ae1a7188e8a3817fa97aecbc	2018-12-28 22:12:36 -08:00
Jongsoo Park	d53012b4fe	add NCHW2NHWC and NHWC2NCHW in utils.py (#15588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588 Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions. Reviewed By: csummersea Differential Revision: D13557674 fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0	2018-12-28 17:34:50 -08:00
Vishwak Srinivasan	9c8d8eab9d	Remove TH/THC link for gesv (#15510 ) Summary: This PR removes the TH/THC binding for gesv. Changelog: - Remove TH/THC binding - Port single matrix case to ATen - Enable test_gesv for CUDA as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/15510 Differential Revision: D13559990 Pulled By: soumith fbshipit-source-id: 9da2825e94d3103627e719709e6b1f8b521a07fb	2018-12-28 16:54:27 -08:00
Dong Li	cd3c4a2f1c	keep extra_info of each op in ProfDagStats (#15244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15244 This DIFF keeps track of the extra_info information attached to each operator. When getPerOpStas() is called, it attaches the extra_info to the result ProfDagStats protobuf. Facebook Net transform attaches a global_op_id which is defined as a tuple of (orig_net_name, original_op_index) to each operator, The global_op_id is encoded as extra_info in each operator. Reviewed By: aazzolini Differential Revision: D13016289 fbshipit-source-id: 3e2719ec7ed0ebe47740b77581c565ff7e79b102	2018-12-28 15:03:23 -08:00
David Riazati	692898fe37	Error when torch.load-ing a JIT model (#15578 ) Summary: Throw a warning when calling `torch.load` on a zip file Fixes #15570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15578 Differential Revision: D13555954 Pulled By: driazati fbshipit-source-id: a37ecdb3dd0c23eff809f86e2f8b74cd48ff7277	2018-12-28 13:54:32 -08:00
SsnL	fb22f76eb6	default_collate should collate bool list to byte tensors (#14669 ) Summary: Based on #15331 . Review only the last commit. Fixes https://github.com/pytorch/pytorch/issues/14507. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14669 Reviewed By: ezyang Differential Revision: D13528725 Pulled By: soumith fbshipit-source-id: f12f1ac1c4ff2a3ddd6877c0c096a5da3a1ffa3c	2018-12-28 12:26:46 -08:00
Jongsoo Park	6a3e54eda9	append caffe2 prefix to dnnlowp cmd line options (#15582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15582 Following convention of having caffe2_ prefix in command line options Reviewed By: viswanathgs Differential Revision: D13252055 fbshipit-source-id: 142a6395b832f211f34d0a87ec2d62c1e5fcdc69	2018-12-28 11:51:59 -08:00
Jesse Hellemn	2c4c8784d2	adding nightly build smoke tests to circleci Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15441 Reviewed By: yf225 Differential Revision: D13552399 Pulled By: pjh5 fbshipit-source-id: 4a52ee2d08324b9ab6b8c266ad6a1cd3bdad1c71	2018-12-28 10:47:38 -08:00
Lingyi Liu	c1643ec551	add the int support (#15581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15581 as title Reviewed By: protonu Differential Revision: D13556274 fbshipit-source-id: ba21f0970257d526e2fe7574eea4f89465b9c618	2018-12-27 17:16:32 -08:00
Will Feng	9bf7eb914d	Move VariableImpl functions to AutogradMeta and Variable (#15487 ) Summary: In this PR, we are moving all functions away from `Variable::Impl`, in order to get rid of `Variable::Impl` (and the `data_` Tensor in it) in the next PR. Some of the functions (such as `set_requires_grad` / `requires_grad` / `grad`) will be living in `AutogradMeta` class, while others (such as `backward()` / `rebase_history()` / `grad_accumulator()` / `grad_fn()`) will be living in `Variable` class. This is the 2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15487 Differential Revision: D13553173 Pulled By: yf225 fbshipit-source-id: 691f9432d0cd0640af380c757f3e3a2f64f8851c	2018-12-27 17:16:31 -08:00
Roy Li	50fbf79451	test basic tensor interop Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12249 Differential Revision: D13469356 Pulled By: li-roy fbshipit-source-id: b49748462aa44ac34b8ce79783f2c895a537a232	2018-12-27 17:04:00 -08:00
David Riazati	70f0c4745b	Allow int/float cast to bool (#13391 ) Summary: This PR adds explicit `bool()` casts to match Python semantics `bool(1) = True` `bool(0) = False` `bool(0.0) = False` `bool(0.1) = True` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13391 Differential Revision: D12871213 Pulled By: driazati fbshipit-source-id: 773a48b2647973138efe854abe725d647f1d727d	2018-12-27 16:01:08 -08:00
Elias Ellison	0fff5b3612	remove print ops before exporting onnx graph (#15550 ) Summary: Removing print ops before exporting onnx graph, fixes https://github.com/pytorch/pytorch/issues/15505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15550 Differential Revision: D13551195 Pulled By: eellison fbshipit-source-id: 1ea1e34cb5b8433eacc2b86fb10b241198af96be	2018-12-27 15:46:05 -08:00
Igor Fedan	62151aa259	Added deviceCount() virtual method to DeviceGuardImplInterface (#15574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15574 Added deviceCount() virtual method to DeviceGuardImplInterface, also added correspondent implementation for CPUGuardImpl, CUDAGuardImpl, FakeGuardImpl, VirtualGuardImpl, HIPGuardImplMasqueradingAsCUDA Reviewed By: soumith Differential Revision: D13554609 fbshipit-source-id: 913bf2aad44a0a356efe54505ee4abaf6c4622db	2018-12-27 15:36:32 -08:00
Gregory Chanan	02a249ed92	Port torch.range to aten and parallelize on CPU. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15484 Differential Revision: D13538955 Pulled By: gchanan fbshipit-source-id: ee3889ad116988d963e603621310b3bbdce0aec9	2018-12-27 15:25:57 -08:00
Lu Fang	d63740bc3f	Export group norm as ATen and add test (#15569 ) Summary: Short term solution, export group norm as an ATen op to unblock users. Long term will add GroupNorm to onnx. Add an end to end test for this one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15569 Differential Revision: D13554293 Pulled By: houseroad fbshipit-source-id: b4974c9ea2a1b81338ca1e5c6747efe2715d7932	2018-12-27 14:44:29 -08:00
SsnL	e4477feb15	Update cuda.get/set_rng_state doc (#14324 ) Summary: Now that `cuda.get/set_rng_state` accept `device` objects, the default value should be an device object, and doc should mention so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14324 Reviewed By: ezyang Differential Revision: D13528707 Pulled By: soumith fbshipit-source-id: 32fdac467dfea6d5b96b7e2a42dc8cfd42ba11ee	2018-12-27 14:09:25 -08:00
Marat Dukhan	9ad6ada9de	Update QNNPACK (#15561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15561 - Update QNNPACK submodule to master (API-incompatible) - Do matching changes in Caffe2 Int8 operators Reviewed By: dreiss Differential Revision: D13551322 fbshipit-source-id: 066f9087061167f7d7cfbc1c8f8628dfa93d056e	2018-12-27 11:59:54 -08:00
Michael Suo	ed949e20cb	Revert D13552080: [pytorch][PR] add clang-format check to CI Differential Revision: D13552080 Original commit changeset: 462a73894c16 fbshipit-source-id: ebfc5aa3343cebabbc24ff39e4e9841a372443e2	2018-12-27 10:56:52 -08:00
daquexian	c86cd9e530	Fix wrong class name in jit _make_fail (#15559 ) Summary: It should be ScriptModule rather than TracedModule :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15559 Differential Revision: D13552058 Pulled By: soumith fbshipit-source-id: 0aa17639c225818b00d59daec4bc2336f039f658	2018-12-27 02:02:33 -08:00
Michael Suo	80cc280c68	add clang-format check to CI (#15543 ) Summary: Simple check that runs against your PR's changes and complains if running clang-format would have created a change. Does nothing when run against master, so it's "safe" to accept changes that fail this check and it won't break the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15543 Reviewed By: soumith Differential Revision: D13552080 Pulled By: suo fbshipit-source-id: 462a73894c16e7108806af7fa88440c377d4d0d2	2018-12-26 22:20:32 -08:00
Ailing Zhang	4d029bba7f	Fix github branch prefix v (#15552 ) Summary: Fixes #15519 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/15552 Differential Revision: D13550780 Pulled By: ailzhang fbshipit-source-id: b117e5ced42de207b91045bffcee8907dd73201e	2018-12-26 19:48:47 -08:00
Viswanath Sivakumar	eeaf1b64cb	Rotated boxes support for GPU GenerateProposals op (#15470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15470 On top of D13509114 and D13017791. Pretty straight-forward. Reviewed By: newstzpz Differential Revision: D13536671 fbshipit-source-id: ff65981b70c63773ccc9aef3ff28e3c9508f6716	2018-12-26 18:03:56 -08:00
Viswanath Sivakumar	e25702ac2b	CUDA kernel for rotated NMS support, over 200x speedup than CPU (#15365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15365 On top of D13017791, adding rotated NMS support with the same kernel building blocks. Results in 218x speedup on avg. Reviewed By: SuperIRabbit Differential Revision: D13509114 fbshipit-source-id: c1d33c8dc4bc50b5906b4f01bb0caf1115e2a357	2018-12-26 18:03:55 -08:00
Will Feng	7b87ecae37	Move autograd metadata from VariableImpl to TensorImpl (#13827 ) Summary: Changes originally in this PR: 1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct 2. Change Variable::Impl functions to use data members in `AutogradMeta` struct 3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl 4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()` Changes moved from https://github.com/pytorch/pytorch/pull/13645: 1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python. 2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated. This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827 Differential Revision: D13507173 Pulled By: yf225 fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a	2018-12-26 16:34:24 -08:00
Soumith Chintala	4c5b1cc026	version bump to 1.1 (#15554 ) Summary: version bump to 1.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15554 Differential Revision: D13550818 Pulled By: soumith fbshipit-source-id: 8a28582c98b42c081e103581551a01fd96c9f42d	2018-12-26 15:44:25 -08:00
derek	8c6ff91d57	In README.md CMAKE_PREFIX_PATH should be CONDA_PREFIX when using an conda virtual environment (#15548 ) Summary: In current README.md, `CMAKE_PREFIX_PATH` is set to conda root even when you have activated an virtual environment. When an conda virtualenv is activated, packages are installed in `CONDA_PREFIX`, not conda root. I think `CMAKE_PREFIX_PATH` should also be set to `CONDA_PREFIX` in this case. I think some build issues can be solved with the new instruction. Maybe something like #14954. soumith, When I made PR #15335 I was confused and made a wrong point. I think this PR could be the real solution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15548 Differential Revision: D13549681 Pulled By: soumith fbshipit-source-id: 42d855b6e49ee58d735d2f4715d3e5752a748693	2018-12-26 12:57:07 -08:00
David Pollack	cdb8edce75	add from_pretrained method to EmbeddingBag (#15273 ) Summary: The `EmbeddingBag` module does not include a `from_pretrained` method like the `Embedding` module. I added it for consistency between the two modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15273 Differential Revision: D13547842 Pulled By: soumith fbshipit-source-id: 8ffde51ff0c1e8fc8310263b6f375da88089ff7d	2018-12-26 08:35:39 -08:00
vishwakftw	5ac95758e2	Make argument size checking consistent across CPU and CUDA for torch.gesv (#15430 ) Summary: There is an inconsistency in the size of arguments for gesv, which is fixed in this PR. Changelog: - Replicate check in CPU as done for CUDA - Fix argument ordering (minor) in CUDA checking Fixes #15328 Differential Revision: D13531167 Pulled By: soumith fbshipit-source-id: c4b4e4fc12880208d08e88d1e47e730ac98c2ad3	2018-12-26 08:32:28 -08:00
Michael Suo	f636dc9276	clang format world (#15524 ) Summary: The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook. Here is a list of non-mechanical changes: - I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting. - Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas - Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas - Small improvements to the precommit hook clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524 Differential Revision: D13547989 Pulled By: suo fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493	2018-12-26 06:55:01 -08:00
Frank Zhang	d4712ee218	Added correct isinf handling for Integral tensors (#15489 ) Summary: Currently torch.isinf on integral tensor will raise RuntimeError: value cannot be converted to type int16_t without overflow: inf. This pr will suppress the error and return false(0) for all integral tensors. The behavior will also be consistent with np.isinf Pull Request resolved: https://github.com/pytorch/pytorch/pull/15489 Reviewed By: zou3519 Differential Revision: D13540786 Pulled By: flashhack fbshipit-source-id: e730dea849da6a59f3752d347bcfbadfd12c6483	2018-12-26 06:36:09 -08:00
Derek Kim	d602ddcda3	Trivial comment update in autograd/function.h (#15529 ) Summary: I removed the explanation on `num_inputs` parameter. This parameter was removed in #8168 colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/15529 Differential Revision: D13547854 Pulled By: soumith fbshipit-source-id: 8a9ac58f2c93a2533b82ec63089477166ed0bcb9	2018-12-26 02:25:54 -08:00
peter	6e4be0af2e	Fix failed type cast in Windows Debug Build (#15333 ) Summary: Fixes #15330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15333 Differential Revision: D13531317 Pulled By: soumith fbshipit-source-id: b956f27bd7fa33cbdf405338fcbcbc7df2fd629f	2018-12-26 00:48:58 -08:00
Gu, Jinghui	12e0ed55b4	Upgrade MKL-DNN to version 0.17 and static build MKL-DNN (#15504 ) Summary: Upgrade MKl-DNN to 0.17 and static build MKL-DNN to fix the potentail build error due to old mkldnn version in host system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15504 Differential Revision: D13547885 Pulled By: soumith fbshipit-source-id: 46f790a3d9289c1e153e51c62be17c5206ea8f9a	2018-12-25 22:56:51 -08:00
Soumith Chintala	2fe5c29d81	remove legacy from docs (#15112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15112 Differential Revision: D13547845 Pulled By: soumith fbshipit-source-id: 61e3e6c6b0f6b6b3d571bee02db2938ea9698c99	2018-12-25 21:57:54 -08:00
Alexander Rodin	60b13d1f71	Use at::zeros instead of torch::zeros in non-differentiable example (#15527 ) Summary: There was a typo in C++ docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15527 Differential Revision: D13547858 Pulled By: soumith fbshipit-source-id: 1f5250206ca6e13b1b1443869b1e1c837a756cb5	2018-12-25 21:50:17 -08:00
peter	2ed95c5871	Fix the compare logic in function `overflows` for MSVC (#15499 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15497. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15499 Differential Revision: D13547835 Pulled By: soumith fbshipit-source-id: a674da93bf905a0b81f0cc60449ccb97c2746926	2018-12-25 21:50:15 -08:00
SsnL	521894c490	Allow converting char tensor to numpy; add [fi]info.min (#15046 ) Summary: https://github.com/pytorch/pytorch/pull/14710 with test fixed. Also added `finfo.min` and `iinfo.min` to get castable tensors. cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15046 Reviewed By: soumith Differential Revision: D13429388 Pulled By: SsnL fbshipit-source-id: 9a08004419c83bc5ef51d03b6df3961a9f5dbf47	2018-12-24 09:11:24 -08:00
Lin Huang	b7bc49ad70	Port replication_pad1d to ATen (#15507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15485 port replication_pad1d Reviewed By: ezyang Differential Revision: D13531920 fbshipit-source-id: dcd64ebd2c24b7431996231b8d5addfb600b1072	2018-12-24 06:34:02 -08:00
Peter Goldsborough	ad6799537e	Support stateful dataset (#15096 ) Summary: Currently re-implements the dataloader for stateful datasets. Outstanding work: - Refactor DataLoader and DataLoader2 to have common base classes and only differ in specifi pieces of logic, - Figure out how to not duplicate the `MapDataset` logic for stateful vs. non-stateful Pull Request resolved: https://github.com/pytorch/pytorch/pull/15096 Differential Revision: D13522043 Pulled By: goldsborough fbshipit-source-id: 08e461ca51783047f11facc4d27dfa2e4f1e4c2a	2018-12-24 06:26:40 -08:00
Michael Suo	8cd917812b	put interactive prompt in bash (#15521 ) Summary: This makes compatibility with different versions of python a little bit simpler, and fixes a problem where stdin wasn't being read from the terminal properly in the prompt. zdevito This should fix your EOF exception. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15521 Differential Revision: D13546358 Pulled By: suo fbshipit-source-id: fb7551a86c888196831c046d9d9848e7ff05b925	2018-12-24 05:37:46 -08:00
peter	f8a56bf476	Fix the iterator category for torch::data::Iterator (#15500 ) Summary: Try to fix https://github.com/pytorch/pytorch/issues/14410. Additional info: From this [page](https://stackoverflow.com/questions/14062297/canonical-way-to-define-forward-output-iterator), If we change it into `input_iterator_tag`, it doesn't mean the `output_iterator_tag` is lost. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15500 Differential Revision: D13545773 Pulled By: soumith fbshipit-source-id: 327bfb7be83d53e42925e0e391b2a4277e3a1b36	2018-12-23 19:49:44 -08:00
Michael Suo	c07647814b	Precommit hook: just warn if no clang-tidy (#15514 ) Summary: The precommit hook shouldn't hard fail if there's no `clang-tidy`, just warn and omit the check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15514 Differential Revision: D13545776 Pulled By: suo fbshipit-source-id: 9bf3f8ee18703c6d1a39eb7776092fb5e120d2a1	2018-12-23 14:38:13 -08:00
Gao, Xiang	4a716250cc	Add torch.rot90 to torch.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15512 Differential Revision: D13545775 Pulled By: soumith fbshipit-source-id: 2a8896571745630cff4aaf3d5469ef646bdcddb4	2018-12-23 14:31:11 -08:00
Brennan Vincent	51f1c4fea5	fix parallelization detection for CPU foreach_reduced_elt (#15483 ) Summary: This does two things: (1): revert #15114 , which is incorrect and actually just completely disables parallelization in this function (because `at::get_num_threads` returns `-1` unless it has been set explicitly) (2): Fix our (FB-internal) failing tests that #15114 was intended to fix, by still working correctly in a setup where `#ifdef _OPENMP` is set and `omp_get_max_threads() > 1` , but `#pragma omp parallel` only launches one thread. I believe such an unusual situation only exists in certain unit tests within FB infra but we still need it to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15483 Differential Revision: D13538940 Pulled By: umanwizard fbshipit-source-id: a3362c7ac7327ced350d127bb426f82c59e42732	2018-12-23 12:51:40 -08:00
Jongsoo Park	4e4ef0cffb	add rowwise adagrad lp test (#15082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082 We didn't have unit test for low-precision rowwise adagrad Reviewed By: chocjy Differential Revision: D13300732 fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546	2018-12-22 10:25:39 -08:00
Jongsoo Park	e012b183dd	handle empty inputs to SparseLengthsMean correctly (#15389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389 SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros. The unit tests were also not covering this special case which is fixed by this diff. Reviewed By: salexspb Differential Revision: D13515970 fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da	2018-12-21 22:20:14 -08:00
Hao Lu	58a7f2aed1	Add pthreadpool_create and pthreadpool_destroy (#15492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15492 Add pthreadpool_create and pthreadpool_destroy, which are used by NNPACK tests. Reviewed By: Maratyszcza Differential Revision: D13540997 fbshipit-source-id: 628c599df87b552ca1a3703854ec170243f04d2e	2018-12-21 20:28:18 -08:00
Pritam Damania	90aa21e795	Metadata for input/output formats in model file proto. (#15252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15252 We would like to extend the model file format to include strongly type, semantic information about the model inputs and outputs. The goal is for a user to be able to consider a model file like a function with a well defined API describing what the inputs and outputs would be. Reviewed By: dzhulgakov Differential Revision: D13009915 fbshipit-source-id: 5df124a876ad03c05fbdaacae0eab659637734c1	2018-12-21 17:42:38 -08:00
Zachary DeVito	f3a588fede	add len to nativeResolver (#15488 ) Summary: (otherwise len is not resolvable using torch::jit::compile) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15488 Differential Revision: D13539991 Pulled By: zdevito fbshipit-source-id: 3ba85fa7b1adb163f9229c568f7997d22321903d	2018-12-21 16:47:15 -08:00
David Riazati	934fc28656	Remove NoneGenerator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15335 Differential Revision: D13540357 Pulled By: driazati fbshipit-source-id: a289e5944b65872103f68faac74e18f10e7c6fff	2018-12-21 16:33:37 -08:00
David Riazati	1dcf2ea096	Add self to Python printer reserved words (#15318 ) Summary: This adds `self` to the list of reserved words and also sorts the lines and prevents the tracer from naming values 'self' (which happens in torch/tensor.py) Fixes #15240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15318 Differential Revision: D13540192 Pulled By: driazati fbshipit-source-id: 46ae02e51b1b31d5c62110fa83ba258ea6bada27	2018-12-21 16:02:07 -08:00
Ailing Zhang	70aafad08a	AD support for adaptive_avg_pool2d (#15459 ) Summary: This adds AD support for adaptive_avg_pool2d, which is necessary for resnet50 in pytorch/vision:master. cc: soumith asuhan dlibenzi apaszke I saw that autodiff bug you fixed in #15403 , as it doesn't prevent this PR from passing, so I'll leave it for your PR to fix it. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15459 Differential Revision: D13534732 Pulled By: ailzhang fbshipit-source-id: 4e48b93e35d5ecfe7bd64b6a132a55b07843f206	2018-12-21 15:38:24 -08:00
Hao Lu	01be9b7292	Handling nullptr case Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15467 Reviewed By: Maratyszcza Differential Revision: D13536504 fbshipit-source-id: ab46ff6bb4b6ce881c3e29d7e6a095ea62289db4	2018-12-21 15:08:00 -08:00
Bram Wasti	235d47760b	Relax check on outputs (#15458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15458 many nets in the wild seem to have outputs that are never produced by the net. Reviewed By: ZolotukhinM Differential Revision: D13534185 fbshipit-source-id: 2b23b39c28404c53f68868f3bf6df53c5fea9eab	2018-12-21 14:19:37 -08:00
Zachary DeVito	6bf05bfde6	allow non-final returns (#15463 ) Summary: This PR allows a subclass of programs that have return statements that are not final in the graph. `final_returns.h` contains the a comment describing how this is accomplished. To minimize complexity in `compiler.cpp`, this pass is done as an AST-to-AST rewrite before the compiler runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15463 Differential Revision: D13538962 Pulled By: zdevito fbshipit-source-id: 67105ca873351825b4a364092ab1873779f3e462	2018-12-21 14:01:33 -08:00
derek	3da4a04733	Fixed trivial typos in Dropout2D and Dropout3D classes (#15200 ) Summary: Fixed trivial typos in Dropout2D and Dropout3D classes weiyangfb Pull Request resolved: https://github.com/pytorch/pytorch/pull/15200 Differential Revision: D13537888 Pulled By: ezyang fbshipit-source-id: 8fb06027ca663a2e4bfa016af400698ae3c88ad1	2018-12-21 11:58:10 -08:00
svcscm	ff8fbc4f23	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 59d7a5b82fb78bc2d2285d0896e35c262512ffb9	2018-12-21 11:47:05 -08:00
surgan12	7e2ec24886	eq_fixes (#15475 ) Summary: fixes #15464 . cc : ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15475 Differential Revision: D13537812 Pulled By: ezyang fbshipit-source-id: 127adf612ac8b3d3a64baa3d12a53daba7d3e4b8	2018-12-21 11:43:06 -08:00
vishwakftw	d9cad71b36	Enable running collect_env.py without building PyTorch (#15468 ) Summary: Closes #15346 Differential Revision: D13537873 Pulled By: ezyang fbshipit-source-id: 7765ce4108dae9479d8900c0815cc2f174596a83	2018-12-21 11:37:43 -08:00
Bram Wasti	ac506f5820	Back out "[nomnigraph][executor] computeChains with nomnigraph" (#15451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15451 Original commit changeset: ccd050bfead6 Reviewed By: ilia-cher Differential Revision: D13533161 fbshipit-source-id: 1d0dcd54c2e3875aab015f3e996693e67a449b87	2018-12-21 11:09:27 -08:00
James Reed	acbd9c49b0	Direct FBGEMM integraton into ATen (#13777 ) Summary: This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation: 1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel. 2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value. 3) Biases are unquantized 4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops must be run with FBGEMM The API can be seen in the added test case. Highlights are: 1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above. 2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777 Differential Revision: D13383276 Pulled By: jamesr66a fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344	2018-12-21 10:35:51 -08:00
Ashwin Ramaswami	614121c1ef	Replace getargspec with getfullargspec (#15396 ) Summary: Replace `getargspec` with `getfullargspec` to resolve test warnings. Fixes #15344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/15396 Differential Revision: D13529548 Pulled By: zou3519 fbshipit-source-id: 50d3be92423a9ce89bc4895b67569663e1abbaa6	2018-12-21 09:40:33 -08:00
Fei Sun	2b23ba8ef0	The benchmark binary support multiple batches in one run (#15443 ) Summary: It is sometimes beneficial to run multiple batches in one benchmark and check the aggregated results. This PR enables this functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15443 Reviewed By: llyfacebook Differential Revision: D13531129 Pulled By: sf-wind fbshipit-source-id: 553a762a5cbadf5a3d9fd6af767ae34899bc1aa2	2018-12-21 08:45:41 -08:00
Gregory Chanan	433db13b48	Move torch.logspace to ATen and parallelize on CPU. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15438 Reviewed By: ezyang Differential Revision: D13529626 Pulled By: gchanan fbshipit-source-id: 896e8afee3d6b5a706c4f5815b91ba6bd8af6672	2018-12-21 08:24:33 -08:00
Dmytro Dzhulgakov	61cc701dd7	Fix cudnn dropout (#15473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15473 Revert accidental changes introduced in D13335176 IntList is a range and copying it just copies pointers. Thus pointers would point either on deallocated memory or on the same memory causing equality always pass. Reviewed By: ezyang Differential Revision: D13537131 fbshipit-source-id: c97b3533be689bb4cdadd9e612f1284ac50e4bda	2018-12-21 08:15:44 -08:00
Jongsoo Park	f52f68bcf9	format specialized_segment_ops_test.py to prepare D13515970 (#15408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15408 Applied formatting to specialized_segment_ops_test.py to prepare D13515970 Reviewed By: salexspb Differential Revision: D13520300 fbshipit-source-id: c3250b6abe8087c607f65ae60d1da61bd46c342b	2018-12-20 23:44:47 -08:00
Yinghai Lu	cb79e1b3a5	Clean up onnxifi transformation code (#15453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15453 Just move things around to facilitate further development. No logic change. Reviewed By: rdzhabarov Differential Revision: D13533959 fbshipit-source-id: eebab1306939e802aacffb24a711d372fd67916c	2018-12-20 22:06:47 -08:00
Edward Yang	26b04523b1	Record Caffe2's current stream ID in c10_cuda. (#15174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174 Previously, Caffe2 maintained a separate per-thread per-device current logical CUDA stream ID. In this PR, we switch Caffe2 over to using c10::Stream to manage the current stream, and also manage the allocation of cudaStream_t objects. This results in a slight behavior change: previously, Caffe2 would have been willing to allocate an arbitrary number of CUDA streams, depending on how high the logical stream IDs went. The c10::Stream pool has a fixed number of streams, once you exceed it, it wraps around. Reviewed By: dzhulgakov Differential Revision: D13451550 fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc	2018-12-20 21:54:05 -08:00
Richard Zou	3353064060	Add option to automatically handle unsorted variable-length sequences in RNNs (#15225 ) Summary: Fixes #3584. Motivation: manually sorting sequences, packing them, and then unsorting them is something a lot of users have complained about doing, especially when we can offer library support for them. Overview: we internally sort sequences before packing them and store a list of `unsorted_indices` that represent how to unsort the sequences inside PackedSequence. The packing helper functions return PackedSequence with the `permutation` field and the unpacking helper functions use it to unsort. To implement this, the following changes were made: - PackedSequence now keeps `sorted_indices` and `unsorted_indices`. These two can be thought of as permutations and are inverses of each other. `sorted_indices` is how the sequences were sorted; `unsorted_indices` is how to unsort the sequences. - Added an `enforce_sorted` argument to pack_sequence and pack_padded_sequence that maintains the legacy behavior of error-ing out on unsorted-sequences. When `enforce_sorted=True`, these functions maintain their ONNX exportability. - pack_sequence(sequences, enforce_sorted) takes in unsorted sequences. - pack_padded_sequence can take in a padded tensor that represents padded, unsorted sequences. - pad_packed_sequence unsorts the PackedSequence such that it is still the inverse operation of packed_padded_sequence. - RNNs apply `sort_indices` to their input hidden state and apply `unsort_indices` to their output hidden state. This is to ensure that the hidden state batches correspond to the user's ordering of input sequences. NOT BC-Breaking - The default for pack_sequence and pack_padded_sequence is `enforce_sorted=True` to avoid breaking ONNX export. To use the new functionality, pass in `enforce_sorted=False` Testing Plan - Modified TestNN.test_pack_sequence, TestNN.test_packed_padded_sequence, and TestNN.test_variable_sequence (RNN test) to check the behavior of unsorted sequences, sorted sequences, and sorted sequences with enforce_sorted=True - test/test_jit.py has a test to see if RNNs are exportable with enforce_sorted=True cc colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/15225 Reviewed By: soumith Differential Revision: D13507138 Pulled By: zou3519 fbshipit-source-id: b871dccd6abefffca81bc4e3efef1873faa242ef	2018-12-20 17:37:18 -08:00
WeihuangXu	52699f0754	Change default value of unique to 'sorted=True' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15379 Differential Revision: D13531287 Pulled By: ezyang fbshipit-source-id: 1512da7d660dc413688d99264e6434897c3ac78c	2018-12-20 17:09:08 -08:00
Jongsoo Park	4ee1c2c632	add denormal options (ftz and daz) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15423 Reviewed By: yinghai Differential Revision: D13526340 fbshipit-source-id: de2ecc717b4f778f33a8bf940ed144dbb230c7a8	2018-12-20 17:04:39 -08:00
surgan12	3a6d473b49	collect_env fix (#15447 ) Summary: fixes #15214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15447 Differential Revision: D13531523 Pulled By: ezyang fbshipit-source-id: 8f24f5ae9f3e78f6c5c9ee702ba14faca7aa297a	2018-12-20 16:56:34 -08:00
Lu Fang	a178f0a316	Remove unused field in jit script module deserializer (#15439 ) Summary: A little bit clean up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15439 Reviewed By: zrphercule Differential Revision: D13532015 Pulled By: houseroad fbshipit-source-id: 2fb1e01fc28549c7e78af6c65ee68339950bc7da	2018-12-20 16:18:40 -08:00
Edward Yang	8883ac4b58	Revert D13494873: [pytorch][PR] Fixing ONNX export of logical ops to have correct output datatype Differential Revision: D13494873 Original commit changeset: 069d2f956a5a fbshipit-source-id: 80ef10b2eb623a63da51dc2e4874f2ee446f426d	2018-12-20 15:56:31 -08:00
Viswanath Sivakumar	95a0e2c421	Fix ASAN div by zero error in rotated GenerateProposals op (#15415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15415 Was introduced in D13429770 Reviewed By: SuperIRabbit Differential Revision: D13524114 fbshipit-source-id: a890eb3b97c24952c361155d1432a801499f4ddd	2018-12-20 15:44:15 -08:00
Jerry Zhang	ed5b584f65	Tensor construction codemod(ResizeLike) - 7/7 (#15087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15087 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13419765 fbshipit-source-id: 34d695309a66723281429610a12544598c507d74	2018-12-20 15:33:07 -08:00
rory	d6cbcb43c5	allow numpy-like boolean-list indexing in pytorch (#14932 ) Summary: Suggested fix to issue #6773, the fix allows numpy-like boolean-list indexing in pytorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/14932 Differential Revision: D13398795 Pulled By: ezyang fbshipit-source-id: 67f8daf9829db2550ff76d2bde673be6dd2708cd	2018-12-20 15:33:06 -08:00
Teng Li	f56217af3b	Doc improvement on DDP (#15440 ) Summary: I noticed that some users don't even know we have this support. Adding into the doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/15440 Differential Revision: D13531045 Pulled By: teng-li fbshipit-source-id: 9757c400c0010608758c754df04e603b36035a10	2018-12-20 14:51:57 -08:00
Edward Yang	cde26c659e	Fix type annotation error. (#15448 ) Summary: According to mypy, the trailing -> None is mandatory. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15448 Differential Revision: D13532179 Pulled By: ezyang fbshipit-source-id: e8972f8c9ada4657c518cd7bcd46e489ab8ddf5f	2018-12-20 14:47:57 -08:00
Johannes M Dieterich	c24a124fa0	Add launch bounds needed for ROCm 2.0 (#15400 ) Summary: ROCm 2.0's compiler requires launch_bounds annotations if flat work group sizes are larger than the default of 256. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15400 Differential Revision: D13531239 Pulled By: ezyang fbshipit-source-id: c0b40600a8c332823da6c7113c644d8dba424a9c	2018-12-20 14:39:13 -08:00
Zachary DeVito	1a2ec10bd4	Support enough of closures to write autograd functions (#15411 ) Summary: This PR adds enough of the infra for supporting closures (inner script functions) in order to allow us to expression symbolic gradients using them. We do not actually ever run graphs that contain these closures. The symbolic_script infrastructure just extracts them out of the original forward graph and turns them into discrete forward/backward pairs. This cuts down on the type annotations necessary to write forward/backward pairs and aligns closely with the "differentiator" function approach to expression reverse-mode AD. Example: This code: ``` import torch r = torch.jit.CompilationUnit( ''' def mul_forward(self, other): def backward(grad_output): grad_self = (grad_output * other).sum_to_size(self.size()) grad_other = (grad_output * self).sum_to_size(other.size()) return grad_self, grad_other return self * other, backward ''') print(r.module.code) ``` Will produce this graph (pretty printed for clarity): ``` def mul_forward(self, self: Tensor, other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]: backward = (self.__lambda, (other, self)) return (torch.mul(self, other), backward) def __lambda(self, context: Tuple[Tensor, Tensor], grad_output: Tensor) -> Tuple[Tensor, Tensor]: other, self, = context grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self)) grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other)) return (grad_self, grad_other) ``` symbolic_script will then do some modifications to remove the unsuppored prim::Function node, yielding: ``` def mul_forward(self, self: Tensor, other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]: return (torch.mul(self, other), (other, self)) def backward(self, context: Tuple[Tensor, Tensor], grad_output: Tensor) -> Tuple[Tensor, Tensor]: other, self, = context grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self)) grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other)) return (grad_self, grad_other) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15411 Differential Revision: D13523340 Pulled By: zdevito fbshipit-source-id: 4d4a269460e595b16802c00ec55ae00e3e682d49	2018-12-20 14:39:11 -08:00
hbraun@nvidia.com	3fdf567752	Adding CUDA version for C2 operators generate proposals and nms (#13694 ) Summary: Related to issue #13684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13694 Reviewed By: wat3rBro Differential Revision: D13017791 Pulled By: newstzpz fbshipit-source-id: 4bdc58e474d8e1f6cd73a02bf51f91542a2b9d0b	2018-12-20 14:39:09 -08:00
Gao, Xiang	a47749cb28	Add at::one_hot (#15208 ) Summary: Closes: https://github.com/pytorch/pytorch/issues/15060 Differential Revision: D13528014 Pulled By: ezyang fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293	2018-12-20 14:24:58 -08:00
Fei Sun	2a64a78e7b	Extract arguments to its own file and pass arguments to ios apps (#15413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15413 In order to pass arguments to the ios app, need to extarct the arguments to its own file. Also, in the ios app, do not use the benchmark.json, which parses the arguments. This is an incompatible change, needs to add hot fix to the tests. Reviewed By: llyfacebook Differential Revision: D13523240 fbshipit-source-id: b559cc7f52d8f50ee206a7ff8d7b59292d855197	2018-12-20 13:31:48 -08:00
Spandan Tiwari	f0f9277c3c	Fixing ONNX export of logical ops to have correct output datatype (#15185 ) Summary: Currently PyTorch ONNX exporter exports the logical ops (`lt`, `gt`, `le`, `ge`, `eq`) with output type in corresponding ONNX ops as type `tensor(uint8)`. But ONNX spec allows for only `tensor(bool)`, which is why models that have these ops fail to load properly. This issue is captured in https://github.com/pytorch/pytorch/issues/11339. Part of this issue, relating to the allowed input types, has been fixed in ONNX spec by houseroad. This PR fixes the other part pertaining to output type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15185 Differential Revision: D13494873 Pulled By: houseroad fbshipit-source-id: 069d2f956a5ae9bf0ac2540a32594a31b01adef8	2018-12-20 12:37:27 -08:00
David Riazati	cb0b096f2b	Miscellaneous small doc fixes (#15373 ) Summary: This PR makes some small changes for better consistency in our README and CONTRIBUTING docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/15373 Differential Revision: D13512753 Pulled By: driazati fbshipit-source-id: 44398ad1894eef521d5f5acb1d06acaad67728cf	2018-12-20 12:33:40 -08:00
Edward Yang	cac02034f6	Extend README for ATen/native/cpu (#15437 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15437 Differential Revision: D13529436 Pulled By: ezyang fbshipit-source-id: 2e2193d54ea7f7626fe7392e4d0c130c2f87a76f	2018-12-20 11:17:00 -08:00
Shen Li	06a7cb5901	Implementing cuda kernel for tril_indices and triu_indices (#15203 ) Summary: Followup PR of #14904, and the stretch goal of #12653. Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor. The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers. Algorithm details are describe in [comments of TensorFactories.cu](`23ddb6f58a/aten/src/ATen/native/cuda/TensorFactories.cu (L109-L255)`). zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203 Reviewed By: zou3519 Differential Revision: D13517695 Pulled By: mrshenli fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea	2018-12-20 10:23:38 -08:00
Edward Yang	5c66662e58	Revert D13498974: [pytorch][PR] [jit] Add self to Python printer reserved words Differential Revision: D13498974 Original commit changeset: 488efb661476 fbshipit-source-id: 3b991bccf4cf2ffdafe70f145aff0ae2837e31f8	2018-12-20 10:02:37 -08:00
Erik Brinkman	8db44eda01	Add support for batched pdist (#12302 ) Summary: This updates pdist to work for batched inputs, and updates the documentation to reflect issues raised. closes #9406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302 Reviewed By: ezyang Differential Revision: D13528485 Pulled By: erikbrinkman fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de	2018-12-20 09:41:08 -08:00
Brennan Vincent	7a764fe270	multi-dim standard deviation for CUDA. (#14990 ) Summary: This is the CUDA version of #14535 . It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type. We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before). As an initial use-case, we implement `std` in multiple dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990 Differential Revision: D13405097 Pulled By: umanwizard fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb	2018-12-20 08:56:32 -08:00
David Riazati	5e624948b6	Add self to Python printer reserved words (#15318 ) Summary: This adds `self` to the list of reserved words and also sorts the lines and prevents the tracer from naming values 'self' (which happens in torch/tensor.py) Fixes #15240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15318 Differential Revision: D13498974 Pulled By: driazati fbshipit-source-id: 488efb661476cdcdb8ecb9cb48942f02e3c1e611	2018-12-20 02:29:09 -08:00
Peter Goldsborough	eb5d28ecef	Pretty printing of C++ modules (#15326 ) Summary: A long outstanding nicety: pretty printing of C++ modules. E.g. ``` Sequential sequential( Linear(10, 3), Conv2d(1, 2, 3), Dropout(0.5), BatchNorm(5), Embedding(4, 10), LSTM(4, 5)); std::cout << sequential; ``` prints ``` torch::nn::Sequential( (0): torch::nn::Linear(in=10, out=3, with_bias=true) (1): torch::nn::Conv2d(input_channels=1, output_channels=2, kernel_size=[3, 3], stride=[1, 1]) (2): torch::nn::Dropout(rate=0.5) (3): torch::nn::BatchNorm(features=5, eps=1e-05, momentum=0.1, affine=true, stateful=true) (4): torch::nn::Embedding(count=4, dimension=10) (5): torch::nn::LSTM(input_size=4, hidden_size=5, layers=1, dropout=0) ) ``` apaszke ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15326 Differential Revision: D13518986 Pulled By: goldsborough fbshipit-source-id: 63bf753672f0e348951de3645208f263581de5fb	2018-12-19 21:55:49 -08:00
Hassan Eslami	2ef0f1222a	Restructuring prof dag counters (#13321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13321 This diff simply refactors the `ProfDAGCounters` into two: * `ProfDAGCounters` that gathers stats at runtime. * `ProfDAGReport` which holds the report from the gathered stats once stats collection is done. This refactoring allow us to implement `+=` for `ProfDAGReport`, which can be used for aggregating same-net reports on each host. Reviewed By: donglimm Differential Revision: D12837988 fbshipit-source-id: 0470c5fd6437f12711cab25a15a12965d79b2a91	2018-12-19 21:48:30 -08:00
Wanchao Liang	b89b46abfb	Remove python_default_init from ATen and use Optional (#15234 ) Summary: Optional clean up. This PR remove python_default_init from the yaml files, and the code-gen, and utilize optional type to do the work. This also fix the bug in the #13149 to correctly adopt as_strided backward. Fixes #9941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15234 Differential Revision: D13502044 Pulled By: wanchaol fbshipit-source-id: 774b61fc4414482cf11d56e22bd0275aefb352a4	2018-12-19 21:38:50 -08:00
Jerry Zhang	3fc889e976	Tensor construction codemod(ResizeLike) - 1/7 (#15073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15073 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13419563 fbshipit-source-id: 8c284405fa3a867303216df876ee6b20d8a46551	2018-12-19 21:38:48 -08:00
bddppq	2db742fc95	Do not use fork to invoke test scripts in pytorch rocm CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14600 Differential Revision: D13523937 Pulled By: bddppq fbshipit-source-id: 1493fdd051283650081d7944bb2bd7f0c4c44990	2018-12-19 21:35:16 -08:00
Edward Yang	1071e92335	Replace Vec256<T>::size with constexpr method (#15406 ) Summary: Stack:     ⚫  #15406 Replace Vec256<T>::size with constexpr method  [💛](https://our.intern.facebook.com/intern/diff/D13519902/) See Note [constexpr static function to avoid odr-usage compiler bug] for detailed justification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15406 Differential Revision: D13523774 Pulled By: ezyang fbshipit-source-id: c0ab44298bb2ef3d68a66d026fc6bc156a909a6b	2018-12-19 20:33:45 -08:00
Marat Dukhan	9abd755a76	Make cpuinfo logging less verbose (#15405 ) Summary: Log only errors in cpuinfo. Fix to #15401 and #15398 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15405 Differential Revision: D13526251 Pulled By: Maratyszcza fbshipit-source-id: 4d9eba0912f7b45093bed2e343cd77a151ffa8c4	2018-12-19 20:23:36 -08:00
James Sun	88bf683cbc	Support error handling in forked threads (#14523 ) Summary: Save error info in the future for parent thread to pick up. Throw the error when the thread is the root thread. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14523 Differential Revision: D13251756 Pulled By: highker fbshipit-source-id: b40f9a45665e1a934743f131ec5e8bad5622ce67	2018-12-19 18:54:46 -08:00
Jerry Zhang	5dd5ef3214	default options for OutputTensorCopyFrom (#15248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15248 OutputTensorCopyFrom takes four arguments: index, a source Tensor, TensorOptions and whether we want to perform an async call. We want to provide some default option for TensorOptions, (1). default device to context_.device() (2). default dtype to input.dtype(). User can also explicitly provide these options to override default values. next diff will change the order of TensorOptions parameter so that user don't need to write down tensor options unless they want to override. Reviewed By: dzhulgakov Differential Revision: D13453824 fbshipit-source-id: 87401f81c7c3f9fd3d8936c710e6c2e04a59b689	2018-12-19 18:14:47 -08:00
James Sun	a00cfd1e9b	Fix Module::copy_into Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15393 Differential Revision: D13519477 Pulled By: highker fbshipit-source-id: d62928597ec0700b550e7cf481c8febae57b200d	2018-12-19 17:09:59 -08:00
Zachary DeVito	0b219538cf	add unpack_outputs to inlineCallTo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15382 Differential Revision: D13518844 Pulled By: zdevito fbshipit-source-id: 981936988080af80629b70bf5f6dfa52ceb09c2f	2018-12-19 15:11:59 -08:00
Benoit Rostykus	07d20b1e7c	Fix documentation (#15372 ) Summary: Current documentation example doesn't compile. This fixes the doc so the example works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15372 Differential Revision: D13522167 Pulled By: goldsborough fbshipit-source-id: 5171a5f8e165eafabd9d1a28d23020bf2655f38b	2018-12-19 15:04:24 -08:00
Bram Wasti	055de167d5	computeChains with nomnigraph (#15366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15366 swap the old implementation with a slightly easier one to understand I ran the tests and compared the number of chains compared to the old algorithm. This one outperforms on every test, but we have yet to see if that impacts performance at all. old chain 34 nomnigraph chain 25 old chain 46 nomnigraph chain 34 old chain 228 nomnigraph chain 188 old chain 397 nomnigraph chain 338 Reviewed By: ilia-cher Differential Revision: D13057451 fbshipit-source-id: ccd050bfead6eb94ab9c7b0a70b09a22c2b9e499	2018-12-19 15:04:23 -08:00
SsnL	9217bde807	Refactor dataloader.py (#15331 ) Summary: Same as #14668, and was approved there. ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you! Below is the original description at #14668: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331 Reviewed By: yf225 Differential Revision: D13503120 Pulled By: ailzhang fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e	2018-12-19 12:36:03 -08:00
vishwakftw	41e7e1bc40	Rename potrs to cholesky_solve (#15334 ) Summary: Changelog: - Renames `potrs` to `cholesky_solve` to remain consistent with Tensorflow and Scipy (not really, they call their function chol_solve) - Default argument for upper in cholesky_solve is False. This will allow a seamless interface between `cholesky` and `cholesky_solve`, since the `upper` argument in both function are the same. - Rename all tests - Create a tentative alias for `cholesky_solve` under the name `potrs`, and add deprecated warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15334 Differential Revision: D13507724 Pulled By: soumith fbshipit-source-id: b826996541e49d2e2bcd061b72a38c39450c76d0	2018-12-19 12:31:24 -08:00
Elias Ellison	33018e4e09	centralize side effects ops as node method (#15188 ) Summary: A number of different passes rely on whether a node has side effects. This centralizes the list of side effectful ops in one place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15188 Differential Revision: D13508438 Pulled By: eellison fbshipit-source-id: 2143e782b787731ce007b6dcd50cbde30e1b8dd0	2018-12-19 10:52:54 -08:00
Tugrul Ates	560530aeec	Optional ScalarType support for native functions & JIT (#15154 ) Summary: For #6593 and #9515 This completes the support for optional<ScalarType> in native, JIT and autograd. Note: Mostly following the existing implementation for optional<Scalar> that was added in https://github.com/pytorch/pytorch/pull/12582. This PR introduces a way to make functions accept an optional dtype and it will unblock #9515 by allowing the `dtype` param for type promotion interface: ``` func: name(inputs, , ScalarType? dtype=None, Casting casting=same_kind) ``` An alternative approach could have been using `ScalarType::Undefined` for the same purpose but without optional, though it would have been a bit hacky. ``` func: name(inputs, , ScalarType dtype=Undefined, Casting casting=same_kind) ``` Here's an example use of this in action: `971f69eac6` There are already a bunch of native functions that were getting optional `dtype` through function overloading. https://github.com/pytorch/pytorch/pull/15133 is the attempt to migrate all of those. I will send those changes separately after this since some functions (e.g. sum) need quite a bit of change in the codebase. See the commits over there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15154 Differential Revision: D13457760 Pulled By: tugrulates fbshipit-source-id: 706134f0bd578683edd416b96329b49a1ba8ab48	2018-12-19 10:45:35 -08:00
vfdev-5	54d4fe3f49	Implement 'to' on ScriptModules (#15340 ) Summary: Following #6008 Fixes "Implement 'to' on ScriptModules #7354" cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/15340 Differential Revision: D13506646 Pulled By: zdevito fbshipit-source-id: 318fea2e8e51a37ce9844efa4c8db67d45a66317	2018-12-19 10:41:23 -08:00
Marat Dukhan	1d94a2bee3	Update cpuinfo submodule (#15385 ) Summary: Pull cpuinfo changes that should make it work on AWS Lambda servers (which don't have `/sys/devices/system/cpu/{possible,present}` files, and probably don't mount sysfs at all). I'm not 100% sure it will fix the issue, but getting this update in would make it easier for users to test using a nightly build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15385 Reviewed By: soumith Differential Revision: D13517467 Pulled By: Maratyszcza fbshipit-source-id: e8e544cd1f9dad304172ebb7b6ba7a8ad7d34e66	2018-12-19 07:31:45 -08:00
svcscm	cbde820bc3	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: dfbdae40e505c46cd64751c6ec107c84f9434131	2018-12-18 23:37:34 -08:00
Jianyu Huang	cd8dd49fba	race condition fix of using mutable_data inside OPENMP region for batched matmul (#15371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15371 Similar to D13387692: Never call mutable_data from an OpenMP region!!! Reviewed By: jspark1105 Differential Revision: D13511259 fbshipit-source-id: 100812d2a547c0a1d5018749d5fdc88162375673	2018-12-18 23:22:56 -08:00
Michael Suo	6ca1d93473	add whitelisted clang-format checks (#15254 ) Summary: This PR adds clang-format automation: - It only checks on whitelisted files, so we can enable incrementally without noise - There is a pre-commit hook provided that will do the same check, plus prompt users to apply the clang-format changes (no change is made without the user agreeing). My plan is to migrate over whole files at a time, clang-formatting them and then adding them to the whitelist. Doing it this way should avoid too many merge pains (the most you'll have to is run clang-format on the affected file before rebasing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/15254 Differential Revision: D13515888 Pulled By: suo fbshipit-source-id: d098eabcc97aa228c4dfce8fc096c3b5a45b591f	2018-12-18 22:34:20 -08:00
Zachary DeVito	122b4ef41d	build fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15384 Differential Revision: D13515708 Pulled By: zdevito fbshipit-source-id: ea077cfec30edf41b85dc83c0a969d1146434145	2018-12-18 22:11:44 -08:00
Zachary DeVito	0368054a6d	Split up compiler.cpp (#15355 ) Summary: This separates the different parts of compiler.cpp to make their relationship more clear. In particular it adds: * sugared_value.{h,cpp} - all the public SugaredValues that the compiler defines and a few that were inside compiler.cpp * type_parser.{h, cpp} - Turns TreeRef's defining types into TypePtr * schema_matching.{h, cpp} - infrastructure for matching arguments against overloaded schema and emitting builtin operators with a particular schema. Retains: * compiler.{h, cpp} - now responsible simply for the `defineMethodsInModule` infra structure. Some utility functions like inlineCallTo have moved to ir.h. Only thing that is not a move is some changes in module.h/cpp that remove multiple returns from `Method::emit_call_to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15355 Reviewed By: suo, wanchaol Differential Revision: D13507524 Pulled By: zdevito fbshipit-source-id: 69ec936a9ff1a383c12a883616346b219c72e393	2018-12-18 19:43:35 -08:00
Ailing Zhang	6ab2e7442d	Autograd using torchscript (#14604 ) Summary: This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly). We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually. This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python. Differential Revision: D13494635 Pulled By: ailzhang fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf	2018-12-18 19:10:57 -08:00
Wanchao Liang	4928c76415	Minor clean up for test_jit (#15368 ) Summary: * remove None args in functional tests * remove some expect files that are not necessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/15368 Differential Revision: D13512349 Pulled By: wanchaol fbshipit-source-id: 304cffff966487d15c373057ae8ad114ef8aa7f9	2018-12-18 18:26:37 -08:00
David Riazati	f3bff2d500	Add RNNCell modules to Script standard library (#14695 ) Summary: Adds RNNCell modules to script standard lib cc apaszke for argument_spec changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/14695 Differential Revision: D13467680 Pulled By: driazati fbshipit-source-id: 13a14da87714325cc4c3d49e5fde8a850d5d757b	2018-12-18 17:28:28 -08:00
David Riazati	f3cc9b2218	Remove fully qualified weak script names (#15364 ) Summary: Cleanup to make references to `weak_script` consistent across codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/15364 Differential Revision: D13509676 Pulled By: driazati fbshipit-source-id: 93dbbbe57e9b9b6587895f3cc6fac678babd21de	2018-12-18 16:48:52 -08:00
Chandler Zuo	096ee8467c	Redefine scheduler to set learning rate using recursive formula (#14010 ) Summary: Modified step_lr for StepLR, MultiStepLR, ExponentialLR and CosineAnnealingLR. In this way, multiple schedulers can be used simultaneously to modify the learning rates. Related issue: https://github.com/pytorch/pytorch/issues/13022 Added unit tests combining multiple schedulers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14010 Reviewed By: ezyang Differential Revision: D13494941 Pulled By: chandlerzuo fbshipit-source-id: 7561270245639ba1f2c00748f8e4a5f7dec7160c	2018-12-18 16:44:31 -08:00
Ruiyang Liu	5e97720100	Replace resize_dim() with set_sizes_and_strides() in (#15348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15348 We have a function resize_dim() on TensorImpl in c10/core/TensorImpl.h which lets you change the dimensionality of a tensor, resizing both sizes and strides. Unfortunately, this API is fairly easy to misuse, because it fills in the new entries with garbage when you size it larger. We want to refactor the call sites to use set_sizes_and_strides() instead, so that there is never an intermediate tensor state where the sizes/strides don't make sense. In this diff, resize_dim() is replaced with set_sizes_and_strides() in aten/src/TH/THTensor.hpp. Reviewed By: ezyang Differential Revision: D13505512 fbshipit-source-id: 193bab89f0018c13ca07488be336d8e967746b76	2018-12-18 16:38:36 -08:00
Richard Zou	5667af3880	Minor cleanup for TestFuser tests (#15134 ) Summary: Changelog: - change some expect tests that didn't have to be expect tests, instead use self.assertAllFused - Some of the fuser tests weren't using self.assertAllFused. - Minor test renames cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/15134 Differential Revision: D13507481 Pulled By: zou3519 fbshipit-source-id: dd0788530a60bb5ed2f42b961fae3db2b4404b64	2018-12-18 16:33:59 -08:00
Bill Li	3681bf7cff	add dense vector to id_list operator (#15090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15090 as title step 2 of the linked task Reviewed By: ellie-wen Differential Revision: D13425977 fbshipit-source-id: f3538ed68f42470ba39c5b779af764d4a5591a9d	2018-12-18 16:27:38 -08:00
Michael Suo	f5da198236	fix clang-tidy script for python 3 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15360 Differential Revision: D13509668 Pulled By: suo fbshipit-source-id: a3448a115eaac8dd4c3f179901a23bdbc5098408	2018-12-18 15:06:14 -08:00
Gregory Chanan	2469f7e02e	Port torch.linspace to ATen and parallelize it on CPU. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15320 Reviewed By: ezyang Differential Revision: D13498995 Pulled By: gchanan fbshipit-source-id: fba655d51d978fffaa53a5e4cae4a99ebfb0eddc	2018-12-18 15:01:49 -08:00
David Riazati	3118124cd6	Add (Un)Fold modules to standard library (#14759 ) Summary: Depends on #14597 for the corresponding aten ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14759 Differential Revision: D13325356 Pulled By: driazati fbshipit-source-id: 99e39449c1ccfa293de05672c31a11e580bdd11f	2018-12-18 12:03:08 -08:00
Lu Fang	f4c504593c	Fix the (reduce)min and (reduce)max ONNX exporting (#15241 ) Summary: max and reducemax are smashed together, we need to support one input case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15241 Reviewed By: yinghai Differential Revision: D13473312 Pulled By: houseroad fbshipit-source-id: 9b8c847286a2631b006ca900271bc0d26574101a	2018-12-18 11:48:06 -08:00
Zachary DeVito	056cfaf3ff	Method returns a single argument (#15289 ) Summary: This PR changes Method (just Method not all graphs) to always have a single return argument. This is part 1 in a set of changes that will enable us to have better handling if early return statements. The simplification that this change provides greatly reduces the work for the next step. This change makes it so that Method and Python handle multiple returns in the same way: * 0 - None * 1 - <single value> * many - Tuple[...] The result is that a lot of special-case handling in compiler.cpp and its bindings can be removed. It also fixes several bugs in return handling, including one where return values were not always checked against their attributed values. Notes: * inferTypeFrom is renamed to be more accurate and discourage use. * This has uncovered some bugs in other components, which are noted in the diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15289 Differential Revision: D13481649 Pulled By: zdevito fbshipit-source-id: 0e2242a40bb28cca2d0e8be48bede96195e4858c	2018-12-18 10:44:09 -08:00
Jerry Zhang	12cf5178aa	caffe2 mobile opengl (#15322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15322 caffe2 mobile opengl code is not used, deleting it to reduce complications when we perform other changes Reviewed By: Maratyszcza Differential Revision: D13499943 fbshipit-source-id: 6479f6b9f50f08b5ae28f8f0bc4a1c4fc3f3c3c2	2018-12-18 08:20:52 -08:00
Edward Yang	54d8ce94ee	Revert D13383102: [pytorch][PR] Upgrade MKL-DNN to version 0.17 Differential Revision: D13383102 Original commit changeset: c434f0e0ddff fbshipit-source-id: 690f46ca0710954fa591a5ea77535e9759db4de5	2018-12-18 07:39:20 -08:00
svcscm	bb9b7de831	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 4bf66581d07d839f459869bc9c6428011063cc5b	2018-12-17 21:25:36 -08:00
Zachary DeVito	3a98462f2c	improve script/no script save error (#15321 ) Summary: Improves the error message for #15116 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15321 Differential Revision: D13499379 Pulled By: zdevito fbshipit-source-id: b8dc0a83efabff74199f4aab2ee98aa41c42608b	2018-12-17 21:13:58 -08:00
James Sun	e37a22128e	Allow tracing with fork/wait (#15184 ) Summary: There is still limitation on this: if a script module is somewhere in the trace, the inputs/outputs can only be tensors or tuples of tensors. resolves #15052 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15184 Differential Revision: D13457691 Pulled By: highker fbshipit-source-id: 8fe46afc41357a0eb8eadd83f687b31d074deb0e	2018-12-17 20:34:26 -08:00
Jie	bd958cde68	[TensorIterator fixing mean to output correct result for half precisi… (#14878 ) Summary: …on](#12115) mean is calculated in two step sum()/numel(). For half precision, data gets casted back to half after sum(). We fused the division into the reduction kernel by adding pre_op/post_op. This allows us to do torch.ones(65536).cuda().half().mean() to return correct result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14878 Differential Revision: D13491159 Pulled By: soumith fbshipit-source-id: e83802e1628b6d2615c45e18d7acf991d143a09e	2018-12-17 20:13:30 -08:00
Edward Yang	71ee882157	Reenable OpenMP by reverting the following two commits. (#15315 ) Summary: Revert "Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)" This reverts commit a84e873bb156080ea76ab182171b1f3b4d5395f6. Revert "Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)" This reverts commit 8901935ad42fe9bf093d1106ea43606008a4024d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15315 Differential Revision: D13495852 Pulled By: ezyang fbshipit-source-id: bcd3f60088b14831c53d3c171f10cd1ab6b35dee	2018-12-17 19:54:41 -08:00
Peter Goldsborough	aec9fdf0a4	Fix _apply in nn.Module (#15305 ) Summary: Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module). soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305 Differential Revision: D13493937 Pulled By: goldsborough fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750	2018-12-17 16:22:21 -08:00
Peter Goldsborough	2f38ffbcb3	Add a correctness check for C++ types to custom operators (#15247 ) Summary: The JIT uses `int64_t` for its integer type and `double` for its floating point type, but users quite often want to write `int` or `float` and that currently fails in not-so-nice ways for custom ops. This PR adds a simple `static_assert` to catch these common failure cases. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/15247 Differential Revision: D13493941 Pulled By: goldsborough fbshipit-source-id: c1cd0d10ab5838c75f167c0bdb57e45a0bc1344e	2018-12-17 16:17:27 -08:00
Tristan Rice	e650a84872	caffe2/python/task: added __repr__ methods to all task definitions (#15250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15250 This adds `__repr__` methods to all of the classes under task.py. This makes the objects much easier to interact with when using them in an interactive manner, such as in a Jupyter notebook. The default `__repr__` method just returns the object ID which is very unhelpful. Reviewed By: hanli0612 Differential Revision: D13475758 fbshipit-source-id: 6e1b166ec35163b9776c797b6a2e0d002560cd29	2018-12-17 16:02:16 -08:00
Roy Li	e0b261a35b	Port nn fold and unfold to c++ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14597 Reviewed By: ezyang Differential Revision: D13272227 fbshipit-source-id: 6eccab5ff5830a977398a96393b778095120edc6	2018-12-17 15:46:37 -08:00
James Sun	c66adfc16b	Allow future type parsing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14887 Differential Revision: D13490984 Pulled By: highker fbshipit-source-id: 165fe995867be273793f983154aa6cbce13e4396	2018-12-17 15:39:52 -08:00
Jesse Hellemn	efb37e86eb	Removing BUILD_C10_EXPERIMENTAL_OPS option and unglobbing experimental/c10d ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15064 Reviewed By: orionr Differential Revision: D13474801 Pulled By: pjh5 fbshipit-source-id: 9d3664c3a3a1b6c2d9f083f8476fe3b037296b98	2018-12-17 15:35:41 -08:00
David Riazati	59d71b9664	Bicubic interpolation for nn.functional.interpolate (#9849 ) Summary: Addresses #918, interpolation results should be similar to tf * Adds bicubic interpolation operator to `nn.functional.interpolate` * Corresponding test in `test_nn.py` The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849 Differential Revision: D9007525 Pulled By: driazati fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc	2018-12-17 15:31:48 -08:00
Wanchao Liang	c5dd91c4ae	add isinstance static type checking for jit (#15076 ) Summary: This PR add isinstance to do static type checking in JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15076 Differential Revision: D13471067 Pulled By: wanchaol fbshipit-source-id: d39b7ed5db9fcca4b503659d02cf7795950ea8ea	2018-12-17 15:21:49 -08:00
peter	216ab259fb	Fix the missing caffe2 proto files for Windows (#15157 ) Summary: Fixes #15156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15157 Differential Revision: D13490420 Pulled By: orionr fbshipit-source-id: 4387d707f634a5975238af915b1befb2277f8ec7	2018-12-17 15:21:47 -08:00
Edward Yang	f4c59c5fdf	Replace SwitchToDevice(0) with SwitchToDevice() (#15126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15126 I want to make people stop manufacturing StreamId from thin air, and a first step is to make people use the default stream. Reviewed By: dzhulgakov Differential Revision: D13432922 fbshipit-source-id: 9f0d8d70646c50d979bde5ba3c3addeebac48a3d	2018-12-17 15:15:00 -08:00
David Riazati	df4c9471ec	Don't enforce docstrings on bool dispatch (#15306 ) Summary: Allows 2 functions that are boolean dispatched to have no docstrings (the only case that will fail now is if both functions have docstrings) Fixes #15281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15306 Differential Revision: D13494884 Pulled By: driazati fbshipit-source-id: 65fec39ae03a7d6a68ad617c9b270faeb1617930	2018-12-17 14:41:05 -08:00
Soumyaroop Roy	95d3fed68f	Fix for issue 14829 (#14908 ) Summary: * Modify the testcase as outlined in the issue * Issue url: https://github.com/pytorch/pytorch/issues/14829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14908 Differential Revision: D13490360 Pulled By: ezyang fbshipit-source-id: ff11a72e19b49223652182e82c2b4e65fe444ca7	2018-12-17 14:28:50 -08:00
Junjie Bai	e07fc114a0	Minor fixes in .jenkins/caffe2/bench.sh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15304 Differential Revision: D13493876 Pulled By: bddppq fbshipit-source-id: 7146eb2587e526af65b4b0290c25bd55653a3088	2018-12-17 13:53:55 -08:00
Spandan Tiwari	700271d0e9	Adding ONNX export for torch.expand and torch.ne (#15050 ) Summary: `torch.expand` and `torch.ne` are used often in models and this PR adds ONNX export support for them. ArmenAg has created issue https://github.com/pytorch/pytorch/issues/10882 for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15050 Differential Revision: D13453036 Pulled By: houseroad fbshipit-source-id: 4724b4ffcebda6cd6b2acac51d6733cb27318daf	2018-12-17 13:48:14 -08:00
Edward Yang	3df79f403e	Tighten up invariants regarding StreamId. (#15125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15125 I realized that it is really bad juju if you fake a StreamId out of thin air, because in general this isn't going to work. So, make the constructor a lot scarier. Most "faking StreamId out of thin air" happens because someone just wants to put something on the default stream. Reviewed By: dzhulgakov Differential Revision: D13432800 fbshipit-source-id: a86991d6fc1d8aa4e54e8175e5f06f90856238e6	2018-12-17 13:30:54 -08:00
David Riazati	1dbc7cff3e	Fix tensor printing bug in Python 2 (#12732 ) Summary: `rsplit` doesn't have kwargs in Python 2 so this line raises an error Fixes #15135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12732 Differential Revision: D10458630 Pulled By: driazati fbshipit-source-id: a63e42fbc0e39e4291480775b516c98122ec05a1	2018-12-17 13:17:51 -08:00
peter	d71fac20eb	Refactor hotpatch_vars and apply it to libtorch (#14976 ) Summary: Fixes #14801. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14976 Differential Revision: D13485381 Pulled By: soumith fbshipit-source-id: 0af3c2e1b90988d56f6f85632328d1e4b788ffd2	2018-12-16 21:53:31 -08:00
Derek Kim	656b565a0f	Trivial comment correction in dataloader (#15276 ) Summary: Trivial comment correction in dataloader Pull Request resolved: https://github.com/pytorch/pytorch/pull/15276 Differential Revision: D13477324 Pulled By: soumith fbshipit-source-id: 2a74a014999655d129311d611f2a09411339cb13	2018-12-15 10:59:00 -08:00
Krishna Kalyan	c51c825efe	Delete ffi documentation (#15220 ) Summary: Deleting FFI documentation since its deprecated. Differential Revision: D13477329 Pulled By: soumith fbshipit-source-id: 0b3d485eb7cef1f05b6b397dff50f21a49d6409e	2018-12-15 09:49:02 -08:00
Fei Sun	60badccd10	Fix a typo in the assert Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15265 Reviewed By: llyfacebook Differential Revision: D13477029 Pulled By: sf-wind fbshipit-source-id: 9c5571a583c01f9701625541ebec0c836cb923f2	2018-12-15 09:09:09 -08:00
y0ast	4bcb425490	fix cholesky call in potrs example (#15215 ) Summary: Cholesky by default returns the lower triangular matrix, see [docs](https://pytorch.org/docs/stable/torch.html#torch.cholesky). However `torch.potrs` by default requires the upper triangular matrix. The naming of the variable `u` suggests that the example expects the upper to be returned, so I've added the flag to make that happen in the example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15215 Differential Revision: D13476468 Pulled By: soumith fbshipit-source-id: 7b68035f435a2b1be4d363b3f63e407394af949d	2018-12-15 04:43:34 -08:00
Michael Suo	2b57bd4107	value-based mark and sweep DCE (#14910 ) Summary: This makes DCE more granular by tracking live values/aliases through the graph (rather than just nodes). So we can be more aggressive in DCE around control flow blocks. For example, in: ``` %a0 = aten::foo() %b = aten::foo() %a2, %b2 = prim::If(%cond) { block0() { %a1 = aten::foo(%.0) %b1 = aten::foo(%b) } -> (%a1, %b1) } return (%a2) ``` we will now dce all the `%b` stuff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14910 Differential Revision: D13476445 Pulled By: suo fbshipit-source-id: 2bf5db19711c07dde946697a4f4b270bd8baf791	2018-12-15 01:16:44 -08:00
Xiang Gao	df614371c7	Mention Jacobian-vector product in the doc of torch.autograd (#15197 ) Summary: A friend of me is learning deep learning and pytorch, and he is confused by the following piece of code from the tutorial https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients : ```python x = torch.randn(3, requires_grad=True) y = x * 2 while y.data.norm() < 1000: y = y * 2 print(y) gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) y.backward(gradients) print(x.grad) ``` He don't know where the following line comes from: ```python gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) ``` What are we computing? Why don't we compute "the gradient of `y` w.r.t `x`"? In the tutorial, it only says > You can do many crazy things with autograd! Which does not explain anything. It seems to be hard for some beginners of deep learning to understand why do we ever do backwards with external gradient fed in and what is the meaning of doing so. So I modified the tutorial in https://github.com/pytorch/tutorials/pull/385 and the docstring correspondingly in this PR, explaining the Jacobian vector product. Please review this PR and https://github.com/pytorch/tutorials/pull/385 together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15197 Differential Revision: D13476513 Pulled By: soumith fbshipit-source-id: bee62282e9ab72403247384e4063bcdf59d40c3c	2018-12-15 00:10:30 -08:00
Jerry Zhang	5b542a755f	Tensor method rename dims()->sizes() (#15246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15246 Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: igorsugak Differential Revision: D13470369 fbshipit-source-id: ce995beab7c64bebe8b234fb5e6d015940ec2952	2018-12-14 21:11:02 -08:00
Zachary DeVito	f118568662	Create parser.cpp (#15238 ) Summary: Moves implementation into .cpp file. Parser was getting included in several compilation units. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15238 Differential Revision: D13474635 Pulled By: zdevito fbshipit-source-id: 7dc824eea8f506d6c8ae1aa67aeec0c34d5285fc	2018-12-14 19:31:36 -08:00
Fei Sun	e1808be37d	Add several features to converting images to blobs (#15204 ) Summary: Several enhancements are implemented: * Resize the images to be within a boundary between min-size and max-size (can be height and weight). It tries to resize the minimum size to match the min-size and keep the aspect ratio. However, if in that case the maximum size is more than the max-size, then resize the maximum size to be equal to the max-size (and the minimum size is less than min-size). The min/max sizes are specified in argument scale, in a comma separated form. If one of the size is -1, then that size is not a restriction. * Change the OpenCV resize function arguments from using cv::Size() to the x, y scale. Theoretically they should be the same. But in reality, the two ways of specifying them may result to different resized outputs. * Once the image is read in, change the data to floats. That means, after resize and other preprocessing steps, the float values are preserved (not truncated to int). * It is possible to convert data in text format to the blob format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15204 Reviewed By: llyfacebook Differential Revision: D13467225 Pulled By: sf-wind fbshipit-source-id: 7da34a72d43a9603cd7ab953f5821c1222d0178f	2018-12-14 17:37:21 -08:00
Yinghai Lu	717496e6c1	Supply static shape info to Reshape when doing onnxGetCompatibility (#15242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15242 Newer version ONNX Reshape gets shape info from a tensor. Hence for static backend, we need to provide this info to it when doing `onnxGetCompatibility` too. Reviewed By: jackm321 Differential Revision: D13471959 fbshipit-source-id: 8a58e28edd900b6ad54a1dbd63ff2579fbe0e820	2018-12-14 16:37:39 -08:00
rohithkrn	763b9954f3	FP16MomentumSGDUpdate Op fix and enable for ROCm (#15150 ) Summary: 1. Fix a bug in FP16MomentumSGDUpdate operator 2. Enable operator for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/15150 Differential Revision: D13473145 Pulled By: bddppq fbshipit-source-id: 4c5c5f30cb9bba658e3639dbe193fa08a304d306	2018-12-14 16:33:45 -08:00
Alexander Sidorov	e596d23137	Start unittesting our main observer (#15191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15191 OSS: just splitting out basic flags from a unit test. So I can extend them in another test where I need to add additional flags. Reviewed By: yinghai Differential Revision: D13159184 fbshipit-source-id: 9823e792cf0ed8d0379235c44564862b7d784845	2018-12-14 16:24:38 -08:00
bddppq	34f1f2208b	Build c10 HIP test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15233 Reviewed By: ezyang Differential Revision: D13471002 Pulled By: bddppq fbshipit-source-id: b42c3bc2b9db672ce50a52eb700cc6ed13d3535f	2018-12-14 15:36:38 -08:00
Krishna Kalyan	5e09c7bc80	record unit time in torch.cuda.event (#15221 ) Summary: Record unit of time for torch.cuda.Event's elapsed_time Differential Revision: D13467646 Pulled By: zou3519 fbshipit-source-id: 4f1f4ef5fa4bc5a1b4775dfcec6ab155e5bf8d6e	2018-12-14 15:29:06 -08:00
James Reed	054456eb93	Preserve module hierarchy on traced modules (#15101 ) Summary: We need this, for example, to properly call `_unpack` when we have a traced module in the hierarchy Pull Request resolved: https://github.com/pytorch/pytorch/pull/15101 Differential Revision: D13468467 Pulled By: jamesr66a fbshipit-source-id: c2b6740b12cde6e23395d12e42d4fc2c4c7ca3f2	2018-12-14 15:07:51 -08:00
Zachary DeVito	60f02b87be	fix an issue where two rules build the same .py files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15230 Differential Revision: D13471625 Pulled By: zdevito fbshipit-source-id: a982413a308c7a9bb5b6a82fe96fd3de44f555aa	2018-12-14 14:52:52 -08:00
Johannes M Dieterich	bd368b867d	Do not ifdef __launch_bounds__ out for ROCm. (#15228 ) Summary: The compiler understands it and profits from knowing it by not using too many VGPRs as it defaults to 256 default workgroup size. Fixes a problem in bringup of ROCm 2.0 on gfx906. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15228 Differential Revision: D13470950 Pulled By: bddppq fbshipit-source-id: f9aa44c7c95299a099c0ea9317b9044cc056acc5	2018-12-14 14:47:32 -08:00
Edward Yang	dcd1685282	Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated Differential Revision: D13440858 Original commit changeset: 1c6af5c53538 fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298	2018-12-14 14:35:01 -08:00
Chaitanya Sri Krishna Lolla	9f1d8f2eeb	enabled tests in test_nn, test_cuda and test_sparse (#15232 ) Summary: tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232 Differential Revision: D13470991 Pulled By: bddppq fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925	2018-12-14 14:27:57 -08:00
David Riazati	e9fb4d1f11	Fix jit doc codeblocks and tables (#15227 ) Summary: Some of the codeblocks were showing up as normal text and the "unsupported modules" table was formatted incorrectly Pull Request resolved: https://github.com/pytorch/pytorch/pull/15227 Differential Revision: D13468847 Pulled By: driazati fbshipit-source-id: eb7375710d4f6eca1d0f44dfc43c7c506300cb1e	2018-12-14 14:27:56 -08:00
Johannes M Dieterich	b316e44a46	Remove __forceinline__ hipification step. (#15229 ) Summary: The HIP definition now correctly contains the inline attribute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15229 Differential Revision: D13470962 Pulled By: bddppq fbshipit-source-id: 34f8361bda5f3dce20a2eeb530c3a25d1b1bdd06	2018-12-14 14:24:05 -08:00
Peter Goldsborough	7a61306031	Enable all clang-tidy performance checks (#15198 ) Summary: This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings. ![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png) ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198 Differential Revision: D13468797 Pulled By: goldsborough fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2	2018-12-14 13:32:47 -08:00
Junjie Bai	fc2856e9aa	Refactor caffe2 CI scripts and add benchmark scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14575 Differential Revision: D13468049 Pulled By: bddppq fbshipit-source-id: e73bc8742c8a03f498816eee8a72b06a3e19fe48	2018-12-14 13:19:33 -08:00
Peter Goldsborough	4327a2d70a	Better tests/support for Python/C++ inter-op (#15193 ) Summary: Methods like `module.named_modules()` returns a container of `shared_ptr<nn::Module>`. Currently the `nn::Module` base class does not have Python bindings. This PR fixes this, and adds more unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15193 Differential Revision: D13458713 Pulled By: goldsborough fbshipit-source-id: 4091fe1b96a1be8db14c6a4307fbacc2b41ff6fe	2018-12-14 08:42:10 -08:00
Jerry Zhang	fb8487d708	Tensor construction codemod(ResizeLike) - 3/7 (#15122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15122 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13419643 fbshipit-source-id: 65b5a037b94d458b944d51f790ba2829db1fb530	2018-12-14 02:08:37 -08:00
Michael Suo	78bf1a9065	Revert D13407930: [pytorch][PR] Support torch.tensor in script Differential Revision: D13407930 Original commit changeset: d17f1195a221 fbshipit-source-id: f4458872c48ec4a2c9983b21ed90bcdc0ae665b7	2018-12-13 22:13:07 -08:00
Duc Ngo	331c4b5b4d	caffe2 - make DataRandomFiller usable in unit tests (#15027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15027 - Make DataRandomFiller able to accept input_dims and input_types for only non intermediate inputs. Add a helper to fill input directly to a workspace Reviewed By: highker Differential Revision: D13408345 fbshipit-source-id: 5fc54d33da12e3f0a200e79380d4c695b0339b17	2018-12-13 20:45:52 -08:00
Duc Ngo	66b26806fc	caffe2 - easy - utils to set argument of operator (#15022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15022 Add setArgument testing utils to make it easy to set argument for an operator Reviewed By: yinghai Differential Revision: D13405225 fbshipit-source-id: b5c1859c6819d53c1a44718e2868e3137067df36	2018-12-13 20:45:50 -08:00
Duc Ngo	9726651d1e	caffe2 - easy - test utils for tensor assertion (#15020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15020 Add test utils for assertion of a tensor (sizes and values) Reviewed By: salexspb Differential Revision: D13401146 fbshipit-source-id: bc385df074043e03ea884940b5631b96de4a607e	2018-12-13 20:45:48 -08:00
Duc Ngo	d0b4ae835d	caffe2 - easy - test utils to compare tensors in two workspaces (#15181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15181 Add test utils to compare tensors in two workspaces Reviewed By: ZolotukhinM Differential Revision: D13387212 fbshipit-source-id: e19d932a1ecc696bd0a08ea14d9a7485cce67bb2	2018-12-13 20:45:46 -08:00
Duc Ngo	a0f68646ac	caffe2 - easy - test utils to fill tensors (#15019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15019 Put some utils to fill tensors to test_utils Reviewed By: salexspb Differential Revision: D13386691 fbshipit-source-id: 51d891aad1ca12dc5133c0352df65b8db4f96edb	2018-12-13 20:45:44 -08:00
Duc Ngo	8fedde5530	caffe2 - easy - test utils to create operator (#15180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15180 Test utils to create an operator On top of D13370461 Reviewed By: ZolotukhinM Differential Revision: D13382773 fbshipit-source-id: a88040ed5a60f31d3e73f1f958219cd7338dc52e	2018-12-13 20:45:42 -08:00
Duc Ngo	eb6fec3652	caffe2 - easy - Create test_util to make it easier to write C++ unit tests (#15014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15014 Currently it looks like many of the simple operations such as comparing tensors, creating tensors, fetching tensors... are too verbose and took effort to write correctly in unit tests. Easy to use utilities are often more important to increase productivity writing unit tests. While caffe2 python unit tests are relatively easier to write at the moment, the C++ side seems lacking. In this change I create a test_util, started with assertsTensorEquals, getTensor, createTensor, and we can start putting more easy to use utilities there. Reviewed By: salexspb Differential Revision: D13370461 fbshipit-source-id: bee467a127e1d032ef19482f98aa5c776cf508c0	2018-12-13 20:45:41 -08:00
vishwakftw	81644ed9ab	Fix derivative for mvlgamma (#15049 ) Summary: Fixes #15015. Added tests to validate derivative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15049 Reviewed By: soumith Differential Revision: D13434117 Pulled By: zou3519 fbshipit-source-id: 4a292600af9eb08b67c0f8b5482e9512aac95e72	2018-12-13 20:32:57 -08:00
Roy Li	0b9b965c1a	Fix numpy conversion for int8 tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15194 Differential Revision: D13459270 Pulled By: li-roy fbshipit-source-id: 605534add263860a3ad9a7fa70888301ee0bf8e4	2018-12-13 19:38:09 -08:00
Natalia Gimelshein	fb140c7828	add erf and erfc to fuser/autodiff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15139 Differential Revision: D13455690 Pulled By: soumith fbshipit-source-id: b06e5f5d362869c2e5fa11a52f9450d77c30d4cb	2018-12-13 19:17:40 -08:00
Sebastian Messmer	bb8ee2de0f	Move TensorImpl::CopyFrom to caffe2::Tensor (2/2) (#14858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14858 This diff doesn't change logic but just takes the existing code and moves it to caffe2::Tensor Reviewed By: ezyang Differential Revision: D13365817 fbshipit-source-id: bc73b27a793602cb14200dcdf357aa63233da43c	2018-12-13 18:41:24 -08:00
Sebastian Messmer	070f33f154	Move TensorImpl::CopyFrom to caffe2::Tensor (1/2) (#14656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14656 This diff doesn't move it yet, but prepares it to be moved, i.e. removes all access to class internals. dzhulgakov: Please comment on if you think it still makes sense to land this even though it's not blocking anymore since we're going to move at::CopyBytes anyhow. ezyang: There's some changes in the implementation, especially handling undefined dest tensors. Please review carefully. Reviewed By: ezyang Differential Revision: D13287688 fbshipit-source-id: 17800ca8a79ab1633f23be58d96f99a160d8ed24	2018-12-13 18:41:23 -08:00
Jing Huang	dc72a5e02c	For rotated proposals, replace cv::rotatedRectangleIntersection with a correct version that doesn't have underflow problem (#15113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15113 cv::rotatedRectangleIntersection has a known float underflow bug that would cause failure in ```CV_Assert(intersection.size() <= 8)``` For rotated proposals, replace cv::rotatedRectangleIntersection with a correct version that doesn't have underflow problem. Otherwise, when ```USE_CPP_GENERATE_PROPOSALS = true```, the training would fail. Reviewed By: viswanathgs Differential Revision: D13429770 fbshipit-source-id: 5e95d059f3c668f14059a0a83e8e53d8554cdb99	2018-12-13 18:13:46 -08:00
Elias Ellison	aecab53778	Support torch.tensor in script (#14913 ) Summary: Adding support for torch.tensor in script. The input list is typed as t[], because it can be arbitrarily nested. I added a check a compile time check that the inner type of the list is a bool, float, or int. Also adds specialization for Boolean Lists, which already existed at the ivalue level but had not been added to the compiler yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/14913 Differential Revision: D13407930 Pulled By: eellison fbshipit-source-id: d17f1195a22149d5b0d08d76c89a7fab8444f7c5	2018-12-13 17:38:38 -08:00
Sebastian Messmer	bbbfda72a0	Remove TensorImpl -> Type dependency Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15086 Reviewed By: dzhulgakov Differential Revision: D13425628 fbshipit-source-id: 08a8a774d17b071367454e027012a02f96d177d4	2018-12-13 17:10:59 -08:00
Peter Goldsborough	1e9c384afb	Enable performance-unnecessary-value-param in .clang-tidy (#15026 ) Summary: This PR fixes around 250 places in the codebase where we were making unnecessary copies of objects (some large, some small). ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/15026 Differential Revision: D13458784 Pulled By: goldsborough fbshipit-source-id: be5148b2ce09493588d70952e6f6d6ff5ec5199b	2018-12-13 16:15:35 -08:00
Junjie Bai	bdfff2f8c2	Add missing caffe2_hip extension in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15189 Reviewed By: orionr Differential Revision: D13457644 Pulled By: bddppq fbshipit-source-id: c2363e9b8fd21709b62777e5b2199f01ec1c65f8	2018-12-13 15:59:51 -08:00
bddppq	de0784510d	Remove disabled_features in hipify Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15098 Reviewed By: ezyang Differential Revision: D13453762 Pulled By: bddppq fbshipit-source-id: e177042c78f5bf393163d660c25b80285353853d	2018-12-13 15:43:57 -08:00
bddppq	855d9e1f19	Run ONNX cuda backend test cases via ROCm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15069 Differential Revision: D13427757 Pulled By: bddppq fbshipit-source-id: ba0273d75986cd5b146f7041a83c63ddf9c6c0cf	2018-12-13 15:10:00 -08:00
vishwakftw	6911ce19d7	Remove _finfo; replace _finfo usage with torch.finfo (#15165 ) Summary: This PR removes the usage of _finfo defined in torch.distributions.utils and changes the call sites to use torch.finfo instead Differential Revision: D13451936 Pulled By: soumith fbshipit-source-id: 6dbda3a6179d9407bc3396bf1a2baf3e85bc4cf2	2018-12-13 14:30:27 -08:00
Jerry Zhang	f1f7c16c90	Tensor construction codemod(ResizeLike) - 4/7 (#15088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15088 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13419682 fbshipit-source-id: 3e59403bc1c0e71e5cb66df932ed0c6a0a72e643	2018-12-13 13:39:56 -08:00
David Reiss	cbd1c519c4	Replace non-printable-ascii characters in ProtoDebugString (#14918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14918 When ProtoBuf-Lite is in use, ProtoDebugString just calls SerializeAsString. This produces binary output, which is not a very suitable "debug" string. Specifically, we've observed it causing problems when calling code tries to add the debug string to a Java exception message (which requires valid UTF-8). Now, we replace all non-ASCII bytes with "?". This is not a very fast implementation, but generating debug strings shouldn't be a performance-sensitive operation in any application. Reviewed By: dzhulgakov Differential Revision: D13385540 fbshipit-source-id: 8868172baf20efaf53fecf7d666a6980f59b64f5	2018-12-13 13:16:24 -08:00
Jerry Zhang	994f72ee3e	Tensor construction codemod(ResizeLike) - 6/7 (#15137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15137 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13419736 fbshipit-source-id: f4ad7b9582c2f809258169b7fef9adbca7063d99	2018-12-13 12:47:33 -08:00
Jerry Zhang	43c0b50c2e	Tensor construction codemod(ResizeLike) - 5/7 (#15084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15084 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13419711 fbshipit-source-id: dd2b740c3f13d8087085bafc5571aaf908d1af42	2018-12-13 12:42:52 -08:00
Junjie Bai	86fbf17ba6	Use std::vector instead of alloca to work around hcc crash Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15175 Differential Revision: D13453708 Pulled By: bddppq fbshipit-source-id: f8c147ae9f679e395fee9d4c73ebcca052c9a752	2018-12-13 12:34:36 -08:00
Junjie Bai	f61612206c	Fix old tensor OutputTensorCopyFrom usage in ImageInput operator (#15094 ) Summary: cc jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15094 Differential Revision: D13451898 Pulled By: bddppq fbshipit-source-id: 27906be62fb88aaa13c257441a2e35a285b445ee	2018-12-13 11:48:19 -08:00
Vitaly Fedyunin	e5bd6fe86d	Kill non-forward, non-backward functions generated from nn.yaml (#15127 ) Summary: Updating binding to legacy functions. Remove unused declarations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15127 Differential Revision: D13433405 Pulled By: VitalyFedyunin fbshipit-source-id: 58544d38affd20818742338c9eb789d9d14ccbaa	2018-12-13 11:34:50 -08:00
Edward Yang	bc80deea1b	Delete defunct USE_SIMPLE_BASE_CTOR_DTOR (#15144 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15144 Differential Revision: D13440872 Pulled By: ezyang fbshipit-source-id: 2b1d73fac0c63729ba01d8f129642334ae9d9cf3	2018-12-13 11:20:37 -08:00
Lu Fang	e51092a2b8	Fix typo (#15045 ) Summary: Simple typo fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/15045 Reviewed By: dzhulgakov Differential Revision: D13413509 Pulled By: houseroad fbshipit-source-id: be66700c30d038368b1433232a4e3fd9299c83d6	2018-12-13 11:13:19 -08:00
Michael Carilli	ca4358c8f5	Use a pool of per-thread cudnn handles for each device, updated (#15080 ) Summary: Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080 Differential Revision: D13440858 Pulled By: ezyang fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035	2018-12-13 10:24:06 -08:00
vishwakftw	214f46faf5	Fix bincount for non-contiguous inputs on CPU (#15109 ) Summary: Fixes #15058. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15109 Differential Revision: D13447448 Pulled By: soumith fbshipit-source-id: 56e8d42934538fb00465105a2c5ccfeb7c18a651	2018-12-13 09:44:20 -08:00
Vitaly Fedyunin	bf7a2b9125	Unify SparseTensorImpl::size_ and TensorImpl::sizes_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15130 Differential Revision: D13434981 Pulled By: VitalyFedyunin fbshipit-source-id: 98bd4d66834a3c3d2ea577adb0c8413852da095d	2018-12-13 08:55:35 -08:00
Peter Goldsborough	0bf1383f0a	Python <-> C++ Frontend inter-op (#13481 ) Summary: This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side). I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly. The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript. apaszke zdevito CC dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481 Differential Revision: D12981996 Pulled By: goldsborough fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7	2018-12-13 08:04:02 -08:00
Richard Zou	b14d6d730a	Reuse KernelSpec for FusionGroups with equivalent graphs (#14541 ) Summary: Before this PR, loop unrolling + the graph fuser was creating multiple FusionGroups with the same bodies (with different variable names) for JIT LSTMs. Each FusionGroup got registered to a separate fusion key; each key resulted in a different compilation for the same specializations. This PR makes it so that when registering FusionGroups with the fusion compiler, the compiler first checks the KernelSpec cache to see if the FusionGroup's graph exists already. If it does, then return the corresponding KernelSpec's key to share compiled kernels. In addition, graphs in the KernelSpec cache are canonicalized before being cached. I added a flag to the canonicalize pass to remove unique names of values. This shortens the compile time for a JIT LSTM (seq_len of 100, loop unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is running the graph fuser and/or fusion compiler; while this PR makes it so that there is only one unique kernel in the forward pass, there are a lot of different kernels (6) in the backward pass (after loop unrolling) that should be investigated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541 Differential Revision: D13324487 Pulled By: zou3519 fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb	2018-12-13 07:54:35 -08:00
Syed Tousif Ahmed	aa022313cb	Removes THCNumerics usages in RNN.cu (#15085 ) Summary: We don't need THCNumerics here since at::Half can be implicitly converted to float and the cuda math dispatches are handled by `/usr/local/cuda/include/crt/math_functions.hpp` and `cmath`. ATen should be free of THCNumerics after this and when porting kernels from THC, one should not use THCNumerics. Should close: https://github.com/pytorch/pytorch/issues/11878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15085 Differential Revision: D13447558 Pulled By: soumith fbshipit-source-id: 4ff5cbf838edcd01e2d1397e4d7f4f920e9e9fc3	2018-12-13 00:24:17 -08:00
Jongsoo Park	1e0eab5df8	minimize header file includes from _avx2.cc (#14950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14950 Minimize the number of headers included from _avx2.cc files to avoid accidental compilation of functions defined the header files reused by other translation units that can lead to illegal instruction errors. Reviewed By: dskhudia Differential Revision: D13394483 fbshipit-source-id: 67149a6fb51f7f047e745bfe395cb6dd4ae7c1ae	2018-12-13 00:18:11 -08:00
Gu, Jinghui	4b97a46421	Disable strict-overflow flag to avoid compilation error (#14977 ) Summary: Disable strict-overflow flag to avoid compilation error Pull Request resolved: https://github.com/pytorch/pytorch/pull/14977 Differential Revision: D13447577 Pulled By: soumith fbshipit-source-id: 1957bd5aa3c7b79219da3dd53560464977c89526	2018-12-12 22:41:33 -08:00
Russell Kaplan	1e93317b99	Remove "early-release beta" disclaimer from README (#15136 ) Summary: Now that PyTorch 1.0 is out, this should be updated :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15136 Differential Revision: D13447377 Pulled By: soumith fbshipit-source-id: bd4e662c53d0699f25d4d90c1b4c1e182b4427c2	2018-12-12 22:14:14 -08:00
Xianjie Chen	fabd23cb2d	support casting to string (#15110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110 support casting to string on CPU Reviewed By: intermilan Differential Revision: D13429381 fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1	2018-12-12 21:33:58 -08:00
Cheng,Penghui	1717ea1da0	Implementation of ChannelShuffle Op for MKLDNN (#15106 ) Summary: the speed-up of a single operation is up to 3X . Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106 Differential Revision: D13429596 Pulled By: bddppq fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5	2018-12-12 20:25:12 -08:00
Tyler Moncur	895cb8fcea	Fix resize for edge case tensors (#14874 ) Summary: Certain tensor shapes failed when being resized. This pull request addresses the bug found in #13404. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14874 Differential Revision: D13429788 Pulled By: soumith fbshipit-source-id: 8aa6451dbadce46d6d1c47a01cb26e6559bcfc8c	2018-12-12 19:56:23 -08:00
Peter Goldsborough	78a77667dd	Autoformat build_variables.py (#15152 ) Summary: autoformat `tools/build_variables.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15152 Differential Revision: D13445343 Pulled By: goldsborough fbshipit-source-id: fd63588de114cb92deda03fa1a0b36f5f9082b2f	2018-12-12 19:30:17 -08:00
Jongsoo Park	fab78827d6	don't compile dnnlowp.cc in avx2 option (#15147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15147 Forgot to take out dnnlowp.cc from avx2 list in a previous diff. Reviewed By: dskhudia Differential Revision: D13440686 fbshipit-source-id: 9ada98b6e885c7d5f22c91a735ff60304480b4cb	2018-12-12 18:57:09 -08:00
Brett Koonce	d8260239a0	docs: minor spelling tweaks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15148 Differential Revision: D13443708 Pulled By: suo fbshipit-source-id: 5e3ec0afd3416ab8ce207f2d04105c49e1c04611	2018-12-12 18:17:14 -08:00
Zachary DeVito	2211a283d2	Export defs.bzl to open source for pytorch (#15132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15132 Pull Request resolved: https://github.com/facebook/fbshipit/pull/64 Reviewed By: dzhulgakov Differential Revision: D13424093 fbshipit-source-id: bbebef964b9f3aef8f59cd394eca068680c36b5a	2018-12-12 17:40:29 -08:00
Junjie Bai	107c9ef518	Add back c2 string_utils include header to benchmark_helper Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15143 Differential Revision: D13439694 fbshipit-source-id: 78698b66d52a0178118cbf3e79a7a5ad1763d47b	2018-12-12 16:38:00 -08:00
Johannes M Dieterich	6610ace28b	use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994 ) Summary: * relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2 * use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API * with this: enable all but one half test in test_nn While there, fix also: * a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994 Differential Revision: D13439869 Pulled By: bddppq fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e	2018-12-12 16:16:47 -08:00
Viswanath Sivakumar	f34d827007	Optimize CPU GenerateProposals op by lazily generating anchors (3-5x faster) (#15103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15103 There are two main optimizations in this diff: 1. We generate all anchors for every single spatial grid first, and then apply NMS to pick 2000 anchors according to RPN_PRE_NMS_TOP_N. By first sorting the score and picking the 2000 top ones and then lazily generating only the corresponding anchors is much faster. 2. Transposing bbox_deltas from (num_anchors * 4, H, W) to (H, W, num_anchors * 4) was also quite slow - taking about 20ms in the RRPN case when there are lots of anchors which it's negligible for RPN case (like 0.1 ms). Instead of transponsing, performing all operations in the (num_anchors, H, W) format speeds things up. For regular RPN scenario, this gives 5x speedup from 5.84ms to 1.18ms a case with 35 anchors over a 600x600 image. For rotated boxes with 245 anchors, the runtime down from 80ms to 27ms per iter. Reviewed By: newstzpz Differential Revision: D13428688 fbshipit-source-id: 6006b332925e01a7c9433ded2ff5dc9e6d96f7d3	2018-12-12 15:53:52 -08:00
Shen Li	90f9e8103c	Implement torch.tril_indices and torch.triu_indices (#12653 ) (#14904 ) Summary: This is an optimized implementation that does the following: 1. created an empty Tensor of correct size. 2. fill the Tensor with correct values. The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors. 1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations. 2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration. 3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it. <img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png"> NOTE: This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following: ```python x = torch.ones(3, 3) i = torch.tril_indices(3, 3) x[i] # need to first convert the 2D tensor into a tuple of two 1D tensors. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904 Reviewed By: zou3519 Differential Revision: D13433027 Pulled By: mrshenli fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a	2018-12-12 15:40:14 -08:00
Imran	342e62f1e3	Minor documentation mistake (#15068 ) Summary: keepdim is a optional parameter for torch.max() Pull Request resolved: https://github.com/pytorch/pytorch/pull/15068 Differential Revision: D13437745 Pulled By: zou3519 fbshipit-source-id: b5198c7d4ae17758cd136f6e5aecc6cb5838f174	2018-12-12 15:24:26 -08:00
David Riazati	5837320b70	Add script standard library documentation + cleanup (#14912 ) Summary: Documents what is supported in the script standard library. * Adds `my_script_module._get_method('forward').schema()` method to get function schema from a `ScriptModule` * Removes `torch.nn.functional` from the list of builtins. The only functions not supported are `nn.functional.fold` and `nn.functional.unfold`, but those currently just dispatch to their corresponding aten ops, so from a user's perspective it looks like they work. * Allow printing of `IValue::Device` by getting its string representation Pull Request resolved: https://github.com/pytorch/pytorch/pull/14912 Differential Revision: D13385928 Pulled By: driazati fbshipit-source-id: e391691b2f87dba6e13be05d4aa3ed2f004e31da	2018-12-12 12:30:13 -08:00
Immanuel Alexander	64b3364209	Move adaptive avg pooling 2d to ATen native (#14714 ) Summary: adaptive_avg_pool1d, adaptive_avg_pool2d, and adaptive_avgpool3d are neural network functions that are currently implemented in our legacy THNN (CPU) / THCUNN (CUDA) libraries. It is generally better if these live in our new library ATen, since it is more feature complete and reduces cognitive overhead. This change moves currently to adaptive_avg_pool1d and adaptive_avg_pool2d to ATen. timed relevant cpu tests with this change: ``` [ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py test_AdaptiveAvgPool1d (__main__.TestNN) test_AdaptiveAvgPool1d_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_single (__main__.TestNN) test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_single (__main__.TestNN) test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN) test_adaptive_log_softmax (__main__.TestNN) test_adaptive_pooling_input_size (__main__.TestNN) test_adaptive_pooling_size_none (__main__.TestNN) .s.s.s.s.s.s.s... ---------------------------------------------------------------------- Ran 17 tests in 6.273s OK (skipped=7) real 0m7.164s user 3m1.289s sys 0m0.905s ``` compared to master: ``` [ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py test_AdaptiveAvgPool1d (__main__.TestNN) test_AdaptiveAvgPool1d_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_single (__main__.TestNN) test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_single (__main__.TestNN) test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN) test_adaptive_log_softmax (__main__.TestNN) test_adaptive_pooling_input_size (__main__.TestNN) test_adaptive_pooling_size_none (__main__.TestNN) .s.s.s.s.s.s.s... ---------------------------------------------------------------------- Ran 17 tests in 7.232s OK (skipped=7) real 0m8.065s user 3m34.714s sys 0m2.440s ``` also timed relevant cuda tests with this change: ``` [ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py test_AdaptiveAvgPool1d (__main__.TestNN) test_AdaptiveAvgPool1d_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_single (__main__.TestNN) test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_single (__main__.TestNN) test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN) test_adaptive_log_softmax (__main__.TestNN) test_adaptive_pooling_input_size (__main__.TestNN) test_adaptive_pooling_size_none (__main__.TestNN) ................. ---------------------------------------------------------------------- Ran 17 tests in 21.049s OK real 0m24.106s user 0m20.890s sys 0m4.026s ``` compared to master ``` [ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py test_AdaptiveAvgPool1d (__main__.TestNN) test_AdaptiveAvgPool1d_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_single (__main__.TestNN) test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_single (__main__.TestNN) test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN) test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN) test_adaptive_log_softmax (__main__.TestNN) test_adaptive_pooling_input_size (__main__.TestNN) test_adaptive_pooling_size_none (__main__.TestNN) ................. ---------------------------------------------------------------------- Ran 17 tests in 23.021s OK real 0m27.095s user 0m20.121s sys 0m3.668s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14714 Differential Revision: D13384084 Pulled By: xnder fbshipit-source-id: 344442103ccbbda72d3c010d2feea00e9985d226	2018-12-12 12:25:22 -08:00
Jerry Zhang	63e77ab6c4	Move numa.{h, cc} to c10/util (#15024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14393 att Reviewed By: dzhulgakov Differential Revision: D13380559 fbshipit-source-id: abc3fc7321cf37323f756dfd614c7b41978734e4	2018-12-12 12:21:10 -08:00
Richard Zou	b34ab435ef	Stop erroneously running aten::warn (#15124 ) Summary: Fixes #15119. Before this PR, we were propagating constants through aten::warn AND running it as a part of shape analysis. This caused aten::warn to be run regardless of if it is supposed to be run dynamically. This PR adds an exclusion for aten::warn in constant propagation and shape analysis, similar to that of prim::RaiseException. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15124 Differential Revision: D13432815 Pulled By: zou3519 fbshipit-source-id: 15ab533ce2accb2da3fd4e569070c7979ce61708	2018-12-12 11:35:23 -08:00
Edward Yang	2d485ffb17	Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248 This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current behavior of pretending CUDA is HIP when you build with ROCm enabled. Reviewed By: bddppq Differential Revision: D13145293 fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02	2018-12-12 11:24:26 -08:00
Gregory Chanan	9943cf2378	Kill Type.storage. (#15075 ) Summary: It's not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15075 Reviewed By: ezyang Differential Revision: D13422487 Pulled By: gchanan fbshipit-source-id: 272aa0a10e96f3ffb97d571490b517f972b9dcf7	2018-12-12 10:57:54 -08:00
Brennan Vincent	9d2955c39c	fix infinite loop when get_max_threads is nonzero but num_threads is 1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15114 Differential Revision: D13431891 Pulled By: umanwizard fbshipit-source-id: f968b8e50cf776c346d4a28d72b12e7856c95839	2018-12-12 10:04:18 -08:00
Gregory Chanan	68ad9ae5be	Ensure there aren't variables in checked_tensor_unwrap, checked_tenso… (#15105 ) Summary: …r_list_unwrap. These functions use unsafeGetTensorImpl(), which doesn't work with Variables (in a silent way that may blow up later). So let's do early checking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15105 Reviewed By: ezyang Differential Revision: D13429149 Pulled By: gchanan fbshipit-source-id: b85f6f5b7cdb9a6dd0c40205b924c840a3920ba0	2018-12-12 09:58:03 -08:00
Richard Zou	0ad39ec5c1	Add better support for bools in the graph fuser (#15057 ) Summary: Fixes #15038. aten::_cast_Float(tensor, non_blocking) support was added in #14336. Its second argument is a bool, but because we don't support generating values of type bool in the fuser codegen, the codegen errored out. aten::_cast_Float in the fuser never actually uses its non_blocking argument, so another way to fix this would be to have a special op for a fused cast but I thought that we might have fusible ops that do take bool arguments in the future so this would be good to have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15057 Differential Revision: D13432091 Pulled By: zou3519 fbshipit-source-id: 455fe574f5f080aca9a112e346b841a2534a8dc3	2018-12-12 09:39:44 -08:00
Brennan Vincent	f36a84b71b	fix some tests that I accidentally disabled (#15077 ) Summary: While moving these scenarios into `_test_dim_ops` I accidentally left an empty loop in the actual tests, causing them to do nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15077 Differential Revision: D13428759 Pulled By: umanwizard fbshipit-source-id: 08f53068981d9192c1408878b168e9053f4dc92e	2018-12-12 09:25:34 -08:00
Edward Yang	3ae684266a	Don't setup x86_64-linux-gnu-gcc as an sccache wrapper. (#15078 ) Summary: When I do this setup in a local Docker development environment, I get the following error: x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory Somehow, gcc seems to get confused when it gets run from the wrong directory. Best not to do it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15078 Differential Revision: D13432143 Pulled By: ezyang fbshipit-source-id: b18e15f493503a4c8205c85f92a214e49762a7bc	2018-12-12 08:01:03 -08:00
Junjie Bai	00a4c8d41c	Use c10::to_string that works cross platform (#15117 ) Summary: Fix master breakage introduced in #15108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15117 Differential Revision: D13430568 Pulled By: bddppq fbshipit-source-id: ce10bc552f085d1bf0afbc13119991bee014ac95	2018-12-12 02:58:49 -08:00
Zhiping Xiu	1423c0d9f1	Add EmptyNameScope to allow you jump out from current scope. (#14631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14631 adding a empty name scope to allow people jump out from current namescope. This could be useful when you want to access blob from parent or sibling scope. Facebook: e.g: we encoutered a potential usecase in D13124249 (it's a large diff, please search by EmptyNameScope in that diff), we need to access to a blob declared in root namescope from a device namescope (device namescope has been used by parallel_GPU API). `EmptyNameScope` can help us do that with ease. I referenced to `EmptyDeviceScope` D6103412 while implementing this one. Reviewed By: yinghai Differential Revision: D13272240 fbshipit-source-id: d4cde5abcc2336e456b6c6ef086266ef94d86da8	2018-12-12 01:39:50 -08:00
bddppq	479481b6cb	Remove linker and dlopen flags that allowed undefined symbols in rocm build (#15091 ) Summary: Previously the undefined symbols were caused by disabled_modules in tools/amd_build/disabled_features.json (now it's cleared). Pull Request resolved: https://github.com/pytorch/pytorch/pull/15091 Differential Revision: D13429595 Pulled By: bddppq fbshipit-source-id: b341e83f9e5a8d16440a364e837b045a8a4fd6e1	2018-12-11 23:23:47 -08:00
Peter Goldsborough	0dade9862c	Fix serialization (#15033 ) Summary: Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do, doesn't get properly loaded. This had to do with the fact that the old protobuf format couldn't store empty parameters. Fixes https://github.com/pytorch/pytorch/issues/14891 soumith ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/15033 Differential Revision: D13411322 Pulled By: goldsborough fbshipit-source-id: 2ef73b2aa93fa9e46b1cbe1fd47d9f134d6016d5	2018-12-11 22:43:36 -08:00
Fei Sun	e20f9bbead	Update the output format for benchmark_helper. It outputs the dimensi… (#15108 ) Summary: …on first and all the values in the next line. This way, it can output arbitrary blob Pull Request resolved: https://github.com/pytorch/pytorch/pull/15108 Reviewed By: llyfacebook Differential Revision: D13429346 Pulled By: sf-wind fbshipit-source-id: 5e0bba2a46fbe8d997dfc3d55a698484552e3af8	2018-12-11 22:24:56 -08:00
Zachary DeVito	b07ee44f40	Pre-commit flake8/clang-tidy (#15102 ) Summary: Provide a pre-commit hook that does flake8 and clang tidy checks. Enables the clang-tidy script to run in parallel to make it fast enough to be used in a pre-commit hook. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15102 Reviewed By: soumith Differential Revision: D13429629 Pulled By: zdevito fbshipit-source-id: bd52fe5652f29b033de8d9926d78350b2da4c2fc	2018-12-11 22:18:18 -08:00
Jane Wang	f8455ed754	add gloo support for gather on GPU (#14916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14916 as titled Reviewed By: pietern Differential Revision: D13267832 fbshipit-source-id: 3b89d08af93f74941f17ff892c33fc2a4a023c19	2018-12-11 21:21:10 -08:00
Sebastian Messmer	3fa53da61a	Fix include paths for UndefinedTensorImpl.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14818 Reviewed By: ezyang Differential Revision: D13348042 fbshipit-source-id: 11bdfc755767ce9d0a6fa95b2cf49d50adde8d60	2018-12-11 21:01:45 -08:00
Sebastian Messmer	63db95dd11	Move UndefinedTensorImpl to c10 (meh) (#14817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14817 unfortunately, we still need this. Reviewed By: ezyang Differential Revision: D13348041 fbshipit-source-id: e8dcc89f5c71bd1ea2c9813990dac6e58e63b1fd	2018-12-11 21:01:42 -08:00
Sebastian Messmer	2dfdbef91d	Fix include paths for TensorImpl.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14816 Reviewed By: ezyang Differential Revision: D13348040 fbshipit-source-id: a7204d89c2dd277d13093b0ed862f40b53dee82f	2018-12-11 21:01:40 -08:00
Sebastian Messmer	9e9e87c19e	Move TensorImpl to c10 (yay!) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14795 Reviewed By: ezyang Differential Revision: D13336856 fbshipit-source-id: 5375d0e42312ff7564f4df06210a5e49542d59e3	2018-12-11 21:01:38 -08:00
Gregory Chanan	bff6d42cef	Add at::scalar_tensor factory function, use it instead of Type.scalar… (#15074 ) Summary: …_tensor. This is part of a long series of paring down the Type interface. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15074 Differential Revision: D13421482 Pulled By: gchanan fbshipit-source-id: 84010ee71fef2cb74d32d5de7858d8ed9f36b885	2018-12-11 20:37:41 -08:00
Edward Yang	b710642969	Make ATen HIPify out-of-place, but still reuse CUDA names. (#14866 ) Summary: ``` This diff changes the HIPification of ATen to be out-of-place. We now have the following mappings: - ATen/cuda => ATen/hip - ATen/native/cuda => ATen/native/hip - ATen/native/sparse/cuda => ATen/native/sparse/hip - THC => THH - THCUNN => THHUNN The build system is adjusted to know about these new build paths, and HIPify is taught how to adjust include paths and THC_GENERIC_FILE appropriately. ATen_hip is now built as the ATen_hip library, rather than reusing ATen_cuda. However, despite these new filepaths, none of the identifiers in ATen have actually changed. So, e.g., THHGeneral.h still defines functions named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself. We'll tackle this in a subsequent PR; this diff is just to get the files out-of-place. Minor extra improvements: - Don't edit tmp_install when hipifying - HIP no longer builds native_cudnn_cpp; it was unnecessary - Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency with all the other variables. - HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it did not previously.) - You can now override file extension matching in pyHIPIFY by explicitly specifying its full name in the matching list. This is used so we can HIPify CMakeLists.txt in some situations. A little bit of string and ceiling wax: - gen.py grows a --rocm flag so that it knows to generate CUDA files which actually refer to the HIP headers (e.g., THH.h) We'll get rid of this eventually and generate real HIP files, but not for this PR. - Management of HIP dependencies is now completely deleted from the ATen CMakeLists.txt. The old code was dead (because it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly ignored by the Caffe2 build system) and didn't actually work. ``` Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866 Differential Revision: D13419475 Pulled By: ezyang fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db	2018-12-11 19:15:27 -08:00
Daniel Ingram	5c2c40ad87	Add error type to raise statement Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15039 Differential Revision: D13419566 Pulled By: zou3519 fbshipit-source-id: f67a3aebce937e3e640e91e81eb3e184cfdf269c	2018-12-11 17:41:44 -08:00
Peter Goldsborough	73ee7fda4c	Remove deprecated variable_tensor_functions (#15003 ) Summary: Removing the deprecated functions in `torch/csrc/variable_tensor_functions.h` (like `torch::CPU`) and corresponding implementations from `torch/csrc/torch.cpp` from master after the release. ezyang gchanan soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15003 Differential Revision: D13418086 Pulled By: goldsborough fbshipit-source-id: a0accdf6f7b0efa1ec07ac7b74b86ff2da37543f	2018-12-11 17:16:11 -08:00
Jane Wang	0552326846	add gloo scatter support on GPU (#14917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14917 as titled Reviewed By: pietern Differential Revision: D13271560 fbshipit-source-id: 0187a3390f8ebd72a2c074e7a651432159d427c0	2018-12-11 17:11:13 -08:00
Zachary DeVito	92314c83fa	re-enable copy of python files, but be careful that the copy is only … (#14982 ) Summary: …done once This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982 Differential Revision: D13413960 Pulled By: zdevito fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853	2018-12-11 16:54:08 -08:00
Richard Zou	71e0cb505c	Split off fuser tests in test_jit.py to their own test case (#15072 ) Summary: This PR creates TestFuser inside test_jit.py to be a home for graph fuser specific tests. This was a useful exercise because now that all the fuser tests are in one place, I can spot redundant and bitrotting tests for cleanup in a future PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15072 Differential Revision: D13421458 Pulled By: zou3519 fbshipit-source-id: 80b1a7712feff75a0c186d1664601c4edbbca694	2018-12-11 14:55:06 -08:00
David Riazati	7408ce2f80	Supress warnings on generated tests Summary: Removes all warnings spew for the TestJitGenerated tests Differential Revision: D13420919 fbshipit-source-id: f251c12f923088ccc5daa2984c15003a67cbd1c1	2018-12-11 14:00:41 -08:00
Josef Lindman Hörnlund	04b65dfd1f	Issue 14984: Remove divide by zero error in index_put_ (#14986 ) Summary: No check for zero index tensor was done in the accumulate=True (serial) case in the new TensorIterator code since https://github.com/pytorch/pytorch/pull/13420. https://github.com/pytorch/pytorch/issues/14984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14986 Differential Revision: D13417861 Pulled By: colesbury fbshipit-source-id: e6ed1af8f708b53a35803fc157ed1f043169ec89	2018-12-11 13:38:12 -08:00
zrphercule	109c8d22dc	Update onnx coverage script for more accurate result (#15029 ) Summary: The coverage of scalar-input test cases were not accurate. This patch fixed that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15029 Differential Revision: D13419764 Pulled By: zrphercule fbshipit-source-id: a14a5cbef432bea8c9126156f5deb1125e1aeb47	2018-12-11 13:14:35 -08:00
Michael Suo	f2f47de5ad	tox.ini -> .flake8 (#15065 ) Summary: We were only using this file to configure flake8, and fbcode linters do not recognize tox.ini which causes spurious linter warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15065 Differential Revision: D13420774 Pulled By: suo fbshipit-source-id: e43a46befa36862c8b3c0a90074aec6a66531492	2018-12-11 13:14:34 -08:00
Roy Li	ca7f8fed60	silence unreachable code warnings (#15036 ) Summary: Stack:     ⚫  #15036 silence unreachable code warnings  [💛](https://our.intern.facebook.com/intern/diff/D13411100/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/15036 Differential Revision: D13414712 Pulled By: li-roy fbshipit-source-id: d4aa84571fa94c66f3c5bfa9575a10c6ee398f9e	2018-12-11 13:09:04 -08:00
Michael Suo	d825b39061	improve deep equality check in alias annotation test (#15031 ) Summary: Previously we were returning true if either IValue wasn't a tensor, which…is bad Pull Request resolved: https://github.com/pytorch/pytorch/pull/15031 Differential Revision: D13409759 Pulled By: suo fbshipit-source-id: f8bdcd05d334c1276ce46f55812065d358c1ff5d	2018-12-11 12:14:00 -08:00
James Sun	02d149b767	Fix race condition in ThreadPool::workOnTasksUntilCompleted (#14833 ) Summary: Resolves #14704 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14833 Differential Revision: D13405211 Pulled By: highker fbshipit-source-id: 8552d51eeb5d3af0ed66c461e5ddfeb9ae2926bd	2018-12-11 11:46:58 -08:00
TerryTsao	c2a754c58b	Fix CMakeLists.txt for Int8 python bindings (#15047 ) Summary: Currently in caffe2, one cannot properly fetch the content of Int8 blobs. Upon digging the source code, it turns out that the relevant source code is not being compiled. Adding the source to CMakeLists.txt fixes this issue. First time ever doing a pull request. Please let me know if there's any rule I should follow. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15047 Differential Revision: D13417583 Pulled By: bddppq fbshipit-source-id: dd39575971a3012635edbf97a045d80e4b62a8eb	2018-12-11 10:48:47 -08:00
Orion Reblitz-Richardson	687834dcb4	Install cpp tests when built (#15000 ) Summary: This is broken out of https://github.com/pytorch/pytorch/pull/13733/ We want to install cpp tests so they can ultimately be runnable from that location for Caffe2 tests run from PyTorch builds. cc pjh5 yf225 anderspapitto Pull Request resolved: https://github.com/pytorch/pytorch/pull/15000 Reviewed By: pjh5 Differential Revision: D13416253 Pulled By: orionr fbshipit-source-id: 51280be0a22557a742f90c9f303c58c35cbd4a38	2018-12-11 10:07:48 -08:00
Michael Carilli	5d3a347685	Stashing checkpointing RNG states based on devices of arg tensors (#14518 ) Summary: This PR intends to address apaszke's concerns in https://github.com/pytorch/pytorch/pull/14253#issuecomment-441740016. Preserving the rng state is now controlled by a kwarg rather than a global state, hopefully in a python 2.7-compatible way. Additionally, the checkpointing function stashes and restores the RNG states of 1. devices associated with all input tensor args to run_fn as well as 2. the current device. I could easily change this to only save and restore the RNG states associated 1. alone. This would simplify the logic to create a [deduplicated, ordered](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R37) list of devices considered active. I'm wondering if the [get_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R32) and [set_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) functions are general enough to reside elsewhere (presumably torch/random.py). I'm also wondering if the check on [torch.cuda._initialized](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) would be better placed within `get_device_states`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14518 Differential Revision: D13356210 Pulled By: ezyang fbshipit-source-id: afa4cc21ce7862142d5cb1dec3750018df222039	2018-12-11 09:48:45 -08:00
svcscm	25ddd659c9	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: d39b31f12ab2ab570548f3e8a65949332a64a0ff	2018-12-11 07:40:37 -08:00
Marat Dukhan	bf1d411dbf	Switch Int8Softmax, Int8Relu, and Int8LeakyRelu to QNNPACK (#14933 ) Summary: Int8Softmax: 4x-5x speedup compared to previous implementation Pull Request resolved: https://github.com/pytorch/pytorch/pull/14933 Differential Revision: D13406820 Pulled By: Maratyszcza fbshipit-source-id: ea8cbe1b861ddb7ff1b851d06d52c6fd6d04ed01	2018-12-11 00:49:06 -08:00
Lingyi Liu	a1ea7dbe40	Adjust the API call to deserilize the tensorproto (#14132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14132 as title Reviewed By: jerryzh168 Differential Revision: D13110697 fbshipit-source-id: 822c9079de11951f90aec3d26f0e4108847e7dac	2018-12-10 22:54:42 -08:00
Natalia Gimelshein	27d5ae7afb	use datatype dependent tolerance in data parallel tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856 Differential Revision: D13413560 Pulled By: soumith fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481	2018-12-10 22:50:27 -08:00
paland3	81dc78d871	Update pooling.py (#14998 ) Summary: Strange line in the documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14998 Differential Revision: D13413235 Pulled By: soumith fbshipit-source-id: 80d05ec1185719b785f0aac914bc2369c1174f2f	2018-12-10 22:36:20 -08:00
Zachary DeVito	48a361cc62	Clean up casting ops (#14947 ) Summary: This removes FloatToInt style names replacing it with just the destination name (e.g. FloatToInt -> Float). This makes it more consistent with the syntax and makes it easier to add type conversions (just add a new prim::Int op, for instance). None of these ops get serialized so this should not effect loading of old models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14947 Differential Revision: D13408409 Pulled By: zdevito fbshipit-source-id: d773fe863f14d9de893f686832769f8cc8903a8e	2018-12-10 22:15:08 -08:00
Jongsoo Park	cff509e2b1	share code between adagrad and rowwise adagrad tests (#14692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14692 Remove some code duplication Reviewed By: chocjy Differential Revision: D13296731 fbshipit-source-id: 5924e037ca64fc4b89234be922bc5ca47fb8bd32	2018-12-10 22:10:39 -08:00
Ilia Cherniavskii	c48b15e41a	TBB task graph (#15041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15041 Adding an alternative implementation of a task graph based on TBB Reviewed By: dmudiger Differential Revision: D13412517 fbshipit-source-id: f5efedd680bbe0072bf38d504e5682ab51dd630f	2018-12-10 21:35:04 -08:00
bddppq	45dfc6764e	Enable more caffe2 fp16 rocm tests (#15040 ) Summary: cc rohithkrn petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/15040 Reviewed By: houseroad Differential Revision: D13413068 Pulled By: bddppq fbshipit-source-id: b2967f16f8da0b9e80083138fb8632c14e9e9b63	2018-12-10 21:30:21 -08:00
Lu Fang	5022f9d6ef	Enable the build of tests in ATen/core (#15032 ) Summary: Otherwise they won't build Pull Request resolved: https://github.com/pytorch/pytorch/pull/15032 Reviewed By: yinghai Differential Revision: D13409801 Pulled By: houseroad fbshipit-source-id: 95464aa8f3604835997ba1bb7f3c3e51485d1686	2018-12-10 21:24:54 -08:00
Gregory Chanan	962b82dd81	More scaffolding for LegacyTHDispatch. (#14852 ) Summary: 1) at::functions are now also exposed in the at::legacy::th namespace and we move relevant calls over to use them (to avoid merge conflicts) 2) LegacyTHDispatch now handles device-type initialization 3) We generate derived LegacyTHDispatchers, e.g. THLegacyCPULongDispatcher, although they are currently empty. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14852 Reviewed By: ezyang Differential Revision: D13360852 Pulled By: gchanan fbshipit-source-id: af6705aeba3593ea5dba9bfc62890e5257bc81f8	2018-12-10 19:57:01 -08:00
Ilia Cherniavskii	e9cd781681	Back out "Revert D13043261: [caffe2] Task graph and task future abstractions in executor" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15030 Reviewed By: bddppq Differential Revision: D13408998 fbshipit-source-id: 9eb675e09fbc4829eab34df7aa660a0590816feb	2018-12-10 19:30:58 -08:00
Jerry Zhang	83f32eebd9	Tensor construction codemod - 2/3 (#14836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14836 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: bddppq Differential Revision: D13335176 fbshipit-source-id: 8d89510670e2cf70559d2f75e68f7181feb0b6d9	2018-12-10 19:30:56 -08:00
Jesse Hellemn	5222a1b190	Fixing reading of FBGEMM from env variables Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15023 Reviewed By: orionr Differential Revision: D13406778 Pulled By: pjh5 fbshipit-source-id: 2265f01170fb7969cbdf4e44ca6ef183f5d8017d	2018-12-10 18:18:38 -08:00
Syed Tousif Ahmed	a97cf568a4	Alignas Array struct (#14920 ) Summary: This PR aligns the Array struct such that cuda vector performance improvements can be utilized. I tested this by using it on our Philox header. Note how the vector store instruction gets used for cuda vector types and when using alignas on Array, vs when not using alignas on Array. With cuda vector type (uint4, uint2, float4): https://godbolt.org/z/UaWOmR With alignas: https://godbolt.org/z/Eeh0t5 Without alignas: https://godbolt.org/z/QT63gq Pull Request resolved: https://github.com/pytorch/pytorch/pull/14920 Differential Revision: D13406751 Pulled By: soumith fbshipit-source-id: 685b1010ef1f576dde30c278b1e9b642f87c843d	2018-12-10 17:58:03 -08:00
rohithkrn	7e2b074219	Integrate rocBLAS fp16 api into Caffe2 (#14882 ) Summary: This PR integrates rocBLAS half and mixed precision APIs in to Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882 Differential Revision: D13407840 Pulled By: bddppq fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e	2018-12-10 17:54:06 -08:00
Junjie Bai	92f3616f36	Fix old tensor CopyFrom usage in boolean mask operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15025 Differential Revision: D13407323 Pulled By: bddppq fbshipit-source-id: 1bc1d28ad0c6c71d25d788549be18917e393ee50	2018-12-10 17:23:45 -08:00
Jongsoo Park	4fcc2fffc3	unit test with multiple omp threads (#14958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14958 Test with multiple threads Reviewed By: jianyuh Differential Revision: D13394791 fbshipit-source-id: 931a6c3bda15ebc816807e537dd0841c383e7a6f	2018-12-10 17:23:44 -08:00
Jerry Zhang	9b272c08cf	Remove partially initialized Tensor in Deserialization (#14197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14197 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13642 Previously we pass in a patially initialized Tensor to Deserialize and it will fill it with the result of deserialization of a tensor proto. Now we want it to return a Tensor directly since it's just a shared pointer to TensorImpl. Reviewed By: dzhulgakov Differential Revision: D12874357 fbshipit-source-id: 12b80a763375da23cfa64a74d6bc186d8d03b94f	2018-12-10 17:17:29 -08:00
Junjie Bai	4a145cd95c	Revert D13043261: [caffe2] Task graph and task future abstractions in executor Differential Revision: D13043261 Original commit changeset: d89424354aea fbshipit-source-id: b307e3281c4d83b60ba2bfadcbcf69afb7a41412	2018-12-10 16:03:59 -08:00
James Reed	0a36fe565d	apply() for ScriptModules (#14655 ) Summary: This can be use to initialize state that is not necessarily eligible for serialization/is implementation-specific. Concretely, I'm going to use this to pack the weight matrices for quantized Linear modules according to the FBGEMM APIs Pull Request resolved: https://github.com/pytorch/pytorch/pull/14655 Differential Revision: D13404438 Pulled By: jamesr66a fbshipit-source-id: 2d327cef5520fdd716b5b1b29effd60a049e8a4a	2018-12-10 15:40:31 -08:00
Edward Yang	9bbb3efe2f	Simplify THPPointer implementation for Storage. (#14897 ) Summary: We've virtualized the destructor for storage, so we no longer have to forward to a particular backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14897 Differential Revision: D13399216 Pulled By: ezyang fbshipit-source-id: 531d29c3f278477cfa8759f30ab4f304d695b659	2018-12-10 15:18:49 -08:00
Edward Yang	23cc3daabd	Disable getNumGPUs rewrite (#14993 ) Summary: cc iotamudelta Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14993 Differential Revision: D13405804 Pulled By: ezyang fbshipit-source-id: c4aa9ed29ee2a4f3abf76c1e0fa8babfd738db35	2018-12-10 15:13:55 -08:00
Sebastian Messmer	6ad9f7b798	Fix include path for WrapDimMinimal.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14794 Reviewed By: dzhulgakov Differential Revision: D13336842 fbshipit-source-id: ca49a9fd1d409d8a75e43eeb9b9b02c305ebb79a	2018-12-10 15:10:03 -08:00
Sebastian Messmer	279ec9ef7a	Move WrapDimMinimal to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14793 Reviewed By: ezyang Differential Revision: D13336841 fbshipit-source-id: 4365a799e1856cc68dd94a273e97663fee5f51db	2018-12-10 15:10:01 -08:00
Edward Yang	66315ab323	Stop disabling maybeOverlappingIndices (#14999 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/14999 Differential Revision: D13405754 Pulled By: ezyang fbshipit-source-id: 98459496494390ad1115b4f1f6738d53c14f0745	2018-12-10 15:02:08 -08:00
Jane Wang	483ba553bd	add gloo allgather support on GPU (#14576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14576 as titled Reviewed By: pietern Differential Revision: D13266063 fbshipit-source-id: e262f77d63724a7504a7112907bbfba49612fe75	2018-12-10 14:32:54 -08:00
Ilia Cherniavskii	029600813e	Task graph and task future abstractions in executor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14116 Reviewed By: dmudiger Differential Revision: D13043261 fbshipit-source-id: d89424354aea14d1d14eb8320fb3aa34908a4e81	2018-12-10 14:28:56 -08:00
Jerry Zhang	a51fe386c8	caffe2/caffe2/contrib/script (#15007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14979 att Reviewed By: dzhulgakov Differential Revision: D13286191 fbshipit-source-id: b8a6bc7aea44487aea4dcf7f44c858fd30c6293c	2018-12-10 14:23:31 -08:00
Michael Suo	25144c8a09	s/Torch Script/TorchScript/g (#15011 ) Summary: pls Pull Request resolved: https://github.com/pytorch/pytorch/pull/15011 Differential Revision: D13404158 Pulled By: suo fbshipit-source-id: e906281463d65c86e4e9073eb0c0a26f4f29e307	2018-12-10 13:48:24 -08:00
Yuxin Wu	110ccbb689	Improve the docs of interpolate(align_corners=) (#14806 ) Summary: ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14806 Reviewed By: ailzhang Differential Revision: D13366332 Pulled By: ppwwyyxx fbshipit-source-id: 08fcea95d5c86b11cdfe464fdd9daa50050871f1	2018-12-10 12:50:38 -08:00
Giuseppe Ottaviano	e77de07448	Improve build time of register_symbols.cpp without compiler hacks (#14911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14911 In optimized modes the compiler tries to inline all the `unordered_map::operator[]` calls, creating a massive amount of code which takes several minutes to optimize. Instead, create a table of PODs and populate the maps using a simple loop. Reviewed By: soumith, luciang Differential Revision: D13382948 fbshipit-source-id: b6752921e0f7213595d26b39e4397f6a3897960b	2018-12-10 11:57:11 -08:00
Edward Yang	18c93b87c2	Delete defunct THP_API.h header. (#14899 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14899 Differential Revision: D13383687 Pulled By: ezyang fbshipit-source-id: f2a08a769cc3775ba55f9c58d622a83df622d816	2018-12-10 10:47:24 -08:00
Edward Yang	1989157eb6	Disable test_leaf_variable_sharing on ASAN runs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15001 Reviewed By: orionr Differential Revision: D13399119 fbshipit-source-id: 6b1d098e55a67b1f5bc6d08a8ee3c1be8234a654	2018-12-10 10:43:05 -08:00
Edward Yang	d30b6bf3b6	Revert D13306052: [pytorch][PR] Allow converting CharTensor to np arrays Differential Revision: D13306052 Original commit changeset: 202d038f139c fbshipit-source-id: 11f6bdd687f8ea5ce2e5f28f48d19449a5c403eb	2018-12-10 10:36:17 -08:00
Edward Yang	dc1e6d0b98	Non-INTERFACE AT_LINK_STYLE is dead code (#14822 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14822 Differential Revision: D13355574 Pulled By: ezyang fbshipit-source-id: a7173084f8735424619b2e393df2715a05918b44	2018-12-10 09:42:53 -08:00
SsnL	54d5c53826	Support torch.load with encoding (#14743 ) Summary: Addresses a common compatibility issue when loading Py2 checkpoints in Py3 regarding to bytes. E.g., [1] https://github.com/pytorch/pytorch/issues/5994, [2] https://github.com/CSAILVision/places365/issues/25, [3] https://discuss.pytorch.org/t/how-to-load-a-saved-model-trained-on-pytorch-0-3-1-python-2-7-on-pyorch-1-0-python-3-7/31212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14743 Reviewed By: weiyangfb Differential Revision: D13350888 Pulled By: soumith fbshipit-source-id: 2df4e828a8b70509118a355307ca3ebe51e108f6	2018-12-10 08:07:36 -08:00
SsnL	9b2bd284b3	Convert int8 numpy array to CharTensor (#14700 ) Summary: When rewriting `default_collate`, I noticed that `from_numpy` and `as_tensor` and `tensor` all do not work on `np.int8` arrays. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14700 Reviewed By: weiyangfb Differential Revision: D13305297 Pulled By: soumith fbshipit-source-id: 2937110f65ed714ee830d50098db292238e9b2a9	2018-12-10 07:39:06 -08:00
SsnL	e1b5dbf699	Allow converting CharTensor to np arrays (#14710 ) Summary: The other direction of #14700 cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14710 Reviewed By: weiyangfb Differential Revision: D13306052 Pulled By: soumith fbshipit-source-id: 202d038f139cf05e01069ff8d05268c66354c983	2018-12-10 07:35:28 -08:00
Jongsoo Park	b039a715ce	pre-pack operation of dnnlowp conv with 16-bit accumulation (#14881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14881 This diff allows us to pre-quantize and pre-pack weight matrix used in DNNLOWP_ACC16 . The intended use pattern is run Int8ConvPackWeight in init_net that generates a packed weight and Int8Conv with DNNLOWP_ACC16 engine uses the the packed weight. Reviewed By: csummersea Differential Revision: D13374662 fbshipit-source-id: dd02b9a4eb7af1fe208aa857fcd0b445e6e395af	2018-12-10 01:08:21 -08:00
Zachary DeVito	e747acbebb	Respect -q of setup.py (#14972 ) Summary: 1. Changes the prints along the 'rebuild' pathway to respect the '-q' flag of setup.py A clean rebuild now only prints: [zdevito@devgpu172.prn2 /data/users/zdevito/pytorch] python setup.py -q rebuild develop [0/1] Install the project... -- Install configuration: "RelWithDebInfo" ninja: no work to do. ninja: no work to do. ninja: no work to do. ninja: no work to do. ninja: no work to do. ninja: no work to do. 2. Deletes apparently dead calls to `generate_code`. Now that CMake builds these files, it appears that it is getting called twice and the second version is never used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14972 Reviewed By: soumith Differential Revision: D13396330 Pulled By: zdevito fbshipit-source-id: 83c45143bbc6a6d2c1cfee929291ec059f2b5dc3	2018-12-09 22:47:49 -08:00
SsnL	fab8085111	_get_device_index supports parsing device strings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14929 Reviewed By: weiyangfb Differential Revision: D13394498 Pulled By: soumith fbshipit-source-id: 948c6118abdf6c1e1a8a17709333954cafb2345e	2018-12-09 21:12:46 -08:00
Soumith Chintala	5fd69e7551	remove mingfeima mkldnn reference from README, as no longer necessary (#14975 ) Summary: we now get mkldnn automatically from third_party/ideep Differential Revision: D13396480 Pulled By: soumith fbshipit-source-id: 20f819ba4b78cbe9c7d0baeab1c575669cbf6c20	2018-12-09 20:44:10 -08:00
Zachary DeVito	aefc83f46d	fixing some rebuild issues (#14969 ) Summary: This fixes rebuild issues with the ninja part of the build. With this patch all ninja files will now report `nothing to do` if nothing has changed assuming `BUILD_CAFFE2_OPS=0`. 1. This only does the python file processing for caffe2 when BUILD_CAFFE2_OPS=1, this part of the build file is written in such a way that it is always required to rerun and can take substantial time to move files around in the no-op build. In the future this part should be rewritten to use a faster method of copying the files or should treat copying the files as part of the build rules and only run when the files are out of date. 2. This points `sleef` to a patched version that fixes a dead build output that is causing everything to relink all the time. See https://github.com/shibatch/sleef/pull/231#partial-pull-merging for the upstream change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14969 Reviewed By: soumith Differential Revision: D13395998 Pulled By: zdevito fbshipit-source-id: ca85b7be9e99c5c578103c144ef0f2c3b927e724	2018-12-09 16:32:19 -08:00
vishwakftw	fc30e2782c	Remove deprecated info argument in btrifact (#14935 ) Summary: As specified in title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14935 Differential Revision: D13394449 Pulled By: soumith fbshipit-source-id: 569d59414f3a1a43ea641bded4b5433eb53e3490	2018-12-09 15:59:30 -08:00
Soumith Chintala	86e03b8a30	add fix for CUDA 10 (#14971 ) Summary: Linux binaries-only fix for CUDA10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14971 Differential Revision: D13395932 Pulled By: soumith fbshipit-source-id: a72d6ab6b98c6c936e6391d55d2e4e45b9f1e6dd	2018-12-09 15:54:27 -08:00
Your Name	5f2736b84a	Fix mismatched test_{full,ones,zeros}_like onnx expect files (#14956 ) Summary: master broken #14903 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14956 Differential Revision: D13395363 Pulled By: bddppq fbshipit-source-id: 31f0913843292e557807fd5a976f8907fa6cae4b	2018-12-09 08:57:14 -08:00
Yiming Wu	a1494efdfa	fix auto grad summing for IfOp where intermediate output needs renaming (#14772 ) Summary: fix auto grad summing for IfOp where intermediate output needs renaming. Bug before this diff: - we only renames the output of IfOp without changing the subnet ops output - this results in blob not found error the unittest provides an example this diff fix that for IfOp Pull Request resolved: https://github.com/pytorch/pytorch/pull/14772 Differential Revision: D13327090 Pulled By: harouwu fbshipit-source-id: ec40ee88526ace3619c54551e223dd71158a02f8	2018-12-09 08:26:46 -08:00
Spandan Tiwari	fa12e1e4d4	Export ones_like, zeros_like and full_like using ONNX ConstantLike op. (#14903 ) Summary: This PR does the following: 1) Updates the ONNX export for `torch.zeros_like` and `torch.full_like` ops to use ONNX op `ConstantLike`. This reduces the export of experimental op `ConstantFill`, which may possibly be removed in future, see https://github.com/onnx/onnx/pull/1434). 2) It also adds export support for `torch.ones_like`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14903 Differential Revision: D13383700 Pulled By: houseroad fbshipit-source-id: 566d00a943e9497172fcd5a034b638a650ab13a2	2018-12-08 22:49:02 -08:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Jongsoo Park	a7b3197b2d	race condition fix of calling mutable_data inside a openmp region (#14921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14921 Fix race condition introduced in D13188595 . Let's reminder ourselves "never call mutable_data from an OpenMP region!!!" Reviewed By: jianyuh Differential Revision: D13387692 fbshipit-source-id: 6a3aeedeeda55a9ede660de8f1f44d4eee76ae2b	2018-12-08 18:17:20 -08:00
Fei Sun	e9db9595d2	Add crop argument, can crop rec as well, first resize and then crop Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14894 Reviewed By: llyfacebook Differential Revision: D13377604 Pulled By: sf-wind fbshipit-source-id: 333d0d864e6c2dc85f405baa25ed58029d62750f	2018-12-08 11:14:56 -08:00
Marat Dukhan	b0909ea6a0	Switch Int8Sigmoid to QNNPACK (#14883 ) Summary: 50x-100x speedup compared to current version. Also, fixes a bug in the current version when batch size exceeds 1 (current version processes only the first image in this case). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14883 Differential Revision: D13390655 Pulled By: Maratyszcza fbshipit-source-id: 1b33a97bf2d0866d38faa2b42e64fd2859017898	2018-12-08 02:47:29 -08:00
Your Name	5e06fa0baf	ONNX changes to use int32_t (instead of enum) to store data type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14926 Reviewed By: houseroad Differential Revision: D13390642 Pulled By: bddppq fbshipit-source-id: c2314b24d9384f188fda2b9a5cc16465ad39581e	2018-12-08 01:06:08 -08:00
Sebastian Messmer	c8a5ec14dd	Remove at references from c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14432 Reviewed By: dzhulgakov Differential Revision: D13223904 fbshipit-source-id: 43b06e33e088e7789ccea6d92267936fe30d8571	2018-12-08 00:28:35 -08:00
Brennan Vincent	25110d61fb	Implement `std` for multiple dimensions on CPU devices. (#14535 ) Summary: Tested on a tensor with 1 billion elements and 3 dimensions on a powerful, highly multi-core Linux machine. parallelized: All operations (e.g., `t.std(1)`) that could be done in the old code are now several times faster. All new operations (e.g., `t.std((0,2))` are significantly faster than the NumPy equivalents. `t.std((0, 1, 2))`, a new operation, is logically equivalent to the old `t.std()`, but faster. serial: The above comment about old operationos now being faster still holds, but `t.std((t1, ..., tn))` is now a few times slower than `t.std()`. If this turns out to be important, we can special-case that to use the old algorithm. The approach is to create a new method, `TensorIterator::foreach_reduced_elt`, valid for `TensorIterator`s that represent a dimension reduction. This method calls a supplied function for each element in the output, supplying it with the input elements that correspond to that output. Given that primitive, we can implement reductions like the following pseudocode: If there is more than one output element: ``` PARALLEL FOR EACH element IN output: accumulator = identity SERIAL FOR EACH data_point IN element.corresponding_input: accumulator.update(data_point) element = accumulator.to_output() ``` If there is only one output element, we still want to parallelize, so we do so along the input instead: ``` accumulators[n_threads] PARALLEL FOR EACH input_chunk IN input.chunks(): accumulators[thread_num()] = identity SERIAL FOR EACH data_point IN input_chunk: accumulators[thread_num()].update_with_data(data_point) accumulator = identity SERIAL FOR EACH acc in accumulators: accumulator.update_with_other_accumulator(acc) output_element = accumulator.to_output() ``` Note that accumulators and data points do not have to be the same type in general, since it might be necessary to track arbitrary amounts of data at intermediate stages. For example, for `std`, we use a parallel version of Welford's algorithm, which requies us to track the mean, second moment, and number of elements, so the accumulator type for `std` contains three pieces of data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14535 Differential Revision: D13283887 Pulled By: umanwizard fbshipit-source-id: 8586b7bf00bf9f663c55d6f8323301e257f5ec3f	2018-12-07 20:16:04 -08:00
Orion Reblitz-Richardson	c2a75926ca	Add CAFFE2_API to video processing functions (#14900 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/13733 Some tests were failing because these methods didn't have an export. cc pjh5 yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14900 Reviewed By: pjh5 Differential Revision: D13381130 Pulled By: orionr fbshipit-source-id: 030536f8fb09765c09a7b0bd45400161053f2e18	2018-12-07 19:55:21 -08:00
Johannes M Dieterich	52942e1f09	Enable unit tests known to work on ROCm (#14011 ) Summary: * Enable unit tests known to work on ROCm. * Disable a few that are known to be flaky for the time being. * Use std::abs for Half * No more special casing for ROCm in TensorMathReduce * Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce ezyang bddppq for awareness Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011 Differential Revision: D13387679 Pulled By: bddppq fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71	2018-12-07 18:57:32 -08:00
Lu Fang	5be28ade66	Automatic update of fbcode/onnx to aca8473a40cf43f01958c81b648efcee7f3a755a (#14865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14865 Previous import was 42804705bdbf179d1a98394008417e1392013547 Included changes: - [aca8473](https://github.com/onnx/onnx/commit/aca8473): Add Erf operator for computing error function (#1675) <bddppq> - [3fc82ca](https://github.com/onnx/onnx/commit/3fc82ca): Add IsNaN operator. (#1656) <Pranav Sharma> - [0685f01](https://github.com/onnx/onnx/commit/0685f01): Add Sign Op (#1658) <Rui Zhu> - [2a8fae8](https://github.com/onnx/onnx/commit/2a8fae8): Fix unused var warning (#1669) <Yinghai Lu> - [e212833](https://github.com/onnx/onnx/commit/e212833): Update scan (#1653) <G. Ramalingam> Reviewed By: zrphercule Differential Revision: D13370727 fbshipit-source-id: 13a93d5acc8d4758f682278ea162ec9124ced22d	2018-12-07 17:37:42 -08:00
rohithkrn	11a9248d01	Enable fp16 for MIOPEN operators in Caffe2 (#14905 ) Summary: This PR enables fp16 MIOPEN operators in Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14905 Differential Revision: D13383439 Pulled By: bddppq fbshipit-source-id: 840afa8d08bef2952ca0039dee2423f1542bb330	2018-12-07 17:26:44 -08:00
Gu, Jinghui	70598740ec	Upgrade MKL-DNN to version 0.17 (#14308 ) Summary: upgrade MKL-DNN to version 0.17 update mkldnn bridge to latest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14308 Differential Revision: D13383102 Pulled By: yinghai fbshipit-source-id: c434f0e0ddff2ee2c86db2d6c44a37298fd005a3	2018-12-07 16:44:50 -08:00
Daniel Bermond	478eb70c07	Fix build with OpenCV 4.0 (#14356 ) Summary: Fixes #14355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14356 Differential Revision: D13356237 Pulled By: bddppq fbshipit-source-id: 2bf6ee21995c2c7b617c4e78ea7341f975f1b937	2018-12-07 16:40:31 -08:00
Sebastian Messmer	4453a1ff88	Remove unused TensorImpl dependencies Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14792 Reviewed By: ezyang Differential Revision: D13336843 fbshipit-source-id: 12f84799a70c2e90a8b934dd8dc031c09a6782f0	2018-12-07 16:23:48 -08:00
Sebastian Messmer	65aa11a876	Remove TensorImpl -> context_base dependency (#14658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14658 Remove this dependency by moving at::CopyBytes to c10. The implementations for at::CopyBytes will have to live in aten/caffe2 for now because they're not unified for CUDA yet. They'll be moved into c10/backend/xxx later. Reviewed By: dzhulgakov Differential Revision: D13288655 fbshipit-source-id: 1c92379345308b3cd39a402779d7b7999613fc0d	2018-12-07 16:23:46 -08:00
Sebastian Messmer	086a37876b	Fix include paths for TensorOptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14747 Reviewed By: ezyang Differential Revision: D13318645 fbshipit-source-id: f5ba77a93f6019fbf5faffb47a2837c95fad474d	2018-12-07 16:23:44 -08:00
James Reed	459aac4f24	Update graph printouts in JIT docs (#14914 ) Summary: Tracing records variable names and we have new types and stuff in the IR, so this updates the graph printouts in the docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/14914 Differential Revision: D13385101 Pulled By: jamesr66a fbshipit-source-id: 6477e4861f1ac916329853763c83ea157be77f23	2018-12-07 15:08:53 -08:00
Ailing Zhang	5734e96775	Improve hub documentation (#14862 ) Summary: Added a few examples and explains to how publish/load models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14862 Differential Revision: D13384790 Pulled By: ailzhang fbshipit-source-id: 008166e84e59dcb62c0be38a87982579524fb20e	2018-12-07 14:59:01 -08:00
James Reed	65da7ddad6	USE_FBGEMM=True by default Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14868 Differential Revision: D13383390 Pulled By: jamesr66a fbshipit-source-id: 1880c07dfd239e19153bd4fde2ab2c8d0604f956	2018-12-07 14:22:55 -08:00
Sergei Nikolaev	a0ee3a279c	USE_TENSORRT support and TensorRT 5 compatibility Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13945 Differential Revision: D13317525 Pulled By: yinghai fbshipit-source-id: 8630dfec1bbc5aac19539e344e7c38a7fd8b051d	2018-12-07 14:01:11 -08:00
Orion Reblitz-Richardson	febc7ff99f	Add __init__.py so files get picked up on install (#14898 ) Summary: This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch. Broken out of https://github.com/pytorch/pytorch/pull/13733/ cc pjh5 yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898 Reviewed By: pjh5 Differential Revision: D13381123 Pulled By: orionr fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe	2018-12-07 13:40:23 -08:00
Gregory Chanan	efc5e9f71a	Replace calls of Type::_th_tensor. (#14877 ) Summary: _th_tensor is moving off Type, so these calls need to be replaced. Unfortunately, replacing these with a full-fledged solution [e.g. from_storage(..., TensorOptions)] is a bit complicated because the storage itself fully defines the Type (modulo variable). It's simpler to just wait for the Variable/Tensor merge rather than to solve this now, so instead I changed the call sites to: at::empty({0}, type.options()).set_(storage...). This isn't great because we are also trying to get rid of Type::options, but this seems to be the lesser-of-two-evils. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14877 Differential Revision: D13374310 Pulled By: gchanan fbshipit-source-id: eb953ed041507e6190d6f32e383912e5a08311cd	2018-12-07 13:04:48 -08:00
Peter Goldsborough	d6c53328f9	Large scale fix of python-related files in torch/csrc/ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515 Differential Revision: D13247966 Pulled By: goldsborough fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628	2018-12-07 13:04:46 -08:00
PenghuiCheng	939877bf4b	Implementation of WeightedSum op for mkl-dnn and fix FC op output shape issue. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14407 Reviewed By: yinghai Differential Revision: D13364364 Pulled By: wesolwsk fbshipit-source-id: e69bcd1bc52e35b2f0e45e5dc40184f1bd66605d	2018-12-07 12:35:19 -08:00
Yudong Guang	265b55d028	Revert D13205604: Move numa.{h, cc} to c10/util Differential Revision: D13205604 Original commit changeset: 54166492d318 fbshipit-source-id: 89b6833518c0b554668c88ae38d97fbc47e2de17	2018-12-07 10:01:25 -08:00
vishwakftw	1c9df7facf	Expose torch.roll function and method (#14880 ) Summary: Fixes #14859 . Differential Revision: D13376915 Pulled By: zou3519 fbshipit-source-id: f1fc0e8492a159431a3fc0a19a41aa10429ecc80	2018-12-07 07:42:47 -08:00
Junjie Bai	6651fae827	Make autograd engine compatible with hip Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14873 Differential Revision: D13375053 Pulled By: bddppq fbshipit-source-id: f3051640386667bbf0566856ed433eb83276c39e	2018-12-07 00:12:06 -08:00
Jon Crall	6e453e56f9	Fixed ConvT docstring (#14876 ) Summary: Fixes #14099 I attempted to be as consistent as possible with the formatting, hence why my equation reads d(k - 1) instead of (k - 1)d. Also there is an unused variable on line 46: `n = self.in_channels`. I could fix that here too if that's not too out of scope. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14876 Differential Revision: D13374317 Pulled By: soumith fbshipit-source-id: a9f110acafa58cdb4206956dbe3ab4738d48292d	2018-12-06 23:57:30 -08:00
svcscm	51d26e76f7	Updating submodules Reviewed By: yns88 fbshipit-source-id: 7da015701f18f8a0b5a8092aae02a42ede7bfd44	2018-12-06 22:52:22 -08:00
David Riazati	4655b7bc4b	Remove weak module test expect files (#14871 ) Summary: This PR removes some expect files that aren't really testing anything Pull Request resolved: https://github.com/pytorch/pytorch/pull/14871 Differential Revision: D13373762 Pulled By: driazati fbshipit-source-id: e3537ee83df23b3b3b854f9b1253fd0cc8e9dd33	2018-12-06 21:55:12 -08:00
Wei Yang	1a247f872f	gradcheck (#14596 ) Summary: - allow gradcheck to take sparse tensor as input - sparse output is not allowed yet at gradcheck - add backward for `to_dense()` to get around sparse output - calling gradcheck at test_sparse, so that we can use `_gen_sparse()` and also easily cover coalesced / uncoalesced test cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/14596 Differential Revision: D13271904 Pulled By: weiyangfb fbshipit-source-id: 5317484104404fd38058884c86e987546011dd86	2018-12-06 18:03:38 -08:00
Teng Li	bfa666eb0d	Skipping two c10d tests only if there are multi-GPUs (#14860 ) Summary: Otherwise, these tests will fail, even though there are never meant to run on single GPU machines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14860 Differential Revision: D13369060 Pulled By: teng-li fbshipit-source-id: 8a637a6d57335491ba8602cd09927700b2bbf8a0	2018-12-06 17:28:07 -08:00
Sebastian Messmer	ada8f828f9	Move TensorOptions, DefaultTensorOptions to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14746 Reviewed By: ezyang Differential Revision: D13318644 fbshipit-source-id: b703d7dc67e75d9e9571c80d62a100c5fc4e84df	2018-12-06 15:59:04 -08:00
Marat Dukhan	bd3eb87258	Switch Int8MaxPool operator to QNNPACK (#14832 ) Summary: 1.6-2.4X speedup on ARM when compiled with gcc Pull Request resolved: https://github.com/pytorch/pytorch/pull/14832 Differential Revision: D13358160 Pulled By: Maratyszcza fbshipit-source-id: 39e9791886fac62650bb53a9df341889f0bb5d49	2018-12-06 15:14:28 -08:00
Richard Zou	e6a420114f	collect_env.py: get conda magma and mkl information (#14854 ) Summary: Fixes #12371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14854 Differential Revision: D13363635 Pulled By: zou3519 fbshipit-source-id: f8b5d05038bf5ce451399dfeed558ae298178128	2018-12-06 14:58:14 -08:00
zrphercule	ddca0442b6	Add LogSigmoid support in ONNX symbolic (#14830 ) Summary: Add LogSigmoid: torch.LogSigmoid(x) = onnx.Log(onnx.Sigmoid(x)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14830 Differential Revision: D13353891 Pulled By: zrphercule fbshipit-source-id: bf456170b9e6c4edad07b3333cd5797f8e0fa97f	2018-12-06 14:17:33 -08:00
Ashwin Bharambe	5f0bff9639	Kill GPU memory logs in normal runs (#14838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14838 The GPU memory tracking logs are incredibly annoying and merely serve to pollute output. I `VLOG(1)`ed them. Hopefully, this is non-controversial. Reviewed By: kuttas Differential Revision: D13343290 fbshipit-source-id: b3cae99346c97b66e97ea660061e15dc5c99b9fc	2018-12-06 13:51:14 -08:00
Junjie Bai	f82f4de229	Stop inserting static casts in Hipify (#14853 ) Summary: Latest hcc can now properly cast to correct type internally, so there is no need to insert static_cast in hipify scripts anymore. However the hcc included in the latest ROCm release (1.9.2) doesn't have this fix, so leaving a flag to continue doing static_cast for those using the official ROCm releases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14853 Differential Revision: D13363171 Pulled By: bddppq fbshipit-source-id: a36476a8511222ff3c933d31788e8a0ffb04f5ca	2018-12-06 13:19:33 -08:00
Jerry Zhang	b5db6ac9f1	Tensor construction codemod - 3/3 (#14835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14835 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: bddppq Differential Revision: D13335184 fbshipit-source-id: 26d8247e16b30bdff045530034af9b72c76d066f	2018-12-06 11:50:59 -08:00
Jerry Zhang	20d1bff292	Tensor construction codemod - 1/3 (#14828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14828 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: bddppq Differential Revision: D13335160 fbshipit-source-id: a3ae4c5a86bfbdaf2d5aa14e0eef57255e829fd4	2018-12-06 11:47:32 -08:00
Jerry Zhang	1d111853ae	Move numa.{h, cc} to c10/util (#14393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14393 att Reviewed By: ezyang Differential Revision: D13205604 fbshipit-source-id: 54166492d31827b0343ed070cc36a825dd86e2ed	2018-12-06 11:30:13 -08:00
Johannes M Dieterich	75a2d8e2de	Upgrade CI to ROCm 1.9.2 (#14216 ) Summary: Drop custom hcc/hip as the 1.9.2 release should contain the relevant patches therein. Most notable feature in 1.9.2 is mixed precision support in rocBLAS and MIOpen. These features will be enabled by subsequent PRs. bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14216 Differential Revision: D13354294 Pulled By: bddppq fbshipit-source-id: 2541d4a196af21c9432c1aff7f6e65b572628028	2018-12-06 10:13:39 -08:00
Jan Schlüter	1c8d41a08d	Allow linspace and logspace with steps=1 and start != end like numpy (#14748 ) Summary: `torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine. Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?" I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy. This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`. If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748 Differential Revision: D13356136 Pulled By: ezyang fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82	2018-12-06 09:30:55 -08:00
Jie	d2fdc33411	(#14580 ) Summary: Removes cast of half to float in torch.sum, with float16 input tensor and float32 output tensor, instead we cast data when loading input in kernel. This supposingly would save a kernel launch as well as a full global memory load on promoted data type (float). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14580 Differential Revision: D13356203 Pulled By: ezyang fbshipit-source-id: 85e91225b880a65fe3ceb493371b9b36407fdf48	2018-12-06 09:03:46 -08:00
Ricardo Cuenca	eb3cabffd6	Consistent formatting in losses' docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14739 Differential Revision: D13356143 Pulled By: ezyang fbshipit-source-id: 9ae8316dd8ba6e910247b64cec22db63df10e11c	2018-12-06 09:01:24 -08:00
Alex Şuhan	2e7cc86a62	Add (partial) autodiff support for nll_loss (#14305 ) Summary: Not ready yet, need some comments / help with this. It's good enough for https://github.com/pytorch/xla immediate goals (forward + backward trace fusion), but there are at least two issues with it: 1. If we don't allow it, `test/test_jit.py` fails to cover the change. 2. If we allow the weight to be set, running `test/test_jit.py TestJitGenerated.test_nn_nll_loss` fails with: ``` ====================================================================== ERROR: test_nn_nll_loss (__main__.TestJitGenerated) ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_jit.py", line 10001, in do_test fn, f_args_variable, kwargs_variable, no_grad=no_grad) File "test/test_jit.py", line 9360, in check_against_reference outputs_test = self.runAndSaveRNG(func, recording_inputs, kwargs) File "test/test_jit.py", line 425, in runAndSaveRNG results = func(inputs, kwargs) File "test/test_jit.py", line 9298, in script_fn self.assertExportImport(CU.the_method.graph, tensors) File "test/test_jit.py", line 415, in assertExportImport self.assertExportImportModule(m, inputs) File "test/test_jit.py", line 419, in assertExportImportModule self.assertEqual(self.runAndSaveRNG(m.forward, inputs), File "test/test_jit.py", line 425, in runAndSaveRNG results = func(inputs, *kwargs) RuntimeError: arguments for call are not valid: for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight, , Tensor out) -> Tensor: expected a value of type Tensor for argument 'total_weight' but found bool <internally-created-node> ~ <--- HERE for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> Tensor: expected a value of type Tensor for argument 'total_weight' but found bool <internally-created-node> ~ <--- HERE for call at: <internally-created-node> ~ <--- HERE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14305 Differential Revision: D13356265 Pulled By: ezyang fbshipit-source-id: 504d783b2d87f923e698a6a4efc0fd9935a94a41	2018-12-06 08:58:54 -08:00
svcscm	e7bd8457a6	Updating submodules Reviewed By: yns88 fbshipit-source-id: 2adbb6f97d4b8f067a2538fec855063510b0ca3f	2018-12-06 08:58:53 -08:00
svcscm	6039c7611f	Updating submodules Reviewed By: yns88 fbshipit-source-id: e0509413215f3b7578b825c52365fec4da625bd5	2018-12-06 02:55:47 -08:00
lcskrishna	12addc64a6	Fixed MIOpen RNN Segfault issue and enabled RNN test (#14810 ) Summary: This pull request contains changes for: 1. Added MIOpen RNN API miopenGetRNNLayerBiasSize and miopenGetRNNLayerParamSize. 2. Fixed usage of API miopenGetRNNLayerParam. 3. Modifying the RNN test to run using MIOpen engine. Differential Revision: D13355699 Pulled By: bddppq fbshipit-source-id: 6f750657f8049c5446eca893880b397804120b69	2018-12-05 23:54:31 -08:00
Yinghai Lu	39d50ef4f6	Export complete subgraph io info when calling onnxGetBackendCompatibility (#14827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14827 We need to send complete IO info when doing `onnxGetBackendCompatibility` to backend like Glow. Previously we are missing some info because sometimes we generate more than one nodes from one C2 op. This fixes the issue. Reviewed By: jackm321 Differential Revision: D13352049 fbshipit-source-id: 8d8ac70656a0ac42f3a0ccecad61456a4f3b2435	2018-12-05 23:52:06 -08:00
Huan Gui	ba287eebca	Fix clip gradient with empty input (#14709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14709 As titled Reviewed By: Wakeupbuddy Differential Revision: D13305554 fbshipit-source-id: 380062d4b0e4f9dc0207a27766cac7b8d05384d5	2018-12-05 22:53:25 -08:00
JerryShih	997df9a6ec	Remove protobuf dependency in pytorch cmake file. (#14182 ) Summary: Currently, pytorch doesn't dependent on protobuf. So, we don't need to include the protobuf dir in pytorch cmake file. And if we build caffe2 without custom-protobuf[1], we will have the protobuf mismatched problem. [1] `92dbd0219f/CMakeLists.txt (L65)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14182 Differential Revision: D13356273 Pulled By: ezyang fbshipit-source-id: 8120c3452d158dc51d70156433d7b9076c6aed47	2018-12-05 22:49:50 -08:00
Xiang Gao	3799d32b7b	Optimize images (#14084 ) Summary: This is a PR that [ImgBot](https://imgbot.net/) opened on my fork https://github.com/zasdfgbnm/pytorch/pull/1, I forward it here. ImgBot does lossless compression on images to reduce file size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14084 Differential Revision: D13356293 Pulled By: ezyang fbshipit-source-id: 731236d95ad870db8ccb99b03ed306704365242c	2018-12-05 22:46:32 -08:00
Aldian Fazrihady	e27d77815d	Prevent `profile_observer_test` from being run by CPU test (#14168 ) Summary: Fix CMakeLists.txt, so the test for CPU won't run profile_observer_test.cc, as currently it only supports GPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/14168 Differential Revision: D13356274 Pulled By: ezyang fbshipit-source-id: 7d105f2e18675e5fab129864958148b0f18d582c	2018-12-05 22:34:29 -08:00
Achal Shah	14fb651b5f	CAFFE2_INCLUDE_DIRS points to invalid path (#14306 ) Summary: I know that including CAFFE2_INCLUDE_DIRS in include headers are not necessary for newer cmakes. But I had this in one of my old projects and cmake gave me error that "/usr/lib/include" is invalid path. It seems like "${_INSTALL_PREFIX}/lib/include" should be changed to "${_INSTALL_PREFIX}/include" as all caffe2 headers are in /include rather than /lib/include/ Please correct me if I am wrong? Pull Request resolved: https://github.com/pytorch/pytorch/pull/14306 Differential Revision: D13356246 Pulled By: ezyang fbshipit-source-id: e2d5d3c42352e59b245714ad90fd7a9ef48170d7	2018-12-05 22:32:04 -08:00
HB_alon	5e307bd1be	use "Extension" instead of the unimported "setuptools.Extension" (#14475 ) Summary: use "Extension" instead of the unimported "setuptools.Extension" Pull Request resolved: https://github.com/pytorch/pytorch/pull/14475 Differential Revision: D13356219 Pulled By: ezyang fbshipit-source-id: 5a3e7eb73a32d6bf09676efd9eddded5586435cd	2018-12-05 22:18:47 -08:00
Shuichi KITAGUCHI	d393dd0744	generate ATen core files with LF. (#14667 ) Summary: on Windows environment, some ATen core files (Type.h, Tensor.h, TensorMethods.h) are created and it's new line code is CRLF. (maybe enviconment dependant) therefore, comparing files is failed in generate_outputs()agener917.py and compilation stopped. this patch generates these files with LF forcibly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14667 Differential Revision: D13356170 Pulled By: ezyang fbshipit-source-id: ef8cc3a6cc8bf3c45b78e9eb3df98cf47c0d33bb	2018-12-05 22:14:29 -08:00
Brendan Soffientini	2d60afbc90	Remove outdated css file and refs in cpp conf.py (#14779 ) Summary: pytorch_theme.css is no longer necessary for the cpp or html docs site build. The new theme styles are located at https://github.com/pytorch/pytorch_sphinx_theme. The Lato font is also no longer used in the new theme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14779 Differential Revision: D13356125 Pulled By: ezyang fbshipit-source-id: c7635eb7512c7dcaddb9cad596ab3dbc96480144	2018-12-05 21:55:45 -08:00
vaeksare	82903dda9b	Fixes for some Windows compiler warnings (#14490 ) Summary: Implement some simple fixes to clean up windows build by fixing compiler warnings. Three main types of warnings were fixes: 1. GCC specific pragmas were changed to not be used on windows. 2. cmake flags that don't exist on windows were removed from windows build 3. Fix a macro that was defined multiple times on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14490 Differential Revision: D13241988 Pulled By: ezyang fbshipit-source-id: 38da8354f0e3a3b9c97e33309cdda9fd23c08247	2018-12-05 21:27:07 -08:00
Edward Yang	a6399121da	Shut up "address will always evaluate to 'true'" warnings (#14774 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14774 Differential Revision: D13327969 Pulled By: ezyang fbshipit-source-id: 43380c89eedaaa89467952401b8fd3f5a9ad754a	2018-12-05 21:18:31 -08:00
Edward Yang	f9446e0c94	HIPify less files in PyTorch (#14804 ) Summary: Stacked on #14803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14804 Differential Revision: D13347986 Pulled By: ezyang fbshipit-source-id: c93177b4ad51855660d0de36d042bfc542bd4be0	2018-12-05 20:52:38 -08:00
Junjie Bai	ba0ebe33c1	Unify device argument parsing between torch and c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786 Differential Revision: D13334501 Pulled By: bddppq fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c	2018-12-05 18:37:32 -08:00
Pieter Noordhuis	252e9058d4	Improve assertion failure message (#14813 ) Summary: See #14554. I can't figure out how the reported issue can happen. The best next thing is have more information when this happens again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14813 Differential Revision: D13351908 Pulled By: pietern fbshipit-source-id: 61b30fcae2e34da54329d0893ca4921b6ad60f0d	2018-12-05 17:20:25 -08:00
Bram Wasti	83ad52634a	Add FunctionSchema based Operator Registry (#13789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13789 This enables creation of operators with FunctionSchema and IValue Reviewed By: smessmer Differential Revision: D13008791 fbshipit-source-id: 151efc88ac315f4a0ab0171a99774caaf767ef1e	2018-12-05 17:20:24 -08:00
Pieter Noordhuis	67dcf10631	Increase test timeout (#14814 ) Summary: It is possible that some sort of contention causes process scheduling delays which in turn cause the timeout to not be hit. Increased sleep here will decrease the probability of this happening. Fixes #14555. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14814 Differential Revision: D13351924 Pulled By: pietern fbshipit-source-id: 1222cf0855408dfcb79f30f94694c790ee998cf9	2018-12-05 17:18:11 -08:00
Pieter Noordhuis	c02b3e7cea	Retry test on address already in use error (#14815 ) Summary: Thanks nairbv for the suggestion. Also see #14589. Fixes #14703. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14815 Differential Revision: D13351913 Pulled By: pietern fbshipit-source-id: d11a4152505d0ce15592b13e417bb80551476a61	2018-12-05 17:09:46 -08:00
Lu Fang	6fccca4278	improve ONNX tests on torch.Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14821 Reviewed By: zrphercule Differential Revision: D13348773 Pulled By: houseroad fbshipit-source-id: 611ca6e28f715e5518649c8c16f702ac3433308c	2018-12-05 17:07:10 -08:00
Lin Huang	524574ab73	Define THPStorage struct only once (rather than N times) (#14802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14802 The definetion of THPStorage does not depend on any Real, its macro defintion is unnecessary, refactor the code so that THPStorage is not macro defined. Reviewed By: ezyang Differential Revision: D13340445 fbshipit-source-id: 343393d0a36c868b9a06eea2ad9b80f5e395e947	2018-12-05 13:19:29 -08:00
Daya S Khudia	ca6311d909	File name change for FbgemmI8Depthwise.h and FbgemmI8Depthwise.cc (#14725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14725 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/33 Renaming FbgemmI8Depthwise.h to FbgemmI8DepthwiseAvx2.h and FbgemmI8Depthwise.cc to FbgemmI8DepthwiseAvx2.cc since FbgemmI8DepthwiseAvx2.cc will be compiled with avx2 flags Reviewed By: jianyuh Differential Revision: D13313898 fbshipit-source-id: a8111eacf3d79a466ce0565bfe5f2f0b200a5c33	2018-12-05 13:14:48 -08:00
zrphercule	e114527d19	Add torch.nn.RReLU support in symbolic (#14781 ) Summary: Now we support exporting torch.nn.RReLU in onnx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14781 Reviewed By: houseroad Differential Revision: D13343872 Pulled By: zrphercule fbshipit-source-id: 1e96b957de4fc2f5ba3959d42329807975419ae3	2018-12-05 13:10:07 -08:00
Daya S Khudia	50936cb06e	Move avx2 specific code in different source files (#28 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/28 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14516 This is the first diff in a series of diffs that will separate out avx2 specific code in separate files. The goal is to compile as little as possible code with avx2 and avx512 compiler flags. Reviewed By: jianyuh Differential Revision: D13248376 fbshipit-source-id: 401c2e9d3cd96c420fd08c3efa011febce96ffbb	2018-12-05 12:19:35 -08:00
Marat Dukhan	55092b1cc6	Validate matching input shapes in Int8Add operator (#14520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14520 Default engine doesn't support broadcast semantics in Int8Add operator. This patch adds a check that shapes are equivalent. Reviewed By: bertmaher Differential Revision: D13250922 fbshipit-source-id: 8526d07723bd9a34d54dee04d121c57f8b33c481	2018-12-05 12:00:23 -08:00
Tongzhou Wang	1c2273c8e9	fix stft arg types Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14800 Reviewed By: zou3519 Differential Revision: D13340574 Pulled By: SsnL fbshipit-source-id: 8b0dbbe299d1a362da0ecc0b1c0dadb2543ded5d	2018-12-05 11:45:37 -08:00
Edward Yang	999690ff3d	Improve HIPify performance (#14803 ) Summary: ``` Improve performance of pyHIPIFY Changes: - Pre-compile regexes, don't use regexes when it's not necessary (this saves us ~15%) - Compile all substitutions for mappings into a single, non-backtracking regex using a Trie. This gives big savings. Before, running pyHIPIFY on all files took 15.8s. Now it takes 3.9s. ``` Stacked on #14769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14803 Differential Revision: D13342620 Pulled By: ezyang fbshipit-source-id: 1cfa36b3236bbe24d07080a31cc788a52d740f40	2018-12-05 11:00:03 -08:00
Ailing Zhang	be47470c91	Fix cuda multiprocessing cached memory (#14736 ) Summary: This PR fixes #11422 In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's. This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type. In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come. Thanks a ton to ezyang who helped design the solution and debugged the issue! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736 Differential Revision: D13335899 Pulled By: ailzhang fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371	2018-12-05 10:55:43 -08:00
Peter Goldsborough	3ae721d350	Set and get default dtype (#13748 ) Summary: Replaces the `DefaultTensorOptions` with just a global default dtype that you can set and get like in Python. Also, calls `set_default_dtype` in the implementation of `torch.set_default_dtype`. Right now these two default values are separate but will always be the same. Should we just bind `set_default_dtype` into Python? I think that might be good to do in a separate PR though. ezyang gchanan Also CC colesbury who wanted to do this for ATen for a while? What do you think about it? Pull Request resolved: https://github.com/pytorch/pytorch/pull/13748 Differential Revision: D13340207 Pulled By: goldsborough fbshipit-source-id: 2689b09eb137fabb3a92d1ad1635782bee9398e8	2018-12-05 10:28:41 -08:00
Marat Dukhan	90b1196ac4	Switch Int8AveragePool operator to QNNPACK (#14783 ) Summary: 2.2-2.9X better performance on ARM when compiled with gcc (same bad perf when compiled with Clang) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14783 Differential Revision: D13332680 Pulled By: Maratyszcza fbshipit-source-id: 4c1138500c6b3026335e9bfe5f6be43b1ae2cefb	2018-12-05 10:18:42 -08:00
peterjc123	e1eb32d9f1	Update magma to 2.4.0 for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14738 Differential Revision: D13341611 Pulled By: soumith fbshipit-source-id: 39a49fc60e710cc32a463858c9cee57c182330e2	2018-12-05 09:53:39 -08:00
Edward Yang	62f4db6d8a	Unify build_caffe2_amd.py and build_pytorch_amd.py (#14769 ) Summary: I need to preserve ability to HIPify out-of-place files only, so build_amd.py grows a --out-of-place-only flag. Stacked on #14757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14769 Differential Revision: D13340154 Pulled By: ezyang fbshipit-source-id: 1b855bc79e824ea94517a893236fd2c8ba4cb79d	2018-12-05 09:26:12 -08:00
Ilia Cherniavskii	dbf6d12776	Default pool() option (#14636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14636 Add a default CPU option for the pool() Reviewed By: andrewwdye Differential Revision: D13281367 fbshipit-source-id: 92dbfce89c900a41731b6d1ff62bb97886c40f77	2018-12-05 08:44:19 -08:00
Francisco Massa	2d958b7f77	Storage.clone maintains original device (#14751 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14673 As pointed out by vishwakftw , the root case of the `deepcopy` issue was that `storage.clone()` would create a new storage in the default device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14751 Reviewed By: soumith Differential Revision: D13323061 Pulled By: fmassa fbshipit-source-id: bfe46ebd78f0b6cd9518c11d09de7849282ed2a2	2018-12-05 08:33:56 -08:00
svcscm	a80a46a6d0	Updating submodules Reviewed By: yns88 fbshipit-source-id: 080e0034bd6353420383ac7b476af5a35eaba7c3	2018-12-05 08:33:55 -08:00
svcscm	0b1b72e975	Updating submodules Reviewed By: yns88 fbshipit-source-id: e397238c7c477c4268e2dc89e530776fc89f18f8	2018-12-05 02:55:46 -08:00
Jongsoo Park	0573ef664e	include avx512vl to avx512 code path (#14733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14733 We often also want to use AVX512VL instruction sets. We already included AVX512F, AVX512DQ. Skylake also has AVX512BW, AVX512CD we may want to later. Reviewed By: duc0 Differential Revision: D13317282 fbshipit-source-id: 82c8e401d82d5c3a5452fb4ccb6e5cb88d242bda	2018-12-05 00:50:51 -08:00
Adam Paszke	f89de64796	Use AT_WARN for warnings in the JIT (#14770 ) Summary: Previously their implementation dispatched to prim::Print, which kept printing the warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14770 Differential Revision: D13327629 Pulled By: suo fbshipit-source-id: b9913f533d4530eb7c29146c39981ba7f72b6b68	2018-12-05 00:16:09 -08:00
Yinghai Lu	ecc17fe3dd	Add output info when doing onnxGetBackendCompatibility (#14784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14784 TSIA. To give more complete info to `onnxGetBackendCompatibility`. Reviewed By: bertmaher, rdzhabarov Differential Revision: D13331989 fbshipit-source-id: 1064b93f7f474788f736e6f0c893dae915c6fb99	2018-12-04 21:53:32 -08:00
Adam Paszke	c79e305add	Don't DCE PythonOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14773 Reviewed By: eellison Differential Revision: D13327673 Pulled By: suo fbshipit-source-id: 236db3407c7eacac470530836e3d4d0dc323110c	2018-12-04 21:37:36 -08:00
Adam Paszke	8dfebc16cc	Improvements for symbolic AD (#14758 ) Summary: Review only the last commit. This commit adds a few optimizations to AD, that let us dramatically reduce the number of sizes we capture from forward. We now: - collapse chains of SumToSize - avoid capturing sizes of tensors that are captured anyway - more aggressively DCE the reverse code - run CSE on the primal code to deduplicate `aten::size` calls cc zou3519 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758 Differential Revision: D13324440 Pulled By: zou3519 fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72	2018-12-04 20:38:21 -08:00
Ailing Zhang	38eb1beff5	Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py Differential Revision: D13289919 Original commit changeset: d701bc7bb48f fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d	2018-12-04 20:25:16 -08:00
Edward Yang	78a9e7d83f	Delete defunct files from torch/csrc/distributed (#14785 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14785 Differential Revision: D13333066 Pulled By: ezyang fbshipit-source-id: e7937b4e8e12409b0fa964c34f995f7861ca95ff	2018-12-04 20:13:20 -08:00
Elias Ellison	d76e411d8c	support conv transpose in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14775 Differential Revision: D13330491 Pulled By: eellison fbshipit-source-id: 432b327d6a33517ff53ea33c9f64700e81432332	2018-12-04 19:54:09 -08:00
Teng Li	2d3cf98b49	Making dist.get_default_group private for PT1 release (#14767 ) Summary: When I wrote the frontend API, it is designed on not letting users use the default_group directly on any functions. It should really be private. All collectives are supposed to either use group.WORLD, or anything that comes out of new_group. That was the initial design. We need to make a TODO on removing group.WORLD one day. It exists for backward compatibility reasons and adds lots of complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14767 Reviewed By: pietern Differential Revision: D13330655 Pulled By: teng-li fbshipit-source-id: ace107e1c3a9b3910a300b22815a9e8096fafb1c	2018-12-04 19:22:24 -08:00
Andy Chen	33ea7eafef	Make checkpoint_sequential work with multiple arguments (#14278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278 In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest. We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors. Reviewed By: ezyang Differential Revision: D13144672 fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7	2018-12-04 18:47:43 -08:00
Lu Fang	3237103624	Automatic update of fbcode/onnx to 42804705bdbf179d1a98394008417e1392013547 (#14777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14777 Previous import was 6b34743d2e361bbc0acb29dd73536478cb92562e Included changes: - [4280470](https://github.com/onnx/onnx/commit/4280470): Changes done internally at Facebook (#1668) <Lu Fang> - [f85221f](https://github.com/onnx/onnx/commit/f85221f): Fuse MatMul and Add into Gemm (#1542) <vloncar> - [022230e](https://github.com/onnx/onnx/commit/022230e): Replace np.long by np.int64 (#1664) <G. Ramalingam> - [0ab3c95](https://github.com/onnx/onnx/commit/0ab3c95): Infer shape from data in Constant nodes (#1667) <Shinichiro Hamaji> Reviewed By: bddppq Differential Revision: D13330082 fbshipit-source-id: 13cf328626cf872d0983bbd2154d95c45da70f1c	2018-12-04 18:37:48 -08:00
David Riazati	a66669a110	Enable testing on Loss modules (#14778 ) Summary: This PR adds `None` buffers as parameters (similarly to #14715). It also cleans up a bunch of the `test_jit.py` tests that should be covered by `common_nn.py` and brings in `criterion_tests` to test loss functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14778 Differential Revision: D13330849 Pulled By: driazati fbshipit-source-id: 924cc4cf94e0dcd11e811a55222fd2ebc42a9e76	2018-12-04 18:35:10 -08:00
Wanchao Liang	d872af9282	Add tests for dropout/batchnorm train/eval, remove training constants (#14780 ) Summary: This PR: 1. add tests for batchnorm/dropout for train/eval parameter mutatino 2. remove training constants from all our standard library Pull Request resolved: https://github.com/pytorch/pytorch/pull/14780 Differential Revision: D13331578 Pulled By: wanchaol fbshipit-source-id: d92ca3ce38cc2888688d50fe015e3e22539a20a5	2018-12-04 18:17:43 -08:00
Gregory Chanan	86b4dd8bb2	Split LegacyDeviceTypeInit from LegacyTypeDispatch. (#14723 ) Summary: The goal here is to have LegacyTHDispatch call into this as well, so LegacyTypeDispatch and LegacyTHDispatch don't have cross dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14723 Reviewed By: ezyang Differential Revision: D13314017 Pulled By: gchanan fbshipit-source-id: 8761cb4af2b2269d2e755203e073bfdba535b8c0	2018-12-04 17:51:37 -08:00
Michael Suo	f6f24cf0f4	don't allow cse to clean up nondeterministic nodes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14776 Differential Revision: D13330229 Pulled By: suo fbshipit-source-id: 6bc88811e1889949f0f079cffccd8cd4270584cc	2018-12-04 15:45:37 -08:00
Adam Paszke	d76fd43294	Reenable all forward-pass fusions that worked before the AD fix (#14558 ) Summary: Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558 Differential Revision: D13321748 Pulled By: zou3519 fbshipit-source-id: c04fc2f70d106d2bfb56206b5aec517a93b79d1f	2018-12-04 15:43:37 -08:00
David Riazati	c3bfa0e52b	BatchNorm support not tracking stats Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14764 Differential Revision: D13325800 Pulled By: driazati fbshipit-source-id: a3e4773dc31b83565e7a4de33614d6efd4a12de9	2018-12-04 15:11:53 -08:00
Lu Fang	c21f090ab4	Minor doc change in c10/Device.h (#14762 ) Summary: Make sure it's a valid regex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14762 Reviewed By: zrphercule Differential Revision: D13326108 Pulled By: houseroad fbshipit-source-id: fdcae2d5d42774c4071651b7477f08047d385dfa	2018-12-04 14:52:22 -08:00
Gregory Chanan	9e1f4ba124	Introduce LegacyTHDispatcher for dispatching to TH functions. (#14754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14754 This isn't hooked up to anything yet, this is just putting the skeleton in place. The idea here is that the functions generated via Declarations.cwrap and nn.yaml are not actually operators, they are implementation details of operators, and thus don't need to participate in VariableType, JIT dispatch generation. So, we will split these functions out from the usual Type/operator hierarchy; for now the dispatch will be done by a Type-like class called LegacyTHDispatcher. Once this is done this probably means we can collapse Type to be backend-specific, not Type/ScalarType specific, because all the ScalarType specific code will live in the LegacyTHDispatcher. Reviewed By: ezyang Differential Revision: D13321605 fbshipit-source-id: 25d1bbc9827a42d6ab5d69aabbad3eac72bf364c	2018-12-04 14:44:06 -08:00
Michael Suo	53a9d4f312	disable batch mm if we have mutable ops (#14771 ) Summary: Just to be safe, disable batch mm for mutable ops. We don't lose much for doing this, and we can go back at a calmer time to re-enable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14771 Reviewed By: eellison Differential Revision: D13327641 Pulled By: suo fbshipit-source-id: 96611e21ed3cb8492a2cd040f7d33fb58c52bd5e	2018-12-04 14:34:57 -08:00
Chandler Zuo	5ed9dfad98	Replace at::Half non-vectorized conversions with implementations from FP16 (#14411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14579 Folded the fp16 codes into c10. Reviewed By: ezyang Differential Revision: D13206450 fbshipit-source-id: 472208dd230dc49d33935622ff3286b17eeb0894	2018-12-04 14:32:33 -08:00
Thomas Viehmann	2d56df7892	Use .to to convert new tensors in new_tensor (#14097 ) Summary: This would solve the tracing problems of #13969. Fixes: #14732 I would appreciate if this got good scrutiny before applied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14097 Differential Revision: D13323181 Pulled By: ezyang fbshipit-source-id: dcd104b497c0bfddb751923c6166a3824b7a3702	2018-12-04 14:03:56 -08:00
Zeming Lin	c7c5eed686	Export generator constructor (#14041 ) Summary: Missed a spot :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14041 Reviewed By: ezyang Differential Revision: D13283803 Pulled By: ebetica fbshipit-source-id: 482e245f57b0cea6ca3886355ea3ae487d024d4b	2018-12-04 13:50:06 -08:00
Zeming Lin	374b797569	c10d doesn't work with torch namespace (#14042 ) Summary: If both `Utils.hpp` and the `torch` namespace is included in the same file, the compiler won't know which fmap to use. I believe this is because of ADL. This change fixes that issue for me. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14042 Reviewed By: ezyang Differential Revision: D13283810 Pulled By: ebetica fbshipit-source-id: b68233336518230ba730e83ddac1226a66896533	2018-12-04 13:47:20 -08:00
Wanchao Liang	3aba2d99e1	Add resnet test, convert more modules (#14437 ) Summary: This PR add resnet to test_jit and convert more nn modules, stacked on #14533 and #14715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14437 Differential Revision: D13325871 Pulled By: wanchaol fbshipit-source-id: 6c94a988b36794a373af6541c0c262a07291f7b1	2018-12-04 13:42:41 -08:00
David Riazati	25c9a8b1fc	Add missing test skip Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14763 Differential Revision: D13325350 Pulled By: driazati fbshipit-source-id: 4d64a7616b227983c2fc2748c5fbecd1bcbff832	2018-12-04 13:38:53 -08:00
Peter Goldsborough	875be849e9	Rename _local_scalar to item() (#13676 ) Summary: Make `at::_local_scalar` more "official" by renaming it to `item()`. gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676 Differential Revision: D13003020 Pulled By: goldsborough fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4	2018-12-04 13:19:26 -08:00
Edward Yang	e829a52977	Remove use of hipify_caffe2, in favor of file path test. (#14757 ) Summary: This is towards unifying build_pytorch_amd.py and build_caffe2_amd.py scripts. There is only one use of hipify_caffe2 left, which is just to control which files actually get HIPified. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14757 Differential Revision: D13323486 Pulled By: ezyang fbshipit-source-id: 958cd91be32dfc3c0a9ba9eda507adb5937aebcd	2018-12-04 12:48:49 -08:00
Jerry Zhang	a597c0ca05	Add inplace FeedTensor for python frontend (#14512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14512 att Reviewed By: dzhulgakov Differential Revision: D13243278 fbshipit-source-id: 78af417d0fcd9b9791ee839d62095903e49205cb	2018-12-04 12:45:11 -08:00
Elias Ellison	ba70cf22fa	Loss (#14720 ) Summary: Adding Loss modules to script. Some of the modules have an optional tensor parameter. I will wait until wanchao's diff to support optional tensors is landed before landing this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14720 Differential Revision: D13317990 Pulled By: eellison fbshipit-source-id: 535925bdf126d28d9e7d64077b83ebd836a5beba	2018-12-04 12:30:05 -08:00
Ailing Zhang	ef91cfd68b	Add new reduction mode in kl_div (#14457 ) Summary: Fixes #6622 . We used to average over all elements for kl divergence, which is not aligned with its math definition. This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension. - In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements. - We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor. - Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release. - [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457 Differential Revision: D13236016 Pulled By: ailzhang fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7	2018-12-04 12:24:28 -08:00
Michael Antonov	773f4d8081	Implements Gather operator for arbitrary axis, sharing the code with BatchGather. (#13756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756 This implements general Gather operator for arbitrary axis, sharing the code with BatchGather. - CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis. - Shared CUDA kernel moved to gather_op.cuh, for any axis. - Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument. - BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output. - Added tests for axis == 2. GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX. This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies wheather this wrapping should be done; set it to true if you'd like wrapping for any axis. TBD: Update gradients to support negative indices (separate diff). TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping by default, but rather do it only if wrap_indices is set. Reviewed By: dzhulgakov Differential Revision: D12983815 fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7	2018-12-04 11:54:28 -08:00
SsnL	16558a1e9d	Refactor dataloader.py (#14668 ) Summary: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668 Reviewed By: soumith Differential Revision: D13289919 Pulled By: ailzhang fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c	2018-12-04 09:53:41 -08:00
Sebastian Messmer	7e4a5b89fe	Back out "Move TensorOptions, DefaultTensorOptions and OptionsGuard to c10" (#14745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14745 Original commit changeset: c62e7f9b0255 Reviewed By: suo Differential Revision: D13318594 fbshipit-source-id: 4d7dc35ca01b627accc3ee512bfcd6f2e805a533	2018-12-04 08:59:10 -08:00
Sebastian Messmer	ff7deb95d7	Back out "Fix include paths for TensorOptions, DefaultTensorOptions, OptionsGuard" (#14744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14744 Original commit changeset: d236d5351ecf Reviewed By: suo Differential Revision: D13318596 fbshipit-source-id: 55f1e9472d05fb5a9c47dc82c32e9a66b5e4308c	2018-12-04 08:59:07 -08:00
Adam Paszke	7bc489c827	Disable randn_like fusion in the JIT (#14752 ) Summary: Fixes #14674. We won't have time for a proper fix before the release, so at least disable fusion of nodes that trigger incorrect behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14752 Differential Revision: D13320407 Pulled By: zou3519 fbshipit-source-id: 2400f7c2cd332b957c248e755fdb0dadee68da5d	2018-12-04 08:55:47 -08:00
Ailing Zhang	86ffc2a5f1	fix import failure in hub test (#14742 ) Summary: Fix #14610 I can repro the test failure following the steps provided, and this fixes the issue for me. Seems the timing of inserting has to happen after the downloading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14742 Differential Revision: D13318533 Pulled By: ailzhang fbshipit-source-id: b9207b4572d5a9443e516d9a84632e3d7b68e477	2018-12-04 08:37:05 -08:00
Edward Yang	9e58c4ef91	Revert D13304654: [pytorch][PR] Introduce LegacyTHDispatcher for dispatching to TH functions. Differential Revision: D13304654 Original commit changeset: cfe3e1a28adc fbshipit-source-id: 06669d3c88f83e1d959e2c266fd608316539d42a	2018-12-04 07:58:34 -08:00
Gregory Chanan	264111bfc1	Introduce LegacyTHDispatcher for dispatching to TH functions. (#14708 ) Summary: This isn't hooked up to anything yet, this is just putting the skeleton in place. The idea here is that the functions generated via Declarations.cwrap and nn.yaml are not actually operators, they are implementation details of operators, and thus don't need to participate in VariableType, JIT dispatch generation. So, we will split these functions out from the usual Type/operator hierarchy; for now the dispatch will be done by a Type-like class called LegacyTHDispatcher. Once this is done this probably means we can collapse Type to be backend-specific, not Type/ScalarType specific, because all the ScalarType specific code will live in the LegacyTHDispatcher. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14708 Reviewed By: ezyang Differential Revision: D13304654 Pulled By: gchanan fbshipit-source-id: cfe3e1a28adcc355f67fe143495ee7e5c5118606	2018-12-04 07:41:04 -08:00
Zachary DeVito	33b1f9f71a	add .code property to ScriptModule (#14735 ) Summary: simple change to allow `print(foo.code)` to give a pretty-printed description of all the methods on a module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14735 Differential Revision: D13317619 Pulled By: zdevito fbshipit-source-id: dc7f7ba12ba070f2dfccf362995c2a9e0e573cb7	2018-12-04 07:32:18 -08:00
Richard Zou	1921816f85	Fix clamp when min/max are both None (#14716 ) Summary: Before this PR, tensor.clamp() would return an empty tensor if min and max were not specified. This is a regression from 0.4.1, which would throw an error. This PR restores that error message. Fixes #14470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14716 Differential Revision: D13311031 Pulled By: zou3519 fbshipit-source-id: 87894db582d5749eaccfc22ba06aac4e10983880	2018-12-04 07:07:09 -08:00
Lu Fang	6e0c5a8a4e	Restore device in cpp API (#14711 ) Summary: This is a stack PR based on https://github.com/pytorch/pytorch/pull/14454. It enables the restoring the storage to appropriate device. ~~[TODO]: add/modify appropriate tests~~ Done Pull Request resolved: https://github.com/pytorch/pytorch/pull/14711 Reviewed By: dzhulgakov Differential Revision: D13315746 Pulled By: houseroad fbshipit-source-id: fe6f24a45c35e88fd1a2eebc09950d4430fac185	2018-12-04 00:46:41 -08:00
Katherin Yu	cbd805169f	move structs to header file (#14728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14728 Move IndexBlob,Index to header file so it can reused. Differential Revision: D13315898 fbshipit-source-id: 34432c9b8fa08af3d3387f32a940d35b02a59760	2018-12-04 00:42:41 -08:00
Lu Fang	c7f93668dc	improve the restore device test, and relax the assertion (#14734 ) Summary: Only compare the device index if device has it. Test the tensor restore with some computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14734 Reviewed By: dzhulgakov Differential Revision: D13317949 Pulled By: houseroad fbshipit-source-id: 26b2f2912a9bbc3b660a62283fb403ddab437e49	2018-12-04 00:33:09 -08:00
Adam Paszke	8812a5d42e	Reduce broadcasted inputs in derivative code (#14485 ) Summary: Previously symbolic AD formulas assumed that no broadcasting happened, and would return gradients of incorrect shapes (possibly leading to silent errors later). Fixes a few bugs (known and unknown): - #11736 - ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153) - Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values. - Undefined behavior in `aten::size` During my tests I've also found a few new problems, and I have opened issues for them: - FusionGroup seems to think that cat nodes broadcast their inputs (#14483) - `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484) This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly. cc zou3519 zdevito ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485 Reviewed By: eellison Differential Revision: D13312888 Pulled By: suo fbshipit-source-id: ad46bfb4d0a306ad9451002f8270f7a790f72d58	2018-12-04 00:16:21 -08:00
Elias Ellison	862b8cae51	interpolate (#14123 ) Summary: Add support for interpolate and upsampling in weak_script mode. Because the function parameters are overloaded, i had to add it as a builtin op. For interpolate: size can be ?int \| int[]?, and scale_factor can be ?float \| float[]?. Every combination of the two parameters needs to be supported. The same logic applies for upsample_nearest, upsample_bilinear, and upsample. There are a few fixes that I came to along the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14123 Differential Revision: D13278923 Pulled By: eellison fbshipit-source-id: e59729034369be4ce4b747291a3d1c74e135b869	2018-12-04 00:01:43 -08:00
David Riazati	a23863fd6f	Add Pooling modules to Script (#14527 ) Summary: Depends on #14584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14527 Differential Revision: D13270773 Pulled By: driazati fbshipit-source-id: e4acd43ccbce0f4b62d41c30ce8d5c721171e19a	2018-12-03 23:55:04 -08:00
David Riazati	d429e78a9a	Add fractional_max_pool2d to standard lib Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14591 Differential Revision: D13270755 Pulled By: driazati fbshipit-source-id: 138a60256795f5ef8d236c75be2cfd929059b98f	2018-12-03 23:49:38 -08:00
David Riazati	e8e494caf8	Add GroupNorm to standard library (#14722 ) Summary: Depends on #14715 for the excluded tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/14722 Differential Revision: D13317714 Pulled By: driazati fbshipit-source-id: bf1cdbc0a3803f82befed41925e91ab60e20ec82	2018-12-03 23:46:19 -08:00
Michael Suo	95e5a5ae0c	basic testing of builtin alias annotations (#14588 ) Summary: Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588 Differential Revision: D13312653 Pulled By: suo fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888	2018-12-03 22:31:02 -08:00
Sebastian Messmer	9fbc2d3153	Remove TensorImpl -> LegacyTypeDispatch dependency Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14651 Reviewed By: ezyang Differential Revision: D13285370 fbshipit-source-id: cc93c3ca95e7260762c1cabca17b8973d52c4e22	2018-12-03 21:53:28 -08:00
Sebastian Messmer	d063c9c330	Fix include paths for TensorOptions, DefaultTensorOptions, OptionsGuard Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14647 Reviewed By: ezyang Differential Revision: D13283497 fbshipit-source-id: d236d5351ecf7ab9712a55e9ef12d8bba48eb53f	2018-12-03 21:53:26 -08:00
Sebastian Messmer	46772dba0c	Move TensorOptions, DefaultTensorOptions and OptionsGuard to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14646 Reviewed By: ezyang Differential Revision: D13283494 fbshipit-source-id: c62e7f9b02551926bf8f1e3ddf6ede4ec925d28d	2018-12-03 21:53:24 -08:00
Sebastian Messmer	1098500e9b	Fix include paths for Layout.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14645 Reviewed By: ezyang Differential Revision: D13283496 fbshipit-source-id: d70881e957c886a6c2befe3ef1d2c5a3fac18e7f	2018-12-03 21:53:22 -08:00
Sebastian Messmer	771eebad7b	Move Layout to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14644 Reviewed By: ezyang Differential Revision: D13283493 fbshipit-source-id: bb02f156d6a5b5129db5743c756acc84c38eca83	2018-12-03 21:53:20 -08:00
Sebastian Messmer	5a4082612f	Fix include paths for Backend.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14643 Reviewed By: ezyang Differential Revision: D13283492 fbshipit-source-id: 9919af9707d094118efc963543320e01b07d7bc5	2018-12-03 21:53:19 -08:00
Sebastian Messmer	c303fcb9cb	Moved Backend to c10 (#14642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14642 Unfortunately, TensorOptions depends on this, so we need it in c10. Reviewed By: ezyang Differential Revision: D13283495 fbshipit-source-id: 433cd47eb18aac1131be9c5cd650efc583870a20	2018-12-03 21:53:17 -08:00
Wanchao Liang	119f9ec291	enable NoneValue parameter assignment for WeakScriptModule (#14715 ) Summary: This PR: 1. Handle None value attr in the WeakScriptModuleProxy 2. add back module tests that now passing Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715 Differential Revision: D13313573 Pulled By: wanchaol fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b	2018-12-03 20:40:55 -08:00
Zachary DeVito	bb546b2e5b	WAR for self.training (#14719 ) Summary: To enable self.training in script modules, this PR automatically adds a buffer called 'training' if a script method requests self.training. Assignment to self.training is overloaded to assign both to the boolean property and the tensor value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14719 Differential Revision: D13310569 Pulled By: zdevito fbshipit-source-id: 406387bb602f8ce5794eeff37642863c75928be5	2018-12-03 20:32:16 -08:00
Zachary DeVito	9a932b8b90	fix expect Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14730 Differential Revision: D13316463 Pulled By: zdevito fbshipit-source-id: 8b11bdb22d354c17bf2de4bded352bb6eb086ec7	2018-12-03 20:15:27 -08:00
Lu Fang	44894915d6	Automatic update of fbcode/onnx to 6b34743d2e361bbc0acb29dd73536478cb92562e (#14637 ) Summary: Previous import was f461f7aad9987635b4aff108620ed7918f002d19 Included changes: - [6b34743](https://github.com/onnx/onnx/commit/6b34743): fix the const map initializatoin (#1662) <Lu Fang> - [ae80999](https://github.com/onnx/onnx/commit/ae80999): Fuse Pad into Conv optimizer (#1580) <vloncar> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14637 Differential Revision: D13281338 Pulled By: houseroad fbshipit-source-id: c31429914bf5954fdc85e0c02168836ef47d635c	2018-12-03 20:11:17 -08:00
Edward Yang	7b6c6f76f7	Skip CUDA tests when built with CUDA but no GPUs available; rename cuda tests so they're obvious. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14706 Reviewed By: soumith Differential Revision: D13304398 fbshipit-source-id: d5e2cda965ce8bc1721489b282336ea3ca7f0471	2018-12-03 18:49:59 -08:00
Edward Yang	22ab6183c5	Move manual_seed into ATen/Context.h; delete reimplementation in test_seed.h (#14625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14625 I want to reorg the test files, but I am too lazy to make the include paths for test_seed.h work out. So just delete it. Reviewed By: gchanan Differential Revision: D13277567 fbshipit-source-id: a3e8e46e4816b6fc0fe926b20779839f9e0a1a06	2018-12-03 18:49:58 -08:00
Zachary DeVito	78d594f46c	Implement Device as a type in the script (#14666 ) Summary: [ note: stacked on expect files changes, will unstack once they land ] This adds DeviceObjType (cannot use DeviceType it is already an enum) to the type hierarchy and an isDevice/toDevice pair to IValue. Previous hacks which used an int[] to represent Device are removed and at::Device is used instead. Note: the behavior or .to is only a subset of python, we need to fix the aten op so that it accepts Option[Device] and Optional[ScalarType]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14666 Reviewed By: suo Differential Revision: D13290405 Pulled By: zdevito fbshipit-source-id: 68b4381b292f5418a6a46aaa077f1c902750b134	2018-12-03 16:54:40 -08:00
Wanchao Liang	4b31572375	Meta programming on If Stmt cond to enable conditional emit blocks (#14533 ) Summary: This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below ```python import torch class Test(torch.jit.ScriptModule): def __init__(self, b = None): self.b = b def forward(self, input): x = input if self.b is not None: x = self.b(input) return x Test()(torch.randn(2, 3)) ``` This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533 Differential Revision: D13310526 Pulled By: wanchaol fbshipit-source-id: 78d1a8127acda5e44d2a8a88f7627c43d29ff244	2018-12-03 15:47:15 -08:00
Edward Yang	298b775577	Delete temporary ATenCoreTest. (#14622 ) Summary: It was previously used to sure that ATen/core was working; but now we have plenty of headers and C++ files in ATen/core so this is no longer necessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14622 Differential Revision: D13276899 Pulled By: ezyang fbshipit-source-id: 9bef7eb1882ccdfa3ee7681a3d5b048ea94b59d3	2018-12-03 15:07:40 -08:00
Michael Suo	9ac845f734	Revert D13280899: [pytorch][PR] Reduce broadcasted inputs in derivative code Differential Revision: D13280899 Original commit changeset: 80cc5ec9331b fbshipit-source-id: 2335093cca8fd7db95470fd83b9299adfa17aa8e	2018-12-03 14:55:02 -08:00
Lu Fang	e0f68671bd	Restore device when import jit script module (#14454 ) Summary: We align the restore logic to `torch.load`, we try to restore to the right device, and if the device is not available, an exception is raised. We allow user to remap the device through a parameter `map_location`, it can be 1) a string like 'cuda:0`, `cpu`, 2) a device, torch.device('cpu'), 3) a dict, {'cuda:1', 'cuda:0'}, and a function, and its signature looks like string map_location(tensor, saved_device_string). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14454 Reviewed By: zrphercule Differential Revision: D13271956 Pulled By: houseroad fbshipit-source-id: dfd6b6049b0dc07549ddeddf2dea03ac53ba6d49	2018-12-03 14:10:30 -08:00
David Riazati	b8da44dc13	Add linear + pixelshuffle modules to standard lib Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14654 Differential Revision: D13300968 Pulled By: driazati fbshipit-source-id: 2c36aab91ea99681687f8da6d318981fee49785b	2018-12-03 14:01:16 -08:00
Adam Paszke	68ffe46991	Reduce broadcasted inputs in derivative code (#14485 ) Summary: Previously symbolic AD formulas assumed that no broadcasting happened, and would return gradients of incorrect shapes (possibly leading to silent errors later). Fixes a few bugs (known and unknown): - #11736 - ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153) - Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values. - Undefined behavior in `aten::size` During my tests I've also found a few new problems, and I have opened issues for them: - FusionGroup seems to think that cat nodes broadcast their inputs (#14483) - `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484) This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly. cc zou3519 zdevito ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485 Differential Revision: D13280899 Pulled By: soumith fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c	2018-12-03 13:44:18 -08:00
Michael Suo	b768db0810	Allow DCE to clean up some mutable ops (#14601 ) Summary: This PR makes DCE a little smarter in the presence of mutable ops. Previously mutable ops could never be cleaned up, now they can be cleaned up if we can prove there are no live uses of any alias sets that the op writes to. This behavior is optional; if you pass DCE a block instead of a graph, it will do the same thing as before. Also changed `InlineAutographSubgraph` to use the common subgraph utils. Tested on traced ResNet, and it gets rid of the dead code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14601 Differential Revision: D13309118 Pulled By: suo fbshipit-source-id: dac2791e7d2ecf219ae717a2759b83c1e927f254	2018-12-03 13:31:08 -08:00
Michael Suo	9783ce3825	Revert D13272203: [pytorch][PR] [jit] Meta programming on If Stmt cond to enable conditional emit blocks Differential Revision: D13272203 Original commit changeset: 44a545abb766 fbshipit-source-id: 8861eb4810a6c9ea4aba8427b3a07d2fa0d69a15	2018-12-03 13:28:52 -08:00
Bram Wasti	6385d00185	Move global-constructor to lazily initialized (mobile restriction) (#14650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14650 this fixes the build for mobile Reviewed By: dzhulgakov Differential Revision: D13267458 fbshipit-source-id: 83e7e76e3c875134395b6c43ea791c5b56871642	2018-12-03 13:24:56 -08:00
Wanchao Liang	5a2f5a216f	Make convertable to list also accepts optional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14598 Differential Revision: D13308254 Pulled By: wanchaol fbshipit-source-id: bd0b6f9f20294d3d589cf68732dbd8c57b67e0e9	2018-12-03 13:09:11 -08:00
Jongsoo Park	b5181ba1df	add avx512 option (but no avx512 kernel yet) (#14664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14664 This diff just adds a framework to add avx512 kernels. Please be really really careful about using avx512 kernels unless you're convinced using avx512 will bring good enough overall speedups because it can backfire because of cpu frequency going down. Reviewed By: duc0 Differential Revision: D13281944 fbshipit-source-id: 04fce8619c63f814944b727a99fbd7d35538eac6	2018-12-03 12:18:19 -08:00
Wanchao Liang	4b90702037	Meta programming on If Stmt cond to enable conditional emit blocks (#14533 ) Summary: This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below ```python import torch class Test(torch.jit.ScriptModule): def __init__(self, b = None): self.b = b def forward(self, input): x = input if self.b is not None: x = self.b(input) return x Test()(torch.randn(2, 3)) ``` This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533 Differential Revision: D13272203 Pulled By: wanchaol fbshipit-source-id: 44a545abb766bbd39b762a6e19f9ebaa295e324b	2018-12-03 12:14:52 -08:00
Teng Li	cac03280f9	Fixed DistributedDataParallel state pickling for multi-gpus (#14690 ) Summary: Fixed: https://github.com/pytorch/pytorch/issues/14678 This PR fixed DDP doesn't work after save() and load() for multiple GPUs, because, it needs all these replicating logics and bucketing in the constructor. So I refactored some of the logics in the constructor to a helper function. And this will be used for load(). Added test too. Tested on 8 GPU machines. ``` tengli@learnfair062:~/pytorch/test$ python run_test.py -i distributed --verbose Test executor: ['/private/home/tengli/miniconda3/bin/python'] Selected tests: distributed Running test_distributed ... [2018-12-02 18:33:55.833580] /public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec Running distributed tests for the mpi backend test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok ---------------------------------------------------------------------- Ran 68 tests in 6.315s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.315s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.315s OK (skipped=15) Running distributed tests for the mpi backend with file init_method test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok ---------------------------------------------------------------------- Ran 68 tests in 6.415s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.415s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.415s OK (skipped=15) Running distributed tests for the nccl backend test_Backend_enum_class (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... skipped 'nccl does not support DistributedDataParallelCPU' test_all_gather (__main__.TestDistBackend) ... skipped 'Only MPI supports CPU all gather' test_all_gather_cuda (__main__.TestDistBackend) ... skipped 'CUDA all gather skipped for NCCL' test_all_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_multigpu (__main__.TestDistBackend) ... ok test_all_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_multigpu (__main__.TestDistBackend) ... skipped 'CUDA all_reduce multigpu skipped for NCCL' test_all_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum_cuda (__main__.TestDistBackend) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_cuda (__main__.TestDistBackend) ... ok test_barrier_full_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_full_group_cuda (__main__.TestDistBackend) ... ok test_barrier_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_group_cuda (__main__.TestDistBackend) ... ok test_barrier_timeout_full_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_cuda (__main__.TestDistBackend) ... ok test_broadcast_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_multigpu (__main__.TestDistBackend) ... skipped 'NCCL broadcast multigpu skipped' test_destroy_full_group (__main__.TestDistBackend) ... ok test_destroy_group (__main__.TestDistBackend) ... ok test_gather (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_get_backend (__main__.TestDistBackend) ... ok test_get_default_group (__main__.TestDistBackend) ... ok test_get_rank (__main__.TestDistBackend) ... ok test_get_rank_size_full_group (__main__.TestDistBackend) ... ok test_get_rank_size_group (__main__.TestDistBackend) ... ok test_irecv (__main__.TestDistBackend) ... skipped 'Nccl does not support irecv' test_isend (__main__.TestDistBackend) ... skipped 'Nccl does not support isend' test_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_multigpu (__main__.TestDistBackend) ... ok test_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum_cuda (__main__.TestDistBackend) ... ok test_scatter (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_send_recv (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' test_send_recv_any_source (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv from any source' test_send_recv_with_tag (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' ---------------------------------------------------------------------- Ran 68 tests in 69.549s OK (skipped=52) Running distributed tests for the nccl backend with file init_method test_Backend_enum_class (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... skipped 'nccl does not support DistributedDataParallelCPU' test_all_gather (__main__.TestDistBackend) ... skipped 'Only MPI supports CPU all gather' test_all_gather_cuda (__main__.TestDistBackend) ... skipped 'CUDA all gather skipped for NCCL' test_all_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_multigpu (__main__.TestDistBackend) ... ok test_all_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_multigpu (__main__.TestDistBackend) ... skipped 'CUDA all_reduce multigpu skipped for NCCL' test_all_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum_cuda (__main__.TestDistBackend) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_cuda (__main__.TestDistBackend) ... ok test_barrier_full_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_full_group_cuda (__main__.TestDistBackend) ... ok test_barrier_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_group_cuda (__main__.TestDistBackend) ... ok test_barrier_timeout_full_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_cuda (__main__.TestDistBackend) ... ok test_broadcast_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_multigpu (__main__.TestDistBackend) ... skipped 'NCCL broadcast multigpu skipped' test_destroy_full_group (__main__.TestDistBackend) ... ok test_destroy_group (__main__.TestDistBackend) ... ok test_gather (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_get_backend (__main__.TestDistBackend) ... ok test_get_default_group (__main__.TestDistBackend) ... ok test_get_rank (__main__.TestDistBackend) ... ok test_get_rank_size_full_group (__main__.TestDistBackend) ... ok test_get_rank_size_group (__main__.TestDistBackend) ... ok test_irecv (__main__.TestDistBackend) ... skipped 'Nccl does not support irecv' test_isend (__main__.TestDistBackend) ... skipped 'Nccl does not support isend' test_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_multigpu (__main__.TestDistBackend) ... ok test_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum_cuda (__main__.TestDistBackend) ... ok test_scatter (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_send_recv (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' test_send_recv_any_source (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv from any source' test_send_recv_with_tag (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' ---------------------------------------------------------------------- Ran 68 tests in 70.381s OK (skipped=52) `` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14690 Differential Revision: D13294169 Pulled By: teng-li fbshipit-source-id: 69ccac34c6c016899bfe8fbc50b48d4bfd1d3876	2018-12-03 12:04:26 -08:00
Edward Yang	18eaec7121	Add (unused) HIP API to the Context object. (#14623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14623 This is the last piece we need before we can start doing out-of-place HIPify on ATen. These APIs are not actually used at the moment; as we still do in-place HIPify which uses CUDA. Reviewed By: gchanan Differential Revision: D13277246 fbshipit-source-id: 771efa81c2d2022e29350f25a5b4bb8f49ac6df0	2018-12-03 10:54:57 -08:00
Edward Yang	b1faab3d8f	Replace THCState_getCurrentStream with direct at::cuda::getCurrentCUDAStream() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14500 Reviewed By: gchanan Differential Revision: D13241401 fbshipit-source-id: d78cf8ddce96876bedc1d14507b0646bcfd41aed	2018-12-03 10:54:55 -08:00
Edward Yang	a49bf21d50	Delete hasCuDNN from Context. (#14499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14499 It still needs to stay in hooks, since it's part of the public C++ API, but I want library code to try to arrange for CuDNN checks to occur inside CUDA code, where we it's statically obvious if CuDNN is available (and you don't need to dynamic dispatch. Reviewed By: gchanan Differential Revision: D13241355 fbshipit-source-id: 4e668a5914ab890463a12d9e528ba4ecbb7dd7c2	2018-12-03 10:54:54 -08:00
Edward Yang	eb71df3e63	Delete at::current_device(), Context::current_device() and Context::getNumGPUs() (#14414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14414 The previous functions were CUDA-centric, and lead to lots of places where we improperly assumed that CUDA is the only game in town (it's not). Best to delete them. What are your alternatives? This diff fix some use sites which may give you some ideas. In particular, the "given a device type, give me the current device for that device type" might be a good function to enshrine for real. Reviewed By: gchanan Differential Revision: D13218540 fbshipit-source-id: 2f42cd6b9bdab4930d25166b8041c9466a1c6e0a	2018-12-03 10:54:52 -08:00
Wei Yang	5ee8312b63	sparse.mm(), reland #14526 (#14661 ) Summary: - reland reverted PR #14526 with doc fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/14661 Differential Revision: D13289047 Pulled By: weiyangfb fbshipit-source-id: 5b843a11a58b56aeada3af2680a27cf89ecef4d8	2018-12-03 10:39:27 -08:00
Pieter Noordhuis	7da2448d62	Fix multi-argument allreduce in ProcessGroupGloo (#14688 ) Summary: If multiple arguments are specified to c10d allreduce, they are interpreted as if they are expanding the ranks in the process group. Therefore, not only is every argument to allreduce an input that must be considered, it is also an output. The problem that this commit fixes is that they were not correctly considered as outputs. The upstream problem is tracked in facebookincubator/gloo#152. Once this is fixed there we can remove the copies that this commit adds. This fixes #14676. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14688 Differential Revision: D13294405 Pulled By: pietern fbshipit-source-id: 078a2a0a0ff12d051392461438f1496201ec3cb9	2018-12-03 09:41:17 -08:00
Gregory Chanan	b15242f70c	Assert all legacy operators are 'extended_method', remove codegen for… (#14649 ) Summary: … other paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14649 Differential Revision: D13285183 Pulled By: gchanan fbshipit-source-id: 91a58a22cba7e00eb0931bc277b0cb9d6f05cfdc	2018-12-03 07:41:50 -08:00
Gregory Chanan	737efa78ba	Remove 'type_method_inline_definitions' which isn't used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14648 Differential Revision: D13284176 Pulled By: gchanan fbshipit-source-id: e6b8f9410fab57164259f97de2fd46f6bdf88d5a	2018-12-03 07:38:21 -08:00
Edward Yang	b96e6ee98d	Delete defunct DynamicCUDAInterface (#14621 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14621 Differential Revision: D13276723 Pulled By: ezyang fbshipit-source-id: b666b2cdf4c45ccec7c802e268878eb2f3e028aa	2018-12-03 07:33:05 -08:00
Gregory Chanan	af95f712b0	Get rid of deprecated_factory_method in codegen, which is no longer u… (#14641 ) Summary: …sed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14641 Differential Revision: D13283449 Pulled By: gchanan fbshipit-source-id: 35cedc48940fa6144b4eab6402d9e1dc74a67b65	2018-12-03 07:28:42 -08:00
Jongsoo Park	5c89190340	inline adagrad functions (#14194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14194 Inline some of perfkernels/adagrad.h functions for better performance Reviewed By: hyuen Differential Revision: D13096351 fbshipit-source-id: b4da8053278d585eabc5389b8a8dcae0f253b413	2018-12-02 20:23:02 -08:00
Pieter Noordhuis	74c3cbc013	Increase test barrier timeout for barrier test (#14689 ) Summary: The CUDA initialization for the participating processes can take long enough for the barrier timeout to trigger on the process that doesn't participate in the group. See #14676. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14689 Reviewed By: teng-li Differential Revision: D13293695 Pulled By: pietern fbshipit-source-id: 6268dc9acfdb22f70c027e5e4be082f7127c0db4	2018-12-02 17:46:17 -08:00
Teng Li	5268dd468c	Fixed DistributedDataParallel cannot kick off all-reduce in a corner case (#14675 ) Summary: Ok, this corner happens for translation guys, and it only happens in the following corner case: (1) when the module is registered a parameter that does not requires grad and (2) this registered parameter has a unique type (say, double, or half) and it's the only unique type such that itself alone will be put into a separate bucket. and (3) it is the last parameter that got registered in the module, such that its bucket reduction is the first to be kicked off. Once this corner case happens, since it does not require grad, the backward hook won't be kicked off. Now that all other buckets are waiting for its bucket to be kicked off, in this case, no bucket will be kicked off since it's blocked by the first bucket (the unique type parameter). This PR fixes two things: (1) Make sure that we will only bucket parameters that requires_grad (2) Make all-reduction checks in the next iteration. As long as we detect the previous iteration's all-reduction has not been fully kicked off, we will issue an error in the next iteration. (3) Also removed some unused variables With this bug fixed, the only case when this error can happen is when the user changed parameters later after wrapping up the module with DDP, like the case in: https://github.com/pytorch/pytorch/issues/12603 Test covered as well Without the first fix, I varied that the repro in fbcode hit this error message: ``` result = self.forward(input, *kwargs) File "/data/users/tengli/fbsource/fbcode/buck-out/dev/gen/language_technology/neural_mt/os/pytorch_translate/train#link-tree/torch/nn/parallel/distributed.py", line 312, in forward raise RuntimeError("Not all gradients are all-reduced from " RuntimeError: Not all gradients are all-reduced from the backward of the previous iteration. This is unexpected and fatal error. Please check and ensure that the model's parameters are not changed after you wrap up the model with DistributedDataParallel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14675 Differential Revision: D13291083 Pulled By: teng-li fbshipit-source-id: 2539b699fae843f104b4b8d22721ae82502ba684	2018-12-02 17:13:07 -08:00
peter	35c8f93fd2	Fix CUDA 8 build on Windows (#14665 ) Summary: Fixes #14663. Test for CUDA 8 is running here: https://dev.azure.com/pytorch/PyTorch/_build/results?buildId=54 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14665 Differential Revision: D13290392 Pulled By: soumith fbshipit-source-id: 57f0d5b704e5d1fcb4927cbc007327b4ed74f443	2018-12-01 16:50:38 -08:00
Ravi Vats	da2c3afa47	Fixed typo in README.md (#14346 ) Summary: Fixed the typo in the Docker image section of README.md file Pull Request resolved: https://github.com/pytorch/pytorch/pull/14346 Differential Revision: D13290403 Pulled By: soumith fbshipit-source-id: 1d848027a773f0cfc875c33d69a66e96abc7ac8b	2018-12-01 16:39:33 -08:00
Zachary DeVito	4c11dee0e8	Use Type::str() in Type::operator<< (#14657 ) Summary: Stacked on zip commit because it also changes expect files, read only the last commit. This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657 Differential Revision: D13288912 Pulled By: zdevito fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697	2018-12-01 00:53:27 -08:00
svcscm	143e171cb9	Updating submodules Reviewed By: yns88 fbshipit-source-id: 6b3905b999b1211196c9138d7236700a1b308491	2018-11-30 19:47:44 -08:00
Zachary DeVito	170ff7764f	Use a zip archive as our container format (#14521 ) Summary: After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format. miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it. This replaces our custom container format with a zip archive, preserving all of the desirable features of our custom format, such as append-oriented writing, and mmap'able tensor data while adding a bunch of debugging advantages: 1. You can unzip and explore the container to debug what is going on with a model. 2. You can edit the model using a text editor (e.g. change the definition of a method, or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress' option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug serialized models. 3. We can easily enable features like compression in the future. 4. Stock python , without pytorch installed, and other programming languages can reasonably consume this format,using json and zipfile packages, which enables people to build tools like visualizers without those visualizers depending on pytorch. This will be especially useful if you want to, for instance, write a visualizer in javascript. Notes: * This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained library for reading/writing zipfiles that unlike other zip libraries also includes libz compatible compress/decompress support. It is a single header and a single C file without any other dependencies. Note that the instructions for miniz explicitly state: > Please use the files from the releases page in your projects. Do not use the git checkout directly! So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable to doing zip-align style things to align data. * Removes 'size' from RecordRef. This allows you to edit files in the zip archive without editing the meta-data file. Very important if you want to print-debug serialized models. * PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings) However, their implementation is completely swapped out to use miniz. * Code exists to check for the old magic number to give a decent warning to our preview users after we change the format. * Container version information is now put in a stand-alone 'version' file in the archive and serves a similar purpose to the other container version info. * All files in the zip archive start at 64-byte boundaries, using an approach similar to zip-align. Tests check that this property remains true. While the writer does this, the reader doesn't depend on it, allowing user-created archives that can use compression, and do not have to align data. * Added test to check for > 4GB files and archives. Disabled by default because it takes almost 2 minutes to run. * torchscript files are now optional: if a submodule does not have methods, it will not be written. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521 Reviewed By: jamesr66a Differential Revision: D13252945 Pulled By: zdevito fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c	2018-11-30 19:19:29 -08:00
Alyssa Wang	1c21dc6e16	Revert D13252990: [pytorch][PR] [sparse] sparse.mm(S, D) Differential Revision: D13252990 Original commit changeset: 8fdb14144405 fbshipit-source-id: 49b8b0759a6e647854689962ffa72a205b4a2088	2018-11-30 18:53:47 -08:00
Jerry Zhang	c71edcc747	Tensor construction codemod - caffe2/caffe2/fb/operators - 2/3 Summary: Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13229251 fbshipit-source-id: 88b3984ea8ca82b9489c0ee9a338fd3f41dee615	2018-11-30 18:38:17 -08:00
Bram Wasti	fd17fd4aa9	Fix 'unknown type name 'optional'' (#14383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14383 D11669870 seems to have missed a spot that wasn't triggered before the stacked code above Reviewed By: smessmer Differential Revision: D13198269 fbshipit-source-id: 74592bedae0721acee744e31ca95253ea6efdedb	2018-11-30 17:29:50 -08:00
Wanchao Liang	7f42d1c98a	fix double precision cast from pybind (#14417 ) Summary: JIT world only have double, not float, so when insertConstant, we need to cast the python `float_` to double instead of float. This will fix the incorrect `math.pi` and other high precision constants value Pull Request resolved: https://github.com/pytorch/pytorch/pull/14417 Differential Revision: D13282975 Pulled By: wanchaol fbshipit-source-id: 26a4c89ffc044d28598af673aebfec95153a869e	2018-11-30 17:25:32 -08:00
Elias Ellison	404ad939e5	Revert existing no_grad_embedding_renorm_ from aten (#14639 ) Summary: Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed. Instead, make a custom op, and use NoGradGuard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639 Differential Revision: D13285604 Pulled By: eellison fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7	2018-11-30 16:57:51 -08:00
Yan Zhu	aeb38cfcea	cuda implementation for PackSegment to support presence mask (#14635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14635 as title Reviewed By: enosair Differential Revision: D13254097 fbshipit-source-id: b9f40109e2889907c925f9a4df9da14f67f45f38	2018-11-30 16:54:10 -08:00
svcscm	1d464d7f3e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 17487c327cbe48969dff397656fe90efcf23b699	2018-11-30 16:23:00 -08:00
Zeming Lin	26f3fb34a1	Build distributed libs in build_libtorch.py (#14037 ) Summary: This patch detects and builds c10d and gloo for the C++ API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14037 Reviewed By: ezyang Differential Revision: D13283801 Pulled By: ebetica fbshipit-source-id: 006dbb691344819833da6b4b844c1f0572942135	2018-11-30 14:46:36 -08:00
Gregory Chanan	36c5f40ec0	Remove methods from _th_triu_ and _th_addcmul_. (#14624 ) Summary: These somehow slipped through when we moved all of Declarations.cwrap to functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14624 Reviewed By: ezyang Differential Revision: D13277434 Pulled By: gchanan fbshipit-source-id: e83451e2d0fdafb55635d4b757688a501454bf8c	2018-11-30 14:19:29 -08:00
Wei Yang	c3a2b1e155	sparse.mm(S, D) (#14526 ) Summary: - add `sparse.mm(S, D)` with backward - for `sparse.addmm()`, relax input constraint so that sparse matrix input doesn't have to coalesced Pull Request resolved: https://github.com/pytorch/pytorch/pull/14526 Reviewed By: ezyang Differential Revision: D13252990 Pulled By: weiyangfb fbshipit-source-id: 8fdb14144405a2122d4b8447ad4055cd0330e6e8	2018-11-30 14:15:34 -08:00
Freddie Mendoza	a84e873bb1	Put back linker flag for OpenMP to prevent build break on ppc64le (#14569 ) Summary: See #14539 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14569 Differential Revision: D13282161 Pulled By: ezyang fbshipit-source-id: 13a1131b26fa300b037f66d1919b97d14033f9e5	2018-11-30 14:13:04 -08:00
Peter Goldsborough	5c1692840e	Remove OptionsGuard from ATen (#14524 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/13738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14524 Differential Revision: D13268031 Pulled By: goldsborough fbshipit-source-id: fb306464b673c05ebd26d0f44d688ccd92d1d8c5	2018-11-30 13:30:35 -08:00
Jerry Zhang	4b915260c7	Explicitly ban uninitialized tensors when invoking Predictor classes (#14377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14377 att Reviewed By: dzhulgakov Differential Revision: D13197348 fbshipit-source-id: 85a451bde3a57a8acdd3af548606c05e223896a6	2018-11-30 13:26:00 -08:00
Fei Sun	738fc7054b	Report timer in benchmarking when requested Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14570 Reviewed By: llyfacebook Differential Revision: D13264904 Pulled By: sf-wind fbshipit-source-id: fd05bc32202b7734dc911e3c792357ddf9ecedee	2018-11-30 13:17:29 -08:00
Peter Goldsborough	f45405bf5b	Fix inheritance for SharedDataset (#14629 ) Summary: ezyang ebetica CC jaliyae Pull Request resolved: https://github.com/pytorch/pytorch/pull/14629 Differential Revision: D13278988 Pulled By: goldsborough fbshipit-source-id: 53afbcd1f3fc5cb23046ff92c4345cd90abd4584	2018-11-30 12:29:45 -08:00
David Riazati	814b5715ba	Move module tests to common_nn (#14578 ) Summary: This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so that they can be used in `test_jit.py` without running any of `test_nn.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578 Differential Revision: D13268286 Pulled By: driazati fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03	2018-11-30 12:14:59 -08:00
svcscm	c042f69dbb	Updating submodules Reviewed By: yns88 fbshipit-source-id: 863e9e2a1f0810f96494cabae1724622b9eb91ff	2018-11-30 11:47:16 -08:00
Brennan Vincent	5ae0ed8552	Remove default constructor lines that do nothing, and fix warnings with clang trunk (#14300 ) Summary: The lines removed in this diff were no-op, but confusing: the default constructors in `store_handler.h` are implicitly deleted, since `std::runtime_error` has no default constructor. Clang added a warning for this behavior [in September 2018](https://reviews.llvm.org/rL343285) (note that the warning is not just for cxx2a, despite the slightly confusing commit message), so building pytorch with a recent build of clang trunk causes spew of this warning, which is fixed by the present PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14300 Differential Revision: D13260039 Pulled By: umanwizard fbshipit-source-id: 92788dbd6794253e788ef26bde250a66d8fb917e	2018-11-30 11:16:35 -08:00
Roy Li	c03851e93a	remove copy_wrapper (#13937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13937 We can now replace s_copy_ with our new _copy_ function. Experimented with moving s_copy_ out of VariableManualType.cpp, but seemed like there was enough special casing to warrant it staying. Reviewed By: ezyang Differential Revision: D13053648 fbshipit-source-id: e9e04d460baf4ee49b500212cf91b95221acd769	2018-11-30 11:12:59 -08:00
Roy Li	5c65a7812e	Move non_blocking copies to aten (#13866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13866 just a straightforward port Reviewed By: ezyang Differential Revision: D13011878 fbshipit-source-id: f288efebf78fa634abfb681b938b44277064d5b6	2018-11-30 11:12:57 -08:00
Roy Li	e3840419ec	Move cuda copy to aten (#13348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13348 Move cross device, cpu to device, device to cpu copies to aten. Most of it is a direct port, main difference is that we dispatch from a single _copy_ function for copies. Reviewed By: ezyang Differential Revision: D12850690 fbshipit-source-id: c2e3f336796b4ae38be6027d2ec131a274a6aa8c	2018-11-30 11:12:55 -08:00
Roy Li	0786dfee7c	Move THTensor_(copy) to aten (#13603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603 P Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_. Reviewed By: ezyang Differential Revision: D12936031 fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1	2018-11-30 11:12:54 -08:00
Sam Gross	c1c841a4e7	Changes based on @gchanan's review of #13420 (#14441 ) Summary: ``` The most significant change is that this fixes the error message when indexing an empty tensor with an out-of-bounds index. For example: x = torch.ones(10, 0) x[:, [3, 4]] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14441 Differential Revision: D13226737 Pulled By: colesbury fbshipit-source-id: d1c4a35a30e3217e3d1727d13f6b354a4a3b2a24	2018-11-30 11:03:20 -08:00
Michael Carilli	edb3ddf1a5	Accumulate grad fix (#14587 ) Summary: Rebased version of https://github.com/pytorch/pytorch/pull/13337. I don't think the lint errors in the original PR had to do with files I touched, so hopefully the rebase fixes them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14587 Differential Revision: D13277428 Pulled By: soumith fbshipit-source-id: f04c186b1dd4889b4250597eef87f9e9bf7b2426	2018-11-30 10:49:15 -08:00
fehiepsi	67308a9323	Fix expanded mvn and lowrankmvn (#14557 ) Summary: This PR fixes an issue of the slowness expanded MVN. A notebook to show the problem is [here](https://gist.github.com/fehiepsi/b15ac2978f1045d6d96b1d35b640d742). Basically, mvn's sample and log_prob have expensive computations based on `cholesky` and `trtrs`. We can save a lot of computation based on caching the unbroadcasted version of `scale_tril` (or `cov_diag`, `cov_factor` in lowrank mvn). When expanding, this cached tensor should not be expanded together with other arguments. Ref: https://github.com/uber/pyro/issues/1586 cc neerajprad fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/14557 Differential Revision: D13277408 Pulled By: soumith fbshipit-source-id: a6b16f999b008d5da148ccf519b7f32d9c6a5351	2018-11-30 10:49:13 -08:00
Jerry Zhang	2e0f3b038c	Tensor construction: combine Resize+mutable_data - 2/4 (#14205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14205 Original commit changeset: 8f9fb55842ae Reviewed By: dzhulgakov Differential Revision: D13126263 fbshipit-source-id: 12ba89e31b7738a81ec5c660ea7b79e8576c35dc	2018-11-30 10:46:58 -08:00
Daya S Khudia	f6354d903a	Unit tests need better compilation flow (#14547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14547 Unit tests used in dnnlowp need a better compilation flow as some of them need avx. Disabling for now so that pytorch builds with fbgemm. Reviewed By: jianyuh Differential Revision: D13240933 fbshipit-source-id: e2e187b758c5d89e524470cd261ce35493f427a2	2018-11-30 09:40:29 -08:00
Soumith Chintala	aa842fe101	clean up linkage options (#14609 ) Summary: minor code cleanup Differential Revision: D13277803 Pulled By: soumith fbshipit-source-id: 5ef925fe95037cab540b329054d7070c1ea7031e	2018-11-30 09:36:59 -08:00
Soumith Chintala	ad1b874a36	set mkl_set_dynamic to false (#13868 ) Differential Revision: D13277331 Pulled By: soumith fbshipit-source-id: 692bb7d5157235e00dea4776d1991bb07e16ff85	2018-11-30 09:29:43 -08:00
Soumith Chintala	37627a182b	fix USE_SYSTEM_NCCL build (#14606 ) Summary: fixes https://github.com/pytorch/pytorch/issues/14537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14606 Differential Revision: D13274156 Pulled By: soumith fbshipit-source-id: f834715e8e17dacf60be459b0efffba1d4df40ae	2018-11-29 23:36:17 -08:00
CircleCI	ff91de43de	Set output of aten::mm to have the same output type as the original node after op canonicalization. (#14602 ) Summary: In CanonalizeOp, addmm is separated into mm and add. But output dimension and type are not preserved for the aten::mm node. Fixing this so that the dumped graph after this pass contains accurate information. sample output: before: %6 : Dynamic = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0] after: %6 : Float(32, 200) = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0] Pull Request resolved: https://github.com/pytorch/pytorch/pull/14602 Differential Revision: D13273754 Pulled By: soumith fbshipit-source-id: 82e22b5f30e9eb6ba9249c5a2216955421f39cc7	2018-11-29 23:24:27 -08:00
David Riazati	89c3dbcad8	Add binary cross entropy to standard lib Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14583 Differential Revision: D13269423 Pulled By: driazati fbshipit-source-id: 7cc1594d8189c3e8f2d4ce0462fdc0a03683006e	2018-11-29 22:23:13 -08:00
David Riazati	1f6d9f44fc	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13272741 Pulled By: driazati fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce	2018-11-29 22:18:55 -08:00
Pieter Noordhuis	3648c269e9	Misc distributed documentation updates (#14605 ) Summary: * s/environmental/environment/g * Casing (CUDA, InfiniBand, Ethernet) * Don't embed torch.multiprocessing.spawn but link to it (not part of the package) * spawn _function_ instead of _utility_ (it's mentioned after the launch utility which is a proper utility) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14605 Differential Revision: D13273480 Pulled By: pietern fbshipit-source-id: da6b4b788134645f2dcfdd666d1bbfc9aabd97b1	2018-11-29 21:51:43 -08:00
Pieter Noordhuis	11ef5191ff	Enable tests for CPU tensors in test_distributed.py (#14572 ) Summary: These were not enabled after adding support in the Gloo backend. The argument checks in ProcessGroupGloo raised an error in two cases: * If the input tensor list to scatter was ``[None]`` on processes other than the source process. * If the output tensor list to gather was ``[None]`` on processes other than the destination process. This commit prepares these arguments explicitly instead of boxing them at the process group call site. This fixes #14536. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14572 Differential Revision: D13272812 Pulled By: pietern fbshipit-source-id: 12cb0d85ec92f175365cbada585260f89330aad8	2018-11-29 21:39:02 -08:00
James Reed	1975917d0e	fix copy_ (#14593 ) Summary: Closes https://github.com/pytorch/pytorch/issues/14590 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14593 Differential Revision: D13272510 Pulled By: jamesr66a fbshipit-source-id: b6921a98460c371d435277c416dad0b5ab0fec8c	2018-11-29 20:31:53 -08:00
Pieter Noordhuis	220ce8046e	Binding for prctl(PR_SET_PDEATHSIG) (#14491 ) Summary: If torch.multiprocessing.spawn is used to launch non-daemonic processes (the default since #14391), the spawned children won't be automatically terminated when the parent terminates. On Linux, we can address this by setting PR_SET_PDEATHSIG, which delivers a configurable signal to child processes when their parent terminates. Fixes #14394. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491 Differential Revision: D13270374 Pulled By: pietern fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c	2018-11-29 20:09:19 -08:00
Teng Li	9127ab3866	Fixed new_group won't work for two or more different rank groups (#14529 ) Summary: This fixed two things: (1) NCCL group doesn't support 2 or more groups, this is because, we need a group name in ProcessGroupNCCL class to keep track of the ProcessGroup ID within that group name, and also the NCCL unique ID within that group name and process group ID. Otherwise, different processes will create different NCCL PG in different orders and can clash on these names. This will fix the NCCL problem. (2) When using new_group, each rank should enter this function and update its global group name counter to ensure that every rank always operates on the same group name. With both fixes: repro code in: https://github.com/pytorch/pytorch/issues/14528 should work with both NCCL and Gloo backends. ``` tengli@learnfair096:~$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=30000 ~/github_issues/nccl_group.py rank: 0 - val: 6.0 rank: 2 - val: 6.0 rank: 3 - val: 6.0 rank: 1 - val: 6.0 rank: 4 - val: 22.0 rank: 6 - val: 22.0 rank: 5 - val: 22.0 rank: 7 - val: 22.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14529 Differential Revision: D13253434 Pulled By: teng-li fbshipit-source-id: 8eb45882b996b06d951fc9a306d5de86a42e8b84	2018-11-29 19:57:47 -08:00
svcscm	e227aa9e2e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 44cd40cc9bc25629ec9547327a515bac22e5c905	2018-11-29 19:46:35 -08:00
David Riazati	67e3905bc6	Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script Differential Revision: D13268293 Original commit changeset: cb33c6dcdadd fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130	2018-11-29 19:19:35 -08:00
Teng Li	0d3cb91d8c	Make env init_method support both env and args for rank and size (#14494 ) Summary: Fixing: https://github.com/pytorch/pytorch/issues/14446 This was a supported behavior in old torch.distributed. We want to support it in the new release. Test should cover all combination of scenario when we have either env or arg set up for rank or size or both Pull Request resolved: https://github.com/pytorch/pytorch/pull/14494 Differential Revision: D13253433 Pulled By: teng-li fbshipit-source-id: c05974d84f1bdf969f74ec45763e11a841fe4848	2018-11-29 18:48:20 -08:00
Edward Yang	1a9602d5db	Delete caffe2_cuda_full_device_control (#14283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14283 According to Yangqing, this code was only used by us to do some end-to-end performance experiments on the impact of cudaSetDevice and cudaGetDevice. Now that the frameworks are merged, there are a lot of bare calls to those functions which are not covered by this flag. It doesn't seem like a priority to restore this functionality, so I am going to delete it for now. If you want to bring it back, you'll have to make all get/set calls go through this particular interfaces. Reviewed By: dzhulgakov Differential Revision: D13156472 fbshipit-source-id: 4c6d2cc89ab5ae13f7c816f43729b577e1bd985c	2018-11-29 18:33:22 -08:00
Edward Yang	8617b780cf	Replace use of 'int' with more descriptive 'DeviceIndex' or 'StreamId'. (#14282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14282 This also is a substantive change, as 'DeviceIndex' and 'StreamId' are narrower types than 'int'. Reviewed By: Yangqing, smessmer Differential Revision: D13156471 fbshipit-source-id: 08aa0f70c4142415b6bd4d17c57da0641c1d0e9a	2018-11-29 18:33:21 -08:00
Zachary DeVito	fd31eae9ad	Switch import/export to python printing (#14400 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/14378, only look at the last commit. This changes the way methods are defined in TorchScript archives to use PythonPrint rather than ONNX protobufs. It also updates torch.proto to directly document the tensor data structure actually being serialized. Notes: * because PythonPrint prints all the methods at once per module, this removes MethodDef in favor of a single torchscript_area and a separate caffe2_graphs entry. Note that NetDef's already have method names, so there is no need or a separate method name entry. * This switches cpp/pickle area to RecordRef (references to a file in the container format) since it is possible the data in these arenas may be large and not suited to json ouput. * Removes 'annotations' -- annotations should be re-added on the first commit that actually has a practical use for them. In the current state it is unlikely they are representing the right information. * Some expect files have changed because PythonPrint is preserving more debug name information for parameter names. * MethodEncoder (the ONNX output format) has been deleted. There is still some cleanup possible combining EncoderBase and GraphEncode now that there is only a single pathway using EncoderBase. * This incorporates the changes from #14397 to define TensorDef Pull Request resolved: https://github.com/pytorch/pytorch/pull/14400 Reviewed By: suo Differential Revision: D13231800 Pulled By: zdevito fbshipit-source-id: af5c1152d0bd6bca8b06c4703f59b161bb19f571	2018-11-29 17:53:49 -08:00
Teng Li	2b7345bcd5	PT1 distributed doc update (#14530 ) Summary: Removed an incorrect section. We don't support this. I wrote this from my memory :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/14530 Differential Revision: D13253471 Pulled By: teng-li fbshipit-source-id: c3f1ffc6c98ef8789157e885776e0b775ec47b15	2018-11-29 17:50:47 -08:00
David Riazati	75eccffdfe	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13268293 Pulled By: driazati fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd	2018-11-29 17:26:29 -08:00
David Riazati	15e8bb379e	Add `List` to annotations (#14482 ) Summary: This PR adds a polyfill for `typing.List` for Python versions that don't support `typing` as a builtin. It also moves the type defintions from `annotations.py` so that they can be used in `torch.nn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14482 Differential Revision: D13237570 Pulled By: driazati fbshipit-source-id: 6575b7025c2d98198aee3b170f9c4323ad5314bd	2018-11-29 17:23:29 -08:00
Lu Fang	2752ad8045	Automatic update of fbcode/onnx to f461f7aad9987635b4aff108620ed7918f002d19 (#14568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14568 Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e Included changes: - [f461f7a](https://github.com/onnx/onnx/commit/f461f7a): Show the op's type and name when the shape inference is failed. (#1623) <Jerry> - [ab8aaf9](https://github.com/onnx/onnx/commit/ab8aaf9): Add scan test case (#1586) <G. Ramalingam> - [c95357e](https://github.com/onnx/onnx/commit/c95357e): link the tutorial (#1650) <Lu Fang> - [d7e2420](https://github.com/onnx/onnx/commit/d7e2420): Upgrade label encoder to support more input types (#1596) <Wei-Sheng Chin> - [6425108](https://github.com/onnx/onnx/commit/6425108): Add Doc about Adding New Operator into ONNX (#1647) <Lu Fang> - [295889c](https://github.com/onnx/onnx/commit/295889c): use an empty initializer to create map (#1643) <Lu Fang> - [e38f3ec](https://github.com/onnx/onnx/commit/e38f3ec): Remove redundant const (#1639) <daquexian> - [ea694bf](https://github.com/onnx/onnx/commit/ea694bf): implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (#1565) <Armen> - [6db386e](https://github.com/onnx/onnx/commit/6db386e): make output shape clear enough for Softmax family (#1634) <Lu Fang> - [2b67c6e](https://github.com/onnx/onnx/commit/2b67c6e): fix batchnorm doc (#1633) <Lu Fang> - [c901784](https://github.com/onnx/onnx/commit/c901784): remove inappropriate consts (#1632) <Lu Fang> - [de82119](https://github.com/onnx/onnx/commit/de82119): Shape inference fix for broadcast, concat and scan (#1594) <KeDengMS> - [d7ffe3b](https://github.com/onnx/onnx/commit/d7ffe3b): Update Optimizer Docs (#1607) <Armen> - [d09d139](https://github.com/onnx/onnx/commit/d09d139): mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (#1466) <Yuta Okamoto> - [eb4b7c2](https://github.com/onnx/onnx/commit/eb4b7c2): allow variadic parameters of different types (#1615) <G. Ramalingam> - [4166246](https://github.com/onnx/onnx/commit/4166246): Fix onnxifi test (#1617) <Yinghai Lu> - [6706a4d](https://github.com/onnx/onnx/commit/6706a4d): Fix a bug in vector address access (#1598) <Raymond Yang> - [ae39866](https://github.com/onnx/onnx/commit/ae39866): Separate types of inputs 1 and 2 in OneHot op. (#1610) <Spandan Tiwari> - [45ba661](https://github.com/onnx/onnx/commit/45ba661): Handle new types in the switch. (#1608) <Dmitri Smirnov> - [14853b6](https://github.com/onnx/onnx/commit/14853b6): Bump docker image version to 230 used in CircleCI (#1606) <bddppq> - [e0993b8](https://github.com/onnx/onnx/commit/e0993b8): [onnxifi] Make sure that backend handles run async. (#1599) <Roman Dzhabarov> - [e6965cc](https://github.com/onnx/onnx/commit/e6965cc): Introduce SparseTensor ML proto (#1554) <Dmitri Smirnov> - [75b782f](https://github.com/onnx/onnx/commit/75b782f): In driver test check the return status of onnxGetBackendIDs (#1597) <bddppq> - [c05b364](https://github.com/onnx/onnx/commit/c05b364): Make CI log less verbose (#1595) <bddppq> - [fa568e4](https://github.com/onnx/onnx/commit/fa568e4): Loop type shape inferencing (#1591) <Scott McKay> - [937e64c](https://github.com/onnx/onnx/commit/937e64c): add uint8 (#1590) <Lu Fang> - [f86e951](https://github.com/onnx/onnx/commit/f86e951): Add domain as an optional parameter for make_node function (#1588) <Young Kim> - [ff45588](https://github.com/onnx/onnx/commit/ff45588): Remove unreachable code in shape_inference.h (#1585) <Changming Sun> - [f7dcad0](https://github.com/onnx/onnx/commit/f7dcad0): Add several hyperbolic function ops. (#1499) <Sergii Dymchenko> - [a60ac7d](https://github.com/onnx/onnx/commit/a60ac7d): Add OneHot op to ONNX. (#1567) <Spandan Tiwari> - [f6c3a7e](https://github.com/onnx/onnx/commit/f6c3a7e): [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (#1583) <Roman Dzhabarov> - [88d1784](https://github.com/onnx/onnx/commit/88d1784): Fix MaxUnpool shape inference when output_shape is provided as input (#1578) <Spandan Tiwari> - [20041b7](https://github.com/onnx/onnx/commit/20041b7): Add type shape inferencing for the If operator (#1571) <Scott McKay> - [d6c4c75](https://github.com/onnx/onnx/commit/d6c4c75): Add a virtual destructor to GraphInferencer (#1574) <Changming Sun> - [a339598](https://github.com/onnx/onnx/commit/a339598): fix ConvTranspose spec (#1566) <Wenhao Hu> Reviewed By: zrphercule Differential Revision: D13263831 fbshipit-source-id: a2ff22c6454e2430429e5a7d18d21661a7ffb0cb	2018-11-29 16:31:56 -08:00
Jane Wang	dc7498c84d	add gloo support for reduce on GPU (#14443 ) Summary: as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/14443 Reviewed By: pietern Differential Revision: D13222907 Pulled By: janewangfb fbshipit-source-id: f418c5d84880196f97089114d02957cf739243f8	2018-11-29 16:19:39 -08:00
Edward Yang	69d3c00ae1	Expunge use of type() from SparseTensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14546 Reviewed By: gchanan Differential Revision: D13258512 fbshipit-source-id: b2d562b6c5228288f60f02beab3c44c50163248f	2018-11-29 16:04:18 -08:00
Edward Yang	c7f828809b	Expunge occurrences of type() from scalar_test (#14545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14545 Self explanatory Reviewed By: gchanan Differential Revision: D13258513 fbshipit-source-id: abce357de57b95cde58b3894c251da519ede6b53	2018-11-29 16:04:16 -08:00
Edward Yang	9aea856115	Expunge use of type() in Distributions.cpp (#14544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14544 Modern usage is options(). This doesn't have a functional difference, because all call sites were CPU only (where getting the device index right doesn't matter.) Reviewed By: gchanan Differential Revision: D13258252 fbshipit-source-id: c70f8d618ee9caf37ff2469cceaa439348b6114c	2018-11-29 16:04:14 -08:00
Edward Yang	7879c979b5	Expunge uses of type() from EmbeddingBag. (#14543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14543 The modern way to do this is to use options(). It doesn't make a functional difference here because everything is CPU (so loss of device information is not a big deal), but it's definitely safer this way. Reviewed By: gchanan Differential Revision: D13257847 fbshipit-source-id: afbc9f7f8d4ca5a8b1cf198997c307e27a2c3333	2018-11-29 16:04:12 -08:00
Edward Yang	6fe1867c23	Expunge direct device index handling from tensor_conversion_dispatch (#14421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14421 Last time I looked this, I bailed because it seemed like there were a lot of sites to fix. Well, I need this to work properly for out-of-place HIPify, so I took another whack at it. Changes should be pretty self-explanatory. Reviewed By: gchanan Differential Revision: D13221302 fbshipit-source-id: ed21e2668a1a629898a47358baf368fe680263a0	2018-11-29 16:04:10 -08:00
Jerry Zhang	5805ef5a83	call raw_mutable_data when data type didn't match in BlobGetMutableTensor (#14513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14513 att Reviewed By: dzhulgakov Differential Revision: D13245875 fbshipit-source-id: 3398a1f41a6195e120ed574dee887070e86dfe1f	2018-11-29 15:18:58 -08:00
David Riazati	666d383a00	Add broadcast list default arg support (#14361 ) Summary: To convert `max_unpool` functions to weak script, this PR adds support for `T` as default arguments for `BroadcastingListN[T]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361 Differential Revision: D13192231 Pulled By: driazati fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249	2018-11-29 15:15:47 -08:00
Michael Carilli	a2d8e84594	Added launch bounds in VolumetricConvolution.cu (#14564 ) Summary: A few months ago we were seeing test failures on certain architectures due to invalid launch configurations of the kernels in aten/src/THCUNN/VolumetricConvolution.cu. This PR ensures that those kernels are always compiled such that at least one block can be resident on an SM, and such errors will not be encountered at runtime on any architecture after compiling for that architecture. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14564 Differential Revision: D13266136 Pulled By: soumith fbshipit-source-id: 35464b20848bb0a1168e8f3b233172331c50b35b	2018-11-29 14:49:29 -08:00
rohithkrn	0d663cec30	Unify cuda and hip device types in Caffe2 python front end (#14221 ) Summary: Goal of this PR is to unify cuda and hip device types in caffe2 python front end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221 Differential Revision: D13148564 Pulled By: bddppq fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b	2018-11-29 14:00:16 -08:00
Lin Huang	bdaa0e38b8	Fix tautological-compare in aten/src/ATen/native/cuda/SummaryOps.cu (#14540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14540 refactor the HANDLE_SWITCH_CASE to avoid tautological-compare in macro Reviewed By: ezyang Differential Revision: D13255725 fbshipit-source-id: cfa64bb7bc53d19c93a693015202f207567690b4	2018-11-29 13:57:27 -08:00
zrphercule	eeb0d67b92	Update to export in onnx_aten_fallback option Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14492 Differential Revision: D13265701 Pulled By: zrphercule fbshipit-source-id: b339c92078f73d152a14db7d5d2b3f5edda9dda6	2018-11-29 13:49:50 -08:00
Junjie Bai	2901777a0e	Add back the MAX_JOBS=4 restriction to make rocm CI more stable (#14566 ) Summary: As a workaround before hcc has fixed high memory usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/14566 Differential Revision: D13263555 Pulled By: bddppq fbshipit-source-id: 479c7a76aff3919f028e03ef345795537480f0fa	2018-11-29 13:24:56 -08:00
Michael Suo	1b0b2e69f8	assorted alias analysis fixes (#14556 ) Summary: - Correctly report whether nodes write to an alias set. - Fix loop convergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14556 Differential Revision: D13261376 Pulled By: suo fbshipit-source-id: 8123c0fb1f8f137a15bd82719be2d99e502bccc2	2018-11-29 13:09:26 -08:00
Adam Paszke	31b3d81714	Broadcast prim::FusedConcat inputs independently when checking kernels (#14503 ) Summary: Fixes #14483. cc zou3519 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/14503 Differential Revision: D13256343 Pulled By: zou3519 fbshipit-source-id: 1c68a23f425be067a742bada7ee8cdfab7fc3fa2	2018-11-29 13:05:00 -08:00
Your Name	cf059028f0	Do not load ROCm cmake files if USE_ROCM is off (#14261 ) Summary: Previously if it unconditionally tries to load rocm cmake files, so there was no way to disable rocm build. After this change, USE_ROCM=0 will disable rocm build. Should fix #14025 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14261 Differential Revision: D13242090 Pulled By: bddppq fbshipit-source-id: 652ec7d49dce9b357778bfa53a8e04b7079787ab	2018-11-29 11:17:19 -08:00
Sebastian Messmer	fb6806f6e9	Remove at references in c10 Allocator.h (#14434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14434 The referenced classes live now in c10, so we don't need to specify their namespace. Reviewed By: ezyang Differential Revision: D13224015 fbshipit-source-id: 6d154b8e3f9a1e38ff0407dbb1151f5c1d5df260	2018-11-29 11:07:22 -08:00
Pieter Noordhuis	4ec6bd7356	Add sourceRank() to ProcessGroup::Work (#14453 ) Summary: This function is only implemented for the subclasses where it makes sense. If it's not overridden it will throw an error. Having this function removes the need for a pointer passing hack to pass the source rank of a recv operation back to the caller. Instead, the caller can now call `source_rank` on the work object and achieve the same result. Closes #11804. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14453 Differential Revision: D13230898 Pulled By: pietern fbshipit-source-id: ef38f48bfaca8ef9a364e5be122951bafc9f8e49	2018-11-29 09:16:53 -08:00
Matthew Heidemann	7c24a16f82	Fixed typo for BCEWithLogitLoss doc comments (#14532 ) Summary: The math symbol was missing a prefix `:` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14532 Differential Revision: D13256077 Pulled By: soumith fbshipit-source-id: 2359819d8aa664f915be1c436cbb0c0756504028	2018-11-29 08:22:19 -08:00
Ryan Moore	29d697aec4	typo in Module docstring Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14511 Differential Revision: D13246061 Pulled By: soumith fbshipit-source-id: 6c13a2957c4c4324ab5d839d634689c61e25b0fe	2018-11-29 07:17:29 -08:00
Jaliya Ekanayake	44cb43bcc1	Jaliyae/samplers (#13870 ) Summary: Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870 Differential Revision: D13240120 Pulled By: soumith fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599	2018-11-29 07:07:19 -08:00
David Riazati	9e93a02624	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13252887 Pulled By: driazati fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988	2018-11-28 23:31:25 -08:00
svcscm	ba25b37e9b	Updating submodules Reviewed By: yns88 fbshipit-source-id: f957056bb48c583738c5defaf3d1f01cd7df3915	2018-11-28 23:31:23 -08:00
svcscm	70e3736e20	Updating submodules Reviewed By: yns88 fbshipit-source-id: 9800251baaa09d9f7988eff340ef36e0ab11f579	2018-11-28 21:09:08 -08:00
Peter Goldsborough	db15f2e13f	Fix version.groups() (#14505 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14502 fmassa soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14505 Differential Revision: D13242386 Pulled By: goldsborough fbshipit-source-id: faebae8795e1efd9c0ebc2294fe9648193d16624	2018-11-28 20:27:33 -08:00
Elias Ellison	6d63e9dbff	Support Embedding + EmbeddingBag in Script + (Ignore flakey test) (#14509 ) Summary: Resubmitting PR #14415 The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break. Everything but the last two commits have already been accepted Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509 Differential Revision: D13247917 Pulled By: eellison fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15	2018-11-28 19:16:38 -08:00
Elias Ellison	105fa58748	pointwise_loss (#14134 ) Summary: Adding pointwise loss ops to weak_script Pull Request resolved: https://github.com/pytorch/pytorch/pull/14134 Differential Revision: D13209455 Pulled By: eellison fbshipit-source-id: 87fc0222121f34a2f4edb24c2da2a11124b097d8	2018-11-28 18:14:38 -08:00
James Sun	186341c5dc	Merge Caffe2 and PyTorch thread pool definitions (#14114 ) Summary: (1) Move Caffe2 thread pool to aten (2) Use the same thread pool definition for PyTorch interpreter (3) Make ivalue::Future thread-safe Pull Request resolved: https://github.com/pytorch/pytorch/pull/14114 Reviewed By: ilia-cher Differential Revision: D13110451 Pulled By: highker fbshipit-source-id: a83acb6a4bafb7f674e3fe3d58f7a74c68064fac	2018-11-28 18:10:20 -08:00
Sam Gross	533668d7e4	Ensure that indices are on the same device as self Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14504 Reviewed By: wat3rBro Differential Revision: D13242200 Pulled By: colesbury fbshipit-source-id: 82731cee808681ec612d406342070640eb26e519	2018-11-28 17:54:32 -08:00
Dmytro Dzhulgakov	da9e49e586	Remove Context dependency from Tensor class (#14269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269 Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`) For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling). Reviewed By: ezyang Differential Revision: D13117981 fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138	2018-11-28 15:45:38 -08:00
Dmytro Dzhulgakov	0cfbbceac3	Change Tensor::CopyFrom to a simple double dispatch (#14268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268 Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation. Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs. This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented). This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification. For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable. Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one. Also, please advise whether it's c10-worthy :) Reviewed By: ezyang Differential Revision: D13117987 fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5	2018-11-28 15:45:37 -08:00
albanD	f80d34a1c8	Update Tensor doc (#14339 ) Summary: Add to the Tensor doc info about `.device`, `.is_cuda`, `.requires_grad`, `.is_leaf` and `.grad`. Update the `register_backward_hook` doc with a warning stating that it does not work in all cases. Add support in the `_add_docstr` function to add docstring to attributes. There is an explicit cast here but I am not sure how to handle it properly. The thing is that the doc field for getsetdescr is written as being a const char * (as all other doc fields in descriptors objects) in cpython online documentation. But in the code, it is the only one that is not const. I assumed here that it is a bug in the code because it does not follow the doc and the convention of the others descriptors and so I cast out the const. EDIT: the online doc I was looking at is for 3.7 and in that version both the code and the doc are const. For older versions, both are non const. Please let me know if this should not be done. And if it should be done if there is a cleaner way to do it ! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14339 Differential Revision: D13243266 Pulled By: ezyang fbshipit-source-id: 75b7838f7cd6c8dc72b0c61950e7a971baefaeeb	2018-11-28 15:28:17 -08:00
andersj	fb7e40b7eb	nccl fixes (#14195 ) Summary: This has 4 changes 1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage 2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci 3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI 4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195 Differential Revision: D13237502 Pulled By: anderspapitto fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55	2018-11-28 14:43:06 -08:00
Edward Yang	ca55c5411f	Clean up house on CUDAStream (#14247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14247 Just a bunch of clean up to get the code in a good state before we enshrine it in c10. Billing of changes: - Inline all "pointer" API functions into their real implementations, so we don't have a bunch of dead pointer functions hanging around. - Replace all occurrences of int64_t with DeviceIndex, as appropriate - Rename device field to device_index - Add documentation for everything in CUDAStream.h - Bring CUDAStream to API parity with Stream (e.g., support equality) - Delete uncheckedSetCurrentCUDAStream, it didn't work anyway because StreamId to internal pointer conversion has a bunch of ways it can fail. Just hope for the best! Reviewed By: dzhulgakov Differential Revision: D13141949 fbshipit-source-id: a02f34921e3d8294bd77c262bd05da07d1740a71	2018-11-28 14:01:59 -08:00
Edward Yang	3aeb288e40	Make clang-tidy shut up about Python C API macros. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14480 Reviewed By: goldsborough Differential Revision: D13235001 fbshipit-source-id: cd7f00b12ed3d9ef0fb0d7bd6c428e21561ec1b6	2018-11-28 13:54:42 -08:00
Sebastian Messmer	e3711aa93f	Make TensorImpl/StorageImpl safer (#14429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14429 - forbid copying - make final what ought to be Reviewed By: dzhulgakov Differential Revision: D13223125 fbshipit-source-id: e6176cc916d4cd8370c835f243ca90d5c3124c4a	2018-11-28 13:41:49 -08:00
Sebastian Messmer	f6dfd9d545	Handle copying intrusive_ptr_target correctly (#14428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14428 See in-code comment Reviewed By: ezyang Differential Revision: D13223126 fbshipit-source-id: 1e87e6112bbcca6377ca04ef2ba25ef937931061	2018-11-28 13:41:48 -08:00
Edward Yang	5f07b33857	Revert D13219647: [pytorch][PR] Support Embedding + EmbeddingBag in Script Differential Revision: D13219647 Original commit changeset: c90706aa6fbd fbshipit-source-id: d189e717ba0773de43d633876bc3a688830a9303	2018-11-28 13:38:58 -08:00
Sebastian Messmer	aec4c19460	Remove StorageImpl::type() (#14139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14139 This seems neither be used nor implemented. Also, it is a c10->aten dependency which we don't want. Reviewed By: ezyang Differential Revision: D13112298 fbshipit-source-id: 0407c4c3ac9b02bbd6fca478336cb6a6ae334930	2018-11-28 13:32:38 -08:00
Jerry Zhang	bcd7b03c2a	Add XBlobGetMutableTensor that returns Tensor (#14424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14424 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14136 Since now Tensor is a shared_ptr, it doesn't make sense to have Tensor* around anymore, so we want to change Tensor* to Tensor in the interface. We added functions that work with `Tensor` instead of `Tensor` in this diff. To remove Tensor, we'll do following ``` auto* Y = Ouptut(0); Y->mutable_data... ``` --> ``` auto Y = Output(0); Y.mutable_data... ``` But to run clangr codemod, we'll keep both APIs in different names, e.g. `Output` and `XOutput`, and do the refactor and then delete the old method and rename the new method into the old one. For example for `Output`, we'll first codemod the callsites from `Output` to `XOutput`, then delete the old `Output` and rename `XOutput` to `Output` in the end. Reviewed By: smessmer Differential Revision: D12934074 fbshipit-source-id: d0e85f6ef8d13ed4e7a7505faa5db292a507d54c	2018-11-28 13:29:48 -08:00
Pieter Noordhuis	0f62af4ab1	Add timeout kwarg to init_process_group (#14435 ) Summary: This applies to the gloo backend only. Timeout support for the NCCL and MPI backends is tracked in issues #14371 and #14372 respectively. When creating a new process group (either the global one or any subgroup created through `new_group`) you can specify a timeout keyword argument (of type datetime.timedelta). This timeout applies to all collective operations executed against that process group, such that any operation taking longer than the timeout will throw a runtime error. Using a different, better catchable error type is tracked in #14433. This fixes #14376. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14435 Differential Revision: D13234317 Pulled By: pietern fbshipit-source-id: 973993b67994dc64861c0977cbb6f051ec9d87f6	2018-11-28 11:35:01 -08:00
Edward Yang	7c4aef9dfc	Add support for HIP to DispatchStub. (#14413 ) Summary: I feel a bit bad writing this patch, because there isn't really any reason not to use the normal dispatch mechanism for CUDA and HIP here (so we have yet another dispatcher), but I don't really want to sign up to rewrite DispatchStub to deduplicate the dispatcher right now. Need to natively add support for HIP here, as I don't want to have to HIPify files which are not in a CUDA directory. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14413 Differential Revision: D13220358 Pulled By: ezyang fbshipit-source-id: cc61218322589a1dc2ab8eb9d5ddd3c616f6b712	2018-11-28 11:07:45 -08:00
Elias Ellison	7749804099	Support Embedding + EmbeddingBag in Script (#14415 ) Summary: Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415 Reviewed By: wanchaol Differential Revision: D13219647 Pulled By: eellison fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717	2018-11-28 10:52:30 -08:00
Jongsoo Park	c32debb916	fix build error from D13188595 (#14481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14481 Fix build error in mode/opt Reviewed By: dskhudia Differential Revision: D13234688 fbshipit-source-id: 6c8515c45f75e7b88713a303f22990ad85d68beb	2018-11-28 10:46:33 -08:00
Raghavendra Thodime	a02b3374d4	Revert D13144472: [fix] condition blob in while_op test changes data type Differential Revision: D13144472 Original commit changeset: af4d920a3148 fbshipit-source-id: 74d9f69fc66964b5e68b4b2cd2fd2be1f63e9d69	2018-11-28 10:43:22 -08:00
Jiong Gong	6039e25e8d	Fix the build issue in setup.py due to cmake version type x.x.x.x vio… (#14331 ) Summary: See https://github.com/pytorch/pytorch/issues/13226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14331 Differential Revision: D13234639 Pulled By: orionr fbshipit-source-id: 87880057e84242e4af5ad6bf87e08831aa2c5459	2018-11-28 10:38:27 -08:00
JerryShih	8901935ad4	Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/11563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14473 Differential Revision: D13234208 Pulled By: ezyang fbshipit-source-id: 7d874c63659e93728af239ecdfb85547613e52ad	2018-11-28 09:28:26 -08:00
Edward Yang	302caef154	Revert D13166626: [pytorch][PR] ignore generated caffe2 docs and virtualenvs Differential Revision: D13166626 Original commit changeset: 4f11228d8b5d fbshipit-source-id: ff301f1791ca8a390767ae43cde8637dcd044d0c	2018-11-28 07:40:04 -08:00
Brennan Vincent	c638f379b3	Make `mean` function work across multiple dimensions. (#14252 ) Summary: Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it. Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252 Differential Revision: D13161157 Pulled By: umanwizard fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c	2018-11-28 06:53:09 -08:00
Francisco Massa	68251fb931	Fix half tensor printing plus speedup large tensor printing (#14418 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863 The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions. Some quick runtime analysis: Before this PR: ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After this PR ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [3]: b = a.cuda() In [4]: %timeit str(b) 8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418 Reviewed By: weiyangfb Differential Revision: D13226950 Pulled By: soumith fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23	2018-11-28 06:13:06 -08:00
Wei Yang	be7c618fd7	torch.sparse.sum() (#12430 ) Summary: - to fix #12241 - add `_sparse_sum()` to ATen, and expose as `torch.sparse.sum()`, not support `SparseTensor.sum()` currently - this PR depends on #11253, and will need to be updated upon it lands - [x] implement forward - [x] implement backward - performance [benchmark script](https://gist.github.com/weiyangfb/f4c55c88b6092ef8f7e348f6b9ad8946#file-sparse_sum_benchmark-py): - sum all dims is fastest for sparse tensor - when input is sparse enough nnz = 0.1%, sum of sparse tensor is faster than dense in CPU, but not necessary in CUDA - CUDA backward is comparable (<2x) between `sum several dims` vs `sum all dims` in sparse - CPU backward uses binary search is still slow in sparse, takes `5x` time in `sum [0, 2, 3] dims` vs `sum all dims` - optimize CUDA backward for now - using thrust for sort and binary search, but runtime not improved - both of CPU and CUDA forward are slow in sparse (`sum several dims` vs `sum all dims`), at most `20x` slower in CPU, and `10x` in CUDA - improve CPU and CUDA forward kernels (nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) \| CPU (sparse vs dense) \| CUDA(sparse vs dense) -- \| -- \| -- (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 8.77 µs vs 72.9 µs \| 42.5 µs vs 108 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 112 µs vs 4.47 ms \| 484 µs vs 407 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 141 µs vs 148 µs \| 647 µs vs 231 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 235 µs vs 1.23 ms \| 781 µs vs 213 µs (1000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 48.5 µs vs 360 µs \| 160 µs vs 2.03 ms (1000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 258 µs vs 1.22 ms \| 798 µs vs 224 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 204 µs vs 882 µs \| 443 µs vs 133 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 709 µs vs 1.15 ms \| 893 µs vs 202 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 39.8 µs vs 81 µs \| 42.4 µs vs 113 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 747 µs vs 4.7 ms \| 2.4 ms vs 414 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 1.04 ms vs 126 µs \| 5.03 ms vs 231 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 1.12 ms vs 1.24 ms \| 5.99 ms vs 213 µs (10000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 133 µs vs 366 µs \| 463 µs vs 2.03 ms (10000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 1.56 ms vs 1.22 ms \| 6.11 ms vs 229 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 1.53 ms vs 799 µs \| 824 µs vs 134 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 5.15 ms vs 1.09 ms \| 7.02 ms vs 205 µs - after improving CPU and CUDA forward kernels - in `(1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD)` forward, CPU takes ~~`171 µs`~~, in which `130 µs` is spent on `coalesce()`, for CUDA, total time is ~~`331 µs`~~, in which `141 µs` is spent on `coalesce()`, we need to reduce time at other places outside `coalesce()`. - after a few simple tweaks, now in the forward, it is at most `10x` slower in CPU, and `7x` in CUDA. And time takes in `sum dense dims only [2, 3]` is `~2x` of `sum all dims`. Speed of `sum all sparse dims [0, 1]` is on bar with `sum all dims` (nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) \| CPU (sparse vs dense) \| CUDA(sparse vs dense) -- \| -- \| -- (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 7 µs vs 69.5 µs \| 31.5 µs vs 61.6 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 11.3 µs vs 4.72 ms \| 35.2 µs vs 285 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 197 µs vs 124 µs \| 857 µs vs 134 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 124 µs vs 833 µs \| 796 µs vs 106 µs (1000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 20.5 µs vs 213 µs \| 39.4 µs vs 1.24 ms (1000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 131 µs vs 830 µs \| 881 µs vs 132 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 95.8 µs vs 409 µs \| 246 µs vs 87.2 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 624 µs vs 820 µs \| 953 µs vs 124 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 45.3 µs vs 72.9 µs \| 33.9 µs vs 57.2 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 81.4 µs vs 4.49 ms \| 39.7 µs vs 280 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 984 µs vs 111 µs \| 6.41 ms vs 121 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 1.45 ms vs 828 µs \| 6.77 ms vs 113 µs (10000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 74.9 µs vs 209 µs \| 37.7 µs vs 1.23 ms (10000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 1.48 ms vs 845 µs \| 6.96 ms vs 132 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 1.14 ms vs 411 µs \| 252 µs vs 87.8 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 4.53 ms vs 851 µs \| 7.12 ms vs 128 µs - time takes in CUDA backward of sparse is super long with large variance (in case of nnz=10000, it normally takes 6-7ms). To improve backward of sparse ops, we will need to debug at places other than CUDA kernels. here is a benchmark of `torch.copy_()`: ``` >>> d = [1000, 1000, 2, 2] >>> nnz = 10000 >>> I = torch.cat([torch.randint(0, d[0], size=(nnz,)), torch.randint(0, d[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, d[2], d[3]) >>> size = torch.Size(d) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce().cuda() >>> S2 = torch.sparse_coo_tensor(I, V, size).coalesce().cuda().requires_grad_() >>> data = S2.clone() >>> S.copy_(S2) >>> y = S * 2 >>> torch.cuda.synchronize() >>> %timeit y.backward(data, retain_graph=True); torch.cuda.synchronize() 7.07 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12430 Differential Revision: D12878313 Pulled By: weiyangfb fbshipit-source-id: e16dc7681ba41fdabf4838cf05e491ca9108c6fe	2018-11-28 02:19:12 -08:00
Jiyan Yang	a2fcd4dee5	Ensure FP16 rowwise Adagrad can be run Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12317 Reviewed By: hyuen Differential Revision: D10190778 fbshipit-source-id: 720a9aaa4e6b1736023d8c6326a613e4ea592b31	2018-11-28 02:15:36 -08:00
Jongsoo Park	e8754ee017	use fbgemm's im2col fusion and thread partitioning (#14350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14350 acc32 for now. Will have a separate diff for acc16 but that will need another out processing that does sparse convolution without im2col. Reviewed By: dskhudia Differential Revision: D13188595 fbshipit-source-id: e8faee46c7ea43e4a600aecb8b8e93e6c860a8c8	2018-11-28 01:13:11 -08:00
Teng Li	a38ed0268e	PT1 Stable Release Distributed Documentation (#14444 ) Summary: The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in https://github.com/pytorch/pytorch/issues/14080 Tested by previewing the sphinx generated webpages. All look good. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14444 Differential Revision: D13227675 Pulled By: teng-li fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86	2018-11-28 00:34:11 -08:00
David Riazati	3d98810fbd	Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit Differential Revision: D13192230 Original commit changeset: 36488960b6c9 fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909	2018-11-28 00:23:09 -08:00
Teng Li	7d07fcd215	Fixed SyncParam/QueueReduction/SyncReduction test for 2+ GPUs (#14452 ) Summary: Fixed: https://github.com/pytorch/pytorch/issues/14445 Also bumped up timeout to 30 seconds, since on 8-GPU machines, DDP test will take more than 15 seconds sometimes. Tested on 8 GPU machines: ``` tengli@learnfair062:~/pytorch/test$ python test_c10d.py --verbose test_dist_broadcast_coalesced_gloo (__main__.DistributedDataParallelTest) ... ok test_dist_broadcast_coalesced_nccl (__main__.DistributedDataParallelTest) ... skipped 'Test skipped due to known issues' test_fp16 (__main__.DistributedDataParallelTest) ... ok test_gloo_backend (__main__.DistributedDataParallelTest) ... ok test_nccl_backend (__main__.DistributedDataParallelTest) ... ok test_queue_reduction (__main__.DistributedDataParallelTest) ... ok test_sync_params_no_buffers (__main__.DistributedDataParallelTest) ... ok test_sync_params_with_buffers (__main__.DistributedDataParallelTest) ... ok test_sync_reduction (__main__.DistributedDataParallelTest) ... ok test_set_get (__main__.FileStoreTest) ... ok test_set_get (__main__.PrefixFileStoreTest) ... ok test_set_get (__main__.PrefixTCPStoreTest) ... ok test_allgather_basics (__main__.ProcessGroupGlooTest) ... ok test_allgather_checks (__main__.ProcessGroupGlooTest) ... ok test_allreduce_basics (__main__.ProcessGroupGlooTest) ... ok test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... ok test_allreduce_checks (__main__.ProcessGroupGlooTest) ... ok test_allreduce_stress (__main__.ProcessGroupGlooTest) ... ok test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... ok test_broadcast_basics (__main__.ProcessGroupGlooTest) ... ok test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... ok test_broadcast_checks (__main__.ProcessGroupGlooTest) ... ok test_broadcast_stress (__main__.ProcessGroupGlooTest) ... ok test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... ok test_gather_basics (__main__.ProcessGroupGlooTest) ... ok test_gather_checks (__main__.ProcessGroupGlooTest) ... ok test_reduce_basics (__main__.ProcessGroupGlooTest) ... ok test_reduce_checks (__main__.ProcessGroupGlooTest) ... ok test_scatter_basics (__main__.ProcessGroupGlooTest) ... ok test_scatter_checks (__main__.ProcessGroupGlooTest) ... ok test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... ok test_timeout_kwarg (__main__.ProcessGroupGlooTest) ... ok test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_barrier (__main__.ProcessGroupNCCLTest) ... ok test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_common_errors (__main__.RendezvousEnvTest) ... ok test_nominal (__main__.RendezvousEnvTest) ... ok test_common_errors (__main__.RendezvousFileTest) ... ok test_nominal (__main__.RendezvousFileTest) ... ok test_common_errors (__main__.RendezvousTCPTest) ... ok test_nominal (__main__.RendezvousTCPTest) ... ok test_unknown_handler (__main__.RendezvousTest) ... ok test_address_already_in_use (__main__.TCPStoreTest) ... ok test_set_get (__main__.TCPStoreTest) ... ok ---------------------------------------------------------------------- Ran 46 tests in 162.980s OK (skipped=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14452 Differential Revision: D13230652 Pulled By: teng-li fbshipit-source-id: 88580fe55b3a4fbc7a499ca3b591958f11623bf8	2018-11-27 21:58:34 -08:00
David Riazati	4cdcbbf410	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13192230 Pulled By: driazati fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e	2018-11-27 21:19:51 -08:00
Brian Vaughan	a0def0b57e	check for invalid ranges in torch.arange Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915 Differential Revision: D13222110 Pulled By: nairbv fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b	2018-11-27 20:38:56 -08:00
Brian Vaughan	b08a186153	roll along multiple dimensions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874 Differential Revision: D13223669 Pulled By: nairbv fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04	2018-11-27 20:32:30 -08:00
David Riazati	662f66ebb9	Add poisson_nll_loss to script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14420 Differential Revision: D13220726 Pulled By: driazati fbshipit-source-id: 6c08a0050075beafcc8ba413c9603b273870c70c	2018-11-27 19:39:16 -08:00
David Riazati	d75f751bec	Add boolean dispatch for function overloading (#14425 ) Summary: This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage. This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions. Fixes #14081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425 Differential Revision: D13222104 Pulled By: driazati fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1	2018-11-27 19:36:47 -08:00
Zachary DeVito	23f901a737	fix enable_cpu_fuser Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14440 Differential Revision: D13226354 Pulled By: zdevito fbshipit-source-id: e4ed023eece8b5b670a4a27d24a8688907b36b90	2018-11-27 19:14:10 -08:00
Elias Ellison	82175f31b4	Move Affine grid to C++ (#14392 ) Summary: Port AffineGrid to C++, because script does not support compiling Function classes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392 Differential Revision: D13219698 Pulled By: eellison fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6	2018-11-27 18:38:11 -08:00
Peter Goldsborough	6f2307ba6a	Allow building libraries with setuptools that dont have abi suffix (#14130 ) Summary: When using `setuptools` to build a Python extension, setuptools will automatically add an ABI suffix like `cpython-37m-x86_64-linux-gnu` to the shared library name when using Python 3. This is required for extensions meant to be imported as Python modules. When we use setuptools to build shared libraries not meant as Python modules, for example libraries that define and register TorchScript custom ops, having your library called `my_ops.cpython-37m-x86_64-linux-gnu.so` is a bit annoying compared to just `my_ops.so`, especially since you have to reference the library name when loading it with `torch.ops.load_library` in Python. This PR fixes this by adding a `with_options` class method to the `torch.utils.cpp_extension.BuildExtension` which allows configuring the `BuildExtension`. In this case, the first option we add is `no_python_abi_suffix`, which we then use in `get_ext_filename` (override from `setuptools.build_ext`) to throw away the ABI suffix. I've added a test `setup.py` in a `no_python_abi_suffix_test` folder. Fixes https://github.com/pytorch/pytorch/issues/14188 t-vi fmassa soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14130 Differential Revision: D13216575 Pulled By: goldsborough fbshipit-source-id: 67dc345c1278a1a4ee4ca907d848bc1fb4956cfa	2018-11-27 17:35:53 -08:00
Wanchao Liang	23d111c87f	Fix clang tidy errors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14427 Differential Revision: D13222381 Pulled By: wanchaol fbshipit-source-id: d90d210a810e95bf0eb404f9c1c304f4e6a3f61e	2018-11-27 17:30:50 -08:00
Zachary DeVito	226a01e5a1	Handling of pretty-printing methods (#14378 ) Summary: Stacked on #14176, review only the last commit. * Print parameters to methods as self.weight rather than as extra inputs. * Print entire set of methods out as a single string * Update test code to test the module-at-a-time export/import Pull Request resolved: https://github.com/pytorch/pytorch/pull/14378 Differential Revision: D13198463 Pulled By: zdevito fbshipit-source-id: 3fab02e8239cfd6f40d6ab6399047bd02cf0a8c8	2018-11-27 17:10:23 -08:00
Edward Yang	75bac5ab32	Eliminate necessity of HIPify on AccumulateType.h (#14412 ) Summary: I'd like to NOT HIPify files that are not in a cuda/ directory, so hand-HIPify AccumulateType.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14412 Differential Revision: D13221801 Pulled By: ezyang fbshipit-source-id: d1927cfc956e50a6a5e67168ac0e1ce56ecd1e0b	2018-11-27 16:39:55 -08:00
andersj	1620161d6b	when BUILD_CAFFE2_OPS is OFF, torch-python needs a direct dep on nccl (#14430 ) Summary: https://github.com/pytorch/pytorch/issues/14431 tracks supporting this with CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/14430 Differential Revision: D13224079 Pulled By: anderspapitto fbshipit-source-id: 47d7900d25910ed61585b93f9003acd1b2630a9f	2018-11-27 15:53:31 -08:00
Sam Gross	006505bb8f	Speed-up "advanced" indexing operations (#13420 ) Summary: This speeds-up "advanced" indexing (indexing a tensor by a tensor) on CPU and GPU. There's still a bunch of work to do, including speeding up indexing by a byte (boolean) mask and speeding up the derivative calculation for advanced indexing. Here's some speed comparisons to indexing on master using a little [benchmark script](https://gist.github.com/colesbury/c369db72aad594e5e032c8fda557d909) with 16 OpenMP threads and on a P100. The test cases are listed as (input shape -> output shape). \| Test case \| CPU (old vs. new) \| CUDA (old vs. new) \| \|-----------------------\|---------------------\|------------------------\| \| 1024x1024 -> 512x1024 \| 225 us vs. 57 us \| 297 us vs. 47 us \| \| 1024x1024 -> 1024x512 \| 208 us vs. 153 us \| 335 us vs. 54 us \| \| 50x50 -> 20000x50 \| 617 us vs. 77 us \| 239 us vs. 54 us \| \| 50x50 -> 50x20000 \| 575 us vs. 236 us \| 262 us vs. 58 us \| \| 2x5x10 -> 10 \| 65 us vs. 18 us \| 612 us vs. 93 us \| See #11647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13420 Reviewed By: soumith Differential Revision: D13088936 Pulled By: colesbury fbshipit-source-id: 0a5c2ee9aa54e15f96d06692d1694c3b24b924e2	2018-11-27 15:23:59 -08:00
Jiyan Yang	0199d59d3a	Resubmit: Set the correct engine name for position weighted pooling when fp16 is used for training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13768 Reviewed By: xianjiec Differential Revision: D12996103 fbshipit-source-id: 5ca4cda4210f68ece2b5d6eced8cf52ee91fb36f	2018-11-27 14:51:56 -08:00
Will Feng	ae1b37650c	Windows local build: restore original working dir after activating VC environment (#14416 ) Summary: `call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64` seems to change the working dir to `C:\Users\Administrator\source`, and we need to cd back to the PyTorch directory before running `git submodule update --init --recursive` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14416 Differential Revision: D13222269 Pulled By: yf225 fbshipit-source-id: a0eb3311fb11713b1bb8f52cd13e2c21d5ca9c7b	2018-11-27 14:18:45 -08:00
Jerry Zhang	5c84145354	condition blob in while_op test changes data type (#14279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14279 att Reviewed By: smessmer Differential Revision: D13144472 fbshipit-source-id: af4d920a3148c648d1a428a5bcd56da19ea8c38c	2018-11-27 14:16:39 -08:00
zrphercule	ba6c49cb9c	Add test of ONNX_ATEN (#14259 ) Summary: In #14239 we fixed ONNX_ATEN. In order to make sure its correctness in the future, we should add related test case. We use torch.fmod() to test ONNX_ATEN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14259 Differential Revision: D13204610 Pulled By: zrphercule fbshipit-source-id: e4660c346e5edd201f1458b7d74d7dfac49b94c7	2018-11-27 13:51:51 -08:00
Hassan Eslami	e392d428b1	Allowing TaskGroups to carry remote nets (#14342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14342 Sometimes, when we are creating a TaskGroup, we are in fact creating a TaskGroup for a distributed job. In some cases, we may want to register a few nets as "remote" to a TaskGroup. The remote net should have sufficient attributes on where they should be executed later on. This diff adds the remote net attribute to the TaskGroup class. It exposes two minimal functionalities: adding a remote net, and getting all remote nets added to a TaskGroup. Reviewed By: d4l3k Differential Revision: D13188320 fbshipit-source-id: efe947aec30817e9512a5e18be985713b9356bdc	2018-11-27 13:34:11 -08:00
Edward Yang	b7856a32f6	Add scaffolding for HIP backend in ATen/core. (#14285 ) Summary: This code doesn't actually do anything, but it will be the groundwork necessary to change PyTorch's HIPIFY pass from reusing CUDA identifiers directly, to actually switching to using HIP identifiers (moving us closer to a world where we can compile both HIP and CUDA PyTorch side-by-side.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14285 Differential Revision: D13158851 Pulled By: ezyang fbshipit-source-id: df2462daa5d0d4112455b67bd3067d60ba55cda5	2018-11-27 13:21:42 -08:00
Edward Yang	1b93cb7631	Document device_guard in native_functions.yaml (#14235 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14235 Differential Revision: D13145780 Pulled By: ezyang fbshipit-source-id: 0e93bf009ad492551bcdcada0357f2fef529e67d	2018-11-27 13:17:23 -08:00
David Riazati	1b80644b4d	Revert D13192228: [pytorch][PR] [jit] Add boolean dispatch for function overloading Differential Revision: D13192228 Original commit changeset: fce33c400c1f fbshipit-source-id: 75c9991dc7097f9513c6c89d16eff2de6e287c3b	2018-11-27 13:14:42 -08:00
Sebastian Messmer	f9c27d60c3	Remove fake dependencies from TensorImpl to caffe2 (#14141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14141 These includes weren't actually used, let's remove them. Reviewed By: ezyang Differential Revision: D13113129 fbshipit-source-id: 816995e280b81bf99002772ea8aea458bdfcd2c7	2018-11-27 12:59:56 -08:00
Sebastian Messmer	3257ac1ff3	Fix include paths for TensorTypeId.h and TensorTypeIdRegistration.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14070 Reviewed By: ezyang Differential Revision: D13081610 fbshipit-source-id: 685994a15a2cd15e9e5447cf77671343de5dd278	2018-11-27 12:59:54 -08:00
Sebastian Messmer	ed10ef97da	Move TensorTypeId to c10/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14327 Reviewed By: ezyang Differential Revision: D13131338 fbshipit-source-id: c4682cb6ed6fe4cd1636e09d918eef6e90c836f1	2018-11-27 12:59:52 -08:00
Sebastian Messmer	6c2e816268	Fix include paths for Storage.h and StorageImpl.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14062 Reviewed By: ezyang Differential Revision: D13081603 fbshipit-source-id: c272b715ef2f513d21d1c3f34fbf79eec6946441	2018-11-27 12:59:50 -08:00
Sebastian Messmer	3d4d09fe06	Move Storage and StorageImpl to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14061 Reviewed By: ezyang Differential Revision: D13081608 fbshipit-source-id: 1ea2d32e9ec9293b6ffa4b9e76c674cca55d5a1c	2018-11-27 12:59:48 -08:00
Sebastian Messmer	507ed9032e	Fix include paths for Allocator.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14060 Reviewed By: ezyang Differential Revision: D13081605 fbshipit-source-id: 02f23af174c0f0c38fb0163c2dfef3873ff5635d	2018-11-27 12:59:46 -08:00
Sebastian Messmer	3a71d5ee49	Move Allocator.h to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14059 Reviewed By: ezyang Differential Revision: D13081606 fbshipit-source-id: d6ad59ad4e3d363268cd4307b6c999a168681246	2018-11-27 12:59:44 -08:00
Sebastian Messmer	0b10f147b6	Move UniqueVoidPtr to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14058 Reviewed By: dzhulgakov Differential Revision: D13081602 fbshipit-source-id: e91ccf9fba9a7a02f99ed90b7a3a0fe7afd56832	2018-11-27 12:59:42 -08:00
Sebastian Messmer	8b1ca2810b	Move ScalarTypeUtils.h to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14024 Reviewed By: ezyang Differential Revision: D13081604 fbshipit-source-id: d7a09610f64eb2e9dd831bbb3c85f20691251594	2018-11-27 12:59:40 -08:00
Sebastian Messmer	44e21cf5bb	Fix include paths for Scalar.h and ScalarType.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14023 Reviewed By: ezyang Differential Revision: D13081609 fbshipit-source-id: c27eeafa381b39e043f0261ea7f6f634ee8bc238	2018-11-27 12:59:38 -08:00
Sebastian Messmer	50e9c56830	Move Scalar and ScalarType to c10/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14022 Reviewed By: ezyang Differential Revision: D13015236 fbshipit-source-id: 92aac4e342d85f75a31837b2943fa5b80f0c35c9	2018-11-27 12:59:36 -08:00
Michael Suo	3fca4bde50	Trace in-place ops (#14254 ) Summary: This PR adds a `try_outplace` option to the tracer. When `try_outplace` is true, the tracer will attempt to out-of-place ops (similar to how things are done today). When it's false, the correct in-place op is emitted. I made `try_outplace` false by default, but flipped it to true for ONNX export utils. zdevito jamesr66a, anywhere else I should preserve the existing behavior? Pull Request resolved: https://github.com/pytorch/pytorch/pull/14254 Reviewed By: eellison Differential Revision: D13166691 Pulled By: suo fbshipit-source-id: ce39fdf73ac39811c55100e567466d53108e856b	2018-11-27 12:40:56 -08:00
Teng Li	ffbc3905a1	Fixed torch.multiprocessing.spawn for not being able to spawn like dataloader workers (#14391 ) Summary: Should fix: https://github.com/pytorch/pytorch/issues/14390 Now imagenet example works fine with multiprocessing and more than 1 dataloader worker Pull Request resolved: https://github.com/pytorch/pytorch/pull/14391 Reviewed By: calebho Differential Revision: D13209800 Pulled By: teng-li fbshipit-source-id: e8abc0fb38d4436cf3474dcbba0e28f4290e4d29	2018-11-27 12:37:41 -08:00
Jerry Zhang	5fefb29a53	Tensor construction: combine Resize+mutable_data - 4/4 (#13856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13856 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007310 fbshipit-source-id: 941f064ef8934bb17fbfb706e6ed3db173b5d268	2018-11-27 12:34:25 -08:00
Zachary DeVito	e22cc7c072	Print default values and introduce ir view classes (#14176 ) Summary: [Stacked commit, only review the last commit] This PR adds support for printing default values in python printing as well as the logic for parsing default values back in using the parser. For simplicity, this PR simply creates a subgraph of the constant expressions and then runs that graph to generate the defaults. A more lightweight approach should be possible later, but would require more machinery. To make reading code in the printer easier, this also add ir_views.h. Similar to tree_views.h these classes can provide views of some commonly used IR nodes that have complicated structure and common operations on that structure. Currently it has only read-only views for prim::If and prim::Loop, but we should eventually add helpers to manipulate If/Loop nodes as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14176 Differential Revision: D13198455 Pulled By: zdevito fbshipit-source-id: dc99ab9692804ccaedb60a55040c0b89ac7a6a6d	2018-11-27 11:48:27 -08:00
Thomas Viehmann	8408dff55a	Add Type support to the fuser, fuse more (#14336 ) Summary: This adds scalar type support to the fuser, both internally (instead of auto / assuming float) and for the inputs/outputs. We can now fuse things with input / output of arbitrary scalar type, in particular comparisons and where work well. So it fixes #13384 by returning the right type tensor (and adds a test where byte and double tensors are returned). The type inference is done by re-calling PropagateTensorShapeOnNode in the compilation, I would venture that it isn't prohibitively expensive compared to the actual compilation. (Propagation was fixed for where to return the second argument's type and amended to handle FusedConcat.) I'm not sure how to add a check for the code generated by the fuser, but I am not sure we absolutely need to (we'd see if it is invalid / produces wrong results). Thanks in particular to apaszke, fmassa, mruberry for advice and encouragement! All the errors are my own. I have discussed order of PRs briefly with mruberry, if this goes in before he submits the PR, he graciously agreed to rebasing his, but I'd happily rebase, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14336 Differential Revision: D13202620 Pulled By: soumith fbshipit-source-id: 855159e261fa15f21aca3053bfc05fb3f720a8ef	2018-11-27 11:33:11 -08:00
svcscm	bd629481fb	Updating submodules Reviewed By: yns88 fbshipit-source-id: e63160e97550942931bacaa860d91d591d2e1712	2018-11-27 11:23:32 -08:00
David Riazati	66c8bbf021	Add boolean dispatch for function overloading (#14081 ) Summary: This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage. This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions. Depends on #14232 for `Optional[BroadcastingList[T]]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081 Differential Revision: D13192228 Pulled By: driazati fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113	2018-11-27 10:51:32 -08:00
Pieter Noordhuis	2cc35c161a	Barrier synchronizes with prior work before completing (#14386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14386 See #13573, #14142, and #14271 for discussion. This change updates ProcessGroupGloo to ensure that all prior operations have completed before executing the barrier. Reviewed By: manojkris Differential Revision: D13205022 fbshipit-source-id: 673e7e6ca357dc843874d6dd8da590832e1de7fa	2018-11-27 10:46:42 -08:00
Pieter Noordhuis	9598d380b0	Make ProcessGroup::Work::wait() throw (#14298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14298 This is a breaking API change for users of the C++ c10d API. The work object defined wait() to return a boolean. If the work completed successfully it would return true, if it didn't it would return false. It was then up to the user to call the exception() function to figure out what went wrong. This has proven suboptimal as it allows users to forget about failure handling and errors may be ignored. The work class is semantically very similar to std::future, where a call to get() may throw if the underlying std::promise has set an exception. This commit changes the semantic of the work class to be similar to this and turns wait() into a void function that throws if the work completes with an exception. The exception() function can still be used to retrieve the exception if isSuccess() returns false, but now returns an std::exception_ptr instead of a reference to a std::exception. Reviewed By: manojkris Differential Revision: D13158475 fbshipit-source-id: 9cd8569b9e7cbddc867a5f34c6fd0b7be85581b8	2018-11-27 10:46:40 -08:00
Pieter Noordhuis	03864b7b11	Add option structs and timeout field (#14297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14297 Adds option structs for allgather and barrier such that we have one for every collective. Add timeout member field to every one of these such that we can support per operation timeouts. Use default constructed options struct for every collective process group function exposed to Python. Reviewed By: manojkris Differential Revision: D13158474 fbshipit-source-id: 3d28977de2f2bd6fc2f42ba3108b63a429338906	2018-11-27 10:46:38 -08:00
Pieter Noordhuis	52f50220d9	Refer to all work with ProcessGroup prefix (#14296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14296 There was mixed usage of "ProcessGroup::Work" and just "Work". Adding prefix for readability/consistency. Reviewed By: manojkris Differential Revision: D13128977 fbshipit-source-id: a54a8784fa91cd6023c723cb83e9f626fb896a30	2018-11-27 10:46:36 -08:00
Pieter Noordhuis	5865561a9a	Remove algorithm caching in ProcessGroupGloo (#14295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14295 This is no longer used after moving to Gloo new style algorithms. Closes #11912. Reviewed By: manojkris Differential Revision: D13111781 fbshipit-source-id: 53e347080e29d847cd9da36f2d93af047930690c	2018-11-27 10:46:34 -08:00
Pieter Noordhuis	936c2bba23	Use new style barrier support in c10d/gloo (#14294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14294 This is the final collective to be ported to the new style where there is no longer a need to keep a cached algorithm instance around. There is a follow up change incoming to remove the algorithm caching functionality in ProcessGroupGloo. Reviewed By: manojkris Differential Revision: D13111509 fbshipit-source-id: f3ea0d955a62029fc4e7cfc09055e4957e0943ac	2018-11-27 10:46:32 -08:00
Wei Yang	50bc9dc9c3	fix doc for sparse.addmm (#14403 ) Summary: - fixing the doc issue in sparse.addmm ================ before change ================== ![image](https://user-images.githubusercontent.com/38509346/49063994-2f10fe80-f1ce-11e8-9ccc-54241bc45f0b.png) ![image](https://user-images.githubusercontent.com/38509346/49064064-641d5100-f1ce-11e8-865a-7227be7156ef.png) ================ post change ================== ![image](https://user-images.githubusercontent.com/38509346/49064078-76978a80-f1ce-11e8-8f38-f1f8ac9ce63b.png) ![image](https://user-images.githubusercontent.com/38509346/49064085-7bf4d500-f1ce-11e8-8a0d-bf9e5460d21f.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14403 Differential Revision: D13216582 Pulled By: weiyangfb fbshipit-source-id: 52e0a20c6b341c37cfb31f281be3afe2a52ca532	2018-11-27 10:24:18 -08:00
Jongsoo Park	a3cfab2d63	per-group and per-channel quantization (#14340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14340 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/25 Per-group and per-channel quantization in fbgemm This diff also cleans up explicit template instantiation using macro expansion This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors. Using this in DNNLOWP operators will be done in a separate diff. Reviewed By: dskhudia Differential Revision: D13176386 fbshipit-source-id: e46c53e31e21520bded71b8ed86e8b19e010e2dd	2018-11-27 10:17:34 -08:00
Peter Goldsborough	49fe678fec	Add variable_factories.h to cppdocs (#14381 ) Summary: This will document `torch::from_blob` and such. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14381 Differential Revision: D13216560 Pulled By: goldsborough fbshipit-source-id: 112f60e45e4d38a8a9983fa71e9cc56bc1a73465	2018-11-27 10:13:23 -08:00
Jan Schlüter	c19af59a6e	Use integer math to compute output size of pooling operations (#14405 ) Summary: As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes. This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU. Fixes #13386. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405 Differential Revision: D13215260 Pulled By: soumith fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d	2018-11-27 09:38:06 -08:00
Edward Yang	c5cc1e3ab2	Delete legacy THCStream (long live THCStream). (#14246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14246 This commit systematically eliminates THCStream entirely from THC, replacing it with at::cuda::CUDAStream. In places where the previous pointer type showed up in a public API signature, those functions are now only available to C++ clients. (It would not be too difficult to make a C-compatible version of CUDAStream, as it's really just a simple struct, but we leave this for future work.) All functions in THC that referred to THCStream were expunged in favor of their modern counterparts. One annoyance was that I didn't feel like redoing how the torch.cuda.Stream binding code worked, but I really wanted to get rid of the stored THCStream* pointer. So I repurposed the bit-packing code I implemented for Stream hashing, and used that to (reversibly) store streams in a uint64_t cdata field. A perhaps more future proof solution would be to get rid of cdata entirely, and store the device and stream ID directly. Billing of changes: - All CUDAStream_ pointer API functions are now hidden and anonymously namespaced (instead of being in the impl namespace). All use sites rewritten to use the modern C++ API. Since CUDAStreamInternals is no longer part of the public API, the CUDAStreamInternals constructor and internals() method have been removed, and replaced with anonymous functions in the C++ file. - device_index() returns DeviceIndex rather than int64_t now - Stream and CUDAStream now have pack/unpack methods. (CUDAStream checks that the unpacked bit-pattern is for a CUDA device.) - THCStream.h header is removed entirely - Most THCStream handling functions in THC API are removed Reviewed By: gchanan Differential Revision: D13121531 fbshipit-source-id: 48873262cc0a37c3eec75a7ba1c93c800da40222	2018-11-27 08:32:09 -08:00
Edward Yang	388258fb5e	Add hash functions for Stream, CUDAStream; fix Device hash function (#14191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14191 Previously, Device's hash function only worked for CPU and CUDA. Now it works for everything. Implementing the bit concatenation was a bit tricky, and I got it wrong the first time. See Note [Hazard when concatenating signed integers] Reviewed By: smessmer Differential Revision: D13119624 fbshipit-source-id: 36bfa139cfc739bb0624f52aaf466438c2428207	2018-11-27 08:32:08 -08:00
Owen Anderson	3ff70712c2	Implement NaN-propagating max/min on Vec256. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13399 Differential Revision: D13199957 Pulled By: resistor fbshipit-source-id: 1565e079b13c5d4f42f2033830a7c997b7d824bc	2018-11-26 22:46:20 -08:00
svcscm	a0ef8afd7e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 210f7eec65bea5e31817fb56dec27b0ab8af797a	2018-11-26 19:38:00 -08:00
Ilia Cherniavskii	f019a2d9b3	Remove unused executors, part 3 (#14199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14199 Remove legacy code for dag, async_dag Reviewed By: salexspb Differential Revision: D13019102 fbshipit-source-id: ff07e45304d9af4be0375215f4b642c4b0edb12d	2018-11-26 19:10:43 -08:00
Ilia Cherniavskii	7953b32dc4	Remove unused executors, part 2 (#14115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14115 Remove legacy implementation of prof_dag Reviewed By: salexspb Differential Revision: D13019096 fbshipit-source-id: 4f2bf676444d84eaa2cc1effcc3ebdc764e0a016	2018-11-26 19:10:42 -08:00
Ilia Cherniavskii	34239006b0	Remove unused executors, part 1 (#14117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14117 Removing unused legacy executors (htrace) Reviewed By: salexspb Differential Revision: D13019078 fbshipit-source-id: 19d0ed1b47a22cc17c27fdd15d748ced54806132	2018-11-26 19:10:40 -08:00
Edward Yang	507cb16583	Delete OPENMP_STUB translation. (#14286 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14286 Differential Revision: D13205356 Pulled By: ezyang fbshipit-source-id: 08e9821e4b32f8d7f3c41906e481f280ee6cf2e3	2018-11-26 19:08:07 -08:00
Wei Yang	12558019a8	backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345 ) Summary: - introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345 Differential Revision: D13094070 Pulled By: weiyangfb fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461	2018-11-26 17:47:48 -08:00
Marat Dukhan	9e1805d38e	Switch Int8ChannelShuffle operator to QNNPACK (#14362 ) Summary: 1.8-2.2X better performance on ARM devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/14362 Reviewed By: jerryzh168 Differential Revision: D13192312 Pulled By: Maratyszcza fbshipit-source-id: 0d3dff067e300c7d741c42615b61246cbf09a829	2018-11-26 17:43:32 -08:00
Teng Li	2d6f039766	Fixed file init_method write/read race (#14388 ) Summary: This should fix the race among multiple processes: https://github.com/pytorch/pytorch/issues/13750 Essentially, the reader is trying to open the file, and will error out if it doesn't exist, we here factor in the timeout option of FileStore to apply a timeout for creating a file (should always be created anyway unless something is wrong), and more importantly, waiting for the file to be created. Tested on both NFS and local drive, the race disappears when 8 concurrent processes do distributed training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14388 Differential Revision: D13207178 Pulled By: teng-li fbshipit-source-id: d3d5d62c4c8f01c0522bf1653c8986155c54ff80	2018-11-26 17:09:35 -08:00
Peter Goldsborough	f639249d51	Fix dataloader iterator test (#14045 ) Summary: I noticed the test `DataLoaderTest.CanDereferenceIteratorMultipleTimes` doesn't test proper progression of the iterator. I also added a test for using `std::copy`. Fixes https://github.com/pytorch/pytorch/issues/14276 ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/14045 Differential Revision: D13092187 Pulled By: goldsborough fbshipit-source-id: 57698ec00fa7b914b159677a4ab38b6b25c2860b	2018-11-26 17:06:41 -08:00
Teng Li	6f3002a50e	Fixed c10d test (#14389 ) Summary: Most likely a typo. Tested on 8-GPU machine ``` tengli@learnfair062:~/pytorch/test$ python test_c10d.py ProcessGroupNCCLTest.test_barrier . ---------------------------------------------------------------------- Ran 1 test in 29.341s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14389 Differential Revision: D13207207 Pulled By: teng-li fbshipit-source-id: aaffe14237076fe19d94e2fa4d9c093397f07bb9	2018-11-26 16:46:33 -08:00
Brennan Vincent	1ca0ec7299	fix typo in `torch.sum` documentation (#14250 ) Summary: Notice that an extra colon was added to `:attr:`, so in https://pytorch.org/docs/stable/torch.html#torch.sum , `dim` shows up as ":attr::_dim_". This patch fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14250 Reviewed By: soumith Differential Revision: D13146363 Pulled By: umanwizard fbshipit-source-id: f7d03dcb0973aae248b56ab407ba8489f2b1fe36	2018-11-26 16:36:52 -08:00
Wanchao Liang	cef23a4b1d	More JIT type hierarchy refinement (#14127 ) Summary: JIT type system hierarchy refinement and refactors: 1. Make NumberType be the base type of IntType FloatType 2. Make single type container like OptionalType and FutureType share SingleElementType base type 3. Some refactors to make it more robust, e.g. adding python_str() for some types so that we have proper python_print serialization format Pull Request resolved: https://github.com/pytorch/pytorch/pull/14127 Differential Revision: D13112657 Pulled By: wanchaol fbshipit-source-id: 335c5b25977be2e0a462c7e4a6649c1b653ccb4f	2018-11-26 16:25:40 -08:00
Jesse Hellemn	afb2c0ce86	changing some rpath stuff (#14304 ) Summary: See if anything breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/14304 Differential Revision: D13201418 Pulled By: pjh5 fbshipit-source-id: ac2101b61a23bda37329d4d923c3d9d120e718bf	2018-11-26 15:57:47 -08:00
Kevin Chen	b18063b39a	Fix caffe2 => onnx exporter for ConvTranspose (#14143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14143 ConvTranspose has a per-operator attribute rename, which meant that the global attribute rename for kernels => kernel_shape was not applied. Changing the behavior so that the global renames always apply, but per-op renames can override those for specific attributes. Note: The python frontend path isn't actually used for ConvTranspose, but I thought it would be good to make it consistent. Reviewed By: yinghai Differential Revision: D13113395 fbshipit-source-id: cd3f124b4b5c753a506d297138b7d002b51bfb38	2018-11-26 15:51:42 -08:00
Will Feng	5918de8e84	Revert D13166669: [pytorch][PR] Allow dataloader to accept a custom memory pinning function Differential Revision: D13166669 Original commit changeset: ca965f9841d4 fbshipit-source-id: 0836b4f50f73ba01c97491a719660f02e36f20ad	2018-11-26 14:55:04 -08:00
andersj	bb7fb7e45f	remove CAFFE2_API from IdWrapper (#14044 ) Summary: it doesn't really make sense on a template class. Also it breaks if you try to build in debug on Windows, so this will save someone some frustration in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14044 Differential Revision: D13202960 Pulled By: anderspapitto fbshipit-source-id: 617d78366993d5ecc2ba1f23bb90010f10df41f3	2018-11-26 14:08:56 -08:00
Jerry Zhang	735cd06536	FeedTensor returns a Tensor (#14196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641 FeedTensor function used to take a pointer to Tensor and feed the content using Resize and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead. Reviewed By: dzhulgakov Differential Revision: D13091163 fbshipit-source-id: 9abf2fd320baca76e050530c500dd29f8e2d0211	2018-11-26 13:05:44 -08:00
Richard Zou	b13f91dbd9	Allow graph fuser to move chunks past multiple nodes. (#14055 ) Summary: Fixes #12290. Also speeds up JIT LSTM forward pass from 8.8ms to 7.8ms; previously, each JIT lstm cell used 2 fused kernels. Now, it only uses one fused kernel (which is how many kernels cudnn uses). Explanation: Let f, g, h be fusible ops. ``` x = f(v, w) z = g(x, y) a, b = chunk(z) c = h(a, b) ``` becomes (before this PR): ``` x = f(v, w) x', y' = broadcast_tensors([x, y]) ax, bx = chunk(x') ay, by = chunk(y') a = g(ax, ay) b = g(bx, by) c = h(a, b) ``` The graph fuser then puts g, g, and h into one FusionGroup and is unable to move `x = f(v, w)` into the FusionGroup. This PR lets the graph fuser move `x = f(v, w)` into the FusionGroup. It does this by abstracting the broadcast_tensors + multiple chunk nodes into one intermediate `prim::BroadcastingChunk[chunks, dim]` node. A `BroadcastingChunk[chunks, dim](inputs)` node is equivalent to: - broadcasting all of inputs - chunk-ing each broadcasted input into `chunks` chunks along dim `dim`. Abstracting the broadcasting chunk behavior away, it is now a lot easier for the graph fuser to move (broadcast + chunk) past an operation. After this PR, the above graph becomes: ``` x = f(v, w) ax, bx, ay, by = BroadcastingChunk(x, y) a = g(ax, ay) b = g(bx, by) c = h(a, b) ``` Now, to move `x = f(v, w)` after the BroadcastingChunk, one just needs to add f's operands to the BroadcastingChunk: ``` ay, by, av, bv, aw, bw = BroadcastingChunk(y, v, w) ax = f(av, aw) by = f(bv, bw) a = g(ax, ay) b = g(bx, by) c = h(a, b) ``` cc apaszke mruberry zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/14055 Differential Revision: D13159259 Pulled By: zou3519 fbshipit-source-id: 134e9e645c950384d9be6a06a883a10e17a73d7d	2018-11-26 12:31:49 -08:00
svcscm	8cc5d54b66	Updating submodules Reviewed By: yns88 fbshipit-source-id: b4d74bf58b5536a0de654dfe73d41b5e1126eec6	2018-11-26 12:21:09 -08:00
Jesse Hellemn	0d1f382e39	Removing Caffe2-specific conda infra Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11961 Differential Revision: D10045909 Pulled By: pjh5 fbshipit-source-id: e9c12124897ee586aeb8b6654b31e4b81687199a	2018-11-26 12:18:17 -08:00
Michael Suo	2fa3c8327c	fix tensor advanced indexing with assignment (#14311 ) Summary: Fix a mishandling of `foo[a] = b` when `a` was a tensor. We were assigning to a copy of `foo`, not a view of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14311 Differential Revision: D13196109 Pulled By: suo fbshipit-source-id: c929401fda7c4a27622d3fe2b11278b08a7f17f1	2018-11-26 12:10:48 -08:00
Jongsoo Park	80ba65e2f5	remove unnecessary zero_point argument from constructors (#14323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14323 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/24 As title says. Reviewed By: dskhudia Differential Revision: D13167073 fbshipit-source-id: 6d6c526fd6e29a14e97f71a0881f28ada8703107	2018-11-26 11:48:17 -08:00
svcscm	0651b594d8	Updating submodules Reviewed By: yns88 fbshipit-source-id: 06e234f1a0217a268712832f21cb06b7109538a6	2018-11-26 11:27:01 -08:00
Peter Goldsborough	a10a993872	Fix -Wreturn-std-move (#14113 ) Summary: On clang-7 (internal) a warning, `-Wreturn-std-move`, is being emitted and raised to an error via `-Werror` for the code this PR fixes. The reason is that `autograd::make_variable` returns an `autograd::Variable`, so returning it from a function that returns `at::Tensor` disallows the compiler from eliding the return value (RVO). So let's explicitly convert the `autograd::Variable` to an `at::Tensor` before returning it. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14113 Differential Revision: D13105638 Pulled By: goldsborough fbshipit-source-id: 6e1dc31c6512e105ab2a389d18807422ee29283c	2018-11-26 11:15:59 -08:00
Jongsoo Park	90ed2f5aca	minimize code compiled with avx2 and header includes from them (#14313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14313 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/22 This diff is an attempt to minimize code compiled with avx2. Reviewed By: dskhudia Differential Revision: D13166591 fbshipit-source-id: 2be241141f6d7478b86a422953791e237ff10268	2018-11-26 11:09:21 -08:00
Peter Goldsborough	fa73037233	Add proper from_blob overloads (#13982 ) Summary: There was an overload for `torch::from_blob` missing that allowed passing strides. ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13982 Differential Revision: D13108089 Pulled By: goldsborough fbshipit-source-id: b87594ec0bf55b35d106b4438bc18b2ce9fc8f71	2018-11-26 10:14:51 -08:00
Brennan Vincent	b30c803662	allow concatenating "hybrid" (sparse/dense) tensors along their dense dimensions (#13761 ) Summary: Follow-up to #13577 The idea is to take each values tensor, concatenate it with zeros before and after itself (along the dimension corresponding to the one we're catting the tensors along), to get a tensor corresponding to the values for that tensor in the result. Then we concatenate all of those together to get the final values tensor. (Hopefully, this will be more clear from the example in the comments). The indices are more straightforward: since we aren't concatenating along a sparse dimension, they don't change at all, so all we need to do are concatenate the indices from the different tensors together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13761 Differential Revision: D13160343 Pulled By: umanwizard fbshipit-source-id: 13d7adecd369e0eebdf5bce3d90a51029b66bd1d	2018-11-26 10:06:49 -08:00
Peter Goldsborough	a13fd7ec28	Allow torch.utils.cpp_extension.load to load shared libraries that aren't Python modules (#13941 ) Summary: For custom TorchScript operators, `torch.ops.load_library` must be used and passed the path to the shared library containing the custom ops. Our C++ extensions stuff generally is meant to build a Python module and import it. This PR changes `torch.utils.cpp_extension.load` to have an option to just return the shared library path instead of importing it as a Python module, so you can then pass it to `torch.ops.load_library`. This means folks can re-use `torch.utils.cpp_extension.load` and `torch.utils.cpp_extension.load_inline` to even write their custom ops inline. I think t-vi and fmassa will appreciate this. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13941 Differential Revision: D13110592 Pulled By: goldsborough fbshipit-source-id: 37756307dbf80a81d2ed550e67c8743dca01dc20	2018-11-26 09:39:21 -08:00
Adam Paszke	a60368982b	Batch more matrix multiplies (#13456 ) Summary: This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows. cc mruberry ngimel zou3519 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456 Differential Revision: D12920339 Pulled By: zou3519 fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443	2018-11-26 09:20:35 -08:00
Gregory Chanan	1ef949036c	Enable native wrappers for the remainder of nn functions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14290 Differential Revision: D13162562 Pulled By: gchanan fbshipit-source-id: 615e1727988bfeeade48f9b38162333a2e298f7b	2018-11-26 07:58:59 -08:00
Huan Gui	60e7d04961	Add Recency Weighted into SparseLookup (#14291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14291 Add RecencyWeighted into SparseLookup. Reviewed By: Wakeupbuddy Differential Revision: D13147738 fbshipit-source-id: de5dc3aaee8ce7d41c6d30d2ff47e9786a7fa4da	2018-11-24 02:43:31 -08:00
Shuichi KITAGUCHI	6e1e2032d3	quote NUMPY_INCLUDE_DIR (#14341 ) Summary: when NUMPY_INCLUDE_DIR contains space character (e.g. "C:\Program Files (x86)\Microsoft Visual Studio\..."), cmake cannot receive correct path name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14341 Differential Revision: D13188408 Pulled By: soumith fbshipit-source-id: b62127d90e53da94fe6af5d3bdd2ea4fd6546210	2018-11-23 21:34:01 -08:00
Michael Suo	33d091f432	shape analysis fix (#14325 ) Summary: This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325 Differential Revision: D13183296 Pulled By: suo fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298	2018-11-23 11:24:24 -08:00
peter	8e3240d022	Some minor fixes for Windows build script (#14218 ) Summary: 1. Fix execution failure when some of the paths are not defined 2. Users can now optionally override install dir by setting `CMAKE_INSTALL_PREFIX` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14218 Differential Revision: D13180350 Pulled By: soumith fbshipit-source-id: 8c9680d1285dbf08b49380af1ebfa43ede99babc	2018-11-23 08:17:16 -08:00
Michael Carilli	7557a993ab	Allow dataloader to accept a custom memory pinning function (#14171 ) Summary: Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type. The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171 Differential Revision: D13166669 Pulled By: soumith fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab	2018-11-23 08:12:43 -08:00
Michael Carilli	c36156eded	Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253 ) Summary: This issue was noticed, and fix proposed, by raulpuric. Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes. The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.** The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`. Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive. However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free. I'm a little wary of the [def checkpoint(function, args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list). Python 3 seems happy with it. Edit: It appears Python 2.7 is NOT happy with a [kwarg after args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification). `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage. I'm open to suggestions (a global flag perhaps)? **Batchnorm may still be an issue, but that's a battle for another day. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253 Differential Revision: D13166665 Pulled By: soumith fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7	2018-11-23 08:09:43 -08:00
svcscm	1e05f4be73	Updating submodules Reviewed By: yns88 fbshipit-source-id: e92b0c24a56b588dcf30542692cb4bdc2d474825	2018-11-22 22:04:37 -08:00
Sebastian Messmer	d55b25a633	Remove individual "using c10:xxx" statements (#13168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13168 We now have a "using namespace c10" in the at and caffe2 namespaces, we don't need the individual ones anymore Reviewed By: ezyang Differential Revision: D11669870 fbshipit-source-id: fc2bb1008e533906914188da4b6eb30e7db6acc1	2018-11-22 11:57:10 -08:00
Yinghai Lu	f79fb58744	Make sure we bind input/output of Onnxifi op positionally (#14214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14214 This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional. Reviewed By: dzhulgakov Differential Revision: D13134470 fbshipit-source-id: d1b916dade65c79133b86507cd54ea5166fa6810	2018-11-22 00:31:01 -08:00
Wanchao Liang	7fc34a4122	Convert gumbel_softmax, lp pooling weak functions and modules (#14232 ) Summary: 1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int] 2. Convert gumbel_softmax, lp pooling weak functions and modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232 Differential Revision: D13164506 Pulled By: wanchaol fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700	2018-11-21 23:44:24 -08:00
Sebastian Messmer	08b77d3844	Use ADL to find toString (#14021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14021 I'm planning to move at::Scalar to c10, and there's a at::toString(Scalar) defined. Unfortunately, we call it by specifying at::toString() instead of relying on ADL. This diff changes that to prepare the actual move. Reviewed By: ezyang Differential Revision: D13015239 fbshipit-source-id: f2a09f43a96bc5ef20ec2c4c88f7790fd5a04870	2018-11-21 23:08:52 -08:00
Sebastian Messmer	0e93a03a3a	Fix include paths for intrusive_ptr (#13692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13692 This now lives in c10/util, not ATen/core anymore. Reviewed By: ezyang Differential Revision: D12937091 fbshipit-source-id: ea2d420a15e7941a38d0b4c75e20ca18437c73f8	2018-11-21 23:08:50 -08:00
Sebastian Messmer	4160c13cd2	Move intrusive_ptr to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13691 Reviewed By: ezyang Differential Revision: D12937090 fbshipit-source-id: fe9d21d5f7ea4e78e7e38ac60db13814a9971ed9	2018-11-21 23:08:49 -08:00
Joel Marcey	e91c8e2f2d	ignore generated caffe2 docs and virtualenvs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14309 Reviewed By: soumith Differential Revision: D13166626 Pulled By: JoelMarcey fbshipit-source-id: 4f11228d8b5da85cec222bf11282722a7319581b	2018-11-21 22:30:34 -08:00
svcscm	3918e226fd	Updating submodules Reviewed By: yns88 fbshipit-source-id: 20976d595e68a08d746d8806fd0205d810656366	2018-11-21 22:02:07 -08:00
Jongsoo Park	fb8c3d62fe	removing quantization utility functions moved to fbgemm (#14301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14301 This diff removes quantization utility functions copied to fbgemm Reviewed By: Maratyszcza Differential Revision: D13159299 fbshipit-source-id: a7f3cd2af0aa241a8578d532a70a157da70d9289	2018-11-21 21:38:23 -08:00
Achal Shah	8c4910b095	Cuda version comparison with CUDA_VERSION_STRING (#14302 ) Summary: Cuda headers include cuda version in form of major.minor. But when we do find_package(cuda). CUDA_VERSION variable includes patch number as well which fails following condition. ` if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION}) ` For example: I have cuda 10.0 installed. My nvcc output looks like this `Cuda compilation tools, release 10.0, V10.0.130 ` If I compile my application with caffe2. It gives me following error: ``` CMake Error at /usr/share/cmake/Caffe2/public/cuda.cmake:59 (message): FindCUDA says CUDA version is (usually determined by nvcc), but the CUDA headers say the version is 10.0. This often occurs when you set both CUDA_HOME and CUDA_NVCC_EXECUTABLE to non-standard locations, without also setting PATH to point to the correct nvcc. Perhaps, try re-running this command again with PATH=/usr/local/cuda/bin:$PATH. See above log messages for more diagnostics, and see https://github.com/pytorch/pytorch/issues/8092 for more details. ``` In this case, it got failed because cuda_version_from_header = 10.0 CUDA_VERSION = 10.0.130 (Came from NVCC) `if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION}) ` Fix: We should compare header version with major.minor format which is given by CUDA_VERSION_STRING Pull Request resolved: https://github.com/pytorch/pytorch/pull/14302 Differential Revision: D13166485 Pulled By: soumith fbshipit-source-id: 1b74e756a76c4cc5aa09978f5850f763ed5469b6	2018-11-21 21:02:28 -08:00
svcscm	992e2750fd	Updating submodules Reviewed By: yns88 fbshipit-source-id: ee60b4dddf688608ef80043b1dc336d120a045d0	2018-11-21 21:02:26 -08:00
svcscm	341b48529e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 366c29d09bec53459e2a4890c7fe8d10f45ff5c3	2018-11-21 20:31:53 -08:00
Teng Li	b26f82b0ec	Robust NCCL barrier improvement to cover all devices combinations (#14271 ) Summary: This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream. Test covered as well. Tested on 8-GPU machine Pull Request resolved: https://github.com/pytorch/pytorch/pull/14271 Differential Revision: D13164993 Pulled By: teng-li fbshipit-source-id: 81e04352740ea50b5e943369e74cfcba40bb61c1	2018-11-21 18:23:55 -08:00
Michael Suo	b149456645	alias analysis (#14018 ) Summary: First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review: 1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules. 2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node". 3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it. - Testing; still figuring out the best way to do this. - Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations. - Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018 Differential Revision: D13144906 Pulled By: suo fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a	2018-11-21 17:48:46 -08:00
Ilia Cherniavskii	d55ba77a5d	Remove extra include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14206 Reviewed By: dzhulgakov Differential Revision: D13131318 fbshipit-source-id: 559b55b8d98cdf6b7d1d3e31237c5473edc5e462	2018-11-21 17:21:44 -08:00
Teng Li	85d3fccee7	Removed redundant allreduce options in DDP (#14208 ) Summary: This somehow is not cleaned up after the C++ migration. Unused and can be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14208 Differential Revision: D13132492 Pulled By: teng-li fbshipit-source-id: 0f05b6368174664ebb2560c037347c8eb45f7c38	2018-11-21 16:56:46 -08:00
David Riazati	d9cdcc9a3b	Add list inequality operator (#14129 ) Summary: This PR adds `aten::neq` for list inequality comparisons and converts `nll_loss` to weak script Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129 Differential Revision: D13123894 Pulled By: driazati fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f	2018-11-21 16:32:58 -08:00
Yinghai Lu	34db39d87a	Add onnxifi support to SparseLengthsWeightedSum (#14210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14210 We left `SparseLengthsWeightedSum` as benchmark is not testing it due to fp16 filler issue. It was flushed out by unit tests. Hence we add the support here. Reviewed By: bddppq Differential Revision: D13132320 fbshipit-source-id: b21c30c185c9e1fbf3980641bc3cdc39e85af2e1	2018-11-21 15:47:24 -08:00
Gu, Jinghui	60963c2ecb	Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. (#12971 ) Summary: Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971 Reviewed By: bddppq Differential Revision: D12850675 Pulled By: yinghai fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019	2018-11-21 15:44:50 -08:00
Viswanath Sivakumar	accbcca338	IDEEP fallback for ResizeNearest op (#14212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14212 TSIA Reviewed By: yinghai Differential Revision: D13134134 fbshipit-source-id: e3c5c9c8756d6e25b213f8dde9d809a44373d7a3	2018-11-21 13:44:07 -08:00
zrphercule	2cacb39a21	Fix ONNX_ATEN mode (#14239 ) Summary: Fix ONNX_ATEN mode by adding it to the validateBlock method. Before this pr, validateBlock will throw an exception when using this mode. I will add related test cases for ONNX_ATEN mode in a different pr once this is merged, since we dont have any currently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14239 Differential Revision: D13145443 Pulled By: zrphercule fbshipit-source-id: 60e7942aa126acfe67bdb428ef231ac3066234b1	2018-11-21 13:15:23 -08:00
Pieter Noordhuis	fe068d9032	Bump gloo (#14281 ) Summary: Includes more robust error handling and timeout support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14281 Differential Revision: D13158232 Pulled By: pietern fbshipit-source-id: e80432799a020576d5abdcd9a21d66b629479caf	2018-11-21 11:27:42 -08:00
Jongsoo Park	31ba34b73c	fix comment on dnnlowp op arguments (#14265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14265 Fix comment Reviewed By: hx89 Differential Revision: D13152106 fbshipit-source-id: fbe98906963cbd5cb20a583a737a792fbc38292e	2018-11-21 09:39:57 -08:00
Gregory Chanan	6ce9907d51	native NN wrappers, including with buffers. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14256 Differential Revision: D13148783 Pulled By: gchanan fbshipit-source-id: 4b6179033cf1df26061b6731eaaa4e008692e592	2018-11-21 09:08:00 -08:00
Pieter Noordhuis	91c0b7159a	Remove header generated at configuration time (#14244 ) Summary: The build was picking up the empty stub header instead of the generated one. Because of the large number of include paths we end up passing to the compiler it is brittle to have both an empty stub file and a generated file and expect the compiler to pick up the right one. With the recent change to compile everything from a single CMake run we can now use native CMake facilities to propagate macros that indicate backend support. The stanzas target_compile_definitions with the INTERFACE flag ensure that these macros are set only for downstream consumers of the c10d target. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14244 Reviewed By: teng-li Differential Revision: D13144293 Pulled By: pietern fbshipit-source-id: f49324220db689c68c126b159f4f00a8b9bc1252	2018-11-21 08:45:08 -08:00
Zachary DeVito	788d2e87bd	Address jittering issues in python_print (#14064 ) Summary: export - print a method with python_print import - import a method with import_method We want to ensure: export(g) == export(import(export(g))) That is after after exporting/importing once, the graph will stay exactly the same. This is less strict that g == import(export(g)) which would require us to maintain a lot more information about the structure of the IR and about the names of debug symbols. This PR addresses this with the following fixes: * print out double-precision numbers with high enough precision such that they always parse in the same way * when creating loop-carried dependencies, sort them by variable name, ensuring a consistent order * parse nan correctly * DCE: remove unused outputs of if statements, and loop-carried dependencies in loops that are dead both after the loop and inside the body of the loop. * Do not set uniqueName for variables whose names are _[0-9]+, these are probably rare in user code, and we need a way to communicate that we do not care about a variable name when re-parsing the graph. Otherwise temporary variable names will jitter around. * Expand the definition of a constant in printing code to None, and family. * Allow re-treeing to work as long as the only thing in its way is a constant node. These do not have side effects but are sometimes inserted in a different order when tracing compared to how we print them. * Print all constant nodes out first in the order in which they are used_val (or, if they are inlined, ensure they get assigned CONSTANT.cX number in a consistent order). Cleanup tuples (this is done in the compiler, but not in the tracer, leading to some tuple indexing jitter if not done). * use strtod_l, not std::stod which can throw exceptions Other: * Add REL_WITH_DEB_INFO to setup.py. It already existed for the cmake files. Threading it into setup.py allows us to turn on debug symbols with optimization everywhere. * enable round trip testing for all generated graphs. This only adds ~6 seconds to total build time but tests printing for every graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064 Differential Revision: D13094637 Pulled By: zdevito fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d	2018-11-21 06:38:29 -08:00
svcscm	af82396f7f	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 27838fb2dad82c78906faf3cc2d124557c30e88f	2018-11-21 06:38:28 -08:00
svcscm	166ee86b46	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 3c17e12a579245a84e9a56b1d8a1641232150675	2018-11-21 00:27:50 -08:00
Lu Fang	7a654617eb	Add tensor table in ModelDef and use it for jit script serialization and deserialization (#13861 ) Summary: As we discussed, the tensors in the torch script will be associated with the tensor data in the serialized file. So let's add a table of tensor (actually it's a repeated TensorProto filed) in the ModelDef. TensorProto.name will be the id. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13861 Reviewed By: dzhulgakov Differential Revision: D13036940 Pulled By: zrphercule fbshipit-source-id: ecb91b062ac4bc26af2a8d6d12c91d5614efd559	2018-11-20 23:37:50 -08:00
Tongzhou Wang	17432a1051	c10d Automatically retry on EINTR (#14180 ) Summary: Probably fixes https://github.com/pytorch/pytorch/issues/14170 Actually I probably shouldn't retry all `SYSCHECK` calls. I'll leave to the reviewers to decide. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14180 Reviewed By: pietern Differential Revision: D13144741 Pulled By: SsnL fbshipit-source-id: d73288f76b18cae14b1b43dad4e5e8d010a96d95	2018-11-20 23:31:26 -08:00
Teng Li	bb301a431d	Make NCCL backend support barrier op (#14142 ) Summary: This is a feature request from: https://github.com/pytorch/pytorch/issues/13573 As the title says, this PR makes NCCL backend support barrier op. There are a couple scenarios that need to be addressed: (1) When there is already a NCCL op happened, we need to record what GPU device(s) the previous op happened and queue the allreduce barrier op on the same GPU device (2) When there is no NCCL op yet, we will try to use a single GPU and separate each process from a single GPU as the best effort. As for the async work, during wait, we would like not just wait on the NCCL kernel to be completed, but also block the thread until the current stream and nccl stream return. `test_distributed` should cover the test. I also manually tested both scenarios. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14142 Differential Revision: D13113391 Pulled By: teng-li fbshipit-source-id: 96c33d4d129e2977e6892d85d0fc449424c35499	2018-11-20 21:12:22 -08:00
Yinghai Lu	1acaafbe70	Fix memory leakage in onnxifi transformer (#14245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14245 tsia Reviewed By: bddppq, rdzhabarov Differential Revision: D13144783 fbshipit-source-id: 5e07bb7ab883ba1af68547a26272cd320967b9e3	2018-11-20 18:03:05 -08:00
David Riazati	8f20d40bb7	Allow undefined tensors as constants (#14120 ) Summary: This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`: ```python torch.jit.script def fn(x=None): # type: (Optional[Tensor]) -> Tensor return torch.jit._unwrap_optional(x) torch.jit.script def fn2(): # type: () -> Tensor return fn() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120 Differential Revision: D13124625 Pulled By: driazati fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569	2018-11-20 16:54:27 -08:00
Wanchao Liang	d6bfc53b9e	Export BatchNorm functional and module, add necessary JIT support (#14016 ) Summary: This PR did three things: 1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features 2. In the process of export, add necessary compiler support for in_place op aug assign 4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016 Differential Revision: D13112064 Pulled By: wanchaol fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91	2018-11-20 14:15:06 -08:00
Thomas Viehmann	1f871f126f	Have PYTORCH_FUSION_DEBUG print C kernel source (#14213 ) Summary: - Move up handling the environment variable from CPU only to all - Introduce two levels to be enabled with PYTORCH_FUSION_DEBUG=n: 1: print C source 2: print CPU assembly, too (previous effect of PYTORCH_FUSION_DEBUG) apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/14213 Differential Revision: D13135393 Pulled By: soumith fbshipit-source-id: befa4ebea3b3c97e471393a9f6402b93a6b24031	2018-11-20 12:45:07 -08:00
Tugrul Ates	1224ef9ea1	Delete backwards compatibility StorageImpl.h and TensorImpl.h (#14230 ) Summary: Since they directly include the real ones in core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14230 Differential Revision: D13140323 Pulled By: tugrulates fbshipit-source-id: d7e3b94e891b2d7fa273d01c0b7edfebdbd7e368	2018-11-20 12:29:24 -08:00
Jongsoo Park	9a281451ed	remove unused parameters from caffe2_dnnlowp_utils.cc (#14164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14164 See title Reviewed By: csummersea Differential Revision: D13115470 fbshipit-source-id: d754f558cd06e5f4c1cd00315e912cdb7b50731a	2018-11-20 00:56:06 -08:00
Jongsoo Park	3c2462cf24	use pragma once (#14163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14163 Some of the names we were using to guard the header file was too short (e.g. DYNAMIC_HISTOGRAM_H). Reviewed By: csummersea Differential Revision: D13115451 fbshipit-source-id: cef8c84c62922616ceea17effff7bdf8d67302a2	2018-11-20 00:56:04 -08:00
Jongsoo Park	4224ce10a8	format python files (#14161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14161 Formatting using Nuclide Reviewed By: hx89 Differential Revision: D13115348 fbshipit-source-id: 7432ce6072a1822d7287b4ebcfcb6309282e15ac	2018-11-20 00:56:02 -08:00
Jongsoo Park	3c0ce51484	clang-format (#14160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14160 clang-format of C++ files Reviewed By: hx89 Differential Revision: D13115201 fbshipit-source-id: d2ad65f66209e00578ef90f87f41272de2d24aa9	2018-11-20 00:56:00 -08:00
Hui Wu	acd7811e33	Add sigmoid op based on MKL-DNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13097 Differential Revision: D13105366 Pulled By: yinghai fbshipit-source-id: d156e8fd519baeecf61c25dcd8fa2c2fa7351ef4	2018-11-19 22:56:35 -08:00
Daya S Khudia	c96b72d61f	OSS build fix (#14192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14192 We can only use C10_* in OSS. The build is only broken if built with USE_FBGEMM=ON Reviewed By: jianyuh Differential Revision: D13121781 fbshipit-source-id: f0ee9a75997766e63e1da8a53de7ddb98296a171	2018-11-19 22:47:17 -08:00
Lu Fang	6dacc20073	Make EncodeMethod in jit script serialization return a string (#14167 ) Summary: Nit Pull Request resolved: https://github.com/pytorch/pytorch/pull/14167 Reviewed By: ezyang Differential Revision: D13116584 Pulled By: dzhulgakov fbshipit-source-id: c0e7e71a81004031564bd2fc59f393041e1283d5	2018-11-19 22:15:19 -08:00
Jongsoo Park	a036f9a65f	Create README.md of caffe2/quantization/server Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14217 Reviewed By: csummersea Differential Revision: D13135086 Pulled By: jspark1105 fbshipit-source-id: bddf4f1c2dc5ec8ea6ebe9e265956f367e082d52	2018-11-19 21:59:34 -08:00
Will Feng	6dc28e666c	CircleCI: fix NCCL install (#14172 ) Summary: The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR fixes the issue. This replaces https://github.com/pytorch/pytorch/pull/14124. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14172 Differential Revision: D13135087 Pulled By: yf225 fbshipit-source-id: 42fff3926734778713d483d74ba0a89e5502dd9e	2018-11-19 21:30:32 -08:00
zrphercule	03a02b6fd5	Fix a bug in test case of onnx::If Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14209 Differential Revision: D13132607 Pulled By: zrphercule fbshipit-source-id: b7f7ccc6a6cbdeb57a7f88a1971d15dd81e6fc81	2018-11-19 18:46:21 -08:00
Teng Li	b807970aea	Tensor type checking and informative error messages for torch.distributed (#14204 ) Summary: This will address https://github.com/pytorch/pytorch/issues/13574 This error message should be more informative to the user for all the non-multiGPU ops, since we python binding to multi-gpu ops always. test_distributed should cover all. Also tested both RunTime errors. ``` >>> a = torch.ByteTensor([]) >>> b = [a, a] >>> dist.all_reduce(b) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 809, in all_reduce _check_single_tensor(tensor, "tensor") File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 207, in _check_single_tensor "to be a torch.Tensor type".format(param_name)) RuntimeError: Invalid function argument. Expecting parameter: tensor to be a torch.Tensor type >>> b = ["b"] >>> dist.all_gather(b, a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 1006, in all_gather _check_tensor_list(tensor_list, "tensor_list") File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 225, in _check_tensor_list "to be a List[torch.Tensor] type".format(param_name)) RuntimeError: Invalid function argument. Expecting parameter: tensor_list to be a List[torch.Tensor] type ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14204 Differential Revision: D13131526 Pulled By: teng-li fbshipit-source-id: bca3d881e41044a013a6b90fa187e722b9dd45f2	2018-11-19 18:30:54 -08:00
Edward Yang	7d1db89ef9	Move stream functions from CUDAContext to CUDAStream (#14110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14110 I'm planning to move CUDAStream to c10/cuda, without also moving CUDAContext, and so it's most convenient if these definitions are in the actual header file in question. Reviewed By: smessmer Differential Revision: D13104693 fbshipit-source-id: 23ce492003091adadaa5ca6a17124213005046c2	2018-11-19 17:05:48 -08:00
Edward Yang	50b914aeeb	Move CUDAStreamInternals inside detail namespace. (#14109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14109 Previously it was at the top level, because the author was under the impression that you could only refer to top-level C++ names from C, but this is not true; you just need to make a stub struct conditioned on __cplusplus. Reviewed By: smessmer Differential Revision: D13104694 fbshipit-source-id: ecb7ae6dcfa4ab4e062aad7a886937dca15fd1b2	2018-11-19 17:05:46 -08:00
Edward Yang	e58bbbac18	Delete dependencies from CUDAStream; remove synchronize_with (#13920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920 I want to move CUDAStream and CUDAGuard to c10_cuda without also bringing along CUDAContext or CUDAEvent for the ride (at least for now). To do this, I need to eliminate those dependencies. There's a few functions in CUDAContext.h which don't really need THCState, so they're separated out and put in general purpose c10/cuda/CUDAFunctions.h Reviewed By: smessmer Differential Revision: D13047468 fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e	2018-11-19 17:05:41 -08:00
Yavuz Yetim	a20c7ce848	Fix race in AtomicFetchAdd. (#13479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13479 Increases the lock scope to above Output() calls. These calls potentially allocate the underlying blob/tensor objects and multiple invocations race each other over the same output blobs/tensors. Reviewed By: bwasti Differential Revision: D12891629 fbshipit-source-id: a6015cfdb08e352521a1f062eb9d94a971cfbdb0	2018-11-19 16:11:58 -08:00
Sebastian Messmer	1a29950478	Remove API macros from intrusive_ptr (#14137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14137 This is a templated header-only class and shouldn't need export/import macros. Reviewed By: ezyang Differential Revision: D13111712 fbshipit-source-id: c8c958e75b090d011d25156af22f37f9ca605196	2018-11-19 15:39:20 -08:00
Jerry Zhang	1c2ed4eb23	Tensor construction: combine Resize+mutable_data - 1/4 (#13942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13942 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13054770 fbshipit-source-id: a9e86e5dfcb4f7cebf5243e1d359fad064561bed	2018-11-19 15:33:50 -08:00
Jerry Zhang	8aa5174106	Tensor construction: combine Resize+mutable_data - 3/4 (#13944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13054836 fbshipit-source-id: 5de07a156687f1ee607d0450410881d9176a87a7	2018-11-19 15:28:13 -08:00
Lu Fang	f34c848f52	Store the optimize flag in module (#14166 ) Summary: When the save/load of script module, we store optimize flag in module instead of encoding it in method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14166 Reviewed By: ezyang Differential Revision: D13117577 Pulled By: dzhulgakov fbshipit-source-id: dc322948bda0ac5809d8ef9a345497ebb8f33a61	2018-11-19 14:34:05 -08:00
Junjie Bai	7fd1ea6ab7	Cleanup caffe2 hipify exclude patterns (#14198 ) Summary: depthwise_3x3_conv_op.cu does not exist Pull Request resolved: https://github.com/pytorch/pytorch/pull/14198 Differential Revision: D13127479 Pulled By: bddppq fbshipit-source-id: ec6bd434055a49ea405c4b399bde8c074114f955	2018-11-19 14:27:56 -08:00
Gregory Chanan	b6edd7bbb4	Support 'python_module' of 'nn' in native functions. (#14126 ) Summary: Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126 Reviewed By: ezyang Differential Revision: D13109975 Pulled By: gchanan fbshipit-source-id: 0b29dc8cf222d25db14da7532d8dc096a988a0ec	2018-11-19 14:13:25 -08:00
Junjie Bai	1e73ab25f5	Use onnx proto_utils to support using protobuf-lite Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14150 Differential Revision: D13115586 Pulled By: bddppq fbshipit-source-id: d6b6935a8deac60f6f58d62a71f6840182a72a51	2018-11-19 13:32:46 -08:00
Daya S Khudia	6b4852213d	Use fbgemm revision file added by shipit (#14105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14105 Pull Request resolved: https://github.com/facebook/fbshipit/pull/62 Use fbgemm revision file created by ShipIt for updating fbgemm revision for pytorch. We don't have to manually update submodule now. Reviewed By: yns88 Differential Revision: D13072074 fbshipit-source-id: bef9eabad50f7140179c370a60bd9ca73067b9b5	2018-11-19 12:12:21 -08:00
Your Name	b6290531aa	Setup sccache for PyTorch ROCm CI (#14153 ) Summary: Discovered huge build time difference between caffe2 rocm build and pytorch rocm build (6min vs. 30min), turns out it's because the sccache setup needed in caffe2 docker images are not n pytorch build script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14153 Differential Revision: D13115097 Pulled By: bddppq fbshipit-source-id: 88414f164b980f0e667c8e138479b4a75ab7692e	2018-11-19 11:31:55 -08:00
Ailing Zhang	e387d945c2	allow empty index for scatter_* methods (#14077 ) Summary: Fixes #2027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077 Differential Revision: D13095788 Pulled By: ailzhang fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3	2018-11-19 09:50:21 -08:00
ArmenAg	751b5ea941	use at::Device throughout JIT (#14181 ) Summary: zdevito soumith Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master. fixes #13254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181 Differential Revision: D13117688 Pulled By: soumith fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f	2018-11-19 09:21:57 -08:00
Gregory Chanan	fc61f1a1d1	Support named return arguments in native_functions. (#14100 ) Summary: Note there was a hacky way of doing this before by specifying "return:" lists manually; this makes the return names part of the function declaration itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14100 Differential Revision: D13101810 Pulled By: gchanan fbshipit-source-id: 1c80574cd4e8263764fc65126427b122fe36df35	2018-11-19 08:27:20 -08:00
Edward Yang	ce85150cb4	Split out CUDAMultiStreamGuard from CUDAGuard (#13912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13912 The implementation and API of CUDAMultiStreamGuard is less mature, and it cannot be implemented generically (yet) in c10_cuda. This might be a reasonable thing to do eventually, but not for now. Reviewed By: smessmer Differential Revision: D13046500 fbshipit-source-id: 4ea39ca1344f1ad5ae7c82c98617aa348c327848	2018-11-19 08:20:11 -08:00
Edward Yang	48099c23b4	Move AT_CUDA_CHECK to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13910 Reviewed By: smessmer Differential Revision: D13046201 fbshipit-source-id: 8d360a0e4d6c2edf070d130e600c6b04f0ee0058	2018-11-19 08:20:10 -08:00
Edward Yang	928687bb24	Add c10 cuda library. (#13900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13900 Add c10 cuda library. Right now, this is not used by anything, and only tests if the CUDA headers are available (and not, e.g., that linking works.) Extra changes: - cmake/public/cuda.cmake now is correctly include guarded, so you can include it multiple times without trouble. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: smessmer Differential Revision: D13025313 fbshipit-source-id: fda85b4c35783ffb48ddd6bbb98dbd9154119d86	2018-11-19 08:20:07 -08:00
Marat Dukhan	2681852438	Switch Int8Add operator to QNNPACK (#14089 ) Summary: - Improved single-threaded performance due to optimized low-level micro-kernels - Improved parallelization (previously was parallelized across images in a batch and pixels only, now within channels as well) - Slightly different result due to different implementation of fixed-point arithmetics (no accuracy loss expected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14089 Differential Revision: D13110135 Pulled By: Maratyszcza fbshipit-source-id: 1f149394af5c16940f79a3fd36e183bba1be2497	2018-11-18 23:57:57 -08:00
Teng Li	92dbd0219f	No more -werror for c10d (#14155 ) Summary: As the title says Pull Request resolved: https://github.com/pytorch/pytorch/pull/14155 Differential Revision: D13115769 Pulled By: teng-li fbshipit-source-id: 278deba090364544d92fa603621604ce37fa974e	2018-11-18 13:53:41 -08:00
Summer Deng	55b25365e9	Add ultra low precision options (#14133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14133 Experiment with ultra low precisions on the Resnext-101 URU trunk model Reviewed By: jspark1105 Differential Revision: D10108518 fbshipit-source-id: f04d74fbe1c9e75efafcd9845719bdb2efbbfe9c	2018-11-18 12:51:34 -08:00
Soumith Chintala	ef3d7963d8	Adds symbolic diff for THNN Conv2d and aten native BatchNorm (#13888 ) Summary: Adds symbolic diff and tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13888 Differential Revision: D13115548 Pulled By: soumith fbshipit-source-id: ba75b01a95a5715a7761724dda018168b6188917	2018-11-18 09:22:31 -08:00
Your Name	07a8a730af	Print warning when ROCm memory leaking is detected in pytorch tests (#14151 ) Summary: We keep seeing random failures in CI because of ROCm memory leaking, e.g: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3102//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3080//console To make the CI more stable, turn it to warning instead of failure. iotamudelta please help investigating the memory leaking Pull Request resolved: https://github.com/pytorch/pytorch/pull/14151 Differential Revision: D13115096 Pulled By: bddppq fbshipit-source-id: a13b68274ecba363d9d8436aa6a62ac40a77d78c	2018-11-18 00:11:44 -08:00
vishwakftw	a5891e6124	Remove debugging code in test_cholesky_batched (#14156 ) Summary: They didn't turn up in my tests because I use pytest which doesn't print debug statements if the tests pass Differential Revision: D13115227 Pulled By: soumith fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b	2018-11-17 22:28:21 -08:00
Jerry Zhang	1bafa6236f	Back out "[reland][codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4" (#14154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14154 Original commit changeset: e89c2e692178 Reviewed By: amateurcoffee Differential Revision: D13115023 fbshipit-source-id: 8f9fb55842ae6c8139d5cd88ec6d0abb0c5cc5e7	2018-11-17 19:51:03 -08:00
Martin Schatz	12bb4742ad	CostInference for 1D conv (#14009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14009 As title Reviewed By: yinghai Differential Revision: D13078718 fbshipit-source-id: 081e7b13ad6741c635ef413915b555f10f93bd33	2018-11-17 17:28:52 -08:00
vishwakftw	a30ade1139	Batched cholesky decomposition (#14017 ) Summary: Implements batching for the Cholesky decomposition. Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations. Changes made: - batching code - tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`. - doc string modification - autograd modification - removal of `_batch_potrf` in `MultivariateNormal`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017 Differential Revision: D13087945 Pulled By: ezyang fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e	2018-11-17 10:49:15 -08:00
Jongsoo Park	390bf1e779	remove unnecessary file from avx2 list (#14012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14012 conv_dnnlowp_op.cc doesn't need avx2 anymore. Reviewed By: dskhudia Differential Revision: D13079665 fbshipit-source-id: dbfe8d2213de4969b6334d54de81d51149268cbd	2018-11-17 10:29:25 -08:00
Your Name	505dedf6ad	Change from using enum to int to store data_type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14140 Differential Revision: D13112937 Pulled By: bddppq fbshipit-source-id: 124d9546bfbd1f9c207a21e40eb3646f7739bd58	2018-11-17 09:24:03 -08:00
Junjie Bai	4f0434d5ab	Revert "CircleCI: fix NCCL install (#14124 )" (#14146 ) Summary: This reverts commit a1fa9d8cf9b2b0e7373ec420c2487d4dfd0e587c. [pytorch_linux_trusty_py2_7_9_build](https://circleci.com/gh/pytorch/pytorch/270206?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console): ``` Nov 17 07:37:27 + sudo apt-get -qq update Nov 17 07:37:30 W: Ignoring Provides line with DepCompareOp for package gdb-minimal Nov 17 07:37:30 W: You may want to run apt-get update to correct these problems Nov 17 07:37:30 + sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev Nov 17 07:37:30 E: Command line option --allow-downgrades is not understood Nov 17 07:37:30 + cleanup Nov 17 07:37:30 + retcode=100 Nov 17 07:37:30 + set +x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14146 Differential Revision: D13113912 Pulled By: bddppq fbshipit-source-id: cd9d371cf72159f03d12a8b56ed5bd2060ebbe59	2018-11-17 00:35:31 -08:00
Junjie Bai	fade36668a	Revert D10428917: [Caffe2] Add cost into profile observer Differential Revision: D10428917 Original commit changeset: 7c100e551bdd fbshipit-source-id: 5164d9ba61cc103eccfdeb91a5cc140cea31a819	2018-11-16 23:30:07 -08:00
Junjie Bai	a43037fa11	Revert D10439558: Add cost for non-linear ops Differential Revision: D10439558 Original commit changeset: 9aeb05bac8b5 fbshipit-source-id: f00977b4f95bdd500d254eb44fb5b0c816506ee4	2018-11-16 23:30:05 -08:00
Marat Dukhan	afc91e4900	Update FXdiv submodule (#14128 ) Summary: Use the most recent version that disables inline assembly. I suspect inline assembly causes miscompilation on some versions of gcc7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14128 Reviewed By: bddppq Differential Revision: D13112370 Pulled By: Maratyszcza fbshipit-source-id: 36cc95dc51390a293b72c18ae982c3a515a11981	2018-11-16 22:45:26 -08:00
Marat Dukhan	6d9a7d0e60	Rename neon2sse.h to NEON_2_SSE.h to match upstream repo Summary: - NEON2SSE is a header that implements NEON intrinsics on top fo SSE intrinsics - Upstream repo provides NEON_2_SSE.h header, but internally it was imported as neon2sse.h - This patch fix incompatibilities between internal and upstream versions Reviewed By: hlu1 Differential Revision: D13096755 fbshipit-source-id: 65e1df9a2a5e74bd52c9aee9be27469ba938cd8c	2018-11-16 21:41:53 -08:00
Marat Dukhan	351478439f	Disable QNNPACK for multi-architecture iOS builds (#14125 ) Summary: QNNPACK contains assembly files, and CMake tries to build them for wrong architectures in multi-arch builds. This patch has two effects: - Disables QNNPACK in multi-arch iOS builds - Specifies a single `IOS_ARCH=arm64` by default (covers most iPhones/iPads on the market) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14125 Differential Revision: D13112366 Pulled By: Maratyszcza fbshipit-source-id: b369083045b440e41d506667a92e41139c11a971	2018-11-16 21:18:01 -08:00
Sebastian Messmer	d56b2258f4	Register caffe2 layer norm with c10 dispatcher (#13693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13693 We can't directly call the caffe2::Operator class from c10 yet because that class isn't deprotobuffed yet. Instead, we factor out the kernel into a reusable static method and call it from the caffe2::Operator and also register it with c10. Reviewed By: ezyang Differential Revision: D12912242 fbshipit-source-id: c57502f14cea7a8be281f9787b175bb6e402d00c	2018-11-16 20:17:47 -08:00
Sebastian Messmer	c905a81c92	Add c10/core/ to cmake build (#14111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14111 It was already in TARGETs, but we forgot it in cmake. Reviewed By: ezyang Differential Revision: D13105166 fbshipit-source-id: f09549e98ebca751339b5ada1150e00cc4cd9540	2018-11-16 20:17:45 -08:00
Haixin Liu	bb404e7a32	Update atol scale in dnnlowp test (#14135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14135 Update atol scale of dnnlowp test. Can't reproduce the flaky test error in the task locally even after setting the same seed value, but found according to comments in check_quantized_results_close(), atol_scale should be 1/1.9=0.526315789473684, which is larger than current value 0.51. So increase the atol_scale to 0.53. Reviewed By: jspark1105 Differential Revision: D13108415 fbshipit-source-id: 1e8840659fdf0092f51b439cf499858795f9706a	2018-11-16 19:18:55 -08:00
Jongsoo Park	c784f847de	fix sparse_adagrad param_size overflow error (#14049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14049 param_size should be passed as int64_t Reviewed By: hyuen Differential Revision: D13090511 fbshipit-source-id: 7892d315d7c82c7d7ca103fb36d30cdf1fe24785	2018-11-16 18:53:32 -08:00
Haixin Liu	cbc94894fb	Add cost for non-linear ops (#13327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13327 Add cost inference function to non-linear ops. Since the actual flops of the non-linear operator depends on the implementation, we use the number of non-linear operations as the proxy for the analytical flops for non-linear operators. Reviewed By: jspark1105 Differential Revision: D10439558 fbshipit-source-id: 9aeb05bac8b5c7ae5d351ebf365e0a81cf4fc227	2018-11-16 18:53:30 -08:00
Haixin Liu	86dc3ab252	Add cost into profile observer (#12793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12793 Add analytical cost into profile observer. It includes the op level cost information for each op run and net level aggregated cost information for each op type. It outputs the following information: 1. analytical flops 2. analytical bytes_read 3. analytical bytes_written Example output at op level: ```I1017 14:58:14.245978 3686541 profile_observer_gpu.cc:26] --------- Starting operator FC op#24 --------- I1017 14:58:14.246049 3686541 profile_observer_gpu.cc:33] Input 0: Tensor model1/embedded_encoder_inputs of type float. Dims: (17,1,256,): I1017 14:58:14.246109 3686541 profile_observer_gpu.cc:33] Input 1: Tensor model1/encoder/layer0/fw/milstm/i2h_w of type float. Dims: (2048,256,): I1017 14:58:14.246176 3686541 profile_observer_gpu.cc:33] Input 2: Tensor model1/encoder/layer0/fw/milstm/i2h_b of type float. Dims: (2048,): I1017 14:58:14.246217 3686541 profile_observer_gpu.cc:44] Argument 0: name: "use_cudnn" i: 1 I1017 14:58:14.246271 3686541 profile_observer_gpu.cc:44] Argument 1: name: "cudnn_exhaustive_search" i: 0 I1017 14:58:14.246338 3686541 profile_observer_gpu.cc:44] Argument 2: name: "order" s: "NHWC" I1017 14:58:14.246372 3686541 profile_observer_gpu.cc:44] Argument 3: name: "axis" i: 2 I1017 14:58:14.246418 3686541 profile_observer_gpu.cc:44] Argument 4: name: "quantization_scheme" i: 1 I1017 14:58:14.246470 3686541 profile_observer_gpu.cc:53] Output 0: Tensor model1/encoder/layer0/fw/milstm/i2h of type float. Dims: (17,1,2048,): I1017 14:58:14.246596 3686541 profile_observer_gpu.cc:61] Cost (flops, bytes_read, bytes_written): I1017 14:58:14.246649 3686541 profile_observer_gpu.cc:62] 17860608 2122752 139264 I1017 14:58:14.246677 3686541 profile_observer_gpu.cc:64] --------- Finished operator FC in 0.764221 ms --------- ``` Example output at net level: ``` I1017 11:13:44.675585 3146691 profile_observer_gpu.cc:165] ================ Detailed stats for net model0/encoder/layer0/bw/milstm ================ I1017 11:13:44.675662 3146691 profile_observer_gpu.cc:167] Cost (flops, bytes_read, bytes_written) per operator type: I1017 11:13:44.675706 3146691 profile_observer_gpu.cc:169] 20992000 42045440 81920 FC I1017 11:13:44.675745 3146691 profile_observer_gpu.cc:169] 20480 163840 81920 Mul I1017 11:13:44.675824 3146691 profile_observer_gpu.cc:169] 20480 163840 81920 Sum I1017 11:13:44.675878 3146691 profile_observer_gpu.cc:169] 0 0 0 ElementwiseLinear I1017 11:13:44.675909 3146691 profile_observer_gpu.cc:169] 0 0 0 LSTMUnit I1017 11:13:44.675958 3146691 profile_observer_gpu.cc:169] 0 0 0 rnn_internal_apply_link ``` Reviewed By: mdschatz Differential Revision: D10428917 fbshipit-source-id: 7c100e551bdd3ac8d7c09be12c72d70a2d67cae1	2018-11-16 18:53:28 -08:00
Will Feng	a1fa9d8cf9	CircleCI: fix NCCL install (#14124 ) Summary: The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR is trying to figure out why. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14124 Reviewed By: teng-li Differential Revision: D13112483 Pulled By: yf225 fbshipit-source-id: 5f65997586648805cf52217a261389625b5535e1	2018-11-16 18:53:26 -08:00
Teng Li	eeb3e67eeb	Fixed MPI build with higher version of GCC (#14122 ) Summary: This appears as I enabled -Werror in c10d build. Good to catch this and fix it. Should fix https://github.com/pytorch/pytorch/issues/14078 and https://github.com/pytorch/pytorch/issues/13962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14122 Differential Revision: D13110678 Pulled By: teng-li fbshipit-source-id: f4c19e16976d65debbd33ed59e17ddbaa19f765a	2018-11-16 18:53:24 -08:00
Teng Li	778e23606b	multiprocessing.spawn python version check (#14039 ) Summary: This will be super helpful to the user Pull Request resolved: https://github.com/pytorch/pytorch/pull/14039 Differential Revision: D13089200 Pulled By: teng-li fbshipit-source-id: 29e7507bd8fe5a0c58a85c52f976bfca282b4c1b	2018-11-16 18:53:23 -08:00
Gregory Chanan	ce6192a21f	Don't python bind _thnn_ functions. (#14101 ) Summary: This is needed for moving nn functions to native functions, but since some functions are already named this way, I'm going to stop binding pre-emptively so we can check if there are any current dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14101 Differential Revision: D13102219 Pulled By: gchanan fbshipit-source-id: 6bbcca33a03ab1bf648f1b73cadfe84339fa3050	2018-11-16 17:18:08 -08:00
Peter Goldsborough	55e1b1ec3e	Fix docs/cpp/requirements.txt (#14121 ) Summary: soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14121 Differential Revision: D13108063 Pulled By: goldsborough fbshipit-source-id: 35cf65ba776e8826c5cab7ae6d3a2d446f87e7cc	2018-11-16 14:56:30 -08:00
Thomas Viehmann	8610ff1072	Allow cooperative structured objects to be passed modules in tracing (#13961 ) Summary: Before this patch, the JIT does not allow Module's forward to take structured objects. This patch allows cooperative objects to do so. Cooperative means: - It has a method self._jit_unwrap() that returns (a list/tuple of) tensors. These are then used in _iter_tensors. - It has a method self._jit_wrap(flattened_input) that takes (a list/tuple?) the flattened_unput (potentially more than it needs) and returns itself (updated) and the unconsumed flattened_inputs. This is then used in the _unflatten mechanism. This is all it takes to permit maskrcnn-benchmark to use its structured BoxList/ImageList types and trace it without calling the .forward directly. I'll push a model working with this patch in https://github.com/facebookresearch/maskrcnn-benchmark/pull/138 I must admit I haven't fully checked whether there are ONNX changes needed before it, too, can profit, but I would be hopeful that anything currently usable remains so. fmassa zdevito So the main downside that I'm aware of is that people will later want to use more elaborate mechanisms, but I think this could be done by just amending what wrap/unwrap are returning / consuming. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13961 Differential Revision: D13103927 Pulled By: soumith fbshipit-source-id: 2cbc724cc4b53197388b662f75d9e601a495c087	2018-11-16 14:02:13 -08:00
Peter Goldsborough	fb6535ec70	Add SharedDataset (#13800 ) Summary: This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred. jaliyae apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800 Differential Revision: D13075610 Pulled By: goldsborough fbshipit-source-id: 4ffdfd7959d49b042c0e254110085f62a0bfeb6c	2018-11-16 13:07:10 -08:00
jjsjann123	96e5d23bad	remove dynamic initialization warning (#13913 ) (#13967 ) Summary: removed assignment in default constructor. removed static shared memory and used dynamic shared memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13967 Differential Revision: D13089996 Pulled By: soumith fbshipit-source-id: 2a218b909c849bed39636b45a02d10ebc279a0b0	2018-11-16 13:04:22 -08:00
Peter Goldsborough	5b1b8682a3	Missing .decode() after check_output in cpp_extensions (#13935 ) Summary: soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13935 Differential Revision: D13090852 Pulled By: goldsborough fbshipit-source-id: 47da269d074fd1e7220e90580692d6ee489ec78b	2018-11-16 12:16:29 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Freddie Mendoza	2c21de2007	Make JOIN_TIMEOUT longer for ppc64le (#14107 ) Summary: This should resolve the issue on ppc64le getting FAIL: test_proper_exit (__main__.TestDataLoader). This only happens when the CI build machine is very busy and fails with a timeout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14107 Differential Revision: D13103859 Pulled By: soumith fbshipit-source-id: 268be80b59840853c5025f3211af272f68608fe5	2018-11-16 12:12:58 -08:00
Ilia Cherniavskii	c192788188	Log error from the net's run (#14035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14035 Log error meesage in case of net's run failure Reviewed By: andrewwdye Differential Revision: D13085431 fbshipit-source-id: d79f76782410cd3a5bd2d8d7f5fb1e535d821051	2018-11-16 12:06:50 -08:00
Junjie Bai	0d7a986da1	Change hip filename extension to .hip (#14036 ) Summary: xw285cornell - To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc `3d51a1fb01/bin/hipcc (L552)`). - Change to use host compiler to compile .cc\|.cpp files. Previously we use hcc to compile them which is unnecessary - Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036 Reviewed By: xw285cornell Differential Revision: D13091813 Pulled By: bddppq fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0	2018-11-16 11:55:59 -08:00
Your Name	30018fcd0b	Enable Caffe2 ROCm test on centos (#14090 ) Summary: xw285cornell petrex ashishfarmer rohithkrn Pull Request resolved: https://github.com/pytorch/pytorch/pull/14090 Differential Revision: D13096874 Pulled By: bddppq fbshipit-source-id: b471c6e4db95cd51567745a2f758d58bba7eafad	2018-11-16 11:51:58 -08:00
Junjie Bai	5a53861d3a	Enable Caffe2 test on centos (#14091 ) Summary: Turns out we don't have any centos test CI job Pull Request resolved: https://github.com/pytorch/pytorch/pull/14091 Differential Revision: D13104722 Pulled By: bddppq fbshipit-source-id: 22fe92ad4b7f2c391eea16b8b95658fa1ee605e2	2018-11-16 11:51:56 -08:00
Thomas Viehmann	1256cbaa69	Relax limits for gradients in test_jit's checkGraph (#14094 ) Summary: - This should help TestJit.test_lstm_fusion_concat_cuda to be less flaky. (Checked on manual_seed 0..99) Fixes: #14026 - Revert the renaming of test_fused_abs that was introduced to game the order of tests to avoid the flakiness above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14094 Differential Revision: D13100174 Pulled By: soumith fbshipit-source-id: 91bb63b07a960a81dddfc0bf25c67696c0f6c46d	2018-11-16 11:43:52 -08:00
Anders Papitto	2983998bb3	add torch-python target (#12742 ) Summary: This is the next minimal step towards moving _C into cmake. For now, leave _C in setup.py, but reduce it to an empty stub file. All of its sources are now part of the new torch-python cmake target. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742 Reviewed By: soumith Differential Revision: D13089691 Pulled By: anderspapitto fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385	2018-11-16 11:43:48 -08:00
Michael Suo	cb86ae304e	alias annotation parsing #2 (#14053 ) Summary: hopefully this one doesn't break master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14053 Differential Revision: D13093406 Pulled By: suo fbshipit-source-id: 8fed44f1a3d463748726cb14acac2ea53dedf29b	2018-11-16 11:39:25 -08:00
Andy Chen	77c2f4d0d7	Make THPDtype_New error instead of truncate (#14103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14103 Addressing T34828781, we change THPDtype_New so that it throws a RuntimeError if the length of name is greater than buffer size (DTYPE_NAME_LEN) - instead of truncating the string to fit the buffer. Reviewed By: ezyang Differential Revision: D13094600 fbshipit-source-id: d0dbf8fdfa342630c31f4d8ca7230d5f24a1254a	2018-11-16 11:35:18 -08:00
Yinghai Lu	7c053b7e64	Add filler for SparseLengthsWeightedSum (#13949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13949 This diff adds support to fillers for `SparseLengthsWeight` ops. It does 3 things: 1. Add the fillers for `SparseLengthsWeight` ops 2. Add filling heuristics to consider the path of `LengthsRangeFill` -> `Gather` -> `SparseLengthsWeightedSum`, where the length input is shared by `LengthsRangeFill` and `SparseLengthsWeightedSum`. Therefore, we need to carefully bound the value of that length input so that at `Gather`, it does not index out-of-bound for the weight input of `Gather`. 3. Fix and simplify the logic of `math::RandFixedSum`, where we just keep rejecting the generated value if it violates the invariants. Reviewed By: highker Differential Revision: D13048216 fbshipit-source-id: bfe402e07e6421b28548047d18b298c148e0ec87	2018-11-16 11:31:05 -08:00
Wanchao Liang	3c7b575a14	Update ATen doc with optional syntax (#14086 ) Summary: Update the readme to reflect the recent optional syntax change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14086 Differential Revision: D13096114 Pulled By: wanchaol fbshipit-source-id: 713834d4d92021e1c7a31f3a56a00fb7da58c348	2018-11-16 10:03:24 -08:00
Tongzhou Wang	562f61a662	Add missing space in stft doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14092 Reviewed By: soumith Differential Revision: D13100177 Pulled By: SsnL fbshipit-source-id: 4eeaa3d0c04212516941d8d5a266aafb53bd9672	2018-11-16 09:57:06 -08:00
Brian Vaughan	e4bb56570c	Preemptively test for out-of-order length. (#13933 ) Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order #13324 We were seeing this segfault on throw, pre-emptively checking avoids this: * Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 * Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933 Differential Revision: D13090389 Pulled By: nairbv fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8	2018-11-16 08:39:05 -08:00
Duc Ngo	c7a247facf	nomnigraph - support subgraph visualization (#13795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13795 Add ability for dot string generation for a single subgraph and python bindings (which is pretty useful for model exploration in Python) Restructure DotGenerator class a bit to make it easy to implement this feature Reviewed By: bwasti Differential Revision: D13010512 fbshipit-source-id: 825665438394b7e6968ab6da167b477af82a7b62	2018-11-16 08:19:20 -08:00
Duc Ngo	d7b95dda51	nomnigraph - easy - expose hasProduce(NodeRef) to python (#14075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14075 Expose hasProduce(NodeRef) to python Reviewed By: bwasti Differential Revision: D13092930 fbshipit-source-id: f1ec06e73e0f5f6a16ad0cbb7d2e3e499a861d8e	2018-11-16 08:19:18 -08:00
Duc Ngo	e7f5fceb99	nomnigraph - easy - expose inducesEdges and addNode to python's NNSubgraph (#14074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14074 expose inducesEdges and addNode to python's NNSubgraph. This make it easy to manually construct a NNSubgraph in python Reviewed By: bwasti Differential Revision: D13092885 fbshipit-source-id: a94ed0b318162e27e3a4b5a4954eb6d169da7405	2018-11-16 08:19:16 -08:00
Thomas Viehmann	7b0f674367	Two small improvements to TorchConfig.cmake (#13849 ) Summary: - Fix the test for TORCH_INSTALL_PREFIX in the environment. The previous version didn't actually work. - Add a guess path to find_package for Caffe2. I'd suspect that it's close to the Torch one. I noticed these while compiling PyTorch custom ops, in particular for the C++ side when you don't want to go through Python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13849 Differential Revision: D13090186 Pulled By: ezyang fbshipit-source-id: cfe98900ab8695f008506a8d0b072cfd9c673f8f	2018-11-16 07:41:57 -08:00
lyuwenyu	1b1cdd944c	Keep `ModuleList` consistent with python `list` in `__setitem__` function. (#13102 ) Summary: `ModuleList` class function `__setitem__` has implicit rist ``` In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)]) In [27]: mlist Out[27]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) ) In [28]: mlist[-1] = nn.ReLU() In [29]: mlist Out[29]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) (-1): ReLU() ) In [30]: mlist[-1] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-30-229d1b6823a0> in <module>() ----> 1 mlist[-1] ~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx) 134 return ModuleList(list(self._modules.values())[idx]) 135 else: --> 136 return self._modules[self._get_abs_string_index(idx)] 137 138 def __setitem__(self, idx, module): KeyError: '2' ``` modified as ``` def __setitem__(self, idx, module): idx = self._get_abs_string_index(idx) return setattr(self, str(idx), module) ``` to fix it. ``` In [31]: class NewModuleList(nn.ModuleList): ...: def __setitem__(self, idx, module): ...: idx = self._get_abs_string_index(idx) ...: return setattr(self, str(idx), module) ...: In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)]) In [33]: mlist[-1] = nn.ReLU() In [34]: mlist Out[34]: NewModuleList( (0): ReLU() (1): ReLU() ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102 Differential Revision: D13092480 Pulled By: ezyang fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749	2018-11-16 07:39:26 -08:00
vishwakftw	a3f39f1ebb	Fix randint docs (#14083 ) Summary: Closes #14079 Differential Revision: D13095904 Pulled By: soumith fbshipit-source-id: e39319c5326bfdf6f401eaddebe94474349901c3	2018-11-16 03:04:02 -08:00
Your Name	2fe4711eb4	Revert "Remove OptionsGuard from ATen (#13738 )" (#14082 ) Summary: This reverts commit 37cb357d8da3427900b8f72f6de7e77b77dcdbae. Try to see if it unbreaks master Pull Request resolved: https://github.com/pytorch/pytorch/pull/14082 Differential Revision: D13095888 Pulled By: bddppq fbshipit-source-id: c728f80f233b4d9daaf65f43202d8104651029a9	2018-11-15 23:47:36 -08:00
Teng Li	45fd77d3b7	Adding GLOO_SOCKET_IFNAME env to allow user set gloo device (#14065 ) Summary: Address https://github.com/pytorch/pytorch/issues/14063 This is a lot easier to use, follow the NCCL convention since they provide the similar NCCL_SOCKET_IFNAME. We can later document this better. Tested on my two hosts, and work out of the box Pull Request resolved: https://github.com/pytorch/pytorch/pull/14065 Differential Revision: D13095522 Pulled By: teng-li fbshipit-source-id: 131dff212626f1aab7e752427f1b684845b909dc	2018-11-15 22:33:56 -08:00
Parth Raichura	3808e9fad3	Caffe2: Fix for creating entries of external_input in predic_net (#12979 ) Summary: Currently after performing export it gives two entries of externel_input of input data in predict_net proto because it extends the externel_input twice once seperately using input blob and one it is extendind all the entries of external_input from proto in which input blob is already included Signed-off-by: Parth Raichura <parth.raichura@softnautics.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12979 Differential Revision: D12916349 Pulled By: soumith fbshipit-source-id: 4d4a1c68c0936f8de3f4e380aea1393fe193cd2d	2018-11-15 22:33:50 -08:00
Michael Suo	1e8aeb0bee	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14076 Differential Revision: D13095528 Pulled By: suo fbshipit-source-id: 78d08719ad5579dc0d6bb9563972df393e4286fe	2018-11-15 22:10:06 -08:00
Syed Tousif Ahmed	3a15de9e44	Fix CUDA_tensor_apply1 base case (#14056 ) Summary: I got some build errors when modifying the `bernoulli_tensor_cuda_kernel` in my Generator refactor https://github.com/pytorch/pytorch/pull/13070. Turns out the functions signature for `CUDA_tensor_apply1` was a little wrong. This PR fixes it. Following is the code and error I was getting before this patch: Code: ``` template<typename scalar_t, typename prob_t> void bernoulli_tensor_cuda_kernel( at::Tensor& ret, const at::Tensor& p, std::pair<uint64_t, uint64_t> seeds) { // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__(scalar_t& v1, const prob_t& p1) { at::cuda::Philox4_32_10 engine( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second); auto x = at::cuda::standard_uniform_distribution(engine); assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(x <= p1); } ); } ``` Error: ``` ov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAApplyUtils.cuh(236): error: no suitable conversion function from "const lambda [](uint8_t &)->void" to "int" exists Nov 15 23:43:03 detected during: Nov 15 23:43:03 instantiation of "void at::cuda::<unnamed>::ApplyOp1<Op, scalar, IndexType, ADims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar, IndexType> &, const Op &, int, IndexType, Offsets...) [with Op=lambda [](uint8_t &)->void, scalar=uint8_t, IndexType=unsigned int, ADims=1, remaining_steps=1, Offsets=<>]" Nov 15 23:43:03 (282): here Nov 15 23:43:03 instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply1<Op,scalar,IndexType,ADims,step>(at::cuda::detail::TensorInfo<scalar, IndexType>, IndexType, Op) [with Op=lambda [](uint8_t &)->void, scalar=uint8_t, IndexType=unsigned int, ADims=1, step=1]" Nov 15 23:43:03 (735): here Nov 15 23:43:03 instantiation of "__nv_bool at::cuda::CUDA_tensor_apply1<scalar,step,Op>(at::Tensor, Op, at::cuda::TensorArgType) [with scalar=uint8_t, step=1, Op=lambda [](uint8_t &)->void]" Nov 15 23:43:03 (774): here Nov 15 23:43:03 instantiation of "__nv_bool at::cuda::CUDA_tensor_apply1<scalar,Op>(at::Tensor, Op, at::cuda::TensorArgType) [with scalar=uint8_t, Op=lambda [](uint8_t &)->void]" Nov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Distributions.cu(118): here Nov 15 23:43:03 instantiation of "void <unnamed>::bernoulli_scalar_cuda_kernel<scalar_t>(at::Tensor &, double, std::pair<uint64_t, uint64_t>) [with scalar_t=uint8_t]" Nov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Distributions.cu(227): here Nov 15 23:43:03 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14056 Differential Revision: D13095362 Pulled By: soumith fbshipit-source-id: 6416bc91616ec76036479062a66517557a14d1b9	2018-11-15 21:33:07 -08:00
Viswanath Sivakumar	037d6b697b	Add ResizeNearest DNNLOWP op (#13940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13940 As in title Reviewed By: jspark1105 Differential Revision: D13054325 fbshipit-source-id: 81af5f095a1aca92d4b5e1fe0e71ae2f21b43922	2018-11-15 21:03:01 -08:00
Daya S Khudia	f66cb02016	Turn fbgemm off by default for pytorch (#14048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14048 Setting USE_FBGEMM to OFF by default until we figure out properly separating avx2 code. See [this issue](https://github.com/pytorch/pytorch/issues/13993). Pytorch can still be compiled with fbgemm by using USE_FBGEMM=ON. Reviewed By: jspark1105 Differential Revision: D13090454 fbshipit-source-id: 6e0e92612e4362a306e376df3dc33e8edeb066e9	2018-11-15 18:42:16 -08:00
Teng Li	f17b2fdf1b	Fixed THD DistributedDataParallel not picklable (#14051 ) Summary: This fixed https://github.com/pytorch/pytorch/issues/12261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14051 Differential Revision: D13091703 Pulled By: teng-li fbshipit-source-id: 16eb85a259c981f3cacd2fbaecc0edbae292e358	2018-11-15 18:10:47 -08:00
Peter Goldsborough	37cb357d8d	Remove OptionsGuard from ATen (#13738 ) Summary: Deletes the `OptionsGuard` from ATen. This works towards the goal of reworking `DefaultTensorOptions`. `OptionsGuard` is troublesome because it relies on mutating thread local state. This PR fixes those code locations and then deletes the `OptionsGuard`. ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13738 Differential Revision: D13000962 Pulled By: goldsborough fbshipit-source-id: c8143ee75070c2280f5fd1d9af86f8ce14279b72	2018-11-15 17:37:27 -08:00
Peter Goldsborough	8f4dc192b6	Fix DataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured (#14038 ) Summary: I think this will be it. So for one, the previous test was bullshit because it was returning the thread id instead of the sample index (which is the thing whose ordering is enforced). Just turning up the number of threads to 10 from 4 made this very obvious. I also think there is a race condition, which may or may not have surfaced, in that there was nothing stopping one worker to get multiple batches, which would screw with the whole ordering logic. I've added a barrier struct such that workers wait for all workers to be in the `get_batch` function before actually doing something. Fixes https://github.com/pytorch/pytorch/issues/14002 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14038 Differential Revision: D13088132 Pulled By: goldsborough fbshipit-source-id: 4bded63756c6a49502ee07ef8709a03073e7e05f	2018-11-15 17:30:41 -08:00
Ilia Cherniavskii	f930c4307c	Clean up executor's execution flags (#13869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13869 Remove unused flags and consolidate them into one struct Reviewed By: yinghai Differential Revision: D13032207 fbshipit-source-id: 2cef093589036238732099e3851a97e739b5fd55	2018-11-15 17:11:51 -08:00
bddppq	874a8a321b	Fix out of order member fields initializaitons (#14015 ) Summary: xw285cornell Unfortunately it's not easy to add -Werror=reorder flag since there are out of order initializations in thrust headers as well, and the rocm cmake macro hip_include_directories doesn't offer a way to include headers as external headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14015 Reviewed By: soumith Differential Revision: D13081104 Pulled By: bddppq fbshipit-source-id: 2540421cb29cf556c79f2d86c460bde6ea5a182e	2018-11-15 17:11:50 -08:00
Edward Yang	31d41a983a	Revert D13088038: [pytorch][PR] [jit] extend alias annotations Differential Revision: D13088038 Original commit changeset: 49dc5d0e9cd4 fbshipit-source-id: b77e4607f3cbd9c202c522a436f90e9a98acd4b4	2018-11-15 16:55:11 -08:00
Brian Johnson	6d378d3740	Updating C++ documentation to PyTorch theme. (#13791 ) Summary: Updates C++ documentation to the PyTorch Sphinx theme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13791 Reviewed By: soumith Differential Revision: D13013908 Pulled By: brianjo fbshipit-source-id: 253a91c6784ad72aa1c37426cd4a945061a60fec	2018-11-15 16:45:52 -08:00
David Riazati	0d29846d5e	Convert more weak functions (#14003 ) Summary: Same deal as #13707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14003 Differential Revision: D13076403 Pulled By: driazati fbshipit-source-id: eb3cb3b2c31caf1de591b613bdc4c9a6ed4e1767	2018-11-15 16:45:50 -08:00
Matthew Brandyberry	c5afad5579	Fix skip logic in caffe_translator_test.py (#13627 ) Summary: Avoid false failure by checking for the presence of the test data in setup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13627 Differential Revision: D13090324 Pulled By: ezyang fbshipit-source-id: e85571943d168c0007212d7b1a5b99ffa0c39235	2018-11-15 16:45:49 -08:00
Ilia Cherniavskii	0e93500841	Remove async_polling (#13825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13825 async_polling was an intermediate step towards async_scheduling and is not used Reviewed By: yinghai Differential Revision: D13019059 fbshipit-source-id: eee6ba53e7f476ddb481afba3bf1768303864d32	2018-11-15 16:23:15 -08:00
Zachary DeVito	0573169e23	Import a method from an python_print string (#13959 ) Summary: * Add hooks to get a callback whenever a valid graph is produced in the compiler or through tracing. These hooks can be used to pretty_print and then reparse every graph our tests produce to check that the serialization function works correctly. Currently this is guarded by an environment variable since there are a few remaining failures. * Fix printing bugs: True and False rather than 1 and 0, print 0. for floating point zero * Change behavior of NoneType. It is now no longer a subtype of Optional but instead implicitly converts to it, returning a prim::Node with an Option[T] type for some specific T. This allows functions like `_unwrap_optional` to correctly match against a None while still deriving the right type. * Fix a bug where empty blocks did not correctly emit "pass" in printer. * Fix a bug where prim::Undefine sometimes cannot be printed as None because it is being used in a schema-less op. This should be fixable once Optional[T] always uses the same None object. * Other minor printing bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/13959 Reviewed By: jamesr66a Differential Revision: D13073519 Pulled By: zdevito fbshipit-source-id: 4167a6b614f2e87b4d21823275a26be5ba4fc3dd	2018-11-15 16:11:37 -08:00
Yinghai Lu	84d464f8f9	Revert "Upgrade mkldnn bridge to reduce overhead of bridge itself (#1… (#14040 ) Summary: …2164)" This reverts commit 4b7c6150d848d134d1fe850e777dc68321d35465. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14040 Differential Revision: D13089531 Pulled By: yinghai fbshipit-source-id: 2114b36111dab6f179c02921bbc9bd382ef461bf	2018-11-15 15:34:15 -08:00
Jerry Zhang	90b0c4f43d	Tensor construction: combine Resize+mutable_data - 2/4 (#13943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13852 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13054815 fbshipit-source-id: e89c2e69217880980187f2befb844c277e51c1e0	2018-11-15 15:34:14 -08:00
Joe Peplowski	136f5c9fe1	Replaced using declaration with explicit constructors 3/3 (#13875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13875 This replaces a using declaration with an explicit constructor Reviewed By: mnovakovic Differential Revision: D13033260 fbshipit-source-id: ce4cc5667ee66abdeebd1e49466c3cf3a65ffb96	2018-11-15 14:52:47 -08:00
Edward Yang	3fbb753512	Revert D12873145: [pt1][tensor][refactor] FeedTensor returns a Tensor Differential Revision: D12873145 Original commit changeset: 653735c20d61 fbshipit-source-id: aa6e40a6a24c6f90acbe87b32b3be0020e2584f8	2018-11-15 14:52:46 -08:00
Michael Suo	d91c686c33	extend alias annotations (#13632 ) Summary: Grab bag of additions to alias annotations that were useful when writing the alias analysis pass. Not very organized since these were mostly split off from that PR. - Switch alias sets to actual sets, since we will want to union them. - Correctly parse alias set unions `a\|b`, and correctly parse wildcards - Move writes into `AliasInfo`, which cleans up some code that was passing a `writes` vector everywhere and simplifies tracking aliased writes during analysis. - Change Tensor list extraction ops to return wildcard tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13632 Differential Revision: D13088038 Pulled By: suo fbshipit-source-id: 49dc5d0e9cd4895427fea3a87b0ec325bd5fe437	2018-11-15 14:23:40 -08:00
Thomas Viehmann	c7e0db140e	use fabs instead of absf in fuser code for aten::abs (#13985 ) Summary: absf didn't work for CUDA Fixes: #13971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13985 Differential Revision: D13084601 Pulled By: soumith fbshipit-source-id: 0027ee719ae2b6a2bfce9c26f21db9c5e6159686	2018-11-15 13:23:59 -08:00
Xiang Gao	c3578b561c	Skip all builtin functions when importing names from _C._VariableFunctions to torch (#13884 ) Summary: We don't want builtin functions of `_C._VariableFunctions` to replace those of `torch`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13884 Reviewed By: ezyang Differential Revision: D13044686 Pulled By: yf225 fbshipit-source-id: 23657d47a4e2fd8ee41103cd6a13c639ce107f67	2018-11-15 13:23:57 -08:00
Gu, Jinghui	4b7c6150d8	Upgrade mkldnn bridge to reduce overhead of bridge itself (#12164 ) Summary: Upgrade mkldnn bridge to reduce overhead of bridge itself Pull Request resolved: https://github.com/pytorch/pytorch/pull/12164 Reviewed By: yinghai Differential Revision: D10159149 Pulled By: wesolwsk fbshipit-source-id: 5ede1130c00a2cd3afe301dcb94bcb89e01bc5a2	2018-11-15 12:54:06 -08:00
Bram Wasti	3de0fd846f	Fix converter to accept const NetDef& Summary: convertToNNModule didn't accept `const Netdef&`. fixed this Reviewed By: duc0 Differential Revision: D13057450 fbshipit-source-id: dc6fa2c86077a56b955f15c369b941a2d32de911	2018-11-15 12:18:11 -08:00
Lingyi Liu	5639332a28	fix the deeptext issue (#14005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14005 the partial initialization of tensor is no longer supported, we need to fix multiple places Reviewed By: hl475 Differential Revision: D13078206 fbshipit-source-id: a1be2bd2a9f573db54e1366a0d7a17cc2e0db0c9	2018-11-15 12:13:45 -08:00
Jerry Zhang	b8de8f6261	Refactor tensor construction in onnxifi_op Summary: att Reviewed By: ezyang Differential Revision: D13028624 fbshipit-source-id: efd8dee5d59f26830a15bb17211eee373f6c8dee	2018-11-15 11:23:21 -08:00
Peter Goldsborough	464c0c2204	Use realpath for loaded libraries (#13936 ) Summary: I noticed `CDLL` needs an absolute path (when calling `torch.ops.load_library`) zdevito soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13936 Differential Revision: D13075605 Pulled By: goldsborough fbshipit-source-id: 297c490cfa3bfaf540b95a9c2644d9153abe4c32	2018-11-15 11:23:20 -08:00
Lin Yang	17b2d2d373	fix TensorPrinter when tensor have 0 size. (#13986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13986 if totoal_count == 0, it crash on: values_stream << tensor_data[total_count - 1]; Reviewed By: jerryzh168 Differential Revision: D13066438 fbshipit-source-id: b7a2d681ca0cf5b68d78872c94fac6de9c5de2dc	2018-11-15 07:51:13 -08:00
Ilia Cherniavskii	4574ea3bec	Make RNN operator handle exceptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13997 Reviewed By: dzhulgakov, bddppq Differential Revision: D13072518 Pulled By: ilia-cher fbshipit-source-id: c4fd897038b6dca41db652b9e063fc12d98f6d07	2018-11-15 00:48:22 -08:00
Wanchao Liang	6d094224b9	Fix optional import/export, export multi-margin-loss (#13877 ) Summary: This PR did two thing: 1. it fix the optional import/export to include any type including tensor types (previously we only support base types), this is essential to unblock optional tensor type annotation in our test logic 2. it tries to export mult_margin_loss functional to serve as a example of optional undefined tensor use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13877 Differential Revision: D13076090 Pulled By: wanchaol fbshipit-source-id: c9597295efc8cf4b6462f99a93709aae8dcc0df8	2018-11-15 00:45:22 -08:00
Teng Li	ddbd87e310	Build with -Werror (#13998 ) Summary: Also fixed a warning As a thought while trying to solve #12854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13998 Reviewed By: pietern Differential Revision: D13078615 Pulled By: teng-li fbshipit-source-id: eb25c429d7dd28b42e4e95740a690d5794a0c716	2018-11-14 22:45:30 -08:00
James Reed	5390ab1d52	Dont crash on 1d convolution (#13999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13999 Temporary mitigation for SEV3 https://our.intern.facebook.com/intern/sevmanager/view/s/168910/ Reviewed By: yinghai Differential Revision: D13075307 fbshipit-source-id: 4df2bcc37b91900653443f7766d5bb080ca3f5a9	2018-11-14 22:38:00 -08:00
Michael Suo	eb024cd1d0	don't throw in matchTypeVariables (#13989 ) Summary: Avoid throwing on match errors. In general, it's not good to throw when failure is expected. But the real reason I'm doing this is it makes it annoying to set a breakpoint on exceptions in my debugger 😛 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13989 Differential Revision: D13069980 Pulled By: suo fbshipit-source-id: 636d4371f8a5be45c935198b73cdea06275b1e9e	2018-11-14 21:45:19 -08:00
Teng Li	20e395a130	Fixed uninitialized warning (#14001 ) Summary: Fixing: https://github.com/pytorch/pytorch/issues/12014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14001 Differential Revision: D13078583 Pulled By: teng-li fbshipit-source-id: 6c8d663da81bc3e564f0643926d67260df828dd8	2018-11-14 21:37:11 -08:00
Sebastian Messmer	e3bb6ff334	Move c10 dispatcher prototype to c10/ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13690 Reviewed By: dzhulgakov Differential Revision: D12912235 fbshipit-source-id: 974b85790c23335be8130a50aa4692e3ddcd2bf9	2018-11-14 18:04:36 -08:00
Sebastian Messmer	4b0fc5200b	Fix include paths for typeid.h (#13689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13689 Now that typeid.h lives in c10/util, the include paths should reflect that. Reviewed By: ezyang Differential Revision: D12912237 fbshipit-source-id: e54225f049f690de77cb6d5f417994b211a6e1fb	2018-11-14 18:04:09 -08:00
Edward Yang	72da09bb4d	Canonicalize THD includes with .. in them Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13980 Reviewed By: jerryzh168 Differential Revision: D13062706 fbshipit-source-id: 100e10d1bae7efc3e13f029708c2c1dd053ce074	2018-11-14 17:43:56 -08:00
Michael Suo	7ea9c674bc	migrate subgraph slicing to use `moveBefore/moveAfter` (#13862 ) Summary: Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability. The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR. Future steps: - Get rid of code duplication (see above) - Use DynamicDAG to back the `moveBefore/After` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862 Differential Revision: D13072871 Pulled By: suo fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0	2018-11-14 17:33:36 -08:00
Yan Zhu	2356c8d542	device inference for Adam (#13990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13990 to make sure ITER blob lives on CPU. Reviewed By: xianjiec Differential Revision: D13056070 fbshipit-source-id: 148edbf745e50e886da3eb99d4e485d11c1924e2	2018-11-14 17:21:08 -08:00
Edward Yang	fed8d8975a	Various improvements to hipify_python.py (#13973 ) Summary: - Speed up hipify_python.py by blacklisting useless (and quite large) directory trees that it would otherwise recurse into - Pass around relative paths instead of absolute paths. This makes it easier to do filename matches based on the root of the tree. - Redo the streaming output to contain more useful information - Make it handle c10/cuda correctly, rewrite c10::cuda to c10::hip, and the header name from CUDAMathCompat.h to CUDAHIPCompat.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13973 Differential Revision: D13062374 Pulled By: ezyang fbshipit-source-id: f0858dd18c94d449ff5dbadc22534c695dc0f8fb	2018-11-14 17:11:24 -08:00
Gregory Chanan	02152c515e	Ensure nn Losses check scalar vs non-scalar values. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860 Reviewed By: ezyang Differential Revision: D13029364 Pulled By: gchanan fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53	2018-11-14 16:46:27 -08:00
Yinghai Lu	6811e32f03	Support exporting Gather and BatchGather to ONNX (#13987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13987 Gather and BatchGather are also used in sparse network. Reviewed By: bddppq, houseroad Differential Revision: D13067290 fbshipit-source-id: e09572a5c4544768f9e1af48166f7c8d78127e63	2018-11-14 15:40:17 -08:00
Brennan Vincent	7daa829bce	Implement `unsqueeze` for sparse vectors (this also makes `stack` work out of the box) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13760 Differential Revision: D13065342 Pulled By: umanwizard fbshipit-source-id: a5e2e80f87ffbbfdf8759b1b593ef34d290ae907	2018-11-14 15:23:05 -08:00
Pieter Noordhuis	ff4f4a0a35	Retry test on "Address already in use" error (#13911 ) Summary: This fixes #13907. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13911 Differential Revision: D13046256 Pulled By: pietern fbshipit-source-id: bab70cd73ef868e23d4857b06e72830ad29ddb4f	2018-11-14 15:23:03 -08:00
Edward Yang	61a0df5af0	Canonicalize THC/THCTensorMasked.cuh include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13977 Reviewed By: jerryzh168 Differential Revision: D13062564 fbshipit-source-id: 77d42585198cd75bc8a2625787604552e5369787	2018-11-14 14:56:30 -08:00
Edward Yang	01d606e048	Canonicalize TH/THRandom.h include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13975 Reviewed By: jerryzh168 Differential Revision: D13062526 fbshipit-source-id: 510e0ff5ce68c20c2f46bae71efa8e4355c6ce05	2018-11-14 14:56:27 -08:00
Edward Yang	9e1655bb22	Canonicalize THCUNN/linear_upsampling.h include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13979 Reviewed By: jerryzh168 Differential Revision: D13062649 fbshipit-source-id: 28b2cbe97613b485ab11bf35be60ca6ee668bbef	2018-11-14 13:50:30 -08:00
Edward Yang	af6d1ec52c	Canonicalize THCUNN/common.h include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13978 Reviewed By: jerryzh168 Differential Revision: D13062631 fbshipit-source-id: 2b1b13c28ee8be603b0cdca46c7ac7f86317c39f	2018-11-14 13:30:27 -08:00
Edward Yang	a7d43702d4	Canonicalize THCGenerate*.h includes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13976 Reviewed By: jerryzh168 Differential Revision: D13062604 fbshipit-source-id: 48b7e2a2bdf97c55820036db9a4ff18a1f4dbce2	2018-11-14 13:30:25 -08:00
Daya S Khudia	f446c67e2f	submodule update to fix compilation warnings (#13925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13925 Fixing compilation warnings; Already fixed in fbgemm repo so just updating submodule Reviewed By: jianyuh Differential Revision: D13048100 fbshipit-source-id: 568f0f90a5499b6f2cab525b2379299d1565bbae	2018-11-14 13:27:32 -08:00
Lara Haidar-Ahmad	587f769a99	Fix missing symbol linker error when using libtorch generated on windows : (#13672 ) Summary: Libtorch is missing some symbols when generated on windows, causing linker errors when using it. It seems like there were some issues in the past with enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS to export all symbols during the build. (See the link below : - Enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS : https://github.com/pytorch/pytorch/pull/3617?fbclid=IwAR084kOPgLUvYjpJMvGG_Q22IPcvmzlywamytdhxd5U3hELkESO6yM8BGfo - Disabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS : https://github.com/pytorch/pytorch/issues/9092?fbclid=IwAR0QSeEcXNh8A1zrgCQvsEq-0S0GJvHBywhZ6kDvoHe6TeRUsTNRzzgXea0 and https://github.com/pytorch/pytorch/pull/9693?fbclid=IwAR2cSya4fbeHvF-BYkXk2NesXjQ3ZWg9vHJ3ivrT9GDJYqHSpg518KAMzW8 ) So enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS is not an option. But some symbols are still missing for Libtorch to be working. We added some functions to TORCH_API in this PR, but we might be missing some. (We also tried adding the whole structure Method (struct TORCH_API Method { ... }) instead of adding the functions separately, but the build fails with a "one or more multiply defined symbols found" error) Do you have any recommendations on how to detect functions that should/shouldn't be in TORCH_API, so the build is successful and the generated Libtorch has all the required exported symbols? I also attached toch_exports_missing.txt, which contains the symbols that are exported with the CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS flag enabled but not in the current Libtorch version. ( by generating the output for both torch.dll libraries with "dumpbin /EXPORTS torch.dll" and comparing both outputs and generating the difference) So any symbol that could be missing from Libtorch should be in this list, but the list has more than 8000 symbols, and I am not sure which ones require to be exported and added to TORCH_API. This PR currently exports the missing symbols for torch::jit::script::Method that appears in the attached list (in the exception of defaultSchemaFor, and emit_call_to that cause a "multiply defined symbols" error). [torch_exports_missing.txt](https://github.com/pytorch/pytorch/files/2558466/torch_exports_missing.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13672 Differential Revision: D12959348 Pulled By: soumith fbshipit-source-id: ef7e85b047b3937dc6aa01ba67e4e01f8eae4eca	2018-11-14 12:00:36 -08:00
Edward Yang	0478d32cb8	Move AlignOf, SmallVector and ArrayRef to c10. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916 Reviewed By: smessmer Differential Revision: D13046722 fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d	2018-11-14 11:13:16 -08:00
Teng Li	4983397c02	Better documentation and warning (#13946 ) Summary: This is to address https://github.com/pytorch/pytorch/issues/12603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13946 Differential Revision: D13055254 Pulled By: teng-li fbshipit-source-id: 20a206ebd3456eac9dc50584664c4bca3ee955d1	2018-11-14 10:41:46 -08:00
Xiang Gao	143ba72264	Move cosine_similarity to ATen (#12199 ) Summary: I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199 Differential Revision: D13062326 Pulled By: nairbv fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd	2018-11-14 10:41:44 -08:00
Jongsoo Park	53c3a92a50	consistent rounding (#9 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13960 The vectorized code was rounding to even in halfway cases with _mm256_round_ps + (_MM_FROUND_TO_NEAREST_INT \|_MM_FROUND_NO_EXC) (see more details in https://software.intel.com/en-us/node/523819), but we were still using std::round in a couple of places which does rounding away from zero in halfway cases. With this diff, we use std::nearbyint in all scalar code (except a few cases where we don't care exact rounding mode and uses rint which is the fastest in general) to be more consistent. nearbyint is the same as what the vectorized code does only when the current rounding mode is FE_TONEAREST but in practice this is OK because we almost always use the default rounding mode FE_TONEAREST. This is inspired by Marat's diff for mobile quantization. Reviewed By: dskhudia Differential Revision: D13017719 fbshipit-source-id: 6b8f99db7ea2e233aa2e3bd2adf622e03ed6258e	2018-11-14 10:21:42 -08:00
Edward Yang	96663edca6	Remove the hip ignore; it conflicts with real in-tree HIP development. (#13972 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13972 Differential Revision: D13062253 Pulled By: ezyang fbshipit-source-id: 4442b194bb08e4f718dff844743d23fd3a6dc8e9	2018-11-14 10:03:19 -08:00
lberrada	35a24a9a94	Example with edge case 0 for torch.sign (#13771 ) Summary: The behavior of the edge case 0 is not self-evident for the `torch.sign` function ( I personally expected a result of 1): ```python >>> a = torch.tensor([0.7, -1.2, 0., 2.3]) >>> a tensor([ 0.7000, -1.2000, 0.0000, 2.3000]) >>> torch.sign(a) tensor([ 1., -1., 0., 1.]) ``` This is not currently documented, I think it is worth it to give a simple example showing this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13771 Differential Revision: D13044520 Pulled By: ailzhang fbshipit-source-id: c3011ccbdf1c13348f6c7242b06a9aa52ebc9204	2018-11-14 09:16:09 -08:00
Jongsoo Park	dead6632b3	bug fix for 1D conv in NHWC layout (#13813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13813 Title says it all. Reviewed By: hx89 Differential Revision: D13017652 fbshipit-source-id: e3cea6c7dee2878119d154bb9f3efbc329d7c0d5	2018-11-14 09:16:07 -08:00
Gregory Chanan	4341dd2753	Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906 ) Summary: This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d. Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906 Reviewed By: ezyang Differential Revision: D13044507 Pulled By: gchanan fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e	2018-11-14 07:58:35 -08:00
Edward Yang	46c0e2c268	Clean up caffe2/tools/build_pytorch_libs.{sh,bat} (#13954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13954 - Remove extra include directories from BASIC_C_FLAGS. We suspect that in some rare cases on Windows, this can cause us to get confused about which header to include. Make this agree with build_pytorch_libs.sh Ditto with BASIC_CUDA_FLAGS - Delete CWRAP_FILES from both places; it's unused in sh, and it's dead in CMAKE - Delete NO_NNPACK in Windows, replace with USE_NNPACK (I'm not sure if this actually does anything on Windows lol) - Delete a bunch of defunct cmake arguments from the build (NOT build_caffe2) target. Reviewed By: soumith Differential Revision: D13056152 fbshipit-source-id: efcc06c65a9f3606666196f3fe5db268844d44d9	2018-11-14 07:42:11 -08:00
Edward Yang	a440629f14	Remove defunct build.sh/THConfig.cmake (#13953 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13953 Differential Revision: D13056128 Pulled By: ezyang fbshipit-source-id: 9fd17f4fe000ac06144b04be996ef6849de2bafa	2018-11-14 07:42:09 -08:00
Edward Yang	fbabe5bf62	Rename c10::detail to c10::impl (#13838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13838 According to Sebastian, the detail convention is specifically for header-private functionality. That's not what c10/detail is; it's general, library private headers which may be used in multiple places within PyTorch. Rename it to impl to avoid the confusion in nomenclature. Reviewed By: smessmer Differential Revision: D13024368 fbshipit-source-id: 050f2632d83a69e3ae53ded88e8f938c5d61f0ef	2018-11-14 07:39:37 -08:00
Richard Zou	db5aeafa60	Avoid grabbing DeviceGuard in at::empty when possible (#13785 ) Summary: Changed at::empty to allocate the correct amount of memory instead of "allocate 0 memory and then resize it to the necessary size". This leads to a 300 ns speedup for at::empty for a cuda tensor of size (64, 2048). (1790ns -> 1460ns for at::empty). Also does the following: Removes DeviceGuards for: - empty_* functions that end up calling functions that already have a DeviceGuard - t(), which gets called a lot in LSTMs, - Remove one of the two DeviceGuard that at::empty(...) uses. It only needs one for correctness, the other comes from the resize_ implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13785 Reviewed By: ezyang Differential Revision: D13004938 Pulled By: zou3519 fbshipit-source-id: f45b7e6abe06c05d1f81cc53e190c7bab6d1c116	2018-11-14 07:39:35 -08:00
Richard Zou	1e45e7a404	Speed up fusion compiler tensor allocation (#13914 ) Summary: Previously the fusion compiler would allocate an empty tensor and then resize it to the correct size. This PR changes the fusion compiler to allocate a tensor of the correct size the first time around. The difference between these approaches for a single tensor is around 400ns; for something like LSTMCell's FusionGroup that emits 8 outputs this is theoretically a 3us win. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13914 Differential Revision: D13046728 Pulled By: zou3519 fbshipit-source-id: e2f28c0dc2ee5bcfee0efe10610039694691415c	2018-11-14 07:26:27 -08:00
Sebastian Messmer	109dd5b412	Move typeid to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13688 Reviewed By: ezyang Differential Revision: D12912240 fbshipit-source-id: 1632172003682f62cea9b8c52596c3c0d8504b23	2018-11-14 02:58:04 -08:00
Teng Li	97036d3c30	FileStore auto deletes file and FileStore::add bug fix (#13708 ) Summary: This addressed: https://github.com/pytorch/pytorch/issues/11874 and we will have the identical file init_method behavior as the previous THD file init. Also the FileStore::add bug is pretty annoying. Two bugs: (1) Add doesn't append to the end of the file. (2) Cache doesn't get updated. Both are fixed and tests are covered. I examined the /tmp to ensure that all temp files are auto deleted after test_c10d.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/13708 Reviewed By: pietern Differential Revision: D12972810 Pulled By: teng-li fbshipit-source-id: 917255390aa52845f6b0ad0f283875a7a704da48	2018-11-14 01:34:22 -08:00
Lu Fang	e2a7d43dfd	Use the torch.proto to store script module (#13736 ) Summary: Directly operate protobuf in the serializer/deserializer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13736 Reviewed By: dzhulgakov Differential Revision: D13028487 Pulled By: houseroad fbshipit-source-id: e578474008874f00f2a22f0a2ffd85f52643881a	2018-11-14 00:22:09 -08:00
Zachary DeVito	2871d3951f	More robust ->match behavior (#13952 ) Summary: Allow schema matching against string literals to work even with white space and other minor differences. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13952 Differential Revision: D13056043 Pulled By: zdevito fbshipit-source-id: 0b502ce8311587308370285f7062914fce34faf0	2018-11-13 23:40:42 -08:00
Junjie Bai	346c418fc9	Add caffe2 clang7 build CI job Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13928 Differential Revision: D13053770 Pulled By: bddppq fbshipit-source-id: 8a015d4d8c3fb6a98b86ce7d7d96c13fc4f0d3f5	2018-11-13 23:12:23 -08:00
Peter Goldsborough	5151d33287	Unflake the ordering enforcement test (#13919 ) Summary: Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer. Fixes https://github.com/pytorch/pytorch/issues/13634 colesbury SsnL ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919 Differential Revision: D13051718 Pulled By: goldsborough fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561	2018-11-13 21:05:02 -08:00
Ashish	f4e502a8c5	Added MIOpen conv transpose op (#13938 ) Summary: This pull request contains changes for: 1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc 2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc 3. Modifying the tests to run convTranspose op using MIOpen engine Differential Revision: D13055099 Pulled By: bddppq fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865	2018-11-13 21:01:52 -08:00
Xiaodong Wang	5059beb644	Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail (#13902 ) Summary: Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail() Otherwise crash trace: ``` caffe2/caffe2/operators/hip/top_k_radix_selection_hip.cuh:409:7: error: '__assert_fail': no overloaded function has restriction specifiers that are compatible with the ambient context 'gatherTopK' assert(writeIndex < outputSliceSize); ^ glibc/include/assert.h:88:6: note: expanded from macro 'assert' : __assert_fail (#expr, __FILE__, __LINE__, __ASSERT_FUNCTION)) ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13902 Reviewed By: bddppq Differential Revision: D13042820 Pulled By: xw285cornell fbshipit-source-id: 5117f6946db8109ae35e644e7423c8456e65e61f	2018-11-13 20:55:50 -08:00
jario-jin	0bedaf9cf6	Update setup.py to support Nvidia TX2 (#13939 ) Summary: add platform.machine() == 'aarch64' for supporting Nvidia TX2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13939 Differential Revision: D13055834 Pulled By: soumith fbshipit-source-id: 0fadc87adf9e6b796978ce743e824eb98b006856	2018-11-13 20:10:35 -08:00
Edward Yang	79ec5de3fc	Add some more files to gitignore. (#13924 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13924 Differential Revision: D13047983 Pulled By: ezyang fbshipit-source-id: bb2a8aa747d0c8195084c650006518df2a00daab	2018-11-13 19:02:57 -08:00
Sam Gross	c3680e2b19	Fix sum() on fp16 (#13926 ) Summary: The size of the shared and global memory buffers were incorrect for float16. They were sized based on float16 elements, but the buffers store intermediate float32 values. Fixes #13909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13926 Differential Revision: D13048334 Pulled By: colesbury fbshipit-source-id: 5a07df53f1152d5920258e91ed3f1e1de89b29e1	2018-11-13 16:50:36 -08:00
Junjie Bai	3002cb2ad0	Revert D13007266: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4 Differential Revision: D13007266 Original commit changeset: a9f0427a11db fbshipit-source-id: c23bb511bb26108405b7e8622377fc18573d4311	2018-11-13 16:44:33 -08:00
Junjie Bai	76d8979afe	Revert D13007287: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 3/4 Differential Revision: D13007287 Original commit changeset: c89a24458e04 fbshipit-source-id: 74d3fe310f1f551e2f52c6e3d9a744a47767b4b1	2018-11-13 16:41:53 -08:00
Junjie Bai	fbd50bbfb9	Revert D13007246: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 1/4 Differential Revision: D13007246 Original commit changeset: 230de42a3843 fbshipit-source-id: 40ce266826f00d320f7215169188ef4ead232660	2018-11-13 16:41:52 -08:00
Zachary DeVito	30676bdcd3	Finish up TODOs in python printer (#13879 ) Summary: * Correctly adds annotate when needed for lists * Parser/Emitter handles octal escapes so we do not fail for some strings. * more complete keyword list in pretty printer * floating point numbers are always printed with a decimal to ensure we never mistake them in parsing Pull Request resolved: https://github.com/pytorch/pytorch/pull/13879 Differential Revision: D13037860 Pulled By: zdevito fbshipit-source-id: f09ab174fc33402a429b21a5bfaf72e15c802cad	2018-11-13 16:39:46 -08:00
Peter Goldsborough	8311bbee7f	Fix Windows build and test in CI (#11716 ) Summary: This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC. ebetica ezyang orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716 Reviewed By: orionr Differential Revision: D13038253 Pulled By: goldsborough fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8	2018-11-13 16:35:54 -08:00
Elias Ellison	f649d8b3a9	add floordiv and bitwise ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13873 Reviewed By: driazati, wanchaol Differential Revision: D13033709 Pulled By: eellison fbshipit-source-id: df7edee0f790038fb2a806d20640ad25c70b50eb	2018-11-13 16:32:22 -08:00
Yan Zhu	7c1fe17288	fix UnpackSegments cuda op (#13917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13917 There is a bug in UnpackSegments cuda op when setting "max_length". "buck test mode/opt //caffe2/caffe2/python/operator_test:pack_ops_test -- test_pack_with_max_length_ops" fails on trunk. This diff fixed this bug. Reviewed By: xianjiec Differential Revision: D13045106 fbshipit-source-id: 4d640d61405bb86326dc33c81145824060cf987e	2018-11-13 15:38:58 -08:00
Yinghai Lu	cd49afce64	Allow attaching additional net info when supplying the benchmark net (#13820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13820 We would like to provide an option to show additional info of the net to be benchmarked. Reviewed By: highker, rdzhabarov Differential Revision: D13018219 fbshipit-source-id: d3ec69901bdae58117a482ddd2c327b0f8cf7cb6	2018-11-13 15:08:25 -08:00
Shuting Wang	23e19ebfa7	add non expotential emphasis loss to Lambdarank Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function. Reviewed By: itomatik Differential Revision: D9891514 fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8	2018-11-13 14:54:04 -08:00
Teng Li	dfa4767754	Update nccl submodule to latest (#13921 ) Summary: This should include fix to the issue: https://github.com/NVIDIA/nccl/issues/153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13921 Differential Revision: D13048999 Pulled By: teng-li fbshipit-source-id: a83f3bbb004f4a4137d187a010c7ec6b48f27eeb	2018-11-13 14:22:39 -08:00
Sam Gross	c46dd5163f	Temporarily disable part of test_spectral_norm (#13908 ) Summary: See #13818 for suggestions about a long-term fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908 Differential Revision: D13047262 Pulled By: colesbury fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed	2018-11-13 14:19:16 -08:00
David Riazati	5163a28917	Convert more weak functions (#13707 ) Summary: Convert some more functions to match up with features added. Some conversions were unsuccessful but the type line was left in for later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707 Differential Revision: D13030210 Pulled By: driazati fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc	2018-11-13 13:50:57 -08:00
David Riazati	53bc5fb043	Support nn.Sequential in script (#13889 ) Summary: This PR makes weak modules in `nn.Sequential` get properly compiled when used Pull Request resolved: https://github.com/pytorch/pytorch/pull/13889 Differential Revision: D13039559 Pulled By: driazati fbshipit-source-id: d3266305f0e206b2a19b63230ac2ab8f02faa603	2018-11-13 13:48:58 -08:00
Thomas Viehmann	5cfccd76e6	Jit load error msg (#13894 ) Summary: When loading a non-existant / non-openeable file, the current error message is ``` Expected to read 8 bytes but got %llu bytes0 ``` This - fixes two ASSERTM formatting calls (including the above), - throws a more specific error message if the ifstream constructor sets `.fail`. Here is someone apparently confused by the current message: https://github.com/facebookresearch/maskrcnn-benchmark/pull/138#issuecomment-437848307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13894 Differential Revision: D13043228 Pulled By: soumith fbshipit-source-id: b348b482c66d5e420874ae6e101b834106b89e82	2018-11-13 12:33:31 -08:00
Jerry Zhang	283062f574	Tensor construction: combine Resize+mutable_data - 2/4 (#13852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13852 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007266 fbshipit-source-id: a9f0427a11dbe084a30837aa32da67c9302cbc6c	2018-11-13 12:28:35 -08:00
Jerry Zhang	e030ee8197	Tensor construction: combine Resize+mutable_data - 3/4 (#13854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007287 fbshipit-source-id: c89a24458e0428485402b3eb23519a92804d768e	2018-11-13 12:28:33 -08:00
Jerry Zhang	9d36c37bdb	Tensor construction: combine Resize+mutable_data - 1/4 (#13853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13853 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007246 fbshipit-source-id: 230de42a3843d71599e812d5511f52f3af47f59b	2018-11-13 12:26:02 -08:00
Sebastian Messmer	96a01f82d1	Remove unnecessary include (#13878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13878 This removes a dependency to another header to simplify moving this header to c10. Also fix some include paths to prepare that move Reviewed By: ezyang Differential Revision: D13036478 fbshipit-source-id: cbddb5281498256fddcbebce61aa606c51b7b8d7	2018-11-13 12:18:28 -08:00
Edward Yang	60a85857dd	s/CAFFE_ENFORCE_WITH_CALLER/AT_ASSERTM/ (#13829 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC sinkingsugar Pull Request resolved: https://github.com/pytorch/pytorch/pull/13829 Differential Revision: D13019452 Pulled By: ezyang fbshipit-source-id: cf8b58b25a484720d9a612df6dd591c91af6f45a	2018-11-13 11:24:51 -08:00
Xiaodong Wang	561bc09026	Remove CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for accuracy (#13844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13844 In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%). We've fixed this in Caffe2 D9601217, and we should do the same to ATen as well. Reviewed By: ezyang Differential Revision: D13025486 fbshipit-source-id: 04f4f0d9af6287b0400ca1842fb2cdac1f8cdb70	2018-11-13 11:17:16 -08:00
Michael Carilli	0d2762e876	Minor fix to reenable nvtx sequence numbers for the forward methods of custom (Python) autograd functions (#13876 ) Summary: Some of our arch people (mkolod, Aditya Agrawal, kevinstephano) notified me that the sequence number annotations weren't showing up for forward methods of custom autograd functions, which was breaking their nvprof dump parsing. Two one-line fixes in the appropriate code paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13876 Differential Revision: D13042381 Pulled By: ezyang fbshipit-source-id: a114118f5c07ad4ba482e7a4892d08805b23c65b	2018-11-13 11:10:32 -08:00
Jerry Zhang	266bb8bf30	FeedTensor returns a Tensor (#13641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641 FeedTensor function used to take a pointer to Tensor and feed the content using Resize and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead. Reviewed By: ezyang Differential Revision: D12873145 fbshipit-source-id: 653735c20d611ff6ac9e380d8b3c721cb396a28f	2018-11-13 10:50:32 -08:00
Wanchao Liang	98b450deb9	Clean optional undefined tensor syntax in ATen yaml files and codegen (#13871 ) Summary: Previously there're multiple undefined tensor syntax exists in ATen definition files, this PR make all follows the same "?" syntax Pull Request resolved: https://github.com/pytorch/pytorch/pull/13871 Differential Revision: D13033486 Pulled By: wanchaol fbshipit-source-id: 7673bc22d08cd6975503deb51fba47ada6bc5156	2018-11-13 10:37:42 -08:00
Jie	bbc7412615	(#13765 ) Summary: fix cuda native batch norm for small feature planes. 1. fixed warp reduction divergent call of WARP_SHFL_XOR, causes hang with CUDA_ARCH > 7.0 2. split Normalization.cu into two files for code reuse, preparation for sync BN Pull Request resolved: https://github.com/pytorch/pytorch/pull/13765 Differential Revision: D13043331 Pulled By: soumith fbshipit-source-id: bf8565bff6ba782475ad0e4be37ea53c8052eadf	2018-11-13 10:14:37 -08:00
Edward Yang	8559fcf791	Unpin Sphinx. (#13831 ) Summary: Sphinx 1.8.2 is released, per https://github.com/sphinx-doc/sphinx/issues/5419 Fixes #11618 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13831 Differential Revision: D13020339 Pulled By: ezyang fbshipit-source-id: 4c7f3aff172efd3aca54ef48ac9052989cce5e4c	2018-11-13 09:45:12 -08:00
Stuart Golodetz	f6e4fc071a	Fix a bug that causes nvcc to emit an unknown option error (#13904 ) Summary: Using `"-Xcompiler -fPIC"` causes nvcc to emit the following: nvcc fatal : Unknown option 'Xcompiler -fPIC' As per fixes lower down in the file (see also issue #7126 on GitHub), the fix is to replace it with `"-Xcompiler" "-fPIC"`. This one was apparently missed when the original fix was applied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13904 Differential Revision: D13043189 Pulled By: soumith fbshipit-source-id: 6dc6d325671e4d08cd8e6242ffc93b3bd1f65351	2018-11-13 09:41:44 -08:00
Taekin Kim	f112aa746a	Fix document about torch.get_default_dtype() (#13890 ) Summary: Minor fix. ``` torch.get_default_dtype() → :class:`torch.dtype` ``` → ``` torch.get_default_dtype() → torch.dtype ``` :class: is not rendered in https://pytorch.org/docs/stable/torch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/13890 Differential Revision: D13040704 Pulled By: colesbury fbshipit-source-id: 5fadb01ad365042d5df2bac058f4ae89b281d3b7	2018-11-13 09:25:32 -08:00
Gregory Chanan	a83a1544b1	Move device_guard from _th_ functions to the wrapper. (#13842 ) Summary: This is what we would want to check in anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13842 Differential Revision: D13025463 Pulled By: gchanan fbshipit-source-id: d1ff9b10f4adc811bbd3db15b440ed00c16c82d1	2018-11-13 08:03:36 -08:00
Richard Zou	e43fb1d26d	Fix cuda out of memory test (#13864 ) Summary: torch.randn(big_number_here, dtype=torch.int8) is wrong because randn isn't implemented for torch.int8. I've changed it to use torch.empty instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13864 Differential Revision: D13032130 Pulled By: zou3519 fbshipit-source-id: d157b651b47b8bd736f3895cc242f07de4c1ea12	2018-11-13 07:30:30 -08:00
Jongsoo Park	7f002008f1	remove ShouldFp32FallbackToNCHW (#13814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13814 D10333829 implemented 3D conv in NHWC in fp32 ops so int8 ops don't need special handling anymore. Reviewed By: hx89 Differential Revision: D13017666 fbshipit-source-id: 41df449f5e21c4c7134cc5c480e559f8c247069b	2018-11-13 00:52:41 -08:00
Yinghai Lu	a7eee0a1e9	Add Reshape if there is add_axis when exporting C2 concat (#13798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13798 The semantics of C2 and ONNX Concat is a bit different. C2 concat accepts "add_axis" arg and will raise the dim if so. It's equivalent of attaching a Reshape after plain concat in ONNX. Reviewed By: rdzhabarov Differential Revision: D13012867 fbshipit-source-id: da23e555bae709fd2a373b04dcb9db4e984ae315	2018-11-12 22:27:49 -08:00
Ailing Zhang	a17c0118a5	fix stability in bce with pos_weight formula (#13863 ) Summary: Fixes #13773 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863 Differential Revision: D13031803 Pulled By: ailzhang fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f	2018-11-12 22:04:24 -08:00
Jongsoo Park	0bfbdcac89	fix bug in D13017777 Summary: Mistakenly created an infinite recursive call. (Note: this ignores all push blocking failures!) Reviewed By: jianyuh Differential Revision: D13038053 fbshipit-source-id: 8b760cb73b5369647d8ef651b8c196ac3f7af04d	2018-11-12 21:57:31 -08:00
Johannes M Dieterich	ce48958606	enable more unit tests (#13166 ) Summary: This enables the distributions and utils test sets for ROCm. Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit. For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166 Differential Revision: D12814759 Pulled By: bddppq fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e	2018-11-12 18:49:52 -08:00
Xiaomeng Yang	cec3455a8b	Add gitignore item for YCM config Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13805 Reviewed By: yinghai Differential Revision: D13031332 Pulled By: bddppq fbshipit-source-id: 279b7bb8879e49eef8abed51dc30b4b7ea0a2fa9	2018-11-12 16:58:56 -08:00
Jesse Hellemn	1600649792	Fix for nightly builds (#13779 ) Summary: Being tested on nightlies manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13779 Reviewed By: yinghai Differential Revision: D13001930 Pulled By: pjh5 fbshipit-source-id: 954eaabe052914b7b23c74e922666bf9dbfb630a	2018-11-12 16:38:14 -08:00
Bram Wasti	b052fe6c2f	Upgrade DLPack Summary: Needed to use TVM Reviewed By: ajtulloch Differential Revision: D12994038 fbshipit-source-id: f0b6c48a43a87fac37fcef73b78026d8384cd022	2018-11-12 15:59:46 -08:00
Bram Wasti	8480fe0105	Fix up creation of unique data nodes Summary: There was a bug in the uniqueness check that only made the first run unique Reviewed By: duc0 Differential Revision: D13013504 fbshipit-source-id: ecf7526d0fafd7968f1301734123f93968efef46	2018-11-12 15:37:08 -08:00
Will Feng	03c0f4fbe7	Use RNG mutex for randperm on CPU (#13832 ) Summary: When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in https://github.com/facebookresearch/maskrcnn-benchmark/pull/93#issuecomment-435479043. This PR fixes the bug. Closes https://github.com/pytorch/pytorch/issues/1868. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13832 Differential Revision: D13025453 Pulled By: yf225 fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d	2018-11-12 15:27:41 -08:00
Will Feng	fc79f70f9a	CircleCI: Add Linux CUDA 10 build (#13858 ) Summary: Moving CUDA 10 build to CircleCI so that we have one less job running on Jenkins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13858 Differential Revision: D13031916 Pulled By: yf225 fbshipit-source-id: 57aa54941d7f529e7094c8d037b836ec2fb6191c	2018-11-12 15:07:34 -08:00
Max Katsev	8de9564c12	Fix gcc-7 build in caffe2/caffe2/quantization/server/activation_distribution_observer.cc (#13799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13799 Fix broken operator= Reviewed By: jspark1105 Differential Revision: D13014333 fbshipit-source-id: 6075906ecf0735bd9a74d57108036a33e1575df8	2018-11-12 14:52:51 -08:00
CircleCI	f1a2bc4eae	Corrected python lib path on windows to be consistent with Linux (#13848 ) Summary: The python lib path on Windows was set to an incorrect path. This fixes it to be consistent with Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13848 Differential Revision: D13030945 Pulled By: soumith fbshipit-source-id: 7fb9013ffe66cff98018aea25fdb5cda03cbceb1	2018-11-12 14:39:55 -08:00
Johannes M Dieterich	53a3c46950	Switch to packaged Thrust on Ubuntu, enable CentOS 7.5 as a CI target (#12899 ) Summary: 1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267) 2) CentOS 7.5 docker (ROCm 279) * Always install the libraries at docker creation for ubuntu. * Add Dockerfile for CentOS ROCm * Enable the centos build * Source devtoolset in bashrc * Set locales correctly depending on whether we are on Ubuntu or CentOS * Install a newer cmake for CentOS * Checkout thrust as there is no package for CentOS yet. PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280 For attention: bddppq ezyang Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899 Differential Revision: D13029424 Pulled By: bddppq fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76	2018-11-12 14:39:54 -08:00
Pieter Noordhuis	1caa341c68	Add torch.multiprocessing.spawn docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13846 Differential Revision: D13029595 Pulled By: pietern fbshipit-source-id: b733b00f7070c18535c31801f20e6e717eec7748	2018-11-12 14:39:52 -08:00
Michael Suo	1a0cb08918	allow `Node::isAfter` to work across blocks (#13855 ) Summary: Extend `isAfter` to work for nodes in different blocks. This is useful if we want to ask a question like "are any of the uses of value `v` after this node", since uses may be inside inner blocks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13855 Differential Revision: D13030528 Pulled By: suo fbshipit-source-id: f681405396f3ec68eec1a2cb92e40873921a4b78	2018-11-12 14:39:50 -08:00
David Brownell	75bf877534	Preventing error where ninja build files are overwritten when invokin… (#13698 ) Summary: …g clean and build together Pull Request resolved: https://github.com/pytorch/pytorch/pull/13698 Differential Revision: D13030905 Pulled By: soumith fbshipit-source-id: 234576ac92e0aa8c2d2409958d3cf85eb29ed1f3	2018-11-12 14:39:48 -08:00
Elias Ellison	686e83223f	add ops between float & int, and change list equality output to be a boolean Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13793 Reviewed By: wanchaol Differential Revision: D13010872 Pulled By: eellison fbshipit-source-id: 2c8248f30b51eab1a87290711f99b7ceb6df2009	2018-11-12 14:39:47 -08:00
Pieter Noordhuis	e3839dfc35	Add matplotlib to docs/requirements.txt (#13828 ) Summary: Used in docs/source/scripts/build_activation_images.py. Don't know if we need a specific version. I installed the latest version (3.0.2) and that works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13828 Differential Revision: D13030294 Pulled By: pietern fbshipit-source-id: b4e7b381182036645924453a1e2abb719090bbc4	2018-11-12 13:43:07 -08:00
Junjie Bai	5bf14c23b7	Bump Caffe2 docker images to version 230 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13857 Differential Revision: D13029637 Pulled By: bddppq fbshipit-source-id: 73c4a0f3d39257a2312b36c9dd55dc001067d9c4	2018-11-12 13:26:23 -08:00
Jongsoo Park	309cc76469	BaseType:: -> this-> (#13817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13817 gcc7 doesn't like BaseType::func<..>() . Should use this->func<...>() Reviewed By: hx89 Differential Revision: D13017777 fbshipit-source-id: 0cf68d459b44379b1c103cf74382857db9a91bef	2018-11-12 12:51:12 -08:00
zrphercule	6093f29409	Update coverage info (#13788 ) Summary: Right now we dont have coverage info of how many pytorch operators can be exported to onnx. This pr will add torch.nn operators to it, while later functional modules will be added as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13788 Differential Revision: D13010448 Pulled By: zrphercule fbshipit-source-id: 19349cabaeff42fda3620bb494f7ec4360d96b76	2018-11-12 12:39:12 -08:00
Duc Ngo	d8f35c42be	nomnigraph - easy - support blob renaming (#13845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13845 Support renaming a blob in nomnigraph Reviewed By: itomatik Differential Revision: D13026762 fbshipit-source-id: fc8cecb4562a6c618ce5c8e2ff79a2a282a8ff09	2018-11-12 12:32:10 -08:00
David Riazati	0c375571f5	Support OptionalType export and type match (#13647 ) Summary: * Adds `OptionalType` support for import/export * Optionals get exported along with their contained type, i.e. 'Optional[int]' * Allows concrete types and `None` to be passed to an op that takes an optional * Converts `softmax` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647 Differential Revision: D12954672 Pulled By: driazati fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a	2018-11-12 12:15:25 -08:00
Richard Zou	bf00008aa1	Use SmallVector for TensorImpl sizes and strides. (#13649 ) Summary: This removes dynamic allocations for sizes/strides for tensors with <= 5 dims. This should cover the most common tensor use cases; we use a lot of 4D tensors in images (N, C, H, W) and LSTMs use tensors with 3 or fewer dims. Benchmarking results can be found here: https://gist.github.com/zou3519/ce4182722ae7e2a228bc8b57ae60b0e9 The quick summary is that this PR: - makes aten LSTM's forward pass ~1ms faster and improves JIT lstm perf as well - Tensor as_strided is now 200ns faster for dimensions <= 5 - at::empty performance is 200ns slower for dimensions > 5. For dims <= 5, there is no noticeable perf change. - Variable ops are 200-500ns faster because Variables never used their sizes/strides fields in the first place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13649 Differential Revision: D12950409 Pulled By: zou3519 fbshipit-source-id: 0bd87ec9f712ddc0d533a347d781e3a91a954b90	2018-11-12 10:40:32 -08:00
Zachary DeVito	aef9e76283	Get pretty printer ready for use as a serialization format (#13616 ) Summary: Get pretty printer ready for use as a serialization format This PR adds a bunch of functionality to the pretty printer (now called python_printer to reflect the fact that it will be used to output valid python source). The idea is to get the printer ready for use as serialization format. This PR does not have tests beyond what the pretty printer already had. PRs stacked on this one will do round-trip export/import to test this functionality more robustly. Notes: * PythonPrinter is an evolution of the original pretty printer. However, much of it has changed so it is best just to read it as a new implementation. Trying to correlate it to the original implementation is probably not much help. * The printer tries to get reasonably close to how the original function was likely written, such as writing expressions rather than making intermediates when possible. We may decide to turn this off for the actual serialization, but it is useful for pretty printing. * tensor field access was changed so that prim::device and family have schema * fixed a bug in the compiler where setUniqueName gets called even when a value already has one. this sometimes assigned really poor names to graph inputs * Graph::insert gains an optional range argument to make range-preserving inserts easier. * prim:: ops that can have schema now have schema. This is because when we parse them back in, we will need the schema to correctly set their output types. * there is code in the python printer to complain if you try to add a prim op and do not update the printer. * BuiltinModule is generalized to take an operator namespace and a version number for work in future commits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13616 Reviewed By: goldsborough Differential Revision: D13008252 Pulled By: zdevito fbshipit-source-id: 32b33bc6410d6ca1c6f02bd6e050f8d5eea32083	2018-11-12 10:21:30 -08:00
Gregory Chanan	b7a7ab364b	Improve mm / addmm error message with sparse tensors (#13796 ) Summary: and write derivatives in terms of native functions. This is the same as https://github.com/pytorch/pytorch/pull/13648 but has a fix for the canonicalize op jit pass to propagate shape information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13796 Reviewed By: ezyang Differential Revision: D13012281 Pulled By: gchanan fbshipit-source-id: 88d0d91e72b5967c51ff865350fcbdd7ffed92ef	2018-11-12 07:16:47 -08:00
laurent	8752214fb7	Apply weight-decay before momentum in the SGD optimizer. (#13801 ) Summary: While trying to understand why two implementations of the same model, one in Python, one using the C++ api (via some [ocaml wrappers](https://github.com/LaurentMazare/ocaml-torch)) did not perform equally well, I noticed that the Python and C++ implementation of SGD slightly differ on weight decay. - In the [Python version](https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L91-L93) weight decay is applied before momentum (and so momentum applies to the weight decay). - In the C++ implementation the weight decay is applied after momentum. In the couple computer-vision models I have looked at the Python version performs a little better so this PR tweaks the C++ implementation to perform weight-decay before momentum. This is possibly caused by having more regularization - maybe increasing the weight decay while keeping the current code would hold the same improvements however a nice advantage of this change is to put the C++ and Python version in line. After this change my Python and C++/ocaml models performed similarly when using the same weight-decay parameter. Maybe there was some real reason to have weight decay after momentum in the C++ version but I haven't found any. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13801 Differential Revision: D13020709 Pulled By: soumith fbshipit-source-id: 7c2ac245577dd04bc3728aec4af0477120a60f13	2018-11-11 23:54:50 -08:00
Gregory Chanan	7e8572be2d	Change method-only _th_ prefix Declarations to functions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13754 Reviewed By: ezyang Differential Revision: D12988489 Pulled By: gchanan fbshipit-source-id: b62bb9288f67d72320925c36283f6ce6cbf95d20	2018-11-11 15:47:06 -08:00
Yan Zhu	003f97cefa	fc layer accept axis argument (#13822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13822 as title Reviewed By: xianjiec Differential Revision: D12996338 fbshipit-source-id: 1aa61e71e2d79535325ea7034c82e1cb6bf3a9f6	2018-11-11 13:44:57 -08:00
Edward Yang	e35418b3be	New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342 This PR introduces a few new concepts: - DeviceGuardImplInterface, and implementations for CPU and CUDA, which provide a generic interface for interfacing with device and stream state, without requiring a direct dependency on the code in question. - InlineDeviceGuard, a general template for generating both specialized and dynamically dispatched device guard implementations. Dynamic dispatch is done by specializing it on a VirtualGuardImpl. - Provide a device-independent DeviceGuard class, which can be used even from CPU code. It uses the aforementioned dynamic dispatch. - CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch but can only be used from CUDA. - StreamGuard, which is the same as above, but for streams rather than devices. - Optional variants of all the aforementioned guards, which are a no-op if no device/stream is specified - CUDAMultiStreamGuard, specifically for the case when we want to set a device on every guard. There are some subtle semantic changes, which have been thoroughly documented in the class definition. BC-breaking changes: - Move constructor/assignment have been removed from all device guard implementations. - In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write 'reset_device', because if you switch devices/device types, the stream/device on the previous device is unset. This is different from previous behavior. - CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard or CUDAMultiStreamGuard as appropriate for your use case. Reviewed By: dzhulgakov Differential Revision: D12849620 fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e	2018-11-11 12:11:10 -08:00
Jongsoo Park	4b86a215ca	moving simd adagrad code to perfkernels (#13549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13549 caffe2/perfkernels has a nice framework to switch btw implementations optimized for different instructions at runtime. This can be a good preparation to implement avx512 adagrad kernels. Reviewed By: hyuen Differential Revision: D12882872 fbshipit-source-id: a8f0419f6a9fd4e9b864c454dad0a80db267190c	2018-11-11 00:20:39 -08:00
Yinghai Lu	d97ac82bf5	Back out "Revert D12967258: Support more data types in ONNXIFI transform" (#13812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13812 Original commit changeset: 2cf95bdc5ed8 Looks like in iOS, `uint64_t` is not the same as `size_t`. :( Fixed it here. Reviewed By: houseroad Differential Revision: D13017390 fbshipit-source-id: d33854ce341225aba372fb945c3704edc14f9411	2018-11-10 20:00:34 -08:00
Pieter Noordhuis	786f9ba6ea	Remove potential infinite loop from test_c10d.py (#13816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13816 If common.find_free_port() returns the same port over and over again, and the TCPStore fails to bind to it over and over again, this function has the potential to loop forever. If we can't find a free port after 10 tries, we are safe to assume something is wrong... Differential Revision: D13017700 fbshipit-source-id: 2139a0ea0f30ce08b5571f80ae0551f1fa7ba4a2	2018-11-10 17:58:13 -08:00
Pieter Noordhuis	c3603301d7	Fix race condition in TCPStoreDaemon initialization (#13815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13815 If the TCPStoreDaemon was constructed and destructed shortly after, it was possible for the controlPipeFd_ to get initialized by the background thread after the stop() function was already called. Then, the destructor hangs on waiting for the thread to terminate, when the termination signal (closing the write side of the control pipe) will never happen. Differential Revision: D13017697 fbshipit-source-id: 9528286fbfc773237990f1a666605d27bac2c0e5	2018-11-10 17:54:21 -08:00
Thomas Viehmann	4c3b76c402	Add std::string to the getTypePtr for JIT inference of custom op types (#13683 ) Summary: This allows custom ops to take string parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13683 Differential Revision: D13017010 Pulled By: soumith fbshipit-source-id: 7c40aca7f57ba3f8812d34bc55828ff362c69bd2	2018-11-10 12:58:53 -08:00
Soumith Chintala	7c02f285dc	Revert D12967258: Support more data types in ONNXIFI transform Differential Revision: D12967258 Original commit changeset: 688076e6f504 fbshipit-source-id: 2cf95bdc5ed8f1e13646bc5cf8139bdc516861d7	2018-11-10 12:34:31 -08:00
Yinghai Lu	5923d76f96	Support more data types in ONNXIFI transform (#13745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13745 We need to support types beside `int64` and `float`. Reviewed By: bddppq, rdzhabarov Differential Revision: D12967258 fbshipit-source-id: 688076e6f504b2bf24bba89714df87a678c5638a	2018-11-10 10:41:01 -08:00
Yan Shang	c85463fc74	Allow Gather to handle empty data (#13781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13781 allow Gather Op to handle empty data. Reviewed By: intermilan Differential Revision: D13001267 fbshipit-source-id: 633c8471b637c56be8f6574f9bf9430785073977	2018-11-10 10:00:47 -08:00
iotamudelta	4f622c26b9	fix ffs intrinsic for long long (ROCm 290) (#13804 ) Summary: * Switch to __ffsll in Embedding which is the correct intrinsic here. * Fix WARP_BALLOT and ffsll in LookupTable as well. Fix comes from iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13804 Differential Revision: D13016184 Pulled By: bddppq fbshipit-source-id: 2287a78ee9e592630336a073ad1e55a90e1f946d	2018-11-10 02:02:43 -08:00
James Sun	d02781a2ef	Make InterpresterStateImpl a intrusive_ptr_target (#13784 ) Summary: InterpresterStateImpl con continue its lifecycle by increment the ref count itself. This patch also removes InterpresterState::clone() interface that conflicts with intrusive_ptr_target that disallows copy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13784 Differential Revision: D13015451 Pulled By: highker fbshipit-source-id: a05f1ea6549d52ec693ccffefaa4d520b2474b8c	2018-11-09 23:39:18 -08:00
Michael Suo	079e86a915	schematize some prim ops (#13790 ) Summary: We're relying on the default function schema (which contains no argument information) in places where we don't need to. This is bad because alias analysis will be very conservative when it doesn't have schema information present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13790 Differential Revision: D13009185 Pulled By: suo fbshipit-source-id: 023516937bd3dcae8a969185a89c55f38d691ba5	2018-11-09 15:50:29 -08:00
Wanchao Liang	e552c04d53	Add proper comment for dispatch_to (#13783 ) Summary: Add proper comment to the fix in https://github.com/pytorch/pytorch/pull/13700 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13783 Differential Revision: D13009956 Pulled By: wanchaol fbshipit-source-id: 34f5259204dab12f4159ab191e7b08e2f5226292	2018-11-09 15:48:15 -08:00
Vishwak Srinivasan	7b2fb012a8	Make potrs batched (#13453 ) Summary: - This is a straightforward PR, building up on the batch inverse PR, except for one change: - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty. Billing of changes: - Add batching for `potrs` - Add relevant tests - Modify doc string Minor changes: - Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`. - Add test for CUDA `potrs` (2D Tensor op) - Move the batched shape checking to `LinearAlgebraUtils.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453 Reviewed By: soumith Differential Revision: D12942039 Pulled By: zou3519 fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35	2018-11-09 15:16:26 -08:00
Ansha Yu	e3e6ca1102	operator serialized test coverage summary document (#13703 ) Summary: Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators. Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703 Reviewed By: dzhulgakov Differential Revision: D12970810 Pulled By: ajyu fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1	2018-11-09 15:04:08 -08:00
Sam Gross	014ea1e1f8	Improve CUDA out-of-memory error message (#13751 ) Summary: ``` The new error message now looks like (from Python): RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 11.93 GiB total capacity; 4.00 GiB already allocated; 7.33 GiB free; 179.00 KiB cached) Summary of terms: "total capacity": total global memory on GPU "already allocated": memory allocated by the program using the caching allocator "free": free memory as reported by the CUDA API "cached": memory held by the allocator but not used by the program The "allocated" amount does not include memory allocated outside of the caching allocator, such as memory allocated by other programs or memory held by the driver. The sum of "allocated" + "free" + "cached" may be less than the total capacity due to memory held by the driver and usage by other programs. Note that at this point cuda_malloc_retry has already returned all possible "cached" memory to the driver. The only remaining "cached" memory is split from a larger block that is partially in-use. ``` This also fixes an issue where on out-of-memory could cause an unrelated subsequent CUDA kernel launch to fail because `cudaGetLastError()` was not cleared. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13751 Differential Revision: D13007177 Pulled By: colesbury fbshipit-source-id: ea7121461b3f2a34646102959b45bde19f2fabab	2018-11-09 14:33:28 -08:00
Edward Yang	ae7c6bcfcf	Make c10 buildable by itself. (#13742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13742 Along the way, I switch us to globbing directories by hand, so we don't actually pick up generated cpp files in c10/build (if you're doing the normal idiom for a cmake build.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: dzhulgakov Differential Revision: D12988039 fbshipit-source-id: 08b7ec50cfef82b767b4ca9972e5ba65bc45bcbb	2018-11-09 13:40:39 -08:00
Peter Goldsborough	09369fa9d7	Fix clang_tidy.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13776 Differential Revision: D13002845 Pulled By: goldsborough fbshipit-source-id: 7b019a032680796cbb04f733b31749ef7c6abe54	2018-11-09 11:46:50 -08:00
Wanchao Liang	79ceecec8e	Optional undefined tensor support (#13650 ) Summary: This PR is a part of task to unblock standard library export. * we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue. * Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy. * It also tries to export bilinear as an example of undefined tensor(None) input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650 Differential Revision: D12967026 Pulled By: wanchaol fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a	2018-11-09 11:29:57 -08:00
Igor Sugak	607094c4bf	fix null-pointer-use in reshape_op.h Summary: UndefinedBehaviorSanitizer: null-pointer-use ../fbcode/third-party-buck/gcc-5-glibc-2.23/build/libgcc/include/c++/5.5.0/bits/stl_vector.h:794:16 ``` Here we take the address of the first element in the empty vector. Fix the error by guarding against empty source. Reviewed By: pixelb Differential Revision: D12989957 fbshipit-source-id: ac5ec366385df835b546bd1756e30cd762f13a7a	2018-11-09 10:07:04 -08:00
Sebastian Messmer	107e067654	Move IdWrapper to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13687 Reviewed By: ezyang Differential Revision: D12912238 fbshipit-source-id: f7a37de52cd3b3c45b3b0e9eeb29dff624fa0258	2018-11-09 10:02:45 -08:00
Peter Goldsborough	332a7db35e	Use MNIST dataset in C++ integration test (#13737 ) Summary: We have an MNIST reader in the C++ data API, so we can get rid of the custom one currently implemented in the integration tests. ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/13737 Differential Revision: D12990936 Pulled By: goldsborough fbshipit-source-id: 125a1910ec91d53dbf121570fc9eec6ccfba0477	2018-11-09 09:55:02 -08:00
Tongzhou Wang	a63ef1d605	Suggest git submodule update --init --recursive (#13769 ) Summary: We now have submodules that have submodules Pull Request resolved: https://github.com/pytorch/pytorch/pull/13769 Reviewed By: soumith Differential Revision: D13000203 Pulled By: SsnL fbshipit-source-id: 63c0c19c6c9d25ae3bf255a2421a82ca68278866	2018-11-09 08:41:44 -08:00
Gregory Chanan	a1b2f1710d	Remove _th_is_contiguous, make is_set_to a function, not a method. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13725 Differential Revision: D12980246 Pulled By: gchanan fbshipit-source-id: e5c5742a67e5a25062df736e28b44c133a635ca8	2018-11-09 07:02:38 -08:00
Gregory Chanan	10a1534c43	Remove _th methods that also have a function. (#13721 ) Summary: There's no reason we need these as the native function wrapper calls into the function anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13721 Differential Revision: D12977449 Pulled By: gchanan fbshipit-source-id: 54701ebe2f0bb2b55484cb437501c626e6471347	2018-11-09 06:57:20 -08:00
Thomas Viehmann	9ffabcfcaa	Use nested variant of getValueTrace to allow more flexible tracing script modules (#13597 ) Summary: When tracing scripted functions, we used to only allow Tensor arguments. This enables tracing script modules with List[Tensor] or Tuple[Tensor, Tensor] arguments (passing tuples). Fixes: #13566 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13597 Differential Revision: D12990464 Pulled By: soumith fbshipit-source-id: fdce3afcb1e09f3c26d6ce834c01bf18d261f47c	2018-11-09 06:24:02 -08:00
James Sun	dca3c2c60f	Save and execute futures in a task queue (#13212 ) Summary: Upon calling wait(), save the forked thread and the current thread to a task queue. A idling thread (which currently is single threaded) should pick a ready task and run till there is nothing in the task queue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13212 Differential Revision: D12884522 Pulled By: highker fbshipit-source-id: b3942a0ee63c148e05f5f41bdc73007fa3c3368e	2018-11-09 01:46:35 -08:00
Junjie Bai	4484f67b47	Revert D10203439: [pytorch][PR] Fix batch norm multiplier init Differential Revision: D10203439 Original commit changeset: 999cc134a45e fbshipit-source-id: 7871e384063db2f3788169338e9c965d5f8ac351	2018-11-09 00:37:05 -08:00
peterjc123	26751ce300	Fix the improper use of windows-native slashes (#13220 ) Summary: Trying to fix #12510. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13220 Differential Revision: D12994483 Pulled By: soumith fbshipit-source-id: adbaf7e7a0a7cd1fc3ec947ddb209b55a9cda2a6	2018-11-08 21:09:44 -08:00
Zachary DeVito	44fb23a2f5	Add ability to annotate jit types inside function (#13752 ) Summary: This adds torch.jit.annotate for annotating the type of an intermediate. This is Py2/3 compatible, e.g.: ``` from torch.jit import annotate from typing import List torch.jit.script def foo(): a = annotate(List[int], []) ``` This is needed to output valid python programs from our IR. It removes the need for the empty list constructors. A future patch can add support to the C++ parser and Python 3, via desugaring: ``` a : int = b a = anntoate(int, b) ``` But this functionality is not required for serialization so is not added in this patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13752 Differential Revision: D12989885 Pulled By: zdevito fbshipit-source-id: 161573a7352094543dc0d33a892f2a3b9103d847	2018-11-08 20:25:00 -08:00
Ashish	5ae3b44255	Added HIP top_k operator (#13747 ) Summary: This PR contains changes for: 1. Adding HIP top_k operator in Caffe2 2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils 3. Removing the top_k operator test from ROCm test ignore list 4. Bug fixes in related code in THC/THCAsmUtils.cuh Differential Revision: D12986451 Pulled By: bddppq fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504	2018-11-08 20:14:53 -08:00
Will Feng	32b3fe8ce6	CircleCI: enable OSX jobs again (#13731 ) Summary: CircleCI now offers 60x OSX concurrency, which is 2x of what we currently have in Jenkins. This should help alleviate the OSX CI wait time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13731 Differential Revision: D12993737 Pulled By: yf225 fbshipit-source-id: f475ad9a1d031eda95b7cacdaf52f31fbb2f4f93	2018-11-08 20:09:05 -08:00
Jianyu Huang	2ee4ef5290	Change all namespace fbgemm2 in the new fbgemm2 to namespace fbgemm (#13740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13740 We would like to rename the old fbgemm to “fbgemm0”, and the new fbgemm2 to “fbgemm”: This DIFF changes all namespace fbgemm2 to namespace fbgemm. The purpose is to avoid the confusion of "fbgemm2" when we release our FBGEMM open source. Reviewed By: jspark1105 Differential Revision: D12850449 fbshipit-source-id: 08cc47864b157e36fbceddb7a10bf26218c67bd8	2018-11-08 19:59:12 -08:00
Jianyu Huang	55964abb11	Change all namespace fbgemm in the old fbgemm to namespace fbgemm0 (#13701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13701 We would like to rename the old fbgemm to “fbgemm0”, and the new fbgemm2 to “fbgemm”: This DIFF changes all namespace fbgemm to namespace fbgemm0. Reviewed By: jspark1105 Differential Revision: D12848727 fbshipit-source-id: 47935e9e2c4714a7ce1bfc3f7e4d6a334130132e	2018-11-08 19:59:10 -08:00
Freddie Mendoza	a8e303dc46	change USE_MKLDNN default from ON (from #13303 ) to OFF for ppc64le (#13759 ) Summary: MKLDNN is not supported on ppc64le change USE_MKLDNN to OFF for ppc64le Pull Request resolved: https://github.com/pytorch/pytorch/pull/13759 Differential Revision: D12993121 Pulled By: soumith fbshipit-source-id: 539d5cfcff2c03b59fa71e10b52fac333a64c381	2018-11-08 19:33:39 -08:00
Gregory Chanan	dd3f52fbe6	Remove _th_ndimension, which doesn't actually do anything. (#13723 ) Summary: Tensor.ndimension is hardcoded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13723 Reviewed By: ezyang Differential Revision: D12979461 Pulled By: gchanan fbshipit-source-id: b95251b74a7b96ebcce2331f847873216968124d	2018-11-08 19:29:59 -08:00
Kaixhin	c9be135bb9	Fix batch norm multiplier init (#12325 ) Summary: Fixes #12259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12325 Differential Revision: D10203439 Pulled By: SsnL fbshipit-source-id: 999cc134a45e2554313adb7eb93ee98e1f84335f	2018-11-08 19:00:00 -08:00
Peter Goldsborough	42001e7c17	Fix clang-tidy for Python2 (#13735 ) Summary: `clang_tidy.py` doesn't run with Python2 right now. Needs a minor fix ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13735 Differential Revision: D12990613 Pulled By: goldsborough fbshipit-source-id: ad19b229a14188fd048dde198a7f4c3483aeff95	2018-11-08 17:57:08 -08:00
Gregory Chanan	89b54229b1	Make _th_unfold and _th_view into functions, from methods. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13724 Reviewed By: ezyang Differential Revision: D12979865 Pulled By: gchanan fbshipit-source-id: 92462198f3c51664f7973c142956774d88d831ca	2018-11-08 16:36:55 -08:00
Roy Li	00e752a46e	Move cpu copy to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13347 Reviewed By: ezyang Differential Revision: D12850691 fbshipit-source-id: d72577efb0ccb6df69e33f0c0a94c9f71937ccf8	2018-11-08 15:56:41 -08:00
Dan Zheng	51f58f0990	Fix typo in CTC loss doc comments. (#13727 ) Summary: `target_lenghts` -> `target_lengths` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13727 Differential Revision: D12981582 Pulled By: zou3519 fbshipit-source-id: e5e02b26cf3030a91494655ff863273333cc4133	2018-11-08 14:50:48 -08:00
Brennan Vincent	bff931a10d	implement concatenation of sparse tensors (#13577 ) Summary: With this change applied, `torch.cat` works for sparse tensors. The algorithm is just to concatenate the values, and give the new values the proper indices (which will be the same as their old indices in every dimension except the catted dimension, and their old indices plus the sum of the size of every previous tensor in the catted dimension). This is my first time contributing to PyTorch so please feel free to tell me if this approach seems totally wrong. Coming next: `torch.stack` for sparse tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13577 Differential Revision: D12980948 Pulled By: umanwizard fbshipit-source-id: 51ebdafee7fcd56d9762dcae9ebe5b4ab8e1dd6b	2018-11-08 14:15:30 -08:00
Ning Dong	65ff84b49e	Catch error by reference in module.cpp (#13743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13743 "catch by reference, throw by value" Catching polymorpic type std::bad_weak_ptr was an error earlier. Reviewed By: goldsborough Differential Revision: D12982626 fbshipit-source-id: 0ff22c0352acc7a94078ce6d5b2a4e56fee75be5	2018-11-08 13:49:21 -08:00
Bram Wasti	8a5869a3f7	Move function_schema to aten/core (#13729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13729 final move to expose function_schema to caffe2 Differential Revision: D12981563 fbshipit-source-id: e4f7fa611a2498a96c27dfa8bfd18e10ad781c10	2018-11-08 13:28:37 -08:00
James Reed	85bde3801b	Tracer now records Python variable names (#13441 ) Summary: This is probably slow but it should make the traces more understandable and make debugging easier. Any suggestions for how to make it faster (i.e. make it so we don't have to traverse all of locals() and globals()) would be appreciated Pull Request resolved: https://github.com/pytorch/pytorch/pull/13441 Differential Revision: D12879763 Pulled By: jamesr66a fbshipit-source-id: b84133dc2ef9ca6cfbfaf2e3f9106784cc42951e	2018-11-08 13:08:42 -08:00
Edward Yang	64a910bac7	Remove unnecessary tools/ qualification. (#13706 ) Summary: H/t kalisp for pointing it out Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13706 Differential Revision: D12983983 Pulled By: ezyang fbshipit-source-id: 6a43cdde142fe64550121b16716f206e7c4d68d6	2018-11-08 12:55:19 -08:00
Brian Vaughan	4fadf571fd	handle flat rolling (no dim specified) T36264909 (#13588 ) Summary: update roll to behave as in numpy.roll when dimension to roll not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588 Differential Revision: D12964295 Pulled By: nairbv fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1	2018-11-08 12:39:35 -08:00
David Riazati	59d021b63a	Fix nn threshold test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13734 Differential Revision: D12983358 Pulled By: driazati fbshipit-source-id: 6db30b8bbc8e34c6e01f678724dfca9555a86177	2018-11-08 12:31:39 -08:00
vishwakftw	0a090fe60a	Fix torch.dist for infinity, zero and minus infinity norms (#13713 ) Summary: Fixes #13559 Differential Revision: D12981556 Pulled By: zou3519 fbshipit-source-id: 99e86abab3ca045257374a9212ca24e7ca59fe9d	2018-11-08 12:03:07 -08:00
Elias Ellison	a92ff57a4d	update range doc (#13730 ) Summary: Update range documentation to show that we don't support start or increment parameters Pull Request resolved: https://github.com/pytorch/pytorch/pull/13730 Differential Revision: D12982016 Pulled By: eellison fbshipit-source-id: cc1462fc1af547ae80c6d3b87999b7528bade8af	2018-11-08 11:40:52 -08:00
Jongsoo Park	869ef71343	AsyncNet: option for time based tracing and trace path (#13440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13440 Time based tracing is easier to look at when multiple nets are running asynchronously. This diff also adds an option to change the path to dump trace files. Reviewed By: aazzolini, ilia-cher Differential Revision: D12479259 fbshipit-source-id: 94d379634ba7b90c111c92b1136ffa4226b8bb8c	2018-11-08 11:34:34 -08:00
David Riazati	556ff8e7b7	Add builtins for `size()` and list with defaults (#13639 ) Summary: * `aten::size()` to match `torch.Tensor.size` * `aten::list_with_default` for semantics of `torch.nn.modules.utils.list_with_default` * converts `adaptive_avg_pool2d` and `adaptive_avg_pool3d` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13639 Differential Revision: D12954670 Pulled By: driazati fbshipit-source-id: 68c30af0efc02c60af5fb8c9715b2435cc01a0d9	2018-11-08 11:26:35 -08:00
Gu, Jinghui	d01cb70497	build with mkl-dnn by default (#13303 ) Summary: build with mkl-dnn by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/13303 Reviewed By: yinghai Differential Revision: D12979633 Pulled By: orionr fbshipit-source-id: 00d23fa27c0d13e82f7e5acb3ebd00ed7ba1d5dc	2018-11-08 11:18:27 -08:00
Yinghai Lu	8581d3ec67	Allow blacklist ops in onnxifi transform Differential Revision: D12945523 fbshipit-source-id: cf5055652591bd1dd8d4be92b7fd6a40a0764536	2018-11-08 09:59:03 -08:00
peter	fd9aaa6b79	Fix linking errors on Windows (#13100 ) Summary: 1. Removes the flag "/FORCE:UNRESOLVED" that shouldn't be used. 2. Fix the code logic for ONNX_BUILD_MAIN_LIBS on Windows 3. Add a patch for protobuf using CMake Pull Request resolved: https://github.com/pytorch/pytorch/pull/13100 Differential Revision: D12978950 Pulled By: orionr fbshipit-source-id: db9eb8136acf5712cfb5a24ed228b7934d873331	2018-11-08 09:54:09 -08:00
Uladzislau Paulovich	3e877a70e3	Enable unused-private-field warning (#13450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13450 Pull Request resolved: https://github.com/facebook/react-native/pull/22065 This diff enables -Wunused-private-field clang warning for Android builds and fixes all broken targets. Reviewed By: gkmhub Differential Revision: D12881793 fbshipit-source-id: 515555661e137be9e7b20eac9b5bdcb549d6a094	2018-11-08 09:23:11 -08:00
Edward Yang	df022f8078	Disable CopyFrom src with uninitialized storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12692 Reviewed By: li-roy, dzhulgakov Differential Revision: D10392295 fbshipit-source-id: 3a37173b03e76862ec421e0b6d0b0e322b2749b5	2018-11-08 07:45:42 -08:00
David Riazati	4472ad3b2f	Move functional _Reduction to its own module (#13401 ) Summary: To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script Depends on #13484 for saving rng state Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401 Differential Revision: D12868501 Pulled By: driazati fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27	2018-11-08 01:04:10 -08:00
Xiaoqiang Zheng	de41d1ae0b	Enable junk fill for the default CPU allocator (#13377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13377 * Enable junk fill for the default CPU allocator. The first diff only enables this for the tests. A second diff will change the default of zero-fill to false. * Fix tests to use 64-bit counters that IterOp and LearningRateOp demands. * Fix kernels that uses uninitialized memory. Reviewed By: salexspb Differential Revision: D10866512 fbshipit-source-id: 17860e77e63a203edf46d0da0335608f77884821	2018-11-08 00:02:37 -08:00
Michael Suo	21991c05a9	Support assignment to subscripted lhs expr (#13486 ) Summary: Support things like `foo[0] = bar` in script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13486 Differential Revision: D12964550 Pulled By: suo fbshipit-source-id: 3dda8ffd683d1b045787c65bfa0c7d43b0455658	2018-11-07 23:07:57 -08:00
Wanchao Liang	411d89ca64	Fix the bug in dispatch_to when calling cpu() (#13700 ) Summary: When we added to in #13146, we did not emit the cast correctly in one of the dispatch overloads, then when we call .cpu(), the dtype will always be the default float type, which is wrong. CC jamesr66a eellison Pull Request resolved: https://github.com/pytorch/pytorch/pull/13700 Differential Revision: D12968699 Pulled By: wanchaol fbshipit-source-id: c1aaf2bf6a163643ce5360797da61c68271d8bf8	2018-11-07 22:57:35 -08:00
Jongsoo Park	90ea61800f	operators/quantized/server -> quantization/server (#13660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13660 Any change in server side quantized operator was triggering ios-sanity-check with more than 5 hours testing time. I suspect this was because the operator code was synced with xplat directory. This diff moves server side quantized operators to caffe2/caffe2/quantization/server to avoid this issue. Reviewed By: hx89 Differential Revision: D12955420 fbshipit-source-id: b6c824b9de5e2a696f8c748e1b2c77d81d46746b	2018-11-07 22:54:13 -08:00
Tongzhou Wang	2448a83d30	Give broadcast_coalesced tensors different version counters (#13594 ) Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370 edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5	2018-11-07 21:49:35 -08:00
Wei Yang	5dd153b1c2	speed up torch.sparse_mask() cpu kernel (#13290 ) Summary: - `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()` - previous `sparse_mask(D, S)` cpu kernel is not parallelized - this PR speed up the cpu kernel for two separated cases: - `D.dim == S.sparse_dim`: simply parallelize the kernel - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation - performance: `D.dim == S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) >>> %timeit D.sparse_mask(S) ======= before change ======= 6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ======= after change ======= 333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` `D.dim > S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000, 2, 2] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) %timeit D.sparse_mask(S) ======= before change ======= 495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ======= after change ======= 594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290 Differential Revision: D12878336 Pulled By: weiyangfb fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37	2018-11-07 20:02:17 -08:00
Wei Yang	6bfce16873	fix flip() shape bug in CPU (#13344 ) Summary: - a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing - this PR brings in `filp()` CUDA implementation for CPU kernel - with this change: ``` >>> t = torch.randn(1, 3, 4, 5) >> t.flip(1, 3).shape torch.Size([1, 3, 4, 5]) ``` - performance: ``` ====== with this PR ====== >>> a = torch.randn(1000, 1000) >>> %timeit -r 100 a.flip(0, 1) 1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each) ====== Perf at previous PR #7873 ====== 100 loops, best of 3: 11 ms per loop ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13344 Differential Revision: D12968003 Pulled By: weiyangfb fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d	2018-11-07 19:53:49 -08:00
Bram Wasti	1616587540	Redo jit/type and utils/functional to ATen/core (#13455 ) Summary: This is a redo of the previous move which broke OS X and Windows tests -- RTTI seemed to be broken Pull Request resolved: https://github.com/pytorch/pytorch/pull/13455 Differential Revision: D12883775 Pulled By: bwasti fbshipit-source-id: 2b6c65e8150e6f89624c6ee99c389335c6fb4bb8	2018-11-07 18:11:29 -08:00
Peter Goldsborough	87b47ff850	Remove .data() use in C++ frontend (#13675 ) Summary: Removes the last uses of `.data()` in implementation code of the C++ frontend. CC yf225 ezyang ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13675 Differential Revision: D12966061 Pulled By: goldsborough fbshipit-source-id: fbc0c83c3ba56598ff853bc7b1ddf9005fdd9c41	2018-11-07 17:30:29 -08:00
Teng Li	eb88098e11	Kill c10d/private/CUDAUtils.hpp (#13681 ) Summary: Use AT_CUDA_CHECK instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/13681 Differential Revision: D12966607 Pulled By: teng-li fbshipit-source-id: da0431f588969791a19519368edb909b9c3dc5ab	2018-11-07 17:09:08 -08:00
Zachary DeVito	c8bb665b5d	Fix a bug in tuple assignment (#13656 ) Summary: Previously, we did not distinguish between `a = b` (simple assignment), and `a, = b` (tuple destructuring of a singleton tuple). The second case would fail in the string frontend, and would not unpack in the python frontend. This patch fixes both issues and also cleans up the error reporting for unexpected expressions on the LHS. Will likely conflict with #13486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13656 Differential Revision: D12964566 Pulled By: zdevito fbshipit-source-id: 992b19e5068aef59a78cd23cb0e59a9eeb7755d1	2018-11-07 16:44:22 -08:00
Brendan Soffientini	9900a8dd89	Remove outdated css and font files in html docs (#13699 ) Summary: The stylesheet at docs/source/_static/css/pytorch_theme.css is no longer necessary for the html docs build. The new html docs theme styles are located at https://github.com/pytorch/pytorch_sphinx_theme. The Lato font is also no longer used in the new theme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13699 Differential Revision: D12967448 Pulled By: soumith fbshipit-source-id: 7de205162a61e3acacfd8b499660d328ff3812ec	2018-11-07 16:31:28 -08:00
Peter Goldsborough	7978ba45ba	Update path in CI script to access ninja (#13646 ) Summary: We weren't running C++ extensions tests in CI. Also, let's error hard when `ninja` is not available instead of skipping C++ extensions tests. Fixes https://github.com/pytorch/pytorch/issues/13622 ezyang soumith yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13646 Differential Revision: D12961468 Pulled By: goldsborough fbshipit-source-id: 917c8a14063dc40e6ab79a0f7d345ae2d3566ba4	2018-11-07 14:31:29 -08:00
Brian Vaughan	bf9b5dffbf	ensure flake8 ignores non-conforming python files generated by build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13680 Differential Revision: D12964332 Pulled By: nairbv fbshipit-source-id: a28358c265fd305f5f8cf893d25d34d6b5929210	2018-11-07 14:27:41 -08:00
Peter Goldsborough	d4f9dbfa66	Remove catch check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13677 Differential Revision: D12961992 Pulled By: goldsborough fbshipit-source-id: 1f0207704d05ac67ed1ec1502bec617c845d9f79	2018-11-07 12:27:15 -08:00
Teng Li	dceec1de30	Distributed Data Parallel documentation for PT1 release (#13657 ) Summary: This should fix https://github.com/pytorch/pytorch/issues/12604 Make html and look through the html pages to make sure that everything looks good Pull Request resolved: https://github.com/pytorch/pytorch/pull/13657 Reviewed By: calebho Differential Revision: D12954250 Pulled By: teng-li fbshipit-source-id: 40e1925ec0cdce5e6a1d8ba29537937da8ef9194	2018-11-07 12:11:57 -08:00
Jongsoo Park	216c5d0bdc	caching packed matrix (#13626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13626 Reuse pack matrix of weights. Reviewed By: dskhudia Differential Revision: D12916630 fbshipit-source-id: f0ec5734f5506134a79d9c0601146488e15c3afe	2018-11-07 12:03:39 -08:00
Yiming Wu	94fe8faa00	new QNNPACK dwconv support and tests (#13652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13652 new dwconv 3x3 5x5 tests provided Reviewed By: Maratyszcza Differential Revision: D12951866 fbshipit-source-id: f853bb7412a724de594ed36c6b2b69ec268d6464	2018-11-07 12:03:35 -08:00
Teng Li	1413dd4bfc	Added the finer bucketing option for DDP (#13607 ) Summary: We only need this for backward, for FWD cast, the non-fine-grained bucketing should be better since it's sequential anyway. Test should be covered all by c10d test, reduced bucket size to make bucketing happen in c10d test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13607 Differential Revision: D12944515 Pulled By: teng-li fbshipit-source-id: d982e8dca2874c91d39b30b73a85bfbeb768c508	2018-11-07 12:00:55 -08:00
Tongzhou Wang	044d00516c	Rename DistBackend -> Backend (#11830 ) Summary: Also add docs for get_backend, Backend, and reduce_op fixes #11803 cc The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11830 Differential Revision: D9927991 Pulled By: SsnL fbshipit-source-id: a2ffb70826241ba84264f36f2cb173e00b19af48	2018-11-07 11:58:12 -08:00
rohithkrn	afc7dbd586	Hipify caffe2/utils/math_gpu.cu (#13521 ) Summary: This PR adds caffe2/utils/math_gpu.cu to pyHipify bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/13521 Differential Revision: D12954843 Pulled By: bddppq fbshipit-source-id: a2bf367da07e49cb7807ba6876b42d0733fc8205	2018-11-07 11:34:15 -08:00
Jerry Zhang	0f59dcb317	Remove partially initialized Tensor + CopyFrom (#13629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13629 Previously we have a Tensor which has a initialized storage(therefore a known device_type) and then we'll call CopyFrom on it to initialize the sizes and data. We want to eliminate partially initialized Tensor by replacing the pattern of calling CopyFrom with a partially initialized Tensor with either splitting that to undefined Tensor + initialization API(1)(3) or combine all the initialization in the same step(2). 1. member variable initialization + CopyFrom Previously we have a tensor that is initialized with device_type, and then use CopyFrom to populate the content, now we remove the partial initialization by make the original member variable an undefined Tensor and use ReinitializeFrom to copy from another Tensor. 2. Output + CopyFrom Previously, we first get a tensor with device_type, and then CopyFrom another Tensor, We changed it two combining these two operations into OperatorBase::OutputTensor. 3. Output + custom functions Example can be found in TransformGPU function. In this case we move the part that initializes the tensor outside of the function, and do that explicitly outside so that we could reuse the Output functions to make a fully initialized Tensor. Note that to keep the original semantics, both of the APIs has a caching effect based on device_type, which means we only create a Tensor object when device_type does not match or the Tensor is undefined, otherwise, we will reuse the original Tensor object. Reviewed By: dzhulgakov Differential Revision: D12848855 fbshipit-source-id: 37bb4ddc1698ebea533b73006eeb1218faa8ddf8	2018-11-07 11:31:03 -08:00
albanD	6c8ac50753	Fix exception catching to catch c10::Error properly (#13665 ) Summary: In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace. c10::Error are subclass of `std::exception` and not `std::runtime_error`. I removed `runtime_error` in all places in our code and replaced them with `const exception`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665 Differential Revision: D12958396 Pulled By: soumith fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465	2018-11-07 11:22:48 -08:00
Alekh Karkada Ashok	674e23bbab	Fixed a small error in docstrings for ConvTranspose3d (#13668 ) Summary: In the example for ConvTranspose3d, the docstring had "Conv3d" instead of "ConvTranspose3d" in one instance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13668 Differential Revision: D12958372 Pulled By: soumith fbshipit-source-id: 5ec901e20b90f4eed2bf04c5b417183ec2096447	2018-11-07 11:22:46 -08:00
Peter Goldsborough	2fe9e3a207	Remove catch from caffe2/.gitmodules Summary: Step 3 to remove catch submodule from PyTorch Reviewed By: ezyang Differential Revision: D12959020 fbshipit-source-id: 49347de8b027433d422b653dd854ad76349d0e25	2018-11-07 11:10:09 -08:00
Peter Goldsborough	e7652cfb40	Remove caffe2/submodules/catch-rev.txt Summary: Step 1 to remove catch submodule from PyTorch Reviewed By: ezyang Differential Revision: D12958997 fbshipit-source-id: ab4b9e103ac83ad490375440722f95247eb1ac7f	2018-11-07 11:10:07 -08:00
Peter Goldsborough	ab0c72ab6f	Replace cursors with OrderedDict (#13427 ) Summary: This is a pre-cursor diff to Python <-> C++ frontend integration -- I have a follow-up PR coming for that. This PR changes the C++ frontend module interface to replace the custom "cursor"s I introduced some time ago with `OrderedDict`. I introduced cursors at the time as a convenient way of applying functions and query operations on a modules' parameters, buffers and modules, allowing things like `module.parameters().map(my_func)`. However, I noticed that (1) this functionality is easily implement-able on top of a regular data structure and (2) more importantly, using OrderedDicts is much, much easier for Python integration. This is especially true given that ScriptModule today also uses OrderedDict. Since C++ frontend modules and ScriptModules will soon too share as many implementation details as possible, it is overall the best move to ditch the custom cursor datastructure and pervasively use OrderedDict everywhere. For this I did: 1. Changed the C++ frontend module interface to more closely match the Python one by providing `parameters()`, `named_parameters()` and other methods Python provides. This is very important for the following diff which binds these into Python for inter-op with Python modules. 2. In lieu of the `Cursor::apply()` method I added `nn::Module::apply`. This again is one more unifying step between Python and C++, since Python modules have an apply function too. 3. Deleted all uses of Cursor. 4. Tidied and beefed up the `OrderedDict` class. In particular, I made `OrderedDict::Item` store an `std::pair` under the hood, because that is trivial to bind into Python and saved me a lot of headaches. `key` and `value` become methods instead of fields, which they should have been from the very start anyway because it allows exactly these kinds of changes, as per usual good software engineering principle of encapsulation. 5. Added many tests for the OrderedDict use in `nn::Module`. ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13427 Differential Revision: D12894092 Pulled By: goldsborough fbshipit-source-id: 715770c95a9643753a1db26d7f9da9a78619a15d	2018-11-07 11:10:05 -08:00
Jerry Zhang	b652c2de50	Rename dim(i) -> size(i) Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim(i)->size(i)): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935287 fbshipit-source-id: 700050640c756d7064c8db4fd50fe6a1421a61ef	2018-11-07 11:07:26 -08:00
bddppq	4326873330	Skip std and var tests in pytorch rocm CI (#13662 ) Summary: https://github.com/pytorch/pytorch/pull/13435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13662 Reviewed By: soumith Differential Revision: D12958408 Pulled By: bddppq fbshipit-source-id: 170b59769fbed149c9246b6549c62160e27d2404	2018-11-07 10:10:25 -08:00
Peter Goldsborough	9403eddce4	Fix tracing bug for custom ops (#13654 ) Summary: Due to a logic bug, tracing is broken for custom ops. Unfortunately, there also weren't any tests for tracing custom ops. The fix is a single line change of moving `pop(stack, std::get<Is>(arguments)...);` before `node = getTracedNode<Is...>(schema, arguments);`. Other changes are added tests and improved commenting/formatting. Fixes https://github.com/pytorch/pytorch/issues/13564 CC The controller you requested could not be found. fmassa zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/13654 Differential Revision: D12952887 Pulled By: goldsborough fbshipit-source-id: 87d256576f787c58e8d8f5c13a0fecd0ec62a602	2018-11-07 09:22:44 -08:00
François Garillot	edd2e38023	Clean up a couple of items in the C2 test scaffolding (WIP) (#7847 ) Summary: - Py3 compatibility - utility functions refactoring Pull Request resolved: https://github.com/pytorch/pytorch/pull/7847 Reviewed By: pietern Differential Revision: D9355096 Pulled By: huitseeker fbshipit-source-id: 8e78faa937488c5299714f78075d7cadb1b2490c	2018-11-07 09:16:13 -08:00
Jongsoo Park	10fdcf748a	swap with empty vector to force deallocation (#13625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13625 v.clear() doesn't guarantee deallocation and it was causing memory capacity issues Reviewed By: jianyuh Differential Revision: D12941938 fbshipit-source-id: b9c80828b122a44e883b32f43b5d8dfb36065773	2018-11-07 08:33:34 -08:00
Gregory Chanan	398d310bac	changes for cumsum/cumprod backward not depending on TH. (#13570 ) Summary: This is a subset of https://github.com/pytorch/pytorch/pull/13467 which is failing with ASAN errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13570 Differential Revision: D12922619 Pulled By: gchanan fbshipit-source-id: 007470243d8aee719ab9441abf29f06b4c84d59f	2018-11-07 07:45:33 -08:00
Jerry Zhang	a228a95b94	Rename ndim() -> dim() - 1/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935693 fbshipit-source-id: f24f1c10cd5bbb9e63cda0a0da989e6e3766380a	2018-11-07 07:30:11 -08:00
Jerry Zhang	4794da03f8	Rename ndim() -> dim() - 4/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935774 fbshipit-source-id: 2a7cb7da534da73b61f01eb0ff124abf193309ee	2018-11-07 07:30:09 -08:00
Jerry Zhang	57ec8f111f	Rename ndim() -> dim() - 6/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935827 fbshipit-source-id: 80ecb034c243dbfd267b9f131cee9d7afd5ef063	2018-11-07 07:27:45 -08:00
Richard Zou	e60a7c2c88	codemod tensor.type().is_cuda(), tensor.type().is_sparse() (#13590 ) Summary: Followup to #12841 Changed these to not require type dispatch: tensor.type().is_cuda() -> tensor.is_cuda() tensor.type().is_sparse() -> tensor.is_sparse() isVariable(tensor.type()) -> tensor.is_variable() This probably does not affect performance very much in most cases but it is nice to have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13590 Reviewed By: ezyang Differential Revision: D12929301 Pulled By: zou3519 fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7	2018-11-07 07:27:42 -08:00
Richard Zou	e70321ed9e	Remove unnecessary type dispatches from Variable::Impl ctor (#13630 ) Summary: This should improve the performance of wrapping a tensor in a Variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/13630 Reviewed By: ezyang Differential Revision: D12944960 Pulled By: zou3519 fbshipit-source-id: 89fa78a563e46a747d851a90ffd1b5cf3cd2d0d7	2018-11-07 07:27:40 -08:00
Jerry Zhang	2ae8e46105	Rename ndim() -> dim() - 2/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935727 fbshipit-source-id: a0c306c8f451a671b80db54fef5aa091ed58bfe5	2018-11-07 07:25:20 -08:00
Gregory Chanan	7341ab0a33	Fix range of target examples and JIT test case for CTC loss. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13644 Differential Revision: D12949733 Pulled By: gchanan fbshipit-source-id: 1c4cacbb6a50d5002165bdd0a7881883db5c8249	2018-11-07 07:04:31 -08:00
Alex Şuhan	a132a7d9ce	Add autodiff support for a few additional operators (#13288 ) Summary: Added aten::{avg_pool2d, log_softmax, max_pool2d_with_indices, threshold}, enabled aten::{expand, view}. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13288 Differential Revision: D12954929 Pulled By: soumith fbshipit-source-id: 6fba58af82cafbc7446705d8c8145cdeaf4954ca	2018-11-06 23:24:12 -08:00
Junjie Bai	a1ba29a2c0	Change to use json format to store disabled_features in hipify (#13595 ) Summary: Since json is a builtin module in Python (>= 2.6), this makes pyhipify can be invoked without installing any extra dependencies. petrex iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13595 Differential Revision: D12931045 Pulled By: bddppq fbshipit-source-id: 31d68fb6e730fd9d11593550ca531423cb0596e9	2018-11-06 22:06:10 -08:00
Marat Dukhan	7d64c9df39	Remove C2GEMMContext (#13443 ) Summary: C2GEMMContext is a remnant of old times when Int8 ops used gemmlowp. It is no longer needed: formerly gemmlowp-based ops use QNNPACK with pthreadpool interface, and other ops (Int8Add, Int8ChannelShuffle) use Caffe2 thread pool interface directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13443 Differential Revision: D12887773 Pulled By: Maratyszcza fbshipit-source-id: bd2732e2c187b399c8a82efebdd244457720256b	2018-11-06 21:50:53 -08:00
David Riazati	dbc467545f	Update weak script modules to match fns (#13631 ) Summary: Add weak modules for those that use weak script functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/13631 Differential Revision: D12945328 Pulled By: driazati fbshipit-source-id: 6cb235763bf5ab35c7b32e0f734f08d22418594f	2018-11-06 21:22:52 -08:00
Thomas Viehmann	14004cbef6	Native batch norm (#13263 ) Summary: - Move batch norm from TH(CU)NN to native - Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode) - It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Compared to the ill-fated #12368 - I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own) - I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`. - I have merged master Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263 Differential Revision: D12946254 Pulled By: SsnL fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d	2018-11-06 20:05:54 -08:00
Zachary DeVito	392ca1e59f	Remove compileFunction (#13640 ) Summary: This finishes a TODO to get torch.jit.script to go through the same pathway as methods, removing the need for forward_schema and for compileFunction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13640 Differential Revision: D12949713 Pulled By: zdevito fbshipit-source-id: 3d1a5f14910d97a68670a3fd416bdbfe457f621d	2018-11-06 19:37:06 -08:00
Teng Li	ce6edbfbd9	Fixed NCCL backend not being built (#13653 ) Summary: An regression caused by NCCL build refactoring earlier. CC fmassa Fixing: https://github.com/facebookresearch/maskrcnn-benchmark/issues/122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13653 Differential Revision: D12952555 Pulled By: teng-li fbshipit-source-id: b42e2a88fff83c9ddd58eeb33e933f1f59f51c52	2018-11-06 19:33:49 -08:00
Tongzhou Wang	2cd912bcc2	Fix more spectral norm bugs (#13350 ) Summary: Problems with SN and DP after #12671 : 1. in eval mode, `weight_orig` is not getting correct gradient #12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef	2018-11-06 19:16:13 -08:00
Tianshu Bao	eb29485ed8	Support custimzed timeout when fetching blob from KVStore (#13582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13582 Worker nodes sometimes witness timeout failures when getting session_id blob from Zeus, which due to delays in master node setting the blob. This diff will add flexibility to specify longer timeout for getting blobs from Zeus. Reviewed By: pietern Differential Revision: D12926156 fbshipit-source-id: b1a4d1d9cf7de084785bfa4a8a0cd3cfd095ba5c	2018-11-06 18:54:56 -08:00
Will Feng	bc1de6ae7d	CircleCI: disable output buffering to better locate test timeout (#13516 ) Summary: ASAN test timeout such as https://circleci.com/gh/pytorch/pytorch/165649?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link doesn't actually show where the timeout happened because of the bash output buffering. This PR turns off the buffering to better surface the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13516 Differential Revision: D12952513 Pulled By: yf225 fbshipit-source-id: 48058c021470e5aa7a2246e1fcd974cfabf5df54	2018-11-06 18:14:26 -08:00
verhoek	619c2f8b44	small fixes regarding docu of torch tensors (#13635 ) Summary: Removed duplicate doc args block. Made statements involving 'each element' more precise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13635 Differential Revision: D12946987 Pulled By: soumith fbshipit-source-id: a17da92f69086b530ff769cf4662ae29843fd188	2018-11-06 17:24:42 -08:00
Jerry Zhang	508f676c50	Rename ndim() -> dim() - 5/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: salexspb Differential Revision: D12935787 fbshipit-source-id: 303d71d3eb050789af2ab9575e5dcc48f6037086	2018-11-06 16:38:35 -08:00
Elias Ellison	6cf450744f	propagate python op error msg (#13624 ) Summary: Correctly propagate the error msg from a python op to the JIT interpreter. In the interpreter we wrap the exception and re-throw it as a Runtime Exception. Potentially in a future diff we can throw the same type of python exception as was originally thrown. Fix for https://github.com/pytorch/pytorch/issues/13560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13624 Differential Revision: D12948756 Pulled By: eellison fbshipit-source-id: 94cdf4c376143c5e40dcb9716aefb3c1e2d957db	2018-11-06 16:28:39 -08:00
Bram Wasti	feff7be294	Remove RTTI from jit/type.h (#13591 ) Summary: RTTI can't be used on android so this is needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/13591 Differential Revision: D12914402 Pulled By: bwasti fbshipit-source-id: be8c8c679bb20c7faaa7e62cd92854cedc19cb3a	2018-11-06 16:19:52 -08:00
Daya S Khudia	18de330e86	CMake integration for int8 server operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13558 Reviewed By: Maratyszcza Differential Revision: D12945460 Pulled By: dskhudia fbshipit-source-id: 1a91027b305fd6af77eebd9a4fad092a12f54712	2018-11-06 15:45:15 -08:00
Pradeep Dorairaj	76c1b5cd79	Fix overflow error in stats_put_ops Summary: I was hitting this error: caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long' So, the assignment from int64_t to float loses some precision and because of that we overflow. Reproduced this issue with this diff D12945013 Reviewed By: mlappelbaum, jdshi-fb Differential Revision: D12927086 fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683	2018-11-06 15:41:51 -08:00
Jerry Zhang	e73943e488	Remove partially initialized Tensor + ShareData (#13522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13522 Currently Tensor is a shared pointer to the underlying implementation, rather than a value, copying the pointer will share the underlying TensorImpl, ShareData probably don't make sense anymore. Reviewed By: dzhulgakov Differential Revision: D12871708 fbshipit-source-id: d3773c66b7ed0bf1c37e886f69f59aec158b216b	2018-11-06 15:23:41 -08:00
Jie	fbe3c3f57f	(#13435 ) Summary: Moved torch.var torch.std to use THC reduction kernel, this greatly improves performance for computing variance over non-contiguous dimensions. Resolving #13192 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13435 Differential Revision: D12947137 Pulled By: soumith fbshipit-source-id: c0a22cb799fa57e8fbed81c7dcb880666f461883	2018-11-06 14:42:26 -08:00
Peter Goldsborough	393ad6582d	Use torch:: instead of at:: in all C++ APIs (#13523 ) Summary: In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions. Note that since we're just talking about typedefs, this change does not break any existing code. Once this lands I will update stuff in `pytorch/tutorials` too. zdevito ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523 Differential Revision: D12942787 Pulled By: goldsborough fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763	2018-11-06 14:32:25 -08:00
Pieter Noordhuis	be424de869	Add torch.multiprocessing.spawn helper (#13518 ) Summary: This helper addresses a common pattern where one spawns N processes to work on some common task (e.g. parallel preprocessing or multiple training loops). A straightforward approach is to use the multiprocessing API directly and then consecutively call join on the resulting processes. This pattern breaks down in the face of errors. If one of the processes terminates with an exception or via some signal, and it is not the first process that was launched, the join call on the first process won't be affected. This helper seeks to solve this by waiting on termination from any of the spawned processes. When any process terminates with a non-zero exit status, it terminates the remaining processes, and raises an exception in the parent process. If the process terminated with an exception, it is propagated to the parent. If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is mentioned in the exception as well. Requires Python >= 3.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13518 Reviewed By: orionr Differential Revision: D12929045 Pulled By: pietern fbshipit-source-id: 00df19fa16a568d1e22f37a2ba65677ab0cce3fd	2018-11-06 14:08:37 -08:00
zrphercule	056f2cd238	ATen/test/basic.cpp: Catch2Gtest (#12142 ) Summary: In #11846 , we immigranted all catch tests in Aten/test/ to use gtest except of basic.cpp for a GPU bug (valgrind related). In this PR, we will find out what the bug is, and immigrant last piece of aten catch to use gtest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12142 Differential Revision: D12946980 Pulled By: zrphercule fbshipit-source-id: cf3b21f23ddec3e363ac8ec4bdeb4bc4fe35f83b	2018-11-06 14:00:18 -08:00
Michael Suo	06bfabf1f5	add tests to no-gtest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13637 Differential Revision: D12946644 Pulled By: suo fbshipit-source-id: 161ddab275d5315fc053030d0f4956a4529602b1	2018-11-06 13:46:07 -08:00
Elias Ellison	137150be88	add unwrap optional operator (#13599 ) Summary: Add a builtin to refine the type of Optional[T] -> T. This is a short-term solution to unblock porting of the the standard library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13599 Reviewed By: driazati, wanchaol Differential Revision: D12943193 Pulled By: eellison fbshipit-source-id: 31c893a78d813313bbbc1d8212b5c04e403cfb4d	2018-11-06 11:54:56 -08:00
Pieter Noordhuis	1906305c07	Consolidate argument checkers (#13623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13623 Moves the bulk of shared argument checkers in the gloo backend to Utils.hpp. Reviewed By: teng-li Differential Revision: D12934598 fbshipit-source-id: 7b80e67ccc3425f21498c30fbe7837af314f96f2	2018-11-06 11:52:38 -08:00
Richard Zou	7ffa864953	Speed up tensor.options() by avoiding type dispatch (#13330 ) Summary: Also speeds up tensor.is_variable(), tensor.layout(), and tensor.device(). This PR speeds up tensor.options() from 54ns to 17ns, resulting in a comparable speedup in torch.as_strided performance: https://gist.github.com/zou3519/7645262a4f89e237405857925bb872c3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13330 Differential Revision: D12847695 Pulled By: zou3519 fbshipit-source-id: 60b303671b0cce7b6140068c7f90c31d512643be	2018-11-06 11:39:28 -08:00
Edward Yang	464dc31532	Add README to tools, delete defunct scripts. (#13621 ) Summary: Some extra documentation for other bits too. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13621 Differential Revision: D12943416 Pulled By: ezyang fbshipit-source-id: c922995e420d38c2698ce59c5bf4ffa9eb68da83	2018-11-06 11:20:53 -08:00
Gu, Jinghui	6aee5488b5	correct omp dependency for mkl-dnn (#13449 ) Summary: The motivational of this PR is to enforce mkldnn to use the same omp version of caffe2 framework. Meanwhile, do not change other assumptions within mkldnn. Previously, the MKL_cmake_included is set in caffe2 in order to disable omp seeking in mkldnn. But, with such change, mkldnn has no chance to adapt for mkl found by caffe2. Then, some building flags of mkl will be not set in mkldnn. For example, USE_MKL, USE_CBLAS, etc. In this PR, we enforce set the MKLIOMP5LIB for mkldnn according to caffe2, and tell the mkl root path in MKLROOT for mkldnn. Then, mkldnn is built as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13449 Differential Revision: D12899504 Pulled By: yinghai fbshipit-source-id: 22a196bd00b4ef0a11d350a32c049304613edf52	2018-11-06 10:48:09 -08:00
Soumith Chintala	a7ee632dff	Various Test and build fixes (#13556 ) Summary: - fixes weights-contiguous requirement for THCUNN Convolutions - Add tests that conv backward pass works for non-contiguous weights - fix RNN tests / error messages to be consistent and pass - relax weight grad precision for fp16 for a particular test - fix regression of CMAKE_PREFIX_PATH not passing through - add missing skipIfNoLapack annotations where needed Differential Revision: D12918456 Pulled By: soumith fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05	2018-11-06 07:13:47 -08:00
Gregory Chanan	9ca9469de6	mm backwards to not depend on TH. (#13575 ) Summary: This is a subset of https://github.com/pytorch/pytorch/pull/13476. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13575 Differential Revision: D12923473 Pulled By: gchanan fbshipit-source-id: 8787808d2ab377cc535f69c3c63dcd671c72b7db	2018-11-06 06:47:44 -08:00
Gregory Chanan	3c1d593a27	cumsum/cumprod derivatives not depending on TH. (#13579 ) Summary: This is identical to https://github.com/pytorch/pytorch/pull/13467 but doesn't include the tests in common_invocations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13579 Differential Revision: D12925404 Pulled By: gchanan fbshipit-source-id: 0a52fd26b15c7e0bbdfec03948f3e6c849e65091	2018-11-06 06:42:01 -08:00
Junjie Bai	95ca66763d	Add math functions overloaded over different numeric types for cuda and hip (#13602 ) Summary: petrex ashishfarmer rohithkrn iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13602 Reviewed By: dzhulgakov Differential Revision: D12935797 Pulled By: bddppq fbshipit-source-id: a49ec66fb60bfd947c63dd2133d431884df62235	2018-11-06 01:40:31 -08:00
Hong Li	d03c6ba50d	Adding Fetching Real number representation Summary: Adding Fetching Real number representation for int8 tensor in workpace.py Reviewed By: harouwu Differential Revision: D12936556 fbshipit-source-id: f8756a37bce21c93d44d52faf5da9c9bd6473f4a	2018-11-05 23:35:24 -08:00
Jerry Zhang	3c32f897ca	Rename ndim() -> dim() - 3/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: dzhulgakov Differential Revision: D12935748 fbshipit-source-id: fccec04e28ec049789f772e70d691382cb8927e0	2018-11-05 23:21:40 -08:00
Jie	bbacd859ab	Updating heuristics for cudnn persistent RNN (#13612 ) Summary: modifying rnn heuristics to exclude GPU with sm == 7.5 from using perssistent RNN Pull Request resolved: https://github.com/pytorch/pytorch/pull/13612 Differential Revision: D12937455 Pulled By: soumith fbshipit-source-id: 5cdaea083d55383b85dbe6e5443f1b36e578e4f5	2018-11-05 21:35:44 -08:00
David Riazati	fc6a9a19ea	Add torch._C._nn built-in, more weak fns (#13322 ) Summary: This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR) Converts these functions to weak script: * `threshold` * `relu` * `hardtanh` * `relu6` * `elu` * `selu` * `celu` * `leaky_relu` * `rrelu` * `tanh` * `sigmoid` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322 Differential Revision: D12852203 Pulled By: driazati fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809	2018-11-05 21:02:18 -08:00
zrphercule	10d67716db	bump docker image to 262 (#13581 ) Summary: We updated valgrind version in our recent docker image. https://github.com/pietern/pytorch-dockerfiles/pull/23 https://github.com/pytorch/ossci-job-dsl/pull/31 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13581 Reviewed By: goldsborough Differential Revision: D12936485 Pulled By: zrphercule fbshipit-source-id: 981532394b23e8d8ecfd6b2458ddf03710d5ac67	2018-11-05 20:43:39 -08:00
Teng Li	bad8235a3a	Disabling NCCL coalesced bcast test since it hangs in CI (#13606 ) Summary: Functionality test shouldn't be affected since we have both backends testing for the same thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13606 Differential Revision: D12937185 Pulled By: teng-li fbshipit-source-id: 03d897b6690f7932654fdb7d11a07016dfffa751	2018-11-05 20:34:15 -08:00
Richard Zou	9ef98624b3	Don't allocate empty Storage/StorageImpl for Variable. (#13580 ) Summary: Variable owns a Tensor which already has a Storage/StorageImpl if necessary. The Variable ctor was unnecessarily allocating another Storage/StorageImpl, which costs around 200ns. This PR gets rid of that behavior and cuts the `as_variable` time from 670ns to 475ns, reducing Variable overhead Pull Request resolved: https://github.com/pytorch/pytorch/pull/13580 Differential Revision: D12925495 Pulled By: zou3519 fbshipit-source-id: 4f5ec33776baa848d1c318abcf40b57125b3bed7	2018-11-05 19:24:14 -08:00
zrphercule	02d3787a19	Support new upsample in symbolic, caffe2 backend & caffe2 frontend (#13272 ) Summary: We updated the description of upsample_op in onnx: https://github.com/onnx/onnx/pull/1467 Therefore, we need to support the new upsample_op in caffe2-onnx backend as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13272 Reviewed By: houseroad Differential Revision: D12833656 Pulled By: zrphercule fbshipit-source-id: 21af5282abaae12d2d044e4018a2b152aff79917	2018-11-05 19:13:57 -08:00
Jerry Zhang	ebaabfbbd5	ReinitializeTensor function for refactoring Tensor as member variable (#13147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13147 We want to refactor ``` class A { void func() { x_.Resize(dims); auto* data = x_.mutable_data<T>(); } Tensor x(CPU); }; ``` to ``` class A { void func() { ReinitializeTensor(&x_, dims, at::dtype<T>().device(CPU)); auto* data = x_.mutable_data<T>(); } Tensor x_; // Undefined Tensor }; ``` This diff adds the ReinitializeTensor function. Reviewed By: dzhulgakov Differential Revision: D10861298 fbshipit-source-id: 9f432297d07a4890e29bb68436364e0b2e2545e7	2018-11-05 19:13:55 -08:00
mruberry	a340dce133	Replaces c10d's CUDAEvent with ATen's (#13464 ) Summary: This PR: - Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files - Updates c10d's usage of CUDAEvent to reflect the ATen API - Updates c10d's usage of streams to reflect the ATen API - Removes use of historic THCState in the touched c10d files - (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case. The controller you requested could not be found. pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/13464 Reviewed By: teng-li Differential Revision: D12924291 Pulled By: pietern fbshipit-source-id: b8ebe3e01e53d74e527ad199cca3aa11915c1fc0	2018-11-05 19:13:52 -08:00
Peter Goldsborough	e2272dd312	Remove ATen/README.md in favor of cppdocs/notes/tensor_basics.rst (#13601 ) Summary: Removes aten/README.md (and some other files dating from when aten was its own repo), and moves the not outdated documentation into a note called "Tensor Basics". I updated the text lightly but did not overhaul the content. CC zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13601 Differential Revision: D12934480 Pulled By: goldsborough fbshipit-source-id: 012a4267b4d6f27e4d5d55d6fc66363ddca10b41	2018-11-05 19:13:50 -08:00
Wanchao Liang	af4a228426	Fix erase_number_type pass, negative indices in c2 and some onnx symbolics (#12888 ) Summary: The PR did two things: 1. fix the bug in erase_number_type on node inputs 2. handle negative indices for dim-reduce in caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12888 Reviewed By: houseroad Differential Revision: D12833486 Pulled By: wanchaol fbshipit-source-id: c3ceb400d91f0173b73ad95e392b010c3c14db7d	2018-11-05 19:13:49 -08:00
Daya S Khudia	2398a3255e	fbgemm submodule update (#13592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13592 submodule update for fbgemm Reviewed By: jspark1105 Differential Revision: D12929740 fbshipit-source-id: 546e4d7042696ffc5b0ee7cabd236ec944d218e7	2018-11-05 17:39:20 -08:00
Sebastian Messmer	b1c57caaf9	Move flat_hash_map to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13527 Reviewed By: ezyang Differential Revision: D12912239 fbshipit-source-id: bb44d3ff87c4ca94943ec2667acf1e7ce2b3c914	2018-11-05 17:39:18 -08:00
Sebastian Messmer	b7c9575c93	Move LeftRight to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13526 Reviewed By: ezyang Differential Revision: D12912241 fbshipit-source-id: 70525a9b20daa8aae623d0cb4002acecc34b1932	2018-11-05 17:39:16 -08:00
Peter Goldsborough	8fafa7b6ac	Remove size() from BatchDataset and templatize IndexType (#12960 ) Summary: This PR brings to changes to the recently landed C++ Frontend dataloader: 1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration. 2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility. See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables. Some additional minor updates and code movements too. apaszke SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960 Differential Revision: D12893342 Pulled By: goldsborough fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108	2018-11-05 17:13:09 -08:00
David Riazati	1969898647	Convert functional dropouts to weak script (#13484 ) Summary: To convert `nn.functional.dropout` * `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions` * rng state between calls in the tests needed to be made consistent Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484 Differential Revision: D12929622 Pulled By: driazati fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56	2018-11-05 17:13:07 -08:00
David Riazati	23e3a12d5e	Add `pass` support to script (#13535 ) Summary: This PR adds basic support for `pass` statements Pull Request resolved: https://github.com/pytorch/pytorch/pull/13535 Differential Revision: D12929529 Pulled By: driazati fbshipit-source-id: 70c7c52630d46e76366c4caa875d6c5419a1e03f	2018-11-05 17:13:06 -08:00
David Riazati	df67d4180a	Validate schema with no returns (#13525 ) Summary: If there is no return type then the returns of the schema are not checked against the returns in the graph, so this PR adds an error if that case is detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13525 Differential Revision: D12929524 Pulled By: driazati fbshipit-source-id: da562e979482393098830bbded26729a2499152a	2018-11-05 16:51:55 -08:00
Peter Goldsborough	7b9d755d88	Restructure torch/torch.h and extension.h (#13482 ) Summary: This PR restructures the public-facing C++ headers in a backwards compatible way. The problem right now is that the C++ extension header `torch/extension.h` does not include the C++ frontend headers from `torch/torch.h`. However, those C++ frontend headers can be convenient. Further, including the C++ frontend main header `torch/torch.h` in a C++ extension currently raises a warning because we want to move people away from exclusively including `torch/torch.h` in extensions (which was the correct thing 6 months ago), since that used to be the main C++ extension header but is now the main C++ frontend header. In short: it should be possible to include the C++ frontend functionality from `torch/torch.h`, but without including that header directly because it's deprecated for extensions. For clarification: why is `torch/torch.h` deprecated for extensions? Because for extensions we need to include Python stuff, but for the C++ frontend we don't want this Python stuff. For now the python stuff is included in `torch/torch.h` whenever the header is used from a C++ extension (enabled by a macro passed by `cpp_extensions.py`) to not break existing users, but this should change in the future. The overall fix is simple: 1. C++ frontend sub-headers move from `torch/torch.h` into `torch/all.h`. 2. `torch/all.h` is included in: 1. `torch/torch.h`, as is. 2. `torch/extensions.h`, to now also give C++ extension users this functionality. With the next release we can then: 1. Remove the Python includes from `torch/torch.h` 2. Move C++-only sub-headers from `all.h` back into `torch.h` 3. Make `extension.h` include `torch.h` and `Python.h` This will then break old C++ extensions that include `torch/torch.h`, since the correct header for C++ extensions is `torch/extension.h`. I've also gone ahead and deprecated `torch::CPU` et al. since those are long due to die. ezyang soumith apaszke fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/13482 Differential Revision: D12924999 Pulled By: goldsborough fbshipit-source-id: 5bb7bdc005fcb7b525195b769065176514efad8a	2018-11-05 16:46:52 -08:00
Teng Li	1b64c0f8fe	Error msg on TCP backend (#13596 ) Summary: Clean it up from my queue: https://github.com/pytorch/pytorch/issues/12721 ``` >>> torch.distributed.init_process_group(backend="tcp") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 275, in init_process_group backend = DistBackend(backend) File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 55, in __new__ raise ValueError("TCP backend has been deprecated. Please use " ValueError: TCP backend has been deprecated. Please use Gloo or MPI backends for collective operations on CPU tensors. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13596 Differential Revision: D12931196 Pulled By: teng-li fbshipit-source-id: bb739b107ad7454e2e0a17430087161fedd4c392	2018-11-05 16:40:02 -08:00
Teng Li	74819087de	Mixed precision DDP hang fix and fine-grained option for DDP perf (#13496 ) Summary: When go to mixed precision fp16 training, DDP randomly hangs. Initially, I thought this smells like a similar NCCL bug I filed a while ago. It turns out it's not. Again, I am seeing different rank process has different size. How could this even happen? It turns out that take_tensors will generate a list of bucketed tensors in an un deterministic order, because, the key to the map is a pointer. An interesting bug digging and fix. Now fp16 DDP training should be fully working now. Also, added another take_tensor fine grained helper that aims to improve the performance of DDP, making it a TODO to replace the DDP take_tensors with that. Fixed: https://github.com/pytorch/pytorch/issues/12150 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13496 Differential Revision: D12920985 Pulled By: teng-li fbshipit-source-id: 26f3edae7be45a80fa7b2410a2e5a1baab212d9c	2018-11-05 16:22:15 -08:00
Peter Goldsborough	84cfc28f23	Note on Tensor Creation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13517 Differential Revision: D12914271 Pulled By: goldsborough fbshipit-source-id: df64fca6652525bc814f6fd3e486c87bff29b5b5	2018-11-05 16:10:58 -08:00
Adam Paszke	f6ff5d8934	Append parameters when checking graphs for TorchScript Methods (#13553 ) Summary: Also, add an assertion in the GraphExecutor to make sure we don't access memory out of bounds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13553 Differential Revision: D12924796 Pulled By: soumith fbshipit-source-id: ea2a134084538484178b8ebad33d6716a8e1d633	2018-11-05 16:07:36 -08:00
Bram Wasti	f3c197d6fa	Add explicit c10:: namespace to converter (#13593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13593 Should fix up master Reviewed By: orionr Differential Revision: D12929779 fbshipit-source-id: 23119f5bf1d9f1e37e8ed01bfa2cc40647725390	2018-11-05 14:52:16 -08:00
Pieter Noordhuis	7faca2a217	Add new style broadcast support in c10d/gloo (#13497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13497 This replaces the existing broadcast implementation with the new style collective call in the gloo backend. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU broadcast implementation. Reviewed By: teng-li Differential Revision: D12890013 fbshipit-source-id: 43f346fb2814f421bedc7babf89169703a46bb9c	2018-11-05 13:52:07 -08:00
Pieter Noordhuis	d2f26a450e	Add new style allreduce support in c10d/gloo (#13426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13426 This replaces the existing allreduce implementation with the new style collective call in the gloo backend. This is the first one to include both a CPU and a CUDA path. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU allreduce implementation. This is not much different from the current situation in the case where there is a single input tensor per call (which is the case when called from DistributedDataParallel). Reviewed By: teng-li Differential Revision: D12855689 fbshipit-source-id: 574281d762dd29149fa7f634fb71f8f6a9787598	2018-11-05 13:52:05 -08:00
Pieter Noordhuis	d50dd47ccd	Add reduce support in c10d/gloo (#13425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13425 This adds support for the new style reduce collective call in the gloo backend. Reviewed By: teng-li Differential Revision: D12869404 fbshipit-source-id: 93c641e6aba3b03c796bda80737547c565cfa571	2018-11-05 13:52:02 -08:00
Pieter Noordhuis	8f0f97749c	Add allgather support in c10d/gloo (#13424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13424 This adds support for the allgather collective call in the gloo backend. The gloo implementation does not support multiple inputs per rank (nor one or more outputs per rank), so we use a temporary flattened buffer and unflatten once the collective finishes. Reviewed By: teng-li Differential Revision: D12832009 fbshipit-source-id: 2f5c1934a338589cef1d3192bd92ada135fecd7a	2018-11-05 13:52:01 -08:00
Pieter Noordhuis	75c2b34c86	Add gather support in c10d/gloo (#13423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13423 This adds support for the gather collective call in the gloo backend. The gloo implementation does not yet support the mode where the root has multiple output tensors (one per rank), so we use a temporary flattened buffer and unflatten on the root once the collective finishes. Reviewed By: teng-li Differential Revision: D12811647 fbshipit-source-id: 90fe8af8c390090b7d4ef43aa74f4e3e67ab9d0b	2018-11-05 13:51:59 -08:00
Pieter Noordhuis	9cfe9418e6	Add scatter support in c10d/gloo (#13422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13422 This adds support for the scatter collective call in the gloo backend. This is the first of the new style collectives that do not expect to be created once and used many times. This commit contains some shortcuts to make this new style work side by side with the existing implementations (such as the std::tuple with nullptr's). These shortcuts are temporary until we have moved over all collectives to this new style. Reviewed By: teng-li Differential Revision: D12310219 fbshipit-source-id: 32e68717f819d5980f0e469d297204948351cefc	2018-11-05 13:51:57 -08:00
Sam Gross	98f5c005da	Speed up CPU threshold and relu implementation (#13182 ) Summary: ``` The previous threshold implementation was not vectorized or parallelized. This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms CPU timings: https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8 1 thread (before vs. after) 10240: 17.4 us vs. 6.9 µs per loop 102400: 141 us vs. 39.8 µs per loop 16 threads (before vs. after) 10240: 17.4 us vs. 6.7 µs per loop 102400: 141 us vs. 14.3 µs per loop CUDA timings are not measurably different. [1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182 Reviewed By: soumith Differential Revision: D12825105 Pulled By: colesbury fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15	2018-11-05 12:51:29 -08:00
Lu Fang	b2127cfa9a	Make the inception onnx test more stable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13563 Differential Revision: D12924968 Pulled By: houseroad fbshipit-source-id: ba43c88aabee749cb1e1307a412eacda4b8870b0	2018-11-05 12:39:00 -08:00
Jerry Zhang	5f514a483c	Move Half.{h, cpp} and Half-inl.h to c10 (#13361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13361 att Reviewed By: Yangqing Differential Revision: D12853472 fbshipit-source-id: ad3b96cbc6904435553a6c9e58aa158ec77a2961	2018-11-05 12:32:12 -08:00
Jerry Zhang	e06f92785c	Move ATen/core/Macros.h to c10/macros/Macros.h Summary: EXT=h,cc,cpp,hpp,cxx,cu,cuh d=caffe2/aten/ codemod -m -d $d --extensions $EXT 'AT_HOST_DEVICE' 'C10_HOST_DEVICE' codemod -m -d $d --extensions $EXT 'AT_DEVICE' 'C10_DEVICE' codemod -m -d $d --extensions $EXT 'AT_HOST' 'C10_HOST' codemod -m -d $d --extensions $EXT 'AT_ANDROID' 'C10_ANDROID' codemod -m -d $d --extensions $EXT 'AT_IOS' 'C10_IOS' codemod -m -d $d --extensions $EXT 'AT_MOBILE' 'C10_MOBILE' codemod -m -d $d --extensions $EXT 'ATen/core/Macros.h' 'c10/macros/Macros.h' codemod -m -d $d --extensions $EXT 'HIP_HOST_DEVICE' 'C10_HIP_HOST_DEVICE' Reviewed By: dzhulgakov Differential Revision: D12851341 fbshipit-source-id: 7d540530ef779e16ddf2b4cdda9dcc85a61410c3	2018-11-05 12:32:11 -08:00
Pieter Noordhuis	8c182cd89e	Add overload of ProcessGroup.allreduce with list of tensors (#13576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13576 TSIA Reviewed By: SsnL Differential Revision: D12923457 fbshipit-source-id: 7824490548edbacac3cda81c7500bd1f851c6093	2018-11-05 11:56:49 -08:00
Peter Goldsborough	482b1366e6	Remove half_support.* (#13534 ) Summary: These two files are unused. I think at the time I moved the code into an inline extension (https://github.com/pytorch/pytorch/blob/master/test/test_cpp_extensions.py#L288) and forgot to delete the files. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13534 Differential Revision: D12924365 Pulled By: goldsborough fbshipit-source-id: 050dd7da267008ea58a5dcc8febee7d7e443bc3d	2018-11-05 10:04:21 -08:00
Thomas Viehmann	f0ed927b62	Add diag_embed to ATen and torch (#12447 ) Summary: Fixes: #12160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12447 Differential Revision: D12916234 Pulled By: SsnL fbshipit-source-id: 512a04efb0c2e0a54295b857a61be66c3aae13da	2018-11-05 08:55:28 -08:00
Brian Vaughan	07f8b61cc6	Roll operator t32802531 (#13261 ) Summary: Adding a roll operator Pull Request resolved: https://github.com/pytorch/pytorch/pull/13261 Differential Revision: D12922575 Pulled By: nairbv fbshipit-source-id: ff05c075d9c484a615011192b023debf47da4017	2018-11-05 08:33:36 -08:00
Jerry Zhang	e7242cbaf2	Rename dim(i) -> size(i) - 1/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12896712 fbshipit-source-id: 909731691fab7799efbcfc3b5dcc9e531831c2d4	2018-11-05 07:27:04 -08:00
Daya S Khudia	3ea64bd80b	fbgemm submodule update (#13562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13562 submodule update for fbgemm. This version of fbgemm has cmake minimum required same as pytorch. Without this, OSS build fails. Reviewed By: jianyuh Differential Revision: D12920951 fbshipit-source-id: 9ef532e715e3f7612fecc8430736633cf6b17f34	2018-11-05 07:22:34 -08:00
Adam Paszke	e988dc621b	Stop depending on static analysis of tensor types in graph fuser (#13387 ) Summary: Built on top of #13108, so please review only the last commit. This makes the graph fuser ignore input types (device/scalar type) when considering graphs for fusion, making it much more robust to shape-prop failures. Those properties are now checked at run time, as part of the kernel validation. This should enable graph fusions in `jit_premul` and `jit_multilayer` timelines in our benchmarks. One regression is that I've disabled fusions of comparison ops (and `type_as`). That's because there's really no good way to ensure that those are really valid, and are a source of bugs (I filed #13384). cc ngimel mruberry zdevito zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13387 Differential Revision: D12888104 Pulled By: zou3519 fbshipit-source-id: c233ea599679c34ac70fb4d8b8497c60aad9e480	2018-11-05 06:32:08 -08:00
Viswanath Sivakumar	505f9b4d63	Add Int8BatchPermutation op in DNNLOWP (#13539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13539 This is used by OCR's FPN model to detect small/dense text. Just a simple permutations along the batch dim based on the input indices, and we can avoid the unnecessary quantize/dequantize ops. Reviewed By: csummersea Differential Revision: D12894055 fbshipit-source-id: d25639a5ffc2c490a0ee7ef307302eb2953c307e	2018-11-05 01:57:50 -08:00
Jongsoo Park	54e8623d26	3D Conv in NHWC layout (#12733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733 Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10333829 fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389	2018-11-04 21:50:09 -08:00
Hector Yuen	274f3c0951	add explicit fpga context (#13318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13318 add a context to describe fpga this will remove the need of having opencl with fpga engine the next step is to change the opencl implementation to explicitly use the fpga context Reviewed By: soumith Differential Revision: D12828795 fbshipit-source-id: 0700a83672d117d7aa3d941cd39c2ae627cb6e5f	2018-11-04 21:47:45 -08:00
albanD	246d5282b3	fix handling of single input in gradcheck (#13543 ) Summary: Now gradcheck properly accept a single Tensor as input. It was almost supported already but not completely. Should fix the confusion from #13540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13543 Differential Revision: D12918526 Pulled By: soumith fbshipit-source-id: a5bad69af0aea48c146f58df2482cabf91e24a01	2018-11-04 20:28:34 -08:00
Dmytro Dzhulgakov	fdf34c8da8	Kill more weird constructors on Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13433 Reviewed By: jerryzh168 Differential Revision: D12874599 fbshipit-source-id: 0c262fda72cbc4f3ea80df790cc8e95140bdc7e0	2018-11-04 16:54:49 -08:00
Jongsoo Park	f000101b81	add a few comments on layout after im2col (#12429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12429 Comments to clarify layout after NHWC im2col for group convolution. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10233284 fbshipit-source-id: 996a69f2f932e02c978abaade7571b00741b6ae8	2018-11-04 11:02:58 -08:00
Daya S Khudia	6b578cd388	update fbgemm submodule (#13547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13547 update fbgemm submodule Reviewed By: jspark1105, jianyuh Differential Revision: D12917297 fbshipit-source-id: ad9b2c7f119ca159af3826266b59ec26fc54911c	2018-11-04 09:15:17 -08:00
Viswanath Sivakumar	c1ed1b4779	Duplicate bias blobs shared by different conv ops to handle scale correctly (#13538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13538 In architectures such as FPN (https://arxiv.org/abs/1612.03144), few Conv ops share the same weight and bias and are run at different scales of the input. Since 'bias_scale = input_scale * weight_scale', sharing the same bias blob among multiple Conv ops means that we need different bias scale for each of the ops. To achieve this, we just duplicate those bias blobs that are used by multiple Conv ops before performing int8 rewrite. Reviewed By: csummersea Differential Revision: D12854062 fbshipit-source-id: 42a2951877819339b117f13f01816291a4fa6596	2018-11-04 04:15:28 -08:00
Jongsoo Park	2a6850bf73	remove unnecessary files (#13537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13537 plot_hist.py and dnnlowp_fc_perf_comparison.py were not supposed to be in operators/quantized/server Reviewed By: hx89 Differential Revision: D12916259 fbshipit-source-id: f5bc0c01a4924cad6f82eff624ba5f79becbea33	2018-11-04 01:01:28 -07:00
Jongsoo Park	8be0efaa8c	omit group conv NHWC test for HIP (#13554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554 D10233252 broke ROCM test. We don't have group conv in NHWC for hip yet and this diff omits related tests. Reviewed By: hyuen Differential Revision: D12917880 fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8	2018-11-03 21:18:23 -07:00
Ilija Radosavovic	9e432b593d	Include caffe2 proto headers in pytorch package data (#13217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13217 Caffe2 proto headers are not included in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1180). However, they are required for building custom Caffe2 ops living outside PyTorch/Caffe2 repo (e.g. custom Detectron ops). Reviewed By: pjh5 Differential Revision: D12815881 fbshipit-source-id: 4d1aaa6a69a2193247586e85e4244fbbdb3e8192	2018-11-03 16:19:39 -07:00
Ilija Radosavovic	149afef5c4	Include lib subdir in caffe2 include dirs path (#13216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13216 Caffe2 headers are placed under `lib/include` in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1201). However, `CAFFE2_INCLUDE_DIRS` path is set to `"${_INSTALL_PREFIX}/include"` which does not exist in package data. This results in issues when trying to build custom Caffe2 ops living outside Caffe2/PyTorch repo (e.g. custom Detectron ops). Reviewed By: pjh5 Differential Revision: D12815878 fbshipit-source-id: 7cb1b4a729f8242b7437e3f30dace3b9cf044144	2018-11-03 16:19:38 -07:00
Jongsoo Park	d40b23e750	remove unused use_scratch argument from batch_matmul (#11745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11745 use_scratch was introduced in D5834868 but D8944686 refactored GemmStridedBatched and use_scratch is not used anywhere and not documented as far as I can tell. Reviewed By: BIT-silence Differential Revision: D9846488 fbshipit-source-id: 915d92aa57bc211888dfb09ad657f7c2b4f4b71c	2018-11-03 15:31:24 -07:00
Jongsoo Park	2bc6a7a260	enable group conv test in NHWC layout in CPU (#12428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428 Group conv in NHWC layout was enabled in CPU after D7547497. In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py . Reviewed By: BIT-silence Differential Revision: D10233252 fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde	2018-11-03 11:58:51 -07:00
Soumith Chintala	2b280c6b74	minor build fixes for incremental builds (#13293 ) Summary: Workaround a cmake-ninja bug, which doesn't track the dependency between xxx-generated-xxx.cu and updating the timestamp of build.ninja, (the consequence being cmake is rerun on a next rebuild). This was surfaced after analyzing the outputs of `ninja -d explain install` Now, compared to https://github.com/pytorch/pytorch/pull/11487#issue-214450604 we're seeing: ``` python setup.py rebuild develop # first time - ~1m 42s python setup.py rebuild develop # second time - ~12 s ``` This gets even faster if we replace the default linker with multithreaded linkers like `lld` or `gold` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13293 Differential Revision: D12916346 Pulled By: soumith fbshipit-source-id: 3817c09a9a687fa2273f90444e5071ce1bb47260	2018-11-03 09:53:04 -07:00
Peter Goldsborough	0479517325	Add modernize-* checks to clang-tidy (#13196 ) Summary: Enables almost all `modernize-*` checks in clang-tidy. This warns against things such as: - Use of `const std::string&` instead of new-style `std::string` + move, - Using old-style loops instead of range-for loops, - Use of raw `new` - Use of `push_back` instead of `emplace_back` - Use of `virtual` together with `override` (`override` is sufficient) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13196 Differential Revision: D12891837 Pulled By: goldsborough fbshipit-source-id: 4d0f782a09eb391ee718d3d66f74c095ee121c09	2018-11-02 20:30:40 -07:00
Mingzhe Li	4bca51e3e7	unify BLAS check between Caffe2 and ATen (#13514 ) Summary: This PR is unifying BLAS check between Caffe2 and ATen. It skips redundant BLAS check for ATen in certain conditions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13514 Reviewed By: orionr Differential Revision: D12905272 Pulled By: mingzhe09088 fbshipit-source-id: 05163704f363c97a762ff034f88a67bd32ac01d0	2018-11-02 18:40:10 -07:00
Haixin Liu	8fc63e523e	Reslove lint and infer warning (#13520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13520 Reslove lint and infer warning shown in the dnnlowp migration diff. Reviewed By: dskhudia Differential Revision: D12905972 fbshipit-source-id: b07400e25b80ea656795b005b91ac1438abe2695	2018-11-02 17:43:49 -07:00
Wanchao Liang	f74fa91b8e	Fix EraseListConstruct pass during ONNX export (#13195 ) Summary: There should really be a single place to erase or do special treatment to the prim::ListConstruct during ONNX export, this will make it consistent across different calls. e.g it will give a correct output graph in the following case: ```python class Test(torch.nn.Module): def forward(self, input): return torch.cat([input, torch.zeros(input.size(0), 1).type_as(input)], dim=1) ``` Before this PR, we have the onnx graph as: ``` graph(%0 : Byte(2, 3)) { %1 : Long() = onnx::Constant[value={0}](), scope: Test %2 : Dynamic = onnx::Shape(%0), scope: Test %3 : Long() = onnx::Gather[axis=0](%2, %1), scope: Test %4 : Long() = onnx::Constant[value={1}](), scope: Test %5 : Dynamic = onnx::Unsqueeze[axes=[0]](%3) %6 : Dynamic = onnx::Unsqueeze[axes=[0]](%4) %7 : int[] = onnx::Concat[axis=0](%5, %6) %8 : Float(2, 1) = onnx::ConstantFill[dtype=1, input_as_shape=1, value=0](%7), scope: Test %9 : Byte(2, 1) = onnx::Cast[to=2](%8), scope: Test %10 : Byte(2, 4) = onnx::Concat[axis=1](%0, %9), scope: Test return (%10); } ``` Which is wrong since onnx does not have a concept of `int[]`, here is the onnx graph after this PR: ``` graph(%0 : Byte(2, 3)) { %1 : Long() = onnx::Constant[value={0}](), scope: Test %2 : Dynamic = onnx::Shape(%0), scope: Test %3 : Long() = onnx::Gather[axis=0](%2, %1), scope: Test %4 : Long() = onnx::Constant[value={1}](), scope: Test %5 : Dynamic = onnx::Unsqueeze[axes=[0]](%3) %6 : Dynamic = onnx::Unsqueeze[axes=[0]](%4) %7 : Dynamic = onnx::Concat[axis=0](%5, %6) %8 : Float(2, 1) = onnx::ConstantFill[dtype=1, input_as_shape=1, value=0](%7), scope: Test %9 : Byte(2, 1) = onnx::Cast[to=2](%8), scope: Test %10 : Byte(2, 4) = onnx::Concat[axis=1](%0, %9), scope: Test return (%10); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13195 Differential Revision: D12812541 Pulled By: wanchaol fbshipit-source-id: db6be8bf0cdc85c426d5cbe09a28c5e5d860eb3e	2018-11-02 15:09:06 -07:00
Jerry Zhang	519570def8	Rename dim(i) -> size(i) - 2/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: salexspb Differential Revision: D12896721 fbshipit-source-id: deb0290354a1ffd69d080f0f126479844bf04e3c	2018-11-02 14:29:06 -07:00
Pieter Noordhuis	7b48a7c3f6	Bump gloo (#13513 ) Summary: Included math.h changes needed in #13422 and later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13513 Differential Revision: D12906653 Pulled By: pietern fbshipit-source-id: 4d4ec7566bf07925b4ce86eb0c63d784cb6b9992	2018-11-02 12:16:17 -07:00
Junjie Bai	da029ca042	Skip Conv1D tests for MIOPEN (#13512 ) Summary: miopen currently only supports 2d Pull Request resolved: https://github.com/pytorch/pytorch/pull/13512 Differential Revision: D12903307 Pulled By: bddppq fbshipit-source-id: a8b0f0580a1859f1e0c1518907406abf013c4c8c	2018-11-02 11:38:26 -07:00
Xiaomeng Yang	34dd831dc2	Revert MKL rowwise moments (#13480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13480 Revert D12845220 since the MKL functions are using multi-thread while the single-thread run is slower than eigen version. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D12891751 fbshipit-source-id: 2a61727b269a304daeee2af6ff7fee7820cb5344	2018-11-02 11:31:43 -07:00
Guoxia Wang	cc3cecdba0	Fix the bug when compile using nvcc compiler. (#13509 ) Summary: I found a bug about compiling the cuda file when I install maskrcnn-benchmark lib. `python setup.py build develop` will throw the error: ``` File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 214, in unix_wrap_compile original_compile(obj, src, ext, cc_args, cflags, pp_opts) File "/usr/lib/python2.7/distutils/unixccompiler.py", line 125, in _compile self.spawn(compiler_so + cc_args + [src, '-o', obj] + TypeError: coercing to Unicode: need string or buffer, list found ``` For more information, please see [issue](https://github.com/facebookresearch/maskrcnn-benchmark/issues/99). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13509 Differential Revision: D12902675 Pulled By: soumith fbshipit-source-id: b9149f5de21ae29f94670cb2bbc93fa368f4e0f7	2018-11-02 11:09:43 -07:00
Gregory Chanan	2827fc7681	Add native wrappers for inplace bitwise operators. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13490 Differential Revision: D12894826 Pulled By: gchanan fbshipit-source-id: bd7a0a50e824d92f8ad39e159c1c10318741191d	2018-11-02 11:03:24 -07:00
Tongzhou Wang	9f2b2cac37	Fix handling all empty bags in CUDA embedding bag (#13483 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483 Differential Revision: D12902914 Pulled By: SsnL fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca	2018-11-02 10:21:14 -07:00
Haixin Liu	3d392cc5ec	Migrate dnnlowp code to open source directory (#13500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13500 This diff migrate dnnlowp related files and operators from deeplearning/quantization/caffe2 and deeplearning/quantization/dnnlowp to the open source directory. Reviewed By: jspark1105 Differential Revision: D10842192 fbshipit-source-id: 53d0666d0ae47a01db9c48114345d746b0a4f11f	2018-11-02 09:36:59 -07:00
Gregory Chanan	bcb851a3d6	Write gesv derivatives in terms of native function. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13469 Reviewed By: ezyang Differential Revision: D12889116 Pulled By: gchanan fbshipit-source-id: 1a25dd6ec3fda5897c5cabbb9a62423b50bfda36	2018-11-02 08:30:24 -07:00
Freddie Mendoza	1e1dd88c4a	Add Linux ppc64le CPU/GPU CI build status Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13507 Differential Revision: D12902281 Pulled By: soumith fbshipit-source-id: d2c89dcf08dcbe1e451ae52e85256f658155a0e1	2018-11-02 07:51:40 -07:00
Tongzhou Wang	2f82a06826	Fix half_tensor.bernoulli_(double) (#13474 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13474 Differential Revision: D12897834 Pulled By: SsnL fbshipit-source-id: 598250fd7b9f1d2509ec0e5012724d7895a62daf	2018-11-02 07:46:46 -07:00
Sergei Nikolaev	61a2d47ec6	Special handling for 1D covolutional kernels in cuDNN flavor of conv_op. (#12902 ) Summary: Essentially makes cuDNN to think of those kernels like of Nx1 ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902 Reviewed By: BIT-silence Differential Revision: D10852862 Pulled By: soumith fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c	2018-11-02 07:08:23 -07:00
Zachary DeVito	86192301b3	Fix a few bugs in format and vararg handling (#13492 ) Summary: There are a couple subtle bugs in the way varargs is implemented: 1. it fails if you pass 0 arguments, because it doesn't handle the case when there are 0 varargs, and because Operator::matches was not updated. 2. it breaks all the named-based lookups on nodes. For instance node->get<int>(attr::value) will return a single entry of the varargs if you look it up by name. Furthermore it complicates some assumptions about the positional arguments (e.g. they use to be 1-to-1 with node inputs but with varargs they are not). Because varargs are only being used for format, this diff instead just allows format to take any value as input, regardless of type. It just provides a way to set is_vararg from the schema but does not restrict the type of the varargs things. This is inline with the pre-existing behavior for is_vararg so it doesn't require Operator::matches changes. This also keeps format inline with how print works, and is closer to the python implementation of format. Note that the implementation of format already worked with arbitrary IValues so restricting to strings was just making it more conservative than needed. This also fixes the implementation of format to work when there are 0 arguments or text before and after a format string, where it would not print things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13492 Differential Revision: D12896989 Pulled By: zdevito fbshipit-source-id: 21425bac8edc81709030a7408180494edea0a54b	2018-11-02 00:07:00 -07:00
Michael Suo	5fbaf0eaf8	add augmented assignment ops (#13364 ) Summary: This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends). - To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments. - Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types. - In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364 Differential Revision: D12899734 Pulled By: suo fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146	2018-11-02 00:01:07 -07:00
Fei Sun	a0e783768f	Do not fill in new data in every iteration if the input data only has one entry (#13495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13495 If the user has one data file to put in, the data is filled up in every iteration, which actually flushes the caches. The retrieved latency is larger than the latency when the caches are warm. Instead of doing that, we should only rely on wipe_cache variable to wipe the caches. The change is to skip filling the data if the input only has one size and it is not the first iteration Reviewed By: hl475 Differential Revision: D12897946 fbshipit-source-id: ee54ed09b8ec85fcefe930858420b90d494ad972	2018-11-01 22:06:09 -07:00
Michael Suo	57e162da56	Switch mutable lists to new mutable schema (#13406 ) Summary: Goodbye, World! This PR removes the world tokens and associated pass and switches lists over to the new mutability/aliasing annotations. Should resolve #12780 since we are disabling optimization pending alias analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13406 Differential Revision: D12886463 Pulled By: suo fbshipit-source-id: e64e55905aebdcad273b39862df3209f823f5408	2018-11-01 19:41:04 -07:00
Tongzhou Wang	6d2b3cc869	Fix pytest, make it work with run_test.py (#13416 ) Summary: Fixes #13326 Also now you can use `run_test.py` with `pytest`. E.g., ``` python run_test.py -vci distributed -pt ``` Yes it works with `distributed` and `cpp_extension`. cc zou3519 vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/13416 Differential Revision: D12895622 Pulled By: SsnL fbshipit-source-id: 2d18106f3a118d642a666bfb1318f41c859c3df7	2018-11-01 19:08:06 -07:00
Wanchao Liang	0fd176fea4	Add operator is, not, is not to script (#13336 ) Summary: As titled, this PR is a part of tasks to unblock exporting the standard library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13336 Differential Revision: D12888912 Pulled By: wanchaol fbshipit-source-id: 6213a17a75a593ae45999994fd9562f29b7d42df	2018-11-01 16:55:28 -07:00
Pieter Noordhuis	24839aac59	Link libgloo.a after libc10d.a to resolve remaining symbols (#13462 ) Summary: libcaffe2.so depends on libgloo.a for the ops in caffe2/contrib/gloo. Symbols in libgloo.a that are not used are ignored and don't end up in libcaffe2.so. libc10d.a depends on the caffe2 target, which in turn depends on the gloo target, and it expects all libgloo.a symbols to be part of libcaffe2.so. Symbols from libgloo.a that are not used in libcaffe2.so remain undefined in libc10d.a. To fix this, we link to libgloo.a when linking _C.so, such that any gloo symbols in libc10d.a are resolved when linking _C.so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13462 Differential Revision: D12892830 Pulled By: pietern fbshipit-source-id: 7560b3899b62f76081b394498480e513a84cefab	2018-11-01 16:03:33 -07:00
Xiaodong Wang	e6b6cc06ee	caffe2/core hipify (#13457 ) Summary: Small edits to caffe2/core hipify to make it compile in fbcode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13457 Reviewed By: bddppq Differential Revision: D12883472 Pulled By: xw285cornell fbshipit-source-id: 1da231d721311d105892db13ed726240398ba49e	2018-11-01 15:49:56 -07:00
Elias Ellison	421f3f3e52	add npair builtins (#13473 ) Summary: Add npair builtins to unblock standard library. As with broadcasting list, the only occurrences are with int/floats. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13473 Differential Revision: D12890844 Pulled By: eellison fbshipit-source-id: c360bb581d0f967cb51b858b6f964c300992d62a	2018-11-01 15:42:52 -07:00
Peter Goldsborough	27002e3fd5	Enable a few hicpp (#13189 ) Summary: Enabling three checks from ["High Integrity C++"](https://www.perforce.com/blog/qac/high-integrity-cpp-hicpp) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13189 Differential Revision: D12859779 Pulled By: goldsborough fbshipit-source-id: 8ec22370dcf88618dae749a8dae0e82678e68b0e	2018-11-01 15:19:17 -07:00
Yufei Wang	d843f63f2a	optimization on cpu conv3d (#11884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11884 In cpu mode, current convNd uses Im2ColNdNCHWImpl, which is generic implementation to handle convolutional layer for arbitrary number of dimensions. In video modeling, we use convNd for filter dimension=3. The problem of current convNd is that Im2ColNdNCHWImpl is much slower than Im2Col used by conv2d for the filters with same Flops. For example, a (1, 7, 7) 3d filter takes 5 times longer than a (7, 7) 2d filter at inference time. This diff extends Im2Col to 3d case (Im2Col3dNCHWImpl), and this optimization for 3d convolution gives 4~5 times faster inference time on cpu for various video models: {F128300920} i-am-not-moving-c2-to-c10 Reviewed By: BIT-silence Differential Revision: D8245940 fbshipit-source-id: 75231d65c9dd56059dfe31701e26021fd1ff2a85	2018-11-01 15:13:26 -07:00
vishwakftw	d714ecf879	Rename potrf to cholesky (#12699 ) Summary: This PR performs a renaming of the function `potrf` responsible for the Cholesky decomposition on positive definite matrices to `cholesky` as NumPy and TF do. Billing of changes - make potrf cname for cholesky in Declarations.cwrap - modify the function names in ATen/core - modify the function names in Python frontend - issue warnings when potrf is called to notify users of the change Reviewed By: soumith Differential Revision: D10528361 Pulled By: zou3519 fbshipit-source-id: 19d9bcf8ffb38def698ae5acf30743884dda0d88	2018-11-01 15:10:55 -07:00
Adam Paszke	26a8bb62ee	Re-enabled mm+add tree batching in the JIT (#13228 ) Summary: I've had to generously increase the range of the CreateADSubgraphs pass, because even though it collapses the RNN loop to a single differentiable subgraphs and a few other nodes, the range uses the distances in the original graph... cc zdevito zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13228 Differential Revision: D12871316 Pulled By: zou3519 fbshipit-source-id: 32da6f30f7821e4339034f1a4dec41ed0849abfb	2018-11-01 14:50:17 -07:00
Nim Arora	81438f1220	Add transpose network pass (#13437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13437 revert transform the NCHW Convolution operators to NHWC and the tensors around these operators Reviewed By: bwasti Differential Revision: D12871789 fbshipit-source-id: 6509a29fa1654424d22904df0d3e60f8cd9c0ec7	2018-11-01 14:27:07 -07:00
Nim Arora	a1728602da	Convert Arguments to dictionary (#13436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13436 revert Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary. Reviewed By: bwasti Differential Revision: D12871811 fbshipit-source-id: 486ad09f3f37723c92a946c486ce3e24a649b4e6	2018-11-01 14:27:05 -07:00
Peter Goldsborough	469c6b0539	Replace tmpnam usage (#13289 ) Summary: Fix ``` /torch_shm_manager#compile-manager.cpp.oc089dac2,gcc-5-glibc-2.23-clang/manager.cpp.o:manager.cpp:function main: warning: the use of `tmpnam' is dangerous, better use `mkstemp` ``` apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13289 Differential Revision: D12873282 Pulled By: goldsborough fbshipit-source-id: fc64b59403d52eb271744378ef4ee8338c79312c	2018-11-01 13:50:43 -07:00
Elias Ellison	edc6d721e0	fix flake (#13463 ) Summary: fix flake on test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/13463 Differential Revision: D12886532 Pulled By: eellison fbshipit-source-id: 1cd2a736663d5037bb4bdcd1d8ca1f201cf6a1cf	2018-11-01 13:39:39 -07:00
David Riazati	99ce499bfe	Revert D12852205: [pytorch][PR] [jit] Add str() builtin Differential Revision: D12852205 Original commit changeset: 3e0e9218afdf fbshipit-source-id: 114b4873504109394fe9d489200d39764ecc638e	2018-11-01 12:48:48 -07:00
Benoit Steiner	e2e560d9c8	Improved the caffe2 to ONNX export (#13429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13429 Made the SSA transformation idempotent. This ensures that if a caffe2 graph is already in SSA form, the name of the ONNX models inputs/outputs match these of the caffe2 graph. Avoid evaluating the model by running it if the shapes of all the blobs are present in the value_info map. This speeds up the conversion and decrease its memory usage in the case of medium to large nets. Reviewed By: abadams Differential Revision: D12873354 fbshipit-source-id: d695b28e610562afa9a41c2d4da05be212ccb488	2018-11-01 12:40:24 -07:00
Daya Khudia	54d63c5752	added fbgemm as submodule (#13354 )	2018-11-01 15:35:02 -04:00
Peter Goldsborough	c2dd0b9fad	Put torch/csrc/jit/fuser/config.h in gitignore Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13461 Differential Revision: D12886222 Pulled By: goldsborough fbshipit-source-id: f7cfb65f671129f46b5eafd75a6b00fa996371ac	2018-11-01 12:27:57 -07:00
Sam Gross	de0d85ba98	Remove getTHCudaHostAllocator in favor of getPinnedMemoryAllocator (#13451 ) Summary: ``` Both allocate "pinned" memory on the host (CPU). The allocator returned by at::cuda::getPinnedMemoryAllocator caches allocations, while getTHCudaHostAllocator would synchronize on frees. ``` This is super minor, but I want to avoid people grabbing getTHCudaHostAllocator by accident. (It's not currently used anywhere). We still need a better API for allocating pinned memory from both C++ and Python. (See https://github.com/pytorch/pytorch/issues/2206) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13451 Differential Revision: D12883037 Pulled By: colesbury fbshipit-source-id: 5d327e715acc1ded9b19660f84ecd23c8334d1c1	2018-11-01 12:18:29 -07:00
David Riazati	8f2bc1bc56	Add str() builtin (#13278 ) Summary: Allow casting to string from any IValue type Pull Request resolved: https://github.com/pytorch/pytorch/pull/13278 Differential Revision: D12852205 Pulled By: driazati fbshipit-source-id: 3e0e9218afdf27569da3ebf155f25e77e9f12984	2018-11-01 12:01:50 -07:00
Elias Ellison	70db53661b	expose fixed length list argument (#13142 ) Summary: Arguments have an optional fixed length list field which allows either a list or a single element that will be broadcast to a fixed length. This PR exposes that as a denotable argument, mostly to cover the many instances in which this used in the standard library. It appears in the standard library with ints & floats. Since this is not really a pattern we want to promote moving forward, I did not expose this for booleans or tensors. We could consider making the optional static length part of the list type, instead of the argument, which would make some of this code much nicer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13142 Differential Revision: D12876047 Pulled By: eellison fbshipit-source-id: e7359d2a878b4627fc2b9ebc090f9849ee524693	2018-11-01 10:34:52 -07:00
Tongzhou Wang	99a5d19591	Rename elementwise_mean to mean (#13419 ) Summary: Closes #12459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419 Differential Revision: D12883299 Pulled By: SsnL fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7	2018-11-01 10:31:26 -07:00
Elias Ellison	a5b627a0bf	add assert statements (#13408 ) Summary: Adding assert statements to unblock standard library. The same limitations that apply to the existing implementation of Exceptions apply to this as well (No control-flow logic, & we ignore the specific Exception thrown). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13408 Reviewed By: driazati Differential Revision: D12876451 Pulled By: eellison fbshipit-source-id: 767ba5a50ba7c5dd6a857ed4845ac076a81cf305	2018-11-01 10:01:07 -07:00
Richard Zou	004fc2f430	Stop unnecessarily setting storage in as_strided. (#13411 ) Summary: As per ezyang's suggestion Previously, tensor.as_strided would: - allocate a tensor `result` and a storage - throw away that storage in favor of the input tensor's storage. This PR makes tensor.as_strided not allocate a storage just to throw it away. This speeds up as_strided from 770ns to 344ns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13411 Reviewed By: ezyang Differential Revision: D12870309 Pulled By: zou3519 fbshipit-source-id: 1415e656f4d1931585c9a6006dcd4670123352d0	2018-11-01 08:32:53 -07:00
Edward Yang	c0e24443f7	Revert D10459665: [c10] Redo jit/type and utils/functional to ATen/core Differential Revision: D10459665 Original commit changeset: 563dec9987aa fbshipit-source-id: bea1dac93ebe73c9e09753d641f04f722d80aef7	2018-11-01 07:26:54 -07:00
Huamin Li	8444ed951d	add sleep time between runs (#12347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12347 add sleep time between net and operator runs, and between each iteration. Reviewed By: sf-wind Differential Revision: D10209308 fbshipit-source-id: 9a42b47e1fdc14b42dba6bb3ff048fe8e2934615	2018-11-01 00:25:22 -07:00
Junjie Bai	86e1009497	Make ATen core HIP compatible (#13343 ) Summary: So caffe2 can include aten core files without hipifying aten cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/13343 Reviewed By: xw285cornell Differential Revision: D12853162 Pulled By: bddppq fbshipit-source-id: f9402691292180dde110a58ea3b1cedc62aab0ba	2018-10-31 21:08:54 -07:00
Bram Wasti	10a6a3e404	Redo jit/type and utils/functional to ATen/core (#12862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12862 This is a redo of the previous move in a way that doesn't migrate the namespace -- also will check for the windows cudnn build failure Reviewed By: Yangqing Differential Revision: D10459665 fbshipit-source-id: 563dec9987aa979702e6d71072ee2f4b2d969d69	2018-10-31 19:57:43 -07:00
Cheng,Penghui	c76fc75292	Implementation copy operator for mkl-dnn (#12820 ) Summary: It is a operator to copy blob from ideep device to ideep device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12820 Reviewed By: ezyang Differential Revision: D10850956 Pulled By: yinghai fbshipit-source-id: f25bff6238cefe847eb98277979fa59139bff843	2018-10-31 19:35:53 -07:00
Tongzhou Wang	96ab7cbe5c	Make gels error message nicer (#13421 ) Summary: cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/13421 Differential Revision: D12875237 Pulled By: SsnL fbshipit-source-id: 889a9820be77bb8055d41e395d7bf55d092b35d7	2018-10-31 19:25:57 -07:00
mruberry	6fe089c6ea	Hierarchical device independent -> device specific architecture (#13108 ) Summary: This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller. The fusion flow is now: - Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference. - Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic. - Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic. - If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked. This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better. This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes. Bug fixes: - thread-safety is improved with caches preventing concurrent access - the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported - an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue) Code/Logical changes: - "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing) - The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere) - Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions - Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy) - There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code apaszke who I promised naming rights to zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing ngimel who contributed to the design of this architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108 Reviewed By: gchanan, fmassa Differential Revision: D12850608 Pulled By: soumith fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3	2018-10-31 18:13:00 -07:00
Sebastian Messmer	2df6d3e3c7	Fix allocator handling in raw_mutable_data (#13349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13349 When we get a Tensor that was created in ATen, it will have an allocator set. Such tensors, before, crashed when you called raw_mutable_data on them. This diff fixes that. Reviewed By: ezyang, teng-li Differential Revision: D12850833 fbshipit-source-id: 51a5f7030afc4854b439cb3698d0ccd8dd101e2c	2018-10-31 18:04:41 -07:00
Junjie Bai	a682ce9144	Add back HIP support to async net (#13400 ) Summary: We lost HIP support in last refactoring `620ece2668` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13400 Differential Revision: D12868211 Pulled By: bddppq fbshipit-source-id: 72dbfda105b826bee28ddf480e88fca7d63f93d8	2018-10-31 17:52:36 -07:00
Junjie Bai	eaf141dd64	Enable opencv and lmdb in ROCM CI (#13430 ) Summary: They are needed to run resnet50_trainer when using datasets from https://download.caffe2.ai/databases/resnet_trainer.zip cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/13430 Differential Revision: D12876593 Pulled By: bddppq fbshipit-source-id: 912943d1d84d165ad396c8a99d2b948d933e12f2	2018-10-31 17:50:33 -07:00
Jerry Zhang	2e1b7a6f4f	Renaming dim() to size() - 1/3 (#13434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13434 Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12867223 fbshipit-source-id: 3e05be1a370ebd1a273bd4c70499d019fd056ac4	2018-10-31 17:43:52 -07:00
Jerry Zhang	edd902594a	Renaming meta() to dtype() - 1/2 (#13333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13333 Codemod generated with clangr shard mode, 50 files per diff, clangr code(meta->dtype): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12845168 fbshipit-source-id: 492091963d2211ea80215200e981965767566135	2018-10-31 17:14:08 -07:00
Yiming Wu	470bfaa586	int8 sigmoid op (#13298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13298 Int8 sigmoid ops, test provided. Only supports first axis now Reviewed By: newstzpz Differential Revision: D12837824 fbshipit-source-id: 2a9f1739813fe7b48f841ae15e0206768e57cd3e	2018-10-31 16:22:45 -07:00
Yangqing Jia	48db74ea03	net_simple_refcount type to help experimentation with dynamic allocation. (#13370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13370 This diff adds a new net type (simple_refcount) that does one thing: for all intermediate results produced by a net, it will keep refcount about internal usage, and when it finishes its consumption, the net will delete the blob content to mimic the case of dynamic allocation. In fact, this would also be the behavior when we go functional: anything that is not explicitly marked as input or output will be up to the executor for lifetime management. See the comments in net_simple_refcount.cc for details. Reviewed By: dzhulgakov Differential Revision: D12855489 fbshipit-source-id: 594a47a786305d595fd505b6700864dd1d9c72aa	2018-10-31 15:59:16 -07:00
Rui Zhu	479b8266bf	Back out "[pytorch][PR] Support upsample" (#13413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13413 Original commit changeset: d5db200365f1 Reviewed By: houseroad Differential Revision: D12870356 fbshipit-source-id: be115d2370636786901c822895664ccace2a9bc2	2018-10-31 15:51:41 -07:00
sven	a4778862c7	Docs/cpp misc features and fixes (#12914 ) Differential Revision: D10502199 Pulled By: ezyang fbshipit-source-id: ec7523caf37d2c92a0e7a2981e1badf51b93dd05	2018-10-31 15:22:45 -07:00
Peter Goldsborough	7b47262936	Use names instead of indices in format (#13266 ) Summary: apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13266 Differential Revision: D12841054 Pulled By: goldsborough fbshipit-source-id: 7ce9f942367f82484cdae6ece419ed5c0dc1de2c	2018-10-31 15:17:47 -07:00
Anders Papitto	a376f3a53f	Revert "Revert D12858091: [pytorch][PR] restore USE_C10D_NCCL" (#13407 ) Summary: This reverts commit b1fe541de35381e3a31a9e71db2be4b3af59dbcc. some CI confusion made it look like this diff needed to be reverted; however the actual issue was elsewhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/13407 Differential Revision: D12869650 Pulled By: anderspapitto fbshipit-source-id: 3a436d41fc8434f9aa79b145f20904c99093eef4	2018-10-31 14:02:25 -07:00
David Riazati	f9c0a08eed	Fix len() for tensors (#13398 ) Summary: Fixes #13376, `len(tensor)` was converting tensor to a 1 element list and returning 1 every time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13398 Differential Revision: D12867630 Pulled By: driazati fbshipit-source-id: 28f3580a072d763df0980b3149c49d1894842ec9	2018-10-31 13:13:21 -07:00
Freddie Mendoza	9577811908	Using pip --user in test.sh script breaks ppc64le builds (#13388 ) Summary: Recent PR #13366 added --user to pip install breaks ppc64le testing when using test.sh. This fix makes it not be used for ppc64le builds/test as both ninja and hypothesis are already in ppc64le docker images. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13388 Differential Revision: D12870164 Pulled By: soumith fbshipit-source-id: b66bafc06ad2c5116bb5ef5e4681cf9c776084aa	2018-10-31 13:09:26 -07:00
Will Feng	08b7c791ff	Windows CI hotfix: Pin Python version to 3.6.7 (#13410 ) Summary: The newest version of `mkl` in conda only supports Python 3.6.7, and installing it as dependency will automatically downgrade Python from 3.7 to 3.6.7, which creates environment divergence between Windows CI build and test jobs. This PR pins Python version to 3.6.7, so that Windows CI build and test jobs have the same conda environment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13410 Differential Revision: D12870201 Pulled By: yf225 fbshipit-source-id: 2c5a41ad4bcc72e02d12ea6529550d5e1cdd45ef	2018-10-31 13:02:18 -07:00
David Riazati	404f8660e7	Add string.format() (#13157 ) Summary: This PR adds `aten::format` as a builtin op for strings with the basic formatting semantics of Python. It also adds varargs to the schema parser (with the limitation that the varargs item is the last argument, i.e. `(args, *kwargs)` is not supported) and to the compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/13157 Differential Revision: D12832537 Pulled By: driazati fbshipit-source-id: 17c1a5615bb286c648fc9e38f2ebe501b064c732	2018-10-31 12:50:56 -07:00
Gregory Chanan	b3ef98450b	Use non-th versions of some functions when defining backwards. (#13394 ) Summary: In these cases, the native function doesn't do anything different besides checking so there is no semantic change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13394 Differential Revision: D12861272 Pulled By: gchanan fbshipit-source-id: ef7403ef3ce0326ccb12178434ce0cf14b28426e	2018-10-31 12:42:03 -07:00
Michael Suo	f30c74558c	Revert D10861211: Convert Arguments to dictionary Differential Revision: D10861211 Original commit changeset: da2fcc3e3b4d fbshipit-source-id: 7243cb340920cf0acb57420bb5de908acd02a064	2018-10-31 12:38:43 -07:00
Michael Suo	93b16b6422	Revert D10519758: [nomnigraph] Add transpose network pass Differential Revision: D10519758 Original commit changeset: a268374fb0b1 fbshipit-source-id: 4de4c99a185c4083665226af94312b38dd0f6820	2018-10-31 12:34:14 -07:00
Will Feng	b1fe541de3	Revert D12858091: [pytorch][PR] restore USE_C10D_NCCL Differential Revision: D12858091 Original commit changeset: 1cc91bb3b82e fbshipit-source-id: a9b55ea8c138f939af71caefdfe7d4bccf0cd331	2018-10-31 11:32:46 -07:00
Hong Xu	a43c6385f1	When looking for pybind11, do not attempt to get properties from pybind11:pybind11. (#12188 ) Summary: There is no property name "INTERFACE_INCLUDE_DIRECTORIES" for pybind11::pybind11. This will cause cmake error if there exists a system installation of pybind11. In addition, pybind11_INCLUDE_DIRS is already set once "find_package(pybind11 CONFIG)" finds pybind11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12188 Differential Revision: D10362655 Pulled By: soumith fbshipit-source-id: 9c5d13295c4a2cf9aacd03e195994287d06ed15c	2018-10-31 11:23:01 -07:00
Sam Gross	f5b34e3446	Handle exceptions in at::parallel_for() (#13393 ) Summary: Currently, exceptions thrown in at::parallel_for() will cause a hard crash if the code is executed by a background thread. This catches the exception and re-throws it in the main thread. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13393 Differential Revision: D12861142 Pulled By: colesbury fbshipit-source-id: d53f5ff830ef8c11f90477eb63e5016f7ef1a698	2018-10-31 11:22:59 -07:00
Sam Gross	a4f00c3d1e	Fix error message in tensorlist() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13392 Differential Revision: D12860921 Pulled By: colesbury fbshipit-source-id: 86da3ef15d70b0343dc922a3842449001c1afffa	2018-10-31 11:19:56 -07:00
Nim Arora	cda44ffa81	Add transpose network pass (#13396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13396 stub for bootcamp task Reviewed By: bwasti Differential Revision: D10519758 fbshipit-source-id: a268374fb0b119c5d1960a4382e51c5e1ca240ba	2018-10-31 11:16:41 -07:00
Nim Arora	04e8a6d9ef	Convert Arguments to dictionary (#13332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13332 Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary. Reviewed By: bwasti Differential Revision: D10861211 fbshipit-source-id: da2fcc3e3b4dbf8decbe14a8e2d5621b3fcc377f	2018-10-31 11:16:39 -07:00
Nim Arora	2cebcbae8c	createUniqueDataNode Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13395 Reviewed By: bwasti Differential Revision: D12831584 fbshipit-source-id: a349dfe7a1da0d90e62b47e1b917f358275007be	2018-10-31 11:16:38 -07:00
serega	a25d3b4d8c	Use byte tensor for mnist labels. (#13363 ) Summary: The C++ mnist example https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/mnist.cpp does not work because the labels are not correctly loaded. Currently it achieves 100 % accuracy. Specifying byte dtype fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13363 Differential Revision: D12860258 Pulled By: goldsborough fbshipit-source-id: ad7b9256e4fc627240e25c79de9d47b31da18d38	2018-10-31 11:05:40 -07:00
Ailing Zhang	488d393ea6	Fix pointwise loss broadcast (#12996 ) Summary: Fixes #12129 , #12327 Differential Revision: D10513781 Pulled By: ailzhang fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec	2018-10-31 10:17:25 -07:00
Gregory Chanan	27ccc8787f	Implement data_ptr as a native function. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13367 Reviewed By: ezyang Differential Revision: D12855339 Pulled By: gchanan fbshipit-source-id: da5d75ab38e01365717eed9a676dcbb22ac89fe7	2018-10-31 09:51:04 -07:00
Anders Papitto	cb87319eb0	restore USE_C10D_NCCL Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13371 Differential Revision: D12858091 Pulled By: anderspapitto fbshipit-source-id: 1cc91bb3b82ec075481353e6f58dfe4e802fee5d	2018-10-31 09:46:45 -07:00
Will Feng	4c06f1f2bb	CircleCI: enable all flaky tests (#13356 ) Summary: A few Caffe2 tests are currently disabled in `py2-gcc4.8-ubuntu14.04` test job because they are known to be flaky. https://github.com/pytorch/pytorch/pull/13055 likely had fixed the flakiness, and this PR tests it. Fixes https://github.com/pytorch/pytorch/issues/12395. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13356 Differential Revision: D12858206 Pulled By: yf225 fbshipit-source-id: 491c9c4a5c48ac1b791fdc9d78acf66091e80457	2018-10-31 09:34:49 -07:00
David Riazati	bc74ec80d0	Add support for torch.backends.cudnn.enabled (#13057 ) Summary: This is used commonly in `nn` functions. This PR adds it as a weak module (and also alters the conversion of weak modules to strong modules to accept ordinary `object`s) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057 Differential Revision: D10846618 Pulled By: driazati fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b	2018-10-31 09:31:09 -07:00
Gregory Chanan	b200b51602	Give _dirichlet_grad a native wrapper. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13368 Reviewed By: ezyang Differential Revision: D12855461 Pulled By: gchanan fbshipit-source-id: a220ff464ef09e4efcd9da296fa8b6839b94c337	2018-10-31 07:57:32 -07:00
Edward Yang	0aaff5eaf9	Replace CUDA-specific set_index(_from) method from DeviceGuard with set_device. (#13275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275 This resulted in a bunch of knock-on changes, which I will now describe: - s/original_index/original_device/ - s/last_index/last_device/ - A bunch of places that used set_index, now use CUDAGuard (which does have set_index) because they were CUDA-specific code. Major caveat: DeviceGuard doesn't actually work non-CUDA/CPU devices, To make that happen, I plan on totally replacing the implementation of DeviceGuard; what I mostly care about here is wrangling the API into an acceptable state. Reviewed By: gchanan Differential Revision: D12832080 fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d	2018-10-31 07:55:13 -07:00
Edward Yang	e5d56659ec	Delete DeviceGuard(int64_t) constructor. (#13232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232 DeviceGuard should be device agnostic, which means that it shouldn't assume that int64_t means select the CUDA device. Reviewed By: gchanan Differential Revision: D10858024 fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe	2018-10-31 07:55:11 -07:00
Edward Yang	e93c721da1	Add c10::Stream, make at::cuda::CUDAStream use it. (#13133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13133 c10::Stream is a device agnostic object which represents a stream on some device (defined as c10::Device). The primary benefit of introducing this object is that we can easily refer to it from code in the non-CUDA library (since it doesn't actually refer to any CUDA specific bits.) Streams are identified by an ID into an appropriate pool. There's some work to translate to and from pointers to the pool; see inline comments. Reviewed By: gchanan Differential Revision: D10855883 fbshipit-source-id: cc447f11a528432e41c2edc789f40e7a6f17bdd3	2018-10-31 07:55:10 -07:00
Gregory Chanan	a3410f7994	Give addbmm a native wrapper. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13316 Reviewed By: ezyang Differential Revision: D12840406 Pulled By: gchanan fbshipit-source-id: ebcc495f2437da71778001971c32ad6074cf98b7	2018-10-31 07:28:46 -07:00
Gregory Chanan	e6ace54840	Move underscore prefixed th functions _th prefix. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13308 Differential Revision: D12839464 Pulled By: gchanan fbshipit-source-id: ceb5913cd154de301d0d476d70b3a4fc62eb319c	2018-10-31 07:03:34 -07:00
Teng Li	e475d3ede3	DDP multi-GPU segfault fix (#13291 ) Summary: Fix https://github.com/pytorch/pytorch/issues/13200 Tested on 8 GPU machines since CI doesn't have this many GPUs, so multi-GPU test won't be triggered ``` tengli@learnfair096:~/pytorch/test$ python run_test.py -i distributed --verbose Selected tests: distributed Running test_distributed ... [2018-10-29 20:32:46.355858] /public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec Running distributed tests for the gloo backend test_DistBackend (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... ok ``` Also I would like to bump up the bucket size of broadcast to higher for performance reasons Pull Request resolved: https://github.com/pytorch/pytorch/pull/13291 Differential Revision: D12842840 Pulled By: teng-li fbshipit-source-id: e8c50f15ebf2ab3e2cd1b51d365e41a6106b98fe	2018-10-31 00:43:42 -07:00
bddppq	dc854c0ee6	Add --user to pip install in pytorch test scripts (#13366 ) Summary: caffe2 docker images uses native system python, which requires sudo to do pip install. In pytorch rocm Ci we use caffe2 docker image Pull Request resolved: https://github.com/pytorch/pytorch/pull/13366 Differential Revision: D12855748 Pulled By: bddppq fbshipit-source-id: 3e53fa203fa6bb3c43d4065c38c2b61e47f45f1e	2018-10-30 23:09:00 -07:00
Anders Papitto	44d2ca660a	Disable CCACHE while building NCCL (#13340 ) Summary: I don't have a full analysis, but ccache appears to often fail while nccl. To work around this, run the NCCL build with CCACHE_DISABLE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13340 Differential Revision: D12855467 Pulled By: anderspapitto fbshipit-source-id: 63eb12183ab9d03dd22090f084688ae6390fe8bd	2018-10-30 22:19:21 -07:00
Xiaomeng Yang	bfe7df2211	Optimize rowwise_moments by MKL (#13329 ) Summary: i-am-not-moving-c2-to-c10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13329 Optimize rowwise_moments by MKL Reviewed By: houseroad Differential Revision: D12845220 fbshipit-source-id: b047e52ba82ed184bd322680fbf96306dfbb9867	2018-10-30 21:43:36 -07:00
Teng Li	865a10feba	Update NCCL to 2.3.7-1 (#13353 ) Summary: Including some hang fixes. Tested locally and distributed works fine Pull Request resolved: https://github.com/pytorch/pytorch/pull/13353 Differential Revision: D12853714 Pulled By: teng-li fbshipit-source-id: be72b9ffb48cffdb590e5452b0a4ec597f052685	2018-10-30 21:34:59 -07:00
Duc Ngo	265c97decf	nomnigraph - More operator definitions (#13358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13358 More operator definitions changes Reviewed By: itomatik Differential Revision: D12852403 fbshipit-source-id: 0a69d9c6b55ab48344521ab9dba1de003dfc0714	2018-10-30 20:59:42 -07:00
Elias Ellison	59f8e8ada7	First step at adding exceptions (#12789 ) Summary: This is a first step towards adding exceptions. We need minimal support in order to begin converting the torch library to weak script mode (which is the main goal here). Some limitations (that are documented in the tests & compiler): 1. Cannot assign exceptions to variables 2. Any name after raise is being treated as a valid Exception 3. No control flow analysis yet. Below a will be undefined: if True: a = 1 else: raise Exception("Hi") return a Pull Request resolved: https://github.com/pytorch/pytorch/pull/12789 Differential Revision: D12848936 Pulled By: eellison fbshipit-source-id: 1f60ceef2381040486123ec797e97d65b074862d	2018-10-30 20:25:50 -07:00
Junjie Bai	c7027a511f	In pytorch CI install ninja via pip instead of building it from source Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13042 Differential Revision: D12854708 Pulled By: bddppq fbshipit-source-id: 2693d8c9818782cb9f0c958dee8f77a1c131e32d	2018-10-30 20:05:40 -07:00
Junjie Bai	3c66520dd8	Remove aten/src/ATen/CUDAStream.cpp from hipify script (#13357 ) Summary: Deleted in https://github.com/pytorch/pytorch/pull/13251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13357 Differential Revision: D12852983 Pulled By: bddppq fbshipit-source-id: 0816a14188590e1971fabefcd575489c7339e122	2018-10-30 19:48:07 -07:00
Jerry Zhang	13b9fd3e05	Renaming meta() to dtype() - 2/2 (#13334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13334 Codemod generated with clangr shard mode, 50 files per diff, clangr code(meta->dtype): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp i-am-not-moving-c2-to-c10 Reviewed By: ezyang Differential Revision: D12845197 fbshipit-source-id: f87eb575d3c31593ca76b70780cc4fca888e706b	2018-10-30 18:24:30 -07:00
Gregory Chanan	cb5f374f6c	More functions moved to native, use _th_ prefix more consistently. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13262 Reviewed By: ezyang Differential Revision: D12827704 Pulled By: gchanan fbshipit-source-id: c910c069200c0766dd6d5f998d341124d560e80d	2018-10-30 17:41:55 -07:00
James Reed	7d9ab140bf	Fix aten::to symbolic + add expand_as (#13325 ) Summary: https://github.com/pytorch/pytorch/pull/13146 broke some cases of ONNX export, this fixes them Pull Request resolved: https://github.com/pytorch/pytorch/pull/13325 Differential Revision: D12844294 Pulled By: jamesr66a fbshipit-source-id: f98dd0685820b2a1e5fcd49733cfa5c19c48a4e7	2018-10-30 17:28:15 -07:00
jithunnair-amd	4d141bee98	Skip test_sum_noncontig in ROCm (#13341 ) Summary: Since it fails due to insufficient precision for DoubleTensor .sum() on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/13341 Differential Revision: D12851335 Pulled By: bddppq fbshipit-source-id: e211c3868b685aa705160ce98a2a18a915ad493f	2018-10-30 16:54:44 -07:00
Gregory Chanan	f1d02f6d1c	Move underscore prefixed linear algebra TH functions to _th prefix. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13309 Reviewed By: ezyang Differential Revision: D12839533 Pulled By: gchanan fbshipit-source-id: 27bdc5254d2529269b705c2c057826a44297a34b	2018-10-30 16:31:53 -07:00
Will Feng	11a16961a5	Fix "CUDA Tensor __rsub__ breaks when device is not 0" (#12956 ) Summary: Currently, `a = 1 - torch.tensor([1]).to('cuda:1')` puts `a` in `cuda:1` but reports `a.device` as `cuda:0` which is incorrect, and it causes illegal memory access error when trying to access `a`'s memory (e.g. when printing). This PR fixes the error. Fixes https://github.com/pytorch/pytorch/issues/10850. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12956 Differential Revision: D12835992 Pulled By: yf225 fbshipit-source-id: 5737703d2012b14fd00a71dafeedebd8230a0b04	2018-10-30 16:29:19 -07:00
Michael Suo	d2659f6689	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13346 Differential Revision: D12850686 Pulled By: michaelsuo fbshipit-source-id: b7474d0a3f3347034592bef45125610c040cff6a	2018-10-30 16:22:58 -07:00
Michael Antonov	f58e4fbc45	Remove redundant array-gen loop in gather_ops_test.py (#13338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13338 Remove unnecessary [r for r in []] statements. Reviewed By: ezyang Differential Revision: D12848907 fbshipit-source-id: 256551b286ac6801585acf9bb0b2644ef0b7ed58	2018-10-30 16:20:22 -07:00
Dan Nguyen	77b8aade58	Revert D12809293: Kill more weird constructors on Tensor Differential Revision: D12809293 Original commit changeset: 5eb663fe8182 fbshipit-source-id: 709a5378fdbbb3fcfaacef8fc48b6530afbbc28f	2018-10-30 16:01:51 -07:00
Xiaodong Wang	ed60f94dba	hipify caffe2 script in fbcode (#13265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13265 Make changes to make hipify_python script to work with fbcode. 1. Add TARGETS file 2. Make hipify_python a module as well as a standalone script. Reviewed By: bddppq Differential Revision: D10851216 fbshipit-source-id: cacd04df6fe2084832256d1916d62dccea86baa9	2018-10-30 15:51:28 -07:00
Gregory Chanan	9ca8a76645	Rename Type.tensor to Type._th_tensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13313 Reviewed By: ezyang Differential Revision: D12840136 Pulled By: gchanan fbshipit-source-id: 896d705eb5091f7677d6d91dbd50629343dfa24d	2018-10-30 15:34:06 -07:00
Anders Papitto	c68b82ebc8	don't expand cmake variable in IF Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13331 Differential Revision: D12849306 Pulled By: anderspapitto fbshipit-source-id: 2f1f72a44ed3a176be8c7490652e49771c3fadbf	2018-10-30 15:20:43 -07:00
Gregory Chanan	cc3618ce36	Move _cumsum and _cumprod to _th_ prefixes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13311 Reviewed By: ezyang Differential Revision: D12839706 Pulled By: gchanan fbshipit-source-id: 79e20b31c6ca2f22229ad3903aacf70dc674c25c	2018-10-30 15:01:14 -07:00
Jerry Zhang	ce469e6c71	dims() to sizes() remaining part Summary: Made the clangr rule more robust and it discovered more callsites. Reviewed By: smessmer Differential Revision: D12825017 fbshipit-source-id: 3be1eeb7ea697b36ef89e78ba64c0ee1259439c4	2018-10-30 14:56:21 -07:00
Sam Gross	9af18d847a	Fix accesses to uninitialized memory when running sum() within an OMP… (#13274 ) Summary: ``` … parallel region. The two_pass_reduction code allocates a buffer of size at::max_threads(). When called within a parallel region, at::parallel_for only uses 1 thread so some of this buffer is not written. This makes two changes: 1) two_pass_reduction is not called when already in a parallel region 2) two_pass_reduction fills unwritten buffer elements with the identity (the value in dst) ``` cc The controller you requested could not be found. SsnL: I think this should fix the NaNs in BatchNorm when calling sum() within a parallel region. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13274 Differential Revision: D12840034 Pulled By: colesbury fbshipit-source-id: d32e80909a98a0f1bb1c80689fe5089b7019ef59	2018-10-30 14:17:35 -07:00
Peter Goldsborough	f04a705cb2	Remove assertions in conv modules (#13283 ) Summary: These assertions aren't necessary because these conditions are checked inside the ATen ops, and right now they're not very user-friendly because they don't have an error message or reference the dimension of the tensor being checked. Let's just remove them (the error then comes from ATen with a friendlier message). ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/13283 Differential Revision: D12840730 Pulled By: goldsborough fbshipit-source-id: 1902056c7d673f819c85f9164558e8d01507401c	2018-10-30 13:51:12 -07:00
Gregory Chanan	c0411719fc	Rename th_addmm to _th_addbmm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13317 Reviewed By: ezyang Differential Revision: D12840603 Pulled By: gchanan fbshipit-source-id: 10ead96cd181535cbd4dfe84be813375024dbd2c	2018-10-30 13:48:49 -07:00
Dong Shi	3a81984bde	Make Stat put ops accept empty tensors safely (#13178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13178 Add default value option to stats put ops Reviewed By: mlappelbaum Differential Revision: D10858564 fbshipit-source-id: cc9b3e621abf3fc21821b73f354bebdcd35e477e	2018-10-30 13:28:58 -07:00
Lu Fang	ce51e3fe55	Move the Test conversion script to main repo (#13287 ) Summary: Better to keep it in the main repo, so we will have the correct dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13287 Reviewed By: zrphercule Differential Revision: D12834665 Pulled By: houseroad fbshipit-source-id: 3a0afaa705a9b8f4168fcd482123bcabcf083579	2018-10-30 13:25:22 -07:00
Wei Yang	3cb2470bb3	add __deepcopy__ back to Parameter (#12886 ) Summary: - fix https://github.com/pytorch/pytorch/issues/315 - add `__deepcopy__` back to Parameter class Pull Request resolved: https://github.com/pytorch/pytorch/pull/12886 Differential Revision: D12838771 Pulled By: weiyangfb fbshipit-source-id: b2ce12244e36f981d89f6c7cdead63237dd820ea	2018-10-30 12:56:26 -07:00
Yangqing Jia	a35162f1bc	Remove net_simple_async (#13320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13320 simple_async has been deprecated via the network override rule for a while, and we should be able to safely remove it. This also clears up 2 tech debts: (1) in rnn executor, rely on the executor override to get the right net. (2) clearly mark checkExecutorOverride as a potential change to net_type by making it c++ style guide compliant. Reviewed By: dzhulgakov Differential Revision: D12840709 fbshipit-source-id: 667702045fa024f5bdc87a9c28ea1786c78432b3	2018-10-30 12:36:38 -07:00
verhoek	0db505bf27	Made docstrings for Embedding more accurate. (#13310 ) Summary: Made the previous description for max_norm more precise, avoiding 'this' and describing what actually happens in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13310 Differential Revision: D12840813 Pulled By: SsnL fbshipit-source-id: 98090c884267a62ce93cd85da84252d46926dfa5	2018-10-30 12:25:38 -07:00
Kean Finucane	264deae5da	Improve visual representation of NQL subgraphs (#13143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13143 For primary review (will likely delete to keep commit chain cleaner) Retain operator input order in DOT string conversion on NQL tool. Assumptions * No API to discern input graph node type * Graph is bipartite * No generative operators; i.e. operator w/o input but creates output * Not supporting subgraph Mocks (from input P60154484) Old: https://pxl.cl/j4mV (DOT string P60154515) New: https://pxl.cl/j0wd (DOT string P60154461) Reviewed By: bwasti Differential Revision: D10224942 fbshipit-source-id: 8b0ce2f1f9248dfaa89aa01a3fd77e327de16ea4	2018-10-30 12:22:37 -07:00
Xiaomeng Yang	017b91f861	Optimize channel_shuffle_op on GPU (#13066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13066 Optimize channel_shuffle_op on GPU Reviewed By: houseroad Differential Revision: D10639281 fbshipit-source-id: 394b937403e5d4e9df93548bbf87285bffaa55a9	2018-10-30 12:18:27 -07:00
Egil Martinsson	518b0d0600	Fix add out=None to digamma docstring (Fixes #13225 ) (#13307 ) Summary: Fixes #13225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13307 Differential Revision: D12840231 Pulled By: SsnL fbshipit-source-id: 2732a2466ac1d2f3fdabfd1eaccddec96e89ba1b	2018-10-30 11:52:35 -07:00
Michael Suo	5ba952afcc	use topological move in graph fuser (#13271 ) Summary: Turns out that getting rid of the multiple passes in fusion is a little more involved, so leaving it off for another day. Expect test changes are just things moving around with new orders, but I would appreciate if someone glanced at them for something crazy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13271 Differential Revision: D12832752 Pulled By: michaelsuo fbshipit-source-id: 55f16c80a97601744a06df2ead45cef7b3a19c08	2018-10-30 11:10:28 -07:00
Jason Gauci	5b15a501da	Refactor & unit test feed predictor Summary: 1. Refactor DDPG predictor. Merge the critic predictor with ParametricDQNPredictor since they are the same 2. Fix bug where loss was multiplied by the batch size 3. Create DDPGFeedPredictor which uses the feed predictor output format 4. Add support for gridworld simulation memoization to DDPG. Also memoize normalization tables. Reviewed By: kittipatv Differential Revision: D10161240 fbshipit-source-id: 2813890043de1241c1fb9b9c2b6a897403f9fc12	2018-10-30 10:27:47 -07:00
Dmytro Dzhulgakov	ec754adb14	Kill more weird constructors on Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13190 Reviewed By: ezyang Differential Revision: D12809293 fbshipit-source-id: 5eb663fe818276d97cf31d1ed1e7f025d2b69851	2018-10-30 10:25:40 -07:00
Will Feng	10de2c1187	CircleCI: fix test timeout by running CPU build and test on different machines (#13284 ) Summary: It seems that we can fix the test timeout issue by running CPU build and test on different machines (I manually ran this patch through the CI 50 times to confirm this). The actual reason of timeout is still unknown, but I suspect it has to do with memory / disk space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13284 Differential Revision: D12840371 Pulled By: yf225 fbshipit-source-id: af326f0358355602ee458696c3ffb325922e5289	2018-10-30 10:22:57 -07:00
David Riazati	ac64724ed9	Add support for tuple constants (#13086 ) Summary: Depends on #13072 Adds support for tuples as variables instead of just as literals. Before, tuples would give the error `python value of type 'tuple' cannot be used as a value`. This PR adds a flag on `SugaredValue` to determine in a value is a tuple or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13086 Differential Revision: D10846632 Pulled By: driazati fbshipit-source-id: 7b5d6ae9426ca3dd476fee3f929357d7b180faa7	2018-10-30 09:01:17 -07:00
albanD	f06b70a6e9	Fix memory leak during packing in tuples (#13305 ) Summary: Verified on python 3.6 that it fixes #13243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13305 Differential Revision: D12838764 Pulled By: soumith fbshipit-source-id: 206a8b22d1d05e5f156f1db1baaa82358f3eaa83	2018-10-30 08:32:26 -07:00
Richard Zou	8a888c48da	Reimplement as_strided in ATen. (#13185 ) Summary: This moves away from using tensor.set_(...) for as_strided, which went through TH and was weirdly slow/complicated. The new as_strided has a new invariant that it will never resize the storage to a larger size (the previous as_strided allowed that behavior but it seemed weird and none of our code relied on it.) This offers a small speedup on as_strided: it went from 1300ns to 1100ns although the benchmarks get a little noisy here. Also on the changelog is a quick fix to resize_ code to avoid unsigned underflow. I'll rewrite the resize_ zero dim logic in a future diff, it doesn't make sense the way it is written right now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13185 Reviewed By: ezyang Differential Revision: D12809160 Pulled By: zou3519 fbshipit-source-id: 3885df9d863baab2b2f8d8e2f8e2bfe660a49d85	2018-10-30 07:52:50 -07:00
Richard Zou	8c2d0c831f	Speed up tensor.storage_offset (#13267 ) Summary: This PR special cases tensor.storage_offset to avoid dispatches in the common case. tensor.storage_offset is important for torch.as_strided performance, because as_strided(sizes, strides) shares an implementation with as_strided(sizes, strides, storage_offset) and it might not be the best if there were two separate implementations (including backward implementations). This PR reduces times on a tensor.storage_offset microbenchmark from 22ns to 2ns (these numbers are pretty stable). For a torch.as_strided benchmark, this PR reduces numbers from 1042 to 928ns, a 100ns improvement, but this number is noisy and goes up and down. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13267 Reviewed By: ezyang Differential Revision: D12829828 Pulled By: zou3519 fbshipit-source-id: df907731e2398ce2baf1c8b1860a561ccc456f78	2018-10-30 07:36:21 -07:00
Richard Zou	ee010a2bee	Operators that never (re)allocate memory do not need DeviceGuard (#13269 ) Summary: This PR removes DeviceGuard for the following native function tensor reshaping operations: - broadcast_tensors - chunk - expand - expand_as - narrow - reshape - reshape_as - select - slice - split - split_with_sizes - squeeze - squeeze_ - transpose - transpose_ - unsqueeze - unsqueeze_ There are probably more but I'm putting this out for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13269 Reviewed By: ezyang Differential Revision: D12830317 Pulled By: zou3519 fbshipit-source-id: 466a1bbd835aa708fe72c3c620e07fed3f85661f	2018-10-30 07:13:15 -07:00
Dmytro Dzhulgakov	47c0d88739	Bring back warning for dtype uninitialized in serialization (#13239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13239 Previous diff missed the if (dtype_initialized) check, duh. Also, for safety of spamming - using LOG_EVERY_MS if it's available Reviewed By: kennyhorror Differential Revision: D12818938 fbshipit-source-id: 76590bd1b28010fb13f5d33423c8eac1395e9f76	2018-10-29 22:09:54 -07:00
Edward Yang	bb703b1ff5	Remove defunct ATen/CUDAStream.h,cpp (#13251 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13251 Differential Revision: D12823807 Pulled By: ezyang fbshipit-source-id: 7fa1ecc8058f3b0dacf5d3a4054f10422832599d	2018-10-29 21:08:10 -07:00
Jerry Zhang	91e87c0395	Renaming size() to numel() - 2/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(size->numel): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp i-am-not-moving-c2-to-c10 Reviewed By: ezyang Differential Revision: D12833748 fbshipit-source-id: 98dc2d3abc23c177c2c9e457b81499952d4b690c	2018-10-29 18:59:29 -07:00
Anders Papitto	c82e8bf988	bump gloo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13286 Differential Revision: D12835150 Pulled By: anderspapitto fbshipit-source-id: 4e3bbca077447ef0c007568a359f2260229c2a51	2018-10-29 18:56:21 -07:00
Ailing Zhang	4a3baec961	Hub Implementation (#12228 ) Summary: [Edit: after applied colesbury 's suggestions] * Hub module enable users to share code + pretrained weights through github repos. Example usage: ``` hub_model = hub.load( 'ailzhang/vision:hub', # repo_owner/repo_name:branch 'wrapper1', # entrypoint 1234, # args for callable [not applicable to resnet18] pretrained=True) # kwargs for callable ``` * Protocol on repo owner side: example https://github.com/ailzhang/vision/tree/hub * The "published" models should be at least in a branch/tag. It can't be a random commit. * Repo owner should have the following field defined in `hubconf.py` * function/entrypoint with function signature `def wrapper1(pretrained=False, args, kwargs):` `pretrained` allows users to load pretrained weights from repo owner. * `args` and `kwargs` are passed to the callable `resnet18`, repo owner should clearly specify their help message in the docstring ``` def wrapper1(pretrained=False, args, kwargs): """ pretrained (bool): a recommended kwargs for all entrypoints args & kwargs are arguments for the function """ from torchvision.models.resnet import resnet18 model = resnet18(args, *kwargs) checkpoint = 'https://download.pytorch.org/models/resnet18-5c106cde.pth' if pretrained: model.load_state_dict(model_zoo.load_url(checkpoint, progress=False)) return model ``` Hub_dir * `hub_dir` specifies where the intermediate files/folders will be saved. By default this is `~/.torch/hub`. * Users can change it by either setting the environment variable `TORCH_HUB_DIR` or calling `hub.set_dir(PATH_TO_HUB_DIR)`. * By default, we don't cleanup files after loading so that users can use cache next time. * Cache logic : * We used the cache by default if it exists in `hub_dir`. * Users can force a fresh reload by calling `hub.load(..., force_reload=True)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12228 Differential Revision: D10511470 Pulled By: ailzhang fbshipit-source-id: 12ac27f01d33653f06b2483655546492f82cce38	2018-10-29 18:43:14 -07:00
mruberry	955a01562d	Removes debug spew in test_jit.py (#13280 ) Summary: Looks like a print() snuck in by accident with a recent PR and it's printing a lot of spew when the tests are run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13280 Differential Revision: D12833449 Pulled By: michaelsuo fbshipit-source-id: 5b50fd4b03bb73e5ca44cabdc99609c10017ff55	2018-10-29 18:25:30 -07:00
Peter Goldsborough	6071389a90	Enable cppcoreguidelines checks in clang-tidy (#12959 ) Summary: Enables most of `cppcoreguidelines-*` checks for clang-tidy. Major fixes included: - Uninitialized members, - Use of `const_cast`, - Use of raw `new` ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12959 Differential Revision: D11349285 Pulled By: goldsborough fbshipit-source-id: 9e24d643787dfe7ede69f96223c8c0179bd1b2d6	2018-10-29 18:23:35 -07:00
Jerry Zhang	8260441b45	Renaming size() to numel() - 1/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(size->numel): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12833710 fbshipit-source-id: aef469b7b6d7715dada593f0f55e5813fbd963ac	2018-10-29 18:01:01 -07:00
Xiaodong Wang	fbd497f169	Fix initialization order in MIOpen file (#13264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13264 Simply change the initialization order to make hcc happy. Otherwise will have to add -Wno-error=reorder. Reviewed By: bddppq Differential Revision: D12827635 fbshipit-source-id: 6f4cd67209f2aa8ae85cfbdc53df0efb3b3cc473	2018-10-29 16:48:54 -07:00
Tongzhou Wang	d8dab6ffa8	Add tensor.to(options) (#13146 ) Summary: ezyang on the template hack smessmer on SFINAE of the `TensorOptions(Device)` goldsborough on the C++ API test changes zdevito on the `jit` codegen changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146 Reviewed By: ezyang Differential Revision: D12823809 Pulled By: SsnL fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc	2018-10-29 16:26:06 -07:00
albanD	3365d74df9	Fix refcounting in anomaly metadata Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13249 Differential Revision: D12823875 Pulled By: soumith fbshipit-source-id: a0857a7cc8a4888aff99991fbae6bdd7a49d1ac4	2018-10-29 15:55:08 -07:00
David Brownell	50a8f8531b	Updated for for arbitrary command line arg ordering Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13253 Differential Revision: D12829884 Pulled By: soumith fbshipit-source-id: 9d8abcdf635e2daffce80ddf1e0e418a1e4c337d	2018-10-29 15:52:03 -07:00
sli	9d9e5f8d1e	Solve bug of DistributedDataParallel (#13248 ) Summary: Fixed bug [https://github.com/facebookresearch/maskrcnn-benchmark/issues/52](https://github.com/facebookresearch/maskrcnn-benchmark/issues/52) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13248 Reviewed By: pietern Differential Revision: D12830451 Pulled By: teng-li fbshipit-source-id: ab33faf3f6f4545f8fe07da7ecbeb2f0a2ea23f0	2018-10-29 15:19:55 -07:00
verhoek	33b00bdbb8	cwd arg in shell function of run_test set to optional (#13247 ) Summary: Tiny fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13247 Differential Revision: D12830311 Pulled By: soumith fbshipit-source-id: 405620e3a1de5bfc7e039f9aaf2f7cb7a3bca1b1	2018-10-29 15:17:00 -07:00
Jerry Ma	7956e9718b	Add name for required optimizer parameter. (#13202 ) Summary: Small change -- the benefit is that the docs will show ``<required parameter>`` instead of ``<object object>`` for these required parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13202 Reviewed By: SsnL Differential Revision: D12826252 Pulled By: jma127 fbshipit-source-id: 5f2c8495e5c56920377e4e012b8711e8f2a6e30e	2018-10-29 15:02:21 -07:00
Dong Shi	2e19529bd1	Add HasDeviceOption [nomnigraph] (#13206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13206 Add has device option for checking if a node has a device option set Reviewed By: bwasti Differential Revision: D12815365 fbshipit-source-id: 58477df93777f470cfb30cd75f02a659a7017b7c	2018-10-29 14:25:40 -07:00
Edward Yang	2cfe439cc7	Turn off tests for Travis-derived Python jobs. (#13252 ) Summary: They appear to timeout 30% of the time when run on CircleCI. Long term plan is to switch to using some binaries which are not provided by Travis. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13252 Differential Revision: D12828812 Pulled By: ezyang fbshipit-source-id: 7189e2a3200ae08c4ece16a27357ff0fd06f3adb	2018-10-29 14:04:57 -07:00
Dmytro Dzhulgakov	3c78cc6c2b	Remove Tensor(const Tensor&, BaseContext*, type) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13204 Reviewed By: ezyang Differential Revision: D11915764 fbshipit-source-id: baf883b3095bc9d5adf0b942eb874eaa7c1f45e5	2018-10-29 13:57:43 -07:00
Dmytro Dzhulgakov	5a2b2aa6af	Remove calls to CopyFrom that can be sync (#13205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13205 CopyFrom without context argument does the sync copy on the current gpu - exactly what most of the places need. This diff kills about 60% of CopyFrom usages. Most common pattern is gpu->cpu copy with further FinishDeviceComputation - the latter can be just killed. Reviewed By: Yangqing Differential Revision: D11236076 fbshipit-source-id: eb790ca494dfc5d5e3a7d850b45d6f73221bb204	2018-10-29 13:57:42 -07:00
Tongzhou Wang	8ad69a80e3	Test scripts only run cases defined in the running script (#13250 ) Summary: 1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it. 2. Adds an assertion in `load_tests` that each script only runs cases defined in itself. cc yf225 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250 Differential Revision: D12823734 Pulled By: SsnL fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b	2018-10-29 13:57:40 -07:00
James Reed	db0b5c7ab7	ArgumentStash for int64_t arguments (#12939 ) Summary: Closes https://github.com/pytorch/pytorch/issues/12906. https://github.com/pytorch/pytorch/issues/12580 is still open because the schema is marked as `traceable=false` in the arg parser constructor, I think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12939 Differential Revision: D10492031 Pulled By: jamesr66a fbshipit-source-id: ca5376de3997b5fb62b493e2e6a9bb0d6c3b9687	2018-10-29 13:55:24 -07:00
Anders Papitto	aabdcaa8fa	No tmp install (#13215 ) Summary: This is a small patch on top of https://github.com/pytorch/pytorch/pull/13150 - please review only the top commit here Pull Request resolved: https://github.com/pytorch/pytorch/pull/13215 Differential Revision: D12827675 Pulled By: anderspapitto fbshipit-source-id: adb01d72a827b6dbffc25f7f99fdc3129906b1ca	2018-10-29 12:59:44 -07:00
Anders Papitto	a69af69ffc	remove vestigial logic related to onnxbot tracking PRs (#13260 ) Summary: onnx always has a million branches so this is noisy Pull Request resolved: https://github.com/pytorch/pytorch/pull/13260 Differential Revision: D12827640 Pulled By: anderspapitto fbshipit-source-id: 55eced08970cc0a888bd8f7bc8670eea48deb288	2018-10-29 12:49:11 -07:00
Anders Papitto	380d2dfb27	absorb nccl (#13150 ) Summary: always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150 Differential Revision: D12815674 Pulled By: anderspapitto fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa	2018-10-29 12:04:32 -07:00
Peter Goldsborough	1c8a823b3b	More robust ABI compatibility check for C++ extensions (#13092 ) Summary: This PR makes the ABI compatibility check for C++ extensions more robust by resolving the real path of the compiler binary, such that e.g. `"c++"` is resolved to the path of g++. This more robust than assuming that `c++ --version` will contain the word "gcc". CC jcjohnson Closes #10114 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13092 Differential Revision: D12810448 Pulled By: goldsborough fbshipit-source-id: 6ac460e24496c0d8933b410401702363870b7568	2018-10-29 11:56:02 -07:00
Bram Wasti	48b98d2f7f	Expose nn:: namespace to python (#13132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13132 Expose more of the C++ API to python Reviewed By: duc0 Differential Revision: D10855086 fbshipit-source-id: 98cc89bc72ef91ed1c59c1a19688e047765cf90b	2018-10-29 11:36:51 -07:00
Sebastian Messmer	62b27d27b7	Re-enable experimental ops build (#12821 ) Summary: The experimental ops for the c10 dispatcher have accidentally been disabled in the oss build when the directory changed from `c10` to `experimental/c10`. This PR re-enables them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12821 Differential Revision: D10446779 Pulled By: smessmer fbshipit-source-id: ac58cd1ba1281370e62169ec26052d0962225375	2018-10-29 11:28:54 -07:00
Roy Li	b818d31a3e	use TypeMeta instead of ScalarType in TensorOptions (#13172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13172 reland D10419671 Reviewed By: ezyang Differential Revision: D12143282 fbshipit-source-id: 43504d06a901af30130ebe97fb0b33def45cdc9a	2018-10-29 11:15:37 -07:00
Jerry Zhang	dcbca53e58	Renaming size() to numel() - 1/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866373 fbshipit-source-id: 589194164d4fea93b74d83fa7fc4c59558c41f4a	2018-10-29 11:11:19 -07:00
Gregory Chanan	b1cf3ad1c2	More Declarations.cwrap functions moved to native, mainly LAPACK, sim… (#13194 ) Summary: …ple math. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13194 Reviewed By: ezyang Differential Revision: D12811972 Pulled By: gchanan fbshipit-source-id: 461beb5efa2b6aba0808d2419eb7eb3153d18d15	2018-10-29 11:03:04 -07:00
Gu, Jinghui	dbab9b73b6	seperate mkl, mklml, and mkldnn (#12170 ) Summary: 1. Remove avx2 support in mkldnn 2. Seperate mkl, mklml, and mkldnn 3. Fix convfusion test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170 Reviewed By: yinghai Differential Revision: D10207126 Pulled By: orionr fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51	2018-10-29 10:52:55 -07:00
zrphercule	bb96b6635c	Support upsample (#13152 ) Summary: This will enable the updated attribute and input format of operator upsample. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13152 Reviewed By: houseroad Differential Revision: D12812491 Pulled By: zrphercule fbshipit-source-id: d5db200365f1ab2bd1f052667795841d7ee6beb3	2018-10-29 10:40:35 -07:00
Anders Papitto	5be20f92ca	Towards a quieter CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13210 Differential Revision: D12824924 Pulled By: anderspapitto fbshipit-source-id: 76dc9d43a1b5c57eca1051ce6c92200b5fbda7ae	2018-10-29 10:35:40 -07:00
Ilia Cherniavskii	1032cf9fe4	Support for zero-length sequences in RNN executor (#13244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13244 Adding support for zero-length sequences into RNN executor Reviewed By: dzhulgakov Differential Revision: D10848803 fbshipit-source-id: f2994ee28c09fb30146243bb300ae7205024dd17	2018-10-29 10:32:42 -07:00
Sam Gross	52b6460d3a	Fix bug in some reductions that use global memory (#13211 ) Summary: Reductions that used global memory, but didn't reduce across threads in a warp did not have enough global memory allocated for their intermediate results. These reductions that were non-contiguous in their reduced dimension and large enough to benefit from reducing across blocks in a grid. Fixes #13209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13211 Differential Revision: D12815772 Pulled By: colesbury fbshipit-source-id: f78be2cb302e7567a76097ca3ba1e7b801c0cdad	2018-10-29 10:23:30 -07:00
Elias Ellison	9e6a695116	Add string equality test, string concat (#12992 ) Summary: Adding string equality comparison, and concat. Both are used in the standard library Pull Request resolved: https://github.com/pytorch/pytorch/pull/12992 Differential Revision: D10513681 Pulled By: eellison fbshipit-source-id: 1f845ef50be7850fdd3366951b20dc2a805c21fd	2018-10-29 10:13:21 -07:00
Michael Carilli	74ac86d2fe	Show demangled names on nvtx ranges (#13154 ) Summary: AsyncDBConnMarkedDownDBException As we discussed, this changes the backward pass profiler annotations such that 1. they're demangled and 2. if they came from a custom Python-side autograd function, they show a unique name based on the name of that Python-side function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13154 Differential Revision: D12808952 Pulled By: colesbury fbshipit-source-id: 4119dbaed7714b87c440a81d3a1835c5b24c7e68	2018-10-29 08:45:54 -07:00
Edward Yang	277b637811	Delete default constructor from CUDAStream. (#13021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13021 Let's make nullptr CUDAStream an illegal state. Reviewed By: gchanan Differential Revision: D10520421 fbshipit-source-id: 723c1f5130b2c92ec97411a958707fac4a90173f	2018-10-29 08:27:24 -07:00
Edward Yang	1a4473bbd7	Rewrite THPUtils_PySequence_to_CUDAStreamList to return vector<optional<CUDAStream>> (#13125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125 Previously, it returned a vector of THCStream*, which we eventually turned into CUDAStream. No need to spatter the conversion code everywhere: just do it correctly to begin with. An important side effect of doing it this way is that we no longer pass nullptr to CUDAStream; instead, we create the default stream. I will rely on this in a later patch. Reviewed By: gchanan Differential Revision: D10853224 fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a	2018-10-29 08:27:23 -07:00
Edward Yang	175f248310	Reduce sizes in TestUncoalescedSparse.test_to_sparse (#13236 ) Summary: The old test took 2min to run. Signed-off-by: Edward Z. Yang <ezyang@fb.com> See #13233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13236 Differential Revision: D12823474 Pulled By: ezyang fbshipit-source-id: c800492a96e41a4cd18d41901f411d9d4e978613	2018-10-29 08:01:58 -07:00
Gregory Chanan	71113c6b9e	Respect kwarg-only of native functions moved from Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13237 Reviewed By: ezyang Differential Revision: D12818917 Pulled By: gchanan fbshipit-source-id: 0ff55ccac3459edd3b28068a0378e9dae085eda0	2018-10-29 07:48:48 -07:00
Ilia Cherniavskii	4276fe7867	Support for saving exceptions in async CPU ops (#12904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12904 Enabling support for saving exceptions in async parts of CPU ops via event().SaveException(). The error contract for CPU ops becomes: - return false in sync part -> net->Run() returns false - throw in sync part -> net->Run() rethrows the same exception - SetFinished("error msg") in async part -> net->Run() returns false - event().SetFinishedWithException() in async part -> net->Run() rethrows the same exception Reviewed By: andrewwdye Differential Revision: D10479130 fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0	2018-10-29 04:57:40 -07:00
Edward Yang	4fe8ca74af	Test if GCC 7 fixes timeout problem. (#13230 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13230 Differential Revision: D12818863 Pulled By: ezyang fbshipit-source-id: 371337ca4b9d8f8e71eb78d6a53085e1c3619631	2018-10-28 20:53:07 -07:00
Edward Yang	34799faccd	Fix move constructor on c10d::CUDAEvent (#13183 ) Summary: Previously, the move constructor performed a swap between the item being moved in, and the uninitialized garbage from the object itself. I didn't bother adding a test because I shortly intend to kill this class entirely. But the fix is so easy that I wanted to put it in in case I don't get around to doing this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13183 Reviewed By: pietern Differential Revision: D12809062 Pulled By: ezyang fbshipit-source-id: 0d94bb9796fb7d30621256bfb401a4f89ba8ddc8	2018-10-28 17:47:12 -07:00
vishwakftw	1fe8278559	Batched Inverse (#9949 ) Summary: Complete billing of changes: Related to Batch Inverse: - [x] Add batched inverse (CPU) - [x] Add batched inverse (CUDA) - [x] Modify autograd entry - [x] Add tests - [x] test_autograd - [x] test_cuda - [x] test_torch - [x] Modify docs - [x] Remove `_batch_inverse` in `MultivariateNormal`. - [x] Allow batch matrices as inputs for negative powers in `matrix_power` Miscellaneous modifications: - [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops. - [x] Add a RAII structure for MAGMA queue management. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949 Differential Revision: D10559089 Pulled By: zou3519 fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4	2018-10-27 23:42:46 -07:00
James Sun	4d62eef505	Add Future to IValue (#12976 ) Summary: Future now is an IValue. prim::Wait now is replaced by aten::wait This PR is built on top of #12925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12976 Differential Revision: D10861483 Pulled By: highker fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7	2018-10-27 10:00:35 -07:00
Marat Dukhan	0f261ee359	Fix performance regresion introduced in D10524381 (#13199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13199 D10524381 removed inclusion of int8_simd.h in Caffe2 Int8 operators, and although the resuling code still compiles and works, it is up to 50% end-to-end slower (no SIMD!) on some models Reviewed By: bertmaher Differential Revision: D12813095 fbshipit-source-id: 03a713a4c070c0ad1e79e71e91d09eaddc0751eb	2018-10-27 08:16:49 -07:00
Ashish	df8c5a3572	Refactoring MIOpen activation ops (#13187 ) Summary: This pull request contains changes for: 1. Adding a generalized MIOpen activation class to be used by activation operators 2. Refactoring MIOpen ReLU op to use the new class 3. Adding ELU, Tanh and Sigmoid MIOpen ops Differential Revision: D12810112 Pulled By: bddppq fbshipit-source-id: 9519b3a0cd733b906bcba5d8948be089029c43ac	2018-10-27 00:22:54 -07:00
Lu Fang	f8864f0505	Revert "Move batch_norm to ATen/native, speed up (#12368 )" (#13191 ) Summary: Revert #12368 since it's causing onnx related test cases failing. https://github.com/pytorch/pytorch/pull/12368 SsnL The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577	2018-10-26 23:05:50 -07:00
Doug Friedman	bc352ace7c	dense.to_sparse() re: #8853 (#12171 ) Summary: Here is my stab at ```dense.to_sparse``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12171 Differential Revision: D10859078 Pulled By: weiyangfb fbshipit-source-id: 5df72f72ba4f8f10e283402ff7731fd535682664	2018-10-26 21:48:52 -07:00
Lu Fang	5182fdad0b	Compute the offset to make sure the order in InlineContainer test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13198 Reviewed By: bddppq Differential Revision: D12812909 Pulled By: houseroad fbshipit-source-id: f448e0d7957c316099a6b565d129eabb7ef81e59	2018-10-26 21:32:25 -07:00
Johannes M Dieterich	7a6e0bd77e	Skip ROCm tests that fail as per #12824 (#13181 ) Summary: For attention: bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/13181 Differential Revision: D12811207 Pulled By: bddppq fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3	2018-10-26 21:06:20 -07:00
Summer Deng	723f40d94e	video model test workflow on CPU (#13203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13203 Minor changes in the test workflow to run the model on CPUs Reviewed By: stephenyan1231 Differential Revision: D9925797 fbshipit-source-id: b7b1fb2658ab68b1ffc2b1f7b314958ea4732b32	2018-10-26 20:48:18 -07:00
Zachary DeVito	dae7616078	Shard all of tests based on how many tests exist. (#13160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9	2018-10-26 18:20:34 -07:00
Xiaomeng Yang	7637b7c966	Opitmize LayerNormOp (#13173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13173 Opitmize LayerNormOp Reviewed By: houseroad Differential Revision: D12398163 fbshipit-source-id: 6b76bc4bd9f34e623f8e385dd07d4ce99490badf	2018-10-26 17:00:18 -07:00
Jerry Zhang	537d671829	Renaming size() to numel() - 4/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866391 fbshipit-source-id: 3badc4e86edaac376918fca8d09dbfa396ac3a2c	2018-10-26 16:47:36 -07:00
Michael Suo	3ca272cf5a	Topologically-safe node moves (#13026 ) Summary: Add new methods to move a node before/after another node while preserving data data dependencies. Any suggestions for a pithier name for the methods would be appreciated 😃 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13026 Differential Revision: D10854574 Pulled By: QueryConnectionException fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e	2018-10-26 16:29:03 -07:00
Ilia Cherniavskii	620ece2668	Simplify thread pool creation logic (#13114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13114 Using one thread pool creator for all device types Reviewed By: manojkris, wesolwsk Differential Revision: D10851533 fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157	2018-10-26 16:02:08 -07:00
Benoit Steiner	63ce3fbde8	Created a transformer to convertr caffe2 NetDef into ONNX models. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13167 Reviewed By: abadams Differential Revision: D11296189 fbshipit-source-id: 7e49c7a78d26f4af39d50b40f70372272debb34a	2018-10-26 15:57:53 -07:00
Gregory Chanan	9e6bb605f6	Native wrappers for many Declarations.cwrap entries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13003 Differential Revision: D10515654 Pulled By: gchanan fbshipit-source-id: c3f2809fdb7daeea2209ef1bcdea60266dc4854d	2018-10-26 15:55:15 -07:00
Peter Goldsborough	80f766e5cd	Create FAQ (#13129 ) Summary: Creates a FAQ. https://github.com/pytorch/tutorials/pull/345 now just links to this page. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13129 Differential Revision: D10854264 Pulled By: goldsborough fbshipit-source-id: 6e57574ffa61409d4d9d1750aa618893b897ad41	2018-10-26 15:44:51 -07:00
Jerry Zhang	eea2ee6d29	Renaming size() to numel() - 1/17 Summary: Codemod generated with clangr shard mode, 25 files per diff Reviewed By: li-roy Differential Revision: D10866237 fbshipit-source-id: 020fcfdf52083430c5b674eda8e07ad3adfcc838	2018-10-26 15:36:59 -07:00
Jerry Zhang	06392bd6a3	Renaming size() to numel() - 3/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866389 fbshipit-source-id: 65489f7b3439ff9a62a5a09b77112f0f4931c609	2018-10-26 15:30:11 -07:00
Junjie Bai	883da952be	Hipify caffe2/core (#13148 ) Summary: petrex ashishfarmer iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148 Reviewed By: xw285cornell Differential Revision: D10862276 Pulled By: bddppq fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4	2018-10-26 15:27:32 -07:00
William Horton	1bec8f773b	Move ConstantPadNd into ATen (#10885 ) Summary: Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885 Differential Revision: D9643786 Pulled By: SsnL fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac	2018-10-26 15:25:27 -07:00
Jerry Zhang	e13e86724e	Renaming size() to numel() - 2/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866381 fbshipit-source-id: 2fabf78dfea262e0c789cf24cd3ca6191852983b	2018-10-26 15:21:50 -07:00
Sam Gross	b090a54a38	Enable MKLDNN in PyTorch in fbcode (#13165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13165 Also mark conflicting functions `static` to avoid duplicate symbol errors Reviewed By: orionr Differential Revision: D10998641 fbshipit-source-id: b93aab99b91daa1e082cc778abb28bf9d33c21d5	2018-10-26 14:52:19 -07:00
Sam Gross	e6ce9f303f	Check that QNNPACK directory exists in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13174 Differential Revision: D12808599 Pulled By: colesbury fbshipit-source-id: 2548a024043f32ee570378dfead8880b00608478	2018-10-26 14:37:11 -07:00
Edward Yang	f282fa1afe	Comment out LOG(ERROR) for legacy no-dtyle serialization behavior Reviewed By: wylqc Differential Revision: D12569279 fbshipit-source-id: 46def8ca163bcf9070a1179166fd8970e07ee229	2018-10-26 13:18:27 -07:00
Roy Li	0687f58441	Fix broken master (#13171 ) Summary: Fixes colliding changes in #12766 and #12368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13171 Differential Revision: D12109430 Pulled By: li-roy fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237	2018-10-26 12:30:55 -07:00
Peter Goldsborough	c21471c77f	Sampler serialization and deserialization (#12999 ) Summary: Implements serialization and deserialization for samplers in the C++ frontend dataloader. apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999 Differential Revision: D10859676 Pulled By: goldsborough fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d	2018-10-26 12:20:51 -07:00
Lu Fang	9f9f06c937	Improve inline container and add some test (#12993 ) Summary: Added getNextRecord/hasNextRecord methods. Even the model data is stored at the end, we can still read the file from the beginning. Added gtest to cover reader and writer's code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12993 Reviewed By: yinghai Differential Revision: D10860086 Pulled By: houseroad fbshipit-source-id: 01b1380f8f50f5e853fe48a8136e3176eb3b0c29	2018-10-26 12:06:47 -07:00
Wanchao Liang	7ca995c815	Add optional default type annotation to support JIT None default value (#13161 ) Summary: As titled, this PR is a part of tasks to unblock exporting the standard library Pull Request resolved: https://github.com/pytorch/pytorch/pull/13161 Differential Revision: D10866927 Pulled By: wanchaol fbshipit-source-id: 50038dbe6840b097b98cbed9d46a189a64e82302	2018-10-26 11:38:50 -07:00
Peter Goldsborough	8797bb1d30	Revert D10419671: use TypeMeta instead of ScalarType in TensorOptions Differential Revision: D10419671 Original commit changeset: 9cc8c5982fde fbshipit-source-id: c870ecdd3730cf695007ebb110d362996da05e5d	2018-10-26 11:09:58 -07:00
Zachary DeVito	ce0d3e9b35	Bind inplace and _out variants into JIT (#13093 ) Summary: This commit is a minimial initial pass at adding inplace and _out variants to the JIT. It changes gen_jit_dispatch.py to add bindings for these operators, and it also supplements the FunctionSchema with alias information for these operators and for viewing operators. Tests are very minimal and will need to be improved in future commits. Notes: * Custom operator tests needed to be changed since _out variants add overloads, which the custom operator pipeline does not handle when called from python. This commit registers special test ops in the _test namespace for this purpose. * Extends the schema parser to parse alias annotations more robustly. * Extends FunctionSchema with `writes()` a set of alias set names that the op will write to, and `annotatedType()` which will return AnnotatedType objects which contain the alias_set information that was parsed from the schema. * Disables all optimizations in graph executor when a mutable operator is found. This is something that will be improved in the future but is necessary for correctness now. * Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the aten ops. * Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types with their alias_set. These types only appear in schema when you call annotatedType and are erased from types in normal use. * Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new version of the same type, replacing the contained types with new_types. This simplifies a lot of logic for recursively cleaning up types. * Refactor List[T] into a common part that is shared with Annotated[T] and can be shared with Optional[T] and Future[T] when they are merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093 Differential Revision: D10848176 Pulled By: zdevito fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d	2018-10-26 10:37:20 -07:00
Roy Li	a70573b589	use TypeMeta instead of ScalarType in TensorOptions (#12768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12768 Note: DefaultTensorOptions no longer fits in 64-bits. I kept functions that take ScalarType as input to minimize changes for now. Reviewed By: ezyang Differential Revision: D10419671 fbshipit-source-id: 9cc8c5982fde9ff243e03d55c0c52c2aa2c7efd8	2018-10-26 09:27:12 -07:00
Roy Li	2f1542839f	reduce Device to 32bits (#12767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12767 In preparation of using TypeMeta in TensorOptions. We need TensorOptions to fit in 128-bits, this isn't possible if both TypeMeta and Device are 64-bit. Reviewed By: ezyang Differential Revision: D10416051 fbshipit-source-id: 23c75db14650f7f3045b1298977f61a0690a8534	2018-10-26 09:27:11 -07:00
Roy Li	a7ba4cb383	Change return type of Tensor::dtype() from ScalarType to TypeMeta (#12766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12766 In preparation of using TypeMeta in TensorOptions. Reviewed By: ezyang Differential Revision: D10232118 fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63	2018-10-26 09:27:09 -07:00
Pieter Noordhuis	46ef2b2898	Ignore flake8 warnings in test_c10d.py (#13159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13159 These lint violations are intentional. Reviewed By: ezyang Differential Revision: D10862131 fbshipit-source-id: 70ad4b0a360cb12d050805fd7b1080dfe4566e86	2018-10-26 09:17:57 -07:00
Pieter Noordhuis	435228508e	Remove test_distributed_trap.py (#13151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13151 No longer needed. Reviewed By: ezyang Differential Revision: D10862319 fbshipit-source-id: 01405d7cf2553f59ff7d3dce33755a5fdd8a8f05	2018-10-26 09:15:27 -07:00
Gregory Chanan	929bffe020	Turn some th_ prefixes into _th_ prefixes for conformity. (#13128 ) Summary: This is the same as https://github.com/pytorch/pytorch/pull/12889 with the addmm changes stripped out, since that appears to cause onnx broadcasting issues I don't understand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13128 Reviewed By: ezyang Differential Revision: D10853911 Pulled By: gchanan fbshipit-source-id: 08ec8629331972f0c332ccd036980fd9c87562b0	2018-10-26 08:08:09 -07:00
Dmytro Dzhulgakov	c95fa4b904	fix dtype uninitialized tensor serialization Summary: See D10380678 for the discussion. Caffe2 serialization code was able to handle dtype uninitalized tensor as long as their numel was 0 O_O. For safety to unblock the push I'm preserving this behavior with critical. As we fix all occurrences of old API, we can delete this test. Reviewed By: kennyhorror Differential Revision: D10866562 fbshipit-source-id: e172bd045fdfca660ff05b426e001f5f2f03f408	2018-10-26 01:30:47 -07:00
Peter Goldsborough	8e1e3ba7b8	Hide c10::optional and nullopt in torch namespace (#12927 ) Summary: Does ```cpp namespace torch { using c10::optional; using c10::nullopt; } ``` So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing). ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12927 Differential Revision: D10510630 Pulled By: goldsborough fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4	2018-10-26 00:08:04 -07:00
Dmytro Dzhulgakov	f72f91610f	Move stream to thread local (#13080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13080 This is the first step to untangle this logic: - moves stream id to thread local mechanically - relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom. The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much. Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse. Reviewed By: ezyang Differential Revision: D10525134 fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e	2018-10-26 00:04:32 -07:00
Thomas Viehmann	dc211c7de4	Move batch_norm to ATen/native, speed up (#12368 ) Summary: - Speed up the case of #12006 in the forward - The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case). - More extensive benchmarking shows not so great performance compared to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368 Differential Revision: D10559696 Pulled By: SsnL fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419	2018-10-25 23:41:10 -07:00
Marat Dukhan	5e73b828bd	CMake integration for Int8 ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13145 Differential Revision: D10860849 Pulled By: Maratyszcza fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5	2018-10-25 22:25:10 -07:00
Richard Zou	4870b1b68f	Speed up tensor.resize_(sizes) when tensor has correct size (#12824 ) Summary: While using gbenchmark, I found `tensor.resize_({0})` would take 300ns if tensor already has the correct size. This is important for `at::empty({0})` perf because `at::empty` always calls `resize_`, which in turn is a important for JIT perf: the fusion compiler creates empty tensors and then `resize_`s them to computed sizes. Most of the 300ns is due to DeviceGuard (200ns) Summary of findings: - `at::empty({0}, cuda)`: 851ns - `empty_tensor.resize({0})`: 308ns - `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this next because it impacts `resize_` perf). - vdispatch overhead (`tensor.resize_()` vs `at::native::resize__cuda(tensor)`): ~10ns This PR rips out the TH `resize_` implementation and adds it to ATen with the following modifications: - DeviceGuard used only after the same-size check. - Same-size check rewritten for simplicity. The new check doesn't affect perf. - empty_cpu / empty_cuda avoid the dispatch overhead to tensor.resize_. Timing with this PR: - `at::empty({0}, cuda)`: 363ns - `empty_tensor.resize_({0})`: 17ns Future: - Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes` - Should tell resize_as_ to use the new resize_ implementation... (because resize_as_ is in TH, it is calling the old TH resize_) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12824 Differential Revision: D10449209 Pulled By: zou3519 fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6	2018-10-25 21:09:41 -07:00
Roy Shi	60c0508d96	Use CAFFE_ENFORCE instead of CHECK in caffe2 rnn executor (#13144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13144 The intention of this diff is to prevent prevent predictor service from crashing by the "Check failed: timestep >= 0 && timestep < _T" error, as a bandage, before D10848803 can be landed (assuming D10848803 replaces the CHECKs into CAFFE_ENFORCEs, too). Reviewed By: ilia-cher Differential Revision: D10857963 fbshipit-source-id: bb56ad83aa867a2d25953aa7ffd84b078f8bf84a	2018-10-25 20:58:13 -07:00
zrphercule	5cbb33f939	Disable upsample optest (#13135 ) Summary: Temporarily disable upsample tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13135 Reviewed By: bddppq Differential Revision: D10859926 Pulled By: houseroad fbshipit-source-id: 9eb068198d43ba0939d81a9e41eb6f24ff19cb6d	2018-10-25 20:37:09 -07:00
Richard Zou	efab8e8fdf	Speed up tensor.get_device(), is_cuda(), is_sparse() by avoiding dispatches (#12841 ) Summary: `tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 2ns, but I'm not sure it's worth it. before: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns 3009690 ``` after: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12841 Differential Revision: D10489353 Pulled By: zou3519 fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e	2018-10-25 19:57:52 -07:00
Frank Jiang	b827a40880	Implement bucket-based attention pooling for IdScoreList features (#13004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13004 Implement BucketWeighted model layer, which learns a weight for each possible score in an IdScoreList. Here, we assume that the scores in the IdScoreList have already been converted into the appropriate 'buckets'. If this is not done, then essentially each score represents its own bucket. We assume that the scores/buckets are integers, and if max_score is not set, we assume that the maximum cardinality of the score is less than or equal to the cardinality of the ids. Reviewed By: chonglinsun Differential Revision: D10413186 fbshipit-source-id: 743e643a1b36adf124502a8b6b29976158cdb130	2018-10-25 18:04:08 -07:00
Wanchao Liang	3ac9a9577c	Remove optional from caffe2 utils (#12965 ) Summary: Now we have everything from c10::optional, we can delete this and keep a single version in c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12965 Differential Revision: D10504042 Pulled By: wanchaol fbshipit-source-id: c0ec3892e92968cca264ae8924c19111674631ba	2018-10-25 17:29:04 -07:00
Orion Reblitz-Richardson	99d24aefc3	Move a number of ATen checks out of Dependencies.cmake (#12990 ) Summary: cc Yangqing mingzhe09088 anderspapitto mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12990 Differential Revision: D10862301 Pulled By: orionr fbshipit-source-id: 62ba09cf0725f29692fac71bc30173469283390b	2018-10-25 17:26:25 -07:00
Yangqing Jia	852d6e8b65	Fix python2 and python 3 compatibility found by lint. (#13140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13140 This is an example about the benefit of proper facebook linter. The old code was not python 2.x (actually, pre-python 3.3) compatible. Note that FileExistsError is added in Python 3.3: https://stackoverflow.com/questions/20790580/python-specifically-handle-file-exists-exception Reviewed By: mingzhe09088 Differential Revision: D10858804 fbshipit-source-id: a4c995aef9f720cb8b0ce463f0a51db667fc42f2	2018-10-25 17:20:11 -07:00
Michael Suo	defe96eb6c	add topology index check in Graph::lint() (#13037 ) Summary: just a sanity check to make sure everything is in order Pull Request resolved: https://github.com/pytorch/pytorch/pull/13037 Differential Revision: D10854563 Pulled By: michaelsuo fbshipit-source-id: 409303c4cbf058b75e24bf2213b49e9d79cb862e	2018-10-25 17:02:38 -07:00
Pieter Noordhuis	526460fc8b	Use default timeout of 30 minutes for gloo backend (#13056 ) Summary: The existing default timeout was set at 10 seconds, which is too low for asynchronous tasks that depend on a barrier to resynchronize. Having a single timeout for all operations is not ideal and this will be addressed in future commits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13056 Reviewed By: teng-li Differential Revision: D10558746 Pulled By: pietern fbshipit-source-id: d857ea55b1776fc7d0baf2efd77951b5d98beabb	2018-10-25 16:35:53 -07:00
Wanchao Liang	4e1c64caee	Add c10::optional to type syntax (#12582 ) Summary: This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes #9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack. Follow up: remove python_default_init completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582 Differential Revision: D10417423 Pulled By: wanchaol fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89	2018-10-25 16:08:29 -07:00
Wendong Li	569a29b81a	Make chunk size configurable in SaveOp (#12949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12949 Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp. Reviewed By: mraway, xsh6528 Differential Revision: D10454037 fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104	2018-10-25 15:47:34 -07:00
Jerry Zhang	f6ccb6a0f9	bring caffe2::Tensor API closer to aten/pytorch (#13134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13134 For tensor, we plan to do the following renaming: ``` * t.ndim() → t.dim() * t.size() → t.numel() * dims() → t.sizes() * t.meta() → t.dtype() * t.dim(d) → t.size(d) ``` This diff adds new APIs in caffe2::Tensor so we can start codemod, we'll remove old API after the codemod Reviewed By: ezyang Differential Revision: D10856028 fbshipit-source-id: 1638997e234d7b3113ef8be65a16246f902273c7	2018-10-25 15:45:09 -07:00
Dmytro Dzhulgakov	49046239f2	Change explicit usages of at::optional to c10::optional (#13082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082 Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away. Reviewed By: ezyang, Yangqing Differential Revision: D10844117 fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899	2018-10-25 15:17:53 -07:00
Dmytro Dzhulgakov	be99eff75a	Back out "Revert D10494123: [c10] Remove at::Optional" (#12991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12991 Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage Reviewed By: ezyang, Yangqing Differential Revision: D10511254 fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202	2018-10-25 15:17:51 -07:00
Yangqing Jia	c47f680086	arc lint torch/utils (#13141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141 This is an example diff to show what lint rules are being applied. Reviewed By: mingzhe09088 Differential Revision: D10858478 fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff	2018-10-25 14:59:03 -07:00
Teng Li	4f94d82c7f	clang-format on c10d and THD (#13138 ) Summary: clang-format-6 run on all cpp,cc,c,cu,cxx,hpp,hxx,h files under /c10d and /thd Pull Request resolved: https://github.com/pytorch/pytorch/pull/13138 Differential Revision: D10857742 Pulled By: teng-li fbshipit-source-id: f99bc62f56019c05acdfa8e8c4f0db34d23b4c52	2018-10-25 14:16:47 -07:00
zrphercule	c6defa0847	Add randn in onnx symbolic (#12880 ) Summary: In this pr we added operator randn in onnx symbolic. Also, related tests are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12880 Reviewed By: houseroad Differential Revision: D10501788 Pulled By: zrphercule fbshipit-source-id: ba8bb00ca848c4b95decabf638a1bc13fe11d03e	2018-10-25 14:11:23 -07:00
Sebastian Messmer	979560c9fc	Include c10 namespace into caffe2 and at namespaces. (#12950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950 For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten. When we move classes from at/caffe2 to c10, this 1. allow keeping backwards compatibility with third paty code we can't control 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces. Reviewed By: ezyang Differential Revision: D10496244 fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e	2018-10-25 14:08:47 -07:00
Sebastian Messmer	d6fe812187	Fix TensorList ambiguity (#13024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13024 There's a TensorList type in ivalue.h and one in ScalarType.h, and they are different. This diff moves IValue types into an ivalue namespace so we can merge the namespaces without conflicts. Reviewed By: ezyang Differential Revision: D10518929 fbshipit-source-id: cb760b6804a399880d2bff3acf9a3422d99fc0b8	2018-10-25 14:08:45 -07:00
David Riazati	14ea4bf0d1	Make 7 nn modules into weak modules (#12966 ) Summary: Depends on #12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1)) * Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph * Adds `torch._jit_internal.weak_module` tags to modules that already work * `Sigmoid` * `Tanh` * `Hardshrink` * `PReLU` * `Softsign` * `Tanhshrink` * `PairwiseDistance` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12966 Differential Revision: D10559557 Pulled By: driazati fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0	2018-10-25 13:59:34 -07:00
Anders Papitto	e07e63f0b3	Absorb shm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13088 Differential Revision: D10856067 Pulled By: anderspapitto fbshipit-source-id: cfbf0f6cad3953e1ee1c55482c00a3db9f140594	2018-10-25 13:55:23 -07:00
Peter Goldsborough	175e553974	Do a better job of checking registered names (#13016 ) Summary: We currently don't check names in `register_module` and `register_parameter` as thoroughly as we do in Python. This PR fixes this. Python checks are e.g. in https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L108 ezyang ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13016 Differential Revision: D10853800 Pulled By: goldsborough fbshipit-source-id: 765357875e90a5046e72351a7a47a86511633ab6	2018-10-25 13:52:08 -07:00
Gregory Chanan	c91d982691	Improve expand error message by including complete sizes rather than … (#13124 ) Summary: …size at dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13124 Reviewed By: ezyang Differential Revision: D10853167 Pulled By: gchanan fbshipit-source-id: 76eeb922304bf19243d9bc52da87f2be8d1700ae	2018-10-25 13:37:25 -07:00
Marat Dukhan	9cb4bce847	Open-source Caffe2 Int8 ops (#13065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13065 - Open-source Caffe2 Int8 (quantized) operators Reviewed By: Yangqing Differential Revision: D10524381 fbshipit-source-id: 6daa153dc247572900c91e37262d033c368b382d	2018-10-25 12:43:00 -07:00
Edward Yang	faa354e102	Commentary about size constraints on TensorImpl. (#13126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13126 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D10454455 Pulled By: ezyang fbshipit-source-id: 7018a41b94e316305751f2f8ad2c2d049799f5d4	2018-10-25 12:24:49 -07:00
Edward Yang	cb15c7615a	Documentation on TensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12713 Reviewed By: li-roy, dzhulgakov Differential Revision: D10404407 fbshipit-source-id: cbc6be2172af068c3fc96e1f6da0b04b6f29ad4b	2018-10-25 12:24:48 -07:00
Peter Goldsborough	ae44627661	Rm test_jit.cpp (#12988 ) Summary: Removes test_jit.cpp, which was supposed to have been deleted in https://github.com/pytorch/pytorch/pull/12030 I had to move zou3519's dynamic DAG tests into `test/cpp/jit/tests.h` too. No other changes to `test_jit.cpp` seem to have happened in the meantime. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/12988 Differential Revision: D10854320 Pulled By: goldsborough fbshipit-source-id: 7ab533e6e494e34a16ce39bbe62b1150e48fcb58	2018-10-25 12:18:15 -07:00
Jerry Zhang	314d95a5f2	Renaming dims() to sizes() (caffe2/caffe2) - 3/4 (#13096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13096 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842875 fbshipit-source-id: 1784859735ed4d1bd5ccd7ca56e289498374a68f	2018-10-25 12:14:21 -07:00
Johannes M Dieterich	557db18c85	Enable MIOpen properly (#13048 ) Summary: * Disable MIOpen convolution on double tensors * MIOpen: set group count in convolution descriptor * MIOpen: Honor Max Dim (ROCm 222) * MIOpen: Batchnorm - Allow half/half and half/float, disallow double * Limit MIOpen batchnorm to same-precision * Fix maxdim check. (ROCm 246) * Fix reversed logic in DISABLE_MIOPEN (ROCm 253) * Export LANG/LC_ALL also for the test step. * Make tensors contiguous before calling MIOpen batch norm * Actually pass dilation to MIOpen. * Do not use MIOpen if there is dilation and the group size is > 1. - This is officially not supported currently. * Fixes for miopenforward bias call * Modified init conv descriptor param values and used same value for dilation * MIOpen: disable transposed convolutions For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13048 Differential Revision: D10785250 Pulled By: bddppq fbshipit-source-id: f9d9797de644652280d59308e5ea5cc07d177fd4	2018-10-25 11:32:49 -07:00
Tristan Rice	ab40eff5dd	caffe2: UpsampleBilinear CUDA implementation (#12843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843 This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp. The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation. Reviewed By: houseroad Differential Revision: D10453776 fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec	2018-10-25 11:10:04 -07:00
Richard Zou	796181d762	Fix UB in CPU_tensor_apply (#13121 ) Summary: std::memcpy has UB when either of src or dest are NULL, even if length is 0. This can and does happen when the input tensors are scalar tensors. This triggered UBSAN on #12824 but it is strange that it has not been triggered before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13121 Differential Revision: D10853113 Pulled By: zou3519 fbshipit-source-id: c4b4ad5e41de6f73dc755e0c25bc9947576a742d	2018-10-25 10:58:06 -07:00
David Riazati	eac3e7ab7c	improve constants error message (#13072 ) Summary: Adds the attribute name to the error message and fixes the corresponding test to actually run Pull Request resolved: https://github.com/pytorch/pytorch/pull/13072 Differential Revision: D10846622 Pulled By: driazati fbshipit-source-id: a7eee6320c28140c4937ede3d4e4685cfce08d84	2018-10-25 10:45:42 -07:00
Sam Gross	9fefab5ac6	Add support for reductions to TensorIterator (#11908 ) Summary: This adds support for reductions like sum() and mul() to TensorIterator. Performance is similar to existing optimized code for CPU, and generally better than existing code for CUDA kernels. The templatized CUDA kernel requires fewer instantiations than the existing THCReduce/THCReduceAll code. For example, sum() previously generated 43 CUDA kernels, while it now requires only one (larger) CUDA kernel. I suspect this should reduce code-size and compilation time, but I haven't measured it. Below are timings for sum() on [CPU](https://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz) (12 threads and 1 thread) and CUDA with various tensor sizes. CPU \| Reduction (dim) \| Master \| PR \| Master (1 thread) \| PR (1 thread) \| \|----------------------\|---------\|---------\|-------------------\|---------------\| \| 1024x1024 (all) \| 22 us \| 34 us \| 136 us \| 147 us \| \| 1024x1024 (0) \| 30 us \| 28 us \| 160 us \| 160 us \| \| 1024x1024 (1) \| 25 us \| 25 us \| 171 us \| 146 us \| \| 1024x10x1024 (all) \| 542 us \| 550 us \| 4.14 ms \| 3.11 ms \| \| 1024x10x1024 (0) \| 658 us \| 690 us \| 6.80 ms \| 5.93 ms \| \| 1024x10x1024 (1) \| 761 us \| 757 us \| 3.34 ms \| 3.52 ms \| \| 1024x10x1024 (2) \| 538 us \| 545 us \| 3.73 ms \| 3.04 ms \| \| 1024x1024x1024 (all) \| 72 ms \| 71 ms \| 364 ms \| 357 ms \| \| 1024x1024x1024 (0) \| 94 ms \| 90 ms \| 935 ms \| 927 ms \| \| 1024x1024x1024 (1) \| 80 ms \| 86 ms \| 881 ms \| 688 ms \| \| 1024x1024x1024 (2) \| 71 ms \| 71 ms \| 456 ms \| 354 ms \| CUDA \| Reduction (dim) \| M40 base \| M40 PR \| P100 base \| P100 PR \| \|----------------------\|----------\|---------\|-----------\|-----------\| \| 1024x10x1024 (all) \| 238 us \| 182 us \| 136 us \| 97 us \| \| 1024x10x1024 (0) \| 166 us \| 179 us \| 105 us \| 84 us \| \| 1024x10x1024 (1) \| 181 us \| 182 us \| 89 us \| 91 us \| \| 1024x10x1024 (2) \| 180 us \| 168 us \| 88 us \| 79 us \| \| 1024x1024x1024 (all) \| 17.5 ms \| 16.4 ms \| 8.23 ms \| 7.48 ms \| \| 1024x1024x1024 (0) \| 27.2 ms \| 28.6 ms \| 7.63 ms \| 7.38 ms \| \| 1024x1024x1024 (1) \| 16.5 ms \| 16.3 ms \| 7.66 ms \| 7.40 ms \| \| 1024x1024x1024 (2) \| 17.8 ms \| 16.4 ms \| 8.37 ms \| 7.31 ms \| Timings were generated with this script: https://gist.github.com/colesbury/d3238b266d8a9872fe6f68f77619b379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11908 Differential Revision: D10071760 Pulled By: colesbury fbshipit-source-id: 40e37a0e6803f1628b94cc5a52a10dfbb601f3d6	2018-10-25 09:42:55 -07:00
Jerry Zhang	e5752f2cb4	Renaming dims() to sizes() (fbcode) Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10848643 fbshipit-source-id: ac75833be8be9162e35b00dcd352f616bc7bbafe	2018-10-25 09:32:18 -07:00
harouwu	1720757220	added submodules for int8 ops (#13106 )	2018-10-25 09:11:11 -07:00
Pieter Noordhuis	2a6431ba2d	Use fixed MASTER_PORT in test_distributed (#13109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13109 The "right" strategy of creating a socket, binding to an undefined port, closing the socket, and reusing the port it was bound to, was subject to a race condition. Another process could bind to that same port sooner than the tests would, causing an "Address already in use" failure when rank 0 would try and bind to that same port. The THD tests have been using a fixed port since forever. Time will tell if this fixes #12876. Differential Revision: D10850614 fbshipit-source-id: c19f12bb4916141187ee8ddb52880f5f418310dc	2018-10-25 08:51:34 -07:00
Edward Yang	956e620c64	Eliminate numel == -1 state, delete Storage-only constructor (#12656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12656 I originally wanted to do this in two steps, but deleting the Storage-only constructor also changes the default numel state (which breaks tests), so easiest to do it all in one go.) - I still need a way to compute the correct TensorTypeId for all of the Caffe2 constructors; rather than hard-code it, I wrote a function in at::detail::computeTensorTypeId() to do this calculation. Maybe this function could be used more widely, but for now, it's used by Caffe2 only. - Added a pile more TensorTypeId for all of Caffe2's supported DeviceTypes - Because I still can't put arbitrary TypeMeta in TensorOptions, the TensorTypeId() calculation doesn't respect dtype. For now, this is not a problem, but this might block work to split non-POD dtypes into their own TensorTypeId. Reviewed By: li-roy Differential Revision: D10380678 fbshipit-source-id: 10c5d12020596fc9f27d5579adffad00513af363	2018-10-25 08:44:05 -07:00
Edward Yang	c368f26f88	Disable CircleCI merging to master. (#13074 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13074 Differential Revision: D10852728 Pulled By: ezyang fbshipit-source-id: 6b96c941f4655ba240adaa0678844efa2af81d06	2018-10-25 08:07:45 -07:00
Edward Yang	e8613d99b5	Delete ATen/CUDAGuard.h (#13078 ) Summary: It's empty. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13078 Differential Revision: D10843892 Pulled By: ezyang fbshipit-source-id: 39e6f73b3a8be3e7573c1af727b65da246d4515b	2018-10-25 07:52:38 -07:00
Andrey Malevich	6995b84d45	Make SparseToDense handle empty outputs properly. (#13043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13043 memset on nullptr is undefined-behavior and as a result filament_test is failing in dev build. This diff is making operator to handle empty output properly, so we can return that test back. I'm not sure either this is even valid to call this op with input that would require empty memset (empty batch?). Will leave this to ninghz and sunnieshang to decide. Reviewed By: xianjiec Differential Revision: D10525605 fbshipit-source-id: a911cdbd62fc3d948328981fd01cd205ec2ad99f	2018-10-25 00:27:52 -07:00
Bram Wasti	f1e4304d19	Add operator_def property to annotation (#13094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13094 Expose operator_def property Reviewed By: duc0 Differential Revision: D10847125 fbshipit-source-id: 67a066555b690715e1f5f04125fd446ab197f45a	2018-10-24 23:42:35 -07:00
Anders Papitto	b883afc928	Absorb c10d into the main cmake build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12953 Differential Revision: D10850274 Pulled By: anderspapitto fbshipit-source-id: 42296e6e49ad8c1845040e031eab95ddbaf58ae4	2018-10-24 22:34:00 -07:00
Teng Li	c250f6f3d5	DDP perf improvement: move sync_reduction to C++, dedicated CUDA streams for memcpy (#12954 ) Summary: - Moved sync_reduction to C++ - Use a dedicated CUDA stream for memcpy - Also use a dedicated CUDA stream for memcpy in queue_reduction Added test as well. CI should cover both DDP and unittest Pull Request resolved: https://github.com/pytorch/pytorch/pull/12954 Differential Revision: D10520069 Pulled By: teng-li fbshipit-source-id: 64348e4e43c15f9695a4c28b036c232587ecfb65	2018-10-24 21:37:13 -07:00
Anders Papitto	69906afaee	absorb THD into main cmake build (#12775 ) Summary: We want to move _C into the same cmake invocation that builds libcaffe2 and libtorch. However, _C depends on THD and c10d, which in turn depend on libcaffe2. That means that we can't move _C into that cmake file unless we do these two first. This change does so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12775 Differential Revision: D10457374 Pulled By: anderspapitto fbshipit-source-id: 2c1aa3b8a418a73d2112e93c7da53a2e70cf7bba	2018-10-24 21:28:37 -07:00
Teng Li	2d9b1fcd09	Make c10d support MPICH and further (#13083 ) Summary: Fixed issue: https://github.com/pytorch/pytorch/issues/12921 Build and works with mpich, all test passed. We should add MPICH to CI at one point of time alter Pull Request resolved: https://github.com/pytorch/pytorch/pull/13083 Reviewed By: soumith Differential Revision: D10844833 Pulled By: teng-li fbshipit-source-id: e8cdc866ee1ee7a33e469017ea562a08da119d53	2018-10-24 20:11:56 -07:00
Teng Li	b4d0dc77be	Eliminate CUDAStream nullptr in NCCL (#13089 ) Summary: As the title says, we should always use the current stream on device in NCCL. This can unblock ezyang on his further work Pull Request resolved: https://github.com/pytorch/pytorch/pull/13089 Reviewed By: ezyang Differential Revision: D10847172 Pulled By: teng-li fbshipit-source-id: 7fc7c4248b5efa1971d2af4d43f62d3379debfe4	2018-10-24 20:04:41 -07:00
iotamudelta	fc1c8f8b5b	Enable test_nn embedding tests and use correct warp size in Embedding.cu (#13046 ) Summary: * Enable test_nn embedding tests and use correct warp size in Embedding.cu * Fix embedding_backward_feature_kernel kernel for HIP For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13046 Differential Revision: D10560721 Pulled By: bddppq fbshipit-source-id: e6c3cbeb980a34ff52a92dba8bde745a2e03f2fd	2018-10-24 19:43:37 -07:00
Yiming Wu	444cc0ee0a	Back out "[pytorch][PR] added gemmlowp module" (#13090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13090 Original commit changeset: 7f8a649c739c Reviewed By: Maratyszcza Differential Revision: D10846367 fbshipit-source-id: a5a5aad29b51287dc1cb80c707eb5a0008ec78f5	2018-10-24 19:41:15 -07:00
Ailing Zhang	478886be30	Fix print precision and match numpy behavior (#12746 ) Summary: Fixes #12578 #9395. * Fix and simplify print logic * Follow numpy print rule `eb2bd11870/numpy/core/arrayprint.py (L859)` > scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3 I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks. ``` In [5]: torch.tensor(1) Out[5]: tensor(1) Out[2]: array(1) # numpy In [6]: torch.tensor(10) Out[6]: tensor(10) Out[3]: array(10) # numpy In [8]: torch.tensor(99000000) Out[8]: tensor(99000000) Out[5]: array(99000000) # numpy In [9]: torch.tensor(100000000) Out[9]: tensor(100000000) Out[6]: array(100000000) # numpy In [10]: torch.tensor(100000001) Out[10]: tensor(100000001) Out[7]: array(100000001) # numpy In [11]: torch.tensor(1000000000) Out[11]: tensor(1000000000) Out[8]: array(1000000000) # numpy In [12]: torch.tensor([1, 1000]) Out[12]: tensor([ 1, 1000]) Out[9]: array([ 1, 1000]) # numpy In [13]: torch.tensor([1, 1010]) Out[13]: tensor([ 1, 1010]) Out[10]: array([ 1, 1010]) # numpy ``` For floating points, we use scientific when `max/min > 1000 \|\| max > 1e8 \|\| min < 1e-4` Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy ``` In [14]: torch.tensor(0.01) Out[14]: tensor(0.0100) Out[11]: array(0.01) # numpy In [15]: torch.tensor(0.1) Out[15]: tensor(0.1000) Out[12]: array(0.1) # numpy In [16]: torch.tensor(0.0001) Out[16]: tensor(0.0001) Out[14]: array(0.0001) # numpy In [17]: torch.tensor(0.00002) Out[17]: tensor(2.0000e-05) Out[15]: array(2e-05) # numpy Out[5]: tensor(0.0000) # old In [18]: torch.tensor(1e8) Out[18]: tensor(100000000.) Out[16]: array(100000000.0) # numpy In [19]: torch.tensor(1.1e8) Out[19]: tensor(1.1000e+08) Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print Out[10]: tensor(110000000.) # old In [20]: torch.tensor([0.01, 10.]) Out[20]: tensor([ 0.0100, 10.0000]) Out[18]: array([ 0.01, 10. ]) # numpy In [21]: torch.tensor([0.01, 11.]) Out[21]: tensor([1.0000e-02, 1.1000e+01]) Out[19]: array([ 1.00000000e-02, 1.10000000e+01]) # numpy Out[7]: tensor([ 0.0100, 11.0000]) # old ``` When print floating number in int mode, we still need to respect rules to use scientific mode first ``` In [22]: torch.tensor([1., 1000.]) Out[22]: tensor([ 1., 1000.]) Out[20]: array([ 1., 1000.]) # numpy In [23]: torch.tensor([1., 1010.]) Out[23]: tensor([1.0000e+00, 1.0100e+03]) Out[21]: array([ 1.00000000e+00, 1.01000000e+03]) # numpy Out[9]: tensor([ 1., 1010.]) # old ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12746 Differential Revision: D10443800 Pulled By: ailzhang fbshipit-source-id: f5e4e3fe9bf0b44af2c64c93a9ed42b73fa613f5	2018-10-24 18:12:51 -07:00
Bram Wasti	3761adc889	C++ API Cleanup Extension (#13087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13087 API changes that simplify subgraph replacement drastically Reviewed By: duc0 Differential Revision: D10444011 fbshipit-source-id: 22c699bb5bc0f21538c70fe9401899d4f7e1b055	2018-10-24 18:06:50 -07:00
Bram Wasti	3fa9ccf1ba	Add new NeuralNetOps for fusion (#13068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13068 Basic ops.def update and converter.cc updates This is the standard way to ingest networks into nomnigraph redo of D10412639 Reviewed By: ZolotukhinM Differential Revision: D10560324 fbshipit-source-id: c8ccb0aabde6ee8f823657ee5cd3ed9ed6c45549	2018-10-24 18:06:49 -07:00
Bram Wasti	e0a8665d03	Converter fix to allow unimplemented convertToOperatorDef (#13069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13069 simply a new fallback Reviewed By: ZolotukhinM Differential Revision: D10591414 fbshipit-source-id: 1ad8f16135a6c68b2df889101f06b736a3e4f7da	2018-10-24 18:06:48 -07:00
Bram Wasti	ef019a2d18	Improve the C++ API (#13067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13067 Cleaning up the interface for nomnigraph in C++ world redo of D10438090 Reviewed By: ZolotukhinM Differential Revision: D10560323 fbshipit-source-id: e4e084284615e813836a7d031b5a71e8d80b0e62	2018-10-24 18:06:46 -07:00
Jerry Zhang	3b919a6f82	Renaming dims() to sizes() (caffe2/caffe2) - 1/4 Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842786 fbshipit-source-id: 551421a2cb4d2f2fc7f43775d4554643de0f0694	2018-10-24 17:36:08 -07:00
Yiming Wu	9573ecefe3	Back out "[pytorch][PR] Add sse2neon tp" (#13091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13091 Original commit changeset: 8b4f9f361cc1 Reviewed By: Maratyszcza Differential Revision: D10846301 fbshipit-source-id: 2798f1fca5c1a2362979977ef5eb724dd37c4e6d	2018-10-24 17:17:34 -07:00
Junjie Bai	e290a9d2fd	Back out "Migrate DeviceOption.numa_node_id to DeviceOption.device_id" Summary: Original commit changeset: 82583d0ad4b8 Reviewed By: enosair, ilia-cher Differential Revision: D10560741 fbshipit-source-id: e289a37d441bd2243b369810abf451292891d9ee	2018-10-24 17:11:25 -07:00
Junjie Bai	ccfaf46431	Make CUDNN an alias of MIOPEN for HIP ops (#12278 ) Summary: This is mostly for reusing all the cudnn test cases in our python operator_tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278 Differential Revision: D10842592 Pulled By: bddppq fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898	2018-10-24 17:07:31 -07:00
Kashif Rasul	e1243cef88	fixed docs for Student-T distribution (#13044 ) Summary: added loc and scale args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13044 Differential Revision: D10560762 Pulled By: ezyang fbshipit-source-id: 6c98ecc04975df8993364b06c480d015a25e2061	2018-10-24 16:59:23 -07:00
Peter Goldsborough	86881cdb39	MNIST images should have an extra dim (#13060 ) Summary: Our convolution ops and such expect three dimensional images, but the images in the MNIST dataset of the C++ frontend currently only have two. apaszke ebetica soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13060 Differential Revision: D10560754 Pulled By: goldsborough fbshipit-source-id: a2cc877b4f43434482bec902c941fafb7a157d5d	2018-10-24 16:53:37 -07:00
David Riazati	6727133f3d	Support warnings.warn (#12964 ) Summary: `warnings.warn` is used commonly thoughout `nn.functional`, so this adds support for it by forwarding its arguments to `print` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12964 Differential Revision: D10559427 Pulled By: driazati fbshipit-source-id: 5b591f6f446c906418f9fc7730c17e301f263d9b	2018-10-24 16:48:02 -07:00
Jerry Zhang	b790fcaf39	Renaming dims() to sizes() (caffe2/caffe2) - 4/4 Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842900 fbshipit-source-id: 8d58ed4d403fb0308a8fa286659f8e830b040bec	2018-10-24 16:32:51 -07:00
Edward Yang	a4475d529d	Use GetFetchStackTrace for the AT_* error macros too. (#13007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13007 No reason to use the hook if it's set, this helps fbcode traces. This slightly pessimizes the stack trace for ATen functions, because we are no longer skipping all of the frames we should. This is probably OK. Reviewed By: Yangqing Differential Revision: D10518499 fbshipit-source-id: be54e490df3c3fde7ff894b5b1473442ffc7ded3	2018-10-24 16:18:25 -07:00
Pieter Noordhuis	917b203b01	Assert spawned processes terminating in distributed tests (#13071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13071 In the case where a process got stuck and timed out on joining, we would see a None != 1 assertion error in the code path where the exit statuses are compared. This implies that the first process exited with exit code 1 and another one didn't exit at all. With this commit the error message is more descriptive. Differential Revision: D10785266 fbshipit-source-id: c8cc02d07ea4fdc6f5374afd9a0aac72218fe61d	2018-10-24 16:03:36 -07:00
Jerry Zhang	2ac7b6b683	Tensor dims() -> sizes() (caffe2/operators) - 5/5 (#13032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13032 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476235 fbshipit-source-id: 263ad75689d864b414dae63cb9a30cb3285dae31	2018-10-24 15:07:43 -07:00
Jerry Zhang	cccd457a1e	Tensor dims() -> sizes() (caffe2/operators) - 4/5 (#13031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13031 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476232 fbshipit-source-id: cb4ad76be068065eb2c5e7d87f33d04423cf93c4	2018-10-24 15:07:42 -07:00
Jerry Zhang	ab253c2bf1	Tensor dims() -> sizes() (caffe2/operators) - 3/5 (#13030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13030 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476226 fbshipit-source-id: 757583e3bde8d5246565433883bd328ab34f3e09	2018-10-24 15:02:40 -07:00
Yiming Wu	b55dc8d971	Add sse2neon tp (#12948 ) Summary: Adding sse2neon in thrid-party as dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/12948 Differential Revision: D10801574 Pulled By: harouwu fbshipit-source-id: 8b4f9f361cc1722f631830f7675b9d209a9f22ef	2018-10-24 14:56:24 -07:00
Jerry Zhang	be43a0faa9	Tensor dims() -> sizes() (caffe2/operators) - 2/5 (#13029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13029 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476225 fbshipit-source-id: 5e63ca80b3843967ea1661ada447bbc18661378d	2018-10-24 14:34:45 -07:00
Jerry Zhang	07c0f4a097	Tensor dims() -> sizes() (caffe2/operators) - 1/5 (#13028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13028 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476220 fbshipit-source-id: 3c3b3d5e2082cd6a1f0ff4a3c8641b30e6f16896	2018-10-24 14:18:18 -07:00
Orion Reblitz-Richardson	4b5d13abab	Use cmake3 if it exists and cmake isn't sufficient (#12972 ) Summary: A tweak to https://github.com/pytorch/pytorch/pull/12916 that only uses cmake3 when cmake isn't good enough. Hopefully fixes the issue zdevito saw. cc zdevito SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12972 Differential Revision: D10560674 Pulled By: orionr fbshipit-source-id: 90c71929630bb8167a3ee2cc6f306eefe5b85445	2018-10-24 14:14:39 -07:00
Duc Ngo	10046c2b2b	nomnigraph - (easy) Expose operators (#13063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13063 Expose the following operators GatherRanges Slice MergeIdLists Reviewed By: itomatik Differential Revision: D10560138 fbshipit-source-id: 90f74d7d4c2bfca40788a5fcec4c73d71b156d3b	2018-10-24 14:09:27 -07:00
Yiming Wu	c64a65c977	added gemmlowp module (#12947 ) Summary: Adding gemmlowp dependency in thrid-party folder Pull Request resolved: https://github.com/pytorch/pytorch/pull/12947 Differential Revision: D10794559 Pulled By: harouwu fbshipit-source-id: 7f8a649c739ccb6c307327080711379b1db8c3e0	2018-10-24 13:53:58 -07:00
David Reiss	0f5cee2f6b	Convert some docstrings from char* to char[] (#13062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13062 Gold (the linker) isn't able to gc unreferenced string constants, but converting these to arrays puts them in their own data sections and reduces (Android) binary size as a result. I'm told even in server builds, this reduces binary size by a few dozen bytes and speeds up startup by a few hundred ns. :-P Reviewed By: Yangqing Differential Revision: D10510808 fbshipit-source-id: 247ba9574e7a9b6a8204d33052994b08c401c197	2018-10-24 13:48:18 -07:00
David Reiss	97b6a25329	Use REGISTER_CPU_GRADIENT_OPERATOR for many operators (#12616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12616 Focusing on operators in common use on mobile. Also use GRADIENT_OPERATOR_SCHEMA. Reviewed By: Yangqing Differential Revision: D10245216 fbshipit-source-id: 5cc023da170149b637fe3c729d3756af948aa265	2018-10-24 13:48:17 -07:00
Edward Yang	df47bbe9c1	Fix test_glu_old HealthCheck with smarter generation strategy. (#12975 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12975 Differential Revision: D10513493 Pulled By: ezyang fbshipit-source-id: ac183aeb4ae7f0a5f91f1a369b595ae92c3e844d	2018-10-24 13:45:19 -07:00
Anders Papitto	2dacf28b66	link libgloo_cuda.a explictly from setup.py (#12951 ) Summary: rather than pass a list through a text file Pull Request resolved: https://github.com/pytorch/pytorch/pull/12951 Differential Revision: D10528309 Pulled By: anderspapitto fbshipit-source-id: d94befcd61b6304815859694b623046f256462df	2018-10-24 13:19:46 -07:00
Jerry Zhang	dd7c2d4284	Change the function signature for caffe2::empty (#13015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13015 att Reviewed By: ezyang Differential Revision: D10469310 fbshipit-source-id: f4621fe5d17bb4663192860f81effe6bdfe21bea	2018-10-24 13:14:24 -07:00
Viswanath Sivakumar	1bea5fc3ad	Fix UpsampleNearest op CPU impl batch handling (#13002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13002 Batch dim wasn't handled in the CPU impl (will fail for inputs with N > 1). Fixing that here. Differential Revision: D10515159 fbshipit-source-id: ee7e4f489d2d4de793f550b31db7c0e2ba3651e8	2018-10-24 13:10:53 -07:00
Jerry Zhang	353fdefdd6	dims() -> sizes() (caffe2/core) (#13014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13014 Tensor method renaming using clangr Reviewed By: ezyang Differential Revision: D10467556 fbshipit-source-id: 7d7eaf5fc59bbb493c057d5b8bfdda03b140c97e	2018-10-24 12:49:28 -07:00
103yiran	0a190c8869	Move the location of annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12969 Differential Revision: D10560824 Pulled By: ezyang fbshipit-source-id: 86c21149682db5ebfd9610df9e9845688a3db3b0	2018-10-24 12:35:08 -07:00
Fei Sun	fcf801f061	Support building binary on windows machines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13059 Reviewed By: llyfacebook Differential Revision: D10560147 Pulled By: sf-wind fbshipit-source-id: c8f38b30c9acdf6ae494e56a5876fd4493696e5d	2018-10-24 12:24:42 -07:00
Will Feng	8355219e68	CircleCI: turn off OSX jobs temporarily Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13064 Differential Revision: D10561008 Pulled By: yf225 fbshipit-source-id: c48364662efa82865a1bc1a7e2db3a9fb8af10d5	2018-10-24 12:22:05 -07:00
Anders Papitto	85273acca8	fix pinning of hypothesis (#13055 ) Summary: tested manually that this works fixes https://github.com/pytorch/pytorch/issues/12395 obviates https://github.com/pytorch/pytorch/pull/12774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13055 Differential Revision: D10559788 Pulled By: anderspapitto fbshipit-source-id: 5cd8bac6eff548280c8742f36a5e7f2748a24623	2018-10-24 11:46:28 -07:00
Jesse Hellemn	448a32e0ee	Adding timestamps to the beginning of every test file in run_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12994 Reviewed By: anderspapitto Differential Revision: D10515291 Pulled By: pjh5 fbshipit-source-id: 191054cdacff308b63e9063d22d62314398e4f88	2018-10-24 11:42:31 -07:00
Zachary DeVito	6c8d47f2af	Add methods to FunctionSchema (#12967 ) Summary: We are beginning to use this class in a wider reaching set of use-cases. This PR refactors it so that we always access schema properties through methods. This will make adding extra information like alias information easier (i.e. we can a version of `type()` that returns the type with alias information and another version that returns a type without that information). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12967 Differential Revision: D10502674 Pulled By: zdevito fbshipit-source-id: a88783ed8f20ab3be6460c12da95f9f940891c44	2018-10-24 10:32:27 -07:00
Yangqing Jia	52beb338ab	Add Modules_CUDA_Fix folder to installed folder (#13013 ) Summary: This is used to patch our cmake cuda scripts - should be in the installation script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13013 Reviewed By: ir413 Differential Revision: D10519104 Pulled By: Yangqing fbshipit-source-id: 542049224ea41068f32d4c0f6399c7e8b684f764	2018-10-24 10:16:18 -07:00
Tongzhou Wang	46162ccdb9	Autograd indices/values and sparse_coo ctor (#13001 ) Summary: Reopen of #11253 after fixing bug in index_select Pull Request resolved: https://github.com/pytorch/pytorch/pull/13001 Differential Revision: D10514987 Pulled By: SsnL fbshipit-source-id: 399a83a1d3246877a3523baf99aaf1ce8066f33f	2018-10-24 10:00:22 -07:00
Roy Li	e0f21a4977	restore caffe2 strides (#12883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12883 Attempting to do this again. last try broke oss ci: D10421896 Reallocation of strides_ if there's no change in dim seems to cause the error that broke internal flow last time. This fixes that. Found a potential race condition in caffe2 counter ops that might be the cause, we will investigate that. Reviewed By: ezyang Differential Revision: D10469960 fbshipit-source-id: 478186ff0d2f3dba1fbff6231db715322418d79c	2018-10-24 09:45:46 -07:00
anderspapitto	88f70fcef9	remove progress from git operations in CI builds (#13017 ) Summary: these are pretty spammy - unless we have a reason to keep them, let's not Pull Request resolved: https://github.com/pytorch/pytorch/pull/13017 Differential Revision: D10528295 Pulled By: anderspapitto fbshipit-source-id: 5514371a6e61e13ec070cc5517488523d42f2935	2018-10-24 09:26:05 -07:00
Richard Zou	7863c17b26	Fix convtranspose3d output_size calculation (#12952 ) Summary: Closes #2119. There was a small bug where the output_size got sliced with `[-2:]` where we really meant to slice it as `[2:]` (to remove the batch and channel dimensions). Added a new test for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952 Differential Revision: D10510678 Pulled By: zou3519 fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2	2018-10-24 09:23:05 -07:00
Orion Reblitz-Richardson	046672eed5	Set proper scope on nodes added by JIT (#12400 ) Summary: In order to support tensorboardX and other visualization tools, we need to make sure a non-empty scope is set on all nodes added by the JIT. This attempts to do this, but is still a WIP. This is a new version of https://github.com/pytorch/pytorch/pull/10749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12400 Reviewed By: ezyang Differential Revision: D10224380 Pulled By: orionr fbshipit-source-id: d1bccd0eee9ef7c4354112c6a39a5987bfac2994	2018-10-24 09:05:46 -07:00
Soumith Chintala	cf235e0894	fix lint after new flake8 release added new style constraints (#13047 ) Summary: fix lint after new flake8 release added new style constraints Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047 Differential Revision: D10527804 Pulled By: soumith fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8	2018-10-24 09:03:38 -07:00
Edward Yang	d72de9fb1e	Replace direct use of int32_t with an alias DeviceIndex (#13019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13019 It just makes the semantic meaning of the int32_t a little bit clearer. Reviewed By: zou3519 Differential Revision: D10520295 fbshipit-source-id: 45b0bd1b6afddee17072b628d8e9b87d7c86e501	2018-10-24 08:27:45 -07:00
Edward Yang	34cca9f05b	Move Device and DeviceType to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12995 Reviewed By: Yangqing Differential Revision: D10513246 fbshipit-source-id: 0c6d52e09166d7e8a786c1a0e21685ec9c35b12a	2018-10-24 08:27:44 -07:00
Edward Yang	ca03c10cef	Rename createCUDAStream() to getStreamFromPool() (#12940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12940 Dmytro was reading this code and requested that we rename the interface to something that made it more obvious that pooling was going on. Seems reasonable to me! Final name is a suggestion from Pieter. Reviewed By: dzhulgakov Differential Revision: D10492071 fbshipit-source-id: b1c2cac760f666968d58166be649dabfe1127c5e	2018-10-24 07:23:31 -07:00
Edward Yang	924326e171	Revert D10438090: [nomnigraph] Improve the C++ API Differential Revision: D10438090 Original commit changeset: 6b4309b8a4b3 fbshipit-source-id: 5f6a28cf032e0be2544f0b33508148f4f49e10c5	2018-10-24 07:04:33 -07:00
Edward Yang	97d4c05566	Revert D10412639: [nomnigraph] Add new NeuralNetOps for fusion Differential Revision: D10412639 Original commit changeset: a4c523fda96b fbshipit-source-id: 973b6dd30b63b9a08069275278b0780b65067635	2018-10-24 07:04:31 -07:00
Yinghai Lu	17c6d168de	Attach Shape node if Concat node has 2 outputs (#13006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13006 In Caffe2, Concat can have 2 outputs. The second being the output shape of the 1st output. In ONNX, Concat only has 1 output. So when we do the exporting, we need to add a `Shape` to the first output and generate the second output from it. Differential Revision: D10517698 fbshipit-source-id: 38e974423e2506b16d37b49d51c27ad87b73e63a	2018-10-23 22:56:48 -07:00
Bram Wasti	53ac4de79d	Expose basic transformation API to Python (#13033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13033 Basic graph manipulation exposed to python Reviewed By: ZolotukhinM Differential Revision: D10519720 fbshipit-source-id: 0f9a494d122289a3a9e23d4cff99ac0a21382ec6	2018-10-23 20:54:54 -07:00
David Riazati	4e0b6c8500	Speed up resolution callback creation (#12859 ) Summary: `inspect.stack()` calls are slow since they access a bunch of extra info about the frame. This PR instead uses `inspect.currentframe()` and goes up the stack until it reaches the correct frame. [Context](stackoverflow.com/questions/17407119/python-inspect-stack-is-slow) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12859 Differential Revision: D10509912 Pulled By: driazati fbshipit-source-id: b85325adf1b3c85a1a3a82e96e567b8be498531b	2018-10-23 20:40:04 -07:00
Bram Wasti	08d99c4486	Add new NeuralNetOps for fusion Summary: Basic ops.def update and converter.cc updates This is the standard way to ingest networks into nomnigraph Reviewed By: duc0 Differential Revision: D10412639 fbshipit-source-id: a4c523fda96bbe0e31de0d9fcf795ae9c7377c90	2018-10-23 19:27:10 -07:00
Bram Wasti	9c1195fe61	Improve the C++ API Summary: Cleaning up the interface for nomnigraph in C++ world Reviewed By: duc0 Differential Revision: D10438090 fbshipit-source-id: 6b4309b8a4b3730f3309edf0047d4006a001895b	2018-10-23 19:27:09 -07:00
Elias Ellison	f9b7ce9c99	Add tuple indexing support for constant integers (#11492 ) Summary: Add support indexing tuples with constant integers by creating a new prim::TupleIndex operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11492 Differential Revision: D9811996 Pulled By: eellison fbshipit-source-id: a458c2522b3c81476252d920e27a8d6c7b9a036b	2018-10-23 17:52:03 -07:00
Yangqing Jia	ff508c91a1	Remove numba dependency Summary: TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers. Converted the old test to a non-numba regular python op test. Reviewed By: xw285cornell Differential Revision: D10519910 fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833	2018-10-23 17:03:47 -07:00
Michael Antonov	a6949abb15	Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (fixed reverted bug) (#12848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848 Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call SerializeAsString_EnforceCheck so that the return value is checked and can throw an exception if failing. Most of the affected code was called from classes derived from BlobSerializeBase. Didn't touch most tests and ENFORCE calls because they usually do checks anyway. Original commit changeset: c0760e73ecc7 Reviewed By: dzhulgakov Differential Revision: D10453456 fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070	2018-10-23 16:21:26 -07:00
Michael Suo	dd00c2997f	fix expect tests (#13005 ) Summary: the topological index shuffled arguments around, updating expect files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13005 Differential Revision: D10517246 Pulled By: michaelsuo fbshipit-source-id: 8f95e4e4ca8ff51da0507f9b0eb838c23ddaa821	2018-10-23 15:53:16 -07:00
Mikhail Zolotukhin	821b04e819	Nomnigraph: Remove Copy constructor and copy assign operator from BasicBlock, add move constructor. Summary: We cannot use copying as it loses recorded callbacks and thus after copying tracked values are no longer tracked. Reviewed By: bwasti, duc0 Differential Revision: D10510057 fbshipit-source-id: b64fdef3fb28fc26fe55eba41f4b5007ba6894de	2018-10-23 15:41:48 -07:00
Sebastian Messmer	83f788d088	Fix MSVC build for Python 3.6 (#12878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12878 Python 3.6 headers define their own ssize_t, which clashes with our definition. Luckily, they also define a `HAVE_SSIZE_T` macro we can use to check for this case. Reviewed By: ezyang Differential Revision: D10467239 fbshipit-source-id: 661675ad1e30a6ca26d6790eaa75657ef6bf37c2	2018-10-23 15:30:01 -07:00
iotamudelta	b8a11cffdb	Minor improvements cherry-pick (#12973 ) Summary: * Enable disabled functions for ROCm (ROCm 252) * fixes for topk fp16 (ROCm 270) * HIP needs kernel invocation to be explicitly templated to be able to take non-const arg as const kernel arg (ROCm 281) For attention: bddppq ezyang Full set of PyTorch/Caffe2 tests on ROCm here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12973 Differential Revision: D10516072 Pulled By: bddppq fbshipit-source-id: 833b3de1544dfa4886a34e2b5ea53d77b6f0ba9e	2018-10-23 15:03:47 -07:00
Junjie Bai	223a96a9a0	Add missing NCHW2NHWC symbols for HIP (#13000 ) Summary: petrex ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/13000 Differential Revision: D10516020 Pulled By: bddppq fbshipit-source-id: 017bd393da3d97fbae3f0227ad01977c5c0744c6	2018-10-23 14:20:33 -07:00
iotamudelta	470e766062	Fix illegal code in rocblas_handle rocblas_handle() that causes failure w/ gcc as base compiler (#12957 ) Summary: The legal function cublasHandle_t cublas_handle() was hipified to the clearly illegal rocblas_handle rocblas_handle(). It should not work and correctly fails with gcc as the host compiler as it induces an ambiguity. Function now hipifies to rocblas_handle rocblashandle() Fixes long standing issue we've observed in PyTorch when base compiler is gcc. For attention: bddppq ezyang Tests on ROCm PyTorch/Caffe2: https://github.com/ROCmSoftwarePlatform/pytorch/pull/284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12957 Differential Revision: D10501227 Pulled By: bddppq fbshipit-source-id: 568cb80801c0d14c9b1b61e3a7db387a5c21acf4	2018-10-23 13:46:15 -07:00
Pat Mellon	21285e73da	Add Google pixel code Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12998 Differential Revision: D10515096 Pulled By: JoelMarcey fbshipit-source-id: 7f97014451448a70ea7f91d7d8bd96fbf6e83f7f	2018-10-23 13:26:37 -07:00
Peter Goldsborough	8e4bea107a	Fix clang-tidy 404 in Travis Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12963 Differential Revision: D10510026 Pulled By: goldsborough fbshipit-source-id: b6b9634a7a2575ff4e2983321d2e4e5829626347	2018-10-23 09:34:43 -07:00
Peter Goldsborough	9ea19cb079	Windows CI integration for custom ops (#12928 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/11527 ezyang orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/12928 Differential Revision: D10501342 Pulled By: goldsborough fbshipit-source-id: 7ce74795aab2f13efeb38f56ce82f53055f5eade	2018-10-23 09:18:09 -07:00
David Riazati	af78d4cd49	Add weak script modules (#12682 ) Summary: Adds support for weak script modules created that get compiled to `ScriptModule`s once added as a submodule of a `ScriptModule`: ```python weak_module class Test(torch.nn.Module): ... weak_script_method def forward(self, x): ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12682 Differential Revision: D10458626 Pulled By: driazati fbshipit-source-id: 10ae23cb83cdafc4646cee58f399e14b2e60acd4	2018-10-23 09:06:02 -07:00
Benoit Steiner	3fb3a07f54	Added a default constructor for torch.finfo. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12847 Differential Revision: D10457487 Pulled By: benoitsteiner fbshipit-source-id: 7d164a71ba52631e5906098f643eecb0630879d1	2018-10-23 09:03:24 -07:00
Jat	1b07eb7148	torch.utils.cpp_extension.verify_ninja_availability() does not return True as documented Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12922 Differential Revision: D10502167 Pulled By: ezyang fbshipit-source-id: 2e32be22a310e6e014eba0985e93282ef5764605	2018-10-23 07:38:08 -07:00
Gregory Chanan	428300d318	Revert D10494123: [c10] Remove at::Optional Differential Revision: D10494123 Original commit changeset: 761bdf7359d6 fbshipit-source-id: 552fb4ab0dc253b95ce87ec6a1c65aba4b07e84a	2018-10-23 07:18:54 -07:00
Yangqing Jia	d401dc4374	Remove at::Optional (#12958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12958 TSIA - this is an ongoing diff to fully move to c10 namespace. Reviewed By: dzhulgakov Differential Revision: D10494123 fbshipit-source-id: 761bdf7359d62ef4503ecb1b8d0ae1c0762e073c	2018-10-23 00:03:20 -07:00
Michael Suo	27af265a5e	Index to track topological order within a block (#12748 ) Summary: Simple index to track topological order. Replaced `topological_index` in the graph fuser with this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12748 Differential Revision: D10502983 Pulled By: michaelsuo fbshipit-source-id: 5855e5add3c9742fe07e86d854260baa34beab3b	2018-10-22 23:55:20 -07:00
Thomas Viehmann	dd823ccd28	small improvements to torch.nn.normalization docs (#12936 ) Summary: Based on a [discussion at the forums](https://discuss.pytorch.org/t/question-about-functional-normalize-and-torch-norm/27755), it might be worthwhile to clarify the documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12936 Differential Revision: D10502139 Pulled By: ezyang fbshipit-source-id: 480c3c367f8c685dcde107b3018cb4129032322d	2018-10-22 23:14:47 -07:00
Parth Raichura	8d7607e346	Add attribute exhaustive_search in _blacklist_caffe2_args (#12805 ) Summary: - exhaustive_search attribute will be blacklisted so it will be discarded from the coverted onnx model. At present it throws error while verifying the onnx model Signed-off-by: Parth Raichura <parth.raichura@softnautics.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12805 Differential Revision: D10502374 Pulled By: ezyang fbshipit-source-id: 0926dfa3237a8a431184e7f7250146e5b0cbfb85	2018-10-22 22:48:31 -07:00
Edward Yang	bc1d96ca98	Add support for inline expect tests. (#12825 ) Summary: expecttest and test_expecttest are the implementation and tests for this functionality. I wired it up to the --accept flag, but there's also a new environment variable EXPECTTEST_ACCEPT which may be more convenient to trigger. Haven't tested if this works in fbcode. There may be a few expect tests which will benefit from inline treatment, but I just did one to show it works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12825 Reviewed By: teng-li Differential Revision: D10448630 Pulled By: ezyang fbshipit-source-id: 3d339f82e2d00891309620a60e13039fa1ed8b46	2018-10-22 19:29:04 -07:00
Edward Yang	952df2ba8f	Install torchvision before all tests, tickles #7851 (#8311 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Pull Request resolved: https://github.com/pytorch/pytorch/pull/8311 Differential Revision: D10239923 Pulled By: ezyang fbshipit-source-id: 3f8cdc6229bfbe701c7583cede65435aa952ed85	2018-10-22 18:16:47 -07:00
Yangqing Jia	3894ed22a8	Remove nullopt from native_parse.py (#12961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12961 According to zdevito - this is not used at all, so we are removing it for safety. It is also possible that this native_parser.py will completely go away in the near future. Reviewed By: zdevito Differential Revision: D10501616 fbshipit-source-id: 3218708e6150d3c94d730fbd25ae1f7abb5718b5	2018-10-22 18:13:37 -07:00
Ilia Cherniavskii	da2da55170	Make sure to update success_ at the end of the run (#12806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12806 Make sure to update success_ status at the end of the run when going through task statuses Reviewed By: aazzolini Differential Revision: D10443704 fbshipit-source-id: 79f8f7fe1eccb78f6e2859f3b1e66dc44347bcc8	2018-10-22 16:58:20 -07:00
Edward Yang	8c514627a4	Add C10_LIKELY/C10_UNLIKELY macros (#12932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12932 I was looking at some assembly for some code I was working on, and felt a desire to have likely()/unlikely() macros. I checked if we already had them, and we didn't. This commit adds them, and fixes up all known use sites to make use of it. Reviewed By: Maratyszcza Differential Revision: D10488399 fbshipit-source-id: 7476da208907480d49f02b37c7345c17d85c3db7	2018-10-22 16:26:19 -07:00
Teng Li	8d3e7e2fcb	Move DDP queue_reduction to C++ (#12852 ) Summary: fully working version by using continuing on goldsborough 's initial version. waiting on the stream guard to be merged before adding more stream perf logics into the c++ version Pull Request resolved: https://github.com/pytorch/pytorch/pull/12852 Differential Revision: D10468696 Pulled By: teng-li fbshipit-source-id: 8e46d408796973817abfd9dbd6566e0ca5b7a13f	2018-10-22 16:07:46 -07:00
Edward Yang	8682999767	Remove trailing whitespace from files in aten/ (#12942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12942 I hate trailing whitespace. Reviewed By: Yangqing Differential Revision: D10492507 fbshipit-source-id: 94ed80988670361e9e7e508c3b07c5e5c6e500e7	2018-10-22 16:04:21 -07:00
Peter Goldsborough	f575e138d8	Credits to Exhale in cppdocs (#12926 ) Summary: Some creds to svenevs soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12926 Differential Revision: D10498288 Pulled By: goldsborough fbshipit-source-id: 878d23ebf260dac17871677635a3283eb3a8a423	2018-10-22 15:39:36 -07:00
egg-west	e64f75a1d8	fix ZeroDivisionError in utils.bottleneck (#11987 ) Summary: ZeroDivisionError occurs when `cuda_prof_exec_time` is small enough. This situation is normal for a project that has little CUDA work. Or someone does not make his work transferred to CUDA successfully. In this time he profiles the code, this error occurs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11987 Differential Revision: D10488568 Pulled By: soumith fbshipit-source-id: db8c1e9e88a00943c100958ebef41a1cb56e7e65	2018-10-22 14:00:15 -07:00
Yangqing Jia	95caa37565	Remove CAFFE2_USE_MINIMAL_GOOGLE_GLOG (#12938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12938 We will be using C10_USE_MINIMAL_GLOG. Also, this will be in exported flags, so dependent libraries won't need to define it. Reviewed By: smessmer, BIT-silence Differential Revision: D10468993 fbshipit-source-id: 04ae3ae17122d46b1b512d4202ab014365b87f4a	2018-10-22 13:37:38 -07:00
Yinghai Lu	283d41885d	Accept external input hint when doing ONNXIFI transform (#12900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12900 Workspace sometimes will be populated with input tensors for shape inference but net.external_input() is not a reliable way to tell weights from input in the workspace. We say in some usecases where net.external_input() is empty. In this case, we need to give user an option to provide input hint. Reviewed By: bddppq Differential Revision: D10476822 fbshipit-source-id: 1a3fa2df69b959d5b952a7824eba9e6c713f4f07	2018-10-22 13:32:33 -07:00
Peter Goldsborough	5f37c0afda	Fix doxygen check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12920 Differential Revision: D10494081 Pulled By: goldsborough fbshipit-source-id: c96b9b61cbae39006b48b23b901248e762cbd232	2018-10-22 12:28:17 -07:00
Yinghai Lu	56bf4850cb	Clean up of the multithreaded benchmark (#12905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12905 This diff does some clean up of the multithread benchmark code: 1. Split implementation to `.cc` file to separate implementation and improve build 2. Make `MutatingNetSupplier` more generic by providing the mutating function as an argument instead of virtual method. 3. Fix AI benchmark by sticking to the original option names Reviewed By: highker Differential Revision: D10479238 fbshipit-source-id: afa201fc287e3fdbb232db24513ecf8024501f66	2018-10-22 12:09:16 -07:00
Anders Papitto	1b530fdae0	remove the find-package codepath for gloo in caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12893 Differential Revision: D10493310 Pulled By: anderspapitto fbshipit-source-id: ba5bd375c118b0f0ab7fb7b9fda010fe17a6ac8d	2018-10-22 11:54:53 -07:00
Sebastian Messmer	6cc15c1a22	Simplify typeid SFINAE (#12706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12706 If both branches are valid C++ code independent from the type passed in, then we can just use if/else inside of a constexpr function to decide between the cases. Only if one branch would be invalid code (say because type T doesn't have a default constructor), we'd need "constexpr if" or SFINAE. Reviewed By: ezyang Differential Revision: D10400927 fbshipit-source-id: 16d9855913af960b68ee406388d6b9021bfeb34a	2018-10-22 11:27:10 -07:00
Xiaomeng Yang	3092a69546	Optimize NCHW2NHWC on GPU (#12910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12910 Optimize NCHW2NHWC on GPU Reviewed By: houseroad Differential Revision: D10481163 fbshipit-source-id: 6ddbd0ec9c96965b96aa1b8a006232d6f2b94249	2018-10-22 11:24:29 -07:00
Anders Papitto	cfb7f0a8f2	remove onnx CODEOWNERS entries (#12941 ) Summary: we don't need these anymore; let's reduce notification spam Pull Request resolved: https://github.com/pytorch/pytorch/pull/12941 Reviewed By: bddppq Differential Revision: D10492266 Pulled By: anderspapitto fbshipit-source-id: 3251b6d0160f773d17b64afc504216323d61276a	2018-10-22 11:09:08 -07:00
Anders Papitto	8f51c513a6	gloo: build once, share between pytorch/caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12885 Differential Revision: D10492244 Pulled By: anderspapitto fbshipit-source-id: 79af1ceb9bb0dab4585a728e64554ff4f38d6c32	2018-10-22 11:06:14 -07:00
Tongzhou Wang	df06fba1f1	Use the newer one of cmake and cmake3. (#12916 ) Summary: On my devgpu, `cmake` is newer than `cmake3`. Using `cmake3` causes compilation to fail. Instead of blindly using `cmake3`, we pick the newer of the two. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12916 Differential Revision: D10481922 Pulled By: SsnL fbshipit-source-id: 8340136c459e25da9f5fc4f420c7e67cadc28aff	2018-10-22 10:29:55 -07:00
Tongzhou Wang	5e8e199f8d	Add note on traced module train/eval behavior Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12903 Differential Revision: D10489090 Pulled By: SsnL fbshipit-source-id: 13ff5587f53706b360dd0905d0ae97fb16ae2bf0	2018-10-22 10:26:15 -07:00
Peter Goldsborough	a022fd2d6b	Implement DataLoader (#11918 ) Summary: This PR implements a DataLoader API for the C++ frontend. The components present in this API largely match the Python API. It consists of: - `Dataset`s: Conceptually a function from a set of indices to a batch of examples; - `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset; - `Sampler`s: Specify a strategy for generating indices for a new batch; - A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads; Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction. Things that are missing right now that maybe should be added: - Memory pinning for CUDA tensors The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types. There are many parts to this PR! Right now, I would like feedback on: - Your impression of the general usability of the API; - Your impression of which parts seem too complex or overthought; - The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader. I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself. There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation. apaszke ezyang The controller you requested could not be found. pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918 Reviewed By: ezyang Differential Revision: D9998881 Pulled By: goldsborough fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea	2018-10-22 10:22:41 -07:00
David Reiss	96d826f635	Define REGISTER_CPU_GRADIENT_OPERATOR (#12588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12588 By default, this is an alias for REGISTER_CPU_OPERATOR. If gradients are not required (e.g., on mobile) it can be converted to a no-op by defining CAFFE2_NO_GRADIENT_OPS, resulting in a smaller build. GRADIENT_OPERATOR_SCHEMA works similarly. CAFFE2_NO_GRADIENT_OPS also converts REGISTER_GRADIENT to a no-op. Use these macros in fully_connected_op.cc as an example. Follow-up diffs will convert more operators. I had to introduce MACRO_EXPAND to handle the way Visual Studio expands VA_ARGS. Reviewed By: Yangqing Differential Revision: D10209468 fbshipit-source-id: 4116d9098b97646bb30a00f2a7d46aa5d7ebcae0	2018-10-22 10:01:02 -07:00
Yangqing Jia	da73d709a8	Remove unsafecoalesce op (#12897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12897 UnsafeCoalesce Op is used during memonger days when we try to coalesce operators for better efficienct computation kernels. It creates a little bit of an unsafe underlying memory storage pattern. With the new tensor unification I am not sure if it is still safe for us to do so, so I propose we delete it for the sake of safety. Reviewed By: bddppq, ilia-cher Differential Revision: D10475980 fbshipit-source-id: b1a838c9f47d681c309ee8e2f961b432236e157e	2018-10-22 09:42:26 -07:00
Evgeniy Zheltonozhskiy	c774cb8913	Rephrase unclear error message for shape mismatch (#12870 ) Summary: I spent a couple of minutes trying to understand which shape corresponds to checkpoint and which one to the model Pull Request resolved: https://github.com/pytorch/pytorch/pull/12870 Differential Revision: D10466600 Pulled By: SsnL fbshipit-source-id: 3b68530b1b756462a2acd59e3a033ff633567a6b	2018-10-22 08:57:16 -07:00
Gregory Chanan	25f4b3efe3	Add simple scripts for checking if generated code changed. (#12835 ) Summary: This is designed to make it easier to see how your codegen changes affected actual generated code. Limitations: A) This is NOT robust; if new directories are added that include generated files, they need to be added to tools/generated_dirs.txt. Note that subdirectories of the list are not included. B) This is particular to my workflow which I don't claim is generally applicable. Ideally we would have a script that pumped out a diff that could be attached to PRs. C) Only works on OSS and definitely won't work on windows. How to use: 1) python setup.py ... 2) tools/git_add_generated_dirs 3) Edit codegen 4) python setup.py ... 4) git diff to see changes 5) If satisfied: tools/git_reset_generated_dirs, commit, etc. If not satisfied: Go to 3) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12835 Reviewed By: ezyang Differential Revision: D10452255 Pulled By: gchanan fbshipit-source-id: 294fc74d41d1b840c7a26d20e05efd0aff154635	2018-10-22 07:33:32 -07:00
Peter Goldsborough	01227f3ba7	Env variable to not check compiler abi (#12708 ) Summary: For https://github.com/pytorch/pytorch/issues/10114 soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/12708 Differential Revision: D10444102 Pulled By: goldsborough fbshipit-source-id: 529e737e795bd8801beab2247be3dad296af5a3e	2018-10-21 20:07:50 -07:00
David Riazati	1e8064dec0	Convert 2 nn.functional functions to weak script (#12723 ) Summary: * Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn` * Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error * Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723 Differential Revision: D10452986 Pulled By: driazati fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36	2018-10-21 14:09:55 -07:00
Tongzhou Wang	b357470421	Add DistributedDataParallelCPU to doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12864 Differential Revision: D10481669 Pulled By: SsnL fbshipit-source-id: 20831af41aaba75546e6ed6a99f011f0447b1acf	2018-10-21 11:20:11 -07:00
Ran Bi	ed02619ba0	Add topological sort to nomnigraph (#12790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12790 Add DFS based topological sort to nomnigraph. Reviewed By: duc0 Differential Revision: D10434645 fbshipit-source-id: aaf106b0cc37806b8ae61f065c1592a29993eb40	2018-10-20 01:07:30 -07:00
Yinghai Lu	a839a67aad	Add IDEEP unit test with zero-dim tensors (#8459 ) Summary: This test flushes out the issue that IDEEP cannot handle tensor with dims like (0, 2), which is a valid tensor shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8459 Differential Revision: D10419328 Pulled By: yinghai fbshipit-source-id: c5efcd152364a544180a8305c47a2a2d126ab070	2018-10-19 23:57:33 -07:00
Yangqing Jia	7dbb38e856	Moving logging from caffe2 to c10. (#12881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881 TSIA. This should not change any functionality. Remaining work: - change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead. - Unify the exception name (EnforceNotMet -> Error) - Unify the logging and warning APIs (like AT_WARNING) Reviewed By: dzhulgakov Differential Revision: D10441597 fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d	2018-10-19 20:22:08 -07:00
Teng Li	d120b9af5a	Make c10d pickling/unpickling work (#12694 ) Summary: This fixes the issue for https://github.com/pytorch/pytorch/issues/12168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12694 Differential Revision: D10468717 Pulled By: teng-li fbshipit-source-id: 3df31d75eea19d6085af665f5350d3cb667a5048	2018-10-19 16:42:36 -07:00
Dong Shi	8cb0848bdc	expose delete_node (#12840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12840 Add binding for delete_node Reviewed By: duc0 Differential Revision: D10453555 fbshipit-source-id: cdcaca8420a9a0c61479961d907ef6bb5478a41d	2018-10-19 13:30:50 -07:00
Junjie Bai	202893fe1a	Migrate DeviceOption.numa_node_id to DeviceOption.device_id Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12717 Reviewed By: ilia-cher Differential Revision: D10408325 fbshipit-source-id: 82583d0ad4b8db094ee4c5c607b52500826328f7	2018-10-19 12:45:48 -07:00
Will Feng	7921e16ca2	Revert D10421896: restore caffe2 strides Differential Revision: D10421896 Original commit changeset: b961ea0bca79 fbshipit-source-id: 9d9d2ed0c2cb23a3fdf6bbfc9509539aeeb7e382	2018-10-19 12:15:44 -07:00
Will Feng	bf99ffc4d2	Remove OMP_NUM_THREADS and MKL_NUM_THREADS settings from docker images (#12836 ) Summary: `OMP_NUM_THREADS` and `MKL_NUM_THREADS` are set to 4 by default in the docker images, which causes `nproc` to only show 4 cores in the docker containers by default, and building PyTorch is slow in this default case. We likely don't need these two flags to be set, and this PR tests that hypothesis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12836 Differential Revision: D10468218 Pulled By: yf225 fbshipit-source-id: 7a57962c962e162a8d97f730626825aa1e371c7f	2018-10-19 11:44:22 -07:00
Xiaomeng Yang	14ff866505	Optimize GroupNormOp (#12844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12844 Optimize GroupNormOp Reviewed By: houseroad Differential Revision: D10455567 fbshipit-source-id: aee211badd1e0c8ea6196843e3e77f7c612a74d5	2018-10-19 11:40:12 -07:00
Elias Ellison	f3e1fe5ca5	add string as supported input / output of script functions (#12731 ) Summary: Add strings to our set of built-in types for annotations. This is used in the the functional library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12731 Differential Revision: D10453153 Pulled By: eellison fbshipit-source-id: f54177c0c529f2e09f7ff380ddb476c3545ba5b0	2018-10-19 11:17:19 -07:00
Roy Li	186219a643	restore caffe2 strides (#12845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12845 Attempting to do this again. Reallocation of strides_ if there's no change in dim seems to cause the error that broke internal flow last time. This fixes that. Found a potential race condition in caffe2 counter ops that might be the cause, we will investigate that. Reviewed By: ezyang Differential Revision: D10421896 fbshipit-source-id: b961ea0bca79757991013a2d60cfe51565689ee9	2018-10-19 10:00:16 -07:00
Edward Yang	68f4a4b3ba	Delete THCStreamGuard in favor of CUDAGuard, also c10d code cleanup (#12849 ) Summary: I got annoyed at waiting for OSS to tell me my c10d builds were busted, so I also added support for building the test scripts in fbcode and fixed the warnings this uncovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12849 Reviewed By: pietern Differential Revision: D10457671 fbshipit-source-id: 5b0e36c606e397323f313f09dfce64d2df88faed	2018-10-19 09:48:41 -07:00
Will Feng	6ec2f09188	CircleCI: enable OSX jobs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12667 Differential Revision: D10466661 Pulled By: yf225 fbshipit-source-id: a1a150d3b384eb88ba4c7e6d57e59d8ed834e53c	2018-10-19 09:42:06 -07:00
Will Feng	7837ec553c	CircleCI: Add doc-push job Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12833 Differential Revision: D10464815 Pulled By: yf225 fbshipit-source-id: 06a6a673b6bb32f7c252a217f9ce59db35c75e9c	2018-10-19 08:58:04 -07:00
Tristan Rice	6190408e24	caffe2: UpsampleBilinear support for scales (#12736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736 This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720. Reviewed By: houseroad Differential Revision: D10416228 fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957	2018-10-19 08:55:55 -07:00
Gregory Chanan	d736f4f0a7	Kill 'python_name' in Declarations.cwrap. (#12832 ) Summary: I'm trying to do some transformations on Declarations.cwrap and this makes things overly difficult and doesn't do anything useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12832 Reviewed By: ezyang Differential Revision: D10450771 Pulled By: gchanan fbshipit-source-id: 1abb1bce27b323dd3e93b52240e7627cd8e56566	2018-10-19 08:47:27 -07:00
Zachary DeVito	31232061aa	Use C local in lexer (2) (#12838 ) Summary: trying again without xlocale.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/12838 Differential Revision: D10453078 Pulled By: zdevito fbshipit-source-id: 760852c82e16acee7d1abb8a918822bf5ff59bca	2018-10-19 00:25:35 -07:00
Jan Schlüter	373b5080da	Warn that tensor.resize_() resets strides (#12816 ) Summary: As discussed in #1570, this adds a warning to the docstring of `tensor.resize_()` to prevent people from naively using it as an in-place view or reshape. For your convenience, the updated docstring renders as follows: ![torch_resize_docstring](https://user-images.githubusercontent.com/629706/47148782-f1b57900-d2d1-11e8-9749-e9c7387113ed.png) Fixes #1570. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12816 Differential Revision: D10457755 Pulled By: ezyang fbshipit-source-id: dd4b3a821e8c76dc534d81c53084abdb336e690a	2018-10-18 22:47:30 -07:00
Edward Yang	d783249674	Revert D10457796: [pytorch][PR] fix typo Differential Revision: D10457796 Original commit changeset: 9d1582c11c2e fbshipit-source-id: 9be38e999a2783dae4a387821806e6850b6a3671	2018-10-18 21:48:14 -07:00
James Reed	ca5dc9f13a	Add py2 compatibility for builtins import (#12784 ) Summary: Testing if this is a solution for the issue reported at https://github.com/pytorch/pytorch/pull/12504#issuecomment-430758448 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12784 Differential Revision: D10454398 Pulled By: jamesr66a fbshipit-source-id: a0304acde5df438c08cceb2d5280933de24664c4	2018-10-18 20:54:23 -07:00
crcrpar	aa6f47e229	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12814 Differential Revision: D10457796 Pulled By: ezyang fbshipit-source-id: 9d1582c11c2e6dec5ff1c87525fac127a7e77273	2018-10-18 20:42:08 -07:00
James Reed	f47d12b0ef	shape_as_tensor should return a CPU tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12846 Differential Revision: D10456885 Pulled By: jamesr66a fbshipit-source-id: fa66d0736cfb0ed09e566ae7c2eaeac37f8bb0e4	2018-10-18 20:20:00 -07:00
Parth Raichura	40ff69b796	Add attribute exhaustive_search in caffe2 blacklist args (#12815 ) Summary: Currently while converting from caffe2 to onnx it doesn't blacklist the exhaustive_search attribute in support_onnx_export. So conversion fails when onnx model is verified using C.check_model. Signed-off-by: Parth Raichura <parth.raichura@softnautics.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12815 Differential Revision: D10457777 Pulled By: ezyang fbshipit-source-id: dc2183d8abef8cd753b348f2eaa62c952a058920	2018-10-18 19:53:40 -07:00
Tongzhou Wang	8a35aafca6	Try to fix randomness.rst formatting again Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12853 Differential Revision: D10458439 Pulled By: SsnL fbshipit-source-id: ebd259e598327b0c5d63de6b7c182781fe361fbd	2018-10-18 19:18:49 -07:00
JerryShih	0fa69c0276	Remove the protobuf library in pytorch linking list. (#12451 ) Summary: There will be a link error when the caffe2 doesn't use its protobuf under third_party. The pytorch will always link that protobuf. The pytorch doesn't use the protobuf directly. We could remove it from the list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12451 Differential Revision: D10262676 Pulled By: ezyang fbshipit-source-id: c2ff3fdf757fc21ed689e7f663c082064b1a0bca	2018-10-18 18:31:51 -07:00
Tongzhou Wang	a85174b46a	Fix randomness.rst formatting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12850 Differential Revision: D10457694 Pulled By: SsnL fbshipit-source-id: fa64964ff6d41625d9383ca96393017230e4ee0f	2018-10-18 18:26:26 -07:00
Zachary DeVito	87d3d209a6	Enable JIT tests in fbcode (#12777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12777 Enables JIT tests in FBCode. Changes pybind11 code to avoid mixing py::args with positinally matched arguments because old versions of PyBind11 leak memory in this case. Reviewed By: jamesr66a Differential Revision: D10419708 fbshipit-source-id: 74bc466001b5d363132d1af32e96841b38601827	2018-10-18 18:18:37 -07:00
Edward Yang	99bc541b5b	size_from_dim(0) is like numel() but worse. Don't do it. (#12729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12729 This may have a dependency on D10380678 if size_from_dim(0) was required because numel() used to return -1 in some cases. This is no longer true. Reviewed By: li-roy, dzhulgakov Differential Revision: D10415069 fbshipit-source-id: 39f46f56249ecaf3533f62a0205b3a45d519d789	2018-10-18 18:06:37 -07:00
Xiang Li	89bf98ac4c	Update '__all__' in '__init.py__' (#12762 ) Summary: It's the best coding practice to always include dynamically declared module level methods in the "__all__" field. Otherwise, IDEs (such as PyCharm) with referenced module inspectors will complain "Cannot find reference ..." . This PR adds 'rand' and 'randn' in __init.py__' . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12762 Differential Revision: D10427541 Pulled By: ezyang fbshipit-source-id: ec0704dfd91e78d7ad098b42cfd4bd1ad0e119df	2018-10-18 17:52:10 -07:00
Edward Yang	a223c5ed2c	Extend ONNX while op by x2, rather than x1.02 Summary: I think the original author wrote 2.0f in attempt to double in size, but this argument takes a percentage increase, not a factor increase. Created from Diffusion's 'Open in Editor' feature. Reviewed By: jamesr66a Differential Revision: D10412946 fbshipit-source-id: 95eb3d284255f232b7782bb1d2c9c2ef8aa6f8a7	2018-10-18 17:49:51 -07:00
Lu Fang	f9d1b63d18	Automatic update of fbcode/onnx to f8828e532da4795e8ea15f5850a37c5179917b9b (#12823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12823 Previous import was 1cbe2743cda739ff752d6ce79553b0ef8ad49783 Included changes: - [f8828e5](https://github.com/onnx/onnx/commit/f8828e5): Use vector instead of set to keep the order of the opt passes (#1524) <Lu Fang> - [b5a37c4](https://github.com/onnx/onnx/commit/b5a37c4): Pin awscli to last known good version (#1518) <bddppq> - [3e219f6](https://github.com/onnx/onnx/commit/3e219f6): ONNX Optimization Rewrite (#1452) <Armen> - [96758c9](https://github.com/onnx/onnx/commit/96758c9): Add MaxUnpool op to ONNX. (#1494) <Spandan Tiwari> - [c4f7043](https://github.com/onnx/onnx/commit/c4f7043): Update docker image version used in CircleCI (#1511) <bddppq> Differential Revision: D10447573 fbshipit-source-id: 8748ba6e3be322a26a9a360ff7f2babd54fd581f	2018-10-18 16:17:25 -07:00
James Reed	f380f0ba27	Move torch.onnx.operators functions into ATen (#12803 ) Summary: These were indiscriminately dumping `onnx::` instructions into traces, and making it so you couldn't run the traces in the JIT interpreter Pull Request resolved: https://github.com/pytorch/pytorch/pull/12803 Differential Revision: D10443526 Pulled By: jamesr66a fbshipit-source-id: 07172004bf31be9f61e498b5772759fe9262e9b3	2018-10-18 16:04:34 -07:00
Anders Papitto	79709f02e9	fix overwriting of CMAKE_EXE_LINKER_FLAGS (#12834 ) Summary: bug lurking since 2016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12834 Reviewed By: bddppq Differential Revision: D10452484 Pulled By: anderspapitto fbshipit-source-id: 352584af06e2fb35338fb66b3d8eb1050b716349	2018-10-18 15:34:28 -07:00
Dmytro Dzhulgakov	92890d4314	Delete ExtendTensor operator Summary: Added 2 years ago in D3665603, never used, kill it. Reviewed By: ezyang Differential Revision: D10421336 fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8	2018-10-18 15:18:40 -07:00
Zachary DeVito	324a510f9c	JIT Cleanups (#12804 ) Summary: 1. Change scope ownership model so they can be shared across Graphs. Now scopes own their parent and are intrusive pointers. Graphs no longer require a scope_root and cloning a node automatically clones its scope. This causes some changes in expect files for trace+script things. As far as I can tell these are not bugs but a different way of interpreting how scopes should propagate. Big traces like that of alexnet keep their scopes unchanged. 2. Remove VariableType.cpp dependency on a symbol being in the pre- declared symbol list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12804 Differential Revision: D10447922 Pulled By: zdevito fbshipit-source-id: dcfcaf514bbe5687047df0f79c2be536ea539281	2018-10-18 14:41:55 -07:00
Andrey Malevich	6058886b03	Speedup pnorm (#12811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12811 L1 version of this operator was super slow and timing out one of our unit-tests. This diff is addressing TODO and making it fast. Reviewed By: chocjy Differential Revision: D10444267 fbshipit-source-id: 550b701b6a5cb3f2540997fd7d8b920400b983a6	2018-10-18 14:22:55 -07:00
James Sun	68843c683d	Open source multithreaded predictor bench utils (#11135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11135 This diff does not have any logic change; it simply move files/functions/classes around. Open source (almost all) necessary dependency for multithreaded predictor bench. The benchmark itself can be open sourced once the predictor is open sourced. Reviewed By: salexspb Differential Revision: D9602006 fbshipit-source-id: 386c9483e2c64c8b7d36e4600189c4e0b7e159ff	2018-10-18 14:16:36 -07:00
Joel Marcey	ee563c5899	Add license reference to README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12822 Differential Revision: D10451895 Pulled By: JoelMarcey fbshipit-source-id: dee4cafd3120571e52cf242bb0674c7aa7dab217	2018-10-18 14:10:24 -07:00
Will Feng	9473e57eca	Revert D10444104: [pytorch][PR] Windows CI integration for custom ops Differential Revision: D10444104 Original commit changeset: 4c447beeb967 fbshipit-source-id: ead52444aefa27692e3f36dadad986e2313261bd	2018-10-18 14:08:18 -07:00
Yinghai Lu	ed317b6203	Remove useless MKL target (#12783 ) Summary: Context: https://github.com/pytorch/pytorch/pull/12625#issuecomment-430560919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12783 Differential Revision: D10451726 Pulled By: yinghai fbshipit-source-id: 3cd1e61209628d7c52b440e5b232ae95dd09885e	2018-10-18 14:03:34 -07:00
Junjie Bai	805f4d5cb8	Revert D10416438: Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE Differential Revision: D10416438 Original commit changeset: cb842e3e26b0 fbshipit-source-id: c0760e73ecc76ca9b1b74f6844e243c2df5260a2	2018-10-18 13:46:33 -07:00
Jerry Zhang	57ddc08a57	Enable multiple external output (#12778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12778 att Differential Revision: D10248027 fbshipit-source-id: fc3d17314e8c2d9704b8bfcc50ace176ec2c85d7	2018-10-18 13:36:23 -07:00
Bram Wasti	dec9bc5f0b	Expose device_option directly Summary: as title states Reviewed By: duc0 Differential Revision: D10442424 fbshipit-source-id: bba2dd600e1979ff018ac0e403463f992a94a6e5	2018-10-18 13:22:17 -07:00
Michael Antonov	63cd051867	Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (#12799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799 Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call SerializeAsString_EnforceCheck so that the return value is checked and can throw an exception if failing. Most of the affected code was called from classes derived from BlobSerializeBase. Didn't touch most tests and ENFORCE calls because they usually do checks anyway. Reviewed By: ezyang Differential Revision: D10416438 fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58	2018-10-18 12:49:01 -07:00
Duc Ngo	2c566a17c7	nomnigraph - simplify subgraph matching APIs (#12681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12681 - Get rid of NodeMatchCriteria as a template parameter, which was too generic. So MatchNode<NodeMatchCriteria> becomes MatchNode<GraphType>, and MatchStore stores the predicate on GraphType::NodeRef. - Similarly, get rid of NNNodeMatchCriteria Now one can just pass in a function pointer NodeRef -> bool to NNMatchNode constructor directly like this mg.createNode(is<Relu>) - Merge static utilities in SubgraphMatcher class into MatchGraph class - Rename MatchNode to MatchPredicate Change use cases and tests to make it work Reviewed By: ZolotukhinM Differential Revision: D10386907 fbshipit-source-id: 43874bd154e3d7c29ce07b4b74eca8a7a9f3078a	2018-10-18 12:32:40 -07:00
Pieter Noordhuis	9c617140f7	Try to reduce c10d test flakiness (#12782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12782 We have seen the "Address already in use" error popup a few times when instantiating the TCPStore. The port that it uses is dynamically generated through common.find_free_port(), which binds a new socket to a random port, closes the socket, and returns the port that the OS had assigned. If some other process grabs that port in the time between closing the socket and the TCPStore binding to it, the bind error shows up. This commit changes most tests to use the FileStore instead and includes a retry when testing the TCPStore. Differential Revision: D10433401 fbshipit-source-id: 8dd575ac91a3cddd1cc41ddb0ff4311ddc58c813	2018-10-18 12:12:33 -07:00
Max Katsev	3fe35300ed	Revert D10417038: [pytorch][PR] Use C locale in lexer Differential Revision: D10417038 Original commit changeset: 1d5f2f9a24ec fbshipit-source-id: 5780fed8e29551ec5b0a56ad6966a560c02bc171	2018-10-18 11:45:18 -07:00
James Reed	545f22c070	Link libshm against c10 (#12802 ) Summary: Fixes this build failure i got: https://gist.github.com/jamesr66a/1e0025d8d6d30b090f0e247457063093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12802 Differential Revision: D10447916 Pulled By: jamesr66a fbshipit-source-id: ab2cddff95429881db992c04e80453a46eb81f79	2018-10-18 11:38:42 -07:00
Edward Yang	5b971445a6	Typo fix (#12826 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12826 Differential Revision: D10449047 Pulled By: ezyang fbshipit-source-id: eb10aa5886339b43bb8c239dd8742e458f3d024d	2018-10-18 11:36:00 -07:00
Tommy Yu	2b63b7a0a5	Support GPU version of Spatial Batch Norm (#11711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11711 Added GPU support for spatial batch normalization. This functions by reducing values from GPUs onto a CPU and broadcasting those results back to each GPU. We have run several experiments, and found these results to be better than those without spatial bn: https://fb.quip.com/fr7HAeDliPB8 Reviewed By: enosair Differential Revision: D9547420 fbshipit-source-id: ccbd2937efd6cfd61182fff2f098fb7c5ae8aeb1	2018-10-18 11:22:13 -07:00
Lu Fang	e240e89984	move the torch/csrc/jit/serialization.h to caffe2 source folder and rename to inline_container.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12781 Reviewed By: dzhulgakov Differential Revision: D10436151 Pulled By: houseroad fbshipit-source-id: 7f59eec21df5acbab0ea693e1a1cd4fa152f05e5	2018-10-18 09:47:19 -07:00
Duc Ngo	963b012bd8	nomnigraph - HEFT scheduler (#12788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12788 Static task scheduling algorithm - Input/Output for static scheduler - HEFT static scheduling algorithm - Theoretical critical path analyzer Reviewed By: bwasti Differential Revision: D10436418 fbshipit-source-id: 074bc587b9a2c7cb2d9e64291981ff1c160f02b2	2018-10-18 08:40:46 -07:00
Peter Goldsborough	12be60cc04	Windows CI integration for custom ops (#11527 ) Summary: This is likely currently broken due to symbol visibility issues, but we will investigate it using this PR. CC orionr yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11527 Differential Revision: D10444104 Pulled By: goldsborough fbshipit-source-id: 4c447beeb9671598ecfc846cb5c507ef143459fe	2018-10-18 07:55:05 -07:00
Peter Goldsborough	eb6a1245a2	Fix torch::jit::load docs (#12709 ) Summary: `torch::jit::load` is currently incorrectly documented/rendered soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12709 Differential Revision: D10422064 Pulled By: goldsborough fbshipit-source-id: 4b195a84847d731ae3fe2d40868ebe858d510a2e	2018-10-18 07:52:13 -07:00
Peter Goldsborough	b1a6fa90e1	Add script::Module::to (#12710 ) Summary: There is currently no obvious way for users to move their `script::Module` to GPU memory. This PR implements the `to()` functions that C++ frontend modules have. zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12710 Differential Revision: D10444103 Pulled By: goldsborough fbshipit-source-id: daa0ec7e7416c683397ee392c6e78b48273f72c7	2018-10-18 07:48:51 -07:00
Wei Yang	710191e292	fix error message of large kernel size in conv2D (#12791 ) Summary: - fix #12565 - test plan: with this fix, we have: ``` >>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True) >>> input = torch.randn(1, 3, 1, 1) >>> output = m(input) ``` RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50 not sure why these are `int` instead of `int64_t`: `5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791 Differential Revision: D10443045 Pulled By: weiyangfb fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c	2018-10-18 00:51:16 -07:00
Lu Fang	f1e7d384b6	Support scales as inputs in ResizeNearest (#12720 ) Summary: To address https://github.com/onnx/onnx/pull/1467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12720 Reviewed By: BIT-silence Differential Revision: D10414813 Pulled By: houseroad fbshipit-source-id: 8831381b0115c363065c8d23bd1a95b4d641b857	2018-10-17 23:08:53 -07:00
James Sun	f4944f0f8a	Rename test/common.py to test/common_utils.py (#12794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794 common.py is used in base_module for almost all tests in test/. The name of this file is so common that can easily conflict with other dependencies if they happen to have another common.py in the base module. Rename the file to avoid conflict. Reviewed By: orionr Differential Revision: D10438204 fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380	2018-10-17 23:04:29 -07:00
Sepehr Sameni	cffeb03a2d	fix forward and backward for norm with negative infinity norm (#12722 ) Summary: I found a bug in norm() and fixed it (and added tests to make sure it's fixed) here is how to reproduce it: ```python import torch x = torch.FloatTensor([[10, 12, 13], [4, 0, 12]]) print(torch.norm(x, -40, dim=0, keepdim=True)) #output is tensor([[ 4.0000, 0.0000, 11.9853]]) print(torch.norm(x, float('-inf'), dim=0, keepdim=True)) #output is tensor([[1., 1., 1.]]) which is wrong! from numpy.linalg import norm as np_norm x = x.numpy() print(np_norm(x, ord=-40, axis=0)) #output is array([[4., 0., 11.985261]]) print(np_norm(x, ord=float('-inf'), axis=0)) #output is array([[4., 0., 12.0]]) ``` it's related to [#6817](https://github.com/pytorch/pytorch/issues/6817) and [#6969](https://github.com/pytorch/pytorch/pull/6969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12722 Differential Revision: D10427687 Pulled By: soumith fbshipit-source-id: 936a7491d1e2625410513ee9c39f8c910e8e6803	2018-10-17 21:07:43 -07:00
Xiaomeng Yang	ed5eb7196b	Add quantized GroupNormOp (#11852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11852 Add quantized GroupNormOp Reviewed By: houseroad Differential Revision: D9931468 fbshipit-source-id: 02af82d98356a49736e44162042783c9e36a81b5	2018-10-17 18:32:44 -07:00
Yangqing Jia	08aab4dfdd	remove ATen/Error.h and ATen/core/Error.h (#12792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12792 This is a follow up diff after D10238910. Only non-codemod change is the removal of ATen/Error.h and ATen/core/Error.h. Other files are basically changing the inclusion path + clang format for inclusion order. Reviewed By: bddppq Differential Revision: D10437824 fbshipit-source-id: 7f885f80ab5827468d1351cfb2765d0e3f555a69	2018-10-17 17:25:42 -07:00
Will Feng	cd88c5ccf4	CircleCI hot fix: pin awscli to 1.16.35 (#12787 ) Summary: awscli==1.16.36 is broken: https://circleci.com/gh/pytorch/pytorch/77338?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/12787 Differential Revision: D10437424 Pulled By: yf225 fbshipit-source-id: c15bed7aa83ddca92ff32e2aaa69fbe97ac6ab1c	2018-10-17 15:57:52 -07:00
Kaixhin	84ce3ab47e	Add MAE and L2 loss to docs (#12754 ) Summary: Fixes #12751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12754 Differential Revision: D10427661 Pulled By: ezyang fbshipit-source-id: 75bbef85976e253ab5a7140fc57f7a0ad34d96f5	2018-10-17 15:40:20 -07:00
Orion Reblitz-Richardson	5ccdd7a626	Support cmake3 for 14.04 and CentOS (#12771 ) Summary: Fix https://github.com/caffe2/caffe2.github.io/issues/24 cc pjh5 anderspapitto soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12771 Reviewed By: anderspapitto Differential Revision: D10430865 Pulled By: orionr fbshipit-source-id: 10c03cd25ab9faad49d53d0f18dd9566bfd28ae2	2018-10-17 15:02:19 -07:00
Sam Gross	21ff6de4b3	Add missing HANDLE_TH_ERRORS (#12770 ) Summary: THPSize_pynew is called from the Python C API and may throw exceptions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12770 Differential Revision: D10431180 Pulled By: colesbury fbshipit-source-id: 93dd1b604ac6bc05d4eb02b97e3f79a73aec73c5	2018-10-17 13:52:02 -07:00
Jerry Zhang	ab1a25aa9b	caffe2::empty for Resize+mutable_data refactor (#12407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12407 We want to use tensor factory to refactor the caffe2's old way of initialize Tensor by Resize and mutable_data in order to eliminate uninitialized Tensor. Previously when we want to create a Tensor in caffe2, we'll do the following ``` Tensor x(CPU); // device type provided x.Resize({1, 2, 3}); // size provided x.mutable_data<float>(); // data type provided and memory allocated ``` This leaves Tensor in not fully initialized state during the process, to eliminate this, we want to provide all the needed information in the begining. ATen already has its TensorFactories: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorFactories.cpp, and there is a TensorOption, we want to adopt the same interface to ease future refactoring. In the callsite, we used to have `Output(i)` that returns a `Blob` that contains an uninitialized `Tensor` and we'll call Resize and mutable_data afterwards to provide dimension and data type, ``` // uninitialized tensor auto* Y = Output(0); // set dimensions Y->Resize({1, 2, 3}); // actually allocate the data auto* data = Y->mutable_data<float>(); // After this step, Tensor is fully initialized. ``` We want to change it to the following: ``` // provide dimensions and TensorOptions which include device type and data type. // This will set all the information of Tensor properly and also allocate memory. auto* Y = Output(0, {1, 2, 3}, at::device({context_.device_type()}).template dtype<T>()); // Tensor is fully initialized after this step // following `mutable_data` call won't allocate memory. auto* data = Y->mutable_data<float>(); ``` microbenchmarks ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ OperatorNewOutputTensorAPI 3.27us 306.05K OperatorOldOutputTensorAPI 3.55us 281.54K ============================================================================ ``` Reviewed By: ezyang Differential Revision: D10207890 fbshipit-source-id: f54ddacaa057b7c6bc7d5a8290171f35e9e40e29	2018-10-17 13:03:06 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Peter Goldsborough	348867c10b	Remove cereal submodule (#12666 ) Summary: Cereal is dead! soumith orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/12666 Reviewed By: soumith Differential Revision: D10422061 Pulled By: goldsborough fbshipit-source-id: ca1ac66d05e699df9de00fc340a399571b7ecb9f	2018-10-17 11:52:47 -07:00
Sebastian Messmer	dd7501e3a8	Remove Blob::ShareExternal from serialization (#11926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11926 With the preparation work of diffs stacked below, we're now able to remove this call to Blob::ShareExternal(), preparing for removing that function from Blob, Reviewed By: dzhulgakov Differential Revision: D9884563 fbshipit-source-id: 7dd5c5fe02be0df7a44be45587c1dd7c474126ef	2018-10-17 11:50:35 -07:00
Sebastian Messmer	6cbf1992bd	Serialization takes pointers instead of Blob (#11925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11925 This is step 1 in the refactoring to remove Blob::ShareExternal(), i.e. Blob would then always own its contents. ShareExternal() is for example used to pass non-owning blobs to serialization. This diff prepares removing that. Reviewed By: ezyang Differential Revision: D9884177 fbshipit-source-id: d01df9a613a4fc62e5679fe45bfc47e2c899b818	2018-10-17 11:50:34 -07:00
Ailing Zhang	25db86cca5	Fix isfinite for int input (#12750 ) Summary: `torch.isfinite()` used to crash on int inputs. ``` >>> import torch >>> a = torch.tensor([1, 2]) >>> torch.isfinite(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/scratch/pytorch/torch/functional.py", line 262, in isfinite return (tensor == tensor) & (tensor.abs() != inf) RuntimeError: value cannot be converted to type int64_t without overflow: inf ``` But this is a easy special case and numpy also supports it. ``` >>> import numpy as np >>> a = np.array([1, 2]) >>> a.dtype dtype('int64') >>> np.isfinite(a) array([ True, True], dtype=bool) ``` So added a hacky line to handle non-floating-point input. Since pytorch raises exception when overflow, we can safely assume all valid int tensors are infinite numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12750 Differential Revision: D10428204 Pulled By: ailzhang fbshipit-source-id: f39b2d0975762c91cdea23c766ff1e21d85d57a5	2018-10-17 11:48:25 -07:00
Zachary DeVito	9a76e84a08	Use C locale in lexer (#12739 ) Summary: Possible fix for #11326. Testing in CI for windows code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12739 Differential Revision: D10417038 Pulled By: zdevito fbshipit-source-id: 1d5f2f9a24eceef7047dc218669faca8a187c65c	2018-10-17 10:42:38 -07:00
Wei Yang	459cff93fe	fix math formula for conv1d and conv2d (#12740 ) Summary: - fix math formula - test plan: build html and view on a browser Pull Request resolved: https://github.com/pytorch/pytorch/pull/12740 Differential Revision: D10419430 Pulled By: weiyangfb fbshipit-source-id: b8eee9e75c3ce6e37535e3de597431ef5030e9ac	2018-10-17 10:24:11 -07:00
Andrey Malevich	e027f7a913	Fix character with wrong encodding in documentation (#12761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12761 , is not really , and thus it can fail some of the Python 2 import. Reviewed By: weiyangfb Differential Revision: D10423231 fbshipit-source-id: 3738c0b9d2f52aa47eef06250f84c5933a38783f	2018-10-17 10:20:45 -07:00
James Reed	9d79030d38	Fixup THPUtils_unpackIndex (#12738 ) Summary: See https://github.com/pytorch/pytorch/issues/12735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12738 Differential Revision: D10416682 Pulled By: jamesr66a fbshipit-source-id: 69f3452750dffda3cfed50463d9241fd7b52528b	2018-10-17 10:16:54 -07:00
103yiran	409ee5bcd9	Remove redundant semicolon Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12753 Differential Revision: D10427674 Pulled By: ezyang fbshipit-source-id: f790dbbafc6b1965c4e1368f311076ea045555de	2018-10-17 09:52:48 -07:00
Jaivarsan	1a6071d436	fixing `seq` to `tensors` in documentation (#12741 ) Summary: Fixes #12251 In the docs the actual key word argument was supposed to be `tensors` but instead it is given as `seq` for doing `torch.cat` operation. zou3519 can you review this code? I don't have access to request for code reviews. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12741 Differential Revision: D10419682 Pulled By: ezyang fbshipit-source-id: a0ec9c3f4aeba23ac3a99e2ae89bd07d2b9ddb58	2018-10-17 09:16:04 -07:00
Sebastian Messmer	7edfe11ba4	Use TypeMeta::dtor() instead of Blob::DestroyCall (#11500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11500 Since TypeMeta already stores a destructor, and we removed the ability from Blob to store a custom destructor in a diff stacked below this, there is now no reason for Blob to store it again. Reviewed By: ezyang Differential Revision: D9763423 fbshipit-source-id: d37a792ffd6928ed1906f5ba88bd4f1d1e2b3781	2018-10-17 06:21:46 -07:00
Sebastian Messmer	7b7bf09e3c	Add TypeMeta::New/Delete (#12307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12307 This adds non-placement variants of New/Delete to TypeMeta. In a future diff, this is going to be used from Blob to destruct its contents. Reviewed By: dzhulgakov Differential Revision: D10184116 fbshipit-source-id: 7dc5592dbb9d7c4857c0ec7b8570329b33ce5017	2018-10-17 06:21:45 -07:00
Guillaume Huard	90737f7f5d	Fix missing final activation in NLLLoss second example (#12703 ) Summary: Fixed the second example in NLLLoss. The LogSoftmax activation was missing after the convolution layer. Without this activation, the second example loss was sometimes negative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12703 Differential Revision: D10419694 Pulled By: ezyang fbshipit-source-id: 98bfefd1050290dd5b29d3ce18fe075103db4674	2018-10-17 02:57:39 -07:00
Thomas Viehmann	0521c47c91	Amend nondeterminism notes (#12217 ) Summary: include atomicAdd commentary as this is less well known There is some discussion in #12207 Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217 Differential Revision: D10419739 Pulled By: SsnL fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e	2018-10-16 23:59:26 -07:00
Andrey Malevich	8c873def88	Revert D10220313: restore caffe2 strides Differential Revision: D10220313 Original commit changeset: aaf9edebf4ff fbshipit-source-id: 46c4d23d89d47be26c3f4967476271d8c2f95f11	2018-10-16 23:57:20 -07:00
bddppq	70c527dacd	Re-disable softmax ops tests in ROCM (#12749 ) Summary: They are flaky in master. ashishfarmer petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/12749 Differential Revision: D10420265 Pulled By: bddppq fbshipit-source-id: cac58efb711941786b10b07ada58e0d59ab1db1d	2018-10-16 22:54:50 -07:00
Tongzhou Wang	034c969f3c	Simply exit DataLoader when Python is dying (#12700 ) Summary: I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed. Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children. An `atexit` hook is used to detect Python exit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700 Differential Revision: D10419027 Pulled By: SsnL fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0	2018-10-16 22:05:33 -07:00
Thomas Viehmann	d34578026c	Various example code fixes (#12707 ) Summary: - Fix broken sparse_coo_examples, update output - Tensor(...) to tensor(...) - Fix arguments to math.log to be floats While the last might be debateable, mypy currently complains when passing an int to math.log. As it is not essential for our examples, let's be clean w.r.t. other people's expectations. These popped up while checking examples in the context of #12500 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12707 Differential Revision: D10415256 Pulled By: SsnL fbshipit-source-id: c907b576b02cb0f89d8f261173dbf4b3175b4b8d	2018-10-16 21:59:40 -07:00
Zachary DeVito	c8ac878b98	Fix bug in script for where (#12385 ) Summary: Where is declared as: ``` where(Tensor condition, Tensor self, Tensor other) ``` Previously the compiler assumed that self must be the first argument. But this is not true in practice for `where` and for a few other exceptions. This changes the compiler to take an explicit self argument which gets matched to the `self` that appears in the schema. Note that this requires renaming a variant of pow, which referred to an exponent Tensor as `self` because otherwise that would cause `t^3` to match against `t` being the exponent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12385 Differential Revision: D10364658 Pulled By: zdevito fbshipit-source-id: 39e030c6912dd19b4b0b9e35fcbabc167b4cc255	2018-10-16 21:05:14 -07:00
Bram Wasti	84edd4a48b	Enable mapping from operatordef to converted node for debugging Summary: Add a mapping for conversion -- this will help with debugging as well but is directly used by the TUI stacked on top of this Reviewed By: duc0 Differential Revision: D10396130 fbshipit-source-id: cdd39278f0ed563bb828b1aebbbd228f486d89c8	2018-10-16 21:03:28 -07:00
bstriner	1bf642800d	Remove duplicate descriptors (#8321 ) Summary: This PR removes some duplication in `recurrent_op_cudnn.cc`. Instead of 4 of the same exact descriptor, should work fine with just 1. I don't see any other code that relies on those being 4 separate locations, but if that is what you need you can always allocate additional descriptors as necessary. Have not fully tested this thing out, just something I noticed when I was reading through the descriptor code. Cheers Pull Request resolved: https://github.com/pytorch/pytorch/pull/8321 Differential Revision: D10363744 Pulled By: ezyang fbshipit-source-id: 733c8242fb86866f1d64cfd79c54ee7bedb03b84	2018-10-16 20:59:00 -07:00
wuhuikx	e497aa1e35	Optimize UpsampleNearest Op (#12151 ) Summary: Optimize the UpsampleNearest Op. 1. Add OMP 2. revise the translated_idx method Pull Request resolved: https://github.com/pytorch/pytorch/pull/12151 Differential Revision: D10362856 Pulled By: ezyang fbshipit-source-id: 535a4b87c7423942217f2d79bedc463a0617c67a	2018-10-16 20:34:20 -07:00
Thomas Viehmann	ba25e13782	Forbid Module.to with copy argument. (#12617 ) Summary: Module.to uses the Tensor.to parsing facility. It should not, however, accept "copy" as a keyword/fourth positional argument. See #12571 for discussion. Thank you SsnL for noticing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617 Differential Revision: D10392053 Pulled By: ezyang fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c	2018-10-16 20:31:44 -07:00
ChongyuIntel	5416260b1e	Add the OpenMP optimization for BatchPermutation. (#12153 ) Summary: This is for Caffe2 optimization. WIth this optimization, the following two ops can boost a lot. (Test with MaskRCNN, on SKX8180 one socket) BatchPermutation op: reduced from 8.296387 ms to 1.4501984 ms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12153 Differential Revision: D10362823 Pulled By: ezyang fbshipit-source-id: 04d1486f6c7db49270992cd8cde41092154e62ee	2018-10-16 20:23:09 -07:00
Edward Yang	3709734b1c	Improve reporting on pytest. (#12610 ) Summary: Before and after coming after I run the tests on CI Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12610 Differential Revision: D10419483 Pulled By: ezyang fbshipit-source-id: 5543e971f8362e4cea64f332ba44a26c2145caea	2018-10-16 20:15:01 -07:00
Edward Yang	3bfa7258b3	Don't serialize hooks (#11705 ) Summary: Fixes #11683. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705 Differential Revision: D9833057 Pulled By: ezyang fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6	2018-10-16 20:11:03 -07:00
Edward Yang	b1892226aa	A quick rundown of codebase structure. (#12693 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12693 Differential Revision: D10419424 Pulled By: ezyang fbshipit-source-id: dc3999253f19b5615849619bd3e4a77ab3ca984e	2018-10-16 20:02:27 -07:00
Yinghai Lu	0054df19b1	Simplify InheritOnnxSchema registration (#12696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12696 In majority of the case, we use `InheritOnnxSchema(type_)`. This diff makes declaration of such case easier. Reviewed By: bddppq Differential Revision: D10395109 fbshipit-source-id: 914c1041387d5be386048d923eb832244fc506c3	2018-10-16 19:59:49 -07:00
Wei Yang	81975a497f	update docs for sparse tensor (#12221 ) Summary: - update docs examples at sparse tensor after print format changed - update example to create empty sparse tensor: ``` >>> torch.sparse_coo_tensor(torch.LongTensor(size=[1,0]), [], torch.Size([1])) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0,)), size=(1,), nnz=0, layout=torch.sparse_coo) ``` zou3519 SsnL yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12221 Differential Revision: D10412447 Pulled By: weiyangfb fbshipit-source-id: 155b8cb0965f060e978f12239abdc1b3b41f6ab0	2018-10-16 19:56:51 -07:00
Yinghai Lu	dc07102b17	Check dim size preventively when doing shape inference for BatchMatMul (#12691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12691 We check input(0) but not input(1) in BatchMatMul. This may result in a protobuf exception which won't be caught by upstream and causing termination of the program. Check that with `CAFFE_ENFORCE` will be caught by upstream inference function. Plus, it will print out clean stack tracing showing where went wrong. Reviewed By: bddppq, houseroad, BIT-silence Differential Revision: D10391130 fbshipit-source-id: daf8dcd8fcf9629a0626edad660dff54dd9aeae3	2018-10-16 17:27:44 -07:00
Thomas Viehmann	50c0aedbec	Don't segfault on Tensor.__delitem__ (#12726 ) Summary: The mapping protocol stipulates that when `__delitem__` is called, this is passed to `__setitem__` [(well, the same function in the C extension interface)](https://docs.python.org/3/c-api/typeobj.html#c.PyMappingMethods.mp_ass_subscript) with NULL data. PyTorch master crashes in this situation, with this patch, it does not anymore. Test code (careful, sefaults your interpreter): ```python import torch a = torch.randn(5) del a[2] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12726 Differential Revision: D10414244 Pulled By: colesbury fbshipit-source-id: c49716e1a0a3d9a117ce88fc394858f1df36ed79	2018-10-16 17:24:18 -07:00
Sebastian Messmer	6476e4598c	Rename TypeMeta function pointers (#12306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12306 In a future diff, I'm going to introduce non-placement constructor and destructor to TypeMeta. To make it less ambigous, this diff is first renaming the existing ones to PlacementXXX. Reviewed By: dzhulgakov Differential Revision: D10184117 fbshipit-source-id: 119120ebc718048bdc1d66e0cc4d6a7840e666a4	2018-10-16 16:45:47 -07:00
Ashish	d0df1e8ec9	Remove MIOpen Softmax operator (#12727 ) Summary: This PR contains changes for: 1. Removing MIOpen softmax operator. Will be added later with the required functionality 2. Enabling softmax_ops_test on ROCm target Differential Revision: D10416079 Pulled By: bddppq fbshipit-source-id: 288099903aa9e0c3378e068fffe6e7d6a9a84841	2018-10-16 16:45:46 -07:00
Lu Fang	30aaa07594	New serialization format (#12384 ) Summary: Addressed Dima's feedback. The proposal is here: https://fb.quip.com/TbQmAuqIznCf Pull Request resolved: https://github.com/pytorch/pytorch/pull/12384 Reviewed By: dzhulgakov Differential Revision: D10246743 Pulled By: houseroad fbshipit-source-id: c80db0c35d60ca32965275da705f2b1dfb2a7265	2018-10-16 16:36:58 -07:00
Tongzhou Wang	ac994f2c78	Fix SpectralNorm with DataParallel (#12671 ) Summary: There were two problems with SN + DP: 1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost. 2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work. Fixes are: 1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained 2. Do not call `detach_`. 3. Added comments in SN about the subtlety. 4. Added a note to the DP doc on this particular behavior of DP. cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu Fixes https://github.com/pytorch/pytorch/issues/11476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671 Differential Revision: D10410232 Pulled By: SsnL fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9	2018-10-16 16:02:17 -07:00
Ansha Yu	c414eb2618	fix improper calling of ShareExternalPointer from RNN op (#12593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12593 size() returns numel_, but what we really want is nbytes(), which is the capacity. Reviewed By: salexspb Differential Revision: D10354488 fbshipit-source-id: f7b37ad79ae78290ce96f37c65caa37d91686f95	2018-10-16 15:58:14 -07:00
Yinghai Lu	4d698cae2e	Enhance shape inference in ONNXIFI transformer (#12685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12685 In this diff, we push the fake run of the net into the ONNXIFI transformer, because 1. We cannot do shape inference for every op 2. Since the net has been SSA rewritten, we cannot use shape info from outer workspace directly. In addition, this diff adds input shape info when querying the `onnxBackendCompatibility` function. Reviewed By: bddppq Differential Revision: D10390164 fbshipit-source-id: 80475444da2170c814678ed0ed3298e28a1fba92	2018-10-16 14:15:46 -07:00
Lu Fang	f53d5e0a75	Automatic update of fbcode/onnx to 1cbe2743cda739ff752d6ce79553b0ef8ad49783 (#12676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12676 Previous import was 06f6d63d5529e3a94533c9f34c402be1793420b1 Included changes: - [1cbe274](https://github.com/onnx/onnx/commit/1cbe274): fix the optimizer (#1510) <Lu Fang> - [481ad99](https://github.com/onnx/onnx/commit/481ad99): Fix TensorProto int32_data comment (#1509) <Lutz Roeder> - [f04fbe0](https://github.com/onnx/onnx/commit/f04fbe0): fix ninja external (#1507) <Rui Zhu> Reviewed By: jamesr66a, wanchaol Differential Revision: D10388438 fbshipit-source-id: 298100589ce226c63d4e58edf185c9227fd52c85	2018-10-16 10:24:15 -07:00
Ailing Zhang	e15501fb68	fix bce_with_logits with legacy reduce (#12689 ) Summary: Fix #12624 . internal usecase of legacy `reduce`. Add test in test_nn Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689 Reviewed By: ezyang Differential Revision: D10391195 Pulled By: ailzhang fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1	2018-10-16 09:46:58 -07:00
Roy Li	00f0dca4b5	restore caffe2 strides (#12381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12381 The workflow passes after D10150834, so we can restore strides. Reviewed By: ezyang Differential Revision: D10220313 fbshipit-source-id: aaf9edebf4ff739cbe45b2d32e77918fce47ba34	2018-10-16 09:19:42 -07:00
Igor Sugak	7035975508	fix double free exposed by latest llvm (#12697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12697 Latest LLVM started reporting double free related to this code. The stack trace: P60181558 Fix it by using the leaky Meyers' singleton Reviewed By: meyering Differential Revision: D10352976 fbshipit-source-id: 11afc2999235831da10c73609d1153d04742ba18	2018-10-16 07:32:08 -07:00
Gregory Chanan	a9981c8477	Remove Type.tensor, Type.native_tensor. (#12687 ) Summary: They aren't needed anymore now that at::empty can handle all backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12687 Differential Revision: D10390740 Pulled By: gchanan fbshipit-source-id: 521d6f92448798aa368186685662451e191c0b05	2018-10-16 07:12:16 -07:00
Gregory Chanan	7d24985852	Kill is_type_dispatched. (#12684 ) Summary: All factory functions are now implemeneted in terms of TensorOptions, which is passed through Type, if necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12684 Differential Revision: D10390224 Pulled By: gchanan fbshipit-source-id: fb536271735e6e0e542f021e407529998b0482eb	2018-10-16 07:05:49 -07:00
Tongzhou Wang	5b8a640d0b	Update fft docs for new cache size (#12665 ) Summary: Follow up of #12553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12665 Differential Revision: D10385615 Pulled By: SsnL fbshipit-source-id: 44fe9ec75cb735de37c56270f160a16a1d2bfb64	2018-10-16 01:47:36 -07:00
Peter Goldsborough	0916f4a337	Remove caffe2/submodules/cereal-rev.txt Summary: Zero-th step in removing the cereal submodule. Reviewed By: yns88 Differential Revision: D10385343 fbshipit-source-id: cc93c22b2cafa73f929f2f7659a6f6e66458aa7e	2018-10-16 01:42:20 -07:00
Bram Wasti	04d4ec285c	Cleanup namespace that were moved to ATen accidentally (#12680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12680 torch::jit shouldn't live in aten Reviewed By: ezyang Differential Revision: D10389502 fbshipit-source-id: f38582e61a275edccf22845c7d709a201f6a0be1	2018-10-16 01:25:08 -07:00
Peter Goldsborough	eb02a1d8a7	Fix clang tidy master comparison (#12674 ) Summary: This PR makes the clang-tidy CI get its diff by comparing the current commit against the base branch that the PR is targeting. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12674 Differential Revision: D10397692 Pulled By: goldsborough fbshipit-source-id: 7fd9e22c92dd885112cd5c003c732d1c12667157	2018-10-16 01:17:18 -07:00
Bram Wasti	31d8e5e71a	Improve Python API with the addition of pythonic setters/getters Summary: Simple additions that make it vastly easier to use nomnigraph in python Reviewed By: duc0 Differential Revision: D10383027 fbshipit-source-id: 441a883b84d4c53cca4f9c6fcc70e58692b8f782	2018-10-16 00:57:54 -07:00
Owen Anderson	f2b62e113c	Clean up IR.h (#12551 ) Summary: Move a lot of methods that don't have an obvious reason for being inline out-of-line. This cleans up the header and should help reduce the problem of touching IR.h and having to rebuild the world. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12551 Differential Revision: D10384808 Pulled By: resistor fbshipit-source-id: 314af89e3282f35fdc94fa3fd3000e3040c8cb6b	2018-10-15 21:21:39 -07:00
Lu Fang	058c1284be	Fix the symbolic for pixel shuffle (#12192 ) Summary: Using Transpose + Reshape, not using DepthToSpace, since they are not available in C2 yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12192 Reviewed By: BIT-silence Differential Revision: D10129913 Pulled By: houseroad fbshipit-source-id: b60ee6d53b8ee95fd22f12e628709b951a83fab6	2018-10-15 19:53:35 -07:00
Junjie Bai	a1dd608260	Reduce MAX_JOBS for pytorch rocm build to make CI more stable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12662 Differential Revision: D10393109 Pulled By: bddppq fbshipit-source-id: e14f72ebc877b5c0f75fe5d195c8b4dbb9b111db	2018-10-15 18:12:46 -07:00
Thomas Viehmann	d80a3eb549	Set philox seed and offset on cuda manual_seed (#12677 ) Summary: Fixes: #12669 Thank you Changmao Cheng for reporting this on the forum with a small example! Pull Request resolved: https://github.com/pytorch/pytorch/pull/12677 Differential Revision: D10391989 Pulled By: ezyang fbshipit-source-id: 5aa7a705bdb8ce6511a8eb1b3a207f22741046bf	2018-10-15 17:45:59 -07:00
Achal	01a333fd7f	OpenCV 4.0 Compatibility fix (#9966 ) Summary: caffe2 compiles with latest opencv 4.0 after committed changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9966 Differential Revision: D10369130 Pulled By: ezyang fbshipit-source-id: 9a104803edca5a22e27e140a794e4b8c878ca416	2018-10-15 17:42:04 -07:00
Yangqing Jia	083e037dea	minor fix (#12688 ) Summary: This seems to be a typo that never got caught - no actual functionality changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12688 Differential Revision: D10391704 Pulled By: Yangqing fbshipit-source-id: ce633776957628c4881956c5423bfab78294d512	2018-10-15 17:25:49 -07:00
ahirner	23c4dbd6d7	Fix ONNX upsample mode (#12648 ) Summary: Fixes #12647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12648 Differential Revision: D10389124 Pulled By: houseroad fbshipit-source-id: 53bc17b592d0d7f1884b555f3a12a33dbf18b4a0	2018-10-15 17:14:44 -07:00
Konstantin Semianov	7a52117792	Add AdaptiveAvgPool2d and AdaptiveMaxPool2d to ONNX.symbolic (#9711 ) Summary: Add AdaptiveAvgPool2d and AdaptiveMaxPool2d to ONNX.symbolic Due to limitations in ONNX only output_size=1 is supported. AdaptiveAvgPool2d -> GlobalAveragePool AdaptiveMaxPool2d -> GlobalMaxPool Fixes #5310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9711 Differential Revision: D10363462 Pulled By: ezyang fbshipit-source-id: ccc9f8ef036e1e54579753e50813b09a6f1890da	2018-10-15 17:02:20 -07:00
Bowen Bao	52cbf4b774	Update eigen submodule to fix CUDA arch>=5.3 build issue. (#12191 ) Summary: Discussed in #11379, #12545. Eigen submodule needs to be updated to `f59336cee3` to support building with CUDA arch >= 5.3. It seems there was a similar fix checked in from #6746, but later the Eigen submodule is switched to the current mirror #7793 at a point the fix was not included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12191 Differential Revision: D10362557 Pulled By: ezyang fbshipit-source-id: 548541e2c93f412bf6680ee80b8da572846f80d2	2018-10-15 17:02:19 -07:00
Dong Shi	e22a776890	Fix for some tests (#12575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12575 Just my guess as to why those tests are failing. Waiting on sandcastle to see if the tests resolve themselves. Reviewed By: mlappelbaum, wesolwsk Differential Revision: D10305051 fbshipit-source-id: 455597b12bbe27dd6c16f7d0274f2c939949d878	2018-10-15 16:53:18 -07:00
Sebastian Messmer	0b96e5d792	Move some files to c10/util (#12245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12245 Move these files to c10/util: - C++17.h - Metaprogramming.h - TypeList.h - TypeTraits.h - Array.h (including .cpp files and test cases) Reviewed By: ezyang Differential Revision: D10139933 fbshipit-source-id: ce7ce89392bf1a6be070ffdfc0407a8a2ce4ba6e	2018-10-15 16:25:12 -07:00
Viswanath Sivakumar	ade97afc74	Re-enable IDEEP graph rewrite test (#12661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12661 Was disabled since workspace.has_mkldnn is now set to false Reviewed By: yinghai Differential Revision: D10383913 fbshipit-source-id: ad6dc705f0606b3711e8b450dc384ad3ebb87686	2018-10-15 15:50:28 -07:00
Peter Goldsborough	ab7520eb50	Revamp and document serialization, support streams (#12421 ) Summary: This PR does three things: 1. Add support for serializing to `ostream` and deserializing from `istream`s in addition to files. This is after https://github.com/pytorch/pytorch/pull/11932 added support for streams in `torch::jit::ExportModule` and `torch::jit::load`. 2. Update the internal interface for how things get serialized into archives (e.g. use the more idiomatic `operator<<` instead of a `save` method). The external interface does not change. 3. Add documentation. ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/12421 Reviewed By: ezyang Differential Revision: D10248529 Pulled By: goldsborough fbshipit-source-id: 6cde6abd0174e3fbf3579c05376a32db0b53755f	2018-10-15 15:47:59 -07:00
Syed Tousif Ahmed	03429e4eaf	Update Gloo submodule to resolve __CUDA_DEPRECATED warning (#12574 ) Summary: Gloo was updated with `type` usage for cudaPointerAttributes which resolves the `__CUDA_DEPRECATED` warnings in our CUDA 10 CI. This PR brings in that change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12574 Differential Revision: D10342450 Pulled By: ezyang fbshipit-source-id: d50564bfcd8623a20b82b0052fba441c8358c17b	2018-10-15 15:45:13 -07:00
Sebastian Messmer	ef18f74e20	Simplify typeid macros (#12654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12654 The previous diff managed to get the macros working, but they've been quite unmaintainable. This diff improves the situation a bit. - Before, there were three global variables for each registered type: type id, type name and a global type meta instance. Now, it's only type id and type meta, type name is gone. I also wanted to get rid of type id, but that doesn't work due to issues with static initialization ordering (type ids for types are requested during static initialization time, meh) - Instead of repeating the whole CAFFE_KNOWN_TYPE macro for GCC and non-GCC because they need different export flags, define it only once and use a EXPORT_IF_NOT_GCC macro. - The CAFFE_KNOWN_TYPE macro has to delegate to a _CAFFE_KNOWN_TYPE_DEFINE_TYPEMETADATA_INSTANCE macro, because of the counter. The pattern was copied for the macros for preallocated types. However, there we don't use a counter but use the preallocated id, so there's no need to delegate to a separate macro. Reviewed By: ezyang Differential Revision: D10379903 fbshipit-source-id: 50a32a5cb55ab85db49618a5f1ee4e8b06e0dfb2	2018-10-15 15:42:10 -07:00
Gregory Chanan	bb35d085ef	Dispatch backend-specific TensorOptions-based 'factory' functions via… (#12071 ) Summary: … Type. This allows one to write a cpu/cuda split 'factory' function that uses TensorOptions. Also move all remaining native_functions with either function or method variants that use Type to use TensorOptions. Thus, there are no more Types in the public function / method API. I believe there is a _lot_ of opportunity for cleanup here, as the old tensor, th_tensor, native_tensor and sparse variants can probably be removed, but let's do that in a follow-on patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12071 Reviewed By: ezyang Differential Revision: D10041600 Pulled By: gchanan fbshipit-source-id: 30ebc17146d344bc3e32ccec7b98b391aac5470b	2018-10-15 15:21:11 -07:00
Zachary DeVito	86aa6a61e0	Dedup MethodValue and FunctionValue (#12589 ) Summary: ... they are basically the same class and I didn't see it in the initial PR. I also got resolvers back onto std::functions by keeping the function_table logic local to defineMethodInModules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12589 Differential Revision: D10383103 Pulled By: zdevito fbshipit-source-id: 1b0a85eb4f112bc28256cac44446d671d803d3a2	2018-10-15 15:00:54 -07:00
Zachary DeVito	71d142604f	Add upcoming features to schema parser (#12585 ) Summary: This commit adds the hooks in schema parser for futures, options, mutable alias sets, marking writes, and named output arguments that need to exist for other upcoming work. This also fixes that problem where you could not declare Lists of Lists. Implementation of most of these features is left NYI. This commit should avoid merge conflicts for these individual features on the schema parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12585 Differential Revision: D10382229 Pulled By: zdevito fbshipit-source-id: 41d794e58ca462cf3a389861c533c68944dc560b	2018-10-15 14:51:42 -07:00
Anders Papitto	4c21b2f2d3	split register_aten_ops.cpp into shards (#12615 ) Summary: after an analogous breakup of VariableType.cpp, the generated register_aten_ops.cpp is now the slowest-to-compile file in a typical incremental rebuild by a wide margin. Therefore, give it the same treatment - the generated code is split across several files to allow parallel compilation. Note that the existing code takes some care to arrange that overloads of the same op name are given in a particular order. This diff preserves that behavior, by treating all overloads of the same name as a single indivisible unit, and sharding based on these groups rather than on individual constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12615 Reviewed By: ezyang Differential Revision: D10367363 Pulled By: anderspapitto fbshipit-source-id: 07db5f9cb79748040909716349626412a13bc86e	2018-10-15 14:12:27 -07:00
Will Feng	c6f0fe5f26	CircleCI: Remove --depth from git fetch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12657 Differential Revision: D10386020 Pulled By: yf225 fbshipit-source-id: 08d1c57159b323c19d5fc94180972d0c70d6aec1	2018-10-15 13:55:27 -07:00
Will Feng	6f339cac6b	Windows local dev: install conda in user-specific directory to avoid conflict (#12663 ) Summary: Currently when developing on the shared Windows debug machine, it's very easy to accidentally wipe out someone else's working binary because the conda environment is shared. This PR fixes that by always installing conda in the user's directory instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12663 Differential Revision: D10386130 Pulled By: yf225 fbshipit-source-id: 1242ef8b2b4239c4a96459a59eb0255b44ed9628	2018-10-15 13:46:12 -07:00
Benoit Steiner	bbe6ef3864	torch.finfo and torch.iinfo to mimic the numpy equivalent (#12472 ) Summary: This pull request intends to provide the functionality requested in https://github.com/pytorch/pytorch/issues/10742 by adding a new torch.finfo and torch.iinfo API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12472 Differential Revision: D10250829 Pulled By: benoitsteiner fbshipit-source-id: eb22ca55d5b0064bef381fa7f1eb75989977df30	2018-10-15 13:43:52 -07:00
Peter Goldsborough	e8d8ccb34a	Emphasize that the /path/to/libtorch must be absolute (#12660 ) Summary: ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12660 Differential Revision: D10386952 Pulled By: goldsborough fbshipit-source-id: efd82f2aa3a349e9acd29303984b8fd7c3208c3f	2018-10-15 13:41:18 -07:00
Peter Goldsborough	a74cc03aa7	Use branch of exhale that fixes overloads (#12668 ) Summary: Docs for [`torch::jit::load`](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1jit_1ace2c44fb8af5905ae17834e81086b8a3.html#exhale-function-namespacetorch-1-1jit-1ace2c44fb8af5905ae17834e81086b8a3) are currently broken. svenevs has a fix on this branch, and we need to update to it. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12668 Differential Revision: D10386949 Pulled By: goldsborough fbshipit-source-id: 1887ba53989e5a77b178f8b2782a7b3ae52b7405	2018-10-15 13:39:01 -07:00
Yangqing Jia	713e706618	Move exception to C10 (#12354 ) Summary: There are still a few work to be done: - Move logging and unify AT_WARN with LOG(ERROR). - A few header files are still being plumbed through, need cleaning. - caffe2::EnforceNotMet aliasing is not done yet. - need to unify the macros. See c10/util/Exception.h This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches: (1) add //caffe2/c10:c10 to your dependency (or transitive dependency). (2) change objects such as at::Error, at::Optional to the c10 namespace. (3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes. Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354 Reviewed By: orionr Differential Revision: D10238910 Pulled By: Yangqing fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32	2018-10-15 13:33:18 -07:00
Ansha Yu	aef8cadb9a	mark Storage functions as const (#12623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12623 Mark Storage functions as const so that they they can be exposed outside of TensorImpl when calling storage() Based on this discussion https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 Also potentially useful in the effort to remove ShareExternalPointer Reviewed By: ezyang Differential Revision: D10370201 fbshipit-source-id: 43cf3803a4aa7b94fdf0c3a604d7db769ca0bdd5	2018-10-15 13:03:28 -07:00
Evan Klitzke	189c1e1afb	Rewrite http://pytorch.org -> https://pytorch.org throughout project (#12636 ) Summary: The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636 Differential Revision: D10377099 Pulled By: soumith fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039	2018-10-15 13:03:27 -07:00
Bram Wasti	a6c7cf8741	python bindings: enable generic nn operator handling Summary: hotfix to unblock @[100000295380748:Dong Shi] Reviewed By: duc0 Differential Revision: D10385763 fbshipit-source-id: 80badd31c1039a245f32940c719e867a86ec7e47	2018-10-15 12:55:42 -07:00
vishwakftw	0740a5d521	compute_uv for SVD (#12517 ) Summary: Adds a `compute_uv` argument that defaults to `True` for optionally computing the singular vectors during SVD. Closes https://github.com/pytorch/pytorch/issues/12420 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12517 Differential Revision: D10384554 Pulled By: SsnL fbshipit-source-id: 704998a257afa815eda901b8ae830e8a661695be	2018-10-15 12:35:56 -07:00
ArmenAg	d5eae90537	update onnx tests (#12619 ) Summary: Fixes #12586 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12619 Reviewed By: ezyang Differential Revision: D10377548 Pulled By: houseroad fbshipit-source-id: 1166e40aa8b98f1fe015fb1bdb2e90acfad3c356	2018-10-15 11:59:19 -07:00
Yinghai Lu	d17b0bc679	Allow running root tasks inline (#12289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12289 When we have a all sync net, the chaining algorithm will generate one single group. And we want to just run that in the serving thread instead of scheduling it onto the worker queue. This will closely mimic the behavior of simple net and gives us the expected performance. Reviewed By: ilia-cher Differential Revision: D10174323 fbshipit-source-id: 1dae11a478936634f8ef1e4aa43d7884d6362e52	2018-10-15 11:14:12 -07:00
mratsim	a1bbe80e21	Remove NervanaGPU operators from Caffe2 (#12564 ) Summary: Fix #12540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12564 Reviewed By: orionr Differential Revision: D10379775 Pulled By: soumith fbshipit-source-id: a925b116f2687e56bf54465fc02ca2eb1e7c8eb0	2018-10-15 11:04:46 -07:00
Will Feng	151b28521a	Fix Windows test script on local dev machine (#12073 ) Summary: We should not clean up Miniconda environment when the user is running `win-test.sh` locally. This would help reproduce #11527 locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12073 Differential Revision: D10053497 Pulled By: yf225 fbshipit-source-id: 11027500e7917a7cb79270c811379e11dbbb6476	2018-10-15 09:36:50 -07:00
Gregory Chanan	7326739188	Remove out-of-date TODO. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12638 Differential Revision: D10376584 Pulled By: gchanan fbshipit-source-id: 47fb0333cd9e41a66c2e215f91e129fe19dc9225	2018-10-15 08:45:59 -07:00
Edward Yang	07d67aa17a	Make TensorOptions immutable. (#12630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12630 Instead of providing mutable accessors, our "mutators" now return new copies of TensorOptions. Since TensorOptions is simply two 64-bit integers, this is not a big efficiency problem. There may be some sites that assumed that TensorOptions was mutable. They need to be fixed. Reviewed By: SsnL Differential Revision: D10249293 fbshipit-source-id: b3d17acc37e78c0b90ea2c29515de5dd01209bd3	2018-10-15 08:30:16 -07:00
Soumith Chintala	1014c8a7db	'Re-sync with internal repository' (#12652 )	2018-10-15 10:57:10 -04:00
Xingdong Zuo	6dd71947ea	remove unused Iterable, also avoid Python 3.7 deprecation warning Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12639 Differential Revision: D10377094 Pulled By: soumith fbshipit-source-id: d904c4c1bbac900e44ea0b3b5635697159aec717	2018-10-15 02:30:22 -07:00
Andrey Malevich	eaf33f22c8	Revert D10123465: Set the correct engine name for position weighted pooling when fp16 is used for training Differential Revision: D10123465 Original commit changeset: e8d929d4153d fbshipit-source-id: 36269e49ac79955fe695ac1a53a3c386aa2f5bec	2018-10-15 01:53:48 -07:00
Mingfei Ma	02695c11db	fix masked_fill_ bug on non-contiguous tensor (#12594 ) Summary: bug fix on #12230 , the following script pass after the fix. ```python x = torch.randn(2, 2, 2) x = x.permute((2, 0, 1)) y = x.clone() y.masked_fill_(y > 0, 1) x.masked_fill_(x > 0, 1) print((x == y).all()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12594 Differential Revision: D10377088 Pulled By: soumith fbshipit-source-id: 88feabe1459d325bfdf9a860412ddbd28686a28b	2018-10-14 23:12:27 -07:00
Edward Yang	0c6ab0e8f4	Delete caffe2/mkl, and references. (#12625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12625 It's obsoleted by ideep Reviewed By: Yangqing Differential Revision: D10372230 fbshipit-source-id: 2d6475ae72389dd654ba0bcbb57766530eb4ac1a	2018-10-13 22:02:32 -07:00
Natalia Gimelshein	a98958d3bd	dtype option for softmax (#11719 ) Summary: Add dtype argument to softmax/log_softmax functions. Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it. For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719 Reviewed By: ezyang Differential Revision: D10175514 Pulled By: zou3519 fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243	2018-10-13 17:57:10 -07:00
Adam J. Stewart	e986f307c3	Fix math formatting of PairwiseDistance docs (#12628 ) Summary: `:math:` was being displayed in the docs for https://pytorch.org/docs/stable/nn.html#torch.nn.PairwiseDistance. I haven't tested this locally, but I assume it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12628 Differential Revision: D10373778 Pulled By: SsnL fbshipit-source-id: 6eb918c521e73c17f6662d83f69e0e4b14dec860	2018-10-13 16:39:15 -07:00
Peter Goldsborough	a91f3338a0	Some documentation fixes (#12521 ) Summary: ezyang soumith Partly addresses https://github.com/pytorch/cppdocs/issues/2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12521 Differential Revision: D10374244 Pulled By: goldsborough fbshipit-source-id: 8e9fe688cbaa2d2b0b96f721e5477ee8845b8f20	2018-10-13 14:20:42 -07:00
James Reed	1f94ce1f97	Fix aten::to export in ONNX Summary: D10356994 broke ONNX export for casting, this fixes it Reviewed By: wanchaol Differential Revision: D10366103 Pulled By: jamesr66a fbshipit-source-id: 039454cce571a1186265708e7ddcb946814cc8b0	2018-10-12 21:20:01 -07:00
Jiyan Yang	635cbff300	Set the correct engine name for position weighted pooling when fp16 is used for training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12225 Reviewed By: hyuen, xianjiec Differential Revision: D10123465 fbshipit-source-id: e8d929d4153d1ee987ae3d1c37892525d7574d16	2018-10-12 20:15:13 -07:00
onnxbot	6bc8d303eb	Update onnx to onnx/onnx@06f6d63 (#12621 ) Summary: `06f6d63d55` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12621 Differential Revision: D10368472 Pulled By: bddppq fbshipit-source-id: b62fbbc0ad5bc41c5e7221ba889b1061087c3214	2018-10-12 17:25:20 -07:00
Ilia Cherniavskii	63a220f54d	Deprecate prof_dag (#11956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11956 Deprecate prof_dag and redirect it to the unified executor Reviewed By: aazzolini Differential Revision: D9983992 fbshipit-source-id: 16821628a99a5683dc39cbb345ddab56e9d8721c	2018-10-12 16:37:57 -07:00
Brian W. Hart	53f4dbc9ac	test_proper_exit: avoid truncation of info message (#12612 ) Summary: test_proper_exit in the dataloader test bucket includes (as its docstring) a reassuring message about complaints that may appear during the test. The message is displayed when the tests are run in verbose mode. But the docstring includes a line break, and the unittest framework only prints the first line of the docstring (see shortDesription()). As a result, the 2nd (more reassuring) half of the message is not displayed. Catenate the docstring onto a single line so all is visible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12612 Differential Revision: D10368786 Pulled By: ezyang fbshipit-source-id: 14b259a6d6a3491d4290148eae56e6ab06f2a9b6	2018-10-12 16:32:28 -07:00
Hector Yuen	17ab3bd502	implement rowwise quantization for fp16 (#12382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382 implement fp16-> (uint8 + scale and bias in fp32) this is similar to fp32 rowwise quantization we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways Reviewed By: csummersea Differential Revision: D10220463 fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f	2018-10-12 13:57:55 -07:00
Alex Ford	7a1b668283	Implement Tensor.__cuda_array_interface__. (#11984 ) Summary: _Implements pytorch/pytorch#11914, cc: ezyang_ Implements `__cuda_array_interface__` for non-sparse cuda tensors, providing compatibility with numba (and other cuda projects...). Adds `numba` installation to the `xenial-cuda9` jenkins test environments via direct installation in `.jenkins/pytorch/test.sh` and numba-oriented test suite in `test/test_numba_integration.py`. See interface reference at: https://numba.pydata.org/numba-doc/latest/cuda/cuda_array_interface.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/11984 Differential Revision: D10361430 Pulled By: ezyang fbshipit-source-id: 6e7742a7ae4e8d5f534afd794ab6f54f67808b63	2018-10-12 13:41:05 -07:00
Natalia Gimelshein	134b5d62e8	don't copy weight gradients in rnn (#12600 ) Summary: This PR gets rid of unnecessary copy of weight gradients in cudnn rnn. Also removes unnecessary check for input size when deciding whether to use persistent rnn, and adds doc string explaining when persistent rnn can be used. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12600 Differential Revision: D10359981 Pulled By: soumith fbshipit-source-id: 0fce11b527d543fabf21e6e9213fb2879853d7fb	2018-10-12 13:34:10 -07:00
Anders Papitto	49256ddb4a	split generated VariableType.cpp (#12493 ) Summary: On my devgpu, this brings the time taken for `touch torch/csrc/jit/type.h && time python setup.py rebuild develop` (debug mode, multicore build) down from 75 seconds to 62 seconds. For the `ninja install` of libtorch portion, which this affects, the reduction is from 52 seconds to 35. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12493 Reviewed By: zdevito Differential Revision: D10315988 Pulled By: anderspapitto fbshipit-source-id: 316dc4ab81134aaa17a568cfc07408b7ced08c2e	2018-10-12 13:14:44 -07:00
Lu Fang	3f52a0aad7	Fix the linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12613 Differential Revision: D10364963 Pulled By: houseroad fbshipit-source-id: f9e2a76c1ab021cce4f45f5b4e74ddcc9618c138	2018-10-12 13:12:08 -07:00
103yiran	239b2ac718	make the variable declaration closer to usage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9262 Differential Revision: D10363576 Pulled By: ezyang fbshipit-source-id: 05c8eb12f3b389caf562cca9e338cc91b0e9acc1	2018-10-12 12:07:08 -07:00
Anders Papitto	15bdb9fe61	remove duplicate BUILD_TEST flag in libtorch cmake file (#12583 ) Summary: there is already a BUILD_TEST flag in the root-level cmake file. Removing this makes sure it doesn't interfere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12583 Differential Revision: D10348620 Pulled By: anderspapitto fbshipit-source-id: 3957783b947183e76a4479a740508c0dc1c56930	2018-10-12 11:53:07 -07:00
dzung-hoang	7da4643232	Caffe2: fix error C2398 and syntax error with Visual Studio 2015 (#10089 ) Summary: Similar fix to [pull #7024](https://github.com/pytorch/pytorch/pull/7024). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10089 Differential Revision: D10363341 Pulled By: ezyang fbshipit-source-id: bc9160e2ea75fc77acf3afe9a4e20f327469592e	2018-10-12 11:47:34 -07:00
Lu Fang	c1d0784dcb	enable onnx integration tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12592 Reviewed By: BIT-silence, zrphercule Differential Revision: D10363056 Pulled By: houseroad fbshipit-source-id: 4d1dc0302a8cbe3d6ff1594f0d038330ba4efc81	2018-10-12 11:34:16 -07:00
Xiang Gao	97eec33f80	Allow tensor.device, tensor.dtype, and tensor.shape in JIT (#12363 ) Summary: Closes https://github.com/pytorch/pytorch/issues/12364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12363 Differential Revision: D10362491 Pulled By: ezyang fbshipit-source-id: f2716e656977370c5ec51cb15f62b6376798e617	2018-10-12 11:29:04 -07:00
Ailing Zhang	5317429e82	move bceWithLogits from python to Aten (#11054 ) Summary: Fixes #10648 . Perf comparison: ``` import torch import torch.nn as nn import time def bm(testsize, repeat=100, cuda=False): total_time = 0.0 pos_weight= torch.ones(testsize[1], device='cuda' if cuda else 'cpu') / testsize[1] # loss = nn.BCEWithLogitsLoss(pos_weight=pos_weight) loss = nn.BCEWithLogitsLoss() input = torch.randn(testsize, device='cuda' if cuda else 'cpu').clamp_(2.8e-2, 1 - 2.8e-2) target = torch.randn(testsize, device='cuda' if cuda else 'cpu').gt(0).float() input.requires_grad = True target.requires_grad = True for _ in range(repeat): start = time.time() l = loss(input, target) l.backward() # print(target.grad) end = time.time() total_time += end - start return total_time for cuda in [False, True]: for testsize in [(100, 100), (1000, 1000), (2000, 2000)]: # print(testsize, cuda) print('{:.5f}'.format(bm(testsize, cuda=cuda))) ``` \| \| Python CPU \| Aten CPU \| Python GPU \| Aten GPU \| ------------- \| ------------- \| ------------- \| ------------- \| ------------- \| \| (100, 100) \| 0.15813s \| 0.10890s \| 0.14601s \| 0.07070s \| \| (1000, 1000) \| 1.74051s \| 0.95038s \| 0.15158s \| 0.10153s \| \| (2000, 2000) \| 5.36515s \| 2.46996s \| 0.31322s \| 0.200941s \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11054 Differential Revision: D9728289 Pulled By: ailzhang fbshipit-source-id: b7c5bc50635f8cc63c317caa4321e32f7df860f8	2018-10-12 11:13:33 -07:00
Tongzhou Wang	6069f6f454	Try to prevent occasional timeout in test_proper_exit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12587 Differential Revision: D10361411 Pulled By: SsnL fbshipit-source-id: 97d0ff9d40918b7729c21f4de6d8cabeb65c728a	2018-10-12 10:53:01 -07:00
Aidan Cully	12686ec656	fix _AllReduce not applying the DeviceScope guard to model.Copy operations. (#12342 ) Summary: This resolves an issue where the `model.Copy` operation would copy to the wrong GPU, such that the below `net.Sum` operation would use an input argument for which p2p access was not enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12342 Differential Revision: D10343181 Pulled By: ezyang fbshipit-source-id: fd2d6d0ec6c09cda2db0a9a4f8086b3560e5a3ec	2018-10-12 10:47:58 -07:00
103yiran	dfad8b60ba	Remove duplicate codes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12526 Differential Revision: D10342611 Pulled By: ezyang fbshipit-source-id: 470b4a181fd9091c3fd33d3d43a2cf6d44594202	2018-10-12 09:58:44 -07:00
François-Michel De Rainville	038d5ca943	Remove incompatibility MSVC, Cuda and Debug (#12572 ) Summary: Experimentally this works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12572 Differential Revision: D10342468 Pulled By: ezyang fbshipit-source-id: dc36587c32ab0910aa14b7351ca12532acd41c7d	2018-10-12 09:52:13 -07:00
Sebastian Messmer	63e09707a2	Use SFINAE instead of macros for 'long' hack (#12605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12605 Some compilers define 'long' as a separate type from 'int32_t' or 'int64_t', some don't. Before, we had a cmake check setting a macro and depending on the macro, we either defined a separate type id for long or didn't. Then, we removed the cmake check and used compiler detection macros directly. This is, however, error prone. This new approach uses SFINAE to register a type id for 'long' only if it's a separate type. Reviewed By: Prowindy Differential Revision: D10359443 fbshipit-source-id: aa371cbb43658c8cd3664ba3d9b0dedbaa225c1d	2018-10-12 09:46:07 -07:00
Philip Yang	b57fdf1db5	Properly set cmake python library and include_dirs (#12569 ) Summary: Properly set cmake python_library and include_dirs hints, so that systems with multiple version of python can still find the correct libraries and header files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12569 Differential Revision: D10359910 Pulled By: soumith fbshipit-source-id: 2238dcbed7aac8a818c9435e6bba46cda5f81cad	2018-10-12 08:11:21 -07:00
vishwakftw	48bc57fa8d	Introduce chain_matmul (#12380 ) Summary: - This was one of the few functions left out from the list of functions in NumPy's `linalg` module - `multi_mm` is particularly useful for DL research, for quick analysis of deep linear networks - Added tests and doc string Pull Request resolved: https://github.com/pytorch/pytorch/pull/12380 Differential Revision: D10357136 Pulled By: SsnL fbshipit-source-id: 52b44fa18d6409bdeb76cbbb164fe4e88224458e	2018-10-12 03:58:12 -07:00
Thomas Viehmann	0cf3c1ce66	Add copy= keyword to Tensor.to (#12571 ) Summary: Fixes: #12454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571 Differential Revision: D10356994 Pulled By: SsnL fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25	2018-10-12 02:10:44 -07:00
James Reed	2279299c6c	Implement aten::contiguous (#12541 ) Summary: Implement contiguous as `aten::contiguous` so it can be recorded during tracing. This was causing issues with both the trace checker as well as when a `contiguous()`-ed tensor was used downstream in a view that expected certain strides Pull Request resolved: https://github.com/pytorch/pytorch/pull/12541 Differential Revision: D10304028 Pulled By: jamesr66a fbshipit-source-id: dc4c878771d052f5a0e9674f610fdec3c6782c41	2018-10-11 23:39:39 -07:00
Edward Yang	1be8b7cc56	Delete "default" codeowners from root directories. (#12584 ) Summary: We will still have an informal notion of codeowner, but it is not necessary to get a review from these people in particular for these directories. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12584 Differential Revision: D10348999 Pulled By: ezyang fbshipit-source-id: 97331ec4bab9f1aa02af82b71ad525a44ad1e7fe	2018-10-11 23:18:04 -07:00
Junjie Bai	0df4d66210	Update caffe2 docker images version in circleci (#12596 ) Summary: `72b6d26950` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12596 Differential Revision: D10355881 Pulled By: bddppq fbshipit-source-id: 33c15819ec51315defc23a7fbc23caa2ddd65e75	2018-10-11 21:54:33 -07:00
Giuseppe Ottaviano	fa99ed9b30	Emit warning about optimization passes only once Reviewed By: ajtulloch Differential Revision: D9584925 fbshipit-source-id: 191035eaefe3ab3980e46598f2ebf34b2b704a9b	2018-10-11 21:41:17 -07:00
Lu Fang	01cb90adf1	fix the ONNX test_operator test (#12591 ) Summary: update the expect file Pull Request resolved: https://github.com/pytorch/pytorch/pull/12591 Differential Revision: D10355620 Pulled By: houseroad fbshipit-source-id: 5acdbf2406d322378025631808108a2d795be916	2018-10-11 21:41:15 -07:00
David Riazati	eb5fdc5fb5	Add default values in script (#12345 ) Summary: Add support for default values on script functions and Modules Followup to #11962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12345 Reviewed By: michaelsuo Differential Revision: D10263613 Pulled By: driazati fbshipit-source-id: 9b380d8c3f8c4abb2d24c33b23c00ec5896ca372	2018-10-11 20:49:23 -07:00
Syed Tousif Ahmed	97bee5cd80	Adds max plan number for CUDA 10 cufft plan cache array (#12553 ) Summary: SsnL As per your review in https://github.com/pytorch/pytorch/pull/12017/, I added a max plan number for CUDA 10 path. Our internal cuFFT team couldn't suggest a number since the limit depends on host/device memory. That is, a plan allocates some buffers on the device and also creates objects for the plans on the host side. I raised this number to 4x arbitrarily per you suggestion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12553 Differential Revision: D10320832 Pulled By: SsnL fbshipit-source-id: 3148d45cd280dffb2039756e2f6a74fbc7aa086d	2018-10-11 19:36:25 -07:00
Johannes M Dieterich	957142a4fe	switch ROCm CI targets to white rabbit release (#12577 ) Summary: * switches docker files over to white rabbit release - removed custom package installs * skips five tests that regressed in that release * fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker * includes first changes to the infrastructure to support upcoming hip-clang compiler * prints ROCm library versions as part of the build (as discussed w/ ezyang ) * explicitly searches for miopengemm * installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577 Differential Revision: D10350165 Pulled By: bddppq fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31	2018-10-11 18:03:11 -07:00
Xiaolong Wang	93a4b76114	Enable alternative LayerNorm impl in FisherGan (#12178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12178 Fisher GAN calls processor_util.add_mlp, which inject the layer norm through the normalizer. We allow to use alternative impl for LayerNorn in the normalizer. Reviewed By: Wakeupbuddy Differential Revision: D9235528 fbshipit-source-id: 88c126c658102926613242ef84a481f6de1676ed	2018-10-11 17:36:11 -07:00
Xiaolong Wang	8ac8b823c2	Allow use substitute ops for LayerNorm (#12177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12177 as titled Reviewed By: Wakeupbuddy Differential Revision: D9218047 fbshipit-source-id: 8d68861472c99d587e678c3d76ac43abc9c8fe6d	2018-10-11 17:36:10 -07:00
Cong Chen	d9eff40546	Revert D10209620: Use SFINAE instead of macros for 'long' hack Differential Revision: D10209620 Original commit changeset: 68f09339e279 fbshipit-source-id: e33927e92e34efc40917d97cd8ba80996a875dff	2018-10-11 16:50:09 -07:00
Junjie Bai	5973312abc	Add clang 6 docker images Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12581 Differential Revision: D10349785 Pulled By: bddppq fbshipit-source-id: 638641d369be0898dd6232737ebaa9d9a8c2e557	2018-10-11 16:48:13 -07:00
Richard Zou	a1487bf874	Smarter differentiable subgraph slicing (#12175 ) Summary: If any inputs require_grad then the graph executor does differential subgraph slicing. The existing algorithm combines adjacent differentiable Node. There are two major motivations. The first is improving fusion opportunities: the graph fusion pass runs after differential subgraph slicing. This means that only nodes that are a part of the same differential subgraph may be considered for fusion. If something like the following happens, ``` y = f(x) k = not_differentiable_op(m) z = g(y) ``` and f and g are both fusible and differentiable operations, then they will be inserted into different differential subgraphs and not fused together. The second is to enable JIT optimizations on backward passes for things like an (automatically) unrolled LSTM. Right now, in an unrolled LSTM, we see something like the following: ``` lstm_cell() non_differentiable_list_op() lstm_cell() non_differentiable_list_op() lstm_cell() non_differentiable_list_op() ``` Each lstm_cell itself is differentiable and gets put into a separate differential subgraph. During the backwards pass, each prim::DifferentiableSubgraph has its own graph executor: these graph executors cannot talk to each other. It is better if we combined all of the lstm_cells (where applicable) into one differential subgraph so their backward passes are combined into one graph executor that can perform better optimizations than several separate graph executors. Think about the computation graph as a DAG where edges are data dependencies and vertices are operations (the nodes). Each vertex is either black or red; a vertex is colored black if it is differentiable and red otherwise. The goal is to contract edges (merge nodes) to have the fewest black vertices remaining such that the graph is still a DAG. The algorithm is the following: - Take the Graph& and create a shadow "DynamicDAG" object to wrap Node and edges. Each Vertex holds multiple Node* (but starts out holding one Node) and each edge is a data dependency. - Greedily contract vertices in the DynamicDAG if they are "differentiable". This operation is unrelated to the Graph&. - A Vertex is "differentiable" if all the nodes it holds is differentiable. - When contracting vertices, combine their Node contents. - The DynamicDAG keeps its vertices in topological order and complains if the contraction is invalid so everything is good. - Take the DynamicDAG: reorder the nodes in the Graph& to match the topological order in the DynamicDAG. - Finally, go through each Vertex in the DynamicDAG: if it contains multiple Node* then merge all of them into a prim::DifferentiableGraph. The DynamicDAG is based off of the dynamic top sort algorithm in [this paper](https://www.doc.ic.ac.uk/~phjk/Publications/DynamicTopoSortAlg-JEA-07.pdf) by Pearce and Kelly. Each contractEdge(producer, consumer) call is `O(\|AR\| log \|AR\| * min(\|out_edges(producer)\|, \|in_edges(consumer)\|)` where `AR` is the "affected region" (defined as the set of nodes that, in topological order, are between producer and consumer). By only considering contractions such that `\|ord(producer) - ord(consumer)\| < threshold1` and `\|out_edges(producer)\| < threshold2` we can make each contractEdge(producer, consumer) call take constant time. The resulting algorithm is linear in the number of nodes. Added a lot of small test cases. Looking for suggestions on the following: - what big computation graphs should I run this on to test how fast or slow it is? - what things other than correctness should I be thinking about when I test this? cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/12175 Differential Revision: D10302564 Pulled By: zou3519 fbshipit-source-id: 8a94d130d82f8a1713cc28483afef9a72d83d61a	2018-10-11 16:20:53 -07:00
Yinghai Lu	0ee2e7c398	Relax the locking of running_mutex_ in async_scheduling net (#12544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12544 `running_mutex_` inside async_scheduling net is used to guard access to the `running_` variable. So we don't need to acquire that lock when we are actually running the net. This will help us prevent potential double locking situation when we decide to run the root nodes inline. Reviewed By: ilia-cher Differential Revision: D10304745 fbshipit-source-id: 5f701b2c22b06ff5bee7f2c37ac634326748f579	2018-10-11 16:00:54 -07:00
James Reed	0f9807ee61	Enable addmm fusion for ONNX export only (#12538 ) Summary: There's some action at a distance issues and not having this is disabling quantization in C2 for prod use cases ref T34831022 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12538 Differential Revision: D10302931 Pulled By: jamesr66a fbshipit-source-id: 700dc8c5c4297e942171992266ffb67b815be754	2018-10-11 13:57:50 -07:00
Orion Reblitz-Richardson	7b0f5d6631	Support USE_CUDNN for Windows (#12518 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12495 cc peterjc123 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12518 Reviewed By: mingzhe09088 Differential Revision: D10338792 Pulled By: orionr fbshipit-source-id: b465c42ea6d5fe9dbc2a4e1f973d952365d0af07	2018-10-11 13:53:27 -07:00
wuhuikx	033e00cd3f	Fix bug in caffe_translator tool (#10056 ) Summary: 1. Fix BN translator IntelCaffe and NVCaffe fuse BN+Scale, and the "BatchNorm" op contains 5 params including (scale and bias) 2. Fix Scale translator the translated outputs of scale have the same names with those of Conv. All their names are output + '_w' and output + '_b' Pull Request resolved: https://github.com/pytorch/pytorch/pull/10056 Differential Revision: D10099205 Pulled By: yinghai fbshipit-source-id: 73a73868e3e16c495e8b233fdb1d373d556a9537	2018-10-11 13:13:12 -07:00
Wei Wen	666bebc7d2	adapting caffe2 operator docs generator to pytorch url Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10801 Differential Revision: D9472991 Pulled By: ezyang fbshipit-source-id: 1b8ba77b8255b7e900b6528bd93b3b870f9ba0d4	2018-10-11 12:55:06 -07:00
Will Feng	eef083e477	CircleCI: add timestamp to build log, clean up unused jobs, print docker image name Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12556 Differential Revision: D10343032 Pulled By: yf225 fbshipit-source-id: fd2dcba18a5cb037fdc448dba64bf9d747dc3761	2018-10-11 12:23:42 -07:00
James Reed	a4120fa132	Get rid of emitApplyIdent (#12504 ) Summary: And reroute builtin/CompilationUnit function resolution through one resolution pathway Pull Request resolved: https://github.com/pytorch/pytorch/pull/12504 Differential Revision: D10319920 Pulled By: jamesr66a fbshipit-source-id: 3ab9877664dd32b97136a7625d0688e1adc0c022	2018-10-11 10:53:53 -07:00
Junjie Bai	8482ea8774	Update develop install command in onnx scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12561 Differential Revision: D10340194 Pulled By: bddppq fbshipit-source-id: 10fb7261028d56f73111e2ca39d4eb2ab930812a	2018-10-11 10:38:52 -07:00
Orion Reblitz-Richardson	cee19eb31c	Back out "[c10][NFCI] Move jit/type, function_schema, and utils/functional to ATen/core" (#12568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12568 Second attempt at D10324615 Original commit changeset: b71eeec98dfe Original commit changeset #2: 1af6400ae0c1 Reviewed By: bwasti Differential Revision: D10338168 fbshipit-source-id: 04cb443a89a9cd1a174df6d5ac1a86c3d423d56b	2018-10-11 09:53:40 -07:00
Laura Gustafson	7acb145893	Fixed print issue for TensorTypeId (#12402 ) Summary: Fixed printing issue for TensorTypeID. It used to print a hex of the ID, e.g. /x1 Now it prints the ID as a string, e.g. 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12402 Reviewed By: ezyang Differential Revision: D10224026 Pulled By: lauragustafson fbshipit-source-id: a9ca841d08c546fccbb948a17f06a29fea66f3fb	2018-10-11 08:23:32 -07:00
Andrey Malevich	229397b439	Revert D10324615: [pytorch][PR] Revert #12466 and #12467 to fix JIT test error on Windows CI Differential Revision: D10324615 Original commit changeset: 12e5fc73da42 fbshipit-source-id: 710c5f3b7a4fe56799ae31a86359b2085b7e741d	2018-10-11 03:39:14 -07:00
Sergei Nikolaev	1c7832c854	CUDA 10 warnings fixed (#12442 ) Summary: Deprecation warning against `cudaPointerAttributes`, where `memoryType` field has been deprecated in favor of `type`, see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#contents-end for details Pull Request resolved: https://github.com/pytorch/pytorch/pull/12442 Differential Revision: D10251239 Pulled By: zou3519 fbshipit-source-id: 500f1e02aa8e11c510475953ef5244d5fb13bf9e	2018-10-11 00:25:22 -07:00
harrysummer	234e6b3797	Bugfix in onnx exporter (#10607 ) Summary: Incorrect processing for int and float arguments. Possibly a typo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10607 Differential Revision: D9376040 Pulled By: bddppq fbshipit-source-id: e3665e7bbb26842d1d7eed50442993cfdbf55a80	2018-10-11 00:25:20 -07:00
Will Feng	1f7cbea984	Revert #12466 and #12467 to fix JIT test error on Windows CI (#12557 ) Summary: Sample error log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/11766/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/12557 Differential Revision: D10324615 Pulled By: yf225 fbshipit-source-id: 12e5fc73da42ffa22e39250aee9ea072fd2e33de	2018-10-10 23:56:56 -07:00
tomguluson92	170d84228e	Delete redundant statement of `col2im` (#12514 ) Summary: Hi, I found that there was two statement of `col2im` in `im2col.h` and think the former one may be redundant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12514 Differential Revision: D10328721 Pulled By: ezyang fbshipit-source-id: d225547848803511c7cc58bd9df1cc6832a537fb	2018-10-10 23:56:54 -07:00
sclarkson	2b033332c8	Allow linking to backwards-compatible cuDNN at runtime (#12239 ) Summary: Fixes #12193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239 Differential Revision: D10321744 Pulled By: soumith fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478	2018-10-10 23:56:51 -07:00
Ailing Zhang	8734b174ca	Multinomial raise error (#12490 ) Summary: Fixes #12260 #2896 ``` torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False) ``` The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread. - Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12490 Differential Revision: D10278794 Pulled By: ailzhang fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1	2018-10-10 20:39:04 -07:00
Jerry Zhang	b89a3b50fb	Remove StaticContext (#12547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12305 Remove StaticContext from context_base.h Reviewed By: dzhulgakov Differential Revision: D10073519 fbshipit-source-id: 350beec3c54365edef338318ce58229ccb825a98	2018-10-10 19:41:03 -07:00
Will Feng	c32839fc90	CircleCI: better credentials visibility (#12552 ) Summary: We will rotate the credentials if the new setting works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12552 Differential Revision: D10322121 Pulled By: yf225 fbshipit-source-id: 158f2f89b83a751566a912869a4400d5be6e5765	2018-10-10 18:25:09 -07:00
Junjie Bai	89010d60f9	Migrate HIP to use DeviceOption.device_id and delete DeviceOption.hip_gpu_id Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12546 Reviewed By: hyuen, xw285cornell Differential Revision: D10305222 fbshipit-source-id: 955e1d2878508a25fe4e9980ae66f8f54aaf7db9	2018-10-10 18:25:06 -07:00
Orion Reblitz-Richardson	25bd7fe488	Add USE_FFMPEG flag for setup.py and R2Plus1D (#12543 ) Summary: Needed for https://github.com/facebookresearch/R2Plus1D/pull/46 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12543 Differential Revision: D10320147 Pulled By: orionr fbshipit-source-id: a7dcbf7c0d4b405b9e89b28ef75a0ed1cf2a3e6a	2018-10-10 18:09:48 -07:00
Dong Shi	da3dd9af12	No Op Optimizer (#12390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390 Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes. Reviewed By: mlappelbaum Differential Revision: D10209812 fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6	2018-10-10 18:09:46 -07:00
Marat Dukhan	8399778049	Update FP16 submodule (#12554 ) Summary: Pull a patch that fixes remaining incompatibility with Microsoft compiler on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/12554 Differential Revision: D10319736 Pulled By: Maratyszcza fbshipit-source-id: bcd88581df48f2678ef81e095f947391104f24d5	2018-10-10 17:25:17 -07:00
Syed Tousif Ahmed	543048d275	Adds launch bounds for CTC loss kernel (#12379 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12324 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12379 Differential Revision: D10318361 Pulled By: ezyang fbshipit-source-id: aec4ae8205e780b18560d639543ed9d0ef0527ce	2018-10-10 17:09:38 -07:00
Jerry Zhang	7724807551	Remove ExtractDeviceOption from StaticContext (#12304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12304 - make ExtractDeviceOption to be a free function. - Add a Strorage(at::Device) constructor in order to preserve the device_id. Reviewed By: dzhulgakov Differential Revision: D10069839 fbshipit-source-id: a5f3994a39bdf1b7503b39bb42c228e438b52bfa	2018-10-10 14:12:16 -07:00
Giovanni	0d50c117db	Introduce BUILD_ATEN_ONLY cmake option (#12443 ) Summary: Following up #11488 conversation with orionr And our brief conversation at PTDC about ATen with soumith and apaszke This PR enables a very slim build focused on ATen particularly without caffe2 and protobuf among other dependencies. WIth this PR NimTorch tests pass fully, including AD, convolutions, wasm, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12443 Reviewed By: mingzhe09088 Differential Revision: D10249313 Pulled By: orionr fbshipit-source-id: 4f50503f08b79f59e7717fca2b4a1f420d908707	2018-10-10 12:54:19 -07:00
Will Feng	a442853f4f	CircleCI: try to fix submodule not found error (#12542 ) Summary: Try to fix the "submodule not found" infra error: https://circleci.com/gh/pytorch/pytorch/48431 by switching to use the official git client (instead of CircleCI's default git client). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12542 Differential Revision: D10305027 Pulled By: yf225 fbshipit-source-id: 42db0694efb468d9460ef51d7b4b2bd90d78ff24	2018-10-10 12:54:17 -07:00
Marat Dukhan	b51901f7d3	Update FP16 submodule (#12539 ) Summary: Pull a patch that makes FP16 compatible with Microsoft compiler on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/12539 Reviewed By: hyuen Differential Revision: D10303487 Pulled By: Maratyszcza fbshipit-source-id: 4e20ece6338e4d0663cd3591914ce333f0972693	2018-10-10 11:54:06 -07:00
Will Feng	45db8274de	CircleCI: Add credentials for pushing to perf test S3 bucket (#12523 ) Summary: This will fix the perf test baseline update in master builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12523 Reviewed By: bddppq Differential Revision: D10289415 Pulled By: yf225 fbshipit-source-id: 408893ab2b0f93c7cffb9f8fbf74453155b850c4	2018-10-10 11:54:04 -07:00
Jerry Zhang	c2a57d082d	Fix windows build (#12534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12534 att Reviewed By: orionr Differential Revision: D10300123 fbshipit-source-id: 3079864b6979779af4a524a54b28f9b2baed8ba4	2018-10-10 09:39:06 -07:00
Peter Goldsborough	033e95765c	Diff against master and enable bugprone-* checks (#12378 ) Summary: This PR: 1. Makes clang-tidy diff against `master` instead of `HEAD~1` in CI, which makes much more sense 2. Enables all checks in the `bugprone-*` category (see https://clang.llvm.org/extra/clang-tidy/checks/list.html) except one about parantheses in macros, because it doesn't always apply too well for us. Fixed some nice code smells. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12378 Differential Revision: D10247972 Pulled By: goldsborough fbshipit-source-id: 97dc9e262effa6874d2854584bf41a86684eb8bd	2018-10-10 07:23:57 -07:00
Sebastian Messmer	727609f435	Use SFINAE instead of macros for 'long' hack (#12424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12424 Some compilers define 'long' as a separate type from 'int32_t' or 'int64_t', some don't. Before, we had a cmake check setting a macro and depending on the macro, we either defined a separate type id for long or didn't. Then, we removed the cmake check and used compiler detection macros directly. This is, however, error prone. This new approach uses SFINAE to register a type id for 'long' only if it's a separate type. Reviewed By: Yangqing, dzhulgakov Differential Revision: D10209620 fbshipit-source-id: 68f09339e279a9a56b95caeef582c557371b518d	2018-10-10 01:11:06 -07:00
Gao, Xiang	e25b8869f7	typo: Aten.h -> ATen.h in cppdocs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12519 Differential Revision: D10287901 Pulled By: goldsborough fbshipit-source-id: 56e0c1851aade84e4154777776d14e087645a762	2018-10-09 23:40:14 -07:00
Marat Dukhan	3829f86c7a	Update NNPACK-related submodules (#12505 ) Summary: Update submodules below: - NNPACK - FP16 - pthreadpool - cpuinfo - psimd Pull Request resolved: https://github.com/pytorch/pytorch/pull/12505 Reviewed By: hyuen Differential Revision: D10286690 Pulled By: Maratyszcza fbshipit-source-id: 279214b47c82e9e2582693191cc218173c00ea69	2018-10-09 21:54:07 -07:00
Xiaodong Wang	283f21d518	Caffe 2 adoption (#12116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12116 Adapt Caffe 2 to platform007 (gcc 8): * gcc 8 + nvcc template symbol lookup (D9319742): context_.template CopySameDevice<T> ==> this->context_.template CopySameDevice<T> * New gcc 8 warning (error): * -Werror=sizeof-pointer-div * Unnecessary parenthesis Reviewed By: bddppq Differential Revision: D10045844 fbshipit-source-id: 95f509fefc9593cbb82b1687793fef8930260d2f	2018-10-09 19:29:23 -07:00
Ilia Cherniavskii	16b8075acd	finishRun fix (#10970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10970 Fixing a possible case when next iteration of a net may be started prematurely. We have to ensure that resetting running_ flag is done after finalizeEvents (e.g. waiting for the rest of net's event to be finished). Reviewed By: heslami Differential Revision: D9545442 fbshipit-source-id: bc324a180b1e93054b051981817be7985f52b4cb	2018-10-09 16:09:46 -07:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Bram Wasti	caf8b0777a	Move function_schema to ATen/core (#12467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12467 final move of files to enable nomnigraph wrapped pytorch IR Reviewed By: ezyang Differential Revision: D10242930 fbshipit-source-id: 1af6400ae0c1f1e7c3be262fbca58010eb2bfa86	2018-10-09 15:38:27 -07:00
Bram Wasti	f989d4b18e	Move jit/type and utils/functional to ATen/core (#12466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12466 Moves type.{h,cpp} and functional.h to ATen/core move is necessary for IR merging -- slimmed down from this diff: D9819906 Reviewed By: ezyang Differential Revision: D10242680 fbshipit-source-id: b71eeec98dfe9496e751a91838d538970ff05b25	2018-10-09 15:38:24 -07:00
onnxbot	58b247fc42	Update onnx to onnx/onnx@008e381 (#12492 ) Summary: `008e381855` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12492 Differential Revision: D10268646 Pulled By: bddppq fbshipit-source-id: 39d2eae66abee898a30b71c23e54f5c51d3f9ac8	2018-10-09 15:38:22 -07:00
iotamudelta	64f707cd26	Enable more unit tests (ROCm 255) (#12486 ) Summary: * Enable more tests that relied on CPU LAPACK at compile time. * enabled min/max tests in test_cuda (ROCm 236) bddppq ezyang Tests ran as part of the ROCm CI here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/255 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12486 Differential Revision: D10262534 Pulled By: ezyang fbshipit-source-id: 167a06fc8232af006f4b33dcc625815fd4b06d6b	2018-10-09 15:38:19 -07:00
vishwakftw	dcd9d73d47	Expunge torch.utils.trainer.* (#12487 ) Differential Revision: D10273602 Pulled By: SsnL fbshipit-source-id: 630c1f8ee0e366f7092d4f93dbe1efa96fc860e0	2018-10-09 14:56:00 -07:00
Peter Goldsborough	8468b7d3f0	Fix tensor doc (#12469 ) Summary: The C++ docs for `at::Tensor` are currently broken because we moved the place `Tensor.h` gets generated to without updating our docs. I use `GEN_TO_SOURCE=1` when generating ATen files, so the `Tensor.h` file should end up in `aten/src/ATen/core/Tensor.h` if i understand correctly. dzhulgakov ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/12469 Differential Revision: D10248521 Pulled By: goldsborough fbshipit-source-id: 8d8a11f0f6e2703b8d767dbc523fc34a4374f345	2018-10-09 14:09:22 -07:00
Will Feng	2b22c60980	Fix GPU perf tests on CircleCI (#12491 ) Summary: `COMMIT_SOURCE` is missing in the current CircleCI config, which is used in perf tests to decide whether to store the new numbers as baseline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12491 Differential Revision: D10274426 Pulled By: yf225 fbshipit-source-id: 047ef6cc61a12738062f9940d1bfd4c3bf152909	2018-10-09 13:53:45 -07:00
iotamudelta	b572e27502	Fix types and warp sizes for ROCm (ROCm 256) (#12485 ) Summary: * Correct the warp size for current AMD GPUs * Fix copy paste error in configure * Correct the wrong typing explicitly bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12485 Differential Revision: D10262490 Pulled By: ezyang fbshipit-source-id: 93467944247ed764d9ac9f7bb212a94fc250608e	2018-10-09 12:53:48 -07:00
iotamudelta	c96afa3322	topk and sort fixes (#12337 ) Summary: * Topk part 1: fix intrinsincs for 64 wave front (#224) 64 in a wave front - intrinsics change. * Disable in-place sorting on ROCm. (#237) It is known to hang - use the Thrust fallback Skip one test - fails with the fallback. * Topk fixes (#239) * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255 * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs * Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63 * Round up blockDim.x to prevent negative index for smem bddppq ezyang Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12337 Differential Revision: D10259481 Pulled By: ezyang fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e	2018-10-09 12:08:48 -07:00
Thomas Viehmann	ea79f7c032	Add derivative to pow with scalar base (#12450 ) Summary: Fixes: #12426 Thank you, DriesSmit, for the report! Pull Request resolved: https://github.com/pytorch/pytorch/pull/12450 Differential Revision: D10238556 Pulled By: soumith fbshipit-source-id: 8bf71467c6734ecc5ff30f15500304d731f7e155	2018-10-09 11:38:48 -07:00
Jie	a3fb004b18	(#12474 ) Summary: Modifies the DistributedSampler logic. Now each process samples elements with a given interval, instead of a consecutive section. This eliminates the possibility where the DataLoader uses padded data while dropping the real data. It happens when: 1. DistributedSampler padded data; and 2. DataLoader drops_last is effectively true, and drops less then the number of padded data. from the example down, we see that data (10, 11, 12) are padded through duplicating data sample (1, 2, 3) The old sampler drops legit original data (3, 6, 9) and introduces duplication (10, 11) into the training set; while the new sampler logic samples correct data points from the data set. This example has been added to dataloader unit test example: ``` data after shuffle: 1, 2, 3, 4, 5, 6, 7, 8, 9 padded data : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 old sampler: -> DataLoader with (batch_size=2 and drop_last=True) p 1: 1, 2, 3 1, 2 p 2: 4, 5, 6 4, 5 p 3: 7, 8, 9 7, 8 p 4:10,11,12 10,11 new sampler: -> p 1: 1, 5, 9 1, 5 p 2: 2, 6,10 2, 6 p 3: 3, 7,11 3, 7 p 4: 4, 8,12 4, 8 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12474 Differential Revision: D10260410 Pulled By: SsnL fbshipit-source-id: 710856571260f42ce25955b81a5b8008e04938cf	2018-10-09 11:23:50 -07:00
Jerry Zhang	1c69d368e1	Remove New with Allocator Registry (#12111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12111 Setup allocator registry keyed by at::DeviceType, and remove New from StaticContext. Reviewed By: ezyang Differential Revision: D10022853 fbshipit-source-id: 3e88a181fe5df24f33f49b88be1f75284a185588	2018-10-09 10:53:52 -07:00
Christian Puhrsch	f564163951	Remove SSE-only code and convolve5x5 (#12109 ) Summary: Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs. On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109 Differential Revision: D10055134 Pulled By: colesbury fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa	2018-10-09 10:53:50 -07:00
Tongzhou Wang	11c31aef04	Prevent hanging in data loader altogether Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985 Differential Revision: D10202374 Pulled By: SsnL fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4	2018-10-09 09:54:19 -07:00
Roy Li	1a0d82e4f4	fix import for script module with control flow blocks (#12351 ) Summary: The value_info proto field was being processed in BuildGraph, but control flow blocks used buildBlocks instead. This PR moves moves that step to BuildBlock. I removed DecoderBase because it was making the code confusing and we never needed it in the first place. closes #12319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12351 Differential Revision: D10212411 Pulled By: li-roy fbshipit-source-id: 47f289a462a1ab7391ff57368185401673980233	2018-10-08 22:25:14 -07:00
Bram Wasti	c959be9d1d	Create named functions construct (#12237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12237 This diff creates named functions and cleans up a lot of the basic block usage throughout the code Reviewed By: duc0 Differential Revision: D10134363 fbshipit-source-id: d0c4ae0bbb726236a15251dbfd529d4fddcd9e9f	2018-10-08 22:12:18 -07:00
Bram Wasti	8414094562	cleanup controlflow (#12235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12235 SSA is actually implicitly maintained so not only was this function not implemented, it never should be implemented. Reviewed By: duc0 Differential Revision: D10133928 fbshipit-source-id: e8e5e2386f8b57812b0be2c380af85ed07cd3152	2018-10-08 22:12:13 -07:00
Tongzhou Wang	d400502b1d	Fix a bunch of warnings in TestNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453 Differential Revision: D10244130 Pulled By: SsnL fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0	2018-10-08 17:38:23 -07:00
Will Feng	cdead5ace1	Enable CircleCI for Linux jobs (#12389 ) Summary: Changes in this PR: 1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests. 2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs. After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389 Differential Revision: D10224267 Pulled By: yf225 fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd	2018-10-08 17:09:37 -07:00
Dong Shi	5a0d2c7138	Add clamping functionality to stats_put_ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12391 Reviewed By: mlappelbaum Differential Revision: D10220000 fbshipit-source-id: 10fdbc8ebab931a5be31df964b5de5728048205d	2018-10-08 16:53:26 -07:00
Edward Z. Yang	1ee6fc4002	Delete noexcept on the move constructor of OrderedDict (#12369 ) Summary: Previously we tested if default-construction was noexcept, which doesn't really mean that the move constructor is noexcept too. Shuts up clang-tidy. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/12369 Differential Revision: D10217348 Pulled By: ezyang fbshipit-source-id: b46437d8ac7a8d756cf03ed0c6bf4400db7ecde7	2018-10-08 16:38:27 -07:00
Ilia Cherniavskii	dd4b9b06a4	Back out "Back out "[caffe2] Use custom CPU thread pool in async_scheduling"" (#12418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12418 Original commit changeset: 32921600925b Reviewed By: yinghai Differential Revision: D10231119 fbshipit-source-id: 7d09ea8de82ff2d911d9ded88d87af4226464d1b	2018-10-08 16:24:07 -07:00
Teng Li	c5d7494ca1	Use open-source NCCL2 in PyTorch (#12359 ) Summary: - Removed the old nccl file - Make open-source NCCL a submodule - CMake to make NCCL itself NCCL2 now is in the default build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359 Reviewed By: orionr, yns88 Differential Revision: D10219665 Pulled By: teng-li fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881	2018-10-08 15:39:07 -07:00
James Reed	c3987a0fc3	Fix issues with ATenOp handling methods where `self` is not the first arg (#12353 ) Summary: ATenOp was handling `torch.where` incorrectly. Whereas the `torch.where` overload (and `aten::` function) had arguments in the order `Tensor condition, Tensor self, Tensor other`, ATenOp was emitting code that assumed that `self` was the 0th argument, and thus was trying to interpret the wrong value as the condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12353 Differential Revision: D10218435 Pulled By: jamesr66a fbshipit-source-id: afe31c5d4f941e5fa500e6b0ef941346659c8d95	2018-10-08 15:25:39 -07:00
Elias Ellison	d0e1dca0f5	fix expect file (#12465 ) Summary: Fix expect file that got out of sync Pull Request resolved: https://github.com/pytorch/pytorch/pull/12465 Differential Revision: D10244646 Pulled By: eellison fbshipit-source-id: 66d101d4c6c0a235ce9fa47dc3cce027624c86bc	2018-10-08 13:54:24 -07:00
James Reed	5bac46508a	Fix TestJit.test_alexnet expect file (#12458 ) Summary: This test only runs when you have torchvision installed, which is not the case on CI builds. When I run test_jit on my local machine, this fails, so fixing up the expect file here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12458 Differential Revision: D10244344 Pulled By: jamesr66a fbshipit-source-id: 728c5d9e6c37f807a0780066f20f6c31de84d544	2018-10-08 13:54:22 -07:00
Marcela Morales Quispe	d4b4c1fbec	Add missing url links to README.md file. (#12440 ) Summary: Signed-off-by: Marcela Morales Quispe <marcela.morales.quispe@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12440 Differential Revision: D10242642 Pulled By: SsnL fbshipit-source-id: f47d7579cf3df097c476a97b58149ca4b1eb17ab	2018-10-08 13:54:21 -07:00
Marat Dukhan	a55b9f77a0	Implement 3D and 4D parallelization in Caffe2 thread pool (#12455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12455 - Mirror changes in pthreadpool Reviewed By: harouwu Differential Revision: D10240470 fbshipit-source-id: c1af769b5894f7865736fdaf4e0e5bf17c524614	2018-10-08 13:12:57 -07:00
Bram Wasti	d181e0f1fc	Add move{Node,Edge,Subgraph} for Graph move-like semantics (#12303 ) Summary: Adding back import{Node,Edge} as move{Node,Edge} and adding a new function moveSubgraph. Previous diff broke OSS Pull Request resolved: https://github.com/pytorch/pytorch/pull/12303 Differential Revision: D10182522 Pulled By: bwasti fbshipit-source-id: 9619431d6d1a44f128613a4f6d8b7f31232ccf28	2018-10-08 12:53:25 -07:00
Bram Wasti	cf2b88fa30	Induce edges on subgraphs (#12255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12255 Simple algorithm to connect a subgraph Reviewed By: ZolotukhinM Differential Revision: D10141701 fbshipit-source-id: c79c5bc2be89100db602d0a5ff3d17e3dc332d8c	2018-10-08 12:24:55 -07:00
Bram Wasti	7103d0d938	Add python bindings (#12253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12253 Adding python bindings to unblock DAI development Reviewed By: duc0 Differential Revision: D10141621 fbshipit-source-id: efac7fb8a0cc787e1c4cc94515e673812529a997	2018-10-08 12:24:53 -07:00
Yinghai Lu	e7653c7561	New chaining/partitioning algorithm for async_scheduling for inference (#11957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11957 For distributed inference, we want to use async_scheduling net to run the net as we need its async part. However, according to the profiling, async_net has big overhead of dispatching tasks onto worker threads. This diff improves the issue by generating a smaller number of chains/tasks by grouping the sync ops that can be run in one shot. Note that it also schedule individual async ops as a single chain because unlike gpu ops, rpc ops are not guaranteed to be linearized at the remote site. For example, if you have two rps ops `op1->op2`, op2 won't implicitly block until op1 finishes. Therefore we need to put each of the async op as one chain as async_scheduling net will only sync the tail of the chain. For the all sync op nets, this change give us `1.5X` slower than simple_net, while without the change, it is `7X` slower. Next step is to work on the executor to make the task scheduling faster. And add a fallback path to be able to run ops inline if it's a all-sync net. Reviewed By: ilia-cher Differential Revision: D9874140 fbshipit-source-id: fcd45328698c29211f2c06ee3287194acda12227	2018-10-08 12:24:52 -07:00
Jongsoo Park	f1f521f71b	make bench_gen.py work for 3d conv (#12433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12433 To test 3d conv, we need to pass lists in spec argument. We also don't want to set use_cudnn=True which is the default in brew. Reviewed By: llyfacebook, csummersea Differential Revision: D10234315 fbshipit-source-id: 96a39992a97e020d6e9dac103e6d64df0cc1020b	2018-10-08 12:24:43 -07:00
Elias Ellison	00aedfc0e2	constant pooling pass (#12222 ) Summary: Add a pass to move all constants to the beginning of the graph, and deduplicate. This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222 Reviewed By: driazati Differential Revision: D10201616 Pulled By: eellison fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd	2018-10-08 11:55:02 -07:00
Gregory Chanan	83b4dc6822	Remove Type.tensor(). (#12360 ) Summary: Use at::empty instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12360 Reviewed By: ezyang Differential Revision: D10215119 Pulled By: gchanan fbshipit-source-id: f9bb257dff1b1bf1ecd3a6e358c4791d81b5bd31	2018-10-08 11:39:05 -07:00
peter	28e1571843	Add the x64 msvc toolchain into PATH (#12446 ) Summary: A possible fix for the problem stated in #12410. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12446 Differential Revision: D10238572 Pulled By: soumith fbshipit-source-id: 17ade148c4036d2481b878e5cd7d9d67c1e3626e	2018-10-08 07:54:20 -07:00
Roy Li	def655ec27	fix critical section of atomic add op Summary: When testing D10220313, I ran into this bug. Reviewed By: aazzolini Differential Revision: D10224295 fbshipit-source-id: f46d7333612bce437c1ae6c0b0b579fc2a639665	2018-10-08 02:20:23 -07:00
Marcela Morales Quispe	8689d8af36	Format inline code block. (#12441 ) Summary: Signed-off-by: Marcela Morales Quispe <marcela.morales.quispe@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12441 Differential Revision: D10236743 Pulled By: SsnL fbshipit-source-id: c0e446a81a388cf6a558bf7ab8ba0e59703dc169	2018-10-08 00:51:07 -07:00
Thomas Viehmann	0e44db8b0d	Add check for backend of arguments to bmm cpu (#12434 ) Summary: Fixes: #12406 Thank you, jcjohnson, for reporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12434 Differential Revision: D10235799 Pulled By: soumith fbshipit-source-id: 44ee35010bac3791901f604095f5b4bc66b0e7f8	2018-10-07 18:55:42 -07:00
Peter Goldsborough	db8d01b248	Move JIT tests to gtest (#12030 ) Summary: In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch. I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically: 1. One function declaration per "test case" in test/cpp/jit/test.h 2. One definition in test/cpp/jit/test.cpp 3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests 4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy. ezyang apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/12030 Differential Revision: D10207745 Pulled By: goldsborough fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e	2018-10-06 23:09:44 -07:00
Sebastian Messmer	6f664d3917	Improve TypeMeta (#11502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11502 TypeMeta now is only a pointer to a TypeMetaData structure, of which there is exactly one global instance per type. This reduces the size of everything storing a TypeMeta (Tensor, Blob, ...) and potentially improves performance. Also, this diff gets rid of the type name registry in favor of static strings. Experiments (summary: 1-3% perf gain) - Service Lab: https://our.intern.facebook.com/intern/servicelab/30712497/ -> No significant results found. - Mobile Lab c10bench.json: https://our.intern.facebook.com/intern/fblearner/details/75984908/ -> 1-3% perf gain - Mobile Lab c10bench default: https://our.intern.facebook.com/intern/fblearner/details/75984999/ -> 2-3% perf gain - adindexer canary: https://our.intern.facebook.com/intern/ads/canary/413002142824203076 -> no significant changes (benchmark too noisy) - adfinder canary: https://our.intern.facebook.com/intern/ads/canary/413002166737860362 -> no significant changes (benchmark too noisy) Reviewed By: dzhulgakov Differential Revision: D9763422 fbshipit-source-id: fc08937f114af5ff9f3ddbe7c7e396942868cdf5	2018-10-06 14:09:28 -07:00
Sebastian Messmer	ac9bb8ecef	Make dynamic_cast_if_rtti safer (#12408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12408 Using static_cast is better than reinterpret_cast because it will cause a compile time error in the following cases, while reinterpret_cast would run into undefined behavior and likely segfault: - Src and Dst are not related through inheritance (say converting int* to double*) - Src and Dst are related through virtual inheritance This `dynamic_cast_if_rtti` is still unsafe because `dynamic_cast` and `static_cast` behave differently if the runtime type is not what you expected (i.e. dynamic_cast returns nullptr or throws whereas static_cast has undefined behavior), but it's much safer than doing reinterpret_cast. Reviewed By: Yangqing Differential Revision: D10227820 fbshipit-source-id: 530bebe9fe1ff88646f435096d7314b65622f31a	2018-10-06 12:56:27 -07:00
Gregory Chanan	0e966fc9f9	Back out "[caffe2] Use custom CPU thread pool in async_scheduling" (#12415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12415 Original commit changeset: 95da8c938b8e Reviewed By: ilia-cher Differential Revision: D10229804 fbshipit-source-id: 32921600925b65edb5bb201c9afba0d03ed49426	2018-10-06 00:42:06 -07:00
Gregory Chanan	695465915a	Remove some Type.tensor usages and remove native_tensor without size. (#12403 ) Summary: Same as before, but with "initialTensorOptions()" instead of "TensorOptions(false)". Pull Request resolved: https://github.com/pytorch/pytorch/pull/12403 Differential Revision: D10225427 Pulled By: gchanan fbshipit-source-id: 60bd025a5cc15bdbbab6eafc91ea55f5f2c3117e	2018-10-05 20:55:14 -07:00
Ilia Cherniavskii	14b48a2404	Use custom CPU thread pool in async_scheduling (#12295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12295 Add ability to use custom implementations of thread pool instead of TaskThreadPool Reviewed By: yinghai Differential Revision: D10046685 fbshipit-source-id: 95da8c938b8e60b728484c520319b09b0c87ff11	2018-10-05 19:56:04 -07:00
David Riazati	92b0e7026e	Add weak script mode for script functions (#11963 ) Summary: This PR is the start of weak script mode for functions Weak scripts allow you to compile a graph from Python code at runtime by annotating with `torch.jit.weak_script` for use in the JIT without affecting eager execution. Scripts are compiled lazily on the first call in a graph to avoid long Python startup times. apaszke zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11963 Differential Revision: D10183451 Pulled By: driazati fbshipit-source-id: 128750994d5eb148a984f8aba4113525c3e248c8	2018-10-05 18:55:49 -07:00
Edward Yang	058a31839d	Warn about local_rank not being globally unique. (#12370 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC deepakn94 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12370 Differential Revision: D10220135 Pulled By: ezyang fbshipit-source-id: 6d1a8a383951ae52753e4f75a14b8080bf02b815	2018-10-05 17:38:41 -07:00
iotamudelta	3f04ca9a91	Remove duplicate math transpilation function (ROCm 233) (#12387 ) Summary: * Remove duplicate math transpilation function * Modify regex to expand matches to more __device__ functions * Try a different tack. Apply math transpilations only to .cu and .cuh files * Undo change that's not required anymore since we're not using regex to detect device functions This should address "overtranspilation" as observed in another PR. bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12387 Differential Revision: D10226798 Pulled By: bddppq fbshipit-source-id: fa4aac8cd38d8f7ef641fad5129ed4714c0fada5	2018-10-05 17:16:35 -07:00
James Reed	e1fe617600	Fix flipped pad buffer constructor arguments Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12361 Differential Revision: D10218404 Pulled By: jamesr66a fbshipit-source-id: f02137f97cd138155ba8181df3ab65f41d5abab7	2018-10-05 17:16:32 -07:00
Eli Amesefe	99de4565dd	Split reduction_front_backops.[cc\|cu] into smaller units to allow build of smaller size (#12315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12315 Allows inclusion of needed reduce_front_back_* ops only Differential Revision: D10188611 fbshipit-source-id: e17fd955ac5aa163a039872b6a435942b1e1e164	2018-10-05 16:50:21 -07:00
Zachary DeVito	b937cbb776	Fix a bug that would resize tensor storage on export Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12377 Differential Revision: D10219213 Pulled By: zdevito fbshipit-source-id: 85cfa4467c672ff5a718e58cfae7e8c8b1cfc532	2018-10-05 16:24:54 -07:00
Anders Papitto	57fcc57f31	set CMAKE_INSTALL_MESSAGE to NEVER (#12392 ) Summary: this removes a bunch of spam output from the build. This is (1) cleaner (2) a couple seconds faster in some cases, e.g. my slow-rendering emacs-based shell Pull Request resolved: https://github.com/pytorch/pytorch/pull/12392 Differential Revision: D10225340 Pulled By: anderspapitto fbshipit-source-id: 477ee76d24f8db50084b1e261db8c22733de923b	2018-10-05 15:57:44 -07:00
Edward Yang	54d9823d00	Make caffe2::Tensor::dims() return an IntList instead of a const vector& (#12180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180 I had to fix a lot of call sites, because a lot of places assume that you can actually get a const vector&, and if the internal representation of sizes in a tensor is NOT a vector, it's not possible to fulfill this API contract. Framework changes: - I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to sizes() now. - De-templatized SetDims; now it is an explicit list of ArrayRef and variadic overloads. This makes implicit conversions work again, so I don't need to explicitly list the std::vector cases too. - As a knock-on effect, this causes Reset() to accept at::IntList as well as const std::vector<int64_t>& - Edited variadic overloads of SetDims to all forward to the underlying arbitrary-dim implementation, reducing code duplication. (It's probably marginally less efficient in the new world.) - Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList - Make MKLTensor accept ArrayRef along with vector in constructor and Reset (unfortunately, no implicit conversions here, since it's templated on index type.) - There are a few other places, like cudnn, where I changed functions that previously took const std::vector<int64_t>& to take at::IntList instead. Classification of call site changes: - 'const std::vector<int64_t>& x_dims = x.dims()' ==> 'at::IntList x_dims = x.dims()' - 'std::vector<int64_t> x_dims = x.dims()' ==> 'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!) Usually this is because we're about to mutably modify the vector to compute some new dimension. However, it also very commonly occurs in the form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators. - Instead of constructing std::vector<int64_t>{blah, blah}, construct an at::IntList directly ArrayRef changes: - cbegin()/cend() iterators, they operate the same aas begin()/end() because everything on ArrayRef is const. - Moved operator<< into ArrayRef.h, so that it's always available when working with ArrayRef. I also templated it, so it now works on an ArrayRef of any type. - Add operator== overload for ArrayRef, and also add variants to permit comparison of ArrayRef with std::vector, a very common operation. (The non-templated version of operator== can get these automatically via implicit conversion, but with templates C++ refuses to do any explicit conversions.) I'm planning to audit all dims() call sites to make sure they don't expect 'auto x = t.dims()' to give you an x whose lifetime can validly outlive the tensor. I opted not to do a dims() to sizes() rename, because dims() also matches the protobufs accessor. Bad news! Reviewed By: jerryzh168 Differential Revision: D10111759 fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452	2018-10-05 15:57:41 -07:00
Sam Gross	f9fb37ca79	Guard Denormals-Are-Zero with runtime CPU check (#12386 ) Summary: Previously, we were only enabling Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After, Christian's patch (https://github.com/pytorch/pytorch/pull/12109) we won't be compiling core files with SSE3 or SSE4 enabled, to better support older AMD processors. This moves the FTZ and DAZ code behind a runtime CPU check in preparation for that change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12386 Differential Revision: D10222237 Pulled By: colesbury fbshipit-source-id: 7ffe32561ab965e1e5f9eb6e679602bbf4775538	2018-10-05 14:54:54 -07:00
Zachary DeVito	bd09ab6687	Remove stages from IR, they are not longer used Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12352 Differential Revision: D10219743 Pulled By: zdevito fbshipit-source-id: 4d9441dc3748616f9b1f0734c65ec1a7abb0d663	2018-10-05 13:58:15 -07:00
Brian Vaughan	c7e8044fc8	Support additional device types (#12293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12293 Adding support for additional device types besides cuda and cpu. Reviewed By: ezyang Differential Revision: D10175683 fbshipit-source-id: 7a8a35c3f1b13a3b6ed84dd2d835f3902a418a6c	2018-10-05 13:15:05 -07:00
daquexian	f8086845aa	Fix bug in grad.py when conv bias != None (#12281 ) Summary: Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)` ``` RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281 Differential Revision: D10217370 Pulled By: ezyang fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2	2018-10-05 12:55:14 -07:00
Gregory Chanan	e2d2b270db	Revert D10212616: [pytorch][PR] Remove some Type.tensor usages and remove native_tensor without size. Differential Revision: D10212616 Original commit changeset: c9cd128d1111 fbshipit-source-id: 923781ba9cd6e60e7c92789832e5601a1fd848b5	2018-10-05 11:55:45 -07:00
Gregory Chanan	705d80b51e	Remove some Type.tensor usages and remove native_tensor without size. (#12355 ) Summary: This is to move us along the path to removing Type from the public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12355 Reviewed By: ezyang Differential Revision: D10212616 Pulled By: gchanan fbshipit-source-id: c9cd128d1111ab219cb0b2f3bf5b632502ab97c0	2018-10-05 11:12:07 -07:00
David Riazati	9ebac3d7fe	Improve type kind error message (#12344 ) Summary: Address #12326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12344 Differential Revision: D10210681 Pulled By: driazati fbshipit-source-id: fcc2e26b79dd2d7d5f9e7ef930e2bf434f2a7e08	2018-10-05 10:57:16 -07:00
Devashish Tyagi	0ebbfc25f3	Add utility function make_tensor (#12288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12288 Current implementation of Tensor takes an intrusive_ptr as an argument for storing data. But instead of requiring users to explicitly pass an intrusive_ptr we want them to pass args for intrusive ptr directly which are forwarded internally through new helper function called make_tensor Reviewed By: ezyang Differential Revision: D10152661 fbshipit-source-id: bfa72de161ace3fd1c4573427abcd1bfbd12e29e	2018-10-05 10:40:28 -07:00
Edward Yang	dd2c487ab0	Enforce invariant that storage_ is always non-null (#12328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12328 - Delete reset() from Storage, as it makes it easy to accidentally create a null storage. - Immediately reject a storage if it is null when passed in Reviewed By: dzhulgakov Differential Revision: D10200448 fbshipit-source-id: 14bfa45f8f59859cc350bd9e20e3ef8692e3991d	2018-10-05 09:43:34 -07:00
Yangqing Jia	7788ec9dd1	Remove dangling cmake check for long typemeta (#12356 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/12356 Differential Revision: D10212726 Pulled By: Yangqing fbshipit-source-id: b9c2c778fb496278477ef323ecfefd5d19d1af3c	2018-10-05 09:43:32 -07:00
Edward Yang	1e7050072b	Make TensorOptions contain optional fields, optimize struct size (#12103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103 This defers lookup of defaults to the site where we read out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE, but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions inside the OptionsGuard region, and then use it outside of the region (the situation where behavior could change.) I also optimize the size of TensorOptions by rearranging fields, so that we always fit in two 64-bit words. Reviewed By: goldsborough Differential Revision: D10052523 fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13	2018-10-05 09:24:53 -07:00
ohlr	b3cdaee6db	Update README.md of ATen Documentation (#12367 ) Summary: The changes are made to clarify how the parsing between the yaml files and header files of THNN and THCUNN works. As issue #12320 shows it is not that easy to understand the existing code without a hint to the important files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12367 Differential Revision: D10217459 Pulled By: ezyang fbshipit-source-id: 9b3e64dea4f156843814840e736dc3230332060c	2018-10-05 08:39:55 -07:00
Bram Wasti	5cb2b2358c	Move interned_strings and get build working (#12039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12039 Refactoring out this diff D9819906 Reviewed By: ezyang Differential Revision: D10024844 fbshipit-source-id: 75b6c93526dc1490299f8b5e564e029146338178	2018-10-05 00:41:18 -07:00
Roy Li	f494f004b7	Fix unintended casting to long (and fix Half overloads) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12357 Reviewed By: Yangqing Differential Revision: D10213037 Pulled By: li-roy fbshipit-source-id: 98f7f5ee2b51a3fab378faf65482919caf008957	2018-10-05 00:28:00 -07:00
Tongzhou Wang	d4c58216d7	Stop warnings on AT_DECLARE_TENSOR_TYPE(.); (#12348 ) Summary: e.g., ``` │../aten/src/ATen/core/TensorTypeIdRegistration.h:101:43: warning: extra ‘;’ [-Wpedantic] │ AT_DECLARE_TENSOR_TYPE(SparseCUDATensorId); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12348 Differential Revision: D10210072 Pulled By: SsnL fbshipit-source-id: 90eacc97ef490148c0ac1357cf28f1326a791dfa	2018-10-04 23:16:47 -07:00
Yinghai Lu	d9ba2b6894	Add Pytorch domain specifc ONNX schema for SparseNN ops (#12338 ) Summary: as the tile said Pull Request resolved: https://github.com/pytorch/pytorch/pull/12338 Differential Revision: D10204691 Pulled By: yinghai fbshipit-source-id: fe6bb8c715a54372508672fc0651841bbc4b8656	2018-10-04 23:16:45 -07:00
Edward Yang	bd8980e8c0	Enable CUDA 10 in CI. (#12343 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12343 Differential Revision: D10215274 Pulled By: ezyang fbshipit-source-id: ab14e0cadd4100d7cfc3c7e924dd92742da3c29e	2018-10-04 23:16:42 -07:00
Edward Yang	6544cd4590	Revert D10205876: Fix unintended casting to long Differential Revision: D10205876 Original commit changeset: b0678b019b19 fbshipit-source-id: ebd3acc017fd10cf293e1de281ea294da86747be	2018-10-04 21:10:52 -07:00
Dong Shi	8e5ac43b4e	Fix unintended casting to long Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12341 Reviewed By: ezyang Differential Revision: D10205876 fbshipit-source-id: b0678b019b196ac9ee52969f80819ee9ee442bf2	2018-10-04 17:41:40 -07:00
David Reiss	16e21e14e3	Fix Caffe2 build on 64-bit Android (#12340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12340 `long` and `int64_t` are the same type on 64-bit Android. Reviewed By: Yangqing Differential Revision: D10204892 fbshipit-source-id: 2d5bf707bf87b99fc597c9292b59f032e9004620	2018-10-04 15:14:53 -07:00
David Riazati	f0b73ff790	Pretty printer improvements (#12179 ) Summary: * Replaces `prim::PythonOp` with the name of the function being called * Delays printing values used in `prim::Return` nodes until the return node itself if that is the only place the value is used to remove some useless assigns zdevito apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12179 Differential Revision: D10132661 Pulled By: driazati fbshipit-source-id: cbc4ac34137ed5872049082e25d19eb1ebc71208	2018-10-04 15:14:51 -07:00
Orion Reblitz-Richardson	895994a7c3	Back out "[pytorch][PR] [Build] Use open-source NCCL2 in PyTorch" Reviewed By: The controller you requested could not be found. fbshipit-source-id: a13075339d3a7b970e81be0b1a32a7c4c3a6c68d	2018-10-04 14:12:04 -07:00
iotamudelta	a98489747d	Enable sparse functionality and tests (#12323 ) Summary: * Enable sparse functions for ROCm * Reenable test_sparse unit tests that are now passing in ROCm ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12323 Differential Revision: D10203540 Pulled By: bddppq fbshipit-source-id: 33ffcfbda32875676c27b33ad1e7cd96fbadc790	2018-10-04 13:43:12 -07:00
vishwakftw	39bd73ae51	Guard NumPy usage using USE_NUMPY (#11798 ) Summary: All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source. Fixes #11757 Reviewed By: Yangqing Differential Revision: D10031862 Pulled By: SsnL fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712	2018-10-04 12:11:02 -07:00
Gu, Jinghui	c064f8a89d	Fix build error mkldnn due to corruptted CMAKE_REQUIRED_LIBRARIES (#12195 ) Summary: This is to fix cmake-time compilation error. When we change script to build Caffe2 with mkldnn, we run into some cmake-time compilation support check (like in libsleef) failed due to incorrect setting of CMAKE_REQUIRED_LIBRARIES. It is a global setting which can interfere camke compilation if it is not clean up properly. FindBLAS.cmake and FindLAPACK.cmake didn't clean this flag, and causes incorrect building of libsleef.so. yinghai gujinghui Pull Request resolved: https://github.com/pytorch/pytorch/pull/12195 Differential Revision: D10159314 Pulled By: yinghai fbshipit-source-id: 04908738f7d005579605b9c2a58d54f035d3baf4	2018-10-04 11:56:06 -07:00
Teng Li	ae7a7fb398	Use open-source NCCL2 in PyTorch (#12312 ) Summary: - Removed the old nccl file - Make open-source NCCL a submodule - CMake to make NCCL itself NCCL2 now is in the default build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312 Differential Revision: D10190845 Pulled By: teng-li fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae	2018-10-04 11:42:17 -07:00
Yangqing Jia	6b79e16d6d	revert test/expect files (#12332 ) Summary: Linter added newline to the expect files in #12144 . This reverts it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12332 Reviewed By: SsnL Differential Revision: D10201790 Pulled By: Yangqing fbshipit-source-id: 29f87c013c3522675a765a81a92520fbaea10057	2018-10-04 11:12:57 -07:00
Yangqing Jia	83de6f0dac	hip minor fix for c10 (#12329 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/12329 Differential Revision: D10201437 Pulled By: Yangqing fbshipit-source-id: 4e62f5870ad269d7a4f936393d2b3e646d0a6b2c	2018-10-04 11:12:54 -07:00
Peter Goldsborough	bcb62cb525	Lazily create tensors in optim_baseline (#12301 ) Summary: Tensors cannot be created globally because of static initialization order issues. So tensors for the optim_baseline test must be created lazily instead. This is fine because these functions will only be called once (in the respective test). ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12301 Differential Revision: D10201008 Pulled By: goldsborough fbshipit-source-id: 59a041f437354e7c6600e5655b3e2d0647dbde9e	2018-10-04 10:55:53 -07:00
Yangqing Jia	1962646d0f	Remove CAFFE2_UNIQUE_LONG_TYPEMETA (#12311 ) Summary: CAFFE2_UNIQUE_LONG_TYPEMETA has been a tricky variable defined only from cmake - this is an experiment to remove it and see what exact compilers need that one set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12311 Reviewed By: dzhulgakov Differential Revision: D10187777 Pulled By: Yangqing fbshipit-source-id: 03e4ede4eafc291e947e0449382bc557cb624b34	2018-10-04 10:12:13 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Johannes M Dieterich	c9f7d7b506	mark unit tests as working, skip failing unit test (#12313 ) Summary: * enabled fp16 tests for test_torch * enable fp16 tests for test_nn * enabled multilabelmargin loss for fp16 * removed skip for test_pdist_empty_col * Enable test_nn tests that pass with compiler fixes etc. * Enable test_legacy_nn tests that pass with compiler fixes etc. ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313 Differential Revision: D10189922 Pulled By: bddppq fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6	2018-10-03 23:56:26 -07:00
Bram Wasti	8c64655460	Open source distributed code (#12254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12254 Move distributed_* code to oss folders This unblocks adding python bindings Reviewed By: duc0 Differential Revision: D10141400 fbshipit-source-id: 04d6654b73b6757c4dc4a1ddd9dfa2ce23c8c91d	2018-10-03 21:41:14 -07:00
Qinqing Zheng	15367ba9bc	Deserialize offset of TreeCursor only when it is not empty (#11465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11465 It happened in one of my testing workflow run that deserialization of dataset_cursor failed. The reason it fails is due to the offset vector is serialized only when it's non-empty, but deserialization always process offset_blob whenever it is called. Though I'm still checking the reason why the offset of dataset_cursor is empty, I think it's good to remove this discrepancy. Reviewed By: aazzolini, Tianshu-Bao Differential Revision: D9737636 fbshipit-source-id: bb111933f534b092f29469680ff29e59617655f0	2018-10-03 20:38:59 -07:00
Jongsoo Park	07bb79bd8b	Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12274 We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type. Reviewed By: llyfacebook Differential Revision: D10156452 fbshipit-source-id: 52cf2bedc9dbb433cd5d03f0b76723f7df6a7361	2018-10-03 19:26:16 -07:00
Jerry Zhang	faab6ea922	Split Allocator (#12105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12105 Split CUDA/OpenCL/xxx Allocator from xxxStaticContext::New and rewrite it under at::Allocator interface. Reviewed By: dzhulgakov Differential Revision: D10001033 fbshipit-source-id: e1ffbc04c18d1dcb1f8d4ef2cbbb321967de5ccc	2018-10-03 19:10:10 -07:00
Jerry Zhang	74dc4460eb	New in StaticContext returns at::DataPtr (#12029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12029 In order to remove New() function in StaticContext(to remove StaticContext) and converge to the Allocator design, we'll first change the return type of New to at::DataPtr. Reviewed By: ezyang Differential Revision: D9889990 fbshipit-source-id: 3257c763530b987025f428741bdd2e089d11bad4	2018-10-03 19:10:07 -07:00
Peter Goldsborough	bcc2a0599b	Enable clang-tidy in CI (#12213 ) Summary: At long last, we will have clang-tidy enabled in CI. For a while I thought I could clean up the project enough to enable clang-tidy with all checks enabled, but I figure it's smarter to set up the minimal checks and at least have those in CI. We can fix more going forward. ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12213 Differential Revision: D10183069 Pulled By: goldsborough fbshipit-source-id: 7ecd2d368258f46efe23a2449c0a206d10f3a769	2018-10-03 17:25:06 -07:00
David Riazati	c9f9df002d	Properly catch errors in PythonOps (#12243 ) Summary: If a PythonOp throws an error it raises an exception to the interpreter and also releases the GIL which causes [pybind to segfault](https://github.com/potassco/clingo/issues/42) This fix catches pybind errors while the GIL is still held and throws a `python_error` to re-capture the GIL Fixes #12118 apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12243 Differential Revision: D10182787 Pulled By: driazati fbshipit-source-id: 719d4a7c3294af201e061cf7141bec3ca0fb1f04	2018-10-03 17:25:03 -07:00
Jongsoo Park	557015fd93	wipe cache with writes (#12279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12279 By some reason if we don't write to the wipe buffer, it doesn't really wipe out everything from caches in x86. We also need to wipe out cache after initializing input blobs. Reviewed By: Maratyszcza Differential Revision: D10161211 fbshipit-source-id: c34414dd8b83947805010d7d57e4134d56de1430	2018-10-03 17:12:23 -07:00
rohithkrn	6b9afc894b	pyHipify Fixes (#12292 ) Summary: This PR makes the following changes: * stores cuda_to_hip mappings in python OrderedDicts * Replace cudaError with cudaError_t and remove cudaError mapping bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/12292 Differential Revision: D10184399 Pulled By: bddppq fbshipit-source-id: b20a4661ba534e4fb12aa738e1ed74dba84f30fc	2018-10-03 17:12:17 -07:00
Wanchao Liang	fe10f3d0c6	Fix up onnxwhile op (#12124 ) Summary: Fix things in onnxwhile op to support nested loops, correctly track loop carried deps. Nested loops should be fully supported together with https://github.com/onnx/onnx/pull/1453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12124 Differential Revision: D10108817 Pulled By: wanchaol fbshipit-source-id: 51b948024da857c9962833213ee792f47f054e48	2018-10-03 15:55:58 -07:00
Wanchao Liang	8aa23907e8	Make if block also take control_inputs, preserve SSA (#12224 ) Summary: If block is missing control inputs when do caffe2 net execution, this PR add them back and remove the un-SSA semantics jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/12224 Differential Revision: D10135408 Pulled By: wanchaol fbshipit-source-id: 746c870bde54ed4ca627167361db1b3f36cd235c	2018-10-03 14:29:01 -07:00
Edward Yang	b548f8320d	Reduce size of TensorImpl from 160 bytes to 128 bytes (#12266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12266 - Put all byte-size fields together (booleans and TensorTypeId), so they can be coalesced into a single word. - Replace std::vector<int64_t> strides with std::unique_ptr<int64_t[]>, saving two words. Reviewed By: dzhulgakov Differential Revision: D10150834 fbshipit-source-id: f54f38eec34732f3ff7e52e00b1371d7b5b210eb	2018-10-03 14:28:59 -07:00
Lu Fang	2217c0b408	create the onnx_root in local, and link it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12294 Reviewed By: BIT-silence Differential Revision: D10178208 Pulled By: houseroad fbshipit-source-id: 6105b88ea5f3ce9164961cf13b356d85178c374d	2018-10-03 13:55:56 -07:00
Wanchao Liang	3db9738b30	add torch factory methods (zeros/ones) to onnx symbolic Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11477 Differential Revision: D9761637 Pulled By: wanchaol fbshipit-source-id: 401f8d43a831685a444e88509bace94ce5b94e52	2018-10-03 13:55:54 -07:00
Junjie Bai	01d835c9b2	Revert D10128131: [nomnigraph] Add move{Node,Edge,Subgraph} for Graph move-like semantics Differential Revision: D10128131 Original commit changeset: b0e17ec2802c fbshipit-source-id: c4a922c10ce8eddc965447b3cc4b6b01dd26dabb	2018-10-03 13:11:23 -07:00
David Riazati	d1ac1eba3b	Add `bool` type to IR (#11834 ) Summary: This PR adds a bool type to `IValue` and puts it into place. * changes conds for `prim::If` and `prim::Loop` to use `bool` type * changes operators that take `bool`s to match their native ops * fixes ambiguous `aten` ops `aten::std` and `aten::var` * fixes tests in `test_jit.py TestJitGenerated` ``` 'test_std_dim', 'test_std_dim_1d', 'test_std_dim_1d_neg0', 'test_std_dim_neg0', 'test_var_dim', 'test_var_dim_1d', 'test_var_dim_1d_neg0', 'test_var_dim_neg0' ``` * adds `prim::BoolToTensor` and `prim::TensorToBool` apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11834 Differential Revision: D9928570 Pulled By: driazati fbshipit-source-id: 373c53df2f1a8ffa9e33d9a517002fbeef25f3eb	2018-10-03 12:40:03 -07:00
Ashish	c029c839a1	MIOpen 1.5 group conv API integration (#12273 ) Summary: This PR contains changes for: 1. Group convolutions introduced in MIOpen 1.5 2. Checks to initialize MIOpen conv operator descriptors only when needed (inputs or weights changed) Differential Revision: D10174611 Pulled By: bddppq fbshipit-source-id: cd3d61fae350c4a5e540ce1a6e08012e0e2689fe	2018-10-03 12:26:58 -07:00
Bram Wasti	a839ec805a	Add move{Node,Edge,Subgraph} for Graph move-like semantics Summary: Adding back import{Node,Edge} as move{Node,Edge} and adding a new function moveSubgraph Reviewed By: duc0, yyetim Differential Revision: D10128131 fbshipit-source-id: b0e17ec2802cb211b6455578fdb17dab2a7a425b	2018-10-03 12:26:55 -07:00
Ir1dXD	b911ca9b0d	docs: change links to https (#12258 ) Summary: Hi, I think it might be better to use https instead of http in the README.md. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12258 Differential Revision: D10162279 Pulled By: soumith fbshipit-source-id: 4658aa75175909b4fea6972b437765d8b49c749f	2018-10-03 06:33:09 -07:00
Sven-Hendrik Haase	080266e79c	Document CUDAHOSTCXX environment variable (#12265 ) Summary: This variable is already being used so this just serves to document that. I think it's an important variable, too, so it should definitely be documented there somewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12265 Differential Revision: D10162261 Pulled By: soumith fbshipit-source-id: e0d01e012c2fedea63372de9967a8eaa3745fe94	2018-10-03 06:33:06 -07:00
daquexian	1fb8925efe	Fix typo LMBD->LMDB in docs of setup.py (#12282 ) Summary: `setup.py` reads `USE_LMDB` rather than `USE_LMBD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12282 Differential Revision: D10162025 Pulled By: soumith fbshipit-source-id: 6295a777be10509ca49516ad7c10061d26b6f9c9	2018-10-03 06:14:19 -07:00
Fei Sun	c0ed48a57e	Add support to the accuracy metric (#12211 ) Summary: The code that reads a blob from input files are broken. Fixing them. Also, add a binary that converts input files to blobs that can be used by Caffe2 directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12211 Reviewed By: llyfacebook Differential Revision: D10121845 Pulled By: sf-wind fbshipit-source-id: 6e48bb594680bcb3186d8d43276b602041c30d3e	2018-10-03 02:10:51 -07:00
Alexander Mazukabzov	06360c3050	Back out "Deduplicate canonical_axis_index_ with maybe_wrap_dim" Summary: Original commit changeset: 13c98fff0880 Reviewed By: ezyang Differential Revision: D10153342 fbshipit-source-id: c74c56e61662e9c747206e812b1da22170cbf742	2018-10-02 16:40:21 -07:00
Jongsoo Park	a76216b8ed	Back out "[aibench] Use caffe2::int8::Int8TensorCPU when input type is uint8_t" Summary: Original commit changeset: b63cd3a75f87 Reviewed By: bddppq Differential Revision: D10154512 fbshipit-source-id: 039dfd295c5d1de799993a20e708915be65e9d76	2018-10-02 16:25:11 -07:00
Junjie Bai	035d04299c	Update onnx to onnx/onnx@ddf8eb6 (#12267 ) Summary: `ddf8eb6aa0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12267 Reviewed By: yinghai Differential Revision: D10151536 Pulled By: bddppq fbshipit-source-id: 4cb04fcc0377c6c39fb318c5fc7043e67c400866	2018-10-02 15:57:43 -07:00
Jongsoo Park	04b0774964	Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12250 We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type. Reviewed By: llyfacebook Differential Revision: D10121216 fbshipit-source-id: b63cd3a75f87e043cc3c83de4f3520b6ffbf1d07	2018-10-02 14:57:28 -07:00
Lu Fang	7c678746ef	update the script to match the current build process Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12262 Reviewed By: BIT-silence Differential Revision: D10148658 Pulled By: houseroad fbshipit-source-id: c083346cc40154f7baea1be713cac799cf076cbf	2018-10-02 14:01:37 -07:00
Peter Goldsborough	29e5ba8a7b	Fix for LibTorch download link (#12263 ) Summary: We now have a proper download link for libtorch. ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12263 Differential Revision: D10149216 Pulled By: goldsborough fbshipit-source-id: e9caefed1c7f8e25d7623d72c8548bfdb6114329	2018-10-02 12:25:25 -07:00
Dmytro Dzhulgakov	1d3f650ce4	Revert D10098106: [pytorch][PR] [WIP] New version of PT1 model format Differential Revision: D10098106 Original commit changeset: 94ec7fc57c84 fbshipit-source-id: 38f729b0970618f38359797b806cbbcd865f4715	2018-10-02 00:43:40 -07:00
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Edward Yang	696498d9e4	Delete stride updating logic from Caffe2, and make PyTorch error in this case. (#12236 ) Summary: Strides appear to cause a huge memory regression in some of our internal training workflows. This diff stems the bleeding, while we figure out exactly what happened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12236 Reviewed By: dzhulgakov Differential Revision: D10134319 fbshipit-source-id: 1547c89a65c05473c409c0977c19c99dcaefb89c	2018-10-01 21:25:04 -07:00
iotamudelta	2cbcaf4544	Skip failing tests in test_sparse (#12229 ) Summary: Skip the recently introduced tests that fail on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/12229 Differential Revision: D10138146 Pulled By: bddppq fbshipit-source-id: a0f1ff97fabb71f635a468e8030dbe32d388de49	2018-10-01 18:31:45 -07:00
Ilia Cherniavskii	8af06d8114	Use DFS scheduling only within single device (#11848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11848 Avoid crossing the boundary between devices when using DFS scheduling Reviewed By: romain-intel Differential Revision: D9931091 fbshipit-source-id: 1f3cf52127830048ed1db50b01677b66eeed8b32	2018-10-01 18:31:43 -07:00
Shicong Zhao	ecace9eb21	Move crf in caffe2 from fb to oss (#12200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12200 moved crf_viterbi_op, copied crf_predict and crf_viterbi_test to oss Reviewed By: Yangqing Differential Revision: D10118341 fbshipit-source-id: 51e30e57d280d6ca75fc0b488f743794f23b589f	2018-10-01 18:31:41 -07:00
Junjie Bai	26df16eb21	Clear previous device option when keep_device is set in load op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12240 Reviewed By: jerryzh168 Differential Revision: D10133933 fbshipit-source-id: 05935bd527177f936c1d08626888d43dedbf5ce4	2018-10-01 17:20:26 -07:00
Jerry Zhang	23f86ad57f	Back out "[caffe2][mpscnn] Enable multiple external output" Summary: Original commit changeset: 0cea9469cea0 Differential Revision: D10135814 fbshipit-source-id: 9563361cc00f4ce5dc2e903c0fcb10643ee9af26	2018-10-01 16:55:32 -07:00
Lu Fang	35becd1879	New version of PT1 model format (#12149 ) Summary: Considered four different existing formats: 1) static graph, 2) torch script, 3) pickle files, 4) PyTorch C++ serialize APIs Pull Request resolved: https://github.com/pytorch/pytorch/pull/12149 Reviewed By: BIT-silence Differential Revision: D10098106 Pulled By: houseroad fbshipit-source-id: 94ec7fc57c842e50fae5286ddeda657a4967a07a	2018-10-01 15:57:02 -07:00
Junjie Bai	8fa7de35f2	Enable ROCM clang-7 build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12223 Differential Revision: D10133697 Pulled By: bddppq fbshipit-source-id: c1de99afccdad415ac1beb85d3b8ab44f9b58738	2018-10-01 15:11:40 -07:00
Roy Li	15d28e400f	remove support for c extensions (#12122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12122 We are deprecating support for c extensions. Please use cpp extension in the future. Reviewed By: Yangqing Differential Revision: D10060541 fbshipit-source-id: 4f7149e06a254bd7af463fd7aa9740f65369963a	2018-10-01 13:55:28 -07:00
Junjie Bai	1b59cf8b51	Add support to use llvm 7 in CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12182 Differential Revision: D10129630 Pulled By: bddppq fbshipit-source-id: f0217336474b807f03f84a4b8052ce92a6e3564b	2018-10-01 13:39:50 -07:00
Ilia Cherniavskii	06f535d8a0	More debug info in plan executor (#12183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12183 Adding more debug info printed from plan executor Reviewed By: manojkris Differential Revision: D10113104 fbshipit-source-id: dddc9aec8012c8575ab305033388412fdaaac537	2018-10-01 12:56:32 -07:00
Ilia Cherniavskii	eba1cf2145	Unify style (#11949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11949 Unify naming style Reviewed By: yinghai Differential Revision: D9931227 fbshipit-source-id: b6956bd98ed8625623e4747d616989f9f3a2ed46	2018-10-01 12:56:29 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Wei Yang	ecb3835387	change \gamma to \Gamma (#12214 ) Summary: - revert `\gamma` changes at landed PR: https://github.com/pytorch/pytorch/pull/12126 - minor fix for docs of `torch.norm()` SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12214 Differential Revision: D10127337 Pulled By: weiyangfb fbshipit-source-id: 15eb8abda39ec9e8b2e815e2a22096cae786995a	2018-10-01 11:31:18 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Duc Ngo	e43ffb0148	nomnigraph - easy - some code cleanup for transformations_test (#12101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12101 clean up some duplicate test code Reviewed By: ZolotukhinM Differential Revision: D10051914 fbshipit-source-id: 698ff144a85e8c70572116c5ddb415cd2396b4e3	2018-10-01 11:31:08 -07:00
Jerry Zhang	006171fffc	Back out "[pytorch][PR] Revert "Move CreateContext to global registry (#11688 )"" (#12121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12055 Original commit changeset: 6ca9de65b707 Reviewed By: ezyang Differential Revision: D10033396 fbshipit-source-id: ca9f4b2f7ef0561f619b833415d394a8b9972bf4	2018-10-01 11:10:46 -07:00
Elias Ellison	fed91f873f	(Very small) allow trailing commas in assign or tuples (#11723 ) Summary: Allow trailing commas in assign statements or tuples, which also allows single element tuples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723 Differential Revision: D10052162 Pulled By: eellison fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8	2018-10-01 10:10:13 -07:00
Jongsoo Park	f3c32a4b54	dnnlowp_16 -> dnnlowp_acc16 (#12205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12205 We're more interested in testing the performance of DNNLOWP_ACC16 engine. Reviewed By: llyfacebook Differential Revision: D10121080 fbshipit-source-id: 7def38be838feb7636f7dd0c8ed352c2df398ec1	2018-10-01 09:40:13 -07:00
Hector Yuen	9768b4d4ff	support half float for SparseLengthsIndicesInGradientWeightedSumWithMainInputGradient (#12186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12186 specialized implementation, preconvert embeddings to float and do everything on fp32 Reviewed By: jspark1105 Differential Revision: D10100603 fbshipit-source-id: 3255b4addb6fda24722bd519163099f5d354d084	2018-09-30 23:56:14 -07:00
Peter Goldsborough	c3817e85fa	Temporary fix for LibTorch download link (#12212 ) Summary: We're waiting for the libtorch links to show up on the website. I had a fake link in the docs so far which is misleading. This PR changes it to a temporary markdown file until the web people fix the site tomorrow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12212 Differential Revision: D10121872 Pulled By: goldsborough fbshipit-source-id: f1bd1315f7333b9168e99983f3f6b679c9b0c52a	2018-09-30 15:39:51 -07:00
Wei Yang	572132fb17	copy_(Sparse, Sparse) for sparse tensor (#9005 ) Summary: - fix #8330 - add `torch.copy_(Sparse, Sparse)` with autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/9005 Differential Revision: D8987885 Pulled By: weiyangfb fbshipit-source-id: b317a41da22ee1eae2835622a0ed28a6771a3a06	2018-09-30 11:55:09 -07:00
Peter Goldsborough	93ecf4d72a	Remove raise_from (#12185 ) Summary: soumith CC alsrgv Fixes #11995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12185 Differential Revision: D10120103 Pulled By: goldsborough fbshipit-source-id: ef7807ad83f9efc05d169675b7ec72986a5d17c3	2018-09-29 22:41:55 -07:00
Wei Yang	5ffc915f26	fix docs (#12126 ) Summary: - fix https://github.com/pytorch/pytorch/issues/12120 - add `torch.argsort`, `torch.pdist`, `broadcast_tensors` to *.rst files - add parameter dim to `torch.unique` doc - fix table and args for `torch.norm` - test plan: make html and check docs in browser gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/12126 Differential Revision: D10087006 Pulled By: weiyangfb fbshipit-source-id: 25f65c43d14e02140d0da988d8742c7ade3d8cc9	2018-09-29 22:26:45 -07:00
Jiyan Yang	40aa212cd6	Support fp16 mkl engine in training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12080 Reviewed By: hyuen Differential Revision: D10037719 fbshipit-source-id: 618ce894eccc4c87a038dc3ab836684f16843cde	2018-09-29 21:55:11 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
mruberry	878e7740fd	Turns optimizations off when checking trace (#12172 ) Summary: Currently when tracing optimizations are performed twice. This means that optimizing passes, like the fusion pass, are also called twice. This is unnecessary and this PR turns off optimizations when checking the trace (since the trace is independent of optimizations). This should improve performance and debugging. apaszke who proposed this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12172 Reviewed By: ezyang Differential Revision: D10109250 Pulled By: apaszke fbshipit-source-id: 8b3385eae143446820f1b61ca7576d7c07f9b248	2018-09-28 19:40:10 -07:00
Bram Wasti	22ce6060ec	Add caffe2_api to exported functions (#12184 ) Summary: Broke the build, sorry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12184 Differential Revision: D10114818 Pulled By: bwasti fbshipit-source-id: 49844183a48d9383c5055a9ce06fe61fbf353050	2018-09-28 18:12:00 -07:00
Jerry Zhang	ebc2643498	Enable multiple external output (#10957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10957 att Differential Revision: D9525097 fbshipit-source-id: 0cea9469cea06cbfd3828549b168483413788269	2018-09-28 18:11:58 -07:00
Bram Wasti	0a5dfa5a52	Add support for device annotations on blobs Summary: device annotations on blobs with Declare and Export trick Reviewed By: yyetim Differential Revision: D9999916 fbshipit-source-id: 0bd4d15e7beed2788f47255d52ea296f8f674295	2018-09-28 14:11:54 -07:00
Bram Wasti	08e5ca1262	Add filter<T>(NNModule) and explicit Declare/Export classes (#11955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11955 Adding a `filter<T>(NNModule)` function to easily get inputs/outputs of a DAI-style NNModule. Reviewed By: duc0 Differential Revision: D9997696 fbshipit-source-id: 818c4f2e3093e0d02b35e6632b426e8d3189c21e	2018-09-28 14:11:53 -07:00
Bram Wasti	60061a20d9	Adding Declare and Export operators (#11954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11954 Adding an alternative to external_input and external_output for use in some distributed settings Reviewed By: aazzolini Differential Revision: D9997121 fbshipit-source-id: 1b5cc03fd3051368a3edc69e7bc472386f5746b5	2018-09-28 14:11:51 -07:00
mruberry	7b2c0a09e4	Adds support for NaN, +inf, -inf float scalars to CPU and CUDA fusers (#12070 ) Summary: In current upstream float scalars are always written into kernels with: `out << std::scientific << v << "f";` When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN. To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter. Edit: Forgot to mention, this partially addresses issue #12067. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070 Reviewed By: ezyang Differential Revision: D10044704 Pulled By: soumith fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c	2018-09-28 14:11:49 -07:00
Edward Yang	0e779c27e1	Deduplicate canonical_axis_index_ with maybe_wrap_dim (#11891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11891 maybe_wrap_dim is a slightly more general function, which is able to, under some circumstances, treat 0 as a "valid" dimension even with a tensor is scalar. canonical_axis_index_ never accepts this behavior, so it always passes false. Reviewed By: jerryzh168 Differential Revision: D9968320 fbshipit-source-id: 13c98fff0880d7bfcd00911a76c8aa10d37bd183	2018-09-28 14:11:48 -07:00
Aditya Kumar	ab9a5976a0	Disable inlinining of EnforceFailMessage (#12078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12078 The constructor is inlined multiple times Reviewed By: salexspb Differential Revision: D9358084 fbshipit-source-id: c8d4177a3fcccac574ee4f63336a6fa8bfb07d11	2018-09-28 11:24:35 -07:00
Tongzhou Wang	8009b6cdb5	Kill self_ty in TYPE_DERIVED_DEFINITION_NATIVE (#11903 ) Summary: This allows us to call the type argument with name other than `self_ty`. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11903 Differential Revision: D10105029 Pulled By: SsnL fbshipit-source-id: 0fbdc728123ebc1154d080628cb41a085ba3e6d7	2018-09-28 11:09:50 -07:00
Zachary DeVito	e7e10e60e0	Introduce builtin script functions (#12141 ) Summary: This functionality replaces the Scalar-Tensor builtin operators, with builtin functions. Builtin functions are used in place of operators where one operator can be defined using a composition of another. This simplifies later optimization passes by allowing us to have fewer operator. In the future, builtin functions can be used for other purposes. For example, we can define derivative functions as code rather than building graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12141 Reviewed By: ezyang Differential Revision: D10088065 Pulled By: zdevito fbshipit-source-id: a2acb06346e649c4c8a2fe423b420871161c21cf	2018-09-28 10:55:08 -07:00
Junjie Bai	65bf181ddf	Add "ai.onnx.pytorch" onnx domain (#12157 ) Summary: zrphercule Pull Request resolved: https://github.com/pytorch/pytorch/pull/12157 Differential Revision: D10100799 Pulled By: bddppq fbshipit-source-id: 76fdd126e0b52c54276752b3b0174735355a7d2f	2018-09-28 09:57:06 -07:00
Fritz Obermeyer	0aff3cc559	Fix broadcasting bug in StudentT (#12148 ) Summary: This fixes a broadcasting error with the `StudentT` distribution - [x] added a regression test - [x] strengthened parameter broadcasting tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/12148 Differential Revision: D10099226 Pulled By: soumith fbshipit-source-id: 0c5eb14180d158f8fff28ceb9e7cd3471c2bb803	2018-09-28 09:57:02 -07:00
cclauss	b0248df72a	Docs: Change cuda(async) —> cuda(non_blocking) (#12158 ) Summary: goldsborough Modify the docs to match the changes made in #4999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12158 Differential Revision: D10103964 Pulled By: SsnL fbshipit-source-id: 1b8692da86aca1a52e8d2e6cea76a5ad1f71e058	2018-09-28 08:39:27 -07:00
Luca Antiga	5be0baefa2	Use streams in JIT serialization, allow JIT serialization to/from buffer (#11932 ) Summary: This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization. It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python. `getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932 Differential Revision: D10084303 Pulled By: apaszke fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d	2018-09-28 07:54:27 -07:00
Jeff Smith	d291cf7de6	Ensuring positive definite matrix before constructing (#12102 ) Summary: Ensuring positive definite matrix in Multivariate Normal Distribution Pull Request resolved: https://github.com/pytorch/pytorch/pull/12102 Reviewed By: ezyang, Balandat Differential Revision: D10052091 Pulled By: jeffreyksmithjr fbshipit-source-id: 276cfc6995f6a217a5ad9eac299445ff1b67a65f	2018-09-28 07:27:20 -07:00
Satish Nadathur	04c0971679	Special case BatchGather and BatchGatherGradient for block_size=1. (#11349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349 Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case. Reviewed By: jspark1105, ilia-cher Differential Revision: D7218043 fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7	2018-09-27 21:11:38 -07:00
Edward Yang	f5a0c337ba	Move TensorImpl IsType, meta, dim32, dim, ExtractDeviceOption to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12100 Reviewed By: jerryzh168 Differential Revision: D10051424 fbshipit-source-id: 5986e92ea54e60ec6bfe992015a05e09288c948c	2018-09-27 20:40:03 -07:00
Edward Yang	bbae57d06e	Move TensorImpl size_from_dim, size_to_dim, size_between_dim, canonical_axis_index to caffe2::Tensor (#12099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12099 - Generalize the free functions to accept IntList, not just std::vector<int64_t> Reviewed By: jerryzh168 Differential Revision: D10051365 fbshipit-source-id: e3d571bf8fead22f6f25c3ca46f0c38c2bb065d2	2018-09-27 20:40:00 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Edward Yang	149403f849	Move TensorImpl ndim, size, itemsize and nbytes to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12098 Reviewed By: jerryzh168 Differential Revision: D10051298 fbshipit-source-id: a833fad74bbda38c019ec2cb97d4bb6804e09963	2018-09-27 19:56:00 -07:00
Michael Suo	7f35e92af2	mutable lists (#10700 ) Summary: This PR implements the design that we discussed. Changes: - Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state). - Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema. - Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences - AnnotateEffects pass which wires up all the world tokens together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700 Reviewed By: eellison Differential Revision: D9547881 Pulled By: michaelsuo fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88	2018-09-27 19:25:13 -07:00
Edward Z. Yang	a5818047c4	Rewrite serialization to correctly handle partial reads/writes in all cases (#12143 ) Summary: Previously, doRead/doWrite were functions that could return partial reads/writes, and we checked for this case inconsistently in the call sites of serialization.cpp. Now, these functions do NOT return the amount of bytes read/written, and instead handle the necessary checking loop themselves. Fixes #12042. Maybe. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143 Differential Revision: D10097027 Pulled By: ezyang fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1	2018-09-27 19:09:53 -07:00
Edward Yang	a86a61b004	Implement caffe2::Tensor::raw_data() in terms of data() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12097 Reviewed By: jerryzh168 Differential Revision: D10051202 fbshipit-source-id: b4b61869363a606ab465d1500558226efae30d06	2018-09-27 18:40:37 -07:00
Edward Yang	2021b26bcb	Move TensorImpl::ShareExternalPointer helper overloads to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12096 Reviewed By: jerryzh168 Differential Revision: D10051126 fbshipit-source-id: a9b95d00512a0b4e6339d4f3f0bb180dd0c79247	2018-09-27 18:40:35 -07:00
Edward Yang	976a9e0454	Move TensorImpl::DebugString() to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12095 Reviewed By: jerryzh168 Differential Revision: D10051078 fbshipit-source-id: f56b6fc5d1cb8ae4b636e88efe607fe65cc1d7a0	2018-09-27 18:40:33 -07:00
Edward Yang	b0e48aa197	Move TensorImpl::Reshape(vector<int>) to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12094 Reviewed By: jerryzh168 Differential Revision: D10051079 fbshipit-source-id: 87fb91f31c33ce9b64c4654e79e0131ae391cd78	2018-09-27 18:40:30 -07:00
Edward Yang	8c533c2c90	Fix bug where Reshape() trashes strides. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12092 Reviewed By: jerryzh168 Differential Revision: D10051005 fbshipit-source-id: c36d1c8d12fb41baf8d1a1a9f38776deeff242de	2018-09-27 18:40:28 -07:00
Edward Yang	d02478e607	Move TensorImpl::ResizeLike to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12091 Reviewed By: jerryzh168 Differential Revision: D10051012 fbshipit-source-id: 772ecd2e377f7d4e1ae510c1f647f6c8b71e5a57	2018-09-27 18:40:25 -07:00
Edward Yang	dd73d57643	Move TensorImpl::ShrinkTo to caffe2::Tensor (#12090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12090 This is a slight pessimization because we need to do a full recompute of is_contiguous(), even though a modification of dim-0 is guaranteed to preserve contiguity. Reviewed By: jerryzh168 Differential Revision: D10050905 fbshipit-source-id: b99233e21c9f4275b0db6e76740462e5430ce152	2018-09-27 18:40:23 -07:00
Edward Yang	00c6fb16e7	Move ExtendTo to caffe2::Tensor from TensorImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12089 Reviewed By: jerryzh168 Differential Revision: D10050859 fbshipit-source-id: 843067aacfa2a519657220bc39a0f499582a48a4	2018-09-27 18:40:21 -07:00
Edward Yang	6a2dbc9808	Rename TensorImpl::GetDeviceType to device_type, and properly test if is_variable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12087 Reviewed By: jerryzh168 Differential Revision: D10050781 fbshipit-source-id: 0b6c9d7caf3b1000691f86fcc7f2ef203936a29f	2018-09-27 18:40:19 -07:00
Edward Yang	c5fc2f1105	Merge UndefinedTensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11972 Reviewed By: gchanan, Yangqing, jerryzh168 Differential Revision: D9995633 fbshipit-source-id: 6b4645c9d4bb0bc4301cd4bcfa76cf85331b8379	2018-09-27 18:40:16 -07:00
Wanchao Liang	e8cb6cb9d2	Fix some symbolics for ReduceSum, GE, LE (#12123 ) Summary: reduce sum negative indices turn to positive as caffe2 not supporting it. GE/LE symbolic operand order is wrong.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12123 Reviewed By: houseroad Differential Revision: D10095467 Pulled By: wanchaol fbshipit-source-id: eb20248de5531c25040ee68b89bd18743498138d	2018-09-27 17:40:46 -07:00
Edward Yang	f6abd16a9d	Merge TensorImpl. (#11971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11971 - Switched TensorImpl::data<T>() to use Storage::unsafe_data<T>() to work around an outstanding bug in the Storage::data<T>() implementation where it only works on Ts which are valid ScalarType - Qualify a bunch of identifiers which still live in caffe2:: namespace - strides returns an IntList now - s/update_strides/update_to_contiguous_strides/ - Correctly compute type_id_ for the Storage only constructor from Caffe2. This is special cased to only work for CPU and CUDA dense tensors. - Fix some signed-unsigned comparisons in Caffe2 code (OSS build for ATen/core has more restrictive warning tests.) Reviewed By: jerryzh168 Differential Revision: D9995559 fbshipit-source-id: 9c74032e011189e1c7e9a98d20f2bd1e25ad2e5c	2018-09-27 17:40:44 -07:00
Edward Yang	1619264ca5	Make ATen-core and caffe2 mutually recursive / merge template data<T>() (#11970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11970 Adds an ATen-core-headers target, which caffe2_cpu_internal depends on, and makes ATen-core depend on caffe2_headers. If you link against ATen-core, you must ALSO link against caffe2_cpu_internal; if you link against caffe2_cpu_internal, you must ALSO link against ATen-core, otherwise you'll have undefined symbols. Then, we merge template data<T>() method with Caffe2 implementation, demonstrating that includes to Caffe2 (core) from ATen/core are working Reviewed By: jerryzh168 Differential Revision: D9967509 fbshipit-source-id: 3d220c38b2c3c646f8ff2884fdcc889fa9276c7a	2018-09-27 17:40:42 -07:00
Gu, Jinghui	c35f85a6d4	Export symbols for pybind and other libs after caffe2 rebase (#11975 ) Summary: Export symbols for pybind and other libs after caffe2 rebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/11975 Differential Revision: D10042615 Pulled By: yinghai fbshipit-source-id: 6de562d99403099113093716834abc51bf726e94	2018-09-27 14:40:27 -07:00
wuhuikx	80e3081c28	Add observers for mkldnn fallback operators (#9093 ) Summary: Add observers for ideep operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9093 Reviewed By: salexspb Differential Revision: D9952949 Pulled By: yinghai fbshipit-source-id: 1678d1a738f8781dc75eb3cb9dfb309f7b7934fb	2018-09-27 14:11:19 -07:00
Cheng,Penghui	6e7e63fda3	Implementation MomentumSGD/MomentumSGDUpdate operators for mkl-dnn (#11686 ) Summary: the speed-up of a single operation is up to 6X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686 Reviewed By: yinghai Differential Revision: D9828129 Pulled By: wesolwsk fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8	2018-09-27 13:39:59 -07:00
Yangqing Jia	13cf39294d	Remove ATen/Error.h and use ATen/core/Error.h instead. (#12132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12132 TSIA. No code change involved. Reviewed By: bwasti Differential Revision: D10083237 fbshipit-source-id: bdab029015b9d0f1fa1f866c68aa5945cc68db9d	2018-09-27 10:11:17 -07:00
Freddie Mendoza	a72603f8f8	Fix for ppc64le jit graph difference in sigmoid backward, see #10726 (#11579 ) Summary: As reported in Issue #10726, the jit compiler, when running on ppc64le, may produce an isomorphic output but fail a diff test against the expected output file. The expected output file is created from a test that was ran on x86_64. This ensures that if ppc64le test output is different, the output is instead compared to an expected output file created when the test is run on a ppc64le system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11579 Differential Revision: D10080890 Pulled By: soumith fbshipit-source-id: 7249bf6b5dfa7c853368a3688a982bc9ed642bc9	2018-09-27 07:09:31 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Jerry Ma	383d340e88	Small optimization for adam (#12107 ) Summary: Apply weight decay for Adam in-place instead of via copy. Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. `eee01731a5/torch/optim/sgd.py (L93)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107 Reviewed By: soumith Differential Revision: D10071787 Pulled By: jma127 fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673	2018-09-26 21:43:46 -07:00
Edward Yang	5da8a8c785	Handle undefined tensor in blob correctly. (#12125 ) Summary: You can't GetDeviceType an undefined tensor, so test for this case first. This allows you to safely move tensors out of blobs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12125 Reviewed By: smessmer Differential Revision: D10080075 Pulled By: ezyang fbshipit-source-id: bb99b089b6daa9d4db99015208f939d7ce4d4a79	2018-09-26 21:43:41 -07:00
zrphercule	325101263a	Aten: catch2gtest (#11846 ) Summary: migrant all tests in aten to use gtest except of basic.cpp Sinc features of gtest are different from catch test, some of the tests has been re-writted with similar meaning. Basic test has some version conflict with valgrind according to CI, therefore this testcase is still implementing catch. It will be resolved by a different pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11846 Differential Revision: D10080860 Pulled By: zrphercule fbshipit-source-id: 439d4cf33fb6ccbe79b797860342853c63e59081	2018-09-26 20:57:45 -07:00
Peter Goldsborough	0f81039eaf	Better high level C++ documentation (#12079 ) Summary: I wrote some high level docs for the larger PyTorch C++ universe and the C++ frontend specifically. Happy for reviews, but let's please also land this ASAP so I can point users at something that looks more ready baked than the C++ docs landing page (https://pytorch.org/cppdocs) does right now. ezyang soumith CC ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/12079 Differential Revision: D10080785 Pulled By: goldsborough fbshipit-source-id: 3028de41373f307468eb1e3802aa27871c93b2e3	2018-09-26 20:57:43 -07:00
Christian Puhrsch	db5f8d42bb	Remove TIndex typedef from core/common.h (#12032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12032 See title Reviewed By: dinhviethoa Differential Revision: D10023757 fbshipit-source-id: dbf0a043b2afab767f052bd4c5e8de13e0f57dcc	2018-09-26 17:02:54 -07:00
Zachary DeVito	478803a75f	Introduce type variables to implement generic list operators (#12040 ) Summary: We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code. This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup. It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040 Differential Revision: D10037098 Pulled By: zdevito fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150	2018-09-26 17:02:51 -07:00
Joel Marcey	75b1ae1acd	Update issue templates Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12114 Reviewed By: soumith Differential Revision: D10060349 Pulled By: JoelMarcey fbshipit-source-id: ed88bf95f78742b089adb043e88613a5db006a10	2018-09-26 16:26:00 -07:00
Syed Tousif Ahmed	1b45f68397	Use atomicAdd from cuda_fp16 header when building with CUDA 10 (#12108 ) Summary: An efficient atomicAdd for halfs has been added in `cuda_fp16.h` in CUDA 10: ```__CUDA_FP16_DECL__ __half atomicAdd(__half *address, __half val);``` Through this change, PyTorch will be able to utilize efficient atomicAdd when building with CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12108 Differential Revision: D10053385 Pulled By: soumith fbshipit-source-id: 946c90691a8f6bdcf6d6e367a507ac3c9970b750	2018-09-26 15:28:17 -07:00
Vlad Belous	6ff568df4d	Add full namespace resolution in CAFFE_DURATION (#12065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12065 Had compilation issues using CAFFE_DURATION in some contexts, specifically due to namespace resolution. Since this is a macro, it should fully qualify. Reviewed By: heslami Differential Revision: D10036132 fbshipit-source-id: b8d55dfe5e991ca702ce5b7483f0ffc699882c85	2018-09-26 13:29:18 -07:00
Dong Shi	d9c27f4d8d	T33898723: Simple put operators for caffe2 stats (#12057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12057 Add simple put operators for various types of stats Reviewed By: mlappelbaum Differential Revision: D9925268 fbshipit-source-id: cec02b0027d2d0ef3d35741be4b02c429d492810	2018-09-26 12:39:37 -07:00
Doug Friedman	c2f8f5076c	add narrow() support for sparse tensors re: #8853 (#11342 ) Summary: Couple questions: 1) I used the log1p implementation in #8969 as a guide especially for testing. I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test. 2) I implemented the branching logic in the narrow function itself; is this the right place to do so? I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file. Or should I implement a separate dispatch in native_functions.yml as in the log1p? And of course, happy to make any any other updates/changes that I may have missed as well. This is my first PR to the project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11342 Differential Revision: D9978430 Pulled By: weiyangfb fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad	2018-09-26 12:24:54 -07:00
Adam Paszke	78fe149ab9	Fix ONNX bug, add symbolic for full Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12052 Differential Revision: D10044910 Pulled By: apaszke fbshipit-source-id: 015ef372966d7594e1b450e348d457429f6ef20d	2018-09-26 11:45:25 -07:00
Adam Paszke	18f9c07b18	Enable tracing of tensor factories with an out argument Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12051 Differential Revision: D10044890 Pulled By: apaszke fbshipit-source-id: 2d794bf408875600bc71f354f0b4961d6b715094	2018-09-26 09:40:34 -07:00
vishwakftw	b535aecd7c	Fix warnings emitted when testing distributions (#12038 ) Summary: The earlier tests had around 80 warnings, and now there are 6 warnings: these are due to JIT The changes remove the wrapping of a Tensor by a Tensor constructor, which emits warnings due to the changes in https://github.com/pytorch/pytorch/pull/11061 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12038 Differential Revision: D10033392 Pulled By: apaszke fbshipit-source-id: b1faf368e650d062d7983f9932511bee4702a893	2018-09-26 09:24:54 -07:00
Orion Reblitz-Richardson	02d7c88fa4	Unify versions across setup.py, libtorch, and libcaffe2 (#12053 ) Summary: This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct. cc Yangqing ezyang soumith goldsborough pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053 Differential Revision: D10041878 Pulled By: orionr fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0	2018-09-26 08:55:06 -07:00
Richard Zou	c8a0b11b7f	add autodiff expressions for common operations (#11832 ) Summary: This PR does a few things: Previously test_jit.py only tested autograd on backward graphs. This is because we borrow from test_autograd and construct graphs with a small number of nodes. Because the number of nodes is small (typically 1-2), those graph do not end up containing autodiff subgraphs, so autodiff never gets tested. This PR enables autodiff testing by doing the following: - added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable autodiff subgraph inlining. - (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold. These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining() sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs. - The relevant backward jit tests disable autodiff subgraph inlining so they will test the autodiff versions of the operators instead of autograd whenever an autodiff variant exists. - We don't run the tests that do inline autodiff subgraphs anymore. This has no impact on testing correctness because the assumption is that autograd functions are correct and are tested in test_autograd.py This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass. Included: - arithmetic overloads - abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt TestJitGenerated tests autodiff for all of the added operations. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832 Differential Revision: D10031256 Pulled By: zou3519 fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93	2018-09-26 08:10:04 -07:00
Sebastian Messmer	21ed7e51b6	Blob doesn't allow access to destroyCall anymore (#11548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11548 This removes getting/setting the DestroyCall of a Blob, paving the way to removing DestroyCall from Blob entirely and using the destructor stored in TypeMeta instead. Use sites have been fixed in diffs stacked below this. Reviewed By: dzhulgakov Differential Revision: D9775191 fbshipit-source-id: 97d72d0c62843849057f295c27f391e63c99c521	2018-09-26 01:45:28 -07:00
Sebastian Messmer	65cbb8226b	IValue can store Blob (#11414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11414 caffe2::Blob can be stored in an IValue. This is a precondition for caffe2 to switch from Blob to IValue. Reviewed By: ezyang Differential Revision: D9731326 fbshipit-source-id: 462a39d2d9ab6f85b99b1670848c6976a3de417c	2018-09-26 01:12:31 -07:00
Sebastian Messmer	b7ebc00979	Move Blob to ATen/core (#11924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11924 Previous diffs removed Blob -> caffe2 dependencies, now we can move it to ATen/core. This is pre-work for allowing storing Blob in IValue. Reviewed By: ezyang Differential Revision: D9980641 fbshipit-source-id: 32082a673ec94c42c20b2298adced8bb7ca94d07	2018-09-25 23:27:52 -07:00
Ansha Yu	8ff435c8f6	Use tempfile during serialized test comparison (#12021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12021 TestPilot runs stress tests in parallel. These fail for serialized tests because extracting (and subsequent deletion) of binary data during the process isn't threadsafe. Extract zips into tempfile to avoid this problem. Also remove some accidentally checked in zips of a test that we didn't end up including for now. Reviewed By: houseroad Differential Revision: D10013682 fbshipit-source-id: 6e13b850b38dee4106d3c10a9372747d17b67c5a	2018-09-25 20:55:45 -07:00
Wei Yang	807de9a1e3	fix segfault when grad to a hook fn is None (#12028 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/11751 by checking if a grad is a Python None object before getting cdata from it - behaviors: pre-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ...: def hook(grad): ...: print(grad) >>> a_list[0].backward() tensor(1.) >>> print('a_list[0]', a_list[0].grad, a.grad) ('a_list[0]', None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() # segfault ``` post-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ... : def hook(grad): ... : print(grad) >>> a_list[0].backward() tensor(1.) >>> print(a_list[0].grad, a.grad) (None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() None >>> print(a_list[1].grad, a.grad) (None, tensor([1., 1., 0., 0., 0.])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12028 Differential Revision: D10034094 Pulled By: weiyangfb fbshipit-source-id: 3f2135325fa7d338b920f57752057e4f6a6c0b1d	2018-09-25 19:10:25 -07:00
Cheng,Penghui	db2f7de5c3	Fallback CreateMutex/AtomicIter operators for mkl-dnn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11685 Reviewed By: pjh5 Differential Revision: D9928058 Pulled By: wesolwsk fbshipit-source-id: 734e19c35a684481d9a4d4f0c596e4dceae51ad4	2018-09-25 17:41:08 -07:00
Yangqing Jia	28dba2f928	Unify all _EXPORT and _IMPORT macros across c++ backend (#12019 ) Summary: TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification. This is a codemod by mechanically doing the following change: CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019 Reviewed By: ezyang, teng-li Differential Revision: D10016276 Pulled By: Yangqing fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164	2018-09-25 17:41:05 -07:00
Edward Yang	90bcf41291	Add safety asserts for methods on TensorImpl which don't work on Variable. (#12058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12058 Methods on TensorImpl have to be written very carefully, because when you have a VariableImpl subclass of TensorImpl, usually the local fields on the TensorImpl are not valid; instead, you have to forward to the "wrapped" tensor. Functions which are virtualized are probably handled correctly by Variable, but functions which are NOT cannot be handled correctly and shouldn't be called if you have a Variable. This diff add checks to determine if this is the case or not. Reviewed By: jerryzh168 Differential Revision: D10034589 fbshipit-source-id: 650b2036ca9a044c0ab4abdf6f825521a64e1fc2	2018-09-25 17:25:47 -07:00
Yinghai Lu	658386a63f	Make USE_IDEEP work again (#12026 ) Summary: This PR establish a baseline so that we can build IDEEP ops in the new work flow. From this baseline, we need to - Merge the CMakefile of MKLDNN from caffe2 and Pytorch - Get rid of `USE_MKL=ON`. Build command from now on: ``` EXTRA_CAFFE2_CMAKE_FLAGS="-DUSE_MKL=ON -DINTEL_COMPILER_DIR=/opt/IntelComposerXE/2017.0.098" python setup.py build_deps ``` gujinghui Pull Request resolved: https://github.com/pytorch/pytorch/pull/12026 Differential Revision: D10041199 Pulled By: yinghai fbshipit-source-id: b7310bd84a494ac899d8e25da368b63feed4eeaf	2018-09-25 16:56:29 -07:00
Brian Gesiak	b7b9e3c7e8	Fix "identifier following the 'template' keyword does not refer to a template" (#12037 ) Summary: LLVM trunk emits an error diagnostic when attempting to compile caffe2. The identifiers following the `template` keywords are not templates, so the use of the keyword does not make sense in this context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12037 Reviewed By: ezyang Differential Revision: D10024531 Pulled By: modocache fbshipit-source-id: da4b9ba405d9f7fd633ab8c1a61c77da9c1a1f89	2018-09-25 16:40:42 -07:00
Edward Yang	1e28294487	Delete some unused variables. (#12059 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12059 Differential Revision: D10034632 Pulled By: ezyang fbshipit-source-id: ff33da0d93734856b8e8bcfe744cefe127fffb91	2018-09-25 14:25:21 -07:00
Edward Yang	e53e8df20b	Support TypeIdentifier::name() (#12036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12036 Sometimes you have a TypeIdentifier, and no way to get to the TypeMeta. Still nice to be able to read out the name. This should be obsoleted by smessmer's patches. Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024554 fbshipit-source-id: 42cdceefd5c59be0441254665f66f5edc829f422	2018-09-25 14:25:19 -07:00
Edward Yang	aa1adde80b	Refactor fastGet/fastSet for clarity, removing a null pointer check. (#11902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11902 Previously, they were going through THTensor_getStoragePtr which incurred a null pointer check on storage. Now they use unsafe_data method which doesn't do this check. I don't know if this actually make things go faster, but I get an added bonus of reducing code duplication, so we should take this change anyway :) Reviewed By: SsnL Differential Revision: D9977654 fbshipit-source-id: f45c74828213a0439480755ad0b2d7f8858cb327	2018-09-25 13:55:53 -07:00
Edward Yang	ceadde2a7f	Add some more locations to search for nccl. (#12063 ) Summary: Users generally expect ./configure to find libraries installed in /usr/local and /usr, so search for nccl there too. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12063 Differential Revision: D10036248 Pulled By: ezyang fbshipit-source-id: d331ddd2ccc8ac9846fb54222db284b1ec371659	2018-09-25 13:27:54 -07:00
Sam Gross	b263078bc3	Fix CUDA division by a scalar on large arrays. (#12023 ) Summary: The gpu_unary_kernel function was not handling arrays that cannot use 32-bit indexing. This functions was only called directly by CUDA division by a scalar. Other arithmetic operations go through gpu_binary_kernel, which already properly handled large arrays. This bug sometimes manifested as a crash and sometimes as an incorrect answer. Fixes #11788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12023 Differential Revision: D10034017 Pulled By: colesbury fbshipit-source-id: b17300f327de54035746bf02f576766007c9b144	2018-09-25 13:10:25 -07:00
vishwakftw	a106388187	Free MAGMA queues after use (#11882 ) Summary: This PR is a minor change, just adds a simple `magma_queue_destroy` function to the implementation of `Gesv`. Also, I have replaced calls for obtaining handles with those already written in ATen. ``` THCState_getCurrentSparseHandle(at::globalContext().getTHCState()) --> getCurrentCUDASparseHandle() THCState_getCurrentBlasHandle(at::globalContext().getTHCState()) --> getCurrentCUDABlasHandle() ``` Differential Revision: D10032204 Pulled By: soumith fbshipit-source-id: ccd11989ecdc357313f0b661a2468f75d3aecb0e	2018-09-25 12:56:57 -07:00
Sebastian Messmer	8f0db9bbbb	Removing some dependency edges from Blob to other caffe2 (#12043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043 Re-trying D9979976, this time with all call sites fixed. D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems. I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource. Reviewed By: ezyang Differential Revision: D10026392 fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157	2018-09-25 11:40:24 -07:00
Orion Reblitz-Richardson	94c513cc7f	Improve pybind11 message (#11640 ) Summary: Improving the message based on https://github.com/pytorch/pytorch/issues/11570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11640 Differential Revision: D10033383 Pulled By: orionr fbshipit-source-id: 0cdcdbe0582d896283a12970aebe771efa390dd2	2018-09-25 11:26:05 -07:00
Duc Ngo	364ae10bb8	nomnigraph - easy - add some python test helper methods (#12020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12020 - make it less verbose to create random blobs in python unit test by adding some test helper methods - move str_compare test helper method to test_util.py Reviewed By: ZolotukhinM Differential Revision: D10003637 fbshipit-source-id: cb79d2ad508341f750a1bb8f564e87d055c65652	2018-09-25 10:55:19 -07:00
Will Feng	7122f8b3bb	Disable more flaky tests on CircleCI (#11399 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11399 Differential Revision: D9736673 Pulled By: yf225 fbshipit-source-id: cad8c0e86a70a01b047e648975ca5b9926e4acb3	2018-09-25 10:25:30 -07:00
Edward Yang	d7e11e3aae	Revert "Move CreateContext to global registry (#11688 )" (#12049 ) Summary: This reverts commit 3ae6ee4ebded136da30aa53fd3873d84acfbc9f0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12049 Differential Revision: D10030954 Pulled By: ezyang fbshipit-source-id: 6ca9de65b707c5b4c68280fc6f1b8e5ad7251efc	2018-09-25 10:13:43 -07:00
Edward Yang	3deb4791c3	Replace 'struct Tensor' with 'class Tensor'. (#12034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12034 We need ATen and Caffe2 to line up, and the rule is that if you have any private/protected members, you should declare it as a class. Class we go. (There are some other obvious candidates for this treatment, but I've kept this patch just to Tensor) Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024467 fbshipit-source-id: 17cfe2741ba9c3f56cb87d6f5d1afd3c61a8e4fe	2018-09-25 09:54:35 -07:00
Edward Yang	fcb3ccf23f	Don't record Git version automatically via cmake (#12046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12046 This /sounds/ like a good idea in theory, but a feature like this must be implemented very carefully, because if you just plop the Git version in a header (that is included by every file in your project, as macros.h is), then every time you do a 'git pull', you will do a FULL rebuild, because macros.h is going to regenerate to a new version and of course you have to rebuild a source file if a header file changes. I don't have time to implement it correctly, so I'm axing the feature instead. If you want git versions in, e.g., nightly builds, please explicitly specify that when you feed in the version. Reviewed By: pjh5 Differential Revision: D10030556 fbshipit-source-id: 499d001c7b8ccd4ef15ce10dd6591c300c7df27d	2018-09-25 09:40:19 -07:00
Gregory Chanan	0947712e5d	Move Factory functions from Type to TypeExtendedInterface. (#12025 ) Summary: This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions. In particular: 1) Removes factory functions from Type, into TypeExtendedInterface. 2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads. 3) We move from Type-based sparse_coo_tensor dispatch to function-based. Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch. That is coming in a future patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025 Reviewed By: ezyang Differential Revision: D10017205 Pulled By: gchanan fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320	2018-09-25 09:40:17 -07:00
Edward Yang	d4ce41c4de	Rename tensor_impl_ to impl_ in Tensor (#12035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12035 This brings it in line with Caffe2's naming Reviewed By: mingzhe09088 Differential Revision: D10024485 fbshipit-source-id: a6feef82a56b5eb3043b0821ea802ba746e542a0	2018-09-25 09:11:39 -07:00
Edward Yang	71b99f28be	Give default values to members of TensorImpl. (#12033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12033 These are reasonable sensible default values. One key pick is -1 for numel: this is because in Caffe2, a tensor may be in "un-allocated" with no storage state; this is historically represented in Caffe2 with numel_ == -1 Reviewed By: mingzhe09088 Differential Revision: D10024439 fbshipit-source-id: a167d727a7665daac7e7a1e98c0c89d8f1da6fa6	2018-09-25 09:11:37 -07:00
Maciej Bargiel	2cdf98a74d	Back out "Removing some dependency edges from Blob to other caffe2" Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223 Differential Revision: D10026321 Ninja: stable broken fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6	2018-09-25 01:11:14 -07:00
Hong Xu	3417a1e7e4	Prepend a "const" to a for loop in printPyObject. (#11857 ) Summary: As pytuple should be a constant type (since obj is constant), potential errors would occur without this const decorator, e.g., when compiling against PyPy. Although PyPy is not supported yet, it would still be useful if we remove this compilation issue (out of very few numbers of compilation issues) to allow hackers playing with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11857 Differential Revision: D10024149 Pulled By: soumith fbshipit-source-id: aa7e08e58f6369233a11477113351dccd3854ba8	2018-09-24 23:12:57 -07:00
Sebastian Messmer	17a65bf9b6	Removing some dependency edges from Blob to other caffe2 (#11923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923 This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore. (1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete. (2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go. This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs. Codemods: $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use. Reviewed By: ezyang Differential Revision: D9979976 fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61	2018-09-24 22:57:05 -07:00
Edward Yang	dfa03e94eb	Fix mispelling of AVAILABLE. (#12016 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12016 Reviewed By: pietern Differential Revision: D10010808 Pulled By: ezyang fbshipit-source-id: ff6394ae9a53f7fdad2cadb4e019e09ac63bba96	2018-09-24 20:46:41 -07:00
John	86e025fca2	magma-cuda should reference updated versions (#12000 ) Summary: Source build doc section LAPACK GPU only lists magma-cuda80 The magma-cuda version should reflect the installed version of cuda. - Verified on ubuntu with magma-cuda92 with build and test - Verified 91 is available Pull Request resolved: https://github.com/pytorch/pytorch/pull/12000 Differential Revision: D10024158 Pulled By: soumith fbshipit-source-id: a34c85a5e87b52657f1e6f7b21d235306ab7b2aa	2018-09-24 20:26:26 -07:00
Pieter Noordhuis	5d4624a1d9	Fix return temporary as reference in MPI backend (#11947 ) Summary: The MPI async work class returned a temporary as reference, which is invalid (hat tip to colesbury for noticing it). This change fixes that and uses a std::exception_ptr to hold on to the exception if applicable, and then returns the reference by throwing it and returning it, like the existing code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11947 Differential Revision: D10019928 Pulled By: pietern fbshipit-source-id: 5a8ed0e894615a09224ca5e48c8b3104275a3019	2018-09-24 20:17:38 -07:00
Spandan Tiwari	9068a46dba	Fix deprecated function warning in ONNX model test. (#11827 ) Summary: When running /test/onnx/test_models.py, we see deprecation warnings in the test points for `super_resolution` and `squeezenet` models. This change updates those models to use the recommended methods, instead of the deprecated ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11827 Reviewed By: houseroad Differential Revision: D10023998 Pulled By: ezyang fbshipit-source-id: ee4e14304678c532ebd574e7bd143e3b311995ab	2018-09-24 19:59:02 -07:00
Adam Paszke	a830964007	Eliminate no-op adds and muls in peephole pass (#11801 ) Summary: Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801 Differential Revision: D9916815 Pulled By: apaszke fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494	2018-09-24 17:48:48 -07:00
Jerry Zhang	3ae6ee4ebd	Move CreateContext to global registry (#11688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688 As a first step to remove static context(merge with allocator), we'll create a global registries for context constructors, and remove CreateContext function from tensor. Reviewed By: ezyang, dzhulgakov Differential Revision: D9779821 fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1	2018-09-24 17:07:50 -07:00
Bram Wasti	b7c302da1a	Make gen_jit_dispatch runnable (#12018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12018 Tried to use the file and ran into a small bug, this fixes it Differential Revision: D10013231 fbshipit-source-id: 4cf8c29cf9e2cedd7a28fa0cc0196e5144a54bf2	2018-09-24 16:09:48 -07:00
Hoa Dinh	70e4b3ef59	Revert D10006069: Remove TIndex typedef from core/common.h Differential Revision: D10006069 Original commit changeset: 5e2aac993968 fbshipit-source-id: fbd8d3860635211e641ca14eaff7a64882e0d6bd	2018-09-24 15:30:25 -07:00
Peter Goldsborough	e05d689c49	Unify C++ API with C++ extensions (#11510 ) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), which is only passed when building a C++ extension. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535	2018-09-24 14:44:21 -07:00
Sam Gross	1c09bfde1b	Make promoteType(half, integer) -> half (#11941 ) Summary: Changes the result type of half type and any integer type to return half type (instead of float or double). This is based on top of #11808. The first new commit is "Make promoteType(half, integer) -> half". I'll rebase on top of master once that PR lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11941 Differential Revision: D10014122 Pulled By: colesbury fbshipit-source-id: 16a5eb3406a5712069201d872d8736d0599e9411	2018-09-24 13:55:42 -07:00
Adam Paszke	51414822f5	Stop moving constants into DifferentiableSubgraphs (#11809 ) Summary: Or even taking them as inputs. This prevents optimizations to happen either inside the differentiable subgraphs, or in the surrounding graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809 Differential Revision: D10009680 Pulled By: apaszke fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4	2018-09-24 13:24:53 -07:00
Syed Tousif Ahmed	ffbac7d0bb	Miscellaneous updates for CUDA 10 (#12017 ) Summary: This PR has some updates related to CUDA 10. - `c2195e9864` ensures that the repo successfully builts on CUDA 10. Addresses https://github.com/pytorch/pytorch/issues/11888 - `423d8d3524` follows up on the cufft max plan number bug: https://github.com/pytorch/pytorch/issues/11089, which has been fixed in CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12017 Differential Revision: D10013405 Pulled By: soumith fbshipit-source-id: 5bc6d7f71d5133f7821b407b1ac6c51bef0f6fa8	2018-09-24 11:58:32 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Christian Puhrsch	1a1d79e761	Remove TIndex typedef from core/common.h (#11993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11993 See title Reviewed By: ezyang Differential Revision: D10006069 fbshipit-source-id: 5e2aac993968307c850e431c00052cb1a339ced2	2018-09-24 10:55:55 -07:00
Christian Puhrsch	a9e6a673ae	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876 Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes(). codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>" codemod -d tc --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" Reviewed By: ezyang Differential Revision: D9948572 fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c	2018-09-24 10:40:10 -07:00
Gregory Chanan	1178851280	Get rid of most usages of Type.tensor. (#12002 ) Summary: 1) Most usages are replaced by at::empty. 2) native_tensor has its namespace function removed 3) Type.tensor(sizes, strides) becomes at::empty_strided(sizes, strides). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12002 Differential Revision: D10007201 Pulled By: gchanan fbshipit-source-id: 5e5647c050ed2ecb87a33e0b5ce4928fa3186c34	2018-09-24 10:16:18 -07:00
Christian Puhrsch	76ab26cc3e	Remove unused THNN functions due to removal of torch/legacy (#11946 ) Summary: See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/11946 Differential Revision: D9994625 Pulled By: cpuhrsch fbshipit-source-id: fca3d48ecbdab06ce53249db2402fc4613da4d21	2018-09-22 21:54:55 -07:00
Christian Puhrsch	a6630e25af	Remove many caffe2::TIndex and replace them with int64_t (#11943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11943 See title Reviewed By: ezyang Differential Revision: D9992645 fbshipit-source-id: e8f80d6ea762971513e5e8072975ceea53e1f11a	2018-09-22 18:11:04 -07:00
Greg McGary	5d0f1c3c8f	Add #include to satisfy Android NDK unified headers Summary: Old per-API+arch headers reside in /opt/android_ndk/r/platforms/android-/arch-/usr/include/ New Unified headers reside in /opt/android_ndk/r/sysroot/usr/include/ Unified headers are not exactly drop-in replacements for the old ones. Old headers had some nested includes that are absent in the unified versions, so we need to explicitly include them. Reviewed By: mzlee Differential Revision: D9952200 fbshipit-source-id: 6515e1d1ab576069db499c3fb23a69d507279c8c	2018-09-22 15:39:56 -07:00
Junjie Bai	7517e53468	Update onnx submodule to onnx/onnx@c4734c6 (#11958 ) Summary: `c4734c6200` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11958 Differential Revision: D10002779 Pulled By: bddppq fbshipit-source-id: 8bd7dfc8fdaf0b699a61f5b228f7102a16b92258	2018-09-22 01:40:31 -07:00
Junjie Bai	f15474ade8	Export caffe2::Caffe2Annotation symbols (#11965 ) Summary: Some of these symbols are used by device_test.cc . `d0db23e95a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11965 Reviewed By: bwasti Differential Revision: D10002439 Pulled By: bddppq fbshipit-source-id: 4ae95b9c888b3c7685d0ffdbcbfa3441bcf90091	2018-09-21 22:43:48 -07:00
Sebastian Messmer	1c282ab99a	Move GetExceptionString to Error.h (#11501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11501 This doesn't really belong to TypeMeta, moving it to the error handling header Reviewed By: ezyang Differential Revision: D9763424 fbshipit-source-id: 127a8246171ab3a4475f2767d2dc1cc13c486a2e	2018-09-21 21:54:33 -07:00
Peter Goldsborough	825181ea9d	Rewrite C++ API tests in gtest (#11953 ) Summary: This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch. You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were: 1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt` 2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all. 3. Updated README ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953 Differential Revision: D9998883 Pulled By: goldsborough fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80	2018-09-21 21:28:16 -07:00
Bram Wasti	d0db23e95a	Add distributed annotations Summary: Annotations for DAI Reviewed By: duc0 Differential Revision: D9805867 fbshipit-source-id: 9ce2d9f3984817510ec8362a281f39878aad55e7	2018-09-21 19:09:59 -07:00
Wei Yang	de11fe0c83	migrate PReLU to ATen (#11758 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10723 - migrate PReLU to ATen and deprecate legacy PReLU - performance: CPU with weight.numel() = 1 ``` >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 100 loops, best of 100: 9.43 ms per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 10 loops, best of 100: 24.4 ms per loop >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 695 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.47 ms per loop ``` CPU with weight.numel() = channels ``` >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 603 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 13.3 ms per loop >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 655 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.45 ms per loop ``` CUDA with weight.numel() = 1 ``` >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 187 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.01 ms per loop >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 195 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.28 ms per loop ``` CUDA with weight.numel() = channel ``` >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 174 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.27 ms per loop >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 181 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.26 ms per loop ``` The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels. ezyang SsnL zou3519 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758 Differential Revision: D9995799 Pulled By: weiyangfb fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e	2018-09-21 16:26:04 -07:00
Owen Anderson	89d56ae435	Move function deletion from the stack to the heap. (#11611 ) Summary: This eliminates the need for any heuristics regarding stack size limits. This is a re-do #11534 with a fix to properly handle cases where multiple edges exist between a pair of functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11611 Differential Revision: D9991198 Pulled By: resistor fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0	2018-09-21 16:11:03 -07:00
Richard Zou	b5f60af94c	Shape prop view/reshape/as_strided through prim::ListConstructs (#11877 ) Summary: Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct. See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28). This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary to do premultiplication). If aten::view is passed an output of a prim::ListConstruct node, then shape prop should be able to figure out its TensorType because we statically know the number of inputs to prim::ListConstruct. This PR implements that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877 Differential Revision: D9972356 Pulled By: zou3519 fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5	2018-09-21 14:20:01 -07:00
Adam Paszke	7efbf3a827	Specialize ArgumentSpecs on tuple elements too (#11863 ) Summary: This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network. Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863 Differential Revision: D9992814 Pulled By: apaszke fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc	2018-09-21 14:19:58 -07:00
Sam Gross	1cf5b0c7c1	Fix casting logic for 0d CPU tensors in CUDA ops (#11808 ) Summary: Previously, we didn't cast any 0-dim tensors used in CUDA operations. We can only avoid the casts for 0-dim CPU tensors used in CUDA operations. Fixes #11795 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11808 Differential Revision: D9922406 Pulled By: colesbury fbshipit-source-id: 940b8a8534770aa5cd70d5d09b96be0f0f8146ff	2018-09-21 14:19:56 -07:00
Adam Paszke	1ad7e0c5ec	Minor JIT improvements (#11654 ) Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3	2018-09-21 14:19:54 -07:00
David Riazati	4e65fbfee5	Remove tests from EXCLUDE_SCRIPT that pass (#11916 ) Summary: Spruriously added in #11261 I had a PR to catch these automatically (#11279), but it had some issues passing on some CI environments but not others (e.g. for `test_nn_group_norm`), any ideas? Pull Request resolved: https://github.com/pytorch/pytorch/pull/11916 Differential Revision: D9992065 Pulled By: driazati fbshipit-source-id: 05cfa8ed9af939e8ffd5827847ee7bfe0be799b2	2018-09-21 14:19:50 -07:00
James Reed	00fe2c5606	Use -O1 for sleef build in Debug mode (#11942 ) Summary: `-O0` is problematic for compiling sleef kernels since they consist of a bunch of vector intrinsics. In `-O0`, the compiler spills every intermediate value to the stack. In one example (TestEndToEndHybridFrontendModels.test_snli in test_jit.py) the function `Sleef_tanhf8_u10avx2` would spill 30kB of AVX registers onto the stack and run two orders of magnitude slower than in opt mode, causing the test to take minutes rather than seconds. I've verified that this behavior is not present with `-O1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11942 Differential Revision: D9994658 Pulled By: jamesr66a fbshipit-source-id: cdd9474c6ae3aa9898d5715ac19a900f5f90468a	2018-09-21 13:24:59 -07:00
Thomas Viehmann	775358e4c2	Add non-legacy test of bilinear (#11935 ) Summary: Fixes: #11905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935 Differential Revision: D9991120 Pulled By: soumith fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6	2018-09-21 12:43:35 -07:00
Brian Johnson	23f5b2abbe	Fixes an error with canonical url. (#11938 ) Summary: Deleted this section by mistake in last PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11938 Reviewed By: SsnL Differential Revision: D9993258 Pulled By: brianjo fbshipit-source-id: 2552178cebd005a1105a22930c4d128c67247378	2018-09-21 12:21:42 -07:00
Adam Paszke	c2a2110d71	Stop tracing _out overloads (#11910 ) Summary: They aren't recognized anywhere in the JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/11910 Differential Revision: D9979968 Pulled By: apaszke fbshipit-source-id: bb2505a14e3b1e54d5c243f99c80a4f4d918b204	2018-09-21 11:44:10 -07:00
Yangqing Jia	c6a14b1edd	Revert D9985212: [pytorch][PR] [minor] remove a remaining todo line deletion in THD cmake Differential Revision: D9985212 Original commit changeset: 5f8e7ac94101 fbshipit-source-id: 1783cbfc91008ab3db36bad7c1bf51e16da7fb2d	2018-09-21 11:25:53 -07:00
Wei Yang	817e83fc01	fix PR #11061 (#11815 ) Summary: - fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data ` - with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args - `torch.as_tensor` retains its behavior as documented gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815 Differential Revision: D9932713 Pulled By: weiyangfb fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259	2018-09-21 11:04:19 -07:00
Thomas Viehmann	6834dcab1c	Align cuda multinomial without replacement to CPU behaviour (#11933 ) Summary: We do this by being more NaN tolerant. Fixes: #9062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11933 Differential Revision: D9991129 Pulled By: soumith fbshipit-source-id: c99b04462c1bee90d00eeabb0c111de12f855f4d	2018-09-21 11:04:17 -07:00
Gao, Xiang	784d345828	Fix docstring of `torch.jit.createResolutionCallback` (#11921 ) Summary: The sample code in the docstring of `torch.jit.createResolutionCallback` is not working: `createResolutionCallback()` gets the frame of `bar`. In order to get the frame of `baz`, one need to use `createResolutionCallback(1)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11921 Differential Revision: D9989123 Pulled By: soumith fbshipit-source-id: a7166defdccbbf6979f7df4c871298e6b9a2b415	2018-09-21 09:41:57 -07:00
Adam Paszke	e655f16c35	Pop stashed IntList in resize_, warn about its usage when tracing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11909 Differential Revision: D9979595 fbshipit-source-id: 07b1027bd6bd1605a31afd4f57bcd58e307fa41e	2018-09-21 08:40:20 -07:00
Thomas Viehmann	4fb7e72fe5	Fix _thnn_fused_lstm_cell backward (#11872 ) Summary: There are two parts: - Optional tensors cannot be dispatch tensors because dispatch tensors cannot be optional. - While the kernel dealt with undefined grad_outs, the logistics around it did not fully accomodate grad_hy being undefined. Fixes: #11800 Thank you, mttk for the reproduction! Pull Request resolved: https://github.com/pytorch/pytorch/pull/11872 Differential Revision: D9978527 Pulled By: apaszke fbshipit-source-id: e622c288d2eac93bd8388e141fb773f2588e2b8f	2018-09-21 08:25:00 -07:00
Edward Yang	48c8adfe1b	Turn storage on UndefinedTensorImpl into nullptr. (#11738 ) Summary: I also fix a bug that crept in while we had incorrect semantics where UndefinedTensorImpl was a CPU tensor, and thus some moves which shouldn't have been legal didn't crash. Moving out the Tensor* also moved out the Tensor* in the blob, and it's not supported to store an undefined tensor in a blob. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11738 Reviewed By: gchanan Differential Revision: D9847859 fbshipit-source-id: db6be0f76a8e6526a89fd0e87b6a23b9cc820c8d	2018-09-21 08:24:57 -07:00
Edward Yang	11bd2f2509	Retainable is no more (#11900 ) Summary: Stack:     ⚫  #11900 Retainable is no more  [💛](https://our.intern.facebook.com/intern/diff/D9977505/)     ⚪  #11902 Refactor fastGet/fastSet for clarity, removing a null pointer check.  [💛](https://our.intern.facebook.com/intern/diff/D9977654/) Kill it with fire Pull Request resolved: https://github.com/pytorch/pytorch/pull/11900 Differential Revision: D9979779 Pulled By: ezyang fbshipit-source-id: 0a437e7a0baadb6440e7dc39a01b4a406171faa7	2018-09-21 06:58:18 -07:00
peter	a7afd133f5	Sync FindCUDA.cmake with upstream cmake repo (#11880 ) Summary: Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391/diffs Pull Request resolved: https://github.com/pytorch/pytorch/pull/11880 Differential Revision: D9989119 Pulled By: soumith fbshipit-source-id: 66e87367127975a5f1619fe447f74e76f101b503	2018-09-21 06:58:17 -07:00
Luca Antiga	58d28a5f12	Fix saving loaded module (#11915 ) Summary: This PR fixes #11913. In order to test for this, the model is serialized twice in `getExportImportCopy`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11915 Differential Revision: D9984697 Pulled By: soumith fbshipit-source-id: ae0250c179000c03db1522b99410f6ecb9681297	2018-09-21 06:58:16 -07:00
Yangqing Jia	0d9be2135f	remove a remaining todo line deletion in THD cmake (#11920 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/11920 Differential Revision: D9985212 Pulled By: Yangqing fbshipit-source-id: 5f8e7ac94101177740e791f44eaa8c8ec55a908c	2018-09-21 00:40:20 -07:00
Sebastian Messmer	b2b05b7c20	Move blob serialization to free functions (#11817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817 Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead. This takes away access to Blob internals from them and makes future refactorings easier. Reviewed By: ezyang Differential Revision: D9882726 fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729	2018-09-20 23:27:34 -07:00
Brian Johnson	17cd426c72	Updated docs styles (#11835 ) Summary: Updated requirements.txt and conf.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11835 Reviewed By: SsnL Differential Revision: D9941160 Pulled By: brianjo fbshipit-source-id: fbac91214558e6d17beff74261d990c7dc762038	2018-09-20 21:11:12 -07:00
Peter Goldsborough	d712a71741	Protobuf serialization (#11619 ) Summary: This PR serves two purposes: 1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general, 2. Add serialization to the ONNX/PyTorch proto format. This is currently a rough prototype I coded up today, to get quick feedback. For this I propose the following serialization interface within the C++ API: ```cpp namespace torch { namespace serialize { class Reader { public: virtual ~Reader() = default; virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; class Writer { public: virtual ~Reader() = default; virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; }} // namespace torch::serialize ``` There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to: 1. Provide a cereal-less serialization forward that we can ship and iterate on going forward, 2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft. The user-facing API is (conceptually): ```cpp void torch::save(const Module& module, Writer& writer); void torch::save(const Optimizer& optimizer, Writer& writer); void torch::read(Module& module, Reader& reader); void torch::read(Optimizer& optimizer, Reader& reader); ``` with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader` ebetica ezyang zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619 Differential Revision: D9984664 Pulled By: goldsborough fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847	2018-09-20 20:39:34 -07:00
Roy Li	30521a37ad	codemod: caffe::float16 -> at::Half (#11785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11785 Replace each instead of float16 with Half. Reviewed By: Yangqing Differential Revision: D9892158 fbshipit-source-id: b9225ca7bd5c84fd1c04a9d24b026c8b6cbff120	2018-09-20 18:55:19 -07:00
Roy Li	a9459bf7b5	Replace float16 with at::Half in caffe2 (#11676 ) Summary: - Finishes unifying Half type in pytorch and caffe2 - As a side effect, aten_op works for fp16 now Pull Request resolved: https://github.com/pytorch/pytorch/pull/11676 Reviewed By: weiyangfb Differential Revision: D9829019 Pulled By: li-roy fbshipit-source-id: b8c9663873c10fe64c90ef180dc81af2e866674e	2018-09-20 18:55:17 -07:00
Lu Fang	9c44c60794	Bump up the frontend version (#11873 ) Summary: To update the onnx model zoo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11873 Reviewed By: BIT-silence Differential Revision: D9953369 Pulled By: houseroad fbshipit-source-id: 5e96a982b8029dceeb08e3bea4094bae053e1865	2018-09-20 16:20:48 -07:00
Thomas Viehmann	9f0d9db6e4	Improve GRU/LSTM documentation for multiple layers (#11896 ) Summary: Prompted by Alex Falcon's input on the forums. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/11896 Differential Revision: D9976831 Pulled By: SsnL fbshipit-source-id: 460af51049c289ed4ce529b7b6ae6314e2bdaae4	2018-09-20 15:42:48 -07:00
Ashish	c7751f4df0	MIOpen bug fixes and performance enhancements (#11766 ) Summary: This PR contains changes for: 1. Performance enhancements for group conv using MIOpen 2. Performance enhancements by removing unnecessary computations while running pooling through MIOpen 3. Added check for bwdData comptutation while running MIOpen convGradient operator 4. Fix in MIOpen poolingGradient operator to compute window size for global pooling case 5. Minor code cleanup in MIOpen spatial batch norm operator Differential Revision: D9979050 Pulled By: bddppq fbshipit-source-id: fabc7a44a2f9ca0307d99564d1ce8fe1de9a6fbb	2018-09-20 15:31:46 -07:00
yya007	b91b15d86e	Implementing Matrix Norm for torch.norm (#11261 ) Summary: Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261 Reviewed By: li-roy Differential Revision: D9652379 Pulled By: yya007 fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c	2018-09-20 14:43:13 -07:00
Peter Goldsborough	6100c0ea14	Introduce ExtensionVersioner for C++ extensions (#11725 ) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383	2018-09-20 14:43:12 -07:00
Thomas Viehmann	068eac255b	Jit fuse clamp (#11574 ) Summary: This patch adds fused forward and backward for clamp to the jit. This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 . The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged. This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11574 Differential Revision: D9943090 Pulled By: apaszke fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67	2018-09-20 14:43:10 -07:00
Christian Puhrsch	d8f6be686d	Remove torch/legacy (#11823 ) Summary: Largely unused and hinders current development Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823 Differential Revision: D9925094 Pulled By: cpuhrsch fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5	2018-09-20 14:00:54 -07:00
Pieter Noordhuis	24ec813967	Defer lazyInitCUDA() until needed (#11893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11893 This is needed to run binaries compiled with CUDA support on on CPU-only machines. Reviewed By: teng-li Differential Revision: D9972872 fbshipit-source-id: 7e4107925b3cd4d2fcf84ae532e800ab65f4b563	2018-09-20 12:12:42 -07:00
Edward Yang	9cd0ae5e2d	Remove deprecated factory functions from Type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11583 Reviewed By: SsnL Differential Revision: D9792800 fbshipit-source-id: 9af46d577911ff38647790169df66aa5d0379dd9	2018-09-20 11:39:48 -07:00
Soumith Chintala	87701289a3	fix link to previous versions (#11894 ) Summary: https://github.com/pytorch/pytorch.github.io/issues/68#issuecomment-423073108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11894 Differential Revision: D9973695 Pulled By: soumith fbshipit-source-id: 1f74b12487ec39f4e88b527dcdfca0742e689c15	2018-09-20 11:10:37 -07:00
Soumith Chintala	0927386890	Workaround CUDA logging on some embedded platforms (#11851 ) Summary: Fixes #11518 Upstream PR submitted at https://gitlab.kitware.com/cmake/cmake/merge_requests/2400 On some embedded platforms, the NVIDIA driver is verbose logging unexpected output to stdout. One example is Drive PX2, where we see something like this whenever a CUDA program is run: ``` nvrm_gpu: Bug 200215060 workaround enabled. ``` This patch does a regex on the output of the architecture detection program to only capture architecture patterns. It's more robust than before, but not fool-proof. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11851 Differential Revision: D9968362 Pulled By: soumith fbshipit-source-id: b7952a87132ab05c724b287b76de263f1f671a0e	2018-09-20 09:26:00 -07:00
Pieter Noordhuis	1c77f9e543	Support torch.distributed.barrier in gloo backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11844 Reviewed By: colesbury, SsnL Differential Revision: D9929055 Pulled By: pietern fbshipit-source-id: 3a34a179cb80f495f18aa926c0f9513924737d8e	2018-09-20 09:25:59 -07:00
Richard Zou	8f4601fbac	renable test_scalar_fusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11378 Differential Revision: D9943578 Pulled By: zou3519 fbshipit-source-id: fb9e4303e844d5e2515acce7869bcbe11526ab56	2018-09-20 07:56:25 -07:00
Alexander Sidorov	23dd5b4a53	Back out "Open-source ThreadSafeActivationCleaningPredictor" Summary: Original commit changeset: bfe253ae5fc8 Apparently Ads push process detected some regression which normal canaries don't show. https://fb.facebook.com/groups/1274424122598505/permalink/2597819483592289/ Reviewed By: highker, Prowindy Differential Revision: D9952807 fbshipit-source-id: 1a3ea249c3b1e2618220c61f3d51468824b6ef10	2018-09-19 21:26:51 -07:00
Hong Xu	83740eae4a	Avoid using PyThreadState.frame as it is not a public member. (#11855 ) Summary: The doc of PyThreadState [1] emphasizes that interp is its only public member. Use PyEval_GetFrame() instead. [1] https://docs.python.org/3/c-api/init.html#c.PyThreadState Pull Request resolved: https://github.com/pytorch/pytorch/pull/11855 Differential Revision: D9954430 Pulled By: ezyang fbshipit-source-id: 92da6781e45e2bcb5e3a37b162fa40e49d823215	2018-09-19 20:58:37 -07:00
Tommy Yu	c64331f48f	Add test for verifying combine_spatial_bn values in DPM (#11710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11710 Added a test to check that output and gradient values are correctly calculated wehn combine_spatial_bn is true on data parallel model Reviewed By: enosair Differential Revision: D9833660 fbshipit-source-id: 14d29fbebefa9dc303ffae06f9899ea4bde23025	2018-09-19 20:17:51 -07:00
Mingzhe Li	aa8cd7319a	Enable build_test on windows (#11802 ) Summary: This PR enables BUILD_TEST for Caffe2 on windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11802 Reviewed By: orionr Differential Revision: D9951223 Pulled By: mingzhe09088 fbshipit-source-id: 7cdc1626b999daadeae482bd569eebdbd53eb6d4	2018-09-19 20:17:49 -07:00
Peter Goldsborough	c22dcc266f	Show build output in verbose mode of C++ extensions (#11724 ) Summary: Two improvements to C++ extensions: 1. In verbose mode, show the ninja build output (the exact compile commands, very useful) 2. When raising an error, don't show the `CalledProcessError` that shows ninja failing, only show the `RuntimeError` with the captured stdout soumith fmassa ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11724 Differential Revision: D9922459 Pulled By: goldsborough fbshipit-source-id: 5b319bf24348eabfe5f4c55d6d8e799b9abe523a	2018-09-19 20:17:43 -07:00
David Riazati	1091c5e59f	Throw error on indexing a 0 dim tensor (#11679 ) Summary: Following through on warning that indexing 0-dim tensor would be an error in PyTorch 0.5 and to use `item()` instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/11679 Reviewed By: soumith Differential Revision: D9833570 Pulled By: driazati fbshipit-source-id: ac19f811fa7320d30b7f60cf66b596d6de684d86	2018-09-19 18:10:03 -07:00
Lu Fang	6831d64591	Fix the symbolic for embedding_bag in ONNX_ATEN_FALLBACK (#11840 ) Summary: The ATen interface was changed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11840 Reviewed By: BIT-silence Differential Revision: D9932452 Pulled By: houseroad fbshipit-source-id: dd2040fcaa0f6052e5856ee19823cf3064124585	2018-09-19 17:40:39 -07:00
sytrus-in-github	ae1a972d78	Fix #11752 : correct numerical issue with log_softmax (#11866 ) Summary: This fixes the numerical problem in log_softmax cpu code when inputs are big but their differences are small. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11866 Differential Revision: D9946799 Pulled By: soumith fbshipit-source-id: 11fe8d92b91ef6b7a66f33fbce37ec2f0f0929be	2018-09-19 17:09:45 -07:00
Edward Yang	6302e4001a	Delete unnecessary include from allocator.cc/event_cpu.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11862 Reviewed By: Yangqing Differential Revision: D9942428 fbshipit-source-id: dea03f5ba0e621a047aa50bc4aa97acc834d2a39	2018-09-19 16:45:54 -07:00
Edward Yang	f4d25039cb	Fix Array.h when compiled with C++17 (#11816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11816 The file isn't in the std:: namespace, so is_same must be qualified. Reviewed By: smessmer Differential Revision: D9923774 fbshipit-source-id: 126532e27f08b5616ca46be1293d5d837920f588	2018-09-19 16:45:53 -07:00
Edward Yang	b06e35b568	Back out "Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h" Summary: Original commit changeset: 0d1792804d73 Reviewed By: Yangqing Differential Revision: D9940725 fbshipit-source-id: 540a8ac7afcfe56a6b63abc6ed297c9434320998	2018-09-19 16:45:51 -07:00
Edward Yang	cedd12d86a	Explicitly qualify references to CPU. (#11819 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11819 Differential Revision: D9928730 Pulled By: ezyang fbshipit-source-id: 3140b6ef168586558f04fa8ee90f6f2169605d7d	2018-09-19 16:45:49 -07:00
Tongzhou Wang	24e958a0a7	Move bernoulli into ATen (#10273 ) Summary: + https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken fixed in moving `bernoulli_out` to ATen + https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods + https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results fixed by adding CUDA asserts In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors. The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like: ```cpp // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__( int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4, const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) { curandStatePhilox4_32_10_t state; curand_init( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second, &state); float4 rand = curand_uniform4(&state); switch (n) { case 4: { assert(0 <= p4 && p4 <= 1); v4 = static_cast<scalar_t>(rand.w <= p4); } case 3: { assert(0 <= p3 && p3 <= 1); v3 = static_cast<scalar_t>(rand.z <= p3); } case 2: { assert(0 <= p2 && p2 <= 1); v2 = static_cast<scalar_t>(rand.y <= p2); } case 1: { assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(rand.x <= p1); } } } ); ``` Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops: post patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 6.841588497161865 +- 0.05413117632269859 torch.bernoulli(xc) 0.05963418632745743 +- 0.0008014909108169377 x.bernoulli_() 0.4024486541748047 +- 0.0021550932433456182 xc.bernoulli_() 0.02167394384741783 +- 2.3818030967959203e-05 ``` pre-patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 12.394511222839355 +- 0.0966421514749527 torch.bernoulli(xc) 0.08970972150564194 +- 0.0038722590543329716 x.bernoulli_() 1.654480218887329 +- 0.02364428900182247 xc.bernoulli_() 0.058352887630462646 +- 0.003094920190051198 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273 Differential Revision: D9831294 Pulled By: SsnL fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088	2018-09-19 16:45:47 -07:00
Bram Wasti	cf5a21e4a1	Add back proto opt disable feature that was lost during refactor (#11875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11875 Seems like the refactor to predictor_config dropped some functionality that is now blocking other teams rFBS2b30208263c14ce7039f27c618a3b232bf11ee33 is the change that was missed hoping to land this quickly :) Reviewed By: jonmorton Differential Revision: D9948324 fbshipit-source-id: 1628f7c51c06319fa7ca5dc9d59799135bb82c5f	2018-09-19 15:33:26 -07:00
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Lu Fang	ce55767091	Add the missing header (#11864 ) Summary: Otherwise, some macro doesn't have the definition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11864 Reviewed By: BIT-silence Differential Revision: D9943327 Pulled By: houseroad fbshipit-source-id: 53e1bfc7a6b832f249f169b75a8fc15cdab63bf4	2018-09-19 14:40:19 -07:00
Ansha Yu	3b1a5a1b8a	Refactor tests part 2 (#11811 ) Summary: Followup to the [first refactor](https://github.com/pytorch/pytorch/pull/11350). Increase coverage of tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/11811 Reviewed By: houseroad Differential Revision: D9923074 Pulled By: ajyu fbshipit-source-id: 0f899bb9e9a75bf7ed939e06cc9b028daa7f6bd9	2018-09-19 10:09:28 -07:00
Pieter Noordhuis	52472508e9	Add env:// rendezvous test (#11782 ) Summary: A missing environment variable raised a missing key error. Now it raises a more descriptive error of the actual problem, for example: ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set Pull Request resolved: https://github.com/pytorch/pytorch/pull/11782 Differential Revision: D9888962 Pulled By: pietern fbshipit-source-id: 5947e7a7bf7aa45f13bbd7b5e997529f26cc92d6	2018-09-19 09:56:06 -07:00
Will Feng	fa32317780	Add empty tensor tests to test_sparse (#11228 ) Summary: This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass. [NOTE] API CHANGE: - `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228 Differential Revision: D9930449 Pulled By: yf225 fbshipit-source-id: 7c62439b216a6badf7938a10741c358ff18a556d	2018-09-19 09:40:26 -07:00
Adam Paszke	8c3a94eaf2	Improve autograd profiler performance (#11773 ) Summary: To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine. \| Run \| Time \| \|----------------------------------------------\|-------------------------\| \| No profiler \| 45ms \| \| With profiler \| 56ms \| \| Use `clock_gettime` instead of `std::chrono` \| 48ms \| \| Touch all pages on block allocation \| 48ms (less jitter) \| \| Use `const char*` instead of `std::string` \| 47ms (even less jitter) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773 Differential Revision: D9886858 Pulled By: apaszke fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0	2018-09-19 09:25:43 -07:00
Peter Goldsborough	b3a2665e0f	Code-reorg to have TORCH_ARG in its own header (#11787 ) Summary: I noticed I was including `torch/nn/pimpl.h` in the optimizer library just to access `TORCH_ARG`, even though that file includes a lot of irrelevant code. Let's save some re-compilation time by refactoring this macro into a separate logical file. #small-wins ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11787 Differential Revision: D9924447 Pulled By: goldsborough fbshipit-source-id: 5acd4ba559ffb2a3e97277e74bb731d7b1074dcf	2018-09-19 09:25:41 -07:00
Lu Fang	32494c226e	OperatorDef <==> NodeProto Conversion (#11621 ) Summary: Operator level proto conversion between (new) torch proto and (old) caffe2 proto. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11621 Reviewed By: BIT-silence Differential Revision: D9892422 Pulled By: houseroad fbshipit-source-id: 01a55ec0a09479876a27082d90fc970723f4d431	2018-09-19 08:41:33 -07:00
Natalia Gimelshein	8601b33c07	fix half grad assignment (#11781 ) Summary: currently grad assignment for half type fails with a misleading RuntimeError ``` RuntimeError: torch.cuda.sparse.HalfTensor is not enabled. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11781 Differential Revision: D9931884 Pulled By: soumith fbshipit-source-id: 03e946c3833d1339a99585c9aa2dbb670f8bf459	2018-09-18 23:00:49 -07:00
Alexander Sidorov	b46f1b8ca7	Open-source ThreadSafeActivationCleaningPredictor (#11779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11779 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11731 This Predictor provides threadsafe interface and also cleans-up activations after each run. So in multi-model setup activation space doesn't explode Reviewed By: highker Differential Revision: D9842374 fbshipit-source-id: bfe253ae5fc813e73a347c5147ff6b58d50781ea	2018-09-18 21:56:58 -07:00
Soumith Chintala	77af40c025	prioritize Accelerate over OpenBLAS (#11812 ) Summary: might fix some binary build issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/11812 Reviewed By: ezyang Differential Revision: D9927309 Pulled By: soumith fbshipit-source-id: 9ed6c2c6fedc2a1cffbf52bc0a795135d4239800	2018-09-18 21:56:57 -07:00
Yangqing Jia	53b5f14f59	Remove inclusion of caffe2 pb (#11820 ) Summary: Probably not needed, but fwiw. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11820 Reviewed By: orionr Differential Revision: D9924953 Pulled By: Yangqing fbshipit-source-id: 4d340e3d4f4dadc50fb68bed9572b8e1e54b5f6d	2018-09-18 21:16:19 -07:00
Yinghai Lu	a26ad5a332	Remove unnecessary check on device option pointer (#11845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11845 The device pointer will be used by cudaPointerGetAttributes, which handles nullptr already. So this check is not necessary. https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#group__CUDART__UNIFIED_1gd89830e17d399c064a2f3c3fa8bb4390 Reviewed By: salexspb Differential Revision: D9929828 fbshipit-source-id: d862f7e5590998ffafe9bfc7754b0f83d2ae4af4	2018-09-18 21:16:18 -07:00
Wei Yang	8aedc27a63	checking device types of input and weights at RNN (#10185 ) Summary: - fixes #9534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10185 Differential Revision: D9141222 Pulled By: weiyangfb fbshipit-source-id: bb652e42cc15917019df080d6bce2926b18f3476	2018-09-18 20:26:02 -07:00
Anthony Miller	e80d1d2876	Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h Differential Revision: D9924348 Original commit changeset: 8d92b9e8b424 fbshipit-source-id: 0d1792804d7387023af3a9c29477f1da6f40044a	2018-09-18 18:27:00 -07:00
Jerry Pan	2c358eaf51	Caffe2: add plan name to logging (#11704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11704 Add plan name to the logging in RunPlan Reviewed By: Tianshu-Bao Differential Revision: D9802416 fbshipit-source-id: 45c359dba0a5d992e303b3cdcf34624881a631d8	2018-09-18 18:10:13 -07:00
Will Feng	1f34be47d9	Raise error when perf test result is NaN (#11588 ) Summary: Currently one of our GPU perf tests `test_gpu_speed_mnist` reports NaN after this commit (https://github.com/pytorch/pytorch/pull/8018), and we didn't have the logic in place to raise error when this happens. This PR fixes the problem and will also update the baseline properly even if its previous value is NaN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11588 Differential Revision: D9831798 Pulled By: yf225 fbshipit-source-id: b95eee38d69b3b8273f48b8ac7b7e0e79cf756ed	2018-09-18 18:10:12 -07:00
David Riazati	a79f5d77ad	Add pretty printer for JIT IR (#10319 ) Summary: Adds some pretty-printing capability to the IR graph to make debugging easier/more human readable, see `torch/csrc/jit/test_jit.cpp:925` and onwards for example outputs. Results aren't perfect yet but it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10319 Reviewed By: zdevito Differential Revision: D9558402 Pulled By: driazati fbshipit-source-id: 1d61c02818daa4c9bdca36d1477d1734cfc7d043	2018-09-18 17:39:44 -07:00
Edward Yang	1c8686001f	Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h (#11818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11818 To do this, I have to move the static context registry into ATen/core. I take the opportunity to convert it into an unordered_map. Reviewed By: Yangqing Differential Revision: D9924348 fbshipit-source-id: 8d92b9e8b4246ce608eba24ecef7ad5f8b9b6582	2018-09-18 17:25:46 -07:00
Yangqing Jia	3da8d71d7d	remove protobuf inclusion in core/logging.h (#11814 ) Summary: This should not be there since logging does not depend on protobuf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11814 Reviewed By: ezyang Differential Revision: D9923819 Pulled By: Yangqing fbshipit-source-id: 4d4edaea1a2e317f5db6e92c35d58c85dd35c5fb	2018-09-18 17:10:02 -07:00
Sebastian Messmer	53cf628503	Simplify Blob move constructor/assignment (#11402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11402 - Simplify move constructor/assignment - Make more things noexcept Reviewed By: ezyang Differential Revision: D9728631 fbshipit-source-id: 92562e30ea1e4d05ca857665a02b0ca66b0739e3	2018-09-18 15:09:40 -07:00
sven	e585f2fb48	Polish CPP docs, Minor Python Docs Fixes (#11722 ) Differential Revision: D9919120 Pulled By: goldsborough fbshipit-source-id: bf14cbe4ab79524495957cb749828046af864aab	2018-09-18 14:55:57 -07:00
Orion Reblitz-Richardson	8ad846fda5	Don't build Detectron ops with NO_CAFFE2_OPS=1 (#11799 ) Summary: cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11799 Differential Revision: D9922745 Pulled By: orionr fbshipit-source-id: b88724b7c2919aabc00d98658e8e563233e01c85	2018-09-18 14:09:33 -07:00
Wanchao Liang	d4e1fa45d0	allow no-alpha add/sub in onnx symbolic (#10972 ) Summary: The PR fixes #10873 The context is aten::add and aten::sub ST overloads don't have alpha, so onnx symbolic does not match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10972 Reviewed By: jamesr66a Differential Revision: D9724224 Pulled By: wanchaol fbshipit-source-id: eb5d1b09fa8f1604b288f4a62b8d1f0bc66611af	2018-09-18 13:55:39 -07:00
James Reed	7d25fa3c72	Emit Undefined type for value when it is Dynamic type (#11810 ) Summary: For example, outputs of control blocks often have Dynamic type, and when we try to export them to ONNX we get an invalid proto, since `elem_type` is not populated on the TypeInfoProto. This makes it so at least we can get past the checker, since having a dynamic typed output from a control block should still be semantically valid Pull Request resolved: https://github.com/pytorch/pytorch/pull/11810 Differential Revision: D9922754 Pulled By: jamesr66a fbshipit-source-id: 5c66113cc302a9d9b8b9f5a8605473d3c6ad5af1	2018-09-18 13:55:36 -07:00
Edward Yang	1d399a80a0	Handle pollution of MAX, MIN and CHECK macros. (#11805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11805 Some of our headers in Caffe2 pollute the macro namespace with things like MAX, MIN, CHECK, so I renamed these in places where this is a problem. This patch courtesy of gchanan, extracted out of #11721 Reviewed By: Yangqing Differential Revision: D9917757 fbshipit-source-id: 17fc692ca04b208dcb8ae00731ed60e393284f7c	2018-09-18 13:18:31 -07:00
Bram Wasti	9eb72889b4	Add successor/predecessor functions Summary: More functionality to prep nomnigraph for scheduler implementations Reviewed By: duc0 Differential Revision: D9794686 fbshipit-source-id: b460859d8ff965d0049b2a696bd8d2f5c97f3f86	2018-09-18 12:27:06 -07:00
Will Feng	47956ddf7e	Revert D9755189: [pytorch][PR] [API CHANGE] Add empty tensor tests to test_sparse Differential Revision: D9755189 Original commit changeset: e9d36f437db1 fbshipit-source-id: 8b99edf626418a953a8bd786847a6e0174a3a14d	2018-09-18 11:26:10 -07:00
Tongzhou Wang	540ef9b1fc	Add distributed get_backend (#11715 ) Summary: I have no idea how to run distributed tests locally so I'll let CI do this. Hopefully everything still works with `IntEnum`. cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/11715 Reviewed By: pietern Differential Revision: D9889646 Pulled By: SsnL fbshipit-source-id: 1e2a487cb6fe0bd4cc67501c9d72a295c35693e2	2018-09-18 10:56:24 -07:00
Soumith Chintala	2732c8bae1	improve aten/convolution error message (#11768 ) Summary: fixes https://github.com/pytorch/pytorch/issues/11762 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11768 Differential Revision: D9884185 Pulled By: soumith fbshipit-source-id: 2a0c3e1f5a4fb4833ae6e9fc791abcf45f7fbea2	2018-09-18 10:56:22 -07:00
Ansha Yu	98aebed88e	Refactor tests part 1 (#11350 ) Summary: Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594) Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner. I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase). 1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests. 2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests. 3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients. 4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object. 5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo. I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do ``` settings(...) given(...) def test_my_stuff(...) ``` But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350 Reviewed By: houseroad Differential Revision: D9693857 Pulled By: ajyu fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7	2018-09-18 10:42:10 -07:00
Peter Goldsborough	6073f3073e	Document torch::nn::init (#11778 ) Summary: Doc fixes and documentation for `torch::nn::init`. ebetica soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11778 Differential Revision: D9886648 Pulled By: goldsborough fbshipit-source-id: 22eb78add1dc32b92cc32253683ab3d746505a64	2018-09-18 10:26:21 -07:00
Will Feng	c8fbeb3aa2	Add empty tensor tests to test_sparse (#11228 ) Summary: This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass. [NOTE] API CHANGE: - `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228 Differential Revision: D9755189 Pulled By: yf225 fbshipit-source-id: e9d36f437db1a132c423d3a282ff405a084ae7cc	2018-09-18 10:26:18 -07:00
Gregory Chanan	e00fb69b25	Use CATCH prefix to avoid name conflicts with Caffe2. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11780 Differential Revision: D9889925 Pulled By: gchanan fbshipit-source-id: 5eca849c36ced00b8ae7482b7945b445a3e1687e	2018-09-18 08:12:45 -07:00
Amitesh Arora	4ee0a78ee6	varargs for meshgrid (#11600 ) Summary: Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device. Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446 The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience. Differential Revision: D9892876 Pulled By: ezyang fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556	2018-09-18 07:41:31 -07:00
Xingdong Zuo	e2bc95e1bd	add `ModuleList.insert` (#11664 ) Summary: fixes #11652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11664 Differential Revision: D9892845 Pulled By: ezyang fbshipit-source-id: 2c910d6bc0b28a999e25beca6e398fd0f35535c5	2018-09-18 07:41:28 -07:00
nehz	91b6458e2d	Container __getitem__ slicing for subclasses (#11694 ) Summary: Simple change to allow ModuleList subclasses's `__getitem__(slice)` to return class of subclass rather than ModuleList Pull Request resolved: https://github.com/pytorch/pytorch/pull/11694 Differential Revision: D9892824 Pulled By: ezyang fbshipit-source-id: b75e9c196487f55cb93f0dab6c20d850e8e759ff	2018-09-18 01:26:18 -07:00
Marc Ferradou	e734c94fa2	Quick update to embedding_bag doc (#11784 ) Summary: Related to #11624 adding maxes to the function def of embedding_bag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784 Differential Revision: D9892598 Pulled By: ezyang fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b	2018-09-17 23:56:05 -07:00
Wei Yang	407a9fee0c	make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061 ) Summary: - fix https://github.com/pytorch/pytorch/issues/10876 - the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source - with this fix, the behavior becomes ``` >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=True) >>> print(copy) tensor([[-1.2001, 1.9869], [-1.0134, 1.3096]], grad_fn=<CopyBackwards>) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=False) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061 Differential Revision: D9569714 Pulled By: weiyangfb fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1	2018-09-17 23:29:09 -07:00
Peter Goldsborough	63c811b3a6	Include some JIT things in C++ docs (#11712 ) Summary: Since we're making parts of the JIT public as part of loading script modules, they should be on the cppdocs website. Orthogonal: We decided not to export things like `IValue` into the `torch` namespace, so `RegisterOperators` shouldn't be there either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11712 Differential Revision: D9837578 Pulled By: goldsborough fbshipit-source-id: 4c06d2fa9dd4b4216951f27424c2ce795febab9c	2018-09-17 23:29:04 -07:00
Christian Puhrsch	bd43d64dd5	Add strides to Tensor (#11763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11763 baseline-std vector ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.74us 148.26K TensorShareData 5.89us 169.78K TensorShareExternalPointer 1.01us 994.35K TensorReallocation 2.46us 405.78K ============================================================================ ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 7.50us 133.27K TensorShareData 7.07us 141.38K TensorShareExternalPointer 1.05us 955.19K TensorReallocation 2.55us 391.62K ============================================================================ ``` baseline-smallvector ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.56us 152.34K TensorShareData 5.84us 171.32K TensorShareExternalPointer 962.49ns 1.04M TensorReallocation 2.32us 431.73K ============================================================================ ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.29us 159.04K TensorShareData 5.73us 174.39K TensorShareExternalPointer 914.90ns 1.09M TensorReallocation 2.29us 435.80K ============================================================================ ``` Reviewed By: ezyang Differential Revision: D9694097 fbshipit-source-id: c462e770a4b40e640d8c9d38e0ae7036a4e6e84a	2018-09-17 22:09:40 -07:00
Thomas Viehmann	a02685e109	Fix test_torch's test_potri (#11770 ) Summary: tset_potri -> test_potri, even though it has been like this for a long time More a curiosity than grave functionality... Pull Request resolved: https://github.com/pytorch/pytorch/pull/11770 Reviewed By: ezyang Differential Revision: D9884767 Pulled By: soumith fbshipit-source-id: 9bedde2e94ade281ab1ecc2293ca3cb1a0107387	2018-09-17 21:58:18 -07:00
Pieter Noordhuis	3cbec5453b	Reorder statements for readability (#11764 ) Summary: I was reading this a couple times before figuring out it's also the entry point for the MPI_COMM_WORLD. Reordered statements and added comment to clarify. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11764 Differential Revision: D9882834 Pulled By: pietern fbshipit-source-id: a9282d55368815925fd695a2541354e5aec599da	2018-09-17 21:58:15 -07:00
Mingzhe Li	a7cbcb1bb9	Enable build_python on windows (#11385 ) Summary: The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385 Reviewed By: orionr Differential Revision: D9884906 Pulled By: mingzhe09088 fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6	2018-09-17 21:40:03 -07:00
Tianshu Bao	63e384a381	SNNTest with Data Preproc Service (#11707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11707 Trigger SNN offline training test with data preproc service. Reviewed By: xsh6528 Differential Revision: D9826978 fbshipit-source-id: f98405ca1e61a7662bf0d9313aaba42436025a83	2018-09-17 21:25:49 -07:00
Sam Gross	7f0dd2487d	Move AT_HOST_DEVICE macro to Macros.h (#10945 ) Summary: ``` I'm using AT_HOST_DEVICE outside of Half.h in an upcoming PR. Since this changes code without making any semantic changes, I wanted to make this change in a separate PR. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10945 Differential Revision: D9539821 Pulled By: colesbury fbshipit-source-id: 0daae40ea78b077a543f7bfeec06b225634540de	2018-09-17 18:25:51 -07:00
Bram Wasti	e8ecbcdf01	Move IValue to ATen/core (#11610 ) Summary: unblocks D9202320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11610 Differential Revision: D9774853 Pulled By: bwasti fbshipit-source-id: 4798223f6de680a7152283e8cad8814da7f90209	2018-09-17 18:25:50 -07:00
Junjie Bai	d4dde0bcaf	Detect number of amd gpus in ROCM CI (#11771 ) Summary: We now have CI machines with different number of amd gpus. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11771 Differential Revision: D9889837 Pulled By: bddppq fbshipit-source-id: dacf728a282f209e3f2419da186e59528a08ca6a	2018-09-17 18:11:09 -07:00
Pieter Noordhuis	24a8c13f36	Add barrier to fix distributed test flakiness (#11775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11775 This should fix #11582. Reviewed By: ezyang Differential Revision: D9885546 fbshipit-source-id: 3544f42ebe8b595cdf6941859c67484d3ea9b3f8	2018-09-17 17:31:45 -07:00
zrphercule	7d0657f13c	Migrate test in cpp/api/ to use gtest (#11556 ) Summary: The second part of T32009899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11556 Differential Revision: D9888224 Pulled By: zrphercule fbshipit-source-id: cb0d0ba5d9c7ad601ee3bce0d932ce9cbbc40908	2018-09-17 17:31:43 -07:00
Bram Wasti	3819d25418	Clean up converter and accept less-valid networks Summary: Cleaning up converter.cc and allowing networks that have "pass through" inputs (that are also outputs but aren't actually consumed by the network) Reviewed By: duc0 Differential Revision: D9759435 fbshipit-source-id: 1ddfcc60a1b865a06682e4022230dfecc4b89ec3	2018-09-17 17:31:41 -07:00
Bram Wasti	ca5def1b8f	Expose annotations (#11649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11649 Putting annotations in python interface Reviewed By: duc0 Differential Revision: D9784750 fbshipit-source-id: d877c886ac52559ca3f009a1fd848dd1779b7d04	2018-09-17 16:39:37 -07:00
Gregory Chanan	3ce17bf8f6	Generate ATen/core to source if env GEN_TO_SOURCE is set. (#11759 ) Summary: It is currently tedious to change code generation because it takes two steps: change the code gen, then gen.py fails because of file mismatch. Just add an environment option of generating directly to source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11759 Differential Revision: D9867259 Pulled By: gchanan fbshipit-source-id: 3cf8024d9e302f382cf8b8a44cb843fb086f8597	2018-09-17 15:25:33 -07:00
Tongzhou Wang	7df6650e9c	Fix empty embedding bag on cuda (#11740 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11739 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11740 Differential Revision: D9881392 Pulled By: SsnL fbshipit-source-id: 2964d314f199dd9b4bb69e36592b67efdf5e0760	2018-09-17 14:40:03 -07:00
David Riazati	7671f4ab1c	Add `math` to scope when using inf in tests (#11302 ) Summary: This fixes #8515 which was mostly issues in the test themselves. As long as `math` is imported in the scope in which the script runs it resolves to a `prim::Constant` with value `inf` correctly. This PR adds this to the `test_jit.py` tests involving `inf` and adds a test to demonstrate `inf` in a non-generated test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11302 Differential Revision: D9684336 Pulled By: driazati fbshipit-source-id: 73df2848dfdb45ab50690a7c88df8fda269a64eb	2018-09-17 14:08:32 -07:00
Jongsoo Park	29610621ec	64B align for avx512 (#11748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748 For avx512, we need to align at a multiple of 64B not 32B Regardless of avx512, it's in general a good idea to be cache line aligned. Reviewed By: ilia-cher Differential Revision: D9845056 fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512	2018-09-17 14:08:31 -07:00
Natalia Gimelshein	336323f53c	return aten::gt to the list of fusable operations, add expected graphs (#11150 ) Summary: Fixes one of #11118 issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11150 Differential Revision: D9861372 Pulled By: apaszke fbshipit-source-id: 98b196b89e991d3936360b30568360367fd32e8b	2018-09-17 13:40:41 -07:00
Soumith Chintala	73738ec570	bump version to 1.0 (#11717 ) Summary: I'm just doing the honors and bumping the version to 1.0.0. 1.0 preview and RC releases will have the 1.0.0.dev{date} tag Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717 Reviewed By: SsnL Differential Revision: D9840857 Pulled By: soumith fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace	2018-09-17 12:13:48 -07:00
vishwakftw	47d65ed34f	Fix issue 10492 (#11634 ) Summary: - pass infos vector by reference - checkErrors takes infos vector by reference - modified gesv tests to not cause infs or nans sporadically - also clean up error messages Reviewed By: ezyang Differential Revision: D9818550 Pulled By: soumith fbshipit-source-id: 00215205ff88767d6a5e921322394c5fd915d6d8	2018-09-17 12:13:45 -07:00
Gregory Chanan	39520ffec1	remove Type/Tensor/TensorMethods include order dependencies. (#11720 ) Summary: Previously, it was a necessity to include TensorMethods.h after Tensor.h in order to get the tensor method definitions. We abstracted this away from users by making sure ATen.h did this correctly; but we don't have any equivalent for ATen/core. In order to solve this dependency issue, we now forward declare Tensor in the Type declaration, which breaks the dependency cycle. Type.h now includes Tensor.h (for backwards compatibility) and Tensor.h now includes TensorMethods.h, so there is no longer include dependency restrictions. We could get rid of TensorMethods.h completely now, but that would involve coordinating a code generation change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11720 Reviewed By: ezyang Differential Revision: D9841488 Pulled By: gchanan fbshipit-source-id: 1668199095e096c1790e646b5dc9f61ec1b33c0a	2018-09-17 11:10:32 -07:00
Gregory Chanan	e125e61824	Fix flake8 Summary: Fix flake8 Reviewed By: ezyang Differential Revision: D9873872 fbshipit-source-id: 26e81238f22caaeccd2c8b4f39cedb6cfb5520dd	2018-09-17 11:10:29 -07:00
Chenguang Xi	cdefc27795	Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162 as title, fix pr test failure Reviewed By: chocjy Differential Revision: D9619308 fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146	2018-09-17 10:29:03 -07:00
Peter Goldsborough	7949250295	Fixes for Torch Script C++ API (#11682 ) Summary: A couple fixes I deem necessary to the TorchScript C++ API after writing the tutorial: 1. When I was creating the custom op API, I created `torch/op.h` as the one-stop header for creating custom ops. I now notice that there is no good header for the TorchScript C++ story altogether, i.e. when you just want to load a script module in C++ without any custom ops necessarily. The `torch/op.h` header suits that purpose just as well of course, but I think we should rename it to `torch/script.h`, which seems like a great name for this feature. 2. The current API for the CMake we provided was that we defined a bunch of variables like `TORCH_LIBRARY_DIRS` and `TORCH_INCLUDES` and then expected users to add those variables to their targets. We also had a CMake function that did that for you automatically. I now realized a much smarter way of doing this is to create an `IMPORTED` target for the libtorch library in CMake, and then add all this stuff to the link interface of that target. Then all downstream users have to do is `target_link_libraries(my_target torch)` and they get all the proper includes, libraries and compiler flags added to their target. This means we can get rid of the CMake function and all that stuff. orionr AFAIK this is a much, much better way of doing all of this, no? 3. Since we distribute libtorch with `D_GLIBCXX_USE_CXX11_ABI=0`, dependent libraries must set this flag too. I now add this to the interface compile options of this imported target. 4. Fixes to JIT docs. These could likely be 4 different PRs but given the release I wouldn't mind landing them all asap. zdevito dzhulgakov soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11682 Differential Revision: D9839431 Pulled By: goldsborough fbshipit-source-id: fdc47b95f83f22d53e1995aa683e09613b4bfe65	2018-09-17 09:54:50 -07:00
Thomas Viehmann	a7e3cd09e0	Fix ctc gradient handling (#11753 ) Summary: Fixes: #11750 Also fix cuda ctc with double to enable gradient check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11753 Differential Revision: D9861318 Pulled By: ezyang fbshipit-source-id: 2e7afea2b60dbbd891bb5d0bda61ee75fe01d933	2018-09-17 09:54:47 -07:00
Edward Yang	07fd4450ab	Revert D9831398: [pytorch][PR] Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) Differential Revision: D9831398 Original commit changeset: db119d3f9c26 fbshipit-source-id: 4f183c9c178c159473bdaaa6299d4d5eb8afe549	2018-09-17 09:39:23 -07:00
Edward Yang	f6a6d7fae1	Switch at::TensorImpl to store TypeMeta rather than ScalarType Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11702 Reviewed By: cpuhrsch Differential Revision: D9831384 fbshipit-source-id: 1b1233a70ed70b47a3dab4a5797b6cfcb7a2c265	2018-09-17 09:09:35 -07:00
Edward Yang	6660a128a5	Cache and use TypeMeta in TensorImpl (#11706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11706 This is necessary to handle use-cases when Storage is not set (because the tensor in question doesn't have a notion of storage.) Reviewed By: orionr Differential Revision: D9833361 fbshipit-source-id: e90a384019f44f57682b687d129b54e85b6fabb9	2018-09-17 08:58:13 -07:00
Edward Yang	2baba7f835	Add storage_offset to Caffe2 (#11701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11701 There's one extra multiply from TypeMeta::itemsize() which needs to be characterized. For all existing Caffe2 uses, storage_offset is zero. Reviewed By: li-roy Differential Revision: D9831230 fbshipit-source-id: 353678edf76d2ccc297a73475a34f6ab2a20d1e1	2018-09-17 08:58:11 -07:00
Edward Yang	35518b3dc7	Back out "Back out "Refactor Tensor/TensorImpl constructors."" E2: Confirm problem with old patch (#11744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11744 Original commit changeset: 093e4c47d557 Restores D9813742 Reviewed By: dzhulgakov Differential Revision: D9847835 fbshipit-source-id: f3f467891e01c923dd9d3352d892cf59e10402f1	2018-09-17 08:58:09 -07:00
Gregory Chanan	0d345cfa18	Remove Type method defaults in ATen. (#11675 ) Summary: This will allow us to break the dependency cycle between Tensor and Type, because currently Type has defaulted Tensor (reference) arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11675 Reviewed By: ezyang Differential Revision: D9819720 Pulled By: gchanan fbshipit-source-id: a9577ac34a358120075129ab0654e7862d1dace6	2018-09-17 08:58:07 -07:00
Jesse Hellemn	5bfd8f583c	Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726 ) Summary: This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726 Reviewed By: orionr Differential Revision: D9844075 Pulled By: pjh5 fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa	2018-09-17 08:58:05 -07:00
Gregory Chanan	a8b1755de6	Check device argument makes sense for legacy tensor constructors. (#11669 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/11427. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11669 Differential Revision: D9817881 Pulled By: gchanan fbshipit-source-id: 77dc5b0e6bc9884d2616210b96c07e4734058bb6	2018-09-17 08:24:25 -07:00
peter	d63bb72d89	Remove symbol export annotations in THC/generic/*.cu (#11367 ) Summary: We use these annotations during function declarations, not definitions. See the description of compiler error [C2491](https://msdn.microsoft.com/en-us/library/62688esh.aspx) for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11367 Reviewed By: ezyang Differential Revision: D9697923 Pulled By: orionr fbshipit-source-id: 1e539c02957851386f887e6d0510ce83117a1695	2018-09-17 08:24:23 -07:00
JerryShih	f5bc2aef07	Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#11563 ) Summary: Fix the link OpenMP link error for AppleClang 9.0 compiler. Built with the following command: python setup.py build develop The error message: ``` Undefined symbols for architecture x86_64: "___kmpc_critical", referenced from: _THFloatTensor_addmm in THTensorMath.cpp.o _THDoubleTensor_addmm in THTensorMath.cpp.o _THByteTensor_addmm in THTensorMath.cpp.o _THCharTensor_addmm in THTensorMath.cpp.o _THShortTensor_addmm in THTensorMath.cpp.o _THIntTensor_addmm in THTensorMath.cpp.o _THLongTensor_addmm in THTensorMath.cpp.o ... "___kmpc_end_critical", referenced from: _THFloatTensor_addmm in THTensorMath.cpp.o _THDoubleTensor_addmm in THTensorMath.cpp.o _THByteTensor_addmm in THTensorMath.cpp.o _THCharTensor_addmm in THTensorMath.cpp.o _THShortTensor_addmm in THTensorMath.cpp.o _THIntTensor_addmm in THTensorMath.cpp.o _THLongTensor_addmm in THTensorMath.cpp.o ... "___kmpc_end_reduce_nowait", referenced from: _.omp_outlined..270 in THTensorMoreMath.cpp.o _.omp_outlined..271 in THTensorMoreMath.cpp.o _.omp_outlined..273 in THTensorMoreMath.cpp.o _.omp_outlined..275 in THTensorMoreMath.cpp.o _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o ... "___kmpc_end_serialized_parallel", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "___kmpc_for_static_fini", referenced from: _.omp_outlined..9 in Embedding.cpp.o _.omp_outlined. in EmbeddingBag.cpp.o _.omp_outlined. in GridSampler.cpp.o _.omp_outlined..42 in GridSampler.cpp.o _.omp_outlined..44 in GridSampler.cpp.o _.omp_outlined..45 in GridSampler.cpp.o _.omp_outlined..47 in GridSampler.cpp.o ... "___kmpc_for_static_init_4", referenced from: _.omp_outlined. in init.cpp.o _.omp_outlined..35 in init.cpp.o _.omp_outlined..36 in init.cpp.o _.omp_outlined..37 in init.cpp.o _.omp_outlined..49 in init.cpp.o _.omp_outlined..52 in init.cpp.o _.omp_outlined..220 in init.cpp.o ... "___kmpc_for_static_init_8", referenced from: _.omp_outlined..9 in Embedding.cpp.o _.omp_outlined. in EmbeddingBag.cpp.o _.omp_outlined. in GridSampler.cpp.o _.omp_outlined..42 in GridSampler.cpp.o _.omp_outlined..44 in GridSampler.cpp.o _.omp_outlined..45 in GridSampler.cpp.o _.omp_outlined..47 in GridSampler.cpp.o ... "___kmpc_for_static_init_8u", referenced from: _.omp_outlined..203 in init.cpp.o _.omp_outlined..207 in init.cpp.o _.omp_outlined..209 in init.cpp.o _.omp_outlined..210 in init.cpp.o "___kmpc_fork_call", referenced from: at::native::embedding_dense_backward_cpu(at::Tensor const&, at::Tensor const&, long long, long long, bool) in Embedding.cpp.o at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::grid_sampler_2d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_3d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_2d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_3d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o ... "___kmpc_global_thread_num", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "___kmpc_push_num_threads", referenced from: void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o ... "___kmpc_reduce_nowait", referenced from: _.omp_outlined..270 in THTensorMoreMath.cpp.o _.omp_outlined..271 in THTensorMoreMath.cpp.o _.omp_outlined..273 in THTensorMoreMath.cpp.o _.omp_outlined..275 in THTensorMoreMath.cpp.o _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o ... "___kmpc_serialized_parallel", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "_omp_get_max_threads", referenced from: _THGetNumThreads in THGeneral.cpp.o caffe2::Caffe2SetOpenMPThreads(int, char*) in init_omp.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o ... "_omp_get_num_procs", referenced from: _THGetNumCores in THGeneral.cpp.o "_omp_get_num_threads", referenced from: _.omp_outlined. in Embedding.cpp.o _.omp_outlined. in SoftMax.cpp.o _.omp_outlined..35 in SoftMax.cpp.o _.omp_outlined..37 in SoftMax.cpp.o _.omp_outlined..38 in SoftMax.cpp.o _.omp_outlined..46 in SoftMax.cpp.o _.omp_outlined..47 in SoftMax.cpp.o ... "_omp_get_thread_num", referenced from: _.omp_outlined. in Embedding.cpp.o _.omp_outlined. in SoftMax.cpp.o _.omp_outlined..35 in SoftMax.cpp.o _.omp_outlined..37 in SoftMax.cpp.o _.omp_outlined..38 in SoftMax.cpp.o _.omp_outlined..46 in SoftMax.cpp.o _.omp_outlined..47 in SoftMax.cpp.o ... "_omp_in_parallel", referenced from: _THFloatTensor_copy in THTensorCopy.cpp.o _THDoubleTensor_copy in THTensorCopy.cpp.o _THByteTensor_copy in THTensorCopy.cpp.o _THCharTensor_copy in THTensorCopy.cpp.o _THShortTensor_copy in THTensorCopy.cpp.o _THIntTensor_copy in THTensorCopy.cpp.o _THLongTensor_copy in THTensorCopy.cpp.o ... "_omp_set_num_threads", referenced from: _THSetNumThreads in THGeneral.cpp.o caffe2::Caffe2SetOpenMPThreads(int, char***) in init_omp.cc.o ld: symbol(s) not found for architecture x86_64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11563 Differential Revision: D9831398 Pulled By: ezyang fbshipit-source-id: db119d3f9c26a71180335ad955f2f62c5369f9ed	2018-09-17 08:24:20 -07:00
Tongzhou Wang	6f6b03566b	Vectorize grid sample 2d CPU kernels (#10980 ) Summary: This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically, 1. add `.data()` in `TensorAccessor` 2. support non-void return value for declaring CPU kernel stub 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)` 1. The following vectorized CPU primitives are added: + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]` + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`. + comparison ops + binary logical ops + `min(a, b)` + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`. + ctor with multiple values (i.e., `setr`) + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`: ``` inputs: {a0, a1, a2, a3, a4, a5, a6, a7} {b0, b1, b2, b3, b4, b5, b6, b7} outputs: {a0, b0, a1, b1, a2, b2, a3, b3} {a4, b4, a5, b5, a6, b6, a7, b7} ``` 2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`: ``` NOTE [ Grid Sample CPU Kernels ] Implementation of vectorized grid sample CPU kernels is divided into three parts: 1. `ComputeLocation` struct Transforms grid values into interpolation locations of the input tensor for a particular spatial dimension, basing on the size of that dimension in input tensor, and the padding mode. ``` ```cpp template<typename scalar_t, GridSamplerPadding padding> struct ComputeLocation { using Vec = Vec256<scalar_t>; // ctor ComputeLocation(int64_t size); // Given grid values `in`, return the interpolation locations after // un-normalization and padding mechanism (elementwise). Vec apply(const Vec &in) const; // Similar to `apply`, but also returns `d apply(in) / d in` // (elementwise). // this is often used in gradient computation. std::pair<Vec, Vec> apply_get_grad(const Vec &in) const; }; ``` ``` 2. `ApplyGridSample` struct Owns N `ComputeLocation` structs, where N is the number of spatial dimensions. Given N input grid vectors (one for each spatial dimension) and spatial offset, it gets the interpolation locations from `ComputeLocation`s, applies interpolation procedure, and then writes to the output (or grad_input & grad_grid in backward). ``` ```cpp template<typename scalar_t, int spatial_dim, GridSamplerInterpolation interp, GridSamplerPadding padding> struct ApplyGridSample { // ctor ApplyGridSample(const TensorAccessor<scalar_t, 4>& input); // Applies grid sampling (forward) procedure: // 1. computes interpolation locations from grid values `grid_x` and // `grid_y`, // 2. interpolates output values using the locations and input data // in `inp_slice`, and // 3. writes the first `len` values in the interpolated vector to // `out_slice` with spatial offset being `offset`. // // This assimes that `grid_x` and `grid_y` all contain valid grid // values \in [-1, 1], even at indices greater than `len`. // // The `*_slice` argument namess mean samples within a batch (i.e., // with the batch dimension sliced out). void forward(TensorAccessor<scalar_t, 3>& out_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; // Applies grid sampling (backward) procedure. Arguments semantics // and strategy are similar to those of `forward`. void backward(TensorAccessor<scalar_t, 3>& gInp_slice, TensorAccessor<scalar_t, 3>& gGrid_slice, const TensorAccessor<scalar_t, 3>& gOut_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; } ``` ``` 3. `grid_sample_2d_grid_slice_iterator` function Among the tensors we work with, we know that the output tensors are contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in backward), we need to randomly read `input` anyways, and `grad_output` usually comes from autograd and is often contiguous. So we base our iterating strategy on the geometry of grid. `grid_sample_2d_grid_slice_iterator` function provides an abstract to efficiently iterates through a `grid` slice (without batch dimension). See comments of that function on the specific cases and strategies used. ``` ```cpp template<typename scalar_t, typename ApplyFn> void grid_sample_2d_grid_slice_iterator( const TensorAccessor<scalar_t, 3>& grid_slice, const ApplyFn &apply_fn); // `apply_fn` is a function/lambda that can be called as if it has // declaration: // void apply_fn(const Vec256<scalar_t>& grid_x, // const Vec256<scalar_t>& grid_y, // int64_t spatial_offset, int64_t len); ``` ``` `apply_fn` will be called multiple times, and together cover the entire output spatial space. Therefore, e.g., to implement forward 2d grid sample, we can do ``` ```cpp ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor); for (int n = 0; n < input_accessor.size(0); n++) { grid_sample_2d_grid_slice_iterator( grid_accessor[n], [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y, int64_t spatial_offset, int64_t len) { grid_sample.forward(out_accessor[n], input_accessor[n], spatial_offset, grid_x, grid_y, len); }); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980 Differential Revision: D9564867 Pulled By: SsnL fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4	2018-09-16 20:41:10 -07:00
peter	10c29c8970	Fix CUDA 8 build on Windows (#11729 ) Summary: Tested via https://github.com/pytorch/pytorch/pull/11374. Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11729 Differential Revision: D9847807 Pulled By: orionr fbshipit-source-id: 69af3e6c5bba0abcbc8830495e867a0b1b399c22	2018-09-16 08:09:24 -07:00
Jiyan Yang	ca6f08f359	Set correct dtype for fp16 op inference function (#11693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11693 as desc. Reviewed By: hyuen Differential Revision: D9829061 fbshipit-source-id: 0f4c8a9d2b95d4cf5fa20a2aefd5671f273a8e76	2018-09-15 23:40:41 -07:00
Junjie Bai	b3e726042c	Do not use FixedDivisor in ROCM order switch op (#11697 ) Summary: Fix the recent order_switch_test failure in ROCM CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/11697 Reviewed By: BIT-silence Differential Revision: D9831039 Pulled By: bddppq fbshipit-source-id: 2368fd1ac7b1bab335ff3377071246cfd3392f3f	2018-09-15 18:24:51 -07:00
rohithkrn	eb3c47bdd5	max -> fmaxf in cross_entropy kernel (#11733 ) Summary: Changing `max` to `fmaxf` in `LabelCrossEntropy` kernel for hip to work correctly. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/11733 Differential Revision: D9846783 Pulled By: bddppq fbshipit-source-id: c1b394d2ba7ee0e819f7bf3b36b53d1962de5522	2018-09-15 18:13:42 -07:00
Ailing Zhang	f09054f8d0	Remove deprecate warning for Upsampling (#11568 ) Summary: Fixes #11452 . Based on the discussion with SsnL and soumith , we want to bring back Upsample as a module instead of introducing a new nn.interpolate module for now. If anyone want to do downsample, they should use `nn.functional.interpolate ` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11568 Differential Revision: D9804359 Pulled By: ailzhang fbshipit-source-id: 2b232d55fc83c2b581bf336f1ee8d1cf1c1159ca	2018-09-14 17:54:48 -07:00
Sebastian Messmer	bb6f18c44f	Simplify IValue::toTensor() (#11355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11355 There is no reason to implement refcounting manually in this case. Given the correct NullType, toIntrusivePtr() and moveToIntrusivePtr() will do the right thing. Reviewed By: ezyang Differential Revision: D9694918 fbshipit-source-id: 8aae4d66aec32ca5f85c438d66339bd80b72b656	2018-09-14 16:57:15 -07:00
Sebastian Messmer	690c999bba	Simplify union payload copying (#11353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11353 Before, there was one extra member in the union that had to be at least as large as the largest other member, because it was used for copying. Now, this isn't needed anymore and we copy the union directly. Reviewed By: ezyang Differential Revision: D9694326 fbshipit-source-id: 42b2f7d51ac5d4ea5ebafea3a598b018e10fed68	2018-09-14 16:57:14 -07:00
Sebastian Messmer	270fb22bd8	Remove intrusive_ptr::reclaim() in Storage (2/2) (#11547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11547 Pushing manual refcounting further back, making things safer. Reviewed By: ezyang Differential Revision: D9778042 fbshipit-source-id: c9572edc440c5ce5ea1b2355b5c54f87078ea28e	2018-09-14 16:57:12 -07:00
Sebastian Messmer	f4d9fe395d	Remove intrusive_ptr::reclaim() in Storage (#11352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11352 Pushing manual refcounting further back, making things safer. Reviewed By: ezyang Differential Revision: D9694327 fbshipit-source-id: befdbcac199225383a93520472ee7c6511a0e9cd	2018-09-14 16:57:10 -07:00
Edward Yang	2c8a1b957e	Back out "Refactor Tensor/TensorImpl constructors." Summary: Original commit changeset: 7501b54fe5f3 Reviewed By: gchanan Differential Revision: D9838097 fbshipit-source-id: 093e4c47d5574ce99f706b0683ef369a89b62b38	2018-09-14 16:39:31 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Junjie Bai	d24bcfd930	Suppress hiprand "duplicate-decl-specifier" warning (#11698 ) Summary: Otherwise each build produces 65MB of warnings log, which makes the CI hard to debug. iotamudelta Jorghi12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11698 Differential Revision: D9840356 Pulled By: bddppq fbshipit-source-id: b69bf6a5c38a97b188221f9c084c608ffc9b37c8	2018-09-14 15:51:43 -07:00
Peter Goldsborough	8e3f8c52e8	Document the Sequential module (#11648 ) Summary: 1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level 2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes. ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648 Differential Revision: D9834526 Pulled By: goldsborough fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1	2018-09-14 15:51:41 -07:00
Mike Ruberry	96d3f968eb	Splits CPU and CUDA fusion compilers (#10981 ) Summary: This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular: - A fusion interface is created that allows "fusion handles" to be requested - The CPU and CUDA fusers implement this interface, with dispatch determined by device - The fusion compilers, fusion function specializations and resource strings are split - CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser - Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common - There is still some specialization in jit/fusers/common, but these specializations are small(-ish) - Updates the build system to remove the dummy interface on Windows and minimize the use of macros This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10981 Reviewed By: soumith Differential Revision: D9701999 Pulled By: apaszke fbshipit-source-id: 3b6bec7b97e0444b2a93caa38d9b897f2e68c1b3	2018-09-14 14:05:34 -07:00
David Riazati	70e68e755a	Casting for binary ops (#11708 ) Summary: Fixes #11663 `TensorIterator` was replacing the op tensors with type casted tensors which ended up producing side effects in binary ops like `a.float() * b` where `a` and `b` are `LongTensor`s. colesbury ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11708 Differential Revision: D9834016 Pulled By: driazati fbshipit-source-id: 4082eb9710b31dfc741161a0fbdb9a8eba8fe39d	2018-09-14 13:40:21 -07:00
Anders Papitto	224e62bbec	respect USE_CUDA_STATIC_LINK in build_libtorch.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11713 Differential Revision: D9835972 Pulled By: anderspapitto fbshipit-source-id: 046363b132e5487c05ef7e6e6d88b508196386a1	2018-09-14 12:25:08 -07:00
Michael Carilli	0c2648830f	Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881 ) Summary: Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script. If the op is in the forward pass that's easy: ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward. If the op is in the backward pass, it's much more difficult. From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information. Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are. This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions. The method is simple and surgical. During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number. During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op. Double-backward is not a problem. If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward: The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass. Minor caveats: - The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name). So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1. However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker. - Feel free to litigate my use of stringstream in profiler.cpp. I did it because it was easy and clean. If that's too big a hammer, let's figure out something more lightweight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881 Differential Revision: D9833371 Pulled By: apaszke fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088	2018-09-14 11:56:55 -07:00
Gregory Chanan	b90872c00e	Get rid of default arguments for TH/THC factory functions. (#11673 ) Summary: This is causing codegen problems in caffe2, when we try to remove the circular Tensor/Type declarations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11673 Differential Revision: D9819341 Pulled By: gchanan fbshipit-source-id: f2c2cd96e8a16f6de6aa4889e71b8a78e12e9256	2018-09-14 10:55:38 -07:00
Pieter Noordhuis	7535d98ec4	Add message tag parameter to send/recv Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11490 Reviewed By: teng-li Differential Revision: D9828116 Pulled By: pietern fbshipit-source-id: 98be1ae84b6763ffb329e63c030c5e3ec0e748b7	2018-09-14 10:55:37 -07:00
Peter Goldsborough	3258fc11a7	Delete torch/csrc/api/README.md (#11703 ) Summary: We'll have separate docs for the C++ frontend, right now this file is just misleading Pull Request resolved: https://github.com/pytorch/pytorch/pull/11703 Differential Revision: D9832847 Pulled By: goldsborough fbshipit-source-id: 2e8b30ccf6b5cba9d0526e6261160f7c6211a35c	2018-09-14 10:55:35 -07:00
James Reed	278e304c18	Implement elif in string frontend (#11667 ) Summary: Closes #11625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11667 Differential Revision: D9828145 Pulled By: jamesr66a fbshipit-source-id: c72dc41cb310a4211b4e4c6b33f7e2c1fb3581a0	2018-09-14 10:09:46 -07:00
Roy Li	115b13ffab	clean up some old Half stuff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11687 Differential Revision: D9829027 Pulled By: li-roy fbshipit-source-id: f35dcdf93ea57ba4fa775e36e9d6378bed46a710	2018-09-14 09:54:45 -07:00
Alexander Sidorov	eb039dc92c	Add CHECKs into GetTensorInfo and ExtractDeviceOption (#11597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11597 We should always CHECK pointers which we plan to dereference if they are inputs to the function. Nobody knows how the function will be called in the future. Reviewed By: yinghai Differential Revision: D9800002 fbshipit-source-id: 7fd05f4717f2256d1b09a9e75475b12de6685b03	2018-09-14 09:40:27 -07:00
Vishwak Srinivasan	0d9b9100f9	Fix gesv and gels docs (#11699 ) Summary: Closes #9935 and closes #5431 . Differential Revision: D9830448 Pulled By: soumith fbshipit-source-id: 4e5320a1d0c1d4c8253a5b26f4842cea76530514	2018-09-14 09:24:45 -07:00
Edward Yang	72822ee6b2	Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533 ) Summary: …cuda()) While I was at it, I audited all other ways I know how we might get a CUDA type from PyTorch and fixed more constructors which don't work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533 Differential Revision: D9775786 Pulled By: ezyang fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17	2018-09-14 09:10:08 -07:00
Gregory Chanan	2631da0822	Move some Tensor method definitions from Type.h to TensorMethods.h. (#11650 ) Summary: There's no reason they need to be in Type.h and this moves us along the path of not having circular dependencies (so we can get rid of TensorMethods.h). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11650 Reviewed By: ezyang Differential Revision: D9812271 Pulled By: gchanan fbshipit-source-id: 8b70db9a5eb0a332398ab2e8998eeaf7d2eea6d7	2018-09-14 08:56:02 -07:00
Gregory Chanan	6c3792b9ec	Implement UndefinedType::typeMeta. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11666 Differential Revision: D9816212 Pulled By: gchanan fbshipit-source-id: 079899590150009bc2e2a3bbdc78a98de9380e37	2018-09-14 08:40:26 -07:00
Neeraj Pradhan	cda71e2600	Disallow scalar parameters in Dirichlet and Categorical (#11589 ) Summary: This adds a small check in `Dirichlet` and `Categorical` `__init__` methods to ensure that scalar parameters are not admissible. Motivation Currently, `Dirichlet` throws no error when provided with a scalar parameter, but if we `expand` a scalar instance, it inherits the empty event shape from the original instance and gives unexpected results. The alternative to this check is to promote `event_shape` to be `torch.Size((1,))` if the original instance was a scalar, but that seems to add a bit more complexity (and changes the behavior of `expand` in that it would affect the `event_shape` as well as the `batch_shape` now). Does this seem reasonable? cc. alicanb, fritzo. ```python In [4]: d = dist.Dirichlet(torch.tensor(1.)) In [5]: d.sample() Out[5]: tensor(1.0000) In [6]: d.log_prob(d.sample()) Out[6]: tensor(0.) In [7]: e = d.expand([3]) In [8]: e.sample() Out[8]: tensor([0.3953, 0.1797, 0.4250]) # interpreted as events In [9]: e.log_prob(e.sample()) Out[9]: tensor(0.6931) # wrongly summed out In [10]: e.batch_shape Out[10]: torch.Size([3]) In [11]: e.event_shape Out[11]: torch.Size([]) # cannot be empty ``` Additionally, based on review comments, this removes `real_vector` constraint. This was only being used in `MultivariateNormal`, but I am happy to revert this if we want to keep it around for backwards compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11589 Differential Revision: D9818271 Pulled By: soumith fbshipit-source-id: f9bbba90ed6f04e0b5bdfa169e70ca20b280fc74	2018-09-14 07:55:35 -07:00
Neeraj Pradhan	c391c20063	Adding .expand method for TransformedDistribution (#11607 ) Summary: This PR: - adds a `.expand` method for `TransformedDistribution` along the lines of #11341. - uses this method to simplify `.expand` in distribution classes that subclass off of `TransformedDistribution`. - restores testing of `TransformedDistribution` fixtures. - fixes some bugs wherein we were not setting certain attributes in the expanded instances, and adds tests for `.mean` and `.variance` which use these attributes. There are many cases where users directly use `TransformedDistribution` rather than subclassing off it. In such cases, it seems rather inconvenient to have to write a separate class just to define a `.expand` method. The default implementation should suffice in these cases. cc. fritzo, vishwakftw, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11607 Differential Revision: D9818225 Pulled By: soumith fbshipit-source-id: 2c4b3812b9a03e6985278cfce0f9a127ce536f23	2018-09-14 07:55:33 -07:00
Edward Yang	74197c7115	Restore support for dim=None on WeightNorm. (#11661 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661 Reviewed By: veenix Differential Revision: D9826799 Pulled By: ezyang fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d	2018-09-14 07:39:43 -07:00
Edward Yang	19065f91fc	Centralize TypeExtendedInterface casts. (#11576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11576 Previously, they were spattered throughout the codebase. We now follow this convention: - LegacyTypeDispatch gives you Type - Context gives you TypeExtendedInterface - Tensor::type() gives you Type - at::getType() gives you TypeExtendedInterface I change some sites to use getType() over type(). Reviewed By: SsnL Differential Revision: D9790187 fbshipit-source-id: 5e2577cb590a5bbf5df530f3763d3b3c0b4625ca	2018-09-14 07:39:41 -07:00
Jiyan Yang	c5f7da3f4a	Support FP16 sparse lookup (#11674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11658 Reviewed By: hyuen Differential Revision: D9676950 fbshipit-source-id: 89a115b9664b84e4e4436b7da033e5a428c2246d	2018-09-14 02:40:08 -07:00
zrphercule	1637729620	Fix ci by skipping some tests (#11668 ) Summary: scalar_tensor_test skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/11668 Differential Revision: D9825819 Pulled By: zrphercule fbshipit-source-id: 6e62a001bcde49be8f7af1501b303bd93d09d005	2018-09-13 20:25:14 -07:00
Edward Yang	e6fe8d9cf5	Try to delete codeowners for ATen/core (#10693 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10693 Reviewed By: soumith Differential Revision: D9772210 Pulled By: ezyang fbshipit-source-id: 14560eaf77441980e9784536acd0ffe20b15c5b8	2018-09-13 20:25:11 -07:00
Fritz Obermeyer	2431eac7c0	Ensure most Distribution methods are jittable (#11560 ) Summary: This adds tests in tests/test_distributions.py to ensure that all methods of `Distribution` objects are jittable. I've replaced a few samplers with jittable versions: - `.uniform_()` -> `torch.rand()` - `.exponential_()` -> `-(-torch.rand()).log1p()` - `.normal_()` -> `torch.normal(torch.zeros(...), torch.ones(...), ...)` Some jit failures remain, and are marked in test_distributions.py - `Cauchy` and `HalfCauchy` do not support sampling due to missing `.cauchy_()` - `Binomial` does not support `.enumerate_support()` due to `arange` ignoring its first arg. - `MultivariateNormal`, `LowRankMultivariateNormal` do not support `.mean`, `.entropy` - [x] Currently some tests fail (I've skipped those) due to unavailability of `aten::uniform` and `aten::cauchy` in the jit. Can someone suggest how to add these? I tried to add declarations to `torch/csrc/ir.cpp` and `torch/csrc/passes/shape_analysis.cpp`, but that resulted in "Couldn't find operator" errors. - [x] There are still lots of `TracerWarning`s that something doesn't match something. I'm not sure whether these are real. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11560 Differential Revision: D9816327 Pulled By: apaszke fbshipit-source-id: 72ec998ea13fc4c76d1ed003d9502e0fbaf728b8	2018-09-13 19:55:01 -07:00
xhzhao	99c0b96f68	optimize norm on ATen CPU backend (#11565 ) Summary: current torch.norm() runs sequentially on CPU. This PR did parallelization and vectorization of torch.norm() on ATen CPU path, roughly provide 2 order of magnitude performance boost. Performance is benchmarks on Xeon skylake 8180, 228 cores 2.5GHz, using the following script: ```python import torch from time import time count = 1000 size = 10001000 def test_norm(p=2): a = torch.randn(size) tstart = time() for i in range(count): torch.norm(a, p) tend = time() print("norm on size %d tensor p = %d: %f s" % (size, p, (tend-tstart))) for p in range(4): test_norm(p) ``` without this optimization, ``` (intel-pytorch) [mingfeim@mlt-skx065 unit_tests]$ python test_norm.py norm on size 1000000 tensor p = 0: 1.071235 s norm on size 1000000 tensor p = 1: 1.069149 s norm on size 1000000 tensor p = 2: 1.068212 s norm on size 1000000 tensor p = 3: 69.735312 s ``` and with this optimization, ``` (pytorch-tf) [mingfeim@mlt-skx053 unit_tests]$ python test_norm.py norm on size 1000000 tensor p = 0: 0.127507 s norm on size 1000000 tensor p = 1: 0.011867 s norm on size 1000000 tensor p = 2: 0.011907 s norm on size 1000000 tensor p = 3: 0.014470 s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11565 Differential Revision: D9804484 Pulled By: ezyang fbshipit-source-id: 52899f30ac26139d00684d07edfb47cb9b25d871	2018-09-13 19:40:43 -07:00
Adam Paszke	98e04db955	Implement requires_grad propagation in the JIT (#11586 ) Summary: Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute. I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586 Reviewed By: ezyang Differential Revision: D9813648 Pulled By: apaszke fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8	2018-09-13 19:25:26 -07:00
Gao, Xiang	513fd3dd36	Improve doc of `torch.nn.functional.pad` (#11623 ) Summary: I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623 Differential Revision: D9818255 Pulled By: soumith fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536	2018-09-13 19:25:24 -07:00
Tongzhou Wang	760679352e	Move Pixel Shuffle to ATen (#9721 ) Summary: <del>#9692 </del> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721 Differential Revision: D8955829 Pulled By: SsnL fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399	2018-09-13 18:25:48 -07:00
Edward Yang	e1cd220b90	Reimplement swap() using default move constructor. (#11659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11659 This is less error-prone and less code. Reviewed By: smessmer Differential Revision: D9814536 fbshipit-source-id: 028510e31e2fa7a9fa11c1398b0743c5cd085dd5	2018-09-13 16:32:55 -07:00
Edward Yang	02980d7f8c	Refactor Tensor/TensorImpl constructors. (#11657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11657 Previously, we had a constructor in TensorImpl for every constructor in Tensor. This was unnecessary and wordy: Tensor is the user-visible class, so it deserves the constructors, but TensorImpl is internal and doesn't need it. So I replaced TensorImpl with a single, Storage accepting constructor, and then rewrote Tensor to use that constructor. Reviewed By: jerryzh168 Differential Revision: D9813742 fbshipit-source-id: 7501b54fe5f39180f1bc07573fd7c1640b0f4e89	2018-09-13 16:32:53 -07:00
Edward Yang	7607b49538	s/GetDevicetype/device_type/ (#11656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11656 The mis-capitalization really sticks up my craw. I know why (we already have a static function named GetDeviceType), but let's name it differently. ``` codemod -d . --extensions cc,cpp,cu,cuh,h,py,hpp,TARGETS GetDevicetype device_type ``` Reviewed By: jerryzh168 Differential Revision: D9813544 fbshipit-source-id: fe462f4bc40b03e74921f8cf5ebd9cfc52e7e636	2018-09-13 16:32:51 -07:00
Edward Yang	c18510463b	Reduce includes in tensor_impl.h (#11643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11643 - Reduce the tensor_impl.h includes to the bare minimum necessary - Explicitly namespace std:: Reviewed By: jerryzh168 Differential Revision: D9811028 fbshipit-source-id: 44e32720962b35c12a7b2c93605721b9f6c5b254	2018-09-13 16:32:49 -07:00
Edward Yang	8402fde279	Revert D9778043: Pass Storage by value Differential Revision: D9778043 Original commit changeset: b1381cd60a82 fbshipit-source-id: 40f1de67e939cb41605978d632105a48a91e7629	2018-09-13 16:32:48 -07:00
Gregory Chanan	85ff72348d	Only involve tensor device in CUDA -> CPU copy, not current device. (#11592 ) Summary: This also unifies the device usage between the async and sync case. Fixes https://github.com/pytorch/pytorch/issues/10832. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11592 Differential Revision: D9797355 Pulled By: gchanan fbshipit-source-id: e496cd371111cfaf9a6c664167967b395e3d72e9	2018-09-13 16:32:46 -07:00
Sebastian Messmer	4672280b55	Pass Storage by value (#11546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11546 - Reviewed By: ezyang Differential Revision: D9778043 fbshipit-source-id: b1381cd60a826055ce8771d6c67eac4cc375b3b4	2018-09-13 15:26:05 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Pieter Noordhuis	29e29ca6ee	Use MPI_Isend/MPI_Irecv to back send/recv (#11630 ) Summary: The isCompleted function is changed to being non-const to accomodate setting some internal status on the work object in the case of completion. Previously, it was only checking a member field, but for the MPI backend it calls MPI_Test to poll for completion of an asynchronous request. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11630 Reviewed By: SsnL Differential Revision: D9808008 Pulled By: pietern fbshipit-source-id: 18b70825b1fb4d561a552fa75e9475a522852cd4	2018-09-13 15:01:24 -07:00
Marc Ferradou	f129da1a47	Add max to the ValueError for EmbeddingBag mode check (#11655 ) Summary: Related to #11624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655 Differential Revision: D9815454 Pulled By: SsnL fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a	2018-09-13 14:39:40 -07:00
Sebastian Messmer	90537289a0	Constexpr std::move / std::forward for C++11 (#11396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11396 std::move and std::forward in C++11 aren't constexpr (they are in C++14). This caused a build issue orionr was working on. It should be fixed by this diff Reviewed By: orionr Differential Revision: D9724805 fbshipit-source-id: 0d9047dce611385d659cc71a6c04cc7a6a40a5ae	2018-09-13 12:56:17 -07:00
James Reed	0f1ca569ce	End-to-end dynamic slicing with ONNX DynamicSlice experimental operator (#11255 ) Summary: Requires https://github.com/onnx/onnx/pull/1377 This PR makes it so that slices with dynamic boundary values can be exported from pytorch and run in caffe2 via ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11255 Differential Revision: D9790216 Pulled By: jamesr66a fbshipit-source-id: 6adfcddc5788df4d34d7ca98341077140402a3e2	2018-09-13 12:39:52 -07:00
Soumith Chintala	acb6f18bab	fix generate_code.py caching (#11644 ) Summary: Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed. This updated logic fixes the input caching Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644 Reviewed By: orionr Differential Revision: D9814348 Pulled By: soumith fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f	2018-09-13 12:39:48 -07:00
Roy Li	75f49befeb	move instance_norm to aten (#10792 ) Summary: This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792 Differential Revision: D9800643 Pulled By: li-roy fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1	2018-09-13 12:26:22 -07:00
Edward Yang	912d3626c8	Split tensor.h into tensor_impl.h and tensor.h (#11642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11642 This is just a preparatory change to help with future refactoring: - I want to reduce the number of includes that tensor_impl.h depends on, but - I need to keep tensor.h providing all Caffe2 headers, because users may be relying on tensor.h transitively providing those headers. Introducing a level of indirection lets me do both at the same time. Reviewed By: jerryzh168 Differential Revision: D9810823 fbshipit-source-id: 8dfaac4b8768051a22898be8fcaf787ecc57eb13	2018-09-13 12:26:20 -07:00
Richard Zou	45e9ee096e	Fix test_mnist_training_leaks_no_memory_cuda warning (#11639 ) Summary: Before this PR it would warn that "dropout is non deterministic and can cause problems when checking trace", so I disabled the trace checking. cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11639 Differential Revision: D9812493 Pulled By: zou3519 fbshipit-source-id: fab86928a5fba8b218b47543533aaf7c82a10b4a	2018-09-13 12:09:20 -07:00
Roy Li	9abc666745	stop allowing extra positional args in arg parser (#10499 ) Summary: Arg parser allowed additional positional args to be parsed into keyword-only params. Fixes a couple cases: - The positional argument happens to be of the right type, and it just works silently. Now, we fail as expected. - The positional argument fails later down the line. Now, we fail at the appropriate time and get a better error message. Pre-fix: ``` >>> torch.cuda.LongTensor((6, 0), 1, 1, 0) tensor([6, 0], device='cuda:1') ``` Post-fix: ``` >>> torch.cuda.LongTensor((6, 0), 1, 1, 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new() received an invalid combination of arguments - got (tuple, int, int, int), but expected one of: * (torch.device device) * (torch.Storage storage) * (Tensor other) * (tuple of ints size, torch.device device) * (object data, torch.device device) ``` Pre-fix: ``` >>> a = torch.tensor(5) >>> a.new_zeros((5,5), 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new_zeros(): argument 'dtype' (position 2) must be torch.dtype, not int ``` Post-fix: ``` >>> a = torch.tensor(5) >>> a.new_zeros((5,5), 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new_zeros() takes 1 positional argument but 2 were given ``` fixes #8351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10499 Differential Revision: D9811093 Pulled By: li-roy fbshipit-source-id: ce946270fd11b264ff1b09765db3300879491f76	2018-09-13 11:56:12 -07:00
David Riazati	6f53b4efea	Remove implicit bool casts (#11503 ) Summary: In order to comply with Python's rules on implicit casting of non-booleans to booleans, this PR removes implicit casting in favor of explicit casts via `bool()` cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11503 Differential Revision: D9780869 Pulled By: driazati fbshipit-source-id: c753acaca27f4e79dddf424c6b04674f44a6aad9	2018-09-13 11:26:45 -07:00
Zachary DeVito	ab3a2d25fb	Improve error messages when trying to use nested lists. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11606 Differential Revision: D9806949 Pulled By: zdevito fbshipit-source-id: c38abc4ce745a63d26a64f6aa1b41350e4b1acd5	2018-09-13 11:10:38 -07:00
Roger-luo	5bc90b8554	support conversion and dispatch of complex numbers (#11603 ) Summary: - Just a simple fix to support `fill_` - And a fix for indexing in `pytorch-complex` Differential Revision: D9804061 Pulled By: ezyang fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf	2018-09-13 11:10:37 -07:00
Roy Li	a861573e36	fix tensor export bug in IR export (#11613 ) Differential Revision: D9811094 Pulled By: li-roy fbshipit-source-id: 012792dbedc70bd3fa242fdf2e39da0b21ce158d	2018-09-13 11:10:35 -07:00
Lu Fang	d278344e36	Automatic update of fbcode/onnx to 39dd0d4fec5913aa517b71bcfcbf638a427894eb (#11622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11622 Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52 Included changes: - [39dd0d4](https://github.com/onnx/onnx/commit/39dd0d4): [build] Add ONNX_API for protos in all cases (#1407) <Orion Reblitz-Richardson> - [944db4f](https://github.com/onnx/onnx/commit/944db4f): cmake (#1401) <zrphercule> - [8ccc8dd](https://github.com/onnx/onnx/commit/8ccc8dd): Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (#1397) <Marat Dukhan> - [df14e74](https://github.com/onnx/onnx/commit/df14e74): Change onnxifi test driver classname (#1396) <zrphercule> - [0c885cc](https://github.com/onnx/onnx/commit/0c885cc): ONNXIFI cpp test driver (#1290) <zrphercule> - [a557848](https://github.com/onnx/onnx/commit/a557848): Coverage Report Tools for Backend Scoreboard (#1301) <Akshay Chalana> - [31fd87f](https://github.com/onnx/onnx/commit/31fd87f): fix AvgPool doc. add default value for count_include_pad (#1391) <Wenhao Hu> - [8ff08c2](https://github.com/onnx/onnx/commit/8ff08c2): Do not export onnx symbols in the python extension (#1388) <bddppq> Reviewed By: orionr Differential Revision: D9806635 fbshipit-source-id: f61c052b6bd14e0c80ace19c1a5f0ba659030c6f	2018-09-13 10:40:48 -07:00
Edward Yang	1f49b879d1	Add missing include for __half (#11638 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11638 Differential Revision: D9811063 Pulled By: ezyang fbshipit-source-id: dd103bb152485bcdbb0108b4d3de2443c30d5572	2018-09-13 10:33:09 -07:00
Tongzhou Wang	d4d72b87e3	Sphinx is case sensitive Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11646 Differential Revision: D9811355 Pulled By: SsnL fbshipit-source-id: d484561baa2ac5b3113870b4ee06fa3560b686e4	2018-09-13 10:33:06 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Christian Puhrsch	36fc1a0a58	Merge caffe2::/at::Storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11637 Reviewed By: gchanan Differential Revision: D9806425 Pulled By: ezyang fbshipit-source-id: e20ec93bff6dc7fb22ca9b7e7348d060b3876b67	2018-09-13 09:40:48 -07:00
Elias Ellison	77f6998e54	Guard against inputting or returning sparse tensors (#11550 ) Summary: Add guards against using sparse tensor by checking the conversion from IValue -> PyObject & PyObject -> IValue. This diff also changes the behavior in constant propagation to not run python ops even if all ops are constant because of possible mutation to global state. This came up in trying to run get_sparse(), and I'm including it here to make it easier to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11550 Differential Revision: D9804712 Pulled By: eellison fbshipit-source-id: 9fe7daf721c6d6e48df4925c0f9c775873bcdc77	2018-09-13 08:58:29 -07:00
Christian Puhrsch	cac11a4ac3	Merge caffe2::/at::StorageImpl (#11543 ) Summary: Merges caffe2::StorageImpl methods with at::StorageImpl methods and defines caffe2::StorageImpl as at::StorageImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11543 Differential Revision: D9795228 Pulled By: cpuhrsch fbshipit-source-id: fbd6fa3cbf6c9099a4803337286c30e00652f95c	2018-09-13 01:25:50 -07:00
Wanchao Liang	44b2b6b150	clean up jit generated tests (#11403 ) Summary: Clean up some generated tests after we have newly nice features like var args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11403 Differential Revision: D9800545 Pulled By: wanchaol fbshipit-source-id: e9973b113f78dc38cf99a81b6ede3fa3485f1cfa	2018-09-12 22:55:03 -07:00
Christian Puhrsch	e998038bc0	Use TypeMeta instead of TypeIdentifier within at::StorageImpl (#11236 ) Summary: Further aligns at::StorageImpl with caffe2::StorageImpl Pull Request resolved: https://github.com/pytorch/pytorch/pull/11236 Differential Revision: D9776286 Pulled By: cpuhrsch fbshipit-source-id: f2c53995fcece013b77b3a1f709ab0f9df8ab23e	2018-09-12 22:26:00 -07:00
Edward Yang	6f05b5ee54	Pin Sphinx to 1.7.9 (#11620 ) Summary: Sphinx 1.8.0 breaks us. Upgrading is tracked in #11618. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11620 Differential Revision: D9806440 Pulled By: ezyang fbshipit-source-id: 7a8d849c78e697a8775d00cd3a463a7bdbcddabe	2018-09-12 21:55:21 -07:00
Guan Pang	17637f2b03	enable_mkl support for resnet18+lstm model Summary: * Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU * Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part * The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash. Reviewed By: viswanathgs Differential Revision: D8888763 fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6	2018-09-12 18:56:46 -07:00
Orion Reblitz-Richardson	0a6931cfee	Only reference ONNX through onnx_pb.h (#11609 ) Summary: I think this is needed to land https://github.com/onnx/onnx/pull/1407 without CI errors. cc mingzhe09088 houseroad Pull Request resolved: https://github.com/pytorch/pytorch/pull/11609 Reviewed By: houseroad Differential Revision: D9803490 Pulled By: orionr fbshipit-source-id: 26193f38ab0a2eef9ad7d0da9a0310dc40ef0f2d	2018-09-12 18:25:58 -07:00
Edward Z. Yang	5da0b31bee	More native docs on TensorOptions. (#11558 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11558 Differential Revision: D9783655 Pulled By: ezyang fbshipit-source-id: 17c749c9ef99fd9dfd0ff365ebfe22102fb891d7	2018-09-12 17:39:39 -07:00
Roy Li	f00f99ebcc	use at::Half in THC (#11322 ) Summary: - use Half instead of half in THC - clean up TH_float2half, TH_half2float, etc. conversions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11322 Differential Revision: D9799553 Pulled By: li-roy fbshipit-source-id: 9aa3e003bff73d9df6224a393f3ec0624b1f44ed	2018-09-12 17:39:37 -07:00
Edward Yang	daa379ffd7	Disable flaky test ObserverTest.TestMultipleNetBase (#11596 ) Summary: Tracked in https://github.com/pytorch/pytorch/issues/9137 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11596 Differential Revision: D9803256 Pulled By: ezyang fbshipit-source-id: 973393203ed8343a3a0feef36d34e561d9f653c4	2018-09-12 17:39:36 -07:00
Edward Yang	e2cd627cce	Temporarily disable docs build. (#11608 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11608 Differential Revision: D9803369 Pulled By: ezyang fbshipit-source-id: a206d6137e8e729f702189c926ec898444d1dc53	2018-09-12 17:39:34 -07:00
Xiaomeng Yang	7f7cda99cd	Optimize order_swich_ops on GPU (#11404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404 Optimize order_swich_ops on GPU Reviewed By: houseroad Differential Revision: D9728642 fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632	2018-09-12 16:56:15 -07:00
Johannes M Dieterich	776a9992e1	topk test fix, hgemm integration (#11593 ) Summary: After discussions in #11584 , new PR for just the test skip and hgemm integration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11593 Differential Revision: D9798527 Pulled By: ezyang fbshipit-source-id: e2ef5609676571caef2f8e6844909fe3a11d8b3e	2018-09-12 16:56:13 -07:00
Edward Yang	def44c96fd	Revert D9779866: [pytorch][PR] Move function deletion from the stack to the heap. Differential Revision: D9779866 Original commit changeset: 96753eead790 fbshipit-source-id: 959deeb63318d48f4c563e10e70ef6ec7fabd3b4	2018-09-12 16:56:11 -07:00
Peter Goldsborough	5b2efcf425	Document the Conv module (#11566 ) Summary: Document the C++ API conv module. No code changes. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11566 Differential Revision: D9793665 Pulled By: goldsborough fbshipit-source-id: 5f7f0605f952fadc62ffbcb8eca4183d4142c451	2018-09-12 16:56:09 -07:00
Peter Goldsborough	130d55a5f4	Allow building the C++ API without cereal (#11498 ) Summary: I am working on unifying the C++ extensions and C++ API, and one constraint for this is that we will want to be able to build the C++ API without cereal, since we won't want to ship it with the Python `torch` package. For this I introduce a `TORCH_WITH_CEREAL` option to CMake. If on, the C++ API will be built with cereal and thus serialization support. If off, serialization functions will throw exceptions, but the library will otherwise still compile the same. __This option is on by default, so for regular C++ API users nothing will change__. However, from C++ extensions, we'll be able to turn it off. This effectively means we won't be searching for any cereal headers from C++ API headers, which wouldn't be installed in the Python package. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11498 Differential Revision: D9784803 Pulled By: goldsborough fbshipit-source-id: 5d0a1f2501993012d28cf3d730f45932b483abc4	2018-09-12 16:56:07 -07:00
Eli Amesefe	12efef166a	Split out copy_op from utility_ops (#11470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11470 In order to reduce build sizes, we are identifying files that can be split up into smaller units, allowing us to only include the ops we need. Reviewed By: orionr, ajtulloch Differential Revision: D9725819 fbshipit-source-id: def1074a33dffe99bd6a7e6e48aa9e5be3d04a6a	2018-09-12 16:25:48 -07:00
Yinghai Lu	316c167940	Add checking of nullptrs in GetTensorInfo (#11587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11587 To help debug the issue in T33295362, we add some checks in the function. Possible crashing site in `GetTensorInfo` 1. tc is nullptr, which is checked. 2. tc->capacity_nbytes() hits nullptr, this is unlikely because storage is not a pointer and compute of capacity_nbytes doesn't involve pointers. It's numel * itermsize(). 3. tc->ExtractDeviceOption hits nullpt. One possibility raw_data() is nullptr because tc->ExtractDeviceOption will use that. This is checked. 4. Tensor itself which is not a reference. This is also checked. Reviewed By: salexspb Differential Revision: D9793484 fbshipit-source-id: 3fc72746fc310a23ae45553bbe0d269a4b9edb72	2018-09-12 16:25:46 -07:00
Xiaodong Wang	eb7a298489	Add resnext model to OSS (#11468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11468 Add resnext model into OSS Caffe 2 repo. Reviewed By: orionr, kuttas Differential Revision: D9506000 fbshipit-source-id: 236005d5d7dbeb8c2864014b1eea03810618d8e8	2018-09-12 15:59:20 -07:00
Peter Goldsborough	c81406c514	Document Any (#11580 ) Summary: Documents the `AnyModule` class in the C++ API. Also changed the API to be friendlier by default. Calling `AnyModule::forward` used to return an `AnyModule::Value` which you had to call `.get<T>()` on to cast to a concrete type. I changed the name of that `forward` method to `any_forward` and instead made `forward` templated on a `ReturnType` template parameter which you can supply to do the `.get<T>` cast for you automatically. I default this parameter to `torch::Tensor` so that it can often be omitted. So where you used to have to write ```cpp any_module.forward(...).get<int>(); any_module.forward(...).get<torch::Tensor>(); ``` you now write ```cpp any_module.forward<int>(...); any_module.forward(...); ``` ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11580 Differential Revision: D9798626 Pulled By: goldsborough fbshipit-source-id: 060b4ea28facaffc417f53b80b846a9dff9acb73	2018-09-12 15:59:19 -07:00
Tongzhou Wang	ac94889939	Add jit doc entry to sidebar (#11598 ) Summary: cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11598 Differential Revision: D9801230 Pulled By: SsnL fbshipit-source-id: f0c8d2468b64a50c3c437667d462722dcd2682d1	2018-09-12 15:29:23 -07:00
Edward Yang	b663b7ce7e	Update ROCm Docker image with latest AMD debians (#11507 ) Summary: Building at https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/194/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11507 Differential Revision: D9772474 Pulled By: ezyang fbshipit-source-id: ab00f05744547dc7ec9f97511e2c8495ac282fac	2018-09-12 15:29:21 -07:00
Tongzhou Wang	02c4cd3c8a	Skip flaky distributed tests (#11594 ) Summary: context: https://github.com/pytorch/pytorch/issues/11582 cc pietern The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11594 Differential Revision: D9798871 Pulled By: SsnL fbshipit-source-id: 9f9e1871c7fd9505ca898865eb8068fab4d3416d	2018-09-12 14:57:57 -07:00
Owen Anderson	d4e05f4e1e	Move function deletion from the stack to the heap. (#11534 ) Summary: This eliminates the need for any heuristics regarding stack size limits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11534 Differential Revision: D9779866 Pulled By: resistor fbshipit-source-id: 96753eead7904bbdc2869fb01f7bd42141032347	2018-09-12 14:39:59 -07:00
Lingyi Liu	958ba4e913	Aibench for asr decoder Summary: as title Reviewed By: sf-wind Differential Revision: D9738021 fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251	2018-09-12 14:25:19 -07:00
Edward Yang	f0a440007e	Explicitly set locale on docs build. (#11595 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11595 Differential Revision: D9798567 Pulled By: ezyang fbshipit-source-id: ac05458347e181960a07cacae1dfc68d2837451f	2018-09-12 14:11:24 -07:00
James Reed	504126e705	Documentation for debugging JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11540 Differential Revision: D9798647 Pulled By: jamesr66a fbshipit-source-id: 968a4af22c735a848fa27cbadaed9b7023ba8276	2018-09-12 14:11:22 -07:00
Michael Carilli	a3036b3bb3	Fused weightnorm for ATen (#10842 ) Summary: This PR contains a C++ implementation of weight norm. The user-side exposure of weight norm through torch.nn.utils.weight_norm is unchanged. If running on the GPU, and the norm is requested over the first or last dimension of the weight tensor, the forward pass is carried out using the fused kernels I wrote for our Fairseq GTC hero run, which offer superior performance to primitive ops and superior numerical stability when running in FP16. In the common case that the backward pass is not itself constructing a graph (ie not attempting to set up double backward) the backward pass will be carried out using another fused kernel. If the backward pass is constructing a graph, an alternate code path is taken, which does the math using differentiable primitive ops. In this way, the implementation allows double backward, even if the fused kernel was used in forward (although in this case, you don't benefit from the performance and stability of the fused backward kernel). If running on the CPU, or if norming over an interior dim, the forward pass is carried out using double-differentiable primitive ops. Figuring out how to generate all the right plumbing for this was tricky, but it was a fun experience learning how the autogenerator works and how the graph is constructed. Thanks to colesbury for useful guidance on this front. I do have a few lingering questions: - Should I unify my return statements (ie by default-constructing Tensors outside if blocks and using operator= within)? - What is the significance of `non_blocking` when calling e.g. `auto norms = saved_norms.to(saved_g.type().scalarType(), non_blocking=True/False);`? I am currently omitting `non_blocking`, so it defaults to False, but I didn't see any associated synchronizes on the timeline, so I'm wondering what it means. - Is there an "official" mapping from at::ScalarTypes to corresponding accumulate types, as there are for the PODs + Half in [AccumulateType.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h)? I looked for an equivalent mapping for ScalarTypes, didn't find one, and ended up rigging it myself (` at::ScalarType AccType = g.type().scalarType() == at::ScalarType::Half ? at::ScalarType::Float : g.type().scalarType();`). - Are sparse tensors a concern? Should I include another check for sparse tensors in the `_weight_norm` entry point, and send those along the fallback CPU path as well? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10842 Differential Revision: D9735531 Pulled By: ezyang fbshipit-source-id: 24431d46532cf5503876b3bd450d5ca775b3eaee	2018-09-12 13:55:27 -07:00
Gregory Chanan	9a7c196040	Move Type, Tensor, TensorMethods to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519 Reviewed By: yf225 Differential Revision: D9771684 Pulled By: gchanan fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0	2018-09-12 13:10:54 -07:00
Wanchao Liang	739e6af869	Add reminder % to the jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11557 Reviewed By: apaszke Differential Revision: D9784642 Pulled By: wanchaol fbshipit-source-id: b7c60c3e9534555c9d7db83769965b3f2f277cdf	2018-09-12 12:40:38 -07:00
Zachary DeVito	ad7936e108	Fix reloading modules back into python (#11552 ) Summary: This changes the way module import works so that when a module is reloaded in python it becomes a ScriptModule and not a _C.ScriptModule Pull Request resolved: https://github.com/pytorch/pytorch/pull/11552 Differential Revision: D9782751 Pulled By: zdevito fbshipit-source-id: 9576850b75494b228ce3def94c0d371a4a44b11d	2018-09-12 12:25:15 -07:00
Gao, Xiang	17e76e26c8	Add trigonometry functions to docs/source/onnx.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11581 Differential Revision: D9794449 Pulled By: soumith fbshipit-source-id: 1218fcf8969a10ffbfefd3ced7fee9fe7df296f1	2018-09-12 12:10:01 -07:00
Richard Zou	13b05c8c78	Add EndToEndHybridModel CUDA tests (#11544 ) Summary: Also adds two additional tests that check for memory leaks while the relevant graph executors are alive: - (minimal test): Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive - (large test) Do MNIST training with a traced MNIST module and test that no memory is leaked while the traced module (with graph executor) is alive cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11544 Reviewed By: apaszke Differential Revision: D9778479 Pulled By: zou3519 fbshipit-source-id: 2d6cdea81dd1264f2c0396b662f70fdafecb3647	2018-09-12 11:25:18 -07:00
Yan Zhu	23d55883c0	minor formatting error log (#11528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11528 as title Reviewed By: chocjy Differential Revision: D9773214 fbshipit-source-id: b7dd4c19ab83a18f344de8e71ce5b3bf74d1af72	2018-09-12 11:25:17 -07:00
zou3519	6398d626f4	Warn that export+import module always load onto the CPU (#11485 ) Summary: Test Plan `cd docs && make html` ![image](https://user-images.githubusercontent.com/5652049/45325074-ed04e480-b51d-11e8-9d2d-685dbe8a08e9.png) cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11485 Differential Revision: D9772119 Pulled By: zou3519 fbshipit-source-id: 3dcb16c9edc2e8deebef17accf91a1c7d4dc9063	2018-09-12 10:55:39 -07:00
Christian Puhrsch	12f4c46eea	caffe2::StorageImpl use at::DataPtr (#11282 ) Summary: See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/11282 Reviewed By: ezyang Differential Revision: D9658503 Pulled By: cpuhrsch fbshipit-source-id: 42fa73c979692cb1069c0345744a85d12150745c	2018-09-12 09:39:23 -07:00
Edward Yang	e5dd77c7ad	Sync all libnccl soversions, not just libnccl.so.1 (#11575 ) Summary: Fixes: ``` /bin/ld: warning: libnccl.so.1, needed by /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so, not found (try using -rp ath or -rpath-link) /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllReduce' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclBcast' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommInitAll' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGetErrorString' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduceScatter' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllGather' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduce' ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11575 Differential Revision: D9789956 Pulled By: ezyang fbshipit-source-id: 63e48763cc233be9d137cec721b239159b511a24	2018-09-12 09:24:51 -07:00
Peter Goldsborough	f0a284502a	Document BatchNorm and update default behavior (#11484 ) Summary: This PR: 1. Documents `BatchNorm`, 2. Makes a number of API changes after reconsidering some quirks: 1. The default value for the `stateful` parameter used to be `false`, but the most common usage of `BatchNorm` out of the wild is certainly stateful, and the default in Python is also statefulness. So we change the default to stateful. 2. The `pure_forward` function used to use the internal running mean and variance variables instead of the ones supplied to that function call when `stateful` was true, which certainly seems odd. When you call `pure_forward` you would certainly expect the values you pass explicitly to be used. This is now fixed. 3. Adds tests for `BatchNorm`, finally. ebetica apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11484 Reviewed By: pjh5 Differential Revision: D9779618 Pulled By: goldsborough fbshipit-source-id: 59ba760e085c01454b75644b24b22317b688e459	2018-09-12 09:09:53 -07:00
Rasmus Diederichsen	6fc18a7541	Typo fix in randomness.rst (#11571 ) Summary: "need to be" -> "need not be" Pull Request resolved: https://github.com/pytorch/pytorch/pull/11571 Differential Revision: D9786001 Pulled By: soumith fbshipit-source-id: 7cc408f5c8bfcc56d4b5c153646f30e1cec37539	2018-09-12 08:25:46 -07:00
Thomas Viehmann	efc0f6784a	Move some bmm/baddbmm to ATen (#11292 ) Summary: - Incorporates MKL addition by mingfeima Thank you! (but all errors are my own) - Native CPU implementation: defer to matrix multiplication for small batches and parallelize over batch dimension for large batches. - Add bmm test for CUDA just to be sure. This is a partial fix for #10661, getting down to a factor ~5. Considerable overhead is incurred for the setup in einsum. It might be more efficient to eventually define an optimized contraction functions for arbitrary and several dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11292 Differential Revision: D9784941 Pulled By: ezyang fbshipit-source-id: f6dded2c6f5e8f0461fb38f31f9a824992a58358	2018-09-12 07:09:55 -07:00
Teng Li	76070fe73c	Make c10d test work on CPU only build (#11567 ) Summary: Make test work with CPU only build, also fixed the test failures for a long time Pull Request resolved: https://github.com/pytorch/pytorch/pull/11567 Differential Revision: D9785740 Pulled By: teng-li fbshipit-source-id: 61c43b758c1ee53117e30de8074583e6faea863a	2018-09-12 01:39:44 -07:00
Owen Anderson	6597779847	Clean up some C++ cruftiness in the script lexer. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11408 Differential Revision: D9772843 Pulled By: resistor fbshipit-source-id: 07f16bf7eaf4f1d8700e46e91a485de4b2d9ed83	2018-09-11 23:55:31 -07:00
Peter Goldsborough	3e3d8caecd	Allow setting deletion constant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11529 Differential Revision: D9775398 Pulled By: goldsborough fbshipit-source-id: 8593d1afcf8be3150dcc4a58433f53307e3ae665	2018-09-11 23:11:46 -07:00
Teng Li	6dcdbd3a1d	Make C10d support CPU only build (#11513 ) Summary: This makes torch.distributed works for CPU only build. Also added one more CI test case to cover MPI CPU build. All CI tests should cover this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513 Differential Revision: D9784546 Pulled By: teng-li fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7	2018-09-11 22:10:34 -07:00
Adam Paszke	90e31f4896	Improve tracer warnings (#11545 ) Summary: Also, fix a performance bug in `ensureUnique`. Previously it formatted the warning string even though we weren't tracing, so all that work would always happen in the hot path and be for nothing. A sample of how the new warnings look like: ``` tmp.py:4: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Pytho n values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! int(x) tmp.py:5: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this fun ction to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might caus e the trace to be incorrect. torch.tensor([1.]) tmp.py:6: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator add_. This might cause t he trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other ha nd, if all other views use the same memory, but are disjoint (e.g. are outputs of torch.split), this might still be safe. torch.split(y, 2, dim=1)[0].add_(2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11545 Differential Revision: D9782975 Pulled By: apaszke fbshipit-source-id: 5b3abd31366e59c69e0b7ff278042b5563deb5a9	2018-09-11 22:10:32 -07:00
Adam Paszke	62c9d4ac96	Make .to() methods native functions (to fix JIT tracing) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491 Differential Revision: D9771121 Pulled By: apaszke fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7	2018-09-11 21:55:42 -07:00
Adam Paszke	a00fa2c614	Release GIL when calling into JIT interpreter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11541 Differential Revision: D9777909 Pulled By: apaszke fbshipit-source-id: d0217e203721262f3f131b54ea78f898df0b54ec	2018-09-11 21:55:40 -07:00
Soumith Chintala	1a246c9c7e	guard spurious cudnn.h include (#11562 ) Summary: This fixes the build when CuDNN was not found on the system. From the `git blame`, it looks like the bug has been around for 2 years :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11562 Differential Revision: D9784589 Pulled By: soumith fbshipit-source-id: b33153436dced0a503c9833cdf52f7093f3394b4	2018-09-11 21:09:54 -07:00
Tongliang Liao	a11ebfa195	Add explicit "this->" for nvcc. (#11196 ) Summary: Fix #11195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11196 Differential Revision: D9737625 Pulled By: ezyang fbshipit-source-id: fb62076f005bd619eba53c0ed3f07683633f6d91	2018-09-11 21:09:52 -07:00
Rasmus Diederichsen	8aa8ad8b01	WIP: Reproducibility note (#11329 ) Summary: This adds a Note on making experiments reproducible. It also adds Instructions for building the Documentation to `README.md`. Please ping if I missed any requirements. I'm not sure what to do about the submodule changes. Please advise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11329 Differential Revision: D9784939 Pulled By: ezyang fbshipit-source-id: 5c5acbe343d1fffb15bdcb84c6d8d925c2ffcc5e	2018-09-11 21:09:51 -07:00
Anders Papitto	b75c32ded9	link against TORCH_CUDA_LIBRARIES Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11475 Differential Revision: D9784616 Pulled By: anderspapitto fbshipit-source-id: bb8b443bcb308bbbe9707d265f21e5d00d717d65	2018-09-11 20:39:53 -07:00
Peter Goldsborough	f4d9f39a94	Test libtorch on cuda Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11554 Differential Revision: D9784772 Pulled By: goldsborough fbshipit-source-id: c3e071695f56c1f427984f427b1f7722722947d3	2018-09-11 20:39:51 -07:00
Rasmus Diederichsen	35348dab10	WIP: Include note on cudnn determinism in each function backed by cudnn (#11434 ) Summary: Ping ezyang This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong? Once #11329 is merged it might make sense to link to the reproducibility note everywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434 Differential Revision: D9751208 Pulled By: ezyang fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9	2018-09-11 20:27:09 -07:00
Wei Yang	54107ae8cf	convert output_device at data_parallel from torch.device to index (#10189 ) Summary: - fixes #9984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189 Differential Revision: D9545390 Pulled By: weiyangfb fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e	2018-09-11 20:27:07 -07:00
Peter Goldsborough	045f862574	Use torch::nn::init::xavier_normal_ Summary: The PyTorch C++ API has `torch.nn.init` equivalents that the RNNG can use to initialize the state of its StackRNNs. This gets rid of the `fanInOut_` methods on `Parser` and tidies up `xavierInitialState` a little. Reviewed By: wowitsmrinal Differential Revision: D9472595 fbshipit-source-id: c202116f32383d3b4bba064c2c0d2656311e1170	2018-09-11 20:27:06 -07:00
Peter Goldsborough	d95fedb436	Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458 ) Summary: This PR does two things: 1. Replaces the implementation of the `Dropout` module with a call to the ATen function, 2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`. I also replaced the implementation of `dropout3d` with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly). ebetica ezyang SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458 Differential Revision: D9756603 Pulled By: goldsborough fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c	2018-09-11 20:16:12 -07:00
Yangqing Jia	3121c8f526	Update gtest and remove the macro guide on gtest from #11321 (#11417 ) Summary: Last PR seems to have test failures, re-issuing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11417 Reviewed By: orionr Differential Revision: D9784706 Pulled By: Yangqing fbshipit-source-id: 9e5f347e19fa2700ff69d2cd69ea7a9e01a91609	2018-09-11 20:16:08 -07:00
Edward Yang	92fd69f256	Split Type into TypeExtendedInterface and Type (#11520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11520 Previously, we had Type which was a catch all interface for all functions and methods we could possibly want to do dynamic dispatch on. However, we want to check in a non-autogenerated Tensor class to ATen/core, and to do this, we must also check in a non-autogenerated Type class which we can do dispatch on. In principle, we could put the full Type interface in ATen/core, but this would be a bad developer experience, since any time you add a new free function, you'd have to regenerate the checked in Type header. For a better dev experience, we split Type into a two parts, Type, which will be checked in (though not in this diff), and TypeExtendedInterface, which will NOT be checked in. Type contains just enough methods to let Tensor be defined, and leaves the rest to TypeExtendedInterface. Some complications: - We (very unfortunately) have overloaded virtual methods. Because of C++'s rules, we cannot move one overload without doing some extra work to make sure that overload in a superclass and an overload in a subclass resolve together. I've chosen to resolve this problem simply by moving ALL overloads of a method which occurs in Tensor to Type. - There are some places where we take a type() object and call a method on it, which is not a Tensor base method. I've eliminated some where possible, but in other cases calling the method on type is the ONLY way to invoke it; in that case, I've just inserted a cast. Further refactoring is necessary. Reviewed By: gchanan Differential Revision: D9771708 fbshipit-source-id: c59d39fe919cd6f42be6dca699d474346ea3c614	2018-09-11 20:16:04 -07:00
Yangqing Jia	35d52dbb0e	re-enable USE_MPI (#11416 ) Summary: The previous error was caused by mpi_test not depending on MPI_CXX_LIBRARIES. This might solve the problem. Not tested locally - waiting for CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11416 Reviewed By: mingzhe09088 Differential Revision: D9771694 Pulled By: Yangqing fbshipit-source-id: 53e7b4f64eadc88313bc4dd9b8e3f7931cda6e91	2018-09-11 18:26:12 -07:00
Fritz Obermeyer	bbf54ea37c	Ensure .enumerate_support() methods are jittable (#11542 ) Summary: This works around #11535 by avoiding `arange(n, out=x)` and `eye(n, out=x)` in `torch.distributions`. I've confirmed that the `.enumerate_support()` methods are now jittable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11542 Differential Revision: D9777805 Pulled By: apaszke fbshipit-source-id: fa38f2f1acfc0a289f725fd8c92478573cfdbefb	2018-09-11 18:26:09 -07:00
Wei Yang	cda74ac476	fix nested no_grad decorator and with-statement (#11479 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10858 - allow `no_grad` decorator to apply `with torch.no_grad()` at the correct context - current behavior: ``` import torch torch.no_grad() def nothing(x): return x testin = torch.Tensor([0]) with torch.no_grad(): print(torch.is_grad_enabled()) # False testout = nothing(testin) print(torch.is_grad_enabled()) # False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11479 Differential Revision: D9758691 Pulled By: weiyangfb fbshipit-source-id: 87de2219c6c45f65a2c0406ae152c3ad760be8f2	2018-09-11 17:56:40 -07:00
Adam Paszke	8b196d671b	Allow tracing random functions (only when using default generators) (#11539 ) Summary: Fixes #11504. zdevito, neerajprad, fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/11539 Differential Revision: D9777897 Pulled By: apaszke fbshipit-source-id: 56983260f5b93da7d5540a6242769ea7bd50eb06	2018-09-11 17:56:39 -07:00
Soumith Chintala	b6b0b5222d	fix missing libnccl.so.1 error (#11553 ) Summary: what it says on the tin. I broke the build in https://github.com/pytorch/pytorch/pull/11487 but contbuild didn't end up catching it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11553 Differential Revision: D9781557 Pulled By: soumith fbshipit-source-id: 2a1fa314af4b85b5491d74110bfee3d80599aa95	2018-09-11 17:25:58 -07:00
Tongzhou Wang	3a39006d38	Fix some more doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11531 Differential Revision: D9776541 Pulled By: SsnL fbshipit-source-id: 8725485639ea6e9479b6ea95a49f5b75a9457db7	2018-09-11 16:26:55 -07:00
Roger-luo	3a8e39b215	Support load and store between Py_complex and std::complex (#11493 ) Summary: Printing for complex numbers requires loading and storing between `Py_complex` and `std::complex`. This patch aims to support this for the plugin. Differential Revision: D9771808 Pulled By: ezyang fbshipit-source-id: 024865f1945d63ddb5efc775a35438c8ea06408e	2018-09-11 15:55:11 -07:00
Zachary DeVito	289a8c9b7d	Allow train/eval, and non-Tensor arguments to python functions (#11505 ) Summary: This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work. This also changes the code for calling python functions from script to allow non-tensor inputs/outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505 Differential Revision: D9765466 Pulled By: zdevito fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69	2018-09-11 15:05:09 -07:00
Yangqing Jia	17776db2ee	Add gtest dependency on aten tests. (#11429 ) Summary: ezyang delivering my promise to you :) Basically, now aten tests can use gtest as part of our test harness unification effort. I also converted one test (atest.cpp) to show how one can do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11429 Reviewed By: ezyang Differential Revision: D9762934 Pulled By: Yangqing fbshipit-source-id: 68ec3a748403c6bd88399b1e756200985a4e07e3	2018-09-11 13:39:51 -07:00
Lukasz Wesolowski	4db21a1d8e	Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413 LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result. Reviewed By: manojkris, xianjiec Differential Revision: D9724988 fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900	2018-09-11 13:25:35 -07:00
Thomas Viehmann	c1dce21fd5	Cuda TensorAccessor (#11373 ) Summary: Provide a TensorAccessor-Like interface for CUDA as discussed in #8366. Compared to TensorAccessor - the CUDATensorAccessor copies the sizes and strides while on the host (I didn't implement a host indexing function, though) to enable transfer to the device, on the device, `[]` works like for TensorAccessors, - instantiation is from TensorAccessors in order to allow using `.accessor<..>`. The drawback is that it you cannot use `auto` for the variable declaration, but the alternative would be a cuda-specific `.accessor`-like function, - there is a PtrTraits argument to enable `__restrict__`, Example for the intended use: ``` ... template <typename scalar_t> __global__ void apply_homography_2d_kernel(cuda::CUDATensorAccessor<scalar_t, 4> dest_a, cuda::CUDATensorAccessor<scalar_t, 4> src_a, cuda::CUDATensorAccessor<float, 2> transform) { ... } template <typename scalar_t> Tensor apply_homography_2d_template(Tensor& res, const Tensor& image, const Tensor& transform) { ... cuda::CUDATensorAccessor<scalar_t, 4> image_a(image.accessor<scalar_t, 4>()); cuda::CUDATensorAccessor<scalar_t, 4> res_a(res.accessor<scalar_t, 4>()); cuda::CUDATensorAccessor<float, 2> transform_a(transform.accessor<float, 2>()); auto stream = at::cuda::getCurrentCUDAStream(); apply_homography_2d_kernel<scalar_t> <<<grid, block, 0, stream>>>(res_a, image_a, transform_a); return res; } ... ``` I could use a hint where to put a test for this (e.g. doing a plain vanilla matrix multiplication with a custom kernel) and comparing with the aten mm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11373 Differential Revision: D9735573 Pulled By: ezyang fbshipit-source-id: 482b218a0d514e19a8b692bbc77c0e37082cfded	2018-09-11 13:09:33 -07:00
vishwakftw	c56a7cfc37	More use of AT_CHECK and AT_ERROR (#11457 ) Summary: Considering these increase the size of the message stack, I didn't touch the code outside `ATen/native` Differential Revision: D9754283 Pulled By: soumith fbshipit-source-id: 04198ec4fd0c4abae09eeba92c493a783408537a	2018-09-11 12:55:09 -07:00
Will Feng	5952acc041	Add "merge to master" step before build in CircleCI (#11443 ) Summary: This PR adds the "merge to master" step before the build step in CircleCI, so that all PR commits are built against master instead of against the PR's branch. Note that all PRs still need to rebase to master to pick up this new config, so it won't apply to old PR branches retroactively. To check in CI: make sure it's performing the git merge to master appropriately in "Merge Onto Master" step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11443 Differential Revision: D9775628 Pulled By: yf225 fbshipit-source-id: 8083db6b098d234a44ae4481f40a486e9906f6f8	2018-09-11 12:39:37 -07:00
James Reed	fbc17321fd	Update pybind11 to fix Python 3.7 support for script (#11473 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11419 In particular pulling in https://github.com/pybind/pybind11/pull/1454 as well as pending bugfix in https://github.com/pybind/pybind11/pull/1517 (documenting in comment) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11473 Differential Revision: D9776003 Pulled By: jamesr66a fbshipit-source-id: a225dcfb66c06bcae98fd2508d9e690c24be551a	2018-09-11 12:39:36 -07:00
Adam Paszke	781737f84c	Remove time prefix from rsync (#11525 ) Summary: This fails with zsh saying "time: command not found". cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11525 Differential Revision: D9772522 Pulled By: apaszke fbshipit-source-id: b80d108fa6b174d68ada08a9fdbf7260ee37e08f	2018-09-11 12:10:24 -07:00
Will Feng	a566bc2f11	Disable all CircleCI jobs (#11523 ) Summary: Disable all CircleCI jobs until we are ready to move forward with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11523 Differential Revision: D9774462 Pulled By: yf225 fbshipit-source-id: c5724e71eb68bac4df958b4f7bcc380050668b3c	2018-09-11 11:25:17 -07:00
Fei Sun	d09041bd81	Add an option to statically link cuda (#10596 ) Summary: Need to link CUDA statically for benchmarking purpose. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10596 Reviewed By: llyfacebook Differential Revision: D9370738 Pulled By: sf-wind fbshipit-source-id: 4464d62473e95fe8db65b0bd3b301f262bf269bf	2018-09-11 11:09:29 -07:00
Lu Fang	727a4453aa	New Serialization Proto Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11166 Reviewed By: mingzhe09088 Differential Revision: D9623522 Pulled By: houseroad fbshipit-source-id: f21153034a398de7959404321d8534234cd58a40	2018-09-11 10:55:43 -07:00
Edward Yang	f80f15866b	Get rid of manual dispatch on Type. (#11486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11486 I discovered these by narrowing the interface on Type, and then fixing call sites outside of core plumbing code which depended on these methods being provided. Reviewed By: cpuhrsch Differential Revision: D9757935 fbshipit-source-id: 3abda0c98919a448a326a757671d438964f6909f	2018-09-11 10:40:22 -07:00
Peter Goldsborough	01c7542f43	Use -isystem for system includes in C++ extensions (#11459 ) Summary: I noticed warnings from within pybind11 being shown when building C++ extensions. This can be avoided by including non-user-supplied headers with `-isystem` instead of `-I` I hope this works on Windows. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11459 Differential Revision: D9764444 Pulled By: goldsborough fbshipit-source-id: b288572106078f347f0342f158f9e2b63a58c235	2018-09-11 10:40:20 -07:00
Orion Reblitz-Richardson	d32b41003a	Copy protos on install same as develop (#11517 ) Summary: This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517 Differential Revision: D9771911 Pulled By: orionr fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9	2018-09-11 10:09:56 -07:00
James Reed	deac304b6b	Bugfix for basic slicing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11428 Differential Revision: D9753999 Pulled By: jamesr66a fbshipit-source-id: cfc4163a5a06b41beb808a4e24650d71f5d91f4f	2018-09-11 09:39:29 -07:00
Soumith Chintala	4e8d9a4a58	Introducing python setup.py rebuild develop (#11487 ) Summary: This speeds up incremental builds by doing the following changes: - Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy" - Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`. Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun Current no-op rebuild speedup: - 1m 15s -> 20s There are some lingering bugs. No-op rebuilds rerun `cmake` for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild). So what you see ``` python setup.py rebuild develop # first time - ~5 mins python setup.py rebuild develop # second time - ~3 mins python setup.py rebuild develop # third time - ~2 mins python setup.py rebuild develop # fourth time - ~20 seconds python setup.py rebuild develop # fifth time - ~20 seconds ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487 Differential Revision: D9769087 Pulled By: soumith fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5	2018-09-11 08:56:25 -07:00
Orion Reblitz-Richardson	31850163ac	Remove separate ATen build target (#11488 ) Summary: ATen has had a separate build target in the past, but with our move to a root-level CMakeLists.txt file this makes less sense and is harder to maintain. Also, as we blend code between Caffe2 and ATen this will become even less maintainable. Talked to ezyang about this, but also cc zdevito, Yangqing, and soumith. If this is too difficult, I will revert, but want to see if we can simplify for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11488 Differential Revision: D9770266 Pulled By: orionr fbshipit-source-id: c7ba52a1676d84e2d052dad4c042b666f49451cd	2018-09-11 08:56:23 -07:00
Tongzhou Wang	de460c7ad3	Improvements on conv/pool/fold/stft/ParamDict docs (#11106 ) Summary: Also fixes some incorrect formula rendering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106 Differential Revision: D9752433 Pulled By: SsnL fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21	2018-09-11 08:56:21 -07:00
Gregory Chanan	86ab92b0a9	Move TensorImpl / UndefinedTensor(Impl) to core (#11441 ) Summary: Moves TensorImpl to core. Renames UndefinedTensor to UndefinedTensorImpl and moves to core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11441 Differential Revision: D9736620 Pulled By: gchanan fbshipit-source-id: 0322ae3b903e338de253b35a0d74a9d3e219204b	2018-09-11 07:45:56 -07:00
Neeraj Pradhan	80fa8e1007	Add .expand() method to distribution classes (#11341 ) Summary: This adds a `.expand` method for distributions that is akin to the `torch.Tensor.expand` method for tensors. It returns a new distribution instance with batch dimensions expanded to the desired `batch_shape`. Since this calls `torch.Tensor.expand` on the distribution's parameters, it does not allocate new memory for the expanded distribution instance's parameters. e.g. ```python >>> d = dist.Normal(torch.zeros(100, 1), torch.ones(100, 1)) >>> d.sample().shape torch.Size([100, 1]) >>> d.expand([100, 10]).sample().shape torch.Size([100, 10]) ``` We have already been using the `.expand` method in Pyro in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py#L10) of `torch.distributions`. We use this in our models to enable dynamic broadcasting. This has also been requested by a few users on the distributions slack, and we believe will be useful to the larger community. Note that currently, there is no convenient and efficient way to expand distribution instances: - Many distributions use `TransformedDistribution` (or wrap over another distribution instance. e.g. `OneHotCategorical` uses a `Categorical` instance) under the hood, or have lazy parameters. This makes it difficult to collect all the relevant parameters, broadcast them and construct new instances. - In the few cases where this is even possible, the resulting implementation would be inefficient since we will go through a lot of broadcasting and args validation logic in `__init__.py` that can be avoided. The `.expand` method allows for a safe and efficient way to expand distribution instances. Additionally, this bypasses `__init__.py` (using `__new__` and populating relevant attributes) since we do not need to do any broadcasting or args validation (which was already done when the instance was first created). This can result in significant savings as compared to constructing new instances via `__init__` (that said, the `sample` and `log_prob` methods will probably be the rate determining steps in many applications). e.g. ```python >>> a = dist.Bernoulli(torch.ones([10000, 1]), validate_args=True) >>> %timeit a.expand([10000, 100]) 15.2 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> %timeit dist.Bernoulli(torch.ones([10000, 100]), validate_args=True) 11.8 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cc. fritzo, apaszke, vishwakftw, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11341 Differential Revision: D9728485 Pulled By: soumith fbshipit-source-id: 3b94c23bc6a43ee704389e6287aa83d1e278d52f	2018-09-11 06:56:18 -07:00
Adam Paszke	120d769432	Add support for tracing strings (#11506 ) Summary: This enabled `torch.einsum` both in tracing and in script mode. It's used all over Pyro at the moment, and is needed for any use of the JIT in there. Fixes #11157. zdevito fritzo neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/11506 Differential Revision: D9764787 Pulled By: apaszke fbshipit-source-id: 9b5251b9e7c5897034602bd07ff67b425d33326c	2018-09-11 06:02:41 -07:00
Adam Paszke	0ddbe668cd	Improve shape analysis to cover all most commonly used ops (#11358 ) Summary: [Here's a list](https://gist.github.com/apaszke/f0821840bdcc67a977832dc58acc1b85) of ops that are in `register_aten_ops.cpp`, but aren't supported in shape prop. Everything else should work now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11358 Differential Revision: D9753693 Pulled By: apaszke fbshipit-source-id: efeae0126ce16cb56b8797fc5246405588bcae3c	2018-09-11 06:02:39 -07:00
Duc Ngo	f84693efa9	nomnigraph - Improvements to subgraph matching APIs (#11418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11418 Several improvements that aim to make the APIs more straightforward to use - Get rid of helper methods subgraph and nonTerminal . Users now should create a NNMatchGraph directly via graph's createNode and createEdge API - Get rid of operatorSubgraph helper method - invertGraphTraversal flag applies to both the match graph and the scanned graph. This allows user to create match graph in the same direction as the scanned graph, thus reduce confusion. - additional parameters of matchNode (count, includeInSubgraph, nonTerminal) are removed from the constructors and moved into setter methods. (We no longer enforce that MatchNode is immutable but this helps improve code clarity). - Tests are updated to reflect the changes Follow up changes: - Possibly clean up the tests further. This change aims to minimally modify the unit tests. - Help a validity check that enforce the current limitation of the match graph (single source node), and throws if the match graph does not satisfy the criteria. - Have the single source node be detected automatically and callers just need to pass in the matchGraph instead of the source node reference. Differential Revision: D9732565 fbshipit-source-id: ae8320e2bc89b867f6bb4b1c1aad635f4b219fa1	2018-09-11 04:39:27 -07:00
Teng Li	3d5fd12488	Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450 ) Summary: This is the new documentation for c10d release, and it also deprecates the old torch.distributed document. This PR depends on https://github.com/pytorch/pytorch/pull/11405 and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450 Differential Revision: D9765504 Pulled By: teng-li fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db	2018-09-11 02:10:28 -07:00
Teng Li	0988bbad2d	C10d release to torch.distributed for PT1 (#11405 ) Summary: The old `torch.distributed` will go to `torch.distributed.deprecated` The old DDP will go to `torch.nn.parallel.deprecated` Now `torch.nn.parallel.DDP` will use c10d DDP Now `torch.distributed` will use C10d frontend API Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405 Reviewed By: pietern Differential Revision: D9733733 Pulled By: teng-li fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08	2018-09-10 23:27:22 -07:00
Peter Goldsborough	b14a80553d	Ignore functional doc error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11508 Differential Revision: D9764380 Pulled By: goldsborough fbshipit-source-id: 3abb9c04f46137be833ea26d67734741e14f8010	2018-09-10 20:55:48 -07:00
Gregory Chanan	f9d12eeb27	Give copy an optional device argument. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11497 Differential Revision: D9762014 Pulled By: gchanan fbshipit-source-id: 996419cc5e86d000af953d030ff361adafb921ad	2018-09-10 20:40:03 -07:00
Peter Goldsborough	dd8defeb3f	Document the Functional module (#11460 ) Summary: Document the `Functional` module in the C++ API. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11460 Differential Revision: D9757555 Pulled By: goldsborough fbshipit-source-id: 15f8bf6d60bd26f3f4e69fb8e414e186e3c220ee	2018-09-10 19:58:38 -07:00
Peter Goldsborough	9cfdf0d677	Document the Embedding module (#11469 ) Summary: ebetica soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11469 Differential Revision: D9757547 Pulled By: goldsborough fbshipit-source-id: a95673abe949bb81d716dbc03c5c3e2a11cc15d3	2018-09-10 18:25:08 -07:00
Orion Reblitz-Richardson	a175282776	Flags for LMDB, LevelDB, and Caffe2 ops (#11462 ) Summary: Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with ``` USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps ``` Also add a flag to build Caffe2 ops, which is default `ON`. Disable with ``` NO_CAFFE2_OPS=1 python setup.py build_deps ``` cc Yangqing soumith pjh5 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462 Reviewed By: soumith Differential Revision: D9758156 Pulled By: orionr fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63	2018-09-10 17:27:50 -07:00
Orion Reblitz-Richardson	e1e69446f6	Lockdown NO_TEST=1 for tests even more (#11415 ) Summary: Skip torch tests as well when NO_TEST=1 environment variable is set. Also remove the separate ATen code path for not being built with Caffe2, since it will always be built with Caffe2. cc The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11415 Reviewed By: soumith Differential Revision: D9758179 Pulled By: orionr fbshipit-source-id: e3e3327364fccdc57a703aeaad8c4f30452973fb	2018-09-10 17:27:48 -07:00
Bram Wasti	3e49a69466	Resolve ambiguity when including both caffe2 and aten registries (#11411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11411 Simple fix Reviewed By: goldsborough Differential Revision: D9730371 fbshipit-source-id: f841327c01faa13cfb6b7fc6e279b8fc50fad1db	2018-09-10 17:27:46 -07:00
James Reed	3ad67c60f0	Traceable explicit Variable instantiation (#11463 ) Summary: There's a bunch of legacy code where people are explicitly instantiating Variable, and these call-sites have thus far been untraceable (appearing as prim::Constant nodes with the tensor value at the time of tracing). This makes it so that the new variable inherits the traced Value* from the tensor it's being constructed from Pull Request resolved: https://github.com/pytorch/pytorch/pull/11463 Differential Revision: D9756529 Pulled By: jamesr66a fbshipit-source-id: da99c6a7621957a305f2699ec9cb9def69b1b2d7	2018-09-10 17:03:24 -07:00
Mingda Li	f2f43ad2da	Add new LengthsSplit operator (#10974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291 This new operator will do the following: Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where: 1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements) 2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1) 3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0) Reviewed By: bddppq, chocjy Differential Revision: D9013119 fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84	2018-09-10 15:40:28 -07:00
Owen Anderson	0b78ae86c5	Cleanup byte swapping utilities to generate optimal code on the platforms we care about. (#11394 ) Summary: While the use of memcpy as part of the byte swapping sequence looks funky, all major compilers recognize and optimize this pattern reliably, resulting in essentially optimal code generation. For example, decodeUInt32LE goes from this on iOS arm64: > ldrb w8, [x0, #3] > ldrb w9, [x0, #2] > bfi w8, w9, #8, #8 > ldrb w9, [x0, #1] > bfi w8, w9, #16, #8 > ldrb w9, [x0] > bfi w8, w9, #24, #8 > mov x0, x8 > ret To this: > ldr w8, [x0] > rev w0, w8 > ret Pull Request resolved: https://github.com/pytorch/pytorch/pull/11394 Reviewed By: SsnL Differential Revision: D9728659 Pulled By: resistor fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f	2018-09-10 15:40:24 -07:00
Peter Goldsborough	a0d4106c07	Integrate custom op tests with CI (#10611 ) Summary: This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them. I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think? Also ezyang for CI changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611 Differential Revision: D9597627 Pulled By: goldsborough fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3	2018-09-10 15:40:21 -07:00
Adam Paszke	3e665cc29b	Improve support for tracing sizes, add more tracer warnings (#11288 ) Summary: Many constructors like `torch.zeros` or `torch.randn` didn't support size tracing correctly which is fixed by this pass. Same issue has been fixed in legacy tensor constructors. Additionally, new tensor constructors, which do not participate in tracing (most notably `torch.tensor`, `torch.as_tensor` and `torch.from_numpy`) raise a warning when they are used. Finally, entering a traceable operation disables the tracing in its body. This is needed because zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288 Reviewed By: ezyang Differential Revision: D9751183 Pulled By: apaszke fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f	2018-09-10 15:22:48 -07:00
Tongzhou Wang	70d93f4777	Check for maximum numel in NCCL broadcasting (#11466 ) Summary: NCCL1 uses `int` as its numerical type for fields like `count`, which makes broadcasting tensors larger than `2 << 31 - 1` impossible, and raises opaque error `invalid arguments`. NCCL2 greatly increase the limit on many platforms by using `size_t`. This patch statically detects this type, and raises properly if the broadcast tensor exceeds the limit. No test because I don't think our test suite should broadcast big tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11466 Differential Revision: D9754753 Pulled By: SsnL fbshipit-source-id: 73506450cae047e06b5b225b39efdb42d5d26685	2018-09-10 14:39:15 -07:00
Peter Goldsborough	35008e0a1a	Add flags to fix half comparison and test (#11395 ) Summary: The controller you requested could not be found. found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue. Reported in https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333 Related: https://github.com/pytorch/tutorials/pull/292 soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11395 Differential Revision: D9725102 Pulled By: goldsborough fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad	2018-09-10 14:10:21 -07:00
Myle Ott	18e5fd36c2	Normalize gradients before reduction in DistributedDataParallelC10d (#11109 ) Summary: Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11109 Differential Revision: D9594708 Pulled By: myleott fbshipit-source-id: 93ab53cb782ee1cbe1264e529b333490a0940338	2018-09-10 13:55:09 -07:00
Tongzhou Wang	ea0ee77c61	Fix katex math rendering (#11472 ) Summary: I'm 80% sure that this fixes the math bug. But I can't repro locally so I don't know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11472 Differential Revision: D9755328 Pulled By: SsnL fbshipit-source-id: 130be664d3c6ceee3c0c166c1a86fc9ec3b79d74	2018-09-10 12:40:23 -07:00
Sebastian Messmer	198ade74f9	Remove manual refcounting from Tensor class (#11294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11294 The Tensor(ptr, retain) constructor is error prone and circumvents the intrusive_ptr safety. This diff removes that and pushes the responsibility to callers. Step by step, manual refcounting can be pushed back and possibly eliminated in the end. Reviewed By: ezyang Differential Revision: D9663476 fbshipit-source-id: 7f010e5e47b137a9575960201c5bf5d552c5c2f5	2018-09-10 12:40:21 -07:00
Sebastian Messmer	b0c1397271	Fix intrusive_ptr move/copy for different NullType's (#11260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11260 This is needed to make something like this work: intrusive_ptr<TensorImpl, UndefinedTensorImpl> a = make_intrusive<SparseTensorImpl>(...); Reviewed By: ezyang Differential Revision: D9652089 fbshipit-source-id: 19c65e98460ccb27bc69e36d7e558cb9d6e67615	2018-09-10 12:40:20 -07:00
Sebastian Messmer	252f93df09	Improve Tensor() constructor (#11258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11258 The two intrusive_ptr constructors in Tensor can be combined into one implementation that does both, moving and copying. Reviewed By: ezyang Differential Revision: D9652088 fbshipit-source-id: 5efca02654ba305c99c20bbeb83551469d17a51d	2018-09-10 12:40:19 -07:00
Sebastian Messmer	09292f2c03	Some improvements to IValue (#11238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11238 - when moving an IValue, free the old value instead of keeping it allocated - making classes final - moving std::string - making ConstantList const Reviewed By: ezyang Differential Revision: D9644700 fbshipit-source-id: ab7228368e4f00f664ba54e1242b0307d91c5e7e	2018-09-10 12:40:17 -07:00
Sebastian Messmer	ce6906b051	Narrowing Blob (#11167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11167 Narrow the Blob API as preparation for merging Blob/IValue - get rid of templated IsType and Operator::InputIsType / OutputIsType - Use 'using' instead of 'typedef' for DestroyCall (just for readability) Reviewed By: ezyang Differential Revision: D9623916 fbshipit-source-id: 952f0b0cf5a525094b02e8d2798dd57a56a9e1d8	2018-09-10 12:40:16 -07:00
Richard Zou	040d75d455	Add option to use CUDA memory leak testing as a context manager (#11380 ) Summary: cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/11380 Reviewed By: ezyang Differential Revision: D9705877 Pulled By: zou3519 fbshipit-source-id: 02470c25236f57fa02f4ac9d7ed63d38a6355db2	2018-09-10 12:40:15 -07:00
Elias Ellison	2158f4a9c8	add export import test to TestJitGenerated (#10982 ) Summary: Checking assertExportImport for all of the generated test jit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10982 Differential Revision: D9636935 Pulled By: eellison fbshipit-source-id: f3f1ce77d454848098f2ac7e0fa18bf8564890be	2018-09-10 11:37:05 -07:00
Gregory Chanan	cee743f639	Move backward/set_data to Type-based dispatch. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11440 Differential Revision: D9736565 Pulled By: gchanan fbshipit-source-id: 1e66f54f1c87084f37c0b014030f0d6d2f8dfaee	2018-09-10 08:40:29 -07:00
Tongzhou Wang	87a9a8f80a	Use AT_CHECK and AT_ERROR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11444 Differential Revision: D9736992 Pulled By: SsnL fbshipit-source-id: bf5320e878c6ef71468f3e2aa12ce304b92d45ca	2018-09-09 21:26:12 -07:00
Tongzhou Wang	560d6efd3a	Only join started dataloader workers (#11432 ) Summary: `Process.start()` actually take some time as it needs to start a process and pass the arguments over via a pipe. Therefore, we only add a worker to self.workers list after it started, so that we do not call `.join()` if program dies before it starts, and `__del__` tries to join it but will get: AssertionError: can only join a started process. Example trace when such error happens: ```py [unrelated] File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__ return _DataLoaderIter(self) File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__ w.start() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() KeyboardInterrupt Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers w.join() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join assert self._popen is not None, 'can only join a started process' AssertionError: can only join a started process ``` No test because hard to reliably trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432 Reviewed By: ezyang Differential Revision: D9735430 Pulled By: SsnL fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351	2018-09-09 12:55:51 -07:00
Matt Dawkins	87b2f05a9c	Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435 ) Summary: Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435 Differential Revision: D9736396 Pulled By: soumith fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f	2018-09-09 11:40:25 -07:00
Thomas Viehmann	581099a7b2	pybind conversion for IntList (#11425 ) Summary: as discussed with ezyang and slayton58 , this might be a nice convenience to be able to use code in extensions just as in ATen. also split off `tracing_state.h` from `torch/jit/tracer.h` fix #11204 to bee able to use the utility functions pytorchbot it's not a jit patch per se. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11425 Differential Revision: D9735556 Pulled By: ezyang fbshipit-source-id: 466c92bbdb1d7d7a970eba1c26b7583fe9756139	2018-09-09 10:39:40 -07:00
Soumith Chintala	ee4309a9ac	override BUILD_TEST when building gloo (#11431 ) Summary: A recent build regression is that we need a system GoogleTest for builds to pass. This was because, when building with Gloo, gloo is trying to build it's own tests, which look for system gtest [here](https://github.com/facebookincubator/gloo/blob/master/cmake/Dependencies.cmake#L72-L80) (because we're not using full cmake build and making it aware of third_party/GoogleTest, but instead, we are building it isolated using tools/build_pytorch_libs.sh Traditionally, we didn't ask Gloo to build it's tests, but because we added `-DBUILD_TEST=1` by default to all builds (in refactoring variable names), we accidentally started asking Gloo to build it's tests. This PR overrides the Gloo flags and asks it to not build tests (like it used to) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11431 Differential Revision: D9736387 Pulled By: soumith fbshipit-source-id: 59e84edae780123b793bdaea5fd9ac46156cd0af	2018-09-09 10:11:56 -07:00
Mingfei Ma	1b94f5c6e6	optimize masked_fill on CPU (#11359 ) Summary: This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU. the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores), it runs `4.20` sec without the PR and `0.11` sec with the PR. ```python import torch import random from time import time size = 10 * 1000 * 1000 count = 100 def test_masked_fill(): dst = torch.randn(size) dst_ = dst.clone() mask = torch.rand(size).mul(2).floor().byte() val = random.random() tstart = time() for i in range(count): dst.masked_fill_(mask, val) tend = time() print("masked_fill_: %f" % (tend-tstart)) for i in range(size): if mask[i]: if dst[i] != val: print("fail") else: if dst[i] != dst_[i]: print("fail1") print("test_masked_fill: PASS") test_masked_fill() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11359 Differential Revision: D9735578 Pulled By: ezyang fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4	2018-09-09 00:25:26 -07:00
Syed Tousif Ahmed	b7ecf035dc	Updates FindCUDA.cmake to 3.12.2 upstream version (#11406 ) Summary: This PR is just a copy-paste of the upstream FindCUDA.cmake. Since, cublas_device is deprecated in CUDA >= 9.2, this change is necessary for build. Related: https://gitlab.kitware.com/cmake/cmake/merge_requests/2298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11406 Differential Revision: D9735563 Pulled By: ezyang fbshipit-source-id: c74d86ced7cc485cb2233f9066ce23e921832c30	2018-09-08 23:10:32 -07:00
Erik Brinkman	6683fb56ca	Add AVX optimizations for pdist (#11230 ) Summary: Added AVX optimizations for pdist using Vec256. This brings single threaded performance up to speed with scipy, but the current implementation greatly hurts performance without AVX enabled. Is there a way to special case out AVX on dispatch and call the non Vec256 code? Or is the way I used Vec256 completely wrong? Single threaded comparison to scipy ============================ This is the time to compute the pdist of a 2048 x 2048 float matrix with only one thread for various values of p between torch and scipy. p = 3 is the code path for arbitrary p, and so is much slower than the other values. p \| torch \| scipy -----\|-----------\|------ 0 \| 6.27 s ± 393 ms \| 7.23 s ± 498 ms 1 \| 5.49 s ± 201 ms \| 43.4 s ± 1.09 s 2 \| 5.74 s ± 474 ms \| 53.8 s ± 3.52 s ∞ \| 5.59 s ± 292 ms \| 47.4 s ± 2.03 s 3 \| really slow \| gave up Result by AVX support ================ This is the time to compute the distance and gradient of a 2048 x 2048 float matrix with all threads by AVX support. `before` is the old code, `default` is no AVX support, etc. Interestingly the AVX optimizations provided a great benefit over the old unoptimized code, but drastically hurt performance when compiled without AVX optimizations. p = 3 is the code path for arbitrary p, and so is much slower than the other values. Results for p = 0 ---------------- avx \| dist \| grad ----\|------\|----- before \| 514 ms ± 87.5 ms \| 191 µs ± 35 µs default \| 3.47 s ± 183 ms \| 201 µs ± 24.6 µs avx \| 123 ms ± 18.2 ms \| 281 µs ± 130 µs avx2 \| 103 ms ± 11.4 ms \| 216 µs ± 74.4 µs Results for p = 1 ---------------- avx \| dist \| grad ----\|------\|----- before \| 426 ms ± 35 ms \| 6.21 s ± 187 ms default \| 2.6 s ± 123 ms \| 5.62 s ± 273 ms avx \| 104 ms ± 6.37 ms \| 833 ms ± 44.3 ms avx2 \| 106 ms ± 3.59 ms \| 924 ms ± 86.2 ms Results for p = 2 ----------------- avx \| dist \| grad ----\|------\|----- before \| 425 ms ± 45.4 ms \| 6.31 s ± 125 ms default \| 3.04 s ± 187 ms \| 3.55 s ± 242 ms avx \| 110 ms ± 3.66 ms \| 896 ms ± 21.8 ms avx2 \| 113 ms ± 4.68 ms \| 934 ms ± 25.2 ms Results for p = ∞ ------------------ avx \| dist \| grad ----\|------\|----- before \| 501 ms ± 39.5 ms \| 6.64 s ± 321 ms default \| 2.15 s ± 92.9 ms \| 8.43 s ± 355 ms avx \| 104 ms ± 5.52 ms \| 835 ms ± 36.7 ms avx2 \| 100 ms ± 3.41 ms \| 864 ms ± 67 ms Results for p = 3 ----------------- avx \| dist \| grad ----\|------\|----- before \| 22.6 s ± 413 ms \| 11.1 s ± 242 ms default \| 24.9 s ± 1 s \| 11.2 s ± 293 ms avx \| 2.69 s ± 148 ms \| 5.63 s ± 88.4 ms avx2 \| 2.48 s ± 31.8 ms \| 5.61 s ± 114 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11230 Differential Revision: D9735503 Pulled By: erikbrinkman fbshipit-source-id: a9da619249e4ca2625b39ca1ca7f5543c3086bfb	2018-09-08 22:55:02 -07:00
Tongliang Liao	538ea67437	Search for CMake config files for pybind11. (#11423 ) Summary: If pybind is build with cmake and installed, we should use config file instead of the Findpybind11 shipped with caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11423 Differential Revision: D9735557 Pulled By: ezyang fbshipit-source-id: 28a39e579fa045060aa1a716e5fd7dbcf7b89569	2018-09-08 22:44:03 -07:00
Yifei Teng	02114e877f	fix #10838 incorrect bidirectional output format (#11368 ) Summary: Fixes the issue discussed in #10838. `hidden_size` should be the last dimension regardless if we're in ONNX or PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11368 Differential Revision: D9734814 Pulled By: soumith fbshipit-source-id: 7f69947a029964e092c7b88d1d79b188a417bf5f	2018-09-08 17:09:57 -07:00
Edward Yang	ac9268f25d	Conversions to and from complex numbers. (#11420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420 Surprisingly tricky! Here are the major pieces: - We grow a even yet more ludicrous macro AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF which does what it says on the tin. This is because I was too lazy to figure out how to define the necessary conversions in and out of ComplexHalf without triggering ambiguity problems. It doesn't seem to be as simple as just Half. Leave it for when someone actually wants this. - Scalar now can hold std::complex<double>. Internally, it is stored as double[2] because nvcc chokes on a non-POD type inside a union. - overflow() checking is generalized to work with complex. When converting to std::complex<T>, all we need to do is check for overflow against T. When converting from complex, we must check (1) if To is not complex, that imag() == 0 and (2) for overflow componentwise. - convert() is generalized to work with complex<->real conversions. Complex to real drops the imaginary component; we rely on overflow checking to tell if this actually loses fidelity. To get the specializations and overloads to work out, we introduce a new Converter class that actually is specializable. - Complex scalars convert into Python complex numbers - This probably fixes complex tensor printing, but there is no way to test this right now. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: cpuhrsch Differential Revision: D9697878 Pulled By: ezyang fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946	2018-09-08 16:39:43 -07:00
Tongzhou Wang	d3f98b5ffc	Add matrix power (#11421 ) Summary: vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it. I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead. Note to reviewers: patch was already approved at #10068 . cc yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421 Differential Revision: D9733407 Pulled By: SsnL fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599	2018-09-08 15:25:56 -07:00
Edward Yang	802380ac93	Improve LegacyTypeDispatch to handle initialization correctly. (#11331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11331 In the previous commit, we added a bare-bones LegacyTypeDispatch in ATen/core. This is not sufficient for the use cases we need: we not only need to be able to get a Type, but we also need to be able to initialize the Types if its the first time we have retrieved a CPU/CUDA/Complex type. I hemmed and hawed about how to do this; the strategy this PR takes is to introduce a new "hooks" interface specifically for initializing CPU/CUDA/Complex (which still lives in Context). We then move all "user-friendly" functions to LegacyTypeDispatch. Here were some other options which I considered, but don't work: - Assume that Type is already initialized, because we only intend to call Type from Tensor methods, where we already have a Tensor. This does not work because Caffe2 created tensors will not have gone through the standard Type codepath, and will have skipped initialization. - Move CUDAHooks and ComplexHooks to ATen/core. Besides being sucky, this isn't even a complete fix, because I still need to initialize CPU hooks (so you still need another hooks interface). Reviewed By: cpuhrsch Differential Revision: D9666612 fbshipit-source-id: ac7004b230044b67d13caa81fdfaf3c6ab915e3f	2018-09-08 10:10:17 -07:00
Edward Yang	9687a72794	Move the type registry out of Context, into LegacyTypeDispatch. (#11274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11274 We don't want to put all of Context into ATen/core, but one particular part cannot be avoided: the type registry, because implementations of TensorMethods will need to get a Type, and then do a virtual call on it. I needed to do a little bit of (temporary) footwork to get this in without also moving Type, because unique_ptr<Type> expects to be able to see the destructor of Type (but it's forward declared right now). So instead I put the destructor as an explicit functor. We can get rid of this once Type actually moves in ATen/core Reviewed By: cpuhrsch Differential Revision: D9657449 fbshipit-source-id: 940931493bf4f1f6a8dad03f34633cacdd63dd0b	2018-09-08 10:10:11 -07:00
Tongzhou Wang	b9b9ae935b	Make torch.randint have default dtype int64 (#11040 ) Summary: cc gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040 Differential Revision: D9565728 Pulled By: SsnL fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e	2018-09-08 07:55:06 -07:00
Teng Li	505ecab88d	bumping up the default store timeout (#11409 ) Summary: to 300 seconds to be safe. It used to be no timeout in THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11409 Differential Revision: D9731709 Pulled By: teng-li fbshipit-source-id: 0ce011dcca507cbf063176ad4995405c77dd0cdd	2018-09-07 23:55:23 -07:00
Pieter Noordhuis	3d2862526b	Support send/recv for the gloo process group (#11387 ) Summary: This change removes the skips for the existing send/recv tests in the backwards compatibility layer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11387 Reviewed By: teng-li Differential Revision: D9729330 Pulled By: pietern fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3	2018-09-07 20:25:18 -07:00
James Reed	47c1de25e8	Test exporting batch norm, dropout, RNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11126 Differential Revision: D9727689 Pulled By: jamesr66a fbshipit-source-id: f142257a2fba27d86844bf33084174f1f68a8ca5	2018-09-07 19:41:39 -07:00
Deyu Fu	b7a2c91eed	remove unnecessary clone() when .grad is None (#11165 ) Summary: Currently gradient is copied into .grad if it is None. This PR aim to remove the copy when it is not absolutely needed. It is generally an improvement of speed and memory usage. And here is a case it may help a lot: Normally, people do optimizer.zero_grad() every minibatch before backward. It will translate into a memset, and later a point-wise add. When there is some large weight in the network, one optimization people can always do is set parameter.grad to None instead of zero_grad. This will remove memset and change point-wise add to a memcpy. Here is result running following script on V100 GPU. It is 100 iterations of forward/backward/zero_grad on single 1-billion word benchmark size embedding. `Zero grad: 2.123847723007202` `None grad: 1.3342866897583008` With the backend change of this PR, the unnecessary memcpy is removed, thus further speed up is achieved. `Zero grad: 2.124978542327881` `None grad: 0.4396955966949463` [benchmark.txt](https://github.com/pytorch/pytorch/files/2341800/benchmark.txt) Some details on the code change: .detach() is used because we need to get rid of new_grad being a view without copy data. This should be safe in first-order only mode. data need to be contiguous, otherwise `grad_variable.data() += new_grad.data();` below will fail. Only the last variable that has reference to the temp gradient will grab its buffer. ngimel, mcarilli and mruberry helped on finalizing this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11165 Differential Revision: D9728874 Pulled By: soumith fbshipit-source-id: b8fb822a2dff6e812bbddd215d8e384534b2fd78	2018-09-07 19:41:37 -07:00
Edward Yang	c49b01a8a0	Change default variants to 'function'. (#11247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11247 Previously, the default for a declaration in native_functions.yaml was ['function', 'method'], i.e., generate both a method and function for every binding. We now believe this is inappropriate: the majority of new kernels added to PyTorch should live as free functions, NOT methods. Thus, we change the default accordingly. I also took the opportunity to de-method some "internal" functions that had a leading underscore. While, strictly speaking, this is a BC breaking change, I believe it is highly unlikely anyone was using these directly. Reviewed By: yf225 Differential Revision: D9648570 fbshipit-source-id: 8b94647b824e0899d6d18aa5585aaedc9d9957d2	2018-09-07 17:56:08 -07:00
Jesse Hellemn	fa522d1aed	Revert D9720931: [pytorch][PR] [third-party] Update googletest to release-1.8.1 Differential Revision: D9720931 Original commit changeset: 18a60d0409e7 fbshipit-source-id: a05dcba71277eb4f8ac38886f307d6cf6e6955a9	2018-09-07 17:42:03 -07:00
Yangqing Jia	c9843bd86b	Update googletest to release-1.8.1 (#11388 ) Summary: This is mainly to pick up the change `20074be19a` to avoid polluting the CMAKE_DEBUG_POSTFIX variable. cc orionr . Pull Request resolved: https://github.com/pytorch/pytorch/pull/11388 Reviewed By: orionr Differential Revision: D9720931 Pulled By: Yangqing fbshipit-source-id: 18a60d0409e74316f74d364f4fe16bf0d0198413	2018-09-07 16:56:16 -07:00
Peter Goldsborough	31d36b1d31	move complex registration test out-of-line (#11397 ) Summary: Moves the code for the complex registration code into an out-of-line C++ extension to de-noise the test_cpp_extensions.py file. Let's keep it nice and tidy so we can point our users at it for usage examples. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11397 Differential Revision: D9725335 Pulled By: goldsborough fbshipit-source-id: 290618f2ee711b1895cdb8f05276034dfe315c6d	2018-09-07 16:56:14 -07:00
James Reed	4ae16c9ad9	Recursive descent for validation + convert expands in ATen fal… (#11356 ) Summary: …lback Pull Request resolved: https://github.com/pytorch/pytorch/pull/11356 Differential Revision: D9721002 Pulled By: jamesr66a fbshipit-source-id: eeb50b56f8a72e929860c5e459a5ab50ac624814	2018-09-07 16:39:36 -07:00
Xiaomeng Yang	4c8cc36e34	Fix igios build (#11392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11392 Fix igios build Reviewed By: houseroad Differential Revision: D9720833 fbshipit-source-id: 33acc3c658c22addd4bad142433824076233e901	2018-09-07 15:55:23 -07:00
David Riazati	4bf5fc44c8	Fix split_size test failures (#11051 ) Summary: ~~This PR fixes #8525 by renaming `split_with_sizes` to `split` so that 2 `aten::split` ops are generated (previously `aten::split(self, int, int)` and `aten::split_with_sizes(self, int[], int)` were generated)~~ ~~`split_with_sizes` was made in PR #5443, but I don't see a reason for it to have a different name than `split` rather than just overload `split`.~~ This PR fixes #8525 by adding `register_special_ops.cpp` to mirror Python dispatching from `split` to `split` and `split_with_sizes` in [tensor.py](https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L279). It also fixes #8520 by adding an `int[]` wherever it sees `torch.Size` In a follow up PR this could also be used to fix some of the other `unknown builtin op` test errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11051 Differential Revision: D9582443 Pulled By: driazati fbshipit-source-id: d27201f85937d72e45e851eaa1460dd3dd1b61a9	2018-09-07 15:39:24 -07:00
Pieter Noordhuis	9886ebeb24	Remove hardcoded system path from CMAKE_MODULE_PATH (#11386 ) Summary: This seems to be causing different versions of OpenMPI being picked up by different parts of the build. Not a good practice to include absolute paths anyway, so let's try removing it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11386 Reviewed By: teng-li Differential Revision: D9724349 Pulled By: pietern fbshipit-source-id: 3dfef91c81f2e97e5125284aff9e7e98f8761917	2018-09-07 15:25:38 -07:00
Orion Reblitz-Richardson	802d21c8f4	Remove FULL_CAFFE2 flag (#11321 ) Summary: Continuing pjh5's work to remove FULL_CAFFE2 flag completely. With these changes you'll be able to also do something like ``` NO_TEST=1 python setup.py build_deps ``` and this will skip building tests in caffe2, aten, and c10d. By default the tests are built. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321 Reviewed By: mingzhe09088 Differential Revision: D9694950 Pulled By: orionr fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8	2018-09-07 15:09:44 -07:00
Tongzhou Wang	93da5a21c9	Update variable view note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11393 Differential Revision: D9725444 Pulled By: SsnL fbshipit-source-id: b1607d986ab93e64b0b0ff9e8f10d9e3f6e2160e	2018-09-07 15:09:43 -07:00
Peter Goldsborough	77b6d7d255	Doc improvements (#11347 ) Summary: 1. Remove cudnn* symbols from C++ docs 2. Fix code examples for `nn::Module` and `jit::compile` 3. Document Dropout Pull Request resolved: https://github.com/pytorch/pytorch/pull/11347 Differential Revision: D9716751 Pulled By: goldsborough fbshipit-source-id: e0566cec35848335cac3eb9196cb244bb0c8fa45	2018-09-07 14:39:36 -07:00
Zachary DeVito	7de0332e10	Add initial documentation for JIT (#11357 ) Summary: In addition to documentation, this cleans up a few error message formats. It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs. The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress. Followup PRs will polish the docs and fix formatting issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357 Differential Revision: D9721277 Pulled By: zdevito fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b	2018-09-07 14:27:47 -07:00
Wanchao Liang	69b4b45f91	enable missing nn tests with single grad check, minor refactor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11366 Differential Revision: D9723305 Pulled By: wanchaol fbshipit-source-id: 9e7e2e7e68cb4919610bccfbf76fa33b647f6eb7	2018-09-07 14:27:46 -07:00
Teng Li	576807ce1a	flaky test fix trial (#11391 ) Summary: Add a barrier() to wait for all PG created before destroy Pull Request resolved: https://github.com/pytorch/pytorch/pull/11391 Differential Revision: D9727383 Pulled By: teng-li fbshipit-source-id: 689d62c978e642b68f4949dcf29982e34869ada4	2018-09-07 14:10:06 -07:00
Xiaodong Wang	e9da2dd3cc	Do not use PERSISTENT cudnn mode for spatialBN (#11382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11382 We found this cudnn bug in S163230 that causes accuracy loss. We fix this in D9601217, but due to the reimplementation of spatialBN it's overwritten. Let's land this fix again. Reviewed By: kuttas Differential Revision: D9702347 fbshipit-source-id: 11547e9edaf7b2ba7f4aa7263ffb4f0281bbf078	2018-09-07 13:41:18 -07:00
Peter Goldsborough	01930a3145	Move sync_params to C++ (#9805 ) Summary: The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase. I also split code into a `.h` and `.cpp` file for better code organization. The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805 Differential Revision: D9688604 Pulled By: goldsborough fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8	2018-09-07 12:56:40 -07:00
Gu Wang	ba6f10343b	update CUDAExtension doc (#11370 ) Summary: fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/11370 Differential Revision: D9701777 Pulled By: soumith fbshipit-source-id: 9f3986cf30ae0491e79ca4933c675a99d6078982	2018-09-07 12:56:38 -07:00
vishwakftw	733402bef4	Fix issues with certain heterogeneous types in lists during tensor creation (#11377 ) Summary: Closes #9963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11377 Differential Revision: D9701824 Pulled By: soumith fbshipit-source-id: 89c5448fd90ece1b365dc42f775b6b0c73ce790c	2018-09-07 12:56:35 -07:00
Jerry Zhang	5e400e9cae	move context_base.h to ATen/core (#11336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11336 Move `context_base.h` header to `ATen/core` and the implementations are in `caffe2/core/context_base.cc` Reviewed By: ezyang Differential Revision: D9670493 fbshipit-source-id: ce5bf2b3b4c80e9b62819f4332ce68af82720055	2018-09-07 12:20:25 -07:00
Peter Goldsborough	fb4e8088f3	Remove methods that start with an underscore from at::Tensor (#11152 ) Summary: This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API. For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods. ezyang colesbury gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152 Differential Revision: D9683607 Pulled By: goldsborough fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543	2018-09-07 11:55:11 -07:00
Tongzhou Wang	e80f7e1f64	Fix more warnings (#11320 ) Summary: also a missing space in fft error message Pull Request resolved: https://github.com/pytorch/pytorch/pull/11320 Differential Revision: D9676012 Pulled By: SsnL fbshipit-source-id: a636e5fce042198510c8e456fa51fde714da8348	2018-09-07 11:26:58 -07:00
Erik Brinkman	91089a7e17	Add GPU implementation of pdist (#11102 ) Summary: Add the gpu kernel version. The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case. Test Plan --------- ``` python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist ``` Current performance specs are a little underwhelming, I'm in the process of debugging. size \| torch \| torch cuda \| scipy -----\|-------\|------------\|------ 16 x 16 \| 9.13 µs ± 3.55 µs \| 9.86 µs ± 81.5 ns \| 15.8 µs ± 1.2 µs 16 x 1024 \| 15 µs ± 224 ns \| 9.48 µs ± 88.7 ns \| 88.7 µs ± 8.83 µs 1024 x 16 \| 852 µs ± 6.03 µs \| 7.84 ms ± 6.22 µs \| 4.7 ms ± 166 µs 1024 x 1024 \| 34.1 ms ± 803 µs \| 11.5 ms ± 6.24 µs \| 273 ms ± 6.7 ms 2048 x 2048 \| 261 ms ± 3.5 ms \| 77.5 ms ± 41.5 µs \| 2.5 s ± 97.6 ms 4096 x 4096 \| 2.37 s ± 154 ms \| 636 ms ± 2.97 µs \| 25.9 s ± 394 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102 Differential Revision: D9697305 Pulled By: erikbrinkman fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99	2018-09-07 09:09:46 -07:00
Gregory Chanan	110191e5c7	Remove detach from TensorImpl, handle via Type. (#11337 ) Summary: This is so that TensorImpl does not have to depend on Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11337 Differential Revision: D9684421 Pulled By: gchanan fbshipit-source-id: d2af93420ca6d493429c251cfe5a34e9289c4484	2018-09-07 08:55:59 -07:00
Edward Yang	52b37d8b66	Move VariableHooksInterface to ATen/core (#11273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11273 This one might strike you as a bit surprising, but it's necessary to expose this interface in ATen/core, because we need to be able to get a true Variable type from Variable tensors, and to do that we need to go through the hooks interface. Reviewed By: gchanan Differential Revision: D9656548 fbshipit-source-id: 28bb5aee6ac304e8cd5fa1e4c65452c336647161	2018-09-07 08:11:53 -07:00
Edward Yang	396e64fff7	Move ATen/Registry.h to ATen/core/Registry.h (#11270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11270 Still need to deduplicate this with caffe2/core/registry.h, but this will be a bit tricky because the current formulation of the macro is namespace sensitive (i.e., the macro for classes defined in at:: namespace won't work if you call from caffe2:: namespace). Reviewed By: gchanan Differential Revision: D9654871 fbshipit-source-id: 2207d1f2cc6d50bd41bf64ce0eb0b8523b05d9d9	2018-09-07 08:11:52 -07:00
Edward Yang	b02b125d16	Rename getMaybeVariableType back to getType. (#11250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11250 ``` codemod -d . --extensions cc,cpp,cu,cuh,h getMaybeVariableType getType ``` Reviewed By: gchanan Differential Revision: D9648830 fbshipit-source-id: 6b2ac2b1c265ae47722390e6e7f106653077d851	2018-09-07 08:11:50 -07:00
Jongsoo Park	68371b6d2e	fast code path when partition=1 which makes LengthsPartition a simple copy (#11351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11351 When partitions == 1 (InputSize() == OutputSize()), LengthsPartition becomes just a copy. Reviewed By: aazzolini Differential Revision: D9693409 fbshipit-source-id: a9ea034d227af357b661477ab779a71600f58f58	2018-09-07 08:11:49 -07:00
vishwakftw	da4ebc2971	Switch SVD on CPU from gesvd to gesdd (#11194 ) Summary: - Added a note to the doc string for `svd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11194 Differential Revision: D9683250 Pulled By: soumith fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f	2018-09-07 07:39:57 -07:00
rasbt	f9595e756e	typo/grammar fixes (#11344 ) Summary: Fixes some minor grammar issues in the code base. PS: I was actually looking for the following one but couldn't find it via grepping in this repo: ![screen shot 2018-09-06 at 3 27 39 pm](https://user-images.githubusercontent.com/5618407/45184280-1e16a980-b1ec-11e8-9cb1-87a96738bdd1.png) Any idea in which file this issue is raised? Pull Request resolved: https://github.com/pytorch/pytorch/pull/11344 Differential Revision: D9696454 Pulled By: soumith fbshipit-source-id: 8ffe494b1bf1efb0e35563381d9da2e1e8032a3c	2018-09-06 21:57:14 -07:00
mruberry	a2afad2b69	Improves ATen CUDAEvent (#11293 ) Summary: After submitting PR #9726, PR #10581 created a different CUDAEvent class. The CUDAEvent proposed in #9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by #10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR #8354. In particular, this PR: - Adds set_device() to CUDAContext - Adds three CUDAEvent tests to stream_test.cpp - Fixes three bugs: - Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards). - Moving an event would cause a segfault. - Events were not destroyed on the device they were created on. See PR #9415 (pietern) - Adds the happened() and recordOnce() functions - Changes the record() functions to not be const - Adds additional assertions to verify correctness This PR does not: - Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR) Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here: `542aadd9a7/aten/src/ATen/native/cudnn/RNN.cpp (L1130-L1132)` and that file is the only one which currently uses the ATen CUDAEvent. Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs). I have decided to separate this from PR #8354 since while it's required for PR #8354 the changes are, clearly, of independent interest. PR #8354 has a new dependency on this one, however. I am closing PR #9726 in favor of this PR. apaszke ezyang pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11293 Differential Revision: D9665836 Pulled By: soumith fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2	2018-09-06 21:39:44 -07:00
Neeraj Pradhan	b3b1e7624d	Optional expand=True kwarg in distribution.enumerate_support (#11231 ) Summary: This adds an optional `expand=True` kwarg to the `distribution.expand_support()` method, to get a distribution's support without expanding the values over the distribution's `batch_shape`. - The default `expand=True` preserves the current behavior, whereas `expand=False` collapses the batch dimensions. e.g. ```python In [47]: d = dist.OneHotCategorical(torch.ones(3, 5) * 0.5) In [48]: d.batch_shape Out[48]: torch.Size([3]) In [49]: d.enumerate_support() Out[49]: tensor([[[1., 0., 0., 0., 0.], [1., 0., 0., 0., 0.], [1., 0., 0., 0., 0.]], [[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]], [[0., 0., 1., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 1., 0., 0.]], [[0., 0., 0., 1., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 1., 0.]], [[0., 0., 0., 0., 1.], [0., 0., 0., 0., 1.], [0., 0., 0., 0., 1.]]]) In [50]: d.enumerate_support().shape Out[50]: torch.Size([5, 3, 5]) In [51]: d.enumerate_support(expand=False) Out[51]: tensor([[[1., 0., 0., 0., 0.]], [[0., 1., 0., 0., 0.]], [[0., 0., 1., 0., 0.]], [[0., 0., 0., 1., 0.]], [[0., 0., 0., 0., 1.]]]) In [52]: d.enumerate_support(expand=False).shape Out[52]: torch.Size([5, 1, 5]) ``` Motivation: - Currently `enumerate_support` builds up tensors of size `support + batch_shape + event_shape`, but the values are repeated over the `batch_shape` (adding little in the way of information). This can lead to expensive matrix operations over large tensors when `batch_shape` is large (see, example above), often leading to OOM issues. We use `expand=False` in Pyro for message passing inference. e.g. when enumerating over the state space in a Hidden Markov Model. This creates sparse tensors that capture the markov dependence, and allows for the possibility of using optimized matrix operations over these sparse tensors. `expand=True`, on the other hand, will create tensors that scale exponentially in size with the length of the Markov chain. - We have been using this in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py) of `torch.distributions` in Pyro. The interface has been stable, and it is already being used in a few Pyro algorithms. We think that this is more broadly applicable and will be of interest to the larger distributions community. cc. apaszke, fritzo, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11231 Differential Revision: D9696290 Pulled By: soumith fbshipit-source-id: c556f8ff374092e8366897ebe3f3b349538d9318	2018-09-06 21:39:42 -07:00
Yan Zhu	c59c1a25b2	diagnose option: get_entry to print a whole row (#11308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11299 Reviewed By: xianjiec Differential Revision: D9652844 fbshipit-source-id: 650d550317bfbed0c1f25ae7d74286cfc7c3ac70	2018-09-06 21:26:30 -07:00
Edward Yang	2946b021e3	Disable flaky test, see #11360 (#11361 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11361 Reviewed By: yf225 Differential Revision: D9696524 Pulled By: ezyang fbshipit-source-id: f6801d6f4f34090d467b16810db9cf576d5d519b	2018-09-06 20:40:00 -07:00
Edward Yang	3149a72c63	Move TensorOptions.cpp to the correct place in ATen/core (#11244 ) Summary: This actually ended up being a lot more involved than I thought. The basic problem is that in some of our build environments, thread local state is not supported. The correct way to test if this is the case is using the (undocumented) CAFFE2_FB_LIMITED_MOBILE_CAPABILITY macro. On mobile, OptionGuard is not available, and you have to do everything by hand. There's a static_assert to check if you accidentally use OptionGuard in this case and give you a better error message in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11244 Reviewed By: gchanan Differential Revision: D9646190 fbshipit-source-id: cf4016f79b47705a96ee9b6142eb34c95abb2bd4	2018-09-06 20:11:39 -07:00
Edward Yang	c45607f77f	Static assert GetMutable is not passed with Tensor argument (#11323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11323 If you do pass it this, you'll get a pointer to UndefinedTensor; probably not what you want! Reviewed By: Yangqing Differential Revision: D9676205 fbshipit-source-id: 0bd3c22c2c40ac2958f95fc7a73b908af291cf22	2018-09-06 20:11:37 -07:00
Orion Reblitz-Richardson	0f419abf40	Roll nomnigraph build into caffe2 (#11303 ) Summary: We need to remove nomnigraph from the list of public libraries in order to support libtorch extensions. Easiest way to do this is to include it into the Caffe2 source like all other caffe2/core/ code. However, because the headers are in a different place, we need to include them for linked libraries (pybind, tests, etc). On an upside, this means that nomnigraph is now default hidden visibility too. FYI peterjc123 xkszltl goldsborough bwasti Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11303 Reviewed By: pjh5 Differential Revision: D9694932 Pulled By: orionr fbshipit-source-id: 5db3eb20bc5ddc873ce9151236b74663fbb33ed8	2018-09-06 19:38:09 -07:00
iotamudelta	9de2085806	Use custom hcc/HIP, purge hcSPARSE (#11198 ) Summary: * purge hcSPARSE now that rocSPARSE is available * integrate a custom hcc and HIP * hcc brings two important compiler fixes (fixes hundreds of unit tests) * HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch) * mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault) * optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198 Differential Revision: D9652340 Pulled By: ezyang fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690	2018-09-06 19:38:07 -07:00
Xiaomeng Yang	ec5404a449	Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU Reviewed By: houseroad Differential Revision: D9512435 fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1	2018-09-06 18:26:13 -07:00
Teng Li	7726b36489	Full-fledged group testings and fixes for c10d frontend APIs (#11318 ) Summary: Fixed a few bugs that were not tested in the c10d frontend APIs, including get_rank, get_world_size, and destroy_process_group of a given group. These APIs are added to the CI tests. Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths. Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11318 Reviewed By: pietern Differential Revision: D9675896 Pulled By: teng-li fbshipit-source-id: a2eac2c57933effa2d139855f786e64919a95bfc	2018-09-06 18:26:11 -07:00
Chenguang Xi	1a01c75dde	support gradClipping per blob in mtml (#10776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10776 as title Reviewed By: chocjy Differential Revision: D9458099 fbshipit-source-id: f840d4f1542e8180f41cc0732c8468fa43805ab8	2018-09-06 18:10:52 -07:00
Lu Fang	c39216f8c4	Automatic update of fbcode/onnx to bff0b8835870c7df7762ef43498d000d2d8ffb52 (#11346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11346 Previous import was 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c Included changes: - [bff0b88](https://github.com/onnx/onnx/commit/bff0b88): Add DynamicSlice experimental op (#1377) <James Reed> - [91a7b8e](https://github.com/onnx/onnx/commit/91a7b8e): statCoverage(model) (#1246) <Akshay Chalana> - [36643c6](https://github.com/onnx/onnx/commit/36643c6): fix the doc for softmax (#1374) <Lu Fang> - [8c64acd](https://github.com/onnx/onnx/commit/8c64acd): Silence usused result warning in ONNXIFI wrapper cleanup. Fix #1344 (#1371) <Marat Dukhan> - [53b20f6](https://github.com/onnx/onnx/commit/53b20f6): Add the ability to deprecate an OpSchema (#1317) <Ryan Hill> - [8aec4e2](https://github.com/onnx/onnx/commit/8aec4e2): [Anderspapitto patch] fix the shape inference for broadcasting (#1368) <Lu Fang> Reviewed By: jamesr66a Differential Revision: D9691533 fbshipit-source-id: 6aff6ce04ade37182e2ffe9bc83eb86846bc722d	2018-09-06 17:39:57 -07:00
Richard Zou	4d678790c5	enable advanced indexing with tensors (#10862 ) Summary: On the way to #10774 This PR adds advanced indexing with tensors. The approach is to desugar advanced indexing into an at::index op. This is exactly how normal pytorch does it. [(I used this code as reference)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) Supporting sequences is a little tricky because JIT script doesn't have an easy way to turn arbitrary n-dimensional python lists into a tensor (it would be easy if we supported `torch.tensor`), so that'll come in a future PR. cc jamesr66a zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10862 Differential Revision: D9659449 Pulled By: zou3519 fbshipit-source-id: 56d293720d44c0fd27909e18327ab3985ddfced6	2018-09-06 16:41:45 -07:00
Duc Ngo	148f7cc47a	nomnigraph - nit - fix generated code to be consistent with style (#11343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11343 make the generated classes (OpClasses.h...) consistent with fb cpp code style Reviewed By: yinghai Differential Revision: D9689487 fbshipit-source-id: 450e742d2462115d1bf41b9ea88d20df0a842b2b	2018-09-06 16:27:17 -07:00
Edward Yang	49231ab0a8	Reimplement storage slicing. (#11314 ) Summary: In #9466 I got rid of storage views and eliminated all places where they were used... OR SO I THOUGHT. In actuality, under certain conditions (specifically, if you trained a CUDA multiprocessing model shared over CUDA IPC and then serialized your parameters), you could also serialize storage slices to the saved model format. In #9466, I "fixed" the case when you loaded the legacy model format (really, just unshared the storages--not strictly kosher but if you aren't updating the parameters, shouldn't matter), but NOT the modern model format, so such models would fail. So, I could have applied the legacy model format fix too, but hyperfraise remarked that he had applied a fix that was effectively the same as unsharing the storages, but it had caused his model to behave differently. So I looked into it again, and realized that using a custom deleter, I could simulate the same behavior as old storage slices. So back they come. In principle, I could also reimplement storage views entirely using our allocators, but I'm not going to do that unless someone really really wants it. Fixes #10120. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314 Reviewed By: ailzhang Differential Revision: D9671966 Pulled By: ezyang fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce	2018-09-06 16:11:59 -07:00
Jongsoo Park	1d406c04ae	fix comment on Cost params_bytes (#11190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11190 As discussed with Alexander Sidorov, params_bytes refer to the number of bytes we're reading for parameters, not the size of parameters. They only differ in sparse operators. Reviewed By: mdschatz Differential Revision: D9628635 fbshipit-source-id: 9e2aed0cf59388928dc69b8534cf254f0347c9c8	2018-09-06 15:12:22 -07:00
Yangqing Jia	68613cf5a2	Windows DLL build with Caffe2 code (#11266 ) Summary: This is an experimental build on top of what orionr and mingzhe09088 built. Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266 Reviewed By: orionr Differential Revision: D9682942 Pulled By: Yangqing fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3	2018-09-06 15:12:20 -07:00
Orion Reblitz-Richardson	34c0043aae	Force third_party Eigen from setup.py (#11334 ) Summary: We shouldn't use system Eigen in any cases when building with setup.py. If people want to use system Eigen (not from third_party) they can build with CMake for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11334 Reviewed By: pjh5 Differential Revision: D9689450 Pulled By: orionr fbshipit-source-id: baf616b9f195692942151ad201611dcfe7d927ba	2018-09-06 14:56:53 -07:00
Tommy Yu	03ca7358af	Add unit test for Parallel Spatial Batch Normalization (#11098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11098 Added a test for testing CPU version across multiple devices. Reviewed By: enosair, BIT-silence Differential Revision: D9584520 fbshipit-source-id: 0d8c85e6d402bc7b34d5f8f16ef655ff9b61b49e	2018-09-06 14:26:56 -07:00
Yinghai Lu	5712fe3297	Fix out-of-boundary conversion issue (#11338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11338 The `min_` and `max_` value of the filler is in `double` format but when we are filling a specific type of tensor, their value can exceed the type limits, resulting in crash. This diff checks the type limits first and if `min_`/`max_` is out of the limits, it will clip it. Reviewed By: highker Differential Revision: D9684455 fbshipit-source-id: 6da98a03c57f3296abaddc7c5cfc1c836c611eb0	2018-09-06 13:39:52 -07:00
Teng Li	ec195129ec	Adding setTimeout option in Store (#11265 ) Summary: This will allow users to set customized timeout option for the store. Tested by my own debug print to make sure that C++ actually used the timeout Pull Request resolved: https://github.com/pytorch/pytorch/pull/11265 Differential Revision: D9666164 Pulled By: teng-li fbshipit-source-id: 4eb6441783da106a3fd59b95457e503e83e4640f	2018-09-06 12:55:50 -07:00
David Riazati	fef52cc1f8	Add resolver for 'torch' module (#10847 ) Summary: This lets you compile builtin functions from C++ without having a dependence on Python ```cpp auto module = torch::jit::compile(JIT"( def my_script_method(x, y): return torch.relu(x) + y )"); IValue result = module->run_method("my_script_method", 1, 2); ``` goldsborough zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10847 Differential Revision: D9543461 Pulled By: driazati fbshipit-source-id: 6160dae094030ca144a0df93cb9f26aa78c8cf27	2018-09-06 12:42:21 -07:00
Duc Ngo	0f1ec07c57	nomnigraph - nit - rename unit test files (#11315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11315 Rename unit tests file to make it consistent with fb cpp style guideline "The unittest for MyFoo.cpp should be named MyFooTest.cpp." Reviewed By: yinghai Differential Revision: D9671519 fbshipit-source-id: 44ed6794f6e479d190916db8064eee692e3ad876	2018-09-06 12:28:18 -07:00
Peter Goldsborough	ed8849b640	Add include path to Doxygen preprocessing and add some documentation (#11313 ) Summary: 1. Add documentation to Linear and improve documentation for RNNs 2. Fix preprocessing in C++ docs by adding correct include path 3. Make myself and ebetica codeowner of docs/cpp to improve development speed ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11313 Differential Revision: D9683615 Pulled By: goldsborough fbshipit-source-id: 84ea32f9ea6b4060744aabbf5db368776a30f0b5	2018-09-06 12:28:17 -07:00
Costin Eseanu	f98bd53b01	Small fix to the UniformIntFill tensor shape and type inference. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11028 Reviewed By: salexspb Differential Revision: D7715107 Pulled By: costin-eseanu fbshipit-source-id: a4f73d53c0192b9826451b4bba4ab0992abbb1a2	2018-09-06 12:11:32 -07:00
Richard Zou	1ad61a18b2	Rename cuda tests to have 'cuda' in their names (#11332 ) Summary: Not a lot changed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11332 Differential Revision: D9683680 Pulled By: zou3519 fbshipit-source-id: 95f444e54049dd268fc10effe425ef2df79c6467	2018-09-06 11:57:52 -07:00
Yiming Wu	0ef2b318a2	fix empty net type (#11286 ) Summary: Turns out that '' net.type is not acceptable to CreateNet. But empty net.type is acceptable. Fix that in this diff. Also this is related to T33613083 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11286 Reviewed By: Maratyszcza, wat3rBro Differential Revision: D9659920 Pulled By: harouwu fbshipit-source-id: d68f24b754e18e1121f029656d885c48ab101946	2018-09-06 11:10:01 -07:00
Xiaodong Wang	936bba77d1	cudnn 7 upgrade with spatialBN fix (#11291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11291 In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%). Our current theory for this accuracy loss is because of the new "CUDNN_BATCHNORM_SPATIAL_PERSISTENT" in spatialBN operator. In Caffe 2, we've made this mode as default. According to CuDNN manual (https://fburl.com/z996mr13), this mode may introduce some limitation in the input data range and cause overflow (which outputs NaN). NaN is probably not the case, because we're seeing a few percent of accuracy drop but not gradient explosion or failure. However, this "performance-optimized" code path may introduce accuracy loss (which is not caught by our unit test case because the input data range is [-0.5-0.5]. Reviewed By: kuttas, stephenyan1231 Differential Revision: D9601217 fbshipit-source-id: 73c2690c19cb1f02ea4e5e2200f50128df4f377b	2018-09-06 10:11:59 -07:00
Elias Ellison	4ae95738b2	Ignore FuseGraph Call on Windows (#11015 ) Summary: Fusion is NYI implemented on Windows, so ignore FuseGraph call instead of failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11015 Differential Revision: D9619121 Pulled By: eellison fbshipit-source-id: ad09aeaa41b7fdeb9ca7bf5e1c166923ca405b15	2018-09-06 09:54:51 -07:00
Anders Papitto	a853a74217	defer resolution of mkl to a cmake wrapper library (#11298 ) Summary: this is a fix that's needed for building extensions with a pre-packaged pytorch. Consider the scenario where (1) pytorch is compiled and packaged on machine A (2) the package is downloaded and installed on machine B (3) an extension is compiled on machine B, using the downloaded package Before this patch, stage (1) would embed absolute paths to the system installation of mkl into the generated Caffe2Config.cmake, leading to failures in stage (3) if mkl was not at the same location on B as on A. After this patch, only a reference to the wrapper library is embedded, which is re-resolved on machine B. We are already using a similar approach for cuda. Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298 Differential Revision: D9683150 Pulled By: anderspapitto fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4	2018-09-06 09:10:39 -07:00
Orion Reblitz-Richardson	dda8402447	Cleanup dependency of distributed flags (#11221 ) Summary: Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set. cc pietern The controller you requested could not be found. cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221 Reviewed By: Yangqing Differential Revision: D9664267 Pulled By: orionr fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73	2018-09-06 08:56:00 -07:00
Gregory Chanan	68930c48cf	Move minimal wrapdim functionality to core, remove THTensor include i… (#11283 ) Summary: …n TensorImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11283 Reviewed By: ezyang Differential Revision: D9660015 Pulled By: gchanan fbshipit-source-id: 263cba226d9ee981d55281c94e6fda5842a46b02	2018-09-06 08:10:33 -07:00
Edward Yang	f6568b00f5	Change includes from ATen/Storage.h to ATen/core/Storage.h (#11217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11217 ``` codemod -d . --extensions cc,cpp,cu,cuh,h 'ATen/Storage.h' 'ATen/core/Storage.h' ``` Reviewed By: gchanan Differential Revision: D9634904 fbshipit-source-id: 35a177733f3816e32d8748513c9caa4cf13a6896	2018-09-06 08:10:30 -07:00
Richard Zou	656e81db93	Fix scalar tensor assert in fusion compiler (#10952 ) Summary: Fixes #8560. Unblocks #10715. The assert (nDim <= uncompressedDims) was being triggered for a scalar tensor because we compute nDim to be 1 for a scalar tensor but uncompressedDim = 0. This PR changes it so that we compute nDim to be 0 for a scalar tensor. This works because indexing in a kernel depends on nDim. If nDim = 0, then offset is always 0, which is what we want. Some other (small) changes were necessary to make this work: - One cannot define a 0-length array `IndexType arr[0]` so the code guards against that - Needed to change some of the maxTensorInfoSize logic to handle the case when uncompressedDim == 0. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10952 Differential Revision: D9544607 Pulled By: zou3519 fbshipit-source-id: 2b873f47e2377125e1f94eb1b310a95cda51476c	2018-09-06 07:54:57 -07:00
Bram Wasti	bb7d1837bc	Add dead code elimination pass (#10101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10101 Simple DCE enabled by knowledge of the actual outputs (stacked beneath this diff) Reviewed By: yinghai Differential Revision: D9107853 fbshipit-source-id: 0c38fe5fe408be2b7fc9e1fe6a5b7160c06ce79b	2018-09-05 23:55:17 -07:00
Teng Li	220c9e52b9	Distributed Data Parallel CPU module for C10D (#11168 ) Summary: Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed. We will keep both in the first release and remove the THD one once c10d is stable enough. Test fully covered just as THD too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11168 Differential Revision: D9674963 Pulled By: teng-li fbshipit-source-id: ecf52a7189374ca7930c2be305218167fdd822a7	2018-09-05 21:59:31 -07:00
Jerry Zhang	126ac4b71f	Back out "[pt1][tensor] Add strides to caffe2::Tensor" Summary: Original commit changeset: 3643871b70f1 Differential Revision: D9665958 fbshipit-source-id: 46e22adbf39af92fb23abb66212991bd53a86317	2018-09-05 20:39:07 -07:00
Orion Reblitz-Richardson	fb836db4b2	Fix conv gradient conversion (#11312 ) Summary: Fix Windows build failure after https://github.com/pytorch/pytorch/pull/10744 landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11312 Reviewed By: mingzhe09088 Differential Revision: D9669907 Pulled By: orionr fbshipit-source-id: d717ec4f8fdf17acf334528d7838b88c5c50e9c3	2018-09-05 20:09:31 -07:00
Peter Goldsborough	dccd0f2de6	Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (#11050 ) Summary: Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive). Fixed things like: - `typedef` vs `using` - Use `.empty()` instead of comparing with empty string/using `.size() == 0` - Use range for loops instead of old style loops (`modernize-`) - Remove some `virtual` + `override` - Replace `stdint.h` with `cstdint` - Replace `return Type(x, y)` with `return {x, y}` - Use boolean values (`true`/`false`) instead of numbers (1/0) - More ... ezyang apaszke cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050 Differential Revision: D9597505 Pulled By: goldsborough fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2	2018-09-05 19:55:50 -07:00
Tongzhou Wang	83a1ab2136	Sparse tensor printing; add NotImplemented autograd fn (#10181 ) Summary: Commits: 1. Add autograd function `NotImplemented` (subclass of `Error`) so python `grad_fn` prints nicer. Since `Error` is used in `DelayedError` to implement `oncedifferentiable`, I can't just change its name. cc colesbury 2. Add printing for sparse tensors. Fixes https://github.com/pytorch/pytorch/issues/9412 . cc weiyangfb The controller you requested could not be found. . 3. Add tests for sparse printing Examples: ```diff In [2]: x = torch.sparse.FloatTensor(torch.arange(4).view(2,2), torch.randn(2, 2), [10, 10, 2]) In [3]: x Out[3]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]]) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]]) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo) In [4]: x.requires_grad_() Out[4]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]], grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, requires_grad=True) In [5]: x + x Out[5]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-2.3664, -1.1855], - [ 0.1662, 0.5021]], grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 3.0162, 0.6902], + [-0.0785, 0.9553]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, grad_fn=<AddBackward0>) In [6]: x.double() Out[6]: - torch.sparse.DoubleTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]], dtype=torch.float64, grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, dtype=torch.float64, layout=torch.sparse_coo, + grad_fn=<NotImplemented>) In [7]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2, 0), [0]) In [8]: x Out[8]: - torch.sparse.FloatTensor of size (0,) with indices: - tensor([], size=(0, 2), dtype=torch.int64) - and values: - tensor([], size=(2, 0)) + tensor(indices=tensor([], size=(0, 2)), + values=tensor([], size=(2, 0)), + size=(0,), nnz=2, layout=torch.sparse_coo) In [9]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2), []) In [10]: x Out[10]: - torch.sparse.FloatTensor of size () with indices: - tensor([], size=(0, 2), dtype=torch.int64) - and values: - tensor([-0.0064, 0.8518]) + tensor(indices=tensor([], size=(0, 2)), + values=tensor([ 0.9800, -0.5978]), + size=(), nnz=2, layout=torch.sparse_coo) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10181 Differential Revision: D9139845 Pulled By: SsnL fbshipit-source-id: 353eebd55fac4049ed9bf85f8b0ee2c1418a744e	2018-09-05 19:41:22 -07:00
Bram Wasti	fa147abda4	Add convertToCaffe2Proto to python API Summary: Closing the gap a bit on API, allowing users to go NetDef -> nomnigraph -> NetDef in python now Reviewed By: duc0 Differential Revision: D9670495 fbshipit-source-id: 6497518ffc05a186deb0d657e06317980d39ddd5	2018-09-05 18:40:48 -07:00
Wei Yang	425ea6b31e	fix doc for functional.dropout* (#10417 ) Summary: - fixes #4177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417 Differential Revision: D9542876 Pulled By: weiyangfb fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df	2018-09-05 17:26:00 -07:00
Jongsoo Park	ad116210e5	typo fix Tranpose2D -> Transpose2D (#11281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11281 A simple typo fix Reviewed By: BIT-silence Differential Revision: D9658324 fbshipit-source-id: b6513c8d12d8fe75a9b18df1b443e9e66e692744	2018-09-05 17:25:58 -07:00
Christian Puhrsch	a9d8b021e9	Remove THFinalizer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11287 Reviewed By: ezyang Differential Revision: D9662341 Pulled By: cpuhrsch fbshipit-source-id: 306bea00694db1ae207167ee4bf10de01426911c	2018-09-05 16:56:27 -07:00
Jesse Hellemn	c0efe6f027	Forward declarations of needed curand functions (#10911 ) Summary: Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911 Reviewed By: pjh5 Differential Revision: D9636256 Pulled By: orionr fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65	2018-09-05 16:56:26 -07:00
Duc Ngo	57728f71e7	nomnigraph - simplify core graph API and test (#11256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11256 - in deleteNode method, remove optional deleteEdge flag as it's not used - in deleteEdge method, remove optional removeRef flag as it's not used - in replaceNode method, remove optional newHead_ parameter as it's not used - also simplifying the implementation by just calling replaceInEdges and replaceOutEdges - remove importNode & importEdge as they're not in used - add getEdgeIfExists that is like getEdge() but returns nullptr instead of throwing when the edge does not exist - reduce verbosity in the basic graph unit test and add more test cases for ReplaceEdges Differential Revision: D9650913 fbshipit-source-id: 6c18b37bef0d2abe1b57fb4fc47bfdbcee387694	2018-09-05 16:40:49 -07:00
Peter Goldsborough	c43187291c	Small fixes to cppdocs for sync script (#11300 ) Summary: I'm setting up an automatic sync job for cppdocs and need two fixes to the cpp docs config: 1. Right now the cppdocs use the `torch` package to figure out the version. For C++ docs all I really need from the built package are the generated Tensor.h and Functions.h files. I can actually generate those directly via `aten/src/ATen/gen.py`, so I can skip building PyTorch altogether and save 10 minutes in the sync job! For this I need to avoid using the torch package in the docs. 2. Internal proxy issues prevent using the git link for sphinx_rtd_theme. We can just use the pip package for the cppdocs (not for the normal PyTorch docs) soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11300 Differential Revision: D9667193 Pulled By: goldsborough fbshipit-source-id: 5567e0b3d3bdce03f5856babdb4ff76bcee91846	2018-09-05 16:40:47 -07:00
Will Feng	c9e66351a7	Port all PyTorch and Caffe2 jobs to CircleCI (#11264 ) Summary: This PR adds all PyTorch and Caffe2 job configs to CircleCI. Steps for the CircleCI mini-trial: - [ ] Make sure this PR passes Jenkins CI and fbcode internal tests - [x] Approve this PR - [ ] Ask CircleCI to turn up the number of build machines - [ ] Land this PR so that the new `.circleci/config.yml` will take effect Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264 Differential Revision: D9656793 Pulled By: yf225 fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1	2018-09-05 16:28:11 -07:00
Jerry Zhang	9f4bcdf075	caffe2::DeviceType -> at::DeviceType (#11254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254 Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h: ``` template <int d> struct EventCreateFunctionRegisterer { explicit EventCreateFunctionRegisterer(EventCreateFunction f) { static_assert(d < MaxDeviceTypes, ""); Event::event_creator_[d] = f; } }; ``` at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example): 1. caffe2::DeviceType → caffe2::DeviceTypeProto 2. caffe2::CPU → caffe2::PROTO_CPU 3. caffe2::DeviceType = at::DeviceType 4. caffe2::CPU = at::DeviceType::CPU codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type, ' 'device_type(), PROTO_' + some manual changes In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU. In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later. Reviewed By: ezyang Differential Revision: D9545704 fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7	2018-09-05 16:28:09 -07:00
Yan Zhu	ac9f0a6884	refactor preproc, support dense in TumHistory layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11131 Reviewed By: xianjiec Differential Revision: D9358415 fbshipit-source-id: 38bf0e597e22d540d9e985ac8da730f80971d745	2018-09-05 16:10:13 -07:00
Natalia Gimelshein	3e85685f8f	add persistent rnns with conservative criteria (#11248 ) Summary: Persistent rnns provide much better performance on V100 with half input data for a variety of cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11248 Differential Revision: D9665687 Pulled By: ezyang fbshipit-source-id: 2bd09a7eb1f5190aadb580977b0ba956e21a7dd5	2018-09-05 16:10:11 -07:00
Richard Zou	68c2e014cb	Handling for py2/py3 division differences (#11016 ) Summary: - In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if `from __future__ import division` is not imported in the file. - The / operator is universally set to do "true" division for integers - Added a `prim::FloorDiv` operator because it is used in loop unrolling. The error if users use '/' in python 2 without importing from __future__ occurs when building the JIT AST. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016 Differential Revision: D9613527 Pulled By: zou3519 fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6	2018-09-05 14:57:38 -07:00
Pieter Noordhuis	9a0effb92c	Update send/recv tests to reflect intended use (#11275 ) Summary: The existing tests had every rank run send to every other rank and only then switch to recv mode. This only works if the send operations are non-blocking and the passed tensors are immediately copied to some kind of send buffer. Instead, every send must be matched with a recv on the other side, because from the API perspective they may block. E.g. imagine a 1GB tensor being sent to every other rank. It can only go through if there is a recv on the other side, or it will deadlock. This change reflects this in the send/recv unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11275 Differential Revision: D9658197 Pulled By: pietern fbshipit-source-id: fb6a3fc03b42343a9dfeed0def30d94914e76974	2018-09-05 14:40:04 -07:00
Martin Schatz	8da081f7a5	Add cost inference to ConvGradient and WeightedSum operators (#10744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10744 As title Reviewed By: jspark1105 Differential Revision: D9436387 fbshipit-source-id: 578b7a6d98843d57e3f8f4c564727e9cadbedd78	2018-09-05 13:56:05 -07:00
Christian Puhrsch	4fe3356ee0	Move collapse dims into a single place (#11272 ) Summary: Deduplicates implementations and reduces sources of failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/11272 Differential Revision: D9659167 Pulled By: cpuhrsch fbshipit-source-id: 759bfba4fd90795038afe684d9829f5f41f98109	2018-09-05 12:57:00 -07:00
Tongzhou Wang	5e2067ce30	Fix some more warnings (#11257 ) Summary: Found these when compiling the new master with gcc 7.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11257 Differential Revision: D9656612 Pulled By: SsnL fbshipit-source-id: 7acb19e13204c010238dab7bc6973cc97b96f9a4	2018-09-05 11:10:27 -07:00
Lu Fang	f866574afc	Fix the batchnorm onnx exporting when affine=False Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11249 Reviewed By: Ac2zoom Differential Revision: D9652526 Pulled By: houseroad fbshipit-source-id: 12a9038beddd227a2f9e2178edf4e8d623488c3e	2018-09-05 11:10:25 -07:00
Adam Paszke	55212507a2	Improve error message to include return types too (#11245 ) Summary: Fixes #11057. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11245 Differential Revision: D9652698 Pulled By: apaszke fbshipit-source-id: 4c5006e32e599c35367aa5acfae45de3ab8ac176	2018-09-05 10:56:51 -07:00
Peter Goldsborough	e6d6aed12e	Check doxygen output in travis (#11124 ) Summary: This PR adds a .travis.yml check for our C++ documentation. The goal is to avoid any documentation/comments in our C++ code that would break the doxygen output and possibly ruin the C++ documentation site (currently https://pytorch.org/cppdocs). For this, we: 1. Run doxygen and record any warnings, 2. Filter out some known bogus warnings, 3. Count the remaining warnings, 4. Fail the check if (3) is non-zero. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11124 Differential Revision: D9651011 Pulled By: goldsborough fbshipit-source-id: 30f776d23bb6d6c482c54db32828b4b99547e87b	2018-09-05 10:25:56 -07:00
Thomas Viehmann	267e1ec112	Accept more numpy scalars as doubles (#9659 ) Summary: Allows mulitplication of e.g. numpy.float32 with tensors. This came up with #9468 If you want this and after the other patch is done, I'll add tests (but that would be conflicting, so I prefer to wait). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9659 Differential Revision: D8948078 Pulled By: weiyangfb fbshipit-source-id: c7dcc57b63e2f100df837f70e1299395692f1a1b	2018-09-05 10:25:55 -07:00
Dmitrii Marin	8bd80a6b74	Fixed log message (#10874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10874 Fixes the log message "WARNING:data_workers:Warning, data loading lagging behind: name=0" where instead of source name the size of a queue is reported Reviewed By: panshen1, Novitial Differential Revision: D9506606 fbshipit-source-id: 03717cfa9b991afb335ef877378afa3b52fd8f22	2018-09-05 09:55:52 -07:00
Neeraj Pradhan	434e943b08	Fix to distribution.__repr__ with lazy attributes (#11263 ) Summary: `__repr__` currently fails for distributions with lazy attributes in PyTorch master, throwing a `KeyError`. This fixes the issue. Additionally: - Added `logits` to `arg_constraints` for distributions that accept either `probs` or `logits`. This is both to have `__repr__` display the `logits` param when available, and to be able to do validation checks (e.g. NaN checks) when the logit parametrization is used. fritzo, alicanb - I think there were reasons why we had not done so in the first place, but I am unable to recall now. It passes all the tests, but let me know if there is something that I am missing at the moment. - There are certain distributions, e.g. `OneHotCategorical` which won't show any parameters because it uses a `categorical` instance under the hood and neither `logits` / `probs` in `arg_constraints` are present in the instance's `__dict__`. This isn't addressed in this PR. cc. vishwakftw, fritzo, nadavbh12, apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11263 Differential Revision: D9654959 Pulled By: apaszke fbshipit-source-id: 16f5b20243fe8e2c13e9c528050d4df0b8ea6e45	2018-09-05 09:55:51 -07:00
Roy Li	9fc22cb772	Add import export step to end to end tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10717 Differential Revision: D9562888 Pulled By: li-roy fbshipit-source-id: 8f5d62fd0a44aca0a41dc10438e7bb91cc2a972a	2018-09-05 09:39:47 -07:00
Edward Yang	1808e368e4	Add complex hooks for out of tree complex implementation. (#11216 ) Summary: This PR adds a hooks interface for registering types for complex scalar types, and a sample implementation of the hook in test_cpp_extensions. The hook registration is patterned off of the existing CUDA hooks. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11216 Differential Revision: D9654840 Pulled By: ezyang fbshipit-source-id: 7b97646280d584f8ed6e14ee10a4abcd04cf2987	2018-09-05 09:25:50 -07:00
Christian Puhrsch	aeb6094538	Unify opt flag for cmake codegen (#11227 ) Summary: Also enables debug for non-MSVC for kernel codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227 Differential Revision: D9656506 Pulled By: cpuhrsch fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333	2018-09-05 08:55:49 -07:00
Duc Ngo	d612855b91	nomnigraph - fix memory error in NN subgraph matchOp (#11127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11127 it's invalid to capture `predicate` by reference as it's a local variable. capture it by value instead. Differential Revision: D9600115 fbshipit-source-id: 92e0130d0a74908380b75ade5c3492df49e25941	2018-09-05 07:57:40 -07:00
Adam Paszke	6d6655e6be	Port PackedSequences functions to C++ (#11224 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224 Differential Revision: D9652703 Pulled By: apaszke fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950	2018-09-05 06:35:15 -07:00
Adam Paszke	b7038f7c37	Treat numerical differences as warnings instead of errors when tracing (#11246 ) Summary: Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246 Differential Revision: D9652701 Pulled By: apaszke fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25	2018-09-05 06:35:13 -07:00
Hector Yuen	b7cd4b692c	add a Float16UniformFill (#11123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11123 this adds an operator that fills a tensor with a uniform(min, max) the implementation is to use the fp32 generator and convert to fp16 if performance becomes an issue we could resort to intrinsics Reviewed By: jspark1105, chocjy Differential Revision: D9598142 fbshipit-source-id: 5aeab99acf7c3596fa6c33611d9d2c484f7c1145	2018-09-04 23:28:22 -07:00
Thomas Viehmann	d4060d2d0e	Implement torch.tensordot (#10025 ) Summary: Fixes: #8988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10025 Reviewed By: ezyang Differential Revision: D9540967 Pulled By: yf225 fbshipit-source-id: 6ba2a7777162983977db884b693e6f4543b31aeb	2018-09-04 21:10:07 -07:00
Yiming Wu	d1b920b44f	keep net type info when generating model complete net (#11032 ) Summary: keep net type info when generating model complete net. This will keep the performance optimization option Pull Request resolved: https://github.com/pytorch/pytorch/pull/11032 Reviewed By: wat3rBro Differential Revision: D9564125 Pulled By: harouwu fbshipit-source-id: c6546af9b1d4ff5eddf6124e24a5da1b8baf47df	2018-09-04 21:10:06 -07:00
Edward Yang	56bdd87b40	Get rid of some uses of type() (#11215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11215 I found these by deleting the implicit conversion of Type to TensorOptions and then fixing sites. This isn't a complete refactor, because I ran out of steam after fixing this many and decided to keep the implicit conversion. Still, why waste a perfectly good refactor? Reviewed By: gchanan, cpuhrsch Differential Revision: D9634750 fbshipit-source-id: 4d8fb778e13e6e24b888b1314a02709b2cb00b62	2018-09-04 20:26:22 -07:00
Edward Yang	9ca63c5e63	Reorganize methods in Type, add CPUTypeDefault/CUDATypeDefault (#11205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11205 Our short term plan for supporting out of tree complex development requires an external library to add a custom subclass of Type without access to the code generation facilities in ATen. This commit reorganizes Type so as to minimize the amount of boilerplate you have to write when making a subclass of Type. In particular, it: - Creates a new CPUTypeDefault/CUDATypeDefault class, which you are intended to inherit from, which provides default implementations of CPU/CUDA that is layout/dtype agnostic. - Adds new getCPUAllocator() and getCUDAAllocator() functions, as a more public API to get your hands on Allocator - Adds allocator() and getDeviceFromPtr(), abstracting the device specific parts of storage() methods; these methods are now implemented in base TypeDefault. - Delete the static typeString() method, which is now dead. - Move is_cuda/is_sparse/is_distributed to TypeDefault. Reviewed By: SsnL Differential Revision: D9631619 fbshipit-source-id: 40b600d99691230e36e03eb56434c351cbc2aa3a	2018-09-04 20:26:20 -07:00
Peter Goldsborough	f0d3fda064	Improve docs for torch::nn::Module (#11115 ) Summary: Added some documentation. Will rebuild docs to make sure it looks good. Can already accept approvals. ebetica apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11115 Differential Revision: D9597880 Pulled By: goldsborough fbshipit-source-id: 56b701da631702ba56e281a0de0f7ebe490f5c5a	2018-09-04 18:10:38 -07:00
Gregory Chanan	7f74875304	Pull Context out of TensorMethods. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11241 Reviewed By: ezyang Differential Revision: D9645514 Pulled By: gchanan fbshipit-source-id: 43e65d1d2fa3183264ed7e4752c1512df5f69175	2018-09-04 18:10:37 -07:00
Gregory Chanan	05cb40dc00	Move some includes from Tensor/Type to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11234 Reviewed By: ezyang Differential Revision: D9642669 Pulled By: gchanan fbshipit-source-id: 2c131bb46b54a0803c37b444ad48d861080056f1	2018-09-04 18:10:34 -07:00
Orion Reblitz-Richardson	c8672f0b42	Support environments with no libprotobuf (#11161 ) Summary: Just pulling this out of https://github.com/pytorch/pytorch/pull/10611 Make sure we can support environments where we don't have libprotobuf installed when we link-local protobuf. cc goldsborough Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11161 Differential Revision: D9650282 Pulled By: orionr fbshipit-source-id: 447b5e54cd2639973b4b10f58590d1c693a988d4	2018-09-04 17:27:54 -07:00
Teng Li	020501b7b0	Getting rid of USE_C10D for build (#11237 ) Summary: Will use USE_DISTRIBUTED for both c10d and THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237 Differential Revision: D9647825 Pulled By: teng-li fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969	2018-09-04 17:27:53 -07:00
Christian Puhrsch	313e89d8db	Fix dimension collapsing (#11226 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11226 Differential Revision: D9646638 Pulled By: cpuhrsch fbshipit-source-id: 104f367f75a4478bb7580324ea3661de71b2c8b0	2018-09-04 17:27:52 -07:00
Gregory Chanan	6219c4a28f	Make Scalar::toTensor a free function, move Scalar to ATen/core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125 Reviewed By: ezyang Differential Revision: D9599798 Pulled By: gchanan fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4	2018-09-04 16:25:57 -07:00
Pieter Noordhuis	033499cf56	Remove mention of USE_DISTRIBUTED_MW (#11240 ) Summary: This was lingering after #10731. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/11240 Differential Revision: D9645437 Pulled By: pietern fbshipit-source-id: d02c33354b094be3bb0872cf54a45721e20c4e7d	2018-09-04 16:10:20 -07:00
Mingzhe Li	3f30c296d3	Export CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_* (#11233 ) Summary: This PR resolved the following compilation errors on devgpu: /home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_Tan()' /home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_MaxPool3D()' .... The same error has been happening with caffe2 build with debug mode before build_caffe2 was removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11233 Reviewed By: orionr Differential Revision: D9645527 Pulled By: mingzhe09088 fbshipit-source-id: 68a45aa7fd815cac41b7fd64cfd9838b3226345a	2018-09-04 14:56:43 -07:00
Maxim Naumov	7e0a052a5d	Adding synthetic data generation to the filler.h file (#11060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060 Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on). Reviewed By: highker Differential Revision: D9417594 fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7	2018-09-04 13:40:53 -07:00
Zachary DeVito	1eed7d5f0b	Report an error when trying to record a mutable operator when (#11129 ) Summary: there are multiple views of the tensor live. Also adds recording for copy_ because this is the critical in place op where these views will cause LHS indexing to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129 Differential Revision: D9600195 Pulled By: zdevito fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3	2018-09-04 13:40:51 -07:00
Yury Gitman	0e8088d6f6	Fix typo in data_parallel_model Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11086 Differential Revision: D9581297 fbshipit-source-id: b164177bdbb309f56ff3231c1ffc0973f6c5299b	2018-09-04 13:15:31 -07:00
Bram Wasti	ec6f0ed560	Additional Python Bindings Summary: Major change: - Addition of pattern matching bindings Minor change: - OperatorDef instantiation - Generic Graph API Reviewed By: duc0 Differential Revision: D9546205 fbshipit-source-id: ab5274014be23a3e9e3fcf18ae1815c4f387b83c	2018-09-04 12:10:10 -07:00
Elias Ellison	750cd48980	update expect file for short circuiting (#11229 ) Summary: Fix failing test by updating expect file Pull Request resolved: https://github.com/pytorch/pytorch/pull/11229 Differential Revision: D9638587 Pulled By: eellison fbshipit-source-id: e870ef3a4fbc7e07f299cc9413703d9f77e89895	2018-09-04 11:56:09 -07:00
Yangqing Jia	684b55d762	In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020 ) Summary: TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020 Differential Revision: D9562548 Pulled By: Yangqing fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5	2018-09-04 10:56:22 -07:00
Elias Ellison	539579aa9a	Logical short circuit (#11116 ) Summary: Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated. BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python. One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean. `tensor == (False or tensor)` `tensor == (True and tensor)` We do not currently support this. edit: wording Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116 Differential Revision: D9618168 Pulled By: eellison fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343	2018-09-04 09:25:13 -07:00
Edward Yang	b2217109ec	Move TensorOptions to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11147 Reviewed By: gchanan Differential Revision: D9614321 fbshipit-source-id: 618cb342eb7c52181425f6bb9c17b9ecdb87a394	2018-09-04 08:55:54 -07:00
Edward Yang	0ff1bb0d8a	Remove Type constructor from TensorOptions, add Type::options (#11189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11189 Replaces it with an operator TensorOptions() method on Type, reestablishing the implicit conversion. I originally wanted to get rid of the implicit conversion entirely, but there were a lot of use-sites, so I added it back to avoid a huge codemod. In this patch, I only had to fix sites that used the optional device_index API. Reviewed By: cpuhrsch Differential Revision: D9628281 fbshipit-source-id: 5fe2a68eefb77a3c9bb446f03a94ad723ef90210	2018-09-04 08:10:04 -07:00
Tongzhou Wang	0d5e4a2c66	Allow passing through arguments to unittest (#11209 ) Summary: Example: ```sh python run_test.py -i sparse -- TestSparse.test_factory_size_check -f ``` With this, the `--verbose` option is redundant (one can call `python run_test.py -- -v` instead of `python run_test.py -v`. But since this is (probably) a frequently used flag, I didn't remove the existing easier-to-use option. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11209 Differential Revision: D9632215 Pulled By: SsnL fbshipit-source-id: ff522802da11ef0a0714578be46e4a44f6343d44	2018-09-03 20:09:08 -07:00
Tongzhou Wang	050aa42e09	Fix some more compile warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11208 Differential Revision: D9632216 Pulled By: SsnL fbshipit-source-id: b181f3ce114474e171146cd2ac5de150b0e23f75	2018-09-03 19:39:33 -07:00
Edward Yang	cd4c32691d	Add complex32, complex64 and complex128 dtypes (#11173 ) Summary: We don't generate a corresponding Type implementations for them, so this doesn't do anything at the moment. We don't plan on supporting complex32 in the near future, but it is added to reserve the name and number in case we do at some point in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11173 Reviewed By: SsnL Differential Revision: D9627477 Pulled By: ezyang fbshipit-source-id: f49a44ab1c92d8a33130c249ac7b234f210a65e6	2018-09-03 19:19:36 -07:00
gngdb	c5b021cc88	State dict loading arguments were in the wrong order (#11200 ) Summary: In the state dict loading code, it would print the error message referring to the shape of the loaded parameters and the parameters in the initialised model with the formatting in the wrong order. Swapped them round to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11200 Differential Revision: D9631160 Pulled By: SsnL fbshipit-source-id: 03d9446303bd417fef67027b10d7a27de06486be	2018-09-03 15:42:30 -07:00
Tongzhou Wang	7e2136c2b5	remove allclose from test_doc skipped list Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11187 Differential Revision: D9628349 Pulled By: SsnL fbshipit-source-id: 0ff94666542ca049a6d82091bd9fc79ec1699ac6	2018-09-03 09:39:56 -07:00
iotamudelta	24eb5ad0c5	Fix unit tests on CI (#11191 ) Summary: Disables two of the unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11191 Differential Revision: D9628702 Pulled By: ezyang fbshipit-source-id: 4c298c728f42bb43d39b57967aa3e44385980265	2018-09-02 21:54:47 -07:00
Edward Yang	0a8c8c1dbe	Rename real to scalar_t. (#11163 ) Summary: This is necessary to allow us to use the complex header which defines real (and is very sad if real is macro'ed). We should also fix accreal, ureal, Real and REAL, but only 'real' is the real blocker. ``` codemod -d aten/src/TH --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THC --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THCUNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11163 Reviewed By: SsnL Differential Revision: D9619906 Pulled By: ezyang fbshipit-source-id: 922cb3a763c0bffecbd81200c1cefc6b8ea70942	2018-09-02 15:26:01 -07:00
Edward Yang	43fd6b234d	Make Type a (mostly) pure virtual class; TypeDefault for impls (#11013 ) (#11013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11013 Previously, the parent class Type also contained a large number of implementations, for things like broadcasting and native functions that didn't need dispatch. We'd like to be able to reference this interface from Tensor even when we don't have any of these implementations are available. To do this, we convert Type into a truly pure virtual interface, and move all of the implementations to TypeDefault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11181 Differential Revision: D9561478 Pulled By: ezyang fbshipit-source-id: 13c49d80bc547551adf524b1cf1d691bfe311133	2018-09-02 15:25:59 -07:00
Tongliang Liao	e1a17d5a42	Should not use CAFFE2_API when definition is already in header. (#11114 ) Summary: Remove or use CAFFE2_EXPORT. Fix #11108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11114 Differential Revision: D9628293 Pulled By: ezyang fbshipit-source-id: dc3bb7dc5bc299e3b6cfd1cdd640f618c206fb5a	2018-09-02 14:39:38 -07:00
pbialecki	cf10efb8d4	Fixes unclear exception message for F.conv2d (#11053 ) Summary: Fixes #11033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11053 Differential Revision: D9573606 Pulled By: soumith fbshipit-source-id: 9729cbd6c8afcef0fd487bdd425b0d1f55189009	2018-09-02 13:39:34 -07:00
vishwakftw	593d74061f	Document torch.allclose (#11185 ) Summary: - Modify torch.autograd.gradcheck to use torch.allclose instead - Expose doc strings Closes #10355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11185 Differential Revision: D9628016 Pulled By: soumith fbshipit-source-id: 22a30622b9fe52e41b5b3540406137b59d8c5a75	2018-09-02 09:26:07 -07:00
iotamudelta	33c7cc13ca	improve docker packages, fix bugs, enable tests, enable FFT (#10893 ) Summary: * improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs) * integrate rocFFT (i.e., enable Fourier functionality) * fix bugs in ROCm caused by wrong warp size * enable more test sets, skip the tests that don't work on ROCm yet * don't disable asserts any longer in hipification * small improvements Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893 Differential Revision: D9615053 Pulled By: ezyang fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b	2018-09-02 08:54:42 -07:00
Samuel Ainsworth	abe8b3391d	LowRankMultivariateNormal cleanup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11179 Differential Revision: D9627502 Pulled By: soumith fbshipit-source-id: c7a4aa8be24bd8c688a7c655ff25ca901ed19704	2018-09-02 07:54:56 -07:00
Marcin Elantkowski	4d28b65fb8	fix serialization of nn.Parameter with dill (#10296 ) Summary: Should resolve #9981. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10296 Differential Revision: D9196353 Pulled By: soumith fbshipit-source-id: 109b6da42b7240cdbc7a0586745c735bce5e1279	2018-09-01 23:55:40 -07:00
Tongzhou Wang	1350f76b62	Fix max and min with inf on CUDA (#11091 ) Summary: Fixes #10237 #11084 cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/11091 Differential Revision: D9582859 Pulled By: SsnL fbshipit-source-id: 3991c0a2af65ba82fa815b82f9e6b2107912fd10	2018-09-01 23:09:23 -07:00
Owen Anderson	7eba9849c1	Pool constants during script compilation. (#10231 ) Summary: This places all constants in the entry block of the graph, and de-duplicates them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10231 Differential Revision: D9601501 Pulled By: resistor fbshipit-source-id: daa10ed8c99e9894830d6f3e5d65c8d3ab5ea899	2018-09-01 22:40:50 -07:00
Edward Yang	7af6f9515f	Move TensorAccessor to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11014 Reviewed By: cpuhrsch Differential Revision: D9561802 fbshipit-source-id: d3dbe6d7e76e2419ead81fb448711f101daee19f	2018-09-01 21:41:26 -07:00
Tongzhou Wang	011f615945	Fix compile warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11177 Reviewed By: soumith Differential Revision: D9626443 Pulled By: SsnL fbshipit-source-id: e75d893e1e91e49d3e7b021892434489d8df7987	2018-09-01 21:41:25 -07:00
Adam Paszke	1506547771	Disable -Werror on macOS test build (#11090 ) Summary: cc goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/11090 Reviewed By: soumith Differential Revision: D9582525 Pulled By: apaszke fbshipit-source-id: 5d2c6e930e7b09f0ed5a35fbf4fe36b8845a2580	2018-09-01 21:09:49 -07:00
Soumith Chintala	f60a2b682e	allow spaces in filename for jit-compiled cpp_extensions (#11146 ) Summary: Now, folder having spaces will not error out for `torch.utils.cpp_extensionload(name="xxx", sources=["xxx.cpp"], verbose=True)` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11146 Differential Revision: D9618838 Pulled By: soumith fbshipit-source-id: 63fb49bfddc0998dccd8a33a6935543b1a6c2def	2018-09-01 20:39:51 -07:00
James Reed	43e73f85ad	Dont optimize slicing dispatch when we are tracing (#11156 ) Summary: Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization. This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156 Differential Revision: D9622252 Pulled By: jamesr66a fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc	2018-09-01 17:13:03 -07:00
Xiaomeng Yang	b3d559cdd1	Optimize WeightedSumOp for two inputs (#11049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049 Optimize WeightedSumOp for two inputs Reviewed By: houseroad Differential Revision: D9566692 fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f	2018-09-01 11:54:55 -07:00
Shihao Xu	b834d9107e	Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164 Revert D9566744 Reviewed By: enosair Differential Revision: D9620272 fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02	2018-08-31 22:25:57 -07:00
Lu Fang	1b7172a2b9	fix the slice onnx exporting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11117 Reviewed By: MisterTea Differential Revision: D9597870 Pulled By: houseroad fbshipit-source-id: 3a2a307ee327397939bedb9150f780682e18a89a	2018-08-31 17:40:03 -07:00
James Reed	03c06ec93d	Traceable detach (#11038 ) Summary: This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038 Differential Revision: D9588038 Pulled By: jamesr66a fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7	2018-08-31 16:40:42 -07:00
Christian Puhrsch	861e1c430c	Move StorageImpl and Storage to core (#11154 ) Summary: Will need to be accessible by caffe2 This also removes a bunch of unnecessary includes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11154 Reviewed By: ezyang Differential Revision: D9618681 Pulled By: cpuhrsch fbshipit-source-id: 838a87b75d9c3959e145fd5fca13b63bc5de7bd3	2018-08-31 15:55:26 -07:00
Peter Goldsborough	4abddad1a0	use py::str to remove deprecation warnings (#11107 ) Summary: ``` In file included from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/cast.h:13:0, from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/attr.h:13, from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pybind11.h:43, from caffe2/torch/csrc/utils/pybind.h:6, from caffe2/torch/csrc/jit/pybind.h:5, from caffe2/torch/csrc/jit/script/init.h:3, from caffe2/torch/csrc/jit/script/init.cpp:1: third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pytypes.h:118:19: note: declared here In file included from caffe2/torch/csrc/jit/pybind.h:12:0, from caffe2/torch/csrc/jit/python_ir.cpp:4: caffe2/torch/csrc/jit/pybind_utils.h: In function 'torch::jit::IValue torch::jit::argumentToIValue(const torch::jit::FunctionSchema&, size_t, pybind11::handle)': caffe2/torch/csrc/jit/pybind_utils.h:138:226: warning: 'pybind11::str pybind11::detail::object_api<Derived>::str() const [with Derived = pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr>]' is deprecated: Use py::str(obj) instead [-Wdeprecated-declarations] ``` apaszke zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11107 Differential Revision: D9598040 Pulled By: goldsborough fbshipit-source-id: 4a055353ac08d54a2bbca49573ff099310de3666	2018-08-31 15:25:04 -07:00
Lu Fang	c48bf3a77e	Automatic update of fbcode/onnx to 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c (#11153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11153 Previous import was bae6333e149a59a3faa9c4d9c44974373dcf5256 Included changes: - [1b09eb1](https://github.com/onnx/onnx/commit/1b09eb1): Fix the shape inference for concat (#1361) <Lu Fang> - [7b9b3ee](https://github.com/onnx/onnx/commit/7b9b3ee): ONNX v1.3.0 release (#1359) <bddppq> Reviewed By: Ac2zoom Differential Revision: D9615844 fbshipit-source-id: f1d4e2d6ef72a269d6ab3c1c347b272b5bdc4f2a	2018-08-31 14:55:15 -07:00
Peter Goldsborough	5987b44dda	Remove aten doc/ folder (#11158 ) Summary: ATen's doc/ folder is manually maintained and can thus cause confusion with the generated file. We now have proper online documentation for ATen, which is superior to ATen doc/. Let's delete ATen/doc. ezyang apaszke soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11158 Differential Revision: D9618782 Pulled By: goldsborough fbshipit-source-id: 0ef14f84947601a0589aa4a41e5c8619783426fe	2018-08-31 14:55:13 -07:00
Adam Paszke	3081c8ea1d	Lower trivial differentiable subgraphs (#11110 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11110 Differential Revision: D9616408 Pulled By: apaszke fbshipit-source-id: f1ae77d698bf0ada32f2c1c3f587e46a4f57a867	2018-08-31 14:55:10 -07:00
Christian Puhrsch	c87d082d26	Use ->data<real>() instead of THTensor_(data) and c10::raw::intrusive_ptr::decref instead of _free (#11039 ) Summary: Codemod used for this ``` grep -rnw "THTensor_(free)" aten \| grep -v Binary \| cut -f 1 -d ":" \| xargs -I {} sed -i "s/THTensor_(free)($[^)]$)/c10::raw::intrusive_ptr::decref(\1)/g" {} ``` ``` grep -rnw "THTensor_(data)" aten \| grep -v Binary \| cut -f 1 -d ":" \| xargs -I {} sed -i "s/THTensor_(data)($[^)]$)/\1->data<real>()/g" {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11039 Reviewed By: ezyang Differential Revision: D9617265 Pulled By: cpuhrsch fbshipit-source-id: d9e7581867a335703f82f4556cead2b32b97bd83	2018-08-31 14:27:09 -07:00
Edward Yang	adeebed549	Delete TensorImpl::toString() (#11035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11035 Instead, inline its definition into Tensor. We need to do this so we can avoid needing to getType() from TensorImpl. Reviewed By: cpuhrsch Differential Revision: D9564516 fbshipit-source-id: 19fdaa2b93419e21572b9916714aee4165cb3390	2018-08-31 14:27:08 -07:00
Edward Yang	5286925d4a	Add getMaybeVariableType(const TensorImpl*) (#11031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11031 The eventual plan is to get rid of TensorImpl::type() entirely; but first we need a function to call. Reviewed By: cpuhrsch Differential Revision: D9564206 fbshipit-source-id: b59a9ccfaed44199f185eff392835cec89ccda8e	2018-08-31 14:27:06 -07:00
Edward Yang	2c5ae8c4bf	Get rid of type() method on TensorOptions; use at::getType instead (#11023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023 I'd like TensorOptions to not know anything about Context, so I can move it to ATen/core without pulling in Context. To do this, the type() method has to go, since it consults the context to get a Type. Reviewed By: cpuhrsch Differential Revision: D9562467 fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0	2018-08-31 14:27:05 -07:00
Edward Yang	fd110411b7	Don't convert TensorOptions to type before printing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11145 Reviewed By: cpuhrsch Differential Revision: D9613897 fbshipit-source-id: eaa28b24992e8202cecb5ab97fa541fcf49a205f	2018-08-31 14:27:03 -07:00
Edward Yang	48c2f3cf0f	Move TensorOptions Tensor methods to TensorMethods.h (#11144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11144 We can move them now that TensorMethods no longer references Tensor. Reviewed By: cpuhrsch Differential Revision: D9613800 fbshipit-source-id: 99ad1dd7d77eb319000769230b7016294cf1980f	2018-08-31 14:27:02 -07:00
Adam Paszke	780d2792c5	Warn about non-traceable behavior when tracing (#11088 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088 Differential Revision: D9585527 Pulled By: apaszke fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2	2018-08-31 14:27:00 -07:00
Peter Goldsborough	c31ebccd01	Clean up TupleType and SchemaParser (#11007 ) Summary: Some fixes to address your comments zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11007 Differential Revision: D9597750 Pulled By: goldsborough fbshipit-source-id: f35f4801707dff2367e9dfc7d4e968357bc2b832	2018-08-31 14:26:59 -07:00
Sebastian Messmer	f4b2961af9	Simplify assignment operators (#11027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11027 Using swap() as a primitive, copy and move assignment become much easier. Reviewed By: ezyang Differential Revision: D9563753 fbshipit-source-id: e74faf39b596f097de758bfe038639565807040a	2018-08-31 13:43:41 -07:00
Orion Reblitz-Richardson	6508db7421	Remove BUILD_CAFFE2 and build everything (#8338 ) Summary: This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification. cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338 Reviewed By: mingzhe09088 Differential Revision: D9600513 Pulled By: orionr fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d	2018-08-31 13:10:24 -07:00
Edward Yang	a2a584f347	Proper recompilation tracking for more files in tools/autograd (#11143 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11143 Differential Revision: D9613758 Pulled By: ezyang fbshipit-source-id: 08ed143739438435e0e8219dff3a738ab424c3e1	2018-08-31 13:10:21 -07:00
Teng Li	3791bd12c8	PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128 ) Summary: Added MPI group support. And this will make all previous group test cases of MPI passed. Also, release the MPI thread level support by serializing different PG's MPI ops. This is required. The build is fixed too Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128 Differential Revision: D9602188 Pulled By: teng-li fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497	2018-08-31 12:39:56 -07:00
Edward Yang	d95e68c8cc	Delete Tensor constructor from TensorOptions. (#11101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11101 I'd like to invert the dependency between Tensor and TensorOptions (such that Tensor includes TensorOptions); to do this, I'd prefer there to not be a Tensor constructor. Eventually, all references of Tensor will disappear from TensorOptions.h Reviewed By: cpuhrsch Differential Revision: D9585627 fbshipit-source-id: dd4a28b2c06b1e55f629762915f03c2b6c34d840	2018-08-31 09:55:01 -07:00
Edward Yang	a585158c9e	Some usage examples for TensorOptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11081 Reviewed By: goldsborough Differential Revision: D9579371 fbshipit-source-id: 329a07fc2e58f57384c8a840bcdebc2c6d4f7bb1	2018-08-31 09:40:30 -07:00
Hector Yuen	e2bdd35cf0	fixes to device.cc (#11122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11122 these changes add fixes to device.cc that are appropriate to create the intra-device-copies for opencl Reviewed By: bwasti Differential Revision: D9553292 fbshipit-source-id: e59f17916b5df30a504adee0718f9cecfe28f35a	2018-08-31 09:25:26 -07:00
Edward Yang	f30fd7fb5c	Get rid of the runtime type in TensorOptions (#11021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11021 We can now store a boolean saying if we want a Variable or not, and context can use VariableHooks to get a VariableType if we request one. Reviewed By: cpuhrsch Differential Revision: D9562312 fbshipit-source-id: 84653cd789622764132252406a5ea1a83eee3360	2018-08-31 09:10:52 -07:00
Edward Yang	1db5a7d8f0	Move variable getType lookup support to Context Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11017 Reviewed By: cpuhrsch Differential Revision: D9562197 fbshipit-source-id: dd00c79592d6c59f2e21c9d62fea3a2c093b609b	2018-08-31 09:10:51 -07:00
Edward Yang	9fac0a5093	Rename at::getType to at::getNonVariableType (#11096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11096 To discourage willy-nilly use, and make it clearer that it is not a Variable Reviewed By: cpuhrsch Differential Revision: D9583699 fbshipit-source-id: 4fbde0c01ae3deb2c7ef8c125a9028f089b203ae	2018-08-31 09:10:49 -07:00
Adam Paszke	0961c923c0	Unbreak the build Summary: The controller you requested could not be found. fbshipit-source-id: 861021dbe88f84d1a8bd80e04dd684527384629f	2018-08-31 08:13:12 -07:00
Edward Yang	3073051a18	Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam Differential Revision: D9554375 Original commit changeset: b88768f470ef fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1	2018-08-31 07:54:31 -07:00
Adam Paszke	82aeebb3d9	Fix a bug in addmm fusion in the JIT (#11100 ) Summary: Fixes #10839. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11100 Differential Revision: D9585533 Pulled By: apaszke fbshipit-source-id: 19e2710c8fc113f577faf14c080d8c89afbe23c4	2018-08-31 07:24:34 -07:00
Chenguang Xi	0555768e0f	Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993 as title Reviewed By: chocjy Differential Revision: D9554375 fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422	2018-08-31 00:55:39 -07:00
Shashank Singh	f1bfe6750f	Back out "[caffe2] Update blackbox predictor with new constructor" (#11105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11105 Reverts: D9516972 See this discussion for context: https://fburl.com/w45hb1oc Reviewed By: highker Differential Revision: D9587931 fbshipit-source-id: 715247929d819dfa88e1d051021e51c5bf0c4835	2018-08-31 00:55:36 -07:00
Ansha Yu	9fae8fcdff	framework for committed serialized tests (#10594 ) Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594 Reviewed By: ezyang Differential Revision: D9370359 Pulled By: ajyu fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8	2018-08-30 22:41:46 -07:00
Adam Paszke	00df09b65d	Change specialization rules in GraphExecutors (#10977 ) Summary: Review last commit only. Stacked on top of #10949. This commit fixes a number of issues connected to caching differentiability status of graphs inside graph executors, and changes the rules for optimization of differentiable subgraphs. Previously every one of those was instantiated as a separate graph executor, but now they are simply heavier-optimized graph regions, and graph executors are only instantiated for their backward. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977 Differential Revision: D9600626 Pulled By: apaszke fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3	2018-08-30 22:11:01 -07:00
Jerry Zhang	a320e5cbd3	Move static_context outside of class (#11097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11097 att Reviewed By: ezyang Differential Revision: D9549702 fbshipit-source-id: 058b942311b00be20a0b557ba97eb3451ea55e33	2018-08-30 22:10:58 -07:00
Edward Yang	750ede7215	Rename getType to getVariableTypeFromBaseType / getVariableType (#11095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095 We used getType to mean a lot of things. - getVariableTypeFromBaseType: given a base Type (non-Variable type) compute the Variable Type which corresponds to it. - getVariableType: like at::getType, but return the Variable type rather than the plain type. This rename makes it clearer at the use-site what things are what, and will make a subsequent rename of at::getType easier. Reviewed By: gchanan, cpuhrsch Differential Revision: D9583630 fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca	2018-08-30 20:11:25 -07:00
Edward Yang	c836a04dc8	Delete a bunch of uses of getType in favor of TensorOptions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11087 Reviewed By: cpuhrsch Differential Revision: D9581560 fbshipit-source-id: ebe3c4c0956da8a7215ada287bf6526dbcb2b07d	2018-08-30 20:11:24 -07:00
Edward Yang	34a0604d51	Eliminate use of getType from DLConvertor (#11080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11080 - Add a new TensorOptions(Device, ScalarType) constructor, which serves roughly the same role as getType used to. We shouldn't get too wild with these constructors, but since this particular one was widely used by getType, it seems worth adding. - Change DLPack DeviceType conversion to at::DeviceType, rather than at::Backend. While I'm at, add a few more conversions that at::DeviceType understands. - Add a new overload of from_blob which understands strides. Reviewed By: gchanan, cpuhrsch Differential Revision: D9578734 fbshipit-source-id: 28288ec053aae8765e23925ab91023398d632d6b	2018-08-30 20:11:23 -07:00
Edward Yang	c283acce72	Rename getTypeRaw to getNonVariableTypeRaw (#11078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11078 ``` codemod -d . --extensions cc,cpp,cu,cuh,h getTypeRaw getNonVariableTypeRaw ``` Reviewed By: gchanan, cpuhrsch Differential Revision: D9578399 fbshipit-source-id: 00a86ae8fb00d14116762ce39d15858da9a1671e	2018-08-30 20:11:21 -07:00
Edward Yang	66c4d7e060	Rename getTypeOpt to getNonVariableTypeOpt (#11077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11077 getType now supports retrieving variable types, so make it clearer when a getType function does NOT give you a variable type. ``` codemod -d . --extensions cc,cpp,cu,cuh,h getTypeOpt getNonVariableTypeOpt ``` Reviewed By: gchanan Differential Revision: D9578398 fbshipit-source-id: 3ee502ac5c714849917f11ddc71de8eacfdaa9d3	2018-08-30 20:11:20 -07:00
Adam Paszke	f3c3127c67	Don't flatten output lists in the JIT IR (#10949 ) Summary: Operators like aten::chunk used to return a number of tensors, but now return a list. To make it easier to do shape prop through aten::chunk and fuse it, I've also introduced prim::ConstantChunk, which behaves like the previous implementation (has a variable length output list). The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949 Reviewed By: zdevito Differential Revision: D9556823 Pulled By: apaszke fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee	2018-08-30 19:54:39 -07:00
Orion Reblitz-Richardson	c8c21fa2b4	Allow same flags when glog is used or not (#11034 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11034 Reviewed By: mingzhe09088 Differential Revision: D9582801 Pulled By: orionr fbshipit-source-id: b41ca1bebf6cf62fff2a2b8caf4c94af3e43db00	2018-08-30 19:24:51 -07:00
Fei Sun	26409a4300	Caffe2 flags needs to be used after the GlobalInit function is called Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11120 Reviewed By: llyfacebook Differential Revision: D9598430 Pulled By: sf-wind fbshipit-source-id: 468f0ed7880339c9c4467d1cef29f5bc9fc80a2a	2018-08-30 19:10:39 -07:00
Hector Yuen	a6cb41486d	update documentation for observers Summary: update to the latest observer usage syntax add an example of HistogramObservers Reviewed By: jspark1105 Differential Revision: D6878439 fbshipit-source-id: c9521f2daecfc7f0c17de6a944dce58e568e3dbe	2018-08-30 18:11:48 -07:00
Tongliang Liao	15314c7b8e	GCC-7 doesn't like the original syntax. (#10665 ) Summary: Replace with "this->template f<T>()". Fix #7881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10665 Differential Revision: D9597187 Pulled By: ezyang fbshipit-source-id: 8af4e7efd98edadabb97e2523a58bd21bc116d1a	2018-08-30 16:41:16 -07:00
Jerry Zhang	684bd1b7bd	size_ -> numel_ (#11112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11112 att Reviewed By: ezyang Differential Revision: D9474018 fbshipit-source-id: d9267e52e2d50dac7524a456a44f2e28b6c0b693	2018-08-30 16:41:13 -07:00
Peter Goldsborough	7ddc6f84c4	NULL -> nullptr (#11047 ) Summary: How did we get so many uses of `NULL` again? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047 Differential Revision: D9566799 Pulled By: goldsborough fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3	2018-08-30 16:25:42 -07:00
Junjie Bai	302e9cb815	Update onnx submodule to onnx/onnx@bae6333 (#10961 ) Summary: ONNX v1.3.0 release Pull Request resolved: https://github.com/pytorch/pytorch/pull/10961 Reviewed By: houseroad Differential Revision: D9543998 Pulled By: bddppq fbshipit-source-id: b7f0a0553d832d609d3b7613a608f7bf4a2582ef	2018-08-30 15:25:57 -07:00
Orion Reblitz-Richardson	56c737a9b7	Inject GetEmptyStringAlreadyInited once for static proto (#11045 ) Summary: I've been seeing a lot of warnings about multiple declarations of this. Hopefully this fixes it. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11045 Reviewed By: mingzhe09088 Differential Revision: D9582756 Pulled By: orionr fbshipit-source-id: 6171485609a2f2f357d6e1c44e26b4ecfcdb4ce6	2018-08-30 14:59:54 -07:00
Jerry Zhang	a136d29fd1	Use intrusive_ptr in Storage (#10907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10907 replace shared_ptr with intrusive_ptr in Storage Reviewed By: ezyang Differential Revision: D9414388 fbshipit-source-id: d413549ffde24959166d2dff2042b99f0c5018af	2018-08-30 14:59:52 -07:00
Adam Paszke	f0142faab0	Expose arbitrary cpp autograd functions to Python (#11082 ) Summary: This is needed because the JIT declares some custom autograd functions. colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/11082 Differential Revision: D9580456 Pulled By: apaszke fbshipit-source-id: 6bf00c1188a20b2ee6ecf60e5a0099f8263ad55a	2018-08-30 14:25:59 -07:00
Zachary DeVito	93bd291e55	Change torch.jit.trace to no longer be a decorator (#11069 ) Summary: This was done because it surprising for a decorator to run a function rather than wrap it, and not simplify the syntax for tracing modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069 Reviewed By: jamesr66a Differential Revision: D9583192 Pulled By: zdevito fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13	2018-08-30 13:56:05 -07:00
Sebastian Messmer	ebe9d204fa	Add test cases to intrusive_ptr (#11026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11026 ezyang fixed a bug with moving or copying an intrusive_ptr into itself. This diff adds test cases for it. Reviewed By: ezyang Differential Revision: D9563464 fbshipit-source-id: 3a3b3f681124730d2500b276c0135c3bba7875ae	2018-08-30 13:25:33 -07:00
Tongzhou Wang	e85f3fccb3	Fix relying on UB in test_data_parallel_nested_output (#11092 ) Summary: We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092 Reviewed By: ezyang Differential Revision: D9583274 Pulled By: SsnL fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7	2018-08-30 13:10:25 -07:00
mruberry	9d4360c060	Creates stream pool (#9938 ) Summary: This PR creates a stream pool per issue #9646. When a new stream is requested, that device it's requested on lazily creates two pools, one low priority and one high priority, of 32 streams each. Streams are returned from these pools round-robin. That is, stream 0 is returned, then stream 1... then stream 31, then stream 0... This PR also takes the opportunity to clean up the stream API, reducing its complexity and verbosity. Change notes: - There are now 3 sets of streams per device, the default stream, the low priority streams, and the high priority streams. These streams live in lazily initialized pools and are destroyed on shutdown. - All stream refcounting has been removed (the pools pattern replaces it). - Setting a stream now sets it on its device. Streams are associated with a device and the previous requirement to specify that device was unnecessary. - There is no exposure for setting the flags on a stream. This may also seem like a regression but the flag was always set to cudaStreamNonBlocking. - Streams are now low or high priority whereas previously the priority could be set with an integer. In practice, however, the range for priorities is -1 to 0 on the latest hardware. -1 is high priority, 0 is low priority (aka default priority). Low vs. high actually clarifies this behavior if people were trying finer separations. (E.g., if someone tried streams with priorities 0, 1, and 2, they would actually all have priority 0, historically, and the intended behavior would not be respected.) - Unused THCStream and THCState stream-related functions were removed. - A new test of pooling behavior was added in stream_test. fyi: colesbury, apaszke, goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/9938 Reviewed By: SsnL Differential Revision: D9569036 Pulled By: ezyang fbshipit-source-id: 12ed673fe373170d0cf4d65cb570de016c53ee7d	2018-08-30 12:40:23 -07:00
Pádraig Brady	23b0c90e71	caffe2: fix gcc8 warnings Summary: The warnings are erroneous as far as i can see, so tweak things to avoid. The (unsigned int) cast is to avoid passing -1 to a size_t time. This was triggered in gcc8's lto build only, giving: caffe2/aten/src/TH/generic/THTensor.cpp: In function ‘THFloatTensor_squeeze1d’: lto1: error: ‘__builtin_memset’ specified size 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=] In function ‘newImpl’, inlined from ‘operator new’ at common/memory/OperatorOverride.cpp:86:23, inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/ext/new_allocator.h:111:0, inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/alloc_traits.h:436:0, inlined from ‘_M_allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:172:0, inlined from ‘_M_default_append’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/vector.tcc:571:0, inlined from ‘resize’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:671:0, inlined from ‘THTensor_resizeDim’ at caffe2/aten/src/TH/THTensor.hpp:123:0, inlined from ‘THFloatTensor_squeeze1d.part.198’ at caffe2/aten/src/TH/generic/THTensor.cpp:429:0, inlined from ‘THFloatTensor_squeeze1d’: common/memory/OperatorOverride.cpp:86:23: error: argument 1 value ‘18446744073709551608’ exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=] void* ptr = malloc(size); Reviewed By: soumith Differential Revision: D9568621 fbshipit-source-id: 4569a4be897d669caa3f283f4b84ec829e8d77ad	2018-08-30 11:55:29 -07:00
Erik Brinkman	611a608517	Add ATen pdist CPU kernel (#10782 ) Summary: Also add single grad whitelist to the jit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782 Reviewed By: ezyang Differential Revision: D9583378 Pulled By: erikbrinkman fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944	2018-08-30 11:55:27 -07:00
Adam Paszke	029082e87c	Add entry for torch/lib/pythonX.Y in .gitignore (#11083 ) Summary: I've had `torch/lib/python3.6` show up as part of the build for some time now. It's not ignored which means I need to be extra careful about checking in files, or I end up with a thousand of them in my index. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11083 Differential Revision: D9580453 Pulled By: apaszke fbshipit-source-id: 369e4fe87962696532d111b24f2a4a99b9572bf2	2018-08-30 11:40:25 -07:00
Jerry Zhang	40227671e9	Add strides to caffe2::Tensor (#10826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10826 Add strides, and make sure the strides are consistent with sizes, and is_contiguous, for all the Caffe2 functions. is_contiguous means strides_[dim-1] = 1 and strides_[i] = strides_[i+1] * max(size_[i+1], 1); Reviewed By: ezyang Differential Revision: D9354480 fbshipit-source-id: 3643871b70f1111b7ffdd9fdd9fe9bec82635963	2018-08-30 11:25:58 -07:00
Orion Reblitz-Richardson	535633bddc	Export MPI functions (#11037 ) Summary: Potential fix for https://github.com/caffe2/caffe2/issues/2551#issuecomment-417124872 cc Yangqing mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11037 Reviewed By: mingzhe09088 Differential Revision: D9580937 Pulled By: orionr fbshipit-source-id: 5e1fbf718728271a5b5af526d8e67cc5b48f0575	2018-08-30 10:42:02 -07:00
Fei Sun	e7195431e0	Add benchmarking functionality to the benchmark app (#10976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10976 The app can run in XCode with the benchmark metrics collected. It can also run when building with buck Reviewed By: llyfacebook Differential Revision: D9546755 fbshipit-source-id: 60ad0112946f8cf57138417f6838a58ed6d2c90f	2018-08-30 09:54:55 -07:00
Xingdong Zuo	a8af7fe46a	Support import of `nn.RNNCellBase` in `__all__` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10992 Differential Revision: D9572005 Pulled By: soumith fbshipit-source-id: 26b546830b6a25a4f7ba6f825cd888d678233a97	2018-08-30 08:25:21 -07:00
Tullie Murrell	dbc0004f99	Remove use_count() == 1 in Tensor::Extend (#11046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11046 As suggested by jerryzh168, temporary fix for a new constraint that was added D9350686 is to remove this assert. Long term jerryzh168 is going to work out a better way of handling this. Reviewed By: jerryzh168 Differential Revision: D9566323 fbshipit-source-id: e4630c7cbe0cc68a084974ea7048654811fae01f	2018-08-29 23:55:28 -07:00
Tongzhou Wang	23af7deea7	Add has_lapack flag (#11024 ) Summary: Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python. Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024 Differential Revision: D9564898 Pulled By: SsnL fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2	2018-08-29 22:41:16 -07:00
Shihao Xu	ad1670cf54	Kill the dummy TaskOutput when task.get_step() (#11048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9566744 fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af	2018-08-29 20:11:29 -07:00
Christian Puhrsch	16b8e0a787	at::StorageImpl: Rename size_ to numel_ and elementSize() to itemsize() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11011 Reviewed By: ezyang Differential Revision: D9561898 Pulled By: cpuhrsch fbshipit-source-id: 0cf5cdc3e7acd397f7e2d66097856aaad0581147	2018-08-29 20:11:27 -07:00
Lu Fang	394bdcd49a	Fix the build of aten tests when FULL_CAFFE2=1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11019 Reviewed By: orionr Differential Revision: D9562691 Pulled By: houseroad fbshipit-source-id: 95a8dee580e5f4dc9af3a2e1f68ec6c62a0e4e04	2018-08-29 18:09:54 -07:00
Yi Cheng	e550eab3e2	Remove MetaNetDef test case in Predictor (#11052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11052 Delete the test case for Predictor with constructing by MetaNetDef since the constructor actually has been deprecated. The broken PR is for construcing predictor from DB instance. Reviewed By: highker Differential Revision: D9566935 fbshipit-source-id: 5511883953a2d3f6eb0a4f1c5518a1bc4b3ffbdc	2018-08-29 17:55:21 -07:00
Christian Puhrsch	91ecbf8b1d	Remove TensorBase (#11036 ) Summary: Not subclassed except by Tensor. Also requried to align further with caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11036 Reviewed By: ezyang Differential Revision: D9565640 Pulled By: cpuhrsch fbshipit-source-id: ff7203a2c95d3f3956282b4f2d8dda6c2b93f4a6	2018-08-29 17:27:19 -07:00
Zachary DeVito	ae635b16f7	Record tensor factory functions in trace (#10935 ) Summary: Things like torch.zeros now appear in traces rather than constants. To continue to support our current level of ONNX export, we run constant prop to turn these back into constants where possible before export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935 Differential Revision: D9527427 Pulled By: zdevito fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a	2018-08-29 17:10:24 -07:00
Roy Li	c4e1adf29d	Remove THHalf type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11010 Reviewed By: ezyang Differential Revision: D9561325 Pulled By: li-roy fbshipit-source-id: 053cf2925ec1fc458db31e92bd31ffd23389f3e8	2018-08-29 16:44:45 -07:00
pbialecki	2cc98d8df7	Adds `dim` argument to `torch.unique` (#10423 ) Summary: Initial version of `unique` supporting a `dim` argument. As discussed in [this issue](https://github.com/pytorch/pytorch/issues/9997) I added the `dim` argument to `torch.unique` with the same behavior like [numpy](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.unique.html). Since the implementation is based on `std/thrust::unique`, the `tensor` always needs to be sorted. The `sorted` argument in `torch.unique` does not have any function, as in the CUDA version of the plain `torch.unique`. To check the performance and equal behavior between `torch.unique` and `np.unique`, I've used [this gist](https://gist.github.com/ptrblck/ac0dc862f4e1766f0e1036c252cdb105). Currently we achieve the following timings for an input of `x = torch.randint(2, (1000, 1000))`: (The values are calculated by taking the average of the times for both dimension) \| Device \| PyTorch (return_inverse=False) \| Numpy (return_inverse=False) \| PyTorch (return_inverse=True) \| Numpy (return_inverse=True) \| \| --- \| --- \| --- \| --- \| --- \| \| CPU \| ~0.007331s \| ~0.022452s \| ~0.011139s \| ~0.044800s \| \| GPU \| ~0.006154s \| - \| ~0.105373s \| - \| Many thanks to colesbury for the awesome mentoring and the valuable advices on the general implementation and performance issues! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10423 Differential Revision: D9517289 Pulled By: soumith fbshipit-source-id: a4754f805223589c2847c98b8e4e39d8c3ddb7b5	2018-08-29 16:26:09 -07:00
Bram Wasti	98d85b1790	Debugging help + test Summary: When conversion fails, dump more information to help fix up the netdef Reviewed By: hyuen, yinghai Differential Revision: D9558667 fbshipit-source-id: 8917cc61c9be6285697e4f8395a9dbc7135f618e	2018-08-29 16:26:07 -07:00
Christian Puhrsch	ef7fc2a3e1	Remove at::StorageImpl::finalizer_ (#11022 ) Summary: Unused member variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/11022 Reviewed By: ezyang Differential Revision: D9562520 Pulled By: cpuhrsch fbshipit-source-id: af190b3ba06d33d65fa0fabffb34a0df769f38d0	2018-08-29 16:09:47 -07:00
Christian Puhrsch	6b87198245	Devirtualize StorageImpl deconstructor (#11018 ) Summary: Further align at::StorageImpl with caffe2::StorageImpl Pull Request resolved: https://github.com/pytorch/pytorch/pull/11018 Reviewed By: ezyang Differential Revision: D9562256 Pulled By: cpuhrsch fbshipit-source-id: d929317f6226a1e2550b78034b723afbae343aaa	2018-08-29 15:39:54 -07:00
Adam Paszke	d9b74f6540	Make it possible to disable JIT using env variables (#10867 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867 Differential Revision: D9556882 Pulled By: apaszke fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4	2018-08-29 15:11:05 -07:00
jgong5	c755616e00	Enable Detectron model inference for CPU and MKL-DNN paths (#10157 ) Summary: 1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks. 2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models. 3. Ignore 0-dim tensor in MKL-DNN concat operator. 4. Generate dynamic library of Detectron module for CPU device. This PR obsoletes #9164. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157 Differential Revision: D9276837 Pulled By: yinghai fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f	2018-08-29 15:11:01 -07:00
Tommy Yu	89834dfe64	Add GPU version of HardSigmoid Op to Caffe2 (#10955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955 Add GPU version of HardSigmoid Op to Caffe2. Updated test file to include GPU tests. Reviewed By: enosair Differential Revision: D9499353 fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545	2018-08-29 14:55:29 -07:00
Zhanibek Datbayev	22e3b2c9c3	Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() Differential Revision: D9413150 Original commit changeset: 51aaf3201e26 fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a	2018-08-29 14:39:49 -07:00
Yangqing Jia	6a8bc3804a	Add flush to logging messages higher than INFO. (#10983 ) Summary: This probably fixes the logging test error that orionr is encountering - haven't tested locally but wanted to send out a PR to kick off CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10983 Reviewed By: ezyang Differential Revision: D9552607 Pulled By: Yangqing fbshipit-source-id: 9ac019031ffd9c03972144df04a836e5dcdafe02	2018-08-29 14:39:48 -07:00
Edward Yang	0b1de74732	Documentation improvement in caffe2/core/tensor.h (#11006 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11006 Reviewed By: smessmer Differential Revision: D9558383 Pulled By: ezyang fbshipit-source-id: 7d36fb69a6e8a7d064da2c8796dc263a9fd4e094	2018-08-29 14:25:38 -07:00
Tongzhou Wang	e9eed8edb4	Add doc for Tensor.digamma_? (#11008 ) Summary: follow up for #10967 zou3519 vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/11008 Differential Revision: D9559889 Pulled By: SsnL fbshipit-source-id: a05d8fbad92a54bcdb93de6e62a7f94180da1d99	2018-08-29 14:11:16 -07:00
Edward Yang	f687ff5a59	Delete unnecessary includes from TensorImpl.h (#11005 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11005 Reviewed By: smessmer Differential Revision: D9558300 Pulled By: ezyang fbshipit-source-id: ebebb3c6d3a1a2f7cc3da9fe9d3c56310ead46e1	2018-08-29 14:11:14 -07:00
Edward Yang	b644d5e74a	Delete context and get_context from Type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11001 Reviewed By: cpuhrsch Differential Revision: D9557315 fbshipit-source-id: b9862b8dda49194298bb1a4fbc214d466f3c8350	2018-08-29 13:55:45 -07:00
Edward Yang	cd9416317d	Minor copy-edit on setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933 Reviewed By: cpuhrsch Differential Revision: D9526650 fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f	2018-08-29 13:41:04 -07:00
Yi Cheng	c99a143eea	Update blackbox predictor with new constructor (#10920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10920 Update the black box predictor and the related code to use the constructor with PredictorConfig. Reviewed By: highker Differential Revision: D9516972 fbshipit-source-id: fbd7ece934d527e17dc6bcc740b4e67e778afa1d	2018-08-29 13:31:45 -07:00
Teng Li	56539f5fe1	PT1 Distributed Release MileStone No.1 - Completed Distributed Package and CI tests (#10871 ) Summary: The PR includes: (1) torch.distributed.c10d, which now includes the complete backward compatible frontend API for `torch.distributed` (2) `env://` init method functionality (3) Minor change to `test_distributed.py`, which is now a test for `torch.distributed.c10d`. (4) The old `test_distributed.py' is now moved to `test_distributed_thd` (5) Miscellaneous bug fixes. (6) DDP CPU test is removed since c10d doesn't have this support yet, but this is a very easy test after moving DDP CPU's dependency to torch.distributed.c10d. (7) CI config to test MPI, NCCL, and Gloo backend of c10d Now all the distributed test including c10d DDP can pass with the c10d frontend API TODO: (in a separate PR) MPI subgroup support, once this is added, CI group test will be enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10871 Differential Revision: D9554514 Pulled By: teng-li fbshipit-source-id: fb686ad42258526c8b4372148e82969fac4f42dd	2018-08-29 12:55:57 -07:00
Duc Ngo	fa7c81c640	nomnigraph - nit - code style update (#10987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10987 some code style update to make it consistent with fb cpp style Reviewed By: yinghai Differential Revision: D9550130 fbshipit-source-id: 6aef9878676c08e7d384383c95e7ba8c5c9a1bce	2018-08-29 12:55:55 -07:00
Christian Puhrsch	ec519e8a4a	Reduce number of elements within test_abs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10997 Differential Revision: D9556861 Pulled By: cpuhrsch fbshipit-source-id: 986ef275e94fcffcc04a5c1103b8b7bfb4ae3ba5	2018-08-29 12:55:54 -07:00
Yanghan Wang	dbce1c840f	exposing net_transformer_fun before add grad (#11003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11003 Need a interface to re-write the graph after the net is built and after adding gradient ops. Reviewed By: aazzolini, harouwu Differential Revision: D9557827 fbshipit-source-id: 2e082f0321c0776e488a29e18047d950948e7c37	2018-08-29 12:55:52 -07:00
Gregory Chanan	bed9d41abd	Generate Type::registerCPU as we do register_cuda_types. (#10947 ) Summary: The goal here is to separate out the base Type into core; as it was done previously we need all derived Types to be defined when we compile the base Type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10947 Reviewed By: gchanan Differential Revision: D9540025 Pulled By: ezyang fbshipit-source-id: 49f0b5acb3c378348ef3a55780abb73e4ae27edd	2018-08-29 12:39:47 -07:00
Richard Zou	4e446b85fb	Make profiler.build_table() O(n) rather than O(n^2) (#10969 ) Summary: Fixes #10851 Speeds up profiling results dramatically. For the following script: ``` import torch import time ITER = 2000 x = torch.randn(1, 1, requires_grad=True) with torch.autograd.profiler.profile() as prof: y = x for i in range(ITER): y = 3 * y - 2 * y y.backward() start = time.time() print("Done running. Preparing prof") x = str(prof) print("Done preparing prof results") end = time.time() print("Elapsed: {}".format(end - start)) ``` I get 7s before / 0.13s after these changes. cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969 Differential Revision: D9556129 Pulled By: zou3519 fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12	2018-08-29 12:25:51 -07:00
zou3519	396dec0e37	s/spaerse/sparse (#10968 ) Summary: cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/10968 Differential Revision: D9546746 Pulled By: zou3519 fbshipit-source-id: a6a4bb8bb04eccf89c3d90a90259070beb484500	2018-08-29 12:13:04 -07:00
Gregory Chanan	525548fb64	Move SparseTensorRef to core, change some includes to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10964 Differential Revision: D9545021 Pulled By: gchanan fbshipit-source-id: 8ba7e5e3a7bdf24e5aeb4bbc91957c1a6f14d7f0	2018-08-29 11:55:29 -07:00
Mingzhe Li	e0dbb91060	Windows raw string fix (#10998 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 mingzhe09088's fix of the docstrings for Windows builds. Unfortunately some versions of Windows seem to try and parse the `#` inside the string as a pre-processor declaration. We might need to change this to something else later, but want to get this landed first. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10998 Reviewed By: mingzhe09088 Differential Revision: D9557480 Pulled By: orionr fbshipit-source-id: c6a6237c27b7cf35c81133fd9faefead675a9f59	2018-08-29 11:40:08 -07:00
Orion Reblitz-Richardson	206d52d0e3	Disable smart_tensor_printer_test without glog (#10999 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 This test fails once we start building with `-DUSE_GLOG=OFF` since the non-glog logging case doesn't support flushing or streaming to the right location. For now, we just disable this test in that case. cc Yangqing mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10999 Reviewed By: mingzhe09088 Differential Revision: D9557488 Pulled By: orionr fbshipit-source-id: 8b306f210411dfc8ccc404bdccf77ddcd36a4830	2018-08-29 11:10:23 -07:00
Lu Fang	562fc7631f	Add test cases for ONNX unsqueeze (#10924 ) Summary: PyTorch exporting test and end to end cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10924 Reviewed By: Ac2zoom Differential Revision: D9548210 Pulled By: houseroad fbshipit-source-id: 2381d1ad92a4e07f97060eb65c9fd09f60ad3de6	2018-08-29 11:10:21 -07:00
Gregory Chanan	1b0d5e60ab	Get rid of some unnecessary includes of Context. (#10951 ) Summary: This is part of splitting Context from what needs to go in ATen/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10951 Differential Revision: D9540369 Pulled By: gchanan fbshipit-source-id: 73b0e8c4493785fbab368a989f46137c51f6ea0b	2018-08-29 11:10:20 -07:00
Ailing Zhang	a9469c9c8a	Fill eigenvector with zeros if not required (#10645 ) Summary: Fix #10345, which only happens in CUDA case. * Instead of returning some random buffer, we fill it with zeros. * update torch.symeig doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645 Reviewed By: soumith Differential Revision: D9395762 Pulled By: ailzhang fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba	2018-08-29 10:55:22 -07:00
Orion Reblitz-Richardson	b41988c71e	Cleanup BUILD_DOCS cmake section (#11000 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11000 Differential Revision: D9557474 Pulled By: orionr fbshipit-source-id: 7d84914b67ff37bdb7738f9b7846dfeb5b975c00	2018-08-29 10:09:52 -07:00
zou3519	7169906249	torch.digamma (#10967 ) Summary: Fixes #10307 cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/10967 Differential Revision: D9546748 Pulled By: zou3519 fbshipit-source-id: 764e27b1cc8dd487270b3ffa653b806c86f717dd	2018-08-29 09:43:19 -07:00
Lu Fang	a5d7abedae	Enable fusing aten::expand on GT, LT, EQ (#10845 ) Summary: GT, LT, EQ all support numpy broadcasting, just enable the fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10845 Reviewed By: bddppq Differential Revision: D9494089 Pulled By: houseroad fbshipit-source-id: 7c65ca06c54dbd476ac7d07b47a413faaed3dd5e	2018-08-28 23:56:50 -07:00
James Reed	db0abe1890	Fix bugs in handling of negative slice + gather indices (#10973 ) Summary: This fixes multiple bugs in the handling of negative indices in both slicing and gather operations. These were uncovered by @[1466077526:Elias Ellison]'s diff D9493614, which made it so that we actually emit negative indices when we see them in PyTorch code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10973 Reviewed By: jhcross Differential Revision: D9546183 Pulled By: jamesr66a fbshipit-source-id: 6cb0e84e8ad399e47e24a96c44025f644c17b375	2018-08-28 23:40:40 -07:00
Shihao Xu	6ca28984c7	Kill the dummy TaskOutput when task.get_step() (#10739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9413150 fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac	2018-08-28 20:41:46 -07:00
James Reed	beeec47041	Sanity checks for tracing (#10841 ) Summary: TODO: integrate into torch.onnx.export -- separate PR Problem: We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation: * The tracer encountered dynamic control flow * Some computation escaped the tracer, and appeared as a Constant tensor node in the graph * Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs Objective: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing. Stage 1 (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference. Examples: ``` torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)]) def foo(x): y = torch.arange(0, x.shape[0]).float() return x + y.unsqueeze(1) ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { - %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() ? ^ + %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]() ? +++ ^ %2 : int = prim::Constant[value=0]() %3 : Dynamic = aten::_cast_Float(%1, %2) %4 : int = prim::Constant[value=1]() %5 : Dynamic = aten::unsqueeze(%3, %4) %6 : int = prim::Constant[value=1]() %7 : Dynamic = aten::add(%0, %5, %6) return (%7); } Node diff: - %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() ? ^ + %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]() ? +++ ^ Trace source location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Check source location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper dank.py(3): <module> ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code. Node: %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() Source Location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Comparison exception: Not equal to tolerance rtol=1e-07, atol=0 (shapes (3,), (4,) mismatch) x: array([0, 1, 2]) y: array([0, 1, 2, 3]) ``` == ``` torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)]) def foo(x): y = x.data return x + y ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code. Node: %1 : Dynamic = prim::Constant[value=<Tensor>]() Source Location: dank.py(6): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Comparison exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441, 0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245], dtype=float32) y: array([0.243178, 0.315964, 0.972041, 0.0215 , 0.927751, 0.457512, 0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272], dtype=float32) ``` == ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)]) def foo(x): for _ in range(x.size(0)): x = torch.neg(x) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { %1 : Dynamic = aten::neg(%0) %2 : Dynamic = aten::neg(%1) %3 : Dynamic = aten::neg(%2) + %4 : Dynamic = aten::neg(%3) - return (%3); ? ^ + return (%4); ? ^ } ``` == ``` import torch def foo(x): if not hasattr(foo, 'cache'): foo.cache = torch.neg(x) return x + foo.cache traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo) ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { - %1 : Dynamic = aten::neg(%0) + %1 : Dynamic = prim::Constant[value=<Tensor>]() %2 : int = prim::Constant[value=1]() %3 : Dynamic = aten::add(%0, %1, %2) return (%3); } Node diff: - %1 : Dynamic = aten::neg(%0) + %1 : Dynamic = prim::Constant[value=<Tensor>]() Trace source location: test.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper test.py(8): <module> Check source location: test.py(6): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper test.py(8): <module> ``` The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios. ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)]) def foo(x): for i in range(3): x[i, :] = torch.zeros(4) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. Exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32) y: array([0., 0., 0., 0.], dtype=float32) ``` == ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)]) def foo(x): x.view(-1).add_(-x.view(-1)) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. Exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771], dtype=float32) y: array([0., 0., 0., 0., 0., 0.], dtype=float32) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841 Differential Revision: D9499945 Pulled By: jamesr66a fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45	2018-08-28 20:25:26 -07:00
Peter Goldsborough	fe15aedacc	Store schema in serialized modules and check arguments in function call (#10872 ) Summary: This PR adds argument checking for script method invocation from C++. For this I had to: 1. The schema of a method is currently not serialized in script modules, so we now store the function schema in the `doc_string` field of the ONNX proto. Upon loading of a serialized script module, we parse the schema into the structured C++ form and assign it to the loaded method, 2. Inside `Method::operator()`, we now verify the number and types of arguments. CC The controller you requested could not be found. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10872 Differential Revision: D9521219 Pulled By: goldsborough fbshipit-source-id: 5cb3d710af6f500e7579dad176652c9b11a0487d	2018-08-28 20:11:39 -07:00
Bram Wasti	ba71547e93	Add clip op to IR Summary: self explanatory Reviewed By: highker Differential Revision: D9551065 fbshipit-source-id: 14b3807af5337654c360a23816cffd7dd346bad5	2018-08-28 19:25:02 -07:00
Bram Wasti	90eb0b6031	Cleanup accidental logging Summary: cleanup Reviewed By: duc0 Differential Revision: D9549449 fbshipit-source-id: 9154b36a39936566fc2711a6e7bd33049681d1c8	2018-08-28 18:55:29 -07:00
Shihao Xu	72a84127b1	Add Workspace methods ws.feed_blob(name, arr) ws.remove_blob(name) (#10929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10929 Workspace classes methods were missing on the Python side. Being able to write the New Checkpoint Framework with more control of the workspace and cleaner implementation. Added - ws.feed_blob(name, arr) - ws.remove_blob(name) Reviewed By: mraway Differential Revision: D9486867 fbshipit-source-id: ea02d2e3a39d716a5a3da0482f57d4ac4c893763	2018-08-28 17:54:34 -07:00
Junjie Bai	8e5b8490bf	Add relevant code for adding caffe2 pybind extensions registry to rocm (#10975 ) Summary: `cfa5dbadfc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10975 Differential Revision: D9546838 Pulled By: bddppq fbshipit-source-id: 3bd6dc0a4eee582bb92fc33ed27fc40eb3ab1200	2018-08-28 15:40:37 -07:00
Orion Reblitz-Richardson	4cb968fb77	Default hidden visibility (#10752 ) Summary: Flipping to hidden visibility one more time. Let's see what fails. cc mingzhe09088 pjh5 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10752 Reviewed By: ezyang Differential Revision: D9526343 Pulled By: orionr fbshipit-source-id: c0e9c29270e95e1b2e21c598095f720c199e1e52	2018-08-28 15:25:43 -07:00
Tommy Yu	92ff070b83	Add CPU version of hard sigmoid operator to caffe2 (#10837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837 Add CPU version of hard sigmoid operator to caffe2. The definition of this operator can be found here: https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid. Reviewed By: BIT-silence Differential Revision: D9489536 fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81	2018-08-28 14:55:49 -07:00
Tongzhou Wang	efd2aeac9e	Set -Wno-stringop-overflow only with GCC >=7 (#10954 ) Summary: `stringop-overflow` is added in GCC 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10954 Differential Revision: D9546084 Pulled By: SsnL fbshipit-source-id: e6e68f993f1dbaa879ca66dc43bbcff9c49890ff	2018-08-28 14:25:29 -07:00
Duc Ngo	b3601a0425	nomnigraph - add documentation for new ReplaceSubgraph api to README.md (#10802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10802 add documentation for new ReplaceSubgraph api to README.md Reviewed By: yinghai Differential Revision: D9473282 fbshipit-source-id: 144c895564af83cc8727a0370e894c2f0b7eadf5	2018-08-28 12:55:25 -07:00
Bram Wasti	cfa5dbadfc	Add nomnigraph bindings Summary: Adds basic nomnigraph python bindings for quickly playing with the graphs. Reviewed By: duc0 Differential Revision: D9441936 fbshipit-source-id: fd70f8ea279b28c766e40f124008800acd94bddd	2018-08-28 12:40:16 -07:00
Teng Li	a88463cd9a	Working async version of AllGather, test fix and compiler warnings, and CI (#10932 ) Summary: The previous NCCL all gather doesn't work as expected. This is a fully working async version. Tested on both C++ and Python Frontend. Multi-node: ``` tengli@learnfair042:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=0 WORLD_SIZE=2 ./ProcessGroupNCCLTest Multi-node world size: 2 rank: 0 Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful tengli@learnfair117:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=1 WORLD_SIZE=2 ./ProcessGroupNCCLTest Multi-node world size: 2 rank: 1 Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` CI test: ``` test_set_get (__main__.FileStoreTest) ... ok test_set_get (__main__.PrefixFileStoreTest) ... ok test_set_get (__main__.PrefixTCPStoreTest) ... ok test_allreduce_ops (__main__.ProcessGroupGlooTest) ... ok test_broadcast_ops (__main__.ProcessGroupGlooTest) ... ok test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_common_errors (__main__.RendezvousFileTest) ... ok test_nominal (__main__.RendezvousFileTest) ... ok test_common_errors (__main__.RendezvousTCPTest) ... ok test_nominal (__main__.RendezvousTCPTest) ... ok test_unknown_handler (__main__.RendezvousTest) ... ok test_set_get (__main__.TCPStoreTest) ... ok ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10932 Differential Revision: D9542067 Pulled By: teng-li fbshipit-source-id: 25513eddcc3119fd736875d69dfb631b10f4ac86	2018-08-28 12:40:14 -07:00
Michael Carilli	579bc43a14	Future-proofing embedding.py against heuristic changes (#10959 ) Summary: - rebase of https://github.com/pytorch/pytorch/pull/9851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10959 Differential Revision: D9542292 Pulled By: weiyangfb fbshipit-source-id: ce51864d203c8ed89da3817f1da020a0ee932960	2018-08-28 12:40:12 -07:00
Xingdong Zuo	3b891d9d49	Support direct access of `nn.RNNCellBase` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10944 Differential Revision: D9541085 Pulled By: soumith fbshipit-source-id: 59077f3b226d04c68a93cd6864894e8f6c594aba	2018-08-28 12:25:12 -07:00
David Riazati	5c58cda8ca	Add subname to console output for assertExpected (#10559 ) Summary: Running `--accept` on a test doesn't tell you explicitly which sub-test is being updated, this PR fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/10559 Differential Revision: D9353977 Pulled By: driazati fbshipit-source-id: a9d4014386ff0fe388a092f3dcf50f157e460f04	2018-08-28 12:13:03 -07:00
Edward Yang	91797c0672	Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946 ``` codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h ``` Reviewed By: houseroad Differential Revision: D9539945 fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e	2018-08-28 11:57:08 -07:00
Lu Fang	5ed62ea6fa	Add Upsample example for torch onnx exporting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10550 Reviewed By: orionr Differential Revision: D9541932 Pulled By: houseroad fbshipit-source-id: 4d179d189c176482ae919e5cc74607b9d315ed26	2018-08-28 11:39:55 -07:00
Zachary DeVito	22c9bc3117	Resolve builtins using a dict rather than by name (#10927 ) Summary: Changes the approach for resolving builtin ops so that the following works ``` add = torch.add script def foo(x): return add(x, x) ``` This handles cases when people alias torch and torch.nn.functional to shorter names. This works by building a table of id -> builtin name for the know builtin ops in torch, torch.nn.functional, and for any user-defined op created by accessing in torch.ops.foo.bar This allows us to clean up many SugaredValue types in the compiler. Notes: * we now consider any attributes on python modules to be constants (e.g. math.pi, and torch.double). * fixes a bug where we incorrectly allowed attribute lookup on arbitrary pyton objects. It is now restricted to modules only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927 Differential Revision: D9527522 Pulled By: zdevito fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660	2018-08-28 11:25:11 -07:00
Jerry Zhang	c9d337f436	Split IsEmptyOp (#10918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10918 att Differential Revision: D9515040 fbshipit-source-id: 53c05c160ba5dda92104aadc2e40801519a2cd28	2018-08-28 10:52:28 -07:00
Jerry Zhang	7de830b879	proper sharing in ShareExternalPointer (#10804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10804 Make ShareData and ShareExternalPointer to create new storage when the old one is used by multiple tensors. When we need to modify the field of storage, we'll create a new storage instead. Reviewed By: ezyang Differential Revision: D9350686 fbshipit-source-id: 68d2b6b886b0367b0fc4fabfd55b9a480e7388ca	2018-08-28 10:52:26 -07:00
Wei Yang	7f9fd1cc26	allow RandomSampler to sample with replacement (#9911 ) Summary: fixes #7908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911 Reviewed By: yf225 Differential Revision: D9023223 Pulled By: weiyangfb fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7	2018-08-28 10:52:25 -07:00
Peter Goldsborough	504d705d0f	Support for CUDNN_HOME/CUDNN_PATH in C++ extensions (#10922 ) Summary: Currently we assume to find cudnn includes and libraries in the `CUDA_HOME` root. But this is not always true. So we now support a `CUDNN_HOME`/`CUDNN_PATH` environment variable that can have its own `/include` and `/lib64` folder. This means cudnn extensions now also get support on the FAIR cluster. soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/10922 Differential Revision: D9526856 Pulled By: goldsborough fbshipit-source-id: 5c64a5ff7cd428eb736381c24736006b21f8b6db	2018-08-28 09:40:29 -07:00
Rohan Varma	1421a9d704	added num_directions explanation to docstrings (#10786 ) Summary: Resolving [https://github.com/pytorch/pytorch/issues/10741](https://github.com/pytorch/pytorch/issues/10741). The current docs use `num_directions` quite a bit, without any explanation for them. `num_directions` is set to 2 if the RNN is bidirectional, or 1 otherwise. This change simply adds that to the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10786 Differential Revision: D9480235 Pulled By: zou3519 fbshipit-source-id: f61d1b0d2b943f84d5b7ff83df6fe0965a508a5e	2018-08-28 09:26:06 -07:00
Christian Puhrsch	bee779bc83	StorageImpl scalar_type_ to data_type_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10915 Reviewed By: ezyang Differential Revision: D9526416 Pulled By: cpuhrsch fbshipit-source-id: 68e43121d72b1b951c73df5bf7b598854fb0e291	2018-08-28 09:26:04 -07:00
Gregory Chanan	82bb9fbedd	Remove Scalar.local(). (#10917 ) Summary: It's a no-op now that Scalars don't store tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10917 Differential Revision: D9520267 Pulled By: gchanan fbshipit-source-id: 5388ff9a4fbb8fc9b9e1ce92208246bf6f08eb92	2018-08-28 07:41:36 -07:00
なるみ	7c7a2ccb58	Update onnx.rst for v0.4 (#10810 ) Summary: Since we don't need `torch.autograd.Variable` anymore, I removed `torch.autograd.Variable` from `onnx.rst`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10810 Differential Revision: D9500960 Pulled By: zou3519 fbshipit-source-id: 1bc820734c96a8c7cb5d804e6d51a95018db8e7f	2018-08-28 07:26:01 -07:00
Edward Yang	de099564e3	Minor copy-edit on README Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10931 Reviewed By: cpuhrsch Differential Revision: D9526248 fbshipit-source-id: 2401a0c1cd8c5e680c6d2b885298fa067d08f2c3	2018-08-27 21:09:36 -07:00
Roy Li	de9cc98e66	Stop copying tensor memory when importing IR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10487 Differential Revision: D9370084 Pulled By: li-roy fbshipit-source-id: ecff1d5d7d006fd60e4f6238ee86c56ad168bfc8	2018-08-27 19:25:42 -07:00
Zachary DeVito	2c342e50e1	Fix a bug in constant prop (#10923 ) Summary: More support for tuples has uncovered a bug in constant prop where it assumed it can create constant nodes of tuples, even though we cannot easily create a single prim::Constant to represent a tuples. This fix checks when we cannot represent an IValue as a prim::Constant and then stops propagating the node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10923 Reviewed By: orionr Differential Revision: D9523417 Pulled By: zdevito fbshipit-source-id: 745058c4388d9a5e0fc1553eaa2731e31bc03205	2018-08-27 18:10:17 -07:00
Junjie Bai	157fb46ffc	Add -rdynamic only to linker flags to avoid compiler warnings (#10789 ) Summary: `clang: warning: argument unused during compilation: '-rdynamic'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10789 Reviewed By: houseroad Differential Revision: D9467385 Pulled By: bddppq fbshipit-source-id: 610550a8f34cfa66b9dfa183752eb129dae21eaa	2018-08-27 17:56:21 -07:00
Edward Yang	f7b02b3a68	Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824 API additions: - Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&) - Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&) - Tensor::operator=(Tensor&&) && (for completeness sake) - TensorBase::unsafeGetTensorImpl() - TensorBase::unsafeReleaseTensorImpl() - TensorBase::getIntrusivePtr() - TensorImpl::type_id() - Tensor::set_data() - Tensor::is_same(Tensor) - Tensor::use_count() - Tensor::type_id() - Tensor::scalar_type() - WeakTensor::is_same(WeakTensor) - intrusive_ptr::weak_use_count() - weak_intrusive_ptr::weak_use_count() - c10::raw::intrusive_ptr::{incref,decref,make_weak} - c10::raw::weak_intrusive_ptr::{incref,decref,lock} API changes: - Tensor::pImpl is no longer public (and now named tensor_impl_) - Most methods accessed this way are now accessible on Tensor maybe_zero_dim() and set_wrapped_number() being prominent exceptions (they are now accessed through unsafeGetTensorImpl()) - Type is no longer friend of Tensor - TensorBase::reset(TensorImpl) is deleted - TensorBase::reset(TensorImpl, bool should_retain) is deleted - TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead - TensorBase::get() is deleted; use unsafeGetTensorImpl() instead - TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead - TensorBase::retain() is deleted; use _raw_incref() instead - TensorBase::release() is deleted; use _raw_decref() instead - WeakTensor lost most of its methods (it no longer inherits from TensorBase) - TensorImpl::storage() is now a const method - Tensor(TensorBase) constructor removed, instead we go through getIntrusivePtr(). I'm not sure about this change; I happened to have accidentally removed the TensorBase constructor and decided to fix call sites, but I could go the other way. - detail::set_data() is deleted; use Tensor::set_data() instead - c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead. (The reason for this change, is that it is invalid to cast an intrusive_ptr_target* to a raw_intrusive_ptr_target* to take advantage of the methods. But there is no reason the incref/decref methods shouldn't also work on intrusive_ptr_target; it is primarily an API consideration. We can be more standards compliant by keeping them as functions, which are universally applicable.) - intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on pointers of the NullType. (This counts as a bug fix, because the documentation specified that pointers produced by release() are valid to reclaim(), and a release() on a null intrusive_ptr produces the NullType::singleton()) Bug fixes: - Dispatch code for mutable references incorrectly returned a reference to a value argument (which would immediately go out of scope). They now correctly return a tensor by value. - intrusive_ptr copy/move assignment did not work correctly when an object was assigned to itself. We now check for this case and no-op if so. (This bug manifested itself as a Tensor mysteriously becoming an UndefinedTensor after lines of code like 'x = x.mul_(y)') Other changes: - The checked cast functions in Utils.h have now been renamed and detemplatized into checked unwrap functions. - Added type_id() and scalar_type() methods to Tensor - pImpl is no longer public - Documented what the && overloads are doing - All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor') have been expunged. This is NO LONGER a valid way to create a new tensor, and if you do this, upon your first incref, you will catch an ASSERT failure saying that only tensors created by intrusive_ptr::release() are valid to reclaim(). Use c10::make_intrusive instead in this situation. - IValue is adjusted to use intrusive_ptr instead of Retainable, and all other sub-classes of Retainable were modified to use intrusive_ptr. When doing this, I had to make the constructors of sub-classes like ConstantList public, so that c10::make_intrusive could invoke them. Fortunately, if you incorrectly stack allocate a ConstantList, and then try to get an intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0. - IValue very narrowly sidesteps the problem of handling NullType, as it considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor> which is not always true. This was always the case, but there's now a comment explaining what's going on. Some MSVC bugs were uncovered during the preparation of this patch. They are documented as comments in the code. Reviewed By: gchanan Differential Revision: D9481140 fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc	2018-08-27 16:11:01 -07:00
Roy Li	f2bb9f0bb5	speed up kl div loss (#10336 ) Summary: Moved kl div loss to aten. benchmarks for 5000 iterations on input size (1000,100) New ``` cuda: forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006] input requires_grad=True: backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155] double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918] target requires_grad=True: backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101] double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982] cpu: forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648] input requires_grad=True: backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492] double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031] target requires_grad=True: backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143] double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685] ``` Old ``` cuda: forward [3.101281268056482, 3.068499860819429, 3.0527669726870954] input requires_grad=True: backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839] double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133] target requires_grad=True: backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282] double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176] cpu: forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352] input requires_grad=True: backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552] double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931] target requires_grad=True: backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046] double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336 Differential Revision: D9213636 Pulled By: li-roy fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b	2018-08-27 16:10:59 -07:00
rohithkrn	f5910c8a36	Add MIOPEN recurrent operator (#10840 ) Summary: The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840 Differential Revision: D9518980 Pulled By: bddppq fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f	2018-08-27 15:39:56 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Elias Ellison	58b145f515	Fix negative indices in tracer (#10560 ) Summary: Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560 Differential Revision: D9493614 Pulled By: eellison fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c	2018-08-27 15:19:41 -07:00
Junjie Bai	9aa92bc261	Change the default value of DeviceOption.numa_node_id from -1 to 0 (#10877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10877 change default value of DeviceOption.numa_node_id to 0 and use has_numa_node_id() to check existence Reviewed By: ilia-cher Differential Revision: D9473891 fbshipit-source-id: 91ac6a152f445644691023110c93d20a3ce80d43	2018-08-27 14:55:46 -07:00
Gregory Chanan	7842b6d0f7	Fix at::optional compile problems on Windows CUDA. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10909 Differential Revision: D9516837 Pulled By: gchanan fbshipit-source-id: fad7e3284e74c599b873ebaae2dcdf5013505855	2018-08-27 14:40:41 -07:00
Zachary DeVito	6ce799edd6	Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812 ) Summary: * Fix the necessary pathways so that tuples and lists can be inputs to the script. * prevent linear algebra functions from being run in shape prop because they frequently will error out for nonsense data. * favor schema-driven python input conversion where possible. remaining cases where we directly create Stacks without schema are only for debugging * Make the error messages when calling script/trace functions more pythonic * Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812 Differential Revision: D9477982 Pulled By: zdevito fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c	2018-08-27 14:40:40 -07:00
Yanghan Wang	f64f6eed3a	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859 Reviewed By: newstzpz Differential Revision: D9498312 fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304	2018-08-27 12:56:46 -07:00
Richard Zou	35beecfe17	fix xfails involving literals (#10905 ) Summary: I missed these in #10900 cc apaszke jamesr66a zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905 Differential Revision: D9516748 Pulled By: zou3519 fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3	2018-08-27 12:41:06 -07:00
vishwakftw	f940af6293	Bag of Distributions doc fixes (#10894 ) Summary: - Added `__repr__` for Constraints and Transforms. - Arguments passed to the constructor are now rendered with :attr: Closes https://github.com/pytorch/pytorch/issues/10884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10894 Differential Revision: D9514161 Pulled By: apaszke fbshipit-source-id: 4abf60335d876449f2b6477eb9655afed9d5b80b	2018-08-27 09:55:27 -07:00
Richard Zou	67f6f930a8	Remove FIXME_zerol() from test_jit.py (#10900 ) Summary: The scalar situation has gotten a lot better and now we can remove all instances of FIXME_zerol(). cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900 Differential Revision: D9514206 Pulled By: zou3519 fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce	2018-08-27 08:55:08 -07:00
Tongzhou Wang	841d779598	Increase BC for PackedSequence ctor (#9864 ) Summary: PackedSequence is never supposed to be created by user, but unfortunately some community repo is already doing this (e.g., [here](`7c191048ce/torchmoji/model_def.py (L218-L229)`)). Some change we made break the calling pattern `PackedSequence(data=x, batch_sizes=y)`. This patch adds back support for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9864 Differential Revision: D9011739 Pulled By: SsnL fbshipit-source-id: 0e2012655d7f4863ec54803550df30874ec35d75	2018-08-27 08:25:23 -07:00
Gregory Chanan	c3271b53e4	Remove ability of Scalars to hold Tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10889 Differential Revision: D9512589 Pulled By: gchanan fbshipit-source-id: 8b2b26c9f3a4da31a46f684793ab237e9ef9a323	2018-08-27 07:26:14 -07:00
Edward Z. Yang	3aaad3ecb1	Begin a bestiary of MSVC/NVCC bugs. (#10883 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10883 Differential Revision: D9513997 Pulled By: ezyang fbshipit-source-id: 37db956e57d86471323d284869bb844f5a4753ac	2018-08-27 07:09:47 -07:00
Adam Paszke	c8b246abf3	Prevent JIT from overspecializing to every single size configuration (#10844 ) Summary: Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details. Summary of changes: - Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail. - Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`. - Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`. - Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`. zdevito ezyang mruberry ngimel csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844 Differential Revision: D9498705 Pulled By: apaszke fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61	2018-08-26 09:54:48 -07:00
Jorg Doku	9679fc5fcd	Handling failing test on ROCm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10854 Reviewed By: ezyang Differential Revision: D9498721 Pulled By: Jorghi12 fbshipit-source-id: 4018383fea5a2a6baff7183b0c0197a4b7a09f20	2018-08-26 07:55:33 -07:00
Yi Cheng	ddc37d7487	Update mobile predictor caller's interface Summary: Update all the caller for the new interface Reviewed By: highker Differential Revision: D9323167 fbshipit-source-id: a39335ceb402db0719f5f2314085ba9a81380308	2018-08-24 23:40:05 -07:00
Christian Puhrsch	d632ccd2c1	Cache isContiguous and numel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10696 Differential Revision: D9437963 Pulled By: cpuhrsch fbshipit-source-id: 7217682f5e4b69c73d943411d738e4892bb465f5	2018-08-24 22:40:39 -07:00
Qinqing Zheng	17dac3e17f	Create class constant for string literal 'blob_names' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10827 Reviewed By: boryiingsu Differential Revision: D9484567 fbshipit-source-id: 275eddc9406b5f427d72c0ab9b0da481b5e59ece	2018-08-24 22:11:43 -07:00
Jongsoo Park	8253cfaa72	Conv BN fusion for 3D conv (#10239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10239 Make Conv + BN fusion also work for 3D convolutions Reviewed By: duc0 Differential Revision: D9176314 fbshipit-source-id: 6604aa569c5c3afdb4480a5810890bc617e449c4	2018-08-24 21:24:36 -07:00
Adam Paszke	542aadd9a7	Stop using symbolic override for tracing RNNs (#10638 ) Summary: This disables the symbolic override hacks and makes tracing emit the recently added ATen ops for RNNs (`aten::lstm`, `aten::gru`, ...). I managed to reuse pretty much all of the translation code for their symbolics. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10638 Differential Revision: D9385830 Pulled By: apaszke fbshipit-source-id: ff06ef7b1ae7c3b7774825e0991bc3887e1ff59b	2018-08-24 20:25:58 -07:00
Bram Wasti	f2f6e6c0e8	Add registry to pybind_state (#10759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10759 Adding a basic registry pattern to pybindstate so that we can have separate 'cc' files register module updates. This is substantially cleaner than using multiple pybind modules (which have been known to cause bugs) Reviewed By: bddppq Differential Revision: D9441878 fbshipit-source-id: af9e9e98385e92b58ca50e935678328c62684d8e	2018-08-24 17:25:02 -07:00
Orion Reblitz-Richardson	c172ffb632	Remove the nanopb submodule Summary: After making changes internally, really remove the nanopb submodule. Finalizes https://github.com/pytorch/pytorch/pull/10772 Reviewed By: yns88 Differential Revision: D9504582 fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b	2018-08-24 16:24:57 -07:00
Peter Goldsborough	148ea2a653	Create at::linear (#10799 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/10755 with fix for ONNX ezyang jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/10799 Differential Revision: D9482168 Pulled By: goldsborough fbshipit-source-id: 85d4bdfcf0d451f2e7a1c83c5f5415cdd6caacdc	2018-08-24 16:02:08 -07:00
Syed Tousif Ahmed	1fbabff76a	Refactor THCNumerics and add common math functions for at::Half (#10301 ) Summary: Summary: This PR is a followup of mruberry's https://github.com/pytorch/pytorch/pull/9318/. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR https://github.com/pytorch/pytorch/pull/10147) Comments: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check https://github.com/pytorch/pytorch/pull/8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: https://github.com/ROCm-Developer-Tools/HIP/issues/374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - https://github.com/pytorch/pytorch/pull/6786 - https://github.com/pytorch/pytorch/pull/5475 - https://github.com/pytorch/pytorch/pull/9401 - https://github.com/pytorch/pytorch/pull/8689 - https://github.com/pytorch/pytorch/pull/8919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a	2018-08-24 16:02:06 -07:00
Gregory Chanan	87a7840fa6	Remove Tensor constructor of Scalar. (#10852 ) Summary: This is along the way of removing Tensor as a member of the tagged union in Scalar. This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase). Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway. I'm undecided what the final API should be here. We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best. For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852 Reviewed By: ezyang Differential Revision: D9496766 Pulled By: gchanan fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db	2018-08-24 16:02:05 -07:00
Edward Yang	0d5584d8d7	Revert D9492561: [pytorch][PR] Moving the operator argument to the front for kernelPointwiseApply. Differential Revision: D9492561 Original commit changeset: d0f0e2ab7180 fbshipit-source-id: fc822e63b11866195ff7883f360338a41e25d9e2	2018-08-24 16:02:04 -07:00
Elias Ellison	0ef5cfd28c	fix ivalue printing for lists (#10777 ) Summary: Fixing the printing of IValue lists, which didn't work previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777 Differential Revision: D9474264 Pulled By: eellison fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f	2018-08-24 16:02:03 -07:00
Adam Paszke	983e0f2413	Remove Node::invalidateSchema (#10822 ) Summary: The schema_ field is a private and internal cache for nodes, and no methods meant to manipulate it should be publicly visible. This call wasn't even necessary at its call site, since removeInput will reset the schema by itself. zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/10822 Reviewed By: zdevito Differential Revision: D9498683 Pulled By: apaszke fbshipit-source-id: 42e1743e3737cb7d81f88e556204487d328c0e47	2018-08-24 16:02:01 -07:00
Elias Ellison	74e6a666b3	If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180 ) Summary: When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180 Differential Revision: D9438153 Pulled By: eellison fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5	2018-08-24 16:02:00 -07:00
Yunus Rahbar	474684cf03	Re-sync with internal repository (#10868 )	2018-08-24 15:48:03 -07:00
Yinghai Lu	8044dc4eb8	Support new Reshape semantics (#10848 ) Summary: Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models. This change also flushes out some of the bugs and new requirement. - Converting shape info into int64 tensor - Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848 Reviewed By: houseroad Differential Revision: D9495121 Pulled By: yinghai fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a	2018-08-24 11:46:41 -07:00
Adam Paszke	8130b1a950	Ignore stack frames coming from python3 object file (#10627 ) Summary: goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/10627 Reviewed By: ezyang Differential Revision: D9384411 Pulled By: apaszke fbshipit-source-id: ce4f6edb9ffbd0c7e320b9347da10399de472150	2018-08-24 11:26:21 -07:00
Jerry Zhang	6e2f6dc6e6	Move Allocator and Device to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10798 Reviewed By: ezyang Differential Revision: D9466602 fbshipit-source-id: f5bda17045076d8c81be9fa5a0749c97bf274b5f	2018-08-24 11:26:19 -07:00
Wei Yang	f1df85d799	bug-fix in normal_( ) (#10846 ) Summary: - fixes #10642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10846 Differential Revision: D9495014 Pulled By: weiyangfb fbshipit-source-id: 35a9fc349f9f0c21a24141f29c62853ab6a68dae	2018-08-24 11:26:18 -07:00
Jorg Doku	313139d14e	Moving the operator argument to the front for kernelPointwiseApply. (#10829 ) Summary: Currently on PyTorch AMD, memory accesses on the TensorInfo struct contained in the Operators passed into the kernelPointwiseApply kernel leads to hangs on the HCC runtime. Permuting the argument order such that the operator is first alleviates this issue and the kernel hangs disappear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10829 Reviewed By: ezyang Differential Revision: D9492561 Pulled By: Jorghi12 fbshipit-source-id: d0f0e2ab7180e55846db909f2744b8c8b110205e	2018-08-24 11:10:43 -07:00
Lu Fang	e3d12d7afb	Automatic update of fbcode/onnx to 6146a85d371481222c10ede4430ad5476e60de87 (#10831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10831 Previous import was 7848f1e0414ba3b2e263609d93d46fd60790b2e9 Included changes: - [6146a85](https://github.com/onnx/onnx/commit/6146a85): Check pybind version (#1315) <Changming Sun> - [2cbf740](https://github.com/onnx/onnx/commit/2cbf740): Domain exists in GraphProto but not in Node (#1310) <Ryan Hill> - [9b874e9](https://github.com/onnx/onnx/commit/9b874e9): [Title] Add optimization pass eliminating nop Pad (#1307) <Tingfan Wu> Reviewed By: yinghai Differential Revision: D9485475 fbshipit-source-id: 3adb4e6e182278fd2abe5068a9d4569763e0ff0c	2018-08-24 10:54:40 -07:00
Orion Reblitz-Richardson	3c9775fff8	Remove nanopb since we've switched to protobuf (#10772 ) Summary: We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default. cc zdevito pjh5 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772 Reviewed By: pjh5 Differential Revision: D9465894 Pulled By: orionr fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201	2018-08-24 10:54:38 -07:00
Orion Reblitz-Richardson	8c13971f57	Remove protobuf require and use requirements.txt (#10771 ) Summary: In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771 Reviewed By: pjh5 Differential Revision: D9474458 Pulled By: orionr fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492	2018-08-24 10:39:40 -07:00
Gregory Chanan	474bd60bad	Provide a tensor overload to mul_out_sparse_scalar. (#10828 ) Summary: This is a small part of the effort to remove Tensor as a tagged member in Scalar because it is inconsistent with how we normally do overloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10828 Differential Revision: D9485049 Pulled By: gchanan fbshipit-source-id: 103f5cc03bb7775cd2d3a0a5c0c5924838055f03	2018-08-24 09:39:26 -07:00
Sam Gross	e146518e46	Fix AT_CUDA_CHECK and AT_CUDNN_CHECK macros (#10834 ) Summary: Previously, the macros evaluated the expression multiple times on error. For example: ``` AT_CUDA_CHECK(cudaStreamWaitEvent(ptr->stream, event, 0)); ``` would previously expand to ``` if (cudaStreamWaitEvent(ptr->stream, event, 0) != cudaSuccess) { AT_ERROR("CUDA error: ", cudaGetErrorString(cudaStreamWaitEvent(ptr->stream, event, 0))); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10834 Differential Revision: D9493257 Pulled By: colesbury fbshipit-source-id: d2473020fd83a25aa421171d19c8dfe559155a9b	2018-08-24 09:09:18 -07:00
Richard Zou	ca567862b2	Support multidimensional indexing (#10787 ) Summary: Part of #10774. This PR does the following: - Support ast.ExtSlice in the frontend. This is done by returning a list of ast.Index and ast.Slice. - Support multidimensional indexing with ints and slices The general approach is to desugar multidimensional indexing into at::slice, at::select operations. This is exactly how normal pytorch does indexing (by desugaring it into at::slice, at::select, and other ops). I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference. We should be able to copy the rest of this to implement the missing indexing features in script (indexing with ellipses, tensors, sequences, etc). After I'm done implementing the missing indexing features in future prs, I can try to templatize python_variable_indexing.cpp so that it can work with both JIT script and normal pytorch indexing, but right now I'm not sure if that's a good idea or not. cc zdevito jamesr66a apaszke wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787 Differential Revision: D9481402 Pulled By: zou3519 fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837	2018-08-24 08:10:32 -07:00
Xiaodong Wang	6993e4a9f7	Caffe2 Functional enforcing inplace output (#10797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10797 A few operators enforces in-place output (e.g., running mean/var for SpatialBN). Functional right now doesn't follow the inplace_enforced_ rules in OpSchema and therefore, the RunNetOnce() will fail on OpSchema->Verify(). Edit the output_names in Functional following the rules to pass check. Reviewed By: jerryzh168 Differential Revision: D9470582 fbshipit-source-id: 168efeccecc32184bd1d02f3fefe8e61faa4e0f4	2018-08-23 22:42:47 -07:00
Yi Cheng	8da4167129	Fix performance regression (#10835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10835 The last diff of constructor cause performance regression in cold run. This one tried to fix this. Reviewed By: highker Differential Revision: D9489617 fbshipit-source-id: a77c2e2c903a73e2ad9806b4f9c209cdb751442f	2018-08-23 19:55:23 -07:00
Teng Li	df2d48b42c	Added PrefixStore, pybind, test for group backward compatibility (#10762 ) Summary: Added Prefix Store support. This will make group be backward compatible. Test is covered too. ``` tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./FileStoreTest Using temporary file: /tmp/testoglRl4 Using temporary file: /tmp/testepZIpB Test succeeded tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./TCPStoreTest Test succeeded ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10762 Differential Revision: D9484032 Pulled By: teng-li fbshipit-source-id: 85754af91fe3f5605087c4a2f79ae930a9fd1387	2018-08-23 18:10:37 -07:00
Duc Ngo	61b34d42e7	nomnigraph - isSubgraphMatch returns the matched Subgraph & map from MatchNodes to graph nodes (#10605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10605 Make isSubgraphMatch returns a subgraph and map from MatchNodes to graph nodes in the result, which makes it easier to write graph fusion logic. Also include some more helper methods for NN subgraph matcher. Reviewed By: bwasti Differential Revision: D9374931 fbshipit-source-id: 3a273295eec81a43027ec3a9e835d27f00853df9	2018-08-23 16:40:19 -07:00
Tigran Hakobyan	ee022a476a	Added this-consts to all methods on SymbolicVariable (#10805 ) Summary: Self explanatory. See https://github.com/pytorch/pytorch/issues/9109 or T32954812 for more details Pull Request resolved: https://github.com/pytorch/pytorch/pull/10805 Reviewed By: ezyang Differential Revision: D9477686 Pulled By: hakobyant fbshipit-source-id: 73dd84e5295e4c749bd6416ce2f6eb7590f05cbc	2018-08-23 16:25:27 -07:00
Peter Goldsborough	9403e0cac0	Use ATen implementation of RNNs (#10761 ) Summary: apaszke recently ported RNNs from Python into ATen, which means we can replace our implementation in the C++ API (written by ebetica) with the ATen implementation, which cleans up a lot of code (+99, -323). Thanks apaszke! I also added the `bidirectional` and `batch_first` options to the C++ API RNN options, just because why not. apaszke ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/10761 Differential Revision: D9443885 Pulled By: goldsborough fbshipit-source-id: b6ef7566b9ced2b2f0b2e1f46c295b6f250c65a8	2018-08-23 16:12:14 -07:00
Johannes M Dieterich	a4c59a9dab	MIOpen integration, more tests enabled, bug fixes (#10612 ) Summary: * first integration of MIOpen for batch norm and conv on ROCm * workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing * workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script * use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm * enable test_sparse set on CI, skip tests that don't work currently on ROCm * enable more tests in test_optim after the elementwise_bug got fixed * enable more tests in test_dataloader * improvements to hipification and ROCm build With this, resnet18 on CIFAR data trains without hang or crash in our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612 Reviewed By: bddppq Differential Revision: D9423872 Pulled By: ezyang fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd	2018-08-23 15:24:47 -07:00
Zachary DeVito	3d43a82440	Add support for vararg style functions. (#10250 ) Summary: Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments. I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification. closes #10632 closes #8516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250 Differential Revision: D9478235 Pulled By: zdevito fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c	2018-08-23 15:10:36 -07:00
Edward Yang	9dbcc9cebd	Move _raw_* intrusive pointer manipulations to raw_intrusive_ptr_target (#10779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10779 The idea is to let classes opt-in to providing these methods by default. Reviewed By: jerryzh168 Differential Revision: D9466076 fbshipit-source-id: b6beee084cc71d53ce446cdc171d798eeb48dc12	2018-08-23 14:32:24 -07:00
Pengyao Chen	dec3ed7b49	Increase the limit for Proto size (#10745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10745 ParseProtoFromLargeString hits limit when using recurring v2. To unblock warmup project, we can increase the limit temporarily. More details in this post -- https://fb.facebook.com/groups/264913123977784/permalink/463566404112454/ Differential Revision: D9436368 fbshipit-source-id: 54488f27ef941cab679843cb0c502095dd056c1b	2018-08-23 13:55:50 -07:00
Andrei Maximov	432b3adffc	Print blob sizes on fatal signal (#10766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766 Added a `Workspace::ForEach(...)` API for accessing the global set of existing Workspace instances. This is used in the signal handler to print blob info on the thread receiving a fatal signal. Reviewed By: mraway Differential Revision: D9147768 fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4	2018-08-23 13:39:55 -07:00
Edward Yang	82ddeb7f2b	Using shared implementation in Tensor (#10619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10619 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9047 Reviewed By: jerryzh168 Differential Revision: D8417101 fbshipit-source-id: 98e0a3275864283c2f06d28f4c9b859b5827ed4d	2018-08-23 13:39:53 -07:00
Sam Gross	23a366be33	Use ATen native functions for THCTensor_cadd/cmul/cdiv/csub (#10707 ) Summary: This seems to save a few percent in binary size in libcaffe2_gpu.so, but the effect may not be real. In fact, deleting some functions can cause the binary size to increase (perhaps due to alignment issues). cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/10707 Differential Revision: D9409009 Pulled By: colesbury fbshipit-source-id: 282931e562e84e316a33ac6da4788c04c2984f08	2018-08-23 13:31:03 -07:00
mruberry	0f5c8edfd3	Removes unused THCState code paths (#9735 ) Summary: To prepare THCState for refactoring into ATen, this PR removes unused THCState code paths. In particular, it: - Removes the UVA Allocator - Removes the THDefaultDeviceAllocator - Respects the 1 BLAS and 1 sparse handle per device reality - Removes kernel p2p access - Removes setting p2p access - Removes the GCHandler code path - Removes many unused THCState_... functions - Removes THCThreadLocal.h/.cpp It does not change the preexisting external behavior of any remaining function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9735 Differential Revision: D9438558 Pulled By: SsnL fbshipit-source-id: dde9acbec237a18bb6b75683e0526f7ff1c9a6ea	2018-08-23 13:10:05 -07:00
Lin Li	ab9e7ae23e	Add CUDA implementation of LARS --caffe2 (#10509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10509 This diff enables CUDA implementation of LARS operator in caffe2. Reviewed By: enosair Differential Revision: D9318356 fbshipit-source-id: 365b9f01e3afd4d9d3ba49155e72e728119f40c5	2018-08-23 12:55:57 -07:00
Will Feng	b14f2e899c	Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279 ) Summary: When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times: ``` _sparseDims + _denseDims = len(shape) _indices.shape: dimensionality: 2, shape: (_sparseDims, nnz) _values.shape: dimensionality: 1 + _denseDims. shape: (nnz, shape[_sparseDims:]) ``` This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled. Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279 Differential Revision: D8936683 Pulled By: yf225 fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e	2018-08-23 10:10:24 -07:00
なるみ	0eb2c83006	Fix link in THNN/README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10821 Differential Revision: D9481118 Pulled By: soumith fbshipit-source-id: 0a416202eb4db025ec7d395e70344cbbf626fec0	2018-08-23 09:25:16 -07:00
Fritz Obermeyer	fcfb1c1979	Make more distributions jittable Summary: This uses zou3519's new `torch.broadcast_tensors()` #10075 to make `Categorical.log_prob()` and the `*Normal.__init__()` methods jittable. Previously `.log_prob()` was failing due to calls to `torch._C.infer_size()` with errors like ``` def log_prob(self, value): if self._validate_args: self._validate_sample(value) > value_shape = torch._C._infer_size(value.size(), self.batch_shape) if self.batch_shape else value.size() E RuntimeError: expected int at position 0, but got: Tensor ``` After this change I'm able to jit many more of Pyro's tests. Reviewed By: ezyang Differential Revision: D9477487 Pulled By: apaszke fbshipit-source-id: 5f39b29c6b8fa606ad30b02fefe2dfb618e883d6	2018-08-23 08:09:49 -07:00
Erik Brinkman	529fc68df2	Update docs with clean (#10819 ) Summary: Add tip about cleaning if installing ninja after a build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10819 Reviewed By: soumith Differential Revision: D9480095 Pulled By: erikbrinkman fbshipit-source-id: 96ae1387038afe6964a1bd1e2186468f6a5ea12f	2018-08-23 07:25:19 -07:00
Edward Yang	deda05e59f	Revert D9395814: move HeatmapMaxKeypointOp unittest to oss Differential Revision: D9395814 Original commit changeset: 25073eb6b143 fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb	2018-08-23 06:54:29 -07:00
Xianjie Chen	b885dea300	parallize the dense part in event models Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10768 Reviewed By: Wakeupbuddy Differential Revision: D9445750 fbshipit-source-id: b8c2ddfe3ccb9278506de15a5e43bada016408f7	2018-08-22 22:40:07 -07:00
Elias Ellison	5c0eece2fd	Force types on values returned from if blocks to be equivalent (#10281 ) Summary: When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281 Differential Revision: D9466566 Pulled By: eellison fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c	2018-08-22 19:55:38 -07:00
Yanghan Wang	9a43fc5eaa	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674 Reviewed By: newstzpz Differential Revision: D9395814 fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9	2018-08-22 19:11:10 -07:00
Yi Cheng	4aa5075cae	update the constructor to accept the PredictorConfg only to set up the predictor (#9483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9483 The interface is updated to accept the config to construct the predictor. Reviewed By: highker Differential Revision: D8872999 fbshipit-source-id: 3ca54d644970823fc33c0ade9a005e12f52e2b24	2018-08-22 19:11:09 -07:00
François Garillot	f0ec3bfa56	Changes for Python3 compatibility (#10524 ) Summary: Review by tomdz volkhin anshulverma Pull Request resolved: https://github.com/pytorch/pytorch/pull/10524 Reviewed By: ezyang Differential Revision: D9328001 Pulled By: huitseeker fbshipit-source-id: 144721c4fd9a1ea6cf6673793416f20cb448aa93	2018-08-22 18:55:01 -07:00
Teng Li	44b47fd7f3	Working pybind version of MPI process group and abort() pybind (#10606 ) Summary: This will make pybind version of MPI PG work. The issue is the scope of the tensor list won't be available for the MPI worker thread. So we pass the vector by value instead. Also added recv_anysource pybind to make it work. The front-end API will wrap one level up with an int for this function. So taking a tensor should be the easiest way for now. Also added abort pybind and fixed the flaky test. ``` tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ mpirun -np 8 ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10606 Differential Revision: D9474393 Pulled By: teng-li fbshipit-source-id: cca236c333656431e87d0d3573eeae9232c598b0	2018-08-22 18:26:04 -07:00
Wei Wen	6c75fc0aa3	Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644 Depends on D8493264 Reviewed By: chocjy, boryiingsu Differential Revision: D9347706 fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842	2018-08-22 17:10:03 -07:00
Adam Paszke	f72e813c2f	Allow tracing functions that take tuples of tensors as inputs (#10637 ) Summary: And return tuples. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637 Reviewed By: eellison Differential Revision: D9385892 Pulled By: apaszke fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431	2018-08-22 15:37:10 -07:00
Jesse Hellemn	043a2e36e5	Removing setup_caffe2.py (#10734 ) Summary: FULL_CAFFE2=1 python setup.py (install \| build_deps develop) should be all anyone needs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10734 Reviewed By: orionr Differential Revision: D9439354 Pulled By: pjh5 fbshipit-source-id: 0169afcda4f8f38c57498ba2151f7654ecce6070	2018-08-22 15:37:07 -07:00
Richard Zou	6c84f7fea0	Relax RHS type assert for augassign (#10730 ) Summary: Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`). Right now we assert that the RHS of the binop is a tensor, but it really doesn't have to be because we support scalar/scalar ops and also list-list ops (i.e., `[1, 2] + [2, 3]`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730 Differential Revision: D9465110 Pulled By: zou3519 fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b	2018-08-22 15:10:33 -07:00
James Reed	d40a598777	Back out "[pytorch][PR] Create at::linear" (#10785 ) Summary: Multiple failing external and internal CI signals were ignored when this commit was landed. goldsborough please fix the text failures and resubmit this change as a new PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/10785 Reviewed By: ezyang Differential Revision: D9466791 Pulled By: jamesr66a fbshipit-source-id: b260e93bac95d05fd627c64e620b6aefb5045949	2018-08-22 14:39:59 -07:00
James Reed	6fcac354c5	Erase ListConstruct nodes for ONNX export (#10713 ) Summary: ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713 Differential Revision: D9458508 Pulled By: jamesr66a fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694	2018-08-22 14:39:58 -07:00
Tongzhou Wang	de11a5fb28	Resubmit #8322 with scipy version check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775 Differential Revision: D9458207 Pulled By: SsnL fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b	2018-08-22 13:39:49 -07:00
Gregory Chanan	ee3e48d34b	Move Backend, Layout, ATenGeneral, Deprecated, Generator to ATen/core. (#10740 ) Summary: I included "legacy" includes in the old spots for Backend, Generator, Layout; it seemed unlikely that the other ones had direct user includes. This is another step on the path to move Type/Tensor to ATen/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10740 Reviewed By: ezyang Differential Revision: D9435888 Pulled By: gchanan fbshipit-source-id: 89f4f0f445d4498a059d3a79069ba641b22bbcac	2018-08-22 13:39:46 -07:00
Chetter2	5ca2713a8b	Fix performance of WeightedRandomSampler (#10636 ) Summary: Since https://github.com/pytorch/pytorch/pull/8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as https://github.com/pytorch/pytorch/pull/10361 fix DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10636 Differential Revision: D9423869 Pulled By: zou3519 fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44	2018-08-22 13:15:48 -07:00
Wei Wen	0e30fa6f3c	Faster random number generation in fused_rowwise_random_quantization_ops (#10634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10634 ``` Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=2, random_=True, data_shape_=array([1024, 1224]), gc=, dc=[, device_type: 1]) Sub+Scale+Sum time: 1.9944190979003908 ms Quantizing time: 2.080512046813965 ms (1.0431669296609765X) De-quantizing time: 0.7375001907348633 ms (0.36978195380863577X) ``` ``` Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=1, random_=True, data_shape_=array([1024, 1224]), gc=device_type: 1, dc=[, device_type: 1]) Sub+Scale+Sum time: 1.6691923141479492 ms Quantizing time: 7.500243186950684 ms (4.493336761366071X) De-quantizing time: 1.1209726333618164 ms (0.6715658967876477X) ``` Reviewed By: jspark1105 Differential Revision: D8849770 fbshipit-source-id: 2bb2bac7e633f647f38e419ce980b8958f3bcae2	2018-08-22 13:15:46 -07:00
Junjie Bai	754ec9e386	Reduce rocm link time with ThinLTO Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10758 Differential Revision: D9467554 Pulled By: bddppq fbshipit-source-id: 6853ccd96ac3209e062c110913ea37d6840c8134	2018-08-22 13:15:45 -07:00
Julian Rosenblum	9767951ca8	Remove regex matching from undefined_tensor_test, fixes #10013 (#10702 ) Summary: Don't regex against strings that may have come from the backtrace. Better to just not regex at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10702 Reviewed By: ezyang Differential Revision: D9406154 Pulled By: jsrmath fbshipit-source-id: 9b17abee2a6e737a32c05f1e3963aef4b6638a47	2018-08-22 12:39:57 -07:00
Jerry Zhang	b0ad8105d2	Split storage from tensor (#10053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053 Tensor in Pytorch 1.0 will have Tensor -> TensorImpl -> Storage -> StorageImpl In this diff we split Storage from Tensor in order to align with this design. We'll have Tensor -> Storage -> StorageImpl after this diff Reviewed By: ezyang Differential Revision: D9384781 fbshipit-source-id: 40ded2437715a3a2cc888ef28cbca9a25b1d5350	2018-08-22 11:55:02 -07:00
Vishwak Srinivasan	5fb9b31ed5	Add matrix_rank (#10338 ) Summary: - Similar functionality as NumPy - Added doc string - Added tests Differential Revision: D9240850 Pulled By: SsnL fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf	2018-08-22 09:58:38 -07:00
Anders Papitto	fbd7189949	add explicit flag to build static libtorch (#10754 ) Summary: I've tested locally that this works to build static and non-static binaries with and without CUDA. In terms of ongoing testing, I am working on incorporating this into the release package generation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10754 Differential Revision: D9457423 Pulled By: anderspapitto fbshipit-source-id: aa1dcb17c67c0f0c493a9cf93aca4a6e06b21666	2018-08-22 09:26:07 -07:00
Edward Yang	227635142f	Delete THD master_worker (#10731 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731 Differential Revision: D9423675 Pulled By: ezyang fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc	2018-08-22 08:54:36 -07:00
Pritam Damania	2fe5fa78fa	Use FinishDeviceComputation instead of adding events in Operator::SyncDevice Summary: The code in Operator::SyncDevice had some duplicate logic and using FinishDeviceComputation sufficed in this case. Reviewed By: yinghai Differential Revision: D9348288 fbshipit-source-id: d8d874bab491e6d448fcd5fa561a8b99d502753b	2018-08-22 01:09:53 -07:00
Ahmed Aly	22446a3619	Productionize CRF layer in PyText (#10362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10362 This diff implements a manual export from PyText's CRF module to the caffe2 CRF layer. Note that most of the changes in caffe2/python/crf.py are just formatting changes, the only relevant change is the new class CRFUtils. Reviewed By: hikushalhere Differential Revision: D9234126 fbshipit-source-id: 1a67d709034660e8b3d5ac840560b56de63e3f69	2018-08-22 00:25:26 -07:00
Edward Yang	19031c68dc	Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488 ) Summary: ``` Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage This patch does two major changes: - It replaces the use of Retainable in Storage with a new implementation based on intrusive_ptr. This will be necessary because Caffe2 will be using this class to implement intrusive_ptrs, and we need to line these up for the merge. One good thing about the new implementation is that the default copy/move constructors/assignment operators and destructor work automatically, instead of needing to be hardcoded into Storage/Tensor. - It replaces all places where we returned std::unique_ptr<Storage> with Storage, collapsing an unnecessary double indirection that is no longer necessary now that we have correctly working copy/move constructors. I didn't initially want to do step (2), but it was very important to eliminate all bare uses of new Storage and new StorageImpl, and this making the API change was the most straightforward way to do this. HOW TO FIX YOUR CODE IN THE NEW API - You no longer need to dereference the result of tensor.storage() to pass it to set. So, instead of: x.set_(*y.storage()); just write: x.set_(y.storage()); - If you were accessing methods on StorageImpl via the pImpl() method, you must use the dot operator to run pImpl(). Even better; just drop pImpl, we now have method forwarding. So, instead of: storage->pImpl()->data(); just do: storage->data(); // storage.pImpl()->data() works too but is not as recommended - storage->getDevice() is no more; instead use storage->device().index() MISC CODE UPDATES - retain, release, weak_retain, weak_release and weak_lock are now reimplemented using the "blessed API", and renamed to make it clearer that their use is discouraged. - nvcc OS X and general OS X portability improvements to intrusive_ptr - A new comment in intrusive_ptr describing how stack allocated intrusive_ptr_targets work differently than heap allocated ones from c10::make_intrusive CAVEAT EMPTOR - THStorage_weakRetain used to work on strong pointers, but it NO LONGER works with intrusive_ptr. You must reclaim the strong pointer into a real strong pointer, construct a weak pointer from it, and then release the strong and weak pointers. See StorageSharing.cpp for an example. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488 Reviewed By: gchanan Differential Revision: D9306134 Pulled By: ezyang fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2	2018-08-21 21:39:55 -07:00
Tongzhou Wang	abb209ef25	Fixes *fft docs (#10760 ) Summary: cc cranmer fixes #10751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10760 Differential Revision: D9444473 Pulled By: SsnL fbshipit-source-id: a4036773a93981801c1283d69f86e30cb0fe3d6d	2018-08-21 21:09:04 -07:00
Yiming Wu	e5e2514f4e	fix debug_info arg in createOperator and improve reroute_tensor (#10736 ) Summary: -Fixed C2 core.CreateOperator debug info assignment -Improving core.Net.reroute_tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/10736 Differential Revision: D9426659 Pulled By: harouwu fbshipit-source-id: 90caf848c88854e17e568d5f6910dc6c81fd000a	2018-08-21 19:40:16 -07:00
Peter Goldsborough	1068ba667c	Create at::linear (#10755 ) Summary: The optimized code for `linear()` which uses `addmm` when a bias is given was duplicated three times in the ATen and the C++ API. Let's just have `at::linear` and use that everywhere. apaszke ezyang (who mentioned this in #10481) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10755 Differential Revision: D9443881 Pulled By: goldsborough fbshipit-source-id: a64862d1649b5961043d58401625ec267d97d9f3	2018-08-21 19:40:15 -07:00
Bram Wasti	a2ca634e04	Add enforce back to converter.cc Summary: hotfix for B*8 Differential Revision: D9444060 fbshipit-source-id: 368f8463e684c39ec0ac18bcb11a7b6132d9f874	2018-08-21 19:09:22 -07:00
James Reed	ddf187c198	Dont assume serialized integral types were widened to int32 in raw_data (#10718 ) Summary: zdevito et al came to the conclusion that the ONNX spec does not mandate the widening conversion of integral types when serializing tensor data into raw_data, as opposed to serializing the data into int32_data. PyTorch recently made this change in the export code, which caused import in caffe2 to break because it did not match semantics. This fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/10718 Differential Revision: D9423712 Pulled By: jamesr66a fbshipit-source-id: 479fbae67b028bf4f9c1ca1812c2c7b0c6cccd12	2018-08-21 18:41:31 -07:00
Aaron Jaech	6325e5aa48	fix typo in error message (#9827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9827 changed unitilized to uninitialized Reviewed By: jerryzh168 Differential Revision: D8995509 fbshipit-source-id: 94518d5542a7bff49fcb9a4505c0c7a959746f78	2018-08-21 18:41:29 -07:00
tomdz	44f996f82c	Py3 fixes for layer_model_helper.py (#10525 ) Summary: Fixes `__getattr__` to adhere to its Python API contract, and wraps `range()` call in a list since it does not return one anymore in Python 3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10525 Reviewed By: ezyang Differential Revision: D9441360 Pulled By: tomdz fbshipit-source-id: d489c0e7cefecc4699ca866fd55ddbfa629688d4	2018-08-21 18:41:28 -07:00
Peter Goldsborough	71ddd837d7	Support custom ops in ScriptModule and tidy up test files (#10610 ) Summary: This PR adds support for using custom ops in ScriptModules, the last step for our custom op strategy. You can now write ``` import torch torch.ops.load_library('libcustom_ops.so') class Model(torch.jit.ScriptModule): def __init__(self): super(Model, self).__init__() torch.jit.script_method def forward(self, input): return torch.ops.custom.op(input) + 1 model = Model() model.forward(torch.ones(5)) # Works model.save("model.pt") # Works model = torch.jit.load("model.pt") # Works ``` You can then load the `model.pt` in C++ and execute its `forward` method! Missing for this was the fact that the script compiler didn't know to convert `ops.custom.op` into a `BuiltinFunction` which then emits a function call. For this I came up with the following strategy inside `torch/csrc/jit/scrip/init.cpp`: 1. When we access `torch.ops`, we return a `CustomOpValue` (subclass of `PythonValue`), whose purpose is only to return a `CustomOpNamespaceValue` (subclass of `PythonValue`) whenever something under it is accessed. 2. `CustomOpNamespaceValue` will then for each field accessed on it return a `BuiltinFunction`. This doesn't reduce performance for any calls that are not to `torch.ops` (as opposed to inspecting every function call's name the call site, for example). I also had to fix `BuiltinFunction` to not assume the namespace is always `aten::`. A lot of other changes are just tidying up the Python and C++ test harness before I integrate it in CI. zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10610 Differential Revision: D9387832 Pulled By: goldsborough fbshipit-source-id: c00f431db56c7502a66fe1f813fe78067f428ecb	2018-08-21 18:41:27 -07:00
Tongliang Liao	e94ae99d24	Delete copy constructor/assignment of class Observable explicitly. (#10593 ) Summary: This should resolves "error C2280: 'std::unique_ptr<caffe2::ObserverBase<caffe2::OperatorBase>,std::default_delete<_Ty>> &std::unique_ptr<_Ty,std::default_delete<_Ty>>::operator =(const std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference a deleted function" from Visual Studio. It should also make error message more human-readable in case if something really messed up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10593 Reviewed By: orionr Differential Revision: D9436397 Pulled By: mingzhe09088 fbshipit-source-id: 31711667297b4160196134a34365da734db1c61d	2018-08-21 16:56:04 -07:00
Shihao Xu	04b773ab87	Support Loading to GPU (#10710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10710 Can't resume from checkpoint for workflows that use GPU. The problem is just we didn't leverage the already-provided GPU deserialization of Caffe2. `keep_device` arg of LoadOp. See https://fburl.com/y27ltaxw How a serialized BlobProto (contraining TensorProto) is loaded into GPU memory? - Load BlobProto from DB. https://fburl.com/pe1qaeyf - Deserialize the BlobProto into a Blob instance. https://fburl.com/5dirjuuh and https://fburl.com/stoho0x1 - Call Blob->Deserialized. https://fburl.com/bnureu32 - Deserializer Registration. https://fburl.com/wbu95ry7 https://fburl.com/ycetud8u - Create TensorCUDA Deserializer. https://fburl.com/2lirfuqj - Create Tensor on GPU and get TensorProto of BlobProto. https://fburl.com/7dre82zg - Copy TensorProto in CPU to Tensor on GPU. https://fburl.com/fr0qk2oe Cloned the GPU workflows for testing in D9125520. Reviewed By: mraway Differential Revision: D9372950 fbshipit-source-id: 2bf70747bd71e8da16239197f7d2761d63f09ff8	2018-08-21 13:57:36 -07:00
Orion Reblitz-Richardson	edb34434ab	More changes for hidden visibility (#10692 ) Summary: Let's run CI tests to see what fails given the changes that just landed in https://github.com/pytorch/pytorch/pull/10624 cc mingzhe09088 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10692 Reviewed By: mingzhe09088 Differential Revision: D9423617 Pulled By: orionr fbshipit-source-id: 3bda1f118d13f8dd8e823727c93167cae747d8cf	2018-08-21 13:39:57 -07:00
nadavbh12	8a1739b05d	Add arguments __repr__ in Distribution base class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10373 Differential Revision: D9240316 Pulled By: ezyang fbshipit-source-id: f35c500f61f86e6be405e8bd4040db5146224984	2018-08-21 12:10:23 -07:00
Lei Zhang	9c321a8779	Add util function from core type to dtype (#10716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10716 title Reviewed By: idning Differential Revision: D9417357 fbshipit-source-id: 0f71805b1d64a46791d6ee4d8620763f878ffdb6	2018-08-21 10:55:19 -07:00
Lu Fang	b23d59ce1a	Make ONNX_ATEN_FALLBACK as internal default option Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10629 Reviewed By: bddppq Differential Revision: D9381106 fbshipit-source-id: 03d42c95d17a70a68fe0f38dad68f1793996dfce	2018-08-21 10:10:50 -07:00
Jorghi12	b0b5139149	Set the BUILD_ENVIRONMENT variable before installing sccache. (#10640 ) Summary: Set the build environment before installing sccache in order to make sure the docker images have the links set up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10640 Reviewed By: yf225 Differential Revision: D9399593 Pulled By: Jorghi12 fbshipit-source-id: a062fed8b7e83460fe9d50a7a27c0f20bcd766c4	2018-08-21 09:40:41 -07:00
Marat Dukhan	30ad13faca	Avoid shadowing i, j vars in GeneralProposals test (#10721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10721 - Fix compilation warning "declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]" Reviewed By: newstzpz Differential Revision: D9419688 fbshipit-source-id: 76efc3688782ce4ead3c89e7069211736febfac2	2018-08-21 09:11:38 -07:00
Gregory Chanan	f9d1b001e1	Move THNN Reduction to ATen/core. (#10703 ) Summary: This is part of moving the (base) Type to ATen/core; Some Type methods have default argument of type THNN Reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10703 Differential Revision: D9406060 Pulled By: gchanan fbshipit-source-id: 789bb3387c58bd083cd526a602649105274e1ef6	2018-08-21 08:54:35 -07:00
Mingzhe Li	f0d8a36e70	Completely remove build_aten and use_aten (#10469 ) Summary: Breaking out of #8338 to completely remove build_aten and use_aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469 Reviewed By: orionr Differential Revision: D9413639 Pulled By: mingzhe09088 fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183	2018-08-20 20:26:42 -07:00
Michael Suo	9e75ec11fb	Make empty list literals construct empty Tensor[] (#10705 ) Summary: This will make the common case more natural (no need to do `_construct_empty_tensor_list()`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705 Differential Revision: D9411622 Pulled By: michaelsuo fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96	2018-08-20 18:28:28 -07:00
Jesse Hellemn	5c0d9a2493	Soumith's last few patches to v0.4.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10646 Reviewed By: ml7 Differential Revision: D9400556 Pulled By: pjh5 fbshipit-source-id: 1c9d54d5306f93d103fa1b172fa189fb68e32490	2018-08-20 18:28:27 -07:00
Duc Ngo	e449a27646	Fix issues link in Caffe2 readme (#10711 ) Summary: Change to pytorch issues link orionr pjh5 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10711 Reviewed By: orionr Differential Revision: D9412870 Pulled By: duc0 fbshipit-source-id: 341e8504ade8eba614cead832e5b5fdca4b1c270	2018-08-20 16:55:11 -07:00
JerryShih	826550a32e	Update the onnx Gemm op to FC/FCTransposed logic in caffe2 onnx backend (#10108 ) Summary: The broadcast is used by default when the opset version is greater then 6. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10108 Reviewed By: bddppq Differential Revision: D9176934 Pulled By: houseroad fbshipit-source-id: b737bd87b0ddc241c657d35856d1273c9950eeba	2018-08-20 16:09:22 -07:00
Jesse Hellemn	15d7f49205	Adding ATEN_NO_TEST option to root level cmake for propogation to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10708 Reviewed By: ml7 Differential Revision: D9410916 Pulled By: pjh5 fbshipit-source-id: b216a9ff7be23ff8754f2fe0b8197b5d006aa08d	2018-08-20 15:40:27 -07:00
James Reed	585e6b581f	Allow method-style casts on tensors (#10641 ) Summary: Closes https://github.com/pytorch/pytorch/issues/10631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641 Differential Revision: D9407598 Pulled By: jamesr66a fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f	2018-08-20 14:10:21 -07:00
Edward Yang	39a3dcc999	Fix #10698 build failure (#10704 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10704 Differential Revision: D9406072 Pulled By: ezyang fbshipit-source-id: 0d472ef84cddc3bf7600b06d04e5e02e94d59fa3	2018-08-20 14:10:19 -07:00
Jason Gauci	b4684db698	Add support for Log() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10694 Reviewed By: houseroad Differential Revision: D9405612 Pulled By: MisterTea fbshipit-source-id: 6d83d3c2db933a3822076c7faf578ac0e92e60c6	2018-08-20 13:25:21 -07:00
Huan Gui	7832e9d564	Add a bisect percentile operator (#10563 ) Summary: Add a bisect percentile operators with lower and upper bounds for interpolation Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563 Reviewed By: chocjy Differential Revision: D7802182 Pulled By: olittle fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417	2018-08-20 13:14:05 -07:00
Jerry Zhang	3d0757430b	Fix EnsureCPUOutputOp (#10651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10651 EnsureCPUOutputOp will copy the input from another Context to CPU, but currently there is no guarantee that the Copy will be executed. Differential Revision: D9390046 fbshipit-source-id: af3ff19cf46560264cb77d2ab8821f0cc5be74f6	2018-08-20 12:12:48 -07:00
Duc Ngo	2e563c417c	Nomnigraph - rename some APIs that invole Subtree to Subgraph (#10551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10551 Renaming from "subtree" -> "subgraph" to improve clarity of subgraph matcher APIs since it now supports DAG This is pure renaming, no functionalities change. Reviewed By: bwasti Differential Revision: D9348311 fbshipit-source-id: 4b9267845950f3029dfe385ce3257d3abb8bdad4	2018-08-20 10:55:21 -07:00
Duc Ngo	aa9f328fa3	Nomnigraph - DAG matching (#10549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10549 Support dag matching in nomnigraph. This is done by maintaining a map from node in the MatchGraph to node in the input graph, and additionally enforce that same nodes in the MatchGraph must match to same nodes in the input graph (with the exception of multiplicity i.e. when count != 1 on the MatchGraph node). In a follow up diff, I'll rename the API that refers to subtree as subgraph to improve clarity. Reviewed By: bwasti Differential Revision: D9347322 fbshipit-source-id: 171491b98c76852240a253279c2654e96dd12632	2018-08-20 10:55:19 -07:00
Gregory Chanan	0cce4620fe	Fix backend/device-type comparison with MKLDNN. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10689 Differential Revision: D9400450 Pulled By: gchanan fbshipit-source-id: f75b042b886d5d525edb2c423173a9646c613a1b	2018-08-20 10:41:08 -07:00
Tongzhou Wang	db7b7f1359	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10686 Differential Revision: D9399874 Pulled By: SsnL fbshipit-source-id: 28130992d2416721552f72cfa835ff0358caeefa	2018-08-20 10:40:55 -07:00
Orion Reblitz-Richardson	d4832f1e7b	More fixes for hidden visibility (#10624 ) Summary: Some more `ATEN_API` additions for hidden visibility. Running CI tests to see what fails to link. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624 Reviewed By: mingzhe09088 Differential Revision: D9392728 Pulled By: orionr fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726	2018-08-20 10:11:59 -07:00
Adam Paszke	9ad9191323	Fix cuDNN dropout state cache (#10662 ) Summary: Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10662 Reviewed By: soumith Differential Revision: D9393629 Pulled By: apaszke fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90	2018-08-20 05:09:41 -07:00
Kittipat Virochsiri	c37fac4d50	Fixing stop condition on composite reader (#9888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9888 Limiter cannot be shared or copied; just pass it to the first reader. Reviewed By: xianjiec Differential Revision: D9008871 fbshipit-source-id: e20cd785b26b1844e156efc3833ca77cfc3ffe82	2018-08-20 03:02:20 -07:00
Xiang Gao	83066e9b30	Add trigonometry functions for ONNX export (#7540 ) Summary: Trigonometry functions are newly added to ONNX in a recent PR https://github.com/onnx/onnx/pull/869 This PR makes pytorch support exporting graphs with trigonometry functions. This PR might need to wait until it is ready to change ```python _onnx_opset_version = 6 ``` to ```python _onnx_opset_version = 7 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/7540 Differential Revision: D9395041 Pulled By: bddppq fbshipit-source-id: bdf3e9d212b911c8c4eacf5a0753bb092e4748d2	2018-08-19 23:01:28 -07:00
Tongzhou Wang	3f603eeee8	some improvements on distributed docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10666 Differential Revision: D9395242 Pulled By: SsnL fbshipit-source-id: 952326b9c5a1a974a1c33a0e12738e1e21ad9956	2018-08-19 17:40:28 -07:00
Tongzhou Wang	108b657159	Import DistributedSampler in utils/data/__init__ (#10671 ) Summary: There is no reason that user should do an extra import to use DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671 Differential Revision: D9395189 Pulled By: SsnL fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2	2018-08-19 16:55:13 -07:00
Edward Yang	6bdbad93b9	Refactor Device to not depend on Backend. (#10478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478 - Removed Backend constructor from Device, and fixed all use-sites to use DeviceType::CPU instead of kCPU, or use a new function backendToDeviceType to perform the conversion. - New method device_type() on Type; it gives you the underlying device type, e.g., CPU for SparseCPU. - We add backward compatibility for kCPU/kCUDA uses, by introducing a new special type which is implicitly convertible to both DeviceType and Backend. As long as you don't define a function that's overloaded on both DeviceType and Backend (but not on BackendOrDeviceType), the implicit conversions will ensure that uses of at::Device(at::kCPU) keep working. We fixed use-sites in the library, but did NOT fix sites in the test code, so that we can exercise this BC code. Reviewed By: Yangqing Differential Revision: D9301861 fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716	2018-08-18 17:39:14 -07:00
Richard Zou	f1420adfe3	Move at::chunk into the graph fuser (#10178 ) Summary: ... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026 This is done through the following: 1) Absorb starting chunks into FusionGroup as a part of the graph fuser pass. 2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked. 3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an input tensor on the CPU. This chunk directly takes in an at::Tensor and creates four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors. - Expect test and correctness test to see if a single chunk is fused by the graph fuser - Correctness test for a variety of chunks (dimension = beginning, middle, end) and tensors (contiguous, non-contiguous, edge case (splitSize = 1) for both CPU/CUDA - Expect test for multiple chunks fused into the same kernel and correctness test. cc zdevito apaszke LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights. After changes: ``` thnn cudnn jit 8.8468 6.5797 9.3470 ``` Before changes: ``` thnn cudnn jit 9.9221 6.6539 11.2550 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178 Differential Revision: D9382661 Pulled By: zou3519 fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8	2018-08-18 16:10:11 -07:00
poh	d87b4e941b	fix python interpreter can not be found without `PYTHON_EXECUTABLE` (#10659 ) Summary: Take 2 of #10543 The problem was that between commit and merge there was added one more run point `tools/build_libtorch.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10659 Differential Revision: D9393540 Pulled By: soumith fbshipit-source-id: 8ebfed600fc735fd1cb0489b161ec80e3db062e0	2018-08-18 15:40:08 -07:00
Taewook Oh	152762a567	Fix warnings diagnosed in recent clang (#10647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10647 Fix "missing std::move from the return value" warning diagnosed by recent clang compiler. Reviewed By: soumith, DavidCallahan Differential Revision: D9384692 fbshipit-source-id: 8ad951e47d605e6f98a9650f2dec2909ad0f3eb8	2018-08-17 21:32:58 -07:00
Richard Zou	e29b5a1ea8	graph fuser inserts explicit expands where necessary (#10325 ) Summary: Fixes #10096 If the only thing preventing a simple mappable operator from being fused into a fusion group is that its Tensor inputs are not of the same shape as the output, then the graph fuser inserts explicit expand nodes for those inputs. This helps the graph fuser not miss out on any fusion opportunities involving simple mappable operations that have Tensor inputs. This PR doesn't do anything for the scalar case; that can be addressed later. Test Plan - Simple expect test case - Added expect tests for a raw LSTMCell. The expands help speed up the forwards pass by allowing more operations to be fused into the LSTMCell's single FusionGroup. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325 Differential Revision: D9379308 Pulled By: zou3519 fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245	2018-08-17 16:03:46 -07:00
Yinghai Lu	7c55d11ba5	Make sure we don't relocate the weight name buffer (#10630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10630 `onnxTensorDescriptorV1.name` points to the string buffer. We use a vector of strings to serve as the storage. This means we cannot reallocate the vector because that may invalidate the `onnxTensorDescriptorV1.name` pointers. Solution is to reserve a large enough vector so that it won't reallocate. Reviewed By: bddppq, houseroad Differential Revision: D9381838 fbshipit-source-id: f49c5719aafcc0829c79f95a2a39a175bcad7bfe	2018-08-17 16:03:31 -07:00
Peter Goldsborough	65b9308128	Basic infrastructure for C++ documentation (#10569 ) Summary: Adds the folder structure, Doxyfile, sphinx setup and Makefile to build C++ docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10569 Differential Revision: D9386744 Pulled By: goldsborough fbshipit-source-id: 0a7c581dcf0a5f7b01ba19d317b493cf95935134	2018-08-17 15:39:50 -07:00
Jesse Hellemn	b62b378022	Adding torch support for CMAKE_ARGS env Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10635 Reviewed By: ml7 Differential Revision: D9383845 Pulled By: pjh5 fbshipit-source-id: fb21bda12e88053eec738974e6e419388c5038d9	2018-08-17 14:54:43 -07:00
Tongzhou Wang	c5c1c051ca	Fix dropout fused kernel applied in eval mode (#10621 ) Summary: fixes https://github.com/pytorch/pytorch/issues/10584 cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621 Differential Revision: D9379397 Pulled By: SsnL fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc	2018-08-17 14:54:42 -07:00
Richard Zou	86c9856d9c	Fuse tensor-scalar ops when scalar is constant (#10511 ) Summary: This is on the way to resolving #9940. Fixes #10501 This PR modifies graph fuser to fuse operations that have constant scalar arguments. These constant scalar arguments are directly inlined into the kernel body. The context for this is that LSTM backward (in particular, sigmoid backward) has many add(x, 1.) operations. This PR should be sufficient for LSTM backward to get fused by the graph fuser. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511 Differential Revision: D9378896 Pulled By: zou3519 fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f	2018-08-17 14:10:23 -07:00
Keren Zhou	f3ac619764	Add fusion support for batchnorm and convolution without bias Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10595 Reviewed By: bwasti Differential Revision: D9110099 fbshipit-source-id: e1ed66c7d82b2f9987b7eb9c7f98877a6dbeb902	2018-08-17 12:11:44 -07:00
Adam Paszke	d35f365ad5	Remove all cuDNN specific inputs to RNN functions (#10581 ) Summary: This is still not the final PR, but it removes all blockers for actually using the RNN functions directly in the JIT. Next patch should be final, and will actually remove the symbolic_override code, and change it to proper symbolics for those ATen functions. Turns out the symbolic code can be also cleaned up a bit, and I'll do that too. zdevito ezyang colesbury (for minor DispatchStub.h) changes There was no way to handle those in the JIT for now, and they turned out to be completely unnecessary. It should make the Python and C++ module code much simpler too, since all the logic is now centralized in the native functions. The downside is that RNN modules no longer own their dropout buffers, which are shared per-device instead (with appropriate locking and synchronization). This might appear as a perf regression at first, but in reality it's highly unlikely that anyone will want to run cuDNN RNNs on the same GPU in parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10581 Reviewed By: colesbury Differential Revision: D9365541 Pulled By: apaszke fbshipit-source-id: 3ef8677ee5481bae60c74a9117a2508665b476b5	2018-08-17 11:09:51 -07:00
Wanchao Liang	52058204d6	Add nn functional tests in JIT (#10409 ) Summary: The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode. Some statistics in summary: - Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional). - 7 functions/ops does not support higher gradient, so just excluded from the whole test. - 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409 Differential Revision: D9350694 Pulled By: wanchaol fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8	2018-08-17 11:09:49 -07:00
Andrey Malevich	b4e72ea811	Revert D9377394: [pytorch][PR] [Caffe2] Add AT_CORE_EXPORT and AT_CORE_IMPORT. Differential Revision: D9377394 Original commit changeset: 993062a461ff fbshipit-source-id: af8ab92e9b88466602508981d9b3ea24ce393dfc	2018-08-17 10:39:27 -07:00
Jongsoo Park	bd9ab650ae	fix compile error in math_hip.cc from new Im2Col/Col2Im interface (#10623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10623 Fix compile error in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-build/10280//console Reviewed By: ezyang Differential Revision: D9379451 fbshipit-source-id: 67cc3964981edba1915b93c49643caa300d63c16	2018-08-17 10:24:25 -07:00
Edward Yang	ff440b61f6	Revert D9378844: [pytorch][PR] fix python interpreter can not be found Differential Revision: D9378844 Original commit changeset: 022e20aab7e2 fbshipit-source-id: 962280707e84edff2a4f59b1ce2f4211a579a055	2018-08-17 10:09:27 -07:00
Elias Ellison	e190505e84	Adding support for inlining if branches (#10084 ) Summary: Inlining if branches which have constant inputs. If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should return a mutated set of (a) and not (a, b). ``` if cond: if True: a = a - 1 else: b = b - 1 ``` To calculate this we recursively update mutate variables in if branches from the leaf nodes up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084 Reviewed By: michaelsuo Differential Revision: D9340429 Pulled By: eellison fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98	2018-08-17 09:48:47 -07:00
Junjie Bai	31c7a32d1c	Include aten_op by default in caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10603 Reviewed By: ahhegazy, dzhulgakov Differential Revision: D9364309 fbshipit-source-id: e72d9f2b1e99cb0fb2186c737fcd925b14d42754	2018-08-17 08:39:46 -07:00
Yinghai Lu	03982fb8d3	Fix subgraph cutting wrt recent external_input change in nomnigraph (#10598 ) Summary: https://github.com/pytorch/pytorch/pull/10100 recently take external input/output in nomnigraph. This PR makes adjust to 0. Relax some of the conditions on external input 1. Update NNModule inputs/outputs when pruning the input/output. 2. Avoiding copying external input/output as nomnigraph already takes care of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10598 Reviewed By: bwasti Differential Revision: D9371730 Pulled By: yinghai fbshipit-source-id: 9273be5041dc4cc8585587f47cb6721e518a06a8	2018-08-17 08:25:49 -07:00
Nikita Melentev	ff3a481aee	fix python interpreter can not be found (#10543 ) Summary: Custom python installation, which have no aliases to `python` or `python3` can't be found by cmake `findPythonInterp` without extra cmake argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10543 Differential Revision: D9378844 Pulled By: ezyang fbshipit-source-id: 022e20aab7e27a5a56b8eb91b6026151116193c7	2018-08-17 08:25:48 -07:00
Tongliang Liao	51222500e2	Add AT_CORE_EXPORT and AT_CORE_IMPORT. (#10602 ) Summary: Fix "error LNK2019: unresolved external symbol" from "CAFFE_KNOWN_TYPE" in tests where we should use dllexport instead of AT_CORE_API(=dllimport). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10602 Differential Revision: D9377394 Pulled By: Yangqing fbshipit-source-id: 993062a461ffce393f2321c5391db5afb9b4e7ba	2018-08-17 02:09:38 -07:00
Jongsoo Park	cc53807be5	group conv with NHWC layout (#10585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585 group conv with NHWC layout Reviewed By: BIT-silence Differential Revision: D7547497 fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661	2018-08-17 00:39:23 -07:00
onnxbot	0aefb9f26c	Update onnx to onnx/onnx@7848f1e (#10613 ) Summary: https://github.com/onnx/onnx/commit/7848f1e Pull Request resolved: https://github.com/pytorch/pytorch/pull/10613 Reviewed By: houseroad Differential Revision: D9376224 Pulled By: bddppq fbshipit-source-id: ce8a53255ba24f0f8f989570e8b015837f8442fb	2018-08-16 23:39:37 -07:00
Summer Deng	6667d55e73	Disallow input filler for GatherRangesOp (#10592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10592 Filter out GatherRanges ops Reviewed By: highker Differential Revision: D9365220 fbshipit-source-id: e21ab00dc9e553c9aaf172e1241206e0c0a7a23d	2018-08-16 21:39:09 -07:00
Hassan Eslami	3578909671	Remove unused code base for distributed training (#10282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10282 This diff removes the unused/deprecated features from the code base. Reviewed By: manojkris Differential Revision: D9169859 fbshipit-source-id: d6447b7916a7c687b44b20da868112e6720ba245	2018-08-16 20:10:17 -07:00
Anders Papitto	f1d40ef280	build_pytorch_libs.sh: use MAX_JOBS rather than NUM_JOBS (#10600 ) Summary: MAX_JOBS is set by our jenkins setup Pull Request resolved: https://github.com/pytorch/pytorch/pull/10600 Differential Revision: D9375317 Pulled By: anderspapitto fbshipit-source-id: 25416d5ee12372f7610baa78cb7b423806b26aa2	2018-08-16 20:10:15 -07:00
Peter Goldsborough	c101a57a74	Build mechanism for custom operators (#10226 ) Summary: This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I: 1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries 2. Created a ` torch/op.h` header for easy inclusion of necessary headers, 3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op. 1. It defines an op in `op.{h,cpp}` 2. Registers it with the JIT using `RegisterOperators` 3. Builds it into a shared library via a `CMakeLists.txt` 4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey! The pure C++ and the Python builds are separate and not coupled in any way. zdevito soumith dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226 Differential Revision: D9296839 Pulled By: goldsborough fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0	2018-08-16 18:56:17 -07:00
Marat Dukhan	67c6d93634	Tune minimal work size (#10599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599 Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed. Reviewed By: ajtulloch Differential Revision: D9366664 fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e	2018-08-16 17:39:57 -07:00
Jerry Ma	afd7477eaa	Add ``buffers(),` `named_buffers()`` methods. (#10554 ) Summary: This commit adds the ``buffers()`` and ``named_buffers()`` methods as analogues of ``parameters()`` and ``named_parameters()``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554 Reviewed By: SsnL Differential Revision: D9367762 Pulled By: jma127 fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e	2018-08-16 16:26:48 -07:00
Junjie Bai	342517e6e7	Back out "Add aten_op to caffe2 onnx (python) backend" (#10589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10589 Original commit changeset: 2cc6fedbaf08 Reviewed By: houseroad Differential Revision: D9365208 fbshipit-source-id: 3871d8e70f0d8e48c8af9593c78587d16c45afc2	2018-08-16 15:15:27 -07:00
Orion Reblitz-Richardson	488ea824ed	Additional changes to make GPU builds work (#10507 ) Summary: A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds. I was testing with ``` FULL_CAFFE2=1 python setup.py build_deps \| tee ~/log.txt cat ~/log.txt \| egrep 'undefined refer' \| sort \| less ``` I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing. cc mingzhe09088 anderspapitto ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507 Reviewed By: Yangqing Differential Revision: D9359606 Pulled By: orionr fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2	2018-08-16 13:25:27 -07:00
Ailing Zhang	ef15bb8787	remove implicit conversion from gpu to cpu (#10553 ) Summary: Resubmit #10416 with fixed tests . This is to remove implicit conversion from gpu to cpu in when calling numpy to keep behavior match others. It requires users to move the tensor back to cpu() before call numpy functions on it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10553 Differential Revision: D9350212 Pulled By: ailzhang fbshipit-source-id: 9317d8fea925d4b20ae3150e2c1b39ba5c9c9d0a	2018-08-16 12:10:39 -07:00
Chao Li	d6f3c88418	Revert D9076734: Split storage from tensor Differential Revision: D9076734 Original commit changeset: ea9e1094ecf8 fbshipit-source-id: 3fa9b65b7265fce6207d9e1d9ef4707dbb29704b	2018-08-16 11:25:32 -07:00
Kirtesh Patil	40a070422d	Adding new allreduce bcube routines to ops supported by gloo (#10494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10494 Adding the AllredubeBcube routines as they are now available in gloo. Reviewed By: wesolwsk Differential Revision: D8269473 fbshipit-source-id: 6a3a32291bbf1fbb328b3ced0f2a753dc5caf4e5	2018-08-16 10:56:26 -07:00
Yinghai Lu	4be4b4c8b5	Remove weight from input of onnxifi backend op (#10575 ) Summary: The ONNXIFI backend will absorb the constant weight in Conv, so we should not add it as an input. This is just a test artifacts. Note that Onnxifi transformer will do the right thing when cutting the graph to absorb the weights. rdzhabarov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10575 Reviewed By: houseroad Differential Revision: D9357339 Pulled By: yinghai fbshipit-source-id: a613fa3acafa687295312f5211f8e9d7f77b39cd	2018-08-16 10:56:25 -07:00
Fei Sun	319fefe9e6	Support benchmark on windows machines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10564 Reviewed By: llyfacebook Differential Revision: D9356389 Pulled By: sf-wind fbshipit-source-id: f6c58e68d3eaf3a39c9f89b8f04e6039c75b4cd9	2018-08-16 10:56:23 -07:00
Gregory Chanan	00f2731112	Merge THTensor into TensorImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10479 Differential Revision: D9315800 Pulled By: gchanan fbshipit-source-id: b13ef0de3342600b02b54e0700eb02021a9d1a9e	2018-08-16 08:10:06 -07:00
Anders Papitto	130881f0e3	Delete build_caffe2.sh, replace with build_libtorch.py (#10508 ) Summary: delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR). This ensures that all consumers of the torch CMake file go through as unified a path as possible. In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508 Differential Revision: D9354398 Pulled By: anderspapitto fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9	2018-08-16 08:10:04 -07:00
Edward Yang	c6facc2aaa	Add conversions between DataType and ScalarType. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10472 Reviewed By: gchanan Differential Revision: D9298048 fbshipit-source-id: c58efa582eab64c58d0771d90d90862911c168d1	2018-08-16 07:55:31 -07:00
Edward Yang	fdd2b9baee	Add DataType alias Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10547 Reviewed By: soumith Differential Revision: D9346040 fbshipit-source-id: 1069a44182ccff68b1694086c8b709ba2046b22b	2018-08-16 07:55:29 -07:00
Edward Yang	8fdba4ec35	Move all operator<< overloads out of the global namespace. (#10546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10546 Have you ever written an operator<< overload in the caffe2 namespace in a core Caffe2 header, and then been stunned when some completely unrelated code started breaking? This diff fixes this problem! The problem looks like this: 1. You're building against a really old version of glog (think 0.3.2, or something like that) 2. This version of glog defines operator<< overloads for std containers in the global namespace 3. You add a new overload in your current namespace (e.g., caffe2). Congratulations: this overload is preferentially chosen over the global namespace one for all calls to << in that namespace. And since it doesn't actually have std::vector overloads, unrelated Caffe2 code breaks. Newer versions of glog have a fix for this: they have the line: namespace std { using ::operator<<; } in their header. So let's help old versions of glog out and do this ourselves. In our new world order, operator<< overloads defined in the global namespace won't work (unless they're for std containers, which work because of ADL). So this diff also moves all those overloads to the correct namespace. Reviewed By: dzhulgakov Differential Revision: D9344540 fbshipit-source-id: 6246ed50b86312668ebbd7b039fcd1233a3609cf	2018-08-16 07:55:27 -07:00
Tongliang Liao	238b4b9236	Resolve error C2370 "redefinition; different storage class" by adding dllimport. (#10571 ) Summary: For #10568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10571 Differential Revision: D9357987 Pulled By: Yangqing fbshipit-source-id: 6726f0a1d31a225375a0ddc0e05284f3eb89dda8	2018-08-16 00:39:33 -07:00
Junjie Bai	84427d26db	Add aten_op to caffe2 onnx (python) backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10579 Reviewed By: houseroad Differential Revision: D9357837 fbshipit-source-id: 2cc6fedbaf088df7e11b52a91dfe3b8f0d7fd599	2018-08-16 00:39:30 -07:00
Junjie Bai	76da0b34c2	Remove an unused variable found by linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10578 Differential Revision: D9357880 Pulled By: bddppq fbshipit-source-id: 6b56c2dbd02258124b5a4656cdf44d14a59e1b71	2018-08-16 00:25:44 -07:00
Tongliang Liao	7487ee55f1	Resolving error C2487 "member of dll interface class may not be declared with dll interface" by removing nested CAFFE2_API. (#10572 ) Summary: For #10570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10572 Differential Revision: D9357984 Pulled By: Yangqing fbshipit-source-id: a8f74e384eb3219fb6ac71ada4a45e6bce9199eb	2018-08-16 00:25:41 -07:00
Owen Anderson	abf85bf0ef	Perform CSE across block boundaries. (#10105 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105 Differential Revision: D9186678 Pulled By: resistor fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d	2018-08-16 00:25:36 -07:00
Peter Goldsborough	2e0dd86903	Make torch::Tensor -> at::Tensor (#10516 ) Summary: This PR removes the `using Tensor = autograd::Variable;` alias from `torch/tensor.h`, which means `torch::Tensor` is now `at::Tensor`. This PR fixes up some last uses of `.data()` and tidies up the resulting code. For example, I was able to remove `TensorListView` such that code like ``` auto loss = torch::stack(torch::TensorListView(policy_loss)).sum() + torch::stack(torch::TensorListView(value_loss)).sum(); ``` is now ``` auto loss = torch::stack(policy_loss).sum() + torch::stack(value_loss).sum(); ``` CC jgehring ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/10516 Differential Revision: D9324691 Pulled By: goldsborough fbshipit-source-id: a7c1cb779c9c829f89cea55f07ac539b00c78449	2018-08-15 21:25:12 -07:00
Vishwak Srinivasan	8013dac43d	Fix bincount for empty input (#9757 ) Summary: Added tests too. Fixes #9756 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757 Reviewed By: Yangqing Differential Revision: D9348485 Pulled By: soumith fbshipit-source-id: e13afadf8dbea20ee6ee595383c522dcbaf8796a	2018-08-15 20:55:59 -07:00
Teng Li	05dcf00644	fixed c10d test (#10557 ) Summary: fixed NCCL test, which is not run in CI. We should enable it soon. ``` ~/new_pytorch/pytorch/test$ python test_c10d.py ............... ---------------------------------------------------------------------- Ran 15 tests in 13.099s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10557 Reviewed By: ailzhang Differential Revision: D9353286 Pulled By: teng-li fbshipit-source-id: 5a722975beaa601203f51c723522cc881f2d2090	2018-08-15 17:22:38 -07:00
Yangqing Jia	0a809fc8b1	build changes to make cpu unified build working. (#10504 ) Summary: Properly annotated all apis for cpu front. Checked with cmake using cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON and resulting libcaffe2.so has about 11k symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504 Reviewed By: ezyang Differential Revision: D9316491 Pulled By: Yangqing fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454	2018-08-15 17:22:36 -07:00
Xiaomeng Yang	87cac4c2f1	Update Im2Col related to make preparation for group conv in NHWC order. (#10439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439 Update Im2Col related to make preparation for group conv in NHWC order. Reviewed By: houseroad Differential Revision: D9285344 fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d	2018-08-15 17:10:24 -07:00
Yiming Wu	579962f2a8	reroute tensor feature in core.Net and generate one net feature in model_helper (#10528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10528 adding 2 features to core and model_helper - reroute_tensor which supports op insertion on net level - model_helper complete net and cut net used for full graph analysis Differential Revision: D9330345 fbshipit-source-id: 56341d3f500e72069ee306e20266c8590ae7985a	2018-08-15 16:40:15 -07:00
Jerry Zhang	523bdc8ec1	Split storage from tensor (#10053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053 Tensor in Pytorch 1.0 will have Tensor -> TensorImpl -> Storage -> StorageImpl In this diff we split Storage from Tensor in order to align with this design. We'll have Tensor -> Storage -> StorageImpl after this diff Reviewed By: dzhulgakov Differential Revision: D9076734 fbshipit-source-id: ea9e1094ecf8c6eaeaa642413c56c6a95fb3d14e	2018-08-15 16:40:14 -07:00
Gregory Chanan	03e9ea5ef0	Fix leaking of Storages (not StorageImpls) (#10552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10552 Fix leaking of Storages (not StorageImpls) Reviewed By: li-roy Differential Revision: D9349824 fbshipit-source-id: 31f14951020a63189bebda25a3bf8bf195cd227f	2018-08-15 16:10:00 -07:00
Lukasz Wesolowski	4c49da34a9	Add new MKLDNN fallback operators (#10526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10526 Resubmitting these changes. Previously they caused issues with multifeed, which I fixed with D9280622 Reviewed By: yinghai Differential Revision: D9327323 fbshipit-source-id: ec69428039b45c6221a5403b8fe9a83637857f04	2018-08-15 15:55:22 -07:00
Simon Wang	a129f9ad3b	Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation. Differential Revision: D9332335 Original commit changeset: 1b3a91d078ef fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec	2018-08-15 15:25:11 -07:00
Thomas Viehmann	151e7de893	varargs for einsum (#10067 ) Summary: Implemented via a wrapper, thank you Richard for the suggestion! Fixes: #9929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10067 Differential Revision: D9083388 Pulled By: soumith fbshipit-source-id: 9ab21cd35278b01962e11d3e70781829bf4a36da	2018-08-15 15:13:25 -07:00
Will Feng	fb45ec5ac3	Don't set DEBUG=1 in ASAN build (#9902 ) Summary: This should make ASAN tests run faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9902 Differential Revision: D9032986 Pulled By: yf225 fbshipit-source-id: 3d2edec2d7ce78bc995d25865aa82ba6d3f971d0	2018-08-15 14:39:57 -07:00
Marat Dukhan	26c764a1db	Update FP16 submodule. Close #10523 (#10548 ) Summary: Pull a fix in FP16 for compilation bug when using Intel Compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/10548 Differential Revision: D9349469 Pulled By: Maratyszcza fbshipit-source-id: 43e6dc5c3c18319d31eca23426770c73795feec5	2018-08-15 14:26:56 -07:00
Orion Reblitz-Richardson	021b4888db	Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530 ) Summary: In my environment, it looks like setup.py hangs when running ``` FULL_CAFFE2=1 python setup.py build_deps ``` Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530 Differential Revision: D9349597 Pulled By: orionr fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4	2018-08-15 14:26:53 -07:00
Eli Amesefe	c5b1aa93ee	Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316 Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor. Reviewed By: harouwu Differential Revision: D9004839 fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845	2018-08-15 14:26:50 -07:00
Edward Yang	6f14202acd	Revert D9276252: [pytorch][PR] remove implicit conversion to cpu Differential Revision: D9276252 Original commit changeset: ea7d9d4f9390 fbshipit-source-id: 5977bf90d4c84b47e15bc8266cc3ce5602c4e05f	2018-08-15 13:55:18 -07:00
Syed Tousif Ahmed	5adcac3dce	Cuda half macros cleanup (#10147 ) Summary: This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on mruberry's https://github.com/pytorch/pytorch/pull/9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds. - Commit `1248de7baf` removes the code paths guarded by `CUDA_HALF_INSTRUCTIONS` macro. Since the macro was removed in commit `2f186df52d`, `ifdef CUDA_HALF_INSTRUCTIONS` would return false and hence the code path that is kept after this change is for the false case of `ifdef CUDA_HALF_INSTRUCTIONS` - Commit `520c99b057` removes the code paths guarded by `CUDA_HALF_TENSOR` macro. Since Pytorch now provides support for only CUDA 8.0 and above, `CUDA_HALF_TENSOR` is always true since CUDA 8.0 satisfies `CUDA_HAS_FP16` and hence, the code path that is kept after this change is for the true case of `ifdef CUDA_HALF_TENSOR`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10147 Differential Revision: D9345940 Pulled By: soumith fbshipit-source-id: c9392261dd432d304f1cdaf961760cbd164a59d0	2018-08-15 13:25:42 -07:00
Adam Paszke	86363e1d8e	Move RNN implementations to C++ (#10481 ) Summary: This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR. Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that). zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481 Reviewed By: ezyang Differential Revision: D9341113 Pulled By: apaszke fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5	2018-08-15 13:25:41 -07:00
Thomas Viehmann	484395edfb	Fix corner case with torch.multinomial (#9960 ) Summary: In the shortcut for n_sample=1, when category 0 has 0 weight, we should not map the (uniform) sample 0 to category 0. The conversion uniform->multinomial was apparently written to work on a (0,1] range (like curand uses), but PyTorch uses a [0,1) range. Fixes: #4858. Thank you, Roy Fejgin for reporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9960 Reviewed By: soumith Differential Revision: D9341793 Pulled By: ailzhang fbshipit-source-id: 6b1a96419a7bc58cc594f761f34c6408ff6354cf	2018-08-15 13:25:39 -07:00
Bram Wasti	fb09292020	Increase tolerance in ConvBN test Summary: reduce flakiness of test Reviewed By: Maratyszcza Differential Revision: D9344877 fbshipit-source-id: 24d5e1b873f94d816c980f3b7db93248cf10aca5	2018-08-15 13:14:35 -07:00
Tongzhou Wang	254dedf604	Propagate NaN through threshold (#10277 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277 Reviewed By: SsnL Differential Revision: D9199825 Pulled By: soumith fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2	2018-08-15 12:59:31 -07:00
Will Feng	0bbcc7b534	Don't assume curl version in Windows build script (#10476 ) Summary: Since we can't specify version number to `choco install curl`, we should not assume that `7.57.0` is the curl version that's in the Windows AMI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10476 Differential Revision: D9303129 Pulled By: yf225 fbshipit-source-id: 198544be68330860fbcf93c99bc995f4e280bda7	2018-08-15 12:59:23 -07:00
James Sun	85408e744f	Move filler interface to operator schema (#10522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10522 Move filler interface to operator schema to avoid extra code for caffe2 mobile. Reviewed By: dzhulgakov Differential Revision: D9312940 fbshipit-source-id: 77fb2406f0c6b171a1912a207e05e36da50c6966	2018-08-15 12:40:18 -07:00
Johan Gudmundsson	9646d68962	support broadcasting in _kl_categorical_categorical (#10533 ) Summary: Support broadcasting in _kl_categorical_categorical this makes it possible to do: ``` import torch.distributions as dist import torch p_dist = dist.Categorical(torch.ones(1,10)) q_dist = dist.Categorical(torch.ones(100,10)) dist.kl_divergence(p_dist, q_dist) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10533 Differential Revision: D9341252 Pulled By: soumith fbshipit-source-id: 34575b30160b43b6c9e4c3070dd7ef07c00ff5d7	2018-08-15 12:40:17 -07:00
Orion Reblitz-Richardson	05a260da43	Bump gloo to latest master (#10545 ) Summary: Needed by the Gloo development team. Verifying nothing breaks in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10545 Reviewed By: Maratyszcza Differential Revision: D9344413 Pulled By: orionr fbshipit-source-id: 207edb71170870bacec47a635a12d7f55b6c1275	2018-08-15 12:25:44 -07:00
Ailing Zhang	5d27d68779	remove implicit conversion to cpu (#10416 ) Summary: Fixes #9934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10416 Differential Revision: D9276252 Pulled By: ailzhang fbshipit-source-id: ea7d9d4f9390edefcd0865a98498f6c4307c291d	2018-08-15 12:25:42 -07:00
Brian Hart	9cffe783f1	relax tolerance for two torch.half (float16) tests (#10519 ) Summary: Two tests in the 'nn' test bucket may fail when the torch.half (float16) data type is used. The assertions used in the tests intend to allow slight floating point imprecision in the results, but the tolerances used for the comparisons are too strict for the half type. Relax the tolerances so that slight float16 imprecision won't cause test failures. The affected tests are: - test_variable_sequence_cuda - test_Conv2d_groups_nobias For more information, see issue: https://github.com/pytorch/pytorch/issues/7420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519 Differential Revision: D9343751 Pulled By: soumith fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a	2018-08-15 12:11:20 -07:00
Duc Ngo	d93e8ab343	Nomnigraph - Refactor SubtreeMatchCriteria to become a Graph of MatchNode (#10512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10512 SubtreeMatchCriteria now becomes a graph of MatchNode MatchNode consists of NodeMatchCriteria, nonTerminal and count. This is a cleaner internal representation of the data structure and will bring us much closer to DAG matching. Note that I still keep the debugString method because convertToDotGraph doesn't currently work with Subgraph. Reviewed By: bwasti Differential Revision: D9321695 fbshipit-source-id: 58a76f007a9a95d18cf807d419c2b595e9bc847f	2018-08-15 12:11:18 -07:00
Mingfei Ma	f59bcea2c3	parallel max and min for ATen on CPU (#10343 ) Summary: optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10343 Differential Revision: D9330799 Pulled By: ezyang fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c	2018-08-15 11:41:01 -07:00
Bangsheng Tang	44b029f5b8	move matrix formation for dot products to precompute/request-only (#10531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10531 fixed a naming issue in pairwise_similarity Reviewed By: huayuli00 Differential Revision: D9331716 fbshipit-source-id: d7de36f20504c08b1c7871ccdffa343221a3da0c	2018-08-15 11:02:10 -07:00
Eli Stevens	f5a4dd89b5	Implements volumetric (5d) affine grid generation. (#8322 ) Summary: I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline. I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all. Diff probably best viewed with whitespace changes ignored. Thanks for considering! Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322 Differential Revision: D9332335 Pulled By: SsnL fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df	2018-08-15 11:02:08 -07:00
Jongsoo Park	d8ff7ad6f8	generalize order switch ops for 1-3d (#10395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395 Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images. This diff generalizes them to 1D and 3D, and also add a unit test we didn't have. Reviewed By: protonu Differential Revision: D9261177 fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda	2018-08-15 10:09:31 -07:00
James Reed	0f05f5fb07	ATen layer norm symbolic (#10513 ) Summary: We can't rely on the ATen fallback pathway here because we need to parse out the constant attributes explicitly Pull Request resolved: https://github.com/pytorch/pytorch/pull/10513 Reviewed By: dzhulgakov Differential Revision: D9322133 Pulled By: jamesr66a fbshipit-source-id: 52af947e6c44532ef220cb4b94838ca838b5df06	2018-08-15 08:28:52 -07:00
Peizhao Zhang	ce8e8feceb	Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390 Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. * The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold. * In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'. Reviewed By: wat3rBro Differential Revision: D9252726 fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499	2018-08-14 23:54:23 -07:00
Matt Dawkins	e41528a5cc	Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379 ) Summary: Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations: OSError: [WinError6] The handle is invalid At: C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda> C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_ C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379 Differential Revision: D9330772 Pulled By: ezyang fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57	2018-08-14 23:10:20 -07:00
Freddie Mendoza	f1631c3106	Modify build.sh and test.sh scripts for ppc64le jenkins build and test (#10257 ) Summary: Initial jenkins builds / test scripts for ppc64le. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10257 Differential Revision: D9331278 Pulled By: ezyang fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838	2018-08-14 21:54:44 -07:00
Wei Yang	19ad55cc02	set coalesced=false at sparse transpose() and removed transpose invariants (#10496 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/6219 - removed invariants at https://github.com/pytorch/pytorch/pull/4707 - assume a sparse tensor with coalesced=true when: 1. its elements are unique and 2. the indices are in sorted order Pull Request resolved: https://github.com/pytorch/pytorch/pull/10496 Differential Revision: D9311214 Pulled By: weiyangfb fbshipit-source-id: 167fa5a8e9e5f9c800db02f728a1194029f7e4f3	2018-08-14 21:25:37 -07:00
Mingzhe Li	964e30de1d	Workaround for Cuda9.2 and GCC7 compilation errors (#10510 ) Summary: Breaking out of #8338 This PR is a workaround for a bug with CUDA9.2 + GCC7. Here is the error this PR fixed: .../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace)’: .../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’ BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace ws) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510 Reviewed By: orionr Differential Revision: D9319742 Pulled By: mingzhe09088 fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81	2018-08-14 20:54:52 -07:00
Teng Li	b6cc65afea	Send, Recv, RecvAnysource, Barrier Op for MPI PG and Python Bindings (#10227 ) Summary: Based on: https://github.com/pytorch/pytorch/pull/10199 Added: (1) send, recv, recvanysource, and barrier for MPI process group. (2) python binding (3) testing Please review: `2e64f5d675` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10227 Reviewed By: ailzhang Differential Revision: D9327138 Pulled By: teng-li fbshipit-source-id: 80496714550a3ca498eb474465ddbd1b8d657d49	2018-08-14 20:10:11 -07:00
Zeming Lin	26e40fa665	Tensor.accessor now fails on rvalue reference (#10518 ) Summary: Previously, it's easy to do `x[0].accessor<float, 2>()`. However, x[0] is a temporary, so the accessor will point to invalid strides/sizes and probably segfault. With this change, such unsafe code is a compile error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10518 Reviewed By: goldsborough Differential Revision: D9329288 Pulled By: ebetica fbshipit-source-id: d08763bee9a19a898b9d1ea5ba648f27baa1992f	2018-08-14 19:41:31 -07:00
Wei Wen	17ecc06b65	static casting TIndex (#10514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10514 fix the bug which break the windows build in fused_rowwise_random_quantization_ops.h Reviewed By: ezyang, jspark1105 Differential Revision: D9322291 fbshipit-source-id: a6a27e87423b6caa973414ffd7ccb12076f2e1e4	2018-08-14 18:42:44 -07:00
Junjie Bai	60aa416a6d	Re-purpose setup_caffe2.py for faster caffe2 build iterations (#10520 ) Summary: setup.py is the official install script, setup_caffe2.py is not used any more Pull Request resolved: https://github.com/pytorch/pytorch/pull/10520 Reviewed By: yinghai Differential Revision: D9325548 Pulled By: bddppq fbshipit-source-id: 3dda87f3dff061b574fd1d5c91859044f065ee33	2018-08-14 18:13:19 -07:00
James Reed	32bb4040dd	Unified type annotation parsing for script frontends (#10279 ) Summary: After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations. Possible TODOs: - Clean up the functions marked HACK - Clean up the Subscript tree-view to better match the Python AST versions - Can we use this for Python functions? That's the only place annotations.get_signature() is still needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279 Differential Revision: D9319726 Pulled By: jamesr66a fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa	2018-08-14 18:13:15 -07:00
Teng Li	b69b1c477b	Adding python binding for MPI process group (#10199 ) Summary: Based on https://github.com/pytorch/pytorch/pull/10159 Please review ProcessGroupMPI.cpp/hpp and init.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/10199 Reviewed By: yf225 Differential Revision: D9324027 Pulled By: teng-li fbshipit-source-id: 2dd524bee0c7ca8f9594ec3b4f3ebbbb608df337	2018-08-14 15:56:33 -07:00
Duc Ngo	39bfc2d0d4	Nomnigraph - add diagnostic ability for Subgraph matching API (#10267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10267 isSubtreeMatch now returns a SubtreeMatchResult which contains a match flag and a debugMessage string that contains the reason why a subtree is not matched (if requested). Reviewed By: bwasti Differential Revision: D9182429 fbshipit-source-id: 530591fad592d02fb4c31fc398960a14ec90c86a	2018-08-14 15:56:31 -07:00
Teng Li	3c39e857ca	Python binding for reduce,allgather,scatter,gather ops and python tests (#10159 ) Summary: Provided python binding for these four ops. Also provided nccl binding test. Based on https://github.com/pytorch/pytorch/pull/10058 Please only review init.cpp, and test file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10159 Reviewed By: yf225 Differential Revision: D9323192 Pulled By: teng-li fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e	2018-08-14 14:24:57 -07:00
Lara Haidar-Ahmad	16ecd6f99c	Fix Debug Build On Windows (#10359 ) Summary: compile files in torch/csrc with /MDd runtime library option for debug build on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/10359 Differential Revision: D9316946 Pulled By: SsnL fbshipit-source-id: c84bfad81d61cd49f39b7bce7177edd2b1e8bd69	2018-08-14 13:24:14 -07:00
Teng Li	3f3a30f79c	Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups (#10058 ) Summary: Added - Reduce (both NCCL and MPI) - AllGather (both NCCL and MPI) - Gather (MPI) - Scatter (MPI) for c10d process groups. This basically finalizes all supported ops for C10d to match THD. All ops are tested as well. ``` mpirun -np 8 ./ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` ``` ./ProcessGroupNCCLTest Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10058 Reviewed By: yf225 Differential Revision: D9316312 Pulled By: teng-li fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f	2018-08-14 13:10:21 -07:00
Peter Goldsborough	13814d6744	Remove use of data() in optimizers (#10490 ) Summary: After talking to users of the C++ API we found that having the tensor type be `autograd::Variable` causes more complications than having it be `at::Tensor`. It used to be a problem because `at::Tensor` didn't have the "autograd API" of variable (e.g. `detach()` or `grad()` methods), but those methods are now on `at::Tensor`. As such, we want to make a last big breaking change to have the tensor type be `at::Tensor`, while factory methods like `torch::ones` will return `Variable`s disguised as `at::Tensor`. This will make many things easier, like calling functions in ATen that take vectors of tensors. This PR makes a small step in this direction by updating the optimizer classes to not use `.data()` on `Variable` to access the underlying `at::Tensor`. Using `.data()` is effectively a hack to work around our modification rules for tensors that require grad. The proper way of doing things is to use `with torch.no_grad` or equivalently `NoGradGuard` in C++ to guard in-place operations. The next step can then simply redefine `torch::Tensor` to be `at::Tensor`. This transition should be smooth, since all methods available on `Variable` are at this point available on `at::Tensor`. For this PR I: 1. Modified the implementations of optimizers to not use `.data()`. This means the implementations are now different from PyTorch, which still uses the legacy method of using `.data`. 2. To properly verify (1), I added more fine-grained test cases to our optimizer tests, e.g. `SGD` with and without `weight_decay`, then with `nesterov` etc. Generally more tests = more happy! 3. Minor cleanup of the optimizer codebase ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10490 Differential Revision: D9318229 Pulled By: goldsborough fbshipit-source-id: fb386700f37840542bc5d323f308ea88fe5ea5c5	2018-08-14 13:10:19 -07:00
Lu Fang	bdb11e716a	Split the dependence of ONNX from test_operators.py (#10151 ) Summary: Now, run `python test/onnx/test_operators.py --no-onnx`, we won't introduce any onnx python dependence. (No onnx/protobuf python packages needs to be installed) The major changes: - output pbtxt from C++ exporter directly, so the floating format may be slightly different. (This should be fine, since it's just to guard ONNX exporting.) - ONNX python packages are only imported if we run the ONNX related checks. Those checks are disabled when using `--no-onnx` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10151 Reviewed By: jamesr66a Differential Revision: D9130706 Pulled By: houseroad fbshipit-source-id: ea28cf5db8399929179698ee535137f209e9ce6f	2018-08-14 12:54:44 -07:00
Jan Stria	eea8ab1861	Move common code to RNNCellBase. (#10399 ) Summary: There are three classes `RNNCell`, `LSTMCell`, `GRUCell` inherited from `RNNCellBase`, all defining the identical initialization function `reset_parameters`. Lets move it to the common base. Another option is to have different initialization for RNN, LSTM and GRU. Maybe those weights whose output is processed with sigmoid (i.e. gain=1) should be initialized differently from those going to tanh (gain=5/3)? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10399 Differential Revision: D9316978 Pulled By: SsnL fbshipit-source-id: a2d9408f0b5c971a3e6c3d42e4673725cf03ecc1	2018-08-14 12:39:59 -07:00
Jongsoo Park	bd497809e2	CAFFE_ENFORCE -> CAFFE_ENFORCE_EQ for error with more information (#10244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10244 Use CAFFE_ENFORCE_EQ(x, y) instead of CAFFE_ENFORCE(x == y) in conv_op_impl.h for error messages with more information. Reviewed By: viswanathgs Differential Revision: D9177091 fbshipit-source-id: cf8d10afec1ce6793d3ae0b62f05648722a4130b	2018-08-14 12:24:44 -07:00
Edward Yang	2400512a08	Remove unnecessary include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10486 Reviewed By: ml7 Differential Revision: D9305283 fbshipit-source-id: 0d1316f9a72670ddbe8d95ead93603d00ad0f63b	2018-08-14 12:10:04 -07:00
Anders Papitto	d1442b36f3	add a rebuild_libtorch command for speedier iteration. (#10036 ) Summary: It just calls into `ninja install`. For iterative work on libtorch.so/_C.so, `python setup.py rebuild_libtorch develop` should provide quick iteration Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036 Differential Revision: D9317869 Pulled By: anderspapitto fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2	2018-08-14 12:10:02 -07:00
Peizhao Zhang	520f4f6cb9	Added some unit test for box_with_nms_limit_op. (#10389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389 Added some unit test for box_with_nms_limit_op. Reviewed By: wat3rBro Differential Revision: D9237860 fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731	2018-08-14 11:55:03 -07:00
Tongzhou Wang	d043f83019	Add tests for Tensor.* nn.* F.* docs (#10311 ) Summary: Test only for existence for now. I had to skip a lot of them so there a FIXME in the test. Also I'm not testing torch.* because of namespace issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311 Differential Revision: D9196341 Pulled By: SsnL fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc	2018-08-14 11:39:46 -07:00
Richard Zou	b4462511fd	Add LSTMCell backward pass expect tests (#10506 ) Summary: - Exposed get_debug_graph for ScriptModule (gets the debug graph for its forward Method) - Added forward/backward expect tests for lstm and milstm cells. These are intended to prevent regressions cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506 Differential Revision: D9316590 Pulled By: zou3519 fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf	2018-08-14 11:39:44 -07:00
Yinghai Lu	e5811becdd	Add tags for onnx tensor descriptors (#10502 ) Summary: We missed 2 places to add tags when we create tensor descriptors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10502 Reviewed By: Maratyszcza Differential Revision: D9312075 Pulled By: yinghai fbshipit-source-id: 329e83ec5470b0a778d2eda525dd6f2143facbdf	2018-08-14 11:25:52 -07:00
Orion Reblitz-Richardson	9497383706	Fix some warnings (#10297 ) Summary: Fixing some compiler warnings while looking at symbol visibility. cc smessmer ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10297 Reviewed By: soumith Differential Revision: D9195336 Pulled By: orionr fbshipit-source-id: 04cbfd3549984caec7bdd1a5b39a6d25e80348e9	2018-08-14 10:40:08 -07:00
Zachary DeVito	61bedc96f0	Schema-based creation of graph nodes (#10198 ) Summary: This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload. Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well: ``` g.insert(aten::add, {v, 3}); ``` Keyword arguments are also supported: ``` g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}}); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198 Differential Revision: D9307252 Pulled By: zdevito fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b	2018-08-14 10:25:38 -07:00
LadyRick	3a40baa15c	fix a grammatical error: accelerate compute (#10204 ) Summary: "accelerate compute" a verb shouldn't go with another verb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10204 Differential Revision: D9316699 Pulled By: fmassa fbshipit-source-id: f1126c594905c3236ffd6b7e57a92552d3d4c1f1	2018-08-14 10:11:15 -07:00
Ahti Kalervo	ef44faece2	check attribute existence in torch.legay.nn.SpatialFullConvolution in method type (#8740 ) Summary: This is related to #5255 When adding cuda support for the model, this error comes: `` AttributeError: 'SpatialFullConvolution' object has no attribute 'finput' `` here is my short code for test. https://gist.github.com/kaleaht/26518c3deea5d1d3dda722fbf1f3ecdc I converted torch7's model also from here. https://github.com/art-programmer/FloorplanTransformation Pull Request resolved: https://github.com/pytorch/pytorch/pull/8740 Differential Revision: D8872735 Pulled By: SsnL fbshipit-source-id: 8d97f8b59cdf4049e87be14b78c4608fd973d149	2018-08-14 10:11:13 -07:00
jgong5	329d901a91	Fold AffineChannel to Conv, the same way as BN (for Detectron models) (#10293 ) Summary: AffineChannel is being used by public Detectron models, e.g. Mask-RCNN and Faster-RCNN. This PR folds this op into convolution the same way as BN to speed up inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10293 Differential Revision: D9276789 Pulled By: yinghai fbshipit-source-id: fbf6dd2c1be05f5713f760752e7245b1320a122b	2018-08-13 22:43:37 -07:00
Bram Wasti	c618df154e	Add intrinsic support for external_input/output to nomnigraph (#10100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10100 nomnigraph has until this point tried to ignore external input and output, as they aren't very well defined (does order matter?). but for DCE and some of Keren's work they are becoming necessary. I went ahead and added this to the core nomnigraph converter Reviewed By: yinghai Differential Revision: D9105487 fbshipit-source-id: a2e10e3cc84515611d6ab7d4bc54cf99b77729c0	2018-08-13 21:39:17 -07:00
Vishwak Srinivasan	7d16e87f14	Fix byte ordering issue in from_numpy (#9508 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3671 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9508 Differential Revision: D9307186 Pulled By: soumith fbshipit-source-id: 39dcaa6fd2d330d7085802acd6f63c19270164fa	2018-08-13 21:39:16 -07:00
peter	facb293aad	Fix FindMKL.cmake for Windows (#10453 ) Summary: Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453 Differential Revision: D9311591 Pulled By: soumith fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31	2018-08-13 21:09:27 -07:00
Richard Zou	fed05cf4cf	Fix prim::FusedConcat bug (#10466 ) Summary: Fixes #10456 The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example, ``` import torch torch.jit.script def fn(x, y, z): x1 = x + y y1 = x - y w = torch.cat([x1, y1]) return w + z x = torch.randn(2, 2, dtype=torch.float, device='cpu') y = torch.randn(2, 2, dtype=torch.float, device='cpu') z = torch.randn(4, 2, dtype=torch.float, device='cpu') fn(x, y, z) fn.graph_for(x, y, z) ``` produced the following graph: ``` graph(%x : Float(2, 2) %y : Float(2, 2) %z : Float(4, 2)) { %3 : int = prim::Constant[value=1]() %y1 : Float(2, 2) = aten::sub(%x, %y, %3) %8 : int = prim::Constant[value=0]() %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y) return (%14); } with prim::FusionGroup_0 = graph(%1 : Float(4, 2) %5 : Float(2, 2) %7 : Float(2, 2) %8 : Float(2, 2)) { %11 : int = prim::Constant[value=1]() %9 : int = prim::Constant[value=1]() %x1 : Float(2, 2) = aten::add(%7, %8, %9) %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5) %2 : int = prim::Constant[value=1]() %3 : Float(4, 2) = aten::add(%w, %1, %2) return (%3); } ``` this is a problem because it violates two invariants: 1) all inputs to the FusionGroup must have the same size 2) prim::FusedConcat's output must not be used inside the FusionGroup This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer. If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466 Differential Revision: D9296686 Pulled By: zou3519 fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd	2018-08-13 21:09:25 -07:00
Junjie Bai	099a545376	Hipify Caffe2 binaries (#10468 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10468 Reviewed By: yinghai Differential Revision: D9301178 Pulled By: bddppq fbshipit-source-id: 5da88aa4d79a5142f8e744cdcd8ae85951bc387c	2018-08-13 20:56:28 -07:00
Peter Goldsborough	9a9224e5c1	Remove "locally" from CONTRIBUTING.md (#10495 ) Summary: A bootcamper was confused by the word "locally" and thought it meant on his macbook as opposed to his FB dev machine. Besides the confusion for the FB context, the word "locally" isn't really necessary at all soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10495 Reviewed By: soumith Differential Revision: D9311480 Pulled By: goldsborough fbshipit-source-id: 2779c7c60f903a1822a50d140ed32a346feec39e	2018-08-13 20:56:26 -07:00
Lukasz Wesolowski	f6eb966fd2	Fix TanhGradientOperator linker errors (#10426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10426 We were seeing linker errors for TanhGradientOperator in multifeed. Since we only use the float specialization, we might as well define it that way. Reviewed By: yinghai Differential Revision: D9280622 fbshipit-source-id: d2ffb698c73a84bb062de5e1f3bda741330e4228	2018-08-13 17:57:10 -07:00
Wei Wen	ffb59e5f20	adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending) Summary: This operator implements b (1/2/4/8) bit stochastic quantization of a floating matrix in a row-wise fashion. 8/b floating values are concatenated to a byte and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629 Reviewed By: harouwu Differential Revision: D8493264 fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02	2018-08-13 16:39:23 -07:00
pbialecki	c6fc3ab557	fixes printing non-contiguous tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10405 Differential Revision: D9302794 Pulled By: soumith fbshipit-source-id: e4a7db8d33400a5a050d05fd1679de8bc3cbcf30	2018-08-13 16:26:20 -07:00
Gregory Chanan	216961b7bf	Remove is_zero_dim_ bool in THTensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10415 Reviewed By: ezyang Differential Revision: D9274954 Pulled By: gchanan fbshipit-source-id: 353a52d91556d5b81c3510eb2bf399d102c9a0a4	2018-08-13 12:39:06 -07:00
peter	f59cce95b4	Some symbol annotation fixes for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10369 Differential Revision: D9300187 Pulled By: ezyang fbshipit-source-id: bf29966ad6aa221332b7232a965fb85e652f866d	2018-08-13 12:26:00 -07:00
Edward Yang	382ff03222	Add missing #pragma once Reviewed By: ml7 Differential Revision: D9299779 fbshipit-source-id: b5b5a1b9ead1b275d3ae54ecfad99617d2869094	2018-08-13 11:39:45 -07:00
iotamudelta	75651d5b58	improve use of ROCm libraries, enable more tests, small fixes (#10406 ) Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2	2018-08-13 11:39:43 -07:00
Jesse Hellemn	cd81217f8e	A single print statement in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473 Reviewed By: ml7 Differential Revision: D9299196 Pulled By: pjh5 fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb	2018-08-13 11:39:42 -07:00
Sam Gross	0b63d12db6	Don't call into Python during Storage destruction. (#10407 ) Summary: ``` This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some programs that use multiprocessing. The backtrace pointed to StorageRef.__del__ being called from subtype_dealloc. My guess is that the Python interpreter was shutdown before all C++ Storage objects were deallocated. Deallocating the C++ Storage called the finalizer which called back into Python after it was no longer safe to do so. This avoids a callback from C++ into Python during Storage finalization. Instead, dead Storage objects (expired weak references) are collected periodically when shared_cache exceeds a limit. The limit is scaled with 2x the number of live references, which places an upper bound on the amount of extra memory held by dead Storage objects. In practice, this should be very small. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407 Differential Revision: D9272400 Pulled By: colesbury fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2	2018-08-13 11:20:07 -07:00
Edward Yang	64235d5c01	Rewrite TensorImpl to use TensorTypeId. (#10278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10278 Translation to Backend happens immediately before we go into the Type universe; otherwise we use TensorTypeId. I allocated TensorTypeId corresponding exactly to existing ATen Backend. Only CPUTensorId and CUDATensorId are relevant in the Caffe2 universe. Reviewed By: gchanan Differential Revision: D9184060 fbshipit-source-id: 9d3989c26f70b90f1bbf98b2a96c57e2b0a46597	2018-08-13 11:20:04 -07:00
Edward Yang	145eb330ad	Back out "Back out "Move typeid.h to move to ATen/core"" (#10465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10465 Original commit changeset: 7050fe845e65 Reviewed By: jerryzh168 Differential Revision: D9296375 fbshipit-source-id: cb8161440ba809dcec5027858a29cd026d537fc3	2018-08-13 11:20:01 -07:00
Zeming Lin	b8530dc1f0	A few additions (#9837 ) Summary: This PR provides 4 fixes / features: 1. torch::nn::Cloneable inherits virtually from torch::nn::Module. We want to pass around a module with new functions, and the best way to do this is to do a diamond inheritance pattern, i.e. ```c++ struct MySuperModuleImpl : virtual public torch::nn::Module { virtual void myFunction() = 0; } struct MySuperModule : public torch::nn::Cloneable<MySuperModule>, MySuperModuleImple {}; struct MyModule : public MySuperModule<MyModule> { void myFunction() override; }; ``` This way, we can simply pass around MySuperModuleImpl around instead of torch::nn::Module. 2. Optimizer options are public now, since there's no way to decay the LR or modify it during training otherwise 3. Serialization functions creates autograd history and calls copy_! Bad! 4. Optimizers did not create buffers after add_parameters was called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9837 Reviewed By: goldsborough Differential Revision: D9199746 Pulled By: ebetica fbshipit-source-id: 76d6b22e589a42637b7cc0b5bcd3c6b6662fb299	2018-08-13 10:24:58 -07:00
root	0a39a9cfbc	Add db directory for hipifying (#10428 ) Summary: bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10428 Differential Revision: D9297115 Pulled By: bddppq fbshipit-source-id: d7134ff24102f03f762e6a7b4340055546c9ecfd	2018-08-13 10:24:56 -07:00
Yangqing Jia	56267cc97b	gflags improvement to allow CAFFE2_EXPORTS (#10444 ) Summary: Explanation copied from code: // Motivation about the gflags wrapper: // (1) We would need to make sure that the gflags version and the non-gflags // version of Caffe2 are going to expose the same flags abstraction. One should // explicitly use caffe2::FLAGS_flag_name to access the flags. // (2) For flag names, it is recommended to start with caffe2_ to distinguish it // from regular gflags flags. For example, do // CAFFE2_DEFINE_BOOL(caffe2_my_flag, true, "An example"); // to allow one to use caffe2::FLAGS_caffe2_my_flag. // (3) Gflags has a design issue that does not properly expose the global flags, // if one builds the library with -fvisibility=hidden. The current gflags (as of // Aug 2018) only deals with the Windows case using dllexport, and not the Linux // counterparts. As a result, we will explciitly use CAFFE2_EXPORT to export the // flags defined in Caffe2. This is done via a global reference, so the flag // itself is not duplicated - under the hood it is the same global gflags flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10444 Differential Revision: D9296726 Pulled By: Yangqing fbshipit-source-id: a867d67260255cc46bf0a928122ff71a575d3966	2018-08-13 09:54:48 -07:00
Edward Yang	64a6f17177	Fix ATen/core header installation. (#10463 ) Summary: Fixes #10353 and fixes #10397. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463 Differential Revision: D9296491 Pulled By: ezyang fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366	2018-08-13 09:25:49 -07:00
onnxbot	fa5d95a00c	Bump onnx to onnx/onnx@0d250de (#10452 ) Summary: `0d250dea76` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10452 Reviewed By: houseroad Differential Revision: D9288037 Pulled By: bddppq fbshipit-source-id: 206be3ee2b8ebca26f3d8af0597078363ed6d168	2018-08-13 00:09:15 -07:00
Tongliang Liao	3cbe8f0c3e	Detect system RocksDB installation with CMake config files. (#7315 ) Summary: On Windows, the FindRocksDB script doesn't detect rocksdb installation built by cmake. And it doesn't include/link the RocksDB dependencies either, like: * `Snappy` * `Shlwapi.lib` * `Rpcrt4.lib` This PR try to detect in config mode first before using private find module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/7315 Differential Revision: D9287587 Pulled By: Yangqing fbshipit-source-id: 314a36a14bfe04aa45013349c5537163fb4c5c00	2018-08-12 18:24:10 -07:00
Tongliang Liao	82d11b847e	Use CUDA_LINK_LIBRARIES_KEYWORD instead of hacking. (#10437 ) Summary: There's no need to hack. Using `CUDA_LINK_LIBRARIES_KEYWORD` is the normal way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10437 Differential Revision: D9287579 Pulled By: Yangqing fbshipit-source-id: d3d575ea8c3235576ba971e4b7493ddb435f92f3	2018-08-12 18:09:20 -07:00
Tongliang Liao	508de8109f	Added missing "AT_" prefix to macro. (#10436 ) Summary: For issue #10435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10436 Differential Revision: D9287578 Pulled By: Yangqing fbshipit-source-id: b07de3a2d7fa6f980a189b5e8f7ce05dfa1bef50	2018-08-12 18:09:19 -07:00
Yinghai Lu	1756daaa75	Use FULL_CAFFE2 to build caffe2 and python in one shot (#10427 ) Summary: Building caffe2 and pytorch separately will end up duplicated symbols as they now share some basic libs. And it's especially bad for registry. This PR fixes our CI and build them in one shot with shared symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10427 Reviewed By: bddppq Differential Revision: D9282372 Pulled By: yinghai fbshipit-source-id: 0514931ea88277029a68fa5368ff4336472f132e	2018-08-12 15:39:12 -07:00
Edward Yang	51f154e072	Fix Python lint errors. (#10441 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10441 Reviewed By: Yangqing Differential Revision: D9285502 Pulled By: ezyang fbshipit-source-id: 12c94b28bee9cade930c8f260577e81ea1915269	2018-08-11 21:08:50 -07:00
Yangqing Jia	cd53b78bd0	Remove caffe namespace GetEmptyStringAlreadyInited (#10438 ) Summary: A followup cleanup of #10380 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/10438 Differential Revision: D9285692 Pulled By: Yangqing fbshipit-source-id: c73defbef00d3b563240d0b69d85bd0a6e3eb504	2018-08-11 17:39:58 -07:00
jgong5	ab6afc2b23	Optimize max_pooling for inference for MKL-DNN/IDEEP device (#10156 ) Summary: Optimize the max_pooling operation for inference path by setting the "inference" flag to the underlying MKL-DNN, saving the computation and store of max indices which is only needed for training. To make the API compatible, training mode is still the default and inference mode is set in the optimizeForIdeep path. Test shows the speed-up of a single max_pooling operation is up to 7X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10156 Differential Revision: D9276755 Pulled By: yinghai fbshipit-source-id: ad533d53aabb8ccb3b592da984d6269d9b794a8a	2018-08-10 23:14:05 -07:00
Yinghai Lu	d3ccc836de	Fix warning in Nomnigraph (#10425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10425 `const size_t` as return value doesn't make sense. Reviewed By: duc0 Differential Revision: D9281442 fbshipit-source-id: c3d9c94f5dbe516476f0c74f63c35e60893c8140	2018-08-10 22:40:26 -07:00
Edward Yang	1dbdc5a93d	Back out "Move typeid.h to move to ATen/core" Summary: Original commit changeset: 21f2c89e58ca Reviewed By: yinghai Differential Revision: D9282171 fbshipit-source-id: 7050fe845e6524b965bdd45794a6fa1665b83e34	2018-08-10 21:39:25 -07:00
Jason Gauci	31646edfff	Increase GLOO rendevous timeout Summary: Increase GLOO rendevous timeout Reviewed By: teng-li Differential Revision: D9273544 fbshipit-source-id: 5c22c1d18df3032f019ff12e2a720aea7c390f15	2018-08-10 18:40:18 -07:00
Edward Yang	767687835e	Replace sudo with --user in CI caffe2 install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10328 Reviewed By: pjh5 Differential Revision: D9275809 Pulled By: ezyang fbshipit-source-id: c22cb1570c67199b74b2188ad83b1e4828e11911	2018-08-10 15:11:43 -07:00
Adam Paszke	adbcb3c1dc	Move dropout and alpha dropout to ATen (#10384 ) Summary: zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384 Reviewed By: ezyang Differential Revision: D9272583 Pulled By: apaszke fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921	2018-08-10 14:55:28 -07:00
Gregory Chanan	5b0be9de59	Remove TH compatibility calls for strides. (#10414 ) Summary: This should just work now that sizes/strides are unified between TH and ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10414 Differential Revision: D9274681 Pulled By: gchanan fbshipit-source-id: 69eb766f4e3a5b6c57b15837cffdef513b6d7817	2018-08-10 13:54:58 -07:00
Edward Yang	674f7a9778	Correctly share CUDA Parameters. (#10220 ) Summary: ``` Correctly share CUDA Parameters, requires_grad and hooks. Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue #9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes #9996. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220 Differential Revision: D9160746 Pulled By: ezyang fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c	2018-08-10 13:54:56 -07:00
Christian Puhrsch	0b8a0125ab	Fixes torch.log after torch.expand giving incorrect results (#10269 ) Summary: fixes #10241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10269 Differential Revision: D9272472 Pulled By: cpuhrsch fbshipit-source-id: cd1afbb4386a0d0956ee21b24f0d529755b986ca	2018-08-10 13:39:38 -07:00
Tongzhou Wang	6a55238a3f	Grid sampler: nearest interpolation & reflection padding (#10051 ) Summary: closes #9702 . cc jph00 Commit structure: 1. Change the index calculation logic. I will explain using 1-D for simplicity. Previously we have (in pseudo code): ``` // 1. get the float locations from grid scalar_t x = from_grid() // 2. find the integral surrounding indices int x_left = floor(x) int x_right = x_left + 1 // 3. calculate the linear interpolate weights scalar_t w_left = x_right - x scalar_t w_right = x - x_left // 4. manipulate the integral surrounding indices if needed // (e.g., clip for border padding_mode) x_left = manipulate(x_left, padding_mode) x_right = manipulate(x_right, padding_mode) // 5. interpolate output_val = interpolate(w_left, w_right, x_left, x_right) ``` This is actually incorrect (and also unintuitive) because it calculates the weights before manipulate out-of-boundary indices. Fortunately, this isn't manifested in both of the current supported modes, `'zeros'` and `'border'` padding: + `'zeros'`: doesn't clip + `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are clipped to the same value, so weights don't matter But this is a problem with reflection padding, since after each time we reflect, the values of `w_left` and `w_right` should be swapped. So in this commit I change the algorithm to (numbers corresponding to the ordering in the above pseudo-code) ``` 1. get float location 4. clip the float location 2. find the integral surrounding indices 3. calculate the linear interpolate weights ``` In the backward, because of this change, I need to add new variables to track `d manipulate_output / d manipulate_input`, which is basically a multiplier on the gradient calculated for `grid`. From benchmarking this addition doesn't cause obvious slow downs. 2. Implement reflection padding. The indices will keep being reflected until they become within boundary. Added variant of `clip_coordinates` and `reflect_coordinates` to be used in backward. E.g., ```cpp // clip_coordinates_set_grad works similarly to clip_coordinates except that // it also returns the `d output / d input` via pointer argument `grad_in`. // This is useful in the backward pass of grid_sampler. scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t grad_in) ``` For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`. If `in` is reflected odd* times in `'reflection'` mode, `grad_in` is set to `-1`. 3. Implement nearest interpolation. 4. Add test cases 5. Add better input checking Discussed with goldsborough for moving `operator<<` of `at::Device`, `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise `AT_CHECK` can't find them.) 6. Support empty tensors. cc gchanan + Make empty tensors not acceptable by cudnn. + Add `AT_ASSERT(kernel block size > 0)` if using `GET_BLOCKS` + Cache `numel` in `TensorGeometry` I was going to use `numel` to test if cudnn descriptor should accept a tensor, but it isn't used eventually. I can revert this if needed. 7. Add more test cases, including on input checking and empty tensors 8. Remove an obsolete comment 9. Update docs. Manually tested by generating docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051 Differential Revision: D9123950 Pulled By: SsnL fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6	2018-08-10 12:43:27 -07:00
Jesse Hellemn	def3715e82	Minor changes for nicer pip packages (#9544 ) Summary: I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544 Reviewed By: orionr Differential Revision: D9267111 Pulled By: pjh5 fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894	2018-08-10 12:09:46 -07:00
Yangqing Jia	40109b16d0	Remove caffe1 specific proto (#10380 ) Summary: This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation. Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380 Differential Revision: D9267981 Pulled By: Yangqing fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390	2018-08-10 11:10:26 -07:00
Anders Papitto	018790cd4b	thread BUILD_SHARED_LIBS through build_pytorch_libs.sh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10272 Differential Revision: D9239337 Pulled By: anderspapitto fbshipit-source-id: 187b3acb7e85635d9b45a3dd82c98d86a2b51e70	2018-08-10 10:39:31 -07:00
Gregory Chanan	9b8a036873	Fix basic.cpp, which compared equality between a size [1] tensor with… (#10404 ) Summary: … a size [] tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10404 Differential Revision: D9268467 Pulled By: gchanan fbshipit-source-id: 92bb387358f4030519c6883c12ea69312185446e	2018-08-10 10:39:29 -07:00
Zhishuai Zhang	e524a8994b	Make lengths_host_.CopyFrom synced in LengthsCosineCoherenceOp and LengthsTileOp (#10360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10360 It seems `lengths_host_.CopyFrom(lengthsInput, &context_);` is asynchronous w.r.t. the host while `lengths_host_.CopyFrom(lengthsInput);` is synchronous. However, according to jerryzh168, `lengths_host_.CopyFrom(lengths, &context_); context_.FinishDeviceComputation();` is the safest way to guarantee synchronization. Reviewed By: jerryzh168 Differential Revision: D9197923 fbshipit-source-id: 827eb63d9d15c1274851e8301a793aed39d4fa6b	2018-08-10 10:39:28 -07:00
Adam Paszke	be5fb8f6fd	Move fused RNN kernels into ATen (#10305 ) Summary: As in the title. I also did a small refactor that let us loose almost 400 loc. This is a first step in moving the RNN code to C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10305 Reviewed By: ezyang Differential Revision: D9196227 Pulled By: apaszke fbshipit-source-id: 54da905519aade29baa63ab1774a3ee1db5663ba	2018-08-10 09:12:05 -07:00
Gregory Chanan	e221791afc	Fix typo. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10387 Differential Revision: D9255840 Pulled By: gchanan fbshipit-source-id: 97b52d4e349c1e2d1970abde7dc6b25e7cf668a0	2018-08-10 08:55:30 -07:00
Gregory Chanan	1e3e26e3e8	Use nDimensionLegacyNoScalars in THTensorDimApply. (#10388 ) Summary: This issue was exposed in https://github.com/pytorch/pytorch/pull/10383. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10388 Differential Revision: D9255836 Pulled By: gchanan fbshipit-source-id: 88c5a6415c27d56ff54d00a8957fdc1617cfbde7	2018-08-10 08:55:28 -07:00
Edward Yang	3667d029b4	Move typeid.h to move to ATen/core (#10163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10163 - Remove dependency on caffe2/core/common.h for ATen/core/typeid.h Unfortunately, Windows seems to rely on typeid.h including this header, so it is still included from the forwarding header caffe2/core/typeid.h - Deduplicate Demangle/DemangleType with their ATen equivalents Reviewed By: smessmer Differential Revision: D9132432 fbshipit-source-id: 21f2c89e58ca1e795f1b2caa316361b729a5231b	2018-08-10 08:45:44 -07:00
Roy Li	e9ad74357e	Use serialization container in ir import export (#10394 ) Summary: Copy of #10191 because these changes didn't land with the diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394 Differential Revision: D9260816 Pulled By: li-roy fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558	2018-08-10 00:09:30 -07:00
Michael Suo	0950d7a98d	support list slicing (#10318 ) Summary: As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318 Differential Revision: D9254351 Pulled By: michaelsuo fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586	2018-08-09 17:25:13 -07:00
Gregory Chanan	b1e3239ec8	Fix some backwards definitions wrt keepdim. (#10382 ) Summary: Before we had 0-dim tensors in TH, we were flexible in what we accepted wrt to the difference between size [] and size [1] tensors in backwards functions because they were identical in TH. So, we had backwards definitions that were technically incorrect, but happened to work. This often masks shape issues, adds greatly to code complexity and thus IMO isn't worth keeping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10382 Differential Revision: D9244618 Pulled By: gchanan fbshipit-source-id: 2c29c53a8ffe8710843451202cad6b4323af10e8	2018-08-09 15:11:55 -07:00
Gregory Chanan	209af45614	Back out "[pytorch][PR] Fix bincount for empty input" Summary: Original commit changeset: 6c4c66c23679 Reviewed By: SsnL Differential Revision: D9253403 fbshipit-source-id: bf5ee669ed095c06ff58a2871f7350e879261076	2018-08-09 14:25:33 -07:00
Alex Sergeev	18d2fcde7a	Fix performance of DistributedSampler per #8958 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10361 Differential Revision: D9240798 Pulled By: ezyang fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f	2018-08-09 12:54:37 -07:00
Thomas Viehmann	64a60030a6	Don't copy on clamp, clamp_out (#10352 ) Summary: This makes clamp and relu faster (fixes #10276). The extra copying was introduced when clamp moved to ATen and the _th_clamp_ wrapper was used to forward to TH/THC, we remove that and add _th_clamp(_out) instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10352 Reviewed By: ezyang Differential Revision: D9233590 Pulled By: SsnL fbshipit-source-id: 4f86a045498e5e577fb22656c71f171add7ed0ac	2018-08-09 12:40:47 -07:00
Vishwak Srinivasan	b43beec070	Fix bincount for empty input (#9757 ) Summary: Added tests too. Fixes #9756 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757 Differential Revision: D8966879 Pulled By: soumith fbshipit-source-id: 9f08a9d5d5d037db16319141d7a227a5efa23869	2018-08-09 12:40:45 -07:00
peter	cc5b47ff47	Fix the logic for PATH guess on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10372 Differential Revision: D9240207 Pulled By: soumith fbshipit-source-id: 0933f6fde19536c7da7d45044efbdcfe8ea40e1f	2018-08-09 12:40:44 -07:00
Tongzhou Wang	3fa1c1022a	Avoid std::thread ctor "cannot resolve" error (#10381 ) Summary: If an `at::test` function is added, gcc can't figure out the `std::thread(test, -1)` resolution. It is not a problem for current code. I bumped into this when playing with native functions. But I think it is a good to just prevent it from happening in future by removing `using namespace at;`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10381 Differential Revision: D9241614 Pulled By: SsnL fbshipit-source-id: 972ac3cecff3a50602b3fba463ae1ebd3f53d036	2018-08-09 11:55:40 -07:00
peter	99b10adc01	Fix compile flags for MSVC Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10368 Differential Revision: D9240791 Pulled By: ezyang fbshipit-source-id: 536b093b5c800cc1cf02cbbde9ae341e25d083d1	2018-08-09 09:39:58 -07:00
Gregory Chanan	7d53c876dc	Move maybeZeroDim to TH, change condition so it doesn't turn off scal… (#10333 ) Summary: …ars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10333 Differential Revision: D9206091 Pulled By: gchanan fbshipit-source-id: 492c50189edc2056aa2acce98d49234d2a54ce39	2018-08-09 09:28:57 -07:00
Gregory Chanan	e967fa9757	Fix THTensor_nElement for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10332 Differential Revision: D9206039 Pulled By: gchanan fbshipit-source-id: 0bc7c15050a6a602f621d3e9ecc3a6ea35481a6a	2018-08-09 09:28:55 -07:00
Thomas Viehmann	52d85bedb7	Deal with undefined tensors in unbind backward (#9995 ) Summary: When only part of the outputs of unbind are used in a backward, the gradients for the others are undefined. This sets those to zero in to_tensor_list. Fixes: #9977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9995 Differential Revision: D9239610 Pulled By: soumith fbshipit-source-id: eb8d1b3f2b4e615449f9d856e10b946910df9147	2018-08-09 08:54:28 -07:00
Zhishuai Zhang	b70b7066f7	Keep kEps in one place to make sure they are consistent (#10334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10334 Keep kEps in one place to make sure they are consistent Reviewed By: xianjiec Differential Revision: D9202280 fbshipit-source-id: 35d173ce1d1a361b5b8cdbf1eac423e906e7c801	2018-08-09 08:27:42 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
Tongzhou Wang	037d8d1bab	Order Loss functions alphabetically in nn.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10365 Differential Revision: D9237287 Pulled By: SsnL fbshipit-source-id: 28e9de76b9cfd8f63c8df561ff1531ea8d0803ea	2018-08-08 22:39:55 -07:00
Marat Dukhan	9dfc4edc68	Update NNPACK and cpuinfo submodules (#8564 ) Summary: Bring in extra optimizations in Winograd-based convolution on NEON Pull Request resolved: https://github.com/pytorch/pytorch/pull/8564 Reviewed By: hlu1 Differential Revision: D9088140 Pulled By: Maratyszcza fbshipit-source-id: 2089191416db98bdad8f0e4848b1435fcf74a88b	2018-08-08 22:39:52 -07:00
Thomas Viehmann	6e49f933ad	Check that result is on CPU for CPU unary ops kernels (#10358 ) Summary: Fixes: #10270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10358 Differential Revision: D9233066 Pulled By: soumith fbshipit-source-id: 39b7524fe55ddb899fb27e2c0ef504ce54dbad35	2018-08-08 21:11:53 -07:00
Duc Ngo	783f2c60b2	nomnigraph - Enhancements to subgraph matching APIs (#10218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10218 SubtreeMatchCriteria now supports: - nonTerminal flag : if this is set, it means we only match the root of the subtree and do not care about the children. Example use case: to match an "input" node but does not care how the input is produced. Additional tests for these new logic are added to subgraph_matcher_test.cc. Subgraph matching APIs for NNGraph is also added. (Further enhancement to make the SubgraphMatching API constructs a Subgraph object/more diagnostic information will go later). Reviewed By: bwasti Differential Revision: D9156092 fbshipit-source-id: 3f28ac15d9edd474b3e0cd51fd7e6f973299d061	2018-08-08 14:56:23 -07:00
Ailing Zhang	69760e2840	update torch.eig() doc (#10315 ) Summary: This fixes #9383 Update torch.eig() doc, the complex part is written based on https://scc.ustc.edu.cn/zlsc/sugon/intel/mkl/mkl_manual/GUID-16EB5901-5644-4DA6-A332-A052309010C4.htm Pull Request resolved: https://github.com/pytorch/pytorch/pull/10315 Reviewed By: yf225 Differential Revision: D9200723 Pulled By: ailzhang fbshipit-source-id: d2e186fd24defbc4fdea6c2cf3dc4f7e05e1d170	2018-08-08 06:43:41 -07:00
Edward Yang	0d03219a42	Remove hack as integrated builds use FULL_CAFFE2 now (#10320 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10320 Reviewed By: jerryzh168 Differential Revision: D9198902 Pulled By: ezyang fbshipit-source-id: 8af28d607735e5f4450c40127c1f8c262ea602ce	2018-08-07 21:40:07 -07:00
Thiago Crepaldi	7d6d7bef6a	Enable docker image build for PyTorch using specific python version (#10317 ) Summary: Current Dockerfile builds pytorch using default python within miniconda, which happens to be Python 3.6 This patch allows users to specify which python should be installed in the default miniconda environment used by the pytorch dockerfile. I have tested the build for python 2.7, 3.5, 3.6 and 3.7. Python 2.7 required typing and cython Pull Request resolved: https://github.com/pytorch/pytorch/pull/10317 Differential Revision: D9204401 Pulled By: ezyang fbshipit-source-id: 11355cab3bf448bbe8369a2ed1de0d409c9a2d6e	2018-08-07 16:13:33 -07:00
Gregory Chanan	66b3bae47c	Add sizesLegacyNoScalars/stridesLegacyNoScalars analog of sizeLegacyN… (#10323 ) Summary: …oScalars,strideLegacyNoScalars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10323 Differential Revision: D9200567 Pulled By: gchanan fbshipit-source-id: 5580d6f92eef0acb04132f1978436cc31cdf563a	2018-08-07 15:41:28 -07:00
Christian Puhrsch	b7bc327180	Remove new_Tensor and generated components Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10194 Differential Revision: D9160559 Pulled By: cpuhrsch fbshipit-source-id: 133185b3d4258c154dc43f7572dbef6bfa6786f3	2018-08-07 15:09:38 -07:00
Peter Goldsborough	5390476297	Add tracing to custom op and simplify tracer overall (#10212 ) Summary: This PR adds tracing infrastructure for custom operators. It also simplifies the tracer overall, and changes the codegen to do more metaprogramming there instead of via C++ (which was necessary for the custom op tracing). To give an example of the tracer/metaprogramming change, what used to look like this in `VariableType.cpp`: ``` jit::tracer::PreTraceInfo trace_info; if (jit::tracer::isTracing()) { trace_info = jit::tracer::preRecordTrace(jit::aten::index_select, "self", self, "dim", dim, "index", index); } ``` is now simply the inlined version of `preRecordTrace`, minus C++ metaprogramming: ``` torch::jit::Node* node = nullptr; if (jit::tracer::isTracing()) { auto& graph = jit::tracer::getTracingState()->graph; node = graph->create(jit::aten::index_select_out, /outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "result", result); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "dim", dim); jit::tracer::addInputs(node, "index", index); graph->appendNode(node); } ``` zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10212 Differential Revision: D9199615 Pulled By: goldsborough fbshipit-source-id: cd4b603c1dc01340ead407228e109c99bdba2cfc	2018-08-07 13:54:15 -07:00
Natalia Gimelshein	5bb21493fd	add fused dropout kernels (#9666 ) Summary: While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask. Once dropout is moved to aten, these kernels still can be used for efficient implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666 Reviewed By: SsnL Differential Revision: D8948077 Pulled By: ezyang fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a	2018-08-07 13:34:53 -07:00
Viswanath Sivakumar	74979495f0	Optional input lengths in CTC op (#10228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10228 Sometimes, for all items in the minibatch in test mode, input length will be equal to max time steps. This avoids having to pass in an external tensor. Differential Revision: D9174378 fbshipit-source-id: 22f7d5c311c855d9c3ac59f2a5e773279bd69974	2018-08-07 13:34:51 -07:00
mruberry	9b1a65bec3	Extends type and shape tracing with device (#9796 ) Summary: This PR extends the existing type and shape metadata tracing and verification done in autograd with device information. This expansion of tracing is required for #8354, is likely useful in other scenarios, and is a healthy sanity check, just like type and shape tracing. The precise changes are: - TypeAndShape -> InputMetadata, now includes device() - Creating InputMetadata is simplified to just require a tensor, and callers were updated to use this simpler invocation wherever possible - The gradient accumulator of a variable is now reset when set_data() is called if either the type or device changes, and this reset now locks to avoid contention with acquiring the gradient accumulator - Mismatched devices during backward() will throw a runtime error, just like mismatched type and shape - (Bonus!) Two uninitialized pointers in THCReduce are now initialized (to nullptr) to prevent build warnings fyi colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9796 Reviewed By: goldsborough Differential Revision: D9119325 Pulled By: ezyang fbshipit-source-id: 76d1861b8d4f74db0575ff1f3bd965e18f9463de	2018-08-07 12:25:17 -07:00
Edward Yang	2993c42ee4	Squash some 'invalid escape sequence' warnings. (#10310 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10310 Differential Revision: D9196254 Pulled By: ezyang fbshipit-source-id: 63bb8e52ac6970fe8e11a2d3c491ab58250dc467	2018-08-07 12:25:15 -07:00
Wei Yang	db7a2b1f0d	fix doc for as_tensor (#10309 ) Summary: - fixes #9914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10309 Differential Revision: D9196427 Pulled By: weiyangfb fbshipit-source-id: c9a01e42c2e9dbfe2bd94ad14651d9f578751de2	2018-08-07 11:24:45 -07:00
Wei Yang	dcaafdd04b	fix doc of sparse_coo_tensor (#10308 ) Summary: - fixes #9998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10308 Differential Revision: D9196423 Pulled By: weiyangfb fbshipit-source-id: 23b4ed96e354ac9aa7c268aad105818a2c6d3bd8	2018-08-07 11:24:44 -07:00
Jorghi12	20a549b101	Start using a newer version of rocRand that's PyTorch compatible. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10280 Differential Revision: D9196349 Pulled By: Jorghi12 fbshipit-source-id: 4147f2e6e3fdd641b026f3761d684437591405be	2018-08-07 11:09:59 -07:00
Roy Li	fe68879832	Fix dir(torch) for python 3.7 (#10271 ) Summary: fixes #10160. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10271 Differential Revision: D9188031 Pulled By: li-roy fbshipit-source-id: a3620553a8ba2b7391acdf78dbe58afcdb6c5f7f	2018-08-07 09:57:51 -07:00
Edward Yang	ad76fc8807	s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275 Remove forwarding declaration in caffe2/core/common.h ``` codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN ``` Reviewed By: mingzhe09088 Differential Revision: D9184809 fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8	2018-08-07 08:54:26 -07:00
Edward Yang	66f7b8abbe	Better macro name hygiene prefixing. (#10274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274 Good C++ libraries don't take up un-namespaced identifiers like DISABLE_COPY_AND_ASSIGN. Re-prefix this. Follow up fix: codemod Caffe2 to use the new macro, delete the forwarding definition Reviewed By: mingzhe09088 Differential Revision: D9181939 fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c	2018-08-07 08:54:24 -07:00
Adam Lerer	18e298305e	Increase TCP listen queue size from 64 to 1024 (#10268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10268 Running torch.distributed.init_process_group fails with more than ~64 processes, with various errors like connection refused or connection reset by peer. After some digging, it looks like the root cause is that all workers have to connect to master via TCP (both in Zeus init and in DataChannelTCP - look for `connect()`), and the listening socket only has a backlog of 64. I increased the backlog to 1024, that seems like enough for reasonable purposes (the hard limit is 65535 in /proc/sys/net/core/somaxconn). There's probably a more correct way to do this that involves retries when connection is refused. Reviewed By: soumith Differential Revision: D9182216 fbshipit-source-id: 2f71c4995841db26c670cec344f1e3c7a80a7936	2018-08-07 08:26:06 -07:00
Edward Yang	1a797ec810	Revert "clean up the build a bit. We no longer need the separate buil… (#10285 ) Summary: …d_libtorch entrypoint (#9836)" This reverts commit 62e23a1ee47eb66056e6695cefef4e42599f8bd0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10285 Differential Revision: D9193107 Pulled By: ezyang fbshipit-source-id: de96dce12fdf74410413ae18feee5caf0bed0025	2018-08-07 07:40:20 -07:00
Michael Suo	b6402648f4	fix off-by-one bug in open-ended slicing (#10286 ) Summary: Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286 Differential Revision: D9193292 Pulled By: michaelsuo fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7	2018-08-07 00:39:42 -07:00
Michael Suo	5a7c710548	Support some basic list operations (#10225 ) Summary: Support a few basic operators: - eq - add - len - select (indexing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225 Differential Revision: D9172338 Pulled By: michaelsuo fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9	2018-08-07 00:39:40 -07:00
Michael Suo	1bae6e24c9	Change empty list literal compiler error to match actual builtin name (#10265 ) Summary: I changed the name of this builtin to match Python's native style, but forgot to change the compiler error to match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10265 Differential Revision: D9192963 Pulled By: michaelsuo fbshipit-source-id: 225ca4cd50fbbe3b31c369deeb3123a84342aab1	2018-08-07 00:39:39 -07:00
Edward Yang	fa9ea5bde9	Move CoreAPI.h to Macros.h, to give it a more accurate name. (#10264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264 Since we now have DISABLE_COPY_AND_ASSIGN macro in the file, CoreAPI is no longer an accurate name. Reviewed By: dzhulgakov Differential Revision: D9181687 fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67	2018-08-06 22:27:44 -07:00
Edward Yang	da44cf6101	Move TensorTypeId, TensorTypeIdRegistration and flat_hash_map to ATen/core (#10263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263 Auxiliary changes that were needed: - Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file now) Reviewed By: dzhulgakov Differential Revision: D9181321 fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa	2018-08-06 22:27:40 -07:00
James Geboski	f1cf3105de	Revert D9169049: [pytorch][PR] Add new mkldnn fallback operators Differential Revision: D9169049 Original commit changeset: 3bc30250d734 fbshipit-source-id: 65a91594bda699ff9535b27dccd0d1e5d1a8036a	2018-08-06 20:39:30 -07:00
wuhuikx	f47bec821e	Add new mkldnn fallback operators (#10162 ) Summary: Add new ideep fallback operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10162 Reviewed By: yinghai Differential Revision: D9169049 Pulled By: wesolwsk fbshipit-source-id: 3bc30250d7340fea2c442f36d16b85241ceee6e7	2018-08-06 16:56:00 -07:00
Achal	25b2e88750	Stop propagating std flags to downstream gcc/nvcc (#10098 ) Summary: When we directly use -std=c++11, it propagates to the downstream applications. Problems: 1. Gcc flags propagating to nvcc. 2. nvcc flags propagating to nvcc. (Which throws an error like redeclaration of std flag) This PR will fix these propagation issues! Similar problem: https://github.com/FloopCZ/tensorflow_cc/pull/92 https://github.com/CGAL/cgal/issues/2775 Requires: Cmake 3.12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10098 Differential Revision: D9187110 Pulled By: ezyang fbshipit-source-id: 0e00e6aa3119c77a5b3ea56992ef3bbfecd71d80	2018-08-06 15:30:27 -07:00
Edward Yang	8b08eca203	Move ScalarType to ATen/core, splitting out Backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10262 Reviewed By: dzhulgakov Differential Revision: D9157408 fbshipit-source-id: 11631a35dfc6cb1f73f61ea08d3115f8ef4cb034	2018-08-06 15:30:25 -07:00
iotamudelta	a38b572de3	enable unit tests and other changes (#10266 ) Summary: This PR for the ROCm target does the following: * enable some unit tests on ROCm * fix a missing static_cast that breaks BatchNorm call on ROCm * fix BatchNorm to work on ROCm w/ ROCm warp sizes etc * improve the pyhipify script by introducing kernel scope to some transpilations and other improvements * fix a linking issue on ROCm * for more unit test sets: mark currently broken tests broken (to be fixed) * enable THINLTO (phase one) to parallelize linking * address the first failing of the elementwise kernel by removing non-working ROCm specialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266 Differential Revision: D9184178 Pulled By: ezyang fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297	2018-08-06 14:54:01 -07:00
Jerry Zhang	e0d43572c1	Cleaner semantics for Reserve (#10261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10261 1. Reserve Currently, Reserve will allocate new memory and old data in the tensor is also preserved, and Resize is relying on this behavior in some call-site, e.g. https://github.com/pytorch/pytorch/blob/master/caffe2/operators/reservoir_sampling.cc#L103, where we should be using Extend. We want to bring semantics of Reserve to be more aligned with std::vector, i.e. we want it to be an optimization about memory allocation and remove the semantics about preserving the data. We'll remove the guarantee that data will be preserved after Reserve, and Extend will be the only API that preserves old data when we do in-place extension of memory. This also helps with the later refactoring on split Storage from Tensor. Also, we'll only pass in the outer dimension to Reserve which means the later dimensions should be set before we call Reserve. 2. Extend/Shrink Previously, Extend actually means ExtendBy and Shrink means ShrinkTo, I would like to add a ExtendTo for convenience, and change Shrink to ShrinkTo. Old functions calling Extend is still there, although it actually means Extend by, but I think it still makes sense to have it. 3. Usage Patterns The expected usage patterns right now is: ``` t->Resize({0, 32, 32, 32}); t->template mutable_data<T>(); // set meta_ t->Reserve(100); auto* t_data = t->template mutable_data<T>(); // feed data to tensor using t_data for (int i = 0; i < 100; ++i) { t->Extend(1, 50, &context_); // you can continue to use t_data if you have reserved enough space // otherwise, you should call t->template mutable_data<T> again to // get the new data pointer since Extend will allocate new memory even // though the original data is preserved. } ``` Reviewed By: ezyang Differential Revision: D9128147 fbshipit-source-id: e765f6566d73deafe2abeef0b2cc0ebcbfebd096	2018-08-06 14:40:16 -07:00
Xiaomeng Yang	a13a53c151	Optimize group_norm on cpu (#10246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10246 Optimize group_norm on cpu Reviewed By: houseroad Differential Revision: D9177878 fbshipit-source-id: 41f7aadc6336317c338c75daccef6cb98e9de9de	2018-08-06 14:26:09 -07:00
Peter Goldsborough	0c848f4179	Python integration for custom operators (#10149 ) Summary: Adds the Python path to custom operators, including dynamically loading operations into Python. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149 Reviewed By: ezyang Differential Revision: D9158380 Pulled By: goldsborough fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02	2018-08-06 13:54:48 -07:00
Anders Papitto	62e23a1ee4	clean up the build a bit. We no longer need the separate build_libtorch entrypoint (#9836 ) Summary: the new entrypoint is `./tools/build_pytorch_libs.sh caffe2` this will also speed up CI builds a bit, since we will no longer be compiling all of libtorch twice Pull Request resolved: https://github.com/pytorch/pytorch/pull/9836 Differential Revision: D9182634 Pulled By: anderspapitto fbshipit-source-id: 0b9a20ab04f5df2d5c4e7777e4dc468ab25b9ce2	2018-08-06 13:41:51 -07:00
Gregory Chanan	d1a0c2eaf8	Add back THTensor_nDimension. (#10259 ) Summary: Turns out some people are using this via the C-API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10259 Differential Revision: D9180135 Pulled By: gchanan fbshipit-source-id: 68f59beabf7f8093e67581d7e7ebfe8dff9e6b69	2018-08-06 11:09:41 -07:00
Gregory Chanan	6ac35b35d1	Stop using THLongStorage for sizes/strides, remove THLongStorageView. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10219 Reviewed By: cpuhrsch Differential Revision: D9159550 Pulled By: gchanan fbshipit-source-id: 745a6d335613688ed41b32369ee4938907ce8cbb	2018-08-06 09:25:32 -07:00
Jongsoo Park	835a5d4f49	Add cost inference of fwd sparse operators and sparse adagrad (#9314 ) Summary: We should also add cost inference for sparse operators in backward pass later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9314 Reviewed By: orionr Differential Revision: D8789240 Pulled By: jspark1105 fbshipit-source-id: 68c2170f294fe13bcc409276f599b5fa8a98bcd3	2018-08-06 08:39:16 -07:00
peter	506142ac8a	Add warning for building PyTorch using Python 2.7 on Windows (#10247 ) Summary: Fixes #9232. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247 Differential Revision: D9178257 Pulled By: SsnL fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca	2018-08-05 21:24:02 -07:00
Lingyi Liu	267c397c5b	Add the ocr_det model for benchmarking (#10245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10245 as title Reviewed By: sf-wind Differential Revision: D9176654 fbshipit-source-id: 3339d2aa6a0ceb0e751745c06dcfd025ccbf5449	2018-08-05 16:45:35 -07:00
Lingyi Liu	7f2e43a084	Add the ocr_rec model json (#10240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10240 as title Reviewed By: sf-wind Differential Revision: D9176522 fbshipit-source-id: 5b92c0b4ed24f96fe7b1321a3ab5ad26dcd3318d	2018-08-05 16:45:23 -07:00
Shuichi KITAGUCHI	df23bdc82d	add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233 ) Summary: Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`. But "setup.py clean" deletes these folders because those are described in `.gitignore` file. To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker. Discussed in #10206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233 Differential Revision: D9175515 Pulled By: ezyang fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37	2018-08-05 15:55:44 -07:00
Xiaomeng Yang	f57e4ce1d5	Update broadcast with alpha to reduce num of launching kernels. (#10235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10235 Update broadcast with alpha to reduce num of launching kernels. Reviewed By: houseroad Differential Revision: D9175824 fbshipit-source-id: 7a463833350a2c84dcfb82f73cf40da403dd59a0	2018-08-04 19:54:20 -07:00
Qin Huang	ab293924bb	support generic feature in DPER2 (#10197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10197 Support generic feature in DPER2 For now since we only have one generic type 1, we are directly adding the parsed feature record to embedding feature. For new feature types with specific structure, there should also be corresponding coding changes expected. Reviewed By: itomatik Differential Revision: D8788177 fbshipit-source-id: 9aaa6f35ece382acb4072ec5e57061bb0727f184	2018-08-04 15:25:13 -07:00
Xiaomeng Yang	57d2d4bcff	Optimize reduce ops for 2d and 3d (#9992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992 Optimize reduce ops for 2d and 3d Reviewed By: houseroad Differential Revision: D9042505 fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d	2018-08-04 13:53:58 -07:00
Richard Zou	29406a2c4c	Fix shared_ptr refcycle in graph executor (#10222 ) Summary: Fixes #10032 When capturing an output, GraphExecutorAutogradFunction creates SavedVariable with is_output=False and owns it: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87 Constructing SavedVariable with is_output=False makes it own a copy of the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference cycle: `6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)` The solution in this PR is to construct the SavedVariable with is_output=True if the captured value is an output. Test Plan Turn on cuda memory checking for JitTestCase. If the test's name includes "cuda" or "gpu" in it, the cuda memory checking test happens. cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222 Reviewed By: ezyang Differential Revision: D9162995 Pulled By: zou3519 fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66	2018-08-04 11:39:10 -07:00
Marat Dukhan	2141cb7d53	Update OnnxifiOp to reflect onnx/onnx#1256 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10230 Reviewed By: yinghai Differential Revision: D9174527 Pulled By: Maratyszcza fbshipit-source-id: 753493e67446b528d65b146e89ea9f874b469ead	2018-08-04 08:09:19 -07:00
Sergii Dymchenko	5df8547ff9	Fix ONNX LogSoftmax export. (#9576 ) Summary: This fixes an issue with incorrect `axis=-1` in the exported ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9576 Reviewed By: yinghai Differential Revision: D9125463 Pulled By: houseroad fbshipit-source-id: 6f4cb1067d1aa6bb0a9f56690fc21816c98eebfa	2018-08-03 22:09:42 -07:00
Edward Yang	36939417b2	Introduce at::DeviceType, which subsumes at::Device::Type and (partially) caffe2::DeviceType (#10175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10175 Previously, we had at::Device::Type and caffe2::DeviceType (from protobuf), intended to help us distinguish between CPU, CUDA, etc. devices. This replaces at::Device::Type entirely with at::DeviceType, which in turn is a direct, 'enum class' version of the protobuf generated caffe2::DeviceType 'enum'. We can't eliminate the 'enum' because this would a pretty drastic API change (enum is interconvertible with integers, enum class is not) but we can make the two line up exactly and share code for, e.g., printing. Reviewed By: Yangqing Differential Revision: D9137156 fbshipit-source-id: 566385cd6efb1ed722b25e6f7849a910b50342ab	2018-08-03 19:25:06 -07:00
Edward Yang	98d60ad43d	Replace caffe2::EnforceNotMet with at::Error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10184 Reviewed By: dzhulgakov Differential Revision: D9140095 fbshipit-source-id: 3beead825609cec5054347e59903b0b78ef150f8	2018-08-03 19:25:05 -07:00
Edward Yang	e2976ea519	Make at::Error look more like caffe2::EnforceNotMet (#10183 ) Summary: - New concept of a message stack; you can add messages using AppendMessage - New concept of a caller; it's just a way to pass along some arbitrary extra information in the exception Coming soon is changing Caffe2 to use at::Error instead of EnforceNotMet Pull Request resolved: https://github.com/pytorch/pytorch/pull/10183 Differential Revision: D9139996 Pulled By: ezyang fbshipit-source-id: 6979c289ec59bc3566a23d6619bafba2c1920de9	2018-08-03 19:25:03 -07:00
Edward Yang	c7c6e93312	Use target_compile_definitions for AT_CORE_STATIC_WINDOWS (#10213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10213 nvcc only respects definitions, not options. Reviewed By: dzhulgakov Differential Revision: D9154388 fbshipit-source-id: 04c4809154df1c61108b65f1115fccdeb336952e	2018-08-03 19:25:02 -07:00
Edward Yang	02a64b183c	Move ATenGeneral back out of core. (#10224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10224 It doesn't work with Caffe2; use AT_CORE_API from ATen/core/CoreAPI.h instead. Reviewed By: smessmer Differential Revision: D9162467 fbshipit-source-id: 3c7d83c1ccb722ebac469296bdd7c3982ff461e5	2018-08-03 19:25:01 -07:00
Edward Yang	41dce17e22	Delete TensorImpl::type_, replace with backend_/scalar_type_/is_variable_ (#10210 ) Summary: The basic game plan is to stop accessing the type_ field directly, and instead using the stored backend_, scalar_type_ and is_variable_ to look up the appropriate Type from Context. Storage of backend_ and scalar_type_ are new. At some future point in time, I'd like to look at this code carefully to see if I can get everything in this codepath inlining. I didn't do it in this patch because there are circular include problems making things difficult. Some other details: - Added Device::backend() which does what it says on the tin - SparseTensorImpl is temporarily hard-coded to root in at::Context for the appropriate context. If/when we put this in shared code, we'll have to break this dep too, but for now it should be OK. - There's a stupid problem with globalContext() deadlocking if you didn't actually initialize it before loading libtorch.so (which is bringing along the variable hooks). I fixed this by reordering the static initializers. Fixes #9784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10210 Differential Revision: D9150697 Pulled By: ezyang fbshipit-source-id: 89e2006c88688bcfab0dcee82dc369127c198c35	2018-08-03 18:25:19 -07:00
Wei Yang	149d4f776b	use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965 ) Summary: - fixes #9141, #9301 - use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion) - return (N) instead of (N, C) to match the same behavior as MultiMarginLoss - Note that with this PR, the following behavior is expected: ``` loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none') loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean') loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum') loss.sum() == loss_sum # True loss.mean() == loss_mean # True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965 Differential Revision: D9038402 Pulled By: weiyangfb fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9	2018-08-03 17:54:19 -07:00
Dmytro Dzhulgakov	7bc87172ea	Kill Tensor::shares_data (#10217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10217 It's only used in debug printing and is not that reliable anyway. If we want to implement it later - we should do it proper accounting for shared storages. Reviewed By: jerryzh168 Differential Revision: D9155685 fbshipit-source-id: 48320d41a0c4155645f3ba622ef88730a4567895	2018-08-03 17:40:39 -07:00
Jerry Zhang	3b3aff2ed6	IsType<TensorCPU> -> IsType<Tensor>(CPU) (#10135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10135 att Reviewed By: yinghai Differential Revision: D9121892 fbshipit-source-id: 4a4a3bfc450896b619bf92c92ef218aaaefc3081	2018-08-03 17:24:59 -07:00
Sebastian Messmer	4aa7469d1f	Implement c10 ops needed for benchmark (#9360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9360 This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark. All implementations are CPU-only and experimental. They're not meant to be used in production. They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths. Reviewed By: dzhulgakov Differential Revision: D8811698 fbshipit-source-id: 826789c38b2bfdb125a5c0d03c5aebf627785482	2018-08-03 16:09:27 -07:00
Sebastian Messmer	08e7af20d3	Implement calling of c10 ops from c2 (#9369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9369 This adds the capability for caffe2 to call c10 operators and adds a dummy c10 sigmoid op as a proof of concept. I used this test script to make sure it works: from caffe2.python import workspace, model_helper import numpy as np data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) m = model_helper.ModelHelper(name="my net") sigmoid1 = m.net.C10Sigmoid_DontUseThisOpYet("data1", "sigmoid1") sigmoid2 = m.net.Sigmoid("data1", "sigmoid2") workspace.RunNetOnce(m.param_init_net) workspace.CreateNet(m.net) data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) workspace.RunNet(m.name, 1) print(workspace.FetchBlob("data1")) print(workspace.FetchBlob("sigmoid1")) print(workspace.FetchBlob("sigmoid2")) (and check that both sigmoid outputs are the same) Reviewed By: ezyang Differential Revision: D8814669 fbshipit-source-id: eeb0e7a854727f1617a3c592a662a7e5ae226f40	2018-08-03 16:09:23 -07:00
wuhuikx	c5abe8844a	Add IDEEP fallbacks for Resnet50 training ops (#8541 ) Summary: 1. Add fallback gradient ops 2. In fallback ops, set the output Tensor as CPUTensor instead of IDEEPTensor if ndim = 0. Because IDEEPTensor doesn't support 0 dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8541 Reviewed By: yinghai Differential Revision: D9115233 Pulled By: wesolwsk fbshipit-source-id: 163e6a76f02bd781c95d1060ccbacf2cab90055e	2018-08-03 15:54:17 -07:00
Sebastian Messmer	4680ab4d44	Generalize intrusive_ptr comment (#10216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10216 - Reviewed By: ezyang Differential Revision: D9155601 fbshipit-source-id: 154de2e6ad747134413a3ab3ae0b7507b8284d49	2018-08-03 14:25:28 -07:00
Sebastian Messmer	97cbcb7d67	Allow releasing/retaining weak_intrusive_ptr (#10214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10214 Seems we're passing weak pointers over C API boundaries. Need this API there too. Reviewed By: ezyang Differential Revision: D9154505 fbshipit-source-id: c9889689b87dad5d918f93ba231e01704b8d2479	2018-08-03 14:25:24 -07:00
Thomas Viehmann	6456b944fd	ctc_loss odds and ends (#10112 ) Summary: - Add convenience wrapper to pass tensors as input_lengths, target_lengths - Fix documentation example - Check BLANK >= 0 Thank you, Simon and Soumith for the suggestions! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112 Differential Revision: D9130737 Pulled By: SsnL fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6	2018-08-03 13:25:18 -07:00
Sebastian Messmer	65d32b1705	Remove unused substitutions (#10187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10187 These substitutions don't actually occur in the target file. Remove them. Reviewed By: ezyang Differential Revision: D9141567 fbshipit-source-id: fcfddee0b4d31e21763b39d852577d2dbb9ce843	2018-08-03 12:25:59 -07:00
Sebastian Messmer	f51f15bb27	Update include paths for ATen/core (#10130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130 Update some include paths to make them internally consistent Reviewed By: ezyang Differential Revision: D9119906 fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467	2018-08-03 11:57:02 -07:00
Aaron Jaech	f77b62c3e1	Add documentation for margin arg in Caffe2 MarginRankingCriterionOp (#10186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10186 The MarginRankingCriterionOp margin argument was undocumented. Reviewed By: jerryzh168 Differential Revision: D9141228 fbshipit-source-id: 724d45dc8e555fbe9d3e8afc7b6bf8ed17bbbdb1	2018-08-03 11:45:51 -07:00
Peter Goldsborough	cb0e72e00d	Add registerOperator overloads that infer the schema (#10048 ) Summary: This PR adds a way to infer the JIT/script schema of a function from its signature, and then create an operator from the schema and implementation. The implementation function is wrapped into another function, which pops values from the stack into an argument tuple, then invokes the function and pushes the return value back onto the stack, sometimes unpacking the return value if it is a tuple. Currently the method is called `createOperator`. We may want to think of a nicer way of registering ops in tandem with `RegisterOperators`. It might be very cumbersome to add a template constructor to `Operator`, so maybe we can come up with a chaining method on `RegisterOperators` like `RegisterOperators(schema, func).op(schema.func).op(schema, func)` -- it has to work at startup time (for a static variable) though. We can solve this in another PR. zdevito apaszke smessmer dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10048 Differential Revision: D9125975 Pulled By: goldsborough fbshipit-source-id: de9e59888757573284a43787ae5d94384bfe8f9a	2018-08-03 11:45:49 -07:00
Owen Anderson	7a377b9a53	Add torch.argsort mirroring similar functionality in numpy. (#9600 ) Summary: Per issue #9542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9600 Differential Revision: D8952338 Pulled By: resistor fbshipit-source-id: c3f69d62858ad9458ec5ae563e3ff24b1c9283a7	2018-08-03 11:45:47 -07:00
Sebastian Messmer	c91af1202a	Make release_resources non-const (#10192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10192 - release_resources() method must be non-const because it modifies the object - for intrusive_ptr<const MyClass>, this needs to be const_cast :( Reviewed By: ezyang Differential Revision: D9143808 fbshipit-source-id: 9203ff7a7ff3bec165931279371c6e75d4f0ca8c	2018-08-03 11:24:45 -07:00
Sebastian Messmer	39476d79a2	Allow releasing/reclaiming intrusive_ptr (#10133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10133 This is useful for C APIs where we want to give owning pointers to/from other languages. Reviewed By: ezyang Differential Revision: D9121493 fbshipit-source-id: f903f5830f587b2ba69c0636ddcf1a066bbac2e0	2018-08-03 11:24:43 -07:00
Edward Yang	5753746d29	Enable static initializer order ASAN. (#10211 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10211 Differential Revision: D9150687 Pulled By: ezyang fbshipit-source-id: 4cd458d19a34788c8897905a87d1b52229f67f90	2018-08-03 11:24:42 -07:00
Christian Puhrsch	4a6fbf03c6	Make StorageImpl member variables largely private and use getters and setters Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10074 Differential Revision: D9086887 Pulled By: cpuhrsch fbshipit-source-id: d2dd0d6a1b71d0f864aefb64cd1daefd11dcfb91	2018-08-03 11:10:02 -07:00
Wanchao Liang	50cf326158	Allow type cast between int and float in Script (#10168 ) Summary: The PR allows int→float and float→int casts. Current we only allow `tensor→int` and `tensor→float` casts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10168 Differential Revision: D9141163 Pulled By: wanchaol fbshipit-source-id: 5e5591a98b4985a675641dfc9a385b2a0bf8e208	2018-08-03 10:56:05 -07:00
Jerry Zhang	5d3782b655	Fix IDEEP Copys (#10104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10104 . Reviewed By: yinghai Differential Revision: D9109638 fbshipit-source-id: 319cc5711132314dfba0f09ac403522f21ad532b	2018-08-03 10:31:32 -07:00
Jerry Zhang	656bb320b7	EnforceFinite test (#10143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143 att Reviewed By: xianjiec Differential Revision: D9122444 fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02	2018-08-03 10:31:29 -07:00
Michael Suo	13de6e8dfa	Make list literals construct ListType (#10193 ) Summary: Previously, `foo = [bar, baz]` would construct a TupleType of fixed arity. This would cause code like: ``` foo = [2] if True: foo = [2, 2] ``` to fail to compile, since `(int)` is not the same as `(int, int)`. This PR changes things so that list literals construct ListTypes, which can be resized. Potentially breaking changes introduced: - Empty list literals are now disallowed, `_constructEmptyFooList()` builtins are required to replace them. - Iterable variable unpacking where the rhs is a list is now disallowed. (Tuples still work) - Lists must have a single type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10193 Differential Revision: D9147166 Pulled By: michaelsuo fbshipit-source-id: bbd1b97b0b6b7cb0e6f9d6aefa1ee9c731e63039	2018-08-03 00:55:23 -07:00
Tongzhou Wang	ab0ac6391b	fix padding doc not rendered correctly (#10196 ) Summary: somehow sphinx doesn't like the previous wording Pull Request resolved: https://github.com/pytorch/pytorch/pull/10196 Differential Revision: D9146817 Pulled By: SsnL fbshipit-source-id: 2140859bc363af556a021658def946d7afbdb245	2018-08-02 23:26:45 -07:00
Junjie Bai	4778afb8bb	In Expand support using -1 to indicate preserving original size (#10174 ) Summary: zrphercule https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174 Differential Revision: D9136467 Pulled By: bddppq fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583	2018-08-02 22:09:47 -07:00
Junjie Bai	dd527db711	Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179 ) Summary: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console petrex ashishfarmer rohithkrn Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179 Differential Revision: D9139657 Pulled By: bddppq fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee	2018-08-02 21:09:27 -07:00
Zachary DeVito	1f78e06f63	Add g.insertConstant and clean up dead attributes code (#10177 ) Summary: * Changes `insertConstant(g, val)` to `g.insertConstant(val)`. * Moves SourceRange to its own file to enable it. * Cleans up dead attribute code in schema matching and graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10177 Differential Revision: D9137789 Pulled By: zdevito fbshipit-source-id: 8a73cfb01a576f02e7e4dce019be9c0a0002989d	2018-08-02 20:45:31 -07:00
Sebastian Messmer	798b530361	weak_intrusive_ptr (#10038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10038 Add weak_ptr ability to intrusive_ptr. Reviewed By: ezyang Differential Revision: D9039980 fbshipit-source-id: dd504d6e0d7acf5914cd45845355e28f9df201fb	2018-08-02 17:25:14 -07:00
Sebastian Messmer	2bd709a7c8	intrusive_ptr (#9897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9897 Add an IntrusivePtr class to do intrusive refcounting with a shared_ptr-like interface. Reviewed By: ezyang Differential Revision: D9018619 fbshipit-source-id: 5de8706aab8eea2e30bead0f59bd6a7ca4d20011	2018-08-02 17:25:12 -07:00
Roy Li	0e9c6898cb	Export modules in ir with google protobuf Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9746 Differential Revision: D9110006 Pulled By: li-roy fbshipit-source-id: 8b9744c042f822fdfe959a7a7fef3d0baff4f639	2018-08-02 15:54:51 -07:00
Lukasz Wesolowski	e2ecf3914a	Change default CUDA block size from 512 to 128 (#10090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10090 Decreasing the block size improves GPU utilization for use cases with small input sizes (e.g. 10000) Reviewed By: pjh5 Differential Revision: D9093573 fbshipit-source-id: c8f995b773a00b1bea3a3809c0f6557133efd9dd	2018-08-02 15:40:13 -07:00
Taewook Oh	7dc870bd7b	Delete invalid 'template' keyword (#10173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10173 With D9024330, `Extend` fundtion is no more a template, which makes the `template` keyword here invalid. For some reason current version of LLVM doesn't catch this, but the latest one does. Reviewed By: jerryzh168 Differential Revision: D9133462 fbshipit-source-id: 54ac9aad01f81b9b4e7b6e2864b8961478d2d860	2018-08-02 14:50:11 -07:00
Owen Anderson	dad6e8bb6c	Remove capture specifiers in register_aten_ops when they're not needed. (#9669 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9669 Differential Revision: D8952335 Pulled By: resistor fbshipit-source-id: 8fbbec7a7f55fbeeda3509cb3d339e1db90a53e6	2018-08-02 13:40:31 -07:00
Christian Puhrsch	94c67f1454	Replace storageimpl type with scalar_type and backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10097 Differential Revision: D9124287 Pulled By: cpuhrsch fbshipit-source-id: c976abeeaaa085b972812c1a3270eb6aef0c0dca	2018-08-02 13:31:30 -07:00
Tongzhou Wang	538b15d13c	Use PYTORCH_PYTHON to call generate_code.py (#10171 ) Summary: Probably fixes https://github.com/pytorch/pytorch/issues/8373#issuecomment-409994847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10171 Differential Revision: D9135607 Pulled By: SsnL fbshipit-source-id: 72f535875658c857621e41fd25c2174052714557	2018-08-02 12:54:14 -07:00
Edward Yang	9e85a7a9de	Back out "[pytorch][PR] [TENSOR MERGE] Delete type_ field from TensorImpl, replaced with backend_/scalar_typ…" (#10169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10169 Original commit changeset: 2b4d867abfdc Reviewed By: pjh5, SsnL Differential Revision: D9135216 fbshipit-source-id: d5c9f12c3a0f75df224c781e1cd1e323cdfbb0d5	2018-08-02 12:39:01 -07:00
onnxbot	7be071a829	Update onnx to onnx/onnx@2a3a226 (#10167 ) Summary: `2a3a226a96` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10167 Reviewed By: houseroad Differential Revision: D9134738 Pulled By: bddppq fbshipit-source-id: 9d3fd3c04a584d5626146f174ac78cabfa0e5934	2018-08-02 12:25:19 -07:00
Rob Kunkle	6e85112f12	Adding katex rendering of equations, and required edits to equations. (#8848 ) Summary: This fixes issue #8529. - Adds Katex extension to conf.py and requirements.txt - Fixes syntax differences in docs - Should allow documentation pages to render faster Pull Request resolved: https://github.com/pytorch/pytorch/pull/8848 Reviewed By: soumith Differential Revision: D8677702 Pulled By: goodlux fbshipit-source-id: c4a832c5879e0eebcb14763b35a41663331ba23f	2018-08-02 12:25:17 -07:00
Junjie Bai	ee98533746	Fix compiler warnings on ignored const qualifiers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10142 Reviewed By: yinghai Differential Revision: D9125502 Pulled By: bddppq fbshipit-source-id: 8043b2a05507a4707220fa820ab6cc486760a93e	2018-08-02 12:10:37 -07:00
Edward Yang	5765549155	codemod -d caffe2 --extensions cc,h CaffeTypeId TypeIdentifier (#10166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10166 TypeIdentifier is still easy to codemod away from Reviewed By: smessmer Differential Revision: D9132840 fbshipit-source-id: bc83a8b17b2e7c19c9d2c9cfe5c7ce6ec1d8cec5	2018-08-02 11:54:30 -07:00
Lin Li	4a2f3cc45f	Improve lars operator by applying clipping (#9905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905 This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate Reviewed By: pjh5 Differential Revision: D9020606 fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f	2018-08-02 11:54:28 -07:00
Gregory Chanan	a243e517fa	Guard sizes/strides in TH/THC for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10145 Differential Revision: D9125791 Pulled By: gchanan fbshipit-source-id: d0b8c88c49d7af85971a4531a63fd85a97bfbec7	2018-08-02 11:24:36 -07:00
Elias Ellison	170d29769b	Strings lexing, parsing, implementation in print (#9324 ) Summary: This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print. If they are encountered elsewhere a NYI exception will be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324 Reviewed By: jramseyer Differential Revision: D8807128 Pulled By: eellison fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078	2018-08-02 11:09:03 -07:00
Gregory Chanan	230ca98d4b	Remove THTensor_isSize. (#10146 ) Summary: This is part of the process of removing THLongStorage to represent sizes/strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10146 Differential Revision: D9126611 Pulled By: gchanan fbshipit-source-id: b0d995a4c51dfd54bf76dcfee9a69f37f9d01652	2018-08-02 10:39:43 -07:00
James Reed	9c818bfbc7	Refactor PythonValue types + use tryMatchSchema for PythonOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10132 Differential Revision: D9121327 Pulled By: jamesr66a fbshipit-source-id: 6d8bcf6b0dca54106cf9ed740bcff857062a03da	2018-08-02 10:26:58 -07:00
iotamudelta	cfa05706ef	ROCm contributions week 29 (#9653 ) Summary: In this changeset: * improvements to `hipify-python.py` * marking unit tests broken for ROCm * reducing the number of jobs for the built to avoid out of memory issues * switch to Thrust/cub-hip master for the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653 Differential Revision: D9117791 Pulled By: ezyang fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352	2018-08-02 09:09:00 -07:00
Runtian Zhou	70d47f92db	Add support for rand_like op in fusion compiler (#9795 ) Summary: Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9795 Differential Revision: D8999029 Pulled By: SsnL fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8	2018-08-02 08:55:25 -07:00
Duc Ngo	4a5cd4f6ab	nomnigraph - new utility for graph transformation (#10081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10081 Add new utility that make it easier to write graph transformation. Callers now only need to take care of the actual transformation logic. The subgraph matching is simplified because callers only need to specify a simple construct for subtree matching criteria. The utlity is SubgraphMatcher::replaceSubtree Some notes: - replaceSubtree takes a subtree matching criteria, and a lambda that takes a subtree root. It does't not handle any transformations itself. Callers should be responsible for the transformation part, including deleting all nodes in the matched subtree(s). We could enhance this to also handle the deletion part if it turns out to be useful. - Only sub tree matching is supported for now but we can add general DAG sub-graph support later if needed. Reviewed By: bwasti Differential Revision: D9073297 fbshipit-source-id: 465a0ad11caafde01196fbb2eda2d4d8e550c3b6	2018-08-01 23:09:41 -07:00
Zhicheng Yan	acbc2744d8	fix bug in 3d group convolution (#9860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9860 For 3D group convolution, in the case of CUDNN 7 and NCHWD order, filter dim is (M, C/group_, k_h, h_w, k_d). According to CUDA doc (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#grouped-convolutions), the existing implementation is incorrect, and will crash the 3d video model training with group convolution. In the implementation, `filter.dims(1)` is already `C/group_`. So don't need to divide it by `group_` again. Reviewed By: BIT-silence Differential Revision: D9008807 fbshipit-source-id: 2f0d6eb47f4e16d7417a7e3baeba709e3254154f	2018-08-01 22:55:38 -07:00
Chunli Fu	57061d600a	Auto-batching IR transformation for control flow (#9392 ) Summary: Implement IR transformation for control flow - `prim::Constant`: clone to new graph directly - `prim::NumToTensor`: create a `BatchTensor` from output tensor with `batch_size = 1` - `prim::TensorToNum`: clone to new graph - `prim::ListConstruct`: clone to new graph - `prim::If`: execute both `if_block` and `else_block` and combine results from them using `cond` - `prim::Loop`: - for loop - while loop: change while `cond` to `cond_any`, use `cond` to update outputs test case: hand-written LSTM, greedy search, beam search Pull Request resolved: https://github.com/pytorch/pytorch/pull/9392 Differential Revision: D8822369 Pulled By: ChunliF fbshipit-source-id: 8f03c95757d32e8c4580eeab3974fd1bc429a1e5	2018-08-01 22:24:35 -07:00
Edward Yang	8a25acbba5	Use angle brackets instead of quotes for includes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10153 Reviewed By: smessmer Differential Revision: D9123768 fbshipit-source-id: 0970552ba4d5772fb3cef2db3af3181d98f85140	2018-08-01 22:02:51 -07:00
Edward Yang	5699250acc	Move IdWrapper to ATen/core (#10152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10152 - Moved from namespace c10::guts to at - I fixed the use sites, since there were only three of them - Macro renamed from C10_ to AT_ Reviewed By: smessmer Differential Revision: D9123652 fbshipit-source-id: bef3c0ace046ebadb82ad00ab73371f026749085	2018-08-01 22:02:50 -07:00
Edward Yang	8cc7d33656	Renumber typeid.h so that the number lines up with ScalarType (#10139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10139 We want CaffeTypeId to be interconvertible with at::ScalarType, and this means we should have the numbers line up exactly. Fortunately this is not too hard to do. Reviewed By: smessmer Differential Revision: D9123058 fbshipit-source-id: 7e9bd59ca25a552afe9d2d0a16cedc4f6311f911	2018-08-01 22:02:46 -07:00
Richard Zou	6b338c8026	Implement torch.broadcast_tensors (#10075 ) Summary: This exposes expand_outplace to python. Fixes #8076. Fixes #10041. I didn't name it torch.broadcast because numpy.broadcast does something slightly different (it returns an object with the correct shape information). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10075 Differential Revision: D9125816 Pulled By: zou3519 fbshipit-source-id: ebe17c8bb54a73ec84b8f76ce14aff3e9c56f4d1	2018-08-01 19:18:34 -07:00
Michael Suo	191482fa39	Distinguish TupleLiteral from ListLiteral (#10128 ) Summary: Previously, the parser was emitting list literals for tuples, but the IR was representing list literals internally with TupleTypes. For implementing most list operations, I think it will be helpful distinguish between lists (dynamic size, homogeneous types) and tuples (fixed arity, heterogeneous types) This diff modifies the parser logic to emit tuple literals. This frees us to represent lists as ListType in the IR, while still properly mapping tuple literals to TupleTypes. A following diff will actually switch over list literals to emit ListTypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10128 Differential Revision: D9121305 Pulled By: michaelsuo fbshipit-source-id: e0cad07ae8bac680f7f8113d10e5129d5a1a511d	2018-08-01 19:18:31 -07:00
Yinghai Lu	a44d9d6eb4	Fix tensor check logic in logging (#10138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10138 Note that `TensorCPU` and `TensorGPU` are all refined to be `Tensor` now. Basically they are the same thing. So check like `blob.IsType<TensorCPU>()` is no longer safe as `TensorGPU` can pass the check too. We need to systematically weed out the such usage in our codebase... @[100008320710723:jerryzh] Reviewed By: houseroad Differential Revision: D9115273 fbshipit-source-id: 13b293c73691002eac34e095cdcd96c27183e875	2018-08-01 18:09:19 -07:00
Edward Yang	24bb8cecbe	Move ATen/Half to ATen/core, and apply lint (#10137 ) Summary: This rewrites checked_convert to use stringstreams, eliminating the use of to_string which is not available on Android stdc++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10137 Reviewed By: smessmer Differential Revision: D9122340 fbshipit-source-id: b7c1bff70e36217305f2b3333c51543ef8ff3d9c	2018-08-01 17:54:58 -07:00
Junjie Bai	806854a3c5	Pin AMD gpu id in Caffe2 CI (#10144 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10144 Differential Revision: D9125707 Pulled By: bddppq fbshipit-source-id: 8ef8f3da6ceb1855f28fc24be621b9b4854ff7f9	2018-08-01 17:39:21 -07:00
Edward Yang	59c355c870	Move halfbits2float and float2halfbits conversions to ATen. (#10134 ) Summary: This will be needed soon because I want to move Half.h into ATen/core, and then I cannot have a TH dependency. I also took the liberty of making the code more strict-aliasing safe (this is not actually useful, since we will never built Torch with strict aliasing) by replacing pointer casts between float and unsigned with a memcpy instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10134 Differential Revision: D9121920 Pulled By: ezyang fbshipit-source-id: 3b1f86a7c5880e8ac1a589a51f0635bb72e1fd40	2018-08-01 17:09:12 -07:00
Jenny Ramseyer	4ed5b9267c	#8518 Support for empty tuples (#10027 ) Summary: Fixing #8518 Sorry for the pile of commits; I forgot to rebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10027 Reviewed By: ezyang Differential Revision: D9070028 Pulled By: jramseyer fbshipit-source-id: 49729c9755ab8a586711e9f6d6a574f3035a7e75	2018-08-01 16:10:00 -07:00
Pushkar Tripathi	1f6888b70a	Allow mobile exporter to export string arrays (#10017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10017 Allow mobile exporter to export string arrays Reviewed By: pjh5 Differential Revision: D9061213 fbshipit-source-id: b6c5257eb2f0f964dba255b97dc5d32af8ce15a7	2018-08-01 16:09:58 -07:00
Edward Yang	1d427fd6f6	Delete type_ field from TensorImpl, replaced with backend_/scalar_typ… (#9787 ) Summary: …e_/is_variable_ The basic game plan is to stop accessing the type_ field directly, and instead using the stored backend_, scalar_type_ and is_variable_ to look up the appropriate Type from Context. Storage of backend_ and scalar_type_ are new. At some future point in time, I'd like to look at this code carefully to see if I can get everything in this codepath inlining. I didn't do it in this patch because there are circular include problems making things difficult. Some other details: - Added Device::backend() which does what it says on the tin - SparseTensorImpl is temporarily hard-coded to root in at::Context for the appropriate context. If/when we put this in shared code, we'll have to break this dep too, but for now it should be OK. - There's a stupid problem with globalContext() deadlocking if you didn't actually initialize it before loading libtorch.so (which is bringing along the variable hooks). I didn't fix it in this PR; it's tracked in #9784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9787 Reviewed By: cpuhrsch Differential Revision: D8980971 Pulled By: ezyang fbshipit-source-id: 2b4d867abfdc3999a836a220c638c109053145a8	2018-08-01 15:34:56 -07:00
Sebastian Messmer	edb90387b2	Lint ArrayRef.h (#10129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10129 - Reviewed By: ezyang Differential Revision: D9119933 fbshipit-source-id: dd13c6d2a0ab72d943acff5cb02b3278ca8c7ba6	2018-08-01 15:34:54 -07:00
Sebastian Messmer	080ae5ea1f	Remove implicit ArrayRef -> vector conversion (#9740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9740 - Remove implicit ArrayRef -> vector conversion - Fix 4 call sites that accidentally did an implicit expensive vector conversion but wouldn't have needed to - Remove explicit vector conversion from 4 call sites that also didn't need to do that Reviewed By: ezyang Differential Revision: D8961693 fbshipit-source-id: 980da9f988083c0072497f9dbcbbf6f516fa311c	2018-08-01 15:34:52 -07:00
Sebastian Messmer	e2846c365a	Improve ArrayRef (#9610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9610 Mostly making some stuff in ArrayRef constexpr to give it better perf. Reviewed By: ezyang Differential Revision: D8926785 fbshipit-source-id: af6d4b05fbc69d20855a80f3edc2b501577a742b	2018-08-01 15:34:50 -07:00
Richard Zou	ad6d62250a	Add torch.compiled_with_cxx11_abi(). (#10071 ) Summary: It returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1. Fixes #8385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10071 Differential Revision: D9088946 Pulled By: zou3519 fbshipit-source-id: b00fd92ee340ef34f60bdd6027ceaf46dd7442c0	2018-08-01 15:34:48 -07:00
onnxbot	1b1c47dfe5	Update onnx to onnx/onnx@32ac71b (#10126 ) Summary: `32ac71b1b9` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10126 Reviewed By: houseroad Differential Revision: D9120544 Pulled By: bddppq fbshipit-source-id: 4fbe1f16e3b712c092f2f188324173ba1ecc1062	2018-08-01 14:28:54 -07:00
Gregory Chanan	fb24c52dc3	Prepare TH for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10123 Differential Revision: D9121068 Pulled By: gchanan fbshipit-source-id: 1cdc6e4b327cf158729cbb4026315be63b159f9d	2018-08-01 14:28:53 -07:00
Gregory Chanan	2d56b5cf8b	Prepare THC for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072 Differential Revision: D9082421 Pulled By: gchanan fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd	2018-08-01 14:28:51 -07:00
Edward Yang	59af5b928a	Move UniqueVoidPtr to ATen/core and apply lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10131 Reviewed By: smessmer Differential Revision: D9121096 fbshipit-source-id: a6861429f06302e3e279ff669961bba34a9fb7a1	2018-08-01 13:25:23 -07:00
Edward Yang	2d6738e89e	Fix lint in ATen/core (but not ArrayRef) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10124 Reviewed By: smessmer Differential Revision: D9119768 fbshipit-source-id: c0a56d27401b730956945146d4f48d4d5a9b77a6	2018-08-01 13:25:19 -07:00
Roy Li	f908b2b919	Use google protobuf in pytorch onnx import/export Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8469 Reviewed By: houseroad Differential Revision: D9102041 Pulled By: li-roy fbshipit-source-id: 805c473745d181b71c7deebf0b9afd0f0849ba4f	2018-08-01 12:54:41 -07:00
Edward Yang	5a44be50ab	Minor nit in comment in CMakeLists.txt Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10125 Reviewed By: smessmer Differential Revision: D9119766 fbshipit-source-id: 290b804bc552b1c3f68e5129ff60ef7f34307714	2018-08-01 12:39:38 -07:00
Anders Papitto	e8f27311aa	fix a couple problems with libtorch cmake file (#10091 ) Summary: in particular, make not building tests actually work Pull Request resolved: https://github.com/pytorch/pytorch/pull/10091 Differential Revision: D9121366 Pulled By: anderspapitto fbshipit-source-id: d7d38cf759aa46bff90d3b4f695c20f29039ae75	2018-08-01 11:39:33 -07:00
Owen Anderson	f126687fbc	Add a dump() method to IR Node's. (#10106 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10106 Differential Revision: D9119891 Pulled By: resistor fbshipit-source-id: 5f41d8890007c639f8f0cdc92d11b128433ad6b8	2018-08-01 11:09:53 -07:00
Sebastian Messmer	4070005081	Move C++17.h to ATen/core (#10107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10107 This header is needed for ATen/core stuff This diff also fixes an issue in C++17.h when run in C++17 enabled compilers. Reviewed By: ezyang Differential Revision: D9095209 fbshipit-source-id: d45947956019a7095875f48746b88c414e8865bc	2018-08-01 09:54:59 -07:00
Peter Goldsborough	87d57dc5f5	Simplified Operator (#10080 ) Summary: zdevito explained that the attributed versions of `Operator`s are no longer necessary. This PR does two things: 1. Removes all code associated with attributed operators, 2. Adds a second kind of state to `Operator` where it is constructed with an `Operation` directly instead of an `OperationCreator`. This will be useful to test custom operators which don't require a node (you can just retrieve it directly). Now rebased on top of https://github.com/pytorch/pytorch/pull/9801 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10080 Differential Revision: D9113668 Pulled By: goldsborough fbshipit-source-id: 1276a191c7cf89da1c38488769f2105ce2664750	2018-08-01 09:41:08 -07:00
Mingzhe Li	f1964c43fd	Update eigen submodule to fix BUILD_ATEN issue (#10095 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/8338 Updating Eigen submodule to fix an issue we saw with BUILD_ATEN and BUILD_CAFFE2 removal. cc mingzhe09088 ezyang smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/10095 Reviewed By: mingzhe09088 Differential Revision: D9109877 Pulled By: orionr fbshipit-source-id: 90e36c298d8a22398558d70dc5f68a95a7687d6b	2018-08-01 09:41:06 -07:00
Anders Papitto	a2a7b0c01a	Initial documentation for building libtorch (#10087 ) Summary: It's not a particularly pretty process right now, but it may as well be documented. I'm not aware of an ideal location for this, so I'm just dropping it in the docs/ folder for now as recommended by soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10087 Differential Revision: D9119681 Pulled By: anderspapitto fbshipit-source-id: cd4afb642f3778c888d66a501bc697d0b0c88388	2018-08-01 09:41:02 -07:00
Dr. Kashif Rasul	ee964c51f4	NegativeBinomial distribution (#9345 ) Summary: - [x] implement distribution - [x] add tests - [x] docs cc ingmarschuster Pull Request resolved: https://github.com/pytorch/pytorch/pull/9345 Differential Revision: D8807023 Pulled By: ezyang fbshipit-source-id: 7bf7f352dd455e0909c58dd94e1bdebba0e8b5c8	2018-08-01 08:39:25 -07:00
Xingdong Zuo	2f848ec8ec	Use new PyTorch API to make code simpler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9968 Differential Revision: D9088316 Pulled By: li-roy fbshipit-source-id: 2658fe0c1734d8b064cbad24d8f0d6c341400b4e	2018-08-01 08:39:23 -07:00
Edward Yang	fa6b28bf40	Move ArrayRef, Backtrace, Error, SmallVector, optional to ATen/core; add CoreAPI (#10092 ) Summary: This also makes Backtrace more portable, by disabling its functionality for mobile builds as well. It also handles Caffe2 static Windows builds by introducing a new variable, AT_CORE_STATIC_WINDOWS, which must be set if you're building ATen on Windows as part of a static library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10092 Reviewed By: gchanan, smessmer Differential Revision: D9094393 Pulled By: ezyang fbshipit-source-id: 93281f9302bd378605a26589ae308faf1dac7df4	2018-08-01 08:39:22 -07:00
Gregory Chanan	b503109f20	Guard sizes/strides in THCUNN for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10083 Differential Revision: D9093572 Pulled By: gchanan fbshipit-source-id: a5c27571ec06f8ed30e6b3b492c743444b58d9fe	2018-08-01 08:10:33 -07:00
Tongzhou Wang	43b151224e	Move grid sampler to ATen (#9961 ) Summary: Spatial version benchmark \| \| CPUFloat THNN \| CPUFloat ATen \| CPUDouble THNN \| CPUDouble ATen \| CUDAHalf THNN \| CUDAHalf ATen \| CUDAFloat THNN \| CUDAFloat ATen \| CUDADouble THNN \| CUDADouble ATen \| \|---------------------------\|---------------\|---------------\|----------------\|----------------\|---------------\|---------------\|----------------\|----------------\|-----------------\|-----------------\| \| [1024x1x28x28] zero pad \| 2.19281888s \| 0.21280479s \| 2.52922535s \| 0.23944831s \| 0.17494774s \| 0.06242800s \| 0.31270599s \| 0.03706479s \| 0.40542483s \| 0.07391024s \| \| [1024x1x28x28] border pad \| 3.04329610s \| 0.24705672s \| 2.29205394s \| 0.22336411s \| 0.17980361s \| 0.06212497s \| 0.31415701s \| 0.03847790s \| 0.43020391s \| 0.07540464s \| \| [32x3x244x244] zero pad \| 18.29301333s \| 2.18566656s \| 19.01662397s \| 3.51552224s \| 1.72487235s \| 0.28933954s \| 2.02466702s \| 0.18178749s \| 2.63671613s \| 0.41391206s \| \| [32x3x244x244] border pad \| 18.72205329s \| 2.02600884s \| 20.13017297s \| 3.25979590s \| 1.96455693s \| 0.33070564s \| 2.18666625s \| 0.19546938s \| 2.91268897s \| 0.38465047s \| For #9702 basics: + grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d). + input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d) + grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar). variable naming: + `tensor_sH` means the stride of `tensor` at the dimension of `H`. + `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961 Differential Revision: D9057175 Pulled By: SsnL fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6	2018-08-01 07:54:46 -07:00
Xiang Gao	6fc75eadf0	Add CELU activation to pytorch (#8551 ) Summary: Also fuse input scale multiplication into ELU Paper: https://arxiv.org/pdf/1704.07483.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551 Differential Revision: D9088477 Pulled By: SsnL fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b	2018-08-01 07:54:44 -07:00
Wei Yang	6f6a1f2d63	fix test_load_error_msg failure (Network is unreachable) (#10021 ) Summary: - fixes [some failure] - removed use of urlopen in test_load_error_msg] cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10021 Differential Revision: D9068108 Pulled By: weiyangfb fbshipit-source-id: a9484d4a913508d54731b6a1eef3cddff66604f2	2018-08-01 00:24:01 -07:00
Pritam Damania	5bd43a7af8	Refactor Seq2SeqModelCaffe2EnsembleDecoder (#10035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10035 This is an initial diff which refactors some of the components in the Seq2SeqModelCaffe2EnsembleDecoder class. Reviewed By: jmp84 Differential Revision: D9026372 fbshipit-source-id: 449635208f24494209ae2fb78a19fca872970ea8	2018-07-31 23:09:09 -07:00
Yinghai Lu	3d247041e4	Force sync device when ops are sampled for observation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10054 Reviewed By: xw285cornell Differential Revision: D9071097 fbshipit-source-id: 44357cdf79148e81db86c5350122a1a320a923fb	2018-07-31 21:09:00 -07:00
Bram Wasti	ec807f2a91	Bail out if netdef has disable_nomnigraph argument Summary: allow models to override nomnigraph opts Reviewed By: ajtulloch Differential Revision: D9035729 fbshipit-source-id: 2b30208263c14ce7039f27c618a3b232bf11ee33	2018-07-31 20:54:46 -07:00
Bram Wasti	fcd567ed15	Enable Optimization on mobile by default Summary: Re-enable opt by default Reviewed By: Maratyszcza Differential Revision: D8525434 fbshipit-source-id: a61253907251a44cfc59e0b50fb1906c5eb20558	2018-07-31 20:54:44 -07:00
Peter Goldsborough	7d2bda7588	Move DDP broadcast coalesced to C++ (#9729 ) Summary: This PR depends on the tests added in #9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions. pietern The controller you requested could not be found. apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9729 Differential Revision: D8985308 Pulled By: goldsborough fbshipit-source-id: dc459fe9040273714044152063585e746974752f	2018-07-31 19:54:21 -07:00
0phoff	294c065384	Changed serialization mechanism of LambdaLR scheduler (#9927 ) Summary: I opened an issue explaining some of my frustrations with the current state of schedulers. While most points that I raised in [that issue](https://github.com/pytorch/pytorch/issues/8741#issuecomment-404449697) need to be discussed more thoroughly before being implemented, there are some that are not so difficult to fix. This PR changes the way the LambdaLR scheduler gets serialized: > The lr_lambda functions are only saved if the are callable objects (which can be stateful). > There is no point in saving functions/lambdas as you need their definition before unpickling and they are stateless. This has the big advantage that the scheduler is serializable, even if you use lambda functions or locally defined functions (aka a function in a function). Does this functionality need any unit tests? Pull Request resolved: https://github.com/pytorch/pytorch/pull/9927 Differential Revision: D9055505 Pulled By: soumith fbshipit-source-id: 6c1cec588beedd098ec7d2bce6a9add27f29e48f	2018-07-31 19:39:06 -07:00
Kyle M. Tarplee	aae37324cc	fixed a newly introduced regression in softmax (#10066 ) Summary: There is a regression in softmin in 0.4.1 that was not present in 0.4.0. The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x). These are not the same. The fix is trivial because the bug is due to operator precedence. This is a major regression that broke my training. I'm not sure how a unit test did not catch this. ``` x = torch.tensor([1, 2, 3.5, 4]) print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0 print(F.softmax(-x, dim=0)) # this is what softmax should be print(F.softmax(x, dim=0)) print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly ``` In 0.4.1 this produces tensor([-0.0278, -0.0755, -0.3385, -0.5581]) tensor([0.6668, 0.2453, 0.0547, 0.0332]) tensor([0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) In 0.4.0 this produces the correct values tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066 Differential Revision: D9106995 Pulled By: soumith fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b	2018-07-31 19:28:30 -07:00
Bram Wasti	f2412fbafc	Allow multiple ops.def and clean up code gen in general Summary: This is a cleanup and refactoring. In its original form (changeset 6fdf915c057a) this diff caused a 5% regression on ads CPU. The root cause was an omission of link_whole = True, causing symbols to be stripped in mode/opt and forcing the converter to fallback causing patterns to be unmatched in the graph transform logic. This version of the diff tests for link_whole by including a C++ test of the transform Reviewed By: yinghai Differential Revision: D9040511 fbshipit-source-id: 3e19b89989aa68b021762d12af2d0b4111280b22	2018-07-31 19:28:28 -07:00
Shuichi KITAGUCHI	799c947cf3	add .gitattributes for EOL conversion. (#9813 ) Summary: `.bat` file's EOL is LF, so a build is failed on some Windows machines. To fix this, add `.gitattributes` and set batch file's EOL to CRLF. Discussion is in #9677. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9813 Differential Revision: D9026486 Pulled By: soumith fbshipit-source-id: 341eaa677c35f8476a7eda1bac9827385072eb29	2018-07-31 18:38:43 -07:00
Bram Wasti	9c0f65fc87	Remove While op stuff (#10102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10102 these codepaths are unused, deleting them Reviewed By: yinghai Differential Revision: D9109764 fbshipit-source-id: 8ace42a399806632bfbcada96b383268f0a8ae89	2018-07-31 17:56:25 -07:00
Bram Wasti	c54d71ba60	Upgrade old transform passes to newer APIs (#10046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10046 stampable Reviewed By: duc0 Differential Revision: D9075830 fbshipit-source-id: dc65be1d39625ef24ad319b5ce0263ecfe7a10c9	2018-07-31 17:39:35 -07:00
Bram Wasti	ceb0f14176	Fix SpatialBN Fusion (#10044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10044 The test was subtly broken! This transform wasn't writing to the correct blob and the test did not catch that because it was looking at the old version. thanks @[100022211048576:kerenzhou] for catching this Reviewed By: Jokeren Differential Revision: D9075520 fbshipit-source-id: c31ff0afcd78dd2dc7ffc240e2e89eeda87f1fb4	2018-07-31 17:39:34 -07:00
Zachary DeVito	bf744bea94	Parse and register schema declarations lazily (#9801 ) Summary: This should prevent slow startup times, and will not report as many errors during static initialization time which are hard to debug ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9801 Reviewed By: goldsborough Differential Revision: D8986603 Pulled By: zdevito fbshipit-source-id: 440d43ab5e8cffe0b15118cb5fda36391ed06dbc	2018-07-31 17:24:24 -07:00
Gregory Chanan	34c7c56c73	Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077 ) Summary: This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077 Differential Revision: D9087491 Pulled By: gchanan fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354	2018-07-31 16:43:45 -07:00
Junjie Bai	ba5d33bede	Re-Enable ATen in C2 in integration builds to test ONNX ATen conversions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10060 Differential Revision: D9081387 Pulled By: bddppq fbshipit-source-id: 13cbff63df5241e013d4ebacfcd6da082e7196f6	2018-07-31 15:27:05 -07:00
Yinghai Lu	e04f8bbfa6	Add virtual dtor for ideep context (#10059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10059 Without virtual dtor, it could induce incorrect sized deallocation, messing up the memory. And unfortunately, sized deallocation cannot be detected by ASAN, yet. Reviewed By: jerryzh168 Differential Revision: D9080526 fbshipit-source-id: c136cf653134e75b074326be2bc03627da42446f	2018-07-31 15:27:02 -07:00
Edward Yang	d2178562a4	Remove some unnecessary includes. (#10085 ) Summary: The affected files are all files that are planned to be moved to ATen/core; the includes are for headers which are NOT slated for movement. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10085 Differential Revision: D9093746 Pulled By: ezyang fbshipit-source-id: 2beeffdae26d03d631d2d51b40bf6303759a2f50	2018-07-31 15:13:37 -07:00
Adam Paszke	1f13453b4d	Slightly relax the constraints on argument and return types to script functions (#9969 ) Summary: This lays out initial support for taking and returning a richer set of types than only tensors. Floats and ints are already valid, lists are straightforward to add, tuples need some discussion. Based on top of #9948. Review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969 Reviewed By: zdevito Differential Revision: D9076973 Pulled By: apaszke fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0	2018-07-31 14:25:29 -07:00
Sebastian Messmer	58fd6e1dd6	Also add ATen/core tests to oss CI (#10029 ) Summary: - Pull Request resolved: https://github.com/pytorch/pytorch/pull/10029 Reviewed By: ezyang Differential Revision: D9070030 Pulled By: smessmer fbshipit-source-id: b5ae79a383dc14e7d79e6a82c5d70e951c9f5168	2018-07-31 13:54:39 -07:00
Lu Fang	ee17ed672b	Add missing dependencies (#10086 ) Summary: Fix the master Pull Request resolved: https://github.com/pytorch/pytorch/pull/10086 Differential Revision: D9093741 Pulled By: houseroad fbshipit-source-id: 65e42994ae7d8e0b449d10a8116a7609434aad04	2018-07-31 13:54:38 -07:00
Roy Li	2422801625	fix _pointwise_loss for target gradients (#10018 ) Summary: _pointwise loss has some python special casing, we converted reduction to aten enums too early. fixes #10009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018 Differential Revision: D9075489 Pulled By: li-roy fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162	2018-07-31 13:39:58 -07:00
Lu Fang	56d1a82b31	Add shape inference when converting from onnx to caffe2 (#10037 ) Summary: Otherwise, some RNN case conversion may fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10037 Reviewed By: orionr Differential Revision: D9072298 Pulled By: houseroad fbshipit-source-id: 080f589eba8618719453feb15a7a494fe5380dd0	2018-07-31 12:42:02 -07:00
Ailing Zhang	371a786b18	Errors out when Openmpi < 2.x.x with distributed. (#10015 ) Summary: This PR fixes #9418 . Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version. I've tested on 2.1.1 and 3.0.0 and they work well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015 Reviewed By: soumith Differential Revision: D9088103 Pulled By: ailzhang fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045	2018-07-31 12:24:40 -07:00
Edward Yang	1ae520c704	Add AT_CHECK for null storage. (#9823 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9823 Differential Revision: D9029433 Pulled By: ezyang fbshipit-source-id: 6101556305593c66f618b20d8c2a084ae2558ea8	2018-07-31 12:09:25 -07:00
Thomas Viehmann	685224aa14	Add CTC loss (#9628 ) Summary: The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the modification that is is in log space. The there also is a binding for the (much faster) CuDNN implementation. This could eventually fix #3420 I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments. I could use feedback on all sorts of things, including: - Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors) - Input convention. I use log probs because that is what the gradients are for. - Launch parameters for the kernels - Errors and obmissions and anything else I'm not even aware of. Thank you for looking! In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this). I have read CuDNN is much faster than implementations because it does not use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step. Average timings for the kernels from nvprof for some size: ``` CuDNN: 60.464us compute_alphas_and_betas 16.755us compute_grads_deterministic Cuda: 121.06us ctc_loss_backward_collect_gpu_kernel (= grads) 109.88us ctc_loss_gpu_kernel (= alphas) 98.517us ctc_loss_backward_betas_gpu_kernel (= betas) WarpCTC: 299.74us compute_betas_and_grad_kernel 66.977us compute_alpha_kernel ``` Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations. Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably. My performance measuring testing script: ``` import timeit import sys import torch num_labels = 10 target_length = 30 input_length = 50 eps = 1e-5 BLANK = 0#num_labels batch_size = 16 torch.manual_seed(5) activations = torch.randn(input_length, batch_size, num_labels + 1) log_probs = torch.log_softmax(activations, 2) probs = torch.exp(log_probs) targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long) targets_2d = targets.view(batch_size, target_length) target_lengths = torch.tensor(batch_size[target_length]) input_lengths = torch.tensor(batch_size[input_length]) activations = log_probs.detach() def time_cuda_ctc_loss(grout, args): torch.cuda.synchronize() culo, culog_alpha = torch._ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_cudnn_ctc_loss(groupt, args): torch.cuda.synchronize() culo, cugra= torch._cudnn_ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_warp_ctc_loss(grout, args): torch.cuda.synchronize() culo = warpctc.ctc_loss(args, blank_label=BLANK, size_average=False, length_average=False, reduce=False) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() if sys.argv[1] == 'cuda': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cuda_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'cudnn': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cudnn_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'warpctc': import warpctc activations = activations.cuda().detach().requires_grad_() args = [activations, input_lengths.int(), targets.int(), target_lengths.int()] grout = activations.new_ones((batch_size,), device='cpu') torch.cuda.synchronize() print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals())) ``` I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628 Differential Revision: D8952453 Pulled By: ezyang fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860	2018-07-31 11:09:48 -07:00
Edward Yang	430e44480f	Delete some obsolete steps in the ROCm build. (#10005 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10005 Differential Revision: D9066107 Pulled By: ezyang fbshipit-source-id: 346f654214cff1c956a4022173347d95657ee9d4	2018-07-31 11:09:46 -07:00
Junjie Bai	f779202711	Correctly set CAFFE2_DISABLE_NUMA when USE_NUMA=OFF in cmake (#10061 ) Summary: previously https://github.com/pytorch/pytorch/blob/master/caffe2/core/numa.cc still gets compiled even when USE_NUMA=OFF Pull Request resolved: https://github.com/pytorch/pytorch/pull/10061 Reviewed By: houseroad Differential Revision: D9081385 Pulled By: bddppq fbshipit-source-id: ad28b647e0033727839770b1da0fba341b1b7787	2018-07-31 11:01:51 -07:00
Junjie Bai	cba03e2ebe	Handle dynamic repeats in onnx symbolic (#10052 ) Summary: ONNX Tile can takes the `repeats` as dynamic input Pull Request resolved: https://github.com/pytorch/pytorch/pull/10052 Differential Revision: D9076841 Pulled By: bddppq fbshipit-source-id: ddd692c5f5846c8fdba019baa9fad83ef9638da4	2018-07-31 10:39:50 -07:00
Gregory Chanan	0c11101eca	Prepare THNN/THCUNN for first class scalars. (#10023 ) Summary: I previous did some transformations, e.g. _nDimension,_dim -> nDimensionLegacyAll, nDimension -> nDimensionLegacyNoScalars. But this didn't touch dim(), which needs to be updated to support scalars. Instead of doing an (ugly) move, I audited the call sites and updated the cases that could be size 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10023 Differential Revision: D9068996 Pulled By: gchanan fbshipit-source-id: c63820767dd1496e908a5a96c34968482193f2c5	2018-07-31 10:39:48 -07:00
Mohammad Hossein Sekhavat	c2d9d2888b	Fix typo in tensors.rst (#10073 ) Summary: An tensor -> A tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/10073 Differential Revision: D9087421 Pulled By: soumith fbshipit-source-id: 6713f5a5e11fb11dff0ab5d2d6274f7837c6625f	2018-07-31 10:13:40 -07:00
103yiran	68cbe37c6a	fix the reference link path Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9240 Reviewed By: SsnL Differential Revision: D8764196 Pulled By: ezyang fbshipit-source-id: 3efc70714406d801ed74f52313beca61129593c7	2018-07-31 09:09:46 -07:00
Adam Paszke	5e5c15dd42	Add (constant size) TensorLists to JIT, use them in cat and stack nodes (#9948 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9948 Reviewed By: ezyang Differential Revision: D9033666 Pulled By: apaszke fbshipit-source-id: 02d75e391ed6dee62500842df50f0b6ee5e38846	2018-07-31 07:39:52 -07:00
Gregory Chanan	6fb9acfc16	Revert empty n-dim and ATen in C2 integration builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064 Differential Revision: D9082082 Pulled By: gchanan fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa	2018-07-31 07:25:56 -07:00
Lu Fang	78b806c861	Fix the onnx symbolic for upsample (#10001 ) Summary: We missed the upsample symbolic when bumping up the opset to 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10001 Reviewed By: bddppq Differential Revision: D9067212 Pulled By: houseroad fbshipit-source-id: 3e285d2800a32cb04fa82f8e7f261bdd010a8883	2018-07-30 21:39:48 -07:00
Edward Yang	37a226de63	When BUILD_ATEN=OFF, use ATen/core directly (#10019 ) Summary: ATenCore.h is a dummy header to just test that this is working at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019 Reviewed By: smessmer Differential Revision: D9067262 Pulled By: ezyang fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee	2018-07-30 21:09:55 -07:00
Junjie Bai	aa36a5d01c	Add typing into caffe2 requirements.txt for USE_ATEN (#10047 ) Summary: I was dumb lol Pull Request resolved: https://github.com/pytorch/pytorch/pull/10047 Differential Revision: D9076023 Pulled By: bddppq fbshipit-source-id: 10587875d04ac2aed2e015846fc73ce9e4717a4f	2018-07-30 20:09:21 -07:00
Junjie Bai	51539fa383	Add pyyaml into caffe2 requirements.txt for USE_ATEN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10039 Reviewed By: houseroad Differential Revision: D9074261 Pulled By: bddppq fbshipit-source-id: 26df516633d5a4ec539a03a62cf9e7839e1e1964	2018-07-30 18:11:25 -07:00
Andrew Tulloch	8f0a229078	Fix HPTT path for 0-sized inputs. Reviewed By: Maratyszcza Differential Revision: D9068091 fbshipit-source-id: 4aeac45f9732a86979a08488637bf0ba6cc79b34	2018-07-30 17:54:57 -07:00
Duc Ngo	788b2e996d	nomnigraph - minor cleanup of Graph.h (#9890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9890 Minor cleanups for Graph.h to make it more consistent with our style guide Also fix opt/device.cc and binary_match_test.cc to not access subgraph.nodes_ which is now private Reviewed By: bwasti Differential Revision: D9017108 fbshipit-source-id: 9f5cba4a2cd2a452a955005f4704f6c120bbc1d5	2018-07-30 16:24:03 -07:00
Yinghai Lu	e0a0234018	Remove C++14 feature (#10022 ) Summary: Which test should I look at, bddppq? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10022 Reviewed By: bddppq Differential Revision: D9068732 Pulled By: yinghai fbshipit-source-id: 241ef72c7fac0ed0b8c58ecdffbb5e24eb956217	2018-07-30 16:24:02 -07:00
onnxbot	3e3f40aeeb	Update onnx to latest master (#10024 ) Summary: `df01dbc005` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10024 Reviewed By: houseroad Differential Revision: D9069464 Pulled By: bddppq fbshipit-source-id: 751328352cd495e27b6bd533f4632d3d6d06c4a6	2018-07-30 15:54:34 -07:00
Elias Ellison	e57cb4a1b2	Add a Constant Propagation Pass to the JIT (#8808 ) Summary: Adding a constant propagation pass to the JIT. I have added examples to the expect files. There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false. I have added a test for each case in test_jit.py file as expected failures. To be consistent with DCE, python ops & CPP ops are treated as not having side-effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808 Reviewed By: wanchaol Differential Revision: D8906770 Pulled By: eellison fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c	2018-07-30 15:54:31 -07:00
Xiuyan Ni	db96a0951f	Add SIMD version to GFTRL optimizer (#9698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698 Add SIMD version to GFTRL optimizer Differential Revision: D8949723 fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26	2018-07-30 15:27:24 -07:00
Christian Puhrsch	9987282134	Use Retainable as base class for StorageImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9956 Reviewed By: gchanan Differential Revision: D9066103 Pulled By: cpuhrsch fbshipit-source-id: 1a5a2ace306308707add3d0e0c1fc861f5c79705	2018-07-30 15:08:56 -07:00
Gregory Chanan	7214754663	Check and return when numel() == 0 in Loops.cuh. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10031 Reviewed By: colesbury Differential Revision: D9070346 Pulled By: gchanan fbshipit-source-id: d6ad4e6ca43d334f5be42fea35915270dd8f405e	2018-07-30 15:01:28 -07:00
Junjie Bai	57750bd638	Enable ATen in C2 in integration builds to test ONNX ATen conversions (#10014 ) Summary: zrphercule Pull Request resolved: https://github.com/pytorch/pytorch/pull/10014 Reviewed By: houseroad Differential Revision: D9061842 Pulled By: bddppq fbshipit-source-id: 1e1c2aeae62dd2cc5c6a8d5e1d395ea5cf882734	2018-07-30 15:01:13 -07:00
Thomas Viehmann	6c7fb1582f	Introduce __array_priority__ on torch.Tensor (#9651 ) Summary: This causes numpy to yield to the torch functions, e.g. instead of numpy array/scalar __mul__ converting the tensor to an array, it will now arrange for the Tensor __rmul__ to be called. Fixes case 2 of #9468 I also makes case 3 and 4 equivalent but does not fix them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9651 Differential Revision: D8948079 Pulled By: ezyang fbshipit-source-id: bd42c04e96783da0bd340f37f4ac3559e9bbf8db	2018-07-30 14:39:43 -07:00
vishwakftw	ea3c36b822	NumPy Scalar to PyTorch Scalar (#9225 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4985 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9225 Differential Revision: D8769317 Pulled By: ezyang fbshipit-source-id: eeaeaf0749c9dc9e372634da68b4bd23e6e3ad28	2018-07-30 14:39:40 -07:00
Mingzhe Li	c9eab34e63	Fix Caffe2 with ATen conda build failure (#10020 ) Summary: Extracted from `627624627e` and in support of https://github.com/pytorch/pytorch/pull/10019 cc pjh5 mingzhe09088 ezyang smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/10020 Reviewed By: pjh5 Differential Revision: D9068124 Pulled By: orionr fbshipit-source-id: 4dd4910136a312b6517c65ce8802837108475f89	2018-07-30 14:10:02 -07:00
Peter Goldsborough	04939a4745	Match parameter names and = default (#9737 ) Summary: More clang tidy cleanups in `torch/csrc`. This time: 1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration) 2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs. Also updated my script a little bit. apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737 Differential Revision: D9069069 Pulled By: goldsborough fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae	2018-07-30 14:10:00 -07:00
Zachary DeVito	40a8239984	Fix a bug in argument spec (#9958 ) Summary: Non-tensor types did not set the running total_dims count, causing corrupted data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9958 Reviewed By: jamesr66a Differential Revision: D9065621 Pulled By: zdevito fbshipit-source-id: 0ac1fcdf6da076a9c9ebd5d70ce9126e3f8e722e	2018-07-30 13:08:59 -07:00
Thomas Viehmann	faa96c1c47	Deal with spaces in einsum equation string (#9994 ) Summary: Fixes #9930 Thank you, vadimkantorov for the report. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9994 Differential Revision: D9042876 Pulled By: ezyang fbshipit-source-id: 3bbd1aaaf1b432be40a7652b6a746d80934a216b	2018-07-30 12:57:56 -07:00
Gregory Chanan	ce5f0d40b6	Enable n-dimensional empty tensors. (#9947 ) Summary: These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947 Reviewed By: ezyang Differential Revision: D9032778 Pulled By: gchanan fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd	2018-07-30 12:33:17 -07:00
Jerry Zhang	73a60efccc	Fix Caffe2CTScan error (#9962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9962 att Reviewed By: hlu1 Differential Revision: D9036869 fbshipit-source-id: 3155af00c62d489f998cbfba07121c4fd20e1c6f	2018-07-30 12:33:15 -07:00
Edward Yang	b4f8c60931	Don't use the XML reporter for Catch2. (#10012 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10012 Differential Revision: D9057766 Pulled By: ezyang fbshipit-source-id: 12148a8cf3061423c61b3e7b36864dfcdb1138a1	2018-07-30 11:25:09 -07:00
Christian Puhrsch	9a9a7325c6	Remove the generation of storage files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9954 Reviewed By: gchanan Differential Revision: D9035947 Pulled By: cpuhrsch fbshipit-source-id: 9b56c7a68e3f562ea11b9265a5fa234838f2b4e0	2018-07-30 09:53:57 -07:00
Edward Yang	432ca747b0	Don't seed GPUs if there are none available. (#9931 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9931 Differential Revision: D9051375 Pulled By: ezyang fbshipit-source-id: 1721f6217e07f80adc107d95e897cd7dd488659a	2018-07-30 08:23:53 -07:00
onnxbot	3609977d7f	Update onnx to onnx/onnx@c761845 (#9964 ) Summary: `c761845c7f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9964 Reviewed By: houseroad Differential Revision: D9038133 Pulled By: bddppq fbshipit-source-id: 6ce740944e636175d2de4602edb92cc4d7e8e5ac	2018-07-29 23:10:12 -07:00
Giovanni	5ff1551eb9	ATen's emscripten support (#9803 ) Summary: Not sure if anybody is interested but I managed to infer a `GRU` fine in `wasm` using ATen's compiled with emscripten. It was quite trivial to fix the configuration. It also passes most of the tests, specially all scalar tensor tests. The command line to configure was, but could be simplified: ``` emconfigure cmake -DAT_LINK_STYLE=STATIC -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DCMAKE_C_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_CXX_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_INSTALL_PREFIX=/home/sugar/aten-wasm ../ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9803 Differential Revision: D9004610 Pulled By: ezyang fbshipit-source-id: db26c59f27162ed80f6aee2973c4cb9252d3d1e4	2018-07-29 20:39:00 -07:00
peter	3d6015db0e	Add essential PATH for the Windows PyTorch loading process (#9920 ) Summary: Fixes #9818. It seems original Python doesn't add `[PYTHONPATH]\Library\bin` into `PATH`. We try to add it before dll loading process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9920 Differential Revision: D9040825 Pulled By: soumith fbshipit-source-id: c07fff71b2aea254a396042ab677696f6829aac7	2018-07-29 08:23:59 -07:00
Anshul Jain (B*8)	56974a06b5	Revert D8909766: [caffe2] Simplify order switch operators Differential Revision: D8909766 Original commit changeset: 17a302d5bf4a fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba	2018-07-28 18:40:13 -07:00
rasbt	eee01731a5	Adds the default value for the amsgrad arg to the Adam docstring (#9971 ) Summary: Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971 Differential Revision: D9040820 Pulled By: soumith fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c	2018-07-28 09:23:45 -07:00
Junjie Bai	b99492a507	Fix BlobStatRegistry HIP BlobStatGetter registration issue (#9973 ) Summary: This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7df. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9973 Differential Revision: D9040671 Pulled By: bddppq fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922	2018-07-28 02:23:40 -07:00
Huayu Li	46d8002800	Fix bug that always uses the same blob when repeating poolings Reviewed By: houseroad Differential Revision: D9027902 fbshipit-source-id: 957702ad9736812ec5aa32066d286c2c3adffc49	2018-07-28 00:09:16 -07:00
Wanchao Liang	47c1badf90	Fix the clamp special case and gradient problem on None, add None to JIT (#9596 ) Summary: Supersedes #8925 This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like: ```python torch.jit.script def func(): x = torch.randn(3, 3, requires_grad=True) y = torch.clamp(x, None, 0) # max = 0 y = torch.clamp(x, min=None, max=0) ``` In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp. In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed. Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp). zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596 Reviewed By: zdevito Differential Revision: D8940839 Pulled By: wanchaol fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28	2018-07-27 22:54:33 -07:00
James Reed	851c18dd20	PyTorch File Format API (#9900 ) Summary: This is a follow-up to https://github.com/pytorch/pytorch/pull/9794 that contains only the serialization library and exposes a cleaner API. This should later be incorporated into the module export code Pull Request resolved: https://github.com/pytorch/pytorch/pull/9900 Reviewed By: zdevito Differential Revision: D9021057 Pulled By: jamesr66a fbshipit-source-id: 01af74a7fdd1b90b2f5484644c3121d8ba9eb3b3	2018-07-27 22:24:57 -07:00
JerryShih	d913db70f2	Handle the "spatial" attribute in onnx BatchNormalization op (#9492 ) Summary: If we have this "spatial" attribute and its value equals to 1, we could just remove this attribute and convert this op to caffe2 SpatialBN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9492 Differential Revision: D8988165 Pulled By: houseroad fbshipit-source-id: a9218dc9cd5fab43deb371f290f81285f5283231	2018-07-27 22:09:15 -07:00
Jerry Zhang	bcba5a50d1	Fix EnforceFiniteOp Summary: att Reviewed By: kennyhorror Differential Revision: D9040248 fbshipit-source-id: 0da0f3b1ce51375731098cc86c92f35953be0861	2018-07-27 22:01:23 -07:00
Bram Wasti	ab4e209007	Back out "[caffe2][nomnigraph] Allow multiple ops.def and clean up code gen in general" Summary: Original commit changeset: 6fdf915c057a Reviewed By: yinghai Differential Revision: D9040008 fbshipit-source-id: 33fd5d4ddc0ec8cae56cf86f6d63b6f666e51a3e	2018-07-27 20:09:14 -07:00
Igor Milyakov	607688e928	Adding reciprocal operator and a test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908 Differential Revision: D9035809 Pulled By: virtan fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7	2018-07-27 18:24:43 -07:00
Lu Fang	ee827f6ba3	Fix a testcase in logsoftmax onnx export (#9660 ) Summary: We only support special case. The original dim is not supported by ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9660 Reviewed By: bddppq Differential Revision: D8965507 Pulled By: houseroad fbshipit-source-id: 021dffdf0489c2d3a50bfd1e0c4cfd00d4a3d776	2018-07-27 17:54:32 -07:00
Igor Milyakov	12a1af3731	Adding conv tests with explicit algo definition Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798 Differential Revision: D9034663 Pulled By: virtan fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d	2018-07-27 17:39:17 -07:00
Jerry Zhang	9eeb4e17af	Split gather op for easier smaller code size (#9916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9916 att Differential Revision: D8961085 fbshipit-source-id: 39a9838647dc97611e77beb0607c4655de727ada	2018-07-27 17:15:33 -07:00
root	c3fe071483	Update hip files (#9826 ) Summary: The goal of this PR is to update the hip files to reflect relevant changes in cuda source files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826 Differential Revision: D9032840 Pulled By: bddppq fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f	2018-07-27 16:54:39 -07:00
Norman Mu	a532c1a48c	Fix default argument value for CTCGreedyDecoder op (#9747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747 Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases. Reviewed By: houseroad Differential Revision: D8963635 fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb	2018-07-27 16:33:07 -07:00
cclauss	eb9bb1f09a	Travis CI: Run flake on Python 2.7 and 3.7 (#9953 ) Summary: Flake8 will produce different results on Python 2 and 3. Python 3.7 has __async__ as a reserved word https://github.com/pytorch/pytorch/pull/4999. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9953 Differential Revision: D9035415 Pulled By: soumith fbshipit-source-id: 8a46e028a2e20a7e3f6d90137020268d65a7cc64	2018-07-27 14:43:26 -07:00
Sam Gross	829d763c69	Implement add, sub, mul, div using TensorIterator (#8919 ) Summary: ``` This adds TensorIterator, a helper class for computing element-wise operations that's intended to replace the CPU and CUDA apply utils functions. CPU kernels are implemented as functions that operate on strided 1-d tensors compared to CPUApplyUtils which operated individual elements. This allows the kernels to handle vectorization, while TensorIterator handles parallelization and non-coalesced dimensions. GPU kernels continue to operate on elements, but the number of specializations is reduced. The contiguous case remains the same. The non-contiguous case uses a single (reduced) shape for all operands and the fast integer division from THCIntegerDivider. To avoid extra specializations for indexing with 64-bits, large operations are split into smaller operations that can be indexed with 32-bits. Major semantic changes: - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by TensorIterator. The autograd engine performs the reduction assuming standard broadcasting if the gradient shape does not match the expected shape. Functions that do not use standard broadcasting rules should either continue to trace the expand calls or handle the reduction in their derivative formula. - Use ONNX v7, which supports broadcasting ops. Performance impact: - Small increased fixed overhead (~0.5 us) - Larger overhead for wrapped numbers (~2.5 us) - No significant change for ops on contiguous tensors - Much faster worst-case performance for non-contiguous GPU tensors - Faster CPU bias addition (~2x) - Faster GPU bias addition (~30% faster) Future work: - Decrease overhead, especially for wrapping numbers in Tensors - Handle general inter-type operations - Extend to unary ops and reductions - Use buffering for compute-bound operations on non-contiguous tensors (pull in from CPUApplyUtils) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919 Differential Revision: D8677600 Pulled By: colesbury fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd	2018-07-27 14:43:24 -07:00
Owen Anderson	e3c4057b6c	Eliminate an extra lookup in the hashtable during CSE. (#9668 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9668 Differential Revision: D8955185 Pulled By: resistor fbshipit-source-id: f3f929efc11be63850bd863679cc7b297c98d679	2018-07-27 14:43:22 -07:00
Christian Puhrsch	ef9801f32c	Merge THStorage into at::Storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9772 Reviewed By: ezyang Differential Revision: D9019375 Pulled By: cpuhrsch fbshipit-source-id: d5185e29747929d648e4260db4967452cd40f563	2018-07-27 13:53:55 -07:00
Owen Anderson	6ed41adb04	Use round-to-negative division when computing output sizes for convolutions involving striding and dilation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640 Differential Revision: D8948081 Pulled By: resistor fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595	2018-07-27 13:22:54 -07:00
Wei Yang	8c0355c90d	convert lambd directly to scalar_t at hardshrink (#9919 ) Summary: - convert lambd directly to scalar_t instead of creating a tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9919 Differential Revision: D9026708 Pulled By: weiyangfb fbshipit-source-id: d20ab06ecc12aa972ee9d1323ee2f84abf8d5ffd	2018-07-27 13:22:52 -07:00
Adam Paszke	ce0b895a0c	Fix UBSAN error in ONNX peephole pass, make it more robust. Summary: Minor fix for a bug introduced by D9004285 Reviewed By: anderspapitto Differential Revision: D9028762 fbshipit-source-id: 9b9c5eef30e61d7ae19784e0418fa29bad2b5564	2018-07-27 12:38:56 -07:00
Thomas Viehmann	c77e4bc4d5	export tensor(ArrayRef, options) on Windows (#9904 ) Summary: I hope this helps me for the windows build failure in #9628 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9904 Differential Revision: D9026715 Pulled By: soumith fbshipit-source-id: bb97d41d060823f5a37bfc9a1659815b8b9f4eab	2018-07-27 12:14:52 -07:00
Jerry Zhang	aebf3b47ae	Remove template parameter from Tensor (#9939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939 Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: ezyang, houseroad Differential Revision: D9024330 fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba	2018-07-27 10:56:39 -07:00
Lu Fang	94439d7df4	Suppress the vptr warning in ubsan (#9909 ) Summary: Unblock https://github.com/pytorch/pytorch/pull/8469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9909 Differential Revision: D9023650 Pulled By: houseroad fbshipit-source-id: 7682a9cd7905e98c802b820ad59745672b32970d	2018-07-27 10:28:07 -07:00
Gregory Chanan	c0bacc6284	Guard test_lapack_empty with has_magma. (#9936 ) Summary: CUDA lapack functions generally don't work unless has_magma is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9936 Differential Revision: D9028579 Pulled By: gchanan fbshipit-source-id: 9b77e3b05253fd49bcabf604d0924ffa0e116055	2018-07-27 10:09:00 -07:00
Eugene Vorontsov	bf32ea8094	Fix dimension check in 1D instance norm, allowing 2D tensors alongside 3D. (#9924 ) Summary: Fixes #9776. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9924 Differential Revision: D9028328 Pulled By: soumith fbshipit-source-id: d5f22abb2be83b34aee95ebe144c97519a6854f8	2018-07-27 09:24:07 -07:00
Gregory Chanan	d3ba9a173e	Handle case where THC btrifact doesn't zero info. (#9907 ) Summary: This was showing up in the n-dimensional empty tests as flaky because it's reading uninitialized cuda memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9907 Differential Revision: D9021413 Pulled By: gchanan fbshipit-source-id: 31542b7597919df9afd6e528bb108a4a3e8eaf60	2018-07-27 09:11:44 -07:00
Gregory Chanan	1af1b0c2a5	Remove THTensor::_dim, temporarily remove THTensor_nDimension. (#9895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9895 The primary goal here was to remove THTensor::_dim, which isn't part of the API moving forward. Instead, we provide 3 options for getting the dimensionality (this is temporary although non-trivial to remove!): ``` nDimension corresponds to the "true" ATen dimension. TODO: implement. nDimensionLegacyNoScalars correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors. nDimensionLegacyAll corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors and tensors with a dimension of size zero are collapsed to 0-dimensional tensors. ``` So in this patch, nDimension -> nDimensionLegacyNoScalars and _dim/_nDimension goes to nDimensionLegacyAll. These are just codemods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9835 Reviewed By: ezyang Differential Revision: D8999338 Pulled By: gchanan fbshipit-source-id: a4d676ac728f6f36ca09604a41e888d545ae9311	2018-07-27 08:56:38 -07:00
Gregory Chanan	bc66d98248	Fix narrow on empty tensors after negative size support. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9838 Differential Revision: D9002345 Pulled By: gchanan fbshipit-source-id: 13f4bacff94d9d0ea31a3b73a75b9b3e774eabf5	2018-07-27 07:55:20 -07:00
Changmao Cheng	7b375ed362	fix ParameterDict doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9918 Differential Revision: D9026402 Pulled By: soumith fbshipit-source-id: d0459dcda631e8921ab39725b9045e03960da5c9	2018-07-27 01:10:50 -07:00
tomguluson92	a709f23225	revise a little spell mistake in tensor.py (#9868 ) Summary: Hello! I just find a small spell mistake while reading this source code. Just PR it, Thx! Pull Request resolved: https://github.com/pytorch/pytorch/pull/9868 Reviewed By: gchanan, ezyang Differential Revision: D9016030 Pulled By: soumith fbshipit-source-id: fc3877177be080adbdbda99a169e401691292ebb	2018-07-27 00:55:03 -07:00
Junjie Bai	4a192bcc3d	Rename onnx integration tests file to avoid confusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9913 Differential Revision: D9026787 Pulled By: bddppq fbshipit-source-id: a3e7e79973abc4f5fe163f3e86b24382a1efd082	2018-07-26 23:40:41 -07:00
Adam Paszke	8cb1eef7b9	Unify IR operator representation (stop using attributes in the JIT) (#9807 ) Summary: Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it could infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7	2018-07-26 22:11:50 -07:00
Vignesh Ramanathan	2c1d9e09b8	Support UINT8 for addition data in ImageInputOp (#9901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9901 Added support for UINT8 datatype for additional data (prefetching and output) by ImageInputOp Reviewed By: ashwinb Differential Revision: D9018964 fbshipit-source-id: f938a8a072c15c0ee521b2f16788c024b08cd37f	2018-07-26 22:11:46 -07:00
James Sun	aa671ddefa	Support production models with predictor benchmark (#9855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9855 Support production models with predictor benchmark Two new flags are added: `--update_prod`: pull production data (netdef, input types, input dims) from Hive and store locally `--use_prod`: run benchmark with local production data with the same workload as in production. By default, 300 models will be loaded. production vs benchmark avg net run time: (collected by prod: https://fburl.com/scuba/6lb91zfx and bench: https://fburl.com/ngjj1dc8) prod: `408us` vs bench: `543us` (With prod data distribution, this should be even closer) framework overhead (as of 2018-07-22): prod: ``` 9.111% BlackBoxPredictor::Run 4.602% SimpleNet::Run 2.377% Operator::Run 1.786% BlackBoxPredictor::AllocateMemory 1.372% Observable::StartAllObservers 1.358% Observable::StartObserver 1.206% Blob::GetMutable ``` bench: ``` 8.577% BlackBoxPredictor::operator() 3.276% SimpleNet::Run 1.954% Operator::Run 1.697% BlackBoxPredictor::AllocateMemory 1.477% Tensor::ShareData 1.230% Blob::GetMutable 1.034% Observable::StartObserver ``` Reviewed By: yinghai Differential Revision: D8942996 fbshipit-source-id: 27355d7bb5a9fd8d0a40195261d13a97fa24ce17	2018-07-26 21:39:29 -07:00
David Brownell	eb33887816	Addressed issue identified by static code analysis: potential buffer … (#9889 ) Summary: …overrun Pull Request resolved: https://github.com/pytorch/pytorch/pull/9889 Differential Revision: D9026278 Pulled By: soumith fbshipit-source-id: ee2ee255f34731ddc581261984c3caf56faa0e12	2018-07-26 21:09:51 -07:00
Vishwak Srinivasan	e41eb43327	Remove deprecated masked_copy (#9819 ) Summary: No tests are affected by this removal. Closes https://github.com/pytorch/pytorch/issues/1885 and closes #9817 While I was at it, I also fixed #9876 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9819 Differential Revision: D9018126 Pulled By: SsnL fbshipit-source-id: a9142bf4e2403bef05779a097f61fa8b7db04b71	2018-07-26 20:55:18 -07:00
Owen Anderson	a841006353	Simplify some code by directly constructing unordered_set from nodes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9675 Differential Revision: D8952196 Pulled By: resistor fbshipit-source-id: 5ef2308fed9f702021f650cf2d241a83d880d359	2018-07-26 19:54:38 -07:00
Yi Cheng	dfa0af093d	Move predictor into caffe2/caffe2/predictor (#9548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9548 Pull Request resolved: https://github.com/pytorch/translate/pull/157 One part of refactor predictor. Move all the files into predictor dir. Reviewed By: highker Differential Revision: D8845276 fbshipit-source-id: 1e917464b0c8a042f025128a082c784eaa3b7013	2018-07-26 19:03:40 -07:00
Sam Gross	c045e969b6	Use qualified name at::Half in Dispatch.h (#9848 ) Summary: This makes AT_DISPATCH_ALL_TYPES_AND_HALF valid outside of the at namespace. See https://github.com/pytorch/extension-cpp/issues/15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9848 Differential Revision: D9006921 Pulled By: colesbury fbshipit-source-id: a6e4f097a9d6fb85c921e1c9b9ea25d0f2db06dc	2018-07-26 19:03:24 -07:00
Jongsoo Park	e7ab093d93	Simplify order switch operators (#9581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581 Mostly to simplify code. Should also improve performance but order switch ops don't take much time anyway. Reviewed By: viswanathgs Differential Revision: D8909766 fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919	2018-07-26 18:24:29 -07:00
Wanchao Liang	b7b61a8eb4	Change expect, cast on Type to return shared pointers, make isSubtypeOf accept TypePtr (#9786 ) Summary: Follow up task of #9584. Commit 1: - change expect/cast to return shared pointers instead of raw pointer - isSubtypeOf accept TypePtr instead. Use `x->isSubtypeOf(NumberType::get())` rather than `x->isSubtypeOf(*NumberType::get())` Commit 2: - to address enable_shared_from_this pitfalls, we make the constructor private and expose the factory method to make sure user can only create it using our factory method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9786 Reviewed By: zdevito Differential Revision: D8980441 Pulled By: wanchaol fbshipit-source-id: e5c923fc57a701014310e77cf29985b43bb25364	2018-07-26 18:09:45 -07:00
Ailing Zhang	9df9c46992	fix loading 1dim tensor from 0.3.* to 0dim tensor (#9781 ) Summary: This PR fixes #9743 . Adding backward support when loading a checkpoint from 0.3.* with 1dim tensor, they are now 0 dim tensor in 0.4+. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9781 Differential Revision: D8988196 Pulled By: ailzhang fbshipit-source-id: a7a1bc771d597394208430575d5a4d23b9653fef	2018-07-26 17:09:41 -07:00
Gregory Chanan	d65c667f28	Avoid divide-by-zero when hamming_window window length is 0. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9896 Reviewed By: ezyang Differential Revision: D9018572 Pulled By: gchanan fbshipit-source-id: fa314687973124165bffb3084932d8ab6d872a93	2018-07-26 15:56:44 -07:00
Fei Sun	d1260d26fe	Sleep before run (#9891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9891 Add an argument to benchmark binary to specify the seconds to sleep before the run and after the warmup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9880 Reviewed By: llyfacebook Differential Revision: D9014254 Pulled By: sf-wind fbshipit-source-id: d5566186c8ed768f1e170e9266c5f2d6077391e0	2018-07-26 14:39:17 -07:00
Norman Mu	18a6541b82	Create IDEEP fallback operators for ctc decoder ops (#9847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9847 CTCBeamSearchDecoder and CTCGreedyDecoder do not currently support IDEEP execution. Add fallback operators to allow IDEEP execution of models that use these operators. Reviewed By: yinghai Differential Revision: D9006234 fbshipit-source-id: fc539ba67b07d1f960d28564d8adde0be8690649	2018-07-26 14:09:11 -07:00
Jerry Zhang	969b62f276	Revert D8121878: Remove template parameter from Tensor Differential Revision: D8121878 Original commit changeset: 4a5e9a677ba4 fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e	2018-07-26 14:02:04 -07:00
Junjie Bai	456f41301c	Disable unique ops test on rocm (#9892 ) Summary: Somehow we have Unique operator tests in two places test_unqiue_ops.py and hypothesis_test.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/9892 Reviewed By: houseroad Differential Revision: D9017631 Pulled By: bddppq fbshipit-source-id: 1f9e40e4953afca26141ef4581202b9b9fce0ae9	2018-07-26 13:10:23 -07:00
zou3519	1dc708493e	Add html-stable target to docs Makefile (#9884 ) Summary: This lets one build docs for the release easier. All of the unstable warnings are removed in `make html-stable`. cc soumith SsnL Sample build: ![image](https://user-images.githubusercontent.com/5652049/43277115-05e2f720-90d5-11e8-9977-b0b4a6ee4b8e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9884 Reviewed By: SsnL Differential Revision: D9016001 Pulled By: zou3519 fbshipit-source-id: 5cf2dfbf886de993242db28cdac5d0c5fadbdc4d	2018-07-26 12:09:06 -07:00
Junjie Bai	0c84a5c27e	Pass shape infos to ONNX -> Caffe2 C++ conversion backend (#9870 ) Summary: And let Gemm conversion to inspect the input `C` to try converting to FC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9870 Reviewed By: houseroad Differential Revision: D9013198 Pulled By: bddppq fbshipit-source-id: b4c509cfccca238262e1c406b004e66cef256321	2018-07-26 12:00:32 -07:00
Adam Paszke	e39c8043dc	Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763 ) Summary: This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763 Reviewed By: zou3519 Differential Revision: D8978457 Pulled By: apaszke fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20	2018-07-26 12:00:27 -07:00
Junjie Bai	6f10944f88	Re-enable rocm tests that have been fixed in rocm 1.8.2 (#9862 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9862 Differential Revision: D9012520 Pulled By: bddppq fbshipit-source-id: cdcc184e23befa8dbd1bc44d59bd25766aac33d0	2018-07-26 10:54:57 -07:00
Gregory Chanan	716f7d657d	Remove Broadcast.py. (#9843 ) Summary: I don't think this file is used anywhere, I guess we'll find out! (Weirdly this failed lint on one of my PRs even though it shouldn't). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9843 Differential Revision: D9003949 Pulled By: gchanan fbshipit-source-id: 26d580d1e7cdd30e82e5f4176244e51fd7cd616d	2018-07-26 10:44:24 -07:00
Jerry Zhang	cd5adc7b5f	Remove template parameter from Tensor (#13 ) Summary: Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: xw285cornell Differential Revision: D8121878 fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81	2018-07-26 10:25:23 -07:00
Wei Yang	2c7e7e37a6	Corrected doc in class RNNCell (#9866 ) Summary: fixes #9642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9866 Differential Revision: D9012131 Pulled By: weiyangfb fbshipit-source-id: d2849b1a50234dbdb335dffab4835c9de85183c3	2018-07-26 09:27:05 -07:00
Junjie Bai	bdbbcf068a	Temporarily disable test_unique on rocm since it keeps running into segfault (#9872 ) Summary: petrex https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872 Reviewed By: ezyang Differential Revision: D9013335 Pulled By: bddppq fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e	2018-07-26 08:34:00 -07:00
Ashish	e70fc145a9	MIOpen fixes for Caffe2 (#9842 ) Summary: The PR contains: Fixes for running MIOpen conv operator in a multi worker scenario, along with a performance fix Fixing a typo in MIOpen pool op and adding some extra checks for MIOpen spatial BN op bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/9842 Differential Revision: D9012512 Pulled By: bddppq fbshipit-source-id: 270e1323c20fbfbc4b725f9a4ff34cd073ddaaa8	2018-07-26 02:42:26 -07:00
Junjie Bai	3be8e4db51	Do not run ONNX integration tests in parallel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9861 Differential Revision: D9011458 Pulled By: bddppq fbshipit-source-id: 7ab1b1763d56f1290ade7a99682ad461c97f807b	2018-07-25 21:54:29 -07:00
Junjie Bai	997f46d1e1	Disable "filter too much" health check for fc operator tests (#9865 ) Summary: makes the CI flaky Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865 Differential Revision: D9011882 Pulled By: bddppq fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d	2018-07-25 21:41:14 -07:00
Marat Dukhan	ba062e7da9	Update OnnxifiOp according to onnx/onnx#1224 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9844 Reviewed By: yinghai Differential Revision: D9004222 Pulled By: bddppq fbshipit-source-id: 1bdcefc0dfbd5e3422217b5254b2462e5a568d2a	2018-07-25 19:29:38 -07:00
Edward Yang	5e4de0821a	Set ROCm MAX_JOBS=4 (#9856 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9856 Differential Revision: D9009100 Pulled By: ezyang fbshipit-source-id: 28f34128fcb7c3d6a115884bf28dc2a6bde5aed6	2018-07-25 19:09:41 -07:00
Edward Yang	6cd0174ff5	Reimplement localScalar as a native function. (#9762 ) Summary: I split it into two parts, _local_scalar and _local_scalar_dense (unchecked) so I could reuse the sparse logic in both paths. _local_scalar became a method on Tensor to work around a circular include problem. This is resurrected copy of #9652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762 Differential Revision: D8972348 Pulled By: ezyang fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c	2018-07-25 19:09:39 -07:00
Edward Yang	ad47228020	Test pinning Hypothesis 3.59.0 (#9830 ) Summary: We think this will band-aid some of the new Caffe2 test failures. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9830 Differential Revision: D9008052 Pulled By: ezyang fbshipit-source-id: 84f1c0faea429d758d760965d6cbfe9e4c72eb19	2018-07-25 18:11:10 -07:00
Edward Yang	b84b78a69d	Fix the ROCM build, and enable sccache for it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9841 Differential Revision: D9008030 Pulled By: ezyang fbshipit-source-id: 51cac3c75fc52658b22a10a6bf8a479bcf803fb2	2018-07-25 17:55:47 -07:00
James Reed	0b16b03b98	Plumb type annotations through script compilation (new) (#9547 ) Summary: Supersedes https://github.com/pytorch/pytorch/pull/9405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9547 Reviewed By: zdevito Differential Revision: D8900327 Pulled By: jamesr66a fbshipit-source-id: a00a94615af4fbaec98ee3ede0cb54bcfd9108dd	2018-07-25 17:10:14 -07:00
Xiaomeng Yang	445c17d492	Update CopyMatrix in math (#9792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9792 Update CopyMatrix in math Reviewed By: houseroad Differential Revision: D8982421 fbshipit-source-id: da2056306cde3300124b21eba7a6c2d113111002	2018-07-25 16:10:52 -07:00
Duc Ngo	74ac5265d1	nomnigraph - make use of nodeIterator (#9831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9831 Follow up to D8980903 - replace dataIterator with nodeIterator where the data isn't used. Reviewed By: pjh5 Differential Revision: D8998351 fbshipit-source-id: c333847ecd8b6d8075352322845839b94a63aecc	2018-07-25 15:40:44 -07:00
Wei Yang	302adb7cc8	added torch.rot90() to ATen (#8628 ) Summary: 1. fixes #6271 2. implemented torch.rot90() following [numpy.rot90()](`6a58e25703/numpy/lib/function_base.py (L54-L138)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/8628 Reviewed By: ezyang Differential Revision: D8987860 Pulled By: weiyangfb fbshipit-source-id: 8dac3b2a1f6d3288672977aba8b547706ce97fe9	2018-07-25 15:11:44 -07:00
Gregory Chanan	2f5c0c30cd	Make logsumexp work with empty tensors again. (#9825 ) Summary: https://github.com/pytorch/pytorch/pull/9755 broke this, but it was only tested if size zero dims were turned on (it can still happen even if that isn't turned on, because we support size [0] tensors). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9825 Differential Revision: D8997303 Pulled By: gchanan fbshipit-source-id: 911dce112f73fad0f3980a7f4f9423df0f2d923d	2018-07-25 13:41:24 -07:00
Edward Yang	4b0098f3ae	Add --allow-change-held-packages to make nccl2 install in docker work (#9828 ) Summary: This was used to build Caffe2 Docker version 170. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9828 Differential Revision: D8997808 Pulled By: ezyang fbshipit-source-id: f48938b2b71bc86578c9d9b46c281ed05478724e	2018-07-25 11:56:40 -07:00
James Reed	279b836675	Add some user-friendly checks in pack padded symbolic to ensure thing… (#9731 ) Summary: …s are the right type Pull Request resolved: https://github.com/pytorch/pytorch/pull/9731 Reviewed By: soumith Differential Revision: D8958693 Pulled By: jamesr66a fbshipit-source-id: 7db1f86a85188fd2c84d0edaaaac6a096d64ba52	2018-07-25 11:25:42 -07:00
Gregory Chanan	be163f50a3	Avoid divide-by-zero when bartlett_window size is 0. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9788 Differential Revision: D8980951 Pulled By: gchanan fbshipit-source-id: 429b341ac687afe4f1429bb141ef070bf315519c	2018-07-25 10:40:39 -07:00
Gregory Chanan	56fbfee872	Remove ifdef __cplusplus from THTensor.h, have cpp self-contained in … (#9775 ) Summary: …THTensor.hpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9775 Differential Revision: D8977140 Pulled By: gchanan fbshipit-source-id: d6d2461f7cb0511ee1def52ac1032a86349a7105	2018-07-25 10:25:17 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit 9ee513365121cd387e11987c66db6599ac53ded7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Yuan Xie	c14e17eced	Co-disitillation with different archs and/or feature set (#9793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9793 Enable co-distillation with different archs Reviewed By: pjh5 Differential Revision: D8888479 fbshipit-source-id: eac14d3d9bb6d8e7362bc91e8200bab237d86754	2018-07-25 10:10:27 -07:00
bhushan	ea67a2bd11	Allows negative index to tensor.narrow (Fixes: #9546 ) Summary: Fixes #9546 Test cases added Reviewed By: ezyang Differential Revision: D8974842 Pulled By: zou3519 fbshipit-source-id: a7707406c2a21e8e14f9c2a8ad4d64c8b08156df	2018-07-25 09:25:45 -07:00
Gregory Chanan	0853d13f86	Move scalar boolean to THTensor, rename scalar in this context to zer… (#9783 ) Summary: …o dim. Manifest: 1) The scalar boolean is now in THTensor, although it isn't hooked up at the TH level yet. 2) setScalar is gone, everything now goes through the maybeScalar equivalent (which is renamed) 3) all "scalars" in this context now refer to "zero_dim" in order to differentiate this concept from the "Scalar" class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9783 Differential Revision: D8978911 Pulled By: gchanan fbshipit-source-id: f09254be4bebad0e4c510fefe4158b4f7e92efe1	2018-07-25 09:25:41 -07:00
Duc Ngo	8825e323b5	nomnigraph - Add way to check if a NodeRef is in a graph, and make a graph node iterator (#9790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9790 - Add way to check if a NodeRef is in a graph - Make a nodeIterator (similar to dataIterator) but only iterate through nodes. Reviewed By: bwasti Differential Revision: D8980903 fbshipit-source-id: b20504a46715858752e25242303125a15a709b88	2018-07-25 09:02:13 -07:00
Jorghi12	42a4747389	Temporarily need this to prevent sccache from breaking. (#9810 ) Summary: Temporarily need this to prevent sccache from breaking when I move sccache install to the DockerFile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9810 Differential Revision: D8991684 Pulled By: Jorghi12 fbshipit-source-id: 14cd0278f53a72372f9bbe27b228980f8d3c1d4a	2018-07-25 09:01:58 -07:00
caoxudong	a74a3fdeb6	typo fix, tutorials url with http protocol is not valid (#9812 ) Summary: The tutorials url with http is not valid, replacing it with https. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9812 Differential Revision: D8991344 Pulled By: ezyang fbshipit-source-id: c12faa57905b50eadc320f9938c39c4139bd093b	2018-07-25 07:54:26 -07:00
sethah	3ef521e98a	Implement backward for torch.symeig (#8586 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/6890. (backward pass for non-symmetric eigen-decomposition is not implemented in other packages, e.g. autograd, mxnet, tensorflow, presumably because the eigenvalues can be imaginary for the general case, and AFAIK we cannot support complex numbers). This patch adds a backward function for the symmetric eigen-decomposition function `torch.symeig`. The formula used is taken from [here](http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf). Unit tests are added to verify correctness. There is still one outstanding issue, which is how to handle the case where the `symeig` is called with `eigenvectors=False`. In this case, the eigenvectors are returned as a zero tensor, but the backward computation for the eigenvalues depends on the eigenvectors. There was a previous attempt to implement this in https://github.com/pytorch/pytorch/pull/2026, where apaszke mentioned that the `eigenvectors` argument should be overridden so that they are saved for the backwards pass. The forward code is autogenerated, though, and it isn't clear to me how that would be done. I'd appreciate any guidance. For now, there is a unit test that will fail until that issue is resolved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8586 Reviewed By: ezyang Differential Revision: D8872760 Pulled By: SsnL fbshipit-source-id: 76614495d0f9c118fec163a428f32e5480b4d115	2018-07-25 07:16:10 -07:00
Edward Yang	0262fd0f91	Delete Tensor::typeString() (#9764 ) Summary: The primary use-site of typeString was checked_cast_tensor. I did a little more than I needed in this patch, to set the stage for actually deleting the tensor type. Specifically, I modified checked_cast_tensor to explicitly take Backend and ScalarType, the idea being that once we remove the tensor subclasses, we will delete the T template parameter. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9764 Differential Revision: D8969196 Pulled By: ezyang fbshipit-source-id: 9de92b974b2c28f12ddad13429917515810f24c6	2018-07-24 22:26:15 -07:00
Edward Yang	723a600ebd	Update for new incremental build instructions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9773 Differential Revision: D8988285 Pulled By: ezyang fbshipit-source-id: c2c3b7cefb54e4e18602b180281f22939293a383	2018-07-24 22:26:13 -07:00
Tony Duan	bca10ad706	Implementation of Weibull distribution (#9454 ) Summary: This implements the two-parameter Weibull distribution, with scale $\lambda$ and shape $k$ parameters as described on [Wikipedia](https://en.wikipedia.org/wiki/Weibull_distribution). Details - We implement as a transformed exponential distribution, as described [here](https://en.wikipedia.org/wiki/Weibull_distribution#Related_distributions). - The `weibull_min` variance function in scipy does not yet support a vector of distributions, so our unit test uses a scalar distribution instead of a vector. Example of the bug: ``` >>> sp.stats.expon(np.array([0.5, 1, 2])).var() # fine array([1., 1., 1.]) >>> sp.stats.weibull_min(c=np.array([0.5, 1, 2])).var() # buggy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 490, in var return self.dist.var(self.args, self.kwds) File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1242, in var res = self.stats(args, **kwds) File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1038, in stats if np.isinf(mu): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9454 Differential Revision: D8863574 Pulled By: SsnL fbshipit-source-id: 1ad3e175b469eee2b6af98e7b379ea170d3d9787	2018-07-24 20:40:15 -07:00
Siddharth Goyal	4b61760738	Add Adadelta optimizer to caffe2 (#9088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088 Closes https://github.com/pytorch/pytorch/pull/9088 - Added CPU/GPU implementations of Adadelta and SparseAdadelta. - Added corresponding Python unittests Reviewed By: BIT-silence Differential Revision: D8712169 fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be	2018-07-24 20:09:21 -07:00
Anders Papitto	620952117e	remove unnecessary -Wno= flags Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9608 Differential Revision: D8946664 Pulled By: anderspapitto fbshipit-source-id: b05f10af58da25b2a2588f7153f393bb3637f29a	2018-07-24 18:40:42 -07:00
Jesse Hellemn	9cf76cfb4c	Chaning conda build script to use current python version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9780 Reviewed By: ml7 Differential Revision: D8983501 Pulled By: pjh5 fbshipit-source-id: 79208796247433cbe271a2d06f66254587d96f80	2018-07-24 18:40:40 -07:00
Peter Goldsborough	f62bc01dfe	Remove TORCH_ASSERT (#9575 ) Summary: I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former. ezyang apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575 Differential Revision: D8924566 Pulled By: goldsborough fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441	2018-07-24 18:10:06 -07:00
Sebastian Messmer	d2610fb379	Constexpr Type Ids -> 6.5% caffe2 perf improvement (#9603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9603 Using constexpr for some heavily queried type ids gives us a 6.5% perf improvement for caffe2. Benchmark results: P59829647 Also ad canaries (but they don't show a significant difference): - adfinder: - https://our.intern.facebook.com/intern/ads/canary/411346509423301481 - https://our.intern.facebook.com/intern/ads/canary/411346563021753557 - adindexer: - https://our.intern.facebook.com/intern/ads/canary/411346517006038367 - https://our.intern.facebook.com/intern/ads/canary/411346571387258927 - multifeed_predictor: - https://our.intern.facebook.com/intern/ads/canary/411346526631282941 - https://our.intern.facebook.com/intern/ads/canary/411346583141009531 Reviewed By: dzhulgakov Differential Revision: D8841577 fbshipit-source-id: 1a0ce7f2bee1ae54b723caefe5bc7f85a20935b4	2018-07-24 17:24:55 -07:00
Keren Zhou	6c6a353a66	Fix speedbenchmark bug (#9770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9770 Add zero ops to operators that do not have a valid schema Reviewed By: hlu1 Differential Revision: D8957472 fbshipit-source-id: d8d0a351183e88ace2e050a87c1e1c363af67e33	2018-07-24 17:10:37 -07:00
onnxbot	d7d673b68d	Updata onnx to lastest master (#9782 ) Summary: `52d40befa7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9782 Reviewed By: yinghai, houseroad Differential Revision: D8978668 Pulled By: bddppq fbshipit-source-id: 238f76a36784c12cc5655a2ee059f7e0169c0bb6	2018-07-24 14:42:01 -07:00
Junjie Bai	e5fe66d7ea	Add support for specifying device_option in Functional (#9619 ) Summary: e.g. ``` Functional.Add(x, y, device_option=DeviceOption(HIP, 0)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9619 Differential Revision: D8966599 Pulled By: bddppq fbshipit-source-id: 22235e42f19278e79802642798bf0ee70a1202f6	2018-07-24 14:41:59 -07:00
Tongzhou Wang	37fc58f1d3	Use torch::empty before random_ on seed gen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9769 Reviewed By: goldsborough Differential Revision: D8977636 Pulled By: SsnL fbshipit-source-id: c2437d5ef53dc74e1b17eb16e728e1d67ae314c7	2018-07-24 14:41:58 -07:00
Peter Goldsborough	f393df774b	Test case for c10d DDP (#9670 ) Summary: Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some. I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!). I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9670 Differential Revision: D8977724 Pulled By: goldsborough fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944	2018-07-24 14:10:24 -07:00
Gregory Chanan	e26d584445	Remove isScalar() from TensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9765 Differential Revision: D8969474 Pulled By: gchanan fbshipit-source-id: 42002b129488179affc919dba877de5a4e8f9fb5	2018-07-24 12:55:06 -07:00
Thomas Viehmann	7050d83dd7	Make logsumexp_out inplace (#9755 ) Summary: Fixes: #9754 Maybe this could also make its way into 0.4.1, it is a severe debugging headache if you hit this... Pull Request resolved: https://github.com/pytorch/pytorch/pull/9755 Reviewed By: ezyang Differential Revision: D8967178 Pulled By: zou3519 fbshipit-source-id: 151ed24e3a15a0c67014e411ac808fb893929a42	2018-07-24 12:40:48 -07:00
Vishwak Srinivasan	360c1bbd5b	Add multivariate log-gamma (mvlgamma) (#9451 ) Summary: 1. Add tests in test_cuda, test_torch 2. Add doc strings Closes https://github.com/pytorch/pytorch/issues/9378 . Differential Revision: D8859746 Pulled By: ezyang fbshipit-source-id: 939c309d90940a7aa08f53004c9e7b3b1c9cf54e	2018-07-24 12:10:10 -07:00
Edward Yang	6885b3fd62	Delete dead IsVariable enum. (#9768 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9768 Differential Revision: D8975802 Pulled By: ezyang fbshipit-source-id: f85844872a1eb13e782aba0c168a3a1c1ac0313d	2018-07-24 11:58:11 -07:00
vishwakftw	f9a99d5504	Specify default initialization schemes for modules in docs (#9038 ) Summary: This closes #6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48	2018-07-24 11:58:08 -07:00
Kittipat Virochsiri	2b134c72e6	Add interface to provide blob types to shape&type inference (#9643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643 Current map interface assumes float data type, which is not always correct. Reviewed By: kennyhorror Differential Revision: D8455784 fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10	2018-07-24 11:58:05 -07:00
Junjie Bai	7af5883860	Eanble python tests on ROCM (#9616 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616 Differential Revision: D8960623 Pulled By: bddppq fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993	2018-07-24 11:37:58 -07:00
Gregory Chanan	6ab5e697b9	Small fixups for enabling zero size dims. (#9724 ) Summary: 1) Properly test cpu for alpha/beta addmm cases. 2) Unsqueeze on empty no longer throws an exception. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9724 Reviewed By: ezyang Differential Revision: D8958513 Pulled By: gchanan fbshipit-source-id: 6ce2ec4a47201f9b225b8c52354144ace43e9e09	2018-07-24 11:11:39 -07:00
Gregory Chanan	675d80841a	Small fixups for n-dimensional empty tensors in CUDA non-reduction di… (#9722 ) Summary: …m ops. Continuation of https://github.com/pytorch/pytorch/pull/9658. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9722 Differential Revision: D8956321 Pulled By: gchanan fbshipit-source-id: 116fcaa1be5b1373f03217911556a28125cc860d	2018-07-24 11:11:37 -07:00
Will Wilson	f6496229a5	Fixes xcode 10 beta 4 compile error (#9748 ) Summary: When building iOS apps with a caffe2 dependency, we were seeing the `caffe2/caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm:33:17: error: method 'copyWithZone:' in protocol 'NSCopying' not implemented [-Werror,-Wprotocol]`. This fixes it by implementing a shallow copy with that method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9748 Reviewed By: jerryzh168 Differential Revision: D8954332 Pulled By: williamtwilson fbshipit-source-id: 0cd44408257c0bd3f4ffb80312ea9d13d13e5ff3	2018-07-24 11:11:35 -07:00
Edward Yang	1283834600	Devirtualize TensorImpl::toString (#9758 ) Summary: This can hardly be called an improvement (we now print CPUFloatType instead of CPUFloatTensor) but it was the simplest way I could think of devirtualizing this function in the short term. Probably need some sort of native function that gives string information about a tensor. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Approved in #9710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9758 Differential Revision: D8966935 Pulled By: ezyang fbshipit-source-id: a4641affe0a6153f90cdd9f4f2a1100e46d1a2db	2018-07-24 11:11:33 -07:00
Gregory Chanan	679d397f28	Fix scalar_tensor_test for squeeze/unsqueeze with zero sized dimensions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9766 Differential Revision: D8971173 Pulled By: gchanan fbshipit-source-id: 50bf7778eee7c60f51e1660ad834e161fa40f563	2018-07-24 10:42:39 -07:00
Junjie Bai	a7afba7308	Remove duplicated functions (#9601 ) Summary: found by linter, duplication was likely introduced in previous code sync Pull Request resolved: https://github.com/pytorch/pytorch/pull/9601 Differential Revision: D8922379 Pulled By: bddppq fbshipit-source-id: 1f61bd7f539d823e62920615674a532ec0149623	2018-07-24 10:23:46 -07:00
Lu Fang	adda789770	Skip maxpool_with_indices onnx tests (#9751 ) Summary: Not in the same format. Skip at the moment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9751 Reviewed By: yinghai Differential Revision: D8965636 Pulled By: houseroad fbshipit-source-id: 81d39c2f5625c14c0e1ee11408b5f7267b53798f	2018-07-24 10:23:43 -07:00
Edward Yang	ba634c11df	Move strides to base class. (#9749 ) Summary: Approved in #9644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9749 Differential Revision: D8965336 Pulled By: ezyang fbshipit-source-id: d1b0763e592f298395621cfd684715dc0a550cd6	2018-07-23 22:27:48 -07:00
Zachary DeVito	9bf72b2087	Add missing windows exports Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9738 Reviewed By: apaszke Differential Revision: D8961728 Pulled By: zdevito fbshipit-source-id: aacba8c03d0d8dfe1e87585d1c2b26703d2ed103	2018-07-23 19:55:19 -07:00
Xiaomeng Yang	5df3eae89e	Add 1x1 specialization for conv with NCHW order (#9671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671 Add 1x1 specialization for conv with NCHW order Reviewed By: houseroad Differential Revision: D8944686 fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c	2018-07-23 18:54:58 -07:00
Tongzhou Wang	a387331e54	Re-enable test_segfault after recent dataloder changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700 Differential Revision: D8953615 Pulled By: SsnL fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60	2018-07-23 18:38:42 -07:00
Edward Yang	099b5ba9d1	Tensor merge PRs from July 20 (#9713 ) Summary: Constituent PRs: - [x] #9553 Remove unnecessary functions from StorageDerived.h (by cpuhrsch, reviewed by ezyang) - [x] #9588 Use THTensor/Storage for THVoidTensor/Storage (by cpuhrsch , reviewed by gchanan) - [x] #9627 Delete context from tensor (by ezyang, reviewed by gchanan) - [x] #9641 Tensor reorganization (by ezyang, reviewed by gchanan ) - [x] #9647 Remove dim_ from THTensor (by cpuhrsch, reviewed by ezyang) - [x] #9650 Remove context (by cpuhrsch, reviewed by gchanan and ezyang) - [x] #9715 Fix Windows build in tensor merge PR (by ezyang, reviewed by gchanan and SsnL) Upcoming PRs which didn't make this cut: - [x] #9644 Stride move to TensorImpl, and nits (by ezyang, reviewed by gchanan) - [ ] #9652 Native localScalar (by ezyang, UNREVIEWED AND FAILING TESTS) - [x] #9710 Devirtualize TensorImpl::toString (by ezyang, reviewed by gchanan) - [ ] #9654 Use int64_t instead of ptrdiff_t for size / Rename flag to resizable_ (by cpuhrsch, CHANGES REQUESTED AND FAILING TESTS) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9713 Reviewed By: gchanan Differential Revision: D8960882 Pulled By: ezyang fbshipit-source-id: 99747b2c5462c7ff6809b67aacb4197626408204	2018-07-23 18:00:41 -07:00
Bram Wasti	e3fb9088d5	Allow multiple ops.def and clean up code gen in general Summary: Basic cleanup, refactoring out some ops to closed source fb Reviewed By: yinghai Differential Revision: D8720722 fbshipit-source-id: 6fdf915c057a5749656d9f34a57fc142de6b076b	2018-07-23 15:44:04 -07:00
Peter Goldsborough	5849354aa1	Add operator<< overloads for TensorOptions (#9606 ) Summary: Added `operator<<` overloads for `at::TensorOptions` on request of ebetica Example output: ``` TensorOptions(dtype=Double, device=cpu, layout=Strided, requires_grad=false) ``` ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9606 Differential Revision: D8925191 Pulled By: goldsborough fbshipit-source-id: 0503bc2851268276e9561d918290bc723e437c9c	2018-07-23 15:11:33 -07:00
Peter Goldsborough	d05a8145c5	Change behavior of clone to clone to a device (#9609 ) Summary: ebetica made me aware that `nn::Module::clone()` always clones to the current device (usually CPU) instead of preserving the device of each parameter. This PR changes the signature of `clone` from `shared_ptr<Module> clone()` to `shared_ptr<Module> clone(optional<Device> device = nullopt)` with semantics of: 1. If a `device` is given, all parameters/buffers are moved to that device, 2. If no `device` is supplied (default), parameters/buffers retain their device. ezyang apaszke ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/9609 Differential Revision: D8957367 Pulled By: goldsborough fbshipit-source-id: 0d409ae645ed2b8d97d6fc060240de2f3d4bc6c8	2018-07-23 14:55:25 -07:00
Peter Goldsborough	31ba2f15e1	Rename embedding variable to weight (#9720 ) Summary: I renamed the variable in the `Embedding` module from `weight` to `table` a few months ago, because it seemed like a more meaningful name. Turns out it's not such a good idea because it deviates from PyTorch, which unnecessarily breaks C++->Python translated code. ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9720 Differential Revision: D8955647 Pulled By: goldsborough fbshipit-source-id: 77228b07d2b733866e8cdecaa6d0686eef4cc3ea	2018-07-23 14:55:24 -07:00
Anders Papitto	431415adc4	quick patch for PackPadded removal to propagate the correct size. (#9657 ) Summary: The underlying reason why this is even an issue is that the conversion into and out of the 'fictional' onnx operators is done in an unhygenic order. This doesn't address that, but it does fix the one observable case where this produces an incorrect result, and unblocks some other work being done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9657 Differential Revision: D8940824 Pulled By: anderspapitto fbshipit-source-id: ea827a24c85447fe4ae470336a746329598eee84	2018-07-23 14:25:39 -07:00
Zachary DeVito	a949245a86	Switch interpreter to use IValue's primitive int/floats (#9718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718 This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base. Notes: * Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line * Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because we no longer allow implicit tensor -> integer conversions following the same convention as in python * prim::ListConstruct/createList has been added to the interpreter for creating lists and this has replaced aten::stack for integers lists * gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract the primitives * IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor * `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type more consistent. This functionality combined with `toIValue(Value)` replaces the `tensor_as` and `as_tensor` family of functions. conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors * IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store the scalars as a double or int64. * To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler, along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior. * Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions. * Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions. * TensorToNum and NumToTensor have actual implementations as operators now * regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp * Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler. * incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here. * stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types: ``` at::Tensor a; at::Scalar b; pop(stack, a, b); at::Tensor c = a + b; push(stack, c); ``` apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584 Reviewed By: apaszke Differential Revision: D8910546 Pulled By: zdevito fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30	2018-07-23 14:11:11 -07:00
Yinghai Lu	a9742e1a27	Add fallback to TensorCPU if there are unsupported types for IDEEP Tensor (#9667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9667 MKL-DNN doesn't support 64-bit integger (`cfee61bf81/include/mkldnn_types.h (L62-L75)`). So force converting from `TensorCPU<long>` to `s32` Ideep tensor will cause memory issue. This diff gives an alternative solution, where we just fall through to TensorCPU. The reasoning is that since MKL-DNN doesn't support 64 bit integer tensor, downstream ops have to be in CPUConext. So there is no reason force converting to ideep tensor and back. Reviewed By: pjh5 Differential Revision: D8943544 fbshipit-source-id: f514903cda27e34b8887271c9df56c8220895116	2018-07-23 13:54:57 -07:00
Norman Mu	ee2cc68259	Add ctc_beam_search_decoder op for caffe2 (#9622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622 Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder. Differential Revision: D8903100 fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3	2018-07-23 13:40:24 -07:00
Sam Gross	aa8a9fa5fc	Extend DispatchStub to support CUDA dispatch (#9664 ) Summary: This is a modification of the strategy from https://github.com/pytorch/pytorch/pull/8919 and https://github.com/pytorch/pytorch/pull/9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef	2018-07-23 13:40:23 -07:00
Xiaolong Wang	3e9e3ef383	Improving diagnose RF NE with Cali (#9550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9550 as titled Differential Revision: D8899226 fbshipit-source-id: 3c7cf026e8cbc0e95770e5a35b213a97bebba385	2018-07-23 13:40:21 -07:00
Sebastian Messmer	88d6b6e6cd	Fix D8722560 (#9717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9717 D8722560 was landed with some build errors, unfortunately the c10 code isn't part of contbuild yet. Fixing them. Differential Revision: D8954141 fbshipit-source-id: 2a082fb8041626e45ccd609f37a8ef807f6dad8a	2018-07-23 12:55:20 -07:00
Peter Goldsborough	5094684238	Create torch::from_blob for variables (#9605 ) Summary: Need an overload of `at::from_blob` for Variables. ezyang colesbury ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/9605 Differential Revision: D8926226 Pulled By: goldsborough fbshipit-source-id: e377c0d019d4377f3fc124614c7dcc562aa69990	2018-07-23 12:40:12 -07:00
Fei Sun	14d4bdb406	Reformat output data format to make it more general for other binaries (#9555 ) Summary: This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries. This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555 Reviewed By: pjh5 Differential Revision: D8903024 Pulled By: sf-wind fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a	2018-07-23 11:11:26 -07:00
idansc	029cf1d78a	Improve error messages of wrong dimensions (#9694 ) Summary: Updated the error message terms _matrices_ and _vectors_ to _2D tensors_ and _1D tensors_ respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9694 Differential Revision: D8949589 Pulled By: ezyang fbshipit-source-id: 2cdcd72e0e9a4459f3691c133bb16ef218b5cf3f	2018-07-23 10:10:55 -07:00
fehiepsi	9525925119	Low rank multivariate normal (#8635 ) Summary: This pull request implements low rank multivariate normal distribution where the covariance matrix has the from `W @ W.T + D`. Here D is a diagonal matrix, W has shape n x m where m << n. It used "matrix determinant lemma" and "Woodbury matrix identity" to save computational cost. During the way, I also revise MultivariateNormal distribution a bit. Here are other changes: + `torch.trtrs` works with cuda tensor. So I tried to use it instead of `torch.inverse`. + Use `torch.matmul` instead of `torch.bmm` in `_batch_mv`. The former is faster and simpler. + Use `torch.diagonal` for `_batch_diag` + Reimplement `_batch_mahalanobis` based on `_batch_trtrs_lower`. + Use trtrs to compute term2 of KL. + `variance` relies on `scale_tril` instead of `covariance_matrix` TODO: - [x] Resolve the fail at `_gradcheck_log_prob` - [x] Add test for KL cc fritzo stepelu apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/8635 Differential Revision: D8951893 Pulled By: ezyang fbshipit-source-id: 488ee3db6071150c33a1fb6624f3cfd9b52760c3	2018-07-23 10:10:53 -07:00
Gregory Chanan	9d6521c3a0	Support n-dimensional empty tensors in CUDA non-reduction dimension f… (#9658 ) Summary: …unctions. This also unifies the error checkign between scatter/scatterAdd on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9658 Differential Revision: D8941527 Pulled By: gchanan fbshipit-source-id: 750bbac568f607985088211887c4167b67be11ea	2018-07-23 08:40:12 -07:00
peter	53083b8353	Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491 ) (#9491 ) Summary: Fixes #9092. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693 Differential Revision: D8946850 Pulled By: ezyang fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707	2018-07-23 06:40:39 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
Edward Yang	1afdc57ed8	Hide all other fields in THTensor (#9683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9683 This pops off `refcount_`, `storage_`, `storage_offset_`; there are now no more direct accesses to these fields and we can make them private (with appropriate friending). Stacked on #9561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9591 Reviewed By: SsnL Differential Revision: D8922246 Pulled By: ezyang fbshipit-source-id: dfae023d790e29ce652e2eab9a1628bbe97b318d	2018-07-22 09:09:34 -07:00
Di Yu	f3d72b2101	Modify barrier net to allow better control over its initialization and execution in DPM (#9665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9665 In data_parallel_model, we isolate synchronizing barrier init net into its own from the param_init_net, so that we could have finer granularity of control over the barrier net. Reviewed By: andrewwdye Differential Revision: D8375389 fbshipit-source-id: ce0c8c1c8e4bd82b7078a1b07abaced3f149d578	2018-07-22 00:23:47 -07:00
Adam Paszke	769cb5a640	Add new ways of matching nodes with schemas in the JIT (#9567 ) Summary: REVIEW LAST COMMIT ONLY As discussed in our yesterday's meeting. Nodes can be now matched to particular overloads using the `matches(...)` function: ```cpp n->matches("aten::type_as(Tensor self, Tensor other) -> Tensor") ``` This also changes the shape prop and peephole passes to use those functions for matching. This fixes a few bugs, makes them much more robust, and prepares us for removal of attributes. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9567 Reviewed By: zdevito Differential Revision: D8938482 Pulled By: apaszke fbshipit-source-id: eb2382eeeae99692aada2d78d5d0c87c8ef1545e	2018-07-21 21:39:07 -07:00
Xiaomeng Yang	a01d6f01b5	Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet Reviewed By: houseroad Differential Revision: D8889361 fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e	2018-07-21 12:54:33 -07:00
Owen Anderson	3bb8c5eab1	Allow MKLDNN on macOS, and any other OS where CMake is able to detect it. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9638 Reviewed By: soumith Differential Revision: D8946130 Pulled By: resistor fbshipit-source-id: 87bd9cb12608467b05bd4998fdb00bfdbd038ca2	2018-07-20 22:27:02 -07:00
Edward Yang	b5c8d59451	Add a CUDAContext header include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9662 Differential Revision: D8945581 Pulled By: ezyang fbshipit-source-id: 2fe0adc96456788579f7d6f1c4513fe45360c030	2018-07-20 20:39:09 -07:00
Edward Yang	23ed26a0c3	Guard include of cuda-only header comm.h (#9656 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9656 Reviewed By: colesbury Differential Revision: D8941361 Pulled By: ezyang fbshipit-source-id: c18cb0e606ae0608e5892040192b8792ae542b74	2018-07-20 19:46:36 -07:00
Ashish	5e84403d5f	Fix for half conversion for ROCm 1.8.2 (#9663 ) Summary: This PR contains the change for explicit conversion between ushort and __half required for ROCm 1.8.2 support bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/9663 Differential Revision: D8943937 Pulled By: bddppq fbshipit-source-id: 16102f9dbc68ed4ece2e8fc244825c3992c24901	2018-07-20 17:11:30 -07:00
Gregory Chanan	3efdece9da	Support n-dimensional empty tensors in take/put. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9635 Differential Revision: D8935119 Pulled By: gchanan fbshipit-source-id: 5035583e7322b1a1720d961945dd0eefb4cb28ef	2018-07-20 15:40:49 -07:00
Yinghai Lu	45e5c17ecf	ONNXIFI transform (#9569 ) Summary: Cut-off runnable subgraph and off-load to ONNXIFI backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/9569 Reviewed By: Maratyszcza Differential Revision: D8930408 Pulled By: yinghai fbshipit-source-id: 2b494f7f8dc10c00e58cf0fed5c4a9434be6155b	2018-07-20 15:09:59 -07:00
Kittipat Virochsiri	01581037dc	Add workspace.RunPlanInBackground (#9637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637 Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training. Reviewed By: MisterTea Differential Revision: D8906439 fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e	2018-07-20 14:56:12 -07:00
Mike Ruberry	1003ccfa15	Creates CUDAContext (#9435 ) Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with #9277 and I will merge with master after #9277 goes in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751	2018-07-20 12:56:15 -07:00
Kittipat Virochsiri	8a0fe0a588	set_input_record() should always add external input (#9636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636 Make sure that the blobs are registered to the net Reviewed By: pjh5 Differential Revision: D8924883 fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6	2018-07-20 11:55:37 -07:00
Gregory Chanan	bae156a481	Support (some) CUDA Lapack on n-dimensional empty tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9631 Reviewed By: ezyang Differential Revision: D8933202 Pulled By: gchanan fbshipit-source-id: 1ade4ca439bf26aa921df1da83a827d860f8f48f	2018-07-20 11:40:25 -07:00
vmirly	d3688861ec	Fixed a missing '=' in LPPoolNd repr function (#9629 ) Summary: In the repr funciton of LPPoolNd(..) class, there was a missing '='. (`kernel_size{kernel_size}`) Link to line in the code: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/pooling.py#L694 Original: return 'norm_type={norm_type}, kernel_size{kernel_size}, stride={stride}, ' \ 'ceil_mode={ceil_mode}'.format(self.__dict__) Fixed: return 'norm_type={norm_type}, kernel_size={kernel_size}, stride={stride}, ' \ 'ceil_mode={ceil_mode}'.format(self.__dict__) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9629 Differential Revision: D8932913 Pulled By: soumith fbshipit-source-id: 9030dff6b14659b5c7b6992d87ef53ec8891f674	2018-07-20 11:24:42 -07:00
Zhaoheng Ni	a3a6ab60cd	Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598 The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp. Reviewed By: jerryzh168 Differential Revision: D8919799 fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840	2018-07-20 11:09:34 -07:00
Adam Paszke	1d4d9fc7da	Prepare to stop using attributes in the JIT (#9505 ) Summary: This PR adds machinery to cache the schema in an IR node, and allows lookups of (possibly) constant inputs by their names (instead of position). The new methods are: - `at::optional<T> get<T>(Symbol name)` - if the argument called name is a constant, then casts it to type `T` and returns it. If it's not constant returns `nullopt`. Raises an error if there's no argument with that name. - `at::optional<IValue> get<T>(Symbol name)` - like above, but packs the result in an IValue - `Value* getValue(Symbol name)` - retrieves a `Value*` for an argument (no need to know its position). All above functions currently inspect the attributes as well, but that's only so that I could start using them in other places in the JIT without disrupting our current functionality. I wanted this diff to be a preparation that doesn't change the semantics too much, and so both the tracer and script create nodes with attributes. The next PR will put that to a stop, and hopefully the changes we need to make to other components will be simpler thanks to what I did here. One more thing I'd like to do before actually stopping creating the non-attributed nodes is to have a convenient way of creating a schema programmatically, matching nodes against it, and creating them without having to pack inputs into flat argument lists (which is quite error prone). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9505 Reviewed By: ezyang Differential Revision: D8915496 Pulled By: apaszke fbshipit-source-id: 39d14fc9a9d73d8494f128367bf70357dbba83f5	2018-07-20 10:56:00 -07:00
Sam Gross	b9e89cf9fd	Revert "Extend DispatchStub to support CUDA dispatch (#9579 )" (#9614 ) Summary: This reverts commit bcf0bf42a1727c8ee788f733c28579d0e36a387c. The commit was causing issues for some internal FB projects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9614 Reviewed By: Yangqing Differential Revision: D8929552 Pulled By: colesbury fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6	2018-07-20 10:25:11 -07:00
Christian Puhrsch	bbb30ad4ab	Use THTensor/Storage for THVoidTensor/Storage (#9588 ) Summary: Change akin to change for THVoidStorage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9588 Reviewed By: gchanan Differential Revision: D8915559 Pulled By: cpuhrsch fbshipit-source-id: 6cc69df0e29942c62750f990903dfd8e4d344581	2018-07-20 09:54:44 -07:00
Christian Puhrsch	f84fdc7866	Remove unnecessary functions from StorageDerived.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9553 Reviewed By: ezyang Differential Revision: D8915526 Pulled By: cpuhrsch fbshipit-source-id: 32013d3aa58a1a68637f99ee619d06e27fadaad6	2018-07-20 09:41:36 -07:00
Vishwak Srinivasan	7b9d8916e5	Fix integral type dispatch error message (#9625 ) Summary: This fix will prevent errors like (found in `bincount`) ``` RuntimeError: %s not implemented for '%s'bincounttorch.FloatTensor ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9625 Differential Revision: D8932945 Pulled By: soumith fbshipit-source-id: 794e3b58d662779402ab318e274661826a5db8b2	2018-07-20 09:24:27 -07:00
Tongzhou Wang	2a0018f2a8	Add scatter_add_ doc (#9630 ) Summary: fixes #4176 cc vishwakftw I didn't do `:math:` and `\neg` because I am using double ticks so they render more similarly with `:attr:`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9630 Differential Revision: D8933022 Pulled By: SsnL fbshipit-source-id: 31d8551f415b624c2ff66b25d886f20789846508	2018-07-20 08:41:05 -07:00
Tongzhou Wang	bfe2aa093e	docs fixes (#9607 ) Summary: fixes #9589 #9507 #9502 #9390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9607 Reviewed By: ezyang, soumith Differential Revision: D8923575 Pulled By: SsnL fbshipit-source-id: cb61d990333b700d813ce781040c3d0325999b8c	2018-07-20 07:55:25 -07:00
Anders Papitto	4028ff6c3a	Revert "quick patch for PackPadded removal to propagate the correct s… (#9613 ) Summary: …ize. (#9593)" This reverts commit 85b28163584380bf4953f2ac2fa21df9715f12d5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9613 Reviewed By: bddppq Differential Revision: D8929322 Pulled By: anderspapitto fbshipit-source-id: 3ae4d320e5407acc1fb63a26b7d1f2ff4059eba9	2018-07-20 00:39:29 -07:00
Adam Paszke	aa7af94656	Make JIT tracing a thread-local property (#9414 ) Summary: As in the title. Lets us simplify a lot of code. Depends on #9363, so please review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414 Reviewed By: zdevito Differential Revision: D8836496 Pulled By: apaszke fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3	2018-07-19 19:09:39 -07:00
Rio Hoshi	5651b27458	Add CAFFE_STATIC_EVENT to Stats (#9501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9501 Added a new stat value to log static states like CPU and memory usage. Reviewed By: pjh5 Differential Revision: D8872254 fbshipit-source-id: 469e94cab99029a3da55f8986dddeadac076e2a8	2018-07-19 16:25:59 -07:00
Peter Goldsborough	b770156a7a	Functional DataParallel (#9234 ) Summary: This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend. For this, I had to: 1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s, 2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++. `replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice). `parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py). Added lots of tests for these things. apaszke ezyang ebetica colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234 Differential Revision: D8865182 Pulled By: goldsborough fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4	2018-07-19 16:12:04 -07:00
Peter Goldsborough	7e78e80d94	Make error message for empty module friendlier (#9565 ) Summary: In our pimpl system, default constructing a module holder default constructs the contained module. This means `Linear linear;` is ill-formed, since `Linear` doesn't have a default constructor. Instead we require `Linear linear = nullptr;` to get the empty state of the `Linear`. This PR makes the error message for the ill-formed case nicer. I had to change the forwarding constructors of most of our modules for this, but that's a minor adjustment. E.g. ``` Linear linear; In file included from /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:5:0, from /home/psag/pytorch/pytorch/test/cpp/api/module.cpp:3: /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h: In instantiation of ‘torch::nn::ModuleHolder<Contained>::ModuleHolder() [with Contained = torch::nn::LinearImpl]’: /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/dropout.h:45:1: required from here /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h:46:5: error: static assertion failed: You are trying to default construct a module which has no default constructor. Use = nullptr to give it the empty state (like an empt y std::shared_ptr). static_assert( ``` ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9565 Differential Revision: D8903666 Pulled By: goldsborough fbshipit-source-id: 5e6b788921a27a44359db89afdc2b057facc5cec	2018-07-19 15:56:54 -07:00
Sam Gross	bcf0bf42a1	Extend DispatchStub to support CUDA dispatch (#9579 ) Summary: This is a few files taken from https://github.com/pytorch/pytorch/pull/8919. They're unchanged from the latest versions of that PR. ``` This is part of https://github.com/pytorch/pytorch/pull/8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9579 Differential Revision: D8909000 Pulled By: colesbury fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0	2018-07-19 14:25:40 -07:00
Edward Yang	a08119afc2	Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561 ) Summary: * THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>` * Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet. * There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides) * Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides * Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides) Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go. Note for gchanan: review from commit "ci" and after Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561 Reviewed By: cpuhrsch Differential Revision: D8901926 Pulled By: ezyang fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510	2018-07-19 14:10:06 -07:00
Junjie Bai	f521823b7b	Do not always set broadcast argument when exporting new onnx add and sub to caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9597 Reviewed By: colesbury Differential Revision: D8920575 Pulled By: bddppq fbshipit-source-id: 97423e1bf6a20559d466d2ac56c9e74e10bfc129	2018-07-19 14:10:05 -07:00
Zhishuai Zhang	6557856671	Fix l2 normalization when handling zero vector (#9594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594 When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this. Reviewed By: pjh5 Differential Revision: D8849732 fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706	2018-07-19 14:10:03 -07:00
Anders Papitto	85b2816358	quick patch for PackPadded removal to propagate the correct size. (#9593 ) Summary: The underlying reason why this is even an issue is that the conversion into and out of the 'fictional' onnx operators is done in an unhygenic order. This doesn't address that, but it does fix the one observable case where this produces an incorrect result, and unblocks some other work being done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9593 Differential Revision: D8919125 Pulled By: anderspapitto fbshipit-source-id: a88ca979c3b9d439863e223717d3697180c26121	2018-07-19 14:10:02 -07:00
Tongzhou Wang	f33cd36c9b	Use int64_t for im2col and col2im (#9590 ) Summary: Fixes #9404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9590 Differential Revision: D8916020 Pulled By: SsnL fbshipit-source-id: ac6758326bbb09b48642b149f4eb8f466ef7044e	2018-07-19 11:29:24 -07:00
Gregory Chanan	f180373d68	Support n-dimensional empty tensors in CUDA BLAS and fix a btrifact bug. (#9573 ) Summary: This is mainly straightforward, with two exceptions: 1) cublasSgemv, cublasDgemv appear to have a bug where (x,0).mv(0) does not handle beta, whereas cublasSgemm, cublasDgemm do for case where (x,0).mm(0,y). This is handled by manually calling zero / mul. 2) I fixed a bug in btrifact that was broken even when dealing with non-empty tensors. Basically, if out.stride(0) was 1, because the underlying BLAS call expects column-major matrices, to get a column-major tensor, out.transpose_(0, 1) would be called. But this is just wrong, as if the batch dimension (0) doesn't match the size of the columns (1), you don't even have a tensor of the correct shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9573 Reviewed By: ezyang Differential Revision: D8906144 Pulled By: gchanan fbshipit-source-id: de44d239a58afdd74d874db02f2022850dea9a56	2018-07-19 09:50:27 -07:00
Tongzhou Wang	aee9e90abd	Fix TestAutograd.test_as_strided (#9538 ) Summary: 0. Fixes #9479 1. rewrites `as_strided` as a native function. This is fine because `set_` does the scalar check. 2. allow using `self` in `python_default_init`. Previously `python_variable_methods.cpp` has `self` as an input `PyObject *`, and use `self_` as the unpacked tensor. But `python_torch_functions.cpp` just use `self` as the unpacked tensor, making it impossible to use `self` in `python_default_init`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9538 Differential Revision: D8894556 Pulled By: SsnL fbshipit-source-id: ca7877b488e12557b7fb94e781346dcb55d3b299	2018-07-19 09:11:13 -07:00
Will Feng	e0446fcfa9	Pass dtype to tensor contructor in test_neg (#9558 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9554. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9558 Differential Revision: D8901085 Pulled By: yf225 fbshipit-source-id: 0edb176fcb18e0c0bcfc6f209343b9097767c9b8	2018-07-19 08:54:39 -07:00
Peter Yeh	54db14e390	HIP Operators Generator--> HipOpG (#9322 ) Summary: The goal of this PR is to add an infrastructure; to convert(hipify) CUDA ops into [HIP](https://github.com/ROCm-Developer-Tools/HIP) ops , at compile time. Note that HIP ops, which are portable c++ code, can run on AMD and NVIDIA platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9322 Differential Revision: D8884707 Pulled By: bddppq fbshipit-source-id: dabc6319546002c308c10528238e6684f7aef0f8	2018-07-19 00:26:06 -07:00
Marat Dukhan	45f0d05202	Adapt OnnxifiOp to removed suffix handling in ONNXIFI loader (#9571 ) Summary: Adapt to changes in onnx/onnx#1203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9571 Reviewed By: yinghai Differential Revision: D8907892 Pulled By: bddppq fbshipit-source-id: 9f88471639dbe9050194e84340f335bece834d5d	2018-07-18 19:26:23 -07:00
Viswanath Sivakumar	604f7e98c3	Expose CAFFE2_USE_OPENCV preprocessor flag (#9509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9509 generate_proposals_op_util_nms.h conditionally requires OpenCV in some cases, and earlier this was checking just CV_MAJOR_VERSION macro, but that is undefined unless opencv.hpp is included. Adding `-DCAFFE2_USE_OPENCV` to TARGETS when opencv is included in external_deps to check for this correctly. Thanks jinghuang for flagging this issue! Differential Revision: D8880401 fbshipit-source-id: 65abbcf4ffe3feffc0ee2560882cb8eb0b7476f9	2018-07-18 18:56:49 -07:00
Yi Cheng	b3e141e84c	Add predictor config into Predictor (#9434 ) Summary: This is the first step of refactoring the Predictor. In this diff the config struct is introduced and the internal data structure of Predictor has been updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9434 Differential Revision: D8843262 Pulled By: fishbone fbshipit-source-id: 23f5e4751614e3fedc9a04060d69331bfdecf864	2018-07-18 16:39:56 -07:00
Keren Zhou	04b33b7231	Add byte_weight_dequant_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9541 Reviewed By: hlu1 Differential Revision: D8882964 fbshipit-source-id: 06d2e0d227ea6a4a8dc5ef1ea9dd1d449c149b47	2018-07-18 16:27:21 -07:00
Christian Puhrsch	c1ee8835b6	Constructors and member functions for THStorage (#9357 ) Summary: Added on top of ezyang's https://github.com/pytorch/pytorch/pull/9278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9357 Reviewed By: ezyang Differential Revision: D8863934 Pulled By: cpuhrsch fbshipit-source-id: a45c955c0b1e9e0866749b3a7e8a36de931bdff1	2018-07-18 15:56:26 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
Peter Goldsborough	3b886500a0	Add CUDAGuard to ATen (#9277 ) Summary: THCStream was recently moved to ATen by mruberry: https://github.com/pytorch/pytorch/pull/8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes https://github.com/pytorch/pytorch/issues/7800 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7	2018-07-18 14:40:31 -07:00
Christian Puhrsch	8769fec03f	Move clamp into ATen (#9506 ) Summary: Glue component of https://github.com/pytorch/pytorch/pull/9319 Important to unblock wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/9506 Reviewed By: wanchaol Differential Revision: D8879437 Pulled By: cpuhrsch fbshipit-source-id: 16ea8a93f3f5df2695180b3a30a583834b7004f1	2018-07-18 13:40:11 -07:00
Edward Yang	c506ff97c8	Disable py2-clang3.8-rocmnightly-ubuntu16.04-test in disabled-configs… (#9543 ) Summary: ….txt setting In the ROCm branches we will experiment with turning this on. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9543 Differential Revision: D8897990 Pulled By: ezyang fbshipit-source-id: ae9d25d1b79ee421d49436593edf8c7e49b3a4e5	2018-07-18 12:58:56 -07:00
Xiaomeng Yang	ca3b36aa6a	Add implementation for batch_moments_op (#9510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510 Add implementation for batch_moments_op Reviewed By: houseroad Differential Revision: D8587654 fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b	2018-07-18 11:59:54 -07:00
Keren Zhou	8c741b7c4f	Add transformation from caffe2::resizeop to onnx::upsample Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9511 Reviewed By: hlu1 Differential Revision: D8876692 fbshipit-source-id: 9ba346e225cfbc686d370134fe41a28333b933cc	2018-07-18 11:59:52 -07:00
Artem Volkhin	b6b6e1b39f	Fix core.Plan.create_from_proto (#9438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438 Current implementation of create_from_proto doesn't work as expected: it duplicates networks and execution steps by copying original PlanDef first and adding each step one-by-one later. Reviewed By: pjh5 Differential Revision: D8850316 fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b	2018-07-18 10:55:55 -07:00
Tongzhou Wang	27455e9c78	Use _six for inf and nan (#9500 ) Summary: Things like `float('inf')` are actually quite expensive. ```py In [1]: import math In [2]: %timeit -n 200 math.inf 49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) In [3]: %timeit -n 200 float('inf') 194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500 Reviewed By: soumith Differential Revision: D8876229 Pulled By: SsnL fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7	2018-07-18 10:40:29 -07:00

5453 changed files with 427067 additions and 287089 deletions

2

.circleci/.gitignore vendored Normal file

View File

 @ -0,0 +1,2 @@
 *.svg
 *.png

									
										38

.circleci/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,38 @@

				CircleCI configuration generator

				================================

				One may no longer make changes to the `.circleci/config.yml` file directly.

				Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory.

				Usage

				----------

				1. Make changes to these scripts.

				2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.

				You'll see a build failure on TravisCI if the scripts don't agree with the checked-in version.

				Motivation

				----------

				These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix.

				The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content.

				Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate

				multiple parts of the file.

				* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets

				Also see https://github.com/pytorch/pytorch/issues/17038

				Future direction

				----------------

				### Declaring sparse config subsets

				See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):

				In contrast with a full recursive tree traversal of configuration dimensions,

				> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.

0

caffe/init.py → .circleci/cimodel/init.py

View File

0

caffe/proto/init.py → .circleci/cimodel/data/init.py

View File

									
										153

.circleci/cimodel/data/binary_build_data.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,153 @@

				#!/usr/bin/env python3

				"""

				This module models the tree of configuration variants

				for "smoketest" builds.

				Each subclass of ConfigNode represents a layer of the configuration hierarchy.

				These tree nodes encapsulate the logic for whether a branch of the hierarchy

				should be "pruned".

				In addition to generating config.yml content, the tree is also traversed

				to produce a visualization of config dimensions.

				"""

				from collections import OrderedDict

				from cimodel.lib.conf_tree import ConfigNode

				import cimodel.data.dimensions as dimensions

				LINKING_DIMENSIONS = [

				    "shared",

				    "static",

				]

				DEPS_INCLUSION_DIMENSIONS = [

				    "with-deps",

				    "without-deps",

				]

				def get_processor_arch_name(cuda_version):

				    return "cpu" if not cuda_version else "cu" + cuda_version

				LINUX_PACKAGE_VARIANTS = OrderedDict(

				    manywheel=[

				        "2.7m",

				        "2.7mu",

				        "3.5m",

				        "3.6m",

				        "3.7m",

				    ],

				    conda=dimensions.STANDARD_PYTHON_VERSIONS,

				    libtorch=[

				        "2.7m",

				    ],

				)

				CONFIG_TREE_DATA = OrderedDict(

				    linux=(dimensions.CUDA_VERSIONS, LINUX_PACKAGE_VARIANTS),

				    macos=([None], OrderedDict(

				        wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				        conda=dimensions.STANDARD_PYTHON_VERSIONS,

				        libtorch=[

				            "2.7",

				        ],

				    )),

				)

				DEVTOOLSET_VERSIONS = [3, 7]

				class TopLevelNode(ConfigNode):

				    def __init__(self, node_name, config_tree_data, smoke):

				        super(TopLevelNode, self).__init__(None, node_name)

				        self.config_tree_data = config_tree_data

				        self.props["smoke"] = smoke

				    def get_children(self):

				        return [OSConfigNode(self, x, c, p) for (x, (c, p)) in self.config_tree_data.items()]

				class OSConfigNode(ConfigNode):

				    def __init__(self, parent, os_name, cuda_versions, py_tree):

				        super(OSConfigNode, self).__init__(parent, os_name)

				        self.py_tree = py_tree

				        self.props["os_name"] = os_name

				        self.props["cuda_versions"] = cuda_versions

				    def get_children(self):

				        return [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]

				class PackageFormatConfigNode(ConfigNode):

				    def __init__(self, parent, package_format, python_versions):

				        super(PackageFormatConfigNode, self).__init__(parent, package_format)

				        self.props["python_versions"] = python_versions

				        self.props["package_format"] = package_format

				    def get_children(self):

				        if self.find_prop("os_name") == "linux" and self.find_prop("package_format") != "conda":

				            return [LinuxGccConfigNode(self, v) for v in DEVTOOLSET_VERSIONS]

				        else:

				            return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				class LinuxGccConfigNode(ConfigNode):

				    def __init__(self, parent, devtoolset_version):

				        super(LinuxGccConfigNode, self).__init__(parent, "DEVTOOLSET=" + str(devtoolset_version))

				        self.props["devtoolset_version"] = devtoolset_version

				    def get_children(self):

				        return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				class ArchConfigNode(ConfigNode):

				    def __init__(self, parent, cu):

				        super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(cu))

				        self.props["cu"] = cu

				    def get_children(self):

				        return [PyVersionConfigNode(self, v) for v in self.find_prop("python_versions")]

				class PyVersionConfigNode(ConfigNode):

				    def __init__(self, parent, pyver):

				        super(PyVersionConfigNode, self).__init__(parent, pyver)

				        self.props["pyver"] = pyver

				    def get_children(self):

				        smoke = self.find_prop("smoke")

				        package_format = self.find_prop("package_format")

				        os_name = self.find_prop("os_name")

				        has_libtorch_variants = smoke and package_format == "libtorch" and os_name == "linux"

				        linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []

				        return [LinkingVariantConfigNode(self, v) for v in linking_variants]

				class LinkingVariantConfigNode(ConfigNode):

				    def __init__(self, parent, linking_variant):

				        super(LinkingVariantConfigNode, self).__init__(parent, linking_variant)

				    def get_children(self):

				        return [DependencyInclusionConfigNode(self, v) for v in DEPS_INCLUSION_DIMENSIONS]

				class DependencyInclusionConfigNode(ConfigNode):

				    def __init__(self, parent, deps_variant):

				        super(DependencyInclusionConfigNode, self).__init__(parent, deps_variant)

				        self.props["libtorch_variant"] = "-".join([self.parent.get_label(), self.get_label()])

									
										213

.circleci/cimodel/data/binary_build_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,213 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				import cimodel.data.binary_build_data as binary_build_data

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.visualization as visualization

				class Conf(object):

				    def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, devtoolset_version):

				        self.os = os

				        self.cuda_version = cuda_version

				        self.pydistro = pydistro

				        self.parms = parms

				        self.smoke = smoke

				        self.libtorch_variant = libtorch_variant

				        self.devtoolset_version = devtoolset_version

				    def gen_build_env_parms(self):

				        elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]

				        if self.devtoolset_version is not None:

				            elems.append("devtoolset" + str(self.devtoolset_version))

				        return elems

				    def gen_docker_image(self):

				        docker_word_substitution = {

				            "manywheel": "manylinux",

				            "libtorch": "manylinux",

				        }

				        docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)

				        alt_docker_suffix = self.cuda_version or "80"

				        docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix

				        return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				    def get_name_prefix(self):

				        return "smoke" if self.smoke else "binary"

				    def gen_build_name(self, build_or_test):

				        parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()

				        if self.smoke:

				            if self.libtorch_variant:

				                parts.append(self.libtorch_variant)

				        else:

				            parts.append(build_or_test)

				        return "_".join(parts)

				    def gen_yaml_tree(self, build_or_test):

				        env_tuples = [("BUILD_ENVIRONMENT", miniutils.quote(" ".join(self.gen_build_env_parms())))]

				        if self.libtorch_variant:

				            env_tuples.append(("LIBTORCH_VARIANT", miniutils.quote(self.libtorch_variant)))

				        os_name = miniutils.override(self.os, {"macos": "mac"})

				        d = {"<<": "*" + "_".join([self.get_name_prefix(), os_name, build_or_test])}

				        if build_or_test == "test":

				            if not (self.smoke and self.os == "macos"):

				                env_tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))

				            if self.cuda_version:

				                env_tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))

				        else:

				            if self.os == "linux" and build_or_test != "upload":

				                d["docker"] = [{"image": self.gen_docker_image()}]

				        d["environment"] = OrderedDict(env_tuples)

				        if build_or_test == "test":

				            if self.cuda_version:

				                d["resource_class"] = "gpu.medium"

				        return d

				def get_root(smoke, name):

				    return binary_build_data.TopLevelNode(

				        name,

				        binary_build_data.CONFIG_TREE_DATA,

				        smoke,

				    )

				def gen_build_env_list(smoke):

				    root = get_root(smoke, "N/A")

				    config_list = conf_tree.dfs(root)

				    newlist = []

				    for c in config_list:

				        conf = Conf(

				            c.find_prop("os_name"),

				            c.find_prop("cu"),

				            c.find_prop("package_format"),

				            [c.find_prop("pyver")],

				            c.find_prop("smoke"),

				            c.find_prop("libtorch_variant"),

				            c.find_prop("devtoolset_version"),

				        )

				        newlist.append(conf)

				    return newlist

				def predicate_exclude_nonlinux_and_libtorch(config):

				    return config.os == "linux" and (config.smoke or config.pydistro != "libtorch")

				def add_build_entries(jobs_dict, phase, smoke, filter_predicate=lambda x: True):

				    configs = gen_build_env_list(smoke)

				    for conf_options in filter(filter_predicate, configs):

				        jobs_dict[conf_options.gen_build_name(phase)] = conf_options.gen_yaml_tree(phase)

				def add_binary_build_specs(jobs_dict):

				    add_build_entries(jobs_dict, "build", False)

				def add_binary_build_tests(jobs_dict):

				    add_build_entries(jobs_dict, "test", False, predicate_exclude_nonlinux_and_libtorch)

				def add_binary_build_uploads(jobs_dict):

				    add_build_entries(jobs_dict, "upload", False)

				def add_smoke_test_specs(jobs_dict):

				    add_build_entries(jobs_dict, "test", True)

				def get_nightly_tests():

				    configs = gen_build_env_list(False)

				    filtered_configs = filter(predicate_exclude_nonlinux_and_libtorch, configs)

				    mylist = []

				    for conf_options in filtered_configs:

				        d = {conf_options.gen_build_name("test"): {"requires": [conf_options.gen_build_name("build")]}}

				        mylist.append(d)

				    return mylist

				def get_nightly_uploads():

				    configs = gen_build_env_list(False)

				    def gen_config(conf, phase_dependency):

				        return {

				            conf.gen_build_name("upload"): OrderedDict([

				                ("context", "org-member"),

				                ("requires", [conf.gen_build_name(phase_dependency)]),

				            ]),

				        }

				    mylist = []

				    for conf in configs:

				        phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"

				        mylist.append(gen_config(conf, phase_dependency))

				    return mylist

				def gen_schedule_tree(cron_timing):

				    return [{

				        "schedule": {

				            "cron": miniutils.quote(cron_timing),

				            "filters": {

				                "branches": {

				                    "only": ["master"],

				                },

				            },

				        },

				    }]

				def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):

				    jobs_list = []

				    configs = gen_build_env_list(smoke)

				    for build_config in configs:

				        build_name = build_config.gen_build_name("build")

				        jobs_list.append(build_name)

				    jobs_dict[toplevel_key] = OrderedDict(

				        triggers=gen_schedule_tree(cron_schedule),

				        jobs=jobs_list,

				    )

				    graph = visualization.generate_graph(get_root(smoke, toplevel_key))

				    graph.draw(toplevel_key + "-config-dimensions.png", prog="twopi")

				def add_binary_build_jobs(jobs_dict):

				    add_jobs_and_render(jobs_dict, "binarybuilds", False, "5 5 * * *")

				def add_binary_smoke_test_jobs(jobs_dict):

				    add_jobs_and_render(jobs_dict, "binarysmoketests", True, "15 16 * * *")

									
										111

.circleci/cimodel/data/caffe2_build_data.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,111 @@

				#!/usr/bin/env python3

				from cimodel.lib.conf_tree import ConfigNode, X

				from cimodel.lib.conf_tree import Ver

				import cimodel.data.dimensions as dimensions

				CONFIG_TREE_DATA = [

				    (Ver("ubuntu", "14.04"), [

				        (Ver("gcc", "4.8"), [X("py2")]),

				        (Ver("gcc", "4.9"), [X("py2")]),

				    ]),

				    (Ver("ubuntu", "16.04"), [

				        (Ver("cuda", "8.0"), [X("py2")]),

				        (Ver("cuda", "9.0"), [

				            # TODO make explicit that this is a "secret TensorRT build"

				            #  (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)

				            X("py2"),

				            X("cmake"),

				        ]),

				        (Ver("cuda", "9.1"), [X("py2")]),

				        (Ver("mkl"), [X("py2")]),

				        (Ver("gcc", "5"), [X("onnx_py2")]),

				        (Ver("clang", "3.8"), [X("py2")]),

				        (Ver("clang", "3.9"), [X("py2")]),

				        (Ver("clang", "7"), [X("py2")]),

				        (Ver("android"), [X("py2")]),

				    ]),

				    (Ver("centos", "7"), [

				        (Ver("cuda", "9.0"), [X("py2")]),

				    ]),

				    (Ver("macos", "10.13"), [

				        # TODO ios and system aren't related. system qualifies where the python comes

				        #  from (use the system python instead of homebrew or anaconda)

				        (Ver("ios"), [X("py2")]),

				        (Ver("system"), [X("py2")]),

				    ]),

				]

				class TreeConfigNode(ConfigNode):

				    def __init__(self, parent, node_name, subtree):

				        super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))

				        self.subtree = subtree

				        self.init2(node_name)

				    # noinspection PyMethodMayBeStatic

				    def modify_label(self, label):

				        return str(label)

				    def init2(self, node_name):

				        pass

				    def get_children(self):

				        return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]

				    def is_build_only(self):

				        return str(self.find_prop("compiler_version")) in [

				            "gcc4.9",

				            "clang3.8",

				            "clang3.9",

				            "clang7",

				            "android",

				        ] or self.find_prop("distro_version").name == "macos"

				class TopLevelNode(TreeConfigNode):

				    def __init__(self, node_name, subtree):

				        super(TopLevelNode, self).__init__(None, node_name, subtree)

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return DistroConfigNode

				class DistroConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["distro_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return CompilerConfigNode

				class CompilerConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return LanguageConfigNode

				class LanguageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["language_version"] = node_name

				        self.props["build_only"] = self.is_build_only()

				    def get_children(self):

				        children = []

				        for phase in dimensions.PHASES:

				            if phase == "build" or not self.props["build_only"]:

				                children.append(PhaseConfigNode(self, phase, []))

				        return children

				class PhaseConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["phase_name"] = node_name

									
										166

.circleci/cimodel/data/caffe2_build_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,166 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.visualization as visualization

				from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"

				DOCKER_IMAGE_VERSION = 266

				class Conf(object):

				    def __init__(self, language, distro, compiler, phase, build_only):

				        self.language = language

				        self.distro = distro

				        self.compiler = compiler

				        self.phase = phase

				        self.build_only = build_only

				    # TODO: Eventually we can probably just remove the cudnn7 everywhere.

				    def get_cudnn_insertion(self):

				        omit = self.language == "onnx_py2" \

				            or self.compiler.name in ["android", "mkl", "clang"] \

				            or str(self.distro) in ["ubuntu14.04", "macos10.13"]

				        return [] if omit else ["cudnn7"]

				    def get_build_name_root_parts(self):

				        return [

				            "caffe2",

				            self.language,

				        ] + self.get_build_name_middle_parts()

				    def get_build_name_middle_parts(self):

				        return [str(self.compiler)] + self.get_cudnn_insertion() + [str(self.distro)]

				    def construct_phase_name(self, phase):

				        root_parts = self.get_build_name_root_parts()

				        return "_".join(root_parts + [phase]).replace(".", "_")

				    def get_name(self):

				        return self.construct_phase_name(self.phase)

				    def get_platform(self):

				        platform = self.distro.name

				        if self.distro.name != "macos":

				            platform = "linux"

				        return platform

				    def gen_docker_image(self):

				        lang_substitutions = {

				            "onnx_py2": "py2",

				            "cmake": "py2",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				        parts = [lang] + self.get_build_name_middle_parts()

				        return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))

				    def gen_yaml_tree(self):

				        tuples = []

				        lang_substitutions = {

				            "onnx_py2": "onnx-py2",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				        parts = [

				            "caffe2",

				            lang,

				        ] + self.get_build_name_middle_parts() + [self.phase]

				        build_env = "-".join(parts)

				        if not self.distro.name == "macos":

				            build_env = miniutils.quote(build_env)

				        tuples.append(("BUILD_ENVIRONMENT", build_env))

				        if self.compiler.name == "ios":

				            tuples.append(("BUILD_IOS", miniutils.quote("1")))

				        if self.phase == "test":

				            # TODO cuda should not be considered a compiler

				            if self.compiler.name == "cuda":

				                tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))

				        if self.distro.name == "macos":

				            tuples.append(("PYTHON_VERSION", miniutils.quote("2")))

				        else:

				            tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))

				            if self.build_only:

				                tuples.append(("BUILD_ONLY", miniutils.quote("1")))

				        d = OrderedDict({"environment": OrderedDict(tuples)})

				        if self.phase == "test":

				            resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"

				            d["resource_class"] = resource_class

				        d["<<"] = "*" + "_".join(["caffe2", self.get_platform(), self.phase, "defaults"])

				        return d

				def get_root():

				    return TopLevelNode("Caffe2 Builds", CONFIG_TREE_DATA)

				def instantiate_configs():

				    config_list = []

				    root = get_root()

				    found_configs = conf_tree.dfs(root)

				    for fc in found_configs:

				        c = Conf(

				            fc.find_prop("language_version"),

				            fc.find_prop("distro_version"),

				            fc.find_prop("compiler_version"),

				            fc.find_prop("phase_name"),

				            fc.find_prop("build_only"),

				        )

				        config_list.append(c)

				    return config_list

				def add_caffe2_builds(jobs_dict):

				    configs = instantiate_configs()

				    for conf_options in configs:

				        jobs_dict[conf_options.get_name()] = conf_options.gen_yaml_tree()

				    graph = visualization.generate_graph(get_root())

				    graph.draw("caffe2-config-dimensions.png", prog="twopi")

				def get_caffe2_workflows():

				    configs = instantiate_configs()

				    # TODO Why don't we build this config?

				    # See https://github.com/pytorch/pytorch/pull/17323#discussion_r259450540

				    filtered_configs = filter(lambda x: not (str(x.distro) == "ubuntu14.04" and str(x.compiler) == "gcc4.9"), configs)

				    x = []

				    for conf_options in filtered_configs:

				        item = conf_options.get_name()

				        if conf_options.phase == "test":

				            item = {conf_options.get_name(): {"requires": [conf_options.construct_phase_name("build")]}}

				        x.append(item)

				    return x

									
										18

.circleci/cimodel/data/dimensions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,18 @@

				#!/usr/bin/env python3

				PHASES = ["build", "test"]

				CUDA_VERSIONS = [

				    None,  # cpu build

				    "80",

				    "90",

				    "100",

				]

				STANDARD_PYTHON_VERSIONS = [

				    "2.7",

				    "3.5",

				    "3.6",

				    "3.7",

				]

									
										147

.circleci/cimodel/data/pytorch_build_data.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,147 @@

				#!/usr/bin/env python3

				from cimodel.lib.conf_tree import ConfigNode, X

				CONFIG_TREE_DATA = [

				    ("trusty", [

				        (None, [

				            X("2.7.9"),

				            X("2.7"),

				            X("3.5"),

				            X("nightly"),

				        ]),

				        ("gcc", [

				            ("4.8", [X("3.6")]),

				            ("5.4", [("3.6", [X(False), X(True)])]),

				            ("7", [X("3.6")]),

				        ]),

				    ]),

				    ("xenial", [

				        ("clang", [

				            ("5", [X("3.6")]),

				        ]),

				        ("cuda", [

				            ("8", [X("3.6")]),

				            ("9", [

				                # Note there are magic strings here

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L21

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L143

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153

				                # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)

				                X("2.7"),

				                X("3.6"),

				            ]),

				            ("9.2", [X("3.6")]),

				            ("10", [X("3.6")]),

				        ]),

				        ("android", [

				            ("r19c", [X("3.6")]),

				        ]),

				    ]),

				]

				def get_major_pyver(dotted_version):

				    parts = dotted_version.split(".")

				    return "py" + parts[0]

				class TreeConfigNode(ConfigNode):

				    def __init__(self, parent, node_name, subtree):

				        super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))

				        self.subtree = subtree

				        self.init2(node_name)

				    def modify_label(self, label):

				        return label

				    def init2(self, node_name):

				        pass

				    def get_children(self):

				        return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]

				class TopLevelNode(TreeConfigNode):

				    def __init__(self, node_name, subtree):

				        super(TopLevelNode, self).__init__(None, node_name, subtree)

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return DistroConfigNode

				class DistroConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["distro_name"] = node_name

				    def child_constructor(self):

				        distro = self.find_prop("distro_name")

				        next_nodes = {

				            "trusty": TrustyCompilerConfigNode,

				            "xenial": XenialCompilerConfigNode,

				        }

				        return next_nodes[distro]

				class TrustyCompilerConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return label or "<unspecified>"

				    def init2(self, node_name):

				        self.props["compiler_name"] = node_name

				    def child_constructor(self):

				        return TrustyCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode

				class TrustyCompilerVersionConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return PyVerConfigNode

				class PyVerConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["pyver"] = node_name

				        self.props["abbreviated_pyver"] = get_major_pyver(node_name)

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return XlaConfigNode

				class XlaConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "XLA=" + str(label)

				    def init2(self, node_name):

				        self.props["is_xla"] = node_name

				class XenialCompilerConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_name"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return XenialCompilerVersionConfigNode

				class XenialCompilerVersionConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return PyVerConfigNode

									
										299

.circleci/cimodel/data/pytorch_build_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,299 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA

				import cimodel.data.dimensions as dimensions

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.visualization as visualization

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"

				DOCKER_IMAGE_VERSION = 300

				class Conf(object):

				    def __init__(self,

				                 distro,

				                 parms,

				                 pyver=None,

				                 cuda_version=None,

				                 is_xla=False,

				                 restrict_phases=None,

				                 gpu_resource=None,

				                 dependent_tests=None,

				                 parent_build=None):

				        self.distro = distro

				        self.pyver = pyver

				        self.parms = parms

				        self.cuda_version = cuda_version

				        # TODO expand this to cover all the USE_* that we want to test for

				        #  tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.

				        # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)

				        self.is_xla = is_xla

				        self.restrict_phases = restrict_phases

				        self.gpu_resource = gpu_resource

				        self.dependent_tests = dependent_tests or []

				        self.parent_build = parent_build

				    # TODO: Eliminate the special casing for docker paths

				    # In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch

				    def get_parms(self, for_docker):

				        leading = ["pytorch"]

				        if self.is_xla and not for_docker:

				            leading.append("xla")

				        cuda_parms = []

				        if self.cuda_version:

				            cuda_parms.extend(["cuda" + self.cuda_version, "cudnn7"])

				        return leading + ["linux", self.distro] + cuda_parms + self.parms

				    def gen_docker_image_path(self):

				        parms_source = self.parent_build or self

				        base_build_env_name = "-".join(parms_source.get_parms(True))

				        return miniutils.quote(DOCKER_IMAGE_PATH_BASE + base_build_env_name + ":" + str(DOCKER_IMAGE_VERSION))

				    def get_build_job_name_pieces(self, build_or_test):

				        return self.get_parms(False) + [build_or_test]

				    def gen_build_name(self, build_or_test):

				        return ("_".join(map(str, self.get_build_job_name_pieces(build_or_test)))).replace(".", "_").replace("-", "_")

				    def get_dependents(self):

				        return self.dependent_tests or []

				    def gen_yaml_tree(self, build_or_test):

				        build_job_name_pieces = self.get_build_job_name_pieces(build_or_test)

				        build_env_name = "-".join(map(str, build_job_name_pieces))

				        env_dict = OrderedDict([

				            ("BUILD_ENVIRONMENT", build_env_name),

				            ("DOCKER_IMAGE", self.gen_docker_image_path()),

				        ])

				        if self.pyver:

				            env_dict["PYTHON_VERSION"] = miniutils.quote(self.pyver)

				        if build_or_test == "test" and self.gpu_resource:

				            env_dict["USE_CUDA_DOCKER_RUNTIME"] = miniutils.quote("1")

				        d = {

				            "environment": env_dict,

				            "<<": "*" + "_".join(["pytorch", "linux", build_or_test, "defaults"]),

				        }

				        if build_or_test == "test":

				            resource_class = "large"

				            if self.gpu_resource:

				                resource_class = "gpu." + self.gpu_resource

				                if self.gpu_resource == "large":

				                    env_dict["MULTI_GPU"] = miniutils.quote("1")

				            d["resource_class"] = resource_class

				        return d

				    def gen_workflow_yaml_item(self, phase):

				        if phase == "test":

				            val = OrderedDict()

				            # TODO When merging the caffe2 and pytorch jobs, it might be convenient for a while to make a

				            #  caffe2 test job dependent on a pytorch build job. This way we could quickly dedup the repeated

				            #  build of pytorch in the caffe2 build job, and just run the caffe2 tests off of a completed

				            #  pytorch build job (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259452641)

				            dependency_build = self.parent_build or self

				            val["requires"] = [dependency_build.gen_build_name("build")]

				            return {self.gen_build_name(phase): val}

				        else:

				            return self.gen_build_name(phase)

				# TODO This is a hack to special case some configs just for the workflow list

				class HiddenConf(object):

				    def __init__(self, name, parent_build=None):

				        self.name = name

				        self.parent_build = parent_build

				    def gen_workflow_yaml_item(self, phase):

				        return {self.gen_build_name(phase): {"requires": [self.parent_build.gen_build_name("build")]}}

				    def gen_build_name(self, _):

				        return self.name

				# TODO Convert these to graph nodes

				def gen_dependent_configs(xenial_parent_config):

				    extra_parms = [

				        (["multigpu"], "large"),

				        (["NO_AVX2"], "medium"),

				        (["NO_AVX", "NO_AVX2"], "medium"),

				        (["slow"], "medium"),

				        (["nogpu"], None),

				    ]

				    configs = []

				    for parms, gpu in extra_parms:

				        c = Conf(

				            xenial_parent_config.distro,

				            ["py3"] + parms,

				            pyver="3.6",

				            cuda_version=xenial_parent_config.cuda_version,

				            restrict_phases=["test"],

				            gpu_resource=gpu,

				            parent_build=xenial_parent_config,

				        )

				        configs.append(c)

				    for x in ["pytorch_short_perf_test_gpu", "pytorch_doc_push"]:

				        configs.append(HiddenConf(x, parent_build=xenial_parent_config))

				    return configs

				def get_root():

				    return TopLevelNode("PyTorch Builds", CONFIG_TREE_DATA)

				def gen_tree():

				    root = get_root()

				    configs_list = conf_tree.dfs(root)

				    return configs_list

				def instantiate_configs():

				    config_list = []

				    root = get_root()

				    found_configs = conf_tree.dfs(root)

				    restrict_phases = None

				    for fc in found_configs:

				        distro_name = fc.find_prop("distro_name")

				        python_version = None

				        if distro_name == "xenial":

				            python_version = fc.find_prop("pyver")

				            parms_list = [fc.find_prop("abbreviated_pyver")]

				        else:

				            parms_list = ["py" + fc.find_prop("pyver")]

				        compiler_name = fc.find_prop("compiler_name")

				        cuda_version = None

				        if compiler_name == "cuda":

				            cuda_version = fc.find_prop("compiler_version")

				        elif compiler_name == "android":

				            android_ndk_version = fc.find_prop("compiler_version")

				            # TODO: do we need clang to compile host binaries like protoc?

				            parms_list.append("clang5")

				            parms_list.append("android-ndk-" + android_ndk_version)

				            restrict_phases = ["build"]

				        elif compiler_name:

				            gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")

				            parms_list.append(gcc_version)

				            if compiler_name == "clang":

				                parms_list.append("asan")

				        if cuda_version in ["9.2", "10"]:

				            # TODO The gcc version is orthogonal to CUDA version?

				            parms_list.append("gcc7")

				        is_xla = fc.find_prop("is_xla") or False

				        gpu_resource = None

				        if cuda_version and cuda_version != "10":

				            gpu_resource = "medium"

				        c = Conf(

				            distro_name,

				            parms_list,

				            python_version,

				            cuda_version,

				            is_xla,

				            restrict_phases,

				            gpu_resource,

				        )

				        if cuda_version == "8":

				            c.dependent_tests = gen_dependent_configs(c)

				        config_list.append(c)

				    return config_list

				def add_build_env_defs(jobs_dict):

				    mydict = OrderedDict()

				    config_list = instantiate_configs()

				    for c in config_list:

				        phases = c.restrict_phases or dimensions.PHASES

				        for phase in phases:

				            # TODO why does this not have a test?

				            if phase == "test" and c.cuda_version == "10":

				                continue

				            d = c.gen_yaml_tree(phase)

				            mydict[c.gen_build_name(phase)] = d

				            if phase == "test":

				                for x in filter(lambda x: type(x) is not HiddenConf, c.get_dependents()):

				                    d = x.gen_yaml_tree(phase)

				                    mydict[x.gen_build_name(phase)] = d

				    # this is the circleci api version and probably never changes

				    jobs_dict["version"] = 2

				    jobs_dict["jobs"] = mydict

				    graph = visualization.generate_graph(get_root())

				    graph.draw("pytorch-config-dimensions.png", prog="twopi")

				def get_workflow_list():

				    config_list = instantiate_configs()

				    x = []

				    for conf_options in config_list:

				        phases = conf_options.restrict_phases or dimensions.PHASES

				        for phase in phases:

				            # TODO why does this not have a test?

				            if phase == "test" and conf_options.cuda_version == "10":

				                continue

				            x.append(conf_options.gen_workflow_yaml_item(phase))

				        # TODO convert to recursion

				        for conf in conf_options.get_dependents():

				            x.append(conf.gen_workflow_yaml_item("test"))

				    return x

0

caffe2/contrib/cuda-convnet2/cudaconvnet/init.py → .circleci/cimodel/lib/init.py

View File

									
										101

.circleci/cimodel/lib/conf_tree.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,101 @@

				#!/usr/bin/env python3

				def X(val):

				    """

				    Compact way to write a leaf node

				    """

				    return val, []

				class Ver(object):

				    """

				    Represents a product with a version number

				    """

				    def __init__(self, name, version=""):

				        self.name = name

				        self.version = version

				    def __str__(self):

				        return self.name + self.version

				class ConfigNode(object):

				    def __init__(self, parent, node_name):

				        self.parent = parent

				        self.node_name = node_name

				        self.props = {}

				    def get_label(self):

				        return self.node_name

				    # noinspection PyMethodMayBeStatic

				    def get_children(self):

				        return []

				    def get_parents(self):

				        return (self.parent.get_parents() + [self.parent.get_label()]) if self.parent else []

				    def get_depth(self):

				        return len(self.get_parents())

				    def get_node_key(self):

				        return "%".join(self.get_parents() + [self.get_label()])

				    def find_prop(self, propname, searched=None):

				        """

				        Checks if its own dictionary has

				        the property, otherwise asks parent node.

				        """

				        if searched is None:

				            searched = []

				        searched.append(self.node_name)

				        if propname in self.props:

				            return self.props[propname]

				        elif self.parent:

				            return self.parent.find_prop(propname, searched)

				        else:

				            # raise Exception('Property "%s" does not exist anywhere in the tree! Searched: %s' % (propname, searched))

				            return None

				def dfs_recurse(

				        node,

				        leaf_callback=lambda x: None,

				        discovery_callback=lambda x, y, z: None,

				        child_callback=lambda x, y: None,

				        sibling_index=0,

				        sibling_count=1):

				    discovery_callback(node, sibling_index, sibling_count)

				    node_children = node.get_children()

				    if node_children:

				        for i, child in enumerate(node_children):

				            child_callback(node, child)

				            dfs_recurse(

				                child,

				                leaf_callback,

				                discovery_callback,

				                child_callback,

				                i,

				                len(node_children),

				            )

				    else:

				        leaf_callback(node)

				def dfs(toplevel_config_node):

				    config_list = []

				    def leaf_callback(node):

				        config_list.append(node)

				    dfs_recurse(toplevel_config_node, leaf_callback)

				    return config_list

									
										13

.circleci/cimodel/lib/miniutils.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				#!/usr/bin/env python3

				def quote(s):

				    return sandwich('"', s)

				def sandwich(bread, jam):

				    return bread + jam + bread

				def override(word, substitutions):

				    return substitutions.get(word, word)

									
										64

.circleci/cimodel/lib/miniyaml.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,64 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				LIST_MARKER = "- "

				INDENTATION_WIDTH = 2

				def is_dict(data):

				    return type(data) is dict or type(data) is OrderedDict

				def is_collection(data):

				    return is_dict(data) or type(data) is list

				# TODO can eventually drop this custom sorting

				def sortkey(x):

				    k = x[0]

				    return (

				        k == "<<",

				        k != "environment",

				        k,

				    )

				def render(fh, data, depth, is_list_member=False):

				    """

				    PyYaml does not allow precise control over the quoting

				    behavior, especially for merge references.

				    Therefore, we use this custom YAML renderer.

				    """

				    indentation = " " * INDENTATION_WIDTH * depth

				    if is_dict(data):

				        tuples = list(data.items())

				        if type(data) is not OrderedDict:

				            tuples.sort(key=sortkey)

				        for i, (k, v) in enumerate(tuples):

				            # If this dict is itself a list member, the first key gets prefixed with a list marker

				            list_marker_prefix = LIST_MARKER if is_list_member and not i else ""

				            trailing_whitespace = "\n" if is_collection(v) else " "

				            fh.write(indentation + list_marker_prefix + k + ":" + trailing_whitespace)

				            render(fh, v, depth + 1 + int(is_list_member))

				        # TODO Could eventually drop this cosmetic convention

				        if depth == 2:

				            fh.write("\n")

				    elif type(data) is list:

				        for v in data:

				            render(fh, v, depth, True)

				    else:

				        list_member_prefix = indentation + LIST_MARKER if is_list_member else ""

				        fh.write(list_member_prefix + str(data) + "\n")

									
										86

.circleci/cimodel/lib/visualization.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,86 @@

				#!/usr/bin/env python3

				"""

				This module encapsulates dependencies on pygraphviz

				"""

				import colorsys

				import cimodel.lib.conf_tree as conf_tree

				def rgb2hex(rgb_tuple):

				    def to_hex(f):

				        return "%02x" % int(f * 255)

				    return "#" + "".join(map(to_hex, list(rgb_tuple)))

				def handle_missing_graphviz(f):

				    """

				    If the user has not installed pygraphviz, this causes

				    calls to the draw() method of the returned object to do nothing.

				    """

				    try:

				        import pygraphviz  # noqa: F401

				        return f

				    except ModuleNotFoundError:

				        class FakeGraph:

				            def draw(self, *args, **kwargs):

				                pass

				        return lambda _: FakeGraph()

				@handle_missing_graphviz

				def generate_graph(toplevel_config_node):

				    """

				    Traverses the graph once first just to find the max depth

				    """

				    config_list = conf_tree.dfs(toplevel_config_node)

				    max_depth = 0

				    for config in config_list:

				        max_depth = max(max_depth, config.get_depth())

				    # color the nodes using the max depth

				    from pygraphviz import AGraph

				    dot = AGraph()

				    def node_discovery_callback(node, sibling_index, sibling_count):

				        depth = node.get_depth()

				        sat_min, sat_max = 0.1, 0.6

				        sat_range = sat_max - sat_min

				        saturation_fraction = sibling_index / float(sibling_count - 1) if sibling_count > 1 else 1

				        saturation = sat_min + sat_range * saturation_fraction

				        # TODO Use a hash of the node label to determine the color

				        hue = depth / float(max_depth + 1)

				        rgb_tuple = colorsys.hsv_to_rgb(hue, saturation, 1)

				        this_node_key = node.get_node_key()

				        dot.add_node(

				            this_node_key,

				            label=node.get_label(),

				            style="filled",

				            # fillcolor=hex_color + ":orange",

				            fillcolor=rgb2hex(rgb_tuple),

				            penwidth=3,

				            color=rgb2hex(colorsys.hsv_to_rgb(hue, saturation, 0.9))

				        )

				    def child_callback(node, child):

				        this_node_key = node.get_node_key()

				        child_node_key = child.get_node_key()

				        dot.add_edge((this_node_key, child_node_key))

				    conf_tree.dfs_recurse(toplevel_config_node, lambda x: None, node_discovery_callback, child_callback)

				    return dot

4406

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										39

.circleci/ensure-consistency.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,39 @@

				#!/usr/bin/env python3

				import os

				import subprocess

				import sys

				import tempfile

				import generate_config_yml

				CHECKED_IN_FILE = "config.yml"

				REGENERATION_SCRIPT = "regenerate.sh"

				PARENT_DIR = os.path.basename(os.path.dirname(os.path.abspath(__file__)))

				README_PATH = os.path.join(PARENT_DIR, "README.md")

				ERROR_MESSAGE_TEMPLATE = """

				The checked-in CircleCI "%s" file does not match what was generated by the scripts.

				Please re-run the "%s" script in the "%s" directory and commit the result. See "%s" for more information.

				"""

				def check_consistency():

				    _, temp_filename = tempfile.mkstemp("-generated-config.yml")

				    with open(temp_filename, "w") as fh:

				        generate_config_yml.stitch_sources(fh)

				    try:

				        subprocess.check_call(["cmp", temp_filename, CHECKED_IN_FILE])

				    except subprocess.CalledProcessError:

				        sys.exit(ERROR_MESSAGE_TEMPLATE % (CHECKED_IN_FILE, REGENERATION_SCRIPT, PARENT_DIR, README_PATH))

				    finally:

				        os.remove(temp_filename)

				if __name__ == "__main__":

				    check_consistency()

									
										126

.circleci/generate_config_yml.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,126 @@

				#!/usr/bin/env python3

				"""

				This script is the source of truth for config.yml.

				Please see README.md in this directory for details.

				"""

				import os

				import sys

				import shutil

				from collections import namedtuple, OrderedDict

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.binary_build_definitions as binary_build_definitions

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.miniyaml as miniyaml

				class File(object):

				    """

				    Verbatim copy the contents of a file into config.yml

				    """

				    def __init__(self, filename):

				        self.filename = filename

				    def write(self, output_filehandle):

				        with open(os.path.join("verbatim-sources", self.filename)) as fh:

				            shutil.copyfileobj(fh, output_filehandle)

				class FunctionGen(namedtuple('FunctionGen', 'function depth')):

				    __slots__ = ()

				class Treegen(FunctionGen):

				    """

				    Insert the content of a YAML tree into config.yml

				    """

				    def write(self, output_filehandle):

				        build_dict = OrderedDict()

				        self.function(build_dict)

				        miniyaml.render(output_filehandle, build_dict, self.depth)

				class Listgen(FunctionGen):

				    """

				    Insert the content of a YAML list into config.yml

				    """

				    def write(self, output_filehandle):

				        miniyaml.render(output_filehandle, self.function(), self.depth)

				def horizontal_rule():

				    return "".join("#" * 78)

				class Header(object):

				    def __init__(self, title, summary=None):

				        self.title = title

				        self.summary_lines = summary or []

				    def write(self, output_filehandle):

				        text_lines = [self.title] + self.summary_lines

				        comment_lines = ["# " + x for x in text_lines]

				        lines = miniutils.sandwich([horizontal_rule()], comment_lines)

				        for line in filter(None, lines):

				            output_filehandle.write(line + "\n")

				# Order of this list matters to the generated config.yml.

				YAML_SOURCES = [

				    File("header-section.yml"),

				    File("linux-build-defaults.yml"),

				    File("macos-build-defaults.yml"),

				    File("nightly-binary-build-defaults.yml"),

				    File("linux-binary-build-defaults.yml"),

				    File("macos-binary-build-defaults.yml"),

				    File("nightly-build-smoke-tests-defaults.yml"),

				    Header("Job specifications job specs"),

				    Treegen(pytorch_build_definitions.add_build_env_defs, 0),

				    File("job-specs-custom.yml"),

				    Treegen(caffe2_build_definitions.add_caffe2_builds, 1),

				    File("binary_update_htmls.yml"),

				    Header("Binary build specs individual job specifications"),

				    Treegen(binary_build_definitions.add_binary_build_specs, 1),

				    Header(

				        "Binary build tests", [

				            "These are the smoke tests run right after the build, before the upload.",

				            "If these fail, the upload doesn't happen."

				        ]

				    ),

				    Treegen(binary_build_definitions.add_binary_build_tests, 1),

				    File("binary-build-tests.yml"),

				    Header("Binary build uploads"),

				    Treegen(binary_build_definitions.add_binary_build_uploads, 1),

				    Header("Smoke test specs individual job specifications"),

				    Treegen(binary_build_definitions.add_smoke_test_specs, 1),

				    File("workflows.yml"),

				    Listgen(pytorch_build_definitions.get_workflow_list, 3),

				    File("workflows-pytorch-macos-builds.yml"),

				    Listgen(caffe2_build_definitions.get_caffe2_workflows, 3),

				    File("workflows-binary-builds-smoke-subset.yml"),

				    Header("Daily smoke test trigger"),

				    Treegen(binary_build_definitions.add_binary_smoke_test_jobs, 1),

				    Header("Daily binary build trigger"),

				    Treegen(binary_build_definitions.add_binary_build_jobs, 1),

				    Header("Nightly tests"),

				    Listgen(binary_build_definitions.get_nightly_tests, 3),

				    File("workflows-nightly-uploads-header.yml"),

				    Listgen(binary_build_definitions.get_nightly_uploads, 3),

				    File("workflows-s3-html.yml"),

				]

				def stitch_sources(output_filehandle):

				    for f in YAML_SOURCES:

				        f.write(output_filehandle)

				if __name__ == "__main__":

				    stitch_sources(sys.stdout)

									
										8

.circleci/regenerate.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,8 @@

				#!/bin/bash -xe

				# Allows this script to be invoked from any directory:

				cd $(dirname "$0")

				NEW_FILE=$(mktemp)

				./generate_config_yml.py > $NEW_FILE

				cp $NEW_FILE config.yml

									
										50

.circleci/scripts/binary_checkout.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,50 @@

				#!/bin/bash

				set -ex

				# This step runs on multiple executors with different envfile locations

				if [[ "$(uname)" == Darwin ]]; then

				  # macos executor (builds and tests)

				  workdir="/Users/distiller/project"

				elif [[ -d "/home/circleci/project" ]]; then

				  # machine executor (binary tests)

				  workdir="/home/circleci/project"

				else

				  # docker executor (binary builds)

				  workdir="/"

				fi

				# It is very important that this stays in sync with binary_populate_env.sh

				export PYTORCH_ROOT="$workdir/pytorch"

				export BUILDER_ROOT="$workdir/builder"

				# Clone the Pytorch branch

				git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

				pushd "$PYTORCH_ROOT"

				if [[ -n "$CIRCLE_PR_NUMBER" ]]; then

				  # "smoke" binary build on PRs

				  git fetch --force origin "pull/${CIRCLE_PR_NUMBER}/head:remotes/origin/pull/${CIRCLE_PR_NUMBER}"

				  git reset --hard "$CIRCLE_SHA1"

				  git checkout -q -B "$CIRCLE_BRANCH"

				  git reset --hard "$CIRCLE_SHA1"

				elif [[ -n "$CIRCLE_SHA1" ]]; then

				  # "smoke" binary build on master on PR merges

				  git reset --hard "$CIRCLE_SHA1"

				  git checkout -q -B master

				else

				  # nightly binary builds. These run at 05:05 UTC every day. 

				  last_commit="$(git rev-list --before "$(date -u +%Y-%m-%d) 05:00" --max-count 1 HEAD)"

				  git checkout "$last_commit"

				fi

				git submodule update --init --recursive --quiet

				echo "Using Pytorch from "

				git --no-pager log --max-count 1

				popd

				# Clone the Builder master repo

				git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				git fetch origin

				git reset origin/master --hard

				echo "Using builder from "

				git --no-pager log --max-count 1

				popd

									
										28

.circleci/scripts/binary_install_miniconda.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,28 @@

				#!/bin/bash

				set -ex

				# This step runs on multiple executors with different envfile locations

				if [[ "$(uname)" == Darwin ]]; then

				  source "/Users/distiller/project/env"

				elif [[ -d "/home/circleci/project" ]]; then

				  # machine executor (binary tests)

				  source "/home/circleci/project/env"

				else

				  # docker executor (binary builds)

				  source "/env"

				fi

				conda_sh="$workdir/install_miniconda.sh"

				if [[ "$(uname)" == Darwin ]]; then

				  retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				else

				  retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

				fi

				chmod +x "$conda_sh"

				"$conda_sh" -b -p "$MINICONDA_ROOT"

				rm -f "$conda_sh"

				export PATH="$MINICONDA_ROOT/bin:$PATH"

				source "$MINICONDA_ROOT/bin/activate"

				# We can't actually add miniconda to the PATH in the envfile, because that

				# breaks 'unbuffer' in Mac jobs. This is probably because conda comes with

				# a tclsh, which then gets inserted before the tclsh needed in /usr/bin

									
										30

.circleci/scripts/binary_linux_build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,30 @@

				#!/bin/bash

				echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"

				set -ex

				source /env

				# Defaults here so they can be changed in one place

				export MAX_JOBS=12

				# Parse the parameters

				if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				  build_script='conda/build_pytorch.sh'

				elif [[ "$DESIRED_CUDA" == cpu ]]; then

				  build_script='manywheel/build_cpu.sh'

				else

				  build_script='manywheel/build.sh'

				fi

				# We want to call unbuffer, which calls tclsh which finds the expect

				# package. The expect was installed by yum into /usr/bin so we want to

				# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in

				# the conda docker images.

				if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				  mkdir /just_tclsh_bin

				  ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh

				  export PATH=/just_tclsh_bin:$PATH

				fi

				# Build the package

				SKIP_ALL_TESTS=1 unbuffer "/builder/$build_script" | ts

									
										35

.circleci/scripts/binary_linux_test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,35 @@

				#!/bin/bash

				source /home/circleci/project/env

				cat >/home/circleci/project/ci_test_script.sh <<EOL

				# =================== The following code will be executed inside Docker container ===================

				set -ex

				# Set up Python

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda create -qyn testenv python="$DESIRED_PYTHON"

				  source activate testenv >/dev/null

				elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then

				  export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"

				else

				  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				  export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"

				fi

				# Install the package

				# These network calls should not have 'retry's because they are installing

				# locally

				pkg="/final_pkgs/\$(ls /final_pkgs)"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "\$pkg" --offline

				else

				  pip install "\$pkg"

				fi

				# Test the package

				pushd /pytorch

				/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"

				# =================== The above code will be executed inside Docker container ===================

				EOL

				echo "Prepared script to run in next step"

				cat /home/circleci/project/ci_test_script.sh

									
										38

.circleci/scripts/binary_linux_upload.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,38 @@

				#!/bin/bash

				# Do NOT set -e

				source /home/circleci/project/env

				declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				cat >/home/circleci/project/login_to_anaconda.sh <<EOL

				set +x

				echo "Trying to login to Anaconda"

				yes | anaconda login \

				    --username "$PYTORCH_BINARY_SOUMITH_CONDA_USERNAME" \

				    --password "$PYTORCH_BINARY_SOUMITH_CONDA_PASSWORD"

				set -x

				EOL

				chmod +x /home/circleci/project/login_to_anaconda.sh

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -e ON BEFORE THIS LINE

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				set -ex

				export PATH="$MINICONDA_ROOT/bin:$PATH"

				# Upload the package to the final location

				pushd /home/circleci/project/final_pkgs

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry timeout 30 /home/circleci/project/login_to_anaconda.sh

				  anaconda upload "$(ls)" -u pytorch-testing --label main --no-progress --force

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				else

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				fi

									
										24

.circleci/scripts/binary_macos_build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,24 @@

				#!/bin/bash

				set -ex

				source "/Users/distiller/project/env"

				mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"

				# For some reason `unbuffer` breaks if we change the PATH here, so we

				# write a script with the PATH change in it and unbuffer the whole

				# thing

				build_script="$workdir/build_script.sh"

				touch "$build_script"

				chmod +x "$build_script"

				# Build

				cat >"$build_script" <<EOL

				export PATH="$workdir/miniconda/bin:$PATH"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  "$workdir/builder/conda/build_pytorch.sh"

				else

				  export TORCH_PACKAGE_NAME="$(echo $TORCH_PACKAGE_NAME | tr '-' '_')"

				  "$workdir/builder/wheel/build_wheel.sh"

				fi

				EOL

				unbuffer "$build_script" | ts

									
										30

.circleci/scripts/binary_macos_test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,30 @@

				#!/bin/bash

				set -ex

				source "/Users/distiller/project/env"

				export "PATH=$workdir/miniconda/bin:$PATH"

				pkg="$workdir/final_pkgs/$(ls $workdir/final_pkgs)"

				# Don't test libtorch

				# TODO we should test libtorch

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  exit 0

				fi

				# Create a new test env

				# TODO cut all this out into a separate test job and have an entirely different

				# miniconda

				source deactivate || true

				conda create -qyn test python="$DESIRED_PYTHON"

				source activate test >/dev/null

				# Install the package

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "$pkg" --offline

				else

				  pip install "$pkg" --no-index --no-dependencies -v

				fi

				# Test

				pushd "$workdir/pytorch"

				$workdir/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"

									
										38

.circleci/scripts/binary_macos_upload.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,38 @@

				#!/bin/bash

				# Do NOT set -e

				export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				cat >/Users/distiller/project/login_to_anaconda.sh <<EOL

				set +x

				echo "Trying to login to Anaconda"

				yes | anaconda login \

				    --username "$PYTORCH_BINARY_SOUMITH_CONDA_USERNAME" \

				    --password "$PYTORCH_BINARY_SOUMITH_CONDA_PASSWORD"

				set -x

				EOL

				chmod +x /Users/distiller/project/login_to_anaconda.sh

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -e ON BEFORE THIS LINE

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				set -ex

				source "/Users/distiller/project/env"

				export "PATH=$workdir/miniconda/bin:$PATH"

				pushd "$workdir/final_pkgs"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry /Users/distiller/project/login_to_anaconda.sh

				  retry anaconda upload "$(ls)" -u pytorch-testing --label main --no-progress --force

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				else

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				fi

									
										102

.circleci/scripts/binary_populate_env.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,102 @@

				#!/bin/bash

				set -ex

				export TZ=UTC

				# We need to write an envfile to persist these variables to following

				# steps, but the location of the envfile depends on the circleci executor

				if [[ "$(uname)" == Darwin ]]; then

				  # macos executor (builds and tests)

				  workdir="/Users/distiller/project"

				elif [[ -d "/home/circleci/project" ]]; then

				  # machine executor (binary tests)

				  workdir="/home/circleci/project"

				else

				  # docker executor (binary builds)

				  workdir="/"

				fi

				envfile="$workdir/env"

				touch "$envfile"

				chmod +x "$envfile"

				# Parse the BUILD_ENVIRONMENT to package type, python, and cuda

				configs=($BUILD_ENVIRONMENT)

				export PACKAGE_TYPE="${configs[0]}"

				export DESIRED_PYTHON="${configs[1]}"

				export DESIRED_CUDA="${configs[2]}"

				export DESIRED_DEVTOOLSET="${configs[3]}"

				if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then

				  export BUILD_PYTHONLESS=1

				fi

				# Pick docker image

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  export DOCKER_IMAGE="soumith/conda-cuda"

				elif [[ "$DESIRED_CUDA" == cpu ]]; then

				  export DOCKER_IMAGE="soumith/manylinux-cuda80"

				else

				  export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"

				fi

				# Upload to parallel folder for gcc abis

				if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then

				  export PIP_UPLOAD_FOLDER='devtoolset7/'

				  if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				    echo "We don't handle conda builds with gcc ABI of 1, since we don't"

				    echo "want to add a new package name to the conda builds"

				    exit 1

				  fi

				else

				  export PIP_UPLOAD_FOLDER=''

				fi

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				export PYTORCH_BUILD_VERSION="1.1.0"

				export PYTORCH_BUILD_NUMBER=1

				cat >>"$envfile" <<EOL

				# =================== The following code will be executed inside Docker container ===================

				export TZ=UTC

				echo "Running on $(uname -a) at $(date)"

				export PACKAGE_TYPE="$PACKAGE_TYPE"

				export DESIRED_PYTHON="$DESIRED_PYTHON"

				export DESIRED_CUDA="$DESIRED_CUDA"

				export LIBTORCH_VARIANT="$LIBTORCH_VARIANT"

				export BUILD_PYTHONLESS="$BUILD_PYTHONLESS"

				export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.1.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				export TORCH_PACKAGE_NAME='torch'

				export TORCH_CONDA_BUILD_FOLDER='pytorch-1.1.0'

				export NO_FBGEMM=1

				export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"

				export DOCKER_IMAGE="$DOCKER_IMAGE"

				export workdir="$workdir"

				export MAC_PACKAGE_WORK_DIR="$workdir"

				export PYTORCH_ROOT="$workdir/pytorch"

				export BUILDER_ROOT="$workdir/builder"

				export MINICONDA_ROOT="$workdir/miniconda"

				export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"

				export CIRCLE_TAG="$CIRCLE_TAG"

				export CIRCLE_SHA1="$CIRCLE_SHA1"

				export CIRCLE_PR_NUMBER="$CIRCLE_PR_NUMBER"

				export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				# =================== The above code will be executed inside Docker container ===================

				EOL

				echo 'retry () {' >> "$envfile"

				echo '    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)' >> "$envfile"

				echo '}' >> "$envfile"

				echo 'export -f retry' >> "$envfile"

				cat "$envfile"

									
										46

.circleci/scripts/binary_run_in_docker.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,46 @@

				#!/bin/bash

				# This section is used in the binary_test and smoke_test jobs. It expects

				# 'binary_populate_env' to have populated /home/circleci/project/env and it

				# expects another section to populate /home/circleci/project/ci_test_script.sh

				# with the code to run in the docker

				# Expect all needed environment variables to be written to this file

				source /home/circleci/project/env

				echo "Running the following code in Docker"

				cat /home/circleci/project/ci_test_script.sh

				set -ex

				# Expect actual code to be written to this file

				chmod +x /home/circleci/project/ci_test_script.sh

				# Run the docker

				if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				  export id=$(docker run --runtime=nvidia -t -d "${DOCKER_IMAGE}")

				else

				  export id=$(docker run -t -d "${DOCKER_IMAGE}")

				fi

				# Copy the envfile and script with all the code to run into the docker.

				docker cp /home/circleci/project/. "$id:/circleci_stuff"

				# Copy built packages into the docker to test. This should only exist on the

				# binary test jobs. The package should've been created from a binary build job,

				# whhich persisted the package to a CircleCI workspace, which this job then

				# copies into a GPU enabled docker for testing

				if [[ -d "/home/circleci/project/final_pkgs" ]]; then

				  docker cp /home/circleci/project/final_pkgs "$id:/final_pkgs"

				fi

				# Copy the needed repos into the docker. These do not exist in the smoke test

				# jobs, since the smoke test jobs do not need the Pytorch source code.

				if [[ -d "$PYTORCH_ROOT" ]]; then

				  docker cp "$PYTORCH_ROOT" "$id:/pytorch"

				fi

				if [[ -d "$BUILDER_ROOT" ]]; then

				  docker cp "$BUILDER ROOT" "$id:/builder"

				fi

				# Execute the test script that was populated by an earlier section

				export COMMAND='((echo "source /circleci_stuff/env && /circleci_stuff/ci_test_script.sh") | docker exec -i "$id" bash) 2>&1'

				echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

									
										25

.circleci/verbatim-sources/binary-build-tests.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,25 @@

				# There is currently no testing for libtorch TODO

				#  binary_linux_libtorch_2.7m_cpu_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cpu"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_2.7m_cu80_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cu80"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_2.7m_cu90_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cu90"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_2.7m_cu100_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cu100"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

									
										47

.circleci/verbatim-sources/binary_update_htmls.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,47 @@

				# update_s3_htmls job

				  update_s3_htmls: &update_s3_htmls

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - run:

				        <<: *setup_linux_system_environment

				    - run:

				        <<: *binary_checkout

				    # N.B. we do not run binary_populate_env. The only variable we need is

				    # PIP_UPLOAD_FOLDER (which is 'nightly/' for the nightlies and '' for

				    # releases, and sometimes other things for special cases). Instead we

				    # expect PIP_UPLOAD_FOLDER to be passed directly in the env. This is

				    # because, unlike all the other binary jobs, these jobs only get run once,

				    # in a separate workflow. They are not a step in other binary jobs like

				    # build, test, upload.

				    #

				    # You could attach this to every job, or include it in the upload step if

				    # you wanted. You would need to add binary_populate_env in this case to

				    # make sure it has the same upload folder as the job it's attached to. This

				    # function is idempotent, so it won't hurt anything; it's just a little

				    # unnescessary"

				    - run:

				        name: Update s3 htmls

				        no_output_timeout: "1h"

				        command: |

				          echo "declare -x \"AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}\"" >> /home/circleci/project/env

				          echo "declare -x \"AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}\"" >> /home/circleci/project/env

				          source /home/circleci/project/env

				          set -ex

				          retry () {

				              $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				          }

				          retry pip install awscli==1.6

				          "/home/circleci/project/builder/cron/update_s3_htmls.sh"

				  # Update s3 htmls for the nightlies

				  update_s3_htmls_for_nightlies:

				    environment:

				      PIP_UPLOAD_FOLDER: "nightly/"

				    <<: *update_s3_htmls

				  # Update s3 htmls for the nightlies for devtoolset7

				  update_s3_htmls_for_nightlies_devtoolset7:

				    environment:

				      PIP_UPLOAD_FOLDER: "nightly/devtoolset7/"

				    <<: *update_s3_htmls

									
										288

.circleci/verbatim-sources/header-section.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,288 @@

				# WARNING: DO NOT EDIT THIS FILE DIRECTLY!!!

				# See the README.md in this directory.

				# IMPORTANT: To update Docker image version, please first update

				# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/pytorch/DockerVersion.groovy and

				# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/caffe2/DockerVersion.groovy,

				# and then update DOCKER_IMAGE_VERSION at the top of the following files:

				# * cimodel/data/pytorch_build_definitions.py

				# * cimodel/data/caffe2_build_definitions.py

				docker_config_defaults: &docker_config_defaults

				  user: jenkins

				  aws_auth:

				    # This IAM user only allows read-write access to ECR

				    aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V3}

				    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V3}

				# This system setup script is meant to run before the CI-related scripts, e.g.,

				# installing Git client, checking out code, setting up CI env, and

				# building/testing.

				setup_linux_system_environment: &setup_linux_system_environment

				  name: Set Up System Environment

				  no_output_timeout: "1h"

				  command: |

				    set -ex

				    # Set up CircleCI GPG keys for apt, if needed

				    curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -

				    # Stop background apt updates.  Hypothetically, the kill should not

				    # be necessary, because stop is supposed to send a kill signal to

				    # the process, but we've added it for good luck.  Also

				    # hypothetically, it's supposed to be unnecessary to wait for

				    # the process to block.  We also have that line for good luck.

				    # If you like, try deleting them and seeing if it works.

				    sudo systemctl stop apt-daily.service || true

				    sudo systemctl kill --kill-who=all apt-daily.service || true

				    sudo systemctl stop unattended-upgrades.service || true

				    sudo systemctl kill --kill-who=all unattended-upgrades.service || true

				    # wait until `apt-get update` has been killed

				    while systemctl is-active --quiet apt-daily.service

				    do

				      sleep 1;

				    done

				    while systemctl is-active --quiet unattended-upgrades.service

				    do

				      sleep 1;

				    done

				    # See if we actually were successful

				    systemctl list-units --all | cat

				    sudo apt-get purge -y unattended-upgrades

				    cat /etc/apt/sources.list

				    ps auxfww | grep [a]pt

				    ps auxfww | grep dpkg

				install_doc_push_script: &install_doc_push_script

				  name: Install the doc push script

				  no_output_timeout: "2m"

				  command: |

				    cat >/home/circleci/project/doc_push_script.sh <<EOL

				    # =================== The following code **should** be executed inside Docker container ===================

				    # This is where the local pytorch install in the docker image is located

				    pt_checkout="/var/lib/jenkins/workspace"

				    # Since we're cat-ing this file, we need to escape all $'s

				    echo "doc_push_script.sh: Invoked with \$*"

				    git clone https://yf225:${GITHUB_PYTORCHBOT_TOKEN}@github.com/pytorch/pytorch.github.io -b site

				    pushd pytorch.github.io

				    set -ex

				    # Argument 1: Where to copy the built documentation to

				    # (pytorch.github.io/$install_path)

				    install_path="\$1"

				    if [ -z "\$install_path" ]; then

				    echo "error: doc_push_script.sh: install_path (arg1) not specified"

				      exit 1

				    fi

				    # Argument 2: What version of the docs we are building.

				    version="\$2"

				    if [ -z "\$version" ]; then

				    echo "error: doc_push_script.sh: version (arg2) not specified"

				      exit 1

				    fi

				    is_master_doc=false

				    if [ "\$version" == "master" ]; then

				      is_master_doc=true

				    fi

				    # Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.

				    dry_run=false

				    if [ "\$3" != "" ]; then

				      dry_run=true

				    fi

				    echo "install_path: \$install_path  version: \$version  dry_run: \$dry_run"

				    export LC_ALL=C

				    export PATH=/opt/conda/bin:$PATH

				    rm -rf pytorch || true

				    # Install TensorBoard in python 3 so torch.utils.tensorboard classes render

				    pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl

				    # Get all the documentation sources, put them in one place

				    pushd "\$pt_checkout"

				    git clone https://github.com/pytorch/vision

				    pushd vision

				    conda install -q pillow

				    time python setup.py install

				    popd

				    pushd docs

				    rm -rf source/torchvision

				    cp -a ../vision/docs/source source/torchvision

				    # Build the docs

				    pip -q install -r requirements.txt || true

				    if [ "\$is_master_doc" = true ]; then

				      make html

				    else

				      make html-stable

				    fi

				    # Move them into the docs repo

				    popd

				    popd

				    git rm -rf "\$install_path" || true

				    mv "\$pt_checkout/docs/build/html" "\$install_path"

				    # Add the version handler by search and replace.

				    # XXX: Consider moving this to the docs Makefile or site build

				    if [ "\$is_master_doc" = true ]; then

				      find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"

				    else

				      find "\$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.\S+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\$version \&#x25BC</a>@g"

				    fi

				    git add "\$install_path" || true

				    git status

				    git config user.email "soumith+bot@pytorch.org"

				    git config user.name "pytorchbot"

				    # If there aren't changes, don't make a commit; push is no-op

				    git commit -m "auto-generating sphinx docs" || true

				    git status

				    if [ "\$dry_run" = false ]; then

				      echo "Pushing to pytorch.github.io:site"

				      git push origin site

				    else

				      echo "Skipping push due to dry_run"

				    fi

				    popd

				    # =================== The above code **should** be executed inside Docker container ===================

				    EOL

				    chmod +x /home/circleci/project/doc_push_script.sh

				# `setup_ci_environment` has to be run **after** the ``checkout`` step because

				# it writes into the checkout directory and otherwise CircleCI will complain

				# that

				#   Directory (/home/circleci/project) you are trying to checkout to is not empty and not git repository

				setup_ci_environment: &setup_ci_environment

				  name: Set Up CI Environment After Checkout

				  no_output_timeout: "1h"

				  command: |

				    set -ex

				    # Check if we should actually run

				    echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT}"

				    echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST}"

				    if [[ "${BUILD_ENVIRONMENT}" == *-slow-* ]]; then

				      if ! [ -z "${CIRCLE_PULL_REQUEST}" ]; then

				        # It's a PR; test for [slow ci] tag on the TOPMOST commit

				        if !(git log --format='%B' -n 1 HEAD | grep -q -e '\[slow ci\]' -e '\[ci slow\]' -e '\[test slow\]' -e '\[slow test\]'); then

				          circleci step halt

				          exit

				        fi

				      fi

				    fi

				    if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				      if ! [ -z "${CIRCLE_PULL_REQUEST}" ]; then

				        # It's a PR; test for [xla ci] tag on the TOPMOST commit

				        if !(git log --format='%B' -n 1 HEAD | grep -q -e '\[xla ci\]' -e '\[ci xla\]' -e '\[test xla\]' -e '\[xla test\]'); then

				          # NB: This doesn't halt everything, just this job.  So

				          # the rest of the workflow will keep going and you need

				          # to make sure you halt there too.  Blegh.

				          circleci step halt

				          exit

				        fi

				      fi

				    fi

				    # Set up NVIDIA docker repo

				    curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				    echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				    echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				    echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				    sudo apt-get -y update

				    sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce

				    # WARNING: Docker version is hardcoded here; you must update the

				    # version number below for docker-ce and nvidia-docker2 to get newer

				    # versions of Docker.  We hardcode these numbers because we kept

				    # getting broken CI when Docker would update their docker version,

				    # and nvidia-docker2 would be out of date for a day until they

				    # released a newer version of their package.

				    #

				    # How to figure out what the correct versions of these packages are?

				    # My preferred method is to start a Docker instance of the correct

				    # Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask

				    # apt what the packages you need are.  Note that the CircleCI image

				    # comes with Docker.

				    sudo apt-get -y install \

				      linux-headers-$(uname -r) \

				      linux-image-generic \

				      moreutils \

				      docker-ce=5:18.09.4~3-0~ubuntu-xenial \

				      nvidia-container-runtime=2.0.0+docker18.09.4-1 \

				      nvidia-docker2=2.0.3+docker18.09.4-1 \

				      expect-dev

				    sudo pkill -SIGHUP dockerd

				    sudo pip -q install awscli==1.16.35

				    if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				      DRIVER_FN="NVIDIA-Linux-x86_64-410.104.run"

				      wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				      sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				      nvidia-smi

				    fi

				    if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then

				      echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				      echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH}" >> /home/circleci/project/env

				      echo "declare -x PYTHON_VERSION=${PYTHON_VERSION}" >> /home/circleci/project/env

				      echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				      if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				        echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env

				      fi

				      export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				      export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				      export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				      echo "declare -x MAX_JOBS=${MAX_JOBS}" >> /home/circleci/project/env

				      if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				        # This IAM user allows write access to S3 bucket for sccache & bazels3cache

				        echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V1}" >> /home/circleci/project/env

				        echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V1}" >> /home/circleci/project/env

				      else

				        # This IAM user allows write access to S3 bucket for sccache

				        echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}" >> /home/circleci/project/env

				        echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}" >> /home/circleci/project/env

				      fi

				    fi

				    # This IAM user only allows read-write access to ECR

				    export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V3}

				    export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V3}

				    eval $(aws ecr get-login --region us-east-1 --no-include-email)

				macos_brew_update: &macos_brew_update

				  name: Brew update and install moreutils, expect and libomp

				  no_output_timeout: "1h"

				  command: |

				    set -ex

				    pwd

				    ls -lah

				    # moreutils installs a `parallel` executable by default, which conflicts

				    # with the executable from the GNU `parallel`, so we must unlink GNU

				    # `parallel` first, and relink it afterwards

				    brew update

				    brew unlink parallel

				    brew install moreutils

				    brew link parallel --overwrite

				    brew install expect

				    brew install libomp

									
										193

.circleci/verbatim-sources/job-specs-custom.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,193 @@

				  pytorch_short_perf_test_gpu:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn7-py3:300"

				      PYTHON_VERSION: "3.6"

				      USE_CUDA_DOCKER_RUNTIME: "1"

				    resource_class: gpu.medium

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - run:

				        <<: *setup_linux_system_environment

				    - run:

				        <<: *setup_ci_environment

				    - run:

				        name: Perf Test

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env

				          # This IAM user allows write access to S3 bucket for perf test numbers

				          echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_PERF_TEST_S3_BUCKET_V3}" >> /home/circleci/project/env

				          echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_PERF_TEST_S3_BUCKET_V3}" >> /home/circleci/project/env

				          docker cp /home/circleci/project/env $id:/var/lib/jenkins/workspace/env

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				  pytorch_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn7-py3:300"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - run:

				        <<: *setup_linux_system_environment

				    - run:

				        <<: *setup_ci_environment

				    - run:

				        <<: *install_doc_push_script

				    - run:

				        name: Doc Build and Push

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          docker cp /home/circleci/project/doc_push_script.sh $id:/var/lib/jenkins/workspace/doc_push_script.sh

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # stable release docs push. Due to some circleci limitations, we keep

				          # an eternal PR open for merging v1.1.0 -> master for this job.

				          # XXX: The following code is only run on the v1.1.0 branch, which might

				          # not be exactly the same as what you see here.

				          elif [[ "${CIRCLE_BRANCH}" == "v1.1.0" ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/stable 1.1.0") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          # For open PRs: Do a dry_run of the docs build, don't push build

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Save the docs build so we can debug any problems

				          export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug

				          docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}

				          docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				  pytorch_macos_10_13_py3_build:

				    macos:

				      xcode: "9.0"

				    steps:

				      - checkout

				      - run:

				          <<: *macos_brew_update

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

				            mkdir -p /Users/distiller/pytorch-ci-env/workspace

				            # copy with -a to preserve relative structure (e.g., symlinks), and be recursive

				            cp -a /Users/distiller/project/. /Users/distiller/pytorch-ci-env/workspace

				      - persist_to_workspace:

				          root: /Users/distiller/pytorch-ci-env

				          paths:

				            - "*"

				  pytorch_macos_10_13_py3_test:

				    macos:

				      xcode: "9.0"

				    steps:

				      - run:

				          name: Prepare workspace

				          command: |

				            sudo mkdir -p /Users/distiller/pytorch-ci-env

				            sudo chmod -R 777 /Users/distiller/pytorch-ci-env

				      - attach_workspace:

				          at: /Users/distiller/pytorch-ci-env

				      - run:

				          <<: *macos_brew_update

				      - run:

				          name: Test

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # copy with -a to preserve relative structure (e.g., symlinks), and be recursive

				            cp -a /Users/distiller/pytorch-ci-env/workspace/. /Users/distiller/project

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    macos:

				      xcode: "9.0"

				    steps:

				      - checkout

				      - run:

				          <<: *macos_brew_update

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # Install CUDA 9.2

				            sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true

				            curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip

				            unzip ~/cuda_9.2.64_mac_installer.zip -d ~/

				            sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window

				            sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib

				            sudo rm -rf /usr/local/cuda || true

				            # Install cuDNN 7.1 for CUDA 9.2

				            curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz

				            rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1

				            tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/

				            sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}

				            git submodule sync && git submodule update -q --init

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

									
										94

.circleci/verbatim-sources/linux-binary-build-defaults.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,94 @@

				# binary linux build defaults

				##############################################################################

				binary_linux_build: &binary_linux_build

				  resource_class: 2xlarge+

				  steps:

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      name: Install unbuffer and ts

				      command: |

				        set -ex

				        source /env

				        retry yum -q -y install epel-release

				        retry yum -q -y install expect moreutils

				  - run:

				      name: Upgrade gcc version (based on env var)

				      command: |

				        set -ex

				        source /env

				        if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then

				          source "/builder/upgrade_gcc_abi.sh"

				          # Env variables are not persisted into the next step

				          echo "export PATH=$PATH" > /env

				          echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" > /env

				        else

				          echo "Not upgrading gcc version"

				        fi

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        source "/pytorch/.circleci/scripts/binary_linux_build.sh"

				  - persist_to_workspace:

				      root: /

				      paths: final_pkgs

				# This should really just be another step of the binary_linux_build job above.

				# This isn't possible right now b/c the build job uses the docker executor

				# (otherwise they'd be really really slow) but this one uses the macine

				# executor (b/c we have to run the docker with --runtime=nvidia and we can't do

				# that on the docker executor)

				binary_linux_test: &binary_linux_test

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - attach_workspace:

				      at: /home/circleci/project

				  # This checkout is only needed to access

				  # .circleci/scripts/binary_linux_test.sh, which can't be inlined because it

				  # blows up the yaml size.

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      name: Prepare test code

				      no_output_timeout: "1h"

				      command: |

				        source "/home/circleci/project/pytorch/.circleci/scripts/binary_linux_test.sh"

				  - run:

				      <<: *binary_run_in_docker

				binary_linux_upload: &binary_linux_upload

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - attach_workspace:

				      at: /home/circleci/project

				  # This checkout is only needed to access

				  # .circleci/scripts/binary_linux_upload.sh, which can't be inlined because it

				  # blows up the yaml size.

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      <<: *binary_install_miniconda

				  - run:

				      name: Upload

				      no_output_timeout: "1h"

				      command: |

				        source "/home/circleci/project/pytorch/.circleci/scripts/binary_linux_upload.sh"

									
										190

.circleci/verbatim-sources/linux-build-defaults.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,190 @@

				##############################################################################

				# Linux build defaults

				##############################################################################

				pytorch_linux_build_defaults: &pytorch_linux_build_defaults

				  resource_class: large

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - checkout

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        set -e

				        # Pull Docker image and run build

				        echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				        docker pull ${DOCKER_IMAGE} >/dev/null

				        export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        git submodule sync && git submodule update -q --init

				        docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				        export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				        # Push intermediate Docker image for next phase to use

				        if [ -z "${BUILD_ONLY}" ]; then

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				          docker push ${COMMIT_DOCKER_IMAGE}

				        fi

				pytorch_linux_test_defaults: &pytorch_linux_test_defaults

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Test

				      no_output_timeout: "90m"

				      command: |

				        set -e

				        export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				        echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				        docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				        if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        else

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        fi

				        if [ -n "${MULTI_GPU}" ]; then

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        else

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        fi

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				caffe2_linux_build_defaults: &caffe2_linux_build_defaults

				  resource_class: large

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - checkout

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        set -e

				        cat >/home/circleci/project/ci_build_script.sh <<EOL

				        # =================== The following code will be executed inside Docker container ===================

				        set -ex

				        export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				        # Reinitialize submodules

				        git submodule sync && git submodule update -q --init --recursive

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				          sudo chown -R jenkins:jenkins '/opt/conda'

				        fi

				        # Build

				        ./.jenkins/caffe2/build.sh

				        # Show sccache stats if it is running

				        if pgrep sccache > /dev/null; then

				          sccache --show-stats

				        fi

				        # =================== The above code will be executed inside Docker container ===================

				        EOL

				        chmod +x /home/circleci/project/ci_build_script.sh

				        echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				        docker pull ${DOCKER_IMAGE} >/dev/null

				        export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				        export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				        # Push intermediate Docker image for next phase to use

				        if [ -z "${BUILD_ONLY}" ]; then

				          if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				          else

				            export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          fi

				          docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				          docker push ${COMMIT_DOCKER_IMAGE}

				        fi

				caffe2_linux_test_defaults: &caffe2_linux_test_defaults

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      name: Test

				      no_output_timeout: "1h"

				      command: |

				        set -e

				        # TODO: merge this into Caffe2 test.sh

				        cat >/home/circleci/project/ci_test_script.sh <<EOL

				        # =================== The following code will be executed inside Docker container ===================

				        set -ex

				        export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"

				        # libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...

				        sudo ln /dev/null /dev/raw1394

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				        fi

				        # Upgrade SSL module to avoid old SSL warnings

				        pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1

				        pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"

				        # Build

				        ./.jenkins/caffe2/test.sh

				        # Remove benign core dumps.

				        # These are tests for signal handling (including SIGABRT).

				        rm -f ./crash/core.fatal_signal_as.*

				        rm -f ./crash/core.logging_test.*

				        # =================== The above code will be executed inside Docker container ===================

				        EOL

				        chmod +x /home/circleci/project/ci_test_script.sh

				        if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}

				        else

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				        fi

				        echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				        docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				        if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        else

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				        fi

				        docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				        export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				        echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

									
										61

.circleci/verbatim-sources/macos-binary-build-defaults.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,61 @@

				##############################################################################

				# Macos binary build defaults

				# The root of everything is /Users/distiller/pytorch-ci-env/workspace

				##############################################################################

				binary_mac_build: &binary_mac_build

				  macos:

				    xcode: "9.0"

				  steps:

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      <<: *macos_brew_update

				  - run:

				      <<: *binary_install_miniconda

				  - run:

				      name: Build

				      no_output_timeout: "1h"

				      command: |

				        set -ex

				        script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"

				        cat "$script"

				        source "$script"

				  - run:

				      name: Test

				      no_output_timeout: "1h"

				      command: |

				        set -ex

				        script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"

				        cat "$script"

				        source "$script"

				  - persist_to_workspace:

				      root: /Users/distiller/project

				      paths: final_pkgs

				binary_mac_upload: &binary_mac_upload

				  macos:

				    xcode: "9.0"

				  steps:

				  - run:

				      <<: *binary_checkout

				  - run:

				      <<: *binary_populate_env

				  - run:

				      <<: *macos_brew_update

				  - run:

				      <<: *binary_install_miniconda

				  - attach_workspace:

				      at: /Users/distiller/project

				  - run:

				      name: Upload

				      no_output_timeout: "10m"

				      command: |

				        script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"

				        cat "$script"

				        source "$script"

									
										83

.circleci/verbatim-sources/macos-build-defaults.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,83 @@

				##############################################################################

				# Macos build defaults

				##############################################################################

				caffe2_macos_build_defaults: &caffe2_macos_build_defaults

				  macos:

				    xcode: "9.0"

				  steps:

				    - checkout

				    - run:

				        <<: *macos_brew_update

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          export IN_CIRCLECI=1

				          brew install cmake

				          # Reinitialize submodules

				          git submodule sync && git submodule update -q --init --recursive

				          # Reinitialize path (see man page for path_helper(8))

				          eval `/usr/libexec/path_helper -s`

				          # Use Homebrew Python if configured to do so

				          if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				          fi

				          pip -q install numpy

				          # Install Anaconda if we need to

				          if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            rm -rf ${TMPDIR}/anaconda

				            curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				            chmod +x ${TMPDIR}/conda.sh

				            /bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda

				            rm -f ${TMPDIR}/conda.sh

				            export PATH="${TMPDIR}/anaconda/bin:${PATH}"

				            source ${TMPDIR}/anaconda/bin/activate

				          fi

				          # Install sccache

				          sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				          sudo chmod +x /usr/local/bin/sccache

				          export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				          # This IAM user allows write access to S3 bucket for sccache

				          export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V3}

				          export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V3}

				          export SCCACHE_BIN=${PWD}/sccache_bin

				          mkdir -p ${SCCACHE_BIN}

				          if which sccache > /dev/null; then

				            printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"

				            chmod a+x "${SCCACHE_BIN}/clang++"

				            printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"

				            chmod a+x "${SCCACHE_BIN}/clang"

				            export PATH="${SCCACHE_BIN}:$PATH"

				          fi

				          # Build

				          if [ "${BUILD_IOS:-0}" -eq 1 ]; then

				            unbuffer scripts/build_ios.sh 2>&1 | ts

				          elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            # All conda build logic should be in scripts/build_anaconda.sh

				            unbuffer scripts/build_anaconda.sh 2>&1 | ts

				          else

				            unbuffer scripts/build_local.sh 2>&1 | ts

				          fi

				          # Show sccache stats if it is running

				          if which sccache > /dev/null; then

				            sccache --show-stats

				          fi

									
										95

.circleci/verbatim-sources/nightly-binary-build-defaults.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,95 @@

				##############################################################################

				# Binary build (nightlies nightly build) defaults

				# The binary builds use the docker executor b/c at time of writing the machine

				# executor is limited to only two cores and is painfully slow (4.5+ hours per

				# GPU build). But the docker executor cannot be run with --runtime=nvidia, and

				# so the binary test/upload jobs must run on a machine executor. The package

				# built in the build job is persisted to the workspace, which the test jobs

				# expect. The test jobs just run a few quick smoke tests (very similar to the

				# second-round-user-facing smoke tests above) and then upload the binaries to

				# their final locations. The upload part requires credentials that should only

				# be available to org-members.

				#

				# binary_checkout MUST be run before other commands here. This is because the

				# other commands are written in .circleci/scripts/*.sh , so the pytorch source

				# code must be downloaded on the machine before they can be run. We cannot

				# inline all the code into this file, since that would cause the yaml size to

				# explode past 4 MB (all the code in the command section is just copy-pasted to

				# everywhere in the .circleci/config.yml file where it appears).

				##############################################################################

				# Checks out the Pytorch and Builder repos (always both of them), and places

				# them in the right place depending on what executor we're running on. We curl

				# our .sh file from the interweb to avoid yaml size bloat. Note that many jobs

				# do not need both the pytorch and builder repos, so this is a little wasteful

				# (smoke tests and upload jobs do not need the pytorch repo).

				binary_checkout: &binary_checkout

				  name: Checkout

				  command: |

				    if [[ -n "$CIRCLE_SHA1" ]]; then

				      # we are on a PR or on a master-merge, but not on a timed build

				      git_url="https://raw.githubusercontent.com/pytorch/pytorch/$CIRCLE_SHA1"

				    else

				      # scheduled nightly binary builds. These run at 05:05 UTC every day. 

				      last_commit="$(git rev-list --before "$(date -u +%Y-%m-%d) 05:00" --max-count 1 HEAD)"

				      git_url="https://raw.githubusercontent.com/pytorch/pytorch/$last_commit"

				    fi

				    retry () {

				        $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				    }

				    retry curl -s "$git_url/.circleci/scripts/binary_checkout.sh" -o "binary_checkout.sh"

				    cat "binary_checkout.sh"

				    source "binary_checkout.sh"

				# Parses circleci arguments in a consistent way, essentially routing to the

				# correct pythonXgccXcudaXos build we want

				binary_populate_env: &binary_populate_env

				  name: Set up env

				  command: |

				    # This step runs on multiple executors with different envfile locations

				    if [[ "$(uname)" == Darwin ]]; then

				      # macos executor (builds and tests)

				      workdir="/Users/distiller/project"

				    elif [[ -d "/home/circleci/project" ]]; then

				      # machine executor (binary tests)

				      workdir="/home/circleci/project"

				    else

				      # docker executor (binary builds)

				      workdir="/"

				    fi

				    script="$workdir/pytorch/.circleci/scripts/binary_populate_env.sh"

				    cat "$script"

				    source "$script"

				binary_install_miniconda: &binary_install_miniconda

				  name: Install miniconda

				  no_output_timeout: "1h"

				  command: |

				    # This step runs on multiple executors with different envfile locations

				    if [[ "$(uname)" == Darwin ]]; then

				      # macos executor (builds and tests)

				      workdir="/Users/distiller/project"

				    elif [[ -d "/home/circleci/project" ]]; then

				      # machine executor (binary tests)

				      workdir="/home/circleci/project"

				    else

				      # docker executor (binary builds)

				      workdir="/"

				    fi

				    script="$workdir/pytorch/.circleci/scripts/binary_install_miniconda.sh"

				    cat "$script"

				    source "$script"

				# This section is used in the binary_test and smoke_test jobs. It expects

				# 'binary_populate_env' to have populated /home/circleci/project/env and it

				# expects another section to populate /home/circleci/project/ci_test_script.sh

				# with the code to run in the docker

				binary_run_in_docker: &binary_run_in_docker

				  name: Run in docker

				  command: |

				    # This step only runs on circleci linux machine executors that themselves

				    # need to start docker images

				    script="/home/circleci/project/pytorch/.circleci/scripts/binary_install_miniconda.sh"

				    cat "$script"

				    source "$script"

									
										49

.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,49 @@

				# Nighlty build smoke tests defaults

				# These are the second-round smoke tests. These make sure that the binaries are

				# correct from a user perspective, testing that they exist from the cloud are

				# are runnable. Note that the pytorch repo is never cloned into these jobs

				##############################################################################

				smoke_linux_test: &smoke_linux_test

				  machine:

				    image: ubuntu-1604:201903-01

				  steps:

				  - run:

				      <<: *setup_linux_system_environment

				  - run:

				      <<: *setup_ci_environment

				  - run:

				      <<: *binary_populate_env

				  - run:

				      name: Test

				      no_output_timeout: "1h"

				      command: |

				        set -ex

				        cat >/home/circleci/project/ci_test_script.sh <<EOL

				        # The following code will be executed inside Docker container

				        set -ex

				        git clone https://github.com/pytorch/builder.git /builder

				        /builder/smoke_test.sh

				        # The above code will be executed inside Docker container

				        EOL

				  - run:

				      <<: *binary_run_in_docker

				smoke_mac_test: &smoke_mac_test

				  macos:

				    xcode: "9.0"

				  steps:

				    - run:

				        <<: *binary_populate_env

				    - run:

				        <<: *macos_brew_update

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          source "/Users/distiller/project/env"

				          git clone https://github.com/pytorch/builder.git

				          unbuffer ./builder/smoke_test.sh | ts

									
										4

.circleci/verbatim-sources/workflows-binary-build-header.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,4 @@

				##############################################################################

				# Daily binary build trigger

				##############################################################################

									
										26

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,26 @@

				      # Binary builds (subset, to smoke test that they'll work)

				      - binary_linux_manywheel_2.7mu_cpu_devtoolset3_build

				      - binary_linux_manywheel_3.7m_cu100_devtoolset3_build

				      - binary_linux_conda_2.7_cpu_build

				      # This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3.6_cu90_build

				      - binary_linux_libtorch_2.7m_cu80_devtoolset3_build

				      - binary_macos_wheel_3.6_cpu_build

				      - binary_macos_conda_2.7_cpu_build

				      - binary_macos_libtorch_2.7_cpu_build

				      - binary_linux_manywheel_2.7mu_cpu_devtoolset3_test:

				          requires:

				            - binary_linux_manywheel_2.7mu_cpu_devtoolset3_build

				      - binary_linux_manywheel_3.7m_cu100_devtoolset3_test:

				          requires:

				            - binary_linux_manywheel_3.7m_cu100_devtoolset3_build

				      - binary_linux_conda_2.7_cpu_test:

				          requires:

				            - binary_linux_conda_2.7_cpu_build

				      # This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3.6_cu90_test:

				      #     requires:

				      #       - binary_linux_conda_3.6_cu90_build

									
										14

.circleci/verbatim-sources/workflows-nightly-uploads-header.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				      #- binary_linux_libtorch_2.7m_cpu_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cpu_build

				      #- binary_linux_libtorch_2.7m_cu80_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu80_build

				      #- binary_linux_libtorch_2.7m_cu90_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu90_build

				      #- binary_linux_libtorch_2.7m_cu100_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu100_build

				      # Nightly uploads

									
										8

.circleci/verbatim-sources/workflows-pytorch-macos-builds.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,8 @@

				      # Pytorch MacOS builds

				      - pytorch_macos_10_13_py3_build

				      - pytorch_macos_10_13_py3_test:

				          requires:

				            - pytorch_macos_10_13_py3_build

				      - pytorch_macos_10_13_cuda9_2_cudnn7_py3_build

									
										15

.circleci/verbatim-sources/workflows-s3-html.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,15 @@

				  # Scheduled to run 4 hours after the binary jobs start

				  update_s3_htmls:

				    triggers:

				      - schedule:

				          cron: "0 9 * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - update_s3_htmls_for_nightlies:

				          context: org-member

				      - update_s3_htmls_for_nightlies_devtoolset7:

				          context: org-member

									
										12

.circleci/verbatim-sources/workflows.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,12 @@

				##############################################################################

				##############################################################################

				# Workflows

				##############################################################################

				##############################################################################

				# PR jobs pr builds

				workflows:

				  version: 2

				  build:

				    jobs:

51

.clang-tidy

View File

 @ -1,43 +1,34 @@
 ---
 # NOTE: there must be no spaces before the '-', so put the comma first.
 # NOTE there must be no spaces before the '-', so put the comma first.
 Checks: '
   *
   ,modernize-*
   ,-cert-err58-cpp
   ,-cert-err60-cpp
   ,-clang-diagnostic-*
   -*
   ,bugprone-*
   ,-bugprone-forward-declaration-namespace
   ,-bugprone-macro-parentheses
   ,cppcoreguidelines-*
   ,-cppcoreguidelines-interfaces-global-init
   ,-cppcoreguidelines-owning-memory
   ,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
   ,-cppcoreguidelines-pro-bounds-constant-array-index
   ,-cppcoreguidelines-pro-bounds-pointer-arithmetic
   ,-cppcoreguidelines-pro-type-cstyle-cast
   ,-cppcoreguidelines-pro-type-reinterpret-cast
   ,-cppcoreguidelines-pro-type-static-cast-downcast
   ,-cppcoreguidelines-pro-type-union-access
   ,-cppcoreguidelines-pro-type-vararg
   ,-cppcoreguidelines-special-member-functions
   ,-fuchsia-*
   ,-google-build-using-namespace
   ,-google-explicit-constructor
   ,-google-readability-braces-around-statements
   ,-google-readability-namespace-comments
   ,-google-readability-todo
   ,-google-runtime-references
   ,-google-runtime-references
   ,-hicpp-braces-around-statements
   ,-hicpp-explicit-conversions
   ,-hicpp-no-array-decay
   ,-hicpp-special-member-functions
   ,-hicpp-vararg
   ,-llvm-header-guard
   ,-llvm-namespace-comment
   ,-misc-unused-parameters
   ,-modernize-make-unique
   ,hicpp-exception-baseclass
   ,hicpp-avoid-goto
   ,modernize-*
   ,-modernize-return-braced-init-list
   ,-modernize-use-auto
   ,-modernize-use-default-member-init
   ,-performance-unnecessary-value-param
   ,-readability-braces-around-statements
   ,-readability-else-after-return
   ,-readability-named-parameter
   ,clang-analyzer-*
   ,-modernize-use-using
   ,performance-*
   ,-performance-noexcept-move-constructor
   '
 WarningsAsErrors: ''
 HeaderFilterRegex: 'torch/csrc/'
 WarningsAsErrors: '*'
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
 CheckOptions:
 ...

2

.ctags.d/pytorch.ctags Normal file

View File

 @ -0,0 +1,2 @@
 --exclude=build/*
 --exclude=include/*

12

.flake8 Normal file

View File

 @ -0,0 +1,12 @@
 [flake8]
 select = B,C,E,F,P,T4,W,B9
 max-line-length = 120
 # C408 ignored because we like the dict keyword argument syntax
 # E501 is not flexible enough, we're using B950 instead
 ignore =
     E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,
     # these ignores are from flake8-bugbear; please fix!
     B007,B008,
     # these ignores are from flake8-comprehensions; please fix!
     C400,C401,C402,C403,C404,C405,C407,C411,
 exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,tools/amd_build/pyHIPIFY,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi

1

.gitattributes vendored Normal file

View File

				`@ -0,0 +1 @@`
				`*.bat text eol=crlf`

									
										49

.github/ISSUE_TEMPLATE/bug-report.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,49 @@

				---

				name: "\U0001F41B Bug Report"

				about: Submit a bug report to help us improve PyTorch

				---

				## 🐛 Bug

				<!-- A clear and concise description of what the bug is. -->

				## To Reproduce

				Steps to reproduce the behavior:

				1.

				1.

				1.

				<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

				## Expected behavior

				<!-- A clear and concise description of what you expected to happen. -->

				## Environment

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				 - PyTorch Version (e.g., 1.0):

				 - OS (e.g., Linux):

				 - How you installed PyTorch (`conda`, `pip`, source):

				 - Build command you used (if compiling from source):

				 - Python version:

				 - CUDA/cuDNN version:

				 - GPU models and configuration:

				 - Any other relevant information:

				## Additional context

				<!-- Add any other context about the problem here. -->

									
										9

.github/ISSUE_TEMPLATE/documentation.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,9 @@

				---

				name: "\U0001F4DA Documentation"

				about: Report an issue related to https://pytorch.org/docs

				---

				## 📚 Documentation

				<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

									
										24

.github/ISSUE_TEMPLATE/feature-request.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,24 @@

				---

				name: "\U0001F680Feature Request"

				about: Submit a proposal/request for a new PyTorch feature

				---

				## 🚀 Feature

				<!-- A clear and concise description of the feature proposal -->

				## Motivation

				<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->

				## Pitch

				<!-- A clear and concise description of what you want to happen. -->

				## Alternatives

				<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

				## Additional context

				<!-- Add any other context or screenshots about the feature request here. -->

									
										13

.github/ISSUE_TEMPLATE/questions-help-support.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				---

				name: "❓Questions/Help/Support"

				about: Do you need support? We have resources.

				---

				## ❓ Questions and Help

				### Please note that this issue tracker is not a help form and this issue will be closed.

				We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:

				- [Discussion Forum](https://discuss.pytorch.org/)

57

.gitignore vendored

View File

 @ -22,24 +22,33 @@
 aten/build/
 aten/src/ATen/Config.h
 aten/src/ATen/cuda/CUDAConfig.h
 build/
 caffe2/cpp_test/
 dist/
 docs/src/**/*
 docs/cpp/build
 docs/cpp/source/api
 test/.coverage
 test/.hypothesis/
 test/cpp/api/mnist
 test/custom_operator/model.pt
 test/data/gpu_tensors.pt
 test/data/legacy_modules.t7
 test/data/legacy_serialized.pt
 test/data/linear.pt
 dropout_model.pt
 test/generated_type_hints_smoketest.py
 test/htmlcov
 test/cpp_extensions/install/
 third_party/build/
 tools/shared/_utils_internal.py
 torch.egg-info/
 torch/__init__.pyi
 torch/csrc/autograd/generated/*
 torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/generated
 torch/csrc/generic/TensorMethods.cpp
 torch/csrc/jit/generated/*
 torch/csrc/jit/fuser/config.h
 torch/csrc/nn/THCUNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THNN_generic.cpp
 @ -47,20 +56,35 @@ torch/csrc/nn/THNN_generic.cwrap
 torch/csrc/nn/THNN_generic.h
 torch/csrc/nn/THNN.cpp
 torch/csrc/nn/THNN.cwrap
 torch/bin/
 torch/cmake/
 torch/lib/*.a*
 torch/lib/*.dll*
 torch/lib/*.exe*
 torch/lib/*.dylib*
 torch/lib/*.h
 torch/lib/*.lib
 torch/lib/*.so*
 torch/lib/protobuf*.pc
 torch/lib/build
 torch/lib/caffe2/
 torch/lib/cmake
 torch/lib/include
 torch/lib/pkgconfig
 torch/lib/protoc
 torch/lib/protobuf/
 torch/lib/tmp_install
 torch/lib/torch_shm_manager
 torch/lib/site-packages/
 torch/lib/python*
 torch/lib64
 torch/include/
 torch/share/
 torch/test/
 torch/version.py
 # Root level file used in CI to specify certain env configs.
 # E.g., see .circleci/config.yaml
 env
 # IPython notebook checkpoints
 .ipynb_checkpoints
 @ -140,13 +164,12 @@ docs/source/scripts/activation_images/
 # PyCharm files
 .idea
 # Visual Studio Code files
 .vscode
 .vs
 # OSX dir files
 .DS_Store
 # GDB history
 .gdb_history
 ## Caffe2
 # build, distribute, and bins (+ python proto bindings)
 @ -181,7 +204,6 @@ docs/dev
 *.sst
 *.ldb
 LOCK
 LOG*
 CURRENT
 MANIFEST-*
 @ -194,3 +216,26 @@ caffe2.egg-info
 # Atom/Watchman required file
 .watchmanconfig
 # Files generated by CLion
 cmake-build-debug
 # Files generated by ctags
 CTAGS
 tags
 TAGS
 # BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
 #
 # Below files are not deleted by "setup.py clean".
 # Visual Studio Code files
 .vscode
 .vs
 # YouCompleteMe config file
 .ycm_extra_conf.py
 # Files generated when a patch is rejected
 *.orig
 *.rej

35

.gitmodules vendored

View File

 @ -1,9 +1,3 @@
 [submodule "third_party/catch"]
 	path = third_party/catch
 	url = https://github.com/catchorg/Catch2.git
 [submodule "third_party/nanopb"]
 	path = third_party/nanopb
 	url = https://github.com/nanopb/nanopb.git
 [submodule "third_party/pybind11"]
 	path = third_party/pybind11
 	url = https://github.com/pybind/pybind11.git
 @ -16,9 +10,6 @@
 [submodule "third_party/googletest"]
 	path = third_party/googletest
 	url = https://github.com/google/googletest.git
 [submodule "third_party/nervanagpu"]
 	path = third_party/nervanagpu
 	url = https://github.com/NervanaSystems/nervanagpu.git
 [submodule "third_party/benchmark"]
 	path = third_party/benchmark
 	url = https://github.com/google/benchmark.git
 @ -61,21 +52,33 @@
 [submodule "third_party/python-six"]
 	path = third_party/python-six
 	url = https://github.com/benjaminp/six.git
 [submodule "third_party/ComputeLibrary"]
 	path = third_party/ComputeLibrary
 	url = https://github.com/ARM-software/ComputeLibrary.git
 [submodule "third_party/onnx"]
 	path = third_party/onnx
 	url = https://github.com/onnx/onnx.git
 [submodule "third_party/cereal"]
 	path = third_party/cereal
 	url = https://github.com/USCiLab/cereal
 [submodule "third_party/onnx-tensorrt"]
 	path = third_party/onnx-tensorrt
 	url = https://github.com/onnx/onnx-tensorrt
 [submodule "third_party/sleef"]
 	path = third_party/sleef
 	url = https://github.com/shibatch/sleef
 	url = https://github.com/zdevito/sleef
 [submodule "third_party/ideep"]
 	path = third_party/ideep
 	url = https://github.com/intel/ideep
 [submodule "third_party/nccl/nccl"]
 	path = third_party/nccl/nccl
 	url = https://github.com/NVIDIA/nccl
 [submodule "third_party/gemmlowp/gemmlowp"]
 	path = third_party/gemmlowp/gemmlowp
 	url = https://github.com/google/gemmlowp.git
 [submodule "third_party/QNNPACK"]
 	path = third_party/QNNPACK
 	url = https://github.com/pytorch/QNNPACK
 [submodule "third_party/neon2sse"]
 	path = third_party/neon2sse
 	url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
 [submodule "third_party/fbgemm"]
 	path = third_party/fbgemm
 	url = https://github.com/pytorch/fbgemm
 [submodule "third_party/foxi"]
 	path = third_party/foxi
 	url = https://github.com/houseroad/foxi.git

									
										46

.jenkins/caffe2/bench.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,46 @@

				#!/bin/bash

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Anywhere except $ROOT_DIR should work. This is so the python import doesn't

				# get confused by any 'caffe2' directory in cwd

				cd "$INSTALL_PREFIX"

				if [[ $BUILD_ENVIRONMENT == *-cuda* ]]; then

				    num_gpus=$(nvidia-smi -L | wc -l)

				elif [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				    num_gpus=$(rocminfo | grep 'Device Type.*GPU' | wc -l)

				else

				    num_gpus=0

				fi

				caffe2_pypath="$(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.dirname(os.path.realpath(caffe2.__file__)))')"

				# Resnet50

				if (( $num_gpus == 0 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu

				fi

				if (( $num_gpus >= 1 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16

				fi

				if (( $num_gpus >= 2 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2

				fi

				if (( $num_gpus >= 4 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4

				fi

				# ResNext

				if (( $num_gpus == 0 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu

				fi

				if (( $num_gpus >= 1 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16

				fi

				if (( $num_gpus >= 2 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2

				fi

				if (( $num_gpus >= 4 )); then

				    "$PYTHON" "$caffe2_pypath/python/examples/resnet50_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4

				fi

									
										377

.jenkins/caffe2/build.sh
									
												View File
												
				@ -2,46 +2,72 @@

				set -ex

				# The INSTALL_PREFIX here must match up with test.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)

				CMAKE_ARGS=()

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# TODO: Migrate all centos jobs to use proper devtoolset

				if [[ "$BUILD_ENVIRONMENT" == *py2-cuda9.0-cudnn7-centos7* ]]; then

				  # There is a bug in pango packge on Centos7 that causes undefined

				  # symbols, upgrading glib2 to >=2.56.1 solves the issue. See

				  # https://bugs.centos.org/view.php?id=15495

				  sudo yum install -y -q glib2-2.56.1

				fi

				# Setup SCCACHE

				###############################################################################

				# Setup sccache if SCCACHE_BUCKET is set

				if [ -n "${SCCACHE_BUCKET}" ]; then

				  mkdir -p ./sccache

				  SCCACHE="$(which sccache)"

				  if [ -z "${SCCACHE}" ]; then

				    echo "Unable to find sccache..."

				    exit 1

				  fi

				  # Setup wrapper scripts

				  for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do

				    (

				      echo "#!/bin/sh"

				      echo "exec $SCCACHE $(which $compiler) \"\$@\""

				    ) > "./sccache/$compiler"

				    chmod +x "./sccache/$compiler"

				# CMAKE_ARGS are only passed to 'cmake' and the -Dfoo=bar does not work with

				# setup.py, so we build a list of foo=bars and then either convert it to

				# -Dfoo=bars or export them before running setup.py

				build_args=()

				build_to_cmake () {

				  cmake_args=()

				  for build_arg in $*; do

				    cmake_args+=("-D$build_arg")

				  done

				  echo ${cmake_args[@]}

				}

				  if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				    (

				      echo "#!/bin/sh"

				      echo "exec $SCCACHE $(which nvcc) \"\$@\""

				    ) > "./sccache/nvcc"

				    chmod +x "./sccache/nvcc"

				SCCACHE="$(which sccache)"

				if [ "$(which gcc)" != "/root/sccache/gcc" ]; then

				  # Setup SCCACHE

				  ###############################################################################

				  # Setup sccache if SCCACHE_BUCKET is set

				  if [ -n "${SCCACHE_BUCKET}" ]; then

				    mkdir -p ./sccache

				    SCCACHE="$(which sccache)"

				    if [ -z "${SCCACHE}" ]; then

				      echo "Unable to find sccache..."

				      exit 1

				    fi

				    # Setup wrapper scripts

				    wrapped="cc c++ gcc g++ x86_64-linux-gnu-gcc"

				    if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				        wrapped="$wrapped nvcc"

				    fi

				    for compiler in $wrapped; do

				      (

				        echo "#!/bin/sh"

				        # TODO: if/when sccache gains native support for an

				        # SCCACHE_DISABLE flag analogous to ccache's CCACHE_DISABLE,

				        # this can be removed. Alternatively, this can be removed when

				        # https://github.com/pytorch/pytorch/issues/13362 is fixed.

				        #

				        # NOTE: carefully quoted - we want `which compiler` to be

				        # resolved as we execute the script, but SCCACHE_DISABLE and

				        # $@ to be evaluated when we execute the script

				        echo 'test $SCCACHE_DISABLE && exec '"$(which $compiler)"' "$@"'

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				      ) > "./sccache/$compiler"

				      chmod +x "./sccache/$compiler"

				    done

				    export CACHE_WRAPPER_DIR="$PWD/sccache"

				    # CMake must find these wrapper scripts

				    export PATH="$CACHE_WRAPPER_DIR:$PATH"

				  fi

				  export CACHE_WRAPPER_DIR="$PWD/sccache"

				  # CMake must find these wrapper scripts

				  export PATH="$CACHE_WRAPPER_DIR:$PATH"

				fi

				# Setup ccache if configured to use it (and not sccache)

				@ -59,6 +85,15 @@ if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then

				  export PATH="$CACHE_WRAPPER_DIR:$PATH"

				fi

				# sccache will fail for CUDA builds if all cores are used for compiling

				if [ -z "$MAX_JOBS" ]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then

				    MAX_JOBS=`expr $(nproc) - 1`

				  else

				    MAX_JOBS=$(nproc)

				  fi

				fi

				report_compile_cache_stats() {

				  if [[ -n "${SCCACHE}" ]]; then

				    "$SCCACHE" --show-stats

				@ -67,73 +102,61 @@ report_compile_cache_stats() {

				  fi

				}

				###############################################################################

				# Explicitly set Python executable.

				###############################################################################

				# On Ubuntu 16.04 the default Python is still 2.7.

				PYTHON="$(which python)"

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON=$(which "python${BASH_REMATCH[1]}")

				  CMAKE_ARGS+=("-DPYTHON_EXECUTABLE=${PYTHON}")

				fi

				###############################################################################

				# Use special scripts for Android, conda, and setup builds

				# Use special scripts for Android and setup builds

				###############################################################################

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				  CMAKE_ARGS+=("-DBUILD_BINARY=ON")

				  CMAKE_ARGS+=("-DBUILD_TEST=ON")

				  CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				  CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				  "${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"

				  exit 0

				elif [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				  "${ROOT_DIR}/scripts/build_anaconda.sh" --skip-tests --install-locally "$@"

				  report_compile_cache_stats

				  # This build will be tested against onnx tests, which needs onnx installed.

				  # At this point the visible protbuf installation will be in conda, since one

				  # of Caffe2's dependencies uses conda, so the correct protobuf include

				  # headers are those in conda as well

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PROTOBUF_INCDIR=/opt/conda/include pip install -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				  report_compile_cache_stats

				  exit 0

				elif [[ $BUILD_ENVIRONMENT == *setup* ]]; then

				  rm -rf $INSTALL_PREFIX && mkdir $INSTALL_PREFIX

				  PYTHONPATH=$INSTALL_PREFIX $PYTHON setup_caffe2.py develop --install-dir $INSTALL_PREFIX

				  build_args+=("BUILD_BINARY=ON")

				  build_args+=("BUILD_TEST=ON")

				  build_args+=("USE_OBSERVERS=ON")

				  build_args+=("USE_ZSTD=ON")

				  "${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@"

				  exit 0

				fi

				###############################################################################

				# Set cmake args

				# Set parameters

				###############################################################################

				CMAKE_ARGS+=("-DBUILD_BINARY=ON")

				CMAKE_ARGS+=("-DBUILD_TEST=ON")

				CMAKE_ARGS+=("-DINSTALL_TEST=ON")

				CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")

				if [[ $BUILD_ENVIRONMENT == *-aten-* ]]; then

				  if [[ CMAKE_ARGS != *USE_ATEN* ]] && [[ CMAKE_ARGS != *BUILD_ATEN* ]]; then

				    CMAKE_ARGS+=("-DBUILD_ATEN=ON")

				  fi

				if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				  build_args+=("BUILD_PYTHON=OFF")

				else

				  build_args+=("BUILD_PYTHON=ON")

				  build_args+=("PYTHON_EXECUTABLE=${PYTHON}")

				fi

				if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then

				  CMAKE_ARGS+=("-DBLAS=MKL")

				  build_args+=("BLAS=MKL")

				  build_args+=("USE_MKLDNN=ON")

				fi

				build_args+=("BUILD_BINARY=ON")

				build_args+=("BUILD_TEST=ON")

				build_args+=("INSTALL_TEST=ON")

				build_args+=("USE_ZSTD=ON")

				if [[ $BUILD_ENVIRONMENT == *py2-cuda9.0-cudnn7-ubuntu16.04* ]]; then

				  # removing http:// duplicate in favor of nvidia-ml.list

				  # which is https:// version of the same repo

				  sudo rm -f /etc/apt/sources.list.d/nvidia-machine-learning.list

				  curl -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb

				  sudo dpkg -i ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb

				  sudo apt-key add /var/nvinfer-runtime-trt-repo-5.0.2-ga-cuda9.0/7fa2af80.pub

				  sudo apt-get -qq update

				  sudo apt-get install -y --no-install-recommends libnvinfer5=5.0.2-1+cuda9.0 libnvinfer-dev=5.0.2-1+cuda9.0

				  rm ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb

				  build_args+=("USE_TENSORRT=ON")

				fi

				if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  CMAKE_ARGS+=("-DUSE_CUDA=ON")

				  CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")

				  CMAKE_ARGS+=("-DUSE_NNPACK=OFF")

				  build_args+=("USE_CUDA=ON")

				  build_args+=("USE_NNPACK=OFF")

				  # Target only our CI GPU machine's CUDA arch to speed up the build

				  build_args+=("TORCH_CUDA_ARCH_LIST=Maxwell")

				  # Explicitly set path to NVCC such that the symlink to ccache or sccache is used

				  CMAKE_ARGS+=("-DCUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")

				  build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")

				  # Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.

				  # Setting PATH to resolve to the right nvcc alone isn't enough.

				@ -144,81 +167,103 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  export PATH="/usr/local/cuda/bin:$PATH"

				fi

				if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  # TODO: This is patching the official FindHip to properly handly

				  # cmake generator expression. A PR is opened in the upstream repo here:

				  # https://github.com/ROCm-Developer-Tools/HIP/pull/516

				  # remove this hack once it's merged.

				  if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then

				    sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake

				  fi

				  build_args+=("USE_ROCM=ON")

				  # This is needed to enable ImageInput operator in resnet50_trainer

				  build_args+=("USE_OPENCV=ON")

				  # This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

				  build_args+=("USE_LMDB=ON")

				  # When hcc runs out of memory, it silently exits without stopping

				  # the build process, leaving undefined symbols in the shared lib

				  # which will cause undefined symbol errors when later running

				  # tests. Setting MAX_JOBS to smaller number to make CI less flaky.

				  export MAX_JOBS=4

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  export HCC_AMDGPU_TARGET=gfx900

				  ########## HIPIFY Caffe2 operators

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py"

				fi

				# building bundled nccl in this config triggers a bug in nvlink. For

				# more, see https://github.com/pytorch/pytorch/issues/14486

				if [[ "${BUILD_ENVIRONMENT}" == *-cuda8*-cudnn7* ]]; then

				    build_args+=("USE_SYSTEM_NCCL=ON")

				fi

				# Try to include Redis support for Linux builds

				if [ "$(uname)" == "Linux" ]; then

				  CMAKE_ARGS+=("-DUSE_REDIS=ON")

				fi

				# Currently, on Jenkins mac os, we will use custom protobuf. Mac OS

				# contbuild at the moment is minimal dependency - it doesn't use glog

				# or gflags either.

				if [ "$(uname)" == "Darwin" ]; then

				  CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=ON")

				  build_args+=("USE_REDIS=ON")

				fi

				# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace

				CMAKE_ARGS+=("-DONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")

				if [[ -n "$INTEGRATED" ]]; then

				    # TODO: This is a temporary hack to work around the issue that both

				    # caffe2 and pytorch have libcaffe2.so and crossfire at runtime.

				    CMAKE_ARGS+=("-DBUILD_SHARED_LIBS=OFF")

				    CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=OFF")

				    CMAKE_ARGS+=("-DCAFFE2_LINK_LOCAL_PROTOBUF=OFF")

				fi

				# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)

				# and use that if so.

				if [[ -x "$(command -v cmake3)" ]]; then

				    CMAKE_BINARY=cmake3

				else

				    CMAKE_BINARY=cmake

				fi

				# sccache will fail for CUDA builds if all cores are used for compiling

				if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then

				  MAX_JOBS=`expr $(nproc) - 1`

				else

				  MAX_JOBS=$(nproc)

				fi

				build_args+=("ONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")

				###############################################################################

				# Configure and make

				###############################################################################

				# Run cmake from ./build_caffe2 directory so it doesn't conflict with

				# standard PyTorch build directory. Eventually these won't need to

				# be separate.

				rm -rf build_caffe2

				mkdir build_caffe2

				cd ./build_caffe2

				# Configure

				${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"

				if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				  # cmake-only non-setup.py build, to test cpp only bits. This installs into

				  # /usr/local/caffe2 and installs no Python tests

				  build_args+=("CMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")

				  # Run cmake from ./build_caffe2 directory so it doesn't conflict with

				  # standard PyTorch build directory. Eventually these won't need to

				  # be separate.

				  rm -rf build_caffe2

				  mkdir build_caffe2

				  cd ./build_caffe2

				  # We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)

				  # and use that if so.

				  if [[ -x "$(command -v cmake3)" ]]; then

				      CMAKE_BINARY=cmake3

				  else

				      CMAKE_BINARY=cmake

				  fi

				  # Configure

				  ${CMAKE_BINARY} "${ROOT_DIR}" $(build_to_cmake ${build_args[@]}) "$@"

				  # Build

				  if [ "$(uname)" == "Linux" ]; then

				    make "-j${MAX_JOBS}" install

				  else

				    echo "Don't know how to build on $(uname)"

				    exit 1

				  fi

				  # This is to save test binaries for testing

				  mv "$INSTALL_PREFIX/test/" "$INSTALL_PREFIX/cpp_test/"

				  ls -lah $INSTALL_PREFIX

				# Build

				if [ "$(uname)" == "Linux" ]; then

				  make "-j${MAX_JOBS}" install

				else

				  echo "Don't know how to build on $(uname)"

				  exit 1

				  # Python build. Uses setup.py to install into site-packages

				  build_args+=("USE_LEVELDB=ON")

				  build_args+=("USE_LMDB=ON")

				  build_args+=("USE_OPENCV=ON")

				  build_args+=("BUILD_TEST=ON")

				  # These flags preserve the flags that were used before this refactor (blame

				  # me)

				  build_args+=("USE_GLOG=ON")

				  build_args+=("USE_GFLAGS=ON")

				  build_args+=("USE_FBGEMM=OFF")

				  build_args+=("USE_MKLDNN=OFF")

				  build_args+=("USE_DISTRIBUTED=ON")

				  for build_arg in "${build_args[@]}"; do

				    export $build_arg

				  done

				  # sccache will be stuck if  all cores are used for compiling

				  # see https://github.com/pytorch/pytorch/pull/7361

				  if [[ -n "${SCCACHE}" ]]; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				  $PYTHON setup.py install --user

				  report_compile_cache_stats

				fi

				report_compile_cache_stats

				###############################################################################

				# Install ONNX

				###############################################################################

				@ -227,47 +272,3 @@ report_compile_cache_stats

				pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				report_compile_cache_stats

				if [[ -n "$INTEGRATED" ]]; then

				  # sccache will be stuck if  all cores are used for compiling

				  # see https://github.com/pytorch/pytorch/pull/7361

				  if [[ -n "${SCCACHE}" ]]; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				  pip install --user -v -b /tmp/pip_install_torch "file://${ROOT_DIR}#egg=torch"

				fi

				report_compile_cache_stats

				# Symlink the caffe2 base python path into the system python path,

				# so that we can import caffe2 without having to change $PYTHONPATH.

				# Run in a subshell to contain environment set by /etc/os-release.

				#

				# This is only done when running on Jenkins!  We don't want to pollute

				# the user environment with Python symlinks and ld.so.conf.d hacks.

				#

				if [ -n "${JENKINS_URL}" ]; then

				  (

				    source /etc/os-release

				    function python_version() {

				      "$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'

				    }

				    # Debian/Ubuntu

				    if [[ "$ID_LIKE" == *debian* ]]; then

				      python_path="/usr/local/lib/$(python_version)/dist-packages"

				      sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				    fi

				    # RHEL/CentOS

				    if [[ "$ID_LIKE" == *rhel* ]]; then

				      python_path="/usr/lib64/$(python_version)/site-packages/"

				      sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				    fi

				    # /etc/ld.so.conf.d is used on both Debian and RHEL

				    echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf

				    sudo ldconfig

				  )

				fi

									
										22

.jenkins/caffe2/common.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,22 @@

				set -ex

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)

				TEST_DIR="$ROOT_DIR/caffe2_tests"

				gtest_reports_dir="${TEST_DIR}/cpp"

				pytest_reports_dir="${TEST_DIR}/python"

				# Figure out which Python to use

				PYTHON="$(which python)"

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON=$(which "python${BASH_REMATCH[1]}")

				fi

				# /usr/local/caffe2 is where the cpp bits are installed to in in cmake-only

				# builds. In +python builds the cpp tests are copied to /usr/local/caffe2 so

				# that the test code in .jenkins/test.sh is the same

				INSTALL_PREFIX="/usr/local/caffe2"

				mkdir -p "$gtest_reports_dir" || true

				mkdir -p "$pytest_reports_dir" || true

				mkdir -p "$INSTALL_PREFIX" || true

									
										171

.jenkins/caffe2/test.sh
									
												View File
												
				@ -1,31 +1,6 @@

				#!/bin/bash

				set -ex

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)

				TEST_DIR=$ROOT_DIR/caffe2_tests

				# Figure out which Python to use

				PYTHON="python"

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON="python${BASH_REMATCH[1]}"

				fi

				# The prefix must mirror the setting from build.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				# Anaconda builds have a special install prefix and python

				if [[ "$BUILD_ENVIRONMENT" == conda* ]]; then

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PYTHON="/opt/conda/bin/python"

				  INSTALL_PREFIX="/opt/conda/"

				fi

				# Add the site-packages in the caffe2 install prefix to the PYTHONPATH

				SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")

				INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Skip tests in environments where they are not built/applicable

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				@ -33,83 +8,119 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  exit 0

				fi

				# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed

				# Caffe2. This shouldn't be done on Anaconda, as Anaconda should handle this.

				if [[ "$BUILD_ENVIRONMENT" != conda* ]]; then

				  export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"

				  export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"

				# Find where cpp tests and Caffe2 itself are installed

				if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				  # For cmake only build we install everything into /usr/local

				  cpp_test_dir="$INSTALL_PREFIX/cpp_test"

				  ld_library_path="$INSTALL_PREFIX/lib"

				else

				  # For Python builds we install into python

				  # cd to /usr first so the python import doesn't get confused by any 'caffe2'

				  # directory in cwd

				  python_installation="$(dirname $(dirname $(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.realpath(caffe2.__file__))')))"

				  caffe2_pypath="$python_installation/caffe2"

				  cpp_test_dir="$python_installation/torch/test"

				  ld_library_path="$python_installation/torch/lib"

				fi

				cd "$ROOT_DIR"

				if [ -d $TEST_DIR ]; then

				  echo "Directory $TEST_DIR already exists; please remove it..."

				  exit 1

				fi

				mkdir -p $TEST_DIR/{cpp,python}

				cd ${INSTALL_PREFIX}

				# C++ tests

				################################################################################

				# C++ tests #

				################################################################################

				echo "Running C++ tests.."

				gtest_reports_dir="${TEST_DIR}/cpp"

				junit_reports_dir="${TEST_DIR}/junit_reports"

				mkdir -p "$gtest_reports_dir" "$junit_reports_dir"

				for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do

				for test in $(find "$cpp_test_dir" -executable -type f); do

				  case "$test" in

				    # skip tests we know are hanging or bad

				    */mkl_utils_test|*/aten/integer_divider_test)

				      continue

				      ;;

				    */aten/*)

				      # ATen uses test framework Catch2

				      "$test" -r=xml -o "${junit_reports_dir}/$(basename $test).xml"

				    */scalar_tensor_test|*/basic|*/native_test)

				      if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				        continue

				      else

				        LD_LIBRARY_PATH="$ld_library_path" "$test"

				      fi

				      ;;

				    *)

				      "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"

				      # Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While

				      # planning to migrate to gtest as the common PyTorch c++ test suite, we

				      # currently do NOT use the xml test reporter, because Catch doesn't

				      # support multiple reporters

				      # c.f. https://github.com/catchorg/Catch2/blob/master/docs/release-notes.md#223

				      # which means that enabling XML output means you lose useful stdout

				      # output for Jenkins.  It's more important to have useful console

				      # output than it is to have XML output for Jenkins.

				      # Note: in the future, if we want to use xml test reporter once we switch

				      # to all gtest, one can simply do:

				      LD_LIBRARY_PATH="$ld_library_path" \

				          "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"

				      ;;

				  esac

				done

				# Get the relative path to where the caffe2 python module was installed

				CAFFE2_PYPATH="$INSTALL_SITE_DIR/caffe2"

				################################################################################

				# Python tests #

				################################################################################

				if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				  exit 0

				fi

				if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then

				  # Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04

				  # See comments on

				  # https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830

				  sudo pip -q uninstall -y hypothesis

				  # "pip install hypothesis==3.44.6" from official server is unreliable on

				  # CircleCI, so we host a copy on S3 instead

				  sudo pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl

				  sudo pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl

				  sudo pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl

				else

				  pip install --user --no-cache-dir hypothesis==3.59.0

				fi

				# Collect additional tests to run (outside caffe2/python)

				EXTRA_TESTS=()

				# CUDA builds always include NCCL support

				if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then

				  EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")

				  EXTRA_TESTS+=("$caffe2_pypath/contrib/nccl")

				fi

				conda_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == conda* ]]; then

				  # These tests both assume Caffe2 was built with leveldb, which is not the case

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/dataio_test.py")

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/checkpoint_test.py")

				rocm_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  # Currently these tests are failing on ROCM platform:

				  # Unknown reasons, need to debug

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/piecewise_linear_transform_test.py")

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/softmax_ops_test.py")

				  # On ROCm, RCCL (distributed) development isn't complete.

				  # https://github.com/ROCmSoftwarePlatform/rccl

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")

				fi

				# NB: Warnings are disabled because they make it harder to see what

				# the actual erroring test is

				echo "Running Python tests.."

				pip install --user pytest-sugar

				"$PYTHON" \

				  -m pytest \

				  -x \

				  -v \

				  --disable-warnings \

				  --junit-xml="$pytest_reports_dir/result.xml" \

				  --ignore "$caffe2_pypath/python/test/executor_test.py" \

				  --ignore "$caffe2_pypath/python/operator_test/matmul_op_test.py" \

				  --ignore "$caffe2_pypath/python/operator_test/pack_ops_test.py" \

				  --ignore "$caffe2_pypath/python/mkl/mkl_sbn_speed_test.py" \

				  ${rocm_ignore_test[@]} \

				  "$caffe2_pypath/python" \

				  "${EXTRA_TESTS[@]}"

				# TODO: re-enable this for rocm CI jobs once we have more rocm workers

				if [[ $BUILD_ENVIRONMENT != *rocm* ]]; then

				  # Python tests

				  echo "Running Python tests.."

				  "$PYTHON" \

				    -m pytest \

				    -x \

				    -v \

				    --junit-xml="$TEST_DIR/python/result.xml" \

				    --ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \

				    --ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \

				    --ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \

				    --ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \

				    ${conda_ignore_test[@]} \

				    "$CAFFE2_PYPATH/python" \

				    "${EXTRA_TESTS[@]}"

				fi

				if [[ -n "$INTEGRATED" ]]; then

				  pip install --user pytest-xdist torchvision

				  "$ROOT_DIR/scripts/onnx/test.sh" -p

				#####################

				# torchvision tests #

				#####################

				if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then

				  pip install --user torchvision

				  "$ROOT_DIR/scripts/onnx/test.sh"

				fi

									
										24

.jenkins/pytorch/build-asan.sh
									
												View File
												
				@ -4,7 +4,9 @@

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Clang version:"

				@ -14,8 +16,24 @@ clang --version

				# symbolize=1: Gives us much better errors when things go wrong

				export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				# FIXME: Remove the hardcoded "-pthread" option.

				# With asan build, the cmake thread CMAKE_HAVE_LIBC_CREATE[1] checking will

				# succeed because "pthread_create" is in libasan.so. However, libasan doesn't

				# have the full pthread implementation. Other advanced pthread functions doesn't

				# exist in libasan.so[2]. If we need some pthread advanced functions, we still

				# need to link the pthread library.

				# This issue is already fixed in cmake 3.13[3]. If we use the newer cmake, we

				# could remove this hardcoded option.

				#

				# [1] https://github.com/Kitware/CMake/blob/8cabaaf054a16ea9c8332ce8e9291bd026b38c62/Modules/FindThreads.cmake#L135

				# [2] https://wiki.gentoo.org/wiki/AddressSanitizer/Problems

				# [3] https://github.com/Kitware/CMake/commit/e9a1ddc594de6e6251bf06d732775dae2cabe4c8

				#

				# TODO: Make the ASAN flags a more unified env var

				CC="clang" CXX="clang++" LDSHARED="clang --shared" \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \

				  NO_CUDA=1 DEBUG=1 \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \

				  CXX_FLAGS="-pthread" \

				  NO_CUDA=1 USE_MKLDNN=0 \

				  python setup.py install

				assert_git_not_dirty

									
										249

.jenkins/pytorch/build.sh
									
												View File
												
				@ -1,22 +1,41 @@

				#!/bin/bash

				if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*

				fi

				# TODO: move this to Docker

				# TODO: add both NCCL and MPI in CI test by fixing these test first

				# sudo apt-get update

				# sudo apt-get install libnccl-dev libnccl2

				# sudo apt-get install openmpi-bin libopenmpi-dev

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# For distributed, four environmental configs:

				# (1) build with only NCCL

				# (2) build with NCCL and MPI

				# (3) build with only MPI

				# (4) build with neither

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get -qq update

				  sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get -qq update

				  if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				    sudo apt-get -qq install openmpi-bin libopenmpi-dev

				  else

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				  fi

				  sudo apt-get -qq install --no-install-recommends openssh-client openssh-server

				  sudo mkdir -p /var/run/sshd

				fi

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-py3-clang5-asan* ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"

				fi

				echo "Python version:"

				python --version

				@ -27,67 +46,133 @@ echo "CMake version:"

				cmake --version

				# TODO: Don't run this...

				pip install -r requirements.txt || true

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  export HCC_AMDGPU_TARGET=gfx900

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  sudo chown -R jenkins:jenkins /usr/local

				  rm -rf "$(dirname "${BASH_SOURCE[0]}")/../../../pytorch_amd/" || true

				  python "$(dirname "${BASH_SOURCE[0]}")/../../tools/amd_build/build_pytorch_amd.py"

				  USE_ROCM=1 python setup.py install

				  exit

				fi

				pip install -q -r requirements.txt || true

				# TODO: Don't install this here

				if ! which conda; then

				  pip install mkl mkl-devel

				  # In ROCm CIs, we are doing cross compilation on build machines with

				  # intel cpu and later run tests on machines with amd cpu.

				  # Also leave out two builds to make sure non-mkldnn builds still work.

				  if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda8-cudnn7-py3-* ]]; then

				    pip install -q mkl mkl-devel

				    export USE_MKLDNN=1

				  else

				    export USE_MKLDNN=0

				  fi

				fi

				# Use special scripts for Android builds

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				  build_args=()

				  build_args+=("-DBUILD_BINARY=ON")

				  build_args+=("-DBUILD_TEST=ON")

				  build_args+=("-DUSE_OBSERVERS=ON")

				  build_args+=("-DUSE_ZSTD=ON")

				  exec ./scripts/build_android.sh "${build_args[@]}" "$@"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # When hcc runs out of memory, it silently exits without stopping

				  # the build process, leaving undefined symbols in the shared lib

				  # which will cause undefined symbol errors when later running

				  # tests. Setting MAX_JOBS to smaller number to make CI less flaky.

				  export MAX_JOBS=4

				  # ROCm CI is using Caffe2 docker images, which needs these wrapper

				  # scripts to correctly use sccache.

				  if [ -n "${SCCACHE_BUCKET}" ]; then

				    mkdir -p ./sccache

				    SCCACHE="$(which sccache)"

				    if [ -z "${SCCACHE}" ]; then

				      echo "Unable to find sccache..."

				      exit 1

				    fi

				    # Setup wrapper scripts

				    for compiler in cc c++ gcc g++; do

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				      ) > "./sccache/$compiler"

				      chmod +x "./sccache/$compiler"

				    done

				    export CACHE_WRAPPER_DIR="$PWD/sccache"

				    # CMake must find these wrapper scripts

				    export PATH="$CACHE_WRAPPER_DIR:$PATH"

				  fi

				  python tools/amd_build/build_amd.py

				  # OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer

				  # LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

				  USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user

				  exit 0

				fi

				# sccache will fail for CUDA builds if all cores are used for compiling

				# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used

				if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then

				  export MAX_JOBS=`expr $(nproc) - 1`

				if [ -z "$MAX_JOBS" ]; then

				  if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then

				    export MAX_JOBS=$(($(nproc) - 1))

				  fi

				fi

				# Target only our CI GPU machine's CUDA arch to speed up the build

				export TORCH_CUDA_ARCH_LIST=5.2

				export TORCH_CUDA_ARCH_LIST="5.2"

				if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  export TORCH_CUDA_ARCH_LIST="6.0"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then

				  export DEBUG=1

				fi

				WERROR=1 python setup.py install

				# Add the test binaries so that they won't be git clean'ed away

				git add -f build/bin

				# Testing ATen install

				if [[ "$BUILD_ENVIRONMENT" != *cuda* ]]; then

				  echo "Testing ATen install"

				  time tools/test_aten_install.sh

				# Patch required to build xla

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  git clone --recursive https://github.com/pytorch/xla.git

				  ./xla/scripts/apply_patches.sh

				fi

				# Test C FFI plugins

				# cffi install doesn't work for Python 3.7

				if [[ "$BUILD_ENVIRONMENT" != *pynightly* ]]; then

				  # TODO: Don't run this here

				  pip install cffi

				  git clone https://github.com/pytorch/extension-ffi.git

				  pushd extension-ffi/script

				  python build.py

				  popd

				# check that setup.py would fail with bad arguments

				echo "The next three invocations are expected to fail with invalid command error messages."

				( ! get_exit_code python setup.py bad_argument )

				( ! get_exit_code python setup.py clean] )

				( ! get_exit_code python setup.py clean bad_argument )

				# ppc64le build fails when WERROR=1

				# set only when building other architectures

				# only use for "python setup.py install" line

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  WERROR=1 python setup.py install

				else

				  python setup.py install

				fi

				assert_git_not_dirty

				# Test documentation build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn7-py3* ]]; then

				  pushd docs

				  # TODO: Don't run this here

				  pip install -r requirements.txt || true

				  make html

				  pip install -q -r requirements.txt || true

				  LC_ALL=C make html

				  popd

				  assert_git_not_dirty

				fi

				# Test standalone c10 build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn7-py3* ]]; then

				  mkdir -p c10/build

				  pushd c10/build

				  cmake ..

				  make -j

				  popd

				  assert_git_not_dirty

				fi

				# Test no-Python build

				@ -95,5 +180,69 @@ if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				  echo "Building libtorch"

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				  WERROR=1 VERBOSE=1 tools/cpp_build/build_all.sh "$PWD/../cpp-build"

				  BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py

				  mkdir -p ../cpp-build/caffe2

				  pushd ../cpp-build/caffe2

				  WERROR=1 VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY

				  popd

				  # Build custom operator tests.

				  CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				  CUSTOM_OP_TEST="$PWD/test/custom_operator"

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  mkdir "$CUSTOM_OP_BUILD"

				  pushd "$CUSTOM_OP_BUILD"

				  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake "$CUSTOM_OP_TEST"

				  make VERBOSE=1

				  popd

				  assert_git_not_dirty

				fi

				# Test XLA build

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  # TODO: Move this to Dockerfile.

				  pip install -q lark-parser

				  # Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642

				  sudo add-apt-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main"

				  wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -

				  sudo apt-get -qq update

				  # Install clang-7 clang++-7 for xla

				  sudo apt-get -qq install clang-7 clang++-7

				  # Bazel dependencies

				  sudo apt-get -qq install pkg-config zip zlib1g-dev unzip

				  # XLA build requires Bazel

				  wget https://github.com/bazelbuild/bazel/releases/download/0.24.1/bazel-0.24.1-installer-linux-x86_64.sh

				  chmod +x bazel-*.sh

				  sudo ./bazel-*.sh

				  BAZEL="$(which bazel)"

				  if [ -z "${BAZEL}" ]; then

				    echo "Unable to find bazel..."

				    exit 1

				  fi

				  # Install bazels3cache for cloud cache

				  sudo apt-get -qq install npm

				  npm config set strict-ssl false

				  curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -

				  sudo apt-get install -qq nodejs

				  sudo npm install -g bazels3cache

				  BAZELS3CACHE="$(which bazels3cache)"

				  if [ -z "${BAZELS3CACHE}" ]; then

				    echo "Unable to find bazels3cache..."

				    exit 1

				  fi

				  bazels3cache --bucket=ossci-compiler-cache-circleci-xla --maxEntrySizeBytes=0

				  pushd xla

				  export CC=clang-7 CXX=clang++-7

				  # Use cloud cache to build when available.

				  sed -i '/bazel build/ a --remote_http_cache=http://localhost:7777 \\' build_torch_xla_libs.sh

				  python setup.py install

				  popd

				  assert_git_not_dirty

				fi

									
										35

.jenkins/pytorch/common.sh
									
												View File
												
				@ -25,6 +25,8 @@ set -ex

				# system; to find out more, grep for this string in ossci-job-dsl.

				echo "ENTERED_USER_LAND"

				export IS_PYTORCH_CI=1

				# compositional trap taken from https://stackoverflow.com/a/7287873/23845

				# note: printf is used instead of echo to avoid backslash

				@ -61,17 +63,33 @@ declare -f -t trap_add

				trap_add cleanup EXIT

				function assert_git_not_dirty() {

				    # TODO: we should add an option to `build_amd.py` that reverts the repo to

				    #       an unmodified state.

				    if ([[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *xla* ]]) ; then

				        git_status=$(git status --porcelain)

				        if [[ $git_status ]]; then

				            echo "Build left local git repository checkout dirty"

				            echo "git status --porcelain:"

				            echo "${git_status}"

				            exit 1

				        fi

				    fi

				}

				if which sccache > /dev/null; then

				  # Save sccache logs to file

				  sccache --stop-server || true

				  rm ~/sccache_error.log || true

				  SCCACHE_ERROR_LOG=~/sccache_error.log RUST_LOG=sccache::server=error sccache --start-server

				  # increasing SCCACHE_IDLE_TIMEOUT so that extension_backend_test.cpp can build after this PR:

				  # https://github.com/pytorch/pytorch/pull/16645

				  SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=1200 RUST_LOG=sccache::server=error sccache --start-server

				  # Report sccache stats for easier debugging

				  sccache --zero-stats

				  function sccache_epilogue() {

				    echo '=================== sccache compilation log ==================='

				    python $(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py ~/sccache_error.log

				    python "$(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py" ~/sccache_error.log

				    echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='

				    sccache --show-stats

				    sccache --stop-server || true

				@ -113,7 +131,8 @@ else

				fi

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3 ]] || \

				   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]]; then

				   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]] || \

				   [[ "$BUILD_ENVIRONMENT" == *pytorch_macos* ]]; then

				  BUILD_TEST_LIBTORCH=1

				else

				  BUILD_TEST_LIBTORCH=0

				@ -122,7 +141,7 @@ fi

				# Use conda cmake in some CI build. Conda cmake will be newer than our supported

				# min version 3.5, so we only do it in two builds that we know should use conda.

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn6-py2* ]] || \

				  if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn7-py2* ]] || \

				     [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then

				    if ! which conda; then

				      echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"

				@ -138,3 +157,11 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then

				    fi

				  fi

				fi

				function get_exit_code() {

				  set +e

				  "$@"

				  retcode=$?

				  set -e

				  return $retcode

				}

									
										2

.jenkins/pytorch/docker-build-test.sh
									
												View File
												
				@ -1,6 +1,8 @@

				#!/bin/bash

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="docker-build-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				docker build -t pytorch .

22

.jenkins/pytorch/enabled-configs.txt

View File

 @ -5,17 +5,21 @@
 # in this file will report a failure (so you don't forget to
 # reenable the tests on merge ;)
 pytorch-linux-xenial-cuda8-cudnn6-py3-build
 pytorch-linux-xenial-cuda8-cudnn6-py3-test
 pytorch-linux-xenial-cuda8-cudnn6-py3-multigpu-test
 pytorch-linux-xenial-cuda8-cudnn7-py3-build
 pytorch-linux-xenial-cuda8-cudnn7-py3-test
 pytorch-linux-xenial-cuda8-cudnn7-py3-multigpu-test
 pytorch-linux-xenial-cuda8-cudnn7-py3-nogpu-test
 pytorch-linux-xenial-cuda9-cudnn7-py2-build
 pytorch-linux-xenial-cuda9-cudnn7-py2-test
 pytorch-linux-xenial-cuda9-cudnn7-py3-build
 pytorch-linux-xenial-cuda9-cudnn7-py3-test
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
 pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-build
 pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-test
 pytorch-linux-xenial-py3-clang5-asan-build
 pytorch-linux-xenial-py3-clang5-asan-test
 pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build
 pytorch-linux-trusty-py2.7.9-build
 pytorch-linux-trusty-py2.7.9-test
 pytorch-linux-trusty-py2.7-build
 @ -40,4 +44,14 @@ pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
 pytorch-docker-build-test
 short-perf-test-cpu
 short-perf-test-gpu
 py2-clang3.8-rocmnightly-ubuntu16.04-build
 py2-clang7-rocmdeb-ubuntu16.04
 py2-devtoolset7-rocmrpm-centos7.5
 pytorch-ppc64le-cuda9.2-cudnn7-py3-build
 pytorch-ppc64le-cuda9.2-cudnn7-py3-test
 pytorch-ppc64le-cuda9.1-cudnn7-py3-build
 pytorch-ppc64le-cuda9.1-cudnn7-py3-test
 pytorch-linux-xenial-cuda8-cudnn7-py3-NO_AVX2-test
 pytorch-linux-xenial-cuda8-cudnn7-py3-NO_AVX-NO_AVX2-test
 pytorch-linux-xenial-cuda8-cudnn7-py3-slow-test
 pytorch-xla-linux-trusty-py3.6-gcc5.4-build
 pytorch-xla-linux-trusty-py3.6-gcc5.4-test

									
										4

.jenkins/pytorch/macos-build-test.sh
									
												View File
												
				@ -1,9 +1,9 @@

				#!/bin/bash

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-build* ]]; then

				if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-build* ]]; then

				  source "$(dirname "${BASH_SOURCE[0]}")/macos-build.sh"

				fi

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test* ]]; then

				if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-test* ]]; then

				  source "$(dirname "${BASH_SOURCE[0]}")/macos-test.sh"

				fi

									
										29

.jenkins/pytorch/macos-build.sh
									
												View File
												
				@ -1,6 +1,8 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				export PATH="/usr/local/bin:$PATH"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				@ -17,11 +19,12 @@ source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				git submodule sync --recursive

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				# Build PyTorch

				if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then

				  export CUDA_VERSION=9.2

				  export TORCH_CUDA_ARCH_LIST=5.2

				  export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}

				@ -29,11 +32,15 @@ if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				  export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}

				  export NO_CUDA=0

				  # Eigen gives "explicit specialization of class must precede its first use" error

				  # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				  export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  if [ -z "${IN_CIRCLECI}" ]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				    export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  fi

				else

				  export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  if [ -z "${IN_CIRCLECI}" ]; then

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				@ -46,7 +53,7 @@ if which sccache > /dev/null; then

				  printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang"

				  if [[ "${JOB_BASE_NAME}" == *cuda* ]]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then

				    printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"

				    chmod a+x "${PYTORCH_ENV_DIR}/nvcc"

				    export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"

				@ -61,6 +68,10 @@ export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				python setup.py install

				assert_git_not_dirty

				# Upload torch binaries when the build job is finished

				7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read

				if [ -z "${IN_CIRCLECI}" ]; then

				  7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read

				fi

									
										122

.jenkins/pytorch/macos-test.sh
									
												View File
												
				@ -1,6 +1,8 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export PATH="/usr/local/bin:$PATH"

				@ -15,19 +17,31 @@ if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja six

				pip install -q hypothesis "librosa>=0.6.2" psutil

				# faulthandler become built-in since 3.3

				if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then

				  pip install -q faulthandler

				fi

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				fi

				git submodule sync --recursive

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				# Test PyTorch

				if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				  # Eigen gives "explicit specialization of class must precede its first use" error

				  # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				  export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				else

				  export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				if [ -z "${IN_CIRCLECI}" ]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				    export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  else

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				@ -38,43 +52,87 @@ export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				# Download torch binaries in the test jobs

				rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z

				7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z

				  7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"

				fi

				# Test that OpenMP is enabled

				pushd test

				if [[ ! $(python -c "import torch; print(int(torch.backends.openmp.is_available()))") == "1" ]]; then

				  echo "Build should have OpenMP enabled, but torch.backends.openmp.is_available() is False"

				  exit 1

				fi

				popd

				test_python_all() {

				  echo "Ninja version: $(ninja --version)"

				  python test/run_test.py --verbose

				  assert_git_not_dirty

				}

				test_cpp_api() {

				test_libtorch() {

				  # C++ API

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				  # But still clean it before we perform our own build.

				  #

				  CPP_BUILD="$PWD/../cpp-build"

				  rm -rf $CPP_BUILD

				  mkdir -p $CPP_BUILD

				  WERROR=1 VERBOSE=1 tools/cpp_build/build_all.sh "$CPP_BUILD"

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				    # NB: Install outside of source directory (at the same level as the root

				    # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				    # But still clean it before we perform our own build.

				  python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				    echo "Testing libtorch"

				  # Unfortunately it seems like the test can't load from miniconda3

				  # without these paths being set

				  export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"

				  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"

				  "$CPP_BUILD"/libtorch/bin/test_api

				    CPP_BUILD="$PWD/../cpp-build"

				    rm -rf $CPP_BUILD

				    mkdir -p $CPP_BUILD/caffe2

				    BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py

				    pushd $CPP_BUILD/caffe2

				    VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY

				    popd

				    python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				    # Unfortunately it seems like the test can't load from miniconda3

				    # without these paths being set

				    export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"

				    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"

				    TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api

				    assert_git_not_dirty

				  fi

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				test_custom_script_ops() {

				  echo "Testing custom script operators"

				  pushd test/custom_operator

				  # Build the custom operator library.

				  rm -rf build && mkdir build

				  pushd build

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake ..

				  make VERBOSE=1

				  popd

				  # Run tests Python-side and export a script module.

				  python test_custom_ops.py -v

				  python model.py --export-script-module=model.pt

				  # Run tests C++-side and load the exported script module.

				  build/test_custom_ops ./model.pt

				  popd

				  assert_git_not_dirty

				}

				if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-test ]]; then

				  test_python_all

				  test_cpp_api

				  test_libtorch

				  test_custom_script_ops

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then

				    test_python_all

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_cpp_api

				  elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then

				    test_libtorch

				    test_custom_script_ops

				  fi

				fi

									
										22

.jenkins/pytorch/multigpu-test.sh
									
												View File
												
				@ -4,8 +4,28 @@

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-multigpu-test"

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch (distributed only)"

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				fi

				time python test/run_test.py --verbose -i distributed

				assert_git_not_dirty

									
										2

.jenkins/pytorch/perf_test/common.sh
									
												View File
												
				@ -10,7 +10,7 @@ get_runtime_of_command () {

				  TIMEFORMAT=%R

				  # runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)

				  runtime=$( { time $@; } 2>&1 1>/dev/null)

				  runtime=$( { time "$@"; } 2>&1 1>/dev/null)

				  if [[ $runtime == *"Error"* ]]; then

				    exit 1

				  fi

									
										23

.jenkins/pytorch/perf_test/compare_with_baseline.py
									
												View File
												
				@ -1,6 +1,6 @@

				import sys

				import json

				import numpy

				import math

				import argparse

				parser = argparse.ArgumentParser()

				@ -35,14 +35,25 @@ else:

				print("population mean: ", mean)

				print("population sigma: ", sigma)

				# Let the test pass if baseline number is NaN (which happened in

				# the past when we didn't have logic for catching NaN numbers)

				if math.isnan(mean) or math.isnan(sigma):

				    mean = sys.maxsize

				    sigma = 0.001

				sample_stats_data = json.loads(args.sample_stats)

				sample_mean = sample_stats_data['mean']

				sample_sigma = sample_stats_data['sigma']

				sample_mean = float(sample_stats_data['mean'])

				sample_sigma = float(sample_stats_data['sigma'])

				print("sample mean: ", sample_mean)

				print("sample sigma: ", sample_sigma)

				if math.isnan(sample_mean):

				    raise Exception('''Error: sample mean is NaN''')

				elif math.isnan(sample_sigma):

				    raise Exception('''Error: sample sigma is NaN''')

				z_value = (sample_mean - mean) / sigma

				print("z-value: ", z_value)

				@ -50,8 +61,10 @@ print("z-value: ", z_value)

				if z_value >= 3:

				    raise Exception('''\n

				z-value >= 3, there is high chance of perf regression.\n

				To reproduce this regression, run `cd .jenkins/pytorch/perf_test/ && bash ''' + test_name + '''.sh` on your local machine and compare the runtime before/after your code change.

				''')

				To reproduce this regression, run

				`cd .jenkins/pytorch/perf_test/ && bash {}.sh` on your local machine

				and compare the runtime before/after your code change.

				'''.format(test_name))

				else:

				    print("z-value < 3, no perf regression detected.")

				    if args.update:

									
										4

.jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh
									
												View File
												
				@ -19,14 +19,14 @@ test_cpu_speed_mini_sequence_labeler () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py)

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										6

.jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh
									
												View File
												
				@ -12,7 +12,7 @@ test_cpu_speed_mnist () {

				  cd examples/mnist

				  pip install -r requirements.txt

				  pip install -q -r requirements.txt

				  # Download data

				  python main.py --epochs 0

				@ -20,7 +20,7 @@ test_cpu_speed_mnist () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				@ -28,7 +28,7 @@ test_cpu_speed_mnist () {

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										4

.jenkins/pytorch/perf_test/test_cpu_speed_torch.sh
									
												View File
												
				@ -1,3 +1,5 @@

				#!/bin/bash

				. ./common.sh

				test_cpu_speed_torch () {

				@ -17,7 +19,7 @@ test_cpu_speed_torch () {

				  fi

				  if ! python perf-tests/modules/test_cpu_torch.py ${ARGS}; then

				    echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."

				    echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."

				    exit 1

				  fi

				}

									
										4

.jenkins/pytorch/perf_test/test_cpu_speed_torch_tensor.sh
									
												View File
												
				@ -1,3 +1,5 @@

				#!/bin/bash

				. ./common.sh

				test_cpu_speed_torch_tensor () {

				@ -17,7 +19,7 @@ test_cpu_speed_torch_tensor () {

				  fi

				  if ! python perf-tests/modules/test_cpu_torch_tensor.py ${ARGS}; then

				    echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."

				    echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."

				    exit 1

				  fi

				}

									
										4

.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh
									
												View File
												
				@ -19,7 +19,7 @@ test_gpu_speed_cudnn_lstm () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				@ -27,7 +27,7 @@ test_gpu_speed_cudnn_lstm () {

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										4

.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh
									
												View File
												
				@ -19,7 +19,7 @@ test_gpu_speed_lstm () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				@ -27,7 +27,7 @@ test_gpu_speed_lstm () {

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										4

.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh
									
												View File
												
				@ -19,7 +19,7 @@ test_gpu_speed_mlstm () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				@ -27,7 +27,7 @@ test_gpu_speed_mlstm () {

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										9

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh
									
												View File
												
				@ -12,7 +12,7 @@ test_gpu_speed_mnist () {

				  cd examples/mnist

				  pip install -r requirements.txt

				  pip install -q -r requirements.txt

				  # Download data

				  python main.py --epochs 0

				@ -20,7 +20,10 @@ test_gpu_speed_mnist () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  # Needs warm up to get accurate number

				  python main.py --epochs 1 --no-log

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				@ -28,7 +31,7 @@ test_gpu_speed_mnist () {

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										4

.jenkins/pytorch/perf_test/test_gpu_speed_word_language_model.sh
									
												View File
												
				@ -28,7 +28,7 @@ test_gpu_speed_word_language_model () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				  for (( i=1; i<=NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --cuda --epochs 1)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				@ -36,7 +36,7 @@ test_gpu_speed_word_language_model () {

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")

				  echo "Runtime stats in seconds:"

				  echo $stats

									
										9

.jenkins/pytorch/short-perf-test-cpu.sh
									
												View File
												
				@ -1,13 +1,18 @@

				#!/bin/bash

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="short-perf-test-cpu"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				SCRIPT_PARENT_DIR=$(dirname "${BASH_SOURCE[0]}")

				# shellcheck source=.jenkins/pytorch/common.sh

				source "$SCRIPT_PARENT_DIR/common.sh"

				cd .jenkins/pytorch/perf_test

				echo "Running CPU perf test for PyTorch..."

				pip install awscli

				pip install -q awscli

				# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read

				# More info at https://github.com/aws/aws-cli/issues/2321

									
										6

.jenkins/pytorch/short-perf-test-gpu.sh
									
												View File
												
				@ -1,13 +1,17 @@

				#!/bin/bash

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="short-perf-test-gpu"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				pushd .jenkins/pytorch/perf_test

				echo "Running GPU perf test for PyTorch..."

				pip install awscli

				# Trying to uninstall PyYAML can cause problem. Workaround according to:

				# https://github.com/pypa/pip/issues/5247#issuecomment-415571153

				pip install -q awscli --ignore-installed PyYAML

				# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read

				# More info at https://github.com/aws/aws-cli/issues/2321

									
										196

.jenkins/pytorch/test.sh
									
												View File
												
				@ -1,32 +1,82 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch"

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  echo "Skipping ROCm tests for now"

				  exit 0

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get -qq update

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get -qq update

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get -qq install --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-slow-* ]]; then

				    export PYTORCH_TEST_WITH_SLOW=1

				    export PYTORCH_TEST_SKIP_FAST=1

				  fi

				fi

				# JIT C++ extensions require ninja.

				git clone https://github.com/ninja-build/ninja --quiet

				pushd ninja

				python ./configure.py --bootstrap

				export PATH="$PWD:$PATH"

				popd

				# --user breaks ppc64le builds and these packages are already in ppc64le docker

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  # JIT C++ extensions require ninja.

				  pip install -q ninja --user

				  # ninja is installed in /var/lib/jenkins/.local/bin

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				  # TODO: move this to Docker

				  pip install -q hypothesis --user

				  # TODO: move this to Docker

				  PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)

				  echo $PYTHON_VERSION

				  if [[ $PYTHON_VERSION == "2" ]]; then

				    pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl --user

				  else

				    pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl --user

				  fi

				  # mypy will fail to install on Python <3.4.  In that case,

				  # we just won't run these tests.

				  pip install mypy --user || true

				fi

				# faulthandler become built-in since 3.3

				if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then

				  pip install -q faulthandler --user

				fi

				# DANGER WILL ROBINSON.  The LD_PRELOAD here could cause you problems

				# if you're not careful.  Check this if you made some changes and the

				# ASAN test is not working

				if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				    export UBSAN_OPTIONS=print_stacktrace=1

				    export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true

				    # We suppress the vptr volation, since we have separate copies of

				    # libprotobuf in both libtorch.so and libcaffe2.so, and it causes

				    # the following problem:

				    #    test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:

				    #        runtime error: member call on address ... which does not point

				    #        to an object of type 'google::protobuf::MessageLite'

				    #        ...: note: object is of type 'onnx_torch::ModelProto'

				    #

				    # This problem should be solved when libtorch.so and libcaffe2.so are

				    # merged.

				    export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp

				    export PYTORCH_TEST_WITH_ASAN=1

				    export PYTORCH_TEST_WITH_UBSAN=1

				    # TODO: Figure out how to avoid hard-coding these paths

				@ -35,13 +85,6 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    # Increase stack size, because ASAN red zones use more stack

				    ulimit -s 81920

				    function get_exit_code() {

				      set +e

				      "$@"

				      retcode=$?

				      set -e

				      return $retcode

				    }

				    (cd test && python -c "import torch")

				    echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")

				@ -49,35 +92,51 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")

				fi

				export ATEN_DISABLE_AVX=

				export ATEN_DISABLE_AVX2=

				if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then

				  export ATEN_DISABLE_AVX=1

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  export PYTORCH_TEST_WITH_ROCM=1

				  # ROCm CI is using Caffe2 docker images, which doesn't have several packages

				  # needed in testing. We install them here.

				  pip install -q psutil "librosa>=0.6.2" --user

				fi

				if [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then

				  export ATEN_DISABLE_AVX2=1

				if [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX-* ]]; then

				  export ATEN_CPU_CAPABILITY=default

				elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then

				  export ATEN_CPU_CAPABILITY=avx

				fi

				test_python_nn() {

				  time python test/run_test.py --include nn --verbose

				  assert_git_not_dirty

				}

				test_python_all_except_nn() {

				  time python test/run_test.py --exclude nn --verbose

				  assert_git_not_dirty

				}

				test_aten() {

				  # Test ATen

				  if [[ "$BUILD_ENVIRONMENT" != *asan* ]]; then

				  # The following test(s) of ATen have already been skipped by caffe2 in rocm environment:

				  # scalar_tensor_test, basic, native_test

				  if ([[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]); then

				    echo "Running ATen tests with pytorch lib"

				    TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib

				    # NB: the ATen test binaries don't have RPATH set, so it's necessary to

				    # put the dynamic libraries somewhere were the dynamic linker can find them.

				    # This is a bit of a hack.

				    ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin

				    ln -s "$TORCH_LIB_PATH"/libnccl* build/bin

				    if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				      SUDO=sudo

				    fi

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libc10* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libmkldnn* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libnccl* build/bin

				    ls build/bin

				    aten/tools/run_tests.sh build/bin

				    assert_git_not_dirty

				  fi

				}

				@ -95,37 +154,72 @@ test_torchvision() {

				  # this should be a transient requirement...)

				  # See https://github.com/pytorch/pytorch/issues/7525

				  #time python setup.py install

				  pip install .

				  pip install -q --user .

				  popd

				  rm -rf vision

				}

				test_libtorch() {

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				     echo "Testing libtorch"

				     CPP_BUILD="$PWD/../cpp-build"

				     if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				       "$CPP_BUILD"/libtorch/bin/test_jit

				     else

				       "$CPP_BUILD"/libtorch/bin/test_jit "[cpu]"

				     fi

				     python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				     OMP_NUM_THREADS=2 "$CPP_BUILD"/libtorch/bin/test_api

				    echo "Testing libtorch"

				    python test/cpp/jit/tests_setup.py setup

				    CPP_BUILD="$PWD/../cpp-build"

				    if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				      "$CPP_BUILD"/caffe2/bin/test_jit

				    else

				      "$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"

				    fi

				    python test/cpp/jit/tests_setup.py shutdown

				    python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				    OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api

				    assert_git_not_dirty

				  fi

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				test_custom_script_ops() {

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				    echo "Testing custom script operators"

				    CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				    pushd test/custom_operator

				    cp -a "$CUSTOM_OP_BUILD" build

				    # Run tests Python-side and export a script module.

				    python test_custom_ops.py -v

				    python model.py --export-script-module=model.pt

				    # Run tests C++-side and load the exported script module.

				    build/test_custom_ops ./model.pt

				    popd

				    assert_git_not_dirty

				  fi

				}

				test_xla() {

				  export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"

				  export XRT_WORKERS="localservice:0;grpc://localhost:40934"

				  pushd xla

				  python test/test_operations.py

				  python test/test_train_mnist.py --tidy

				  popd

				  assert_git_not_dirty

				}

				(cd test && python -c "import torch; print(torch.__config__.show())")

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  test_torchvision

				  test_xla

				elif [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then

				  test_torchvision

				  test_python_nn

				elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then

				  test_python_all_except_nn

				  test_aten

				  test_libtorch

				  test_custom_script_ops

				else

				  test_torchvision

				  test_python_nn

				  test_python_all_except_nn

				  test_aten

				  test_torchvision

				  test_libtorch

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				    test_python_nn

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_python_all_except_nn

				    test_aten

				    test_torchvision

				    test_libtorch

				  fi

				  test_custom_script_ops

				fi

									
										139

.jenkins/pytorch/win-build.sh
									
												View File
												
				@ -9,147 +9,28 @@ if [ ! -f setup.py ]; then

				  exit 1

				fi

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )

				source "$SCRIPT_PARENT_DIR/common.sh"

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				if [[ ${JOB_NAME} == *"develop"* ]]; then

				  export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}

				fi

				mkdir -p ci_scripts/

				export TMP_DIR="${PWD}/build/win_tmp"

				export TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")

				cat >ci_scripts/upload_image.py << EOL

				export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers

				import os

				import sys

				import boto3

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				$SCRIPT_HELPERS_DIR/build_pytorch.bat

				session = boto3.session.Session()

				s3 = session.resource('s3')

				data = open(sys.argv[1], 'rb')

				s3.Bucket('ossci-windows-build').put_object(Key='pytorch/'+IMAGE_COMMIT_TAG+'.7z', Body=data)

				object_acl = s3.ObjectAcl('ossci-windows-build','pytorch/'+IMAGE_COMMIT_TAG+'.7z')

				response = object_acl.put(ACL='public-read')

				assert_git_not_dirty

				EOL

				cat >ci_scripts/build_pytorch.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\curl-7.57.0-win64-mingw\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				:: Install MKL

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output mkl.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z mkl.7z --quiet

				  )

				  7z x -aoa mkl.7z -omkl

				)

				set CMAKE_INCLUDE_PATH=%cd%\\mkl\\include

				set LIB=%cd%\\mkl\\lib;%LIB

				:: Install MAGMA

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z --output magma_cuda90_release_mkl_2018.2.185.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet

				  )

				  7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma

				)

				set MAGMA_HOME=%cd%\\magma

				:: Install sccache

				mkdir %CD%\\tmp_bin

				if "%REBUILD%"=="" (

				  :check_sccache

				  %CD%\\tmp_bin\\sccache.exe --show-stats || (

				    taskkill /im sccache.exe /f /t || ver > nul

				    del %CD%\\tmp_bin\\sccache.exe

				    if "%BUILD_ENVIRONMENT%"=="" (

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %CD%\\tmp_bin\\sccache.exe

				    ) else (

				      aws s3 cp s3://ossci-windows/sccache.exe %CD%\\tmp_bin\\sccache.exe

				    )

				    goto :check_sccache

				  )

				)

				:: Install Miniconda3

				if "%REBUILD%"=="" (

				  IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				  .\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				)

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				if "%REBUILD%"=="" ( call conda install -y -q numpy cffi pyyaml boto3 )

				:: Install ninja

				if "%REBUILD%"=="" ( pip install ninja )

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				git submodule update --init --recursive

				set PATH=%CD%\\tmp_bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt

				set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64

				set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				:: Target only our CI GPU machine's CUDA arch to speed up the build

				set TORCH_CUDA_ARCH_LIST=5.2

				sccache --stop-server

				sccache --start-server

				sccache --zero-stats

				set CC=sccache cl

				set CXX=sccache cl

				set DISTUTILS_USE_SDK=1

				set CMAKE_GENERATOR=Ninja

				if not "%USE_CUDA%"=="1" (

				  if "%REBUILD%"=="" (

				    set NO_CUDA=1

				    python setup.py install

				  )

				  if errorlevel 1 exit /b 1

				  if not errorlevel 0 exit /b 1

				)

				if not "%USE_CUDA%"=="0" (

				  if "%REBUILD%"=="" (

				    sccache --show-stats

				    sccache --zero-stats

				    rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch

				    copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe

				  )

				  set CUDA_NVCC_EXECUTABLE=%CD%\\tmp_bin\\nvcc

				  if "%REBUILD%"=="" set NO_CUDA=0

				  python setup.py install && sccache --show-stats && (

				    if "%BUILD_ENVIRONMENT%"=="" (

				      echo "NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3\` in Command Prompt before running Git Bash."

				    ) else (

				      7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z

				    )

				  )

				)

				EOL

				ci_scripts/build_pytorch.bat

				if [ ! -f $IMAGE_COMMIT_TAG.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then

				if [ ! -f ${TMP_DIR}/${IMAGE_COMMIT_TAG}.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then

				    exit 1

				fi

				echo "BUILD PASSED"

									
										78

.jenkins/pytorch/win-test-helpers/build_pytorch.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,78 @@

				if "%DEBUG%" == "1" (

				  set BUILD_TYPE=debug

				) ELSE (

				  set BUILD_TYPE=release

				)

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%

				set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers

				call %INSTALLER_DIR%\install_mkl.bat

				call %INSTALLER_DIR%\install_magma.bat

				call %INSTALLER_DIR%\install_sccache.bat

				call %INSTALLER_DIR%\install_miniconda3.bat

				:: Install ninja

				if "%REBUILD%"=="" ( pip install -q ninja )

				git submodule sync --recursive

				git submodule update --init --recursive

				set PATH=%TMP_DIR_WIN%\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;%PATH%

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set CUDA_PATH_V9_0=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt

				set CUDNN_LIB_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64

				set CUDA_TOOLKIT_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set CUDNN_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				:: Target only our CI GPU machine's CUDA arch to speed up the build

				set TORCH_CUDA_ARCH_LIST=5.2

				sccache --stop-server

				sccache --start-server

				sccache --zero-stats

				set CC=sccache cl

				set CXX=sccache cl

				set CMAKE_GENERATOR=Ninja

				if not "%USE_CUDA%"=="1" (

				  if "%REBUILD%"=="" (

				    set NO_CUDA=1

				    python setup.py install

				  )

				  if errorlevel 1 exit /b 1

				  if not errorlevel 0 exit /b 1

				)

				if not "%USE_CUDA%"=="0" (

				  if "%REBUILD%"=="" (

				    sccache --show-stats

				    sccache --zero-stats

				    rd /s /q %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch

				    for /f "delims=" %%i in ('where /R caffe2\proto *.py') do (

				      IF NOT "%%i" == "%CD%\caffe2\proto\__init__.py" (

				        del /S /Q %%i

				      )

				    )

				    copy %TMP_DIR_WIN%\bin\sccache.exe %TMP_DIR_WIN%\bin\nvcc.exe

				  )

				  set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc

				  if "%REBUILD%"=="" set NO_CUDA=0

				  python setup.py install --cmake && sccache --show-stats && (

				    if "%BUILD_ENVIRONMENT%"=="" (

				      echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.

				    ) else (

				      mv %CD%\build\bin\test_api.exe %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch\lib

				      7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && python %SCRIPT_HELPERS_DIR%\upload_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z

				    )

				  )

				)

									
										19

.jenkins/pytorch/win-test-helpers/download_image.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,19 @@

				import os

				import sys

				import boto3

				import botocore

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				session = boto3.session.Session()

				s3 = session.resource('s3')

				BUCKET_NAME = 'ossci-windows-build'

				KEY = 'pytorch/' + IMAGE_COMMIT_TAG + '.7z'

				LOCAL_FILE_PATH = sys.argv[1]

				try:

				    s3.Bucket(BUCKET_NAME).download_file(KEY, LOCAL_FILE_PATH)

				except botocore.exceptions.ClientError as e:

				    if e.response['Error']['Code'] == "404":

				        print("The object does not exist.")

				    else:

				        raise

									
										9

.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,9 @@

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.0_cuda90_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.0_cuda90_%BUILD_TYPE%.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/magma_2.5.0_cuda90_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.0_cuda90_%BUILD_TYPE%.7z --quiet

				  )

				  7z x -aoa %TMP_DIR_WIN%\magma_2.5.0_cuda90_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma

				)

				set MAGMA_HOME=%TMP_DIR_WIN%\magma

									
										15

.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,15 @@

				if "%BUILD_ENVIRONMENT%"=="" (

				  set CONDA_PARENT_DIR=%CD%

				) else (

				  set CONDA_PARENT_DIR=C:\Jenkins

				)

				if "%REBUILD%"=="" (

				  IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe

				  %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3

				)

				call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

				if "%REBUILD%"=="" (

				  :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				  call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3

				)

									
										10

.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,10 @@

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output %TMP_DIR_WIN%\mkl.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z %TMP_DIR_WIN%\mkl.7z --quiet

				  )

				  7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl

				)

				set CMAKE_INCLUDE_PATH=%TMP_DIR_WIN%\mkl\include

				set LIB=%TMP_DIR_WIN%\mkl\lib;%LIB

									
										15

.jenkins/pytorch/win-test-helpers/installation-helpers/install_sccache.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,15 @@

				mkdir %TMP_DIR_WIN%\bin

				if "%REBUILD%"=="" (

				  :check_sccache

				  %TMP_DIR_WIN%\bin\sccache.exe --show-stats || (

				    taskkill /im sccache.exe /f /t || ver > nul

				    del %TMP_DIR_WIN%\bin\sccache.exe

				    if "%BUILD_ENVIRONMENT%"=="" (

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe

				    ) else (

				      aws s3 cp s3://ossci-windows/sccache.exe %TMP_DIR_WIN%\bin\sccache.exe

				    )

				    goto :check_sccache

				  )

				)

									
										39

.jenkins/pytorch/win-test-helpers/run_python_nn_smoketests.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,39 @@

				#!/usr/bin/env python

				from __future__ import print_function

				import subprocess

				TESTS = [

				    (

				        "Checking that caffe2.python is available",

				        "from caffe2.python import core",

				    ),

				    (

				        "Checking that MKL is available",

				        "import torch; exit(0 if torch.backends.mkl.is_available() else 1)",

				    ),

				    (

				        "Checking that CUDA archs are setup correctly",

				        "import torch; torch.randn([3,5]).cuda()",

				    ),

				    (

				        "Checking that magma is available",

				        "import torch; torch.rand(1).cuda(); exit(0 if torch.cuda.has_magma else 1)",

				    ),

				    (

				        "Checking that CuDNN is available",

				        "import torch; exit(0 if torch.backends.cudnn.is_available() else 1)",

				    ),

				]

				if __name__ == "__main__":

				    for description, python_commands in TESTS:

				        print(description)

				        command_args = ["python", "-c", python_commands]

				        command_string = " ".join(command_args)

				        print("Command:", command_string)

				        subprocess.check_call(command_args)

									
										54

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,54 @@

				if exist "%TMP_DIR%/ci_scripts/pytorch_env_restore.bat" (

				    call %TMP_DIR%/ci_scripts/pytorch_env_restore.bat

				    exit /b 0

				)

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%

				:: Install Miniconda3

				if "%BUILD_ENVIRONMENT%"=="" (

				    set CONDA_PARENT_DIR=%CD%

				) else (

				    set CONDA_PARENT_DIR=C:\Jenkins

				)

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )

				    curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe

				    %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3

				)

				call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				    call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba

				)

				pip install -q ninja future hypothesis "librosa>=0.6.2" psutil

				:: No need to install faulthandler since we only test Python >= 3.6 on Windows

				:: faulthandler is builtin since Python 3.3

				pushd .

				call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64

				popd

				set PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;%PATH%

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set CUDA_PATH_V9_0=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt

				set CUDNN_LIB_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64

				set CUDA_TOOLKIT_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set CUDNN_ROOT_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set PYTHONPATH=%TMP_DIR_WIN%\build;%PYTHONPATH%

				set NUMBAPRO_CUDALIB=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin

				set NUMBAPRO_LIBDEVICE=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\nvvm\libdevice

				set NUMBAPRO_NVVM=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\nvvm\bin\nvvm64_32_0.dll

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    pushd %TMP_DIR_WIN%\build

				    python %SCRIPT_HELPERS_DIR%\download_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z

				    :: 7z: -aos skips if exists because this .bat can be called multiple times

				    7z x %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z -aos

				    popd

				) else (

				    xcopy /s %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %TMP_DIR_WIN%\build\torch\

				)

				for /f "usebackq tokens=*" %%i in (`set`) do echo set "%%i" >> %TMP_DIR%/ci_scripts/pytorch_env_restore.bat

									
										19

.jenkins/pytorch/win-test-helpers/test_custom_script_ops.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,19 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				cd test\custom_operator

				:: Build the custom operator library.

				mkdir build

				cd build

				:: Note: Caffe2 does not support MSVC + CUDA + Debug mode (has to be Release mode)

				cmake -DCMAKE_PREFIX_PATH=%TMP_DIR_WIN%\build\torch -DCMAKE_BUILD_TYPE=Release -GNinja ..

				ninja -v

				cd ..

				:: Run tests Python-side and export a script module.

				python test_custom_ops.py -v

				python model.py --export-script-module="build/model.pt"

				:: Run tests C++-side and load the exported script module.

				cd build

				set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%

				test_custom_ops.exe model.pt

									
										9

.jenkins/pytorch/win-test-helpers/test_libtorch.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,9 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				dir

				dir %TMP_DIR_WIN%\build

				dir %TMP_DIR_WIN%\build\torch

				dir %TMP_DIR_WIN%\build\torch\lib

				cd %TMP_DIR_WIN%\build\torch\lib

				set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%

				test_api.exe --gtest_filter="-IntegrationTest.MNIST*"

									
										2

.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				cd test && python run_test.py --exclude nn --verbose && cd ..

									
										16

.jenkins/pytorch/win-test-helpers/test_python_nn.bat
									
										Normal file
									
												View File
												
				@ -0,0 +1,16 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				pushd test

				echo Some smoke tests

				python %SCRIPT_HELPERS_DIR%\run_python_nn_smoketests.py

				if ERRORLEVEL 1 exit /b 1

				echo Run nn tests

				python run_test.py --include nn --verbose

				if ERRORLEVEL 1 exit /b 1

				popd

									
										12

.jenkins/pytorch/win-test-helpers/upload_image.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,12 @@

				import os

				import sys

				import boto3

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				session = boto3.session.Session()

				s3 = session.resource('s3')

				with open(sys.argv[1], 'rb') as data:

				    s3.Bucket('ossci-windows-build').put_object(Key='pytorch/' + IMAGE_COMMIT_TAG + '.7z', Body=data)

				object_acl = s3.ObjectAcl('ossci-windows-build', 'pytorch/' + IMAGE_COMMIT_TAG + '.7z')

				response = object_acl.put(ACL='public-read')

									
										91

.jenkins/pytorch/win-test.sh
									
												View File
												
				@ -1,93 +1,50 @@

				#!/bin/bash

				#!/bin/bash -ex

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-test

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )

				source "$SCRIPT_PARENT_DIR/common.sh"

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				if [[ ${JOB_NAME} == *"develop"* ]]; then

				  export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}

				fi

				mkdir -p ci_scripts/

				export TMP_DIR="${PWD}/build/win_tmp"

				export TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")

				cat >ci_scripts/download_image.py << EOL

				import os

				import sys

				import boto3

				import botocore

				mkdir -p $TMP_DIR/build/torch

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				session = boto3.session.Session()

				s3 = session.resource('s3')

				BUCKET_NAME = 'ossci-windows-build'

				KEY = 'pytorch/'+IMAGE_COMMIT_TAG+'.7z'

				LOCAL_FILE_PATH = sys.argv[1]

				try:

				    s3.Bucket(BUCKET_NAME).download_file(KEY, LOCAL_FILE_PATH)

				except botocore.exceptions.ClientError as e:

				    if e.response['Error']['Code'] == "404":

				        print("The object does not exist.")

				    else:

				        raise

				# This directory is used only to hold "pytorch_env_restore.bat", called via "setup_pytorch_env.bat"

				CI_SCRIPTS_DIR=$TMP_DIR/ci_scripts

				mkdir -p $CI_SCRIPTS_DIR

				EOL

				if [ -n "$(ls $CI_SCRIPTS_DIR/*)" ]; then

				    rm $CI_SCRIPTS_DIR/*

				fi

				cat >ci_scripts/setup_pytorch_env.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\curl-7.57.0-win64-mingw\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers

				:: Install Miniconda3

				IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				call conda install -y -q numpy mkl cffi pyyaml boto3

				pip install ninja

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				set PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt

				set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64

				set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set PYTHONPATH=%CD%\\test;%PYTHONPATH%

				cd test/

				python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z

				7z x %IMAGE_COMMIT_TAG%.7z

				cd ..

				EOL

				cat >ci_scripts/test_python_nn.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --include nn --verbose && cd ..

				EOL

				cat >ci_scripts/test_python_all_except_nn.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --exclude nn --verbose && cd ..

				EOL

				run_tests() {

				    if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				        ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat

				        $SCRIPT_HELPERS_DIR/test_python_nn.bat && \

				        $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat && \

				        $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \

				        $SCRIPT_HELPERS_DIR/test_libtorch.bat

				    else

				        if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				            ci_scripts/test_python_nn.bat

				            $SCRIPT_HELPERS_DIR/test_python_nn.bat

				        elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				            ci_scripts/test_python_all_except_nn.bat

				            $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat && \

				            $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \

				            $SCRIPT_HELPERS_DIR/test_libtorch.bat

				        fi

				    fi

				}

				run_tests && echo "TEST PASSED"

				run_tests && assert_git_not_dirty && echo "TEST PASSED"

									
										10

.jenkins/run-shellcheck.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,10 @@

				#!/bin/bash -xe

				# One may want to invoke this script locally as follows:

				#

				#   .jenkins/run-shellcheck.sh --color=always | less -R

				EXCLUSIONS=SC2086,SC1091,SC2155,SC1090,SC2164,SC1003

				find .jenkins/pytorch -name *.sh | xargs shellcheck --exclude=$EXCLUSIONS --external-sources "$@" || true

									
										2

.travis.aten.yml
									
												View File
												
				@ -27,5 +27,5 @@ matrix:

				    include:

				        env: LINT_CHECK

				        python: "2.7"

				        install: pip install flake8

				        install: pip install flake8-mypy

				        script: flake8

Compare commits

5407 Commits v0.4.1 ... v1.1.0

2 .circleci/.gitignore vendored Normal file Unescape Escape View File

38 .circleci/README.md Normal file Unescape Escape View File

0 caffe/__init__.py → .circleci/cimodel/__init__.py Unescape Escape View File

0 caffe/proto/__init__.py → .circleci/cimodel/data/__init__.py Unescape Escape View File

153 .circleci/cimodel/data/binary_build_data.py Normal file Unescape Escape View File

213 .circleci/cimodel/data/binary_build_definitions.py Normal file Unescape Escape View File

111 .circleci/cimodel/data/caffe2_build_data.py Normal file Unescape Escape View File

166 .circleci/cimodel/data/caffe2_build_definitions.py Normal file Unescape Escape View File

18 .circleci/cimodel/data/dimensions.py Normal file Unescape Escape View File

147 .circleci/cimodel/data/pytorch_build_data.py Normal file Unescape Escape View File

299 .circleci/cimodel/data/pytorch_build_definitions.py Normal file Unescape Escape View File

0 caffe2/contrib/cuda-convnet2/cudaconvnet/__init__.py → .circleci/cimodel/lib/__init__.py Unescape Escape View File

101 .circleci/cimodel/lib/conf_tree.py Normal file Unescape Escape View File

13 .circleci/cimodel/lib/miniutils.py Normal file Unescape Escape View File

64 .circleci/cimodel/lib/miniyaml.py Normal file Unescape Escape View File

86 .circleci/cimodel/lib/visualization.py Normal file Unescape Escape View File

4406 .circleci/config.yml View File

39 .circleci/ensure-consistency.py Executable file Unescape Escape View File

126 .circleci/generate_config_yml.py Executable file Unescape Escape View File

8 .circleci/regenerate.sh Executable file Unescape Escape View File

50 .circleci/scripts/binary_checkout.sh Executable file Unescape Escape View File

28 .circleci/scripts/binary_install_miniconda.sh Executable file Unescape Escape View File

30 .circleci/scripts/binary_linux_build.sh Executable file Unescape Escape View File

35 .circleci/scripts/binary_linux_test.sh Executable file Unescape Escape View File

38 .circleci/scripts/binary_linux_upload.sh Executable file Unescape Escape View File

24 .circleci/scripts/binary_macos_build.sh Executable file Unescape Escape View File

30 .circleci/scripts/binary_macos_test.sh Executable file Unescape Escape View File

38 .circleci/scripts/binary_macos_upload.sh Executable file Unescape Escape View File

102 .circleci/scripts/binary_populate_env.sh Executable file Unescape Escape View File

46 .circleci/scripts/binary_run_in_docker.sh Executable file Unescape Escape View File

25 .circleci/verbatim-sources/binary-build-tests.yml Normal file Unescape Escape View File

47 .circleci/verbatim-sources/binary_update_htmls.yml Normal file Unescape Escape View File

288 .circleci/verbatim-sources/header-section.yml Normal file Unescape Escape View File

193 .circleci/verbatim-sources/job-specs-custom.yml Normal file Unescape Escape View File

94 .circleci/verbatim-sources/linux-binary-build-defaults.yml Normal file Unescape Escape View File

190 .circleci/verbatim-sources/linux-build-defaults.yml Normal file Unescape Escape View File

61 .circleci/verbatim-sources/macos-binary-build-defaults.yml Normal file Unescape Escape View File

83 .circleci/verbatim-sources/macos-build-defaults.yml Normal file Unescape Escape View File

95 .circleci/verbatim-sources/nightly-binary-build-defaults.yml Normal file Unescape Escape View File

49 .circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml Normal file Unescape Escape View File

4 .circleci/verbatim-sources/workflows-binary-build-header.yml Normal file Unescape Escape View File

26 .circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml Normal file Unescape Escape View File

14 .circleci/verbatim-sources/workflows-nightly-uploads-header.yml Normal file Unescape Escape View File

8 .circleci/verbatim-sources/workflows-pytorch-macos-builds.yml Normal file Unescape Escape View File

15 .circleci/verbatim-sources/workflows-s3-html.yml Normal file Unescape Escape View File

12 .circleci/verbatim-sources/workflows.yml Normal file Unescape Escape View File

51 .clang-tidy Unescape Escape View File

2 .ctags.d/pytorch.ctags Normal file Unescape Escape View File

12 .flake8 Normal file Unescape Escape View File

1 .gitattributes vendored Normal file Unescape Escape View File

49 .github/ISSUE_TEMPLATE/bug-report.md vendored Normal file Unescape Escape View File

9 .github/ISSUE_TEMPLATE/documentation.md vendored Normal file Unescape Escape View File

24 .github/ISSUE_TEMPLATE/feature-request.md vendored Normal file Unescape Escape View File

13 .github/ISSUE_TEMPLATE/questions-help-support.md vendored Normal file Unescape Escape View File

57 .gitignore vendored Unescape Escape View File

35 .gitmodules vendored Unescape Escape View File

46 .jenkins/caffe2/bench.sh Executable file Unescape Escape View File

377 .jenkins/caffe2/build.sh Unescape Escape View File

22 .jenkins/caffe2/common.sh Normal file Unescape Escape View File

171 .jenkins/caffe2/test.sh Unescape Escape View File

24 .jenkins/pytorch/build-asan.sh Unescape Escape View File

249 .jenkins/pytorch/build.sh Unescape Escape View File

35 .jenkins/pytorch/common.sh Unescape Escape View File

2 .jenkins/pytorch/docker-build-test.sh Unescape Escape View File

22 .jenkins/pytorch/enabled-configs.txt Unescape Escape View File

4 .jenkins/pytorch/macos-build-test.sh Unescape Escape View File

29 .jenkins/pytorch/macos-build.sh Unescape Escape View File

122 .jenkins/pytorch/macos-test.sh Unescape Escape View File

22 .jenkins/pytorch/multigpu-test.sh Unescape Escape View File

2 .jenkins/pytorch/perf_test/common.sh Unescape Escape View File

23 .jenkins/pytorch/perf_test/compare_with_baseline.py Unescape Escape View File

4 .jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh Unescape Escape View File

6 .jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh Unescape Escape View File

4 .jenkins/pytorch/perf_test/test_cpu_speed_torch.sh Unescape Escape View File

4 .jenkins/pytorch/perf_test/test_cpu_speed_torch_tensor.sh Unescape Escape View File

4 .jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh Unescape Escape View File

4 .jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh Unescape Escape View File

4 .jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh Unescape Escape View File

5407 Commits

v0.4.1 ... v1.1.0

2

.circleci/.gitignore vendored Normal file

View File

38

.circleci/README.md Normal file

View File

0

caffe/init.py → .circleci/cimodel/init.py

View File

0

caffe/proto/init.py → .circleci/cimodel/data/init.py

View File

153

.circleci/cimodel/data/binary_build_data.py Normal file

View File

213

.circleci/cimodel/data/binary_build_definitions.py Normal file

View File

111

.circleci/cimodel/data/caffe2_build_data.py Normal file

View File

166

.circleci/cimodel/data/caffe2_build_definitions.py Normal file

View File

18

.circleci/cimodel/data/dimensions.py Normal file

View File

147

.circleci/cimodel/data/pytorch_build_data.py Normal file

View File

299

.circleci/cimodel/data/pytorch_build_definitions.py Normal file

View File

0

caffe2/contrib/cuda-convnet2/cudaconvnet/init.py → .circleci/cimodel/lib/init.py

View File

101

.circleci/cimodel/lib/conf_tree.py Normal file

View File

13

.circleci/cimodel/lib/miniutils.py Normal file

View File

64

.circleci/cimodel/lib/miniyaml.py Normal file

View File

86

.circleci/cimodel/lib/visualization.py Normal file

View File

4406

.circleci/config.yml

View File

39

.circleci/ensure-consistency.py Executable file

View File

126

.circleci/generate_config_yml.py Executable file

View File

8

.circleci/regenerate.sh Executable file

View File

50

.circleci/scripts/binary_checkout.sh Executable file

View File

28

.circleci/scripts/binary_install_miniconda.sh Executable file

View File

30

.circleci/scripts/binary_linux_build.sh Executable file

View File

35

.circleci/scripts/binary_linux_test.sh Executable file

View File

38

.circleci/scripts/binary_linux_upload.sh Executable file

View File

24

.circleci/scripts/binary_macos_build.sh Executable file

View File

30

.circleci/scripts/binary_macos_test.sh Executable file

View File

38

.circleci/scripts/binary_macos_upload.sh Executable file

View File

102

.circleci/scripts/binary_populate_env.sh Executable file

View File

46

.circleci/scripts/binary_run_in_docker.sh Executable file

View File

25

.circleci/verbatim-sources/binary-build-tests.yml Normal file

View File

47

.circleci/verbatim-sources/binary_update_htmls.yml Normal file

View File

288

.circleci/verbatim-sources/header-section.yml Normal file

View File

193

.circleci/verbatim-sources/job-specs-custom.yml Normal file

View File

94

.circleci/verbatim-sources/linux-binary-build-defaults.yml Normal file

View File

190

.circleci/verbatim-sources/linux-build-defaults.yml Normal file

View File

61

.circleci/verbatim-sources/macos-binary-build-defaults.yml Normal file

View File

83

.circleci/verbatim-sources/macos-build-defaults.yml Normal file

View File

95

.circleci/verbatim-sources/nightly-binary-build-defaults.yml Normal file

View File

49

.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml Normal file

View File

4

.circleci/verbatim-sources/workflows-binary-build-header.yml Normal file

View File

26

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml Normal file

View File

14

.circleci/verbatim-sources/workflows-nightly-uploads-header.yml Normal file

View File

8

.circleci/verbatim-sources/workflows-pytorch-macos-builds.yml Normal file

View File

15

.circleci/verbatim-sources/workflows-s3-html.yml Normal file

View File

12

.circleci/verbatim-sources/workflows.yml Normal file

View File

51

.clang-tidy

View File

2

.ctags.d/pytorch.ctags Normal file

View File

12

.flake8 Normal file

View File

1

.gitattributes vendored Normal file

View File

49

.github/ISSUE_TEMPLATE/bug-report.md vendored Normal file

View File

9

.github/ISSUE_TEMPLATE/documentation.md vendored Normal file

View File

24

.github/ISSUE_TEMPLATE/feature-request.md vendored Normal file

View File

13

.github/ISSUE_TEMPLATE/questions-help-support.md vendored Normal file

View File

57

.gitignore vendored

View File

35

.gitmodules vendored

View File

46

.jenkins/caffe2/bench.sh Executable file

View File

377

.jenkins/caffe2/build.sh

View File

22

.jenkins/caffe2/common.sh Normal file

View File

171

.jenkins/caffe2/test.sh

View File

24

.jenkins/pytorch/build-asan.sh

View File

249

.jenkins/pytorch/build.sh

View File

35

.jenkins/pytorch/common.sh

View File

2

.jenkins/pytorch/docker-build-test.sh

View File

22

.jenkins/pytorch/enabled-configs.txt

View File

4

.jenkins/pytorch/macos-build-test.sh

View File

29

.jenkins/pytorch/macos-build.sh

View File

122

.jenkins/pytorch/macos-test.sh

View File

22

.jenkins/pytorch/multigpu-test.sh

View File

2

.jenkins/pytorch/perf_test/common.sh

View File

23

.jenkins/pytorch/perf_test/compare_with_baseline.py

View File

4

.jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh

View File

6

.jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh

View File

4

.jenkins/pytorch/perf_test/test_cpu_speed_torch.sh

View File

4

.jenkins/pytorch/perf_test/test_cpu_speed_torch_tensor.sh

View File

4

.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh

View File

4

.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh

View File

4

.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh

View File

9

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh

View File