pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-28 10:34:54 +08:00

Author	SHA1	Message	Date
Eli Uriegas	6d488714a7	.circleci: Specify setup job to run on everything (#35013 ) Summary: CircleCI by default, chooses to run 0 jobs on tags meaning that when we tag a build that no job is run if a dependent job does not contain the correct filters. This adds an explicit configuration to run the setup job on every branch and every tag that CircleCI can run on. For more information on CircleCI filters and what they do (and more importantly what they do not do) visit: https://circleci.com/docs/2.0/configuration-reference/#filters-1 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35013 Differential Revision: D20535560 Pulled By: seemethere fbshipit-source-id: 7ee5dddbc0a9416fd76ed198e5447318c53e1873	2020-03-19 09:36:27 -07:00
Jeff Daily	35d9874a35	in test_data_parallel.py, remove skipIfRocm from tests that pass (#34978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34978 Differential Revision: D20535920 Pulled By: mrshenli fbshipit-source-id: 3baa8608dd3b0dd5578bc32e56a2e6c1fe69492d	2020-03-19 09:16:43 -07:00
albanD	1f4a4aaf64	functional autograd api (#34066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34066 Basic implementation of https://github.com/pytorch/pytorch/issues/30632 Test Plan: Imported from OSS Differential Revision: D20260307 Pulled By: albanD fbshipit-source-id: 7db5c2411ddc3e954ff8fbbe93eb3b96a2bcfb2f	2020-03-19 08:24:07 -07:00
Edward Yang	96860af870	Revert D20164420: [1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd Test Plan: revert-hammer Differential Revision: D20164420 Original commit changeset: 3d4ed7423096 fbshipit-source-id: 67f0f9c11cee84df6dbe37db7821dd601227df66	2020-03-19 08:02:07 -07:00
Edward Yang	7c06b86e42	Revert D20518647: [pytorch][PR] [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer Test Plan: revert-hammer Differential Revision: D20518647 Original commit changeset: 4760d1d29df1 fbshipit-source-id: b84f1a06c2de27e147716279223a6844ef89f760	2020-03-19 07:53:43 -07:00
Mike Ruberry	5d92a6cc30	Revert D7778113: Reland "[RPC] Use qualified name str directly in RPC torch script code path" Test Plan: revert-hammer Differential Revision: D7778113 Original commit changeset: b830c03ac946 fbshipit-source-id: ef08b287a6db58320c738cde0c99b3333f5724eb	2020-03-19 06:05:23 -07:00
Mike Ruberry	9c4683e8e3	Revert D20312366: [pytorch][PR] Added type promotion logic for complex numbers Test Plan: revert-hammer Differential Revision: D20312366 Original commit changeset: 90f00a1a916d fbshipit-source-id: 4510739a888b2eec5d8a72e792998ac46da6d82a	2020-03-19 05:55:57 -07:00
Mike Ruberry	0d8447a9b8	Warns when performing integer division with div and addcdiv (#34570 ) Summary: Per title. In the future we want to make div(), the division operator, and addcdiv perform true division as in Python 3, NumPy, and JAX. To do this without silently breaking users we plan to: - Warn (once) in 1.5 when a user performs integer division using div or addcdiv - RuntimeError in 1.6 when a user attempts to perform integer division using div or addcdiv - Always perform true division in 1.7 using div, /, and addcdiv Users can use true_divide or floor_divide today to explicitly specify the type of division they like. A test for this behavior is added to test_type_promotion. Unfortunately, because we are only warning once (to avoid a deluge) the test only uses maybeWarns Regex. The XLA failure is real but will be solved by https://github.com/pytorch/pytorch/pull/34552. I'll be sure to land that PR first to avoid temporarily breaking the XLA build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34570 Differential Revision: D20529211 Pulled By: mruberry fbshipit-source-id: 65af5a9641c5825175d029e8413c9e1730c661d0	2020-03-19 04:10:55 -07:00
Nikita Shulga	6f737dd4a3	Fix signed-unsigned warnings (#34791 ) Summary: And few typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791 Test Plan: CI Differential Revision: D20524879 Pulled By: malfet fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409	2020-03-19 00:29:56 -07:00
anjali411	c8f665dcb6	Added type promotion logic for complex numbers (#34093 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/33780 After this PR: 1. dtype promotion logic will correctly work for ops involving complex scalars 2. torch.ComplexFloatTensor, torch.ComplexDoubleTensor works 3. added alias for complex64 (cfloat) and complex128 (cdouble) 4. added an internal function get_complex_default_dtype (consciously not exposed in public API) >>> 1jtorch.ones(2) tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64) >>> torch.set_default_dtype(torch.float64) >>> 1jtorch.ones(2) tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128) >>> 1j + torch.ones(2) tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128) >>> torch.tensor(1j) + torch.ones(2,2) tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)], [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093 Differential Revision: D20312366 Pulled By: anjali411 fbshipit-source-id: 90f00a1a916d9c8eeda101eb6e9d250fce569815	2020-03-18 23:36:13 -07:00
Shihao Xu	d616cad676	Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#34962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34962 Relanding #34733. Fix is in https://github.com/pytorch/pytorch/pull/34988. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` ``` buck test mode/dev //caffe2/test/distributed/rpc/jit:rpc_fork_thrift -- test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D7778113 fbshipit-source-id: b830c03ac9463075fca248eba75be364b0e8b080	2020-03-18 22:25:09 -07:00
Natalia Gimelshein	be82e554fe	Revert D20524479: [pytorch][PR] [C++ API Parity] Add xor_convergence test for lbfgs Test Plan: revert-hammer Differential Revision: D20524479 Original commit changeset: 3413779676ab fbshipit-source-id: ef8007ed6c184bc8b8751eb713aac2a891260048	2020-03-18 21:56:17 -07:00
James Reed	153b16ef4c	Doxygen for torchbind (#35007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35007 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D20525680 Pulled By: jamesr66a fbshipit-source-id: aaa768f395e30dcec8007d50e17f21837c306719	2020-03-18 21:49:24 -07:00
Mikhail Zolotukhin	eef17edaa3	Fix warnings in test/test_jit_fuser.py (#34980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34980 We were passing sample inputs to `torch.jit.script` (as if it was `torch.jit.trace`), but this parameter was treated as an optional `optimize` parameter. That parameter is deprecated and that caused a warning. Differential Revision: D20520369 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 87b40a5e35bfc4a3d7a5d95494632bfe117e40b7	2020-03-18 19:55:25 -07:00
Michael Suo	55b254e114	update gitignore to include clangd index (#35018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35018 Test Plan: Imported from OSS Differential Revision: D20528402 Pulled By: suo fbshipit-source-id: badb487a4fbb0299b49c1b1022bcd7b61eba1e88	2020-03-18 19:53:03 -07:00
Nikita Shulga	d3b6099366	[build] Update gloo submodule (#34969 ) Summary: Update gloo submodule to `113bde13035594cafdca247be953610b53026553` be compatible with separate compilation introduced by https://github.com/facebookincubator/gloo/pull/251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34969 Test Plan: CI Differential Revision: D20527163 Pulled By: malfet fbshipit-source-id: 300d83d8fe95d57b8d740543efada3c56ac7b493	2020-03-18 19:24:23 -07:00
Omkar Salpekar	5f67c923f1	[1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd (#34638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34638 Fixes: https://github.com/pytorch/pytorch/issues/27643 This PR manages notifying workers in the event of a failure during distributed autograd. Gracefully handles propagating errors across all nodes in the backward pass and sets state in the local autograd engines accordingly. (Note: this ignores all push blocking failures!) Test Plan: Added 2 new tests checking errors when they are thrown in an intermediate node during distributed autograd. Ensured that all existing distributed autograd tests pass. Differential Revision: D20164420 fbshipit-source-id: 3d4ed74230969ac70bb763f1b5b1c16d979f66a2	2020-03-18 18:56:14 -07:00
Nikita Shulga	a73dfcf8cf	Adjust ProtoBufPatch to protobuf-3.11.x (#35008 ) Summary: `GetEmptyStringAlreadyInited` invocation pattern in protobuf generated header files chanegd to `:PROTOBUF_NAMESPACE_ID::internal::GetEmptyStringAlreadyInited`, where `PROTOBUF_NAMESPACE_ID` is defined in `protobuf/port_def.inc` as `google::protobuf` This likely to have changed around protobuf-3.8.x time, but I've only tested it using protobuf-3.11.4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35008 Test Plan: Update `third-party/protobuf` submodule to 3.11.4, compile and run `pattern_net_transform_test` Differential Revision: D20526949 Pulled By: malfet fbshipit-source-id: fddaa3622c48ad883612c73c40a20d306d88d66b	2020-03-18 18:35:23 -07:00
Shihao Xu	e5ee95e448	[RPC] Add to confirmed users immediately if the fork is shared from owner, instead of adding nothing to pending users (#34988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34988 In https://github.com/pytorch/pytorch/pull/31893, we introduced a confirmedUsers_ map in RRefContext. For the case that the fork is shared from the owner, there is no `pendingUsers_` intermediate phase for this fork, we should put this fork into `confirmedUsers_` immediately. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork ``` Differential Revision: D7735909 fbshipit-source-id: 14c36a16486f0cc9618dcfb111fe5223781b647d	2020-03-18 18:17:41 -07:00
anjali411	b8e043abca	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20518647 Pulled By: anjali411 fbshipit-source-id: 4760d1d29df1784e2d01e2a476d2a08e9df4ea1c	2020-03-18 17:28:57 -07:00
Meghan Lele	2a1c83823d	[tools] Parallelize tools/clang_format_new.py (#34750 ) Summary: Summary This commit parallelizes the invocation of `clang-format` on all files in `tools/clang_format_new.py` using `asyncio`. Testing Ran and timed the script. Before ``` $ time ./tools/clang_format_new.py --diff ... real 0m7.615s user 0m6.012s sys 0m1.634s ``` After ``` $ time ./tools/clang_format_new.py --diff ... Some files not formatted correctly real 0m2.156s user 0m8.488s sys 0m3.201s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34750 Differential Revision: D20523133 Pulled By: SplitInfinity fbshipit-source-id: 509741a0b4fcfcdcd7c5a45654e3453b4874d256	2020-03-18 17:27:02 -07:00
Jiakai Liu	6e47e7bf52	[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites Summary: There are three guards related to mobile build: * AutoGradMode * AutoNonVariableTypeMode * GraphOptimizerEnabledGuard Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size. Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards: Full JIT: still set all three guards. More specifically: * OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI. * FB: Not covered by this diff (as we are using mobile interpreter for most internal builds). Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically: * OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites. * FB: JNI callsites: Use the unified LiteJITCallGuard. For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites. Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch. PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods. Test Plan: - CI Reviewed By: xta0 Differential Revision: D20498017 fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728	2020-03-18 17:19:35 -07:00
Nikolay Korovaiko	a4048b4703	port ge changes from bert/pytorch_fusion (#34942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34942 Differential Revision: D20505894 Pulled By: Krovatkin fbshipit-source-id: 7b442fae6aa2b1a29891b94f824094a1fddae4a2	2020-03-18 17:13:24 -07:00
anjali411	4521477f83	[C++ API Parity] Add xor_convergence test for lbfgs (#35001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35001 Differential Revision: D20524479 Pulled By: anjali411 fbshipit-source-id: 3413779676ab95c1ee82298f95d3441a89873107	2020-03-18 17:06:53 -07:00
Elias Ellison	bcbde490e4	Fix flake (#34974 ) Summary: fix flake, add overload names Pull Request resolved: https://github.com/pytorch/pytorch/pull/34974 Differential Revision: D20519191 Pulled By: eellison fbshipit-source-id: d08d36b64397287cad484690074e694d8a0e472e	2020-03-18 16:45:33 -07:00
Jerry Zhang	b2e5e0cad6	[quant][graphmode] quantization support for aten::rehshape (#34803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34803 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20504457 fbshipit-source-id: 5ca691ef4880c72d30d62390e63e3288b2f06dce	2020-03-18 15:40:43 -07:00
Li Zhang (DAI)	69e701fbf9	Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning Summary: Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning Reviewed By: mraway Differential Revision: D20286298 fbshipit-source-id: de3e029611d843f38d3f42ecd4148358f7e14a2b	2020-03-18 15:28:00 -07:00
davidriazati	e35dd4f603	[jit] Include call stack in OSError message (#34669 ) Summary: Previously there was no indication of why you would get an `OSError` for something (such as the generated methods of a `dataclass`). ](https://our.intern.facebook.com/intern/diff/20426570/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34669 Pulled By: driazati Differential Revision: D20426570 fbshipit-source-id: 45d63631984fa26a87c03de5523fb10d8abbc6db	2020-03-18 15:10:23 -07:00
Mike Ruberry	3b7e1cd2cc	Makes floor_divide a method, adds sparse floor division (#34552 ) Summary: (Updated per review feedback) `torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to: - have an out variant: `floor_divide(x, y, out=z)` - be a method on a tensor: `x.floor_divide(y)` - have an in-place variant: `x.floor_divide_(y)` - work with sparse tensors Tests are added to test_sparse.py and test_torch.py for these new behaviors. In addition, this PR: - cleans up the existing sparse division and true_division code and improves their error message - adds testing of sparse true_division to test_sparse.py - extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y). The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors. There are two potential follow-up issues suggested by this PR: - the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes - the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552 Differential Revision: D20509850 Pulled By: mruberry fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8	2020-03-18 15:00:53 -07:00
Jerry Zhang	d77d907f0e	[quant][graphmode] Add quantization support for aten::dropout (#34347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34347 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20504453 fbshipit-source-id: 1bab29e21d0564ed88cdeb4894addfe00ebbd390	2020-03-18 14:35:27 -07:00
Gao, Xiang	c747f09846	Add operator [] to c10::impl::ListIterator (#34926 ) Summary: This is causing failures on my Windows build Pull Request resolved: https://github.com/pytorch/pytorch/pull/34926 Differential Revision: D20501850 Pulled By: smessmer fbshipit-source-id: 92c72dd657b27b1786952dbdccfceff99f4ba743	2020-03-18 12:57:38 -07:00
ashish	064f6285af	Torchvision in jenkins testing (#34909 ) Summary: This pull request updates the Torchvision commit to use ROCm enabled torchvision in `.jenkins/pytorch/test.sh`. Pytorch tests: ``` test_SyncBatchNorm_process_group (__main__.TestDistBackend) test_alexnet (jit.test_models.TestModels) test_script_module_script_resnet (jit.test_models.TestModels) test_script_module_trace_resnet18 (jit.test_models.TestModels) test_torchvision_smoke (__main__.TestTensorBoardPytorchGraph) ``` in `test2` were skipped because torchvision was not installed in `test2` instead it was installed in `test1`. The PR moved torchvision test to correct place and thereby enabling the above mentioned tests. cc: ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/34909 Differential Revision: D20515333 Pulled By: ezyang fbshipit-source-id: 69439756a687ba441c1f8107233b4dbc1e108387	2020-03-18 12:45:51 -07:00
Mike Ruberry	1afc584188	Deprecates current torch.full integral type inference, adds torch.full complex type inference (#34709 ) Summary: Per title. Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases: - When integral fill values (including bool) are given - When complex fill values are given For example: ``` np.full((1, 2), 1).dtype : dtype('int64') np.full((1, 2), (1 + 1j)).dtype : dtype('complex128') ``` Whereas in PyTorch ``` torch.full((1, 2), 1).dtype : torch.float32 torch.full((1, 2), (1 + 1j)).dtype : RuntimeError: value cannot be converted to type float without overflow: (1,1) ``` This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly. The PR also implements inference for complex types. So that with it: ``` torch.full((1, 2), (1 + 1j)).dtype : torch.complex64 ``` The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned. A test for these behaviors is added to test_torch.py. Implementation note: This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object. Future Work: This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example. Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709 Differential Revision: D20509387 Pulled By: mruberry fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933	2020-03-18 12:19:31 -07:00
Michael	f3b8a470e1	Added functionality for all to take Lists as input (#34582 ) Summary: New pull request after rebase error in pull request https://github.com/pytorch/pytorch/issues/33923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34582 Differential Revision: D20447689 Pulled By: eellison fbshipit-source-id: 4296b64185eccb136b1b614b532deb3af20c7544	2020-03-18 12:01:30 -07:00
Edward Yang	d0577e19f0	Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only Test Plan: revert-hammer Differential Revision: D20346700 Original commit changeset: 12d77b391731 fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60	2020-03-18 11:42:51 -07:00
Shen Li	b35e544772	Minor fixes for RPC API doc (#34955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34955 Test Plan: Imported from OSS Differential Revision: D20512262 Pulled By: mrshenli fbshipit-source-id: 86ed099638fd32dc8fbde5a6f284239b146fd5e9	2020-03-18 11:20:32 -07:00
Shihao Xu	d29f450e63	Revert D20442573: [RPC] Use qualified name str directly in RPC torch script code path Test Plan: revert-hammer Differential Revision: D20442573 Original commit changeset: 87f8b7d94adc fbshipit-source-id: db0f10c28352d2b3ca21b5357e8e09c01a50018c	2020-03-18 11:00:09 -07:00
Jerry Zhang	689598df0b	[quant][graphmode] insert quant/dequant work for duplicated debugName (#34315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34315 previously we register quantization parameter attributes using debugName of the observed value, but debugName is not unique, this PR addresses this problem by making attribute names unique Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20504455 fbshipit-source-id: 6dd83bdfc4e4dc77ad3af3d5b48750fb01b2fce1	2020-03-18 10:49:25 -07:00
Michael Carilli	aaa8f02156	Eager autocasting, out-of-place ops only (#32140 ) Summary: Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140 Differential Revision: D20346700 Pulled By: ezyang fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f	2020-03-18 10:28:21 -07:00
Xiao Wang	fa5bc9fa2e	Fix problem in NHWC max_pool2d; use accumulate type in NHWC max_pool2d (#34934 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/34736. Both code snippet in that issue can now execute normally. More tests are also added. This PR is a follow-up on https://github.com/pytorch/pytorch/issues/34519, where one variable was mistakenly missed when updating the max_pool2d kernel. This PR also uses accumulate type of scalar_t in the backward kernel, which resolves the numerical precision issue when stride < kernel_size on fp16. cc csarofeen ptrblck jjsjann123 VitalyFedyunin ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/34934 Differential Revision: D20512062 Pulled By: VitalyFedyunin fbshipit-source-id: a461ebbb3e3684aa183ae40e38d8f55bb6f4fee1	2020-03-18 08:32:10 -07:00
Edward Yang	d927d58c2a	Revert D20289209: Support RowWiseSparseAdam on GPU Test Plan: revert-hammer Differential Revision: D20289209 Original commit changeset: a7a8a21bd18c fbshipit-source-id: 4a8ae684d099a5499c28b7e65578fc7ab10b248d	2020-03-18 07:35:07 -07:00
Mike Ruberry	a1eaaea288	Revert D20497453: [pytorch][PR] Makes floor_divide a method, adds sparse floor division Test Plan: revert-hammer Differential Revision: D20497453 Original commit changeset: ac326f2007d8 fbshipit-source-id: b94b89b1a25521506e3d0a6b072d3d4d8c55e63d	2020-03-18 01:48:50 -07:00
Nikita Shulga	a3de359464	Do not throw from CUDAContext destructor (#34756 ) Summary: Throwing from destructor leads to undefined behaviour (most often to segault) So it's better to leak memory then segault Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756 Test Plan: Run `test_pytorch_onnx_caffe2` Differential Revision: D20504228 Pulled By: malfet fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab	2020-03-18 00:13:18 -07:00
Mike Ruberry	b7129050e7	Makes floor_divide a method, adds sparse floor division (#34552 ) Summary: (Updated per review feedback) `torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to: - have an out variant: `floor_divide(x, y, out=z)` - be a method on a tensor: `x.floor_divide(y)` - have an in-place variant: `x.floor_divide_(y)` - work with sparse tensors Tests are added to test_sparse.py and test_torch.py for these new behaviors. In addition, this PR: - cleans up the existing sparse division and true_division code and improves their error message - adds testing of sparse true_division to test_sparse.py - extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y). The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors. There are two potential follow-up issues suggested by this PR: - the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes - the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552 Differential Revision: D20497453 Pulled By: mruberry fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d	2020-03-18 00:01:45 -07:00
Jongsoo Park	bcbdba450c	[caffe2] open source 2/4-bit SLS operators (#34903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34903 Reattempt of D20461609 Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM Test Plan: CI Reviewed By: jianyuh Differential Revision: D20495304 fbshipit-source-id: 66a99677583f50fd40e29c514710c7b1a8cdbc29	2020-03-17 22:55:10 -07:00
anjali411	d7e4a379a0	[C++ API Parity] LBFGS optimizer step() update and added closure to the Optimizer step() function (#34564 ) Summary: Follow-ups after this PR: * Remove `LossClosureOptimizer`, and merge `Optimizer` into `OptimizerBase` (and rename the merged class to Optimizer) * Merge the LBFGS-specific serialize test function and the generic `test_serialize_optimizer` function, possibly by passing a bool `has_only_global_state` flag into the `test_serialize_optimizer` function to denote whether `size()` should be equal to 1 or 2? * https://github.com/pytorch/pytorch/pull/34564#discussion_r393780303 * It seems that we don't have the equivalent `XORConvergence_LBFGS` test like the other optimizers, and it would be good to add one * Remove mentions of `parameters_` in optimizer.cpp, de-virtualize all functions, and remove the `OptimizerBase(std::vector<Tensor> parameters)` constructor from `OptimizerBase` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34564 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20495701 Pulled By: anjali411 fbshipit-source-id: 6d35286d2decb6f7dff93d9d3e57515770666622	2020-03-17 22:27:24 -07:00
svcscm	df20f5b374	Updating submodules Summary: GitHub commits: `70331595ce` `51ae830b00` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 045a70a24059fc1120d54d5b85ffe0e2831d2161	2020-03-17 21:34:16 -07:00
James Reed	130e720784	[torchbind] Add more comprehensive docscrings (#34906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34906 Test Plan: Imported from OSS Differential Revision: D20496221 Pulled By: jamesr66a fbshipit-source-id: 3863ec77324564f6f0f1c54b0cbd6c29d12f3c74	2020-03-17 20:41:18 -07:00
James Reed	09a7788a2f	[torchbind] Improve IValue custom class API and remove most Capsule stuff (#34848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34848 Test Plan: Imported from OSS Differential Revision: D20480514 Pulled By: jamesr66a fbshipit-source-id: 1c595faf34e00aab0a6202a8902426bd310551c3	2020-03-17 20:39:34 -07:00
Shen Li	c4fdba326d	Support using self as the destination in rpc.remote for builtin operators (#34931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34931 Test Plan: Imported from OSS Differential Revision: D20503571 Pulled By: mrshenli fbshipit-source-id: ed1454a349798b18b9953bbf13c86bc43d3b559d	2020-03-17 20:30:19 -07:00
Shihao Xu	b5edf329f8	[JIT] Make RPC RRef Owner WorkerInfo.name available to TorchScript (#34896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34896 Make TorchScript support calling ref.owner() to get owner worker id and calling ref.owner_name() to get owner worker name. Differential Revision: D7652208 fbshipit-source-id: a60125bb316ac2cf19a993cbd2affc933c0af7c9	2020-03-17 20:28:18 -07:00
Shen Li	95f1cb34b9	Revert D20480546: adds quantized implementation of hard sigmoid Test Plan: revert-hammer Differential Revision: D20480546 Original commit changeset: 9febcb44afd9 fbshipit-source-id: 4461b455e63448cf45237e23c988b492c3e0f1b0	2020-03-17 19:58:08 -07:00
Rohan Varma	ff3d205ee5	[rpc] handle exceptions in ProcessGroupAgent::enqueueRecv (#34413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34413 In this diff we have made various improvements to ProcessGroupAgent in order to accomodate edge and error cases such as a "non-clean" shutdown (shutdowns in which we abort RPC as quickly as possible, and don't wait for all pending work across all RPC agents to be completed): 1. Catch and log exceptions in `enqueueRecv`. This prevents us from calling `std::terminate()` in a different thread and logs an error message indicating the issue. With this we no longer have crashes caused by exceptions in this thread during non-graceful shutdown. 2. Provide cleaner error messages everywhere (and use `c10::str` where possible). One example is in `agent::send()`. 3. Add the ability to abort pending sends that cause blocking waits in `handleSend`. The reason we need to abort this is since during a non-graceful shutdown, we could become blocked waiting for these since there is no guarantee the remote end is still active and this would result in a long wait and eventual timeout. We abort these by adding them to a map, and go through this map during `shutdown()`. 4. Fix flaky tests: `test_handle_send_exceptions` and `test_backward_node_failure` and `test_backward_node_failure_python_udf`. These tests were flaky since they dealt with non-graceful shutdown of workers which has chances for a bunch of edge cases explained above. We have also refactored `createExceptionResponse`, `enqueueRecv`, and some test functions for the above reasons in this diff. For testing: Ensured that the tests are no longer flaky with 500 tests runs. Previously, these tests were flaky and disabled. Also added a unit test in the internal `ProcessGroupAgentTest.cpp`. ghstack-source-id: 100311598 Test Plan: Ensured that the tests are no longer flaky with 500 tests runs. Previously, these tests were flaky and disabled. Also added a unit test in the internal `ProcessGroupAgentTest.cpp`. Reviewed By: mrshenli Differential Revision: D20269074 fbshipit-source-id: de9cad7f7185f9864ffbb6b14cd8ca9f6ff8f465	2020-03-17 19:01:41 -07:00
Jerry Zhang	1c8e086537	[quant][graphmode][refactor] Change QParamMap to QParamVector (#34314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34314 Test Plan: . Imported from OSS Differential Revision: D20493032 fbshipit-source-id: fd945b861ae08e1d97f154aa2b1fb3099761882b	2020-03-17 18:35:15 -07:00
Yanli Zhao	4bd3e9b41b	fix barrier in jit test (#34901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34901 init_pg is needed for dist.barrier call, otherwise default process group may not be found for some rpc backend ghstack-source-id: 100319642 Test Plan: unit test Differential Revision: D20495321 fbshipit-source-id: a44241bd2ff6e1404eee9b241270a94e9fd114d0	2020-03-17 18:19:08 -07:00
Nikolay Korovaiko	74a28ff1dd	Make checkInputs more robust (#34838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34838 Differential Revision: D20500828 Pulled By: Krovatkin fbshipit-source-id: 7eff720dff2698423f3e65b3809ff6f598f936d7	2020-03-17 17:51:12 -07:00
Neeraj Pradhan	e43c2d59dd	Reduce memory overhead of categorical.sample (#34900 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34714 (using the discussed solution). Thanks to jjabo for flagging and suggesting this. Instead of expanding `probs` to prepend `sample_shape`, it is better to use the `num_samples` argument to `torch.multinomial` instead, which is faster and consumes lesser memory. Existing tests should cover this. I have profiled this on different inputs and the change results in faster `.sample` (e.g. 100X faster on the example in the issue), or at worst is similar to what we have now with the default `sample_shape` argument. cc. fritzo, alicanb, ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34900 Differential Revision: D20499065 Pulled By: ngimel fbshipit-source-id: e5be225e3e219bd268f5f635aaa9bf7eca39f09c	2020-03-17 17:49:41 -07:00
Shen Li	85c51a8c10	Fix dist autograd context Example block format (#34921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34921 Test Plan: Imported from OSS Differential Revision: D20500012 Pulled By: mrshenli fbshipit-source-id: 6c81123ad347726032c29630d7bf58feb6d8c5fd	2020-03-17 17:44:14 -07:00
Shen Li	f05abd1259	Fix example block format in Distributed Optimizer API doc (#34919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34919 Test Plan: Imported from OSS Differential Revision: D20500013 Pulled By: mrshenli fbshipit-source-id: d28cbdd1ec207e1e8501ce389b7040fb764f12ca	2020-03-17 17:44:09 -07:00
Shen Li	e87db8a77b	Fix example format in Distributed Autograd doc (#34914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34914 Test Plan: Imported from OSS Differential Revision: D20500015 Pulled By: mrshenli fbshipit-source-id: 55715fd1ffce143952d3f6ffcf60ee83ade0efb4	2020-03-17 17:44:01 -07:00
Shen Li	552f9d3a68	Minor fixes for RPC API docs (#34890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34890 Test Plan: Imported from OSS Differential Revision: D20491788 Pulled By: mrshenli fbshipit-source-id: 95a9821d70e0afe51f586b891845b3106c7105ce	2020-03-17 17:43:55 -07:00
Shen Li	3c48aadd98	Update descriptions for transmitting CUDA tensors (#34888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34888 Test Plan: Imported from OSS Differential Revision: D20491408 Pulled By: mrshenli fbshipit-source-id: 4ca35ac9edd4c1af4f2bae2cfb0f1f6060658d5c	2020-03-17 17:43:48 -07:00
Shen Li	800bdcf000	Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887 Test Plan: Imported from OSS Differential Revision: D20491409 Pulled By: mrshenli fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332	2020-03-17 17:43:42 -07:00
Shen Li	6446ccce76	Adding warnings for async Tensor serialization in remote and rpc_async (#34885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34885 Test Plan: Imported from OSS Differential Revision: D20491279 Pulled By: mrshenli fbshipit-source-id: 8c861e7c7e9ea39f9427f80bc4e75c72c0087366	2020-03-17 17:43:35 -07:00
Shen Li	0d857d55b9	Add a warning for RRef serialization (#34884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34884 Test Plan: Imported from OSS Differential Revision: D20491278 Pulled By: mrshenli fbshipit-source-id: fd00701fd0090639ffe392f40610426c78bc9269	2020-03-17 17:40:55 -07:00
Nikita Shulga	f87cd83d11	Append multiple arguments to list of flags as multiple items (#34899 ) Summary: This makes PyTorch compileable(but not linkable) with `CUDA_SEPARABLE_COMPILATION` option enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34899 Test Plan: CI Differential Revision: D20501050 Pulled By: malfet fbshipit-source-id: 02903890a827fcc430a26f397d4d05999cf3a441	2020-03-17 16:48:32 -07:00
Jerry Zhang	841f7600bb	[quant][graphmode] Quantization pattern for aten::linear (#33854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33854 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20493031 fbshipit-source-id: bafd0a3ba5d07327d451b3915f043db33b012b53	2020-03-17 16:36:30 -07:00
Shihao Xu	71f02a481b	[RPC] Avoid polluting Python root logger on importing "torch" module (#34871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34871 We used to configure root logger in RPC module. A stream handler is added to `root.handlers`. This is not desired behavior for pytorch users. We should instead keep the root logger handler list untouched. We can configure the logger local to the rpc module, set it's log level, so it doesn't use it's ancestor, which is usually the root which has no stream handlers in most cases. https://docs.python.org/3/library/logging.html#logging.Logger.setLevel And add a stream handler to make it output to stdout, even if the root handlers is not configured and has an empty list. https://docs.python.org/3/library/logging.html#logging.Logger.addHandler https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler ghstack-source-id: 100322141 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_wait_all_workers ``` Differential Revision: D7677493 fbshipit-source-id: 88a66079e7348c79a7933e3527701917cbebb7ba	2020-03-17 16:07:06 -07:00
Vasiliy Kuznetsov	58c5b6d306	adds quantized implementation of hard sigmoid (#34607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34607 Adds quantized version of hardsigmoid activation. Note: not implementing the _ and .out versions is currently intended, because the implementation changes the scale and zp and it's nice to not allow the user to specify scale and zp. Lmk if we should handle this differently. Test Plan: tests benchmarks Imported from OSS Differential Revision: D20480546 fbshipit-source-id: 9febcb44afd920125ed2ca4900492f0b712078ea	2020-03-17 16:01:39 -07:00
Shihao Xu	97757dca79	Format register_ditributed_ops.cpp (#34922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34922 format Test Plan: ` Differential Revision: D7717743 fbshipit-source-id: 207bd46a6b0579adbd35f6417af239ec717c7a41	2020-03-17 15:42:18 -07:00
Terence Feng	0216c76e12	SNIFAE Template Constructors of IValue (#34647 ) (#34843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34843 Currently, we use not_ok_to_boxing to filter Dimname that can not be converted/constructed to IValue. The correct way should be SNIFAE the constructor of IValue. (Note: this ignores all push blocking failures!) Test Plan: PyTorch compiled after the code change. All unit test passed Imported from OSS Differential Revision: D20494886 fbshipit-source-id: 91dfba6a41a3ae2d6ceba9d4124cbf612ea3f080	2020-03-17 15:40:48 -07:00
Yan Xie	959a7138fd	Support RowWiseSparseAdam on GPU (#34341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34341 Implement RowWiseSparseAdam on CUDA Reviewed By: xianjiec Differential Revision: D20289209 fbshipit-source-id: a7a8a21bd18c1b9891f04f202d3ecaf183e30cad	2020-03-17 15:08:24 -07:00
rohithkrn	72e3d66f50	[ROCm] Fix for std::isnan regression in ROCm (#34664 ) Summary: Filing this PR since we are in the process of migrating ROCm CI to ROCm version 3.1. This patch is to ensure the correct functionality of float <-> bfloat16 conversion in rocm3.1. `std::isnan` regresses with rocm3.1. iotamudelta ezyang cc: ashishfarmer (original author of this patch) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34664 Differential Revision: D20440972 Pulled By: ezyang fbshipit-source-id: 1ccb911c88f05566d94e01878df6c70cf7f31242	2020-03-17 15:03:17 -07:00
Eli Uriegas	b227ea955e	.circleci: Remove should_run_job, no longer needed (#34326 ) Summary: Done at the recommendation of ezyang TODO: - [x] Sync `XImportant` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34326 Differential Revision: D20496786 Pulled By: seemethere fbshipit-source-id: 8c84e097d81db28d7dcda8720973bce77f6eb4f7	2020-03-17 15:01:59 -07:00
Derun Gu	5857a125df	Turn on exact_dtype by default on test_optim.py (#34825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34825 Test Plan: Imported from OSS Differential Revision: D20498111 Pulled By: great-way fbshipit-source-id: e689ca40c496b6b4cccb0df30bdae89b2c024f31	2020-03-17 14:41:13 -07:00
Owen Anderson	a4224886f3	Eliminate guards through max_pool ops. (#34512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34512 Differential Revision: D20478962 Pulled By: resistor fbshipit-source-id: 86fc926305f95cae8b334ed344d8e0cdd1ef7b2b	2020-03-17 14:00:00 -07:00
Hameer Abbasi	6b701de130	Add types argument to __torch_function__ (#34303 ) Summary: This PR adds the `types` argument to `__torch_function__` as per RFC 0001: https://github.com/pytorch/rfcs/pull/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34303 Differential Revision: D20474992 Pulled By: ezyang fbshipit-source-id: cdd40b3b38f3bda4ece8812a629f5db87e919d01	2020-03-17 13:32:00 -07:00
Eli Uriegas	275f5c8049	setup.py: Add numpy as required for install_requires (#34510 ) Summary: Was originally not a requirement but we should add it back here since it's required on import and we require it anyways for our conda packages. Tested with: ``` ❯ pkginfo -f requires_dist *.whl requires_dist: ['numpy'] ``` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34510 Differential Revision: D20352125 Pulled By: seemethere fbshipit-source-id: 383e396fe500ed7043d83c3df57d1772d0fff1e6	2020-03-17 13:31:55 -07:00
Edward Yang	940e678da9	Add back cudaHostRegister to cudart API. (#34665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34665 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20493861 Pulled By: ezyang fbshipit-source-id: 4215e3037a16be460f20cfc2859be5ee074128d3	2020-03-17 13:30:39 -07:00
Kimish Patel	7a3cf67fd8	Implement channels last upsample2d/3d forward pass kernel. (#34597 ) Summary: Thi PR implement channel last upsampling nearest for 2D/3D. This is supposed to be faster, plus, avoids converting formats going in and out of operator. Will post benchmarking numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34597 Test Plan: python test/test_nn.py TestNN.test_upsamplingNearest3d_channels_last Differential Revision: D20390583 Pulled By: kimishpatel fbshipit-source-id: e0162fb97604a261887f38fc957d3f787c80954e	2020-03-17 13:04:42 -07:00
Hector Yuen	3ad7dfa2cf	move emulation libraries to contrib (#34861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34861 start with unary ops Test Plan: buck test //glow/fb/test/numerics/... ``` [hyz@devgpu019.snc1 ~/fbsource/fbcode/caffe2/caffe2/contrib] buck test //glow/fb/test/numerics/... Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 2.0 sec Building: finished in 9.8 sec (100%) 14826/14826 jobs, 23 updated Total time: 11.9 sec Trace available for this run at /tmp/testpilot.20200316-143829.59858.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 7228e74a7f7e8e4934ab79a135930e665ca0e589 fbpkg e6db8251dbeb46b68a52a862744deff4 at Sun Mar 8 21:16:39 2020 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/795/t.par /proc/self/fd/4/__monkeytype_main_wrapper__.py:934: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working Discovering tests Running 34 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874425505432 ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slsw_all_one_tenth_mel_25 (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 0.000 1/34 (passed) ✓ glow/fb/test/numerics:test_batchnorm_nnpi_fp16nnpi - test_bn (glow.fb.test.numerics.test_batchnorm_nnpi_fp16.BatchnormTest) 1.974 2/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_clip (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 1.371 3/34 (passed) ✓ glow/fb/test/numerics:test_batchmatmul_nnpi_fp16nnpi - test_batch_matmul (glow.fb.test.numerics.test_batchmatmul_nnpi_fp16.TestBatchMatMul) 2.993 4/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_clip_graph (glow.fb.test.numerics.test_operator_onnxifi.CommonOpsTest) 0.536 5/34 (passed) ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - test_accumulator_limits (glow.fb.test.numerics.test_numerics_nnpi.AccTest) 0.472 6/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_mat_mul_graph (glow.fb.test.numerics.test_operator_onnxifi.MatMulTest) 0.495 7/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_tanh (glow.fb.test.numerics.test_op_nnpi_fp16.UnaryOpTest) 0.573 8/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_fc_graph (glow.fb.test.numerics.test_operator_onnxifi.FCTest) 0.793 9/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_concat_graph_sampe_shape (glow.fb.test.numerics.test_operator_onnxifi.ConcatTest) 0.441 10/34 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_small_sls (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 0.463 11/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_add_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.772 12/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_fp16fc_graph (glow.fb.test.numerics.test_operator_onnxifi.Fp16FCTest) 0.481 13/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_exercise (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 0.495 14/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_tanh_graph (glow.fb.test.numerics.test_operator_onnxifi.TanhTest) 0.538 15/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_add_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.517 16/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_numeric_cases (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 0.555 17/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_sub_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.692 18/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_sigmoid (glow.fb.test.numerics.test_op_nnpi_fp16.UnaryOpTest) 1.038 19/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_sigmoid_graph (glow.fb.test.numerics.test_operator_onnxifi.SigmoidTest) 0.530 20/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_div_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.590 21/34 (passed) ✓ glow/fb/test/numerics:test_sls_4bit_nnpi_fp16nnpi - test_slws_fused_4bit_rowwise_all_same (glow.fb.test.numerics.test_sls_4bit_nnpi_fp16.SparseLengthsSumTest) 0.607 22/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_div_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.583 23/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_mul_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.803 24/34 (passed) ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - test_accumulator_simple (glow.fb.test.numerics.test_numerics_nnpi.AccTest) 0.484 25/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slws_fused_8bit_rowwise_length1_graph (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 9.069 26/34 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_slws_fused_8bit_rowwise_intel2 (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 1.741 27/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_mul_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.902 28/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_sub_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.678 29/34 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_slws_fused_8bit_rowwise_all_same (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 0.726 30/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_num0 (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 1.621 31/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slws_fused_8bit_rowwise_graph (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 10.121 32/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_gather_graph (glow.fb.test.numerics.test_operator_onnxifi.CommonOpsTest) 99.675 33/34 (passed) ✓ glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_NNPI 0.156 34/34 (passed) {emoji:2702} glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_Interpreter 0.000 (OMITTED) Test output: > This test was disabled. > To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck). > To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/281474992503783 ✓ glow/fb/test/numerics:fp16_op_test - main 3.986 (passed) ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - main 12.606 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - main 12.622 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - main 12.688 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - main 12.688 (passed) ✓ glow/fb/test/numerics:test_batchnorm_nnpi_fp16nnpi - main 12.744 (passed) ✓ glow/fb/test/numerics:test_batchmatmul_nnpi_fp16nnpi - main 12.763 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - main 12.800 (passed) ✓ glow/fb/test/numerics:test_sls_4bit_nnpi_fp16nnpi - main 13.034 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874425505432 Summary (total time 134.18s): PASS: 43 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 1 glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_Interpreter ``` Reviewed By: yinghai Differential Revision: D20471053 fbshipit-source-id: 0bd8e69fbb843a02dc031f45a060aa78c602b42c	2020-03-17 12:50:41 -07:00
Nikita Shulga	cfab65d90d	Fix CMake Dev warning in caffe2/CMakeLists.txt (#34886 ) Summary: If arguments of `ENDIF()` block are non-empty, they should match corresponding `IF()` BLOCK Pull Request resolved: https://github.com/pytorch/pytorch/pull/34886 Test Plan: CI Differential Revision: D20494631 Pulled By: malfet fbshipit-source-id: 5fed86239b4a0cb4b3aedd02c950c1b800199d2d	2020-03-17 12:19:42 -07:00
Edward Yang	3e68d0c5d0	Revert D20461609: [caffe2] open source 2/4-bit SLS operators Test Plan: revert-hammer Differential Revision: D20461609 Original commit changeset: b3ef73ff10f2 fbshipit-source-id: e90ee5e34b1feab5b0bd582ed7e96e37de7044b0	2020-03-17 11:10:10 -07:00
Mikhail Zolotukhin	95833a49e6	[TensorExpr] Pull changes from bertmaher/pytorch_fusion. (#34842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34842 This PR (hopefully the last one of such kind) is merging changes from a side branch where tensor expessions based fuser work has been done so far. This PR is is a squashed version of changes in the side branch, which is available here: https://github.com/bertmaher/pytorch Differential Revision: D20478208 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 21556e009f1fd88099944732edba72ac40e9b9c0	2020-03-17 11:02:48 -07:00
Shihao Xu	ecd7c0f84c	[RPC] Use qualified name str directly in RPC torch script code path (#34733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34733 simplify ghstack-source-id: 100292435 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D20442573 fbshipit-source-id: 87f8b7d94adc03544f8e2955d01cd4702bb31a34	2020-03-17 10:28:52 -07:00
svcscm	a0b7a39a92	Updating submodules Summary: GitHub commits: `eff7e6d11d` `7812ac2fa9` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: a3f94dd5b48240169296d773b2828cd97b0871dd	2020-03-17 10:02:37 -07:00
peter	65889388d1	Use randomtemp to resolve intermittent cuda build errors (#34777 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25393. Core logic of randomtemp: https://github.com/peterjc123/randomtemp/blob/master/randomtemp/randomtemp.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/34777 Differential Revision: D20491243 Pulled By: ezyang fbshipit-source-id: 76b0e1819ac1e3f760d5451197bd75ea13df1f0b	2020-03-17 09:56:01 -07:00
peter	67cb018462	Print cuda install logs for Windows CI (#34858 ) Summary: Related to https://github.com/pytorch/pytorch/issues/34821. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34858 Differential Revision: D20491248 Pulled By: ezyang fbshipit-source-id: c6ddd59197a7bce31c1a3ea5dc28b0ee95d5c216	2020-03-17 09:37:25 -07:00
xiaobingsuper	acbca57d18	improve batch_norm contiguous case's performance (#34530 ) Summary: For batch_norm inference contiguous case, we can get a better performance by manually vectorize it. Test script: ``` X import torch import torch.nn as nn import time torch.manual_seed(0) for n in [1, 10, 100]: for c in [1, 10, 100]: for hw in [1, 10, 200]: m = nn.BatchNorm2d(c, affine=False) m.eval() input = torch.randn(20, c, hw, hw) # warm up for i in range(200): output = m(input) fwd_t = 0 for j in range(1000): t1 = time.time() output = m(input) t2 = time.time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 1000 * 1000 print("size = (%d, %d, %d, %d); compute time is %.4f(ms)" % (n, c, hw, hw, fwd_avg)) ``` Before: ``` size = (1, 1, 1, 1); compute time is 0.0110(ms) size = (1, 1, 10, 10); compute time is 0.0123(ms) size = (1, 1, 200, 200); compute time is 0.8166(ms) size = (1, 10, 1, 1); compute time is 0.0107(ms) size = (1, 10, 10, 10); compute time is 0.0257(ms) size = (1, 10, 200, 200); compute time is 8.7533(ms) size = (1, 100, 1, 1); compute time is 0.0122(ms) size = (1, 100, 10, 10); compute time is 0.1619(ms) size = (1, 100, 200, 200); compute time is 123.5674(ms) size = (10, 1, 1, 1); compute time is 0.0109(ms) size = (10, 1, 10, 10); compute time is 0.0123(ms) size = (10, 1, 200, 200); compute time is 0.5629(ms) size = (10, 10, 1, 1); compute time is 0.0107(ms) size = (10, 10, 10, 10); compute time is 0.0253(ms) size = (10, 10, 200, 200); compute time is 8.7817(ms) size = (10, 100, 1, 1); compute time is 0.0120(ms) size = (10, 100, 10, 10); compute time is 0.1655(ms) size = (10, 100, 200, 200); compute time is 123.2488(ms) size = (100, 1, 1, 1); compute time is 0.0109(ms) size = (100, 1, 10, 10); compute time is 0.0123(ms) size = (100, 1, 200, 200); compute time is 0.5740(ms) size = (100, 10, 1, 1); compute time is 0.0108(ms) size = (100, 10, 10, 10); compute time is 0.0257(ms) size = (100, 10, 200, 200); compute time is 8.7201(ms) size = (100, 100, 1, 1); compute time is 0.0122(ms) size = (100, 100, 10, 10); compute time is 0.1628(ms) size = (100, 100, 200, 200); compute time is 123.1739(ms) ``` After: ``` size = (1, 1, 1, 1); compute time is 0.0105(ms) size = (1, 1, 10, 10); compute time is 0.0114(ms) size = (1, 1, 200, 200); compute time is 0.5771(ms) size = (1, 10, 1, 1); compute time is 0.0105(ms) size = (1, 10, 10, 10); compute time is 0.0160(ms) size = (1, 10, 200, 200); compute time is 6.9851(ms) size = (1, 100, 1, 1); compute time is 0.0122(ms) size = (1, 100, 10, 10); compute time is 0.0848(ms) size = (1, 100, 200, 200); compute time is 98.6758(ms) size = (10, 1, 1, 1); compute time is 0.0105(ms) size = (10, 1, 10, 10); compute time is 0.0115(ms) size = (10, 1, 200, 200); compute time is 0.2690(ms) size = (10, 10, 1, 1); compute time is 0.0105(ms) size = (10, 10, 10, 10); compute time is 0.0159(ms) size = (10, 10, 200, 200); compute time is 6.6946(ms) size = (10, 100, 1, 1); compute time is 0.0123(ms) size = (10, 100, 10, 10); compute time is 0.0854(ms) size = (10, 100, 200, 200); compute time is 98.7327(ms) size = (100, 1, 1, 1); compute time is 0.0107(ms) size = (100, 1, 10, 10); compute time is 0.0116(ms) size = (100, 1, 200, 200); compute time is 0.2681(ms) size = (100, 10, 1, 1); compute time is 0.0104(ms) size = (100, 10, 10, 10); compute time is 0.0159(ms) size = (100, 10, 200, 200); compute time is 6.7507(ms) size = (100, 100, 1, 1); compute time is 0.0124(ms) size = (100, 100, 10, 10); compute time is 0.0852(ms) size = (100, 100, 200, 200); compute time is 98.6866(ms) ``` For real modle Resnext101, we can also get ~20% performance improvement for large batch size, Test script: ``` import torch import torchvision import torch import time torch.manual_seed(0) #torch.set_num_threads(1) model = torchvision.models.resnext101_32x8d().eval() for batch_size in [1, 64]: input = torch.randn(batch_size, 3, 224, 224) #warm up with torch.no_grad(): for i in range(5): output = model(input) fwd_t = 0 for i in range(10): t1 = time.time() output = model(input) t2 = time.time() fwd_t = fwd_t + (t2 - t1) time_fwd_avg = fwd_t / 10 * 1000 print("Throughput of resnext101 with batch_size = %d is %10.2f (imgs/s)" % (batch_size, batch_size * 1000/ time_fwd_avg )) ``` Before: ``` Throughput of resnext101 with batch_size = 1 is 7.89 (imgs/s) Throughput of resnext101 with batch_size = 64 is 13.02 (imgs/s) num_threads =1 Throughput of resnext101 with batch_size = 1 is 2.97 (imgs/s) Throughput of resnext101 with batch_size = 64 is 2.75 (imgs/s) ``` After: ``` Throughput of resnext101 with batch_size = 1 is 8.95 (imgs/s) Throughput of resnext101 with batch_size = 64 is 15.52 (imgs/s) num_threads = 1 Throughput of resnext101 with batch_size = 1 is 3.10 (imgs/s) Throughput of resnext101 with batch_size = 64 is 2.88 (imgs/s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34530 Differential Revision: D20479560 Pulled By: ngimel fbshipit-source-id: 2e788ebcd814556116c90553ec61159eeffb3c16	2020-03-17 09:22:35 -07:00
Hong Xu	a8ca340ad6	Remove all uses of AT_CHECK and replace them with TORCH_CHECK (#34846 ) Summary: AT_CHECK has been deprecated and provides no more features than TORCH_CHECK Pull Request resolved: https://github.com/pytorch/pytorch/pull/34846 Differential Revision: D20481339 Pulled By: mrshenli fbshipit-source-id: 1777e769a069a78e03118270294e5e273d516ca7	2020-03-17 08:59:02 -07:00
Edward Yang	76d9e76b4a	Default to erroring when failing to return from non-void function. (#34663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34663 Been bitten by this so many times. Never more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20425480 Pulled By: ezyang fbshipit-source-id: c4489efacc4149c9b57d1b8207cc872970c2501f	2020-03-17 07:31:56 -07:00
Jongsoo Park	d9b97a4ffd	[caffe2] open source 2/4-bit SLS operators (#34783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34783 Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM Test Plan: CI Reviewed By: yinghai Differential Revision: D20461609 fbshipit-source-id: b3ef73ff10f2433afe06ffa73fe1145282d9ec4c	2020-03-17 01:00:31 -07:00
James Reed	089a0a2117	[torchbind] Test moving custom classes to/from IValue (#34847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34847 Test Plan: Imported from OSS Differential Revision: D20480512 Pulled By: jamesr66a fbshipit-source-id: 87f5f8ea8764e26d383b17e4f72538166ddd0655	2020-03-16 23:57:42 -07:00
James Reed	699a4ed8f5	[testing][do not land] (#34605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34605 Test Plan: Imported from OSS Differential Revision: D20393219 Pulled By: jamesr66a fbshipit-source-id: c74d886f5f01061294203a002b72b75a3c446f09	2020-03-16 23:56:00 -07:00
Yanli Zhao	89cbc0edea	fix tests that could have racy script module instantiation (#34792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34792 it is not thread safe to initiate script module in multiple threads. for both test_remote_script_module and test_torchscript_functions_not_supported, it is possible that client thread is initiating MyScriptModule while server thread is initiating it as well in the same rank process. removing MyScriptModule instatiation in client thread, it is not needed actually. ghstack-source-id: 100266609 Test Plan: unit tests Differential Revision: D20463234 fbshipit-source-id: 6ff70ad90fa50b0b44c78df2495b4bcaabb4487b	2020-03-16 23:14:07 -07:00
Nikita Shulga	e70c28856f	[Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811 ) Summary: To speed up compilation time Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811 Test Plan: CI Differential Revision: D20476992 Pulled By: malfet fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792	2020-03-16 22:15:18 -07:00
Ailing Zhang	471ddacd8b	Add retry decorator and use it for Hub tests. (#34829 ) Summary: fix https://github.com/pytorch/pytorch/issues/34751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34829 Differential Revision: D20476231 Pulled By: ailzhang fbshipit-source-id: eb38ee655e28250352b15e8e37b3b39310a7c378	2020-03-16 20:19:45 -07:00
Supriya Rao	b336deb6ee	[quant][mobile] Not use qnnpack max_pool2d if ceil_mode is true (#34844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34844 QNNPACK max_pool2d operator does not support ceil_mode so this can cause crashes in the kernel when it is set to true. We default to the server implementation when ceil_mode is set to true Test Plan: python test/test_quantized.py Imported from OSS Differential Revision: D20478701 fbshipit-source-id: 7962444ac493f5c3c32a9aa1a7be465e8b84ccc2	2020-03-16 19:27:04 -07:00
Rohan Varma	1e140c353c	[profiler][rpc] fix a race condition in the profiler when multiple threads call (#33719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33719 We were seeing a strange error where gathering profiler events (specifically `parse_cpu_trace` in `profiler.py`) would fail with the error: `IndexError: pop from empty list`. It turned out that this was because for one particular `Event`, there was a pop recorded but not a push. Instead of the `push` event being completely missing, it was overwritten by a completely different event. After a bunch of debugging, and trying several hypotheses, it turns out that this was a race condition in `RangeEventList::record`. What happened was that different threads would call into `RangeEventList::record` on the same event list instance, and one record would stomp over the data written by the other one. Somehow the data written was a valid `Event` so the error did not manifest itself until the profiler realized a `pop` was missing a matching `push` in the python code. I fixed this by adding a lock to serialize writes to `RangeEventList::record`. This PR also makes a small change to pass in the `RecordFunction` name into `popRange`. It makes the debugging easier when investigating the events recorded. Differential Revision: D20071125 fbshipit-source-id: 70b51a65bcb833a7c88b7462a978fd3a39265f7e	2020-03-16 18:41:16 -07:00
Shen Li	422e348619	Don't run user function until all UserRRefs in the args are confirmed (#34497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34497 Use a thread_local table to intercept UserRRefs created during user function args deserialization, and then wait for confirmations of those UserRRefs before launching the given user function. Differential Revision: D20347464 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 087484a2d2f03fbfb156752ab25653f39b412a07	2020-03-16 18:30:06 -07:00
Shen Li	d876fef743	Fix send count for local RPC (#34809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34809 Test Plan: Imported from OSS Differential Revision: D20470495 Pulled By: mrshenli fbshipit-source-id: 2d6e2a2889be07fb074443f05db5089291daf8cf	2020-03-16 18:30:01 -07:00
Shen Li	38b2856c71	Split deserialize from runPythonUdf and remove generatePythonUDFResult (#34496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34496 Differential Revision: D20347469 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: b832a3a9e2ef61f149175f737b26f65d63bf797b	2020-03-16 18:28:07 -07:00
Eli Uriegas	ae0c88d6aa	.circleci: Add manywheel builds for python 3.8 (#34732 ) Summary: Not entirely sure why this wasn't here before but we definitely need to test for this. Closes https://github.com/pytorch/pytorch/issues/34727 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34732 Differential Revision: D20480508 Pulled By: seemethere fbshipit-source-id: 43bcff679ca35993f6bf1b10980acd7c86f780b1	2020-03-16 17:28:46 -07:00
neginraoof	480d1849b0	[ONNX] Fix for expand -1 dim value (#34069 ) Summary: PyTorch expand allows size with -1 dim value. -1 dim value means to infer the dimension from input tensor. This can be exported to ONNX expand with 1 dim value since ONNX expand supports two-way broadcast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34069 Reviewed By: hl475 Differential Revision: D20195532 Pulled By: houseroad fbshipit-source-id: c90e7d51b9d7422c09c5ed6e135ca8263105b8c9	2020-03-16 15:30:20 -07:00
Vasiliy Kuznetsov	1bac5fd0d3	add hardsigmoid FP operator to PyTorch (#34545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34545 This is for common operator coverage, since this is widely used. A future PR will add the quantized version. Some initial questions for reviewers, since it's my first FP operator diff: * do we need a backwards.out method for this? * do we need CUDA? If yes, should it be this PR or is it ok to split Test Plan: ``` // test python test/test_torch.py TestTorchDeviceTypeCPU.test_hardsigmoid_cpu_float32 // benchmark python -m pt.hardsigmoid_test ... Forward Execution Time (us) : 40.315 Forward Execution Time (us) : 42.603 ``` Imported from OSS Differential Revision: D20371692 fbshipit-source-id: 95668400da9577fd1002ce3f76b9777c6f96c327	2020-03-16 15:24:12 -07:00
Kevin Matzen	6d8649dc53	[caffe2] fix Transpose2D calls in NHWC<->NCHW (#34625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34625 These templated function calls are not specifying the template args correctly. The first arg is the index type, not the array data type. That means, right now it's using `T` as the index type as well, which will break if we do a template specialization for uint8_t. If we omit both, it will correctly infer that the index type is `int` and the data type is `T`. Reviewed By: BIT-silence Differential Revision: D20358728 fbshipit-source-id: 8cbd8eeb14bce602c02eb6fce2cc141f0121fa24	2020-03-16 15:18:44 -07:00
Xiang Gao	31eaeba38a	Increase the prec of test_baddbmm (#34764 ) Summary: This test is flaky on my computer, the error is: ``` AssertionError: tensor(1.3351e-05) not less than or equal to 1e-05 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34764 Differential Revision: D20476006 Pulled By: ezyang fbshipit-source-id: dad7e702275346070552c8a98765c37e6ca2c197	2020-03-16 15:06:01 -07:00
Pearu Peterson	8bae1ed144	PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem - copy (#34721 ) Summary: This is a copy of PR https://github.com/pytorch/pytorch/issues/29488 to help the merging process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34721 Differential Revision: D20444270 Pulled By: vincentqb fbshipit-source-id: 042c56c8c0dae37834f52b4aee2deae7dd6fa659	2020-03-16 14:13:30 -07:00
Mikhail Zolotukhin	976d6aaa51	Revert D20251830: [TensorExpr] Add tensorexpr benchmarks. Test Plan: revert-hammer Differential Revision: D20251830 Original commit changeset: bafd66ce32f6 fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052	2020-03-16 13:20:16 -07:00
Nikita Shulga	ef78fa8668	caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810 ) Summary: Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15% For example, it reduces pool_op.cu compilation from 18.8s to 16s Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810 Test Plan: CI Differential Revision: D20472230 Pulled By: malfet fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b	2020-03-16 12:58:05 -07:00
Mikhail Zolotukhin	e93e7b2795	[TensorExpr] Add tensorexpr benchmarks. (#34230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230 This PR adds some benchmarks that we used to assess tensor expressions performance. Differential Revision: D20251830 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af	2020-03-16 11:49:39 -07:00
Mikhail Zolotukhin	ea5c86c276	[TensorExpr] Add LLVM codegen. (#34228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228 This PR adds LLVM codegen to tensor expressions. LLVM is added as an optional build dependency specified with `USE_LLVM=<path_to_llvm>` variable. If this variable is not set or LLVM is not found in the specified path, the LLVM codegen is completely disabled. Differential Revision: D20251832 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2	2020-03-16 11:49:34 -07:00
Mikhail Zolotukhin	35e7efeb9a	[TensorExpr] Add CUDA codegen. (#34227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34227 This PR adds a CUDA support to tensor expressions. Differential Revision: D20251836 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ab36a55834cceff30c8371fef6cca1054a32f017	2020-03-16 11:49:29 -07:00
Mikhail Zolotukhin	42b2c8c65d	[TensorExpr] Add a fuser pass based on tensor expressions. (#34226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34226 LLVM and Cuda backends are added in subsequent PRs, so at this point the fuser is pretty useless, but it still can be tested and its logic is not going to change with addition of the codegens. Differential Revision: D20251838 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 82b0d221fa89904ed526689d02a6c7676a8ce8de	2020-03-16 11:49:24 -07:00
Mikhail Zolotukhin	e31d462e92	[TensorExpr] Pull changes to core classes for representing expressions and statements from the side branch. (#34224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34224 Our development has been happening on a side branch `pytorch_fusion` in `bertmaher/pytorch` fork. This PR moves changes to the core classes representing expressions and transformations on them. At this moment, the tensor expressions are only used in tests. Subsequent PRs add LLVM and CUDA codegen for tensor expressions and implement fuser on top of these. This PR is huge as it is a squashed version of changes in the side branch. It is not practical to pull changes one by one from the branch, so here is the squashed version. If you're interested in seeing the history of changes, please refer to https://github.com/bertmaher/pytorch Differential Revision: D20251835 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1a871acc09cf3c6f7fb4af40d408cdbb82dc7dab	2020-03-16 11:47:47 -07:00
Xinyi Zhang	99b91ee2ad	[fix][tiny][caffe2] Avoid triggering errors when allow ratio is 100% (#34757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34757 Reviewed By: Wakeupbuddy Differential Revision: D20451255 fbshipit-source-id: 07997cf31dba653b61d082ec3f28357c3b90c4eb	2020-03-16 11:39:32 -07:00
peter	24c9e61e79	Enable JIT tests on Windows (#27029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27029 Reviewed By: eellison Differential Revision: D20458664 Pulled By: jamesr66a fbshipit-source-id: 22be918543703869f471e89b3478423198351bf3	2020-03-16 11:26:21 -07:00
Yinghai Lu	1af6002321	Initial implementation of NNPI Int8FC op Test Plan: ``` buck test mode/no-gpu glow/fb/test/numerics:test_fc_nnpi_int8nnpi -- --print-passing-detail ``` Reviewed By: hyuen Differential Revision: D20450490 fbshipit-source-id: c4811cdc994548b6e319d57115434dfc199e07c2	2020-03-16 10:46:17 -07:00
Michael Suo	a57f92e4de	[jit] copy unused/ignored methods to ScriptModule during compilation (#33981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33981 Okay it turns out that https://github.com/pytorch/pytorch/pull/29342 deletes actually useful things from the resulting Python module. In particular, people like having `ignore`'d methods attached so that they can invoke them from python. Test Plan: Imported from OSS Differential Revision: D20171650 Pulled By: suo fbshipit-source-id: 71862e932c6a56cd055d0cff6657887ee0ceb9a8	2020-03-16 10:38:59 -07:00
Jerry Zhang	cec9758afa	[quant][graphmode] Add quantization pattern for quantized::add_relu (#33532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33532 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20354880 fbshipit-source-id: ea608a5ace395a909851f9e577ffdcb51512a3af	2020-03-16 10:20:57 -07:00
Gregory Chanan	8eaafbd99b	Remove unused newWithSize declaration. (#34730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34730 Test Plan: Imported from OSS Differential Revision: D20446078 Pulled By: gchanan fbshipit-source-id: 0effc088dcba4f60385e3b23fa656cb772a3b7bc	2020-03-16 09:17:54 -07:00
Gregory Chanan	b94d650868	Remove unused newView declaration. (#34729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34729 Test Plan: Imported from OSS Differential Revision: D20446077 Pulled By: gchanan fbshipit-source-id: b68471aeaf673851bdfc6bb0615aba8ebb883a4c	2020-03-16 09:16:14 -07:00
Xiang Gao	a66b837b19	Migrate dirichlet_grad from CUDA_tensor_apply4 to TensorIterator (#33996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33996 Test Plan: Imported from OSS Differential Revision: D20196789 Pulled By: VitalyFedyunin fbshipit-source-id: 69ee720f4f3d8a2df91874b77ee3918ce1b951b2	2020-03-16 08:56:32 -07:00
Xiang Gao	c3c0cf1591	Migrate binary_cross_entropy_backward from CUDA_tensor_apply4 to (#33995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33995 TensorIterator Test Plan: Imported from OSS Differential Revision: D20196790 Pulled By: VitalyFedyunin fbshipit-source-id: c0c231a20e6e69fc3c68c3ac5082b20f2feb6158	2020-03-16 08:54:49 -07:00
anjali411	762be86e63	[C++ API Parity] [Optimizers] added closure to optimizers (#34790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34790 Differential Revision: D20468361 Pulled By: anjali411 fbshipit-source-id: 1c6115d735b211dc2bedf002d58931cb32cf657a	2020-03-16 07:51:44 -07:00
Will Feng	bdd7dbfd4b	[C++ API] RNN / GRU / LSTM layer refactoring (#34322 ) Summary: This PR refactors RNN / GRU / LSTM layers in C++ API to exactly match the implementation in Python API. BC-breaking changes: - Instead of returning `RNNOutput`, RNN / GRU forward method now returns `std::tuple<Tensor, Tensor>`, and LSTM forward method now returns `std::tuple<Tensor, std::tuple<Tensor, Tensor>>`, matching Python API. - RNN / LSTM / GRU forward method now accepts the same inputs (input tensor and optionally hidden state), matching Python API. - RNN / LSTM / GRU layers now have `forward_with_packed_input` method which accepts `PackedSequence` as input and optionally hidden state, matching the `forward(PackedSequence, ...)` variant in Python API. - RNN / LSTM / GRU layers no longer have these fields: `w_ih` / `w_hh` / `b_ih` / `b_hh`. Instead, to access the weights and biases of the gates, users should do e.g. `rnn->named_parameters()["weight_ih_l0"]`, which mirrors the Python API `rnn.weight_ih_l0`. - In `RNNOptions` - `tanh()` / `relu()` / `activation` are removed. Instead, `nonlinearity` is added which takes either `torch::kTanh` or `torch::kReLU` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `LSTMOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `GRUOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` The majority of the changes in this PR focused on refactoring the implementations in `torch/csrc/api/src/nn/modules/rnn.cpp` to match the Python API. RNN tests are then changed to reflected the revised API design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34322 Differential Revision: D20458302 Pulled By: yf225 fbshipit-source-id: ffff2ae1ddb1c742c966956f6ad4d7fba03dc54d	2020-03-15 17:48:29 -07:00
Martin Yuan	d4f182d06b	Add overloaded name to prim operators (#34280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34280 To have prim ops searchable for lite interpreter, overloaded names need to be added for the operators with the same name but different schema. For example, aten::add in register_prim_ops.cpp. The difference is a combination of args and output type. `"aten::add(str a, str b) ->str"` `"aten::add(int a, int b) ->int"` `"aten::add(float a, float b) ->float"` `"aten::add(int a, float b) ->float"` `"aten::add(float a, int b) ->float"` `"aten::add(Scalar a, Scalar b) ->Scalar"` Solution: Use the argument type and/or output type (the same to the existing overloaded names). The overloaded name should be minimum as long as the operators can be differentiated. For other operators please look into the source code change for details. `"aten::add.str(str a, str b) ->str"` `"aten::add.int(int a, int b) ->int"` `"aten::add.float(float a, float b) ->float"` `"aten::add.int_float(int a, float b) ->float"` `"aten::add.float_int(float a, int b) ->float"` `"aten::add.Scalar_Scalar(Scalar a, Scalar b) ->Scalar"` Test Plan: Imported from OSS Differential Revision: D20456997 Pulled By: iseeyuan fbshipit-source-id: 2c3dc324b4a4e045559f62c6cc2a10fbb9a72dcf	2020-03-15 17:05:54 -07:00
Mike Ruberry	c86d1361b8	Removes unused THCTensor_(triu), THCTensor_(div) (#34712 ) Summary: Per title. Dead code removal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34712 Differential Revision: D20442618 Pulled By: mruberry fbshipit-source-id: b03aa4984328f94021c1480e21375fd868d6d550	2020-03-15 16:42:35 -07:00
xiaobingsuper	c258e4732a	solve conv3d backward get incorrect result problem (#34358 ) Summary: Fix https://github.com/pytorch/pytorch/issues/34344. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34358 Differential Revision: D20461698 Pulled By: ngimel fbshipit-source-id: 472624d0037ab65d9dcc221f647ec68818be5fc9	2020-03-15 16:15:53 -07:00
xiaobingsuper	7848c229b8	Move min and max(reduce all) to Aten(CPU) (#33936 ) Summary: This PR is about port min and max(reduce all) to Aten. Performance test script: ``` import torch import timeit torch.manual_seed(0) #torch.set_num_threads(1) device = "cpu" print(f'device: {device}') for op in ('max', 'min'): for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(20_000, 200000), (200_000, 20000)]: print(f'a.{op}(), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'a.{op}()', setup=f'import torch; a =(torch.torch.randn({n}) * 100).to({dtype})', number=t)) ``` Test device: skx-8180, 2 sockets Before: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 2.773961597122252 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 2.3256353894248605 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 3.800648272037506 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 3.31692426931113 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 2.735901520587504 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 2.2510280115529895 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 2.723656536079943 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 2.228839812800288 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 2.703160767443478 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 2.3175809988752007 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 2.820106916129589 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 2.325718787498772 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 3.833602518774569 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 3.316444822587073 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 2.7308286419138312 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 2.198460517451167 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 2.730219766497612 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 2.2268200274556875 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 2.7342184390872717 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 2.320415544323623 ``` After: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 1.7767417253926396 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 0.550495645031333 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 1.1113408291712403 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 0.44446005020290613 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 0.5246349424123764 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 0.47057845536619425 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 0.6597231412306428 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 0.40366593934595585 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 1.767227927222848 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 0.6187495030462742 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 1.7881382443010807 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 0.5440589748322964 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 1.1090848250314593 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 0.4293213738128543 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 0.5207074657082558 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 0.41422136034816504 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 0.6145811947062612 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 0.4172037309035659 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 1.7397673893719912 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 0.596766366623342 ``` Single thread: Before: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 2.5068740313872695 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 2.234461876563728 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 3.5549037409946322 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 3.2497852174565196 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 2.493077039718628 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 2.171935741789639 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 2.469274105504155 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 2.273881389759481 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 2.5818942049518228 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 2.2394551979377866 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 2.5894540259614587 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 2.331936141476035 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 3.590122046880424 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 3.255849950015545 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 2.5205496419221163 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 2.168218174017966 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 2.658622432500124 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 2.3376982398331165 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 2.496626536361873 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 2.2504652086645365 ``` After: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 1.9525171788409352 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 1.6108122132718563 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 1.2444602297618985 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 0.7705567870289087 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 0.6575072864070535 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 0.13242999743670225 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 0.829406064003706 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 0.35575105529278517 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 1.6426756298169494 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 1.4049720335751772 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 2.029639278538525 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 1.6363644907251 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 1.3821239182725549 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 0.834847847931087 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 0.6913397628813982 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 0.1370067736133933 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 0.8190992185845971 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 0.3640836915001273 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 1.6516661625355482 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 1.4111155439168215 ``` Fixes: https://github.com/pytorch/pytorch/issues/33197 Fix https://github.com/pytorch/pytorch/issues/24728, https://github.com/pytorch/pytorch/issues/24729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33936 Differential Revision: D20461658 Pulled By: ngimel fbshipit-source-id: 5749260114ace3ea7b513e32edc805c844a19c8a	2020-03-15 16:09:58 -07:00
Pritam Damania	f058c03b15	Disallow sending CUDA tensors over RPC for current RPC agents. (#33604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33604 For our current RPC agents, this PR disallows sending CUDA tensors over RPC and asks users to copy them explicitly to CPU. Currently, this seems to be the easiest contract to guarantee for our current RPC agents, otherwise if we do support this transparently it gets a little tricky in terms of whether a CUDA tensor on the client should be sent to CPU/GPU of the remote end and also which GPU device on the remote end. In the future, the TensorPipe RPC agent can have its own specific handling of CUDA tensors. Closes https://github.com/pytorch/pytorch/issues/28881 ghstack-source-id: 100166120 Test Plan: waitforbuildbot Differential Revision: D20020183 fbshipit-source-id: ca4d43d2a24e8fcd3a60b21e654aa0e953e756cb	2020-03-15 15:01:46 -07:00
Xiang Gao	f404537c26	CUDA Loops: move address computation into policy, make policy.load load all arguments (#33720 ) Summary: So that in the future we can make policy accept an offset calculator in its constructor for the support of non-contiguous tensors. The `elementwise_kernel_helper` is now very general and it can handle any cases: ```C++ template<typename func_t, typename policy_t> __device__ inline void elementwise_kernel_helper(func_t f, policy_t policy) { using traits = function_traits<func_t>; using return_t = typename traits::result_type; using args_t = typename traits::ArgsTuple; int idx = blockIdx.x; return_t results[thread_work_size]; cuda9::workaround::enable_default_constructor<args_t> args_[thread_work_size]; args_t args = reinterpret_cast<args_t >(&args_); // load policy.load(args, idx); // compute #pragma unroll for (int i = 0; i < thread_work_size; i++) { if (policy.check_inbounds(i)) { results[i] = c10::guts::apply(f, args[i]); } } // store policy.store(results, idx); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33720 Differential Revision: D20459652 Pulled By: ngimel fbshipit-source-id: aa8b122e0e8c6e08ab354785e04753ff778882e2	2020-03-15 14:41:05 -07:00
Kimish Patel	528aabd373	Fix backward compatibility check test for schemas containing (#34782 ) Summary: "torch.classes". BC check tests skips adding torch.classes based schemas to existing schemas. Removed the skip. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34782 Test Plan: cd test/backward_compatibility python dump_all_function_schemas.py --filename new_schemas.txt python check_backward_compatibility.py --new-schemas new_schemas.txt Before this PR fails with: ``` Mar 15 11:12:20 Broken ops: [ Mar 15 11:12:20 _xnnpack::conv2d_packed(Tensor X, __torch__.torch.classes.XNNPackConv2dOpContext W_prepack) -> (Tensor Y) Mar 15 11:12:20 _xnnpack::conv2d_prepack(Tensor W, Tensor? B, int[2] stride, int[2] padding, int[2] dilation, int groups) -> (__torch__.torch.classes.XNNPackConv2dOpContext) Mar 15 11:12:20 _xnnpack::linear_packed(Tensor X, __torch__.torch.classes.XNNPackLinearOpContext W_prepack) -> (Tensor Y) Mar 15 11:12:20 _xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> (__torch__.torch.classes.XNNPackLinearOpContext) Mar 15 11:12:20 ] ``` After this PR, it passes. Reviewed By: houseroad Differential Revision: D20461994 Pulled By: kimishpatel fbshipit-source-id: de692644ee7d49accf2d8260cd3a10f6e147653a	2020-03-15 14:35:19 -07:00
Lu Fang	15c84c37b6	[PyTorch BC] Clean up the BC whitelist (#34784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34784 Remove stale items Test Plan: ci Reviewed By: hl475 Differential Revision: D20461740 fbshipit-source-id: 46dcc39f3a867165aadee182033b09ca65ee8551	2020-03-15 12:46:57 -07:00
Tugrul Ince	08bc3c6cbf	Remove unnecessary import (#34778 ) Summary: https://github.com/pytorch/pytorch/issues/34563 accidentally introduced a lint error due to an unused import. This PR removes this import. Jit tests run as expected after this change: ``` > python test/test_jit.py ..... Ran 2435 tests in 100.077s OK (skipped=140, expected failures=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34778 Differential Revision: D20459708 Pulled By: tugrulince fbshipit-source-id: bb742085fafc849ff3d9507d1557556e01fbeb4b	2020-03-15 09:56:55 -07:00
Xiaomeng Yang	1d81bd02cc	Export roi_align_gradient_op to c10 (#34776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34776 Export roi_align_gradient_op to c10 Test Plan: unittest Reviewed By: houseroad Differential Revision: D20459210 fbshipit-source-id: 80bf065f83bb44b39a150bae25b3591c16f522fa	2020-03-15 02:43:39 -07:00
Yinghai Lu	373c80ee90	Fix missing header (#34762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34762 So far it's by luck that we somehow include "caffe2/core/tensor.h" before including "caffe2/caffe2/quantization/server/fbgemm_pack_blob.h". This is not safe and this diff fixes it. Test Plan: unittest Reviewed By: jianyuh Differential Revision: D20455352 fbshipit-source-id: 777dae32a23d0ec75fd7e5e1627426b5a5f81f5a	2020-03-15 00:19:42 -07:00
Will Feng	6c555e1508	Revert D20311699: [pytorch][PR] [C++ API] RNN / GRU / LSTM layer refactoring Test Plan: revert-hammer Differential Revision: D20311699 Original commit changeset: e2b60fc7bac6 fbshipit-source-id: 72f4a762189490998d6b716857eeac053a11742d	2020-03-14 16:18:48 -07:00
Kimish Patel	84bd71dbd4	Enable threading for XNNPACK ops. (#34547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34547 This enables threading by passing a threadpool to xnnpack ops. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20370553 fbshipit-source-id: 4db08e73f8c69b9e722b0e11a00621c4e229a31a	2020-03-14 12:53:36 -07:00
Kimish Patel	4da5569300	Pass to remove prepacking ops. (#34319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34319 Removes prepacking ops and install them as attributes of the top level module. Needs to run freezing as the first pass. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20290726 fbshipit-source-id: 633ceaa867ff7d5c8e69bd814c0362018394cb3a	2020-03-14 12:53:31 -07:00
Kimish Patel	7dd5da2026	JIT pass to insert XNNPACK ops (#34048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34048 Rewrites the graph to insert xnnpack prepack and packed run ops for conv2d and linear. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185658 fbshipit-source-id: c4c073c912ad33e822e7beb4ed86c9f895129d55	2020-03-14 12:53:27 -07:00
Kimish Patel	4c30fc7238	Integrate XNNPACK with custom class for packing weights. (#34047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047 This PR integrates the added xnnpack conv2d and linear op via custom class registration for packed weights. The packed struct is serializable. Test Plan: python test test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185657 fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698	2020-03-14 12:51:56 -07:00
Will Feng	e23a9dc140	[C++ API] RNN / GRU / LSTM layer refactoring (#34322 ) Summary: This PR refactors RNN / GRU / LSTM layers in C++ API to exactly match the implementation in Python API. BC-breaking changes: - Instead of returning `RNNOutput`, RNN / GRU forward method now returns `std::tuple<Tensor, Tensor>`, and LSTM forward method now returns `std::tuple<Tensor, std::tuple<Tensor, Tensor>>`, matching Python API. - RNN / LSTM / GRU forward method now accepts the same inputs (input tensor and optionally hidden state), matching Python API. - RNN / LSTM / GRU now has `forward_with_packed_input` method which accepts `PackedSequence` as input and optionally hidden state, matching the `forward(PackedSequence, ...)` variant in Python API. - In `RNNOptions` - `tanh()` / `relu()` / `activation` are removed. Instead, `nonlinearity` is added which takes either `torch::kTanh` or `torch::kReLU` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `LSTMOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `GRUOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` The majority of the changes in this PR focused on refactoring the implementations in `torch/csrc/api/src/nn/modules/rnn.cpp` to match the Python API. RNN tests are then changed to reflected the revised API design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34322 Differential Revision: D20311699 Pulled By: yf225 fbshipit-source-id: e2b60fc7bac64367a8434647d74c08568a7b28f7	2020-03-14 12:09:04 -07:00
Jerry Zhang	5710374e4e	[reland][quant][graphmode] Add quantized conv2d-relu fusion pattern (#33279 ) (#34744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34744 att Test Plan: python test/test_jit.py Differential Revision: D20449667 Pulled By: jerryzh168 fbshipit-source-id: 01bbc26604fac421dcaacaf4fa1b57731f1f08b7	2020-03-14 01:03:18 -07:00
James Reed	fb20621b3b	Move torchbind out of jit namespace (#34745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34745 Test Plan: Imported from OSS Differential Revision: D20450239 Pulled By: jamesr66a fbshipit-source-id: 3f5597626f21d7b5e329b57da358c76b531bf806	2020-03-13 23:03:14 -07:00
Supriya Rao	8a395882ce	[quant][onnx] Support conversion of quantized sigmoid operator from pytorch to caffe2 (#34629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34629 Add support for sigmoid in the conversion flow through onnx Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_quantized_sigmoid python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_small_model Imported from OSS Differential Revision: D20433680 fbshipit-source-id: 95943e14637d294122e4d102c5c19c06d27064c6	2020-03-13 22:42:06 -07:00
Supriya Rao	af28915164	[quant][onnx] Add support to convert max_pool2d quantized pytorch op to C2 (#33945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33945 Add mapping for this operator in symbolics Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_max_pool2d Imported from OSS Differential Revision: D20433681 fbshipit-source-id: 88f02ade698262a6f8824671830bc1f7d40bbfa6	2020-03-13 22:40:49 -07:00
Will Feng	d041d0784e	[C++ API] RNNCell / LSTMCell / GRUCell layers (#34400 ) Summary: This PR adds `RNNCell` / `LSTMCell` / `GRUCell` layers to the C++ frontend, with implementations exactly matching the Python API equivalent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34400 Differential Revision: D20316859 Pulled By: yf225 fbshipit-source-id: bb7cee092622334043c0d0fd0fcb4e75e707699c	2020-03-13 21:52:24 -07:00
Lingyi Liu	68758b2fa0	Add the quantized batch_norm3d and also batch_norm3d fused with relu operators (#34702 ) Summary: as title, for bringing up the quantized video model. Will add the batch_norm_relu test in another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34702 Differential Revision: D20436092 Pulled By: lly-zero-one fbshipit-source-id: 116bd306f7880bfd763d8575654fbd6c92818338	2020-03-13 20:30:28 -07:00
Will Feng	da11646db1	[C++ API] Link to module options doc for functional that has same options as module (#34752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34752 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20452681 Pulled By: yf225 fbshipit-source-id: 06b56a08bd480999353ebbff39c035225e4070df	2020-03-13 20:19:43 -07:00
Eli Uriegas	7dee36a061	.circleci: Remove CUDA 10.0, no longer needed (#34726 ) Summary: Since we've added CUDA 10.2, it is time to retire CUDA 10.0 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34726 Differential Revision: D20453081 Pulled By: seemethere fbshipit-source-id: fd5bb35325a5f1577d0f0404d16cd7dfe34c86ad	2020-03-13 18:55:45 -07:00
Zachary DeVito	52005b551c	invokeOperatorFromPython: support overloaded operator calling (#34671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34671 Like the python arg parser, this tries to convert to the schema in order. It introduces schema_match_exception which gets thrown when the schema doesn't match, allowing the overload handler to try the next option. Behavior will not 100% match the schema argument parser but should work for simple cases using custom binding. Test Plan: Imported from OSS Differential Revision: D20432206 Pulled By: zdevito fbshipit-source-id: 280839a2205ea3497db3a9b5741fccc1e2bff9a8	2020-03-13 18:46:03 -07:00
James Reed	ab76a8206f	[JIT][mobile] Support built-in Function call in lite interpreter (#34676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34676 Test Plan: Imported from OSS Differential Revision: D20427938 Pulled By: jamesr66a fbshipit-source-id: 79eebfa858776f26da55ffd49d3f78fa7ae0df9b	2020-03-13 18:24:18 -07:00
Michael Suo	af3a7e2b50	[jit] small cleanups after script:: removal (#34677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34677 1. Remove remaining uses of `script::` namespace from the codebase, 2. Add one more typedef for `script::ExtraFilesMap` which is part of the public interface. Pull Request resolved: #34580 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D20431739 Pulled By: suo fbshipit-source-id: a29d369c755b6506c53447ca1f286b6339222c9a	2020-03-13 17:56:16 -07:00
Jerry Zhang	e7910aa9e5	[fix] use non-inplace for insert observer pass (#34190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34190 inplace modification of ClassType might affect other tests, so we want to do non-inplace modifications. Actually the inplace argument will be removed soon. Test Plan: ci Imported from OSS Differential Revision: D20451765 fbshipit-source-id: e87ad528c4e7f84f5774b94a8e3e85568269682d	2020-03-13 17:25:07 -07:00
Elias Ellison	1734bd6871	skip mask_rcnn test (#34734 ) Summary: fix master Pull Request resolved: https://github.com/pytorch/pytorch/pull/34734 Differential Revision: D20447607 Pulled By: eellison fbshipit-source-id: 165c64f0484abf068b7d3a204a6bcb623ffe0910	2020-03-13 15:50:49 -07:00
Nikita Shulga	6d790c3611	Mark PyTorch incompatible with python-3.6.0 (#34724 ) Summary: Per https://github.com/pytorch/pytorch/issues/19161 PyTorch is incompatible with 3.6.0 due to the missing `PySlice_Unpack` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34724 Test Plan: CI + try to load pytorch binary using python-3.6.0 Differential Revision: D20449052 Pulled By: malfet fbshipit-source-id: 2c787fc64f5d1377c7f935ad2f3c77f46723d7dd	2020-03-13 15:22:34 -07:00
prajjwal1	aedffdf7d8	Support for Tensor Shape Type Hint (#34595 ) Summary: This PR is related to [https://github.com/pytorch/pytorch/issues/33953](https://github.com/pytorch/pytorch/issues/33953). I've created a directory `type_hint_tests` for the example as suggested by zou3519 [here](https://github.com/pytorch/pytorch/issues/33953#issuecomment-597716405). This directory is supposed to contain examples over which mypy will run. I've added the test in `test/test_type_hints.py`. The test can simply be invoked by ``` $ python3 test/test_type_hints.py Fail to import hypothesis in common_utils, tests are not derandomized .b'test/type_hint_tests/size.py:7: error: Tuple index out of range\ntest/type_hint_tests/size.py:8: error: Tuple index out of range\n' . ---------------------------------------------------------------------- Ran 2 tests in 13.660s OK ``` Note that I've not made the change of fixing the stub to show that the test works. The issue can be fixed by changing definition of Size in `class Size(Tuple[_int, ...]): ... ` in `/torch/__init__.pyi.in`. After changing the `Size` definition, the test passes. ``` $ python3 test/test_type_hints.py Fail to import hypothesis in common_utils, tests are not derandomized .b'' . ---------------------------------------------------------------------- Ran 2 tests in 19.382s OK ``` I will do that once i get approval from zou3519. This is an initial implementation, please provide your suggestions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34595 Differential Revision: D20441817 Pulled By: zou3519 fbshipit-source-id: 00a434adf5bca813960f4efea38aa6d6953fe85f	2020-03-13 15:16:24 -07:00
Jeff Hwang	c9ed111894	[caffe2][quantization] Add initializer and precision as read-only property to QueryTensorQparam (#34706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34706 as title Test Plan: test in stacked diff Reviewed By: csummersea Differential Revision: D20436618 fbshipit-source-id: e51ef0a22708425cd296c05f4089fe8c98eda90a	2020-03-13 15:09:35 -07:00
Rohan Varma	c371c3aba7	[rpc][profiler] add a test case to verify record_function context manager works (#34511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34511 With https://github.com/pytorch/pytorch/pull/34122/files, issues with using record_function context manager and profiling RPCs were fixed. This adds a test case to verify that we can use RPC with the `record_function` decorator. ghstack-source-id: 100109932 Test Plan: Unit test change Differential Revision: D20352242 fbshipit-source-id: d6429e4352ad3b8d874dc0f27b23ecb6202e6b2b	2020-03-13 15:03:30 -07:00
Xiaomeng Yang	0f3b6f3dec	Add min function to cuda math compat (#34723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34723 Add min function to cuda math compat Test Plan: unittest Reviewed By: houseroad Differential Revision: D20444517 fbshipit-source-id: 1a93343cc57249ef1101eeb7ef373266f6a2873a	2020-03-13 14:31:09 -07:00
Meghan Lele	a730abd997	[PyTorch][tools] Add linux64 clang-format hash Summary: This commit adds a reference hash for the linux64 clang-format binary and in doing so, enables this script to be used on Linux machines. Test Plan: Ran the script. ``` meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ export http_proxy=fwdproxy:8080 meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ export https_proxy=fwdproxy:8080 meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ python3 ./tools/clang_format_new.py --diff Downloading clang-format to /data/users/meghanl/fbsource/fbcode/caffe2/.clang-format-bin 0% \|################################################################\| 100% Using clang-format located at /data/users/meghanl/fbsource/fbcode/caffe2/.clang-format-bin/clang-format meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ echo $? 1 ``` A non-zero return code indicates that `clang-format` will make changes. Reviewed By: suo Differential Revision: D20434291 fbshipit-source-id: fa13766e9d94720d4b0d8a540d2f1507e788f7a5	2020-03-13 14:22:17 -07:00
Rohan Varma	f933fa3613	[docs][1.5] update RPC docs to reflect correct use of dist_autograd backwards and dist_optim step() (#34670 ) Summary: - Clarify that `torch.distributed.autograd.backwards()` does not use the current thread local autograd context, instead it looks it up based on the context_id passed in - Clarify the same for `torch.distributeed.optimizer.optim.step()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34670 Differential Revision: D20427645 Pulled By: rohan-varma fbshipit-source-id: a1a88de346cdd4dbe65fb2b7627157f86fd2b6a3	2020-03-13 14:09:23 -07:00
Tugrul Ince	c9023e3b12	Support left and right shift operators in JIT (#34563 ) Summary: With this PR, we can now support left and right shift operators in the JIT engine for <int, int> and <Tensor, int>. Updated tests pass as expected: ``` > python test/test_jit.py ... Ran 2427 tests in 84.861s OK (skipped=139, expected failures=1) ``` Running the following code with Python results in the output below: ``` > cat ~/expressions.py import torch torch.jit.script def fn(a, b): # type: (int, int) return ( a << b, # supported b >> a, # supported a & b, a \| b, a ^ b ) print(fn.graph) ``` ``` > python ~/expressions.py graph(%a.1 : int, %b.1 : int): %4 : int = aten::leftshift(%a.1, %b.1) # /home/ince/expressions.py:7:8 %7 : int = aten::rightshift(%b.1, %a.1) # /home/ince/expressions.py:8:8 %10 : int = aten::__and__(%a.1, %b.1) # /home/ince/expressions.py:9:8 %13 : int = aten::__or__(%a.1, %b.1) # /home/ince/expressions.py:10:8 %16 : int = aten::__xor__(%a.1, %b.1) # /home/ince/expressions.py:11:8 %17 : (int, int, int, int, int) = prim::TupleConstruct(%4, %7, %10, %13, %16) return (%17) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34563 Differential Revision: D20434209 Pulled By: tugrulince fbshipit-source-id: 886386c59755106e17b84778b8e495b80a6269cd	2020-03-13 13:00:33 -07:00
Elias Ellison	c34ee4fb6e	[JIT] disable test (#34722 ) Summary: I opened https://github.com/pytorch/pytorch/issues/34658 but it didn't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34722 Differential Revision: D20444547 Pulled By: eellison fbshipit-source-id: 90aa06098587b48c9760a9c6df9bec01d642fcdb	2020-03-13 12:48:27 -07:00
Hong Xu	027d7f7ba5	Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623 The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid of it entirely. Close #34502 Test Plan: Imported from OSS Differential Revision: D20420112 Pulled By: albanD fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50	2020-03-13 12:27:22 -07:00
Eli Uriegas	4a599f47fb	scripts: Add script to promote conda packages (#34659 ) Summary: How this actually works: 1. Get's a list of URLs from anaconda for pkgs to download, most likely from pytorch-test 2. Download all of those packages locally in a temp directory 3. Upload all of those packages, with a dry run upload by default This, along with https://github.com/pytorch/pytorch/issues/34500 basically completes the scripting work for the eventual promotion pipeline. Currently testing with: ``` TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 PYTORCH_CONDA_FROM=pytorch scripts/release/promote/conda_to_conda.sh ``` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34659 Differential Revision: D20432687 Pulled By: seemethere fbshipit-source-id: c2a99f6cbc6a7448e83e666cde11d6875aeb878e	2020-03-13 12:14:58 -07:00
Nikita Shulga	b1dbe33056	Skip `TestNN.test_spectral_norm_load_state_` if PyTorch is compiled w… (#34686 ) Summary: …ithout lapack LAPACK is needed for `at::svd``, which is called from `pinverse()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34686 Test Plan: CI + local run Differential Revision: D20442637 Pulled By: malfet fbshipit-source-id: b3531ecc1197b0745ddcf50febb7fb4a7700d612	2020-03-13 11:36:33 -07:00
X Wang	40eff454ce	Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/33988 and fix https://github.com/pytorch/pytorch/issues/34083. Previously, the max_pool2d_nhwc kernels used a shared memory with size proportional to the tensor size (c \* h \* w). When the tensor size is too large, the kernel launch fails. This PR follows the guidance in AdaptiveAvgPool2d_nhwc by increasing the number of grid_x with split in "C" dimension. With that change, there will be a maximum limit in the shared memory size (which is less than 48 kb) regardless of tensor size. A benchmark can be found at [here](`0b98146089/max-pool2d/max-pool2d.ipynb`). TL;DR barely any performance drop is found. cc csarofeen ptrblck jjsjann123 VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/34519 Differential Revision: D20388848 Pulled By: VitalyFedyunin fbshipit-source-id: 9454f385f9315afaab4a05303305578bbcd80b87	2020-03-13 11:28:49 -07:00
Will Feng	3924c55f4c	[C++ API] Update torch::nn functional docs (#34688 ) Summary: - `torch::nn::functional` functions must provide example for how to use the corresponding functional options - `torch::nn::functional` functions must link to the corresponding functional options - remove `TORCH_NN_FUNCTIONAL_USE_MODULE_OPTIONS` macro, and put `torch::nn::functional` options docs inside the functional namespace, right above functional declaration - `torch::nn::functional` options docs should not link back to torch::nn layers. Instead, they should have links to `torch::nn::functional::xxx` ---- This PR is BC-breaking in the following way: `TORCH_NN_FUNCTIONAL_USE_MODULE_OPTIONS` macro is removed, and user should explicitly write ```cpp namespace functional { using SomeFuncOptions = SomeModuleOptions; } // namespace functional ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34688 Differential Revision: D20431251 Pulled By: yf225 fbshipit-source-id: 7d4f27dca3aad2a1e523690927d7afb261b9d308	2020-03-13 10:27:28 -07:00
Xingying Cheng	27410318ad	[PyTorch][Mobile] Fix the operator latency issue. Summary: Last diff enabled operator stats for non-production build including AIBench. But the operator latency is off: https://our.intern.facebook.com/intern/aibench/details/414567479798816 as it is representing operator execution end time, and as the threadLocalDebugInfo was not set, the start time is 0. So this diff is fixing it by creating a new ThreadLocalDebugInfo object when op starts to run and store the model information for logging. Test Plan: ```buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform android --framework pytorch --remote --devices SM-G960F-8.0.0-26``` https://our.intern.facebook.com/intern/aibench/details/922804117425407 ```buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android --framework pytorch --remote --devices SM-G960F-8.0.0-26``` https://our.intern.facebook.com/intern/aibench/details/593403202250750 Reviewed By: xta0 Differential Revision: D20436388 fbshipit-source-id: 740bc94c3f51daef6af9b45c1ed7a708f5fc8836	2020-03-13 09:49:54 -07:00
Andrew Delong	8e8a37d746	Fix bug in baddbmm corner case (#33467 ) (#33538 ) Summary: Ensure `torch.baddbmm(c, a, b)` returns `beta*c` when `a @ b` has empty inner dimension. Fixes https://github.com/pytorch/pytorch/issues/33467. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33538 Differential Revision: D20352352 Pulled By: albanD fbshipit-source-id: a7021c1979f82402ecea4784d6cc39783392ea16	2020-03-13 09:30:20 -07:00
Bangsheng Tang	8f854fb9e2	[1/n][multi-tower] add partition info in predictor construction (#34175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34175 to incorporate PartitionInfo added in D20015493 Test Plan: unit tests Reviewed By: yinghai Differential Revision: D20133759 fbshipit-source-id: 130db2d80bca3c05a7ec91292159f857046718e0	2020-03-13 09:23:39 -07:00
generatedunixname89002005287564	14c1ab049d	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D20415422 fbshipit-source-id: 860f8dd9dce0a2420792bafb7d3e58bd883ab7e4	2020-03-13 06:27:03 -07:00
Edward Yang	b93518a662	Revert D20422879: [pytorch][PR] Remove hotpatches that circumvent MAGMA bug Test Plan: revert-hammer Differential Revision: D20422879 Original commit changeset: 8dd7a30b5c31 fbshipit-source-id: a44dda3220d426a92b0e158e9903566be8701374	2020-03-13 06:00:11 -07:00
svcscm	6791ae51a5	Updating submodules Summary: GitHub commits: `e8f09733c7` `7e1606a407` `674cf41732` `e961892c6c` `a5dffd2784` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: eb2e20f65ba40bacbfeb1d0cb54ed373cca564ff	2020-03-13 04:17:59 -07:00
Rohan Varma	fd35596585	[docs][1.5] Update distributed autograd note (#34657 ) Summary: - Update API calls `backward` and `optim.step` now that we require `context_id` - Add notes to clarify purpose of distributed autograd context (this was a source of confusion in some feedback) - Add note that details why optimizer requires context_id - Clearly specify that we don't have SMART mode yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/34657 Differential Revision: D20427667 Pulled By: rohan-varma fbshipit-source-id: 5f8a3539ccf648a78e9e9a0dfdfe389c678b1606	2020-03-12 22:56:32 -07:00
Chunli Fu	808f84ee35	[Shape Inference] Update shape inference in dper3 backend - C2 part (#34474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34474 Add InferQuantization - set current_dim_type_ to CONSTANT for quantization ops. Test Plan: buck test mode/opt-clang caffe2/caffe2/opt:bound_shape_inference_test Reviewed By: yinghai Differential Revision: D20332703 fbshipit-source-id: 36fa9bc81ae9f49dd00d8393d99ccce0884542df	2020-03-12 22:20:51 -07:00
Shen Li	ad4bc8c9b8	Best-effort Error Detection for Using Deleted UserRRefs (#34673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34673 Test Plan: Imported from OSS Differential Revision: D20427839 Pulled By: mrshenli fbshipit-source-id: b1b12ca42a9ed5294806c53fa7d6f54e7dc8b188	2020-03-12 21:39:15 -07:00
Shen Li	f9aa0c870f	Use c10::str in py_rref.cpp (#34681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34681 Test Plan: Imported from OSS Differential Revision: D20428827 Pulled By: mrshenli fbshipit-source-id: 847486b3114f0e9a2ad5f80c5e44db82d977c6a2	2020-03-12 21:39:10 -07:00
Shen Li	673d56c838	Use c10::str in process_group_agent.cpp (#34679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34679 Test Plan: Imported from OSS Differential Revision: D20428467 Pulled By: mrshenli fbshipit-source-id: 2bfde4e383347c6e709109f074f55b9bc8068a49	2020-03-12 21:38:14 -07:00
Jerry Zhang	e9a660a160	Revert D20354878: [quant][graphmode] Add quantized conv2d-relu fusion pattern Test Plan: revert-hammer Differential Revision: D20354878 Original commit changeset: 2b19797d4b3f fbshipit-source-id: 18f447074794af0d579e145df02af47d01746921	2020-03-12 21:29:08 -07:00
Lingyi Liu	5d65b5cd01	Add the 3d upsample quantized op for video model (#34594 ) Summary: as title, we are currently missing this 3d op, which is required for video related model. Performance benchmark: ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 64, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 4, 1, 2, 3]) x = x.permute([0, 4, 1, 2, 3]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.interpolate(x, size=30, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.functional.interpolate(q_x, size=30, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ``` ``` ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 1136.8209528923035 1.294245719909668 0.0011384780660638283 GB/s float GB/s quant 0.20510608588517917 45.03953391792442 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 827.9890131950378 1.11464262008667 0.0013462046021426 GB/s float GB/s quant 0.28160868355034036 52.29678369508914 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 834.6958303451538 7.481417655944824 0.008963046638020456 GB/s float GB/s quant 0.2793459455806586 31.16640544920269 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34594 Differential Revision: D20389106 Pulled By: lly-zero-one fbshipit-source-id: d3a8c2cac58087d8b29e9cae64822f5b2d4c03ba	2020-03-12 21:06:38 -07:00
Edward Yang	d5f8c8f3ba	Revert D20121169: [pytorch][PR] ONNX Export Support for CrossEntropyLoss Test Plan: revert-hammer Differential Revision: D20121169 Original commit changeset: 7b56617e8c60 fbshipit-source-id: d7f302d1e54f3c978c3be0a0ad1ee600790a5b27	2020-03-12 20:30:54 -07:00
Chunli Fu	4ae74b3b25	[DPER3][Shape Inference] Initial Shape Inference in DPER3 frontend (#33607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33607 Differential Revision: D20025048 fbshipit-source-id: 8b3a3bcfeb450de4d38c555bf2bb116ddedad3ec	2020-03-12 20:25:50 -07:00
Jerry Zhang	0ff4d37933	[quant][graphmode] Add quantized conv2d-relu fusion pattern (#33279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33279 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20354878 fbshipit-source-id: 2b19797d4b3fd96918164a58bfbd768211ad6c6d	2020-03-12 19:49:57 -07:00
eellison	44256199a9	[JIT] remove specialized list ops (#34520 ) Summary: Now that lists are no longer specialized, we can register only one operator for list ops that are generic to their element type. This PR reorgs lists into three sets of ops: - CREATE_GENERIC_LIST_OPS - CREATE_SPECIALIZED_LIST_OPS - CREATE_COMPARATOR_LIST_OPS_SPECIALIZED (we didn't bind certain specialized ops to Tensor) This is important to land quickly because mobile is finalizing its bytecode soon, after which we could not remove these ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34520 Reviewed By: iseeyuan Differential Revision: D20429775 Pulled By: eellison fbshipit-source-id: ae6519f9b0f731eaa2bf4ac20736317d0a66b8a0	2020-03-12 17:49:23 -07:00
Eli Uriegas	c78eacb5ee	scripts: Add promotion script for s3 to pypi (#34500 ) Summary: Is reliant on scripts for promotion from s3 to s3 to have already run. A continuation of the work done in https://github.com/pytorch/pytorch/issues/34274 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34500 Test Plan: yeah_sandcastle Differential Revision: D20389101 Pulled By: seemethere fbshipit-source-id: 5e5b554cff964630c5414d48be35f14ba6894021	2020-03-12 17:21:23 -07:00
Meghan Lele	52787388d2	[tools] Add clang_format_new.py to download, verify and run clang-format binary (#34566 ) Summary: Summary This commit adds `tools/clang_format_new.py`, which downloads a platform-appropriate clang-format binary to a `.gitignored` location, verifies the binary by comparing its SHA1 hash to a reference hash (also included in this commit), and runs it on all files matched a specific regex in a list of whitelisted subdirectories of pytorch. This script will eventually replace `tools/clang_format.py`. Testing Ran the script. No Args ``` pytorch > ./tools/clang_format.py Downloading clang-format to /Users/<user>/Desktop/pytorch/.clang-format-bin 0% \|################################################################\| 100% Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format > echo $? 0 > git status <bunch of files> ``` `--diff` mode ``` > ./tools/clang_format.py --diff Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format Some files are not formatted correctly > echo $? 1 <format files using the script> > ./tools/clang_format.py --diff Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format All files are formatted correctly > echo $? 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34566 Differential Revision: D20431290 Pulled By: SplitInfinity fbshipit-source-id: 3966f769cfb923e58ead9376d85e97127415bdc6	2020-03-12 17:08:54 -07:00
Jerry Zhang	90ca7a1feb	[quant][graphmode] Add Finalize function that inlines graph and produce quantized ops (#33927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927 Test Plan: test will be added in later PRs Imported from OSS Differential Revision: D20354879 fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404	2020-03-12 14:52:58 -07:00
Nikita Shulga	9f05fc9322	[Aten] First argument of check_names_valid_for() should be an unsigned value (#34158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34158 Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232089 fbshipit-source-id: d74b5e36a139998e6967b7b6339001c49d9d58e8	2020-03-12 13:46:37 -07:00
Hao Lu	721bd11cc3	[caffe2] Refactor out common util functions from tvm_transformer (#34652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34652 Split from D20006007 because it needs to synced to open source and also for easy testing & landing. Test Plan: ``` buck test caffe2/caffe2/fb/tvm:test_tvm_transform ``` CI Reviewed By: yinghai Differential Revision: D20414037 fbshipit-source-id: 6e17dd9f8cffe87bc59c6e3cc6fd1f8d8def926b	2020-03-12 13:30:15 -07:00
Elias Ellison	787c307e63	Revert D20368543: [pytorch][PR] [JIT] remove specialized list ops Test Plan: revert-hammer Differential Revision: D20368543 Original commit changeset: ad0c6d70d2a6 fbshipit-source-id: b8b1a64ac830d5f544567714b940c57274194d3f	2020-03-12 12:55:49 -07:00
Shihao Xu	8c332ff84f	[JIT] EliminateDeadCode shouldn't remove custom operator node that has untracked mutation (#34635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34635 For custom op, it's removed in EliminateDeadCode IR optimization step, causing wrong training result. EliminateDeadCode decides to remove it, because it has no output, so output is used. Also, it has no side effect, and has no untracked mutation, which is not true, custom op can have untracked mutation. The if statement here only allows aten and prim operator to have untracked mutation, which should be removed. ghstack-source-id: 100001319 Test Plan: ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_jit buck build mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_jit \ && buck-out/gen/caffe2/torch/fb/distributed/pytorch/tests/test_jit\#binary.par -r test_use_dense_adagrad_step ``` Reviewed By: wanchaol Differential Revision: D7440221 fbshipit-source-id: e424417ab397d90075884c7050c59dfc5c84cf77	2020-03-12 12:37:32 -07:00
Chunli Fu	fe9b4e3cba	[DPER3] Blob Reorder (#33579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33579 Differential Revision: D20008865 fbshipit-source-id: f35aded311d9d1d7d438d828ccabd2bab5575e5c	2020-03-12 12:28:12 -07:00
peterjc123	9e6cd98c3f	Ensure torch_cuda is linked against on Windows (#34288 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34288 Differential Revision: D20314251 Pulled By: seemethere fbshipit-source-id: 15ab2d4de665d553a1622a2d366148697deb6c02	2020-03-12 12:16:44 -07:00
xiaobingsuper	31cd893899	remove some TH dead code (#34644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34644 Test Plan: Imported from OSS Differential Revision: D20423063 Pulled By: ngimel fbshipit-source-id: 2783345ea9b3ed65e51a7d0e17cfa29f2c12cc43	2020-03-12 12:10:32 -07:00
vishwakftw	cb06cb7b9f	Remove hotpatches that circumvent MAGMA bug (#34357 ) Summary: Changelog: - The magma implementation of small singular square batch matrices had a bug that resulted in nan values in the LU factorization result. This has been fixed in MAGMA 2.5.2. This PR removes the existing patch that was a temporary workaround for this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34357 Test Plan: - Existing tests for det and lu should pass Differential Revision: D20422879 Pulled By: seemethere fbshipit-source-id: 8dd7a30b5c31fc5b844e0a11965efd46067e936a	2020-03-12 11:59:23 -07:00
gabloa	a74fbea345	Continuous bernoulli distribution (take 2) (#34619 ) Summary: We recently had a NeurIPS paper (https://arxiv.org/abs/1907.06845 and https://papers.nips.cc/paper/9484-the-continuous-bernoulli-fixing-a-pervasive-error-in-variational-autoencoders) where we introduce a new [0,1]-supported distribution: the continuous Bernoulli. This pull request implements this distribution in pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34619 Differential Revision: D20403123 Pulled By: ngimel fbshipit-source-id: d807c7d0d372c6daf6cb6ef09df178bc7491abb2	2020-03-12 11:53:18 -07:00
Ksenija Stanojevic	944ea4c334	ONNX Export Support for CrossEntropyLoss (#33767 ) Summary: Add ONNX export support for torch.nn.CrossEntropyLoss. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33767 Reviewed By: hl475 Differential Revision: D20121169 Pulled By: houseroad fbshipit-source-id: 7b56617e8c60617b922949fc8b4ecc626eedf7ed	2020-03-12 11:46:58 -07:00
peter	352e9b11e0	Attempt to resolve inconsistent dll linkage warnings on MSVC (#34639 ) Summary: Continue the work in https://github.com/pytorch/pytorch/pull/19242. Remove the template declarations that implies different dll linkage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34639 Differential Revision: D20419400 Pulled By: ezyang fbshipit-source-id: 5c7c30f0a4c3ba555589629f352ddb1c006c0c54	2020-03-12 11:41:02 -07:00
Jeremy Lilley	fff6fe83a7	[pytorch-rpc] WireSerializer should check has_storage() (#34626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34626 We need to check has_storage() before looking at it in cloneSparseTensors(), to avoid gratuitously throwing. Ideally, we'd add a test for this (I wrote one up but had to disable it), but won't work until JIT Pickler supports sparse tensors. ghstack-source-id: 100018077 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcAgent/... Differential Revision: D20399971 fbshipit-source-id: 5debfa8140eb1f949d37336330223962cc320abc	2020-03-12 11:35:21 -07:00
rohithkrn	2f32b92763	[ROCm] Enable BFloat16 type for EmbeddingBag ops et al (#34630 ) Summary: This PR enables bfloat16 type for - Embedding, Index, Sigmoid Ops used in [DLRM](https://github.com/facebookresearch/dlrm) - Miscellaneous ops like comparison ops, arange op used in unit tests - Rename types list with the pattern `*_with_bfloat16` in `test_torch.py` to avoid confusion iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34630 Differential Revision: D20405093 Pulled By: ezyang fbshipit-source-id: aa9538acf81b3a5a9a46ce5014529707fdf25687	2020-03-12 11:30:33 -07:00
svcscm	1e6c47413a	Updating submodules Summary: GitHub commits: `87f3feae5a` `cd6c8897f5` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 0c961541c715da74ae417ad25bf29f48e74e45d1	2020-03-12 11:23:39 -07:00
Pritam Damania	d81d65b2f7	Add entry for distributed tests to CODEOWNERS. (#34637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34637 ghstack-source-id: 100003837 Test Plan: waitforbuildbot Differential Revision: D20404552 fbshipit-source-id: a7f35beb8b78ad25e5cd000cd940dd7e94cc65de	2020-03-12 11:17:51 -07:00
eellison	f9f8424386	[JIT] remove specialized list ops (#34520 ) Summary: Now that lists are no longer specialized, we can register only one operator for list ops that are generic to their element type. This PR reorgs lists into three sets of ops: - CREATE_GENERIC_LIST_OPS - CREATE_SPECIALIZED_LIST_OPS - CREATE_COMPARATOR_LIST_OPS_SPECIALIZED (we didn't bind certain specialized ops to Tensor) This is important to land quickly because mobile is finalizing its bytecode soon, after which we could not remove these ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34520 Differential Revision: D20368543 Pulled By: eellison fbshipit-source-id: ad0c6d70d2a6be6ff0e948d6786052167fc43e27	2020-03-12 10:48:14 -07:00
Nathan Goldbaum	3f1ba3c465	Redo of "Add API for listing functions overridable by __torch_function__" (#34240 ) Summary: This is a redo of https://github.com/pytorch/pytorch/pull/33791, which was reverted because it introduced a flaky test. The test was flaky and only flaky on Python3.5 because of dict order randomization. I've fixed the issue with tests clobbering each other in b539fec and removed the override tests for `torch.nn.functional.tanh` and `torch.nn.functional.sigmoid`, which are deprecated and shouldn't be overridable in e0d7402. I also verified that no more test clobbering is happening. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34240 Differential Revision: D20252442 Pulled By: cpuhrsch fbshipit-source-id: 069568e342a41c90e1dc76cbf85ba4aed47f24be	2020-03-12 10:33:17 -07:00
Shihao Xu	4e07c35679	Delete all user forks tracked in RRefContext before graceful shutting down (#31893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31893 In order to resolve the issue summarized in https://github.com/pytorch/pytorch/issues/31325. The overal solution is to proactively send out delete fork messages from user nodes, before user nodes detecting rref leaks. As the first step, we want to have a weak ref tracker to track all user rrefs. ghstack-source-id: 100023142 Test Plan: V22 is the version that make User to wait on delete UseerRRef message. # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_nested_rref_stress --stress-runs 100 buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_nested_rref_stress buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par - r test_rref_forward_chain buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_non_garbage_collected_user_rref_due_to_local_circular_dependency ``` Reviewed By: mrshenli Differential Revision: D19292254 fbshipit-source-id: 92c3e8d0b00f183c5e22f163bdca482cc25a1ce9	2020-03-12 10:23:08 -07:00
Gregory Chanan	dd313f314e	Stop creating unnecessary Storage with newWithStorage1d. (#34389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34389 Test Plan: Imported from OSS Differential Revision: D20311060 Pulled By: gchanan fbshipit-source-id: 6d681e0a78e3ea3982d11cfd2eedca843f48302a	2020-03-12 10:18:28 -07:00
Gregory Chanan	518e9f94c2	Kill newWithStorage. (#34388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34388 Test Plan: Imported from OSS Differential Revision: D20311059 Pulled By: gchanan fbshipit-source-id: 4619a99c7bea76b54b7938b798eedc5bc2983dd5	2020-03-12 10:18:23 -07:00
Gregory Chanan	9fd08b9c37	Get rid of newWithSize. (#34387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34387 Test Plan: Imported from OSS Differential Revision: D20311058 Pulled By: gchanan fbshipit-source-id: b62653fd31a181d06aa73cda68abe75614cea0a9	2020-03-12 10:17:15 -07:00
Will Feng	a54416d208	[C++ API] Remove deprecated torch::nn::BatchNorm / FeatureDropout / modules_ordered_dict and torch::nn::init::Nonlinearity / FanMode (#34508 ) Summary: This PR is BC-breaking in the following way: - The deprecated `torch::nn::BatchNorm` is removed in favor of `torch::nn::BatchNorm{1,2,3}d` - The deprecated `torch::nn::FeatureDropout` is removed in favor of `torch::nn::Dropout{2,3}d` - The deprecated `torch::nn::modules_ordered_dict` is removed. User should do `Sequential sequential({{"m1", MyModule(1)}, {"m2", MyModule(2)}})` instead. - The deprecated `torch::nn::init::Nonlinearity` is removed, in favor of the following enums: - `torch::kLinear` - `torch::kConv1D` - `torch::kConv2D` - `torch::kConv3D` - `torch::kConvTranspose1D` - `torch::kConvTranspose2D` - `torch::kConvTranspose3D` - `torch::kSigmoid` - `torch::kTanh` - `torch::kReLU` - `torch::kLeakyReLU` - The deprecated `torch::nn::init::FanMode` is removed, in favor of the following enums: - `torch::kFanIn` - `torch::kFanOut` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34508 Differential Revision: D20351601 Pulled By: yf225 fbshipit-source-id: cca0cd112f29a31bb023e348ca8f82780e42bea3	2020-03-12 10:09:58 -07:00
Mansoor	e95657b87e	[C++ API] AdaptiveLogSoftmaxWithLoss (#29076 ) Summary: Implemented AdaptiveLogSoftmaxWithLoss and some tests for modules. Reference https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29076 Differential Revision: D20404588 Pulled By: yf225 fbshipit-source-id: edbadf432b8173cbcc6caf83c9c03dd92dc31a37	2020-03-12 09:53:58 -07:00
albanD	157d2d7825	Fix version check for grad_fn for views (#34145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34145 This fix the following behavior: ```python import torch class MyFn(torch.autograd.Function): staticmethod def forward(ctx, inp, inplace): view = inp.clone()[:3] if inplace: view += 2 return view staticmethod def backward(ctx, grad): return grad, None base = torch.rand(10, requires_grad=True) foo = MyFn.apply(base, False) print(foo.grad_fn) # <torch.autograd.function.MyFnBackward object at 0x7f5fd28c4d18> foo = MyFn.apply(base, True) print(foo.grad_fn) # <AsStridedBackward object at 0x7f601c0c3cf0> ``` Where both should be printing `MyFnBackward`. Test Plan: Imported from OSS Differential Revision: D20229907 Pulled By: albanD fbshipit-source-id: 5ebd315d459023017d51760c5bafe43acd5fc3e2	2020-03-12 09:47:56 -07:00
Vasiliy Kuznetsov	43c9cc7a9c	add quantized ELU activation (#34267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34267 Adds quantized ELU. Test Plan: ``` python test/test_quantized.py TestQuantizedOps.test_qelu ``` still need to benchmark, saving that for after the review comments Imported from OSS Differential Revision: D20370953 fbshipit-source-id: fe941bf966f72dd9eee2c4b2ef45fe7afb50c866	2020-03-12 09:31:00 -07:00
Elias Ellison	514cba0661	[JIT] remove builtin interpolate functions (#34514 ) Summary: `torch.nn.functional.interpolate` was written as a builtin op when we scripted the standard library, because it has four possible overloads. As a result, whenever we make a change to `interpolate`, we need to make changes in two places, and it also makes it impossible to optimize the interpolate op. The builtin is tech debt. I talked with ailzhang, and the symbolic script changes are good to remove (i guess that makes a third place we needed to re-implement interpolate). I'm trying to get rid of unneccessary builtin operators because we're standardizing mobile bytecode soon, so we should try to get this landed as soon as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34514 Differential Revision: D20391089 Pulled By: eellison fbshipit-source-id: abc84cdecfac67332bcba6b308fca4db44303121	2020-03-12 09:21:33 -07:00
Vitaly Fedyunin	962e362427	Fix _cat operator (#34591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34591 Test Plan: Imported from OSS Differential Revision: D20388000 Pulled By: VitalyFedyunin fbshipit-source-id: 8ae7593dbddc1a96a03193a99afc9a4ce46203ad	2020-03-12 09:20:10 -07:00
Nikita Shulga	a22008f91e	Prohibit copying autograd engines (#34567 ) Summary: Make sure that there could not be more than one instance of either `torch::autograd::Engine` or `torch::autograd::python::PythonEngine` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34567 Test Plan: CI Differential Revision: D20390622 Pulled By: malfet fbshipit-source-id: c90595032afc88f552dee52901361b58b282dc1a	2020-03-12 08:06:53 -07:00
Terence Feng	3c76b2aeea	Replace THPLayout with at::Layout in Python Argument Parser (#34543 ) (#34584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34584 Test Plan: ``` python setup.py develop python test/test_torch.py ``` Output: ``` ... Ran 3834 tests in 198.825s OK (skipped=180) ``` Imported from OSS Differential Revision: D20403330 fbshipit-source-id: 41474d5e7001db070f98ac8379f909f0ac74deb6	2020-03-12 07:19:00 -07:00
Lingyi Liu	f70945b1c3	fix the quantized batchnorm2d (#34579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34579 Differential Revision: D20382783 Pulled By: lly-zero-one fbshipit-source-id: dadfc4974cb4c808f1eedf8cc4ec52ec8d3ea1b0	2020-03-12 00:48:40 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Edward Yang	cf8b728255	Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34588 I constructed the patch by deleting OperatorOptions and then rerouting all queries for AliasAnalysisKind to FunctionSchema. Some of the behavior is kind of bogus: we really shouldn't be mutating FunctionSchema after the fact, but that won't get fixed until we actually switch to true schema merging. Reland of https://github.com/pytorch/pytorch/pull/34160 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20387079 Pulled By: ezyang fbshipit-source-id: d189f7a6ad8cd186b88b6fbfa3f189994eea14e8	2020-03-11 20:59:46 -07:00
Samuel	b039bca4db	Fix typo in data.rst (#34624 ) Summary: Fix minor typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/34624 Differential Revision: D20401946 Pulled By: ngimel fbshipit-source-id: 0c6a7d838aa15120b3ecb8b9ba4b57550c9bcd32	2020-03-11 19:40:18 -07:00
Linbin Yu	2fe7fc681d	[PT] add macro to expose caffe2 ops to PyTorch mobile (#34578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578 Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile. Test Plan: verified caffe2 ops are registered in PT mobile (on the whole stack) ``` _caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output) _caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size) _caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints) _caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y) _caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor) Reviewed By: dreiss Differential Revision: D20128254 fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4	2020-03-11 19:15:14 -07:00
Xianjie Chen	0dc0fffca1	[net_transform] only skip ConstantFill for autogen_grad (#34628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34628 Differential Revision: D20370564 fbshipit-source-id: 854c8ab44ba262e5020383447ed6bb629064ec33	2020-03-11 19:09:52 -07:00
Xiang Gao	86fb522acd	Remove cudaMemcpy on full memory overlap (#34548 ) Summary: TensorIterator is already checking partial overlap, so there is no trivial UB, but TensorITerator allows full overlap, and it is not a bad idea to skip the memcpy in such case. fixes: https://github.com/pytorch/pytorch/issues/34525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34548 Differential Revision: D20371643 Pulled By: ngimel fbshipit-source-id: ff9e2e872537010afe040204e008b2499af963ad	2020-03-11 17:36:03 -07:00
Kimish Patel	adb8e26182	Fix for handling batch size 0. (#34599 ) Summary: Separating this out in a different diff, however since most of the xnnpack integration is not tested until the PR https://github.com/pytorch/pytorch/issues/34047, this was not caught till then. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34599 Test Plan: Tested in test/test_xnnpack_integration.py via https://github.com/pytorch/pytorch/issues/34047. Differential Revision: D20391000 Pulled By: kimishpatel fbshipit-source-id: 596a3e54445072ab63f700d425d07c7f44586683	2020-03-11 16:36:28 -07:00
Will Feng	9064fafb6e	[C++ API] Update torch::nn layer docs (#34522 ) Summary: This PR updates C++ API torch::nn layer docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34522 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20380832 Pulled By: yf225 fbshipit-source-id: ee99a838ec05c6ce2a23aa97555707e507d09958	2020-03-11 16:09:09 -07:00
Meghan Lele	56832bf7f3	[JIT] Add support for tolist for GPU-resident Tensors (#34554 ) Summary: Summary This commit modifies the JIT implementation of `Tensor.tolist` so that it can be called on GPU-resident Tensors as well. If the Tensors is not on the CPU when the operator is invoked, it is copied to the CPU before doing any of the rest of the work to convert it into a list. Testing This commit adds GPU versions of some of the existing CPU tests for this feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34554 Differential Revision: D20392604 Pulled By: SplitInfinity fbshipit-source-id: 69c17b98d866428c19d683588046169538aaf1e3	2020-03-11 15:14:12 -07:00
Michael Suo	866505b100	[ci] try to fix rocm builds (#34600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34600 They are failing with: ``` E: The method driver /usr/lib/apt/methods/https could not be found. ``` Trying the solution recommended in: https://unix.stackexchange.com/questions/263801/apt-get-fails-the-method-driver-usr-lib-apt-methods-https-could-not-be-found The long-term solution is to move all this to be pre-installed in the docker image. Test Plan: Imported from OSS Differential Revision: D20391153 Pulled By: suo fbshipit-source-id: 959dff2ea9e77bb52739c0659e9d800cdbe4cb01	2020-03-11 15:01:12 -07:00
lordeddard	2de4f245c6	Fix typo in documentation (#34581 ) Summary: Update the parameter description of `total_steps` in `OneCycleLR`. References https://github.com/pytorch/pytorch/issues/34531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34581 Differential Revision: D20386306 Pulled By: albanD fbshipit-source-id: f8b424a01760e8f5d4de5367b6c60fb342019689	2020-03-11 13:57:10 -07:00
Peng Xia	25e4e9eb86	[On-device Benchmark] speed_benchmark_torch switch to log latency from dataset level to row level (#34598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34598 as above Test Plan: test.txt ``` what time is it now could you set a reminder at 7 am waht is the weather today ``` example json ``` { "model": { "category": "CNN", "description": "Assistant Mobile Inference", "files": { "model": { "filename": "model.pt1", "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" }, "data": { "filename": "input.txt", "location": "/home/pengxia/test/input.txt", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" } }, "format": "pytorch", "framework": "pytorch", "kind": "deployment", "name": "Assistant Mobile Inference" }, "tests": [ { "command": "{program} --model {files.model} --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter 5 --input_file {files.data} --report_pep true", "identifier": "{ID}", "metric": "delay", "iter": 15, "warmup": 2, "log_output": true } ] } ``` iter = 5 (--iter 5 ) 3(3 lintes in the test.txt) = 15 arbabu123 I will provide a wrapper to compute the iter in future. run following command ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G960U-8.0.0-26 ``` results https://our.intern.facebook.com/intern/aibench/details/275259559594003 Note: this is compatible with the existing examples.* Reviewed By: kimishpatel, ljk53 Differential Revision: D20389285 fbshipit-source-id: 80165ef394439a307ac7986cf540a80fdf3d85d6	2020-03-11 13:51:42 -07:00
Gemfield	70f3298684	Fix SELECTED_OP_LIST file path issue (#33942 ) Summary: If SELECTED_OP_LIST is specified as a relative path in command line, CMake build will fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33942 Differential Revision: D20392797 Pulled By: ljk53 fbshipit-source-id: dffeebc48050970e286cf263bdde8b26d8fe4bce	2020-03-11 13:19:31 -07:00
James Reed	1f834b5c2a	[JIT] Torchbind error if python instantiate class that doesnt exist (#34568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34568 Test Plan: Imported from OSS Differential Revision: D20378106 Pulled By: jamesr66a fbshipit-source-id: 395a3b05d23727b9cfd074440b2d0e8ef002ec09	2020-03-11 13:13:08 -07:00
Syoyo Fujita	12fb8148e4	Disable ROCM when building mobile libtorch. (#34478 ) Summary: When a system has ROCm dev tools installed, `scripts/build_mobile.sh` tried to use it. This PR fixes looking up unused ROCm library when building libtorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34478 Differential Revision: D20388147 Pulled By: ljk53 fbshipit-source-id: b512c38fa2d3cda9ac20fe47bcd67ad87c848857	2020-03-11 11:28:32 -07:00
Rohan Varma	b553e6911a	[distributed] quicker exit in the case of failed tests in distributed (#34150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34150 In the distributed setting we commonly have tests in which there are errors where one process exits but the other do not (since they are for example waiting for work from the process that exited). Currently, when this situation happens we do not handle this well, and wait for process 0 to timeout. This results in wasted time waiting for test errors and a less helpful "Process 0 timed out..." error message when the error was actually something else. This diff fixes the issue by checking for exited subprocesses and terminating the test when we see a subprocess that has exited uncleanly. We still enforce timeouts and return when all processes have exited cleantly in the happy path. ghstack-source-id: 99921462 Test Plan: All distributed tests + tested by writing tests that should trigger the unclean subprocess detection, and verified that we exit quickly instead of waiting for the entire timeout. Differential Revision: D20231032 fbshipit-source-id: 3e0d4a20925b7d1098ec4c40ffcc66845425dd62	2020-03-11 11:27:17 -07:00
ettiee	2cf576e9ea	small typos (#34589 ) Summary: Spotted a couple of small typos 🙏 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34589 Differential Revision: D20387653 Pulled By: ngimel fbshipit-source-id: 3089fe606ccb8c8ee57cf7a900aba714fd0ce567	2020-03-11 11:01:31 -07:00
Gregory Chanan	82cdd3abae	Stop last usage of newWithSize. (#34386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34386 Test Plan: Imported from OSS Differential Revision: D20311061 Pulled By: gchanan fbshipit-source-id: 1e90a90db2efa1a566d4a78a6d1b8d918b91cf66	2020-03-11 09:58:30 -07:00
Edward Yang	4b929e5466	Revert D20193196: [pytorch][PR] PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem Test Plan: revert-hammer Differential Revision: D20193196 Original commit changeset: 78a487991242 fbshipit-source-id: 8da4f8cb17c45af41e8c0ce80bc72581eb10dbb8	2020-03-11 09:24:34 -07:00
Edward Yang	6f8a8e4e47	Revert D20282846: Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. Test Plan: revert-hammer Differential Revision: D20282846 Original commit changeset: ba7bca6e8adc fbshipit-source-id: b9e15d2b2c3d1dbc6e971ab3c0bdf380e769dcf1	2020-03-11 07:50:29 -07:00
Edward Yang	63964175b5	Revert D20379910: [pytorch][PR] Set USE_RCCL cmake option (dependent on USE_NCCL) Test Plan: revert-hammer Differential Revision: D20379910 Original commit changeset: 981f924be93d fbshipit-source-id: 2cfc2eebe6ebabf801f0ea6a183aad2342ada79f	2020-03-11 07:41:13 -07:00
Pearu Peterson	2ec779d46c	PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem (#29488 ) Summary: This PR implements the following linear algebra algorithms for low-rank matrices: - [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061). + exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061). + uses `torch.lowrank.get_approximate_basis` + exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] PCA - using `torch.svd_lowrank` + uses `torch.svd_lowrank` + exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices, uses non-centered sparse matrix algorithm + [x] documentation - [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124) + exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883) + exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically + the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point. - [x] benchmarks + [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations. + [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster. + [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results) Resolves https://github.com/pytorch/pytorch/issues/8049. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488 Differential Revision: D20193196 Pulled By: vincentqb fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e	2020-03-11 07:33:49 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Edward Yang	9d42177a31	Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34160 I constructed the patch by deleting OperatorOptions and then rerouting all queries for AliasAnalysisKind to FunctionSchema. Some of the behavior is kind of bogus: we really shouldn't be mutating FunctionSchema after the fact, but that won't get fixed until we actually switch to true schema merging. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20282846 Pulled By: ezyang fbshipit-source-id: ba7bca6e8adc3365789639b88e54c4e881b1692e	2020-03-11 07:15:18 -07:00
Edward Yang	b2344b70da	Beef up documentation on Dispatcher.h, reorder methods for clarity. (#33838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33838 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20227875 Pulled By: ezyang fbshipit-source-id: 319855b1f0fa436f9ed5256d2106b07f20e6b833	2020-03-11 07:13:39 -07:00
Kurt Mohler	fbbeee0983	Port `remainder` from TH to ATen (CPU and CUDA) (#34136 ) Summary: CPU issue https://github.com/pytorch/pytorch/issues/24753 CUDA issue https://github.com/pytorch/pytorch/issues/24615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34136 Differential Revision: D20375458 Pulled By: ezyang fbshipit-source-id: 1a9fb39a7e2d17a0d31bd14b211eaacea060e834	2020-03-11 07:08:11 -07:00
Jiakai Liu	7aca9afdfb	[pytorch] remove boilerplate setQEngine() from PyTorch mobile predictors (#34556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34556 According to https://github.com/pytorch/pytorch/pull/34012#discussion_r388581548, this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't really necessary for mobile. In Context.cpp it selects the last available QEngine if the engine isn't set explicitly. For OSS mobile prebuild it should only include QNNPACK engine so the default behavior should already be desired behavior. It makes difference only when USE_FBGEMM is set - but it should be off for both OSS mobile build and internal mobile build. Test Plan: Imported from OSS Differential Revision: D20374522 Pulled By: ljk53 fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698	2020-03-11 00:55:14 -07:00
Pritam Damania	2ce9513b0c	AccumulateGrad: ensure sparse tensor indices and values refcount is always 1 (#34559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34559 We check the use_count for indices and values when we avoid a clone for sparse tensors. The sparse tensor grad itself might have a higher refcount due to DDP hooks/dist autograd structures holding refs, but the indices and values inside the sparse tensor should always have a refcount of 1. ghstack-source-id: 99900534 Test Plan: waitforbuildbot Differential Revision: D20375239 fbshipit-source-id: 6a654549d13071ab3451cef94259caf7627b575c	2020-03-10 23:41:44 -07:00
Ailing Zhang	ab2297dfe6	Add Tensor overload for start in narrow. (#34317 ) Summary: https://github.com/pytorch/pytorch/issues/31558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34317 Differential Revision: D20294333 Pulled By: ailzhang fbshipit-source-id: 47c6646ae298e04a455923bd5048db026a5e3c7c	2020-03-10 22:33:22 -07:00
Vasiliy Kuznetsov	2e88a78d2e	add quantized_hardtanh (#34097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34097 Adds quantized hardtanh. Calls the clamp kernel behind the scenes. Test Plan: ``` python test/test_quantized.py ``` Imported from OSS Differential Revision: D20208860 fbshipit-source-id: 165a6a1c22f1dcc479679e5ea0c990d0e9c3b6c5	2020-03-10 22:27:15 -07:00
Shihao Xu	8d84c5f1c7	Fix static data initialization deadlock on GIL (#34505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34505 A thread could hold GIL when calling PythonRpcHandler::getInstance(), meantime another thread could have been doing static data initialization by calling `new PythonRpcHandler()`, inside of which GIL is also required. Static data initialization is thread-safe, so the thread holding the GIL will wait for the other thread to finish static data initializating before going forward. Because the initialization can't proceed without GIL, there is a deadlock. We ask the calling thread to release GIL to avoid this situation. ghstack-source-id: 99893858 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn -- 'test_backward_simple_script_call $test_dist_autograd_spawn\.DistAutogradTestWithSpawn$' --stress-runs 100 ``` Differential Revision: D7490489 fbshipit-source-id: 76f63cc7bedf088d3dbff288f53aa0bd33749255	2020-03-10 20:40:22 -07:00
Jithun Nair	ce77d4a316	Set USE_RCCL cmake option (dependent on USE_NCCL) (#31341 ) Summary: so that Gloo build has RCCL path enabled for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/31341 Differential Revision: D20379910 Pulled By: ezyang fbshipit-source-id: 981f924be93ddcc0705c1934f92d938c29aaf312	2020-03-10 20:26:09 -07:00
davidriazati	23b2fba79a	[jit] Add type tags to lists/dicts in pickle (#33255 ) Summary: Stacked PRs * #33474 - [jit] Remove list specializations from pickler * #33255 - [jit] Add type tags to lists/dicts in pickle This adds a global call to `torch.jit._pickle.restore_type_tags` for lists and dicts so that we can preserve their types after serialization. ](https://our.intern.facebook.com/intern/diff/20346780/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255 Pulled By: driazati Differential Revision: D20346780 fbshipit-source-id: c8534954ef4adb2e3c880401acbee30cd284f3db	2020-03-10 19:17:01 -07:00
Jiakai Liu	4167db11f7	[pytorch][ci] add build_only flag to mobile CI jobs (#34560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34560 These jobs don't have next phase so we don't really need commit the docker images. Should also fix issue #34557. Test Plan: Imported from OSS Differential Revision: D20375308 Pulled By: ljk53 fbshipit-source-id: 328cb428fcfb0fbb79b2a233b5f52607158c983c	2020-03-10 17:45:51 -07:00
Daya Khudia	a09c4d3997	[pt][quant] Vectorized qmul and more methods on qint data types (#34376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34376 Vectorized implementation of qmul. qmul is now ~16x faster on my development machine. This implementation works for qint8, quint8 and qint32. Also added some commonly used operations, such as multiply operator, requantize operation etc., to qint vector classes for future use. ``` #!/usr/bin/env python import time import torch import torch.nn as nn torch.set_num_threads(1) # print(torch.__config__.parallel_info()) A = torch.rand(1, 54, 54, 256) B = torch.rand(1, 54, 54, 256) scale = .05 zero_point = 50 for dtype in [torch.quint8, torch.qint8]: qA = torch.quantize_per_tensor(A, scale=scale, zero_point=zero_point, dtype=dtype) qB = torch.quantize_per_tensor(B, scale=scale, zero_point=zero_point, dtype=dtype) NITER = 1000 s = time.time() for i in range(NITER): out = torch.ops.quantized.mul(qA, qB, scale=scale, zero_point=zero_point) time_per_iter = (time.time() - s) / NITER print('dtype: {} time per iter ms: {:.3f}'.format(dtype, time_per_iter * 1000)) ``` ### Before dtype: torch.quint8 time per iter ms: 6.714 dtype: torch.qint8 time per iter ms: 6.780 ### After dtype: torch.quint8 time per iter ms: 0.431 dtype: torch.qint8 time per iter ms: 0.417 ### Test Modified qmul tests to include qint8 and qint32 data types. python test/test_quantized.py TestQuantizedOps.test_qmul_relu_same_qparams python test/test_quantized.py TestQuantizedOps.test_qmul_relu_different_qparams python test/test_quantized.py TestQuantizedOps.test_qmul_broadcast ghstack-source-id: 99862681 Differential Revision: D20308515 fbshipit-source-id: 4fa65b2ba433cfd59260fc183a70f53a6fcc36b4	2020-03-10 16:51:41 -07:00
Meghan Lele	903ad90325	[JIT] Introduce a fake Tensor creation node for IR unit tests (#34334 ) Summary: Summary There is often a need to create a Tensor when writing IR by hand for JIT optimisation pass unit tests. The only options for this today are real Tensor creation functions like `aten::ones`. Any test that uses these functions must also use the same default arguments as the Python/C++ API, which means that all of the tests have to be updated when the API is updated. This commit introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that should be used in unit tests instead of real Tensor creation functions. This new primitive has no public-facing API, so the maintenance burden is much lower. Testing This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of `aten::rand`, `aten::ones`, and `aten::zeros`. ``` $ ./bin/test_jit CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -_CUDA:_MultiCUDA [==========] Running 75 tests from 1 test case. [----------] Global test environment set-up. [----------] 75 tests from JitTest [ RUN ] JitTest.ADFormulas [ OK ] JitTest.ADFormulas (82 ms) [ RUN ] JitTest.Attributes [ OK ] JitTest.Attributes (0 ms) ... ... ... [ RUN ] JitTest.LiteInterpreterPrim [ OK ] JitTest.LiteInterpreterPrim (0 ms) [ RUN ] JitTest.LiteInterpreterLoadOrigJit [ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms) [----------] 75 tests from JitTest (150 ms total) [----------] Global test environment tear-down [==========] 75 tests from 1 test case ran. (150 ms total) [ PASSED ] 75 tests. ``` Fixes* This pull request fixes https://github.com/pytorch/pytorch/issues/33500. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34334 Differential Revision: D20296437 Pulled By: SplitInfinity fbshipit-source-id: df4e7b0881ae4913424e5a409bfa171a61c3e568	2020-03-10 16:12:45 -07:00
Gao, Xiang	d0834c5b64	Preserve memory format for torch.cat on CUDA (#34526 ) Summary: fix https://github.com/pytorch/pytorch/issues/34084 cc: ptrblck VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/34526 Differential Revision: D20371847 Pulled By: ngimel fbshipit-source-id: e3b1a34caff2db8099ad9afe91bf9b473d5da6e8	2020-03-10 16:06:10 -07:00
Vincent Quenneville-Belair	be3bc1deb1	convert counter back to list #33229 (#33356 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356 Differential Revision: D20003196 Pulled By: vincentqb fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92	2020-03-10 15:46:24 -07:00
Nikita Shulga	dd7cec680c	Do not use clang if it can not parse system extensions (#34549 ) Summary: Attempt to build pytorch with ASAN on system with gcc-8 fails due to the mismatch system compilation flags. Address the issue by using original compiler to build `torch._C` extension Pull Request resolved: https://github.com/pytorch/pytorch/pull/34549 Test Plan: Run `.jenkins/pytorch/build-asan.sh` on FC-30 Differential Revision: D20373781 Pulled By: malfet fbshipit-source-id: 041c8d25f96b4436385a5e0eb6fc46e9b5fdf3f1	2020-03-10 15:40:08 -07:00
Lingyi Liu	09296c34a4	Add the build for runtime dispatch for AVX, AVX2 instruction set (#26125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26125 We already had some optimization implementation using AVX2 for improve the quantized kernel performance. In this diff, we want to enable the runtime dispatch. Test Plan: Sandcastle build and test Also test with a python binary calling into vectorized op. torch.__config__.show() PyTorch built with: - GCC 4.2 - clang 8.0.20181009 - Intel(R) Math Kernel Library Version 2017.0.3 Product Build 20170413 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v0.18.1 (Git Hash N/A) - OpenMP 1 - CPU capability usage: AVX2 - Build settings: Reviewed By: jamesr66a Differential Revision: D17337251 fbshipit-source-id: 8e22d10011a12a4eaf54cea3485353eb1811d828	2020-03-10 15:32:57 -07:00
Igor Sugak	259d7299db	[caffe2] do not declare __assert_fail in clang builds (#33893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33893 It appears that when Clang drives CUDA compilation ` __assert_fail` is always defined as device function. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true -c cxx.untracked_headers=ignore //fblearner/flow/projects/dper:workflow ``` Reviewed By: ngimel Differential Revision: D20145034 fbshipit-source-id: 23153411ed631e05421c7afcf41b7ea5619cdd96	2020-03-10 14:45:03 -07:00
anjali411	2d24005d18	[C++ API Parity] rmsprop optimizer update (#33450 ) Summary: This PR is BC-breaking in the following way: In RMSpropOptions: 1. learning_rate is renamed to lr. Test plan before 1.5 release: Test that in 1.5 we can load a C++ RMSprop optimizer that was serialized in 1.4, and their states are the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33450 Differential Revision: D20366623 Pulled By: anjali411 fbshipit-source-id: 83250be9b583a766927e0e22a4de8b0765379451	2020-03-10 13:30:56 -07:00
David Reiss	6f12145c60	Change std::to_string call to c10::to_string Summary: I'm using this code in an internal Android build, and std::to_string doesn't work in our internal Android builds yet. Test Plan: Internal build. Reviewed By: ljk53 Differential Revision: D20234221 fbshipit-source-id: 8fd61235bf9b487e07a1459c452830e732c7afb0	2020-03-10 13:18:27 -07:00
Terence Feng	2cf344be4c	Turn on exact_dtype by default on test_sparse.py (#34489 ) (#34542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34542 Turn on exact_dtype by default on test_sparse.py (#34489) Pull Request resolved: #34489 Test Plan: ``` python test/test_sparse.py ``` Imported from OSS Differential Revision: D20369764 fbshipit-source-id: ade2434f77af8ae419bda653b4c46616c052a8b2	2020-03-10 12:52:09 -07:00
Pritam Damania	b185359fb4	Avoid clone for sparse tensors during accumulation of grads. (#33427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33427 This PR is an attempt to avoid clone for sparse tensors similar to how we avoid clone for dense tensors currently. As per my understanding even if the 'indices' and 'values' of a sparse tensor are non-continguous, operations like 'add' are still supported. As a result, the major change in this PR is to use create a shallow copy instead of clone() for sparse tensors. ghstack-source-id: 99838375 Test Plan: waitforbuildbot Differential Revision: D19926698 fbshipit-source-id: b5a3f36c2aa273e17f8b7a9f09c1ea00e7478109	2020-03-10 12:41:47 -07:00
Eli Uriegas	5f61f42c79	.circleci: Switch should_run_job cuda 10.1 -> 10.2 (#34498 ) Summary: We updated the default jobs to run in a different PR but neglected to update this script as well. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34498 Differential Revision: D20368420 Pulled By: seemethere fbshipit-source-id: 240171b18f397095e3a8d57de3a29d1d2e891d85	2020-03-10 12:25:09 -07:00
Natalia Gimelshein	cd9d9a2235	fix handling of replica parameters in DataParallel (#33907 ) Summary: In DataParallel, replica parameters are not leaves (because they are computed via broadcast from master parameters), and should be treated as such. Fixes https://github.com/pytorch/pytorch/issues/33552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33907 Differential Revision: D20150199 Pulled By: ngimel fbshipit-source-id: 5965d4115b6b3a8433063126ff6269567872fbeb	2020-03-10 10:35:44 -07:00
Xiang Gao	0dbfb26e53	Clean up include list of Shape.cu (#34528 ) Summary: The include list seems to be copied from somewhere else, and some totally unrelated files are included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34528 Differential Revision: D20358622 Pulled By: ngimel fbshipit-source-id: d8a6260f5f77b0eabdbd68e3728873efd632d9bc	2020-03-10 10:29:20 -07:00
Yanli Zhao	cb689a5d68	remove duplicated process group gloo timeout (#31342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31342 Test Plan: unit test Differential Revision: D19131704 fbshipit-source-id: 4e91d5933635ee2c7c301caf89a5a7009c5cb7c8	2020-03-10 09:08:02 -07:00
Andrew Delong	c7dd5f89a2	Fix #33562 (uncaught domain_error on macOS) (#34301 ) Summary: Tries to fix https://github.com/pytorch/pytorch/issues/33562 by raising `std::runtime_error` instead of `std::domain_error`. * The Python tests already expect `RuntimeError` so this shouldn't affect Python users of PyTorch. * If someone out there is using C10 or ATen from C++ and tries to catch `std::domain_error` specifically, this fix would break their code. Hopefully that's not the case. Alternative to this PR is someone try to really get to the bottom of why `std::domain_error` isn't being caught. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34301 Differential Revision: D20344579 Pulled By: ezyang fbshipit-source-id: d5f3045085a2f75b71b864335ebf44991d0cad80	2020-03-10 08:56:38 -07:00
Jithun Nair	9e94e46453	Check if rnn weights need to be flattened (#34265 ) Summary: cuDNN needs it, MIOpen doesn't. However, since it seems to be the PyTorch preference to not introduce ROCm-specific logic in the python layer, we need to add a C++ function to detect if rnn weight flattening is needed. This PR will be needed to fix the rnn unit test errors arising for PR https://github.com/pytorch/pytorch/issues/33837. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34265 Differential Revision: D20345105 Pulled By: ezyang fbshipit-source-id: a2588a6e2ac6f7d1edf2b7872bc6a879a7df96ec	2020-03-10 08:45:29 -07:00
rohithkrn	29b673392f	[ROCm] Enable BFloat16 type for loss functions and few misc ops required for resnet50 (#34469 ) Summary: This PR enables bfloat16 type for loss criterion ops(and the ops they depend on) and few miscellaneous ops required to train resnet50. iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34469 Differential Revision: D20348856 Pulled By: ezyang fbshipit-source-id: 0a8f06c2169cfa3c9cf319120e27150170095f6c	2020-03-10 08:39:07 -07:00
Yuxin Wu	20b18a58f1	Update compiler warning about ABI compatibility (#34472 ) Summary: 3ac42677633a39c588c3fea19d2d4121f114edb3 already forces pytorch to use gcc>=5 everywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/34472 Differential Revision: D20345134 Pulled By: ezyang fbshipit-source-id: 3ce706405e8784cac5c314500466b5f988ad31bf	2020-03-10 08:12:07 -07:00
Richard Zou	f5ee46f1cf	Remove custom function in no_grad block error message (#33896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33896 Fixes #32625. Previously, we'd receive an error message if we have a custom function return a view of an input in a no_grad block: ``` class Alias(Function): staticmethod def forward(ctx, x): return x[:] staticmethod def backward(ctx, gx): return gx inp = torch.rand(2, requires_grad=True) with torch.no_grad(): # Used to error out output = Alias.apply(inp) ``` After this change, the error no longer happens. The behavior changes to become consistent to if we had implemented an operator that does the same thing as the custom function: - the output requires_grad - we are able to detect (and error out) if the user tries to modify the output in-place outside of the no_grad block. Test Plan: - new test Differential Revision: D20345601 Pulled By: zou3519 fbshipit-source-id: 7f95b4254f52ddbf989d26f449660403bcde1c78	2020-03-10 07:58:55 -07:00
Richard Zou	3e6e2e9b7b	Print the current Node name in anomaly mode (#33875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33875 Fixes #33675. I added a `current_node_name` argument to AnomalyMetadata::print_stack. This is a mandatory arg because I found only one callsite and making it a default arg on a virtual function can be confusing. Test Plan: - Tested locally: https://gist.github.com/zou3519/09937387c83efc76e1700374d5c9c9d9 - I don't know how to add a test for this: the message is printed to stderr but it isn't an exception nor a warning. I considered capturing the stderr of a subprocess but that seems like asking for flakiness. Differential Revision: D20349399 Pulled By: zou3519 fbshipit-source-id: 7585ddffe2bf9e1081f4028a9c44de783978a052	2020-03-10 07:51:52 -07:00
Pritam Damania	d30fa4837e	Unify gradient accumulation between distributed autograd and local autograd (#33214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33214 Distributed autograd had some custom logic in terms of how we accumulated gradients. This was mostly done early on to enable basic functionality. Although, in the long term we should merge this logic with what we have in the local autograd engine. A lot of work has gone into ensuring we accumulate grads correctly and efficiently and we should reuse that as a starting point. We can investigate if we need further custom logic for distributed autograd later on if we need additional optimizations. In this PR I've merged the gradient accumulation logic and also the gradient hooks. As a result, now gradient hooks are called in distributed autograd as well. ghstack-source-id: 99838019 Test Plan: waitforbuildbot Differential Revision: D19843284 fbshipit-source-id: 7923d7e871fb6afd3e98dba7de96606264dcb5f3	2020-03-10 01:56:08 -07:00
Colin Jermain	4f62cbe7de	[ONNX] Support one_hot (#34454 ) Summary: This PR resolves https://github.com/pytorch/pytorch/issues/22534 by adding a converter for the `torch.nn.functional.one_hot` function, and covering it with a test. Are there other places this should be tested? Pull Request resolved: https://github.com/pytorch/pytorch/pull/34454 Reviewed By: hl475 Differential Revision: D20354255 Pulled By: houseroad fbshipit-source-id: 84224c1610b2cc7986c91441c65647ddc090750d	2020-03-09 22:26:36 -07:00
Michael Suo	965146b818	[jit] delete netdef converter (#33807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33807 afaik this is unused, so removing it from the source tree. RIP :( Test Plan: Imported from OSS Differential Revision: D20122118 Pulled By: suo fbshipit-source-id: cb45943f5b9f969482301a2f9fe540326dbc78f2	2020-03-09 22:25:16 -07:00
Mike Ruberry	3671036ef3	Adds true_divide function, analogous to Python 's, JAX's, NumPy's (true) division (#34236 ) Summary: See NumPy's division documentation here: https://numpy.org/doc/1.18/reference/generated/numpy.divide.html#numpy.divide. True division is the same as PyTorch's default division except when both inputs are integer or bool tensors. In the latter case the inputs are (conceptually) cast to the default floating type before the division is performed. The function is implemented for dense and sparse tensors and supports exporting to ONNX from PyTorch's eager mode or JIT traces. The function is inherently incompatible with exporting to ONNX via JIT script, and is another datapoint suggesting we should deprecate exporting scripted graphs to ONNX. Tests are added for the type promotion, named tensor, and ONNX export behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34236 Reviewed By: houseroad Differential Revision: D20334087 Pulled By: mruberry fbshipit-source-id: 83d00d886f46f713215d7d9e02ffd043164c57f1	2020-03-09 21:06:33 -07:00
Nikita Shulga	e408d46477	Print pytorch version before running ASAN tests (#34521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34521 Test Plan: CI Differential Revision: D20357233 Pulled By: malfet fbshipit-source-id: 1c1b5a94a66d828383676a7a1403bbc13bb21c83	2020-03-09 20:52:46 -07:00
Shen Li	b9c32209db	Use SerializedPyObj in PythonRpcHandler::generatePythonUDFResult (#34495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34495 Differential Revision: D20347466 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 79625adb4ac3c9c6da4f40016e973bf17466c693	2020-03-09 20:41:05 -07:00
Shen Li	b82658810e	Split deserialize from _run_function in RPC internal.py (#34494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34494 Differential Revision: D20347463 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: e6fd886622f26c46bb83ac118e67abb2f5b296b9	2020-03-09 20:41:00 -07:00
Shen Li	544fb64440	Use SerializedPyObj in PythonRpcHandler (#34493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34493 Differential Revision: D20347462 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 9edda9eb95b1994464459271bb53ee77b760e474	2020-03-09 20:40:55 -07:00
Shen Li	18ef09f5ac	Remove _load_return_value from RPC internal.py (#34492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34492 Differential Revision: D20347468 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 92388d0d50a08fb895bacacf94c7b5495b4ae2b6	2020-03-09 20:40:50 -07:00
Shen Li	6d1c4df660	Consolidate Python Messages to use SerializedPyObj (#34491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34491 Differential Revision: D20347467 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: efae4111d961f3a528cede77c863fb049cda9029	2020-03-09 20:40:45 -07:00
Shen Li	3b661eb84c	Avoid copy contents in SerializedPyObj (#34490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34490 Differential Revision: D20347465 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: d59e74e3ee9122992a5c50a083e43ab31b7a70f5	2020-03-09 20:38:54 -07:00
James Reed	2de4fa702b	[JIT] Preserve qualified names on traced modules (#34395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34395 fixes: https://github.com/pytorch/pytorch/issues/33913 Test Plan: Imported from OSS Differential Revision: D20347778 Pulled By: jamesr66a fbshipit-source-id: 7b5a35b6f9678c34cb6127d531fa3bfe65703116	2020-03-09 19:23:53 -07:00
Yinghai Lu	79e1305519	[net_runner] Get shape info from qtensors (#34321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34321 Mostly cosmetic as we can infer the shape anyway. It can remove a lot of the noise in the log though. Note that weight sharing doesn't work yet. I'll add another diff to address this. Reviewed By: houseroad Differential Revision: D20290841 fbshipit-source-id: fe6f9b60d05dbe150af15b5d9d7a69fd902e12cc	2020-03-09 18:34:16 -07:00
Nikolay Korovaiko	e16908cb1f	profile block outputs; helps guard elimination (#33889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33889 Reviewed By: zdevito Differential Revision: D20294979 Pulled By: Krovatkin fbshipit-source-id: 2a68710ec8f8f854c99dfe173f49da442a39e498	2020-03-09 17:12:58 -07:00
Johannes M Dieterich	2c1a302d6a	[ROCm] Enable double __shfl_down (#34103 ) Summary: This allows us to enable some double-based pdist tests running into accrued error from casting down to float previously. Addresses https://github.com/pytorch/pytorch/issues/33128 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34103 Differential Revision: D20343279 Pulled By: ezyang fbshipit-source-id: a2da768259fab34ef326976283b7a15bebbbb979	2020-03-09 16:23:56 -07:00
Nikolay Korovaiko	0a4a558c2c	Dictionary Constants (#32869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32869 Differential Revision: D19909339 Pulled By: Krovatkin fbshipit-source-id: 6fe2a9b470768f84b957c69cdf9af3a1bd9b1ca9	2020-03-09 16:12:36 -07:00
Gregory Chanan	90ff3b56d0	Kill some unused TH(C)Storage functions. (#34385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34385 Test Plan: Imported from OSS Differential Revision: D20311064 Pulled By: gchanan fbshipit-source-id: 6dc50621dc417e9ea4624cdebd0970453fa75a77	2020-03-09 16:03:56 -07:00
Gregory Chanan	4e357089b4	Stop calling newWithSize directly. (#34384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34384 Test Plan: Imported from OSS Differential Revision: D20311057 Pulled By: gchanan fbshipit-source-id: 1e1a1f9b757b62f20d8d806f21abdd70f07b12aa	2020-03-09 16:03:51 -07:00
Elias Ellison	fea618b524	[JIT] remove list with default builtin (#34171 ) Summary: I think this was added when we couldn't compile the function itself. now we can. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34171 Differential Revision: D20269960 Pulled By: eellison fbshipit-source-id: 0a60458d639995d9448789c249d405343881b304	2020-03-09 16:02:26 -07:00
Bruce Lin	34688d2c48	Add brand guidelines link (#34503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34503 Differential Revision: D20349273 Pulled By: soumith fbshipit-source-id: 6b085377741ace5d200ca0d536de433b9bb7825c	2020-03-09 15:55:52 -07:00
Jerry Zhang	2e7eef41ac	[quant][graphmode] Swap quantized functional linear with aten::linear (#33853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33853 Quant fusion relies on inline, but inline will break the CallFunction("linaer", ...) into a if block it will be hard to recognize this block and swap it with quantized::linear, in order to preserve the op, we will swap all quantized functional linear into aten::linear. They might produce different backward graph, but this is called in the step before we get quantized model, so it shouldn't affect anything. We'll integrate this with convert_script later in the new "finalize_quant" API Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20343873 fbshipit-source-id: 423e03bf893b79267d2dc97bc997ee1bfe54ec0f	2020-03-09 15:45:20 -07:00
Kimish Patel	7688ca631a	Enable RTTI for mobile builds, to enable custom class via torchbind in mobile (#34368 ) Summary: Custom classes via torchbind requires runtime type information. We are trying to enable custom class based graph rewrite for XNNPACK in this stacked PRs: https://github.com/pytorch/pytorch/pull/34047. They require RTTI enabled for mobile. Mobile builds are failing currently without it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34368 Differential Revision: D20306155 Pulled By: kimishpatel fbshipit-source-id: 52c61ff5467a619e8f51708a05258eee35dd0a56	2020-03-09 15:43:55 -07:00
davidriazati	2c0f3536b6	[jit] Make `ModuleList`s a sugared value (#34320 ) Summary: Previously when emitting subscripts we only emitted actual values, but now they may sometimes emit a `ModuleValue`, so it should stay as a `SugaredValue`. This allows for the result of the subscript to be treated as a real module (i.e. you can just do `self.modlist[1](inputs)` instead of `self.modlist[1].forward(inputs)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34320 Pulled By: driazati Differential Revision: D20345642 fbshipit-source-id: 2bedf9a454af747b704422f6bbb8370cbdf4bf61	2020-03-09 15:36:46 -07:00
cyy	c218963270	fix more errors (#34480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34480 Differential Revision: D20345198 Pulled By: ezyang fbshipit-source-id: 583246acd02850ead96f1f0574d01ef6697c6352	2020-03-09 14:54:15 -07:00
Jeremy Lilley	15a7b9cf0a	[RpcAgent] Metrics for current num active/async rpc calls. (#34398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34398 As part of PR 34109, it was suggested that we track the number of outstanding async calls for RPC DebugInfo, particularly if we move towards using at::launch() threads on occasion for continuations. This particular aspect of the change was distinct from the main purpose of the diff, and started getting bigger, so split this functionality out as a separate diff. For completeness, we track client_active_calls, server_active_calls, server_active_async_calls, and write some very basic unittest coverage. ghstack-source-id: 99708836 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/... Differential Revision: D20314994 fbshipit-source-id: 2f7c75d5c511b27ed0c09c7b8a67b6fb49df31a5	2020-03-09 13:34:59 -07:00
Tao Xu	8294db8f15	[iOS][CI] Remove org-member from iOS Simulator Builds (#34410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34410 ### Summary Currently, the iOS jobs are not being run on PRs anymore. This is because all iOS jobs have specified the `org-member` as a context which used to include all pytorch members. But seems like recently this rule has changed. It turns out that only users from the admin group or builder group can have access right to the context values. https://circleci.com/gh/organizations/pytorch/settings#contexts/2b885fc9-ef3a-4b86-8f5a-2e6e22bd0cfe This PR will remove `org-member` from the iOS simulator build which doesn't require code signing. For the arm64 builds, they'll only be run on master, not on PRs anymore. ### Test plan - The iOS simulator job should be able to appear in the PR workflow Test Plan: Imported from OSS Differential Revision: D20347270 Pulled By: xta0 fbshipit-source-id: 23f37d40160c237dc280e0e82f879c1d601f72ac	2020-03-09 13:22:54 -07:00
Jerry Zhang	776d2a1e8f	[quant][graphmode] Handling ops doesn't require observation in insertObservers (#33481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33481 We have to propagate observed property of values through ops like max_pool2d, flatten and avoid inserting duplicated observers. For example: ``` x1 = self.conv(x) x2 = maxpool(x1) x3 = self.conv(x2) ``` If x1 is observed, we should propagate this information through maxpool and we should consider x2 as observed as well. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20261897 fbshipit-source-id: 7de354a3ccb2b6e1708f5c743d4d9f7272691a93	2020-03-09 13:15:54 -07:00
Xiang Gao	2b45368e50	Fix cudnn 64bit indexing issue (#34407 ) Summary: Fix https://github.com/pytorch/pytorch/issues/33143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34407 Differential Revision: D20325106 Pulled By: ngimel fbshipit-source-id: 5aa52295f5491f189b7a8bea0987f28de0589d98	2020-03-09 12:35:55 -07:00
vishwakftw	e025677e3c	Remove kwargs from torch.meshgrid (#34356 ) Summary: Changelog: - Remove kwargs from torch.meshgrid as they serve no purpose Closes https://github.com/pytorch/pytorch/issues/34206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34356 Differential Revision: D20310971 Pulled By: zou3519 fbshipit-source-id: 97250051504aa3ec1e2a9af9296e7cc71872e5bf	2020-03-09 12:07:43 -07:00
Jiakai Liu	70fe508c26	[pytorch] fix BUILD_CAFFE2_MOBILE gating around caffe2/operators/experimental/c10/cpu (#34354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34354 The condition `NOT INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE` was added in #27086, but seems it's always false on current master: BUILD_CAFFE2_MOBILE is ON by default - the name is a little bit misleading - it is ON even when it's building non-mobile PyTorch/Caffe2. It is OFF only when it's building PyTorch mobile, where INTERN_BUILD_MOBILE is ON. And when it's building PyTorch mobile, it won't build caffe2/operators at all (by setting BUILD_CAFFE2_OPS OFF: https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L345) So I imagine the real intention is to skip when it's building Caffe2 mobile. We can simply remove the deprecating BUILD_CAFFE2_MOBILE condition. Test Plan: Imported from OSS Differential Revision: D20345298 Pulled By: ljk53 fbshipit-source-id: d2cb4e2248fc209d63b2843e0f12e577e323def4	2020-03-09 12:00:57 -07:00
Gregory Chanan	6d3783a6bc	Clean up unused newWithSize variants. (#34383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34383 Test Plan: Imported from OSS Differential Revision: D20311065 Pulled By: gchanan fbshipit-source-id: 9fc2cc4377f32c865401b04868a7405c49929c64	2020-03-09 11:19:30 -07:00
Peng Xia	91e922a338	[AI Bench] Add support for nlu model Summary: add support for nlu specific input Test Plan: tested ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G950U-7.0-24 ``` make sure it compatible with previous test ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G950U-7.0-24 ``` ``` { "model": { "category": "CNN", "description": "Assistant Mobile Inference", "files": { "model": { "filename": "model.pt1", "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" }, "data": { "filename": "input.txt", "location": "/home/pengxia/test/input.txt", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" } }, "format": "pytorch", "framework": "pytorch", "kind": "deployment", "name": "Assistant Mobile Inference" }, "tests": [ { "command": "{program} --model {files.model} --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter {iter} --input_file {files.data} --report_pep true", "identifier": "{ID}", "metric": "delay", "iter": 5, "warmup": 2, "log_output": true } ] } ``` input.txt ``` what is weather today what time it is set a reminder for tomorrow ``` result https://our.intern.facebook.com/intern/aibench/details/137241352201417 Reviewed By: kimishpatel Differential Revision: D20300947 fbshipit-source-id: 7c1619541a2e9514a560a9acb9029cfc4669f37a	2020-03-09 10:39:49 -07:00
neginraoof	bcfd348858	[ONNX] Export new_zeros (#34077 ) Summary: ONNX export for new_zeros op added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34077 Reviewed By: hl475 Differential Revision: D20332074 Pulled By: houseroad fbshipit-source-id: 4235c4f2c279c37aa8dde6d13c1b26f621967768	2020-03-09 10:38:22 -07:00
Will Feng	baeb359e7a	Remove `using namespace torch::autograd` from header files (#34423 ) Summary: This PR prevents leaking symbols from `torch::autograd` namespace to the root namespace. Fixes https://github.com/pytorch/pytorch/issues/34371. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34423 Differential Revision: D20338404 Pulled By: yf225 fbshipit-source-id: e7ff3348193667a0cee5d38f9a003ae36cc704ca	2020-03-09 10:31:21 -07:00
Adam Paszke	e3d50c4dda	Retain the order of parameters while generating ConcreteModuleTypes (#34131 ) Summary: `ConcreteModuleTypeBuilder` used to keep parameters together with all others attributes in an `unordered_map` often leading to reordering them while building up the type. Parameter order is semantically meaningful, so we need to preserve it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34131 Differential Revision: D20331542 Pulled By: suo fbshipit-source-id: 5b860025f7902654d6099751d3fb14b12f6f5a67	2020-03-09 10:25:45 -07:00
Gregory Chanan	f62a7e7efb	Simplify implementation of newWithStorage1d. (#34382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34382 The previous implementation was handling both newWithStorage and newWithSize, which doesn't make much sense. Test Plan: Imported from OSS Differential Revision: D20311056 Pulled By: gchanan fbshipit-source-id: 2696a4566e6203c98338c86cbf4c236bd18d7c49	2020-03-09 10:18:44 -07:00
prajjwal1	b1bd950a4d	Fixed stub for AdamW (#34299 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/33757](https://github.com/pytorch/pytorch/issues/33757) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34299 Differential Revision: D20337844 Pulled By: ezyang fbshipit-source-id: 54bf174a09b8db9bf6e0c3c717730dd7c795d76b	2020-03-09 08:45:51 -07:00
Will Feng	739d4609c3	[C++ API] Fix ModuleList compile error: error: 'begin' was not declared in this scope (#34463 ) Summary: One example in the current docs for `torch::nn::ModuleList` doesn't compile, and this PR fixes it. Fixes https://github.com/pytorch/pytorch/issues/32414. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34463 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20331120 Pulled By: yf225 fbshipit-source-id: 50bb078fe1a900c9114d5434e92dc40ee13b52bf	2020-03-09 08:15:50 -07:00
Will Feng	b09e90af1e	Fix C++ at::Tensor docs generation (#34467 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25845. Test Plan: Check `pytorch_cpp_doc_push` CI job, and see if there is `classat_1_1_tensor` generated (similar to `structat_1_1native_1_1_convolution_descriptor`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/34467 Differential Revision: D20338190 Pulled By: yf225 fbshipit-source-id: 52dc05af5e0d742e740de5576d0d2b3e17ef28dd	2020-03-09 08:04:32 -07:00
albanD	6e2bb1c054	End of the .data removal in torch/optim (#34211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211 Test Plan: Imported from OSS Differential Revision: D20248684 Pulled By: albanD fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421	2020-03-09 06:40:39 -07:00
Mike Ruberry	7e55494502	Warns on read-only Numpy array->tensor conversion (#33615 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/5442. Per title (and see issue). A test is added to test_torch.py to verify the behavior. Update (with new behavior): NumPy arrays can be non-writeable (read-only). When converting a NumPy array to a Torch tensor the storage is shared, but the tensor is always writable (PyTorch doesn't have a read-only tensor). Thus, when a non-writeable NumPy array is converted to a PyTorch tensor it can be written to. In the past, PyTorch would silently copy non-writeable NumPy arrays and then convert those copies into tensors. This behavior violates the from_numpy contract, however, which promises that the tensor and the array share memory. This PR adds a warning message when a non-writeable NumPy array is converted into a Torch tensor. This will not break any networks, but will make end users aware of the behavior. They can work-around the warning message by marking their NumPy arrays as writeable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33615 Differential Revision: D20289894 Pulled By: mruberry fbshipit-source-id: b76df0077399eb91038b12a6bf1917ef38c2cafd	2020-03-08 20:03:50 -07:00
peter	79d47c1c5f	Fix the missing ';' in Conv.cpp (#34448 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34415. BTW, isn't this tested on CI? Maybe we need to introduce some tests with legacy versions of cuDNN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34448 Differential Revision: D20325104 Pulled By: ngimel fbshipit-source-id: f03dec30ffa6e50a28ee8103d7d49cd6fc0a6d69	2020-03-07 21:43:18 -08:00
Pritam Damania	7d9f611b64	Add worker_name helper to dist_utils. (#34162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34162 This avoids the "worker{}".format(..) in our unit tests to something cleaner. ghstack-source-id: 99713074 Test Plan: waitforbuildbot Differential Revision: D20233533 fbshipit-source-id: 5cff952ca68af5a6d26dc5cc01463cf7756d83d9	2020-03-07 13:24:45 -08:00
James Reed	8a17dc65af	[quantization] Make FP16 RNN use new prepack op (#34339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34339 Test Plan: Imported from OSS Differential Revision: D20297194 Pulled By: jamesr66a fbshipit-source-id: 8bf6d0f2cb047e90bbdd184aaad337b143040d10	2020-03-07 10:04:01 -08:00
James Reed	45a504dd2d	[JIT] Introduce BuiltinOpFunction and integrate into torchbind (#34098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34098 * #33900 [JIT] Move stuff out of class_type.cpp Test Plan: Imported from OSS Differential Revision: D20229166 Pulled By: jamesr66a fbshipit-source-id: d658a63a5d6e372e675f35b8456adc8de82b49f3	2020-03-07 10:03:56 -08:00
James Reed	60e8615a6d	[JIT] Virtualize Function (#33921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921 NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)! Test Plan: Imported from OSS Differential Revision: D20177227 Pulled By: jamesr66a fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde	2020-03-07 10:03:50 -08:00
James Reed	bb1114258c	[JIT] Move stuff out of class_type.cpp (#33900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33900 These functions don't require any libtorch-specific functionality, so move them into the header so they're included in the ATen build Test Plan: Imported from OSS Differential Revision: D20175874 Pulled By: jamesr66a fbshipit-source-id: 1efab1b60e196a635e6c6afadb042b63771170f0	2020-03-07 10:02:32 -08:00
Kamil Wojcicki	65bad41cbe	Fixed typos in quantization docs / docstrings (#34182 ) Summary: Removed extra back quote character. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34182 Differential Revision: D20320146 Pulled By: jerryzh168 fbshipit-source-id: 33c347711a052cc55f7d1a41ed959dadf99a3d7d	2020-03-06 21:53:52 -08:00
Leah Dickstein	c5e822b7bb	Back out "[jit] Add type tags to lists/dicts in pickle" (#34406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34405 Original commit changeset: 2f1826e6679a Test Plan: reverting, see S197156 Reviewed By: akyrola, volkhin Differential Revision: D20317456 fbshipit-source-id: 89298a9c022edba1d54bcdc7541804cb919e33f5	2020-03-06 20:02:16 -08:00
Brandon Green	392afb9f8b	Fix overlapping keywords (#34142 ) Summary: This commit fixes overlapping keywords in the CPP Docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/34142 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20319949 Pulled By: yf225 fbshipit-source-id: e7bb2efdc286c85792c6f18a260c3bba33c54008	2020-03-06 19:16:21 -08:00
Lingyi Liu	b0479506a8	Add the 3d avg pool for video related model (#33339 ) Summary: ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 5, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 4, 1, 2, 3]) x = x.permute([0, 4, 1, 2, 3]) NITER = 10 s = time.time() for i in range(NITER): float_out = torch.nn.functional.avg_pool3d(x, kernel_size=3, stride=None, padding=0) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.avg_pool3d(q_x, kernel_size=3, stride=None, padding=0) time_per_iter_quant = (time.time() - s) / NITER print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') ``` ``` ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 16.286182403564453 0.7308721542358398 0.04487682479080417 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 15.364313125610352 0.6497383117675781 0.042288796541418254 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 15.649032592773438 13.879132270812988 0.8869003363966556 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33339 Differential Revision: D19900904 Pulled By: lly-zero-one fbshipit-source-id: 4522cc6b4a0751aeda6c7edc258e0cb3f55a8fe3	2020-03-06 17:44:34 -08:00
Lu Fang	d98516026e	[PyTorch BC] Clean up the BC whitelist (#34393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34393 Clean up the list Test Plan: CI Reviewed By: hl475 Differential Revision: D20300530 fbshipit-source-id: 50e7da0a9f8295eff33590982f32f84abee96d9c	2020-03-06 16:10:20 -08:00
Xiao Wang	ccf6fab65e	Fix doc and type hints for "torch.add"; fix deprecated python calls in tests (#33935 ) Summary: This PR fixed documentation for `torch.add` with alpha. It also fixed these deprecated python calls `torch.add` and `torch.addmm` in tests, which may affect performance in test/test_sparse.py and test/test_nn.py. cc csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/33935 Differential Revision: D20313320 Pulled By: ngimel fbshipit-source-id: fb08413d7e244865952e3fc0e1be7f1794ce4e9a	2020-03-06 15:53:58 -08:00
Martin Yuan	01edb7450f	[Lite Trainer] Add necessary registrations for MNIST model (#33717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33717 Because of the special treatment of operator names for lite interpreter, all the operators used in lite interpreter are still prepended by "_". Add the necessary registrations for MNIST model. All the ops with autograd capability are included in torch_mobile_train. After rebase the selective build from D19649074 can be utilized to strip the unused ops. Note that this diff is for feasibility test. The training accuracy are not covered in the test. ghstack-source-id: 97780066 Test Plan: ``` buck run xplat/caffe2/fb/lite_trainer:lite_trainer -c pt.disable_gen_tracing=1 -c pt.static_dispatch=0 -- --model=/path/MnistModel.bc ``` {F227898221} Reviewed By: dreiss Differential Revision: D19743201 fbshipit-source-id: cacadd76f3729faa0018d147a69466bbf54312fd	2020-03-06 15:49:03 -08:00
Xiang Gao	96ca06cfce	Add nhwc memory format test for dropout (#34379 ) Summary: cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/34379 Differential Revision: D20310118 Pulled By: ngimel fbshipit-source-id: a9bafd6b8fbcb57443e22181cf6bd9879b6f6051	2020-03-06 15:43:21 -08:00
Xiang Gao	37dfc6c498	Reenable large conv tests (#34259 ) Summary: Please merge after https://github.com/pytorch/pytorch/pull/33073 With that PR, we are now trying different algorithms when OOM, so hopefully there will be some algo working at low memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34259 Differential Revision: D20310094 Pulled By: ngimel fbshipit-source-id: bccd8162bd06a0e54ac6f42a7fd9a5b766f92cd7	2020-03-06 15:36:54 -08:00
Duncan Riach	516a587438	Enhance reproducibility documentation (#33795 ) Summary: Improves explanation of non-determinism when running on GPUs. Adds info about `torch.nn.BCELoss` operating non-deterministically on GPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33795 Differential Revision: D20284880 Pulled By: ngimel fbshipit-source-id: d543959636d261a80c234150304344b19a37ba5d	2020-03-06 15:32:04 -08:00
Eli Uriegas	079de7f376	.circleci: Remove macOS builds related to CUDA (#34333 ) Summary: We don't release binaries for macOS with CUDA support so we should just remove it from our regular PR pipeline Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34333 Differential Revision: D20312565 Pulled By: seemethere fbshipit-source-id: 376228680aa0e814d1b37f1ff63b7d1262515e44	2020-03-06 13:18:06 -08:00
Eli Uriegas	2d3f6cbf03	.circleci: Update default smoke tests from cuda 10.0 -> 10.2 (#34328 ) Summary: Now that https://github.com/pytorch/pytorch/issues/34241 is merged, we can update these to the latest cuda version to get a better signal. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34328 Differential Revision: D20312552 Pulled By: seemethere fbshipit-source-id: 8e6bf797e067500d5dd9a607c6c19465028637bc	2020-03-06 13:11:58 -08:00
Nikita Shulga	5608ffc46c	[PyTorch] Remove const modifiers from passed by value integers in qbatch_norm_fn (#34378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34378 This fixes strange symbol mangling mismatch beteen `DECLARE_DISPATCH(qbatch_norm_fn, qbatch_norm_stub)` and `REGISTER_DISPATCH(qbatch_norm_stub, &q_batch_norm_kernel<false>);` if code is build on Windows with clang Test Plan: CI + build PyTorch on Windows using clang Reviewed By: EscapeZero Differential Revision: D20309550 fbshipit-source-id: e97c7c3b6fee2e41ea6b2f8167ce197aec404e3d	2020-03-06 13:04:54 -08:00
Gao, Xiang	c6ea71b6e8	Fix Conv.cpp, &&= is not a C++ operator (#34381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34381 Differential Revision: D20310674 Pulled By: ngimel fbshipit-source-id: a453c1d07bcf7aead7402f091bccb4af7b1ec690	2020-03-06 12:38:58 -08:00
Jeremy Lilley	5f641f93f1	[aten] Don't deadlock in IValue::Future impl, tests. (#34099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34099 This change effectively applies into IValue's future impl a few fixes we discovered when using the torch::utils::Future<T> impl. The parallel impls should probably eventually be merged, but until then: - Don't hold the lock when invoking the callbacks. This makes it effectively impossible (deadlocks) to call value() to get the value from inside the callback. - We discovered that it was slightly cleaner in practice to notify condition variables prior to invoking callbacks (best to unblock paused threads ASAP, before spawning new work). - Fix some var naming inconsistency. - Add a some caffe2 cpp test coverage. ghstack-source-id: 99336569 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- 'JitTest\.IValueFuture' ``` Differential Revision: D20203278 fbshipit-source-id: 6e805ba547899dab9aab458e4b23049db31f930e	2020-03-06 12:34:50 -08:00
Eli Uriegas	0489b8da42	Add scripts to promote S3 artifacts from test channels to stable channels (#34274 ) Summary: Currently testing against the older release `1.4.0` with: ``` PYTORCH_S3_FROM=nightly TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 scripts/release/promote/libtorch_to_s3.sh PYTORCH_S3_FROM=nightly TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 scripts/release/promote/wheel_to_s3.sh ``` These scripts can also be used for `torchvision` as well which may make the release process better there as well. Later on this should be made into a re-usable module that can be downloaded from anywhere and used amongst all pytorch repositories. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34274 Test Plan: sandcastle_will_deliver Differential Revision: D20294419 Pulled By: seemethere fbshipit-source-id: c8c31b5c42af5096f09275166ac43d45a459d25c	2020-03-06 12:18:16 -08:00
Rohith Menon	879a90b322	[ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343 Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization tl;dr - fp16 tensor deserialization 12x faster, serialized size 25% lower - uint8 tensor deserialization 36x faster, serialized size 25% lower Test Plan: ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 12.37ms 80.82 BlobProtoByteDeserializationFloat16 1125.46% 1.10ms 909.64 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.57ms 56.92 BlobProtoByteDeserializationUInt8 3629.45% 484.02us 2.07K ============================================================================ ``` Reviewed By: yinghai Differential Revision: D20137451 fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1	2020-03-06 11:58:30 -08:00
albanD	98afce3c56	Remove unnecessary assert in autograd engine (#34307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34307 Test Plan: Imported from OSS Differential Revision: D20283401 Pulled By: albanD fbshipit-source-id: 34f6eb8955b7d9cb259260abc1056ddd9f354107	2020-03-06 11:45:46 -08:00
Nikita Shulga	6d8a0f6731	[Aten] Init container iterators to an unsigned type (#34159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34159 This fixes `comparison of integers of different sign` warnings Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232085 fbshipit-source-id: 8f325be54395be54c704335cb7edf2ec7ef75e75	2020-03-06 10:35:43 -08:00
Xiaodong Wang	4c99351de6	[AMD] Remove num_gpu check for remote execution (#34318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34318 Stop checking whether we have AMD GPU devices on the host, because we may be constructing a net on a machine without GPU, and run the net on another one with GPU Reviewed By: ajauhri Differential Revision: D20269562 fbshipit-source-id: 1f561086cacdcead3ce7c03c2d02c25336c8b11a	2020-03-06 09:53:57 -08:00
Jongsoo Park	4872b126fd	[aten] remove stmt unreachable, variable never used warnings (#34017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34017 Remove warning ``` caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(437): warning: statement is unreachable caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(271): warning: variable "transpose_m1" was set but never used caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(271): warning: variable "transpose_m2" was set but never used ``` Test Plan: CI Reviewed By: ngimel Differential Revision: D20181179 fbshipit-source-id: 3665912ba55bffbd8b4555f8a6803e57a502c103	2020-03-06 09:52:43 -08:00
Jongsoo Park	82a177c07f	[c10] remove warning attribute does not apply to any entity (#34018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34018 Remove warning ``` caffe2/c10/util/ArrayRef.h(278): warning: attribute does not apply to any entity ``` Test Plan: CI Reviewed By: jianyuh Differential Revision: D20181191 fbshipit-source-id: 58bd168a87a94fec925c7cde8b8d728a4257446c	2020-03-06 09:47:10 -08:00
Shihao Xu	17ceb6941f	[RPC] Create local RRef<ModuleInterface> remotely in Python, use it remotely in TorchScript (#34183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34183 https://github.com/pytorch/pytorch/pull/33263 enhanced the RRef Python constructor to infer most types, by `jit::tryToInferType(..)`. But this helper function can't infer `ScriptModule` type due to `ScriptModule`'s special per-Module type singleton logic, so it's still not possible for an Python-created RRef to know the JIT type of it's contained `ScriptModule`. Instead of inferring the specific type of a Module, which could leads to too many candidate types (due to Module's multiple inheritance possibility), it's more straightforward to set it's type as a user-specified `ModuleInterface` type. We added an optional argument `type_hint` for users to mark an `RRef` for what `ModuleInterface` type it's holds. ghstack-source-id: 99649379 (Note: this ignores all push blocking failures!) Test Plan: Aspects that need to be confirmed in the test cases https://fb.quip.com/aGxRAh2lCg05 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_create_local_script_class_rref buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_create_local_script_module_rref buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_return_local_script_module_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_torchscript_function_exception ``` Differential Revision: D7065050 fbshipit-source-id: e10210c0996622969e499e4a35b0659b36787c1c	2020-03-06 08:28:22 -08:00
Gregory Chanan	a7da4490cc	Clean up some legacy scalar/empty handling. (#34217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34217 LegacyNoScalar variants cause 0-dim tensors to behave like 1-dim tensors. LegacyAll variants cause 0-dim tensors to behave like 1-dim tensors, and numel == 0 tensors to be treated like 0-dimensional tensors. This this was done by codemod, these are often unneeded and often translated incorrectly to ATen. Test Plan: Imported from OSS Differential Revision: D20249577 Pulled By: gchanan fbshipit-source-id: 6f2876d3e479562c9323f3629357a73a47869150	2020-03-06 08:13:31 -08:00
Hong Xu	9c5578fd0a	Make sure Vec256 int32_t and int16_t loadu temprary arrays are properly initialized (#34281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34281 Seems like #32722 has missed two loadu functions Test Plan: Imported from OSS Differential Revision: D20287731 Pulled By: albanD fbshipit-source-id: d959b2508de3f9f660368152d7260026d7fbccbe	2020-03-06 07:55:45 -08:00
Pavel Belevich	35b6d2945d	Tensor.random_ check that from and to are in tensor dtype bounds (#34033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34033 Test Plan: Imported from OSS Differential Revision: D20182414 Pulled By: pbelevich fbshipit-source-id: 3704570ead7de169ce13c81164be0aff0806fb46	2020-03-06 07:22:47 -08:00
Shen Li	30680196e4	Revert D20121915: [JIT] Add support for list() Test Plan: revert-hammer Differential Revision: D20121915 Original commit changeset: c6c4ef444dbf fbshipit-source-id: 829adb58780f4d0f41acebb3e7640a9c68bdbc1b	2020-03-06 07:16:40 -08:00
lixinyu	f9f135c5d8	ChannelsLast3d support is_contiguous, contiguous, suggest_memory_format, caching (#33033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33033 Test Plan: Imported from OSS Differential Revision: D19759661 Pulled By: glaringlee fbshipit-source-id: 6c4798fa93589338c0c71c5308b9fd1151330245	2020-03-06 06:02:03 -08:00
Will Feng	415595ace4	[C++ API] Remove init-list form of at::indexing::Slice (#34255 ) Summary: The init-list form of `at::indexing::Slice` (i.e. `tensor.index({{1, None, 2}, ...})` instead of `tensor.index({Slice(1, None, 2), ...})`) in C++ API can be easily confused with the list-form indexing in Python API (e.g. `tensor[[1, 3, 2], ...]`), which is not good from readability perspective. This PR removes the init-list form of `at::indexing::Slice` to make the API less confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34255 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20290166 Pulled By: yf225 fbshipit-source-id: abbcbeca0b179219e5e1f196a33ef8aec87ebb76	2020-03-06 05:51:53 -08:00
meganset	b8fd88319a	C++ make torch::nn::Sequential push_back(AnyModule) methods public (#34208 ) Summary: Issue https://github.com/pytorch/pytorch/issues/33192 Moves Sequential::push_back methods with AnyModule from private -> public Allows adding an existing AnyModule via something like: ``` torch::nn::Sequential q; auto a=torch::nn::AnyModule(torch::nn::Linear(1,2)); q->push_back(a); q->push_back("fc",a); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34208 Differential Revision: D20300278 Pulled By: yf225 fbshipit-source-id: 4525319bb7fb6667e43a006c9f446a2193781005	2020-03-06 05:47:14 -08:00
Jiakai Liu	9a5e9d8cec	[pytorch][mobile] change mobile build scripts to build PyTorch by default (#34203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34203 Currently cmake and mobile build scripts still build libcaffe2 by default. To build pytorch mobile users have to set environment variable BUILD_PYTORCH_MOBILE=1 or set cmake option BUILD_CAFFE2_MOBILE=OFF. PyTorch mobile has been released for a while. It's about time to change CMake and build scripts to build libtorch by default. Changed caffe2 CI job to build libcaffe2 by setting BUILD_CAFFE2_MOBILE=1 environment variable. Only found android CI for libcaffe2 - do we ever have iOS CI for libcaffe2? Test Plan: Imported from OSS Differential Revision: D20267274 Pulled By: ljk53 fbshipit-source-id: 9d997032a599c874d62fbcfc4f5d4fbf8323a12e	2020-03-05 23:40:47 -08:00
Ilia Cherniavskii	b50825e011	Make RecordFunction more robust for async use cases (#34122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34122 Earlier work added support for async rpc cases when RecordFunction's end callbacks might be called in a different thread; in addition some extra care was needed to handle pointer to parent function; This PR makes RecordFunction aware of potentially multiple threads in use, as well as removes unused parent() call and restricts current() RecordFunction to scope-based record functions (RECORD_FUNCTION macro) Test Plan: unit tests Differential Revision: D20297709 Pulled By: ilia-cher fbshipit-source-id: 46a59e1b2eea0bbd8a59630385e193b38d30f9d1	2020-03-05 22:28:53 -08:00
Elias Ellison	38857734f0	[JIT] fix py35 test (#34350 ) Summary: test_module_interfaces was using syntax only supported in >= 3.6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34350 Reviewed By: mrshenli Differential Revision: D20298869 Pulled By: eellison fbshipit-source-id: 22319ca403113cff2eedf57767bb34d9580e6db3	2020-03-05 21:31:19 -08:00
anjali411	76035f050b	[C++ API Parity] Adam: updated step and class design (#33730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33730 Differential Revision: D20292073 Pulled By: anjali411 fbshipit-source-id: a7b4a70f29027ab355aebb91873ea55d5cb51783	2020-03-05 19:15:24 -08:00
Shihao Xu	f4da78f1b3	Remove RPC TorchScript private API (#33978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33978 We can directly pass user_callbale to rpc_async API in TorchScript. There is no need to have private API for taking qualified name. ghstack-source-id: 99600360 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_torchscript_functions_not_supported ``` Differential Revision: D7420993 fbshipit-source-id: 228c15b21848e67418fab780e3fd6a1c6da5142d	2020-03-05 18:35:05 -08:00
Kimish Patel	02478984d6	Add support to dump unsupported ops. Add lite_interpter_load test. (#34278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34278 This diff helps check all the ops not supported by lite_interpreter. Helpful mainly to find all the ops that need to be added instead of adding them one by one. Test Plan: buck run caffe2/binaries:lite_interpreter_model_load -- --model=<bytecode-model-path> Reviewed By: iseeyuan Differential Revision: D20266341 fbshipit-source-id: 5a6c7a5bc52f910cea82a72045870da8105ccb87	2020-03-05 18:31:31 -08:00
Supriya Rao	434af5d94a	[quant] Speed up per-channel min-max observer (#34118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34118 Previously calc_per_channel_qparams was using for loops and python primitives, which called `item` many times causing slowdown during training. These changes uses torch primitives on the tensor to speed up the operation over 60x Perf results on MobileNetV2 during training using autograd profiler FP32 forward call - Self CPU time total: 47.222ms CUDA time total: 124.001ms before change FakeQuant Model - Self CPU time total: 19.107s CUDA time total: 27.177s after change FakeQuant Model - Self CPU time total: 404.667ms CUDA time total: 446.344ms Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D20287841 fbshipit-source-id: 6b706b8206e0d0da3c3c217b014e8da5b71b870d	2020-03-05 18:29:41 -08:00
neginraoof	d2b5eb2a45	[ONNX] Fix for random generators export (#33789 ) Summary: Export random generator with dynamic input size Pull Request resolved: https://github.com/pytorch/pytorch/pull/33789 Reviewed By: hl475 Differential Revision: D20121175 Pulled By: houseroad fbshipit-source-id: c16d11eb07678166d125759d97aadfcd7c80ef14	2020-03-05 17:58:54 -08:00
Jiakai Liu	89d314b5d5	[pytorch] update mobile docker image version (#34337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34337 Test Plan: Imported from OSS Differential Revision: D20296975 Pulled By: ljk53 fbshipit-source-id: bc4a39689dca22e4530f25225f1884eda9bc74de	2020-03-05 17:47:36 -08:00
Supriya Rao	1cf12b7e53	[quant] Fix histogram observer to work with QAT on GPU (#34232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34232 By default `torch.zeros` creates the tensor on GPU. Need to specify the device argument to get it to work correctly on GPU during QAT. Test Plan: 1. Tested by running QAT on GPU 2. python test/test_quantization.py Imported from OSS Differential Revision: D20286351 fbshipit-source-id: 745723c85d902870c56c1c7492f26cb027ae9dc6	2020-03-05 17:19:12 -08:00
Xiang Gao	e4a883e601	cuDNN convolution try multiple algo (#33073 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/31336 https://github.com/pytorch/pytorch/issues/1664 Sometimes cuDNN heuristics return algorithms that can not be used. Instead of just using the first algorithm returned, we should try these algorithms one by one until one of them succeed. Benchmark: https://github.com/zasdfgbnm/things/blob/master/2020Q1/conv-benchmark.ipynb ```python i = torch.randn(256, 3, 256, 256).cuda() c = torch.nn.Conv2d(3, 3, 3, 3).cuda() %timeit c(i); torch.cuda.synchronize() ``` before vs after = 498 vs 490 µs The performance is improved I guess because, before this PR, we always call the heuristics to get the algorithm, but after this PR, we only do at the first time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33073 Differential Revision: D20284755 Pulled By: ngimel fbshipit-source-id: b03af37c75939ca50c2cb401c706ba26914dd10e	2020-03-05 17:06:21 -08:00
Meghan Lele	5500c3de0a	Revert D20150304: [pytorch][PR] [JIT] Introduce a fake Tensor creation node for IR unit tests Test Plan: revert-hammer Differential Revision: D20150304 Original commit changeset: c88f5289055a fbshipit-source-id: 14ac0e46145e9fb4f200c6318b63edd541380aeb	2020-03-05 16:25:08 -08:00
Elias Ellison	78aebbcb88	[JIT] add other module apis (#34106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34106 Test Plan: Imported from OSS Differential Revision: D20283996 Pulled By: eellison fbshipit-source-id: 88e7bc4547e96717d6c8efe0b25ede0d198d9e68	2020-03-05 16:12:29 -08:00
Peter Bell	2af64ba3ed	Allow output to zero-strided tensors if the size is <= 1 along that dim (#34100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34100 Differential Revision: D20267778 Pulled By: ngimel fbshipit-source-id: 1b84c4f6e6bf5d29c3698daa3cb71554b25c1eee	2020-03-05 16:01:33 -08:00
Martin Yuan	ccf4d69b75	[Lite Interpreter] Enable __setstate__ (#33294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294 1. Serialize bytecode of __setstate__ and run it when loading the model. 2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile. Test Plan: Imported from OSS Differential Revision: D20162898 Pulled By: iseeyuan fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c	2020-03-05 15:24:21 -08:00
Eli Uriegas	765c5b1c95	.circleci: Add CUDA 10.2 to CI (#34241 ) Summary: Basically a re-do of https://github.com/pytorch/pytorch/pull/33471 Should be safe to merge now that https://github.com/pytorch/pytorch/issues/34135 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34241 Differential Revision: D20292711 Pulled By: seemethere fbshipit-source-id: c508b5ef58f52aa3a263fd33b0373f31719fa0a4	2020-03-05 15:06:34 -08:00
Elias Ellison	f218842f2e	[JIT] Add support for list() (#33818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33818 Test Plan: Imported from OSS Differential Revision: D20121915 Pulled By: eellison fbshipit-source-id: c6c4ef444dbf1d4134dccb28c13315e225945b64	2020-03-05 14:48:20 -08:00
Elias Ellison	479c3b0aa5	[JIT] add support for torch.norm (#33783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33783 Fix for https://github.com/pytorch/pytorch/issues/20113 Test Plan: Imported from OSS Differential Revision: D20121917 Pulled By: eellison fbshipit-source-id: ffedcc40678cd80f5529ff9323088eed544e5158	2020-03-05 14:46:24 -08:00
neginraoof	beb4309406	[ONNX] Reduce ONNX test time on CI (#33242 ) Summary: Among all ONNX tests, ONNXRuntime tests are taking the most time on CI (almost 60%). This is because we are testing larger models (mainly torchvision RCNNs) for multiple onnx opsets. I decided to divide tests between two jobs for older/newer opsets. This is now reducing the test time from 2h to around 1h10mins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33242 Reviewed By: hl475 Differential Revision: D19866498 Pulled By: houseroad fbshipit-source-id: 446c1fe659e85f5aef30efc5c4549144fcb5778c	2020-03-05 14:38:34 -08:00
Ailing Zhang	ff2731b45c	Revert "Disable MNIST test in test_xla() (#34261 )" (#34316 ) Summary: Should be passing now ;) This reverts commit 4a194f89aadc7cd1d7e24622b53855cfb885da75. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34316 Reviewed By: mrshenli Differential Revision: D20287196 Pulled By: ailzhang fbshipit-source-id: 1cc48a11edcc48a0ec4161c94487912eba63c9a5	2020-03-05 14:27:26 -08:00
Yinghai Lu	9651088228	Tuck the packing logic into Int8FCPackWeight op (#34289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34289 Test Plan: ``` buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test ``` Reviewed By: csummersea Differential Revision: D20275538 fbshipit-source-id: 699ca2a145c7c9a50b0fdab7bd68d8557a031ac0	2020-03-05 13:43:08 -08:00
Meghan Lele	9ce833879f	[JIT] Introduce a fake Tensor creation node for IR unit tests (#33914 ) Summary: Summary There is often a need to create a Tensor when writing IR by hand for JIT optimisation pass unit tests. The only options for this today are real Tensor creation functions like `aten::ones`. Any test that uses these functions must also use the same default arguments as the Python/C++ API, which means that all of the tests have to be updated when the API is updated. This commit introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that should be used in unit tests instead of real Tensor creation functions. This new primitive has no public-facing API, so the maintenance burden is much lower. Testing This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of `aten::rand`, `aten::ones`, and `aten::zeros`. ``` $ ./bin/test_jit CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -_CUDA:_MultiCUDA [==========] Running 75 tests from 1 test case. [----------] Global test environment set-up. [----------] 75 tests from JitTest [ RUN ] JitTest.ADFormulas [ OK ] JitTest.ADFormulas (82 ms) [ RUN ] JitTest.Attributes [ OK ] JitTest.Attributes (0 ms) ... ... ... [ RUN ] JitTest.LiteInterpreterPrim [ OK ] JitTest.LiteInterpreterPrim (0 ms) [ RUN ] JitTest.LiteInterpreterLoadOrigJit [ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms) [----------] 75 tests from JitTest (150 ms total) [----------] Global test environment tear-down [==========] 75 tests from 1 test case ran. (150 ms total) [ PASSED ] 75 tests. ``` Fixes* This pull request fixes https://github.com/pytorch/pytorch/issues/33500. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33914 Differential Revision: D20150304 Pulled By: SplitInfinity fbshipit-source-id: c88f5289055a02dc20b7a5dcdf87469f9816d020	2020-03-05 12:42:42 -08:00
Artem Volkhin	75d29f8d3e	Allow converting IValue to vector<string> (#34269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269 follow up for https://github.com/pytorch/pytorch/pull/16519 Test Plan: unit tests Reviewed By: houseroad Differential Revision: D20261495 fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb	2020-03-05 12:31:23 -08:00
Adam Paszke	3a4bac5c76	Throw a proper error when parsing local variable annotations without assignments (#34133 ) Summary: Currently, putting `outputs: List[Tensor]` instead of `outputs: List[Tensor] = []` in your JITed code results in: ``` Traceback (most recent call last): File "custom_lstms.py", line 453, in <module> test_script_stacked_bidir_rnn(5, 2, 3, 7, 4) File "custom_lstms.py", line 404, in test_script_stacked_bidir_rnn rnn = script_lstm(input_size, hidden_size, num_layers, bidirectional=True) File "custom_lstms.py", line 62, in script_lstm other_layer_args=[LSTMCell, hidden_size * dirs, hidden_size])) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1267, in script return torch.jit._recursive.create_script_module(obj, torch.jit._recursive.infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 305, in create_script_module return create_script_module_impl(nn_module, concrete_type, stubs_fn) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 317, in create_script_module_impl stubs = stubs_fn(nn_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 511, in infer_methods_to_compile stubs.append(make_stub_from_method(nn_module, method)) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 41, in make_stub_from_method return make_stub(func) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 34, in make_stub ast = torch.jit.get_jit_def(func, self_name="RecursiveScriptModule") File "/home/apaszke/pytorch/torch/jit/frontend.py", line 173, in get_jit_def return build_def(ctx, py_ast.body[0], type_line, self_name) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 206, in build_def build_stmts(ctx, body)) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 129, in build_stmts stmts = [build_stmt(ctx, s) for s in stmts] File "/home/apaszke/pytorch/torch/jit/frontend.py", line 129, in <listcomp> stmts = [build_stmt(ctx, s) for s in stmts] File "/home/apaszke/pytorch/torch/jit/frontend.py", line 181, in __call__ return method(ctx, node) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 294, in build_AnnAssign rhs = build_expr(ctx, stmt.value) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 180, in __call__ raise UnsupportedNodeError(ctx, node) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 116, in __init__ source_range = ctx.make_range(offending_node.lineno, AttributeError: 'NoneType' object has no attribute 'lineno' ``` This patch makes the error message more reasonable: ``` torch.jit.frontend.UnsupportedNodeError: annotated assignments without assigned value aren't supported: File "custom_lstms.py", line 221 # type: (Tensor, Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]] inputs = reverse(input.unbind(0)) outputs: List[Tensor] ~ <--- HERE for i in range(len(inputs)): out, state = self.cell(inputs[i], state) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34133 Differential Revision: D20249076 Pulled By: ezyang fbshipit-source-id: 40ec34ad38859f9fe56f379d3f8d08644b00fab9	2020-03-05 11:23:07 -08:00
Yunus Rahbar	ed11e2536a	[pytorch_ci] Skip determination tests in rocm Summary: I don't know why, but this segfaults on rocm. Test Plan: Can only be tested on master Reviewed By: mrshenli Differential Revision: D20286011 fbshipit-source-id: dde952449bf54ae459d36020f3e3db6fa087b39f	2020-03-05 11:23:02 -08:00
rohithkrn	e907128caf	[ROCm] Enable BFloat16 type for pooling ops (#34166 ) Summary: This PR enables bfloat16 type for pooling ops on ROCm. Also adds bfloat16 implementation of atomicAdd since pooling ops use it. Note: Changes in the lambda function blocks is only indentation as it is now wrapped inside `AT_SKIP_BFLOAT16_IF_NOT_ROCM` macro. iotamudelta ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/34166 Differential Revision: D20263421 Pulled By: ezyang fbshipit-source-id: 3f4199ec57522e638ec29f45e22c6ec919b7816d	2020-03-05 11:20:54 -08:00
Lara Haidar	8216d9ae64	ONNX Export Support for NLLLoss (#33509 ) Summary: Adding ONNX export support for torch.nn.NLLLoss(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33509 Reviewed By: hl475 Differential Revision: D20052212 Pulled By: houseroad fbshipit-source-id: 62efcff4efa1e0e97c65ad1b670c2fc1da08d28f	2020-03-05 11:13:21 -08:00
Jiakai Liu	e642a65bea	[pytorch][CI] add e2e mobile custom build jobs to CI (#34184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34184 Add mobile custom build with static dispatch & dynamic dispatch to CI. Most of mobile code analysis CI should be covered by the custom build + dynamic dispatch flow, so changing it to running on master only. Test Plan: Imported from OSS Differential Revision: D20241774 Pulled By: ljk53 fbshipit-source-id: f34c5748735c536ab6b42c8eb1429d8bbdaefd62	2020-03-05 10:26:45 -08:00
Rohan Varma	d98bd5e1f5	[test all] Back out "Revert D20171428: [profiler] fix chrome tracing for profiler run with cuda" Summary: There was an error in https://github.com/pytorch/pytorch/pull/30724/files that resulted in export_chrome_trace generating invalid JSON. This only came up when the profiler is run with use_cuda=True from what it looks like. In the future, we should have tests that ensure we generate valid JSON because we no longer use the json library. ghstack-source-id: 99508836 Test Plan: Added a unit test. Differential Revision: D20237040 fbshipit-source-id: 510befbdf4ec39632ac56544afcddee6c8cc3aca	2020-03-05 09:05:56 -08:00
Shen Li	4a194f89aa	Disable MNIST test in test_xla() (#34261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34261 Test Plan: Imported from OSS Differential Revision: D20260350 Pulled By: mrshenli fbshipit-source-id: b92a6b79e59bdfdf8e68b5dd73f87ea1dfd0daed	2020-03-05 07:55:52 -08:00
Jie	2b79bab029	[CUDA_FUSER] Fork CUDA fuser (#33527 ) Summary: Separating CUDA fuser from CPU fuser. 1. New node in IR - prim::CudaFusionGroup: This enables the cuda fuser to co-exist along side the old fuser. Allows us to incrementally build and expand cuda fuser. 2. copied FuseGraph optimization passes to CudaFuserGraph: We will re-factor & reuse Chunk/Concat in the old fuser logic, which is handled in the optimization pass at this moment. Unfortunately many code in the pass is tightly binded with the legacy fuser, which makes code sharing difficult. The CudaFusionGraph will support only a subset of operations comparing to legacy fuser (CUDA only). It is registered as a custom pass post fusion via ```torch._C._jit_register_cuda_fuser()``` To have it in effect, you should also turn off fusion on GPU via ```torch._C._jit_override_can_fuse_on_gpu(False)``` 3. We don't have codegen in this PR yet (WIP). Currently we just fall back to the old fuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33527 Differential Revision: D20171598 Pulled By: ZolotukhinM fbshipit-source-id: 9a3c0f06f46da7eaa80ae7551c04869f5b03ef71	2020-03-04 20:25:08 -08:00
Elias Ellison	e132047f1b	[JIT] fix alias assertion (#34268 ) Summary: [This check](`019ffdca31/torch/csrc/jit/ir/alias_analysis.cpp (L772)`) wasn't being triggered for None outputs of tuples, because `mustBeNone` would return false if `num_outputs != 1`. This caused an assertion to fail in alias analysis. It's kind of a convoluted case to repro and I wasn't able to make a succinct one, but I tested internally and it fixed the bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34268 Differential Revision: D20261539 Pulled By: eellison fbshipit-source-id: 95edea10e2971727cfd3f3bc2b6bdf9dbadca6a9	2020-03-04 19:00:58 -08:00
Shihao Xu	e2ddf935bb	Run RPC JIT tests with variable type hints only in Python >=3.6 (#34284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34284 Python 3.5 only supports function type hints. Variable type hints are introduced in Python 3.6. So these tests with JIT type hints will fail with "Syntax Error" in Python 3.5 environment. ghstack-source-id: 99542199 Test Plan: ` Differential Revision: D7348891 fbshipit-source-id: c4c71ac021f35b5e6f7ce4d3e6af10dd1d2600cc	2020-03-04 18:59:08 -08:00
Pritam Damania	c62de4286e	Add test to verify dist_autograd doesn't populate .grad field. (#33949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33949 ghstack-source-id: 99419830 Test Plan: waitforbuildbot Differential Revision: D20165254 fbshipit-source-id: ef4413637b1568d81e4aca053838230025df6bba	2020-03-04 17:08:48 -08:00
Vitaly Fedyunin	e1c6f93f14	Clean warning message (#34143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34143 Test Plan: Imported from OSS Differential Revision: D20228174 Pulled By: VitalyFedyunin fbshipit-source-id: 7ab873e87be8621b0f72e8300942fd82cbc19b29	2020-03-04 15:02:19 -08:00
Yunus Rahbar	1546d2afeb	[pytorch_ci] Don't run determination tests in py35 Test Plan: Can only really be tested in PyTorch master Reviewed By: mrshenli Differential Revision: D20260023 fbshipit-source-id: b5444c376894bfccd6524cf04a71cf76eea72275	2020-03-04 14:23:40 -08:00
Supriya Rao	e236e15934	[quant] Run weight_post_process for QAT (#33852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33852 This fixes an issue for QAT models. During eval if we call `prepare_qat` and `convert` before calling `load_state_dict` it throws an error because the weight info (num channels) is not updated in the observer module. It is not an issue for per-tensor case Fixes issue #33830 Test Plan: python test/test_quantization.py EagerModePostTrainingQuantTest.test_eval_after_train python test/test_quantization.py EagerModeQuantizationAwareTrainingTest.test_eval_after_train Imported from OSS Differential Revision: D20212996 fbshipit-source-id: a04af8fe4df2e555270ae4d6693f5777d86f8a46	2020-03-04 14:01:32 -08:00
Shen Li	d59e036f4d	Revert D20194092: Add support to dump unsupported ops. Add lite_interpter_load test. Test Plan: revert-hammer Differential Revision: D20194092 Original commit changeset: 0d596cd02043 fbshipit-source-id: 17b4bae27543f231bd6c12d90368d399ca55ebdf	2020-03-04 13:53:58 -08:00
Kimish Patel	17a5c67796	Add support to dump unsupported ops. Add lite_interpter_load test. (#34072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34072 This diff helps check all the ops not supported by lite_interpreter. Helpful mainly to find all the ops that need to be added instead of adding them one by one. Test Plan: buck run caffe2/binaries:lite_interpreter_model_load -- --model=<bytecode-model-path> Reviewed By: iseeyuan Differential Revision: D20194092 fbshipit-source-id: 0d596cd0204308027194af7ed738551d0c32a374	2020-03-04 13:18:12 -08:00
Jiakai Liu	385067ed4f	[pytorch][cmake] improve build mobile with host toolchain (#34187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34187 Noticed that a recent PR broke Android/iOS CI but didn't break mobile build with host toolchain. Turns out one mobile related flag was not set on PYTORCH_BUILD_MOBILE code path: ``` "set(INTERN_DISABLE_MOBILE_INTERP ON)" ``` First, move the INTERN_DISABLE_MOBILE_INTERP macro below, to stay with other "mobile + pytorch" options - it's not relevant to "mobile + caffe2" so doesn't need to be set as common "mobile" option; Second, rename PYTORCH_BUILD_MOBILE env-variable to BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN - it's a bit verbose but becomes more clear what it does - there is another env-variable "BUILD_PYTORCH_MOBILE" used in scripts/build_android.sh, build_ios.sh, which toggles between "mobile + pytorch" v.s. "mobile + caffe2"; Third, combine BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN with ANDROID/IOS to avoid missing common mobile options again in future. Test Plan: Imported from OSS Differential Revision: D20251864 Pulled By: ljk53 fbshipit-source-id: dc90cc87ffd4d0bf8a78ae960c4ce33a8bb9e912	2020-03-04 11:43:16 -08:00
Edward Yang	93990bab58	Make use of our S3 mirror if Yann Lecunn's website is not accessible (#34215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34215 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20251538 Pulled By: ezyang fbshipit-source-id: c419f0ce869aca4dede7e37ebd274a08632d10bf	2020-03-04 11:35:34 -08:00
Dmytro Dzhulgakov	67608cc018	Fix MKLDNN conv2d 5d weight handling (#34115 ) Summary: Effectively backporting `c5c00c119f` before that PR lands The bug didn't manifesting itself earlier because MkldnnConv2d constructor didn't reorder the weights. So the issue was arising only on second serialization/deserialization. This also fixes the constructor to deliver better perf right away. Note, that I still serialize 5d tensor - it was the previous behavior, we have to handle it anyway and with https://github.com/pytorch/pytorch/issues/32422 the output of `mkldnn_reorder_conv2d_weight` will always be 4d. cc pinzhenx Pull Request resolved: https://github.com/pytorch/pytorch/pull/34115 Reviewed By: wanchaol Differential Revision: D20224685 Pulled By: dzhulgakov fbshipit-source-id: 24ca9227c4eb4c139096a64ae348808d7478d7dc	2020-03-04 11:26:38 -08:00
Nikita Shulga	9dd5d51b01	[ATen] Exclude CUDA tests when running `basic` under valgrind (#34181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34181 Test Plan: CI Reviewed By: orionr, seemethere Differential Revision: D20241021 fbshipit-source-id: a7371afc45acc2c07a36c8216036338e14170a56	2020-03-04 11:24:33 -08:00
Kimish Patel	8269c4f3d3	Added nullptr check for pthradpool_get_threads_count (#34087 ) Summary: We get seg fault without this in using XNNPACK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34087 Differential Revision: D20199787 Pulled By: kimishpatel fbshipit-source-id: d3d274e7bb197461632b21688820cd4c10dcd819	2020-03-04 11:10:53 -08:00
Shen Li	ac6e75a165	Revert D20195053: [pytorch][PR] Add API for listing functions overridable by __torch_function__ Test Plan: revert-hammer Differential Revision: D20195053 Original commit changeset: 1585f4e405f5 fbshipit-source-id: 3c1aab9c60e3138d40d200ae4238bda0cddf8896	2020-03-04 10:13:54 -08:00
Omkar Salpekar	78b81dad83	[Dist Autograd][Better Engineering] Enhanced Error Reporting in Dist Autograd/RPC (#34179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34179 Fixes: https://github.com/pytorch/pytorch/issues/27644 Test Plan: Asserted `test_backward_autograd_engine_error` throws an exception with node information. Differential Revision: D20238150 fbshipit-source-id: a49b279b77416a7e0e09043aa44ed616023d8e70	2020-03-04 10:13:49 -08:00
Nikita Shulga	45b8c8dbcb	[torch] Fix sign-compare warning in `torch::utils::rnn:pack_sequence` (#34185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34185 ArrayRef<T>::size() is size_t Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20241552 fbshipit-source-id: 73cd062db810ebc5a4e34e094dfe6c7e6571ef2d	2020-03-04 10:13:45 -08:00
Mingfei Ma	39f78db7ec	optimize UpSampleNearest 1d 2d and 3d performance on CPU (#31452 ) Summary: This PR aims at improving `UpSample` performance with `mode='nearest'` on 1D 2D and 3D, both inference and training are covered. Current implementation from 'ATen' doesn't have parallelization. 1. single socket inference speedup for 1d, 2d and 3d: 63x, 57x, 46x. 2. single core inference speedup for 1d, 2d and 3d: 5.9x, 4.6x, 3.4x. 3. dual sockets training speedup for 1d, 2d and 3d: 38x, 33x, 65x Pull Request resolved: https://github.com/pytorch/pytorch/pull/31452 Differential Revision: D20077828 Pulled By: VitalyFedyunin fbshipit-source-id: a7815cf2ae344696067d2ec63bd4f4e858eaafff	2020-03-04 10:13:41 -08:00
Hong Xu	112cecc440	Remove the use of macros when defining division between integers (#34104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34104 Test Plan: Imported from OSS Differential Revision: D20222676 Pulled By: VitalyFedyunin fbshipit-source-id: fb026ce7843e7931324ea82542fb07784e40efdb	2020-03-04 10:13:36 -08:00
Hong Xu	438f4ea0ac	Cleaner implementation of bitwise operations of integeral types (#33849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33849 For integral types, there is no need to manipulate with `reinterpret_cast` and therefore a cleaner implementation is available. This might also be helpful on some less optimized compilers or on a less optimized arch (while a test on gcc 8.3 x64 shows no difference in performance). Test Plan: Imported from OSS Differential Revision: D20222675 Pulled By: VitalyFedyunin fbshipit-source-id: 875890d1479f8abab4c4a19d934fe9807d12dfd2	2020-03-04 10:13:32 -08:00
Hong Xu	3a3fcbbc39	Use templates instead of macros when defining bitwise operators. (#33835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33835 Test Plan: Imported from OSS Differential Revision: D20131414 Pulled By: VitalyFedyunin fbshipit-source-id: ec7eb7cb14e037a277cc8d71d5c9df27abf51752	2020-03-04 10:11:36 -08:00
Shen Li	78ad3dc174	Fix Lint (#34218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34218 Test Plan: Imported from OSS Differential Revision: D20249788 Pulled By: mrshenli fbshipit-source-id: 5ca2acaff5344fc4455c70af60576f8e93e54cbf	2020-03-04 09:48:57 -08:00
Jerry Zhang	6f52562e75	[quant][graphmode] Add add_relu pattern in skip values (#32816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32816 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20208786 fbshipit-source-id: ef84b77f46f88b192a75c123aabaa203836a7dfb	2020-03-04 09:36:02 -08:00
Edward Yang	22506ae71d	Reduce code duplication in OperatorEntry by keying hash map on optional<DispatchKey> (#33817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33817 Then, nullopt denotes catch all, whereas everything else is specific to a DispatchKey. I can delete the second copy of methods when I do this. This refactor should be pushed all the way to the frontend but I am doing it one step at a time. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20125163 Pulled By: ezyang fbshipit-source-id: 026075a4bab81b0bd88b07f0800f6e6bbeb2166a	2020-03-04 08:57:22 -08:00
Summer Deng	c688eb28a2	Minor fix for quantizing the Ads complex model Summary: Remove Int8Relu in quantized model Suppress log warnings if verbose is false Test Plan: TBD Reviewed By: yinghai Differential Revision: D20202474 fbshipit-source-id: 995ef8e665d8edeee810eedac831440b55271a7b	2020-03-04 08:34:59 -08:00
peter	5f4a01b2ea	Update MAGMA to 2.5.2 for Windows (#34205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34205 Differential Revision: D20248224 Pulled By: soumith fbshipit-source-id: f5e0fe06aa8f8ee551abe45db1d55d06e95ab928	2020-03-04 08:28:09 -08:00
Peter Bell	f6c883ccea	TH: Defer to ATen's AVX detection code (#34088 ) Summary: As per https://github.com/pytorch/pytorch/issues/22338#issuecomment-593028168, this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34088 Differential Revision: D20236039 Pulled By: ezyang fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c	2020-03-04 08:22:02 -08:00
Martin Yuan	fdd771c90f	Make tracing in code gen optional (#33715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33715 Tracing codes depend on the full JIT, which is not available in lite interpreter. Use `-c pt.disable_gen_tracing=1` to turn off generating tracing part. ghstack-source-id: 99252322 Test Plan: ``` buck build xplat/caffe2:torch -c pt.disable_gen_tracing=1 ``` The tracing part of generated/VariableType_?.cpp will not be generated. Reviewed By: smessmer Differential Revision: D19684577 fbshipit-source-id: a1e5b80eca5e51c7bf72b5cc8f0e36c2135fabc2	2020-03-04 08:16:31 -08:00
Nikita Shulga	790274bff2	[caffe2] Fix signed unsigned comparison warning (#34161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34161 Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232087 fbshipit-source-id: 09dc8d452c5923cd2941e0cc01eac7a6677b38e8	2020-03-04 08:02:44 -08:00
Jessica Lin	6d78882158	Add layout.html to template for stable docs (#33770 ) Summary: When docs are built, conf.py points to a _templates-stable/layout.html that does not exist. Adding this file here so future stable docs will build with Google Analytics tags and without the unstable able that is in _templates/layout.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/33770 Differential Revision: D20164895 Pulled By: jlin27 fbshipit-source-id: 5fca9f9b825b1484dab52e2b2d91f92ae6372371	2020-03-04 03:14:52 -08:00
Hao Lu	fc6dce6033	[c10] Fix TORCH_INTERNAL_ASSERT_DEBUG_ONLY MSVC bug (#34173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34173 Test Plan: Temporarily change `AT_ASSERTM` to `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` to test MSVC fix. ``` buck test mode/opt //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' ``` & CI Reviewed By: yinghai Differential Revision: D20235886 fbshipit-source-id: 2b7d618e924a0ede95f4a6b8f60cc08e9d58b09d	2020-03-04 02:45:35 -08:00
Martin Yuan	f097ca503d	Add and test training in lite interpreter. (#32359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32359 Test Plan: Imported from OSS Differential Revision: D19450614 Pulled By: iseeyuan fbshipit-source-id: 6bafff39d7880a5b7fb9cd70c33a4e584812be12	2020-03-03 23:33:43 -08:00
Yinghai Lu	2ba74b741e	Add backward Int8Quantize shape inference (#34152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34152 Propagate the input shape of Int8Quantize backwards. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: csummersea Differential Revision: D20231521 fbshipit-source-id: a77c61b0d5bc570241e62553cecd9ff38553ff44	2020-03-03 22:04:25 -08:00
lixinyu	57c1b80ec2	[pytorch]Migrate _th_ger to Aten and kill resize_scalar in codegen (#33792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33792 Test Plan: Imported from OSS Differential Revision: D20107158 Pulled By: glaringlee fbshipit-source-id: bceddb2d39d3abf36f277daba537677312449c9c	2020-03-03 20:27:54 -08:00
Shihao Xu	7d01888a75	[JIT] Register rpc.rpc_async(..) as a JIT operator (#33329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33329 # Use case ``` torch.jit.script def send_rpc_async(dst_worker_name, user_callable_qual_name, tensor): # type: (str, str, Tensor) -> None rpc._rpc_async_torchscript( dst_worker_name, user_callable_qual_name, args=(tensor,) ) ``` # Problem ``` torch.jit.frontend.NotSupportedError: keyword-arg expansion is not supported: File "/data/users/shihaoxu/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/rpc/rpc_spawn#binary,link-tree/torch/distributed/rpc/api.py", line 722 args = args if args else () kwargs = kwargs if kwargs else {} fut = _invoke_rpc_torchscript(to, qualified_name, args, *kwargs) ~~~~~~ <--- HERE return fut ``` # Solution Register `rpc.rpc_async(..)` as a JIT operator to handle variable-length argument list. # Plan This PR is the required changes to make `rpc.rpc_async(..)` a JIT prim operator, which can dynamically handle different number of arguments. - Register "prim::rpc_async" as a `Symbol` in "interned_string.h" - Add a if branch in "python_sugared_value.cpp" `toSugarValue(py::object, ..)` entry utility function to set up how JIT frontend convert `torch.distributed.rpc.rpc_async(..)` Python function (Python object) into a `SpecialFormValue` (IR SugaredValue). - Add a switch case for "prim::rpc_aynsc" Symbol in "ir_emitter.cpp" and `emitApplySpecialForm(..)` to set up how JIT compiler provides inputs to the "prim::rpc_aynsc" Operator. - Register "prim::rpc_async" as a `jit::Operator` and provide implementation in "register_distributed_ops.cpp". Notice, since the distributed module is an optional part when building PyTorch. The code to be added in this PR should be wrapped within preprocessing maco. ``` #ifdef USE_DISTRIBUTED new code here #endif ``` Test Plan: Items that need to be confirmed in the test cases https://fb.quip.com/DCvdA9ZLjeO0 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_call_python_function_remotely_from_script_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test-2.7 -- test_layer_norm_op_jit ``` Differential Revision: D5738300 fbshipit-source-id: a4604fe762e00be062dc8232ca9790df31fb2074	2020-03-03 19:57:42 -08:00
davidriazati	9b39ad7f2c	[jit] Fix iOS build (#34180 ) Summary: `unpickler.cpp` depends on the mobile type parser all the time, so include it regardless of whether it's a mobile build or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/34180 Pulled By: driazati Differential Revision: D20241881 fbshipit-source-id: a998dd2b3f1c7f58e55bb7851dc595c8ddf9eacb	2020-03-03 19:44:43 -08:00
Jiakai Liu	3c042a6ab9	[pytorch][mobile] support for custom mobile build with dynamic dispatch (#34055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34055 Enable custom mobile build with dynamic dispatch for OSS build. It calls a python util script to calculate transitive dependencies from the op dependency graph and the list of used root ops, then pass the result as the op registration whitelist to aten codegen, so that only these used ops are registered and kept at link time. For custom build with dynamic dispatch to work correctly, it's critical to have the accurate list of used ops. Current assumption is that only those ops referenced by TorchScript model are used. It works well if client code doesn't call libtorch API (e.g. tensor methods) directly; otherwise the extra used ops need to be added to the whitelist manually, as shown by the HACK in prepare_model.py. Also, if JIT starts calling extra ops independent of specific model, then the extra ops need to be added to the whitelist as well. Verified the correctness of the whole process with MobileNetV2: ``` TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh ``` Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D20193327 Pulled By: ljk53 fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa	2020-03-03 19:25:16 -08:00
Jerry Zhang	e5bbd23ca7	[quant][graphmode] Skip quantizing input and output in matched module (#32814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32814 We skip quantization for the intermediate values for patterns like `Conv - ReLU`, but currently we didn't skip quantizing the input/output of the graphs of matched modules, since we now changed the way we add observers, this also needs to be updated. Test Plan: python test/test_jit.py -- 'TestJit.test_insert_observers_skip_values' Imported from OSS Differential Revision: D20208785 fbshipit-source-id: ce30f2c4c8ce737500d0b41357c80ec8b33aecf9	2020-03-03 18:38:36 -08:00
Yunus Rahbar	7cee787a19	[pytorch_ci] Python target determinator (#33577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33221 This will make it so that if a pull request is just pure Python files, then we'll only run the Python tests that are connected to the dependency graph of the touched files. Assumptions made: - the Python code does not do dynamic imports - test_X.py never imports from test_Y.py Right now this is only done for test_nn (presumably the largest test entrypoint), but it's not much more work to do it for all the other test entrypoints too. Test Plan: CircleCI results when touching just a few Python files: - pytorch_macos_10_13_py3_test: 41 ->13 minutes https://circleci.com/gh/pytorch/pytorch/4550574?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link - pytorch_windows_vs2019_py36_cuda10.1_test1: 11 -> 2 minutes https://circleci.com/gh/pytorch/pytorch/4550846?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link - pytorch_windows_vs2019_py36_cuda10.1_test2: 51 -> 21 minutes https://circleci.com/gh/pytorch/pytorch/4550845?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link - pytorch_linux_xenial_py3_6_gcc5_4_test: 41 -> 14 minutes https://circleci.com/gh/pytorch/pytorch/4550543?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Differential Revision: D20009089 fbshipit-source-id: 41708cc301d1c866eb92a04421d8346feb0e3cb5	2020-03-03 18:01:12 -08:00
Amy Yang	7c20578794	NNPI op mapping correct SpatialBN NNPI op name (#34176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34176 Wrong operator name for the NNPI SpatialBN Test Plan: flow canary Reviewed By: hyuen Differential Revision: D20237933 fbshipit-source-id: dfde658dcbf2482320e36d549f7d83c27df264a0	2020-03-03 17:57:28 -08:00
Hao Lu	a19db54b36	[Redo][ATen] Remove AT_ASSERTM from Blob::free_() (#34168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34168 Redo D19153199. It was reverted because it broke CI, due to the change of `AT_ASSERTM` to `TORCH_INTERNAL_ASSERT_DEBUG_ONLY`. Two problems: 1) bug in `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` about MSVC. I'm sending another diff to fix this bug. 2) BlobTest was expecting `Blob::template Get<T>()` to throw when there is a type mismatch. For now I'll leave `AT_ASSERTM` as it is. Test Plan: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' --run-disabled buck test mode/opt //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' --run-disabled ``` Reviewed By: yinghai Differential Revision: D20235225 fbshipit-source-id: 594dad97c03c419afaa8f9023408bc5a119b3cfa	2020-03-03 17:54:05 -08:00
Emilio Castillo	31cc311143	Expose `CUDACachingAllocator` `raw_alloc` and `raw_delete` to python (#33860 ) Summary: This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls). Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal https://github.com/cupy/cupy/pull/3126 We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33860 Differential Revision: D20212788 Pulled By: ngimel fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c	2020-03-03 17:50:11 -08:00
Nikita Shulga	4edff32f81	[c10] Fix typo in __assert_fail noreturn modifier guard (#34157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34157 `[[noreturn]` only conficts with CUDA __asert_fail defition if clang is used if host compiler Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232088 fbshipit-source-id: 7182c28a15278e03175865cd0c87410c5de5bf2c	2020-03-03 17:25:25 -08:00
davidriazati	99e211e661	[jit] Add type tags to lists/dicts in pickle (#33255 ) Summary: Stacked PRs * #33474 - [jit] Remove list specializations from pickler * #33255 - [jit] Add type tags to lists/dicts in pickle This adds a global call to `torch.jit._pickle.restore_type_tags` for lists and dicts so that we can preserve their types after serialization. ](https://our.intern.facebook.com/intern/diff/19868637/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255 Pulled By: driazati Reviewed By: xman1979, Tianshu-Bao Differential Revision: D19868637 fbshipit-source-id: 2f1826e6679a786ca209198690269f399a542c04	2020-03-03 16:48:21 -08:00
Shen Li	7da24b36b1	Apply clang-format to RPC files (#34139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34139 Test Plan: Imported from OSS Differential Revision: D20227342 Pulled By: mrshenli fbshipit-source-id: 01b478bde1f6a51f69eb5277fa90ba6ac2d4b5dc	2020-03-03 16:44:35 -08:00
Shen Li	3af0dffe84	Use double quotes in C++ to stay consistent with Python RPC docs (#34095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34095 Test Plan: Imported from OSS Differential Revision: D20227343 Pulled By: mrshenli fbshipit-source-id: 69c556beee1f9e944eb1053b5ff0ac368dd99c60	2020-03-03 16:44:30 -08:00
Shen Li	f1085a8e41	Improve ProcessGroup RpcBackendOptions Constructor API (#34081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34081 Before this commit, applications have to do the following to configure number of threads in ProcessGroup RPC backend: ``` op = ProcessGroupRpcBackendOptions() op.rpc_timeout = rpc_timeout op.init_method = init_method op.num_send_recv_threads = 32 init_rpc(...., rpc_backend_options=op) ``` After this commit, it can be simplified to: ``` init_rpc(...., rpc_backend_options=ProcessGroupRpcBackendOptions(num_send_recv_threads=32)) ``` Fixes #34075 Test Plan: Imported from OSS Differential Revision: D20227344 Pulled By: mrshenli fbshipit-source-id: def4318e987179b8c8ecca44d7ff935702c8a6e7	2020-03-03 16:43:29 -08:00
Nikita Shulga	9d1c971b11	[Aten] Suppress valgrind leaks in libcuda (#34169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34169 Valgrind have no insight how memory is being initialized by ioctls() Test Plan: CI Reviewed By: seemethere Differential Revision: D20235974 fbshipit-source-id: 46413afa4842e7d42582bbbda903438b1d98691f	2020-03-03 16:00:17 -08:00
Xiang Gao	1beb309e03	Make DEBUG == REL_WITH_DEB_INFO on CUDA build (#34153 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/34079 I don't know how much we care about the difference between `-G` and `-lineinfo` in `DEBUG` vs `REL_WITH_DEB_INFO`, but since `-G` never worked, let's just use `-lineinfo` on both `DEBUG` and `REL_WITH_DEB_INFO`. This would resolve the failure in `DEBUG=1` build. Locally tested to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34153 Reviewed By: ljk53 Differential Revision: D20232049 Pulled By: ngimel fbshipit-source-id: 4e48ff818850ba911298b0cc159522f33a305aaa	2020-03-03 15:07:42 -08:00
Eli Uriegas	cb3905e8cf	.circleci: Re-do run nightly pipelines on tag (#34148 ) Summary: Commit that this commit relied on was found to be causing issues with valgrind https://github.com/pytorch/pytorch/issues/33471 Re-does https://github.com/pytorch/pytorch/issues/34078 after revert. This reverts commit 1aff3e2dd3c3937aa1fedbfeee2143cfca25abcc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34148 Differential Revision: D20234451 Pulled By: seemethere fbshipit-source-id: cb5e496a3f761beeeb0cc8df71f9ebc0b271737b	2020-03-03 15:00:59 -08:00
lixinyu	7cda964e20	Remove deprecated codepath for old-style autograd.Function (#30696 ) (#33956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33956 Test Plan: Imported from OSS Differential Revision: D20167359 Pulled By: glaringlee fbshipit-source-id: 9b323bd29eca97bce0475225ad2b3b2ded29005d	2020-03-03 14:58:02 -08:00
Elias Ellison	04378eb618	[JIT] Add modulelist indexing for integer literal (#29236 ) Summary: Allow indexing into modulelists for integer literals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29236 Differential Revision: D19583935 Pulled By: eellison fbshipit-source-id: 24d54051422a69769dac5e82f3bf622ded2bd8a6	2020-03-03 14:47:31 -08:00
Edward Yang	ba1bd41767	Turn on strict dtype checking for test_torch.py (#33825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33825 Partially addresses #20376 I do this by overriding assertEqual in classes that opt into this. This means I have to fix #33821. The fix is a little unsatisfactory as idiomatic Python 2 super() calls don't work (since the class is no longer in scope); hopefully this will just work when we go to Python 3. General approach taken: - A lot of dtype mismatches are because we specified tensor constants that infer to some dtype, but the actual dtype needed is something else. Those are easy, just annotate the tensor() constructor (often a legacy Tensor/FloatTensor call) with dtype - There are a few cases where the promotion rules are nontrivial. Some of them I just typed out the expected promotion rules manually (based on trial and error) - There are some more complex cases; if it gets too hairy I just set exact_dtype=False and nope the fuck out I don't have time to do it for all the other classes. But the setup should work if people just incrementally add the overrides to classes, and then eventually flip the default. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20125791 Pulled By: ezyang fbshipit-source-id: 389c2d1efbd93172af02f13e38ac5e92fe730c57	2020-03-03 14:45:53 -08:00
Rohan Varma	c579976603	Revert D20171428: [profiler] fix chrome tracing for profiler run with cuda Test Plan: revert-hammer Differential Revision: D20171428 Original commit changeset: ec135a154ce3 fbshipit-source-id: 51ef4351a0df33fd087edbca1b7cd753cdbf1fdf	2020-03-03 14:36:01 -08:00
Xiang Gao	f299c2d6e1	Completely kill CUDA_tensor_apply3 (#34026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34026 Test Plan: Imported from OSS Differential Revision: D20196078 Pulled By: VitalyFedyunin fbshipit-source-id: 502184f412edee90a4f4c030def277a99a7369d4	2020-03-03 14:18:17 -08:00
Xiang Gao	1affaf8d10	Migrate lerp from CUDA_tensor_apply3 to TensorIterator (#34025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34025 Test Plan: Imported from OSS Differential Revision: D20196079 Pulled By: VitalyFedyunin fbshipit-source-id: 150d1de6632c58850020b73ee72e0ed380072926	2020-03-03 14:18:12 -08:00
Xiang Gao	27f56632a4	Migrate bce loss from CUDA_tensor_apply3 to TensorIterator (#34023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34023 Test Plan: Imported from OSS Differential Revision: D20196084 Pulled By: VitalyFedyunin fbshipit-source-id: bd000f09139cb848562e5310f10067db85e1b935	2020-03-03 14:16:40 -08:00
Rohan Varma	92083f31b5	[gloo] dont hold locks in calls to buffer in ProcessGroupGloo:RecvWork::wait() and (#33926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33926 The UnboundBuffer calls here are already protected by a mutex. We only need to hold the lock while writing the shared structures completed_ and exception_. ghstack-source-id: 99315427 Test Plan: CI CI Differential Revision: D20154546 fbshipit-source-id: d1b74508c917b21acdcd0f6a914eb0455437ca0e	2020-03-03 13:28:45 -08:00
Rohan Varma	c93b1d427c	[profiler] fix chrome tracing for profiler run with cuda (#33987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33987 There was an error in https://github.com/pytorch/pytorch/pull/30724/files that resulted in `export_chrome_trace` generating invalid JSON. This only came up when the profiler is run with `use_cuda=True` from what it looks like. In the future, we should have tests that ensure we generate valid JSON because we no longer use the json library. Test Plan: Add UT to validate JSON. Differential Revision: D20171428 fbshipit-source-id: ec135a154ce33f62b78d98468174dce4cf01fedf	2020-03-03 13:27:26 -08:00
Eleanor Dwight Holland	6a97777f72	Remove use of `.data` from optimizers (#33640 ) Summary: Removes all uses of `.data` from optimizers. Or tries to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640 Reviewed By: vincentqb Differential Revision: D20203216 Pulled By: albanD fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0	2020-03-03 13:21:55 -08:00
Jerry Zhang	f26bbb5f86	[fix] flake8 lint error (#34146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34146 Test Plan: . Imported from OSS Differential Revision: D20228830 fbshipit-source-id: 41de3c27c10256939ae6309d25b0499f708a3dca	2020-03-03 13:15:27 -08:00
Dmytro Dzhulgakov	a8fc3d8c2a	Fix HistogramObserver to not do detach on input (#34114 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33545, added a unittest Pull Request resolved: https://github.com/pytorch/pytorch/pull/34114 Differential Revision: D20224719 Pulled By: dzhulgakov fbshipit-source-id: 053d3b3b0c86340027ba1b95b5f3c247aa151aee	2020-03-03 13:15:22 -08:00
Igor Sugak	9650253d70	[caffe2] fix ambiguous call to 'fmaxType' THCHalfAutoNumerics.cuh (#33569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33569 Clang reported a few places where a call to `fmaxType` is ambiguous. In all cases one of the arguments is `double` and another is `float`. Fix the error by creating a proper value 0 and remove the unneeded `ZERO_MACRO` code. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20006926 fbshipit-source-id: ca6cfacd57459b1c48eb5080b822d9509b03544d	2020-03-03 13:13:19 -08:00
Hector Yuen	49586a2a7e	fix sph batchnorm to use sph fma Summary: make use of springhill's fma on SpatialBatchnorm Test Plan: re-enabled the unit test, ran it a couple of times pending: net runner Reviewed By: amylittleyang Differential Revision: D20227767 fbshipit-source-id: 7c601f185940249c0a32bdf95d74a20552cd2625	2020-03-03 12:53:08 -08:00
Hong Xu	49921cad28	Minimum build should also exclude XNNPACK (#34110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34110 Differential Revision: D20228129 Pulled By: ezyang fbshipit-source-id: 24e1482f6a6ff423de966bb7a7a45ad3815791e9	2020-03-03 12:51:37 -08:00
anjali411	fbc9c61c81	randn and normal_ for complex tensors (#34037 ) Summary: 1. randn and normal_ methods will work for complex tensors after this PR 2. added an internal function for viewing complex tensors as float tensors which enables us to reuse functions defined for float tensors for complex tensors with change in arguments passed(like size, standard deviation in case of normal_). currently the resultant new float tensor doesn't share the storage with the input complex tensor which means that the version counter wouldn't be updated if any function is called on this resultant tensor, but once the dtype entry is removed from the storage class, this issue will be resolved. Side notes: 1. didn't add a separate header for the util functions because of this issue https://github.com/pytorch/pytorch/issues/20686#issuecomment-593002293 2. we should eventually have a public API method view_complex_as_float once (2) mentioned above gets resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/34037 Differential Revision: D20221793 Pulled By: anjali411 fbshipit-source-id: a78f5e83d6104e2f55e0b250c4ec32e8d29a14eb	2020-03-03 12:46:01 -08:00
Nathan Goldbaum	ad2825a2c9	Add API for listing functions overridable by __torch_function__ (#33791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33182 This adds private API functions that developers of types that implement `__torch_function__` can use to ensure full coverage of the subset of the PyTorch API that can be overrided. I've refactored some of the code in the tests into a new `torch._overrides.get_overridable_functions` function. I've also changed `TENSOR_LIKE_TORCH_OVERRIDES` into `torch._overrides.get_testing_overrides` and `IGNORED_TORCH_FUNCTIONS` into `torch._overrides.get_ignored_functions`. Making these two static global variables in the tests into functions should allow rewriting their implementation to construct their return values instead of just statically defining the return value as is done here. Currently that is blocked on not being able to inspect function signatures of compiled kernels in PyTorch (see https://github.com/pytorch/pytorch/issues/28233). See the docs I've added for usage examples of these new functions. I also refactored the existing override tests to make use of these new functions, which should be a good forcing function to make sure they're kept up-to-date. Finally, while working on this I discovered that `TestTorchFunctionOverrides.test_mean` and `TestTorchFunctionOverrides.test_mm` weren't ever being run because they were getting clobbered by the other dynamically generated override tests. I fixed that by renaming the tests and then fixing the actual test code. I've verified that all the subclassing semantics is correct and that the updated test answers are correct. I'm happy to put the fixes to the existing tests in as a separate pull request if that would be easier to review. ping cpuhrsch since the feature request originally came from them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33791 Differential Revision: D20195053 Pulled By: cpuhrsch fbshipit-source-id: 1585f4e405f5223932b410eae03a288dc8eb627e	2020-03-03 12:40:34 -08:00
Zachary DeVito	358450e02b	improved TorchScript traceback (#33834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834 This changes how we report Tracebacks to make them more clear when there are both serialized and non-serialized ranges. It now looks like: ``` Traceback (most recent call last): File "foo.py", line 25, in <module> s2(a, b) File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(input, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__.py", line 7, in forward x: Tensor, y: Tensor) -> Tensor: return (self).bar(x, y, ) ~~~~~~~~~ <--- HERE def bar(self: __torch__.Moo, x: Tensor, File "code/__torch__.py", line 11, in bar x: Tensor, y: Tensor) -> Tensor: _0 = (self).baz(x, y, ) ~~~~~~~~~ <--- HERE _1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None) return torch.add(_0, _1, alpha=1) File "code/__torch__.py", line 17, in baz x: Tensor, y: Tensor) -> Tensor: return torch.add(x, y, alpha=1) ~~~~~~~~~ <--- HERE Traceback of TorchScript, original code (most recent call last): File "foo.py", line 11, in forward def forward(self, x, y): return self.bar(x, y) ~~~~~~~~ <--- HERE File "foo.py", line 9, in bar def bar(self, x, y): return self.baz(x, y) + torch.ones(3) ~~~~~~~~ <--- HERE File "foo.py", line 7, in baz def baz(self, x, y): return x + y ~~~~~ <--- HERE RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1 ``` It follows Python convension of having the most important information last and reading from the bottom up. Changes: Moved the error message to the end, to copy Python * Report original traceback separate from serialized traceback * Make sure root functions have names in the interpreter trace. Test Plan: Imported from OSS Differential Revision: D20126136 Pulled By: zdevito fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13	2020-03-03 12:27:38 -08:00
Edward Yang	74a0663afd	In torch_test, mark every test that takes >5s on a DEBUG CPU-only build as slow test (#33901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33901 After this change, the pytest profile looks like: 4.83s call test/test_torch.py::TestTorch::test_fft_ifft_rfft_irfft 4.23s call test/test_torch.py::TestTorch::test_var_dim 4.22s call test/test_torch.py::TestTorch::test_std_dim 4.19s call test/test_torch.py::TestTorch::test_max 4.06s call test/test_torch.py::TestTorch::test_min 3.60s call test/test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu 2.62s call test/test_torch.py::TestTorchDeviceTypeCPU::test_pow_cpu 2.60s call test/test_torch.py::TestTorch::test_matmul_small_brute_force_1d_Nd And the entire CPU-only test suite can be run in 88s on my Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20222288 Pulled By: ezyang fbshipit-source-id: 4224a9117f42566e290ae202881d76f1545cebec	2020-03-03 11:49:49 -08:00
Simon Layton	9b527b35bb	CUDA Vectorized Dropout (#33879 ) Summary: Add vectorization to dropout kernels for both reads & writes. Moved the `masked_scale_kernel` implementation to `TensorIterator` to pick up recent autovectorization additions by zasdfgbnm , and wrote a vectorized specialization of the dropout training kernel (along with some fairly conservative dispatch logic). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33879 Differential Revision: D20222853 Pulled By: ngimel fbshipit-source-id: 711f56ca907fbc792a10d4bf069c28adab7d6ad7	2020-03-03 11:43:45 -08:00
Jiakai Liu	0cf34cf672	[pytorch][mobile] make sure mobile build work with dynamic dispatch (#34038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34038 Mobile build doesn't include autograd/VariableType dispatch. As the result AutoNonVariableTypeMode needs to be set in mobile runtime. With static dispatch this works is done inside generated jit-dispatch code - AutoNonVariableTypeMode needs to be set on per-op basis. Setting it globally or setting it for wrong ops might break some `is_variable()` checks in the codebase. Thanks to the unification of Variable class and Tensor class, all is_variable() checks have been removed, so AutoNonVariableTypeMode can be set globally now. We never tested inference-only mobile build with dynamic dispatch. It seems that dynamic dispatch also requires setting AutoNonVariableTypeMode for our mobile build (where VariableType functions are not registered). Verified the end-to-end test works with this change: ``` TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh ``` Test Plan: Imported from OSS Differential Revision: D20193329 Pulled By: ljk53 fbshipit-source-id: cc98414d89d12463dc82b0cdde0b6160dafc0349	2020-03-03 11:34:08 -08:00
Jiakai Liu	51936c5ea4	[pytorch][CI] end-to-end custom build script (#34012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34012 Today some mobile simulator tests only run on landed PRs and it requires setting up special build environment to repro errors locally. The goal of the PR is to do end-to-end mobile custom build & integration tests with host toolchain (using same CMake options as mobile build). This way, non-mobile engineers can capture & debug mobile related build issues much more easily. There are three custom build types that this script supports: 1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch libraries released for Android and iOS (same CMake build options + host toolchain), which doesn't contain autograd function nor backward ops thus is smaller than full LibTorch. 2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch size by only including ops used by a specific model. 3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it relies on the op dependency graph (instead of static dispatch) to calculate and keep all transitively dependent ops by the model. Type 2) will be deprecated by type 3) in the future. Type 3) custom build has not been fully supported yet so it's expected to fail. Replacing existing mobile build CI to run Type 1) build & integration test. Test Plan: Imported from OSS Differential Revision: D20193328 Pulled By: ljk53 fbshipit-source-id: 48c14cae849fde86e27123f00f9911996c1cf40e	2020-03-03 10:55:17 -08:00
Jerry Zhang	5b9f1ada30	[quant][graphmode] Observing input/output values in call site (#33277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33277 Currently we insert observer in the called graph, which is incorrect since graphs can be shared and the decision of whether to insert observer or not might dependend on where the graph is called. For example, for a call sequence `self.conv1(self.conv2(x))`, we can't inserting observer correctly if `self.conv1` and `self.conv2` are sharing the same type in the current implementation, because we insert observer in the graph of the forward method of Conv2d right now and this call sequence requires us to insert only one observer for the output of self.conv1/input of self.conv2. We'll need to insert observers for input/output values of the graph in call site instead. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20208787 fbshipit-source-id: 739e1d877639c0d0ed24e573bbd36211defa6836	2020-03-03 10:53:24 -08:00
Igor Sugak	7289e8e865	[caffe2] std::numeric_limits<double>::quiet_NaN() use instead of ::nan("") (#33566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33566 Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20006447 fbshipit-source-id: ec522bc2065ad033ee2eeedd26d4a8a7a27e5f56	2020-03-03 10:42:58 -08:00
Michael Ranieri	1702152ef9	fixup unit tests (#34105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105 make parallel_net_test.cc chronos conforming. exclude gtest asserts that check thrown exceptions when exceptions are disabled. Test Plan: CI green Differential Revision: D20153525 fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0	2020-03-03 10:33:21 -08:00
Xiang Gao	5082839de5	Migrate Lerp from CUDA_tensor_apply4 to TensorIterator (#33994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33994 Test Plan: Imported from OSS Differential Revision: D20196788 Pulled By: VitalyFedyunin fbshipit-source-id: e5e281460e8cca7ea3911fe56549e1ab62d50e76	2020-03-03 09:38:49 -08:00
Xiang Gao	4074d559e4	Migrate kl_div_backward from CUDA_tensor_apply3 to TensorIterator (#34022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34022 Test Plan: Imported from OSS Differential Revision: D20196080 Pulled By: VitalyFedyunin fbshipit-source-id: 265884dc01c3260197776ee5baaadbe6b523fede	2020-03-03 09:33:31 -08:00
Gregory Chanan	3def76583a	[RESUBMIT] [pytorch] Migrating index_add cuda to ATen (#33548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33548 Mostly just moved code. Index dim and number of indices checks are added to make checks idential to index_add_cpu_ This is a resubmit of #30573, which got reverted. Test Plan: Imported from OSS Differential Revision: D20002248 Pulled By: gchanan fbshipit-source-id: 46df4047cb3fc1dff37a15b83c70b2cbb7a6460b	2020-03-03 09:06:13 -08:00
Gerard Goossen	f29110fdf8	[pytorch] blas gemm fix for k=0 (#33819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33819 These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true. Resubmit of https://github.com/pytorch/pytorch/pull/33419 (which got reverted due to a problem with XLA, but which now has been fixed) ghstack-source-id: 99333280 Test Plan: Test included Differential Revision: D20121460 fbshipit-source-id: c1056b8e26751e24078bbe80c7cb4b223bcca7cb	2020-03-03 08:56:05 -08:00
Shen Li	b1fd7ba019	Revert D20169501: [pytorch][PR] .circleci: Add CUDA 10.2 to our CI pipeline Test Plan: revert-hammer Differential Revision: D20169501 Original commit changeset: 43b7ca680200 fbshipit-source-id: dbeb0315ccc06b8e082d019cd1ffcd97e1d38e04	2020-03-03 08:15:36 -08:00
Shen Li	1aff3e2dd3	Revert D20204104: [pytorch][PR] .circleci: Add filter to run nightly builds on tag Test Plan: revert-hammer Differential Revision: D20204104 Original commit changeset: 685630e8a04b fbshipit-source-id: 1f4c890b0b199b406bac51e30febb8c6482e7e31	2020-03-03 08:03:03 -08:00
cyy	5be8a4e027	find mkl installed by nuget (#34031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34031 Differential Revision: D20221807 Pulled By: ezyang fbshipit-source-id: 827e2775956f408febb287676bbf9a96a70fe2d4	2020-03-03 07:44:20 -08:00
momohatt	a23e8099dd	Fix typo (#34008 ) Summary: This PR removes apparently unnecessary dots in the documentation of `torch.t`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34008 Differential Revision: D20195084 Pulled By: ezyang fbshipit-source-id: a34022de6b7a32d05a0bb3da197ee3507f4b8d8e	2020-03-03 07:38:40 -08:00
Boyuan Chen	2ce9d26809	Support cdf for mixture_same_family distribution (#33408 ) Summary: The new added mixture_same_family should support cdf if the family has cdf implemented. This is very useful for flow models where cdf of mixture of gassian/logistic is used to model flow Pull Request resolved: https://github.com/pytorch/pytorch/pull/33408 Differential Revision: D20191552 Pulled By: ezyang fbshipit-source-id: 0bfd7973aa335c162919398a12ddec7425712297	2020-03-03 07:31:24 -08:00
Andrey Malevich	e0b90b87a4	[C2] Fix slowness of the ReshapeOp. (#33729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33729 ReshapeOp is doing some useless movements of data between CPU and GPU, which results in crazy amount of kernel calls from this operator. Which makes this operator ridiculosly slow compared to BatchMatMul for cases of pretty cheap models (for example on some versions of GAT). This diff is moving ReshapeOp to leverage CPU storage and reduce amount of kernel calls from num_dims + 3 calls for case of 3-D tensor to 2 calls. Test Plan: Unit-tests are still passing. TODO: perf testing Reviewed By: akyrola Differential Revision: D19659491 fbshipit-source-id: 2341b21e57208b988169f2df5fb598be3dc8acb2	2020-03-03 00:44:22 -08:00
Rohan Varma	0afee0c20b	[rpc][metrics] add initial metric handler classes. (#33153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33153 Test Plan: Added unit tests. Reviewed By: pritamdamania87 Differential Revision: D19615364 fbshipit-source-id: e0447463651390b08ad48e134cb73764d8dcf4f3	2020-03-02 22:03:12 -08:00
Nikita Shulga	0689cf8fc1	[c10] Make __assert_fail CUDA definition compilable with clang host compiler (#34102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34102 if nvcc is invoked with clang host compiler, it will fail with the following error due to the decorators mismatch defined in cuda and c10: ``` error: attribute "noreturn" did not appear on original declaration ``` Test Plan: Build pytorch with clang Reviewed By: EscapeZero Differential Revision: D20204951 fbshipit-source-id: ff7cef0db43436e50590cb4bbf1ae7302c1440fa	2020-03-02 20:11:49 -08:00
cyy	8a14b41617	fix warnings reported by PVS (#33868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868 Differential Revision: D20169059 Pulled By: ailzhang fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd	2020-03-02 18:51:38 -08:00
Eli Uriegas	0729ad733d	Change lint from python2 -> python3 (#34107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34107 Updates linter to only lint for python3 instead of linting for python2 Test Plan: good_testplan Reviewed By: orionr Differential Revision: D20205395 fbshipit-source-id: 1fa34e5fdf15f7aed96a66d2ce824a7337ee6218	2020-03-02 18:11:42 -08:00
Wanchao Liang	f909b5535e	[autograd] fix allow_unused checking for C++ API (#34035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34035 Bug for the conditon check in https://github.com/pytorch/pytorch/pull/24342, realized we don't have tests in either python or cpp to catch this, so added testes for both python and cpp. Thanks hczhu on capturing it! Test Plan: Imported from OSS Differential Revision: D20198837 Pulled By: wanchaol fbshipit-source-id: 33846a14c0a8e7aac2e8328189d10c38a0d7e6ee	2020-03-02 17:57:15 -08:00
Amy Yang	0759191f12	blacklist spatialBN until bitwise matching (#34092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34092 Disable op in transform map until we get bitwise matching to ice-ref Test Plan: CI Reviewed By: hyuen Differential Revision: D20177936 fbshipit-source-id: e316384184cb264852e63e5edce721a8614742d1	2020-03-02 17:55:00 -08:00
Eli Uriegas	3b93928ada	.circleci: Add filter to run nightly builds on tag (#34078 ) Summary: ## What this will do: When the repository is tagged the current nightly build pipelines will run and upload to the `test` subdirectory in our S3 bucket for `download.pytorch.org`. Will also upload to the correct organization on anaconda [pytorch-nightly](https://anaconda.org/pytorch-test) This is only meant for release candidates and will actually not run on any tag that does not match the release candidate regex. This has been tested on a small scale with: `3ebe0ff2f8` ## Related PRs: * `.circleci: Divert packages to test channel on tag`: https://github.com/pytorch/pytorch/pull/33842 * `.cirlceci: Swap PYTORCH_BUILD_VERSION if on tag`: https://github.com/pytorch/pytorch/pull/33326 ## Work to be done later: - [ ] Figure out how to remove manual step of updating s3 html indices. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34078 Differential Revision: D20204104 Pulled By: seemethere fbshipit-source-id: 685630e8a04b19fc17374585e9228a13a8c3e407	2020-03-02 17:20:21 -08:00
Jiakai Liu	ad3f4a32bd	[pytorch][buck] fix selective buck build (#34090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34090 Update the per-op-registration template file to use the new c10 registration API. ghstack-source-id: 99318973 Test Plan: ``` buck build -c pt.selective_build=1 \ fbandroid/mode/dev_clang_libcxx fbandroid/mode/server \ xplat/caffe2/fb/lite_predictor:lite_predictor_resnet ``` Differential Revision: D20200452 fbshipit-source-id: dc619cb6bdfc0c787b87475eb24b6a2da29e70e2	2020-03-02 17:13:08 -08:00
Rohan Varma	1ed950e1b6	[distributed] skip use_ignore_output tests in c10d if not built with gloo (#33513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33513 These tests require gloo so like the other tests, they should be skipped if not building with gloo. Otherwise they crash on Mac if not built with gloo currently. verified that it does not crash anymore with this PR. ghstack-source-id: 99303707 Test Plan: Built on Mac and verified that the tests do not fail. Differential Revision: D19976908 fbshipit-source-id: 6a2a70c3eab83efd0e188e86cabe56de4a869f4d	2020-03-02 16:43:21 -08:00
Xiang Gao	ff1fc402a8	Migrate dirichlet from CUDA_tensor_apply3 to TensorIterator (#34021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34021 Test Plan: Imported from OSS Differential Revision: D20196082 Pulled By: VitalyFedyunin fbshipit-source-id: 9736a0ebbc529975e95a4f996dbc28e070cf1e63	2020-03-02 16:31:32 -08:00
Xiang Gao	77b9016a8e	Migrate gamma grad from CUDA_tensor_apply3 to TensorIterator (#34020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34020 Test Plan: Imported from OSS Differential Revision: D20196083 Pulled By: VitalyFedyunin fbshipit-source-id: 8659bc004678a656071263c94e929f2e1a686812	2020-03-02 16:29:45 -08:00
Eli Uriegas	bb4465f9f5	.circleci: Add CUDA 10.2 to our CI pipeline (#33471 ) Summary: Adds support for CUDA 10.2 builds on our nightly pipelines / regular test pipeliens. Depends on https://github.com/pytorch/builder/pull/404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33471 Test Plan: sandcastle_will_deliver Reviewed By: ezyang Differential Revision: D20169501 Pulled By: seemethere fbshipit-source-id: 43b7ca680200a67fa88ad4f7b5a121954c9f089d	2020-03-02 15:50:48 -08:00
Michael Ranieri	b874c039f6	Allow checking for cached module before asserting (#33954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954 fixes caffe2/core/module_test.cc on windows misc lint fixes. Test Plan: CI green Reviewed By: malfet Differential Revision: D20153512 fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0	2020-03-02 15:43:50 -08:00
davidriazati	a4716d0e26	Fix lint (#34094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34094 Pulled By: driazati Differential Revision: D20201433 fbshipit-source-id: d8292b329aebd232556db517b71daeee3f266bfc	2020-03-02 15:34:52 -08:00
Ilia Cherniavskii	c206b4398d	Show errors from the tasks in the thread pool (#33938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33938 Making sure we don't silently ignore exceptions from the tasks in the thread pool Test Plan: python setup.py clean && python setup.py develop install Differential Revision: D20178603 Pulled By: ilia-cher fbshipit-source-id: 34971032205a1a53fb7419ed84ebb986f9e959ad	2020-03-02 14:49:52 -08:00
hirano	a57a7b4c29	Change input value in examples of `BCEWithLogitsLoss` (#34053 ) Summary: In the examples of `BCEWithLogitsLoss`, `0.999` is passed as the prediction value. The value `0.999` seems to be a probability, but actually it's not. I think it's better to pass a value that is greater than 1, not to confuse readers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34053 Differential Revision: D20195456 Pulled By: ezyang fbshipit-source-id: 3abbda6232ee1ab141d202d0ce1177526ad59c53	2020-03-02 14:35:56 -08:00
Michael Ranieri	15bf4892f2	prevent crash on exit from static destructor race (#33955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33955 unit tests on windows (clang and cl) were crashing on exit due to racing with static variable destruction. Test Plan: CI green Differential Revision: D20153587 fbshipit-source-id: 22e35e591660d49f3a755f93d0c14d7a023ebb2a	2020-03-02 14:28:13 -08:00
Pavel Belevich	e568c039bd	Enable Tensor.random_(from, to) for half on CPU (#34030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34030 Test Plan: Imported from OSS Differential Revision: D20182412 Pulled By: pbelevich fbshipit-source-id: b7439e6d66e1c0b9ffa8b397cab057c9146f5714	2020-03-02 14:22:35 -08:00
Alfredo Canziani	384a4feab6	Fix bad math typesetting (#34027 ) Summary: Fixing documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34027 Differential Revision: D20195235 Pulled By: ezyang fbshipit-source-id: 0281bc0e8718e700e0982ced1342969b367ba57c	2020-03-02 14:16:34 -08:00
davidriazati	11843049d5	[jit] Fix flipped PackedSequence outputs in script (#32955 ) Summary: Stacked PRs * #32955 - [jit] Fix flipped PackedSequence outputs in script * #32953 - [jit] Support properties on `Device` Fixes #32605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32955 Pulled By: driazati Differential Revision: D20165514 fbshipit-source-id: a130c438b40e51ec27d36f021b0dc7869570aa6a	2020-03-02 13:50:36 -08:00
Ankesh Anand	45c45195cd	Remove warning about building from source to use the NCCL backend (#34051 ) Summary: I think this warning isn't true anymore, and the NCCL backend works without PyTorch needing to be built from source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34051 Differential Revision: D20195310 Pulled By: ezyang fbshipit-source-id: 14f879a8c43ea5efdbdf0f638792ea2b90011f4a	2020-03-02 13:43:43 -08:00
Michael Ranieri	51d969e86a	preprocessor cleanup (#33957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957 lots of small preprocessor warning cleanup for windows Test Plan: CI green Reviewed By: malfet, albanD Differential Revision: D20153582 fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04	2020-03-02 13:37:19 -08:00
Peter Bell	4b3ae7e0af	Enable -Werror=format compile errors on torch exception types (#34019 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33899 In the issue, we have ``` TypeError("expected %s (got %s)", dispatch_key, toString(other.key_set()).c_str()); ``` which results in `dispatch_key` being interpreted as a c-string by `sprintf`. Adding `__attrbute__((format))` to the `TypeError` constructor allows gcc or clang to detect this at compile time. Then `-Werror=format` makes it a hard error at compile time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34019 Differential Revision: D20194842 Pulled By: ezyang fbshipit-source-id: fa4448916c309d91e3d949fa65bb3aa7cca5c6a8	2020-03-02 13:25:39 -08:00
Michael Ranieri	9239608037	fix windows clang attributes (#33959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959 make sure clang on windows uses correct attributes. add support for cl.exe style pragma attributes Test Plan: CI green Differential Revision: D20153548 fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1	2020-03-02 13:20:51 -08:00
Xiang Gao	87b3f87f27	Migrate prelu from CUDA_tensor_apply2 to TensorIterator (#34003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34003 Test Plan: Imported from OSS Differential Revision: D20196994 Pulled By: VitalyFedyunin fbshipit-source-id: 1749a968b1ec6636e08c11c93de43b5599e7cf4b	2020-03-02 12:49:32 -08:00
Shen Li	9956a231b9	Fix backward compatibility tests (#34071 ) Summary: 1. As RRef has been added as a JIT type in https://github.com/pytorch/pytorch/issues/32992, we no longer need to skip them 2. Nightly now knows about Any Pull Request resolved: https://github.com/pytorch/pytorch/pull/34071 Reviewed By: houseroad Differential Revision: D20196963 Pulled By: mrshenli fbshipit-source-id: 1ea79c5682e8be9087b9cb74104e1b914c3fc456	2020-03-02 12:42:33 -08:00
Michael Ranieri	ec0f2184ba	clang intrinsics targeting (#33958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33958 look for clang intrinsic headers on windows Test Plan: CI green Differential Revision: D20153573 fbshipit-source-id: c87da3b0e9950d3df0bf8350df8ae592064d6613	2020-03-02 12:37:07 -08:00
anjali411	ba4cff2ffc	[dtype inference] Following pytorch default for float vs double (#33713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33713 Differential Revision: D20193387 Pulled By: anjali411 fbshipit-source-id: d802ec395df4e75e2be02e91d7288ae6fb7cf8e0	2020-03-02 11:56:34 -08:00
Zino Benaissa	cab8772c6c	Freezing Torchscript modules (#32178 ) Summary: This patch enables folding GetAttr nodes with their corresponding values. _jit_pass_freeze_module API returns a new TorchScipt module where all function calls and get attributes are inlined. Usage: frozen_model = torch._C._freeze_module(scrited_model._c) frozen_model.forward(...) This API currently optimizes the forward method. We will follow up to to preserve and optimize methods and attributes that are annotated as torch.jit.interface. Several future improvements to JIT optimizations are required to maximize clean up/de-sugar the graph and eliminate redundancies. Ideally, we want to produce a graph that can easily be lowered to GLOW and other low-level backends. __ Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178 Differential Revision: D19419640 Pulled By: bzinodev fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b	2020-03-02 11:38:36 -08:00
Hong Xu	e73d4286b0	Fix conflict between XNNPACK's clog dependency and our cpuinfo dependency (#33922 ) Summary: Currently if we run ```bash DEBUG=1 ONNX_ML=0 MAX_JOBS=8 CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache CMAKE_CUDA_COMPILER_LAUNCHER=ccache USE_OPENMP=0 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_NCCL=0 USE_CUDA=1 USE_CUDNN=0 USE_STATIC_CUDNN=0 USE_NNPACK=0 USE_QNNPACK=0 USE_FBGEMM=0 BUILD_TEST=0 TORCH_CUDA_ARCH_LIST="6.1" python setup.py develop --cmake-only ``` then `touch build/CMakeCache.txt` (which adjusting build options will do), then `python setup.py develop`, the following error message will show up: ``` CMake Error at build/clog-source/CMakeLists.txt:249 (ADD_SUBDIRECTORY): ADD_SUBDIRECTORY not given a binary directory but the given source directory "/home/hong/wsrc/pytorch/build/clog-source" is not a subdirectory of "/home/hong/wsrc/pytorch/build/clog-source". When specifying an out-of-tree source a binary directory must be explicitly specified. ``` This is due to a conflict between our cpuinfo submodule and XNNPACK's external clog dependency. Moving our cpuinfo upward and setting CLOG_SOURCE_DIR resolves the issue. --- Also reverted https://github.com/pytorch/pytorch/issues/33947 , where `CLOG_SOURCE_DIR` as an option is not quite appropriate (given that cpuinfo uses its included clog subdir) and the setting of this variable should be a bit later when the dir of cpuinfo is known. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33922 Differential Revision: D20193572 Pulled By: ezyang fbshipit-source-id: 7cdbdc947a6c7e0ef10df33feccb5b20e1b3ba43	2020-03-02 10:40:12 -08:00
Jie	e54b8e1a47	[CUDNN NHWC CONVOLUTION] Re-stride input tensors for wgrad in cudnn_convolution (#33784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33784 Differential Revision: D20127485 Pulled By: VitalyFedyunin fbshipit-source-id: 9d893ffe7ff9499e7e9a7e8bed720e9441d1018e	2020-03-02 10:05:59 -08:00
Jongsoo Park	31737e989d	[aten] remove shadowed declaration warning (#34014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34014 Remove warning ``` caffe2/aten/src/ATen/core/op_registration/op_registration.h: In lambda function: caffe2/aten/src/ATen/core/op_registration/op_registration.h:704:47: warning: declaration of ‘c10::DeviceType t’ shadows a parameter [-Wshadow=compatible-local] auto deviceTypeToDispatchKey = [](DeviceType t){ ^ caffe2/aten/src/ATen/core/op_registration/op_registration.h:703:21: note: shadowed declaration is here inline CppFunction dispatch(DeviceType t, Func&& raw_f) { ~~~~~~~~~~~^ ``` Test Plan: CI Reviewed By: dzhulgakov Differential Revision: D20181155 fbshipit-source-id: 41947d171369b9bd7a87e3e367492f9b2165fd6b	2020-03-02 09:22:13 -08:00
Orion Reblitz-Richardson	ad17dafc50	[caffe2] Remove python2 from operator_test (#33977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33977 Removing python2 from operator_test so we can retire python2 support for PyTorch. Test Plan: waitforsandcastle Reviewed By: seemethere Differential Revision: D20129500 fbshipit-source-id: d4c82e4acfc795be9bec6a162c713e37ffb9f5ff	2020-03-02 08:55:53 -08:00
Sean Silva	f4532d7542	Fix typo (#33925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33925 Differential Revision: D20171970 Pulled By: vincentqb fbshipit-source-id: 5c1a8553760f74cecebaea7e88463b767ab81211	2020-03-02 08:13:55 -08:00
Shen Li	71f8624ecb	Revert D19153199: [ATen] Remove `AT_ASSERTM` from Blob::free_() Test Plan: revert-hammer Differential Revision: D19153199 Original commit changeset: f93983d5bf32 fbshipit-source-id: d79cf659f3cb26427196b9d9d1fe44e15874ad79	2020-03-02 07:35:35 -08:00
Moto Hira	6631c2a627	[doc] Add grad context manager doc to toplevel torch module. (#33877 ) Summary: fixes https://github.com/pytorch/pytorch/issues/32014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33877 Differential Revision: D20141801 Pulled By: albanD fbshipit-source-id: bac713382a71666dd5e2499f710c51a55cc579ba	2020-03-02 06:32:36 -08:00
Xiao Wang	a500491cbc	Fix index_put when tensor length > int_max (#33753 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/33345. The original CUDA kernel looks good. I changed most appearances of `int` to `int64_t` to avoid the CUDA memory access issue. Removed the two `TORCH_CHECK`. Added a unit test. cc csarofeen ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/33753 Differential Revision: D20185005 Pulled By: ngimel fbshipit-source-id: ef0abdc12ea680e10fe6b85266e2773c7a272f0d	2020-03-01 21:51:23 -08:00
Hao Lu	f857fe18cd	[ATen] Remove `AT_ASSERTM` from Blob::free_() (#33929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33929 `Blob::~Blob()` calls `Blob::free_()`. `Blob::free_()` throws and destructors should not throw. A few other minor tweaks include: - Remove `static_cast<void*>()` in `ShareExternal` - Remove default values of `pointer_` and `has_ownership_` Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu ``` https://our.intern.facebook.com/intern/ads/canary/424941782651397826 https://our.intern.facebook.com/intern/ads/canary/424941799628450155 Reviewed By: yinghai Differential Revision: D19153199 fbshipit-source-id: f93983d5bf324b9a464ad2d1ed0dba13f807d2f6	2020-03-01 21:09:04 -08:00
svcscm	e017b1e9fb	Updating submodules Summary: GitHub commits: `af57f36db0` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 4bd71218aee5e2a20a3496f2a51d464a19c0f879	2020-03-01 20:54:32 -08:00
Basil Hosmer	ad769d74d9	Collapse _like overloads into a single overload. (#33705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33705 The fact that there were two overloads appears to be a historical artifact that dates back to when goldsborough originally added these bindings in the first place. If TensorOptions is made optional, then you only need one overload, not two, as they are exactly redundant with each other. When MemoryFormat was added, it was made a little harder to do this, as the C++ syntax at::empty_like(t, memory_format) would not work if you collapsed the overload; but now it works because TensorOptions supports MemoryFormat. The upshot is, I can get rid of all the overloads and just have one overload. Amazingly, this change is backwards compatible, as the test attests. While I was at it, I also deleted the overload name from the functions entirely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20073355 Pulled By: bhosmer fbshipit-source-id: c6a8908213b32ccf6737ea864d135e2cce34f56b	2020-03-01 19:40:22 -08:00
Basil Hosmer	b98bce8cd4	Add MemoryFormat to TensorOptions, but not codegen. (#33704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33704 This diff adds MemoryFormat field to TensorOptions, and teaches all kernels that take TensorOptions to respect it, but doesn't teach the codegen about it. As such, it is now possible to specify memory_format using TensorOptions syntax, e.g., at::empty_like(tensor, at::memory_format(MemoryFormat::Contiguous)) in the C++ API, but there isn't any other user visible effect. The intended end state of this diff stack is to eliminate the explicit MemoryFormat? arguments from native functions, but as this change has BC implications I'd prefer to do it separately. So this starts things off with a non-BC breaking addition to the API. For all internal functions that are not bound by codegen, I switch them to exclusively using TensorOptions (eliminating MemoryFormat); there's only a few, mostly quantized and to(). To keep things screwed down in the short term, it is a HARD ERROR to specify both the explicit MemoryFormat argument as well as TensorOptions. This caught a few errors in my diff where I needed to modify memory format settings and then call code later, esp in empty_like. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20073356 Pulled By: bhosmer fbshipit-source-id: 18d310d7ee7cf2ee182994104652afcfc9d613e2	2020-03-01 18:22:12 -08:00
svcscm	9f7708eecb	Updating submodules Summary: GitHub commits: `8c1badaa4a` `ce1ee42199` `b23caba073` `aa48f50c9a` `f7695cddae` `8a386d9549` `baab5386e2` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 6c036499de97418afd9337979e89365ce13ceee7	2020-03-01 16:05:00 -08:00
Yanli Zhao	15caf3b516	move test helper functions out of test funciton (#33960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33960 test helper functions should be out of test function. it is possible process 2 launches test functions slower than process 1, and process 1 sends request to run a helper function on process 2. process 2 may have not compile the helper function yet when process 2 starts to serve processs 1's request, and thus may return error like "attempted to get undefined function" ghstack-source-id: 99205620 Test Plan: test_remote_script_module was flaky for thrift backend in my local stress test runs, due to error "attempted to get undefined function". With fix in this diff, stress runs passed Differential Revision: D20167969 fbshipit-source-id: 8a2b9cd7bd62462e24bdbcb69ad32dca745d6956	2020-03-01 14:16:56 -08:00
Yinghai Lu	84ec5357d3	Make HashNode visible (#34045 ) Summary: HashNode and CompareNode are useful functions for hanlding jit::Node. This is to unblock https://github.com/pytorch/glow/pull/4235. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34045 Reviewed By: houseroad Differential Revision: D20184733 Pulled By: yinghai fbshipit-source-id: 6c829f2f111a490fd2d85017475c1731cd97fb20	2020-03-01 12:28:18 -08:00
Wanchao Liang	ace2b4f37f	[resubmit] try to infer rref type from python (#33992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33992 resubmit of https://github.com/pytorch/pytorch/pull/33369 with tweaks on when the rref type being created to ensure ivalue->type() hold the correct RRef type inside of inner element type. Test Plan: Imported from OSS Differential Revision: D20175043 Pulled By: wanchaol fbshipit-source-id: a08b178e989c995632374e6c868d23c5a85526ae	2020-02-29 20:26:40 -08:00
Basil Hosmer	7747fe81c4	reuse named tensor error message in generated code (#33536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33536 Simple fix, merge the identical string literals that were being inlined into every wrapper for ops that don't support named tensors. E.g. ``` Tensor all(const Tensor & self, int64_t dim, bool keepdim) { if (self.has_names()) { AT_ERROR( "all is not yet supported with named tensors. Please drop names via " "`tensor = tensor.rename(None)`, call the op with an unnamed tensor, " "and set names on the result of the operation."); } const OptionalDeviceGuard device_guard(device_of(self)); return at::native::all(self, dim, keepdim); } ``` becomes ``` Tensor all(const Tensor & self, int64_t dim, bool keepdim) { if (self.has_names()) { AT_ERROR("all", named_tensors_unsupported_error); } const OptionalDeviceGuard device_guard(device_of(self)); return at::native::all(self, dim, keepdim); } ``` Also updated the generated file comments to include the source template names, e.g. ``` // generated by aten/src/ATen/gen.py from TypeDefault.cpp ``` Test Plan: Imported from OSS Differential Revision: D19993407 Pulled By: bhosmer fbshipit-source-id: 88395a649e6ba53191332344123555c217c5eb40	2020-02-29 17:00:13 -08:00
Jiakai Liu	7f7ea685c0	Revert D18672405: Use codegen'ed unboxing wrappers Test Plan: revert-hammer Differential Revision: D18672405 Original commit changeset: bf2a7056082d fbshipit-source-id: b7ef1529fc266b4856e49e4dbd1fe8c7ba3d455d	2020-02-29 15:27:54 -08:00
Jiakai Liu	3acfccafbb	Revert D20172782: Fix mobile build Test Plan: revert-hammer Differential Revision: D20172782 Original commit changeset: e4bfca2a6076 fbshipit-source-id: 3093efd4a135f8d6c3174887ad1e3362aad9aa7c	2020-02-29 15:21:07 -08:00
Jiakai Liu	595445e889	Revert D20178827: Fix mobile build Test Plan: revert-hammer Differential Revision: D20178827 Original commit changeset: 980ac3d1ab3d fbshipit-source-id: 9af6cb319e80c9b6a916bbdeffd69920075c7aec	2020-02-29 15:04:35 -08:00
Jiakai Liu	c596ec7eb3	[pytorch] update code analyzer script to cover new c10::Module::def API (#33975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33975 Currently the code analysis script doesn't go beyond the scope of the registration API call, i.e. calling registration via a wrapper will not be covered by the analysis - currently the new API is essentially a wrapper around old API. Simply adding the new API signature to the registration API pattern can solve the problem for now. We might need change the analyzer code if things change significantly in the future. Test Plan: - update test project to use the new API; - run analyzer against pytorch codebase; Differential Revision: D20169549 Pulled By: ljk53 fbshipit-source-id: c7925fa0486eee18f07e791a38c32152fee59004	2020-02-29 10:29:45 -08:00
Sebastian Messmer	5a8562a6af	Fix mobile build (#34000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34000 - ghstack-source-id: 99241400 Test Plan: liujiakai Differential Revision: D20178827 fbshipit-source-id: 980ac3d1ab3d47c12613c20ee9b8dc7d083f56a9	2020-02-28 23:28:00 -08:00
Will Feng	1494005cfd	C++ tensor indexing: more indexing tests (#30427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30427 Test Plan: Imported from OSS Differential Revision: D18695899 Pulled By: yf225 fbshipit-source-id: 74455fe52ef922556fabe65aefca9ec93fe2346d	2020-02-28 22:07:41 -08:00
Kimish Patel	0e52627358	Fixing pthreadpool symbol conflict issue. (#33869 ) Summary: Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that is conflicting, to pthread_create_c2. Removed 2 other conflicting symbols that are not used internally at all. Pointing XNNPACK to original repo instead of the fork. Copy pasted the new interface and implementation to caff2/utils/threadpool, so that for internal builds we compile against this. When threadpool is unified this will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869 Differential Revision: D20140580 Pulled By: kimishpatel fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3	2020-02-28 21:23:18 -08:00
Elias Ellison	85b1c45a45	[JIT] fix alias assertion (#33952 ) Summary: This bug has been hit a couple times recently. We need to handle all bivariant types, not just optional, when asserting mutability/immutability of pointed-to elements in alias analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33952 Differential Revision: D20166025 Pulled By: eellison fbshipit-source-id: cf3df9897a639641ef8303a08ba2b13523d01ef1	2020-02-28 19:54:29 -08:00
davidriazati	2111c4ff0c	[jit] Add missing tensor properties (#33906 ) Summary: Fixes #30775 This adds TorchScript implementations (copied from `python_variable.cpp`) for the remainin `Tensor` properties that were missing from the jit, in addition to a test that ensures new properties will trigger a failure so we can decide whether we want to add them as well. For `some_tensor`, adds: * `some_tensor.T` * `some_tensor.ndim` * `some_tensor.is_leaf` * `some_tensor.name` ](https://our.intern.facebook.com/intern/diff/20153288/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33906 Pulled By: driazati Differential Revision: D20153288 fbshipit-source-id: 2ddc48a14267077bc176065267e5ce52181b3d6b	2020-02-28 19:06:11 -08:00
Sebastian Messmer	6e70b2da62	Fix mobile build (#33985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33985 This was broken by https://github.com/pytorch/pytorch/pull/32521 but only showed up in master CI builds ghstack-source-id: 99220995 Test Plan: CI Differential Revision: D20172782 fbshipit-source-id: e4bfca2a6076f1bc1c562fca9c7dfcb156bfbf3e	2020-02-28 18:43:18 -08:00
davidriazati	2f6ffe8c39	[jit] Resolve type annotation names to types (#29623 ) Summary: This adds some machinery so that we use Python to resolve types to a value and the corresponding resolution logic in `annotations.py` instead of using the string. This PR also `slowTests` a random test since it was taking > 1 min whereas all the other tests take < 10 seconds. Fixes #31864 Fixes #31950 ](https://our.intern.facebook.com/intern/diff/20144407/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29623 Pulled By: driazati Differential Revision: D20144407 fbshipit-source-id: ef3699f6b86039d8b4646ffc42c21bd1132d1681	2020-02-28 18:35:10 -08:00
Martin Yuan	55b44f6746	Throw an exception when method cannot be found from mobile module. (#33972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33972 Test Plan: Imported from OSS Differential Revision: D20168965 Pulled By: iseeyuan fbshipit-source-id: 2efe5dcb1fb80407cd88a47c50cb382ecd8aa275	2020-02-28 18:28:09 -08:00
Ailing Zhang	de55e47a4b	Pass all ops to XLA with additional info about whether it's compound (#33908 ) Summary: This PR prepares us to allow XLA use `XLAPreAutograd` to override compound ops. To do this we'll need to pass all ops, with additional infomation about whether it's compound or not for XLA to parse. Companion PR: https://github.com/pytorch/xla/pull/1698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33908 Differential Revision: D20149585 Pulled By: ailzhang fbshipit-source-id: a93140e8a34548fcabcea454386d15df58177c1d	2020-02-28 18:17:23 -08:00
Bert Maher	38b6cb479b	Check fuser results when profiling (#33944 ) Summary: With the profiling executor enabled the fuser won't be invoked until the second pass over a script function, so some of these tests weren't correctly comparing the fused output with the interpreter output. I've used the `checkScript` method where applicable, which seems to do the right thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33944 Test Plan: Locally inject obvious errors into the fuser and verify that the updated tests fail when they're supposed to. Differential Revision: D20162320 Pulled By: bertmaher fbshipit-source-id: 4a2f3f2d2ff1d81f23db504dc8cd0d5417bdcc50	2020-02-28 17:01:34 -08:00
Igor Sugak	4377061baf	[caffe2] fix atomicAdd redeclaration Clang error (#33559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33559 For sm_60+ CUDA supports `atomicAdd(double, double)` function and for lower compute capabilities the CUDA C Programming Guide [1] suggest a user implementation as in this code. On the other side, Clang's CUDA wrappers unconditionally define this function, regardless of compute capability, and merit an error if it actually get's used. So the problem is: when Clang is used for < sm_60, CUDA's `atomicAdd(double, double)` cannot be used and it cannot be redeclared in standard compliant C++. Workaround the problem by using Clang's `enable_if` attribute [2], which has a side effect of function redeclaration. 1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions 2. https://clang.llvm.org/docs/AttributeReference.html#enable-if Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005113 fbshipit-source-id: d0d4bd6514f201af9cdeba1229bd9b798df0d02e	2020-02-28 15:48:19 -08:00
Igor Sugak	4fb8679218	[caffe2] fix field initialization after base Clang errors (#33556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33556 Fix several places exposed by Clang where order of member initializer list doesn't actually match the actual initialization order. The fix is to simply reorder member initializer lists. Also accepted formatting changes suggested by clang-format linter. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20004834 fbshipit-source-id: b61c7c3f1fe8413bbb3512f6b62177a3ddf67682	2020-02-28 15:42:49 -08:00
David Reiss	991f7a20f2	Use clog from cpuinfo/deps instead of downloading (#33947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33947 XNNPACK was downloading clog because we weren't setting CLOG_SOURCE_DIR. Actually, it was downloading cpuinfo and pointing to the copy of clog within that. So let's just point to the copy of clog within the cpuinfo submodule we already have. (Note: this ignores all push blocking failures!) Test Plan: Ran cmake and didn't see any downloading. Verified that our clog is the same as the one that was being downloaded with `diff -Naur`. Differential Revision: D20169656 Pulled By: suo fbshipit-source-id: ba0f7d1535f702e504fbc4f0102e567f860db94b	2020-02-28 15:19:03 -08:00
Ailing Zhang	69d2741480	Add list of view ops to public doc. (#32560 ) Summary: This PR comes from discussion with albanD in https://fb.quip.com/npBHAXaPfnbu. Main goal is to clarify view ops with general outplace/inplace ops and remind users about the difference. For reference this information is only available in code which is internal and hard to find. Also changes to this list actually affect users so we think it's better to expose it as public information. It's also helpful for new backend like XLA when implementing PyTorch ops. `19bbb4fccb/tools/autograd/gen_autograd.py (L32-L68)` Please feel free to comment! Pull Request resolved: https://github.com/pytorch/pytorch/pull/32560 Differential Revision: D20161069 Pulled By: ailzhang fbshipit-source-id: b5f1fd4353fe7594a427784db288aeb5a37dc521	2020-02-28 15:05:55 -08:00
xiaobing.zhang	b678256bfb	Move glu to Aten(CPU) (#33179 ) Summary: This PR move glu to Aten(CPU). Test script: ``` import torch import torch.nn.functional as F import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" #warm up for n in [10, 100, 1000, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(1000): output = F.glu(input) output.backward(grad_output) for n in [10, 100, 1000, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(10000): t1 = _time() output = F.glu(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms). input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms). input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms). input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms). ``` After: ``` input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179 Differential Revision: D19839835 Pulled By: VitalyFedyunin fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9	2020-02-28 14:54:38 -08:00
Sebastian Messmer	3c5677a676	Use codegen'ed unboxing wrappers (#32521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32521 Not all ops support the templated unboxing wrappers yet. For the ones that don't, let's use the codegen'ed unboxing wrappers from register_aten_ops.cpp, but register them with c10 directly instead of JIT. The `use_c10_dispatcher` setting in `native_functions.yaml` now has a new option 'with_codegenerated_unboxing_wrapper' which means we take the codegened unboxing wrapper from register_aten_ops.cpp and stuff it into c10. This new argument is the default, 'unboxed_only' is not the default anymore. For the (very few) ops that don't support boxed dispatch yet (i.e. ops taking TensorOptions arguments), we set them to 'unboxed_only' and they follow the old behavior of having register_aten_ops.cpp register the jit op. Next steps here are (1) to make TensorOptions work with boxed dispatch and remove the `unboxed_only` option from `use_c10_dispatcher`, so that all ops go through the new path and (2) make the new path template-only and remove codegen from it (see https://github.com/pytorch/pytorch/issues/32366). First experiments show that - For a small JITted model that calls add (i.e. a op with just two arguments that are both tensors) on two tensors in a loop, we see a 2-4% performance improvement (~35-50ns) when compared to the old path. This is a simple op that takes two tensor arguments and no non-tensor arguments, so iterating over it in boxed dispatch is cheap. - For a small JITted model that calls avgpool1d (i.e. an op that has one tensor arg and 5 non-tensor args) on a tensor in a loop, we see a 3-4% performance regression (~60ns) when compared to the old path. This is an op that takes only one tensor argument and then 6 non-tensor arguments. Unboxed dispatch doesn’t have to look at those but boxed dispatch still needs to iterate over them. This performance difference is likely due to boxed dispatch iterating over all arguments in a loop and unboxed dispatch not having to look at non-tensor arguments. ghstack-source-id: 99161484 Test Plan: unit tests that call existing ops through JIT Differential Revision: D18672405 fbshipit-source-id: bf2a7056082dfad61e7e83e9eeff337060eb6944	2020-02-28 14:48:25 -08:00
Sebastian Messmer	2fa51dde28	Remove unnecessary tensor copies (#33732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732 move and forward instead of copy Benchmarks: A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance. No visible change for a model like resnet that does more work in its kernels. ghstack-source-id: 99161486 Test Plan: benchmarks Differential Revision: D20082642 fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847	2020-02-28 14:47:04 -08:00
Gregory Chanan	917e56e950	Throw an error if nbytes is called on a sparse tensor. (#33897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33897 Test Plan: Imported from OSS Differential Revision: D20146388 Pulled By: gchanan fbshipit-source-id: b5853096e290fa7fb50be41446b138ebdf71009f	2020-02-28 14:12:50 -08:00
Gregory Chanan	f5d92fbc25	Get rid of newWithStorage2d calls. (#33823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33823 Test Plan: Imported from OSS Differential Revision: D20122448 Pulled By: gchanan fbshipit-source-id: b249372c93ee71b84a293dfb5c298a8fb664da16	2020-02-28 14:07:44 -08:00
Hector Yuen	56d9906083	update mapping of fake operators (#33946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33946 update mapping of fake operators to model nnpi update SpatialBN to non-lowered Test Plan: compilation https://github.com/pytorch/pytorch/pull/33946 Reviewed By: amylittleyang Differential Revision: D20156136 fbshipit-source-id: e6ed87c3c5eba692a49376f0d9dae37ae185f185	2020-02-28 14:01:02 -08:00
svcscm	ad44394f15	Updating submodules Summary: GitHub commits: `e5b1164ad7` `6df461c14e` `41535d0218` `30c57a1a0e` `3b9aeb2ebe` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 8361b5814c531edc99f96f11db97d6b2adcc5280	2020-02-28 13:29:48 -08:00
Joseph Spisak	9fd1a7697f	Create CODE_OF_CONDUCT.md	2020-02-28 13:20:00 -08:00
Michael Carilli	a726827ec8	Formatting changes for gradient scaling (#33832 ) Summary: hard to get right locally...I can build the docs but never quite match what it looks like live. the bullet point indentation was just an oversight. Removing `Returns:` formatting tabs because they take up a lot of space when rendered and add no clarity. Some functions in Pytorch [do use them](https://pytorch.org/docs/master/torch.html#torch.eye), but [many don't bother](https://pytorch.org/docs/master/torch.html#torch.is_tensor), so apparently some people shared my feelings (Not using them is in line with existing practice). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33832 Differential Revision: D20135581 Pulled By: ngimel fbshipit-source-id: bc788a7e57b142f95c4fa5baf3fe01f94c45abd8	2020-02-28 11:40:48 -08:00
Igor Sugak	5dde8cd483	[caffe2] fix no matching function min/max Clang errors (#33563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563 When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used. Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1]. 1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005795 fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696	2020-02-28 11:33:24 -08:00
Mingfei Ma	c6d301220a	Fix torch.cat() performance regression on single core CPU (#33534 ) Summary: This PR addresses the performance regression on `torch.cat()` on CPU with single thread. Previous optimization https://github.com/pytorch/pytorch/issues/30806 introduced regression for several cases on pytorch operator benchmark. See https://github.com/pytorch/pytorch/issues/33334 for detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33534 Differential Revision: D20129963 Pulled By: VitalyFedyunin fbshipit-source-id: 3fa6cd266978e5b54fa37105555502b77352df3e	2020-02-28 11:22:08 -08:00
svcscm	890242254b	Updating submodules Summary: GitHub commits: `6f4df6e0cd` `6b7df86da1` `f873713ad6` `2b3b76cc4d` `b990727d33` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: bf7b1639ee23e1e823bc2217f56c87dc7befaf7f	2020-02-28 10:42:20 -08:00
Gregory Chanan	04dc0e6973	Split Distribution.cu into smaller files to reduce compilation time. (#33892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33892 Test Plan: Imported from OSS Differential Revision: D20148925 Pulled By: gchanan fbshipit-source-id: 955e6ff22ee5fb24000b9f2ee58a243e76edf993	2020-02-28 09:21:51 -08:00
anjali411	dece155335	Modified assertEqual to handle complex tensors (#33773 ) Summary: - Modified assertEqual to handle complex tensors - added a test in test_torch.py to test torch.zeros - added dispatch for complex for index_kernel, index_put_kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33773 Differential Revision: D20135553 Pulled By: anjali411 fbshipit-source-id: f716604535c0447ecffa335b0fc843431397c988	2020-02-28 08:43:28 -08:00
anjali411	09046713cc	removed .data from test_autograd.py (#33886 ) Summary: issue: https://github.com/pytorch/pytorch/issues/33630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33886 Differential Revision: D20160292 Pulled By: anjali411 fbshipit-source-id: 14a42d8148bd60db2dd8ec39f83f99c061ae19c1	2020-02-28 08:24:07 -08:00
Jerry Zhang	f5f1e5e7f6	[quant][graphmode][refactor] Factor out getInvokedMethod (#33649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33649 Test Plan: . Imported from OSS Differential Revision: D20123589 fbshipit-source-id: 0853d757434fb85c6d86666ff9fc991f8c4cb4bc	2020-02-27 23:48:09 -08:00
Jerry Zhang	7f1112820a	[quant][graphmode][refactor] Move check for weight outside of insertObserverFor (#33276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33276 att Test Plan: . Imported from OSS Differential Revision: D20123593 fbshipit-source-id: 45dc8488ddf02225ba2c20374c9385edd77a4912	2020-02-27 23:48:04 -08:00
Jerry Zhang	7c13f576ea	[quant][graphmode][refactor] Checks for bias and weight (#33273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33273 - Move the check for bias to valueNeedsToBeQuantized - Move TORCH_CHECK inside the functions for checking if a value is bias or weight Test Plan: . Imported from OSS Differential Revision: D20123595 fbshipit-source-id: 4b805d57dcaf41a6436506d021dd5f6518bc88fd	2020-02-27 23:47:59 -08:00
Jerry Zhang	97541a5106	[quant][graphmode][refactor] Move values_to_skip check inside valueNeedsToBeQuantized (#33275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33275 att Test Plan: . Imported from OSS Differential Revision: D20123592 fbshipit-source-id: 2b56ea8bab27eb9ea2bf792c83e48a7af8917e1a	2020-02-27 23:46:29 -08:00
Wanchao Liang	64aab3260a	[jit] allow RRef local creation with IValue objects (#33263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33263 This PR allow PyRRef local creation to inspect the pyobject, if it founds that we could turn it to an IValue, turn to an IValue first, otherwise hold it as a PyObjectType Test Plan: Imported from OSS https://fb.quip.com/aGxRAh2lCg05 Differential Revision: D19871243 Pulled By: wanchaol fbshipit-source-id: ae5be3c52fb1e6db33c64e64ef64bc8b9ea63a9a	2020-02-27 22:49:53 -08:00
Igor Sugak	1507573a52	[caffe2] fix no return statement in constexpr function Clang error in TypeIndex.h (#33576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33576 `throw` statement at the end of `constexpr` is ill-formed according to Clang. It happens when Clang is driving CUDA compilation and compiles for device the effected code. Due to its compilation model it requires host code to be well-formed even when compiling for device. Fix the error by guarding the entire definition of `type_index_impl` with `__CUDA_ARCH__` check. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: smessmer Differential Revision: D20008881 fbshipit-source-id: b0dc9abf0dc308b8b8637b54646a0411baf7fef3	2020-02-27 22:29:58 -08:00
peter	c18cb1eb52	Improve dll loading logic on Windows (#33856 ) Summary: The way it works on the Anaconda distribution of Python 3.8 is a bit different. Loading DLLs explicitly (e.g. `ctype.CDLL`) relies on paths appended by `os.add_dll_directory`. But if you try to load DLLs implicitly (e.g. `from torch._C import *`), it will rely on `PATH`. Fixes https://github.com/pytorch/vision/issues/1916. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33856 Differential Revision: D20150080 Pulled By: soumith fbshipit-source-id: cdbe76c138ea259ef7414c6634d4f7e0b1871af3	2020-02-27 21:58:35 -08:00
Meghan Lele	cb8d9f99aa	[JIT] Implement Tensor.tolist() (#33472 ) Summary: Summary This commit adds an implementation of `Tensor.tolist()` to the JIT interpreter. Testing This commit adds several unit tests that test that this function works correctly for 0D, 1D, 2D and 3D tensors of type `float`, `int` and `bool`. ``` (base) meghanl-mbp:pytorch meghanl$ python test/test_jit.py TestList.test_to_list -v Fail to import hypothesis in common_utils, tests are not derandomized test_to_list (jit.test_list_dict.TestList) Unit tests for Tensor.tolist() function. ... ok ---------------------------------------------------------------------- Ran 1 test in 0.329s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33472 Differential Revision: D20109738 Pulled By: SplitInfinity fbshipit-source-id: a6e3fee5e3201d5e1f0c4ca45048488ae2bf5e33	2020-02-27 21:45:46 -08:00
Wanchao Liang	5029ff001b	[Revert] manual revert of D19918320 (#33920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33920 revert D19918320 Test Plan: revert diff Reviewed By: zhaojuanmao Differential Revision: D20151299 fbshipit-source-id: c346554ae9074991331479e434e54b0cc513f1a4	2020-02-27 21:22:36 -08:00
Michael Suo	8f84deddd1	[jit] fix up refs in overview.md (#33919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33919 Test Plan: Imported from OSS Differential Revision: D20154953 Pulled By: suo fbshipit-source-id: 2ef83cce8da88212bed7edc813c9b233267ea81b	2020-02-27 19:22:51 -08:00
Michael Suo	d6485b411b	[jit] add top-level readme to csrc/jit (#33916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33916 Test Plan: Imported from OSS Differential Revision: D20150771 Pulled By: suo fbshipit-source-id: c7550954ddd6a294ce833348bf9fa058503e9bd7	2020-02-27 19:21:05 -08:00
Michael Suo	bd7e9c490a	[jit] stop printing crap in test_jit (#33917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33917 Test Plan: Imported from OSS Differential Revision: D20150750 Pulled By: suo fbshipit-source-id: 9a35298a8856d423fb6b9043174853cccf968706	2020-02-27 19:06:43 -08:00
lixinyu	d66c320b10	disable leaky_relu_ backward calculation with negative slope (#33639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33639 Test Plan: Imported from OSS Differential Revision: D20045735 Pulled By: glaringlee fbshipit-source-id: b3becf30a8fe9ee178792bd88f6ee10102504ed5	2020-02-27 18:54:57 -08:00
Jerry Zhang	997b5b5797	[quant][graphmode][refactor] Simplify signature for insertObserverFor (#33274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33274 att Test Plan: . Imported from OSS Differential Revision: D20123588 fbshipit-source-id: e656d96e0b6004bfcca5df2ab222184d4e1dd6ad	2020-02-27 17:24:41 -08:00
Michael Suo	db4a24e008	[jit] remove some unused/redundant files (#33806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33806 as title Test Plan: Imported from OSS Differential Revision: D20122117 Pulled By: suo fbshipit-source-id: 209d29ed2c873181140c9fb5cdc305c200ce4008	2020-02-27 17:16:12 -08:00
Vitaly Fedyunin	877ab3afe3	Better handing of Autograd+Fork errors. (#33885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33885 Fixes: #32835 Fixes: #5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Test Plan: Imported from OSS Differential Revision: D20144024 Pulled By: VitalyFedyunin fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f	2020-02-27 16:07:29 -08:00
Simón Sepúlveda Osses	746e5218e7	Mistake in MSELoss documentation (#33836 ) Summary: Replaced `sum` with `mean` in [line 392](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/loss.py#L392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33836 Differential Revision: D20142053 Pulled By: ailzhang fbshipit-source-id: 2bfe19944ffc5534902dd9087023e70ddf5746c3	2020-02-27 15:34:46 -08:00
Ailing Zhang	48fd410e44	Try fix XLAPreAutograd with _like functions. (#33848 ) Summary: In _like functions we call `globalLegacyTypeDispatch().initForDispatchKeySet(c10::detail::multi_dispatch_key_set(self, options));` -> `dispatchKeyToBackend` and thus this change. `self` has both `XLAPreAutograd` and `XLATensorId` in key set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33848 Differential Revision: D20135898 Pulled By: ailzhang fbshipit-source-id: a8585f39f3fa77b53718f20d3144f4f2f3cb8e53	2020-02-27 15:28:40 -08:00
Gregory Chanan	87e97ced20	Split UnaryOpsKernel into smaller files for faster compilation. (#33888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33888 Test Plan: Imported from OSS Differential Revision: D20143653 Pulled By: gchanan fbshipit-source-id: de708030e93e96091e0c01a89b4342872d0657dd	2020-02-27 15:13:01 -08:00
Eli Uriegas	aff1da5aac	.circleci: Remove trailing slash, fix conda upload (#33903 ) Summary: Conda registers a suffixed slash as a new user so it was failing to upload the anaconda packages. In the future this should be handled through a single variable that can be used for both but until then this will have to do. Bug was introduced in https://github.com/pytorch/pytorch/issues/33842 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33903 Differential Revision: D20148679 Pulled By: seemethere fbshipit-source-id: 27c95f5d906ce84aa34bf5d76fd6f1ef5df08fb9	2020-02-27 14:56:02 -08:00
Jongsoo Park	a7fe200f5f	[caffe2] simplify caffe2 code with fbgemm handling block size 1 emb (#33774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33774 Simplify caffe2 code using D19246900 Test Plan: CI Reviewed By: jianyuh Differential Revision: D20102410 fbshipit-source-id: 8de4d9cfac66898db0718ac6477339fd5e5428e3	2020-02-27 14:45:28 -08:00
Jack Cao	524dad13a8	Add device to the test tensor. Default device type is CPU, in pytorch… (#33635 ) Summary: …/xla this will result in a failure since it is comparing a XLA tensor with a CPU tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33635 Differential Revision: D20043517 Pulled By: ailzhang fbshipit-source-id: d84038ea675e4d4a9c02e7a8b0924bdb12f40501	2020-02-27 14:40:07 -08:00
HearyShen	edd5c009f7	fix docs mistakes in lr_scheduler.MultiplicativeLR (#33805 ) Summary: This PR is referenced to an issue: [The docs of `MultiplicativeLR` use `LambdaLR` as example](https://github.com/pytorch/pytorch/issues/33752#issue-570374087) https://github.com/pytorch/pytorch/issues/33752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33805 Differential Revision: D20121314 Pulled By: mruberry fbshipit-source-id: 5afa63bbe83d35ce4e55705b8cbd96326a907651	2020-02-27 14:11:57 -08:00
Gregory Chanan	d97560999b	Split BinaryCompareKernel.cu into a file-per-kernel to speed up compilation. (#33871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33871 Test Plan: Imported from OSS Differential Revision: D20140862 Pulled By: gchanan fbshipit-source-id: a4fde38c1c7c5905e3855fa490ea2e87bb24c703	2020-02-27 13:48:36 -08:00
Meghan Lele	5eacdfb21f	Revert D20127441: [pytorch][PR] [JIT] Introduce a fake Tensor creation node for IR unit tests Test Plan: revert-hammer Differential Revision: D20127441 Original commit changeset: 56da4f23ac46 fbshipit-source-id: 7d4602e5011bec6f6871eab16af05a3198694e5d	2020-02-27 13:48:31 -08:00
Gregory Chanan	c4d611a0f5	Split BinaryMiscOpsKernels into more files for faster build times. (#33873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33873 Test Plan: Imported from OSS Differential Revision: D20140974 Pulled By: gchanan fbshipit-source-id: 88b982881e8034f3b03cdb6911ae4239d2bb1596	2020-02-27 13:47:06 -08:00
Brian Vaughan	910acafc79	Revert D20124224: [jit] stop printing crap in test_jit Test Plan: revert-hammer Differential Revision: D20124224 Original commit changeset: 9241d21fdf94 fbshipit-source-id: 0680f9db922f9a33a4e859eedd142b87a51bbede	2020-02-27 13:40:34 -08:00
svcscm	53630f7681	Updating submodules Summary: GitHub commits: `ae68f84fcd` `6cb0beaf0e` `401fb54029` `fe8777e593` `44fcf005eb` `72ee067b90` `01a3c124d4` `c94f8f43b9` `a09b292a28` `472e40a902` `967d4bc051` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: e8e43b1cbc365fd7f5b068d625c4020240358690	2020-02-27 13:35:14 -08:00
Brian Vaughan	243af17d65	Revert D20103905: [jit] Fix flipped PackedSequence outputs in script Test Plan: revert-hammer Differential Revision: D20103905 Original commit changeset: 84081213ed21 fbshipit-source-id: 2b260654fac87e52fbaf8035018e4ea484928af1	2020-02-27 13:29:35 -08:00
Brian Vaughan	a7cf5c859f	Revert D20136865: fix lint Test Plan: revert-hammer Differential Revision: D20136865 Original commit changeset: 4bf7ac324a0a fbshipit-source-id: 94cc83cda180f744cec174d269f1b82babff0e5c	2020-02-27 13:21:44 -08:00
iurii zdebskyi	908eee5583	remove .data from test/distributed/ (#33874 ) Summary: `.data` calls are unsafe and should not be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33874 Differential Revision: D20141059 Pulled By: izdeby fbshipit-source-id: 8e11afc74f0cb04f5b18b458068fb813a6d51708	2020-02-27 13:14:29 -08:00
Meghan Lele	390d4d6df3	[JIT] Introduce a fake Tensor creation node for IR unit tests (#33595 ) Summary: Summary There is often a need to create a Tensor when writing IR by hand for JIT optimisation pass unit tests. The only options for this today are real Tensor creation functions like `aten::ones`. Any test that uses these functions must also use the same default arguments as the Python/C++ API, which means that all of the tests have to be updated when the API is updated. This commit introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that should be used in unit tests instead of real Tensor creation functions. This new primitive has no public-facing API, so the maintenance burden is much lower. Testing This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of `aten::rand`, `aten::ones`, and `aten::zeros`. ``` $ ./bin/test_jit CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -_CUDA:_MultiCUDA [==========] Running 75 tests from 1 test case. [----------] Global test environment set-up. [----------] 75 tests from JitTest [ RUN ] JitTest.ADFormulas [ OK ] JitTest.ADFormulas (82 ms) [ RUN ] JitTest.Attributes [ OK ] JitTest.Attributes (0 ms) ... ... ... [ RUN ] JitTest.LiteInterpreterPrim [ OK ] JitTest.LiteInterpreterPrim (0 ms) [ RUN ] JitTest.LiteInterpreterLoadOrigJit [ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms) [----------] 75 tests from JitTest (150 ms total) [----------] Global test environment tear-down [==========] 75 tests from 1 test case ran. (150 ms total) [ PASSED ] 75 tests. ``` Fixes* This pull request fixes https://github.com/pytorch/pytorch/issues/33500. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33595 Differential Revision: D20127441 Pulled By: SplitInfinity fbshipit-source-id: 56da4f23ac46335227254f606c6481718108f378	2020-02-27 13:10:20 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Jerry Zhang	afbd04449e	[quant][graphmode] Swap dequantize after inline for ops that doesn't require observation (#33173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33173 How to deal with ops that’s defined for both floating point and quantized Tensor? Category of ops: the ones that doesn’t require observers, which means the quantization parameters(scale/zero_point) of the output of this op can be inferred from the quantization parameters of inputs. For example: avg_pool, max_pool, flatten, transpose, upsample Another related topic to previous one is how do we deal with things like adaptive_avg_pool2d that does not require to be observed and it works with quantized tensor as well? If we insert quant/dequant for them, even the quant fusion becomes a numerically changing operation because the scale/zero_point for input and output are different. Proposal We can swap the operator with dequantize whenever we see it. For example, for pattern Let’s say aten::general_op is defined for both floating point and quantized %r = aten::conv(...) %q = quantize(%r) %dq = dequantize(%q) %f = aten::general_op(%dq) ... We detect that all inputs of aten::general_op is produced by dequantize, we’ll first delete all the dequantize for the inputs and then insert dequantize for each use of the output of the aten::general_op, note that this should work generally for all the case we might encounter. After transformation we’ll have: %r = aten::conv(...) %q = quantize(%r) %x = aten::general_op(%q) %f = dequantize(%x) ... 1. Multiple inputs 1. We need to make sure all inputs of the aten::general_op are produced by dequantize before we do this transformation 2. Input used by multiple operators 1. We already did this by inserting dequantize for each use of the value 3. Output used by multiple operators 1. We’ll reuse the code that inserts dequantize(might need some refactor) Note that current concat does not belong to this category right now since it does not inherit quantization parameters from inputs. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20123590 fbshipit-source-id: de2febe1f37e4079457a23acaeccbc6d9c9e1f8a	2020-02-27 12:42:29 -08:00
Lu Fang	6647a44e8c	Automatic update of fbcode/onnx to 9fdae4c68960a2d44cd1cc871c74a6a9d469fa1f (#33858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33858 Previous import was 04a29addfd5b912812addb8dea5f8763fbfaad01 Included changes: - [9fdae4c6](https://github.com/onnx/onnx/commit/9fdae4c6): Copy sizes in some optimizers to remain shape information (#2574) <daquexian> - [c978d102](https://github.com/onnx/onnx/commit/c978d102): Implement CELU node as a Function (#2575) <Jeremy Cochoy> - [c677aef4](https://github.com/onnx/onnx/commit/c677aef4): Fix CI build break (#2603) <Changming Sun> - [d343755d](https://github.com/onnx/onnx/commit/d343755d): Allow function body to rely on other operator sets (#2597) <Ke Zhang> Test Plan: ci Reviewed By: hl475 Differential Revision: D20135343 fbshipit-source-id: d719c4ba2ae26892a5fa921691c84eba64b59291	2020-02-27 12:40:39 -08:00
Gregory Chanan	bd77abffe3	Kill some unused (TH)Storage-based APIs. (#33815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33815 Test Plan: Imported from OSS Differential Revision: D20119333 Pulled By: gchanan fbshipit-source-id: 15042ca0fabdc88b53d662b6dd964968f64997f4	2020-02-27 12:23:25 -08:00
JeongUkJae	b10761d890	fix type stub errors (#33762 ) Summary: I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs. I expected below code should be type-checked without any errors. ```python import torch from torch.nn import Linear from torch.autograd import Variable from torch.optim import AdamW from torch.utils import hooks # nn.Module should have training attribute module = Linear(10, 20) module.training # torch should have dtype bfloat16 tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16) # torch.Tensor.cuda should accept int or str value torch.randn(5).cuda(1) torch.tensor(5).cuda('cuda:0') # optimizer should have default attribute module = Linear(10, 20) print(AdamW(module.weight).default) # torch.Tensor should have these boolean attributes torch.tensor([1]).is_sparse torch.tensor([1]).is_quantized torch.tensor([1]).is_mkldnn # Size class should tuple of int a, b = torch.tensor([[1,2,3]]).size() # check modules can be accessed torch.nn.parallel torch.autograd.profiler torch.multiprocessing torch.sparse torch.onnx torch.jit torch.hub torch.random torch.distributions torch.quantization torch.__config__ torch.__future__ torch.ops torch.classes # Variable class's constructor should return Tensor def fn_to_test_variable(t: torch.Tensor): return None v = Variable(torch.tensor(1)) fn_to_test_variable(v) # check RemovableHandle attributes can be accessed handle = hooks.RemovableHandle({}) handle.id handle.next_id # check torch function hints torch.is_grad_enabled() ``` But current master branch raises errors. (I checked with pyright) ``` $ pyright test.py Searching for source files Found 1 source file test.py 12:45 - error: 'bfloat16' is not a known member of module 15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]' 'int' is incompatible with 'device' Cannot assign to 'None' 16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]' 'str' is incompatible with 'device' Cannot assign to 'None' 23:19 - error: Cannot access member 'is_sparse' for type 'Tensor' Member 'is_sparse' is unknown 24:19 - error: Cannot access member 'is_quantized' for type 'Tensor' Member 'is_quantized' is unknown 25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor' Member 'is_mkldnn' is unknown 32:7 - error: 'autograd' is not a known member of module 33:7 - error: 'multiprocessing' is not a known member of module 34:7 - error: 'sparse' is not a known member of module 35:7 - error: 'onnx' is not a known member of module 36:7 - error: 'jit' is not a known member of module 37:7 - error: 'hub' is not a known member of module 38:7 - error: 'random' is not a known member of module 39:7 - error: 'distributions' is not a known member of module 40:7 - error: 'quantization' is not a known member of module 41:7 - error: '__config__' is not a known member of module 42:7 - error: '__future__' is not a known member of module 44:7 - error: 'ops' is not a known member of module 45:7 - error: 'classes' is not a known member of module 60:7 - error: 'is_grad_enabled' is not a known member of module 20 errors, 0 warnings Completed in 1.436sec ``` and below list is not checked as errors, but I think these are errors too. * `nn.Module.training` is not boolean * return type of `torch.Tensor.size()` is `Tuple[Unknown]`. --- related issues. https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762 Differential Revision: D20118884 Pulled By: albanD fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab	2020-02-27 06:58:53 -08:00
Pavel Belevich	095de1e872	Migrate `random_` from the TH to Aten (CPU and CUDA) (#33663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33663 Test Plan: Imported from OSS Differential Revision: D20056350 Pulled By: pbelevich fbshipit-source-id: f9859b79ffdec70c48d6ee3ec70fd6fad593a9f5	2020-02-27 05:05:42 -08:00
Michael Suo	f5952cf7cb	fix lint (#33861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33861 Test Plan: Imported from OSS Differential Revision: D20136865 Pulled By: suo fbshipit-source-id: 4bf7ac324a0abce9b45121ac5ab438448a6f3149	2020-02-27 00:33:51 -08:00
Shihao Xu	9733711394	[JIT] Support calling Tensor.element_size() in TorchScript (#33808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33808 # Problem https://github.com/pytorch/pytorch/issues/33620 ghstack-source-id: 99073701 Test Plan: ``` buck test mode/dev-nosan //caffe2/test:jit -- test_numel buck test mode/dev-nosan //caffe2/test:jit -- test_element_size buck build mode/dev-nosan //caffe2/test:jit \ && buck-out/gen/caffe2/test/jit\#binary.par -r test_numel buck build mode/dev-nosan //caffe2/test:jit \ && buck-out/gen/caffe2/test/jit\#binary.par -r test_element_size ``` Compile error P126667043 Generated code, ``` buck-out/dev/gen/caffe2/generate-code=register_aten_ops_0.cpp/register_aten_ops_0.cpp buck-out/dev/gen/caffe2/generate-code=register_aten_ops_2.cpp/register_aten_ops_2.cpp ``` P126667064 Differential Revision: D7050644 fbshipit-source-id: 20dbdb9c500b6d7683c23e3049d43ed0ca06d831	2020-02-26 22:30:44 -08:00
Hong Xu	00f685d2d8	Add Scalar::type() (#33603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33603 This function returns ScalarType based on its value. This is helpful to avoid code generated in aten_op.h has returned Scalars depending on arg self to determine its type. Test Plan: Imported from OSS Differential Revision: D20100218 Pulled By: ezyang fbshipit-source-id: 337729a7559e6abb3a16b2a563a2b92aa96c7016	2020-02-26 22:25:18 -08:00
Edward Yang	d41c8d0461	Correctly preserve "not set anywhere" TensorOptions when merging. (#33510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33510 Previously, we would fill in TensorOption with defaults whenever an item was missing from both the left and right side of the merge. This is morally incorrect: if we don't have an item on the left or right, we should keep the entry empty (so the downstream user can apply the appropriate defaulting rule). I don't think this caused any bugs, but I noticed this error when working on a later patch in my diff stack. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20001775 Pulled By: ezyang fbshipit-source-id: 88139fc268b488cd1834043584a0d73f46c8ecaa	2020-02-26 21:46:39 -08:00
Edward Yang	ca002a0f6b	Switch empty_like to use merge_in to process TensorOptions. (#33505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33505 This shouldn't change semantics, but it has the benefit of making torch::empty_like(x, dtype(kFloat)) actually work (previously, this would just ignore all of the properties from x). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20001776 Pulled By: ezyang fbshipit-source-id: ba81186d3293abc65da6130b2684d42e9e675208	2020-02-26 21:44:33 -08:00
Nathan Goldbaum	84101f353e	Avoid problematic pickle usages on Python 3.8.0 and 3.8.1 (#33824 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32289 This has been fixed upstream as of Python 3.8.2. I think the easiest and least invasive way to ameliorate this is to catch the error condition and print a more informative error asking the user to update their Python version. It might be possible to buffer the data into memory and then read from memory, but that would be an invasive change and might cause memory exhaustion for very large models. Suggestions for alternate fixes or ways to improve the error message wording are very welcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33824 Differential Revision: D20131722 Pulled By: ezyang fbshipit-source-id: a6e3fbf4bf7f9dcce5772b36f7a622cbf14b5ae4	2020-02-26 21:15:38 -08:00
Pritam Damania	421e3e9a54	Release GIL for RPC pybind functions. (#33610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33610 Our pybind definitions for several RPC functions didn't release GIL once we were processing stuff in C++. This PR adds asserts that we release GIL appropriately and adds py::gil_scoped_release and py::gil_scoped_acquire in the appropriate places. ghstack-source-id: 99066749 Test Plan: waitforbuildbot Differential Revision: D20025847 fbshipit-source-id: 57a778cba0336cf87352b07c89bbfb9254c4bdd7	2020-02-26 20:56:06 -08:00
davidriazati	cea0cc8ca8	[jit] Unify augmented assign handling (#33578 ) Summary: Stacked PRs * #33578 - [jit] Unify augmented assign handling * #32993 - [jit] Fix aug assign for non-tensor attributes We handle augmented assignments to `Select` and `Var` statements differently, but the actual in place update is the same for both, so this PR factors it out into a method so we don't have 2 code paths doing the same thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33578 Pulled By: driazati Differential Revision: D20127647 fbshipit-source-id: 94f37acbd2551498de9d2ca09a514508266f7d31	2020-02-26 19:13:15 -08:00
Omkar Salpekar	24dd800e6a	[Dist Autograd] Functional API for Dist Autograd and Dist Optimizer (#33711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711 Fixed #33480 This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id. This diff incorporates these API changes and all places where these functions are called. More concretely, this code: ``` with dist_autograd.context(): # Forward pass. dist_autograd.backward([loss.sum()]) dist_optim.step() ``` should now be written as follows: ``` with dist_autograd.context() as context_id: # Forward pass. dist_autograd.backward(context_id, [loss.sum()]) dist_optim.step(context_id) ``` Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking. Differential Revision: D20011710 fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65	2020-02-26 19:08:28 -08:00
Jerry Zhang	4c33222c51	[quant][graphmode] Replicate dequantize nodes (#33531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33531 We already insert dequantize for each use of the value, but there might still be cases where we only see the value is used multiple times after inline. This pass adds the support to replicate dequantize after inline to ensure output of dequantize is only used by one node, which is necessary to preserve all quantization patterns like `dequant - conv - quant` Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20123591 fbshipit-source-id: 6edb10a4566538bcf9379d332233f870372b7a63	2020-02-26 18:59:16 -08:00
davidriazati	2b9fa4a756	[jit] Fix flipped PackedSequence outputs in script (#32955 ) Summary: Stacked PRs * #32955 - [jit] Fix flipped PackedSequence outputs in script * #32953 - [jit] Support properties on `Device` Fixes #32605 ](https://our.intern.facebook.com/intern/diff/20103905/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32955 Pulled By: driazati Differential Revision: D20103905 fbshipit-source-id: 84081213ed214846e563b9f05bcab0210bb1a71b	2020-02-26 18:53:27 -08:00
Michael Suo	150e025be8	[jit] stop printing crap in test_jit (#33779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33779 This should eliminate random warnings and print spew from test_jit. It also fixes a bug where we weren't properly comparing captured outputs (!) Test Plan: Imported from OSS Differential Revision: D20124224 Pulled By: suo fbshipit-source-id: 9241d21fdf9470531b0437427b28e325cdf08d3a	2020-02-26 18:46:03 -08:00
Wanchao Liang	4dad00b64b	[rpc] special case tensor type check when getting RRef (#33582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33582 Test Plan: Imported from OSS Differential Revision: D20009837 Pulled By: wanchaol fbshipit-source-id: 7e9ab87d4dddb822c7575891a2b620eff83bfa00	2020-02-26 18:44:40 -08:00
Wanchao Liang	d494986171	[jit] make RRef type annotation available in Python (#33526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33526 Test Plan: Imported from OSS Differential Revision: D19988848 Pulled By: wanchaol fbshipit-source-id: aeebc946d08b38dac0b656617bf395e86bcea558	2020-02-26 18:44:35 -08:00
Wanchao Liang	2448c97a53	[jit] infer RRef type as container type (#33369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33369 This PR add RRef type infer rule when we try to infer a type from a pyobject, this allow script module attributes could contain a rref, (i.e. List[RRefs] as a module attribute) Test Plan: Imported from OSS Differential Revision: D19918320 Pulled By: wanchaol fbshipit-source-id: e5fd99c0ba5693b22ed48f0c0550b5e1dac89990	2020-02-26 18:43:13 -08:00
Elias Ellison	857eb4145e	[JIT] add support for torch.cdist (#33737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33737 Test Plan: Imported from OSS Differential Revision: D20121916 Pulled By: eellison fbshipit-source-id: b0427bbfd3ade1f3129c4a95a542fbc32c3abd76	2020-02-26 18:37:37 -08:00
Elias Ellison	f31b1d3453	[JIT] add support for lu_unpack (#33736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33736 Test Plan: Imported from OSS Differential Revision: D20121914 Pulled By: eellison fbshipit-source-id: 1136f4d7678a2233129aefe3e30234af385b8353	2020-02-26 18:37:33 -08:00
Elias Ellison	4543cf4eb1	[JIT] add support for torch.lu to torchscript (#33724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33724 Fix for https://github.com/pytorch/pytorch/issues/33381, partial fix of https://github.com/pytorch/pytorch/issues/30786 Test Plan: Imported from OSS Differential Revision: D20077321 Pulled By: eellison fbshipit-source-id: a1e6a0370712b36c9f66979098ac2f9d500ca5f6	2020-02-26 18:37:28 -08:00
Elias Ellison	fddf73250d	[JIT] fix resolving of functions in torch/functional. fix compilation of torch.stft (#33504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33504 Fix resolution fo functions that are bound onto torch in torch/functional.py. This does not fix compilation of all of those functions, those will be done in follow ups. Does torch.stft as a start. Fixes #21478 Test Plan: Imported from OSS Differential Revision: D20014591 Pulled By: eellison fbshipit-source-id: bb362f1b5479adbb890e72a54111ef716679d127	2020-02-26 18:35:43 -08:00
Elias Ellison	057fd5e10d	add support for _modules, reducing special casing of nn.Sequential (#29495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29495 This PR adds support for `_modules`, making it so we no longer need to special case support for `nn.Sequential`. I was getting internal errors around the previous approach using `self.define()`, so i am adding this PR as part of the stack. Fix for https://github.com/pytorch/pytorch/issues/28998 Test Plan: Imported from OSS Differential Revision: D18412561 Pulled By: eellison fbshipit-source-id: a8b24ebee39638fccf63b2701f65f8bb0de84faa	2020-02-26 18:07:19 -08:00
Eli Uriegas	6eef66e1f4	.circleci: Divert packages to test channel on tag (#33842 ) Summary: This sets up PIP_UPLOAD_FOLDER to point to the correct channel for release candidates as opposed to nightlies. Removes an old safety check that's not needed anymore for devtoolset3 And provides a nice default for PIP_UPLOAD_FOLDER, which should clear up confusion on where it's initially set This is a stepping stone towards the promotable pipeline. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33842 Differential Revision: D20130791 Pulled By: seemethere fbshipit-source-id: dac94ef46299574c36c08c968dd36faddeae6363	2020-02-26 17:25:18 -08:00
Mingfei Ma	cd0acf4374	port masked_fill from TH to ATen (#33330 ) Summary: port `masked_fill` from TH to ATen with TensorIterator. single core performance roughly stays the same, single socket performance has 3~16x boost. `masked_fill` is missing from https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33330 Differential Revision: D20098812 Pulled By: VitalyFedyunin fbshipit-source-id: ff20712ffc00cc665550997abcfdfb8916c18e40	2020-02-26 17:20:07 -08:00
Lara Haidar	a0e90e1b45	ONNX Error Message on Missing Op (#33593 ) Summary: Print a complete and comprehensive error message with a description of the issue when an op is missing during ONNX export (previously an ambiguous "key not in registry" error was thrown which was not helpful for the user to understand the failure). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33593 Reviewed By: hl475 Differential Revision: D20052213 Pulled By: houseroad fbshipit-source-id: ae3010a97efdab26effad5e4a418e9cc41f5b04e	2020-02-26 15:18:16 -08:00
Gregory Chanan	02908dfa67	remove setStorage with null StorageImpl support. (#33735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33735 This apparently used to create a new storage, but I couldn't find anywhere in the code where this actually happens. Changing it to an assert to see what happens. Test Plan: Imported from OSS Differential Revision: D20084029 Pulled By: gchanan fbshipit-source-id: e9c4db115a25fc2e17a3b166c1ff5a0e6b56d690	2020-02-26 15:12:41 -08:00
Yinghai Lu	04f88a3a7b	Add partition info message to NetDef (#33616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33616 Att. We start by assign `node_name` of DeviceOption in each of the op in the net. The for each unique node_name, we will have a PartitionInfo describing the partition, including logic devices that it can be assigned and we establish the link by partition names. Test Plan: unittests Canaries: AF: https://our.intern.facebook.com/intern/ads/canary/424817103900710410 AI: https://our.intern.facebook.com/intern/ads/canary/424737510862189908 Reviewed By: ipiszy, bangshengtang, jfix71 Differential Revision: D20015493 fbshipit-source-id: 0bb0f30cfc3892f7b8709d87b8bc1fbab2f2c46d	2020-02-26 14:54:58 -08:00
David Riazati	51e405743f	Revert D20010383: [jit] Unify augmented assign handling Test Plan: revert-hammer Differential Revision: D20010383 Original commit changeset: 52e559ce907e fbshipit-source-id: 7ca938070d5e98c91e7a7b8485a3c1e790c3ceb2	2020-02-26 14:22:14 -08:00
davidriazati	867990dc17	[jit] Unify augmented assign handling (#33578 ) Summary: Stacked PRs * #33578 - [jit] Unify augmented assign handling * #32993 - [jit] Fix aug assign for non-tensor attributes We handle augmented assignments to `Select` and `Var` statements differently, but the actual in place update is the same for both, so this PR factors it out into a method so we don't have 2 code paths doing the same thing. ](https://our.intern.facebook.com/intern/diff/20010383/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33578 Pulled By: driazati Differential Revision: D20010383 fbshipit-source-id: 52e559ce907e95e5c169ab9d9690d0d235db36f3	2020-02-26 14:09:40 -08:00
Jerry Zhang	c32fa465a5	Preserve Backward compatibility of models serialized before #31040 (#33796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33796 Test Plan: Imported from OSS Differential Revision: D20109662 Pulled By: jerryzh168 fbshipit-source-id: 9bc936a59fd6dd1031fbf05eb90f98ae9677b936	2020-02-26 13:40:38 -08:00
Will Feng	5c33d98b0d	Add assert_tensor_equal and assert_tensor_not_equal to test/cpp/api/support.h (#30426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30426 This PR adds `assert_tensor_equal` and `assert_tensor_not_equal` to `test/cpp/api/support.h`, as better functions for testing whether two tensors are equal / not equal. Test Plan: Imported from OSS Differential Revision: D18695900 Pulled By: yf225 fbshipit-source-id: c19b9bc4c4e84d9f444015023649d27618fcbdf5	2020-02-26 13:25:25 -08:00
Wojciech Baranowski	8aa09de19e	build: set -DNDEBUG in Release (#32719 ) Summary: This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719 Test Plan: * Build with VERBOSE=1 and manually inspect `less ndebug.build.log \| grep 'c++' \| grep -v -- -DNDEBUG` (only with nina on Linux) * CI Fixes https://github.com/pytorch/pytorch/issues/22745 Differential Revision: D20104340 Pulled By: yf225 fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c	2020-02-26 12:53:31 -08:00
Eli Uriegas	93e30c16cb	.circleci: Switch to using robot token for conda uploads (#33786 ) Summary: Thanks to pjh5 for continued use of his account to upload binaries but I think we can start using a bot account now for this. Just a draft until we can ensure the env variables get injected correctly and the token can actually upload Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33786 Differential Revision: D20122423 Pulled By: seemethere fbshipit-source-id: 0444584831a40ae730325d258935f6d1b873961b	2020-02-26 11:37:40 -08:00
Gao, Xiang	45e4b614d1	Per channel quantization performance improvement (#33772 ) Summary: Benchmark: NVIDIA GTX 1650 + AMD Ryzen Threadripper 3970X ```python import torch print(torch.__version__) for i in range(1000): torch.randn(1024 * 128, device='cuda') def cuda(e): a = torch.randn(2 e, 32, device='cuda') s = torch.randn(32, device='cuda') z = torch.randn(32, device='cuda') torch.cuda.synchronize() %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); torch.cuda.synchronize() def cpu(e): a = torch.randn(2 e, 32, device='cpu') s = torch.randn(32, device='cpu') z = torch.randn(32, device='cpu') %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); for i in range(10, 24): cuda(i) print() for i in range(10, 32): cpu(i) ``` Before ``` 1.5.0a0+9bc922d 849 µs ± 44.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 817 µs ± 30.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 814 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.11 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.19 ms ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.6 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.44 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.14 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.41 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 26.9 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.6 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 104 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 207 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 249 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 420 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 766 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.45 ms ± 574 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.84 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.69 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.29 ms ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.32 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.4 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 47.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 187 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 379 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 652 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.22 s ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 2.34 s ± 8.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4.56 s ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8.97 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 17.8 s ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 35.2 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 1.5.0a0+a7ec8cc 92.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 97.7 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 109 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 119 µs ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 146 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 211 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 347 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 624 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.17 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.25 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.43 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 8.51 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 16.9 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 33.7 ms ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 201 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 285 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 347 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 675 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.34 ms ± 643 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.82 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.7 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 20.3 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.4 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 78.8 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 153 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 285 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 541 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.03 s ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.97 s ± 8.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.81 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Fixes https://github.com/pytorch/pytorch/issues/33647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33772 Differential Revision: D20112531 Pulled By: ngimel fbshipit-source-id: f90e3ef1b5be8276851637f3e1251cb8f1af411f	2020-02-26 10:19:25 -08:00
Barak Nehoran	f597ac6efc	Fix grid_sample gradients at image borders (#32829 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23925 This fixes the incorrect gradients returned by `F.grid_sample` at image borders under `"border"` and `"reflection"` padding modes. At nondifferentiable points, the choice of which gradient to return among its super- or subgradients is rather arbitrary and generally does not affect training. Before this change, however, a bug in the code meant that the gradient returned at the exact borders was not selected from among the super- or subgradients. The gradient is now set to zero at the borders, which is a defensible choice for both the `"border"` and `"reflection"` padding modes: * For `"border"` padding, this effectively means that the exact borders of the image are now considered out of bounds, and therefore receive zero gradient. * For `"reflection"` padding, this effectively treats the exact borders as extrema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32829 Differential Revision: D20118564 Pulled By: soumith fbshipit-source-id: ef8571ff585be35ab1b90a922af299f53ab9c095	2020-02-26 10:10:42 -08:00
Michela Paganini	b8f0acf50f	Fix examples with updated pruning naming convention (#33144 ) Summary: Fix in docs requested by vainaijr. Closes issue https://github.com/pytorch/pytorch/issues/32991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33144 Differential Revision: D20104640 Pulled By: albanD fbshipit-source-id: 9b1be2c1cbde1964967967a9581bb6932a305d81	2020-02-26 10:02:50 -08:00
Daya Khudia	a8e7ed48f4	[pt][quant] Parallelize quantize and dequantize (#33765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33765 quantize and dequantize methods now make use of multiple threads. This makes use of shz0116's recent parallelization of quantize/dequantize routines in FBGEMM. Fixes: https://github.com/pytorch/pytorch/issues/32006 https://github.com/pytorch/FBGEMM/issues/142 Alternative to https://github.com/pytorch/pytorch/pull/30153 ``` #!/usr/bin/env python import time import torch import torch.nn as nn torch.set_num_threads(4) # print(torch.__config__.parallel_info()) W = torch.rand(1, 54, 54, 256) NITER = 1000 s = time.time() for i in range(NITER): W_q = torch.quantize_per_tensor(W, scale=1.0, zero_point = 0, dtype=torch.quint8) time_per_iter = (time.time() - s) / NITER print('quantize time per iter ms', time_per_iter * 1000) s = time.time() for i in range(NITER): W_deq = W_q.dequantize() time_per_iter = (time.time() - s) / NITER print('dequantize time per iter ms', time_per_iter * 1000) ``` ### With 1 thread quantize time per iter ms 0.22633790969848633 dequantize time per iter ms 0.6573665142059326 ### With 4 threads quantize time per iter ms 0.0905618667602539 dequantize time per iter ms 0.19511842727661133 ghstack-source-id: 98935895 Test Plan: python test/test_quantized.py Reviewed By: jspark1105 Differential Revision: D20098521 fbshipit-source-id: bd8c45761b4651fcd5b20b95759e3868a136c048	2020-02-26 10:00:40 -08:00
Peter Bell	2eb95d8f4a	Migrate `fmod` and `fmod_` from TH to ATen (CPU) (#33592 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33592 Differential Revision: D20043875 Pulled By: ezyang fbshipit-source-id: b8c0a4e73a3cef6e55e91bbd35f8aadca8114c56	2020-02-26 09:35:16 -08:00
Hong Xu	f87b0b2515	Remove the use of macros in defining binary ops for base Vec256 (#33733 ) Summary: This greatly improves readability and maintainability (e.g., debugging) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33733 Differential Revision: D20103187 Pulled By: ezyang fbshipit-source-id: e539e46f5d378a2b01da7ecaa6b850655e0fa866	2020-02-26 09:21:35 -08:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Ahmad Salim Al-Sibahi	24659d28a1	Feature/vonmises upstream (#33418 ) Summary: Third try of https://github.com/pytorch/pytorch/issues/33177 😄 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33418 Differential Revision: D20069683 Pulled By: ezyang fbshipit-source-id: f58e45e91b672bfde2e41a4480215ba4c613f9de	2020-02-26 08:19:12 -08:00
Martin Yuan	758ad516f3	[Lite interpreter] Pass shared_ptr properly (#33667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33667 Pass shared_ptr properly according to C++ guidances. Thank kimishpatel for pointing it out. Test Plan: Imported from OSS Differential Revision: D20111001 Pulled By: iseeyuan fbshipit-source-id: 213a0f950a7f3b9199d789dc0155911f6102d77a	2020-02-25 21:40:05 -08:00
Michael Carilli	fc6a153688	[WIP] Reanimate gradient scaling API with original scale update heuristic (#33366 ) Summary: Also, windows memory failures responsible for the earlier reversion have been fixed. This PR (initially) contains 2 commits: * a revert of the revert * all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366 Differential Revision: D20099026 Pulled By: ngimel fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529	2020-02-25 19:00:34 -08:00
Emilio Castillo	a836c4ca78	Skip manual backward for `cdist` with case `p=2` (#31167 ) Summary: Fixes an issue with `cdist` backward calculation for large inputs for the euclidean case. The grid size when launching the kernel exceeded the 2^16 limit for the second dimension, resulting in `RuntimeError: CUDA error: invalid configuration argument` Code to reproduce: ``` h, w, d = 800, 1216, 12 n = 133 A = torch.randn(n, d).cuda() B = torch.randn(h, w, d).cuda() A.requires_grad = True B.requires_grad = True B = B.reshape(-1, d).contiguous() dist = torch.cdist(A, B) loss = dist.sum() loss.backward() ``` Thanks to tkerola for the bug report, reproduction and suggesting a solution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31167 Differential Revision: D20035605 Pulled By: ngimel fbshipit-source-id: ae28ba4b549ee07a8bd937bb1de2438dc24eaa17	2020-02-25 18:19:30 -08:00
Stas Bekman	9a5ea71380	pad_packed_sequence: doc improvement (#33768 ) Summary: pad_packed_sequence: 1. clarify that batch's order is restored to the original one 2. add example This is a follow up to https://github.com/pytorch/pytorch/issues/33746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33768 Differential Revision: D20102792 Pulled By: ngimel fbshipit-source-id: 5ef511e5e3833edcb85cc01af0e92568b6d7a3cf	2020-02-25 18:00:04 -08:00
joerg-de	5bac7febad	removed padding and dilation from LPPool2d Doc (#33714 ) Summary: removed padding and dilation from LPPool2d Doc as the function dose not support padding and dilation Pull Request resolved: https://github.com/pytorch/pytorch/pull/33714 Differential Revision: D20097021 Pulled By: ngimel fbshipit-source-id: fc1c2d918b32f4b45c7e6e6bd93f018e867a628f	2020-02-25 17:54:38 -08:00
Haixin Liu	038ee01393	Disable printing of the histogram when dump (#33749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33749 Disable printing of the histogram when dump to make the log cleaner. Test Plan: CI Reviewed By: amylittleyang Differential Revision: D20087735 fbshipit-source-id: 5421cd9d25c340d92f29ce63fed2a58aefef567d	2020-02-25 17:37:55 -08:00
Jerry Zhang	8667379133	[quant][graphmode][refactor] Factor out insertDequantCall (#33172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33172 For code reuse Test Plan: . Imported from OSS Differential Revision: D20087842 fbshipit-source-id: 797868d31b96c4ff8640121ea4bee1396deb6b57	2020-02-25 17:22:35 -08:00
Jerry Zhang	a13ee18982	[quant][graphmode] refactor nodeQuantizable (#33171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33171 For better code reuse Test Plan: . Imported from OSS Differential Revision: D20087845 fbshipit-source-id: f88cffb410bd54a1b3f937786104f46bcd1190d3	2020-02-25 15:20:22 -08:00
Edward Yang	8159316714	Revert D19941103: [pytorch] blas gemm fix for k=0 Test Plan: revert-hammer Differential Revision: D19941103 Original commit changeset: e1c85d1e7574 fbshipit-source-id: da12747130c60b61452aa46e269c66546a1075f9	2020-02-25 13:30:38 -08:00
xiaobing.zhang	4d203c6fc8	Move cumprod and cumsum to Aten(CPU) (#33280 ) Summary: This PR is about move cumprod and cumsum to Aten. Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #torch.set_num_threads(1) #warm up for n in [10, 300]: input = torch.randn(n, n, n, requires_grad=False, device=device) input = input * 0.01 + 1 for dim in range(input.dim()): for i in range(100): #output = input.cumsum(dim) output = input.cumprod(dim) for n in [10, 300]: input = torch.randn(n, n, n, requires_grad=False, device=device) input = input * 0.01 + 1 for dim in range(input.dim()): fwd_t = 0 for i in range(1000): t1 = _time() #output = input.cumsum(dim) output = input.cumprod(dim) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 1000 * 1000 print("size = (%d, %d, %d); reduce dim=%d; compute time is %.4f(ms)" % (n, n, n, dim, fwd_avg)) ``` Test device: skx-8180. Performance: ``` size = (10, 10, 10); reduce dim=0; compute time is 0.0098(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0089(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0089(ms) size = (300, 300, 300); reduce dim=0; compute time is 208.9403(ms) size = (300, 300, 300); reduce dim=1; compute time is 241.5989(ms) size = (300, 300, 300); reduce dim=2; compute time is 66.2587(ms) After: size = (10, 10, 10); reduce dim=0; compute time is 0.0065(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0063(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms) size = (300, 300, 300); reduce dim=0; compute time is 36.0139(ms) size = (300, 300, 300); reduce dim=1; compute time is 36.0776(ms) size = (300, 300, 300); reduce dim=2; compute time is 21.0111(ms) number_threads = 1: size = (10, 10, 10); reduce dim=0; compute time is 0.0053(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0051(ms) size = (300, 300, 300); reduce dim=0; compute time is 81.8831(ms) size = (300, 300, 300); reduce dim=1; compute time is 88.5687(ms) size = (300, 300, 300); reduce dim=2; compute time is 54.9922(ms) cumprod: Before: size = (10, 10, 10); reduce dim=0; compute time is 0.0096(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0088(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0088(ms) size = (300, 300, 300); reduce dim=0; compute time is 221.2601(ms) size = (300, 300, 300); reduce dim=1; compute time is 249.7894(ms) size = (300, 300, 300); reduce dim=2; compute time is 71.5182(ms) number_threads = 1: size = (10, 10, 10); reduce dim=0; compute time is 0.0100(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0093(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0093(ms) size = (300, 300, 300); reduce dim=0; compute time is 207.6287(ms) size = (300, 300, 300); reduce dim=1; compute time is 241.6693(ms) size = (300, 300, 300); reduce dim=2; compute time is 66.2977(ms) After: size = (10, 10, 10); reduce dim=0; compute time is 0.0063(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0062(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms) size = (300, 300, 300); reduce dim=0; compute time is 36.4283(ms) size = (300, 300, 300); reduce dim=1; compute time is 38.1139(ms) size = (300, 300, 300); reduce dim=2; compute time is 20.9140(ms) number_threads =1: size = (10, 10, 10); reduce dim=0; compute time is 0.0052(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0050(ms) size = (300, 300, 300); reduce dim=0; compute time is 82.6926(ms) size = (300, 300, 300); reduce dim=1; compute time is 90.1265(ms) size = (300, 300, 300); reduce dim=2; compute time is 55.0196(ms) ``` Fix https://github.com/pytorch/pytorch/issues/24668, https://github.com/pytorch/pytorch/issues/24669. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33280 Differential Revision: D20076997 Pulled By: VitalyFedyunin fbshipit-source-id: 12225767da8cfdc5e44257462a432bffa04cd469	2020-02-25 13:03:16 -08:00
Will Feng	0dded4026e	[C++ API] Add PackedSequence / pack_padded_sequence / pad_packed_sequence / pack_sequence (#33652 ) Summary: Most of the function implementation and test code are translated from the Python version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33652 Differential Revision: D20052211 Pulled By: yf225 fbshipit-source-id: ce6767db54364f91ef4f06674239a12278c2752a	2020-02-25 12:53:41 -08:00
Yuxin Wu	c20628c5f6	Remove `clean_tag` from tensorboard (#33133 ) Summary: The function originally comes from `4279f99847/tensorflow/python/ops/summary_op_util.py (L45-L68)` As its comment says: ``` # In the past, the first argument to summary ops was a tag, which allowed # arbitrary characters. Now we are changing the first argument to be the node # name. This has a number of advantages (users of summary ops now can # take advantage of the tf name scope system) but risks breaking existing # usage, because a much smaller set of characters are allowed in node names. # This function replaces all illegal characters with _s, and logs a warning. # It also strips leading slashes from the name. ``` This function is only for compatibility with TF's operator name restrictions, and is therefore no longer valid in pytorch. By removing it, tensorboard summaries can use more characters in the names. Before: ![0209-12:10:14](https://user-images.githubusercontent.com/1381301/74109072-37382e00-4b35-11ea-8c9f-ab37a8bd5808.png) After: ![0209-12:10:57](https://user-images.githubusercontent.com/1381301/74109081-4323f000-4b35-11ea-9dab-447f8466a41e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33133 Differential Revision: D20089307 Pulled By: ezyang fbshipit-source-id: 3552646dce1d5fa0bde7470f32d5376e67ec31c6	2020-02-25 12:41:58 -08:00
peter	72288e82e2	Use shim executable sccache-cl as the compiler instead of sccache cl (#33745 ) Summary: CMake only views the first item of `CC` and `CXX` as executable. So calling `sccache.exe` directly won't work. Using a shim executable resolves this problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33745 Differential Revision: D20100397 Pulled By: soumith fbshipit-source-id: 3a130d30dd548b7c2e726c064e66ae4fccb30c44	2020-02-25 12:24:05 -08:00
Edward Yang	0e74cbcc54	Revert "Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572 )" (#33742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33742 This reverts commit 90f4c5695e1785883d9ae7c86ad3fabd1963a4cb. Test Plan: Imported from OSS Differential Revision: D20095103 Pulled By: ezyang fbshipit-source-id: ff47dae21c278570b4ca497d76deedb75823d6d7	2020-02-25 12:09:49 -08:00
peter	9bc922d518	Extend cuda install timeout for Windows jobs (#33755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33755 Differential Revision: D20100372 Pulled By: soumith fbshipit-source-id: 8b39177d3e87d248857f0582de6c9e203d09d4a7	2020-02-25 11:51:43 -08:00
Jerry Zhang	7eba36b1f6	[quant][graphmode][refactor] Separate preprocess step for insertObserver (#32813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32813 We need to separate the step to make the logic more clear and also to find all the values we want to skip in advance without the interference of inserted observers Test Plan: . Imported from OSS Differential Revision: D20087841 fbshipit-source-id: ec3654ca561c0d4e2c05011988bb9ecc8671c5c2	2020-02-25 11:26:22 -08:00
Rohan Varma	d82093e665	[profiler] remove redundant assert in record_function_ops (#33225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33225 This removes a redundant assert statement in `record_function_ops`. In the else branch in question, we are guaranteed to have `current == &rec`, so this assert will never fire. Although, maybe we should add an assert failure when `current == &rec` since it seems that `current` should always be profiler::record_function_exit. ghstack-source-id: 98852219 Test Plan: Existing autograd profiler UTs past Differential Revision: D19849145 fbshipit-source-id: 2014a0d3b9d11e5b64942a54e0fb45e21f46cfa2	2020-02-25 10:59:10 -08:00
Meghan Lele	2b404de347	[scripts] Add script to fetch clang-format binary from AWS S3 (#33644 ) Summary: Summary This commit adds a script that fetches a platform-appropriate `clang-format` binary from S3 for use during PyTorch development. The goal is for everyone to use the exact same `clang-format` binary so that there are no formatting conflicts. Testing Ran the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33644 Differential Revision: D20076598 Pulled By: SplitInfinity fbshipit-source-id: cd837076fd30e9c7a8280665c0d652a33b559047	2020-02-25 10:47:03 -08:00
Xiang Gao	98526c7444	Migrate fake_quant_slice to TensorIterator (#33744 ) Summary: This is a quick improvement for per tensor quantization. per-channel should remove the loop in https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/fake_quant_per_channel_affine.cpp # Benchmark: device = GTX-1650 ```python import torch print(torch.__version__) for i in range(1000): torch.randn(1024 * 128, device='cuda') def f(e): a = torch.randn(2 ** e, device='cuda') torch.cuda.synchronize() %timeit torch.fake_quantize_per_tensor_affine(a, 0.5, 0, 0, 1); torch.cuda.synchronize() for i in range(15, 27): f(i) ``` Before ``` 1.5.0a0+bf00b4d 14.5 µs ± 981 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18.2 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 25.6 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 38.6 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 70.2 µs ± 5.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 125 µs ± 4.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 231 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 461 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 891 µs ± 88.2 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.77 ms ± 8.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.77 ms ± 80.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.16 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ``` 1.5.0a0+3f18ac3 12.5 µs ± 738 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 13.7 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 17.9 µs ± 850 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 29.7 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50.4 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 95 µs ± 8.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 173 µs ± 7.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 348 µs ± 29.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 657 µs ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.33 ms ± 77.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.71 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.33 ms ± 439 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33744 Differential Revision: D20090129 Pulled By: ngimel fbshipit-source-id: 5dd48a0c5455a2b6c5c638d747c1767cb259255d	2020-02-25 10:44:21 -08:00
Gregory Chanan	8196ec0115	Remove some dead THStorage related code. (#33734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33734 Test Plan: Imported from OSS Differential Revision: D20084030 Pulled By: gchanan fbshipit-source-id: 29aa5459e8ecc8af8af31157797f44057d6a786e	2020-02-25 09:44:05 -08:00
Jianyu Huang	5ef1c2c5d2	Back out "[pt][quant] RNN debug test" (#33750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33750 Original commit changeset: 8c38d8f067e5 ghstack-source-id: 98911215 Test Plan: CI Differential Revision: D20090521 fbshipit-source-id: 73df43ad60574e44e80b36ebf6392030c3efb66e	2020-02-25 09:28:00 -08:00
Alex Cheparukhin	ee23944f46	[Caffe2] Fix shape inference for element-wise operators (#33431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33431 Some elementwise operators don't have shape and type inference specified for the output tensor: `BitwiseOr`, `BitwiseAnd`, `BitwiseXor`, `Not`, `Sign`. This change fixes this issue: - For `Not` and `Sign` operators, the output has the same type and shape as the input, so `IdenticalTypeAndShapeOfInput` function is used to specify that. - For bitwise operators created by `CAFFE2_SCHEMA_FOR_BINARY_BITWISE_OP` macro, the type and shape inference rules should be the same as for other binary element-wise operators, so `TensorInferenceFunction(ElementwiseOpShapeInference)` is used to specify that. Also some tests were modified to ensure that the shape and type are inferred (`ensure_outputs_are_inferred` parameter) Test Plan: ``` CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:elementwise_ops_test CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:math_ops_test ``` Note that the tests have to be executed with `CAFFE2_ASSERT_SHAPEINFERENCE=1` in order to fail upon shape inference failure. Reviewed By: idning Differential Revision: D19880164 fbshipit-source-id: 5d7902e045d79e5669e5e98dfb13a39711294939	2020-02-25 09:03:06 -08:00
Jeong Ukjae	819ca2c285	add bfloat16 conversion method in type stub (__init__.pyi) (#33747 ) Summary: Resolve https://github.com/pytorch/pytorch/issues/33699 `torch/__init__.pyi` will be generated like ```python # TODO: One downside of doing it this way, is direct use of # torch.tensor.Tensor doesn't get type annotations. Nobody # should really do that, so maybe this is not so bad. class Tensor: requires_grad: _bool = ... grad: Optional[Tensor] = ... # some methods here... overload def bernoulli_(self, p: _float=0.5, *, generator: Generator=None) -> Tensor: ... def bfloat16(self) -> Tensor: ... def bincount(self, weights: Optional[Tensor]=None, minlength: _int=0) -> Tensor: ... # some methods here... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33747 Differential Revision: D20090316 Pulled By: ngimel fbshipit-source-id: b9ce4c0d4ef720c94ccac0a0342a012e8cf3af0c	2020-02-25 08:49:47 -08:00
Jeong Ukjae	fd175fa8a2	fix bugs in gen_pyi.py (#33748 ) Summary: This loop should generate type hints for inplace binary operator methods (`binop` variable) but had been using `name` variable. That's why that wrong type hints had been generated. Resolve https://github.com/pytorch/pytorch/issues/33698 --- Current `__init__.pyi` has these type hints. ```python class Tensor: # some codes here... overload def zeros_like_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like__(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like__(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like__(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like__(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like___(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like___(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like___(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like___(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like____(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like____(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like____(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like____(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... ``` But `__init__.pyi` should generate these type hints. ```python class Tensor: # some codes here... overload def add_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def add_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def add_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def add_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... overload def div_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def div_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def div_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def div_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... overload def mul_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def mul_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def mul_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def mul_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... overload def sub_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def sub_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def sub_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def sub_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33748 Differential Revision: D20090444 Pulled By: ngimel fbshipit-source-id: e4a5dd08126629ec4c54b630a87ee540e669ec9a	2020-02-25 08:45:19 -08:00
albanD	6bdb59539f	follow-up test_torch .data removal (#33696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33696 This changes two tests: - The batchnorm inference cannot change the memory format of the weights as they are 1D. So this is removed. - The batchnorm test now run both in affine and not affine mode. - I added back the test for type errors using .data. In particular, `.data` allows to change the type of a Tensor inplace (very bad, never do it!) but since it is possible, we should test it until .data is removed. cc Enealor who did the first version of the PR. Test Plan: Imported from OSS Differential Revision: D20069241 Pulled By: albanD fbshipit-source-id: a0348f40c44df38d654fb2a2b2b526d9d42f598a	2020-02-25 07:36:42 -08:00
Tongzhou Wang	4ef854b4b4	Fix potential hang when exiting main process (#33721 ) Summary: The following script reproduces the hang ```py import multiprocessing, logging logger = multiprocessing.log_to_stderr() logger.setLevel(multiprocessing.SUBDEBUG) import torch class Dataset: def __len__(self): return 23425 def __getitem__(self, idx): return torch.randn(3, 128, 128), idx % 100 ds = Dataset() trdl = torch.utils.data.DataLoader(ds, batch_size=64, num_workers=300, pin_memory=True, shuffle=True) for e in range(1000): for ii, (x, y) in enumerate(trdl): print(f'tr {e: 5d} {ii: 5d} avg y={y.mean(dtype=torch.double).item()}') if ii % 2 == 0: print("="200 + "BEFORE ERROR" + "="200) 1/0 ``` The process will hang at joining the putting thread of `data_queue` in main process. The root cause is that too many things are put in the queue from the worker processes, and the `put` at `062ac6b472/torch/utils/data/dataloader.py (L928)` is blocked at background thread. The `pin_memory_thread` exits from the set `pin_memory_thread_done_event`, without getting the `(None, None)`. Hence, the main process needs the same treatment as the workers did at `062ac6b472/torch/utils/data/_utils/worker.py (L198)` . After the patch, the script finishes correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33721 Differential Revision: D20089209 Pulled By: ezyang fbshipit-source-id: e73fbfdd7631afe1ce5e1edd05dbdeb7b85ba961	2020-02-25 07:04:41 -08:00
Gerard Goossen	7a8b6c2c6b	[pytorch] blas gemm fix for k=0 (#33419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33419 These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true. ghstack-source-id: 98836075 Test Plan: Previously got error for special case where k=0 which has gone. The error was in some complicated autograd, and I'm not sure how and where an simple regression test should be added. Differential Revision: D19941103 fbshipit-source-id: e1c85d1e75744b1c51ad9b71c7b3211af3c5bcc6	2020-02-25 06:49:50 -08:00
Andrey Malevich	4460c8b034	[C2] Tiny changes to adagrad to make it slightly better. (#33727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33727 Some small changes to adagrad (tiny bit faster, though there is more interesting diff in the stack on this). Test Plan: Part of the stack Reviewed By: chocjy Differential Revision: D20029499 fbshipit-source-id: 7f4fddb9288d7881ef54673b17a0e19ef10d64c0	2020-02-24 23:02:17 -08:00
Andrey Malevich	65864d3634	[C2] Small improvement for elementwise_mul operator. (#33537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33537 Cases of embeddings smaller than 128, we can get a bit more compute by allocating less threads per block. Test Plan: Unit-test, benchmark. Reviewed By: xianjiec Differential Revision: D19969594 fbshipit-source-id: 6cc6b14fc61302804bed9093ea3591f21e3827d8	2020-02-24 23:00:27 -08:00
peter	adbe289870	Update MKL to 2020.0.166 for Windows (#33690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33690 Differential Revision: D20089300 Pulled By: ezyang fbshipit-source-id: 887c006fbdb2c837f0a1c607a196811f44f1fb35	2020-02-24 22:43:34 -08:00
Will Feng	36919278cc	C++ tensor multi-dim indexing: add index() and index_put_() overloads, simple indexing tests, merge with Python indexing path (#32841 ) Summary: This PR adds the following items: - 1st item: `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads for `Tensor::index` and `Tensor::index_put_`, to be used specifically for multi-dim indexing purpose. Design rationale: * C++ `Tensor::index` and `Tensor::index_put_` are both existing tensor APIs, and they currently (before this PR) only accept a list of tensors (i.e. `ArrayRef<Tensor>`) as indices. If we change their signatures to also accept non-tensors as indices (i.e. `ArrayRef<TensorIndex>`, and `TensorIndex` is convertible from `Tensor` / `Slice` / `None` / `Ellipsis`), it would slow down the original code path (since now it has to go through more steps), which is undesirable. To get around this problem, the proposed solution is to keep the original `ArrayRef<Tensor>` overload, and add `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads to `Tensor::index` and `Tensor::index_put_`. This way, the original code path won’t be affected, and the tensor multi-dim indexing API is only used when the user explicitly pass an `ArrayRef<TensorIndex>` or a braced-init-list of `TensorIndex`-convertible types to `Tensor::index` and `Tensor::index_put_` . Note that the above proposed solution would still affect perf for the user’s original `Tensor::index` or `Tensor::index_put_` call sites that use a braced-init-list of tensors as input, e.g. `tensor.index({...})` or `tensor.index_put_({...}, value)`, since now such function calls would take the multi-dim indexing path instead of the original advanced indexing path. However, there are only two instances of this in our codebase (one in ATen cpp test, one in a C++ API nn init function), and they can be easily changed to explicitly use `ArrayRef<Tensor>` as input (I changed them in this PR). For external user’s code, since this is part of the C++ frontend which is still considered experimental, we will only talk about this change in the release note, and ask users to switch to using `ArrayRef<Tensor>` explicitly if they want to keep using the original advanced indexing code path. - 2nd item: Mechanisms for parsing `ArrayRef<TensorIndex>` indices and performing indexing operations (mirroring the functions in `torch/csrc/autograd/python_variable_indexing.cpp`). - 3rd item: Simple tests to demonstrate that the `Tensor::index()` and `Tensor::index_put_()` APIs work. I will add more tests after the first few PRs are reviewed. - 4th item: Merge Python/C++ indexing code paths, for code simplicity. I tested locally and found that there is no perf regression resulting from the merge. I will get more concrete numbers for common use cases when we settle on the overall design. This PR supersedes https://github.com/pytorch/pytorch/pull/30425. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32841 Differential Revision: D19919692 Pulled By: yf225 fbshipit-source-id: 7467e64f97fc0e407624809dd183c95ea16b1482	2020-02-24 22:04:00 -08:00
Ashkan Aliabadi	6aecfd1e80	Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c	2020-02-24 21:58:56 -08:00
peter	2a4aad7466	Don't activate vc env again for cuda with ninja on Windows (#33700 ) Summary: Possibly get rid of https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33700 Differential Revision: D20089251 Pulled By: ezyang fbshipit-source-id: 0cfe62b869fb874e25f06894aa76fadc44cf6817	2020-02-24 21:56:29 -08:00
Jerry Zhang	7caf3c396b	[quant][graphmode][refactor] Change signature of getModuleAccessPath (#32812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32812 We'll error out for the case we can't handle inside the function, instead of checking each time in the callsite Test Plan: . Imported from OSS Differential Revision: D20087846 fbshipit-source-id: ae6d33a94adf29c4df86d67783e7ef8753c91f90	2020-02-24 21:52:43 -08:00
Shihao Xu	a1862468d0	Add missing test launchers for JitRpcTest and JitDistAutogradTest (#32891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32891 - Add JitDistAutoGradTest into fork/spawn test launcher - Add JitRpcTest into fork/spawn test launcher ghstack-source-id: 98900090 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork_thrift buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn_thrift ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork_thrift buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_spawn buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_spawn_thrift ``` Differential Revision: D5785394 fbshipit-source-id: 335a85424d22f1a83874be81a8139499c9a68ce2	2020-02-24 21:42:47 -08:00
Natalia Gimelshein	a9cef05f5d	improve EmbeddingBag performance on cuda (#33589 ) Summary: This PR improves performance of EmbeddingBag on cuda by removing 5 kernel launches (2 of those are synchronizing memcopies). - 2 memcopies are checking values of offsets[0] and offsets[-1] to be in expected range (0 for the former, less than number of indices for the latter). It seems strange to check only those 2 values, if users are providing invalid offsets, invalid values can be anywhere in the array, not only the first and last element. After this PR, the checks are skipped on cuda, the first value is forced to 0, if the last value is larger than expected, cuda kernel will assert. It is less nice than ValueError, but then again, the kernel could have asserted if other offset values were invalid. On the cpu, the checks are moved inside the cpu implementation from functional.py, and will throw RuntimeError instead of ValueError. - 3 or 4 initializations (depending on the mode) of the output tensors with .zeros() are unnecessary, because every element of those tensors is written to, so their data can be uninitialized on the start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33589 Reviewed By: jianyuh Differential Revision: D20078011 Pulled By: ngimel fbshipit-source-id: 2fb2e2080313af64adc5cf1b9fc6ffbdc6efaf16	2020-02-24 21:37:34 -08:00
Jeong Ukjae	3cf97bc23c	Fix typing error of torch/nn/modules/container.pyi.in (#33686 ) Summary: * `Sequential` has `__iter__` method, but type stub doesn't * `ModuleList.__getitem__` returns `Module`, but type stub doesn't * Type stub says `ParameterList` has `insert` method, but actual `ParameterList` doesn't * `ParameterDict.__getitem__` should returns `Parameter` * `ParameterList` and `ParameterDict` have `extra_repr` methods --- torch/nn/modules/container.py: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.py torch/nn/modules/container.pyi.in: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.pyi.in Pull Request resolved: https://github.com/pytorch/pytorch/pull/33686 Differential Revision: D20086730 Pulled By: ngimel fbshipit-source-id: a8271489417461c67ff84a239c4cd96c3aa17b5c	2020-02-24 21:20:38 -08:00
Sameer Deshmukh	d6ea4be153	Fix minor problems in index_put_ docs (#33689 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/33641 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33689 Differential Revision: D20086967 Pulled By: ngimel fbshipit-source-id: d9dde8edb904de1cf56b9337920cb29e008b72fb	2020-02-24 21:15:36 -08:00
Jerry Zhang	54aac4af1f	Update hypothesis_utils.py (#33739 ) Summary: A typo.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33739 Differential Revision: D20088096 Pulled By: jerryzh168 fbshipit-source-id: d8b5d263c25f8c779698607be87bf76aca1811ab	2020-02-24 20:56:42 -08:00
Kevin Chen	cba8af9b24	[pytorch] Set alias analysis kind to FROM_SCHEMA for qadd, qmul, qclamp, qconcat (#33359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33359 Updated alias analysis kind to FROM_SCHEMA so input tensors can be marked as nonmutable when appropriate, allowing for constant folding of these tensors. Needed to update the schemas of the _out variants with annotations to mark the output input tensor as aliased and mutable. Test Plan: ``` import torch class M(torch.nn.Module): def __init__(self): super(M, self).__init__() def forward(self, x): w = torch.tensor([3], dtype=torch.float) w = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8) y = torch.tensor([3], dtype=torch.float) y = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8) return torch.ops.quantized.add_out(x, w, y) m = torch.jit.script(M()) torch._C._jit_pass_constant_propagation(m.graph) print(m.graph) ``` ``` graph(%self : __torch__.___torch_mangle_9.M, %x.1 : Tensor): %11 : int = prim::Constant[value=12]() # <ipython-input-11-1dd94c30cb58>:9:49 %9 : float = prim::Constant[value=1.]() # <ipython-input-11-1dd94c30cb58>:9:41 %10 : int = prim::Constant[value=0]() # <ipython-input-11-1dd94c30cb58>:9:46 %36 : QInt8(1) = prim::Constant[value={3}]() %y.2 : Tensor = aten::quantize_per_tensor(%36, %9, %10, %11) # <ipython-input-11-1dd94c30cb58>:11:12 %24 : Tensor = quantized::add_out(%x.1, %36, %y.2) # <ipython-input-11-1dd94c30cb58>:12:15 return (%24) ``` As expected, the aten::quantize_per_tensor() for w is now folded. The aten::quantize_per_tensor() for y is not folded, since that tensor is aliased/modified. Differential Revision: D19910667 fbshipit-source-id: 127071909573151dc664500d363399e3643441b7	2020-02-24 20:08:06 -08:00
Jerry Zhang	bc5e9e0d55	[quant][graphmode][refactor] Move the check for qconfig inside insertObserver call (#32809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32809 This is a refactor to help further changes to quantization.cpp We want some operations on the graph happen before we call insertObserver for invoked methods, especially `addIntermediateValuesToSkipObserver` since we want to skip the input of the ReLU module in `Conv - ReLU` pattern. Test Plan: test_jit.py test_quantization.py Imported from OSS Differential Revision: D20087844 fbshipit-source-id: 28b7fa0c7ce9e254ab9208eb344893fb705e14d9	2020-02-24 20:03:33 -08:00
Mikhail Zolotukhin	bf00b4d305	[TensorExpr] Add a boilerplate pass for future TensorExpr fusion pass. (#33464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33464 I added a python-exposed knob to register this pass in custom passes pipeline. If the knob is not used, the pass is not registered and thus not run at all. Differential Revision: D19958217 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: fecdd98567fcda069fbdf8995c796899a3dbfa5c	2020-02-24 18:47:31 -08:00
Tongzhou Wang	9278196d89	scatter_add uses src, not other (#32307 ) Summary: using `other` kwarg gives `TypeError: scatter_add_() missing 1 required positional arguments: "src"` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32307 Differential Revision: D20076859 Pulled By: zou3519 fbshipit-source-id: dfb417c087d5be41fad02dc0b2cf0506c89b1b02	2020-02-24 18:01:34 -08:00
Supriya Rao	98af01ee7c	[quant] Make FakeQuant use REGISTER_DISPATCH (#33682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33682 Previously, there were two API's for CPU and CUDA. This change keeps one top level API, i.e `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine` and uses the device type to dispatch to different backends (CPU and CUDA). CPU kernel implementation is in QuantizedOpKernels.cpp CUDA kernel implementation is in fake_quantize_core.cu Test Plan: python test/test_fake_quant.py Benchmark Results for CPU FakeQuantize tensor of size (2, 256, 128, 128) Before: per tensor quant ms 9.905877113342285 per channel quant ms 74.93825674057007 After: per tensor quant ms 6.028120517730713 per channel quant ms 44.91588592529297 Imported from OSS Differential Revision: D20072656 fbshipit-source-id: 0424f763775f88b93380a452e3d6dd0c90cb814b	2020-02-24 17:48:13 -08:00
Wojciech Baranowski	b10a39bb32	Migrate _cat from TH to ATen (CUDA) (#33237 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24520 Benchmarks: Upstream: ``` $ python -m pt.cat_test --tag_filter all --device cuda --omp_num_threads 1 --mkl_num_threads 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 17.355 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 30.718 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 17.329 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 30.176 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 74.417 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 75.728 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 190.165 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa8876fcf28>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa8876fcf28>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 57.711 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7fa886237048>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7fa886237048>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 49.903 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7fa7b57bb840>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7fa7b57bb840>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 84.181 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bba60>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bba60>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 82.339 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7fa7b57bbae8>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7fa7b57bbae8>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 82.312 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7fa7b57bbb70>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7fa7b57bbb70>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 90.715 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 129.021 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 142.966 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 387.023 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbbf8>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbbf8>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 36.647 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbc80>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbc80>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 278.890 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbd08>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbd08>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 557.752 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbd90>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbd90>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 842.512 ``` New version: ``` $ python -m pt.cat_test --tag_filter all --device cuda --omp_num_threads 1 --mkl_num_threads 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 24.419 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 25.025 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 24.247 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 25.098 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 74.441 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 74.866 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 189.280 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1c9b056048>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1c9b056048>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 57.629 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f1c9b0560d0>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f1c9b0560d0>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 49.975 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f1bce8f38c8>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f1bce8f38c8>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 83.643 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3ae8>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3ae8>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 82.307 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f1bce8f3b70>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f1bce8f3b70>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 82.323 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f1bce8f3bf8>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f1bce8f3bf8>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 90.549 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 129.022 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 142.969 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 386.973 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3c80>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3c80>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 43.800 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3d08>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3d08>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 279.023 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3d90>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3d90>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 565.790 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3e18>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3e18>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 845.153 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33237 Differential Revision: D20069181 Pulled By: ngimel fbshipit-source-id: b392e1ffd72c0d8df0c5a2d3ac96f59b37c84e32	2020-02-24 17:41:16 -08:00
svcscm	97da60d511	Updating submodules Summary: GitHub commits: `ea8bae1f0f` `134472ee45` `37e6cf9d62` `eb367d45c0` `76de6e15c0` `e1b1a55309` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 9d0d688d81be822900475223a787c5649e143e85	2020-02-24 17:34:59 -08:00
Jerry Zhang	479e474a37	[quant][graphmode] FoldConvBatchNorm2d support shared ClassTypes (#32379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32379 Folding Conv2d - BatchNorm2d modules means recalculate the weight and bias of Conv2d module by incorproating the parameters of BatchNorm2d, and also change the method calls to calling only forward of Conv2d module, this involves change of both module types and graph because the bias of Conv2d is a parameter when it has value and is an attribute when it is None(since JIT code has assumption of prameter being Tensor in multiple places), therefore we'll need to remove the bias attribute when it is None and add a bias attribute later. Since ClassType might be shared, we separate remove and add in separate steps and also keep track of the processed graph to avoid modifying the graph and type multiple times. However we'll have to record the slot index of bias as well so we can replay the slot removal on other instances of Conv2d module. Test Plan: tbd Imported from OSS Differential Revision: D20078719 fbshipit-source-id: cee5cf3764f3e0c0a4a2a167b78dbada2e3835cc	2020-02-24 17:29:13 -08:00
Xiang Gao	54e41a87eb	Make ELU great again (#33244 ) Summary: Due to compiler bug, we have to make some workaround on ELU for CUDA. A necessary condition for this bug to happen is `invoke_with_array` in `Loops.cuh`. Now, https://github.com/pytorch/pytorch/issues/33222 will kill that function, and we need to remove that workaround once https://github.com/pytorch/pytorch/issues/33222 is landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33244 Differential Revision: D20076197 Pulled By: ngimel fbshipit-source-id: 39f99783014c78cecad1c39cb46092278ff220b9	2020-02-24 17:18:30 -08:00
Jianyu Huang	5b031d961d	[pt][quant] RNN debug test (#33621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33621 ghstack-source-id: 98746093 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details Differential Revision: D20036968 fbshipit-source-id: 7cbb027a6afbe28bc250fc663089c6a9406e880b	2020-02-24 16:15:17 -08:00
Xinyi Zhang	696527e659	[caffe2] Add embedding empty ratio checker (disabled by default) (#33145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33145 Reviewed By: xianjiec Differential Revision: D19716574 fbshipit-source-id: 42a636600ac3977910d35093916865790bbe5b10	2020-02-24 16:10:01 -08:00
Yanli Zhao	5090d7082b	add propagate flag USE_DISTRIBUTED for libtorch_python_source Reviewed By: pritamdamania87 Differential Revision: D20070789 fbshipit-source-id: fdb8a2eefb5bfc1ae1d80e29bd15eb1d70920c87	2020-02-24 16:02:47 -08:00
Gregory Chanan	330b69fef8	Kill dead scalar_check. (#33695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33695 I'm not sure how this stuck around, but it has no effect. Test Plan: Imported from OSS Differential Revision: D20068867 Pulled By: gchanan fbshipit-source-id: 79191338a8bc7a195e2b7265005ca6f00aab3818	2020-02-24 14:53:24 -08:00
Supriya Rao	996c0adb53	[quant] Regsiter fake_quant and observer attributes as buffers (#33626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626 For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest. Test Plan: Tested on actual model on GPU Imported from OSS Differential Revision: D20038839 fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4	2020-02-24 14:16:03 -08:00
Michael Suo	dc3d47110a	[docs] add experimental warning to TorchScript classes in language reference (#33697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33697 reference Test Plan: Imported from OSS Differential Revision: D20070220 Pulled By: suo fbshipit-source-id: 9828d876afed59203cc472eaf0134d52d399069e	2020-02-24 14:01:19 -08:00
Will Feng	533b973fd0	Fix visibility of torch::nn::RNNImpl::options (#33718 ) Summary: In PR https://github.com/pytorch/pytorch/issues/33027, `options` in RNNImpl was mistakenly changed to `protected` (it was `public` before) ``` protected: FORWARD_HAS_DEFAULT_ARGS({1, AnyValue(Tensor())}) RNNOptions options; ``` This PR changes it back to `public` again. Fixes https://github.com/pytorch/pytorch/issues/33694. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33718 Differential Revision: D20075149 Pulled By: yf225 fbshipit-source-id: 82901369eeaacd82df849e17df64dc1aaf98f9fe	2020-02-24 13:50:39 -08:00
Edward Yang	062ac6b472	Bring up new-style registration API as wrapper around old-style (#33205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33205 A number of important use-cases are implemented: - def(schema): defines a schema, with no implementation (alias inferred from schema, by default) - def(schema, fn_ptr): registers fn_ptr as a catch-all kernel for the operation - def(schema, lambda): registers lambda as a catch-all kernel for the operation - def(schema, torch::dispatch(dispatch_key, fn)), and def(schema, torch::dispatch(device_type, fn)): registers the function to only be executed when dispatch_key/device_type is selected for use - def(schema, TORCH_OPTIMIZED_FN(fn)): registers the function as unboxed only, using the inline syntax All of our code generated registrations in ATen are switched to the new API. Some aspects of the API which are not fully implemented: - It's still not valid to omit the schema when registering a function pointer, due to #32549 - Although it's possible to take advantage of top-level namespaces ala torch::import("aten"), we don't use it because this results in worse code (as we have to cat everything back together). This is not an essential problem, we just need the internals to be less stupid. There are some aspects of the API which don't semantically make sense, but I chose not to fix them in this PR: - For some reason, TORCH_OPTIMIZED_FN uses the runtime wrapper to do wrapping, rather than the compile time one which inlines the function in. This means that there isn't any reason we should be passing in the function pointer as a template argument; a regular old argument ought to have worked fine. This is seemingly consistent with the current API though; needs further investigation. - There's no reason to optional<DispatchKey>, DispatchKey would work just fine (use DispatchKey::Undefined for the nullopt case) In the long term, we should swap the wrapper around: the new-style API has the real implementation, and the old-style API is backwards compatibility. However, this implies a lot of internal refactoring, so I decided to short circuit around it to get this in faster Ancillary changes: - I stopped moving optional<DispatchKey>, it's literally just two words, pass it by value please. - Needed to add a & qualified version of RegisterOps::op, since I'm storing RegisterOps as a member inside the new style Namespace and I cannot conveniently get a rvalue reference to it in that situation. (BTW, register_ = std::move(register_) really doesn't work, don't try it!) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19856626 Pulled By: ezyang fbshipit-source-id: 104de24b33fdfdde9447c104853479b305cbca9a	2020-02-24 11:45:14 -08:00
David Reiss	ced8865d91	Add sigmoid to mobile ops Summary: Used by segmentation model. Test Plan: Ran segmentation model on mobile. Reviewed By: iseeyuan Differential Revision: D19881378 fbshipit-source-id: 87f00058050fd173fbff1e88987ce09007622b83	2020-02-24 11:37:24 -08:00
Peter Bell	32c93099c4	Add typing info for data members of utils.data.sampler classes (#33679 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33679 Differential Revision: D20063099 Pulled By: ngimel fbshipit-source-id: 1bbf71a65408d117019ab38d7d095cfd337f5d1e	2020-02-24 11:29:59 -08:00
Yanli Zhao	4d9b649261	jit pickling rref (#32959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32959 in rpc torch script call path, we need to pickle/unpickle rref, this diff is added to make jit pickler/unpickler be able to pickle/unpickle rref. It is similar to what is implemented for PyRef::pickle() and PyRef::unpickle(). The pickling/unpickling design assumes it is always coupled with RPC calls. It is not needed to checkpoint a model with rref, before checkpointing the model, user should call ref.to_here() to get value inside rref. The pickling process is: 1. push torch.distributed.rpc.rref global string 1. call rref.fork() and create rrefForkData, which is a few IDs and type str of the value held inside the rref, the IDs includes rref id, fork id, caller work id, callee work id, owner work id 2. push the rrefForkData The unpickling process is: 1. read torch.distributed.rpc.rref global string, and retrieve the cached global lamda function 2. the globa lamda function will get rrefForkData 3. if callee is also owner work id, then get owner rref based on Ids inside rrefFork data and return the ownerRRef 4. if callee is not owner work id, then create user rref using the rrefForkData and return the userRRef 5. meanwhile owner rref will be notified and do reference counting correctly During unpickling, a type_resolver is needed to parse type str. This type_resolver has python dependency, so we get it from rpc_agent, and pass it to unpickler during construction. So we added a type_resolver argumenmt to jit unpickler constructor in this diff. ghstack-source-id: 98814793 Test Plan: unit test Differential Revision: D19713293 fbshipit-source-id: 4fd776cdd4ce8f457c4034d79acdfb4cd095c52e	2020-02-24 11:16:35 -08:00
Thomas Viehmann	481e7f2e78	catch and propagate warnings for JIT ScriptMethods (#33010 ) Summary: We align it with ScriptFunctions by using the HANDLE_TH_ERRORS/END_HANDLE_TH_ERRORS_PYBIND macros. Fixes https://github.com/pytorch/pytorch/issues/24155 or https://github.com/pytorch/pytorch/issues/24828 ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/33010 Differential Revision: D20053585 Pulled By: suo fbshipit-source-id: c8876b54069285ba9638bb2328fd8738b59c396d	2020-02-24 10:28:17 -08:00
Xingdong Zuo	6a76433b9d	[Update independent.py]add explicit string representation (#33676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33676 Differential Revision: D20069202 Pulled By: ngimel fbshipit-source-id: 48b609d4fb7a098e9e3383553103a9441673d63f	2020-02-24 10:15:00 -08:00
DuckSoft	6a275b696e	adding IterableDataset to utils.data.__init__ (#33543 ) Summary: this shall fix issue https://github.com/pytorch/pytorch/issues/27820 again Pull Request resolved: https://github.com/pytorch/pytorch/pull/33543 Differential Revision: D20002446 Pulled By: vincentqb fbshipit-source-id: 7563a56fd6238efe8ea5626b02ba5e8fcda0780e	2020-02-24 10:09:38 -08:00
Gregory Chanan	e3ba533c8b	Minimize the cases where we have to cpu_zero. (#33570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33570 In this PR, we are a bit more careful about avoiding zero-ing the output. Analysis as follows: 1) `mm` doesn't need zero_ because it never calls scal, which is the underlying problem. 2) for `mv`, which does call scal (in certain cases), we can just move the zeroing to where it would actually be a problem, namely when the scalar value is 0. In this case we just run the non-BLAS version of the code. Test Plan: Imported from OSS Differential Revision: D20007665 Pulled By: gchanan fbshipit-source-id: 1f3a56954501aa9b2940d2f4b35095b2f60089a8	2020-02-24 07:47:36 -08:00
Gregory Chanan	641750e33c	Fix NaN handling in torch.mv. (#31666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31666 List of changes: 1) Fix a case where torch.mv was not handling NaNs correctly. In particular, with a transposed tensor and expanded vector, NaNs in the output are kept, even if beta = 0. This is handled in the `out=` case by zero-ing out the passed-in Tensor, but this can happen just the same with the non-out variant if the allocated tensor happens to have a NaN. Also adds tests for this case. NOTE: we zero out the output tensor in all cases for mv and mm, even though this is probably overkill. I didn't find another case where this would be a problem, but the old code at least attempted to do this for all mv and mm calls and I didn't add comprehensive testing to be sure that it's not a problem. 2) on CPU: move mv, mv_out, mm, mm_out to be direct wrappers on _th_addmv, _th_addmm, rather than having their own wrappers in Declarations.cwrap. Ths is to remove the magic around cpu_zero from the codegen, which simplifies the codegen and makes testing this easier. Test Plan: Imported from OSS Differential Revision: D19239953 Pulled By: gchanan fbshipit-source-id: 27d0748d215ad46d17a8684696d88f4cfd8a917e	2020-02-24 07:46:08 -08:00
Ashkan Aliabadi	039dc90854	Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration. Test Plan: revert-hammer Differential Revision: D19521853 Original commit changeset: 99a1fab31d0e fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6	2020-02-23 22:07:19 -08:00
James Reed	9d834cc889	[JIT] Fix FunctionType::python_str() (#33680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33680 Test Plan: Imported from OSS Differential Revision: D20062777 Pulled By: jamesr66a fbshipit-source-id: fcdb0527ca6776ff161cd535794e9c12bb32bdde	2020-02-23 21:52:09 -08:00
Zachary DeVito	5fa03d4dbb	Fix bug where we were trying to get a schema for prim::Constant, which is not registered as an operator. (#33645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33645 Fix bug where we were trying to get a schema for prim::Constant, which is not registered as an operator. ghstack-source-id: 98785729 Test Plan: buck test mode/dev //pytext/models/test:scripted_seq2seq_generator_test -- 'test_generator $pytext\.models\.test\.scripted_seq2seq_generator_test\.ScriptedSeq2SeqGeneratorTest$' Differential Revision: D20050833 fbshipit-source-id: cc38510b0135b750fdf57fb9c1e66ce1d91ee128	2020-02-23 21:37:35 -08:00
Michael Carilli	e1bddbbaf6	Bounds checking for functor execution in vectorized/unrolled kernels (#33642 ) Summary: The current logic for vectorized/unrolled operations in CUDALoops.cuh applies bounds checking to loads and stores, [but not to the actual functor's execution](`16d6c17845/aten/src/ATen/native/cuda/CUDALoops.cuh (L264)`). In other words, for a block acting on the tail of a tensor that doesn't require the whole block to participate in memory transactions, many threads execute their functor on uninitialized data. For functors that only communicate with the outside world via the bounds-checked loads and stores, that's ok. The threads acting on garbage data never actually write their results. But [my proposed inf/nan checking kernel](https://github.com/pytorch/pytorch/pull/33366/files#diff-9701a2b34900195d160bdc234e001b79R70-R79) has the additional side effect of writing to a `found_inf` flag in global memory. For irregularly-shaped tensors where tail threads execute the functor on garbage data, these threads would sometimes see and report spurious infs/nans. In general, we can't guarantee functors won't have side effects. For safety (and efficiency) we should apply bounds checking to the functor execution as well as the loads and stores. Is it possible that other elementwise kernels (in addition to the strided/vectorized implementation) are also executing functors unconditionally? That would cause similar failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33642 Differential Revision: D20062985 Pulled By: ngimel fbshipit-source-id: 65b8d75a001ce57865ed1c0cf89105d33f3f4dd4	2020-02-23 21:17:31 -08:00
Ashkan Aliabadi	941b42428a	Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509 ) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa	2020-02-23 19:08:42 -08:00
Enealor	7aa605ed92	Remove uses of `.data` in test_torch (#33638 ) Summary: Removes almost every usage of `.data` in test_torch to address part of https://github.com/pytorch/pytorch/issues/33629. Lines 4706-4710 had to be refactored to allow this. The changed test is fundamentally the same, as it appears to be meant to confirm that using an input of a different type than the weight causes an appropriate error. There is one remaining usage of `.data`, and it is on line 5132. This was left as the `set_` and `resize_` methods still mention `.data` explicitly. I figure the right time to remove this is when those methods have their runtime errors updated. Note: ~~some tests are skipped locally, and so I am still verifying that nothing has been obviously broken.~~ Appears to be passing early tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33638 Differential Revision: D20062288 Pulled By: albanD fbshipit-source-id: 672a6d7a20007baedb114a20bf1ddcf6c4c0a16a	2020-02-23 14:11:21 -08:00
Lu Fang	6d448acb34	[PyTorch BC] Skip aten::random_ to fix BC CI (#33666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33666 it's caused by a revert. So let's skip it. Test Plan: ci Reviewed By: hl475 Differential Revision: D20057382 fbshipit-source-id: d71af8efe68b31befcef5dddc372540e8a8ae2ac	2020-02-22 21:28:18 -08:00
Akash S M	9e384f9ce4	Remove duplicate header include. (#33656 ) Summary: The same header `<torch/nn/functional/conv.h>` is included twice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33656 Differential Revision: D20056913 Pulled By: yf225 fbshipit-source-id: b1563035c9821731b99c26eec130ff0b9cc627a7	2020-02-22 14:17:07 -08:00
Pavel Belevich	312627a7c3	Revert D19776613: Migrate `random_` from the TH to Aten (CPU) Test Plan: revert-hammer Differential Revision: D19776613 Original commit changeset: a8d262bccf5f fbshipit-source-id: 36389ffa3d8377743f55f97221d7a7ee25a409f6	2020-02-22 08:15:27 -08:00
Yinghai Lu	a2f3c6c26f	Call RandomNumberSeed() on-demand (#33539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539 We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`. Test Plan: unittests. Canaries: AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410 AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838 Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569 Reviewed By: ipiszy Differential Revision: D19993190 fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff	2020-02-22 01:22:18 -08:00
Mike Ruberry	8291e06f8f	Fixes cuda->numpy and non-strided->numpy segfaults (#33612 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/33300. Calling .numpy() on a CUDA or non-strided (e.g. sparse) tensor segfaults in current PyTorch. This fixes the segfaults and throws the appropriate TypeError, as was intended. Two tests, one in test_cuda.py and the other in test_sparse.py, are added to verify the behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33612 Differential Revision: D20038210 Pulled By: mruberry fbshipit-source-id: 265531dacd37c392232fd3ec763489a62ef54795	2020-02-21 22:23:08 -08:00
Lu Fang	59daf1611b	[Caffe2] Skip //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.RocksDB' Summary: Skip the test to unblock dper fbpkg push Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.RocksDB' --run-disabled Reviewed By: cheshen1 Differential Revision: D20043418 fbshipit-source-id: 05ceb2cea08722a671fa211d73680fd4b78f354c	2020-02-21 21:30:02 -08:00
Lu Fang	1c08fa7051	[Caffe2] Skip caffe2/caffe2:caffe2_test_cpu - DBSeekTest.LMDB Summary: skip broken tests in https://fburl.com/svc/zsbsrc7a to unblock dper fbpkg push. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.LMDB' --run-disabled Reviewed By: cheshen1 Differential Revision: D20042330 fbshipit-source-id: 5b86e66da2a219c915c471b8e87f33239bdc5ba9	2020-02-21 21:28:31 -08:00
Nikolay Korovaiko	a7e22b4c6a	add bailout checks to checkScript (#32802 ) Summary: this adds enough infrastructure to run bailout checks in `checkScript`. I'll need to figure out the best way to enable it for nightly builds now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32802 Differential Revision: D19974718 Pulled By: Krovatkin fbshipit-source-id: 40485503f6d3ae14edcce98e1eec1f0559f3ad08	2020-02-21 21:18:54 -08:00
Michael Ranieri	9b2b15f4fc	misc windows warning fixes (#33632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33632 * `inline_container.h` was unnecessarily exposing all includers to caffe2 headers via `caffe2/core/logging.h` * Add msvc version of hiding unused warnings. * Make sure clang on windows does not use msvc pragmas. * Don't redefine math macro. Test Plan: CI green Differential Revision: D20017046 fbshipit-source-id: 230a9743eb88aee08d0a4833680ec2f01b7ab1e9	2020-02-21 19:36:25 -08:00
Pavel Belevich	d971007c29	Migrate `random_` from the TH to Aten (CPU) (#32534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32534 Fixes #24752 Fixes #32510 Test Plan: Imported from OSS Differential Revision: D19776613 Pulled By: pbelevich fbshipit-source-id: a8d262bccf5f2807f6125c83080aa16d77491b19	2020-02-21 16:13:58 -08:00
Dmytro Dzhulgakov	e10aa6b72f	Fix flaky DagNetTest unittest Summary: The first run of the net is noisy sometimes - just run it twice. Reviewed By: cheshen1 Differential Revision: D20039274 fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29	2020-02-21 16:08:04 -08:00
Andrey Malevich	6474ea404d	[C2] Native GPU implementation for bucketize (#33529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33529 Current version goes through GPU -> CPU -> GPU copy and is pretty slow: ~19 ms for 1M elements with 20 possible buckets based on benchmark. This new version is ~0.2 on the same Test Plan: benchmark + unit-test Reviewed By: chocjy Differential Revision: D19969518 fbshipit-source-id: 51889bc9a232b6d45d9533e53b7b7f4531da481f	2020-02-21 15:47:04 -08:00
Hong Xu	15ba902c08	Turn ONNX_ML into a proper build option. (#33424 ) Summary: The detection of the env variable ONNX_ML has been properly handled in tools/setup_helpers/cmake.py, line 242. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33424 Differential Revision: D20043991 Pulled By: ezyang fbshipit-source-id: 91d1d49a5a12f719e67d9507cc203c8a40992f03	2020-02-21 15:42:33 -08:00
Natalia Gimelshein	16d6c17845	improve roll performance (#33623 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33544 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33623 Differential Revision: D20037643 Pulled By: ngimel fbshipit-source-id: 9fd293eca5242daf414c116344b2e1fde9f9ebc5	2020-02-21 15:09:51 -08:00
Xiang Gao	f62f1b2ef0	Revert "Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to … (#33553 ) Summary: …have different argument types" This reverts commit 05fb160048b71c1b8b00d2083a08618318158c1a. Please go to https://github.com/pytorch/pytorch/pull/33558 and check the CUDA9 on CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/33553 Differential Revision: D20017575 Pulled By: ngimel fbshipit-source-id: a5fd78eea00c7b0925ab21fd90a7daeb66725f1a	2020-02-21 14:56:30 -08:00
Edward Yang	a72946dbab	Stop generating out full function type for registration, use decltype or infer it (#33097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33097 Previously, we had to specify full types because the functions we registering might be overloaded, and the type was necessary to resolve the ambiguity. I disambiguate all of these names by mangling the names of the methods we place on CPUType/CUDAType/TypeDefault with the overload name (these are internal wrappers which are not user visible), and then can strip the generation of full function types from the registration. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19837898 Pulled By: ezyang fbshipit-source-id: 5f557184f6ec84cb0613d4eb2e33b83fd1712090	2020-02-21 14:26:14 -08:00
Edward Yang	22963f42ec	Delete unnecessary aliasAnalysis specification from operator registrations. (#33093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093 In #30187 the aliasAnalysis field on operator registration was updated so that alias analysis could be specified in only some registration call sites, rather than requiring it be consistently specified in all call sites. With this change, we can eliminate the requirement that all registrations specify aliasAnalysis; as long as we know one site specifies the correct aliasAnalysis, we don't have to specify it any of the other sites. In this patch, the "one site" is TypeDefault.cpp (previously we only generated these stub declarations for manually registered functions, but now we generate the stubs for everything). Then I delete aliasAnalysis anywhere we register an op for an existing function (which is a lot of places). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19837897 Pulled By: ezyang fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2	2020-02-21 14:24:44 -08:00
Yanli Zhao	d5b768dffd	refactor strongTypePtr (#33590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33590 ghstack-source-id: 98713798 Test Plan: unit test Differential Revision: D20015521 fbshipit-source-id: 8c744a6f30f12671bef89c3555110ce26609d9a3	2020-02-21 13:32:18 -08:00
Suyash458	47e90d774e	C++/Python API Parity: add pad_sequence (#32387 ) Summary: - add `pad_sequence` and tests - related issue https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32387 Differential Revision: D20025421 Pulled By: yf225 fbshipit-source-id: caa9ae2114bece8db387a3a1610f24a3e06b1324	2020-02-21 13:16:09 -08:00
Mikhail Zolotukhin	bb5181b716	[TensorExpr] Add IR Printer. (#33220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33220 Test Plan: Imported from OSS Differential Revision: D19848379 Pulled By: ZolotukhinM fbshipit-source-id: 1c6ab4f63080d4506dedc3c47938de92fb4bfba2	2020-02-21 13:10:26 -08:00
Mikhail Zolotukhin	fc70fc3610	[TensorExpr] Add IR visitor, IR mutator, and IR evaluator. (#33219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33219 Test Plan: Imported from OSS Differential Revision: D19848381 Pulled By: ZolotukhinM fbshipit-source-id: 44ca7cd99c25e290a8ffd8146785c19f9c785dfd	2020-02-21 13:10:22 -08:00
Mikhail Zolotukhin	49af9425a7	[TensorExpr] Add core classes for representing expressions and statements. (#33218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33218 Test Plan: Imported from OSS Differential Revision: D19848378 Pulled By: ZolotukhinM fbshipit-source-id: 48399f8651324d5ad0607e08573d5d7b2026bb23	2020-02-21 13:10:17 -08:00
Mikhail Zolotukhin	1a4f997178	[TensorExpr] Add a class for representing data type. (#33217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33217 Test Plan: Imported from OSS Differential Revision: D19848380 Pulled By: ZolotukhinM fbshipit-source-id: d8683f8fc4555d2456cd2a7c827d8e8231915b49	2020-02-21 13:10:12 -08:00
Mikhail Zolotukhin	089d658153	[TensorExpr] Add classes for memory management in tensor expressions. (#33216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33216 All tensor expressions belong to a kernel arena and are freed when the arena is destroyed. Until it is destroyed, all expressions stay valid. Test Plan: Imported from OSS Differential Revision: D19848382 Pulled By: ZolotukhinM fbshipit-source-id: a581ea2b635b9ba2cc53949616a13d8d3a47caae	2020-02-21 13:08:50 -08:00
ashish	616beb1412	[ROCm] Added support for pytorch extensions to use HIP (#32669 ) Summary: This pull request has changes for: 1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py 2. Fixes for hipify module to be able to be used by a torch extension cc: ezyang iotamudelta jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669 Differential Revision: D20033893 Pulled By: zou3519 fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008	2020-02-21 12:10:02 -08:00
Stas Bekman	ca8e025cdf	improve the doc of enforce_sorted in pack_padded_sequence (#33617 ) Summary: this is a follow up PR to https://github.com/pytorch/pytorch/issues/33602: torch/nn/utils/rnn.html: `pack_padded_sequence` has a confusing and incomplete description of the `enforce_sorted` param. Currently it goes: ``` enforce_sorted (bool, optional): if ``True``, the input is expected to contain sequences sorted by length in a decreasing order. If ``False``, this condition is not checked. Default: ``True``. ``` The second part "this condition is not checked" (1) makes no sense since the alluded to condition is not described and (2) it's incomplete as it doesn't reflect the important part, that it actually does the sorting. I think it should say something like: ``` enforce_sorted (bool, optional): if ``True``, the input is expected to contain sequences sorted by length in a decreasing order. If ``False``, the input will get sorted unconditionally. Default: ``True``. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33617 Differential Revision: D20035131 Pulled By: albanD fbshipit-source-id: 654382eb0cb62b5abc78497faa5b4bca42db5fda	2020-02-21 11:51:08 -08:00
Yash	293fa5fc44	[Documentation] Fix minor typo in torch.serialization (#33549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33549 Differential Revision: D20002545 Pulled By: albanD fbshipit-source-id: 46fe2002329e5250c009eb066432909b71ecd74d	2020-02-21 09:29:13 -08:00
nicolov	e77abb9a5b	Normalize reward-to-go in C++ actor-critic (#33550 ) Summary: Comparing to the [Python implementation](https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py), it seems like the tensor of normalized reward-to-go is computed but never used. Even if it's just an integration test, this PR switches to the normalized version for better convergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33550 Differential Revision: D20024393 Pulled By: yf225 fbshipit-source-id: ebcf0fee14ff39f65f6744278fb0cbf1fc92b919	2020-02-21 09:19:39 -08:00
davidriazati	ee28831341	[jit] Fix aug assign for non-tensor attributes (#32993 ) Summary: Instead of erroring out this de-sugars augmented assignments to class members from `self.a += 1` to `self.a = self.a + 1`. Fixes #32973 ](https://our.intern.facebook.com/intern/diff/19737636/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32993 Pulled By: driazati Differential Revision: D19737636 fbshipit-source-id: 07307cde88d8c348a7affdafe26db21c74e28ec0	2020-02-21 08:42:35 -08:00
Nathan Goldbaum	fa80299bdf	__torch_function__ overrides for torch.functional and torch.nn.functional (#32799 ) Summary: This adds `__torch_function__` support for all functions in `torch.functional` and `torch.nn.functional`. The changes to C++ code and codegen scripts are to facilitate adding `__torch_function__` support for the native functions in `torch._C._nn`. Note that I moved the `handle_torch_function` C++ function to a header that both `python_torch_functions.cpp` and `python_nn_functions.cpp` include. The changes to `python_nn_functions.cpp` mirror the changes I made to `python_torch_functions.cpp` when `__torch_function__` support was first added in https://github.com/pytorch/pytorch/issues/27064. Due to the somewhat different way the `torch._C` and `torch._C._nn` namespaces are initialized I needed to create a new static reference to the `torch._C._nn` namespace (`THPNNVariableFunctions`). I'm not sure if that is the best way to do this. In principle I could import these namespaces in each kernel and avoid the global variable but that would have a runtime cost. I added `__torch_function__` support to the Python functions in `torch.nn.functional` following the approach in https://github.com/pytorch/pytorch/issues/32194. I re-enabled the test that checks if all functions in the `torch` namespace are explicitly tested for `__torch_function__` support. I also generalized the check to work for `torch.functional` and `torch.nn.functional` as well. This test was explicitly disabled in https://github.com/pytorch/pytorch/issues/30730 and I'm happy to disable it again if you think that's appropriate. I figured now was as good a time as any to try to re-enable it. Finally I adjusted the existing torch API tests to suppress deprecation warnings and add keyword arguments used by some of the code in `torch.nn.functional` that were missed when I originally added the tests in https://github.com/pytorch/pytorch/issues/27064. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32799 Differential Revision: D19956809 Pulled By: ezyang fbshipit-source-id: 40d34e0109cc4b9f3ef62f409d2d35a1d84e3d22	2020-02-21 08:38:37 -08:00
Hong Xu	6cec555926	Replace AT_CHECK with TORCH_CHECK in torch/csrc/jit/pybind_utils.h (#33524 ) Summary: This is generating a considerable amount of warning, due to the fact that the header file is included in multiple places. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33524 Differential Revision: D20006604 Pulled By: ezyang fbshipit-source-id: 0885cd2a708679ba5eeabb172366eb4c5a3bbef4	2020-02-21 08:38:32 -08:00
Edward Yang	90f4c5695e	Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33572 This reverts commit 687a7e4a2566861c53c8fb53a80b198465168b38. Original PR #33305 Reland with BC tests whitelisted. See https://github.com/pytorch/pytorch/issues/33580 for reasoning why this change is not actually BC breaking. Test Plan: Imported from OSS Differential Revision: D20011011 Pulled By: ezyang fbshipit-source-id: 116374efc93af12b8ad738a0989d6f0daa9569e2	2020-02-21 08:36:32 -08:00
Hong Xu	e2a9ea0f72	Ensure that lambda is no less than zero in softshrink (#33201 ) Summary: Softshrink is ill-defined when `lambda < 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33201 Differential Revision: D19899571 Pulled By: ezyang fbshipit-source-id: ac0dd8edea3435810a76a3a88152f83a024c7859	2020-02-21 08:34:06 -08:00
Hong Xu	a6a72ac68f	Fix all occurrences of C416. (#33429 ) Summary: C416: Unnecessary (list/set) comprehension - rewrite using list/set(). See https://pypi.org/project/flake8-comprehensions/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429 Differential Revision: D19972858 Pulled By: ezyang fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23	2020-02-21 08:32:22 -08:00
Xiang Gao	4588f49f68	Kill cudaDeviceAllocator in THCState (#33380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33380 Differential Revision: D19973151 Pulled By: ezyang fbshipit-source-id: 41634c43b28ca723e39e761afd32e5015e122368	2020-02-21 08:06:11 -08:00
Nikolay Korovaiko	a943b0518b	strict check for a device type in Fuser (#33025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33025 Differential Revision: D19975873 Pulled By: Krovatkin fbshipit-source-id: 57f160bec9e4285dda63611f12665264754aac32	2020-02-20 23:53:27 -08:00
Xiang Gao	e8a03438cc	Make TestCuda.test_memory_stats more robust (#33575 ) Summary: IIUC Python does not guarantee when an object is garbage collected. So it is possible that, some other test running before `TestCuda.test_memory_stats` creates object which is only garbage collected during `TestCuda.test_memory_stats`, causing mem stats to change and causing this test to fail. This kind of failure is very hard to debug (it took me and mcarilli and ptrblck quite a while to figure out what is happening), and it is the root cause of mcarilli's gradient scaling PR https://github.com/pytorch/pytorch/pull/26512 failing on Windows. cc: csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/33575 Differential Revision: D20009260 Pulled By: ngimel fbshipit-source-id: 62f2716aefac3aa6c7d1898aa8a78e6b8aa3075a	2020-02-20 21:02:55 -08:00
Jiakai Liu	009293ec5c	[pytorch][size] remove unused SparseCPUType from mobile build (#33517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33517 I don't think any mobile model uses SparseCPU backend yet so we can skip generating dispatch code for this backend type. This will help reduce mobile code size with dynamic dispatch turned on, roughly ~100K for uncompressed iOS: D19616007 +413K v.s. D19616016 +319K. It probably doesn't affect much static dispatch build size as the unused static dispatch methods will be stripped by linker in the end. ghstack-source-id: 98615810 Test Plan: - CI & BuildSizeBot Reviewed By: linbinyu Differential Revision: D19978633 fbshipit-source-id: 27bf6ada2ba98482084cf23724cf400b538b0a03	2020-02-20 20:12:36 -08:00
Zachary DeVito	ac9b40164d	Use cheaper check in isTensorList (#33528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33528 Test Plan: Imported from OSS Reviewed By: ajyu Differential Revision: D19989166 Pulled By: zdevito fbshipit-source-id: b0c484e037ca48226ed4d9204a06982e0c627ff0	2020-02-20 20:10:51 -08:00
svcscm	d3d975cbf6	Updating submodules Summary: GitHub commits: `a16cb11a77` `d92f4e3e1e` `d021412065` `a7c056b5b4` `ac6d53d1c9` `d75ce0a8ae` `622abbcbb3` `e1f7368d51` `dc2e654b75` `50c9e44631` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 452151a75a70f744cba309b2700f274275d476bd	2020-02-20 18:25:57 -08:00
Jeremy Lilley	9266bde970	[pytorch] Minor: add GIL assert to PythonRpcHandler::handleExceptionGILHeld (#33557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33557 We should add GIL asserts in some places to keep assumptions documented. This just adds one in an exception codepath as a placeholder for more. This change also moves a #define from a .h to the .cpp to reduce scope. ghstack-source-id: 98673532 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D20005387 fbshipit-source-id: b7eff54a6f1dd69d199f8ca05cdb3001c50b37c4	2020-02-20 18:15:44 -08:00
Michael Suo	0bde610c14	Re-sync with internal repository (#33591 )	2020-02-20 16:46:16 -08:00
Hao Lu	3498c000e2	[TVM] Remove dynamic batch size dispatching (#33584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33584 - Remove dynamic batch size dispatching - Set caffe2_tvm_min_ops to 8 - Set caffe2_tvm_profiling_based_jit to false - Rename some variable names Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform Reviewed By: yinghai Differential Revision: D19850620 fbshipit-source-id: 2ec9bbd9fa72f953e79f3e27609ad00d4e135710	2020-02-20 16:13:29 -08:00
Elias Ellison	faa800eb5b	[JIT] remove inline everything jitter skip (#33468 ) Summary: The `not inline_everything` check was causing the jitter check to be skipped whenever we emitted a function. thanks SplitInfinity for pointing this out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33468 Differential Revision: D19975934 Pulled By: eellison fbshipit-source-id: 03faf8d2fd93f148100d8cf49cb67b8e15cf1f04	2020-02-20 15:58:25 -08:00
Peter Bell	c882425c24	Add 64-bit indexing support to THC index reductions (#33405 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32863, (together with https://github.com/pytorch/pytorch/issues/33310 for the `TensorIterator` reductions) This adds 64-bit indexed kernels for `THC_reduceDimIndex` and uses `THCTensor_canUse32BitIndexMath` to switch between the two at runtime. I have a test for this locally but haven't included it here because `max` is much slower than `argmax`. To the point where the test takes several minutes to call max on just one `2**32` element tensor. That seems excessive, even for a slow test but I can push it if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33405 Differential Revision: D20010769 Pulled By: ezyang fbshipit-source-id: a8a86f662598d5fade4d90448436418422c699a3	2020-02-20 15:20:14 -08:00
Igor Sugak	23846d5a38	[caffe2] use Clang identification macro in various places (#33574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574 Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation. Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Reviewed By: BIT-silence Differential Revision: D20007440 fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75	2020-02-20 15:16:11 -08:00
Martin Yuan	5782758b54	Add instructions and operators for new bytecode format of PyText model (#33555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555 A quick fix for the PyText model (in internal production) on the new bytecode format. Test Plan: Imported from OSS Differential Revision: D20008266 Pulled By: iseeyuan fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440	2020-02-20 15:05:37 -08:00
Igor Sugak	108fc78395	[caffe2] fix invalid % escape in inline assembly strings (#33554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554 NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool. 1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow Reviewed By: bddppq Differential Revision: D20003621 fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc	2020-02-20 14:31:52 -08:00
anjali411	e5cf7afd0a	torch.tensor can infer complex dtype now (#33361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33361 Test Plan: Imported from OSS Differential Revision: D19943477 Pulled By: anjali411 fbshipit-source-id: ff6d7d2a6fdb6c58390f33bdd8be2f3fa182518b	2020-02-20 14:24:15 -08:00
anjali411	13e4ee7883	Added tensor.is_complex(), is_complex and dtype.is_complex py binding, tensor printing, and dixed the scalar type returned for complex float (#33268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33268 Test Plan: Imported from OSS Differential Revision: D19907698 Pulled By: anjali411 fbshipit-source-id: c3ce2e99fc09da91a90a8fb94e5525a00bb23703	2020-02-20 13:38:01 -08:00
Nick Korovaiko	36d724c963	run peephole to do profile-based optimizations (#33337 ) Summary: We need to run a peephole before constant propagation in the profiling pipeline, so we fold `prim::shape` for inputs with complete tensor types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33337 Differential Revision: D19905624 Pulled By: Krovatkin fbshipit-source-id: 80fff067941556053847ddc7afe0fd1c7a89a3ba	2020-02-20 12:39:22 -08:00
vishwakftw	1a25747342	Check for consistent devices in at::where (#33432 ) Summary: Changelog: - Add a check to ensure that all inputs to `where` lie on the same device Pull Request resolved: https://github.com/pytorch/pytorch/pull/33432 Test Plan: - Added test_where_invalid_device Fixes https://github.com/pytorch/pytorch/issues/33422 Differential Revision: D19981115 Pulled By: VitalyFedyunin fbshipit-source-id: 745896927edb53f61f3dd48ba9e1e6cd10d35434	2020-02-20 12:18:01 -08:00
Edward Yang	71225ecc8c	Revert D20006312: Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is. Test Plan: revert-hammer Differential Revision: D20006312 Original commit changeset: 4d4cc8ae78ad fbshipit-source-id: 4bd4b9d1331dc97f5b83e0df491be5fd0a11214a	2020-02-20 12:05:13 -08:00
Vitaly Fedyunin	687a7e4a25	Revert D19975411: Remove special case codegen for tril_indices/triu_indices. Test Plan: revert-hammer Differential Revision: D19975411 Original commit changeset: 996598759bed fbshipit-source-id: 6bdb4b8f903e13815fc146e6f3260e5bb04c1045	2020-02-20 11:29:53 -08:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
Edgar Andrés Margffoy Tuay	cdf381c967	Fix LambdaLR scheduler side effects (#32848 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848 Differential Revision: D19859736 Pulled By: vincentqb fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d	2020-02-20 11:09:56 -08:00
Vitaly Fedyunin	3233033a17	Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is. Test Plan: revert-hammer Differential Revision: D19975410 Original commit changeset: eb729870c2d2 fbshipit-source-id: 4d4cc8ae78ad18751c126b93d82932ac2732f1b5	2020-02-20 11:01:44 -08:00
Jithun Nair	718c538ff9	Add ability to enable/disable MIOpen at runtime (#33118 ) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad	2020-02-20 10:47:57 -08:00
Yanli Zhao	01e1de8220	allow remote torchscript call to itself (#32990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32990 right now remote torchscript call can not call to itself, this diff is to support this in the same way as how is supported when calling remote python call to itself ghstack-source-id: 98599082 Test Plan: unit test Differential Revision: D19731910 fbshipit-source-id: 6495db68c3eaa58812aa0c5c1e72e8b6057dc5c4	2020-02-20 09:44:10 -08:00
Edward Yang	a9e4448dff	Update documentation on why _cudnn_init_dropout_state looks the way it is. (#33347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33347 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19975410 Pulled By: ezyang fbshipit-source-id: eb729870c2d279d7d9ca43c92e514fe38dedb06d	2020-02-20 09:36:26 -08:00
Edward Yang	196fda5a79	Remove special case codegen for tril_indices/triu_indices. (#33305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33305 The current TensorOptions code is written to exactly extract out TensorOptions based on exact struct match, including default arguments. That meant that tril_indices/triu_indices which had a different default argument didn't match, and thus needed a special case. I resolve this special case by instead replacing the explicit long default argument with a None default argument, and then adjusting the actual implementations to select the correct dtype when none was specified. I think the general rule I'm following here is that it is always acceptable to replace an explicit default argument, with a None argument (assuming the backend will compute it appropriately); the documentation gets modestly worse, but everything that was previously expressible continues to be expressible. Maybe later we should switch the default argument back to long, but for now the simplification in code is worth it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19975411 Pulled By: ezyang fbshipit-source-id: 996598759bed9e8d54fe61e19354ad038ed0e852	2020-02-20 09:34:28 -08:00
peter	ffe327f7d9	Revert "Disable flaky test TestCppExtensionAOT.test_cuda_extension in… (#33404 ) Summary: … Windows CI (https://github.com/pytorch/pytorch/issues/33282)" This reverts commit 5b922918d023126ad1f468c68577c9b599ad202d. Fixes https://github.com/pytorch/pytorch/issues/33270. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33404 Differential Revision: D19972594 Pulled By: ezyang fbshipit-source-id: c8f67536fd6e4b7135171d621ad671b1b2a21fd4	2020-02-20 09:08:29 -08:00
Vitaly Fedyunin	05fb160048	Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types Test Plan: revert-hammer Differential Revision: D19964089 Original commit changeset: a1e8e62d1ebc fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0	2020-02-20 08:19:21 -08:00
Edward Z. Yang	883b18ea70	Delete build_variables.bzl following configerator change. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2020-02-20 10:26:49 -05:00
Jongsoo Park	e95282ab28	[caffe2] make fused rowwise quant/dequant op work for N-dim tensors (#33426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33426 Make 2/4/8-bit fused rowwise conversion operators more general to work for N-dim tensors Test Plan: CI Reviewed By: ellie-wen Differential Revision: D19943136 fbshipit-source-id: 47008544dd7e1d11a346d34f35449e0fcc0e7ee0	2020-02-19 23:29:42 -08:00
Spandan Tiwari	bf0951d937	Updating ONNX checker logic. (#33522 ) Summary: We want to run ONNX checker only when selected operator type is ONNX, and nowhere else. This PR updates the logic in the exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33522 Reviewed By: hl475 Differential Revision: D19983954 Pulled By: houseroad fbshipit-source-id: 15db726321637a96fa110051cc54e9833e201133	2020-02-19 19:30:29 -08:00
Gao, Xiang	1fe635be3c	Allow vectorized gpu loop to have different argument types (#33222 ) Summary: Although currently the only user of GPU loops that has args with different dtypes is `where`, it sounds strange to restrict the args to have the same dtype. Allowing args to have different dtypes also makes it possible for me to clean up legacy code by reusing current code to implement unrolled GPU loop for non-contiguous tensors. The stack storage of `elementwise_kernel_helper` is changed from `arg_t args[nt][arity]` to `traits:: ArgsTuple args[nt]`. Due to this change, we can no longer get element by `operator[]`, but instead we should use `std::get`. As a result, we can no longer unroll the loop wrt arity using pragma, but we have to create a `static_unroll` to make use of template meta-programming to do the same job. A good side effect of this change is, `invoke_with_array` is no longer needed and can be replaced with already existing `c10::guts::apply`. And we don't need the `namespace arg_type` workaround either. This makes the code less ugly. The same approach might also work for ROCm loops, but I didn't change anything on ROCm in this PR, because I don't want potential compilation error or perf regression to delay this PR. But after this gets merged, I will try on ROCm and send a separate PR to make the code less diverge if the same approach trivially applies (trivially apply means a mindless copy-paste doesn't introduce unexpected compilation error or perf regression). Assembly (https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb#33222): ``` Symbol: void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3>) ASM: .section .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits .sectioninfo @"SHI_REGISTERS=20" .align 128 .global _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_ .type _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function .size _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40520 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_) .other _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT" _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R9, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 39 /0030/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0040/ IMAD.SHL.U32 R9, R9, 0x100, RZ ; /0050/ IADD3 R5, -R9, c[0x0][0x160], RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0060/ SHF.R.S32.HI R17, RZ, 0x1f, R9 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 255 /0070/ ISETP.GE.AND P0, PT, R5, 0x100, PT ; /0080/ @!P0 BRA `(.L_2919) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0090/ IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ; /00a0/ SHF.L.U64.HI R17, R9, 0x2, R17 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 229 /00b0/ IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ; /00c0/ IADD3 R2, P1, R12, c[0x0][0x190], RZ ; /00d0/ IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ; /00e0/ IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 82 /00f0/ IMAD.WIDE R8, R0, 0x10, R8 ; /0100/ IMAD.WIDE R2, R0, 0x10, R2 ; /0110/ LDG.E.128.SYS R8, [R8] ; /0120/ LDG.E.128.SYS R4, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0130/ IADD3 R12, P0, R12, c[0x0][0x180], RZ ; /0140/ IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0150/ IMAD.WIDE R12, R0, 0x10, R12 ; //## File "/usr/include/c++/8/tuple", line 1315 /0160/ FFMA R7, R7, c[0x0][0x168], R11 ; /0170/ FFMA R6, R6, c[0x0][0x168], R10 ; /0180/ FFMA R5, R5, c[0x0][0x168], R9 ; /0190/ FFMA R4, R4, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /01a0/ STG.E.128.SYS [R12], R4 ; /01b0/ EXIT ; .L_2919: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /01c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; /01d0/ BMOV.32.CLEAR RZ, B0 ; /01e0/ BSSY B0, `(.L_2920) ; /01f0/ IMAD.MOV.U32 R4, RZ, RZ, RZ ; /0200/ CS2R R6, SRZ ; /0210/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0220/ IMAD.MOV.U32 R10, RZ, RZ, RZ ; /0230/ P0 BRA `(.L_2921) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0240/ IADD3 R3, P1, R9, R0, RZ ; /0250/ LEA.HI.X.SX32 R6, R0, R17, 0x1, P1 ; /0260/ LEA R2, P1, R3, c[0x0][0x188], 0x2 ; /0270/ LEA.HI.X R3, R3, c[0x0][0x18c], R6, 0x2, P1 ; /0280/ LDG.E.SYS R10, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0290/ IADD3 R6, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02a0/ ISETP.GE.AND P1, PT, R6, R5, PT ; /02b0/ P1 BRA `(.L_2922) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /02c0/ LDG.E.SYS R6, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /02d0/ IADD3 R8, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02e0/ ISETP.GE.AND P1, PT, R8, R5, PT ; /02f0/ P1 BRA `(.L_2923) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0300/ IADD3 R8, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0310/ ISETP.GE.AND P1, PT, R8, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0320/ LDG.E.SYS R8, [R2+0x200] ; /0330/ @!P1 LDG.E.SYS R7, [R2+0x300] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0340/ P1 IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0350/ BRA `(.L_2921) ; .L_2923: /0360/ IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0370/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0380/ BRA `(.L_2921) ; .L_2922: /0390/ CS2R R6, SRZ ; /03a0/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; .L_2921: /03b0/ BSYNC B0 ; .L_2920: /03c0/ BMOV.32.CLEAR RZ, B0 ; /03d0/ BSSY B0, `(.L_2924) ; /03e0/ P0 BRA `(.L_2925) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /03f0/ IADD3 R3, P1, R9, R0, RZ ; /0400/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P1 ; /0410/ LEA R2, P1, R3, c[0x0][0x190], 0x2 ; /0420/ LEA.HI.X R3, R3, c[0x0][0x194], R12, 0x2, P1 ; /0430/ LDG.E.SYS R11, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0440/ IADD3 R12, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0450/ ISETP.GE.AND P1, PT, R12, R5, PT ; /0460/ P1 BRA `(.L_2926) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0470/ LDG.E.SYS R13, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0480/ IADD3 R12, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0490/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04a0/ P1 BRA `(.L_2927) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04b0/ LDG.E.SYS R15, [R2+0x200] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /04c0/ IADD3 R12, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /04d0/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04e0/ P1 BRA `(.L_2928) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04f0/ LDG.E.SYS R4, [R2+0x300] ; /0500/ BRA `(.L_2928) ; .L_2927: /0510/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0520/ BRA `(.L_2928) ; .L_2926: /0530/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0540/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0550/ BRA `(.L_2928) ; .L_2925: /0560/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0570/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0580/ IMAD.MOV.U32 R11, RZ, RZ, RZ ; .L_2928: /0590/ BSYNC B0 ; .L_2924: //## File "/usr/include/c++/8/tuple", line 1315 /05a0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05b0/ IADD3 R9, P0, R9, R0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /05c0/ FFMA R11, R11, c[0x0][0x168], R10 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /05d0/ IADD3 R14, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05e0/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ; /05f0/ LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0600/ ISETP.GE.AND P1, PT, R14, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0610/ LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ; /0620/ STG.E.SYS [R2], R11 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0630/ P1 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0640/ IADD3 R10, R0, 0x80, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /0650/ FFMA R13, R13, c[0x0][0x168], R6 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0660/ ISETP.GE.AND P0, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0670/ STG.E.SYS [R2+0x100], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0680/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0690/ IADD3 R0, R0, 0xc0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /06a0/ FFMA R15, R15, c[0x0][0x168], R8 ; /06b0/ FFMA R7, R4, c[0x0][0x168], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06d0/ STG.E.SYS [R2+0x200], R15 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06e0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06f0/ STG.E.SYS [R2+0x300], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 260 /0700/ EXIT ; .L_2929: /0710/ BRA `(.L_2929); /0720/ NOP; /0730/ NOP; /0740/ NOP; /0750/ NOP; /0760/ NOP; /0770/ NOP; .L_40520: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33222 Differential Revision: D19964089 Pulled By: ngimel fbshipit-source-id: a1e8e62d1ebcc67fb49f00d87c02bcdd13194024	2020-02-19 18:41:27 -08:00
Hao Lu	81394581a3	[Caffe2][ThreadPool] Make sure numThreads does not exceed the number of big cores (#33523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523 When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly. Test Plan: ``` cd ~/fbsource/xplat buck test caffe2:caffe2_testAndroid ``` Reviewed By: dreiss Differential Revision: D19779267 fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd	2020-02-19 18:24:24 -08:00
Nik Ved	602ef0d9d0	[WIP] migrate scatter_ to ATen CPU (+multithreading, nondeterministic) (#33139 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24757, partially https://github.com/pytorch/pytorch/issues/33094. Uses fix introduces in https://github.com/pytorch/pytorch/issues/33108 to avoid regressions for some compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33139 Differential Revision: D19882462 Pulled By: ngimel fbshipit-source-id: 5016f186a4aadc3cc32edcfd9abdea11786f27e9	2020-02-19 18:17:37 -08:00
Rohan Varma	6cb9e6b015	Back out "Revert D19871946: [distributed] pass in timeout to TCP store when initializing" (#33434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33434 Reland of https://github.com/pytorch/pytorch/pull/33325, since the unit test was flaky and failed on land. To ensure that the test is not flaky, I bumped the timeout so the rendezvous does not timeout (timing out the rendezvous in 1s led to the flakiness). I also generalized our mechanism for retrying on errors to include retrying on errors due to timeout in rendezvous. ghstack-source-id: 98558377 Test Plan: Added UT test_tcp_store_timeout_set Differential Revision: D19935390 fbshipit-source-id: 56ccf8c333dd2f954a33614d35cd1642d4e9473a	2020-02-19 17:17:17 -08:00
Lingyi Liu	ecb05f12c3	Support broadcast for quantized mul kernel (#30442 ) Summary: Since the tensor iterator supports the broadcast, we will just remove the assertion on input shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30442 Differential Revision: D19976562 Pulled By: lly-zero-one fbshipit-source-id: 91b27fc8b2570f29d110c6df26eacdd16f587b9f	2020-02-19 16:52:31 -08:00
Vitaly Fedyunin	ea514c819a	Make slow_conv_transpose2d_backward tensors contiguous (#33462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33462 Test Plan: Imported from OSS Differential Revision: D19956516 Pulled By: VitalyFedyunin fbshipit-source-id: 4fa9dcba0dd02b891ab36e6ecee8fc59e049c15c	2020-02-19 16:44:14 -08:00
Gaurav Singh	e5a02aa2fe	[caffe2] simplify relative error expr (#32999 ) Summary: simplify relative error expr Pull Request resolved: https://github.com/pytorch/pytorch/pull/32999 Differential Revision: D19739382 Pulled By: jerryzh168 fbshipit-source-id: 95e0c68f6d9cb6708f400cc1cdb311af83b0621e	2020-02-19 16:35:44 -08:00
Zhu, Haozhe	bd3c6e8e91	avoid large vector copy when query per_channel q_params (#31040 ) Summary: The quantizer use std::vector to save per_channel scales and zero_points, but when query scales(zero_points), it requires to return tensor. These lead to use std::vector to initialize tensors and it dose cost lots of time. So I change quantizer to save per_channel scales and zero_points by using tensor directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31040 Differential Revision: D19701070 Pulled By: jerryzh168 fbshipit-source-id: 9043f16c44b74dd8289b8474e540171765a7f92a	2020-02-19 16:24:24 -08:00
Jerry Zhang	8527ba8b70	[jit] Add None parameter as parameter instead of attributes (#32964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32964 att Test Plan: . Imported from OSS Differential Revision: D19913188 fbshipit-source-id: 9cdd93cbaf9892f4311656c786637765a675a68c	2020-02-19 16:06:56 -08:00
Omkar Salpekar	507f963aa6	[RPC Reliability] Enabled retries for RPCs with exponential backoff (#33365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33365 This adds functionality for re-trying RPC's that are sent with the function sendWithRetries(). It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout. GitHub Issue: https://github.com/pytorch/pytorch/issues/32124 Per the first 4 milestones, the following will be addressed in future PR's: * enabling RPC Retries for RRef internal messages Differential Revision: D19915694 fbshipit-source-id: 4a520e32d5084ebcf90e97fd9f26867115a35c0c	2020-02-19 15:59:29 -08:00
Michael Suo	416413dec4	[jit] add `inlined_graph` method to ScriptFunctions (#33508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33508 Ever since we switched to not inlining by default, some users have complained since they relied on inlining occuring to, e.g. process the graph with some other tool. Add an inlined_graph for convenience in those cases. Test Plan: Imported from OSS Differential Revision: D19977638 Pulled By: suo fbshipit-source-id: fe1fa92ff888959203d5d1995930d488b5f9e24c	2020-02-19 15:41:25 -08:00
Hongzhang Shan	5e80ca12bb	[pt][fbgemm] Turn on USE_FBGEMM on Windows env (#297 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33250 As Title says. FBGEMM has recently added the support for Windows. ghstack-source-id: 97932881 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D19738268 fbshipit-source-id: e7f3c91f033018f6355edeaf6003bd2803119df4	2020-02-19 15:09:21 -08:00
Michael Suo	cbf8657945	[jit] Fix ModuleDict type sharing (#33515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33515 Previously, if we had a `ModuleDict` with the same value types but different names for keys, they would share types under certain conditions. This only happens for `ModuleDict`, because in other cases a simple Python class check invalidates the class. Test Plan: Imported from OSS Differential Revision: D19978552 Pulled By: suo fbshipit-source-id: f31b2af490064f89b70aa35f83ba740ddaf2a77a	2020-02-19 15:01:46 -08:00
albanD	8908b62fb2	Clean views created inside no_grad that are modified inplace (#32839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32839 As mentioned in the updated comment in `variable.h`, this disambiguate code like: ```python base = torch.rand(10, requires_grad=True) with torch.no_grad(): view = base[1] view.copy_(var) torch.autograd.grad(base.sum(), var) # <- what should it return? ``` Given that there is no consensus of what should happen here (does the gradient flow through the view in the no_grad or not). This special case is detected and forbidden. As mentionned in the error message: - If you want it to be tracked: move both out of the no_grad - If do not want them to be tracked, move both inside the no_grad This implies that any custom Function that returns views does not allow inplace modification on its output. I'll add a PR to the stack to relax this to be a DeprecationWarning for now. And we will make it into an actual error for 1.6 This replaces https://github.com/pytorch/pytorch/pull/26607 cc sublee Test Plan: Imported from OSS Differential Revision: D19814114 Pulled By: albanD fbshipit-source-id: ff2c9d97c8f876d9c31773a2170e37b06d88bed7	2020-02-19 14:55:53 -08:00
Michael Suo	20c1e25832	Re-sync with internal repository (#33519 )	2020-02-19 14:33:44 -08:00
Matthew Haines	1d9fcf8bd2	Correct documentation for torch.unsqueeze (#33478 ) Summary: "out" argument in torch.unsqueeze is not actually implemented, fixed documentation https://github.com/pytorch/pytorch/issues/29800 After: ![image](https://user-images.githubusercontent.com/33493903/74796371-6289ee00-5296-11ea-8493-e8c18ac63bdf.png) Before: ![image](https://user-images.githubusercontent.com/33493903/74796444-96651380-5296-11ea-816c-2adacfa79e35.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33478 Differential Revision: D19978477 Pulled By: yf225 fbshipit-source-id: 42337326c1ec04975307366c94591ee32a11b091	2020-02-19 14:01:06 -08:00
Ailing Zhang	62c953b348	Fix svd tests between devices. (#33470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33470 Differential Revision: D19974449 Pulled By: ailzhang fbshipit-source-id: e456608fe95d270d822e786a5955cce7c746165c	2020-02-19 13:53:10 -08:00
anjali411	a8bd1d24c9	[Documentation] cummin doc fix (#33492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33492 Differential Revision: D19976082 Pulled By: anjali411 fbshipit-source-id: c9f8f541783fded98b8aba54e293f824c926496e	2020-02-19 13:51:38 -08:00
Mikhail Zolotukhin	d4e4513a64	[JIT] Add more ops to 'removableGuard' in guard elimination pass. (#33465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33465 Differential Revision: D19958385 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: f89b6a2ead279b55af286072223fc9ea1b5fe3b3	2020-02-19 11:47:23 -08:00
Jerry Zhang	07e5e42713	[jit][fix] Remove slot in parameter slot (#32846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32846 att Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D19844711 fbshipit-source-id: 3d29e5e97e97781f5dc00069827971baed52d76e	2020-02-19 11:15:15 -08:00
ptrblck	1e3664b6ef	Remove c/pdist tests from _internal/common_utils.py (#33409 ) Summary: * remove brute_test from `torch/testing/_internal/common_utils.py` * add these tests as internal tests to `test_torch.py` CC ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/33409 Differential Revision: D19951729 Pulled By: ailzhang fbshipit-source-id: b1126aaf26fa64a0f17cbb582dc8038b79cfe3eb	2020-02-19 10:27:30 -08:00
matham	60339a38ed	Fixes #33001 (#33456 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/33001. When subtracting 1 from a empty array, instead of being `-1` as seems to be expected in the later code (while loop), because `size()` seems to be unsigned, it becomes a very large number. This causes a segfault during the while loop later in the code where it tries to access a empty array. This issue seemed to happen only on the pi with the following example code: `v = torch.FloatTensor(1, 135).fill_(0); v[0, [1]] += 2`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33456 Differential Revision: D19963711 Pulled By: ezyang fbshipit-source-id: 1dbddd59a5df544cd7e025fc540c9efe2c4e19f4	2020-02-19 09:57:52 -08:00
Gao, Xiang	165b1ad8e8	Kill THCState_getNumDevices (#33375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33375 Differential Revision: D19973163 Pulled By: ezyang fbshipit-source-id: d8edede3a3ac5012e4208bb30b6e66d8a2d1019f	2020-02-19 09:52:40 -08:00
Yinghai Lu	96e5dea9f4	Remove unused variable (#33484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33484 att Test Plan: unittests Reviewed By: jfix71 Differential Revision: D19862090 fbshipit-source-id: c6a33604e2fc78fb90ae2b5fcc72421ee89a02aa	2020-02-19 08:51:56 -08:00
Gregory Chanan	d7f00b1b45	Remove using declaration from widely-used header file. (#33293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33293 Test Plan: Imported from OSS Differential Revision: D19904992 Pulled By: gchanan fbshipit-source-id: b5ac76db2e5cdb422671c6c5424858e1d97c323e	2020-02-19 08:19:11 -08:00
peter	a67691e508	Fix isnan for integral types in MSVC (#33483 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/32537#discussion_r381077989. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33483 Differential Revision: D19970623 Pulled By: anjali411 fbshipit-source-id: 53502101822672a333ab5349d93b6e93f7ee4265	2020-02-19 08:13:03 -08:00
davidriazati	53ad596342	[jit] Remove `torch.jit._dump_trace (#33453 ) Summary: This was old code that isn't tested and is broken, it should have been deleted in #24874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33453 Pulled By: driazati Differential Revision: D19961403 fbshipit-source-id: 94c52360460194d279dad5b0ea756ee366f525e1	2020-02-19 07:49:44 -08:00
svcscm	8b6a898d2b	Updating submodules Summary: GitHub commits: `d9ead2de34` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 6c245f2a656d30b7baf8d0bff85a49090174c289	2020-02-19 05:09:56 -08:00
Michael Suo	d13c1b8af8	[jit] de-optionalize SourceRange context (#32880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32880 The PR below made it impossible to construct a SourceRange without a context, so get rid of its optional-ness Test Plan: Imported from OSS Differential Revision: D19670923 Pulled By: suo fbshipit-source-id: 05936fca2a3d5e613313ade9287b2210bc4a3ccd	2020-02-18 23:46:05 -08:00
Michael Suo	d85c913bfd	[jit] Delete the ErrorReport default constructor (#32879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32879 An error report without a SourceRange context is bad, because it doesn't tell the user where something happened. Delete the default constructor to make it harder to create errors like this (you can still use a fake SourceRange if you absolutely need to). Also clean up the only case where the default constructor was used. Test Plan: Imported from OSS Differential Revision: D19670924 Pulled By: suo fbshipit-source-id: 46888a86e5d32b84c8d6d52c0c8d70243722b14a	2020-02-18 23:44:32 -08:00
Pieter Noordhuis	e9ac92a242	Make RPC message constructor actually move (#33440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33440 The constructors make a copy without `std::move` in the initializer list. Test Plan: Confirmed manually that without this change, the `data()` pointer of the vector changes. With this change it does not, as intended. Reviewed By: mrshenli Differential Revision: D19948685 fbshipit-source-id: ee4f22e29894b858ad86068722dc2f4651987517	2020-02-18 23:31:33 -08:00
svcscm	d50305e2f3	Updating submodules Summary: GitHub commits: `7903fc3142` `462eaef5fc` `e2966a7507` `09013ed8c4` `df7e47c39b` `f40e6d1dbf` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 37553007eb60438d5ddd9cb16f0edc24e4637c25	2020-02-18 23:27:08 -08:00
Xiang Gao	a5f01846c2	Kill THCState_getCurrentStream (#33376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33376 Differential Revision: D19964101 Pulled By: ngimel fbshipit-source-id: d6b76327191a469f3a88a54d8ffe07121139ab16	2020-02-18 21:24:27 -08:00
Spandan Tiwari	96989a2a11	[ONNX] Adding ONNX large model export support in exporter (#33062 ) Summary: There are large models such as GPT2-large which cannot be exported with the current exporter because of the 2GB protobuf limit (e.g. see https://github.com/pytorch/pytorch/issues/19277). ONNX spec specifies a special format for large (> 2GB) models. This PR adds support for exporting large models in ONNX large model format in the PyTorch-ONNX exporter. This is the first PR for this feature that enables the end-to-end execution. Tests for large model export have been added. We may need follow-up PRs to refine this workflow based on user feedback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33062 Reviewed By: hl475 Differential Revision: D19782292 Pulled By: houseroad fbshipit-source-id: e972fcb066065cae6336aa91c03023d9c41c88bd	2020-02-18 20:51:43 -08:00
Jerry Zhang	3ad59734d7	Add type annotation for bias in _ConvNd (#32885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32885 Currently Tensor bias is registered as parameter and None bias is registered as attribute. We need the type annotation because when we try to fold ConvBn in graph mode quantization we'll remove the None bias attribute and add a Tensor bias attribute, without type annotation the bias Value in the graph will be marked with different type in these two cases, so we have rewrite the graph to change the type as well in that case. But with type annotation we don't need to modify the graph since both cases the bias value will have type `Tensor?` Test Plan: . Imported from OSS Differential Revision: D19844710 fbshipit-source-id: 52438bc72e481ab78560533467f9379a8b0b0cfa	2020-02-18 20:09:18 -08:00
Xingdong Zuo	feaa622fc6	[Update transforms.py]Add `TanhTransform` (#19785 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/33195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19785 Differential Revision: D19642395 Pulled By: ezyang fbshipit-source-id: 73c386fb89cd195201757b5fa47d6c01914a1f8f	2020-02-18 17:42:10 -08:00
Ashkan Aliabadi	43e015f4b1	Bug fix in dynamic quantization kernels + better test coverage. (#33320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33320 Reviewed By: supriyar Differential Revision: D19893911 Pulled By: AshkanAliabadi fbshipit-source-id: e79dd06af333c6629e3412315550814da28d9c24	2020-02-18 15:32:44 -08:00
Zachary DeVito	f1b73799d5	Clean up isinstance flags (#33265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33265 This removes the need for isinstance to keep trace of list and tuple separately by introducing AnyListType and AnyTupleType into the JIT type system to be the common supertype of any lists or tuples. This allows us to remove the weird flags from the interpreter for the isinstance operator. Test Plan: Imported from OSS Differential Revision: D19883933 Pulled By: zdevito fbshipit-source-id: f998041b42d8b4554c5b99f4d95d1d42553c4d81	2020-02-18 15:07:06 -08:00
Zachary DeVito	7f2c25b6fa	Move special ops into interpreter (#32889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889 Common primitive ops that have special inputs make it very hard to serialize the bytecode for mobile because information about how the op behaves is hidden in the Node. This changes how we handle the following ops so that they are encoded as their own interpreter bytecodes. ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, , int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` This leaves a state where the _only_ remaining Node*-consuming builtins are things that are only introduced during JIT optimization and will not appear in mobile code. Serialization of bytecode can now be made to directly write the CodeImpl object without modification. Test Plan: Imported from OSS Differential Revision: D19673157 Pulled By: zdevito fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26	2020-02-18 15:07:01 -08:00
Zachary DeVito	83c347ff4a	Remove prim::Constant op (#32804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32804 Constants are interpreter primitives so the op was not actually used. This cleans up some of the logic around it. This also fixes constant prop such that failures to look up an op do not silently stop constant propagation. Instead, only errors inside the op implementation itself will do this. Test Plan: Imported from OSS Differential Revision: D19673156 Pulled By: zdevito fbshipit-source-id: 7beee59a6a67a6c2f8261d86bd505280fefa999e	2020-02-18 15:06:56 -08:00
Zachary DeVito	c59e35b147	interpreter handling for varargs to remove need for looking at Node (#32791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32791 When a registered operator has varags (ends with ... in its schema), the interpreter now appends the number of arguments to the top of the stack before invoking the operator. This allows the removal of more uses of Node* in the interpreter. This PR also then cleans up the constructors for Operator to make it more likely someone chooses the correct one. After making these ops: ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` Into interpreter primitives, we can remove all but two constructors for operators: one that is (schema_string, operation), and one that is (symbol, op_creator) for the remaining weird primitives. Test Plan: Imported from OSS Differential Revision: D19673158 Pulled By: zdevito fbshipit-source-id: 95442a001538a6f53c1db4a210f8557ef118de66	2020-02-18 15:04:48 -08:00
anjali411	da015c77a1	Cummax and Cummin doc update and performance benchmark (#32537 ) Summary: [CPU] Benchmark results for cummax, cummin: In [1]: import torch In [2]: x=torch.randn(5,6,7).cuda() In [3]: %timeit x.cummax(0) 134 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [4]: %timeit x.max(0) 114 µs ± 560 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit x.cummax(1) 134 µs ± 760 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [6]: %timeit x.max(1) 118 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [7]: %timeit x.cumsum(0) 97.1 µs ± 6.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [8]: %timeit x.cumprod(0) 83.6 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit x.cumprod(1) 86.3 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [10]: y=torch.randn(5,6,7) In [11]: %timeit y.cummax(0) 148 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [12]: %timeit y.max(0) 111 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [13]: %timeit y.cumsum(0) 54.8 µs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [14]: %timeit y.cumprod(0) 56.2 µs ± 836 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32537 Differential Revision: D19951171 Pulled By: anjali411 fbshipit-source-id: cf972c550189473e9ce62e24ac7dd34b9373fef9	2020-02-18 14:12:25 -08:00
anjali411	016d73bd74	remove Complex CPU/CUDA backend enum keys (#33267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33267 Test Plan: Imported from OSS Differential Revision: D19907696 Pulled By: anjali411 fbshipit-source-id: 78cc55344313387c4b05bb003688915cee64e3be	2020-02-18 13:38:39 -08:00
Owen Anderson	1d743e3154	Add guard elimination support for aten::unsqueeze. (#33371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33371 Differential Revision: D19920041 Pulled By: resistor fbshipit-source-id: 906af47676dba014c31eef069a4753207f2efc60	2020-02-18 13:22:58 -08:00
Michael Ranieri	1af30451e5	sync srcs between fbcode and ovrsource targets (#33368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33368 reorganizing files that describe sources to ensure the same list is used for both fbcode and ovrsource targets. (BUCK vs TARGETS) Test Plan: CI green Reviewed By: malfet Differential Revision: D19803036 fbshipit-source-id: 69c1fa10877c3f0c0e9c1517784949c3c9939710	2020-02-18 13:00:43 -08:00
Peter Bell	44af8ee6cd	Add pybind11 exception translator (#30588 ) Summary: Closes https://github.com/pytorch/pytorch/issues/30027 The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function: ```cpp m.def("foo", foo, py::call_guard<torch::PyWarningHandler>()); ``` Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588 Differential Revision: D19905626 Pulled By: albanD fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6	2020-02-18 11:33:29 -08:00
peter	4c8064c9e1	Fix avx-512 detection logic for jit fuser with MSVC 2019 (#33403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33401. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33403 Differential Revision: D19949812 Pulled By: soumith fbshipit-source-id: 00dc3c99b5ba1c13394d5d38bcb148720434b0a3	2020-02-18 11:04:18 -08:00
Michael Suo	abbf6e7f53	fix clang-tidy lint (#33448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33448 Test Plan: Imported from OSS Differential Revision: D19952962 Pulled By: suo fbshipit-source-id: db04bf74f6156edd1bd0716b12f6ca911c84a6bf	2020-02-18 11:02:57 -08:00
svcscm	4468a7b7b3	Updating submodules Summary: GitHub commits: `efc34423b6` `75bb459654` `fc1945c2e0` `332a31a145` `2b6eef4dc9` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: d105b9aa5001c53f884f007406684b73809a7680	2020-02-18 10:21:04 -08:00
Gregory Chanan	f938b3b4e0	Remove TH binding of set_(Tensor). (#33358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33358 We just translate this code to ATen. Test Plan: Imported from OSS Differential Revision: D19911114 Pulled By: gchanan fbshipit-source-id: 2279e63bb7006f7253620417937e3ce9301e0cdb	2020-02-18 10:10:00 -08:00
Jeong Ukjae	879cf0b15a	fix typing bug of LambdaLR.__init__ (#33271 ) Summary: ## problem ```python class LambdaLR(_LRScheduler): """Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr. Args: optimizer (Optimizer): Wrapped optimizer. lr_lambda (function or list): A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. last_epoch (int): The index of last epoch. Default: -1. Example: >>> # Assuming optimizer has two groups. >>> lambda1 = lambda epoch: epoch // 30 >>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() """ ``` `LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas. ## related issue Resolve https://github.com/pytorch/pytorch/issues/32645 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271 Differential Revision: D19878665 Pulled By: vincentqb fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0	2020-02-18 09:10:00 -08:00
Assaf Shocher	2c99ea8654	Dirac init compatibility with group convolutions (#32825 ) Summary: Initializing weights of group-conv with init.dirac_, and applying, previously resulted in an output that makes no sense: ``` x = torch.randn([1, 3, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(3, 3, 3, padding=1, groups=3, bias=False) torch.nn.init.dirac_(conv_layer.weight.data) print('\noutput (before this PR):\n',conv_layer(x)) input: tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]]) output (before this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ```` This PR allows introducing groups to the initialization: ``` torch.nn.init.dirac_(conv_layer.weight.data, groups=3) print('output (after this PR):\n', conv_layer(x)) output (after this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]], grad_fn=<MkldnnConvolutionBackward>) ``` When out_channels is different than input_channels, it does the natural thing which is applying identity in each group separately: ``` x = torch.randn([1, 2, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(2, 4, 3, padding=1, groups=2, bias=False) torch.nn.init.dirac_(conv_layer.weight.data, groups=2) print('\noutput:\n', conv_layer(x)) input: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]]]]) output: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ``` Argument 'groups' defaults to 1 so it is backward compatible. Tests are modified to include cases of with groups>1 but also contain groups=1 cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32825 Differential Revision: D19859926 Pulled By: vincentqb fbshipit-source-id: 9dfdd24471ff14d79c442dfd28c1891aff812fdf	2020-02-18 09:00:12 -08:00
Richard Zou	28c5213a97	Add mechanism to pass a number of workers to cpp extensions (#33346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33346 Fixes #33091 This PR lets users control the number of workers that cpp extensions uses through the environment variable `MAX_JOBS`. If the environment variable is a non-negative integer we use that many threads; otherwise, ninja falls back to the default. I chose to use the name `MAX_JOBS` because we use it in PyTorch already to control the number of workers PyTorch builds with. There is a risk that users of cpp extensions already have `MAX_JOBS` set but we are hoping that that risk is small and/or it means semantically the same thing. Test Plan: - tested locally Differential Revision: D19911645 Pulled By: zou3519 fbshipit-source-id: d20ed42de4f845499ed38f1a1c73e9ccb620f780	2020-02-18 06:48:11 -08:00
Vasil Khalidov	cfb4862673	[pytorch] correct input size check for GroupNorm (#33008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33008 Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed. Details: The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros. However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful. One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](`791c172a33/torchvision/models/segmentation/deeplabv3.py (L50)`), where GroupNorm could be used in place of BatchNorm [here](`791c172a33/torchvision/models/segmentation/deeplabv3.py (L55)`). However, now this is prohibited and results in failures. Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size. Test Plan: check that existing tests pass Reviewed By: fmassa Differential Revision: D19723407 fbshipit-source-id: c85c244c832e6592e9aedb279d0acc867eef8f0c	2020-02-18 06:43:53 -08:00
Mikhail Zolotukhin	dde2ff4608	[Fuser] Add a knob for disabling/enabling CUDA fuser. (#33395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33395 By default the GPU fuser stays enabled, but this function allows to manually disable it. It will be useful for working on other implementations of fuser. Test Plan: Imported from OSS Differential Revision: D19926911 Pulled By: ZolotukhinM fbshipit-source-id: 7ea9d1dd7821453d640f81c487b63e1d585123c4	2020-02-17 21:28:09 -08:00
Will Feng	a203dc2e6d	[C++ API] Allow skipping default arguments in module's forward method when module is used in Sequential (#33027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33027 This PR allows default arguments in module's forward method to be skipped when module is used in `torch::nn::Sequential`, by introducing the `FORWARD_HAS_DEFAULT_ARGS` macro and requiring that all modules that have default arguments in its forward method must have a corresponding `FORWARD_HAS_DEFAULT_ARGS` macro call. Fixes issue mentioned in https://github.com/pytorch/pytorch/issues/30931#issuecomment-564144468. Test Plan: Imported from OSS Differential Revision: D19777815 Pulled By: yf225 fbshipit-source-id: 73282fcf63377530063e0092a9d84b6c139d2e32	2020-02-17 20:38:02 -08:00
Will Feng	4724964810	[C++ API] Expose AnyValue and AnyModuleHolder classes (#33026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33026 This PR contains necessary changes to prepare for https://github.com/pytorch/pytorch/pull/33027. It exposes the following classes to public: 1. `torch::nn::AnyValue`, because if the user has optional arguments in their module's forward method, they must also use the `FORWARD_HAS_DEFAULT_ARGS` macro and pass in the default values for those optional arguments wrapped by `torch::nn::AnyValue`. 2. `torch::nn::AnyModuleHolder`, because `torch::nn::Module` needs to declare it as a friend class for it to be able to access `torch::nn::Module`'s protected methods such as `_forward_has_default_args` / `_forward_num_required_args` / `_forward_populate_default_args`. Test Plan: Imported from OSS Differential Revision: D19777814 Pulled By: yf225 fbshipit-source-id: 1c9d5aa24f0689154752c426a83ee98f64c9d02f	2020-02-17 20:35:22 -08:00
Will Feng	5d7f42847c	Add at::Tensor::retain_grad API (#33349 ) Summary: This PR adds `at::Tensor::retain_grad`, and its implementation mirrors the Python `torch.Tensor.retain_grad` API: `c6271c63f2/torch/tensor.py (L292-L315)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33349 Differential Revision: D19944524 Pulled By: yf225 fbshipit-source-id: e61d5d761996b6d1b860c04c4b4650c1a49a6a8c	2020-02-17 20:03:48 -08:00
Xiang Gao	55fa133cdc	Remove gpu_kernel_with_index (#33370 ) Summary: Although `gpu_kernel_with_index` might look like a quite general helper function at first look, it actually isn't. The problem is not only 32bit indexing, but something more fundamental: `TensorIterator` reorder dims and shapes, so if you have non-contiguous tensor such as `torch.empty(5, 5).t()` , the index won't be correct. Since the whole point of `TensorIterator` is to manipulate shapes/strides to speedup loops, it is fundamentally impossible to get the correct linear index without tons of efforts. Currently, the range factories are not failing on an `out=non_contiguous_tensor` is because it is so lucky that `has_internal_overlap` is stupid enough to return everything not contiguous as `TOO_HARD`. Since `gpu_kernel_with_index` is not general, we should move it from `Loops.cuh` to `RangeFactories.cu`. And since the kernel is so simple to implement, it makes no sense to use `TensorIterator` which goes through tons of unnecessary checks like `compute_dtypes`. `torch.range` is not tested for 64bit-indexing, and I will file a new PR to remove it (it was supposed to be removed at 0.5). Benchmark: The device is GTX-1650, I don't have a good GPU at home. Code: ```python import torch print(torch.__version__) for i in range(100): torch.randn(1000, device='cuda') torch.cuda.synchronize() for i in range(15, 29): %timeit torch.arange(2 ** i, device='cuda'); torch.cuda.synchronize() ``` Before: ``` 1.5.0a0+c37a9b8 11.9 µs ± 412 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.7 µs ± 309 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 19.6 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 28.9 µs ± 923 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 48.4 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 85.7 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 162 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 312 µs ± 9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 618 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.22 ms ± 9.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.45 ms ± 97.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.9 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.1 ms ± 378 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After: ``` 1.5.0a0+7960d19 11 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.4 µs ± 550 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18.4 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 27.6 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 46.2 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 83.3 µs ± 5.61 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 158 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 307 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 603 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.2 ms ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.4 ms ± 23.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.77 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.51 ms ± 933 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33370 Differential Revision: D19925990 Pulled By: ngimel fbshipit-source-id: f4a732fe14a5582b35a56618941120d62e82fdce	2020-02-17 17:15:04 -08:00
Xiaomeng Yang	ebb008eb68	Optimize Unfold3dAcc to improve performance of conv3d backward (#33317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33317 Optimize Unfold3dAcc to improve performance of conv3d backward Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "Conv3d" Reviewed By: houseroad Differential Revision: D19892678 fbshipit-source-id: 18873dd1d1409263d9925840db302b21fb3b490d	2020-02-17 14:49:02 -08:00
Pritam Damania	c90b393c00	Fix logging for aborted communicators in ProcessGroupNCCL. (#33147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33147 The log mentioned that it is aborting communicators even if `blockingWait_` was false. This was incorrect, and I updated the logging to reflect the appropriate behavior. ghstack-source-id: 98025017 Test Plan: waitforbuildbot Differential Revision: D19817967 fbshipit-source-id: fb3415af2cc99eb20981ceaa5203c0a1880fd6f3	2020-02-17 14:42:51 -08:00
Haixin Liu	1a589f50bd	[auto quant] Add quant_scheme_generator to interface with dper Summary: Add quant_scheme_generator that will be used to interface with dper. Also updated two related functions: - Add batch_size option to save_local_dataset() in dataset utils to be more flexible. Test Plan: Tested in the stacked diff D19747206. buck test deeplearning/numeric_suite/toolkit/test:int8_static_utils_test Reviewed By: csummersea Differential Revision: D19745159 fbshipit-source-id: a4ac1ef0ffdddc68bdf5e209ae801b8c475d0b96	2020-02-17 10:41:22 -08:00
svcscm	87dc2dbcce	Updating submodules Summary: GitHub commits: `19c040cb01` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: ddc41000622a682874ab3a11fdf4a91038f9c15f	2020-02-16 23:57:14 -08:00
Jongsoo Park	c57f8984e6	[caffe2] make order btw div and mul in adgrad consistent (#32974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32974 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/286 Re-attempt of D18805426 . Decided to be consistent with PyTorch Adagrad There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. This diff make them consistent by doing w += lr * grad / (sqrt(moment) + epsilon) in Adagrad and w += lr / (sqrt(moment) + epsilon) * grad in RowWiseSparseAdagrad. The Adagrad order is consistent with PyTorch (see aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp addcmul_cpu_kernel function). The RowWiseSparseAdagrad order is to make compute more efficient. In RowWiseSparseAdagrad, lr / (sqrt(moment) + epsilon) is shared among all elements in the row And, we're not going to use FMA to be consistent with PyTorch (even though it provides a little accuracy benefit) Test Plan: CI Reviewed By: wx1988 Differential Revision: D19342865 fbshipit-source-id: e950c16f2e1c4a2f2a3ef53b1705db373c67f341	2020-02-16 22:45:59 -08:00
svcscm	d29997373e	Updating submodules Summary: GitHub commits: `80dda47903` `797af57bb6` `b2fceb9d05` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: dde5fb9abca185422df11dc61c658dc333ad63ca	2020-02-16 21:01:37 -08:00
Rohan Varma	d4e4beddc4	Revert D19871946: [distributed] pass in timeout to TCP store when initializing Test Plan: revert-hammer Differential Revision: D19871946 Original commit changeset: dd002180c4c8 fbshipit-source-id: 40b0676c51e43366c0700e81d16cc7927ee8efc2	2020-02-16 19:37:44 -08:00
Rohan Varma	df47a3abe0	[distributed] pass in timeout to TCP store when initializing (#33325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33325 Closes https://github.com/pytorch/pytorch/issues/32924. There was a bug where for TCPStore, we would not respect the timeout passed into `init_process_group` while constructing the TCPStore. Instead, we'd set the timeout after the rendezvous created the store, meaning that we used the default timeout of 300s while connecting to the server. This diff passes the timeout passed into `init_process_group` to rendezvous so that it can be passed into the constructor for TCPStore, so that we can use the right timeout at construction time. Question: Should we make this change for FileStore as well? Currently the FileStore constructor does not take in a timeout at all. ghstack-source-id: 98401875 Test Plan: Added a UT Differential Revision: D19871946 fbshipit-source-id: dd002180c4c883216645b8a97cc472c6116ac117	2020-02-16 17:59:44 -08:00
Huayu Li	c75d06d854	Move gating part of SparseFeatureGating to local Summary: in dper2, local net is hard-coded by whitelisting some layers. Add SparseFeatureGating related layers to local net explicitly. Test Plan: * workflow: f167812211 * QRT: fall back looks normal {F228442018} Differential Revision: D19852280 fbshipit-source-id: 6fecc3d745c3f742d029575a7b9fe320618f1863	2020-02-16 14:18:27 -08:00
Lu Fang	f6808df75f	[BC] Temporarily fix the BC check (#33387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33387 CI is broken. Skip two functions to fix the problem. Test Plan: ci Reviewed By: hl475 Differential Revision: D19926249 fbshipit-source-id: a46d1465c59de8616d2af5fb0b9cc18532359f88	2020-02-15 18:31:25 -08:00
Peter Bell	495bd5818b	Fix index truncation in argmin/max for large tensors (#33310 ) Summary: Fixes the `TensorIterator` parts of https://github.com/pytorch/pytorch/issues/32863 (THC is still broken) `TensorIterator::split` now keeps track of the `view_offsets` into the full tensor range. With this, I can take the base offset for the reduced dimension and translate partial results from the sub-iter into the index range of the full tensor. This happens only once for each intermediate result, so we should still benefit from the performance of 32-bit indexing in loops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33310 Differential Revision: D19906136 Pulled By: ngimel fbshipit-source-id: 3372ee4b8d5b115a53be79aeafc52e80ff9c490b	2020-02-15 17:24:55 -08:00
Xiang Gao	cd038c0ae9	Get rid of some template arguments in GPU loop (#33308 ) Summary: Globally define ```C++ constexpr int num_threads = C10_WARP_SIZE * 2; constexpr int thread_work_size = 4; constexpr int block_work_size = thread_work_size * num_threads; ``` and kill all the template arguments passing these values. These are effectively global, but we are now passing them around by template arguments, causing many inconvenience in coding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33308 Differential Revision: D19907250 Pulled By: ngimel fbshipit-source-id: 4623b69baea7e6e77f460ffdfa07cf9f8cba588a	2020-02-15 15:17:46 -08:00
Pritam Damania	fd684cc312	Use torch.set_default_dtype in test_data_parallel and rename dtype2prec (#32962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32962 As per gchanan's comments on https://github.com/pytorch/pytorch/pull/30445, I've used `torch.set_default_dtype` in test_data_parallel instead of specifying dtype=torch.double everywhere. Also, renamed dtype2prec to dtype2prec_DONTUSE ghstack-source-id: 98388429 Test Plan: waitforbuildbot Differential Revision: D19714374 fbshipit-source-id: eb55bbca33881625636ba9ea6dd4cb692f25668e	2020-02-15 14:07:54 -08:00
Lu Fang	6dd6b0bfae	Revert D19900566: [pytorch][PR] Simplify prim::shape when we have complete tensor types. Test Plan: revert-hammer Differential Revision: D19900566 Original commit changeset: c8eaad70c8ea fbshipit-source-id: 764f2139fdf19f22a397694d011078ec525f5e8a	2020-02-15 11:37:35 -08:00
Owen Anderson	d35a4c202e	Add support for aten::slice to guard elimination. (#33311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33311 Differential Revision: D19911105 Pulled By: resistor fbshipit-source-id: 402cfe5f2e03a62b78ed13157e1462cefd9eeafb	2020-02-14 22:54:37 -08:00
svcscm	c37a9b874b	Updating submodules Summary: GitHub commits: `65758fd3b1` `fb73204584` `618f71a795` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 814ebbcf35bcecc62ec64854a26ea645d651fbc2	2020-02-14 20:48:09 -08:00
lixinyu	1e76649d30	fast setup for output tensor in tensor iterator (#33165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33165 Test Plan: Imported from OSS Differential Revision: D19825853 Pulled By: glaringlee fbshipit-source-id: 8f908f2e93a4e377306a77e8a771208603b20e72	2020-02-14 20:34:50 -08:00
svcscm	c6271c63f2	Updating submodules Summary: GitHub commits: `46fd5fed10` `87cd6087c6` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 402427af823fe31ac1f6e18c5a020ec6ec7cc1af	2020-02-14 20:04:48 -08:00
Mikhail Zolotukhin	e1a895858f	Allow to register custom passes both before and after fusion. (#33261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33261 It was requested in #33114. Test Plan: Imported from OSS Differential Revision: D19910600 Pulled By: ZolotukhinM fbshipit-source-id: 827f1744b97f386065a21d1ba5d82c1f90edbe46	2020-02-14 16:28:52 -08:00
Eli Uriegas	3359871f5d	.circleci: Use volume mounts instead of docker cp (#33355 ) Summary: docker cp was erroring out, so lets just use volume mounts instead which should hopefully be more consistent Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33355 Differential Revision: D19913948 Pulled By: seemethere fbshipit-source-id: 059ddd36a8162f946cfea451b5dcd1706f1209e9	2020-02-14 15:32:57 -08:00
Eli Uriegas	dfafe2aad1	.cirlceci: Swap PYTORCH_BUILD_VERSION if on tag (#33326 ) Summary: Basically just fills out PYTORCH_BUILD_VERSION to the correct version baesd on the git tag. This makes it so that we don't have to continually edit this file when doing releases. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33326 Differential Revision: D19911035 Pulled By: seemethere fbshipit-source-id: e27105f3e193a49dd68452d8f60232f8a132acad	2020-02-14 14:43:29 -08:00
George Guanheng Zhang	5cab54e0db	Revert D19560159: [RPC Reliability] Implemented retries for RPCs with exponential backoff Test Plan: revert-hammer Differential Revision: D19560159 Original commit changeset: 40cd86f9a25d fbshipit-source-id: 70f5b19bc05fc34e3c912f42f9d32b9fb80aed06	2020-02-14 14:29:59 -08:00
Will Feng	0b5b2b864a	[BC-Breaking] Rename at::Tensor::base() to _base() (#33316 ) Summary: This PR renames `at::Tensor::base()` to `at::Tensor::_base()`, to achieve parity with Python `torch.Tensor._base` API. ---- This PR is BC-breaking in the following way: Previously, to get the tensor that this tensor is a view of, the user would call `tensor.base()` in C++. Now, they must call `tensor._base()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33316 Differential Revision: D19905687 Pulled By: yf225 fbshipit-source-id: 949d97b707b2c82becb99ac89e9ac24359d183e6	2020-02-14 14:06:58 -08:00
Tao Xu	9c0625b004	[iOS] Add watchOS support (#33318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33318 ### Summary Recently, we have a [discussion](https://discuss.pytorch.org/t/libtorch-on-watchos/69073/14) in the forum about watchOS. This PR adds the support for building watchOS libraries. ### Test Plan - `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=WATCHOS ./scripts/build_ios.sh` Test Plan: Imported from OSS Differential Revision: D19896534 Pulled By: xta0 fbshipit-source-id: 7b9286475e895d9fefd998246e7090ac92c4c9b6	2020-02-14 14:02:22 -08:00
Owen Anderson	ecd9a5ad12	Simplify prim::shape when we have complete tensor types. (#33336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33336 Differential Revision: D19900566 Pulled By: resistor fbshipit-source-id: c8eaad70c8ea57ebbe920dcfbdaf6a9435b49506	2020-02-14 13:53:08 -08:00
Edward Yang	9c8b67b179	Revert D19905015: Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19905015 Original commit changeset: b117e44d5552 fbshipit-source-id: a10c78aed953434f69f466bdd36f914334ba82f3	2020-02-14 13:42:29 -08:00
anjali411	b730c5a3bd	remove dispatch key (#33266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33266 Test Plan: Imported from OSS Differential Revision: D19907697 Pulled By: anjali411 fbshipit-source-id: 99fc06b7c41229e8d9ed4271de62247cda12ee6e	2020-02-14 13:26:15 -08:00
Johannes M Dieterich	6ade7e3a15	[ROCm] Enable 3D convolutions through ROCm (#33067 ) Summary: For both the Caffe2 and PyTorch backends, enable 3D convolutions through MIOpen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33067 Reviewed By: BIT-silence Differential Revision: D19880495 Pulled By: bddppq fbshipit-source-id: 8f6f970910654c1c5aa871b48a04c1054875691c	2020-02-14 13:19:10 -08:00
Negin Raoof	9823662b43	[ONNX] Export split with list of sizes (#33161 ) Summary: Exporting Split with a dynamic list of split_sizes is not supported. This PR enables export using onnx SplitToSequence + SequenceAt Pull Request resolved: https://github.com/pytorch/pytorch/pull/33161 Reviewed By: hl475 Differential Revision: D19860152 Pulled By: houseroad fbshipit-source-id: 300afedc22b01923efb23acd1a3627aa146bb251	2020-02-14 12:46:33 -08:00
meganset	e9e9331927	Fractional Max Pooling: output ratios defined as double (#33304 ) Summary: References https://github.com/pytorch/pytorch/issues/33240 Changes options.output_ratio from long integer to double to allow ratios to used to calculate output size from inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33304 Differential Revision: D19887318 Pulled By: yf225 fbshipit-source-id: 228c2c6bf4158307700c2a983d27d539c6b9eded	2020-02-14 12:31:39 -08:00
Raghuraman Krishnamoorthi	243cc20451	Enable inplace relu fusion for training (#33105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33105 Support inplace relu for Conv+BN+Relu fusion during training. ghstack-source-id: 97944659 Test Plan: buck test caffe2/test:quantization -- 'test_fuse_module_train $test_quantization\.FusionTest$' --print-passing-details Differential Revision: D19795221 fbshipit-source-id: 056dc06050d145750c4d0044c0fc1c3febcfdafc	2020-02-14 12:15:58 -08:00
Guanheng Zhang	8245641091	Re-activate binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33321 ) Summary: Re-send the PR as Intel has restored the relevant packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33321 Differential Revision: D19894221 Pulled By: zhangguanheng66 fbshipit-source-id: bc19dcfa5b17ff047f9ae09ebd8eadfb01f7ed68	2020-02-14 12:01:56 -08:00
Omkar Salpekar	92b67c03e4	[RPC Reliability] Implemented retries for RPCs with exponential backoff (#32602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32602 This adds functionality for re-trying RPC's that are sent with the function `sendWithRetries()`. It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout. GitHub Issue: https://github.com/pytorch/pytorch/issues/32124 Per the first 3 milestones, the following will be addressed in future PR's: * enabling RPC Retries for RRef internal messages Differential Revision: D19560159 fbshipit-source-id: 40cd86f9a25dc24367624d279a3b9720b20824cf	2020-02-14 11:57:24 -08:00
Edward Yang	ae53f8dd25	Revert D19859905: [pytorch][PR] Gradient scaling API Test Plan: revert-hammer Differential Revision: D19859905 Original commit changeset: bb8ae6966214 fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970	2020-02-14 11:03:27 -08:00
xiaobing.zhang	b276ddda38	remove THC dist code which nerver be used (#33283 ) Summary: Remove THC dist code which nerver be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33283 Differential Revision: D19905361 Pulled By: gchanan fbshipit-source-id: 367fd31e2209d36b30af31511554fdbdd67c98e4	2020-02-14 10:37:23 -08:00
Nicki Skafte	4bef344210	Implementation of mixture distributions (#22742 ) Summary: Addressing issue https://github.com/pytorch/pytorch/issues/18125 This implements a mixture distributions, where all components are from the same distribution family. Right now the implementation supports the ```mean, variance, sample, log_prob``` methods. cc: fritzo and neerajprad - [x] add import and `__all__` string in `torch/distributions/__init__.py` - [x] register docs in docs/source/distributions.rst ### Tests (all tests live in tests/distributions.py) - [x] add an `Example(MixtureSameFamily, [...])` to the `EXAMPLES` list, populating `[...]` with three examples: one with `Normal`, one with `Categorical`, and one with `MultivariateNormal` (to exercise, `FloatTensor`, `LongTensor`, and nontrivial `event_dim`) - [x] add a `test_mixture_same_family_shape()` to `TestDistributions`. It would be good to test this with both `Normal` and `MultivariateNormal` - [x] add a `test_mixture_same_family_log_prob()` to `TestDistributions`. - [x] add a `test_mixture_same_family_sample()` to `TestDistributions`. - [x] add a `test_mixture_same_family_shape()` to `TestDistributionShapes` ### Triaged for follup-up PR? - support batch shape - implement `.expand()` - implement `kl_divergence()` in torch/distributions/kl.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22742 Differential Revision: D19899726 Pulled By: ezyang fbshipit-source-id: 9c816e83a2ef104fe3ea3117c95680b51c7a2fa4	2020-02-14 10:31:56 -08:00
Hong Xu	7dde91b0ae	Vectorize elu and its backward function on CPU (#32986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32986 Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('ELU',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(20_000, 100000), (200_000, 10000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, requires_grad=True, dtype={dtype}); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double 5.292799739996553 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double 4.828570917001343 torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float 3.1359513780043926 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float 2.7030876770004397 Backward torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double 4.568238995998399 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double 1.8908141480060294 torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float 3.8652471189998323 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float 1.13068484600808 ``` After: ``` Forward torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double 2.1265591429983033 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double 1.6708065870043356 torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float 1.1806934149935842 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float 0.77735430400935 Backward torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double 4.494567882007686 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double 2.007220732004498 torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float 3.615133151994087 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float 1.105554559995653 ``` Test Plan: Imported from OSS Differential Revision: D19794595 Pulled By: VitalyFedyunin fbshipit-source-id: c319ec04676ced22179b8b34789ac8bf6428deab	2020-02-14 09:45:17 -08:00
Jeremy Lilley	1b2d2ba504	[PyTorch] Fix write-after-free (TSAN) in GraphTask::set_error() (#33156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33156 When dist_autograd_spawn_thrift's 'test_backward_node_failure_python_udf' test is run, it was encountering a TSAN error related to holding the mutex while the underlying datastructure was being dealloced. In this change, we simply get a shared_ptr<> reference to the future, and set_exception() without having the lock held, to avoid deallocing underneath the lock. ghstack-source-id: 98303434 Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/rpc:dist_autograd_spawn_thrift -- 'test_backward_node_failure_python_udf $test_dist_autograd_spawn\.DistAutogradTestWithSpawn$' Differential Revision: D19821362 fbshipit-source-id: 82f735e33f8e608552418ae71592400fa3621e40	2020-02-14 09:32:17 -08:00
George Guanheng Zhang	0c98939b7b	Revert D19899550: [pytorch][PR] Second try on Von Mises: Make it JIT compatible Test Plan: revert-hammer Differential Revision: D19899550 Original commit changeset: fbcdd9bc9143 fbshipit-source-id: c8a675a8b53f884acd0e6c57bc7aa15faf83d5d6	2020-02-14 08:42:16 -08:00
George Guanheng Zhang	ff5f38f53b	Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19858239 Original commit changeset: f068d8505886 fbshipit-source-id: b117e44d5552e157747920d8098ce3b86a29c6bf	2020-02-14 07:35:08 -08:00
Ahmad Salim Al-Sibahi	b1583ceb1e	Second try on Von Mises: Make it JIT compatible (#33177 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/17168 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/33177 Differential Revision: D19899550 Pulled By: ezyang fbshipit-source-id: fbcdd9bc91438164bcb2b1cbc314c765520754e1	2020-02-14 07:16:41 -08:00
Yinghai Lu	ecd3c252b4	Suport all length one SLS op lowering: C2 part (#33332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33332 We check the input shape of lengths and indices of SLS and add an attribute if they are the same. Test Plan: ``` buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- test_slws_fused_8bit_rowwise_length1_graph ``` Reviewed By: ipiszy Differential Revision: D19874903 fbshipit-source-id: 06b643b5351d0ba19ba209b5a5b599fbb38b1dfc	2020-02-13 22:53:11 -08:00
Francis Charette Migneault	0150f40dde	dont force msvc /Ox flag which can conflict with /RTC1 in debug config (#33164 ) Summary: Relates to https://github.com/pytorch/pytorch/issues/33132 This fix doesn't add full multi-configuration support described in https://github.com/pytorch/pytorch/issues/33132 but at least avoid the error presented in the issue when `CMAKE_BUILD_TYPE=Debug` is used with MSVC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33164 Differential Revision: D19899727 Pulled By: ezyang fbshipit-source-id: 28a364d920c4a3fb577c6b484ccd69a133fbcf5d	2020-02-13 22:15:20 -08:00
Xiang Gao	602aec325d	Kill old cuda support (#33302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33302 Differential Revision: D19899586 Pulled By: ezyang fbshipit-source-id: 11293475795b4bfee9a65133bb6718649e220787	2020-02-13 21:48:07 -08:00
songyouwei	e5218e3e12	Add missing error messages for container modules (#29991 ) Summary: Container `Module`s, including `ModuleList`, `ParameterList` and `ParameterDict`, should not be called like a regular `Module`. This PR add error messages for these special modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29991 Differential Revision: D19698535 Pulled By: ezyang fbshipit-source-id: fe156a0bbb033041086734b38f8c6fde034829bf	2020-02-13 21:34:27 -08:00
Jongsoo Park	92fbf7cf97	[caffe2] use JIT'ed fp16 SLS (#32432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32432 Use JIT'ed fp16 SLS in D19477209 from Caffe2 operators Test Plan: CI Reviewed By: jianyuh Differential Revision: D19477208 fbshipit-source-id: ef2ccba10f5f4c475166141bf09c266dedb92d38	2020-02-13 21:15:39 -08:00
Lu Fang	642bd51043	[ONNX] Skip problematic ONNX test to unblock CI (#33323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33323 skip the tests until it is fixed Test Plan: ci Reviewed By: hl475 Differential Revision: D19894675 fbshipit-source-id: 1cfc153577bf021171f4412115d84719beae7a91	2020-02-13 21:08:27 -08:00
Lu Fang	e5c7b7b8b5	Automatic update of fbcode/onnx to 04a29addfd5b912812addb8dea5f8763fbfaad01 (#33328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33328 Previous import was 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e Included changes: - [04a29add](https://github.com/onnx/onnx/commit/04a29add): Use // instead of # (#2598) <Lu Fang> - [f8e140a9](https://github.com/onnx/onnx/commit/f8e140a9): Kezhan/function update (#2596) <Ke Zhang> - [6185faae](https://github.com/onnx/onnx/commit/6185faae): fix the attribute types section in IR.md (#2590) <Ke Zhang> - [f254647a](https://github.com/onnx/onnx/commit/f254647a): Allow Constant operator to promote scalar and list to tensors. (#2592) <Jeremy Cochoy> - [f12ec799](https://github.com/onnx/onnx/commit/f12ec799): Add NegativeLogLikelihood(NllLoss) op (#2551) <liqunfu> Test Plan: ci Reviewed By: hl475 Differential Revision: D19897554 fbshipit-source-id: d8efb5c5ac8f9d71727de33c67af681ed8ec8123	2020-02-13 21:03:17 -08:00
Wanchao Liang	93179b1c1c	[jit] Initial use RRef in TorchScript (#33190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33190 This enable the initial RRef type to be used inside TorchScript, user could pass a python RRef into a torchscript function and call to_here inside. Specifically, this PR: - Add RRef schema type parsing - Add python interop for RRef in Python and into JIT - register to_here op in register_distributed_ops More support for RRef in TorchScript will be added in future PRs Test Plan: Imported from OSS Differential Revision: D19871244 Pulled By: wanchaol fbshipit-source-id: 7eca6c491a84666b261c70806254b705603bd663	2020-02-13 20:17:25 -08:00
Wanchao Liang	b2c5896432	[jit] Add RRef to IValue and JIT type system (#32992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32992 This PR add RRef to IValue and the JIT type system. - The RRefInterface abstract class inherit from intrusive_ptr_target, this made the RRef class can be hold in ivalue as intrusive_ptr - Add RRefType as a JIT type, it's a container type similar to future type. Test Plan: Imported from OSS Differential Revision: D19871242 Pulled By: wanchaol fbshipit-source-id: cb80ca32605096f9a42ef147109fb368a7c1d4d3	2020-02-13 20:17:20 -08:00
Wanchao Liang	9ae4d38a21	[rpc] Switch RRef to be managed by intrusive_ptr (#33189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33189 Add RRefInterface to Aten/Core, which will later be used by IValue Switch all the rpc code base to use intrusive_ptr instead of shared_ptr, so that we could add it to IValue. Actual adding to IValue and JIT will be in next PR Test Plan: Imported from OSS Differential Revision: D19871241 Pulled By: wanchaol fbshipit-source-id: d7e1fd04b46320e0f26c18591b49c92ad30a4032	2020-02-13 20:15:31 -08:00
Mike Ruberry	cb4e6d025a	Updates numpy to tensor negative stride error message (#33254 ) Summary: See https://discuss.pytorch.org/t/bugs-about-torch-from-numpy-array/43312. This update incorporates albanD 's suggestion into the error message, saving future users from having to ask or look on the forums if they encounter this issue and don't mind making their arrays contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33254 Differential Revision: D19885808 Pulled By: mruberry fbshipit-source-id: 8f0fd994cf8c088bf3c3940ab4dfb3ddbc5b3ede	2020-02-13 15:38:52 -08:00
Hector Yuen	a80d0330e4	add int4 fake fp16 mappings Summary: update this mapping with thte int4 sls ops so we can run netrunner Test Plan: testing with net_runner Reviewed By: jfix71 Differential Revision: D19879826 fbshipit-source-id: eac84b10e2365c21cb8a7cfbf3123e26a9945deb	2020-02-13 15:37:23 -08:00
Rohan Varma	eb9b4b1f29	handle errors in ProcessGroupAgent::listenLoop(). (#32957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32957 Closes https://github.com/pytorch/pytorch/issues/29703. If there is a gloo timeout and `recvWork->wait()` times out in `listenLoop()`, processGroupagent crashes since there is an unhandled exception in a thread. This catches the exception and exits the listen loop. In a follow up diff, we will enhance these error conditions so that if users attempt to send RPCs again, they are notified that the RPC agent was in a bad state and it was shutdown. This PR also adds a new option, `processGroupTimeout` to PG agent's backend options. This allows us to control the gloo timeout. ghstack-source-id: 98236783 Test Plan: Added a unit test. Differential Revision: D19678979 fbshipit-source-id: 3895ae754f407b84aca76c6ed3cb087d19178c40	2020-02-13 14:50:05 -08:00
Gregory Chanan	7ae1e023e7	glu: port cpu forward implementation to ATen (#26410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26410 I only ported the CPU forward implementation for now to try a CPU-only benchmark. Test Plan: Imported from OSS Differential Revision: D17454519 Pulled By: gchanan fbshipit-source-id: ff757cf972c5627074fea2f92a670129007a49f4	2020-02-13 14:32:25 -08:00
Peter Bell	0808485c6a	Workaround performance bug / memory leak in GOMP (#32875 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32008 This is similar to CaoZhongZ's patch which runs on all OpenMP threads in the team and selectively exits early to scale the number of threads active. I have also restored the `if` clause from before https://github.com/pytorch/pytorch/issues/26963 so that running on 1 thread should still avoid additional synchronisation. One comment is that this does slightly change the meaning of `at::get_num_threads` inside of a `parallel_for` loop since it's not guaranteed that the function was called on that many threads. I've looked at the uses within ATen and couldn't see anything that would be problematic. There are a few places in `quantized` that seem to make this assumption but they always use a grain size of 1 so should be safe: `d9e99ab544/aten/src/ATen/native/quantized/cpu/qconv.cpp (L436-L437)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32875 Differential Revision: D19775823 Pulled By: VitalyFedyunin fbshipit-source-id: 4f843b78cdb9e2766339590d728923786a00af6d	2020-02-13 14:31:08 -08:00
Hong Xu	bbdc5b7bd0	Optimize error checking in mvlgamma (#32665 ) Summary: - Clean up error checking code - Avoid unecessary floating-point computation - Use float instead of double when possible to avoid massive cast in the tensor - Use bool instead of uint8_t for clear Boolean purpose - Improve error message Pull Request resolved: https://github.com/pytorch/pytorch/pull/32665 Differential Revision: D19601920 Pulled By: VitalyFedyunin fbshipit-source-id: 0c6c6b5ff227b1437a6c1bae79b2c4135a13cd37	2020-02-13 14:05:19 -08:00
Will Feng	5b922918d0	Disable flaky test TestCppExtensionAOT.test_cuda_extension in Windows CI (#33282 ) Summary: See https://github.com/pytorch/pytorch/issues/33270 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33282 Differential Revision: D19886975 Pulled By: yf225 fbshipit-source-id: 7e6756095b1bb8c55fc5acb8fc2cb02c1e89b032	2020-02-13 13:10:44 -08:00
Prajjwal Bhargava	0c93c2b142	Add a warning sign for anomaly detection (#33176 ) (#33239 ) Summary: Fixes [33176](https://github.com/pytorch/pytorch/issues/33176) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33239 Differential Revision: D19879847 Pulled By: albanD fbshipit-source-id: 594b936c10f98c364331e782b64f42059413a741	2020-02-13 12:52:21 -08:00
Edward Yang	6c6a814a2c	Beef up documentation on DispatchKey.h (#33011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33011 I also reordered some of the keys in non-semantic ways to make the organizational grouping mroe clear. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19796584 Pulled By: ezyang fbshipit-source-id: 3083abadb47e9f382b9fbe981af0b34203c6ea4d	2020-02-13 12:26:19 -08:00
Supriya Rao	2e88d3d703	[quant] Add Quantized BatchNorm2d module (#33109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33109 Test Plan: python test/test_quantized_nn_mods.py ModuleAPITest.test_batch_norm Imported from OSS Differential Revision: D19861926 fbshipit-source-id: 67315e49b4b3577b965d422ca707d927d977feeb	2020-02-13 12:15:43 -08:00
Supriya Rao	d0435604a5	[quant] Add a quantized batch_norm operator (#33080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33080 Quantized batch norm for cases where batch norm cannot be fused with conv. AVX2 implementation is from Caffe2. Test Plan: python test/test_quantized.py TestQuantizedOps.test_batch_norm Imported from OSS Differential Revision: D19861927 fbshipit-source-id: bd8cd101fc063cb6358132ab7c651a160999293c	2020-02-13 12:15:38 -08:00
Andres Suarez	b28a834813	[codemod][lint][fbcode] Apply google-java-format Test Plan: Sandcastle. Visual inspection. Reviewed By: scottrice Differential Revision: D19878711 fbshipit-source-id: be56f70b35825140676be511903e5274d1808f25	2020-02-13 12:14:14 -08:00
Elias Ellison	bf16688538	[JIT] peephole optimize values with NoneType (#33264 ) Summary: If a value has the type None, we can always replace it with a None constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33264 Differential Revision: D19878695 Pulled By: eellison fbshipit-source-id: 5d0e7ffb37c5747997df093fec3183039d8dff4d	2020-02-13 12:03:49 -08:00
Hong Xu	0c474d95d9	Remove Half support in binary cross entropy and some activation functions on CPU (#33206 ) Summary: For reasons similar to https://github.com/pytorch/pytorch/issues/33021. Note that the support of half type has not been available in any releases yet so it should be safe to remove (All forward ones concerning this PR were added in daef363b15c8a3aaaed09892004dc655df76ff81 and 8cb05e72c69fdd837548419770f3f1ba9807c16d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33206 Differential Revision: D19861137 Pulled By: ezyang fbshipit-source-id: 38a3a398a716a782c26a611c56ddeab7eb7ac79e	2020-02-13 11:47:42 -08:00
peter	946f3a9ed7	Refactor and add VS 14.16 and 2019 CI for Windows (#33117 ) Summary: Changes according to https://github.com/pytorch/pytorch/issues/18319. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33117 Differential Revision: D19858239 Pulled By: ezyang fbshipit-source-id: f068d8505886b92c9388c9c636eab5bd20377ceb	2020-02-13 11:45:41 -08:00
Chaitanya Sri Krishna Lolla	2635055229	[ROCm] Enable 3D batch norms through MIOpen (#33262 ) Summary: Enable test for Caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33262 Differential Revision: D19880486 Pulled By: bddppq fbshipit-source-id: af663a11137a53302e55198f38117ab6bdc9ec89	2020-02-13 11:29:51 -08:00
t-kuha	acea368095	Fix compilation error when buildng with FFMPEG (#27589 ) Summary: When building with FFMPEG, I encountered compilation error due to missing include/library. I also find the change in video_input_op.h will improve build on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27589 Differential Revision: D19700351 Pulled By: ezyang fbshipit-source-id: feff25daa43bd2234d5e75c66b9865b672a8fb51	2020-02-13 11:23:48 -08:00
Michael Carilli	40246fa63c	Gradient scaling API (#26512 ) Summary: This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081. Volume-wise, this PR is mostly documentation and tests. The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight . The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant. The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package. `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases. Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P). Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible. Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API. However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates. User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself. [I know what the contract with custom optimizers should be](`35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)`), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`. Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081. The gradient scaling API is intended to be orthogonal/modular relative to autocasting. Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well. ### Todo - [ ] How do I get c10 registered status for my two custom kernels? They're very simple. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512 Differential Revision: D19859905 Pulled By: mruberry fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923	2020-02-13 11:06:06 -08:00
Rohan Varma	d613bd0522	[rpc][easy] move unnecessary python call directly to pybind (#33174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33174 Closes https://github.com/pytorch/pytorch/issues/32780. It looks like this is the only callsite where we do `_get_current_rpc_agent().foo()`, and we can do this directly in the pybind layer to save some overhead. ghstack-source-id: 98200664 Test Plan: All UTs should pass. Differential Revision: D19828786 fbshipit-source-id: 5c34a96b5a970e57e6a1fdf7f6e54c1f6b88f3d8	2020-02-13 09:14:13 -08:00
George Guanheng Zhang	0bf60e348f	Revert D19878241: [pytorch][PR] Restore tests binary_macos_libtorch_2_7_cpu_build and binary_macos_li… Test Plan: revert-hammer Differential Revision: D19878241 Original commit changeset: 07bce43e4667 fbshipit-source-id: 7f76717d73e264f30e8f56fb7bc38c8928dea092	2020-02-13 09:09:11 -08:00
Guanheng Zhang	ff7d147732	Restore tests binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33291 ) Summary: Fix https://github.com/pytorch/pytorch/issues/33209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33291 Differential Revision: D19878241 Pulled By: zhangguanheng66 fbshipit-source-id: 07bce43e466708dacd37b87ba3419435c6a7cde5	2020-02-13 08:48:16 -08:00
Summer Deng	d554b112e3	Add histogram collection and weight prepacking utils (#33125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33125 Provide histogram collection and weights prepacking interface for Dper to auto quantize the Ads models. Test Plan: buck test mode/opt deeplearning/numeric_suite/toolkit/test:int8_static_utils_test buck test mode/opt deeplearning/numeric_suite/toolkit/test:histogram_utils_test Reviewed By: amylittleyang Differential Revision: D19794819 fbshipit-source-id: 6a4f4a6684da0977b7df2feed8a4b961db716da8	2020-02-13 01:40:20 -08:00
Hao Lu	b98c7d34ed	[TVM] Add clip op to c2_frontend (#33257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33257 Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform Reviewed By: yinghai Differential Revision: D19866406 fbshipit-source-id: e903e15178af323d0bd1f804e09919023c0a2989	2020-02-12 22:30:43 -08:00
Hao Lu	16685d93e9	[TVM] Add ReplaceNaN op (#33256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33256 Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform Reviewed By: yinghai Differential Revision: D19851553 fbshipit-source-id: dee048c52ade16d9e531256b90e5d3391632cd8e	2020-02-12 22:29:30 -08:00
Lu Fang	03e9b9ce18	[PyTorch BC] Remove unnecessary items in whitelist (#33247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33247 remove stale items. Test Plan: ci Reviewed By: hl475 Differential Revision: D19861294 fbshipit-source-id: 2b112e5908c19a1ff190e3850085038065d21c53	2020-02-12 21:34:18 -08:00
Michael Ranieri	e45343fa14	TORCH_INTERNAL_ASSERT_DEBUG_ONLY not eating message string (#33251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33251 Somehow this was preventing `c10::Error` exceptions from ever being thrown on windows when `defined(NDEBUG) == false`. Kinda scary. Test Plan: sandcastle green, made sure `intrusive_ptr_test.cpp` (givenStackObject_whenReclaimed_thenCrashes) passed inside ovrsource using `mode/win/dev-debug` Reviewed By: malfet Differential Revision: D19865667 fbshipit-source-id: c32d5752025c043e57d16c6d14a94b069bed0bc3	2020-02-12 21:23:34 -08:00
davidriazati	f61b45fc89	[jit] Support properties on `Device` (#32953 ) Summary: Stacked PRs * #32955 - [jit] Fix flipped PackedSequence outputs in script * #32953 - [jit] Support properties on `Device` PyTorch devices have a `index` and `type` property. This PR adds support for both to TorchScript ](https://our.intern.facebook.com/intern/diff/19849320/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32953 Pulled By: driazati Differential Revision: D19849320 fbshipit-source-id: ce845258c6110058dd9ea1f759ef74b7ed2e786e	2020-02-12 18:59:10 -08:00
Mikhail Zolotukhin	806e7daa1f	Rename TorchScript compiler to IR emitter to better reflect its function. (#33127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33127 Test Plan: Imported from OSS Differential Revision: D19806503 Pulled By: ZolotukhinM fbshipit-source-id: ab78bdbbac5f12dbcc6c2e2573f5862a16ffcf3d	2020-02-12 18:45:13 -08:00
anjali411	91744907d4	SGD: updated step and class design (#32592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32592 Differential Revision: D19868154 Pulled By: anjali411 fbshipit-source-id: ce888efc68b1531d97e8b0abf2b146198e012d2f	2020-02-12 18:38:55 -08:00
Jianyu Huang	914610d079	[pytorch][quant] Add assert for min, max, qmin, qmax for ChooseQuantizationParams (#32739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32739 As Title says. ghstack-source-id: 98061467 Test Plan: CI Differential Revision: D19610810 fbshipit-source-id: f9621cd7d780769941ed77974b19c5226d4b2b30	2020-02-12 16:49:31 -08:00
Xiaomeng Yang	bc0ab07064	Opitmize Unfold3d to improve performance of Conv3d (#33191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33191 Opitmize Unfold3d to improve performance of Conv3d forward Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "Conv3d" Reviewed By: houseroad Differential Revision: D19821946 fbshipit-source-id: 937adafddb9a1aef5f1d1423dd99884c59e465f9	2020-02-12 16:34:55 -08:00
Martin Yuan	0e753b2818	Fix SIGABORT caused by double exception in PyTorchStreamReader when file not found. (#33243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33243 If a file does not exist in an archive, PyTorchStreamReader throws an exception. However, when PyTorchStreamReader is destructed another exception is thrown while processing the first exception. As a result of this double exception there is SIGABORT. Thanks dreiss for catching this bug and suggesting the fix. It happened when he used _load_for_mobile to load a torch script file without bytecode session. A unittest is added to test this case. Test Plan: Imported from OSS Differential Revision: D19859205 Pulled By: iseeyuan fbshipit-source-id: 8f96b6256f1a1f933fce1c256d64604c7e9269e4	2020-02-12 16:27:15 -08:00
svcscm	ac8511a21e	Updating submodules Summary: GitHub commits: `927d8afa7a` `e64508917b` `40d690970f` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 9135af67550f83a598a0a0baa1f9f6b1e4311ddf	2020-02-12 15:43:34 -08:00
Brian Stark	f9ad5528e0	Fix for rand_like as well. (#33095 ) Summary: This is a followup PR to https://github.com/pytorch/pytorch/issues/32830 This solves the same issue for RandLike which we saw in RandNLike Pull Request resolved: https://github.com/pytorch/pytorch/pull/33095 Reviewed By: hl475 Differential Revision: D19848625 Pulled By: houseroad fbshipit-source-id: 147921becf79490027a93606d52c5bc41d9eaf7f	2020-02-12 14:54:39 -08:00
Zachary DeVito	f045dab3dd	Remove ImplicitTensorToNum (#32761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32761 This replaces ImplicitTensorToNum with result-specific operators like IntImplicit, FloatImplicit, or ScalarImplicit. Note that ScalarImplicit was not correctly implemented before and this PR fixes the lapse. This does not change on-disk serialization because these operators are not serialized directly but written as eg. `annotated(int, foo)`. Test Plan: Imported from OSS Differential Revision: D19615385 Pulled By: zdevito fbshipit-source-id: 48575f408e8219d2ec5b46936fc2aa691f283976	2020-02-12 14:49:07 -08:00
Zachary DeVito	99349defc1	remove unnecessary Node* ops (#32760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32760 Minor changes to the way ops are implemented to remove incidental use of Node* in the operator implementation. Current state for operators that previously took Node: ``` TBD: USES NODE: prim::DifferentiableGraph(...) -> (...) USES NODE: prim::profile(...) -> (...) USES NODE: prim::FusionGroup(...) -> (...) USES NODE: prim::PythonOp(...) -> (...) USES NODE: prim::ImplicitTensorToNum(Tensor a) -> Scalar # next PR Should be made interpreter primitives: USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, , int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack Should be made into vararg operators, i.e. the operators last argument should be an IValue that contains the number of arguments. USES NODE: prim::FusedConcat(...) -> (...) USES NODE: prim::MMTreeReduce(...) -> (...) USES NODE: prim::MMBatchSide(...) -> (...) USES NODE: prim::ConstantChunk(...) -> (...) USES NODE: prim::AutogradAnyNonZero(...) -> bool USES NODE: prim::BroadcastSizes(...) -> (...) USES NODE: prim::ChunkSizes(...) -> (...) USES NODE: aten::format(str self, ...) -> str USES NODE: prim::Print(...) -> (...) fixed: USES NODE: aten::extend(Tensor[](a!) self, Tensor [] other) -> () USES NODE: aten::copy(Tensor[](a) self) -> Tensor[] USES NODE: aten::extend(int[](a!) self, int [] other) -> () USES NODE: aten::copy(int[](a) self) -> int[] USES NODE: aten::extend(float[](a!) self, float [] other) -> () USES NODE: aten::copy(float[](a) self) -> float[] USES NODE: aten::extend(bool[](a!) self, bool [] other) -> () USES NODE: aten::copy(bool[](a) self) -> bool[] USES NODE: aten::extend(t[](a!) self, t [] other) -> () USES NODE: aten::copy(t[](a) self) -> t[] USES NODE: aten::keys(Dict(str, t) self) -> str[]() USES NODE: aten::values(Dict(str, t) self) -> t[]() USES NODE: aten::dict((str, tVal)[] inputs) -> Dict(str, tVal) USES NODE: aten::keys(Dict(int, t) self) -> int[]() USES NODE: aten::values(Dict(int, t) self) -> t[]() USES NODE: aten::dict((int, tVal)[] inputs) -> Dict(int, tVal) USES NODE: aten::keys(Dict(float, t) self) -> float[]() USES NODE: aten::values(Dict(float, t) self) -> t[]() USES NODE: aten::dict((float, tVal)[] inputs) -> Dict(float, tVal) USES NODE: aten::keys(Dict(Tensor, t) self) -> Tensor[]() USES NODE: aten::values(Dict(Tensor, t) self) -> t[]() USES NODE: aten::dict((Tensor, tVal)[] inputs) -> Dict(Tensor, tVal) USES NODE: aten::test_vartype2(t a, t[] b) -> (t[]) USES NODE: aten::_ncf_unsqueeze(Tensor self, int ndim) -> Tensor USES NODE: aten::_ncf_view(Tensor self, int[] input_shape, int normalized_ndim) -> Tensor USES NODE: prim::is_none(int? a) -> bool USES NODE: aten::__interpolate(Tensor input, int? size = None, float[]? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::__interpolate(Tensor input, int[]? size = None, float[]? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::__interpolate(Tensor input, int? size = None, float? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::__interpolate(Tensor input, int[]? size = None, float? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::sorted(t[](a) self) -> (t[]) USES NODE: aten::sort(t[](a!) self, bool reverse=False) -> () USES NODE: aten::test_vartype(t[] a, t b) -> (t) USES NODE: prim::unchecked_unwrap_optional(t(a)? optional) -> t(a) USES NODE: prim::unchecked_cast(...) -> (...) USES NODE: aten::dict() -> Dict(str, Tensor) USES NODE: prim::Load(...) -> (...) USES NODE: prim::Store(...) -> (...) USES NODE: prim::Drop(...) -> (...) USES NODE: aten::tensor(t[] data, , ScalarType? dtype=None, Device? device=None, bool requires_grad=False) -> Tensor USES NODE: aten::as_tensor(t[] data, *, ScalarType? dtype=None, Device? device=None) -> Tensor ``` Test Plan: Imported from OSS Differential Revision: D19615387 Pulled By: zdevito fbshipit-source-id: 95298c3c4249b9f812c332d13f0fb79daeecb662	2020-02-12 14:49:02 -08:00
Zachary DeVito	72a00a8a9c	Remove Node dependencies from operator.h (#32682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32682 This moves code around so that operator.h/cpp no longer requires a full definition of Node* nor does it include alias analysis or the pretty printer. This should make it possible to include in the mobile build. Functionality for checking if operators match Node and to look up and operator for a Node have moved to the Node object. Test Plan: Imported from OSS Differential Revision: D19615386 Pulled By: zdevito fbshipit-source-id: e38bdf29971183597ef940d061c06ba56e71d9c5	2020-02-12 14:47:26 -08:00
root	ab14375b08	Workaround for CUDA10.2.89 CUDA extension compilation error (#33230 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/33203 PR based on https://github.com/mpark/variant/pull/73 Verified locally on CUDA10.2.89 and 10.1.243 Thanks ngimel for the hint and gridley for the initial fix in the variant repo! :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33230 Differential Revision: D19858083 Pulled By: ngimel fbshipit-source-id: b9438084f5688712c6aa6b17813c68ccde237bbb	2020-02-12 14:23:30 -08:00
Michael Ranieri	40265e2d66	prevent various warnings related to undef and redef (#33196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33196 Test Plan: Sandcastle green Reviewed By: malfet Differential Revision: D19842268 fbshipit-source-id: 47bc3d7a75e803041491e11a648b4a9e7d9cc72c	2020-02-12 13:28:35 -08:00
lixinyu	323b0e0a0f	fix #30480 torch.normal shape checking is broken (#32243 ) (#33050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33050 Following what gchanan proposed in #30480 - If the (logical) shapes of mean and std are broadcastable, we broadcast them for the output Done in tensor iterator already. - If the (logical) shapes of mean and std are not broadcastable and they have the same number of elements, we fall back to the old behavior (pick the shape of mean) Done by reshape std to the same shape of mean. - If the (logical) shapes of mean and std are not broadcastable and don't have the same number of elements, we error out. Done by tensor iterator already. Test Plan: Imported from OSS Differential Revision: D19771186 Pulled By: glaringlee fbshipit-source-id: a0b71063c7f5fdda2d4ceb84e06384414d7b4262	2020-02-12 12:43:09 -08:00
rivergold	2e9b7c5fe1	Migrate dist from TH to ATen(CPU, CUDA) (#29714 ) Summary: [https://github.com/pytorch/pytorch/issues/24691](https://github.com/pytorch/pytorch/issues/24691) [https://github.com/pytorch/pytorch/issues/24551](https://github.com/pytorch/pytorch/issues/24551) Benchmark: Speed ```python import time, sys import torch import math inf = math.inf torch.manual_seed(0) devices = ["cpu", "cuda"] ps = [0, 1, 2, 3, 4, inf, -inf] # Warm up for device in devices: for n in [1, 10, 100, 1000]: x = torch.randn(100, n, requires_grad=False, device=device) y = torch.randn(100, n, requires_grad=False, device=device) for i in range(1000): for p in ps: dist_xy = torch.dist(x, y, p) for device in devices: print('On {}'.format(device)) for n in [1, 10, 100, 1000]: total_time = 0 x = torch.randn(100, n, requires_grad=False, device=device) y = torch.randn(100, n, requires_grad=False, device=device) for i in range(10000): for p in ps: t1 = time.time() dist_xy = torch.dist(x, y, p) t2 = time.time() total_time += (t2 - t1) average_time = total_time / 10000 / len(ps) * 1000 print("input size(100, %d) average time is %.8f (ms)." % (n, average_time)) ``` Output Before: ```shel On cpu input size(100, 1) average time is 0.0079491 (ms). input size(100, 10) average time is 0.0364167 (ms). input size(100, 100) average time is 0.3120752 (ms). input size(100, 1000) average time is 3.0605820 (ms). On cuda input size(100, 1) average time is 0.04745627 (ms). input size(100, 10) average time is 0.04919453 (ms). input size(100, 100) average time is 0.06601572 (ms). input size(100, 1000) average time is 0.07849015 (ms). ``` After: ```shell On cpu input size(100, 1) average time is 0.0099936 (ms). input size(100, 10) average time is 0.0340414 (ms). input size(100, 100) average time is 0.2793379 (ms). input size(100, 1000) average time is 0.7858076 (ms). On cuda input size(100, 1) average time is 0.04410237 (ms). input size(100, 10) average time is 0.03326339 (ms). input size(100, 100) average time is 0.03314828 (ms). input size(100, 1000) average time is 0.03990038 (ms). ``` Precision ```python for device in devices: torch.manual_seed(0) print('On {}'.format(device)) for n in [1, 10, 100, 1000]: x = torch.randn(100, n, requires_grad=False).to(device) y = torch.randn(100, n, requires_grad=False).to(device) for p in ps: dist_xy_float = torch.dist(x, y, p) dist_xy_double = torch.dist(x.double(), y.double(), p) difference = torch.abs(dist_xy_double - dist_xy_float) print('input size (100, {}), p: {}, float: {}, double: {}, difference: {}'.format(n, p, dist_xy_float, dist_xy_double, difference)) ``` Part of [output](https://gist.github.com/rivergold/dd95014dc7f163b22f72699d1134cdd2) Before: ```shell On cpu input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.1806640625, double: 11474.185433543797, difference: 0.00476948129653465 input size (100, 100), p: 2, float: 143.50729370117188, double: 143.5073391487937, difference: 4.5447621829453055e-05 input size (100, 100), p: 3, float: 36.045475006103516, double: 36.04550275212738, difference: 2.774602386779179e-05 input size (100, 100), p: 4, float: 18.796083450317383, double: 18.79609807865317, difference: 1.4628335787136848e-05 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 On cuda input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.1865234375, double: 11474.185433543797, difference: 0.00108989370346535 input size (100, 100), p: 2, float: 143.50733947753906, double: 143.5073391487933, difference: 3.2874575595087663e-07 input size (100, 100), p: 3, float: 36.04550552368164, double: 36.045502752127405, difference: 2.7715542358919265e-06 input size (100, 100), p: 4, float: 18.796098709106445, double: 18.796098078653177, difference: 6.304532682577246e-07 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 ``` After ```shell On cpu input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.1806640625, double: 11474.185433543797, difference: 0.00476948129653465 input size (100, 100), p: 2, float: 143.50729370117188, double: 143.5073391487937, difference: 4.5447621829453055e-05 input size (100, 100), p: 3, float: 36.045475006103516, double: 36.04550275212738, difference: 2.774602386779179e-05 input size (100, 100), p: 4, float: 18.796083450317383, double: 18.79609807865317, difference: 1.4628335787136848e-05 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 On cuda input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.185546875, double: 11474.185433543797, difference: 0.00011333120346534997 input size (100, 100), p: 2, float: 143.50733947753906, double: 143.5073391487933, difference: 3.2874575595087663e-07 input size (100, 100), p: 3, float: 36.04550552368164, double: 36.045502752127405, difference: 2.7715542358919265e-06 input size (100, 100), p: 4, float: 18.796096801757812, double: 18.796098078653177, difference: 1.2768953645547754e-06 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29714 Differential Revision: D19769518 Pulled By: albanD fbshipit-source-id: 69b79b64f1f190b410efe884662b6601e903eccf	2020-02-12 12:26:48 -08:00
Tao Xu	97bf41ca22	Fix iOS x86_64 CI failure (#33194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33194 ### Summary The iOS x86_64 job has been failed for a few days. I haven't found the root cause, but seems like updating the torchvision to its latest version can fix the problem ### Test Plan - the x86_64 job works Test Plan: Imported from OSS Differential Revision: D19845079 Pulled By: xta0 fbshipit-source-id: 5034e252600b6704b860d68c371a65bef4cf37fc	2020-02-12 11:07:48 -08:00
Xiang Gao	87640570b3	Make CUDA OOM error a type (#33056 ) Summary: There are cases when we want to recover from CUDA OOM, for example, some cuDNN algorithms use huge workspace and we want to recover from OOM to pick a different algorithm, in such cases, there is no reason to catch all errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33056 Differential Revision: D19795359 Pulled By: ezyang fbshipit-source-id: a34e23bf6d172dc0257389251dafef5b38d27d2b	2020-02-12 10:45:40 -08:00
Gregory Chanan	a389f8fa18	Revert D18912680: Prepare templates Test Plan: revert-hammer Differential Revision: D18912680 Original commit changeset: 9e3828e42ee5 fbshipit-source-id: 9ef81991394f4e36f0652dfe594d5122969bd9cf	2020-02-12 10:39:09 -08:00
Kurt Mohler	3cfea39968	Document how BCELoss avoids infinite results (#33160 ) Summary: Issue https://github.com/pytorch/pytorch/issues/31453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33160 Differential Revision: D19835527 Pulled By: albanD fbshipit-source-id: 82fd2dd46ffbc87e90ca8e100db411b6ff6bfe32	2020-02-12 07:56:19 -08:00
albanD	05281a5671	Add nice error message if missing overrides in custom autograd.Function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33142 Test Plan: Imported from OSS Differential Revision: D19815786 Pulled By: albanD fbshipit-source-id: 5513d900c7b711b625383686fcf03f822ab7ea80	2020-02-12 07:55:06 -08:00
Kirby Banman	09915ad570	[TensorBoard] Correct typo and wrap dataformats. (#31604 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/31603 - A minor spelling typo is corrected: "suitible" --> "suitable" - A minor quality of life improvement is added: the data format strings are better rendered as fixed width to indicate that they are string constants. "CHW" --> "`CHW`" Pull Request resolved: https://github.com/pytorch/pytorch/pull/31604 Differential Revision: D19697293 Pulled By: ezyang fbshipit-source-id: ee38b0d4c9ca8a233ac9243c310d9a3b42ad6f32	2020-02-12 07:51:04 -08:00
vfdev	c6e0360812	Minor change of docstring example of WeightedRandomSampler (#30846 ) Summary: Previous example ```python >>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True)) [0, 0, 0, 1, 0] ``` may seem misleading according to provided weights. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30846 Differential Revision: D19697367 Pulled By: ezyang fbshipit-source-id: 3d6e3cd0cecb5272a368707ba35bc7acdbd82c30	2020-02-12 07:46:39 -08:00
Jongsoo Park	1767ae8daf	[caffe2] remove dnnlowp log code (#33184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33184 dnnlowp specific code shouldn't be in the default FC in the first place Test Plan: Just removing #ifdef #endif Reviewed By: jianyuh Differential Revision: D19835301 fbshipit-source-id: 7880cf298bedb3f0bc407d140d342124663ea4a7	2020-02-12 00:47:09 -08:00
Lin Yang	9d9fa2eace	[2/3] Bind Bucketize to PyTorch (#33014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33014 Export Bucketize to PyTorch. Test Plan: buck test caffe2/caffe2/python/operator_test:torch_integration_test Reviewed By: bddppq Differential Revision: D19737534 fbshipit-source-id: be1c892bb8d01da9892f221f150f1a2788ac732e	2020-02-11 23:20:10 -08:00
Jithun Nair	47e589eb6e	Disable flaky tests test_DistributedDataParallel and test_backend_group for ROCm (#33211 ) Summary: Getting intermittent error in CI runs: TestDistBackend.test_DistributedDataParallel ``` 02:36:32 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/serialization.py", line 442, in _legacy_save 02:36:32 pickler.dump(obj) 02:36:32 AttributeError: Can't pickle local object 'Module._replicate_for_data_parallel.<locals>.zero_grad' ``` Some CI runs where it failed: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16163/console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16165/console TestDistBackend.test_backend_group ``` test_backend_group (__main__.TestDistBackend) ... Memory access fault by GPU node-5 (Agent handle: 0x265c670) on address 0x7fded754a000. Reason: Page not present or supervisor privilege. ``` Some CI runs where it failed: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16288/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/33211 Differential Revision: D19849089 Pulled By: bddppq fbshipit-source-id: 5e997653cc344f4c6819d46bedc6d3bd75b5d854	2020-02-11 22:50:03 -08:00
Hongyu Cai	5bc5dd58f3	[jit] fix a typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29107 Differential Revision: D19698662 Pulled By: ezyang fbshipit-source-id: e7eea3246008e2c6d560ff5e4d84b90f65ff1afd	2020-02-11 22:45:28 -08:00
Xiang Gao	b9a5353fee	Move where cuda implementation to TensorIterator (#33228 ) Summary: Reopen of https://github.com/pytorch/pytorch/pull/32984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33228 Differential Revision: D19850862 Pulled By: ngimel fbshipit-source-id: b92446a49b4980188fa4788220a2164650e905c2	2020-02-11 22:28:27 -08:00
svcscm	7863d2413d	Updating submodules Summary: GitHub commits: `9fd0d1a3c7` `bcaf9cdf1f` `3e49249d30` `98307ea1ec` `f48ebb4d48` `353f9c9f29` `1caef25fc0` `805ab665f2` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 609187c69ba2c6b31a05dcfdb1770054002ddb6e	2020-02-11 22:00:54 -08:00
Summer Deng	d609497dde	bulk_eval_collect_histograms Summary: Collect activation histograms along the model evaluation and aggregate all the histograms from multiple threads/readers into one file The original functionality of bulk_eval workflow is still valid. The output predictions and extra blobs will be exported to a hive table, which will be very useful for numerical debugging. Test Plan: FBL ```flow-cli canary dper.workflows.bulk_eval.export --mode dbg --parameters-file experimental/summerdeng/sparsenn/bulk_eval_input_configs.json --run-as-secure-group team_ai_system_sw_hw_co-design --entitlement gpu_prod --name "Histogram collection with caffe2 logging. Attach histogram observer to the predict net. Use small model 102343030. " ``` f163861773 When the flow is done, we can get all the histogram files under the specified dir. For example: ``` -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6ca65cc0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6cde8a80 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6d144840 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6d4a9600 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6da303c0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6dd1c800 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e0855c0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e3e0380 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e95a140 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6eafcf00 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6ed1a100 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f094ec0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f561c80 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f783a40 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6fccb7c0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7003d580 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb703ae340 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7084ae80 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70bc1c40 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70f43a00 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70ff7680 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb71361300 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb716df0c0 -rw-rw-r--. 1 185754 185754 4024538 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7199c780 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb71b72f00 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72330000 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72598100 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7290d880 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72b03980 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72f1f160 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb8bcee9e0 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fd51b457260 -rw-rw-r--. 1 185754 185754 4026659 Jan 23 09:51 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final ``` The aggregated histogram file is /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final. It can be loaded to the following auto quant workflow for int8 static quantization. ######## Code refactoring ######## Moved the utility functions to process activation histograms to the deeplearning/numeric_suite/toolkit:hist_processor and add the dependency in dper. We also had a hist_compiler in the caffe2/caffe2/fb/fbgemm/numerical_debugger/python_utils/hist_compiler.py. Also refactored the code to reuse the utility functions in deeplearning/numeric_suite/toolkit:hist_processor. The histograms from bulk_eval and the hist_compiler are identical. /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.compiled.bak /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final.bak Reviewed By: hx89 Differential Revision: D19270090 fbshipit-source-id: c7ecb4f2bbf1ea725c52e903356ad9a7b9ad73ac	2020-02-11 21:39:47 -08:00
Trevor Hickey	9e7638f7c1	"batchSize" was set but never used (#32294 ) Summary: fixes a compiler warning: ``` torch/aten/src/ATen/native/cuda/MaxUnpooling.cu.cc(402): warning: variable "batchSize" was set but never used ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32294 Differential Revision: D19697277 Pulled By: ezyang fbshipit-source-id: b9821be325826dc4785cad7994803b54f1711a0c	2020-02-11 21:28:49 -08:00
rohithkrn	66ee4f1c81	[ROCm] Enable Bfloat16 type for activation and batch-norm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32065 Differential Revision: D19728858 Pulled By: ezyang fbshipit-source-id: 8f828c558bfe6c5f43f476ff8a0f967341f8f351	2020-02-11 21:04:20 -08:00
Hong Xu	f255b7a3ac	Drop support of the build option USE_GLOO_IBVERBS (#33163 ) Summary: Two releases have passed since its deprecation: 8a026d4f74b71944ac2860c315996165a40f5626 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33163 Differential Revision: D19850713 Pulled By: ezyang fbshipit-source-id: 30a60df470b88e8c40e33112296e437cde29c49f	2020-02-11 20:35:50 -08:00
cshesse	1487137c5b	add missing default value for LRScheduler.step() (#32411 ) Summary: see also other type errors in https://github.com/pytorch/pytorch/pull/30576 and https://github.com/pytorch/pytorch/pull/30441 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32411 Differential Revision: D19697245 Pulled By: ezyang fbshipit-source-id: d0295d747541adec5d6fad646f4cf4bb2f04abf5	2020-02-11 20:34:33 -08:00
Nathan Goldbaum	139afd0ea7	Fix link to py-spy content in contribution guide TOC (#31760 ) Summary: The extra dashes are breaking the link here Pull Request resolved: https://github.com/pytorch/pytorch/pull/31760 Differential Revision: D19697301 Pulled By: ezyang fbshipit-source-id: 65de026b9016dc8689c9dac9efb8aafd00b535cd	2020-02-11 20:27:35 -08:00
Saurabh Aggarwal	74c8a8f7bc	Revert D19825127: [pytorch][PR] Move where cuda implementation to TensorIterator Test Plan: revert-hammer Differential Revision: D19825127 Original commit changeset: bbf4682349d9 fbshipit-source-id: 0c439b8c9a00a5aa46fd196396cf7cc83cddb1b4	2020-02-11 19:49:18 -08:00
Michael Ranieri	000a5e2b7f	bad tbb lambda capture, bad chunk size (#30352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30352 1) tbb forwards us ident through parameter, we don't need to capture it. 2) tbb is being passed steps <= 0 which is bad. Taken from TBB documentation: ``` The index type must be an integral type. The loop must not wrap around. The step value must be positive. If omitted, it is implicitly 1. ``` I have a build that uses `TBB_USE_DEBUG=1` and there are currently a lot of issues with PyTorch use. Is TBB version not tested very much right now? ghstack-source-id: 94459382 Test Plan: CI green Differential Revision: D18666029 fbshipit-source-id: d5aa8327b03181d349e1964f9c8211298c433d6a	2020-02-11 18:46:32 -08:00
Zafar Takhirov	a23009f98f	Quantized leaky relu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33004 Test Plan: Imported from OSS Differential Revision: D19740193 Pulled By: z-a-f fbshipit-source-id: 32542d5465db44190366a2f8b737305a03b5fa76	2020-02-11 17:56:02 -08:00
peter	769abddfa3	Build ahead-of-time C++ extensions with ninja on windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33084 Differential Revision: D19817361 Pulled By: ezyang fbshipit-source-id: 95a6d0ffa9beb6885c8a41688621b33da51706ae	2020-02-11 17:50:09 -08:00
Zafar Takhirov	acd51e13f7	TorchScript add check if quantized Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32890 Test Plan: Imported from OSS Differential Revision: D19673463 Pulled By: z-a-f fbshipit-source-id: 453ff662810845fcaeb8e6d5919afa8e2d395768	2020-02-11 17:38:49 -08:00
Jithun Nair	cb39a5400c	Use C10_WARP_SIZE to fix functionality on HIP vs CUDA for batch_norm_backward_reduce (#33098 ) Summary: 1. Use C10_WARP_SIZE instead of hardcoded value "32". 2. `getNumThreads` returns a minimum of 32 for CUDA, which is same as the warp size in CUDA. However, for HIP, it returns a minimum of 16, which is less than the warp size (64) in HIP. This creates an issue in the [reduce function](`14548c2d5b/aten/src/ATen/native/cuda/Normalization.cuh (L115)`) when it zeroes out the other entries in shared memory [here](`14548c2d5b/aten/src/ATen/native/cuda/Normalization.cuh (L137)`): since `blockDim.x` is at least equal to the warp size in CUDA, this never zeroes out `shared[0]`, but for HIP, since `blockDim.x` could be 16 or 32, which is less than the warp size (64), this results in `blockDim.x * blockDim.y` being potentially less than the warp size for small cases, which then zeroes out `shared[0]` as well. This results in an erroneous output of zero for the reduce function on ROCm (depending on how the block dimensions are set). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33098 Differential Revision: D19837355 Pulled By: bddppq fbshipit-source-id: ea526acd82ec08b1acb25be860b7e663c38ff173	2020-02-11 16:47:22 -08:00
Lu Fang	44723a1c24	[ONNX] Fix ONNX CI (#33200 ) Summary: Move the data to aws Pull Request resolved: https://github.com/pytorch/pytorch/pull/33200 Reviewed By: hl475 Differential Revision: D19843193 Pulled By: houseroad fbshipit-source-id: bb0451d211cfc951ddb66264b92586c43b6e8841	2020-02-11 16:38:26 -08:00
Karl Ostmo	af4d6120bd	Temporarily disable failing 'binary_macos_libtorch_2_7_cpu_build' and… (#33207 ) Summary: … 'binary_macos_wheel_3_6_cpu_build' jobs Pull Request resolved: https://github.com/pytorch/pytorch/pull/33207 Differential Revision: D19844787 Pulled By: kostmo fbshipit-source-id: d44a0e26bf76afe4a5f94d7f1ad2d558de6f5d47	2020-02-11 15:44:35 -08:00
Ilia Cherniavskii	04829e924a	Update CPU threading doc (#33083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33083 Added more recommendations, some notes and warning Test Plan: cd docs ; make html Differential Revision: D19829133 Pulled By: ilia-cher fbshipit-source-id: b9fbd89f5875b3ce35cc42ba75a3b44bb132c506	2020-02-11 14:13:51 -08:00
Iurii Zdebskyi	6706c3f457	Prepare templates (#30982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30982 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. ----------- In this PR: Updating the templates. ----------- Test Plan: Imported from OSS Differential Revision: D18912680 Pulled By: izdeby fbshipit-source-id: 9e3828e42ee5c3aefbf3729f4a8d6db813f2e7c3	2020-02-11 13:10:14 -08:00
Hong Xu	45818a3de4	Remove some Half support in some binary CPU kernels (#33021 ) Summary: They were probably mistakenly added as we do not intend to support Half on CPUs in general and in these situations Half type would probably be significantly slower than their float and double counterpart due to the lack of vectorization and the need of additional casting. cc XiaobingSuper Pull Request resolved: https://github.com/pytorch/pytorch/pull/33021 Differential Revision: D19795152 Pulled By: VitalyFedyunin fbshipit-source-id: b19796db88880a46557e1b2fd06e584d46093562	2020-02-11 12:54:47 -08:00
Mingfei Ma	7b50e76255	optimize cat performance on CPU with TensorIterator (#30806 ) Summary: This PR aims at improving `cat` performance on CPU. Current `cat` logic from `TH` module has no parallelization when the input tensor array are all contiguous. This code also try to reuse the same `TensorIterator` as much as possible, in order to reduce overhead of creating `TensorIterator`, this is helpful when the slice of copy is not large enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30806 Differential Revision: D19275026 Pulled By: VitalyFedyunin fbshipit-source-id: 756e9b86891f725c256b0a6981887ff06d88b053	2020-02-11 12:49:56 -08:00
Mike Ruberry	ad90c97c0a	Removes flaky check (#33146 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/32949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33146 Differential Revision: D19836001 Pulled By: mruberry fbshipit-source-id: 773069ae0c181e1a050b65b888c87590c1dddb32	2020-02-11 12:21:07 -08:00
ptrblck	a64d0ffe81	Use int64 in pdist kernel to handle batches >= 46342 #30583 (#31593 ) Summary: Currently `torch.pdist` yields an illegal CUDA memory access for batch sizes >= 46342 as reported by SsnL in https://github.com/pytorch/pytorch/issues/30583. Thanks for the minimal code reproduction, btw! ;) Reason for this bug: The calculation if `i` in the [`pdist_kerne_cuda_impl`](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)`) might overflow, if a tensor with a `batch size >= 46342` is passed to `torch.pdist`. Detailed description: * `result` is resizes as ` n * (n - 1) / 2 = 1073767311` ([line of code](`46ad80c839/aten/src/ATen/native/Distance.cpp (L140)`)) * `grid` is initialized as `result.numel()` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L246)`)) * `k` is assigned to the `blockIdx.x` as an `int32` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L108)`)) * `i` is calculated using `2 * k >= 2147534622` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)`)), which overflows, since `2147534622 > 2147483647 (int32_max)`. Using `const int64_t k = blockIdx.x;` would solve the illegal memory access. This seems also be done for [`cdist_kernel_cuda_impl`](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L198-L201)`). However, we might expect a slowdown, so I've timed the current PyTorch master vs. this PR: (tested with `x = torch.randn(x.size(0), 128)` on a V100) \|x.size(0) \| int32 idx \| int64 idx \| slowdown \| \|----------\|-----------\|-----------\|----------\| \| 50000 \| - \| 4.4460 \| - \| \| 25000 \| 1.02522 \| 1.10869 \| 7.53% \| \| 12500 \| 0.25182 \| 0.27277 \| 7.68% \| \| 6250 \| 0.06291 \| 0.06817 \| 7.72% \| \| 3125 \| 0.01573 \| 0.01704 \| 7.69% \| \| 1562 \| 0.00393 \| 0.00426 \| 7.75% \| While checking the backward kernel, it seems I'm triggering another error with a size limit of ```python x = torch.randn(1449, 1, device='cuda', requires_grad=True) out = torch.pdist(x) out.mean().backward() > RuntimeError: CUDA error: invalid configuration argument ``` , while `[<=1448, 1]` works. I'll take another look at this issue. Let me know, if the potential fix should go into this PR or if I should open a new issue. CC ngimel, csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/31593 Differential Revision: D19825571 Pulled By: ngimel fbshipit-source-id: ace9ccab49f3cf0ce894cdb6daef0795e2e8ec03	2020-02-11 12:00:39 -08:00
Xiang Gao	367488b001	Move where cuda implementation to TensorIterator (#32984 ) Summary: `where` is special because the arguments do not have the same type, which does not satisfy the assumption in modern https://github.com/pytorch/pytorch/pull/32383. I migrate it to TensorIterator so that there is something to test that this case is not broken. Currently, this case fallback to using legacy (not vectorized, not unrolled) code. It should be supported in the future when I cleanup `Loops.cuh`. I also move some sharing part of `CUDALoops.cuh` and `ROCmLoops.cuh` into `Loops.cuh` so that to logic for checking whether `func_t` has the same arg types could be shared. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32984 Differential Revision: D19825127 Pulled By: ngimel fbshipit-source-id: bbf4682349d96b4480c4d657f3c18a3a67a9bf17	2020-02-11 11:10:06 -08:00
Hong Xu	31370949be	Add zero_mask function for vectorized functions. (#32985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32985 This can be useful in many situations to decide whether all elements are zeros or non-zeros, such as elu as shown in #32986 . Test Plan: Imported from OSS Differential Revision: D19794549 Pulled By: VitalyFedyunin fbshipit-source-id: 1be1c863d69b9a19fdcfcdd7cb52343066f740d3	2020-02-11 11:01:29 -08:00
George Guanheng Zhang	855ee6446f	Revert D18749922: [pytorch] Migrating index_add cuda to ATen Test Plan: revert-hammer Differential Revision: D18749922 Original commit changeset: d243be43a3b6 fbshipit-source-id: 15dafa644d84ff8803bd9ab3cdd40e12d805924a	2020-02-11 10:33:20 -08:00
Iurii Zdebskyi	857bae39e0	Updated DispatchKeyExtractor to expect TensorOptions (#30981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30981 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. ----------- In this PR: Extended DispatchKeyExtractor logic to expect TensorOptions. ----------- Test Plan: Imported from OSS Differential Revision: D18912684 Pulled By: izdeby fbshipit-source-id: 25cf1c397caa14272ca65b4003f1f03ff282ea77	2020-02-11 10:09:08 -08:00
Vincent Quenneville-Belair	e7f0b15473	Remove return value for __exit__ (#32997 ) Summary: When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager. Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised. ```python In [1]: import torch ...: ...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler): ...: def __init__(self, optimizer, gamma, milestones, last_epoch = -1): ...: self.init_lr = [group['lr'] for group in optimizer.param_groups] ...: self.gamma = gamma ...: self.milestones = milestones ...: super().__init__(optimizer, last_epoch) ...: ...: def get_lr(self, step): ...: global_step = self.last_epoch #iteration number in pytorch ...: gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1] ...: return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr] ...: ...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) ``` ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-7fad6ba050b0> in <module> 14 15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) <ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch) 6 self.gamma = gamma 7 self.milestones = milestones ----> 8 super().__init__(optimizer, last_epoch) 9 10 def get_lr(self, step): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch) 75 self._step_count = 0 76 ---> 77 self.step() 78 79 def state_dict(self): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch) 141 print("1a") 142 # try: --> 143 values = self.get_lr() 144 # except TypeError: 145 # raise RuntimeError TypeError: get_lr() missing 1 required positional argument: 'step' ``` May be related to https://github.com/pytorch/pytorch/issues/32898. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997 Differential Revision: D19737731 Pulled By: vincentqb fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655	2020-02-11 09:27:29 -08:00
Jongsoo Park	6c0dc66cb4	[caffe2] use JIT'ed fp32 SLS (#33123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32413 Use JIT'ed fp32 SLS in Caffe2 operators Test Plan: ``` ./fblearner/flow/run_integration_tests --regex dper.workflows.canary.canary_workflow --wait ``` f167043951 was killed due to 3hr timeout instead of failed. Reviewed By: jianyuh Differential Revision: D19680711 fbshipit-source-id: efaca333edcfeab0007ad88f4f5168b2229e7e66	2020-02-11 08:59:17 -08:00
albanD	3655975565	Add allow_rebase_history flag and fix codegen functions for multiple views (#32790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32790 Same as https://github.com/pytorch/pytorch/pull/31990 but without the first commit in the stack that is problematic for a lot of people. Test Plan: Imported from OSS Differential Revision: D19814116 Pulled By: albanD fbshipit-source-id: d104911a5b098a5807b4bc08b69803ebd4f69fa6	2020-02-11 07:16:02 -08:00
Gerard Goossen	330d051bd5	[pytorch] Migrating index_add cuda to ATen (#30573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30573 Mostly just moved code. Index dim and number of indices checks are added to make checks idential to index_add_cpu_ ghstack-source-id: 98010129 Test Plan: existing tests Differential Revision: D18749922 fbshipit-source-id: d243be43a3b6a9b9591caf0c35ef2fb6ec0d3ead	2020-02-11 06:03:53 -08:00
Natalia Gimelshein	9857d9b4cd	fix gather regression by not materializing loop vars in the error mes… (#33108 ) Summary: …sage Per title, fixes regression reported in https://github.com/pytorch/pytorch/issues/32425. cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/33108 Differential Revision: D19816116 Pulled By: ngimel fbshipit-source-id: 9f4a84c8e4533873b71bb7bbf3a7915b05308845	2020-02-10 18:27:02 -08:00
Lin Yang	6f46962f21	[1/3] Bind IndexHash to PyTorch (#33015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33015 Export IndexHash to PyTorch Test Plan: buck test caffe2/caffe2/python/operator_test:torch_integration_test ✓ caffe2/caffe2/python/operator_test:torch_integration_test-2.7 - test_index_hash_op (caffe2.caffe2.python.operator_test.torch_integration_test.TorchIntegration) 0.151 44/50 (passed) Reviewed By: bddppq Differential Revision: D19727301 fbshipit-source-id: a65c954539e81a15577fe5c3c0deb3614e983534	2020-02-10 17:47:38 -08:00
svcscm	61ac14a483	Updating submodules Summary: GitHub commits: `543b39c9ad` `38c2e0ee44` `552c07c32b` `4369f2c7bb` `07dbb5d2f4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 803108a618a5be9ea58a38644c851486bad3bfbc	2020-02-10 17:19:07 -08:00
Ailing Zhang	a3e69d3405	Use bazelisk instead of specifying bazel version manually. (#33036 ) Summary: Bazelisk automatically reads `.bazelversion` file and install the required version of Bazel. This saves us from updating CI script everytime we need a Bazel upgrade. Use clang-8 for consistency with pytorch/xla repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33036 Differential Revision: D19820819 Pulled By: ailzhang fbshipit-source-id: 1560ec225cd037a811769a509a704b0df77ea183	2020-02-10 17:14:08 -08:00
svcscm	524fe8a96c	Updating submodules Summary: GitHub commits: `4bc5213b66` `9ae570bb89` `b2bc1da561` `dcde8696bd` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: c5ca30dab73f80cd13f5a5bf6e3867083b2512ac	2020-02-10 15:07:12 -08:00
Ivan Kobzarev	d672779339	[CI][treehug] Disable xenial_py2.7 tests due to mypy min version py3.5 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33159 Test Plan: Imported from OSS Differential Revision: D19822400 Pulled By: IvanKobzarev fbshipit-source-id: 8e7b561e6a6181ec1f9b6f56a539ddcb538b3858	2020-02-10 14:52:29 -08:00
Jiakai Liu	495c1df510	[pytorch] convert code analyzer to a binary (#33102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33102 Add a simple main() to build code analyzer as a binary. This enables easier integration with FB internal build environment. ghstack-source-id: 97958658 Test Plan: - CI Differential Revision: D19798560 Pulled By: ljk53 fbshipit-source-id: 126230e3bf7568046a309e8a6785230f820e0222	2020-02-10 14:46:29 -08:00
Karl Ostmo	e8c4f5a74b	Temporarily disable failing iOS builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33154 Differential Revision: D19820655 Pulled By: kostmo fbshipit-source-id: fc3e22b1bf4ec112085ea846c3999efd0f3e26f3	2020-02-10 13:47:57 -08:00
Gregory Chanan	3bde97d5a5	Move a resize from codegen to code. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33024 Test Plan: Imported from OSS Differential Revision: D19774147 Pulled By: gchanan fbshipit-source-id: 08cb099f1695b28117e4236e214976b548aec7a1	2020-02-10 12:47:14 -08:00
Jithun Nair	3c4cec56aa	Enable test_distributed for ROCm but only with nccl backend [REDUX] (#32551 ) Summary: This is a redux of the original PR https://github.com/pytorch/pytorch/issues/28814 which was reverted in PR https://github.com/pytorch/pytorch/issues/29736 due to test_DistributedDataParallel being suspected as being flaky. Further investigation revealed it wasn't flakiness, but a bug in the PyTorch source code which has been now fixed in PR https://github.com/pytorch/pytorch/issues/32356. This PR is another attempt at enabling the test_distributed unit test suite only for the nccl backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32551 Differential Revision: D19729966 Pulled By: bddppq fbshipit-source-id: 12a0d850991a903cc7723d63693b6157071d7115	2020-02-10 12:42:36 -08:00
George Guanheng Zhang	f4fbe9549d	Revert D19800021: [pytorch][PR] Improve error message for assertWarnsRegex Test Plan: revert-hammer Differential Revision: D19800021 Original commit changeset: 1c31ae785c8f fbshipit-source-id: d7b340d678562c25a84d48be66c576075000b50d	2020-02-10 12:17:52 -08:00
Jeremy Lilley	6be4ec100f	[pytorch] Elide more Thrift Tensor send copies. (#31998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31998 This change builds on recent torch::from_blob() changes to avoid Tensor copies on send in more cases. Particularly, this change adds an enabled option to assume if the Tensor Storage's DataPtr has a non-trivial deleter, then the Tensor does in fact manage the underlying memory. And hence we can reference the Tensor's Storage via an IOBuf that is referenced while sending, saving a Tensor copy. We add appropriate test cases, particularly re: torch::from_blob() which would have been problematic would recent changes. ghstack-source-id: 97778619 Test Plan: buck test mode/dev caffe2/torch/fb/distributed/wireSerializer/test/... Reviewed By: satgera Differential Revision: D19306682 fbshipit-source-id: 05f56efb2d5d6279ae4b54dfcbba0f729c2c13fa	2020-02-10 11:34:33 -08:00
peterjc123	ebed008dd4	Correct /MP usage in MSVC (#33120 ) Summary: ## Several flags `/MP[M]`: It is a flag for the compiler `cl`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. `/maxcpucount:[M]`: It is a flag for the generator `msbuild`. It leads to project-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. `/p:CL_MPCount=[M]`: It is a flag for the generator `msbuild`. It leads the generator to pass `/MP[M]` to the compiler. `/j[M]`: It is a flag for the generator `ninja`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. ## Reason for the change 1. Object-level multiprocessing is preferred over project-level multiprocessing. 2. ~For ninja, we don't need to set `/MP` otherwise M * M processes will be spawned.~ Actually, it is not correct because in ninja configs, there are only one source file in the command. Therefore, the `/MP` switch should be useless. 3. For msbuild, if it is called through Python configuration scripts, then `/p:CL_MPCount=[M]` will be added, otherwise, we add `/MP` to `CMAKE_CXX_FLAGS`. 4. ~It may be a possible fix for https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Because `/MP` is also passed to `nvcc`.~ It is probably not true. Because `/MP` should not be effective given there is only one source file per command. ## Reference 1. https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019 2. https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows 3. https://blog.kitware.com/cmake-building-with-all-your-cores/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33120 Differential Revision: D19817227 Pulled By: ezyang fbshipit-source-id: f8d01f835016971729c7a8d8a0d1cb8a8c2c6a5f	2020-02-10 11:29:25 -08:00
mfkasim91	9d94f56ce0	Backward operation of torch.eig for real eigenvalues (#33090 ) Summary: Another pull request to follow up issue https://github.com/pytorch/pytorch/issues/32531. Here I implemented the backward operation for `torch.eig` with a condition that all the eigenvalues are real. This pull request is independent of my another pull request https://github.com/pytorch/pytorch/issues/32932, which means that there is no dependency between this PR and my another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33090 Differential Revision: D19814347 Pulled By: albanD fbshipit-source-id: 2fae30964e97987abb690544df8240aedeae56e8	2020-02-10 09:52:56 -08:00
Peter Bell	c917a247a8	Improve error message for assertWarnsRegex (#33099 ) Summary: `assertWarnsRegex` now prints out any warnings that it caught while failing to find a matching warning. This makes it easier to debug tests by just looking at the CI logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33099 Differential Revision: D19800021 Pulled By: ezyang fbshipit-source-id: 1c31ae785c8ffc5d47619aff6597e479263be2de	2020-02-10 07:27:59 -08:00
albanD	3e8d813263	Add more checks to custom Function (#33069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33069 This PR adds the following: - Warn when a non-input Tensor is given to `mark_dirty()` as it is not needed. - Raise an error if we modify inplace an input that is a view and that we have multiple output. This setting is not handled by `CopySlices` and will raise a cryptic error during the backward. - Raise an error if an input is modified inplace but not returned. That will prevent the graph rewrite from being done correctly. Test Plan: Imported from OSS Differential Revision: D19791563 Pulled By: albanD fbshipit-source-id: 4d8806c27290efe82ef2fe9c8c4dc2b26579abd1	2020-02-10 07:25:24 -08:00
albanD	e1c53a5c86	Fix version counter bump in cpp Function (#33068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33068 The version counter is already tracked if we use pytorch's functions but not if the user unpack the Tensor and modifies it by hand or with a third party library. Test Plan: Imported from OSS Differential Revision: D19791564 Pulled By: albanD fbshipit-source-id: a73c0f73d8fd0c0e5bf838f14bed54fa66937840	2020-02-10 07:22:29 -08:00
Peter Bell	efba630287	Issue a warning when zero_grad is used in DataParallel (#33064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31768, second attempt of https://github.com/pytorch/pytorch/issues/32870 DataParallel creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module. However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should issue a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33064 Differential Revision: D19790178 Pulled By: albanD fbshipit-source-id: 886f36640acef4834a6fa57a26ce16b42ff0e9ad	2020-02-10 07:04:27 -08:00
Summer Deng	e2f1288514	Add utils to inspect fp16/int8 packed weights (#32979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32979 Since we use prepacked weights in the Fp16 FCs and future Int8 FCs in production Ads models, we provide the python utils to inspect the unpacked format of the weights for debugging purpose. The main interfaces are the following: ``` from deeplearning.numeric_suite.toolkit import packed_weights_inspector # inspect fp16 packed weights unpacked_fp16_weights = packed_weights_inspector.extract_fp16_fc_packed_weights(fp16_weight_blob_name) # inspect int8 packed weights unpacked_int8_weights, qparams = packed_weights_inspector.extract_int8_fc_packed_weights(int8_weight_blob_name) ``` Test Plan: ``` buck test mode/opt deeplearning/numeric_suite/toolkit/test:packed_weights_inspector_test ``` Reviewed By: amylittleyang Differential Revision: D19724474 fbshipit-source-id: e937672b3722e61bc44c2587aab2288a86aece9a	2020-02-08 18:18:56 -08:00
Negin Raoof	6249d7302b	[ONNX] Fix export for avg_pool with default stride (#33017 ) Summary: If using nn.functional avg_pool, stride is an optional arg. If not provided, it is set to kernel_size. This PR fixes the export of avg_pool with default stride. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33017 Reviewed By: hl475 Differential Revision: D19759604 Pulled By: houseroad fbshipit-source-id: b0352db6fbaf427f4cff9ba8a942efdeb39b6f02	2020-02-07 22:46:46 -08:00
Raghuraman Krishnamoorthi	0e29e9e0f6	Re-enable internal test runs Summary: Fix internal error message due to old version of hypothesis test_suite = self.load_tests() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 678, in load_tests suite = loader.load_all() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 467, in load_all __import__(module_name, level=0) File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/test_quantization.py", line 45, in <module> hu.assert_deadline_disabled() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/torch/testing/_internal/hypothesis_utils.py", line 322, in assert_deadline_disabled assert settings().deadline is None File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/hypothesis/_settings.py", line 127, in __getattr__ raise AttributeError('settings has no attribute %s' % (name,)) AttributeError: settings has no attribute deadline Test Plan: buck test mode/dev //caffe2/test:quantization -- --run-disabled runs successfully Differential Revision: D19795232 fbshipit-source-id: ef1d8be20b4be30e1cfad4cd5019c4779a5f4568	2020-02-07 18:08:18 -08:00
Brian Stark	17d4ef9e9e	Support using scalar tensor for split (#32493 ) Summary: split requires an int input, however in tracing operators such as size(axis) return a tensor, which is different behavior than when not tracing. As such need to modify split to handle these cases. Fixes https://github.com/pytorch/pytorch/issues/27551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32493 Reviewed By: hl475 Differential Revision: D19538254 Pulled By: houseroad fbshipit-source-id: c8623009de5926aa38685e08121f4b48604bd8c0	2020-02-07 17:16:43 -08:00
Kiuk Chung	7314f1c281	[torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070 `start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`. Test Plan: Warning message looks like: ``` main.py:8: UserWarning: This method only supports start_method=spawn (got: fork). To use a different start_method use: torch.multiprocessing.start_process(...) warnings.warn(msg) ``` Reviewed By: ailzhang Differential Revision: D19780235 fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b	2020-02-07 15:26:00 -08:00
Nikolay Korovaiko	c6fa6d82ae	move Decompose before profiling to prevent clearing shape info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33100 Differential Revision: D19793346 Pulled By: Krovatkin fbshipit-source-id: fdc5927f4970eabbb5a8f62a499d5b79117af2a9	2020-02-07 14:04:40 -08:00
Lara	868db903ae	ONNX support for torch.take (#33061 ) Summary: Adding ONNX export support for torch.take() Pull Request resolved: https://github.com/pytorch/pytorch/pull/33061 Reviewed By: hl475 Differential Revision: D19782651 Pulled By: houseroad fbshipit-source-id: 0168fb941e166acda4ca607165248b8e0b260ace	2020-02-07 13:41:26 -08:00
Hong Xu	a9583c1f75	Vectorize softplus and its backward function on CPU (#32944 ) Summary: The benchmarking shows a huge performance gain (2-7x faster). Also note that I removed Half support because it isn't generally supported on CPU. Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('Softplus',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 10000), (100_000, 1000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 40000), (100_000, 4000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype}, requires_grad=True); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 3.73130346799735 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 3.6790116359916283 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 2.7477027159911813 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 2.7382752639969112 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 7.037510035006562 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 5.855093962003593 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 3.413616877005552 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 2.5485514330066508 ``` After: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 0.9465823079954134 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 0.8799468770012027 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 0.39715987400268205 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 0.3563060039887205 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 2.400547721001203 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 1.4740848699875642 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 1.6684603010071442 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 0.6815649690106511 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32944 Differential Revision: D19725407 Pulled By: VitalyFedyunin fbshipit-source-id: 7430de838df731bd17617eff63f10107d5ad6b8b	2020-02-07 11:28:49 -08:00
Kimish Patel	e7b42209eb	Added sparkspot model. Summary: Lite interpereter does not have softplus and sub ops for this model. Test Plan: buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android --framework pytorch --remote --devices SM-G960U-8.0.0-26 https://our.intern.facebook.com/intern/aibench/details/890521439770638 buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android/arm64 --framework pytorch --remote --devices SM-G960U-8.0.0-26 https://our.intern.facebook.com/intern/aibench/details/485779747361527 For Caffe2: buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/caffe2/mobile_migration/sparkspot.json --platform android --framework caffe2 --remote --devices SM-G950U-7.0-24 https://our.intern.facebook.com/intern/aibench/details/177482569133423 Reviewed By: ljk53, iseeyuan Differential Revision: D19757721 fbshipit-source-id: cdd4b39d072925fc8de17184f2c90918de6245ba	2020-02-07 11:22:06 -08:00
Hongyu Cai	de27f4261d	[jit] remove redundant variables from JIT TestCase Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29091 Differential Revision: D19746083 Pulled By: suo fbshipit-source-id: 76fd71740fe7a3f52da361d96a7b694ec208de24	2020-02-07 10:42:33 -08:00
Negin Raoof	d678093907	[ONNX] Extend op registration to next opsets (#32943 ) Summary: Currently, custom ops are registered for a specific opset version. For example, all torchvision custom ops are registered for opset 11, and cannot be exported into higher opset versions. This PR extends op registration to higher opset versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32943 Reviewed By: hl475 Differential Revision: D19739406 Pulled By: houseroad fbshipit-source-id: dd8b616de3a69a529d135fdd02608a17a8e421bc	2020-02-07 10:37:50 -08:00
Alban Desmaison	3b2f267ad8	add to codeowner to get better inbox notification for PR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33087 Differential Revision: D19790389 Pulled By: albanD fbshipit-source-id: 360ee1fc47a9b0b8d8ddbe47b77f2cbffaead9c8	2020-02-07 07:56:47 -08:00
Lu Fang	674dca0831	Automatic update of fbcode/onnx to 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e (#33075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33075 Previous import was 65020daafa9183c769938b4512ce543fd5740f8f Included changes: - [8b3f7e2e](https://github.com/onnx/onnx/commit/8b3f7e2e): Update Dropout and BatchNorm to be Training Friendly (#2568) <Lara Haidar> - [61f0bbc5](https://github.com/onnx/onnx/commit/61f0bbc5): Fix a bug in ScatterND shape inference (#2577) <Bowen Bao> - [05bce9cf](https://github.com/onnx/onnx/commit/05bce9cf): add utility function to make reference attribute whose name is not the same as the attribute it refers. (#2583) <Ke Zhang> - [71181c83](https://github.com/onnx/onnx/commit/71181c83): Clarify spec for constant of shape with dim_n = 0 (#2567) <Negin Raoof> - [eadba733](https://github.com/onnx/onnx/commit/eadba733): Update sigs.md with link to calendar page (#2579) <Prasanth Pulavarthi> - [08562f8e](https://github.com/onnx/onnx/commit/08562f8e): Update working-groups.md (#2580) <Prasanth Pulavarthi> - [0e718913](https://github.com/onnx/onnx/commit/0e718913): Fix Slice op's shape inference logic (#2526) <Hariharan Seshadri> - [12111410](https://github.com/onnx/onnx/commit/12111410): Add missing spaces to RandomLike doc (#2572) <Takeshi Watanabe> - [7e6e61d6](https://github.com/onnx/onnx/commit/7e6e61d6): Contributing: fix typos (#2571) <Maher Jendoubi> - [bbd604ef](https://github.com/onnx/onnx/commit/bbd604ef): Add Einsum op (#2504) <Negin Raoof> - [fd3ab73a](https://github.com/onnx/onnx/commit/fd3ab73a): Clarify split supports zero length splits (#2544) <Negin Raoof> - [6dd73774](https://github.com/onnx/onnx/commit/6dd73774): Fix circleci build and drop unsupported Windows builds (#2565) <Wei-Sheng Chin> - [b3d201a2](https://github.com/onnx/onnx/commit/b3d201a2): Fix the formula of intermediate zero calculation for DynamicQuantizeLinear (#2556) <Yufeng Li> - [3613eb25](https://github.com/onnx/onnx/commit/3613eb25): Add wording to clarify. (#2555) <Dwayne Robinson> - [dfa4384c](https://github.com/onnx/onnx/commit/dfa4384c): Fix shape inference for Split with split attribute (#2328) <Shinichiro Hamaji> - [684fc1bc](https://github.com/onnx/onnx/commit/684fc1bc)*: Keep symbolic dims in Concat with a single input (#2418) <Shinichiro Hamaji> Test Plan: ci Reviewed By: hl475 Differential Revision: D19784487 fbshipit-source-id: 421cdc3394faeff0168853f4ff065fc599ca3967	2020-02-07 02:18:57 -08:00
Michael Ranieri	e025f393f6	windows template specialization bug (#33076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076 attempt at fixing https://github.com/pytorch/pytorch/issues/30886 Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes Differential Revision: D19784550 fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6	2020-02-07 00:41:22 -08:00
Pritam Damania	05d18ffaf5	Distributed Autograd: Allow multiple backward passes to accumulate gradients. (#32506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32506 In this PR, we've introduced a `retain_graph` parameter to distributed autograd similar to `torch.autograd.backward`. In terms of design, this parameter is sent over RPC to all nodes and is used to create the GraphTask on the local nodes. This enables us to run `dist_autograd.backward()` multiple times in the same context. The use case currently for this is to benchmark only the backward pass for distributed autograd. We'd like to measure the QPS for the backward pass and as a result, running a single forward pass and multiple backward passes in a loop is one way to benchmark backward pass performance. ghstack-source-id: 97868900 Test Plan: waitforbuildbot Differential Revision: D19521288 fbshipit-source-id: 7ad8521059fd400d7b5a6ab77ce56e1927ced90a	2020-02-06 23:27:21 -08:00
Jeremy Lilley	f0d7bd41b9	[jit] Minor: avoid recalculating some keys for map accesses in pickler. (#33060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33060 Noticed this when tracking down a partially-related SIGSEGV. If inserting a non-present key into a memoized map, don't re-calculate it twice (probably safer that way anyway). ghstack-source-id: 97904485 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D19778008 fbshipit-source-id: 95b1d708c034a54b96a22ccbdffb24f72d08dffd	2020-02-06 21:25:04 -08:00
svcscm	10db323b75	Updating submodules Summary: GitHub commits: `4121390031` `fdd24faa6c` `94471e632b` `0a24425afd` `8b79c69b6c` `99f3917826` `3853cef0ba` `5db0cb90fc` `714edbb20f` `880ade1420` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: a63558a8df40c936d8959287f815835502b6cbd9	2020-02-06 21:01:50 -08:00
Brian Stark	afa8cbf8c2	Modifed randNLike for scripting (#32830 ) Summary: the rand N like function had required args which were not being used. As such modified the method signature to give default values so when scripting does not provide these arguments which are not even being used, no error is thrown. Additionally modified the const checker for handling prim::Constant as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/32830 Reviewed By: hl475 Differential Revision: D19731715 Pulled By: houseroad fbshipit-source-id: a3cacb3977eecb88b122e0ceb654fdbf1c8286c1	2020-02-06 18:19:42 -08:00
BowenBao	432858c960	[ONNX] Fix exporting copy_ with index as tensor input (#32801 ) Summary: Supporting the below case. Previously index for copy_ was only considered as constant integer, where as it could be a tensor input as well. ```python class InPlaceIndexedAssignment(torch.nn.Module): def forward(self, data, index, new_data): data[index] = new_data return data data = torch.zeros(3, 4) index = torch.tensor(1) new_data = torch.arange(4).to(torch.float32) torch.onnx.export(InPlaceIndexedAssignment(), (data, index, new_data), 'inplace_assign.onnx', opset_version=11) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32801 Reviewed By: hl475 Differential Revision: D19731666 Pulled By: houseroad fbshipit-source-id: 08703fdccd817f901282e19847e259d93929e702	2020-02-06 18:11:47 -08:00
Elias Ellison	ca33aeba09	[JIT] Add Exit Transform / Convert To SSA to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24114 Differential Revision: D19780828 Pulled By: eellison fbshipit-source-id: d481ad886b2ad6349a1646672e507336d45759fb	2020-02-06 18:04:06 -08:00
Avinash Madasu	b0476dc6e6	Fix Typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33038 Differential Revision: D19769127 Pulled By: zou3519 fbshipit-source-id: 53a7fa603b097d7070ca484997a587ec74e87357	2020-02-06 11:16:56 -08:00
James Reed	38820a7014	[JIT] Resolve custom classes in source importer (#32977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32977 ghstack-source-id: 97736042 Test Plan: Imported from OSS Differential Revision: D19724588 fbshipit-source-id: b31b6ae14d2881d3604922e611fe4749108e674d	2020-02-06 10:45:40 -08:00
James Reed	757cea92a4	[c10] Allow taking a std::tuple as arg (#32948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32948 ghstack-source-id: 97736044 Test Plan: Imported from OSS Differential Revision: D19709119 fbshipit-source-id: 26b069a95ae7a79a2d5cbe3845eb1a5dcd398be1	2020-02-06 10:44:31 -08:00
Richard Zou	8195961f20	Revert D19730209: [pytorch][PR] Issue a warning when using zero_grad in DataParallel Test Plan: revert-hammer Differential Revision: D19730209 Original commit changeset: cb9b2cb0c2e0 fbshipit-source-id: 5bf53ea3c37a7ed2411a2acc34e40d07eff144c9	2020-02-06 07:05:51 -08:00
Richard Zou	ec1e9a1ae2	Revert D19417087: fix #30480 torch.normal shape checking is broken Test Plan: revert-hammer Differential Revision: D19417087 Original commit changeset: 1c4bc7df9231 fbshipit-source-id: ee579304cd79e48a6ce87daf490b53baabc655a8	2020-02-06 07:01:29 -08:00
Andrey Malevich	e76fa9822d	[C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32607 As desc. Test Plan: Unit-test. Reviewed By: xw285cornell, chocjy Differential Revision: D19551567 fbshipit-source-id: 3a121351d2b4016e99a1536dec746be970698664	2020-02-05 23:49:27 -08:00
lixinyu	3c17cbb6c8	fix #30480 torch.normal shape checking is broken (#32243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32243 Following what gchanan proposed in #30480 - If the (logical) shapes of mean and std are broadcastable, we broadcast them for the output Done in tensor iterator already. - If the (logical) shapes of mean and std are not broadcastable and they have the same number of elements, we fall back to the old behavior (pick the shape of mean) Done by reshape std to the same shape of mean. - If the (logical) shapes of mean and std are not broadcastable and don't have the same number of elements, we error out. Done by tensor iterator already. Test Plan: Imported from OSS Differential Revision: D19417087 Pulled By: glaringlee fbshipit-source-id: 1c4bc7df923110a803620b9e2abd11a7151fc33e	2020-02-05 23:47:14 -08:00
xiaobing.zhang	b00345a6f2	Move normal distribution to Aten(CPU) (#32031 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32031 Differential Revision: D19729002 Pulled By: ezyang fbshipit-source-id: f571368a8a2ac4068c937062167a2fd89e64098c	2020-02-05 20:39:40 -08:00
Peter Bell	46c3c18bcc	Issue a warning when using zero_grad in DataParallel (#32870 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31768 `DataParallel` creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module, ~breaking any model that uses `backward`-`zero_grad` in its `forward`. I fix this by patching the replica module so that `zero_grad` clears grads on the parent as well.~ However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should raise a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32870 Differential Revision: D19730209 Pulled By: ezyang fbshipit-source-id: cb9b2cb0c2e0aca688ce0ff3e56b40fbd2aa3c66	2020-02-05 20:25:04 -08:00
Richard Zou	6209412647	Add option to use ninja to compile ahead-of-time cpp_extensions (#32495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90	2020-02-05 18:49:29 -08:00
BowenBao	e54d954572	[ONNX] Add flag to enable script tests (#32654 ) Summary: This will allow us to incrementally enable more tests for scripting as we put in fixes. houseroad spandantiwari Pull Request resolved: https://github.com/pytorch/pytorch/pull/32654 Reviewed By: hl475 Differential Revision: D19583401 Pulled By: houseroad fbshipit-source-id: 8dc05e4784df819c939dffdf33b00cbb80bfa364	2020-02-05 17:51:00 -08:00
Edgar Andrés Margffoy Tuay	1b746b95fb	Consider hub_dir alongside TORCH_HOME env variable for storing hub models (#32844 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32844 Differential Revision: D19747566 Pulled By: ailzhang fbshipit-source-id: caca41a3a057d7d280d4783515aba2cc48c82012	2020-02-05 15:35:53 -08:00
davidriazati	74ce3a032c	Fix some bugs with zipfile serialization (#32244 ) Summary: Stacked PRs * #32958 - Make zip serialization the default * #32244 - Fix some bugs with zipfile serialization It includes the following changes: * Split up tests so that we can test both serialization methods * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end) * Call `readinto` on a buffer if possible instead of `read` + a copy * Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine) ](https://our.intern.facebook.com/intern/diff/19418935/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244 Pulled By: driazati Reviewed By: eellison Differential Revision: D19418935 fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573	2020-02-05 15:32:14 -08:00
Pritam Damania	ab75d64e6e	Add ability to abort NCCL communicators from the store. (#32895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32895 When a particular rank calls `ncclCommAbort` on a communicator, it is important to ensure all other ranks call `ncclCommAbort` on their respective communicators. If this is not done, the other ranks could get stuck causing the GPU to spin with 100% utilization. To alleviate this issue, whenever any rank calls `ncclCommAbort` we put the unique communicator id in the store. The NCCL watchdog thread then monitors the store and aborts any communicators found in the store as "aborted". A few more general fixes in this PR: 1) Use std::shared_ptr for the store in PrefixStore. PrefixStore was using a reference to the store and when that reference went out of scope the store object it was holding onto was invalid. This caused a segfault in the watchdog thread. 2) Enhanced logging for the watchdog thread. Test Plan: waitforbuildbot Differential Revision: D19638159 fbshipit-source-id: 596cd87c9fe6d4aeaaab4cb7319cc37784d06eaa	2020-02-05 15:28:05 -08:00
Michael Suo	df1d68d52e	[jit] fix parser for one-line functions (#32941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32941 The Python grammar allows single-statement one-line functions. So we should allow it in the string parser. Test Plan: Imported from OSS Differential Revision: D19704153 Pulled By: suo fbshipit-source-id: 8c06cc9c600aa2a9567b484a1ecc0360aad443e3	2020-02-05 13:11:47 -08:00
Pruthvi Madugundu	908b451efb	Enabling the nccl/rccl test for ROCM environment (#32340 ) Summary: Enabling the RCCL test on rocm by adding a temporary grace period to clean up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32340 Differential Revision: D19744459 Pulled By: xw285cornell fbshipit-source-id: 1af3b64113a67f93e622d010ddd3020e5d6c8bc8	2020-02-05 12:02:31 -08:00
Natalia Gimelshein	e8581869f2	Properly update _flat_weights in RNN models (#32989 ) Summary: Resubmitting https://github.com/pytorch/pytorch/issues/32939 Should fix https://github.com/pytorch/pytorch/issues/32346 hopefully. Now when _flat_weights list is updated, None elements are appended to it if some weights are missing, subsequent setattr calls for the missing weights should repair _flat_weights and make it suitable to use in the backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32989 Differential Revision: D19731952 Pulled By: ngimel fbshipit-source-id: 2118a19840491e7ab0fef15185fad982f42795a6	2020-02-05 11:53:41 -08:00
Gregory Chanan	72b9412be2	Move some broadcasting logic away from codegen. (#32982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32982 For masked_scatter_ and masked_fill_ (which already have manually written wrappers), move the broadcasting logic into the manually written wrappers. Test Plan: Imported from OSS Differential Revision: D19726830 Pulled By: gchanan fbshipit-source-id: 1f6e55e19c1314a76e43946b14d58f147c0f8204	2020-02-05 10:23:49 -08:00
Gaurav Singh	fbde3c05b6	[aten] fix vector memory leak (#32478 ) Summary: free(y) missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32478 Differential Revision: D19728471 Pulled By: agolynski fbshipit-source-id: 73e7933c832f9c19f3fe09df76699c7b335a87bd	2020-02-05 10:18:54 -08:00
Ailing Zhang	81a9046301	Fix dispatch of argmax/argmin. (#32961 ) Summary: The way we currently dispatch argmax/argmin to out-of-source devices is bad and caused issues, e.g it doesn't work well when the input requires grad. https://github.com/pytorch/xla/issues/1585. Making argmax/argmin dispatch at device level resolves it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32961 Differential Revision: D19726826 Pulled By: ailzhang fbshipit-source-id: f7fb445fd8e7691524afcc47d24d8e6b0171d10c	2020-02-05 10:17:50 -08:00
Gregory Chanan	3531f99384	Kill _th_max, _th_min overloads that aren't used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32981 Test Plan: Imported from OSS Differential Revision: D19726831 Pulled By: gchanan fbshipit-source-id: 22b5b9115838360850c4ee250ed95742f3444dc8	2020-02-05 09:20:21 -08:00
Edward Yang	16c166e2ea	Add XLAPreAutograd key for XLA use cases that need custom autograd. (#32788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32788 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628643 Pulled By: ezyang fbshipit-source-id: 7099b08eff37913144b961dda00b070bd4b939d4	2020-02-05 08:10:02 -08:00
Edward Yang	6b0813ea5d	Stop using dispatchTypeId to do checks for tensor list unwrap. (#32787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32787 Gets rid of a longstanding TODO. TensorList unwrap is only used for cat, which means we can assume that the inputs are dense, and do something similar to how we do the dense tensor wrapping above. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628642 Pulled By: ezyang fbshipit-source-id: 3264439407585fb97995a9a2302c2913efecb421	2020-02-05 08:08:16 -08:00
lixinyu	1b446aa2ee	Expose Channel Last 3d enum Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32947 Test Plan: Imported from OSS Differential Revision: D19707716 Pulled By: glaringlee fbshipit-source-id: 03824769376043bc6151a4580aba27654de5077f	2020-02-04 23:33:19 -08:00
James Reed	836b4c9e64	Attempt to workaround MSVC17 static constexpr bug Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33002 Test Plan: Imported from OSS Differential Revision: D19739097 Pulled By: jamesr66a fbshipit-source-id: 7ce54ddb1f56a741d88d3215b154192171c54dfa	2020-02-04 22:33:22 -08:00
James Reed	f393adc0ed	[JIT] Fix python pickle serialization for torchbind (#32878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32878 ghstack-source-id: 97736045 Test Plan: Imported from OSS Differential Revision: D19669879 fbshipit-source-id: 23ea91cffe7344d1eed014e2509983c281dd18d3	2020-02-04 19:29:55 -08:00
James Reed	23a4800708	[JIT] Make IRParser use op schema (#32854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32854 ghstack-source-id: 97736043 Test Plan: Imported from OSS Differential Revision: D19656881 fbshipit-source-id: 509d09fdbd765ca5cd153bec6440aedfb4e6d23b	2020-02-04 19:29:50 -08:00
James Reed	bc4790b3aa	[JIT] Trace uses of torchbind classes as module attributes (#32833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32833 ghstack-source-id: 97736046 Test Plan: Imported from OSS Differential Revision: D19645714 fbshipit-source-id: 10a7271f13c3588aea666b44b916e90ba7b3c666	2020-02-04 19:28:37 -08:00
Pavel Belevich	d141465713	Fix torch::allclose to handle std::numeric_limits<T>::lowest() for integral types (#32978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32978 Fixes #32946 Test Plan: Imported from OSS Differential Revision: D19726013 Pulled By: pbelevich fbshipit-source-id: ada4aeabc8e39016d24f1a40f02fb7c56f069cd3	2020-02-04 19:06:52 -08:00
svcscm	e4f633ba0b	Updating submodules Summary: GitHub commits: `619d2503cb` `c442208177` `75d9b18eba` `ed5142083a` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 11a53fea064f8e40c2a89d3068421d7cad231d00	2020-02-04 16:36:24 -08:00
Lara	4502d8c391	Interpolate Float [] support in ONNX (#32554 ) Summary: The PR https://github.com/pytorch/pytorch/pull/31791 adds support for float[] constant, which affects some cases of ONNX interpolate support. This PR adds float[] constants support in ONNX, updates interpolate in ONNX, and re-enable the disabled tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32554 Reviewed By: hl475 Differential Revision: D19566596 Pulled By: houseroad fbshipit-source-id: 843f62c86126fdf4f9c0117b65965682a776e7e9	2020-02-04 16:14:40 -08:00
Rohan Varma	bda874b480	[rpc] throw correct Exception on local client based on the RemoteException (#32936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32936 Closes https://github.com/pytorch/pytorch/issues/32732. Currently if a UDF run in RPC throws an exception such as ValueError or TypeError, we wrap this in a RemoteException on the callee side. When raising this on the caller side, we currently raise a vanilla Exception. This diff changes it so that the correct exception is thrown. Tested by changing the current rpc tests to assert on the right type of error rather than just the base `Exception`. ghstack-source-id: 97706957 Test Plan: Modified unit test. Differential Revision: D19700434 fbshipit-source-id: e451b772ea6aecc1d2e109e67e7f932eb9151f15	2020-02-04 16:08:25 -08:00
John-Mark Allen	a9141dd240	Patch `Half.h` for compiling CUDA with clang (#29027 ) Summary: Following discussion: https://github.com/pytorch/pytorch/issues/28417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29027 Differential Revision: D19698745 Pulled By: ezyang fbshipit-source-id: fab4be3bcbac8f3b334d7e0a56e6a790e2c6b6d8	2020-02-04 15:05:52 -08:00
Enealor	7ea6559658	Add size checks to `torch.stack` (#32931 ) Summary: Checks the size of each tensor passed to `torch.stack` before calling `cat` to address https://github.com/pytorch/pytorch/issues/29510. This is done in the `get_stack_input` function as that is a common path. The function now compares the size of each tensor in the TensorList to the size of the first tensor and throws an exception when the sizes are not equal. To compare: ``` x = torch.zeros([1, 2]) y = torch.zeros([1, 3]) torch.stack([x, y]) # Errors due to size differences ``` Current error: ``` RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 2 and 3 in dimension 2 at (path)\aten\src\TH/generic/THTensor.cpp:612 ``` New error: ``` RuntimeError: stack expects each tensor to be equal size, but got [1, 2] at entry 0 and [1, 3] at entry 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32931 Differential Revision: D19700110 Pulled By: ezyang fbshipit-source-id: 7e18bb00fa2c137e418e340d719b6b76170b83e3	2020-02-04 15:00:54 -08:00
Ehsan Azar	58e8d5588a	[ONNX] Export bitwise_not for bool (logical_not) (#28439 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25805 (for bool tensors as in the issue) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28439 Differential Revision: D19700156 Pulled By: ezyang fbshipit-source-id: 0706ada6a8d259dce381ba2d009f226e14c3c14f	2020-02-04 14:45:58 -08:00
aviloria	4f5908d5d7	Remove unneded TORCH_API (#32015 ) Summary: It was causing a build error when compiling on MINGW64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32015 Differential Revision: D19697296 Pulled By: ezyang fbshipit-source-id: 71e58783c48f8e99755c091b2027d59740dfca47	2020-02-04 14:44:35 -08:00
Ralf Gommers	6305e4a88f	Add warning and example for seeding to DistributedSampler (#32951 ) Summary: Closes gh-31771 Also note that the `epoch` attribute is only used as a manual seed in each iteration (so it could easily be changed/renamed). Seeding consecutive iterations with `[0, 1, 2, ...]` is low-entropy, however in practice it probably doesn't matter when using the sampler in combination with a dataloader (because there won't be enough data nor epochs to run into statistical issues due to low-entropy seeding). So leaving that as is. Rendered docstring: <img width="534" alt="image" src="https://user-images.githubusercontent.com/98330/73701250-35134100-46e9-11ea-97b8-3baeb60fcb37.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32951 Differential Revision: D19729333 Pulled By: ezyang fbshipit-source-id: 3ddf90a3828b8bbae88aa2195a5d0b7d8ee1b066	2020-02-04 14:36:59 -08:00
Ashkan Aliabadi	b0d5ce3848	Revert D19710990: [pytorch][PR] properly update _flat_weights in RNN modules Test Plan: revert-hammer Differential Revision: D19710990 Original commit changeset: c978c7519464 fbshipit-source-id: 8710bc2f4f1d01d9c93d038b59caf1e6859375dd	2020-02-04 14:35:55 -08:00
cyy	27e1fecabd	let user specify CUDA_HOST_COMPILER Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32904 Differential Revision: D19729047 Pulled By: ezyang fbshipit-source-id: c233e3924f71a025c51d25a7e3a8d728dac8730a	2020-02-04 14:32:12 -08:00
hello@nicklashansen.com	d3a0bdd06b	proofreading (#29797 ) Summary: two instances of if -> it in torch.nn.modules.batchnorm.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/29797 Differential Revision: D19698613 Pulled By: ezyang fbshipit-source-id: 7312b2333f227113e904dfa91db90d00e525affb	2020-02-04 14:30:36 -08:00
Brian W. Hart	ea968f5cc3	fix possible pandas import error during tensorboard tests (#29650 ) Summary: TensorBoard tests using SummaryWriter() may fail with a pandas import complaint if TensorFlow packages are installed in the same python environment as PyTorch: Traceback (most recent call last): File "test_tensorboard.py", line 212, in test_writer with self.createSummaryWriter() as writer: File "test_tensorboard.py", line 64, in createSummaryWriter return SummaryWriter(temp_dir) ... File "[...]/site-packages/pandas/core/arrays/categorical.py", line 52, in <module> import pandas.core.algorithms as algorithms AttributeError: module 'pandas' has no attribute 'core' The exact failure may depend on the pandas version. We've also seen: File "[...]/site-packages/pandas/core/arrays/categorical.py", line 9, in <module> import pandas.compat as compat AttributeError: module 'pandas' has no attribute 'compat' The module import chain leading to the failure is tensorboard imports tensorflow imports tensorflow_estimator imports pandas. pandas includes a submodule named 'bottleneck', whose name collides with the PyTorch 'test/bottleneck/' subdirectory. So IF tensorboard, tensorflow, tensorflow_estimator, and pandas are installed in the python environment AND IF testing is run from within PyTorch's 'test/' directory (or maybe just with 'test/' in PYTHONPATH, etc.), then TensorBoard tests using SummaryWriter() will fail. Rename the 'bottleneck/' directory slightly to avoid the name collision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29650 Differential Revision: D19698638 Pulled By: ezyang fbshipit-source-id: cb59342ed407cb37aefc833d67f768a8809129ac	2020-02-04 14:27:46 -08:00
Shinichiro Hamaji	478356aeec	Fix broken links in governance.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30815 Differential Revision: D19697401 Pulled By: ezyang fbshipit-source-id: d7e1a1b54039624f471b6cfb568428feb73060f4	2020-02-04 14:26:09 -08:00
peng	18d1896ba0	Fix confusing "does not have GPU support" warning message (#30721 ) Summary: Many people who use caffe2 are confused about "does not have GPU support" warning message. https://github.com/facebookresearch/video-nonlocal-net/issues/6 facebookarchive/caffe2#346 facebookarchive/caffe2#1634 facebookarchive/caffe2#197 Many none GPU reasons can cause this warning message. It is better to give the error info. ![image](https://user-images.githubusercontent.com/13826327/70129721-41175e00-16ba-11ea-85df-a4b1a1690149.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30721 Differential Revision: D19697413 Pulled By: ezyang fbshipit-source-id: bd24b7c814e7e677352068b9e9f77a68de080159	2020-02-04 14:20:00 -08:00
Shinichiro Hamaji	67706187fb	Fix a broken link in contribution_guide.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30814 Differential Revision: D19697403 Pulled By: ezyang fbshipit-source-id: b01fd0e189b3bc7ccaa197c9c64e12fee70a6310	2020-02-04 14:14:25 -08:00
nihui	b69c685c4a	try to find cudnn header in /usr/include/cuda (#31755 ) Summary: With fedora negativo17 repo, the cudnn headers are installed in /usr/include/cuda directory, along side with other cuda libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31755 Differential Revision: D19697262 Pulled By: ezyang fbshipit-source-id: be80d3467ffb90fd677d551f4403aea65a2ef5b3	2020-02-04 14:10:32 -08:00
svcscm	e999095594	Updating submodules Summary: GitHub commits: `8f3d7019bb` `a5df50cf5c` `b896a52075` `3a073234da` `7c05bee055` `90f0aa9665` `5cdd1abbb9` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 70dd062814f68bda77e119bb9deaefbf71c551e6	2020-02-04 13:00:26 -08:00
peter	d3fa68eeec	Fix for MKL detection script on Windows (#32970 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32914. 1. Use `DEFINED ENV{MKLProductDir}` instead of `$ENV{MKLProductDir}` 2. Cache `INTEL_COMPILER_DIR` and `INTEL_MKL_DIR` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32970 Differential Revision: D19727677 Pulled By: soumith fbshipit-source-id: 065c6bee35a2295f1c478df1460cad7668b25af5	2020-02-04 12:41:39 -08:00
Jiakai Liu	e922826dda	[pytorch] simplify lazy initialization of DefaultCPUGenerator singleton (#32897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32897 Moving the default static instance into the method to achieve the same purpose. ghstack-source-id: 97570792 Test Plan: - CI Reviewed By: dreiss Differential Revision: D19674566 fbshipit-source-id: 27f54da66dd7667c34905eddaac6579e64aa1118	2020-02-04 11:37:14 -08:00
Mike Ruberry	aa3c871739	Adds TestViewOps, updates documentation (#32512 ) Summary: Understanding which ops return views and which return tensors with new storage is a common user issue, and an issue for developers connecting accelerators to PyTorch, too. This generic test suite verifies that ops which should return views do (and a few ops that shouldn't don't). The documentation has also been updated for .t(), permute(), unfold(), and select() to clarify they return views. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32512 Differential Revision: D19659454 Pulled By: mruberry fbshipit-source-id: b4334be9b698253a979e1bb8746fdb3ca24aa4e3	2020-02-04 11:10:34 -08:00
James Reed	341fb6d11d	Make caffe2/caffe2/python/models/seq2seq python3 compatible Test Plan: watiforsadcastle Reviewed By: dzhulgakov Differential Revision: D19698403 fbshipit-source-id: 36b73e07e598c848abbe368e522484da9ba4c78f	2020-02-04 10:51:47 -08:00
Jie	9e7c47644f	[NHWC CUDNN CONV]Update cudnn convolution memory_format behavior (#32482 ) Summary: 1. Allows both the memory_format of weight & input to dictate the output memory_format. 2. Provides utility function to recursively convert memory_format of Conv2d and ConvTranspose2d layers. This allows easy model conversion and ensures that lost memory_format through incompatible layers could be restored at Convolution-like layer, where significant performance boost is expected on later generation CUDA devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32482 Differential Revision: D19647903 Pulled By: VitalyFedyunin fbshipit-source-id: 62c96ff6208ff5e84fae1f55b63af9a010ad199a	2020-02-04 09:50:57 -08:00
Gregory Chanan	ec2c974bd5	Simplify some TH codegen by moving code out of the switch and killing dead code. (#32888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32888 This kills ~1500 lines of generated code by doing the following: 1) Stop binding _th_clone, which isn't used anymore. 2) Move allocation code out of the switch, because it doesn't need to be there, example: Now: ``` auto dispatch_scalar_type = infer_scalar_type(self); auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(c10::Storage(scalarTypeToTypeMeta(dispatch_scalar_type), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); switch (dispatch_scalar_type) { case ScalarType::Bool: { ... case ScalarType::Byte: { ... ``` Before: ``` auto dispatch_scalar_type = infer_scalar_type(self); switch(dispatch_scalar_type) { case ScalarType::Bool: { auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<bool>(), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); case ScalarType::Byte: { auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<byte>(), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); ``` Note there's one extra lookup from ScalarType -> TypeMeta, but that can go away once we are able to put everything in a dispatch macro. 3) Prepare for more moves out of the switch by using dispatch_scalar_type where we would have used an explicit ScalarType::Name More moves are currently blocked by "real" types needing to map scalar_type -> C++ type. Dispatch macros can solve that, but I'll need to wrap the actual TH calls in templates so the entire thing can be done via dispatch. 4) Kill some codegen that isn't used anymore: ALLOC_WRAP, is_actual_return_long. Test Plan: Imported from OSS Differential Revision: D19672613 Pulled By: gchanan fbshipit-source-id: 753f480842d11757e10182e43b471bd3abaa5446	2020-02-04 08:41:20 -08:00
Kimish Patel	820410b505	Added upsample_neartest2d op for lite interpreter. (#32913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32913 This enables mobile detection and tracking models. Test Plan: buck test caffe2/test/cpp/jit:jit -- JitTest.LiteInterpreterUpsampleNearest2d Reviewed By: iseeyuan Differential Revision: D19664502 fbshipit-source-id: 1c7270dcf394aba7b510c5aa80552c58a5038f24	2020-02-04 07:59:03 -08:00
Jeremy Lilley	b894dc06de	[Pytorch] Propagate errors in clearAndWaitForOutstandingRpcsAsync. (#32952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32952 When the Async() version of clearAndWaitForOutstandingRpcs() was written, we didn't yet have the generic Future<T> class, and hadn't worked out our error model fully. This change fixes that method to properly propagate the first encountered error to the future, using a bool+CAS. ghstack-source-id: 97665749 Test Plan: existing test coverage, buck test mode/dev-nosan caffe2/test/... Differential Revision: D19710337 fbshipit-source-id: 66ce5593a94a16ea624930dbb9409917ef5cfd5d	2020-02-03 20:47:51 -08:00
Yinghai Lu	b4b1b100bd	Add a loop test for onnxified net (#32935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32935 Mock away the content of onnxified net with some low cost ops so that we can still mimic the input/output transfer while doing minimal work on the card. Test Plan: ``` buck run glow/fb/test:sparsenn_test -- --gtest_filter='SparseNNTest.vanillaC2' --onnxifi_debug_mode --onnxifi_loop_test_mode --nocaffe2_predictor_use_memonger ``` Differential Revision: D19631971 fbshipit-source-id: f970c55ccb410702f479255eeb750e01e3f8c2ae	2020-02-03 18:35:41 -08:00
Natalia Gimelshein	df71b3e23a	properly update _flat_weights in RNN modules (#32939 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/32346 hopefully. Now when _flat_weights list is updated, `None` elements are appended to it if some weights are missing, subsequent `setattr` calls for the missing weights should repair _flat_weights and make it suitable to use in the backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32939 Differential Revision: D19710990 Pulled By: ngimel fbshipit-source-id: c978c7519464e94beeffa9bc33b9172854a2f298	2020-02-03 18:27:00 -08:00
Hong Xu	3cac9900ca	Clarify when softplus is reverted to linear. (#32945 ) Summary: The default value is removed because it is explained right below. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32945 Reviewed By: soumith Differential Revision: D19706567 Pulled By: ailzhang fbshipit-source-id: 1b7cc87991532f69b81aaae2451d944f70dda427	2020-02-03 17:54:31 -08:00
Basil Hosmer	544eab37d0	Move deprecation warning out of generated code into python_arg_parser. (#32907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32907 All op-specific information used in this logic was available to the parser itself, so the check can be done in that context, no codegen needed. No change in the warning behavior itself, mod minor formatting tweak - passes existing tests. Saves like ~275K binary size on mac: ``` -rwxr-xr-x 1 bhosmer 1876110778 16502064 Feb 1 00:43 torch/lib/libtorch_python.dylib -rwxr-xr-x 1 bhosmer 1876110778 16247888 Feb 1 00:44 torch/lib/libtorch_python.dylib ``` [codegen diff](https://github.com/bhosmer/scratch/compare/deprecation_warning_before...deprecation_warning_after) More important than the size savings is the minimization of codegen. Ideally the generated artifact should express distinctive per-op properties in as minimal a form as practically possible - e.g. here instead of generating check-and-warn behavior into every binding, we generate only the data that triggers the behavior in the parser. (And actually we were generating it already.) Test Plan: Imported from OSS Differential Revision: D19679928 Pulled By: bhosmer fbshipit-source-id: cf0140573118430720c6b797c762fe5be98acd86	2020-02-03 17:47:04 -08:00
Shinichiro Hamaji	612e621da0	Improve CHECK_OP macro (#29539 ) Summary: - Show values in question like glog. - Handle expressions with logical operators properly by adding parentheses around expressions. - Allow outputting nullptr (some build failed without this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29539 Reviewed By: dreiss Differential Revision: D19698991 Pulled By: ljk53 fbshipit-source-id: e329c01622cfc386ac009904092519a4adfe94a8	2020-02-03 17:27:41 -08:00
Sameer Deshmukh	5ca7bf453d	Tests for verifying behaviour of BatchNorm using 0-dim batch sizes. (#32384 ) Summary: The `BatchNorm*` part of the issue (see gh-12013) seems to have been fixed in the master branch and these tests would make it concrete. However I would appreciate comments on https://github.com/pytorch/pytorch/issues/12013#issuecomment-575871264 on whether the current behaviour is satisfactory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32384 Differential Revision: D19704154 Pulled By: ngimel fbshipit-source-id: 1bbbbf1ae1215a460b22cf26e6b263e518ecf60b	2020-02-03 16:58:23 -08:00
Xiang Gao	9c2ed2574a	Vectorized memory access in TensorIterator GPU loop for 1d contiguous case (#32383 ) Summary: Step 2 of https://github.com/pytorch/pytorch/issues/31975 Vectorized memory access is enabled. Generated code: https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb ``` void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3>) ASM: .section .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits .sectioninfo @"SHI_REGISTERS=20" .align 128 .global _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_ .type _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function .size _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40898 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_) .other _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT" _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 294 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R9, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 177 /0030/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 294 /0040/ IMAD.SHL.U32 R9, R9, 0x100, RZ ; /0050/ IADD3 R5, -R9, c[0x0][0x160], RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256 /0060/ SHF.R.S32.HI R17, RZ, 0x1f, R9 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 296 /0070/ ISETP.GE.AND P0, PT, R5, 0x100, PT ; /0080/ @!P0 BRA `(.L_3173) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256 /0090/ IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ; /00a0/ SHF.L.U64.HI R17, R9, 0x2, R17 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 260 /00b0/ IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ; /00c0/ IADD3 R2, P1, R12, c[0x0][0x190], RZ ; /00d0/ IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ; /00e0/ IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 218 /00f0/ IMAD.WIDE R8, R0, 0x10, R8 ; /0100/ IMAD.WIDE R2, R0, 0x10, R2 ; /0110/ LDG.E.128.SYS R8, [R8] ; /0120/ LDG.E.128.SYS R4, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256 /0130/ IADD3 R12, P0, R12, c[0x0][0x180], RZ ; /0140/ IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 238 /0150/ IMAD.WIDE R12, R0, 0x10, R12 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196 /0160/ FFMA R7, R7, c[0x0][0x168], R11 ; /0170/ FFMA R6, R6, c[0x0][0x168], R10 ; /0180/ FFMA R5, R5, c[0x0][0x168], R9 ; /0190/ FFMA R4, R4, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 238 /01a0/ STG.E.128.SYS [R12], R4 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 301 /01b0/ EXIT ; .L_3173: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /01c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; /01d0/ BMOV.32.CLEAR RZ, B0 ; /01e0/ BSSY B0, `(.L_3174) ; /01f0/ P0 BRA `(.L_3175) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0200/ IADD3 R3, P1, R9, R0, RZ ; /0210/ LEA.HI.X.SX32 R4, R0, R17, 0x1, P1 ; /0220/ LEA R2, P1, R3, c[0x0][0x188], 0x2 ; /0230/ LEA.HI.X R3, R3, c[0x0][0x18c], R4, 0x2, P1 ; /0240/ LDG.E.SYS R8, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0250/ IADD3 R4, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /0260/ ISETP.GE.AND P1, PT, R4, R5, PT ; /0270/ P1 BRA `(.L_3175) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0280/ LDG.E.SYS R4, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0290/ IADD3 R6, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /02a0/ ISETP.GE.AND P1, PT, R6, R5, PT ; /02b0/ P1 BRA `(.L_3175) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /02c0/ IADD3 R10, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /02d0/ LDG.E.SYS R7, [R2+0x200] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /02e0/ ISETP.GE.AND P1, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /02f0/ @!P1 LDG.E.SYS R6, [R2+0x300] ; .L_3175: /0300/ BSYNC B0 ; .L_3174: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /0310/ BMOV.32.CLEAR RZ, B0 ; /0320/ BSSY B0, `(.L_3176) ; /0330/ P0 BRA `(.L_3177) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0340/ IADD3 R3, P1, R9, R0, RZ ; /0350/ LEA.HI.X.SX32 R10, R0, R17, 0x1, P1 ; /0360/ LEA R2, P1, R3, c[0x0][0x190], 0x2 ; /0370/ LEA.HI.X R3, R3, c[0x0][0x194], R10, 0x2, P1 ; /0380/ LDG.E.SYS R11, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0390/ IADD3 R10, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /03a0/ ISETP.GE.AND P1, PT, R10, R5, PT ; /03b0/ P1 BRA `(.L_3177) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /03c0/ LDG.E.SYS R13, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /03d0/ IADD3 R10, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /03e0/ ISETP.GE.AND P1, PT, R10, R5, PT ; /03f0/ P1 BRA `(.L_3177) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0400/ IADD3 R10, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /0410/ ISETP.GE.AND P1, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0420/ LDG.E.SYS R10, [R2+0x200] ; /0430/ @!P1 LDG.E.SYS R15, [R2+0x300] ; .L_3177: /0440/ BSYNC B0 ; .L_3176: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0450/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0460/ IADD3 R9, P0, R9, R0, RZ ; /0470/ FFMA R11, R11, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197 /0480/ IADD3 R14, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0490/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ; /04a0/ LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /04b0/ ISETP.GE.AND P1, PT, R14, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /04c0/ LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ; /04d0/ STG.E.SYS [R2], R11 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /04e0/ P1 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197 /04f0/ IADD3 R8, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196 /0500/ FFMA R13, R13, c[0x0][0x168], R4 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0510/ ISETP.GE.AND P0, PT, R8, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0520/ STG.E.SYS [R2+0x100], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0530/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197 /0540/ IADD3 R0, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196 /0550/ FFMA R7, R10, c[0x0][0x168], R7 ; /0560/ FFMA R15, R15, c[0x0][0x168], R6 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0570/ ISETP.GE.AND P0, PT, R0, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0580/ STG.E.SYS [R2+0x200], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0590/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /05a0/ STG.E.SYS [R2+0x300], R15 ; /05b0/ EXIT ; .L_3178: /05c0/ BRA `(.L_3178); /05d0/ NOP; /05e0/ NOP; /05f0/ NOP; .L_40898: ``` We can clearly see the `LDG.E.128` in it, which is a result of vectorization. Benchmark: https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-vec.ipynb Benchmark on P100, dtype `uint8`: before: ``` 1.4.0a0+a5b4d78 e1d97025eeeddcf083e9bee0c8f6a53168991a71 22.2 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 34.7 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 52 µs ± 312 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 86.9 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 154 µs ± 204 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 291 µs ± 668 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 566 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.18 ms ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.29 ms ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.4 ms ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` after: ``` 1.4.0a0+a5b4d78 1281cdfd8188fe86241ecaf71d001809d016c3a3 24 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 30.5 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 43.1 µs ± 300 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 67.6 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 116 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 215 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 413 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 824 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.63 ms ± 478 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.19 ms ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Benchmark on P100, dtype `half`: Before: ``` 1.4.0a0+a5b4d78 1c017f0c14c91bd5125ab387a90441b0c0e2f3ad 30.8 µs ± 226 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 43.4 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 69.1 µs ± 83 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 119 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 224 µs ± 99.1 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 418 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 865 µs ± 237 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.69 ms ± 695 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.3 ms ± 527 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.77 ms ± 741 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ``` 1.4.0a0+a5b4d78 7e50ee27333e7047072d328d03767b4845286356 28.9 µs ± 61.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 40.2 µs ± 244 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 63.8 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 109 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 199 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 380 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 743 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.47 ms ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.91 ms ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.8 ms ± 296 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cc: csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/32383 Differential Revision: D19697455 Pulled By: ngimel fbshipit-source-id: 0707481c2f334e6634c000b4afd275b2fee8fbe1	2020-02-03 16:20:40 -08:00
Hector Yuen	4baadd54d7	add SpatialBN lowered fake fp16 Summary: SpatialBNFakeLoweredFp16NNPI this is the fake operator for SpatialBN that gets lowered into add/mul/div, etc. Test Plan: test_spatialbn Reviewed By: tracelogfb, amylittleyang Differential Revision: D19658680 fbshipit-source-id: 2abddbcd9a2023ac75c494f20eaac2051b7139dc	2020-02-03 15:03:34 -08:00
neginraoof	5c019fede3	[ONNX] Fix for constant folding flaky tests (#32546 ) Summary: Fix for constant folding flaky tests Looks like the constant folding test modules are sometimes exported with ONNX_ATEN op export type, which is causing the CI failures. I'm unable to repro this issue locally, but my guess is that the op export param is being overwritten on CI build at some point. This PR sets the op export type and hopefully fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32546 Reviewed By: hl475 Differential Revision: D19606919 Pulled By: houseroad fbshipit-source-id: 31793d6857bbbf99b43b4a7c22a045a56ae19e44	2020-02-03 14:23:50 -08:00
Pritam Damania	a751ddaaa5	Use leaky singletons for torch.distributed. (#32923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32923 As per https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 and https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use-members, we should be using leaky singletons to avoid static initialization order problem. Closes https://github.com/pytorch/pytorch/issues/27412 ghstack-source-id: 97601384 Test Plan: waitforbuildbot Differential Revision: D19688986 fbshipit-source-id: 8c1935fb7da8a7116dbca55eb43dc04bc02695ac	2020-02-03 14:15:18 -08:00
Santiago Castro	6996f8d880	Add missing `default_collate` in dataloader.pyi Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28935 Differential Revision: D19698781 Pulled By: ezyang fbshipit-source-id: abdd735c98656ed16cd326529441d1fcec2ace3e	2020-02-03 14:01:49 -08:00
BowenBao	1c42b9466b	[ONNX] Update support of exporting bool type index mask (#32445 ) Summary: e.g. `tensor[torch.tensor([0, 1, 0], dtype=torch.bool)]` Previously the mask is of type uint8. Both uint8 and bool should be supported for export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32445 Reviewed By: hl475 Differential Revision: D19610713 Pulled By: houseroad fbshipit-source-id: 8df636e0c3cb0b82919a689242a962c79220209c	2020-02-03 13:01:14 -08:00
neginraoof	e03e4f3a2d	[ONNX] Add einsum export (#32716 ) Summary: Adding symbolic for onnx einsum as part of opset 12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32716 Reviewed By: hl475 Differential Revision: D19626168 Pulled By: houseroad fbshipit-source-id: d8cc8af5f05f36aca3cd55dead602261ccdfec51	2020-02-03 12:56:50 -08:00
Santiago Castro	167a892e99	Add missing `shuffle` attribute to DistributedSampler typing file Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28763 Differential Revision: D19698808 Pulled By: ezyang fbshipit-source-id: 7820acd7b0715ebf1d9ae954dca0058b6759075e	2020-02-03 12:02:58 -08:00
Yang Yang	48eff08256	Fix the level of headers in pytorch/CONTRIBUTING.md (#28412 ) Summary: Running Clang-Tidy, Pre-commit Tidy/Linting Hook, Building PyTorch with ASAN shouldn't belong to Windows development tips. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28412 Differential Revision: D19700228 Pulled By: ezyang fbshipit-source-id: 39d999c68e4bd9264f4ae1fdab517871c883a663	2020-02-03 11:50:25 -08:00
Jonathan Reynolds	14c15eb3b0	Py2 -> py3 for caffe2/caffe2/contrib/tensorboard (#32882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32882 Update tensorboard binary and unit tests to python 3 Test Plan: ``` > buck test //caffe2/caffe2/contrib/tensorboard:tensorboard_test ``` ``` > buck test //caffe2/caffe2/contrib/tensorboard:tensorboard_exporter_test ``` Reviewed By: sanekmelnikov Differential Revision: D19670873 fbshipit-source-id: f5eb65ccbb4ecfdc801b9fa05a60d4c5c29dc428	2020-02-03 11:36:35 -08:00
David Samuel	00c6b90327	Fix in documentation of convolutional modules (#30079 ) Summary: I noticed the description of the initialization of convolutional modules is inconsistent with the actual implementation. There are two such cases: 1) `k` in the initialization of ConvTranspose modules is not dependent on the input channels but on the output channels (`kaiming_uniform_` uses the size of the second dimension of `weight` which is transposed in the first two dimensions). 2) Both the normal convolutions and the transposed ones use `k` divided by `groups`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30079 Differential Revision: D19698511 Pulled By: ezyang fbshipit-source-id: 1ba938fbbd97663eaf29fd1245872179d2761fff	2020-02-03 11:22:36 -08:00
Pierre Fenoll	37953d92d1	raise when jit-load.ing a folder (#27836 ) Summary: Very similar to https://github.com/pytorch/pytorch/issues/16267 but handling directories. Stoked to contribute! Pull Request resolved: https://github.com/pytorch/pytorch/pull/27836 Differential Revision: D19698398 Pulled By: ezyang fbshipit-source-id: eabc3a44d258124f860babb47ab91e22c2c3d6cc	2020-02-03 11:19:57 -08:00
kngwyu	3fa907c145	[docs] Fix argument type of torch.masked_select (#30385 ) Summary: This should be `BoolTensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30385 Differential Revision: D19698414 Pulled By: ezyang fbshipit-source-id: 68f1e10eb9d4b99552bb158f6ad7e6ff0f7cc1c4	2020-02-03 11:15:11 -08:00
BowenBao	10183061eb	[ONNX] Update ONNX landing page since 1.3 (#32805 ) Summary: * New ops supported for exporting. * Updates on support for tensor indexing and dynamic list of tensors. * lara-hdr, spandantiwari Should we also include updates on torchvision support in this page? cc houseroad, neginraoof Please review if I have missed anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32805 Reviewed By: hl475 Differential Revision: D19635699 Pulled By: houseroad fbshipit-source-id: b6be4fce641f852dcbceed20b4433f4037d8024a	2020-02-03 10:38:29 -08:00
StandbyMe	ef50161ec9	[JIT] Update OVERVIEW.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28870 Differential Revision: D19698758 Pulled By: ezyang fbshipit-source-id: 23167ec5bf9f7ab81012a124206bb4c2bdd6ca06	2020-02-03 10:32:36 -08:00
Wojciech Baranowski	7cddc302e5	min, max: check that operand and outputs are on the same device type (#32862 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32862 Differential Revision: D19695935 Pulled By: ezyang fbshipit-source-id: bb37eb7a187214aa69259828024366f479a258d7	2020-02-03 10:16:22 -08:00
Sameer Deshmukh	b34e0dda24	Emit the C++ version when compiling pytorch from source. (#32819 ) Summary: The need for this is felt because sometimes we change a build script and change the `std=c++XX` flag, which does not get caught until the compilation has progressed for a while. https://github.com/pytorch/pytorch/issues/31757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32819 Differential Revision: D19697205 Pulled By: ezyang fbshipit-source-id: b045a1d15e24c4c6007b5d1464756051d32bf911	2020-02-03 10:12:03 -08:00
cshesse	c841ab403c	add missing method annotations to torch.Tensor (#30576 ) Summary: Looks like some of the tensor methods defined in https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L393 were missing. Also add missing self object to `map_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30576 Differential Revision: D19698355 Pulled By: ezyang fbshipit-source-id: 6df99f17d5de11715dbe89aecb292612405c08ac	2020-02-03 09:59:14 -08:00
Enealor	e085c55e53	Fix `\\` warnings/errors when building optim documentation (#32911 ) Summary: This PR fixes the warnings and errors attributed to the use of `\\` outside of a proper environment. While rendered correctly in the documentation, it produces the warning ``` LaTeX-incompatible input and strict mode is set to 'warn': In LaTeX, \\ or \newline does nothing in display mode [newLineInDisplayMode] ``` on the CI tools and errors with ``` ParseError: KaTeX parse error: Expected 'EOF', got '\\' at position (x): ... ``` when not set to warn. This PR also makes minor formatting adjustments. The `CosineAnnealingLR` documentation has been adjusted to remove an unnecessarily large fraction and to improve spacing. The `SGD` documentation has been adjusted so that variables are consistently typeset and so that it follows the convention of punctuating equations. I attached images of the current documentation, the new documentation and a marked version to highlight differences. * SGD: New: ![new_sgd](https://user-images.githubusercontent.com/53704971/73596383-98795500-44d6-11ea-97ce-bac02a0a1638.png) Current: ![current_sgd](https://user-images.githubusercontent.com/53704971/73596384-98795500-44d6-11ea-86d3-b407cebbb513.png) Marked new: ![marked_sgd](https://user-images.githubusercontent.com/53704971/73596385-98795500-44d6-11ea-9e06-9ac5e5e27270.png) * CosineAnnealingLR: New: ![new_calr](https://user-images.githubusercontent.com/53704971/73596382-98795500-44d6-11ea-9c90-02406d297bae.png) Current: ![current_calr](https://user-images.githubusercontent.com/53704971/73596387-9911eb80-44d6-11ea-93fb-ee72d695312a.png) Marked new: ![marked_calr](https://user-images.githubusercontent.com/53704971/73596386-9911eb80-44d6-11ea-91a6-ed7a62b4e255.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32911 Differential Revision: D19697114 Pulled By: ezyang fbshipit-source-id: 567304bd4adcfa4086eae497cb818cf74375fe5d	2020-02-03 09:54:38 -08:00
Hong Xu	7101f6b5c0	Properly handle NaN in binary max and min (#32541 ) Summary: The output depends asymmetrically on whether the first or the second argument is NaN. See https://github.com/pytorch/pytorch/issues/25016 for detail of the issue. This is part of a continuing effort that was dropped in https://github.com/pytorch/pytorch/issues/30851 The failure in https://github.com/pytorch/pytorch/issues/27185 is resolved by explicitly casting a half type number to float when applying `isnan`. Close https://github.com/pytorch/pytorch/issues/25016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32541 Differential Revision: D19644643 Pulled By: VitalyFedyunin fbshipit-source-id: 8d49e6ed5a9996a817df7a9419dc5eee601430bc	2020-02-03 09:04:39 -08:00
Nikolay Novik	e87887ccb4	Update type hints for torch.optim.optimizer.Optimizer (#32900 ) Summary: This PR fixes type hints for `torch.optim.optimizer.Optimizer` object, issue also reported in https://github.com/pytorch/pytorch/issues/23731 To test things I used following optimiser implementation, that is fully covered with type hints: ```python from typing import Optional, Callable, Union, Iterable from torch import Tensor from torch.optim.optimizer import Optimizer OptClosure = Optional[Callable[[], float]] _params_t = Union[Iterable[Tensor], Iterable[dict]] class SGD(Optimizer): def __init__(self, params: _params_t, lr: float = 0.1) -> None: defaults = dict(lr=lr) super(SGD, self).__init__(params, defaults) def __setstate__(self, state: dict) -> None: super(SGD, self).__setstate__(state) def step(self, closure: OptClosure = None) -> Optional[float]: loss = None if closure is not None: loss = closure() for group in self.param_groups: for p in group['params']: if p.grad is None: continue d_p = p.grad.data p.data.add_(-group['lr'], d_p) return loss ``` Without fix `mypy` reports bunch of inconsistencies in types and missing properties: ```bash $ mypy torch_optimizer/sgd.py torch_optimizer/sgd.py:14: error: Too many arguments for "__init__" of "Optimizer" torch_optimizer/sgd.py:17: error: "__setstate__" undefined in superclass torch_optimizer/sgd.py:19: error: Return type "Optional[float]" of "step" incompatible with return type "None" in supertype "Optimizer" torch_optimizer/sgd.py:24: error: "SGD" has no attribute "param_groups" Found 4 errors in 1 file (checked 1 source file) ``` with fix not issues: ```bash $ mypy torch_optimizer/sgd.py Success: no issues found in 1 source file ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32900 Differential Revision: D19697175 Pulled By: ezyang fbshipit-source-id: d5e2b3c421f69da3df8c32b3d53b4b6d15d61a41	2020-02-03 09:00:01 -08:00
caozhong	29e6f13cd1	Enable MKL on MacOS if installed (#32905 ) Summary: Fix cmake script that missed MKL directories Signed-off-by: caozhong <zhong.z.cao@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32905 Differential Revision: D19688496 Pulled By: ezyang fbshipit-source-id: d04a608eea5f983e153a48b0b1eb0390aebbe6c0	2020-02-02 14:57:43 -08:00
svcscm	f8dd65f2a1	Updating submodules Summary: GitHub commits: `e384ddc186` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 18d4371821439388a6b546a1953c31856c80ec85	2020-02-02 14:56:10 -08:00
svcscm	ff0ba563d5	Updating submodules Summary: GitHub commits: `6eb4ee98ba` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 74dda0be26516756cd4d4d2df2167392fc48074a	2020-02-02 12:22:16 -08:00
Hong Xu	71ad88199a	Clarify the searched string is displayed in the error message Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32789 Differential Revision: D19646635 Pulled By: suo fbshipit-source-id: 18233fee7c75f7da2a1826fb66f78a519e6d9c77	2020-02-01 17:24:37 -08:00
Will Feng	b564eaf7a8	Bug fixes: torch::tensor(floating-point values) -> default dtype, and torch::tensor(integer values) ->at::kLong (#32367 ) Summary: Some of the `torch::tensor` behavior is updated to better match Python API. Fixes https://github.com/pytorch/pytorch/issues/32234. This PR is BC-breaking in the following way: - `torch::tensor({1.0f, 2.0f})`: float -> default dtype - `torch::tensor(at::ArrayRef<int>({1, 2, 3}))`: int -> at::kLong - `torch::tensor(std::vector<int>({1, 2, 3}))`: int -> at::kLong - `torch::tensor(at::ArrayRef<float>({1.f, 2.f, 3.f}))`: float -> default dtype - `torch::tensor(std::vector<float>({1.f, 2.f, 3.f}))`: float -> default dtype - `torch::tensor(at::ArrayRef<double>({1., 2., 3.}))`: double -> default dtype - `torch::tensor(std::vector<double>({1., 2., 3.}))`: double -> default dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/32367 Differential Revision: D19498484 Pulled By: yf225 fbshipit-source-id: 19c8dc2a56476266153cff4c404e7f84d309eb12	2020-02-01 15:00:07 -08:00
Zafar Takhirov	4cc6e6bbbe	Adding scalar to the c10 registration type check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32886 Test Plan: Imported from OSS Differential Revision: D19673484 Pulled By: z-a-f fbshipit-source-id: ea8478a4fe6788dcb044ec1ab7d51dc50ab3fa60	2020-02-01 13:15:50 -08:00
svcscm	ce07fb26c0	Updating submodules Summary: GitHub commits: `3f4acb24bb` `930ea23548` `c0c5daf3db` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 878178c5412375d74e7f64d7e4142f57ddbc931f	2020-02-01 13:14:30 -08:00
svcscm	c83f984906	Updating submodules Summary: GitHub commits: `5adba3596a` `d8b4f2ff66` `daa254211a` `9c4684ff10` `fdb82b21cb` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 4e74f7e888cc2004ba937d3bb253645fbd2388c5	2020-01-31 23:24:51 -08:00
Elias Ellison	040bc1d0e1	[JIT] make is_scripting a condvalue (#32871 ) Summary: Add `torch.jit.is_scripting` to the list of CondValues, or values that if they are an input to a if statement we only compile one side of the if. I'm not sure if we actually want this PR. Pros: - Makes it easier to add features that are not yet supported in TorchScript (like has_torch_function) - The current idiom of writing `torch.jit.is_scripting` and factoring out the block to a function annotated with `torch.jit.ignore` is functionally equivalent and much more cumbersome Cons: - Makes it easier to add features that are not yet supported in TorchScript - Perhaps is confusing as a reader what is being compiled. Potentially could give all caps name or otherwise change name to make it more visually stand out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32871 Differential Revision: D19670383 Pulled By: eellison fbshipit-source-id: 5257b0bd23c66f199d59a7f2c911e948301e5588	2020-01-31 18:23:42 -08:00
Sampath Mummadi	4d7ab255d3	[PyTorch][TorchScript] Add support for join on List of strings in TorchScript (#32847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32847 Add support for join on List of strings in TorchScript. Test Plan: (pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py Fail to import hypothesis in common_utils, tests are not derandomized . Ran 1 test in 1.090s OK Differential Revision: D19650809 fbshipit-source-id: 387a8f0e3cc3111fd3dadd3d54c90fc8c7774cf9	2020-01-31 18:20:38 -08:00
Rohan Varma	144eb59756	[rpc] don't crash callee when function does not exist on it, instead return Exception (#32726 ) Summary: Closes https://github.com/pytorch/pytorch/issues/27368. Previously, if a function `'func` did not exist on worker A but existed in B, and the user ran `rpc.rpc_sync(A, func)`, A would crash with a segmentation fault since it is not able to find the function. B would eventually timeout since RPCs by default time out in 60s. At the root this comes from an unhandled exception when trying to deserialize the `PythonUDF` to run. This PR makes it so that we can recover from this error, and A reports back a `RemoteException` to B indicating that the function was not found. Now, A will no longer crash and B can handle the exception appropriately and with more information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32726 Differential Revision: D19648825 Pulled By: rohan-varma fbshipit-source-id: 53847f4bfb68187db41c61d69ddac13613e814b4	2020-01-31 18:02:12 -08:00
svcscm	a8d39a7937	Updating submodules Summary: GitHub commits: `e0fd90427f` `c892e21dc6` `3cdc99f2b2` `800d24ddc5` `74326cdb3c` `e4af160c09` `6c2fb05f6d` `a0555ecf37` `e4122f77fc` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 9e3e0a7231c3e5cc0167cd935541dd7a8a4ea84d	2020-01-31 17:56:39 -08:00
Xingying Cheng	4493b10500	[PyTorch] Gate out mobile operator logging observer. Summary: Introduce separate gating for mobile operator logging observer. Reviewed By: ljk53 Differential Revision: D19665993 fbshipit-source-id: b81a228c55110a02edb8c2b6f9fd02e750b2ad69	2020-01-31 17:25:53 -08:00
Elias Ellison	10bd21d550	[JIT] fix nested select assign (#32877 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/31902 ``` self.sub.a = 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32877 Differential Revision: D19670322 Pulled By: eellison fbshipit-source-id: 6d8f350b4d1169be1d2a56050fccd7c246ad9212	2020-01-31 16:58:26 -08:00
Omkar Salpekar	ad78c0f4fc	Fixed the flaky test_rref_context_debug_info (#32749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32749 The test was flaky since the message from owner RRef confirming fork would arrive after the test checked whether the pending User RRefs map was empty - leading to an assertion error. This diff creates a utility function that should be used by any test to wait for this message to complete processing before doing any assertions related to the pending User RRefs map. GitHub Issue: https://github.com/pytorch/pytorch/issues/30988 Test Plan: Stress tested `test_rref_context_debug_info` 200 times. Differential Revision: D19612289 fbshipit-source-id: 57a7c19b1cf792b94c263d3efbbbb6da60c07d07	2020-01-31 16:53:18 -08:00
Charles Hofer	d03c9aaa05	Fix upsampling test case on ppc (#32786 ) Summary: Power and x86 are giving slightly different results when scaling images up using `torch.nn.functional.interpolate` and when using OpenCV's `resize`. This is causing `test_upsampling_not_recompute_scale_factor` to fail on Power, but not x86. This changes the expected value to what OpenCV on Power produces if the test case is running on Power as well. See https://github.com/pytorch/pytorch/issues/31915 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/32786 Differential Revision: D19672053 Pulled By: ezyang fbshipit-source-id: 3497f852bdc6d782646773792f9107c857c7b806	2020-01-31 16:40:56 -08:00
Elias Ellison	fe01376ffe	[JIT] namedtuple constants (#32873 ) Summary: If there was a namedtuple with immutable constant inputs, that was also the input / output of a function which expected a namedtuple it would fail. Fix by using namedtuple constructor on serialization. (no one has run into this bug yet). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32873 Differential Revision: D19668807 Pulled By: eellison fbshipit-source-id: bae33506e53b6a979b4e65a3e7c989b1408c98f4	2020-01-31 15:25:31 -08:00
Zafar Takhirov	fbe121e395	Quantized sigmoid function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31851 Test Plan: Imported from OSS Differential Revision: D19280716 Pulled By: z-a-f fbshipit-source-id: f47d37e32a675756fcaca293e2c14f90c43891de	2020-01-31 14:40:21 -08:00
Kushashwa Ravi Shrimali	7b65acdf9e	Solves Issue #32750 - torch.prod now works fine with FP16 Input Tensor and FP32 Output Tensor (#32831 ) Summary: This PR solves Issue https://github.com/pytorch/pytorch/issues/32750. - Changes function prod_kernel_impl to use `out_t` argument instead of `scalar_t` (which caused the garbage output for FP16 input and FP32 output tensor type). - Adds test case for `torch.prod` (for CUDA): tests both `torch.prod` and `torch.tensor.prod`. Checks all the combinations for dtypes: `torch.float16` and `torch.float32`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32831 Differential Revision: D19664666 Pulled By: ngimel fbshipit-source-id: c275363355c832899f10325043535949cd12b2f8	2020-01-31 14:25:08 -08:00
Jerry Zhang	8ddd5bb0e9	Don't serialize None values in observer (#32733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32733 Similar to https://github.com/pytorch/pytorch/pull/32318, we should stop serializing None values since they can't be broadcasted Test Plan: Imported from OSS Differential Revision: D19611586 Pulled By: jerryzh168 fbshipit-source-id: 369881de0567ed8eb25bdada892227f49bb5b29d	2020-01-31 13:28:43 -08:00
Gregory Chanan	1760d5b83c	Remove wrap_dim from codegen layer. (#32738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32738 This is to simplify the codegen layer, with the goal of making it simple enough to just check in. Test Plan: Imported from OSS Differential Revision: D19610927 Pulled By: gchanan fbshipit-source-id: 760734f579b1f655775e6d270918c361985f3743	2020-01-31 13:13:35 -08:00
Hong Xu	660a93c558	Code cleaning: Some iterating variables in builtin_functions.cpp can be const (#32852 ) Summary: To suppress a clang-tidy warning: torch/csrc/jit/script/builtin_functions.cpp#L89 [performance-for-range-copy] warning: loop variable is copied but only used as const reference; consider making it a const reference Also make the const qualifier of scalar explicit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32852 Differential Revision: D19663277 Pulled By: ezyang fbshipit-source-id: f4ec5688d3cbea9a5f40db6063b7d111b0bf0cce	2020-01-31 12:55:20 -08:00
Jiakai Liu	ada966b7d7	[pytorch] avoid `thread_local std::vector<Call>` for mobile build (#32849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32849 We learned that Android NDK's gcc + gnustl combination might produce a use-after-free for thread_local variables with non-trivial destructors. This PR removes such a thread_local use case from error_report.cpp for mobile build, which is the only case included in mobile lite-JIT build. ghstack-source-id: 97491327 Test Plan: - CI Reviewed By: dreiss Differential Revision: D19652702 fbshipit-source-id: ee8d316ad5c6e6c8a8006eb25f3bba1618dd7e6d	2020-01-31 12:48:57 -08:00
Gao, Xiang	d9e99ab544	Loops.cuh legacy code cleanup -- gpu_kernel_with_index (#32777 ) Summary: I didn't see any use case where the functor of `gpu_kernel_with_index` needs to have argument other than the index. Merge conflict with https://github.com/pytorch/pytorch/pull/32755. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32777 Differential Revision: D19646381 Pulled By: ngimel fbshipit-source-id: 81d2be74170457e39943274e3689845e83758bfa	2020-01-31 12:02:50 -08:00
svcscm	fd3bd7777d	Updating submodules Summary: GitHub commits: `01fc273e29` `53222db222` `dea724242e` `3dd493b166` `ec496347bc` `03f4ec299e` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: e362b5df2099f1c3dd2ef7702d4bbd5bb85e4b27	2020-01-31 11:54:30 -08:00
Hong Xu	b16dab8a41	Coding header is better specified in lowercase letters (#32850 ) Summary: The Python document <https://www.python.org/dev/peps/pep-0263/> gives all examples using lowercase letters. Although it doesn't say straightly, the following paragraph seems to indicate that uppercase letters aren't legitimate: > If a source file uses both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an error. My Emacs also complains about the uppercase letters every time I save the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32850 Differential Revision: D19663281 Pulled By: ezyang fbshipit-source-id: 48127d3c2fd6e22dd732a2766913735136ec2ebc	2020-01-31 10:02:30 -08:00
svcscm	22466552e3	Updating submodules Summary: GitHub commits: `edc4a4f551` `72c7112964` `62c8286307` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 92dd070a28091dda81e315591d6d12cddfecf00f	2020-01-31 10:01:15 -08:00
svcscm	ed10408cc6	Updating submodules Summary: GitHub commits: `a3394d248c` `91f92d0106` `e50c78af57` `d49bb54c3d` `504fda5cda` `42086f8764` `d5b454a9c0` `0e31e0a8b0` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7ce9d3444d653c6889ffe080425aa082c33f137a	2020-01-30 22:05:39 -08:00
Martin Yuan	03557a9838	Make save_for_lite_interpreter private (#32771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32771 It's a patch to #32621, make the api private. Test Plan: Imported from OSS Differential Revision: D19657307 Pulled By: iseeyuan fbshipit-source-id: e604a0cbed6a1e61413daaafc65bea92b90f1f5d	2020-01-30 21:01:54 -08:00
Nikolay Korovaiko	c3b4bfcfed	Add knobs to set the number of profiling runs and bailout depth (#32735 ) Summary: Diagnostic API to simplify debugging and experiments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32735 Differential Revision: D19626708 Pulled By: Krovatkin fbshipit-source-id: aa8c0da94d4559329fd7c8093329aea4e0271b6a	2020-01-30 18:50:56 -08:00
Shihao Xu	12bcfa7c77	Remove Python dependency (toPyTuple/fromPyTuple, jitCompilationUnit, deserialize) in rref_impl.h/cpp (#32753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32753 Functions to be bound as an Aten operator could not have Python dependency. This is to refactor and remove Python dependency. ghstack-source-id: 97485800 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D5741675 fbshipit-source-id: 31ee60955be8d815d0773f3699e3ff2f1f9d8849	2020-01-30 17:52:48 -08:00
Natalia Gimelshein	29fabb1fbc	make tests for empty inputs check zero parameter grads (#32820 ) Summary: Make batch norm with empty inputs return zero parameter gradients. Now batch norm, group norm and convolutions now return zero grads for parameters, so make tests check that. Fixes some bullet points in https://github.com/pytorch/pytorch/issues/12013 (interpolate is not fixed by this PR, is being fixed in other PRs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32820 Differential Revision: D19651470 Pulled By: ngimel fbshipit-source-id: 96fdd085f9b0e98e91217dd2ac1f30f9c482b8be	2020-01-30 17:42:55 -08:00
Hovhannes Harutyunyan	bc2e05a398	Update Docs for building PyTorch for Android. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32578 Reviewed By: ljk53 Differential Revision: D19588904 Pulled By: dreiss fbshipit-source-id: 2934752b9c5b94f2f141417669d8385be44d703b	2020-01-30 17:12:03 -08:00
Xiang Gao	fcf9fcedf4	Remove needs_dynamic_casting from TensorIterator and move it to Loops.cuh (#32755 ) Summary: Remove `needs_dynamic_casting` from TensorIterator and move it to `Loops.cuh`. The original design of `needs_dynamic_casting` is fundamentally flawed: it injects logics into TensorIterator and uses a bunch of boolean values to test whether the dynamic casting is needed. This makes it very fragile, as the TensorIterator is so complicated and it is easy to introduce unnecessary dynamic casts. It also makes the `gpu_kernel` very unflexible, differently cases needs to manipulate TensorIterator to make it work. For example, currently ```python torch.zeros(10, device='cuda').mul_(0.9) ``` needs dynamic cast, but it shouldn't. Testing whether dynamic casting is needed could be easy: just compare the dtypes of the lambda with the dtypes of operands. If they don't match, then dynamically cast, otherwise don't cast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32755 Differential Revision: D19644092 Pulled By: ngimel fbshipit-source-id: 130bb8bd78d20c2ed1bdfc9d9fb451eb0f0c7e55	2020-01-30 17:06:23 -08:00
root	0f0972051a	Cudnn bn size fix (#32763 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/29744 by falling back to native batch norm implementation, if cudnn cannot execute the provided shape. Shape numbers were verified for cudnn 7.6.5.32 with tensor shapes: ```python # for spatial bn x = torch.Size([880801, 256, 5]) x = torch.Size([65535, 256, 5]) x = torch.Size([880801, 64, 4, 4]) x = torch.Size([65535, 64, 4, 4]) # for per-act bn x = torch.Size([131070, 2048]) x = torch.Size([262136, 2048]) ``` for `training()` and `eval()` mode using `torch.float32` and `torch.float16`. I've increased the shape of our current smoke test to, but I can also add all use cases of the support matrix, if wanted. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/32763 Differential Revision: D19644328 Pulled By: ngimel fbshipit-source-id: c2151bf9fe6bac79b8cbc69cff517a4b0b3867aa	2020-01-30 16:57:15 -08:00
Lu Fang	bcb7c22679	[PyTorch BC] Fix the ci (#32843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32843 fix the ci by skipping aten::join Test Plan: ci Reviewed By: hl475 Differential Revision: D19650584 fbshipit-source-id: 4446eef568ded334217ff9205a795daffebe41a1	2020-01-30 16:05:03 -08:00
svcscm	5380e16db9	Updating submodules Summary: GitHub commits: `73638a8795` `7a83deaa83` `969d173d11` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 399ed7a972876727a6bfd1409667c735c406fef5	2020-01-30 15:41:49 -08:00
Gaurav Singh	765904f1b9	[torch] fd error check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32797 Differential Revision: D19642262 Pulled By: mrshenli fbshipit-source-id: 1720812166dd583dca6d72cb7e24b65ec013a62b	2020-01-30 15:30:03 -08:00
Yinghai Lu	94ddc2c462	Resubmit more code fakefp16 mapping unification (#32798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32798 ATT Test Plan: unittests Reviewed By: amylittleyang Differential Revision: D19632251 fbshipit-source-id: 670004050d67415bb24392f3520afa32b64ce740	2020-01-30 12:48:48 -08:00
Edward Yang	690d41f24e	Centralize addition of "always on" dispatch keys. (#32734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32734 VariableTensorId is the only key with this treatment today, but BackendSelect and CompoundOp are coming soon. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628091 Pulled By: ezyang fbshipit-source-id: 250753f90528fa282af7a18d8d2f7736382754bd	2020-01-30 11:49:40 -08:00
Edward Yang	5ddd2cd92b	Make DispatchKeyGuards accept DispatchKey::Undefined (#32729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32729 When working on the vmap prototype I noticed that this was helpful as it lets me easily initialize a no-op guard, if I need to do it at constructor time (which I usually do, because the guards don't have move constructors). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628092 Pulled By: ezyang fbshipit-source-id: d6259a3f70d287cdac2e4a5f3984e2880f19bdc2	2020-01-30 11:49:35 -08:00
Edward Yang	3d0a470d89	Rename DispatchKey::UndefinedTensorId to Undefined (#32728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32728 It doesn't have much to do with tensors anymore. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628093 Pulled By: ezyang fbshipit-source-id: 4d57111cdf44ba347bec8a32bb5b4b47a83c1eaf	2020-01-30 11:47:40 -08:00
Shen Li	a40a19ccab	Remove GIL from RRefContext (#32807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32807 After this commit, RRefContext no longer depends on pybind. Test Plan: Imported from OSS Differential Revision: D19636316 Pulled By: mrshenli fbshipit-source-id: 88faa101c32e9019e979ae8e5da6706e49842726	2020-01-30 10:53:25 -08:00
Mike Ruberry	413c0f6c29	Fixes moving after weight norm application (#32563 ) Summary: This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN. One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563 Differential Revision: D19602725 Pulled By: mruberry fbshipit-source-id: d8f9441d17815c8c9ba15b256d4be36f784a3cf9	2020-01-30 10:31:11 -08:00
peter	9bab617b3e	Make python version a parameterizable option for Windows CI. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32823 Differential Revision: D19642347 Pulled By: ezyang fbshipit-source-id: a4d461aa29a06bb7f5e5d359a2df2c90e9a4fd41	2020-01-30 08:16:43 -08:00
James Reed	cc35c876cb	Fix backcompat for linear_relu_dynamic_fp16 (#32803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32803 Stack from [ghstack](https://github.com/ezyang/ghstack): * #32803 Fix backcompat for linear_relu_dynamic_fp16 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D19642281 Pulled By: albanD fbshipit-source-id: 3b6ae4dd81bf8a70dd81ccbb02fffd7653bbd08c	2020-01-30 08:08:29 -08:00
albanD	fa65859270	Re-enable non-deterministic autograd tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32793 Test Plan: Imported from OSS Differential Revision: D19634632 Pulled By: albanD fbshipit-source-id: 9dda29536c2ed4afb81ecbea471ba615241bbac2	2020-01-30 08:00:19 -08:00
Pavel Belevich	85bd3e5bdb	Removing @expectedFailureXLA from test_nll_loss_empty_tensor_reduction_mean (#32701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32701 Because it's disabled in XLA(https://github.com/pytorch/xla/pull/1563) Discussed in https://github.com/pytorch/xla/issues/1539 Test Plan: Imported from OSS Differential Revision: D19633349 Pulled By: pbelevich fbshipit-source-id: b9a81c976a96b325356ff210ff838dfcd5352db7	2020-01-30 07:38:12 -08:00
Edward Yang	6874278985	Revert D19611800: [PyTorch][TorchScript] Add support for join on List of strings in TorchScript Test Plan: revert-hammer Differential Revision: D19611800 Original commit changeset: cef66356abc1 fbshipit-source-id: 41af9e0de83b1fb808b17255ec905e137909457d	2020-01-30 06:46:28 -08:00
Shihao Xu	b0923acb29	Reduce RPC branches for Python/BuiltinOp/TorchScript (#32689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32689 As described in https://github.com/pytorch/pytorch/issues/32565 ghstack-source-id: 97440343 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D5721814 fbshipit-source-id: 9079e81764be1e7c7b85dd72a18c76f3ecfd2547	2020-01-30 01:19:35 -08:00
Basil Hosmer	affd598c1f	Fix/simplify alias annotation handling in op codegen. (#32574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32574 Previously, we ignored alias annotations when deriving argument mutability and instead recognized particular signature patterns (in-place, out variant) and assigned mutability accordingly. Op signatures that didn't fit these patterns would error (e.g. see #30526, which this fixes). No change in the generated binding code. Code changes: 1. in function_wrapper.py, fix the mutability derivation logic used when creating an argument's c++ type property. Note that we temporarily need to trap a special case and apply the old logic, see code comment for details. 2. in gen_jit_dispatch.py, update logic that assumed only one mutable Tensor argument per declaration. Happily this mostly was accomplished by bypassing some now-redundant signature regeneration machinery. Another special case here requires that we keep the old machinery around temporarily. Test Plan: Imported from OSS Differential Revision: D19564875 Pulled By: bhosmer fbshipit-source-id: 5637a9672923676d408c9586f3420bcc0028471a	2020-01-30 00:31:03 -08:00
Basil Hosmer	fb159b5236	Some work on eager op binding codegen (gen_python_functions.py) (#29986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986 Previously in addition to generating a python binding for each op, we would generate an almost-trivial helper for each overload. This PR eliminates the helpers, simplifying codegen logic a bit and reducing the source-level indirection by a step. Perf should be unchanged. codegen diff: `1f2f07fb60` Note: in the interests of keeping the diff contained, there's only some light cleanup here beyond what's necessary for the codegen changes. Plan is to do some more substantial refactoring in followup PRs that leave generated code unchanged. Test Plan: Imported from OSS Differential Revision: D18567980 Pulled By: bhosmer fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906	2020-01-30 00:29:53 -08:00
Jeremy Lilley	821b6aa769	[pytorch] Minor: avoid acquiring GIL twice in PyRRef::localValue() (#32785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32785 Add PythonRpcHandler::handleExceptionWithGIL() so that in PyRRef::localValue(), we don't need to release the GIL and re-acquire the following line. ghstack-source-id: 97418465 Test Plan: existing test coverage Differential Revision: D19626195 fbshipit-source-id: db694d04b078811f819626789e1e86f1b35adb5b	2020-01-29 21:27:43 -08:00
Supriya Rao	c2d736cefb	Add support for Dynamic LSTM quantization on Mobile (#32757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32757 This PR updates the main quantize_dynamic API to use QNNPACK backend for mobile Test Plan: python test/test_quantization.py PostTrainingDynamicQuantTest.test_quantized_rnn Imported from OSS Differential Revision: D19632220 fbshipit-source-id: b4c51485c281d088524101b97c84dd806438b597	2020-01-29 20:55:48 -08:00
Brian Stark	55c382e62b	Fixed access to element in size tensor for scripting (#32652 ) Summary: when using scripting, there was an error in attempting to access a specific element from within the size tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32652 Reviewed By: hl475 Differential Revision: D19610726 Pulled By: houseroad fbshipit-source-id: bca49927bbe71dbe7e7d7edf301908fe79e089b5	2020-01-29 18:33:46 -08:00
Sampath Mummadi	8ead65a946	[PyTorch][TorchScript] Add support for join on List of strings in TorchScript Summary: Add support for join on List of strings in TorchScript. Test Plan: (pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 1.090s OK Differential Revision: D19611800 fbshipit-source-id: cef66356abc14dfd100a806d25dd1a8bc9af0a11	2020-01-29 18:22:52 -08:00
Mingzhe Li	cccf5e7011	Resolve rendezvous race condition Summary: When running the ctr_mbl_feed, we've encountered hang issue related to the rendezvous handshake based on zeus. It was mitigated by this diff https://our.intern.facebook.com/intern/diff/D19167151/. This diff resolves the race condition by adding a reference to the rendezvous handler. Test Plan: x7340282797 Reviewed By: yifuwang Differential Revision: D19627293 fbshipit-source-id: 560af289db8ef6cf8d6f101f95ec27d5a361fd04	2020-01-29 17:49:07 -08:00
Michael Suo	3552be1090	[jit] fix the NoneType param/buffer hack (#32745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32745 Some parameters (like `bias` in conv) are optional. To achieve this previously, you had to add `bias` as a constant, which would invoke some pretty weird behavior in the frontend, summarized as: ``` if bias is not None: add it as a parameter normally else: # bias is None add it as a constant with the value None ``` There are several things bad about this: 1. Bias is not a constant. Marking it `__constants__` is confusing. 2. It basically relies on an implementation detail (the frontend processes parameters before constants) to work. Okay, whatever. I don't even know why we did this originally, but getting rid of it doesn't break anything, so I assume improved NoneType refinement has made this a non-issue. Note on perf: this will make no difference; if bias was `None` it's still folded out today, if bias is a Tensor it would be added as a parameter both before and after this change Test Plan: Imported from OSS Differential Revision: D19628634 Pulled By: suo fbshipit-source-id: d9128a09c5d096b938fcf567b8c23b09ac9ab37f	2020-01-29 17:04:39 -08:00
Natalia Gimelshein	2e359ef86d	enable empty batch for all flavor of convolutions (#32709 ) Summary: resubmitting https://github.com/pytorch/pytorch/issues/32612 after a merge gone wrong. Enables convolution with an empty batch or number of channels for all flavors of convolution (grouped convolution, convTranspose). Would make https://github.com/pytorch/pytorch/issues/31658 unnecessary. Also returns zero gradients for the parameters, that's necessary for correct DDP operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32709 Differential Revision: D19627968 Pulled By: ngimel fbshipit-source-id: 7359759bd05ff0df0eb658cac55651c607f1b59f	2020-01-29 16:33:48 -08:00
Jianyu Huang	a840afbeb4	[pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32683 Pull Request resolved: https://github.com/pytorch/glow/pull/4079 Similar to D17768404, we changed the EmbeddingBag operator for 8-bit fused version to add the option to include the last offset and parallelize the op. ghstack-source-id: 97404645 Test Plan: To generate the AVX2 code (`embedding_lookup_fused_8bit_rowwise_idx_avx2.cc`): ``` python hp_emblookup_codegen.py --fused --use-offsets ``` To test the correctness: ``` buck test //caffe2/torch/fb/sparsenn:test -- test_embedding_bag_byte_rowwise_offsets --print-passing-details ``` Reviewed By: yinghai Differential Revision: D19592761 fbshipit-source-id: f009d675ea3f2228f62e9f86b7ccb94700a0dfe0	2020-01-29 16:04:56 -08:00
Enealor	b565d9b356	Logspace fixes (#32744 ) Summary: Reopening of PR https://github.com/pytorch/pytorch/issues/32631 with `viable/strict` base for testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/32744 Differential Revision: D19626090 Pulled By: ngimel fbshipit-source-id: ed0fc759198ee2edc23afdcb1e190a11d70ec4c8	2020-01-29 15:17:00 -08:00
James Reed	fc2ff7912f	[quantization] Remove incorrect fp16 dynamic linear/relu op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32774 Test Plan: Imported from OSS Differential Revision: D19624471 Pulled By: jamesr66a fbshipit-source-id: eb6cb11fabf2ddd5edf345aff35b86b83c3af94c	2020-01-29 14:50:24 -08:00
Pavel Belevich	9357b91180	Remove -Werror from test/cpp_extensions/setup.py (#32704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32704 -Werror is too aggressive check for test cpp extensions because it fails even on deprecation warnings which is are included from core codebase. Fixes #32136 Test Plan: Imported from OSS Differential Revision: D19620190 Pulled By: pbelevich fbshipit-source-id: 0e91566eb5de853559bb59e68a02b0bb15e7341b	2020-01-29 14:12:32 -08:00
Gao, Xiang	8b187e8f2a	Fix ivalue_inl.h:353:29: warning: comparison of unsigned expression >= 0 is always true (#32778 ) Summary: `slot` is unsigned integer which is `always >= 0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32778 Differential Revision: D19625789 Pulled By: ngimel fbshipit-source-id: c92c35c65d4372be934283e87aeba99e9e0ef353	2020-01-29 14:04:05 -08:00
Edward Yang	c47c78d0bf	Revert D19597036: More code fakefp16 mapping unification Test Plan: revert-hammer Differential Revision: D19597036 Original commit changeset: deed61945884 fbshipit-source-id: c057e57810a99464aefb00b645613ecd6a7c5533	2020-01-29 13:32:42 -08:00
Edward Yang	3ee6673e99	Refreshing numel on a stride update is pointless. (#32116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32116 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579875 Pulled By: ezyang fbshipit-source-id: 00393c9dc101967c79231bfae36b23b7b80135fb	2020-01-29 13:26:28 -08:00
Edward Yang	8c6f52ac24	Delete resize_dim() (#32114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32114 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579876 Pulled By: ezyang fbshipit-source-id: d09a231ba891403a06eae0c2203e0ad7dd6d3a12	2020-01-29 13:26:23 -08:00
Edward Yang	b371eab8c7	Expunge last two sites of resize_dim (#32112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32112 It turns out we already removed these from the CPU version; copy the changes over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579874 Pulled By: ezyang fbshipit-source-id: e40efbf94e128fd81421b227b76dd9c9c0256d96	2020-01-29 13:25:22 -08:00
Edward Yang	c7df28a2a3	Delete copy/move constructors on these RAII guards. (#32727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32727 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19621858 Pulled By: ezyang fbshipit-source-id: 5112c849252478d8249de4f8c8c5a2d6caf60672	2020-01-29 13:20:15 -08:00
Edward Yang	5ffa1efa52	Add missing C10_API to dispatch key TLS setter/getters (#32557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32557 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579853 Pulled By: ezyang fbshipit-source-id: 45f83a7a5ead0344e4c13526abb5fafdedaed4a4	2020-01-29 13:20:09 -08:00
Edward Yang	3b47922855	Improve documentation in dispatcher; remove unnecessary optional (#32533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32533 Applies renames based on comments in #32439. I also updated some other documentation and variable names while I was at it. Fixes #32435. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579854 Pulled By: ezyang fbshipit-source-id: 85021a92a2a84501f49ee5c16318f81f5df64f8d	2020-01-29 13:18:29 -08:00
Kurt Mohler	8cb05e72c6	Port BCELoss to ATen to increase accuracy (#31365 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/24933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31365 Differential Revision: D19557712 Pulled By: ezyang fbshipit-source-id: 3ae78c949b2f6c21b294d986d28e09daa9b0c526	2020-01-29 12:58:37 -08:00
Edward Yang	50d82f5122	Make VC++ version a parametrizable option for Windows CI. (#32043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32043 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19621910 Pulled By: ezyang fbshipit-source-id: dce00a56ff679548fd9f467661c3c54c71a3dd4e	2020-01-29 12:11:47 -08:00
Deepali Chourasia	e84f9d9d0c	Fix TensorProtosDBInput AttributeError (#32274 ) Summary: https://github.com/pytorch/pytorch/issues/6794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32274 Differential Revision: D19621889 Pulled By: ezyang fbshipit-source-id: 1bdd042b6421a2798c7f1e9030dfc6dfc1246989	2020-01-29 12:05:43 -08:00
Ailing Zhang	8693164acb	Randomize xla port (#32718 ) Summary: fixes https://github.com/pytorch/pytorch/issues/30717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32718 Differential Revision: D19607998 Pulled By: ailzhang fbshipit-source-id: 81ba9c7c71988a64cdc8fa5500967509657438fe	2020-01-29 12:04:01 -08:00
Yanli Zhao	b5d8982ae2	clean up GIL usuage (#32748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32748 This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function ghstack-source-id: 97373011 Test Plan: unit test Differential Revision: D19612337 fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890	2020-01-29 11:58:46 -08:00
Ivan Kobzarev	eab99ab08e	[android] fbjni DoNotStrip annotation for oss native methods (#32567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32567 As a first change to support proguard. even if these methods could be not called from java, on jni level we register them and this registration will fail if methods are stripped. Adding DoNotStrip to all native methods that are registered in OSS. After integration of consumerProguardFiles in fbjni that prevents stripping by proguard DoNotStrip it will fix errors with proguard on. Test Plan: Imported from OSS Differential Revision: D19624684 Pulled By: IvanKobzarev fbshipit-source-id: cd7d9153e9f8faf31c99583cede4adbf06bab507	2020-01-29 11:52:53 -08:00
Dylan Bespalko	2471ddc96c	Improved speed of frobenous norm for non-complex dtype (#30871 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex) Changes: [x] Fixed performance issue raise in https://github.com/pytorch/pytorch/issues/30704 so that non-complex numbers do not call `conj()` and `real()`. [x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in https://github.com/pytorch/pytorch/issues/27064. [x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`. - `checkBackend()` is replaced with a device type check and a layout check. - This ensures the ComplexCPU Type ID is supported. [x] Added AVX support for complex `exp()`, as requested in https://github.com/pytorch/pytorch/issues/755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30871 Differential Revision: D19200726 Pulled By: ezyang fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277	2020-01-29 11:43:53 -08:00
Pavel Belevich	b1c85dd916	Custom RNG DispatchKey (#32325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32325 The purpose of this PR is to enable PyTorch dispatching on `at::Generator` parameters and demonstrate how it can be used in cpp extensions to implement custom RNG. 1. `CustomRNGKeyId` value added to DispatchKey enum and `DispatchKeySet key_set_` added to `at::Generator` 2. The overloaded `operator()(at::Generator gen)` added to MultiDispatchKeySet. 3. The existing CPUGenerator and CUDAGenerator class are supplied with CPUTensorId and CUDATensorId dispatch keys 4. The implementation of CPU's `cauchy_kernel`(as an example, because it's already moved to ATen) was templatized and moved to `ATen/native/cpu/DistributionTemplates.h` to make it available for cpp extensions 5. Minor CMake changes to make native/cpu tensors available for cpp extensions 6. RegisterCustomRNG test that demonstrates how CustomCPUGenerator class can be implemented and how custom_rng_cauchy_ native function can be registered to handle Tensor::cauchy_ calls. Test Plan: Imported from OSS Differential Revision: D19604558 Pulled By: pbelevich fbshipit-source-id: 2619f14076cee5742094a0be832d8530bba72728	2020-01-29 11:30:04 -08:00
Yinghai Lu	642c9ef922	More code fakefp16 mapping unification Summary: ATT Reviewed By: amylittleyang Differential Revision: D19597036 fbshipit-source-id: deed61945884fb4b01d058f3c72c75f5a937a41c	2020-01-29 11:01:24 -08:00
Xiang Gao	d119de8abd	Deduplication of type casting codes (#32730 ) Summary: These codes are implemented twice at different places by different people, we should merge them together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32730 Differential Revision: D19622023 Pulled By: ezyang fbshipit-source-id: a9cbda31428b335bf28a7e4050f51f58e787b94f	2020-01-29 10:13:15 -08:00
Rohan Varma	cbb744f00f	apply linter to rpc test files (#32659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32659 Applies linter to RPC test files so that we can use linter shortcuts without getting unnecessary changes to the whole file. ghstack-source-id: 97361237 Test Plan: No actual changes. Differential Revision: D19584742 fbshipit-source-id: a11ce74ee0e2817e6f774fff7c39bcab06e99307	2020-01-29 09:49:45 -08:00
Max Balandat	8bc889e502	Fix crash of SobolEngine if default tensor type is cuda (#32496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32496 Addresses https://github.com/pytorch/pytorch/issues/32494 Test Plan: ``` import torch from torch.quasirandom import SobolEngine torch.set_default_tensor_type(torch.cuda.FloatTensor) se = SobolEngine(3) ``` Reviewed By: 2timesjay Differential Revision: D19517571 fbshipit-source-id: 02eb499ffbd4260474d348e9bb536fb8c36c2c31	2020-01-29 08:49:18 -08:00
Seyyed Hossein Hasanpour	c7bf4d22fe	added exception args to the returned error message (#32693 ) Summary: addresses https://github.com/pytorch/pytorch/issues/32692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32693 Differential Revision: D19606757 Pulled By: mrshenli fbshipit-source-id: 79fc09f8bb6a33e1b73ce0bbc45387544c7adc1b	2020-01-29 08:26:27 -08:00
Gregory Chanan	c35ca84eee	Get rid of some unused THGenerate*Type defines. (#32657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32657 The goal here is to simplify the codegen enough that we can just handwrite the bindings, so anything in here is "bad". Test Plan: Imported from OSS Differential Revision: D19584521 Pulled By: gchanan fbshipit-source-id: 93005b178228c52a1517e911adde2e2fe46d66a5	2020-01-29 08:12:45 -08:00
albanD	594cadeb8f	Make sure temporary vectors are properly initialized in avx2 code (#32722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32722 Checked using [this](https://godbolt.org/z/uAaE9R) that it gives the correct assembly. Test Plan: Imported from OSS Differential Revision: D19610012 Pulled By: albanD fbshipit-source-id: 4d1cb812951ae03d412a0fba3c80730f0d286e1f	2020-01-29 07:58:25 -08:00
Elias Ellison	5e2311033e	fix windows build (#32762 ) Summary: remove windows visibility macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/32762 Differential Revision: D19616367 Pulled By: eellison fbshipit-source-id: d824162fe92bff4cb2b1a170312cd14b6d7bd99d	2020-01-28 22:55:48 -08:00
svcscm	fd850685da	Updating submodules Summary: GitHub commits: `b81d0657df` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 82d39025e331083e58c0d0cc9b47985e590bb289	2020-01-28 21:03:34 -08:00
Natalia Gimelshein	62d652f922	replaces .at with [] in getSlot (#32677 ) Summary: per title. cc qizzzh Pull Request resolved: https://github.com/pytorch/pytorch/pull/32677 Differential Revision: D19596094 Pulled By: ngimel fbshipit-source-id: 06177b9e12d203d84b541205437ef2ad51db0fac	2020-01-28 20:49:03 -08:00
Elias Ellison	c729614997	[JIT] Improve May Contain Alias Using Contained Elements (#32326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32326 Now that we have type-level granularity we can improve `mayContainAlias` queries. Each new values is initialized as containing the wildcard set of each contained mutable type. Whenever a value is added to a container it is set to the wildcard set. Now, to check if any two values contain overlapping values, we can just check if the `containedMemoryLocations` of two sets overlap. Test Plan: Imported from OSS Differential Revision: D19563262 Pulled By: eellison fbshipit-source-id: c6d7489749c14b2054a6d50ef75baca699ada471	2020-01-28 18:08:56 -08:00
Elias Ellison	25d33a2ee8	[JIT] Use Type Level Granularity in Alias Analysis Wildcards (#32251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32251 Previously wildcard sets were associated by TypeKind, meaning all Lists were in one alias set, all Classes were in one alias set, etc. We can improve analysis by bucketing wildcard sets by TypePtr instead. Any two mutable types which can unify should be in the same wildcard set bucket. This also allows us do much simpler `mayContainAlias` analysis, and also improves `analyzeConservative` analysis because now we can recurse through all contained memory locations and mark writes, instead of just recursing only level deep in contained elements. Test Plan: Imported from OSS Differential Revision: D19563263 Pulled By: eellison fbshipit-source-id: 371a37d1a8596abc6c53f41c09840b6c140ea362	2020-01-28 18:07:48 -08:00
Yinghai Lu	02f055ffd9	Add mapping for FbFCPacked in fakefp16 transform Summary: ATT. Since the infra is there. Test Plan: run it Reviewed By: amylittleyang Differential Revision: D19605250 fbshipit-source-id: c68be4d7963afa4fa5f8f60c90f1913605eae516	2020-01-28 17:00:24 -08:00
Huamin Li	18aab32959	Move exponential_ from TH to Aten (CPU) (#32501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32501 This diff will address https://github.com/pytorch/pytorch/issues/24699 We ask the input `lambda` to be >= 0 to be same as https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.exponential.html#numpy-random-exponential. This does not exist in the previous implementation. Benchmark I am using PT operator microbenchmark ``` ================================================================================ Before the change, Program Output: ================================================================================ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: exponential_ # Mode: Eager # Name: exponential__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 21311.746 ================================================================================ After the change, Program Output: ================================================================================ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: exponential_ # Mode: Eager # Name: exponential__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 20919.914 ================================================================================ ``` Test Plan: Sandcastle and Github tests Reviewed By: BIT-silence Differential Revision: D19518700 fbshipit-source-id: 0e79cb6a999c1278eb08b0d94cf61b119c85a36c	2020-01-28 16:59:22 -08:00
Xinyi Zhang	1f78bd0774	[caffe2] Early error throwing for currupted embeddings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32717 Reviewed By: xianjiec Differential Revision: D19604954 fbshipit-source-id: c02eccf048c0dba3f66d729ab1fda50f3cacef63	2020-01-28 16:55:29 -08:00
Jianyu Huang	6f7d5bb3e1	Temporarily disable the test_quantized_rnn test (#32742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32742 As Title says (Check https://github.com/pytorch/pytorch/issues/32644). ghstack-source-id: 97352793 Test Plan: CI Differential Revision: D19611029 fbshipit-source-id: 9f4a155c909f419e41c1d7078eb2796dd17cedd2	2020-01-28 16:50:59 -08:00
Brian Stark	43d31ae4c3	Added ONNX model checker to ONNX export (#32298 ) Summary: Included the ONNX model checker code in the ONNX export this will force onnx checker to run for all models that get exported. This should help with validating exported models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32298 Reviewed By: hl475 Differential Revision: D19538251 Pulled By: houseroad fbshipit-source-id: eb20b124fe59200048f862ddaf20f6c59a0174d5	2020-01-28 16:28:54 -08:00
Marc Lacayo	99228086a6	Added missing period in README. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32723 Differential Revision: D19607256 Pulled By: mlacayo fbshipit-source-id: 2993014d4d90fa26acd5bc01ed7494cc43a29a62	2020-01-28 16:25:04 -08:00
Owen Anderson	e74e1ccc47	Use direct vector indexing in Object::getSlot() instead of at(). (#31627 ) Summary: This method is pretty hot. In an internal workload, this single call to at() accounted for ~2% of overall cycles. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31627 Reviewed By: yinghai Differential Revision: D19607779 Pulled By: qizzzh fbshipit-source-id: 1684919049a35fdad686d8396c7dce7243ab92d4	2020-01-28 16:17:16 -08:00
Alban Desmaison	ee60cd9124	Back out "fix view listing in autograd codegen" (#32720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32720 Original commit changeset: 5ebc4c978af5 Test Plan: existing tests Reviewed By: chenyangyu1988 Differential Revision: D19603336 fbshipit-source-id: 56051a716c4eedf49cfe7367ff447b4b9c5429ea	2020-01-28 16:10:47 -08:00
davidriazati	2060e0a9dd	Split serialization tests to their own file (#32241 ) Summary: Stacked PRs * #32244 - Make zip serialization the default * #32241 - Split serialization tests to their own file This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`. ](https://our.intern.facebook.com/intern/diff/19415826/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32241 Pulled By: driazati Differential Revision: D19415826 fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b	2020-01-28 15:04:05 -08:00
Jongsoo Park	0327e75e14	Back out "[caffe2] use JIT'ed fp32 SLS" (#32711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32711 Original commit changeset: 4f29d34523ef Test Plan: CI Differential Revision: D19603967 fbshipit-source-id: af3f647fff416a84290a42217747948bac4d73c6	2020-01-28 14:07:11 -08:00
Yinghai Lu	ffdcbadeaa	Minor refactoring to improve code reuse (#32675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32675 It's good to have one location to do the mapping. Test Plan: Everything still runs. Reviewed By: amylittleyang Differential Revision: D19590354 fbshipit-source-id: d8c0d14e4bdf27da3e13bd4d161cd135d6e3822b	2020-01-28 13:31:48 -08:00
Rohan Varma	9de3208449	[rpc][flaky-tests] fix for test_handle_send_exceptions and (#32656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32656 Fixes these flaky tests. Test Plan: Run the test 500 times and verify that it succeeds every time. Differential Revision: D19584453 fbshipit-source-id: 07cbc4914211f274182ac0fa74bb5ef6d43392d1	2020-01-28 12:40:12 -08:00
Rohan Varma	6e7e595c1d	[rpc][easy] remove redundant test in rpc_test.py (#32588 ) Summary: Both `test_wait_all_workers` and `test_wait_all_workers_and_shutdown` test the same pattern of initialize RPC, call `_wait_all_workers`, and `rpc.shutdown(graceful=False)`. `test_wait_all_workers` seems to be more thorough since it tests one worker driving and the others waiting on it as well. We shouldn't have duplicate test so removing this `test_wait_all_workers_and_shutdown`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32588 Differential Revision: D19566294 Pulled By: rohan-varma fbshipit-source-id: b69519d169b3964649d47ad75532bda5de538241	2020-01-28 11:55:17 -08:00
James Reed	0ea65d63cf	[JIT] Fix stateful lambda stuff and simplify code in custom C++ binding API Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32658 Test Plan: Imported from OSS Differential Revision: D19584701 Pulled By: jamesr66a fbshipit-source-id: d556c7db2f32900eb1122348402789b59516a7d7	2020-01-28 11:03:04 -08:00
James Reed	465ebd58ba	[JIT] pickle serialization for custom bound classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32604 Test Plan: Imported from OSS Differential Revision: D19566633 fbshipit-source-id: 9387d3ff45cbd6ccde49ce190a52859481cc301c	2020-01-28 11:02:59 -08:00
James Reed	34ccfba403	[JIT] Include custom_class.h in torch/script.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32586 Test Plan: Imported from OSS Differential Revision: D19558716 fbshipit-source-id: be540d8ed7de0834e64be89ae621ae50befc83b0	2020-01-28 11:02:54 -08:00
James Reed	06c19263d3	[JIT] Serialize attributes and types in ClassType serialization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32555 Test Plan: Imported from OSS Differential Revision: D19544737 Pulled By: jamesr66a fbshipit-source-id: 2256cfba414a850cdc986bb5872dd4cb177b456c	2020-01-28 11:02:49 -08:00
James Reed	1719da13f9	[JIT] Support for registering C++ lambdas as methods on custom C++ class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32553 Test Plan: Imported from OSS Differential Revision: D19543269 Pulled By: jamesr66a fbshipit-source-id: 7e566650295e9d1c4f2f716470e061308a6210a0	2020-01-28 11:01:07 -08:00
Eli Uriegas	da390914bd	.circleci: Add workflows for Python 3.8 (#31948 ) Summary: Done by just editing `.circleci/cimodel/data/dimensions.py` to include `3.8` and then regenerated using `.circleci/regenerate.sh` cc kostmo, mingbowan, ezyang, soumith Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31948 Differential Revision: D19602069 Pulled By: seemethere fbshipit-source-id: ac57fde9d0c491c7d948a3f5944c3cb324d403c0	2020-01-28 10:26:03 -08:00
Nikolay Korovaiko	0dc38be407	consider FAIL_GUARD while counting indices for GUARDs (#32672 ) Summary: This handles a corner case when a user schedules second bailout after the first one and the first one doesn't fire. Alternatively, we could go back to the implementation that uses a hash set to remember the indices of bailouts that need to fire. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32672 Differential Revision: D19596872 Pulled By: Krovatkin fbshipit-source-id: 41dcc374cd2501ac20a9892fb31a9c56d6640258	2020-01-28 08:59:25 -08:00
Martin Yuan	c64dec1993	Python binding to export bytecode format for lite interpreter (#32621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32621 Export the "_save_for_mobile" method to Python so that the bytecode format for lite interpreter can be added or updated to the original script model. It's the first step of python binding for lite interpreter, as discussed in this [internal post](https://fb.workplace.com/groups/1144215345733672/permalink/1478900738931796/) and offline. Next step is to export the load_for_mobile and run method of mobile module, so that users could verify the mobile model from Python. Test: use the following python script to display the bytecode part of the updated model file. ``` #!/usr/bin/env python3 import sys import pickle import pprint import zipfile class FakeObject(object): def __init__(self, module, name, args): self.module = module self.name = name self.args = args self.state = None def __repr__(self): state_str = "" if self.state is None else f"(state={self.state!r})" return f"{self.module}.{self.name}{self.args!r}{state_str}" def __setstate__(self, state): self.state = state class FakeClass(object): def __init__(self, module, name): self.module = module self.name = name self.__new__ = self.fake_new def __repr__(self): return f"{self.module}.{self.name}" def __call__(self, args): return FakeObject(self.module, self.name, args) def fake_new(self, args): return FakeObject(self.module, self.name, args) class DumpUnpickler(pickle._Unpickler): def find_class(self, module, name): return FakeClass(module, name) def persistent_load(self, pid): return FakeObject("pers", "obj", (pid,)) def main(argv): zfile = zipfile.ZipFile(argv[1]) names = [i for i in zfile.namelist() if "bytecode.pkl" in i] if not names: print("bytecode.pkl not found.") return with zfile.open(names[0], "r") as handle: value = DumpUnpickler(handle).load() pprint.pprint(value) if __name__ == "__main__": sys.exit(main(sys.argv)) ``` Test Plan: Imported from OSS Differential Revision: D19596359 Pulled By: iseeyuan fbshipit-source-id: 19a4a771320f95217f5b0f031c2c04db7b4079a8	2020-01-28 08:30:20 -08:00
Gregory Chanan	e24ce0e524	Kill some more unused code in function_wrapper.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32600 Test Plan: Imported from OSS Differential Revision: D19565654 Pulled By: gchanan fbshipit-source-id: 993c3dc5467639a7690109d07911951a165a412f	2020-01-28 07:38:51 -08:00
comet	9a2691f2fc	Fix spelling errors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32673 Differential Revision: D19597118 Pulled By: pietern fbshipit-source-id: f88c1da7548fcee141ed248f5f49d25c1d639955	2020-01-28 04:46:15 -08:00
Michael Suo	63170431f9	[jit] fix segfault on missing getstate (#32642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32642 Previously, if we defined `__setstate__` but not `__getstate__`, we would segfault. This PR turns that into a comprehensible error message (and improves another error message as well). Fixes https://github.com/pytorch/pytorch/issues/25886 Test Plan: Imported from OSS Differential Revision: D19596463 Pulled By: suo fbshipit-source-id: dbe76bc36bc747d65fb0223184c009e0e9ba072c	2020-01-28 01:25:37 -08:00
Wojciech Baranowski	8e4161517e	div_kernel: throw when dividing by integer zero (#32629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32629 Differential Revision: D19595782 Pulled By: ezyang fbshipit-source-id: f5bbb298f150efe63a698e8a0b53a84871d16560	2020-01-27 21:41:00 -08:00
Pritam Damania	b3848c568e	Fix flaky test_nccl_timeout. (#32653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32653 This test was flaky since the watchdog thread could abort the communicator instead of the thread calling `wait()`. As a result, we could actually see `NCCL error` instead of `Operation timed out` on the user end. ghstack-source-id: 97250714 Test Plan: waitforbuildbot Differential Revision: D19583003 fbshipit-source-id: 5c07326d1a16f214dcdbabed97ca613e0a5b42b9	2020-01-27 21:09:40 -08:00
James Reed	d68592a440	[JIT] Fix classes as attributes in recursive scripting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32594 Test Plan: Imported from OSS Differential Revision: D19562951 Pulled By: jamesr66a fbshipit-source-id: 3d5491c1c23456f107390a78be16da687de951e6	2020-01-27 20:37:48 -08:00
Shihao Xu	b9f764b1c7	Use the C++ current RpcAgent pointer to eliminate the unnecessary argument passing from Python world (#32635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32635 With the source of truth of current RPC agent moved to C++ world, there is no point of passing current RPC agent from Python world to C++ world. ghstack-source-id: 97293316 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info ``` Differential Revision: D5703519 fbshipit-source-id: ef7c28bdb1efd293eb6cafe0b0fca7ca80fa08a6	2020-01-27 20:24:32 -08:00
Hong Xu	666e5430f8	Clean up mvlgamma doc (including a weird way to link to reference) (#32667 ) Summary: Intentionally left blank Pull Request resolved: https://github.com/pytorch/pytorch/pull/32667 Differential Revision: D19594683 Pulled By: ezyang fbshipit-source-id: 5a6eb0a74f569d3c0db2a35e0ed4b329792a18e4	2020-01-27 20:12:17 -08:00
Alban Desmaison	db8ce7ea2d	Back out "Make autogen functions correct for multiple outputs and views" (#32681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32681 Original commit changeset: a2b41c2d231e Test Plan: fb and oss tests Reviewed By: hudeven Differential Revision: D19591864 fbshipit-source-id: 7068b5563e37bc9a5d415fd535c73fd9d71fe131	2020-01-27 19:54:34 -08:00
Shihao Xu	5c8535d5b0	Make C++ RpcAgent::currentRPCAgent_ the source of truth of current RPC Agent (#32633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32633 There were 2 sources of current RPC agent. - One is in Python world, `torch.distributedrpc.api._agent`. - The other is in C++ world, `RpcAgent::defaultRpcAgent_` Setting Python `_agent` to `None`, does not necessarily reset the C++ `defaultRpcAgent_` to `nullptr`. i.e. ``` torch.distributedrpc.api._agent = None ``` does not translate to ``` RpcAgent::defaultRpcAgent_ = nullptr ``` This PR is to remove this ambiguity, and use the C++ pointer as source of truth. The solution is to leverage a pybind11 behavior that it implicitly casts C++ `shared_ptr<RpcAgent>(nullptr)` to Python `None`. ghstack-source-id: 97293315 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_duplicate_name buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info ``` ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_remote_module buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_embedding buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc ``` Differential Revision: D5733066 fbshipit-source-id: b3e6032ee975f19ca556497edbbf40b517b25be8	2020-01-27 19:34:12 -08:00
svcscm	1217c9b364	Updating submodules Summary: GitHub commits: `3f156207e8` `135cff30a5` `7aa66c704f` `1dc4136644` `9166d9f767` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: fb27e09060ecb4278b4002c02bce48fe9f4dc361	2020-01-27 18:34:38 -08:00
Shihao Xu	1695915371	Make _wait_all_workers() support being called for multiple times (#32624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32624 We need this PR to resolve the issue mentioned in https://github.com/pytorch/pytorch/issues/31325#issuecomment-574918917. The solution is for each `_wait_all_workers()` call, there is a sequence ID added, to identify different calls. ghstack-source-id: 97277591 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_wait_all_workers buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_wait_all_workers ``` Differential Revision: D5739520 fbshipit-source-id: a64131e09c365179624700514422f5375afe803f	2020-01-27 17:04:02 -08:00
Ivan Kobzarev	39987de9e4	[vulkan][caffe2] Add logging for descriptor extensions, fp16 storage Summary: `fbcode/caffe2/caffe2/mobile/contrib/libvulkan-stub/BUCK` changes comment: libvulkan-stub contains vulkan headers `VK_HEADER_VERSION 29` fbandroid uses ndk r17 that includes vulkan `VK_HEADER_VERSION 76` which contains defines for extensions that we need. `("include", "*/.h"),` -> `("include", "*.h"),` means that ndk vulkan headers to use. For fp16_storage logging need to add boilerplate for `vkGetPhysicalDeviceFeatures2KHR` Test Plan: scuba employees device_event logcat getVulkanInfo(). ``` instance ext.name:VK_KHR_surface instance ext.name:VK_KHR_android_surface instance ext.name:VK_EXT_swapchain_colorspace instance ext.name:VK_KHR_get_surface_capabilities2 instance ext.name:VK_EXT_debug_report instance ext.name:VK_KHR_device_group_creation instance ext.name:VK_KHR_external_fence_capabilities instance ext.name:VK_KHR_external_memory_capabilities instance ext.name:VK_KHR_get_physical_device_properties2 instance ext.name:VK_KHR_external_semaphore_capabilities device ext.name:VK_KHR_incremental_present device ext.name:VK_EXT_hdr_metadata device ext.name:VK_KHR_shared_presentable_image device ext.name:VK_GOOGLE_display_timing device ext.name:VK_KHR_push_descriptor device ext.name:VK_KHR_image_format_list device ext.name:VK_EXT_queue_family_foreign device ext.name:VK_ANDROID_external_memory_android_hardware_buffer device ext.name:VK_KHR_external_semaphore_fd device ext.name:VK_KHR_external_fence_fd device ext.name:VK_KHR_external_memory_fd device ext.name:VK_KHR_external_memory device ext.name:VK_KHR_swapchain device ext.name:VK_KHR_external_semaphore device ext.name:VK_KHR_driver_properties device ext.name:VK_KHR_sampler_mirror_clamp_to_edge device ext.name:VK_KHR_multiview device ext.name:VK_KHR_relaxed_block_layout device ext.name:VK_KHR_maintenance1 device ext.name:VK_KHR_maintenance3 device ext.name:VK_KHR_maintenance2 device ext.name:VK_EXT_global_priority device ext.name:VK_KHR_get_memory_requirements2 device ext.name:VK_KHR_descriptor_update_template device ext.name:VK_KHR_bind_memory2 device ext.name:VK_KHR_shader_draw_parameters device ext.name:VK_KHR_dedicated_allocation device ext.name:VK_KHR_create_renderpass2 device ext.name:VK_KHR_draw_indirect_count device ext.name:VK_KHR_sampler_ycbcr_conversion device ext.name:VK_KHR_device_group device ext.name:VK_KHR_external_fence device ext.name:VK_KHR_variable_pointers device ext.name:VK_EXT_sampler_filter_minmax device ext.name:VK_KHR_storage_buffer_storage_class VULKAN_SYMBOL_WRAPPER_LOAD_INSTANCE_SYMBOL(vkGetPhysicalDeviceFeatures2KHR) res=1 mChipsetInfoUtilInfo.getVulkanInfo():{vk_driver_version=2149056512, vk_device_id=100859905, vk_extension_descriptor_update_template=1, vk_api_version=4198487, vk_support_fp16_storage=0, vk_platform_dlopen=success, vk_shader_int16=1, vk_device_type=1, vk_shader_float64=0, vk_extension_push_descriptor=1, vk_shader_int64=0, vk_wrapper_init=true, vk_vendor_id=20803, vk_max_compute_shared_memory_size=32768, vk_device_name=Adreno (TM) 630, vk_max_compute_work_group_invocations=1024, vk_device_count=1} ``` Reviewed By: dreiss Differential Revision: D19564664 fbshipit-source-id: 908b34bdcc24d9b03ecc185edbc5cfb6e7aa27c9	2020-01-27 16:34:47 -08:00
James Reed	812b1ad869	[quantization] FP16 dynamic quantized Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32331 Test Plan: Imported from OSS Differential Revision: D19441158 Pulled By: jamesr66a fbshipit-source-id: c04247ffe707be68718c486c31bc6c6040f7dc11	2020-01-27 15:45:32 -08:00
svcscm	389b9c180b	Updating submodules Summary: GitHub commits: `9ae8cbb0a1` `986df37135` `ef4d11b6e1` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 04e7a5ad02cb412ef36672ec30e10a898c525232	2020-01-27 14:43:34 -08:00
Edward Yang	57519bd829	Revert "Fix iterator for ncclCommWatchdog. (#32571 )" (#32649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32649 This reverts commit 59dbece3716775c3e6f3a428f73fbf1bde8fac4f. Revert "Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338)" This reverts commit f86d6c6afd0e981642d20b4269837334ec46c140. Test Plan: Imported from OSS Differential Revision: D19584224 Pulled By: ezyang fbshipit-source-id: 6cc0ad56ba1f3aec5b48db44e8c6c24c8105db4a	2020-01-27 14:25:30 -08:00
Gregory Chanan	897b6908d4	Kill THIntegerTensor, THDenseTensor, THDenseIndexTensor. (#32599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32599 these aren't used anymore. Test Plan: Imported from OSS Differential Revision: D19565655 Pulled By: gchanan fbshipit-source-id: c0da31365df7342352f9850ae2a2e1e611a6886b	2020-01-27 13:26:31 -08:00
Zafar Takhirov	f6c46df856	Adding native qconcat Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32252 Test Plan: Imported from OSS Differential Revision: D19422889 Pulled By: z-a-f fbshipit-source-id: 23dd5f50009cc4c46b36c39ae1168b57f9a977a4	2020-01-27 11:24:46 -08:00
Edward Yang	f0917dce7f	Revert D19562258: [pytorch][PR] Fixes moving after weight norm application Test Plan: revert-hammer Differential Revision: D19562258 Original commit changeset: 4fef006e32cd fbshipit-source-id: 62e40de19331a61f4a65b7371460fe7dc28f23ea	2020-01-27 11:18:19 -08:00
Yinghai Lu	64323ae177	Back out "Use simd version for fp16 conversions" (#32640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32640 Original commit changeset: 3b1ee0ba756e Reverting according to https://our.intern.facebook.com/intern/diff/D19291499/?transaction_id=1347995678706116&dest_fbid=465672071047258 Test Plan: unittests. Reviewed By: jspark1105, jianyuh Differential Revision: D19576708 fbshipit-source-id: bec92318523498067935234ab702c925ece71da6	2020-01-27 10:01:24 -08:00
Mike Ruberry	e36cbb8f2f	Fixes moving after weight norm application (#32563 ) Summary: This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN. One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563 Differential Revision: D19562258 Pulled By: mruberry fbshipit-source-id: 4fef006e32cdfd8e3e3d519fc2ab5fc203dd7b36	2020-01-27 09:57:43 -08:00
Johannes M Dieterich	5ac2593d4f	[ROCm] Adjust elementwise_kernel settings on ROCm (#32609 ) Summary: Recent PR https://github.com/pytorch/pytorch/issues/31974 and upcoming PR https://github.com/pytorch/pytorch/issues/32383 are changing the behavior of the elementwise_kernel infrastructure on CUDA. In order to stay in sync, change the nd-loop behavior to match ROCm and CUDA for now. Once the full rework is done, the ROCm settings will likely diverge again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32609 Differential Revision: D19580121 Pulled By: ezyang fbshipit-source-id: 4c8dcf6db3ac973e48ece6a665615cfe7d7cb764	2020-01-27 09:26:28 -08:00
Sameer Deshmukh	ca9dc67094	0-dim batch size input for interpolate. (#32400 ) Summary: This PR adds support for 0-dim batch size input for `torch.nn.functional.interpolate` for various modes of interpolation. Fixes part of gh-12013 CC: rgommers ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/32400 Differential Revision: D19557090 Pulled By: ezyang fbshipit-source-id: 6822f148bb47bfbcacb5e03798bf2744f24a2a32	2020-01-27 09:24:46 -08:00
Sameer Deshmukh	602394e996	verify input sizes for instance norm and group norm (#29082 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29082 Differential Revision: D19373507 Pulled By: ezyang fbshipit-source-id: 231a79280f4cd7db2c26218a60869356a124bf77	2020-01-27 09:05:56 -08:00
xiaobing.zhang	19bb496a0d	Enable mkldnn on windows (#31355 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15982. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31355 Differential Revision: D19428979 Pulled By: ezyang fbshipit-source-id: bee304c5913e70e8dead3098e9796051861cd666	2020-01-27 09:00:02 -08:00
Chaitanya Sri Krishna Lolla	957a07ffbd	[ROCm] Enable Caffe2 video operators for ROCm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32610 Differential Revision: D19580129 Pulled By: ezyang fbshipit-source-id: 16d620173dcc231068e041d599aa09c94e677a9e	2020-01-27 08:29:07 -08:00
Rohan Varma	5b321a0985	[rpc] make handling of FORWARD_AUTOGRAD_REQ in request_callback_impl (#32476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32476 This makes the handling of FORWARD_AUTOGRAD_REQ in request_callback nonblocking. Processing this message requires unwrapping the message with autograd information, processing the original message, and sending back the message with autograd information wrapped. This makes the processing the original message nonblocking by grabbing a future to it and marking the parent future as completed when this one completes. ghstack-source-id: 97221251 Test Plan: `test_rpc_spawn.py` and `test_dist_autograd_spawn.py` both pass. Differential Revision: D19509501 fbshipit-source-id: 84ad2f9c5305ed11ed9bb0144b1aaf5f8698cd2b	2020-01-27 00:47:27 -08:00
peter	1e5aead35b	Make cuda search process of cpp extension quiet (#32620 ) Summary: Fixes https://discuss.pytorch.org/t/error-with-cpp-extentions/67559. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32620 Differential Revision: D19576164 Pulled By: soumith fbshipit-source-id: 076229322375774bec03ef2632fc233000c15391	2020-01-26 20:26:43 -08:00
Nikolay Korovaiko	8fbe1ccd16	faster bailout tests (#32266 ) Summary: Reduces the overhead of `prim::BailOut` nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32266 Differential Revision: D19503336 Pulled By: Krovatkin fbshipit-source-id: daa0c373f0fa17edd689600b75e7e4ba98b4670a	2020-01-26 19:44:00 -08:00
Summer Deng	12d5933969	Bug fix of norm minimization for dev mode (#31462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31462 Fix the divide by zero issue in norm minimization in dev mode Test Plan: buck run mode/dev vision/video_modeling/classification/tools:test_octGloRe_quantization -- --test_data=/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/deep_vision_video_yufei_test_data_fcc_v4p2_10.csv --output_dir /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/octGloRe --load_model_path=/mnt/vol/gfsfblearner-oregon/flow/data/2019-10-15/e2681db8-e4f5-4b70-ae18-45bf0b8fbfbc/train_model_epoch0_inputcount0_final.mdl --dataset_name="FCC V4P2" --num_labels=1099 --column_handle="handle" --clip_per_video=1 --num_groups=24 --width_per_group=2 --batch_size=32 --histogram_file=/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/octGloRe/hist_octGloRe_final_24x2_fcc_v4p2_1clip_f144586257_nullfix_100k_compiled.hist --int8_model_type="pb" --int8_predict_net_path="reproduce_octGloRe_final_24x2_predict_net_int8_l2approx_wminmax_from_mdl.pb" --int8_init_net_path="reproduce_octGloRe_final_24x2_init_net_int8_l2approx_wminmax_from_mdl.pb" --weight_quant="l2_approx" --activation_quant="l2_approx" --print_model --int8_model_saved --num_iter 10 Reviewed By: jspark1105 Differential Revision: D19172591 fbshipit-source-id: 994a20e3364b0dc33623a11281e0bdbc2e06159d	2020-01-26 12:44:14 -08:00
Edgar Andrés Margffoy Tuay	90a259e1e2	Add warning regarding pickle insecurity on torch.load documentation (#32593 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31875 Added a small warning box based on the one presented on the [pickle](https://docs.python.org/3/library/pickle.html) module regarding the safety issues of unpickling files. i.e., unwanted code execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32593 Differential Revision: D19572292 Pulled By: ngimel fbshipit-source-id: 69e7de390133ea77bddcadcd5b6820193c8abcc9	2020-01-25 22:12:37 -08:00
Enealor	3bbb36e02d	Update linspace types (#32218 ) Summary: Changes the linspace functions to be more consistent as requested in https://github.com/pytorch/pytorch/issues/31991. The code has also been updated to avoid an early rounding error; the line `scalar_t step = (scalar_end - scalar_start) / static_cast<static_t>(steps-1)` can result in `step = 0` for integer scalars, and this gives unintended results. I examined the new output using ``` import torch types = [torch.uint8, torch.int8, torch.short, torch.int, torch.long, torch.half, torch.float, torch.double] print('Testing linspace:') for type in types: print(type, torch.linspace(-2, 2, 10, dtype=type)) ``` which returns ``` Testing linspace: torch.uint8 tensor([254, 254, 254, 255, 255, 0, 0, 1, 1, 2], dtype=torch.uint8) torch.int8 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2], dtype=torch.int8) torch.int16 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2], dtype=torch.int16) torch.int32 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2], dtype=torch.int32) torch.int64 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2]) torch.float16 tensor([-2.0000, -1.5557, -1.1113, -0.6670, -0.2227, 0.2227, 0.6660, 1.1113, 1.5547, 2.0000], dtype=torch.float16) torch.float32 tensor([-2.0000, -1.5556, -1.1111, -0.6667, -0.2222, 0.2222, 0.6667, 1.1111, 1.5556, 2.0000]) torch.float64 tensor([-2.0000, -1.5556, -1.1111, -0.6667, -0.2222, 0.2222, 0.6667, 1.1111, 1.5556, 2.0000], dtype=torch.float64) ``` which is the expected output: `uint8` overflows as it should, and the result of casting from a floating point to an integer is correct. This PR does not change the logspace function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32218 Differential Revision: D19544224 Pulled By: ngimel fbshipit-source-id: 2bbf2b8552900eaef2dcc41b6464fc39bec22e0b	2020-01-25 20:23:54 -08:00
Charles Hofer	5fd037ce44	Fix MagmaInitializesCorrectly_CUDA by using an invertible matrix (#32547 ) Summary: This test case had been using the tensor ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ``` which is not an invertible tensor and causes the test case to fail, even if magma gets initialized just fine. This change uses a tensor that is invertible, and the inverse doesn't include any elements that are close to zero to avoid floating point rounding errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32547 Differential Revision: D19572316 Pulled By: ngimel fbshipit-source-id: 1baf3f8601b2ba69fdd6678d7a3d86772d01edbe	2020-01-25 20:00:54 -08:00
jeongukjae	320d1a1573	Fix wrong typing (torch/nn/parameter.pyi) (#32617 ) Summary: A constructor of `nn.Parameter` has default values on `data` and `requires_grad`, but in type stub, there are no default values. Resolve https://github.com/pytorch/pytorch/issues/32481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32617 Differential Revision: D19571397 Pulled By: ngimel fbshipit-source-id: fd14298aa472b7575221229cecf5a56f8c84f531	2020-01-25 16:19:33 -08:00
Jiakai Liu	69283388ca	[pytorch] codegen flags to whitelist op registrations / generate to separate files (#32451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32451 This PR adds a few new parameters to ATen codegen script: ``` 1. op_registration_whitelist Can be used to filter op registrations for selective build; 2. type_whitelist Can be used to filter types (CPUType, CUDAType, ...) for selective build; 3. per_op_registration When set it will group function registrations by op name and write to separate files; ``` 1 & 2 are introduced for mobile custom build without relying on static dispatch; 3 is introduced to solve custom build with multi-library / multi-model (needed by FB internal build - see more details: https://fb.quip.com/ZVh1AgOKW8Vv). These flags should work independently with each other (and independent to USE_STATIC_DISPATCH). Not setting them should have no effect compared to master. ghstack-source-id: 97214788 Test Plan: - tested all 3 params with FB internal build changes. Differential Revision: D19427919 fbshipit-source-id: a381fe5f768fe2e9196563787f08eb9f18316e83	2020-01-25 15:27:29 -08:00
Jiakai Liu	0afe195046	[pytorch] move type_derived_methods out of anonymous namespace (#32275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32275 Currently TypeDerived (e.g. `CPUType::`) methods are declared and defined in anonymous namespace as they are only called from c10 dispatcher - except for STATIC_DISPATCH mode, where they can be directly called from Functions.h. We plan to generate c10 op registration into separate files for internal xplat/BUCK build, thus we need declare these methods in non-anonymous namespace. I feel it's easier to simply change it unconditionally, unless there are some side effect I'm not aware of - `TypeDefault::` methods are in non-anonymous namespace anyway. ghstack-source-id: 97214789 Test Plan: - CI Differential Revision: D19426692 Pulled By: ljk53 fbshipit-source-id: 44aebba15f5e88ef4acfb623844f61d735016959	2020-01-25 15:24:32 -08:00
Jongsoo Park	bd20274e8f	[caffe2] use JIT'ed fp32 SLS (#32413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32413 Use JIT'ed fp32 SLS in Caffe2 operators Test Plan: CI Reviewed By: jianyuh Differential Revision: D19460555 fbshipit-source-id: 4f29d34523efb6ea1e4c324cc8c93c96990c6aad	2020-01-25 12:57:18 -08:00
Shihao Xu	6ad9e5c70d	Support TorchScript call over remote API (RRef) (#32466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32466 It's a follow-up work of https://github.com/pytorch/pytorch/pull/32197. In https://github.com/pytorch/pytorch/pull/32197, `rpc.sync_rpc(..) `and `rpc.rpc_async(..)` support taking a TorchScript annotated Python function as the user function for RPC. This PR extend along this direction by making `rpc.remote(..)` support taking a TorchScript annotated Python function as well. ghstack-source-id: 97211168 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_function_exception buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_function_exception ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D19440633 fbshipit-source-id: d37f6dcdc0b80d35ac7bcba46ad6f9b831c3779b	2020-01-25 02:18:27 -08:00
Jongsoo Park	e0ffe72649	[aten] fix shadowing variable warning (#32573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32573 Fix the following warning ``` caffe2/aten/src/ATen/ParallelOpenMP.h:36:9: warning: declaration of ‘num_threads’ shadows a previous local [-Wshadow=compatible-local] int64_t num_threads = omp_get_num_threads(); ^~~~~~~~~~~ caffe2/aten/src/ATen/ParallelOpenMP.h:29:9: note: shadowed declaration is here int64_t num_threads = omp_in_parallel() ? 1 : omp_get_max_threads(); ^~~~~~~~~~~ ``` Test Plan: CI Reviewed By: ilia-cher Differential Revision: D19552578 fbshipit-source-id: b8388de1aaa2bb7676b777c93b8ba9c25f5a3d51	2020-01-24 18:48:07 -08:00
Supriya Rao	169541871a	Add operator support for dynamic quant on mobile (#32479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32479 Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Differential Revision: D19542980 fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed	2020-01-24 17:51:54 -08:00
Pritam Damania	59dbece371	Fix iterator for ncclCommWatchdog. (#32571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32571 The watchdog thread would erase an element and call `it--` (implicitly relying on `it++` in the for loop to position correctly). Although, `it--` would cause undefined behavior if the iterator is pointing to begin(). As a result, I've modified the logic to update the iterator appropriately. I've also enhanced the watchdog thread to catch and log exceptions. ghstack-source-id: 97150763 Test Plan: waitforbuildbot Differential Revision: D19551365 fbshipit-source-id: 426835819ad8d467bccf5846b04d14442a342f78	2020-01-24 17:34:36 -08:00
Jianyu Huang	1218a16aae	[pytorch][refactor] Explicitly use auto* for pointers (#32548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32548 As Title says. ghstack-source-id: 97175523 Test Plan: CI Differential Revision: D19541893 fbshipit-source-id: 96dce6964e6a89393d4159401a59672f041f51d3	2020-01-24 17:20:38 -08:00
Jerry Zhang	e7edc5f20e	[jit] Cloning constants in ClassType (#32371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32371 After we add constants to ClassType, we didn't update clone to clone the constants, this PR adds the support fixes: https://github.com/pytorch/pytorch/issues/32368 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D19564378 fbshipit-source-id: dbb13fb889d6ea9291034313b1f3c9aff4748bda	2020-01-24 16:48:38 -08:00
Rohan Varma	666472a38d	[docs] Change fut.wait() to torch.jit._wait(fut) in jit overview docs (#32336 ) Summary: It looks like the jit Future does not have a `wait()` anymore and this throws an error when trying to run this code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32336 Differential Revision: D19559922 Pulled By: rohan-varma fbshipit-source-id: a5aa67990595e98e0682a20cf5aced17c2ae85bb	2020-01-24 16:40:22 -08:00
Michael Ranieri	6412ca3ce9	duplicate symbols with AT_PARALLEL_OPENMP=0 (#32568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32568 explicitly disabling openmp actually causes it to be used. Test Plan: CI passes Reviewed By: ilia-cher Differential Revision: D19549732 fbshipit-source-id: 767b92148f47a1450ded46e101cd3d9b331a5d40	2020-01-24 16:27:50 -08:00
Jerry Zhang	91f10a1de1	[quant][graphmode][refactor] Better API for fold_convbn (#32380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32380 We'll clone the module first and then fold conv bn and return a new module Test Plan: . Imported from OSS Differential Revision: D19508033 fbshipit-source-id: 328e91a2c9420761c904a7f2b62dab4cfaaa31ac	2020-01-24 15:46:47 -08:00
Huamin Li	52f8f031ac	add diag into pt operator microbenchmark (#32597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32597 Currently, there is no benchmark test about diag operator. This diff will add one into the suite. Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim1_M64_N64_diagonal0_outTrue_cpu # Input: dim: 1, M: 64, N: 64, diagonal: 0, out: True, device: cpu Forward Execution Time (us) : 28.496 # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim2_M128_N128_diagonal-10_outFalse_cpu # Input: dim: 2, M: 128, N: 128, diagonal: -10, out: False, device: cpu Forward Execution Time (us) : 45.179 # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim1_M256_N256_diagonal20_outTrue_cpu # Input: dim: 1, M: 256, N: 256, diagonal: 20, out: True, device: cpu Forward Execution Time (us) : 49.009 ``` Reviewed By: mingzhe09088 Differential Revision: D19564024 fbshipit-source-id: 828a3e0e0e06810a77eb5ddb734efd30e4a63acf	2020-01-24 15:41:04 -08:00
Jiakai Liu	9e0ce72e9e	[pytorch] change op dependency output to use double-quoted strings (#32464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32464 Changed to double quoted strings to make FB linter happy. Test Plan: Imported from OSS Differential Revision: D19507859 Pulled By: ljk53 fbshipit-source-id: fa70535c7fbea73214b3b0efb0532184b5ee6854	2020-01-24 15:27:28 -08:00
Jerry Zhang	2bfd33b4ab	[refactor] Adding FoldConvBatchNorm2dHelper (#32374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32374 Moving all fold conv bn code to a class to prepare for making it work with shared ClassType Test Plan: compiles Imported from OSS Differential Revision: D19508032 fbshipit-source-id: 4e9cf714111305d2b5474d4506507078f69f0c84	2020-01-24 14:41:20 -08:00
Jeremy Lilley	573a30270c	[pytorch] Minor: boilerplate to propagate errors in request_callback_impl (#32556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32556 Out of caution, avoid assuming that there's never a failure in a couple of request_calback_impl case handlers, but rather propagate the error. ghstack-source-id: 97128697 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D19544685 fbshipit-source-id: 67c55626960bd42a5b0dec7841e8ba44ab059eb9	2020-01-24 14:37:33 -08:00
albanD	3ab30753e9	Make autogen functions correct for multiple outputs and views (#31990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31990 This PR does three things: - Add a new `allow_rebase_history` flag to the differentiable views. If set, trying to rebase their history will raise an error. - Make sure that the codegen functions verify this flag before doing inplace operations so that they fail before doing the inplace modification. - Make sure the codegen functions set this flag properly when we don't support rebasing the history of the output. The codegen change can be found [here](`4bf180caa0`). Test Plan: Imported from OSS Differential Revision: D19409649 Pulled By: albanD fbshipit-source-id: a2b41c2d231e952ecfe162bdb6bad620ac595703	2020-01-24 14:32:28 -08:00
albanD	9e59244b53	fix view listing in autograd codegen (#32044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32044 Fix the list of views in the codegen: - Move `narrow` out of the autograd functions since it's now implemented with slice. - Add `split_with_sizes` that was missing from the list - Remove special formulas for both `split` and `split_with_sizes`. Both used not to be considered as views. When they are, all the rnn code breaks because it uses them in an invalid way. The generic formula will generate one `narrow` Node for each output. Which is always valid. The diff for the generated code can be found [here](https://github.com/pytorch/pytorch/compare/16eff6e...albanD:06d6e85) (outdated for last commit) Test Plan: Imported from OSS Differential Revision: D19409648 Pulled By: albanD fbshipit-source-id: 5ebc4c978af500403f7f008c0231b7db0cabab26	2020-01-24 14:31:21 -08:00
Jerry Zhang	d2bda53f6d	[quant][graphmode] Call _jit_pass_dedup_module_ueses in quantize_script (#32303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32303 att Test Plan: . Imported from OSS Differential Revision: D19508029 fbshipit-source-id: 468ed53fc8bb3c8fdf5d79aea186949e64be711a	2020-01-24 13:34:40 -08:00
Jerry Zhang	fe3eb09da5	[quant] Re-enable fold_convbn in quantize_script (#32302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32302 att Test Plan: . Imported from OSS Differential Revision: D19508035 fbshipit-source-id: 2ac26585396ec8a115acd0e1d7ccb84098a76824	2020-01-24 13:03:53 -08:00
Jiakai Liu	fd1a4f18ee	[pytorch] update code analyzer build.sh to handle srcs with same name (#32525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32525 Before calling static code analyzer we need link all bitcode files into a single module. Current approach is a bit hacky: cmake still calls "ar" to pack bitcode files into archives, then we manually unpack these archives and call llvm-link. Turns out libtorch_cpu.a contains a few files with same name, e.g.: ``` aten/src/ATen/native/SoftMax.cpp aten/src/ATen/native/mkldnn/SoftMax.cpp ``` "ar x" will only keep one of them and cause inaccurate analysis result. Use this temporary hack to workaround the problem. Ideally should merge this step into cmake (e.g. directly calling llvm-link to produce target output?). Differential Revision: D19530533 Pulled By: ljk53 fbshipit-source-id: 94b292c241abaaa0ff4a23059882abdc3522971e	2020-01-24 12:37:30 -08:00
Michael Suo	ef5637f85e	[jit] allow compilation using optional modules (#32539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32539 Before: if something in `_modules` was `None`, we would barf. This is incorrect because it's allowed for users to put `None` there, in case a module is optional. This case ought to be handled correctly during scripting. Fixes https://github.com/pytorch/pytorch/issues/32469 Test Plan: Imported from OSS Differential Revision: D19552346 Pulled By: suo fbshipit-source-id: aba7fdc19fd84d195c81cdaca8a75013a8626a8b	2020-01-24 11:51:47 -08:00
Nikolay Korovaiko	7d0f0b62de	API for testing bailouts (#32518 ) Summary: This API seems to be quite useful to make sure all bailouts in a graph are triggered. I used it for testing torchvision models and I was wondering if this might be something we might actually want to have? zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/32518 Differential Revision: D19553147 Pulled By: Krovatkin fbshipit-source-id: 7542c99051588b622091aec6d041c70731ca5d26	2020-01-24 11:19:41 -08:00
Eli Uriegas	f0c85571ed	docker: Refactor Dockerfile process for official images (#32515 ) Summary: ## Commit Message: Refactors Dockerfile to be as parallel as possible with caching and adds a new Makefile to build said Dockerfile. Also updated the README.md to reflect the changes as well as updated some of the verbage around running our latest Docker images. Adds the new Dockerfile process to our CircleCI workflows ## How to build: Building the new images is pretty simple, just requires `docker` > 18.06 since the new build process relies on `buildkit` caching and multi-stage build resolving. ### Development images For `runtime` images: ``` make -f docker.Makefile runtime-image ``` For `devel` images: ``` make -f docker.Makefile devel-image ``` Builds are tagged as follows: ```bash docker.io/${docker_user:-whoami}/pytorch:$(git describe --tags)-${image_type} ``` Example: ``` docker.io/seemethere/pytorch:v1.4.0a0-2225-g9eba97b61d-runtime ``` ### Official images Official images are the ones hosted on [`docker.io/pytorch/pytorch`](https://hub.docker.com/r/pytorch/pytorch) To do official images builds you can simply add set the `BUILD_TYPE` variable to `official` and it will do the correct build without building the local binaries: Example: ``` make -f docker.Makefile BUILD_TYPE=official runtime-image ``` ## How to push: Pushing is also super simple (And will automatically tag the right thing based off of the git tag): ``` make -f docker.Makefile runtime-push make -f docker.Makefile devel-push ``` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32515 Differential Revision: D19558619 Pulled By: seemethere fbshipit-source-id: a06b25cd39ae9890751a60f8f36739ad6ab9ac99	2020-01-24 10:27:20 -08:00
Michael Suo	8fd3eaed25	[jit] Fix dict type serialization (#32569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32569 If the dict's contained types cannot be inferred from its contents (for example, `Dict[str, Tensor]` vs. `Dict[str, Optional[Tensor]]`), we must explicitly annotate the type. Also this removes some special handling that omits annotations on empty containers that have the default type. It makes the code more complex for not too much value, and was wrong for dicts anyway. Test Plan: Imported from OSS Differential Revision: D19551016 Pulled By: suo fbshipit-source-id: c529b112e72c10f509a6bc0f5876644caa1be967	2020-01-24 03:19:55 -08:00
Jianyu Huang	3ada2e0d64	[pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477 We would like to add the intra-op parallelization support for the EmbeddingBag operator. This should bring speedup for the DLRM benchmark: https://github.com/pytorch/pytorch/pull/24385 Benchmark code: ``` from __future__ import absolute_import, division, print_function, unicode_literals import torch import time eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum') input = torch.LongTensor(1500).random_(0, 1000000) offsets = torch.zeros(64, dtype=torch.int64) niter = 10000 s = time.time() for _ in range(niter): out = eb(input, offsets) time_per_iter = (time.time() - s) / niter print('time_per_iter', time_per_iter) print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9) ``` The following results are single core on Skylake T6: - Before our change (with the original caffe2::EmbeddingLookup) time_per_iter 6.313693523406982e-05 GB/s 6.341517821789133 - After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths. time_per_iter 5.7627105712890626e-05 GB/s 6.947841559053659 - With Intel's PR: https://github.com/pytorch/pytorch/pull/24385 time_per_iter 7.393271923065185e-05 GB/s 5.415518381664018 For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6. ghstack-source-id: 97124557 Test Plan: With D16990830: ``` buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench ``` With D17750961: ``` buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb ``` OSS test ``` python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu ``` Buck test ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets" --print-passing-details ``` Generate the AVX2 code for embedding_lookup_idx_avx2.cc: ``` python hp_emblookup_codegen.py --use-offsets ``` Differential Revision: D17768404 fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700	2020-01-23 21:29:44 -08:00
Yanli Zhao	b474c351dd	[rpc] Remove template on RRef and add Type to RRef creation (#30630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30630 This remove template and all the specializations it have in rpc, we universally use IValue as the inner value since we support making python object to be hold inside IValue. This will also ensure that we have the correct type information when creating the RRef, we use the return type from the schema when creating userRRef and OwnerRRef, it will enable IValue to always have the correct type if the IValue is the RRef object (next PR) Test Plan: Imported from OSS Differential Revision: D19502235 fbshipit-source-id: 0d5decae8a9767e0893f3b8b6456b231653be3c5	2020-01-23 21:15:46 -08:00
svcscm	ef2d4e67d1	Updating submodules Summary: GitHub commits: `08e28edc08` `6884ecfc67` `685144514f` `ed665880aa` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 7b19dca06ad7e8751de21efc48f5eada37b446fb	2020-01-23 21:09:43 -08:00
Elias Ellison	6f146e1768	[JIT] Remove capsule type handling of node hashing (#32540 ) Summary: Capsule Type doesn't appear in the IR, it is purely used in the runtime. So we should not have to handle it node hashing... Let's see if this breaks anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32540 Differential Revision: D19541357 Pulled By: eellison fbshipit-source-id: 905ed9f89cf6d03b45ddb4fde02adfa149b477f8	2020-01-23 17:44:28 -08:00
Nik Ved	d2f66083c5	porting gather to ATen using TensorIterator with multithreading support. (#32425 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24702](https://github.com/pytorch/pytorch/issues/24702). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32425 Differential Revision: D19538265 Pulled By: ngimel fbshipit-source-id: 78821a16b6948916e956a04f984e0956f86cf582	2020-01-23 16:14:47 -08:00
Jerry Zhang	4cd6b5cda6	[quant] Re-enable test_nested that has different qconfig for shared ClassType (#32206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32206 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D19508028 fbshipit-source-id: 5de3c2ef17de146feca03d7135a7e04f393de398	2020-01-23 15:32:57 -08:00
James Reed	6745bfc31c	Revert "Remove __torch__ from custom class qualname" (#32514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32514 This reverts commit c7fdf5b251c6fecd5d78b4f33d30bd77ca3f841c. Test Plan: Imported from OSS Differential Revision: D19525532 Pulled By: jamesr66a fbshipit-source-id: 126f4e87250a2ac739bd7aa161a0f7b39f143d38	2020-01-23 14:56:25 -08:00
James Reed	8ed1dd528e	[JIT] Add torch.classes.load_library Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32508 Test Plan: Imported from OSS Differential Revision: D19525175 Pulled By: jamesr66a fbshipit-source-id: b9f07113f551bdfb56d49d24d12989be2b8fc7e4	2020-01-23 14:56:20 -08:00
James Reed	69f9bf8893	[JIT] Support returning tuple from custom bound C++ method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32477 Test Plan: Imported from OSS Differential Revision: D19509927 Pulled By: jamesr66a fbshipit-source-id: 7d407150402cc19344c3ec3b4a27b3d7c464e8ac	2020-01-23 14:56:15 -08:00
James Reed	ae42e232ce	[JIT] Fix custom class method binding for const methods Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32471 Test Plan: Imported from OSS Differential Revision: D19508249 Pulled By: jamesr66a fbshipit-source-id: 3a0bce6845072bb03567049a73b9982b54d8daf9	2020-01-23 14:56:11 -08:00
James Reed	7e14c420ae	[JIT] Test __getstate__ and __setstate__ for custom bound C++ classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32470 Test Plan: Imported from OSS Differential Revision: D19508250 Pulled By: jamesr66a fbshipit-source-id: 481299fb3c18fa874c2a1d2993984bb6b3193bac	2020-01-23 14:56:06 -08:00
James Reed	dbd29e5668	[JIT] Passing custom class as arg (#32260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32260 This makes it so you can actually pass the custom class as an arg to ScriptFunctions Test Plan: Imported from OSS Differential Revision: D19424252 Pulled By: jamesr66a fbshipit-source-id: c3530186619655781dedbea03c2ad321aaff1cb8	2020-01-23 14:54:59 -08:00
Xiang Gao	ad4fba0ce4	Only run test_conv_large and test_conv_transposed_large_cuda on 32GB device (#32473 ) Summary: For some reason, these two tests start to fail on 16GB Volta on Linux... Also fixes https://github.com/pytorch/pytorch/issues/31650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32473 Differential Revision: D19538314 Pulled By: ngimel fbshipit-source-id: 266195f19d8cf76b035795e0e318c152ae72adc2	2020-01-23 14:50:24 -08:00
Denis Akhiyarov	49cd83d735	no more build_pytorch_libs.sh/.bat (#32319 ) Summary: https://github.com/pytorch/pytorch/issues/12918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32319 Differential Revision: D19544272 Pulled By: soumith fbshipit-source-id: dd32fa61efa78af908f21c7e54cb6484bf895e54	2020-01-23 14:45:54 -08:00
Jerry Zhang	d234626267	[quant][graphmode] Support quantizing shared ClassType with different qconfigs (#32205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32205 to be filled Test Plan: python test_jit.py Imported from OSS Differential Revision: D19508031 fbshipit-source-id: cbf03d34e52eae62595c34fde6ec645cb6744ad9	2020-01-23 14:32:55 -08:00
Elias Ellison	ef94496b36	[JIT] throw if no self arg on ignored methods (#32503 ) Summary: There was a user who did this and it would seg fault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32503 Differential Revision: D19538481 Pulled By: eellison fbshipit-source-id: dc3752028b9eff6ac88c025e8a2b5f8fd44ce32f	2020-01-23 14:27:00 -08:00
Guanheng Zhang	db02a4e4ce	Support 3D attention mask in MultiheadAttention. (#31996 ) Summary: Support a 3D attention mask for MultiheadAttention. If `attn_mask` has the batch dimension, it will not be unsqueezed. Fix https://github.com/pytorch/pytorch/issues/30678 Relevant issues/pr: https://github.com/pytorch/pytorch/pull/25359 https://github.com/pytorch/pytorch/issues/29520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31996 Differential Revision: D19332816 Pulled By: zhangguanheng66 fbshipit-source-id: 3448af4b219607af60e02655affe59997ad212d9	2020-01-23 13:16:48 -08:00
Martin Yuan	b6b8620871	Add unit test on export_opnames with interface. (#31531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31531 As suggested by suo , add unit test on torch.jit.export_opnames with interface. A submodule is annotated as interface and assigned to an instance, and then re-assigned to another instance. Make sure the operator names are also updated. Test Plan: Imported from OSS Differential Revision: D19539129 Pulled By: iseeyuan fbshipit-source-id: 71a76ae7790cdd577618ca278afdb132727f08dc	2020-01-23 12:27:22 -08:00
Pavel Belevich	9af5a97b1d	Fix nll_loss to support empty tensors on GPU (#31491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31491 Fixes #31472 Test Plan: Imported from OSS Differential Revision: D19537231 Pulled By: pbelevich fbshipit-source-id: 20a43251a0f68a7a3557dd8234daee2d4814e5dd	2020-01-23 11:45:59 -08:00
Jerry Zhang	583bb97618	[quant][graphmode] Default to non-inplace in graph mode quantization API (#32204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32204 att Test Plan: . Imported from OSS Differential Revision: D19508030 fbshipit-source-id: 94814c3c126a196f3938f944abfa5ae2a24d8dde	2020-01-23 10:39:46 -08:00
Lu Fang	ea7bebb7fe	[PyTorch BC] Clean up the whitelist for PyTorch Op BC check (#32523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32523 remove stale items Test Plan: cont build Reviewed By: hl475 Differential Revision: D19526918 fbshipit-source-id: ee7392ae84e5ddf88284020775119e59c9b6533e	2020-01-23 09:25:37 -08:00
albanD	02aa3ba331	Raise error for code that risk deadlock (#32295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32295 Fix for https://github.com/pytorch/pytorch/issues/32045 Calling into the engine with the GIL can deadlock because: - worker thread initialization acquires the GIL - Any Node / hook can be a python function that will acquire the GIL The choice was made here to raise an error as one of the advantage of using cpp extensions with python is to be able to release the GIL. So we prefer to educate users to do it rather than doing it under the hook. Test Plan: Imported from OSS Differential Revision: D19430979 Pulled By: albanD fbshipit-source-id: e43f57631885f12e573da0fc569c03a943cec519	2020-01-23 08:53:59 -08:00
Hongyi Jia	21d475e20d	[gloo] Skip registry warning (#31126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31126 Gloo device creator registry is throwing warning that confuses users - https://fb.workplace.com/groups/1405155842844877/permalink/3217491788277931/ Create C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING API to skip such warning Test Plan: {F224342749} Tested both `C10_DEFINE_SHARED_REGISTRY` and `C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING`. Make sure nothing breaks Reviewed By: d4l3k Differential Revision: D18904783 fbshipit-source-id: 0e0065d530956249a18325d4ed3cb58dec255d4c	2020-01-22 22:46:27 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Jongsoo Park	e735395fc6	[caffe2] use 2-stage EmbeddingSpMDM interface (#32271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32271 Use the 2-stage EmbeddingSpMDM interface in D19425982 to reduce the overhead of code cache lookup and lock contention. Fix an issue in sparse_lengths_sum_benchmarks generating empty indices when average length is small like 1. Test Plan: CI Reviewed By: dskhudia Differential Revision: D19425987 fbshipit-source-id: d5c5f0d46e0072403901809c31d516fa0f4b9b31	2020-01-22 19:05:36 -08:00
Dehua Cheng	685f090ac8	[Rowwise Pruning][c2 op] Add Quantile Op (#32448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32448 Using binary search to compute the value for the given quantile among the input tensors. Test Plan: Newly added unittests; Reviewed By: jspark1105 Differential Revision: D19487604 fbshipit-source-id: 0dc6627b78d1310ac35b3f1d53b89cc89a697ece	2020-01-22 16:59:56 -08:00
Michael Carilli	4bdfc71421	Fix race condition for to() backward that spans devices (#31930 ) Summary: While putting finishing touches on the gradient scaling PR (https://github.com/pytorch/pytorch/pull/26512), I discovered my multi-GPU test (which uses `to()` to transfer tensors between devices) was intermittently failing with bad numerics. I knew it was going to be [a weird case from the start](https://www.imdb.com/title/tt8946378/quotes/qt4868203) and spent a week descending into madness. It turns out, for backward ops that create gradients on a different device from the device on whose stream the op is executed, the streaming backward synchronizations in [input_buffer.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L46-L83) do not properly tell later ops to wait on the population/creation of those gradients. For example, a cross-device `to()` backward (CopyBackward Node) enqueues a cudaMemcpyAsync on the current stream of the source (incoming gradient's) device, then [syncs getCurrentCUDAStream on the destination device with the cudaMemcpyAsync](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Copy.cu#L76). However, `input_buffer.cpp` in such cases ([case (3)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L77-L81)) was not properly telling `opt_consumer_stream` to wait on the current stream of the destination device (`var`'s device). Circumstances needed to repro in current master (see [my test](https://github.com/pytorch/pytorch/compare/master...mcarilli:backward_to_race_fix#diff-e68a7bc6ba14f212e5e7eb3727394b40R1901)): - 2 devices, with non-default streams used for forward-pass ops on both devices (which is the default behavior in test_cuda.py) - A `to()` that transfers a tensor requiring grad from one device to another - A backward pass that routes back through to()'s backward (aka CopyBackward). Under these circumstances, backward ops following CopyBackward on CopyBackward's destination device (aka the original forward-pass source device) race with the device-to-device transfer, and execute using partially-transferred data. The present PR fixes the race condition and ensures that later ops wait on the CopyBackward transfer. This PR should also make streaming backward safe for other backward ops that span devices, as long as they play nice and populate any new gradients they create using the "current stream" of the device(s) on which they create those gradients. There are a couple minor issues where I'm not sure of the best approach: - Should we guard onto the var's device for the entire body of InputBuffer::add? - I'm fairly sure we need to `recordStream` on `var` if the consumer stream is different from the stream on which (we expect) `var` was created, but calling `c10::cuda::CUDACachingAllocator::recordStream` in input_buffer.cpp might break CPU-only builds. I couldn't find a different API call to record streams that seemed CPU-build-agnostic. Could I wrap the call with a macro? Thanks to mruberry for helpful suggestions and also the organization/naming of the stream pool and streaming backward code that allowed me to (just barely) wrap my head around the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31930 Differential Revision: D19517617 Pulled By: mruberry fbshipit-source-id: 183d5460aefa5d27366b465b0473b80ec80fa044	2020-01-22 16:32:24 -08:00
Yanli Zhao	193ac31441	[jit] Enable IValue to hold a PyObject (#32491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32491 This PR enables IValue to be able to hold a pure PyObject by adding a new enum tag, a new jit_type to denote PyObject existance in IValue and the JIT type system. We don't and not plan to expose this to user. This is the basic piece that enable ivalue to be adopted broader like making RRef always hold IValue, it might also simplify some compiler logic ghstack-source-id: 97039980 Test Plan: Imported from OSS Differential Revision: D19502234 fbshipit-source-id: 90be001706d707d376cfbea25980fd82980df84a	2020-01-22 15:48:32 -08:00
svcscm	556c0b063d	Updating submodules Summary: GitHub commits: `87b81e7cb2` `3a9a0976f2` `9294f3b2fa` `c8addc5ad4` `9a9f1a849a` `27cb280170` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 73beec64bf9c17fa6c42dd09ea85350e8c9c66ea	2020-01-22 15:30:31 -08:00
Jongsoo Park	14e0bec9f2	[caffe2] remove unnecessary np.set_printoptions and fix test errors (#32475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32475 As title Test Plan: CI Reviewed By: houseroad Differential Revision: D19508778 fbshipit-source-id: fd9ad63607535980505d155f3e3c3b7c6b95daf7	2020-01-22 14:49:47 -08:00
Gaurav Singh	faffd2141a	Corrected logical boolean expression (#32249 ) Summary: Changed bitwise & to logical && in the boolean expression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32249 Differential Revision: D19501586 Pulled By: eellison fbshipit-source-id: afe374cfc9661182703cc82810d9cb735fbb8180	2020-01-22 13:54:16 -08:00
Shen Li	43eb931c0f	Remove mis-exposed abort API on ProcessGroup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32292 Test Plan: Imported from OSS Differential Revision: D19430252 Pulled By: mrshenli fbshipit-source-id: 4ec594e1be54afe774bdcecc0f1c9bda2edf5e0d	2020-01-22 12:51:20 -08:00
Jerry Zhang	b7c6277c53	Adding QConfigTypePtrMap (#32203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32203 The type is needed for allowing multiple qconfig configurations for shared ClassType, see next PR for more details Test Plan: . Imported from OSS Differential Revision: D19508027 fbshipit-source-id: a3df29dab3038bfa88c55dda98a3e8a78e99e5a1	2020-01-22 12:40:12 -08:00
Elias Ellison	38d122eca9	implement tuple constants (#31841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31841 Add Tuple Constants to JIT. The constraint here is that all elements of a tuple must themself be insertable as a a constant. Previously tuples were special cased in constant propagation, but now that there are more passes that are inserted constants, such as freezing, we should just have tuples be representable as constants. Test Plan: Imported from OSS Differential Revision: D19439514 Pulled By: eellison fbshipit-source-id: 3810ba08ee349fa5598f4b53ea64525996637b1a	2020-01-22 12:13:31 -08:00
Elias Ellison	69492ad6ac	remove tuple logic in constant propagation (#31840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31840 The next PR in this stack makes tuples insertable as constants, so we can remove special handling of tuples in constant propagation. Test Plan: Imported from OSS Differential Revision: D19439515 Pulled By: eellison fbshipit-source-id: c58f153157f1d4eee4c1242decc4f36e41c1aa05	2020-01-22 12:13:26 -08:00
Elias Ellison	b01d824a78	improve mayContainAlias (#31839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31839 There are a number of improvements that can be made to `mayContainAlias`, which I would like to do in follow ups. For now, this is an easy one. Test Plan: Imported from OSS Differential Revision: D19439516 Pulled By: eellison fbshipit-source-id: 0042fb7eaae6cfb4916bf95dc38280517a4bd987	2020-01-22 12:13:20 -08:00
Elias Ellison	adf0916606	Add str[] float[] constants resubmit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31791 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D19439513 Pulled By: eellison fbshipit-source-id: a04c7401687b051f0d4fb4794963931ebe004194	2020-01-22 12:11:58 -08:00
Zachary DeVito	e184a8843c	Fix comparisions for ConcreteModuleType (#32256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32256 Previously two unrelated modules loaded from torch.jit.load would compare equal because we only considered their data_ attributes which are initialized blank in torch.jit.load. This changes ConcreteModuleType to distinguish when the data_ attribute is blank vs when it is empty. This replaces the poisoned logic. ghstack-source-id: 96755797 Test Plan: oss Differential Revision: D19423055 fbshipit-source-id: 79d6a50a3731c6eeb8466ba2a93702b49264bba0	2020-01-22 11:59:38 -08:00
Jerry Zhang	8e689378c7	Move some of the helper functions for public use (#32202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32202 Move some helper functions in ModuleUseDeduper for public use Test Plan: . Imported from OSS Differential Revision: D19508034 fbshipit-source-id: 2e8e05eff6f3bbcfe6936598371e4afa72f9b11f	2020-01-22 11:35:37 -08:00
Edgar Riba	510a122d27	add missing align_corners annotation (#32492 ) Summary: adds the missing annotation in grid_sample and affine_grid functional Pull Request resolved: https://github.com/pytorch/pytorch/pull/32492 Differential Revision: D19516550 Pulled By: ezyang fbshipit-source-id: 064c8c99bf6eae6744237c0b151b3ce4c82ada96	2020-01-22 11:29:07 -08:00
Hong Xu	1c017f0c14	Migrate max and min (binary) from TH to ATen. (#30851 ) Summary: TH implementation will be removed after the unary max and min are migrated. Benchmark: (Debian 10, Release build, gcc 7.4, no turbo) ```python import timeit for device in ('cpu', 'cuda'): print(f'device: {device}') for op in ('max', 'min'): for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'torch.{op}(a, b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'torch.{op}(a)' + (';torch.cuda.synchronize()' if device == 'cuda' else ''), setup=f'import torch; a = torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype}) * ({n} / 2)', number=t)) print() ``` Before: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.241763713000182 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.7138833169992722 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.2183356810000987 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7031846980007685 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7704679510006827 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.289198366999699 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7937613740014058 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2930124340000475 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8032857640009752 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.2908709189996443 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8829010000008566 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.2994690759987861 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 1.8037853410005482 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.2929310759991495 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.8075240359994496 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2932477679987642 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7868400779989315 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2885970789993735 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8389664830010588 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.29402057399966 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.787109836999662 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.842438002999188 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.429616614999759 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.835390076999829 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.940423873000327 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4108991760003846 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.9318018840003788 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4168134739993548 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9610764919998473 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4189234130008117 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.960172712999338 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4162539499993727 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.8985912560001452 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.4113489299998037 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.9160250799995993 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4128787690005993 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8806865219994506 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4086357010000938 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9362181240012433 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4151225870009512 ``` After: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.2685823729998447 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.72004808300062 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.212242640000113 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7089235590001408 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7767087259999244 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2916517639996528 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8265984959998605 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.3002885240002797 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8084679720004715 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3012119999993956 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8800218449996464 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.3060645710002063 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.4905043950002437 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.9126290209997023 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7972335520007618 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2918074379995232 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8047651860006226 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2992197730000044 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8526509560006161 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3030709570002728 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.700986622000528 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.8415469050005413 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.3051693249999516 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.8321999460004008 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8086475109994353 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.405110773999695 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.913458047999484 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4236377289998927 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9386842409994642 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4230227469997772 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 3.0341797270002644 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4289592409995748 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.6091147850002017 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 2.036691903999781 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8256167649997224 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4078955400000268 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8631781489993955 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4210130069996012 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 3.0112479260005784 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4297719679998409 ``` Solve partly https://github.com/pytorch/pytorch/issues/24594 #24595 Close https://github.com/pytorch/pytorch/issues/25016 Continuing https://github.com/pytorch/pytorch/issues/27185 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30851 Differential Revision: D19515694 Pulled By: ezyang fbshipit-source-id: 1764897f912d6ae24b0c361f19a1aacf96e0826e	2020-01-22 09:03:18 -08:00
peter	b77c25dec0	Fix dll load logic for Python 3.8 on Windows (#32215 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215 Differential Revision: D19501869 Pulled By: ezyang fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915	2020-01-22 08:33:34 -08:00
Yanli Zhao	c342c354a9	Put sparse all reduce results to input tensors (#32226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32226 right now if users call torch.dist.all_reduce() on dense tensors, outputs are put in input tensors. but if users call torch.dist.all_reduce() on sparse tensors, outputs are neither returned explicitly to users nor are put in input tensors. To make torch.dist.all_reduce() API have same behavior on both dense tensors and sparse tensors, this diff is made to make torch.dist.all_reduce() on sparse tensors to put output in input tensors as well. This is acheived by simply calling input_sparse.copy_(output_sparse), see PR https://github.com/pytorch/pytorch/pull/9005 that implemented copy_ for sparse tensors. close #31413 ghstack-source-id: 96984228 Test Plan: unit test Differential Revision: D19192952 fbshipit-source-id: 2dd31dc057f20cc42b44b9e55df864afa2918c33	2020-01-22 08:06:56 -08:00
Peter Bell	e37a24b044	Always return a new tensor from nn.functional.pad (#32350 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32350 Differential Revision: D19501845 Pulled By: ezyang fbshipit-source-id: ea79496d23dc0016f3caa233c53d283b08f60371	2020-01-22 08:03:42 -08:00
Stas Bekman	8abaa322da	fix torch.eq() doc entry (#32399 ) Summary: fix `torch.eq()` entry example to match the current output (boolean, instead of uint8) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32399 Differential Revision: D19498104 Pulled By: ezyang fbshipit-source-id: e7ec1263226766a5c549feed16d22f8f172aa1a3	2020-01-22 07:43:10 -08:00
Edward Yang	248f6d0485	Implement backend fallback fallthrough (#32439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32439 This adds c10::fallthrough_kernel which is a special boxed function which can be used to implement fallthrough behavior at a dispatch key. A fallthrough kernel will redispatch to the next valid dispatch key. It is implemented in such a way that it costs no more to fallthrough than it does to go straight to the actual implementation of the kernel. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D19503886 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 6ee05bd815c4ef444e612d19f62312dbb76f2787	2020-01-22 07:32:08 -08:00
Hong Xu	0d610b4821	Remove the support of build options like NO_, WITH_ (#32447 ) Summary: We will now use USE_, BUILD_ consistently. The backward compatibility for NO_* and WITH_* is hereby removed in this commit, as promised in the comment (next release is beyond Feb 20): # Before we run the setup_helpers, let's look for NO_* and WITH_* variables and hotpatch environment with the USE_* # equivalent The use of NO_* and WITH_* is deprecated and will be removed in Feb 20, 2020. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32447 Differential Revision: D19515536 Pulled By: ezyang fbshipit-source-id: 2f2c51e6d4674af690b190a1f0397b8f596b6a15	2020-01-22 07:25:29 -08:00
Jerry Zhang	44b270d892	`insert_quant_dequant` pass support shared class types (#31408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31408 We'll error out when a graph is quantized with different QSchemes. This only occurs when we have two modules that have same types (e.g. two Conv2d modules initialized with same arguments) and quantized with two configs that would produce different quantized graphs, for example per tensor affine and per channel affine. This is a rare case, so it should be OK to skip for now. Actual support will come later. Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D19162366 fbshipit-source-id: 798f06d0ddef0c8458237ce88b62159cc77eec8b	2020-01-21 22:18:49 -08:00
svcscm	60b6c99aa7	Updating submodules Summary: GitHub commits: `d2ee8a1a3f` `a1543b168d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: a1394f1c4a48920d3ce1403c70351e2c56eaecf0	2020-01-21 19:18:29 -08:00
xiaobing.zhang	64de93d8e7	Move log_normal to Aten(CPU) (#31854 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24723. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.log_normal_() for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.log_normal_() t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test Device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0114 (ms). input size(128, 10) forward time is 0.1021 (ms). input size(128, 100) forward time is 1.0081 (ms). input size(128, 1000) forward time is 10.1831 (ms). ``` After: ``` input size(128, 1) forward time is 0.0108 (ms). input size(128, 10) forward time is 0.0969 (ms). input size(128, 100) forward time is 0.9804 (ms). input size(128, 1000) forward time is 9.6131 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31854 Differential Revision: D19314586 Pulled By: pbelevich fbshipit-source-id: 2ea1d9a2c505e36aca9e609b52ccb3e8caf2ba8f	2020-01-21 19:07:31 -08:00
svcscm	4973695268	Updating submodules Summary: GitHub commits: `d45f7b4f09` `e6e8b9e871` `da618022d2` `2df47f519a` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: c4af09e70a56d11e845150ba3d90a570a3758e51	2020-01-21 17:16:46 -08:00
Peter Bell	7fdc6cb74e	Fix test_data_parallel name errors and add to run_test.py (#32428 ) Summary: While working on https://github.com/pytorch/pytorch/issues/31768 and trying to add tests for `DataParallel`, I discovered that: - `test_data_parallel.py` can't be run through `run_test.py` - running it with `pytest` fails with many name errors `test_data_parallel.py` seems to have been split from `test_nn.py` in https://github.com/pytorch/pytorch/issues/28297 but not in a state where it can actually be run. Presumably `DataParallel` hasn't been tested by CI in the time since. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32428 Differential Revision: D19499345 Pulled By: ezyang fbshipit-source-id: f9b748a99a5c85fc6675c22506cf10bbfd9c8a4d	2020-01-21 15:11:03 -08:00
Pritam Damania	0b606a4a7c	Enhace DispatchStub to be thread safe from a TSAN point of view. (#32148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32148 TSAN would complain about multiple threads reading and writing to the `cpu_dispatch_ptr` without any sort of synchronization. Although, this is a valid issue from a TSAN point of view, there wasn't a correctness issue since both threads would compute the same value. In order to fix this, I've used std::atomic for cpu_dispatch_ptr with relaxed ordering guarantees. ghstack-source-id: 96989435 Test Plan: Verify the TSAN tests pass. Differential Revision: D19386082 fbshipit-source-id: 1ff0893e02529eddd06b2855d9565edf1bbf1196	2020-01-21 14:59:57 -08:00
anjali411	be6ffac1b6	Adagrad optimizer - updated step function, added param_groups, state to optimizers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29335 Differential Revision: D19449382 Pulled By: anjali411 fbshipit-source-id: ee238801ed9cdf15a80f2ce31cc4aab8ba582aea	2020-01-21 14:41:12 -08:00
svcscm	0ed04bfdf6	Updating submodules Summary: GitHub commits: `40b08129cf` `8cd8d286e6` `d305f13e21` `2957bd45f1` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 3b76eb7c8b6b5cf617aca7bd143e1ee404c4f0ed	2020-01-21 14:11:17 -08:00
Ashkan Aliabadi	e1d97025ee	QNNPACK: Add support for dynamic quantization. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31896 Test Plan: Added new tests to QNNPACK's test suite to cover the new use case. All new tests are passing. Reviewed By: supriyar Differential Revision: D19443250 Pulled By: AshkanAliabadi fbshipit-source-id: fa7b1cffed7266a3c198eb591d709f222141a152	2020-01-21 12:33:08 -08:00
svcscm	bc6005281b	Updating submodules Summary: GitHub commits: `47e0b9b97e` `6d225aaf95` `ab4da8f60a` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 27bcdf08b6f5e47a5c948e094aca26bf67a6fb66	2020-01-21 12:12:31 -08:00
James Reed	9e853e7090	Revert "Temporary workaround for BC test due to schema parser changes" (#32441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32441 This reverts commit ceffdbd2179e7dafdc6407909a00f4267db040de. Test Plan: Imported from OSS Reviewed By: houseroad Differential Revision: D19500043 Pulled By: jamesr66a fbshipit-source-id: 3bd22c55e4a81ff8b89d27f6e7438e3bdfc18606	2020-01-21 12:07:46 -08:00
Pritam Damania	f86d6c6afd	Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32338 Timed out ops could linger around if the user doesn't actually call `wait()` on that OP. As result, to fix this I've introduced the following functionality in this PR: 1. Keep track of all outstanding work in ProcessGroupNCCL. 2. Enhance NCCL watchdog to sweep through all outstanding work and perform the following operations: i. If the work has timed out, abort all communicators for that work and remove them from the cache. ii. If the communicators for the work receive an error, abort the communicators and remove them from the cache. iii. If the work has completed (successfully/unsuccessfully), remove it from the list of outstanding work. ghstack-source-id: 96895704 Test Plan: waitforbuildbot Differential Revision: D19401625 fbshipit-source-id: 8f6f277ba2750a1e1aa03cdbc76e8c11862e7ce5	2020-01-21 12:05:40 -08:00
Gaurav Singh	ec4be4e58c	Redundant condition (#32396 ) Summary: Optimize expression: 'A \|\| (!A && B)' <=> 'A \|\| B' A: relErr <= maxRelErr !A : relErr > maxRelErr B: absErr <= absErrForRelErrFailure Pull Request resolved: https://github.com/pytorch/pytorch/pull/32396 Differential Revision: D19499370 Pulled By: ezyang fbshipit-source-id: c19bdcb2d4e7ff7806a8cd181c6e7e9e276b9979	2020-01-21 11:30:49 -08:00
Dmytro Dzhulgakov	839fe714de	Fix BC test after TorchBind cahnges (#32429 ) Summary: It was broken by https://github.com/pytorch/pytorch/issues/32320. Let's be on the safe side and just whitelist all testing ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/32429 Differential Revision: D19501016 Pulled By: dzhulgakov fbshipit-source-id: 9cc1d363edb4579905bee1976a2b57255ce41738	2020-01-21 11:30:44 -08:00
David Reiss	e4f43bf7a5	Set rpath for JNI library on Mac (#32247 ) Summary: Without this, dlopen won't look in the proper directory for dependencies (like libtorch and fbjni). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32247 Test Plan: Build libpytorch_jni.dylib on Mac, replaced the one from the libtorch nightly, and was able to run the Java demo. Differential Revision: D19501498 Pulled By: dreiss fbshipit-source-id: 13ffdff9622aa610f905d039f951ee9a3fdc6b23	2020-01-21 11:30:39 -08:00
generatedunixname89002005287564	9482683065	Remove dead includes in caffe2/test Reviewed By: ezyang Differential Revision: D19273220 fbshipit-source-id: 3dfc3388914e60611c84472e3fc529f5b5e40534	2020-01-21 11:30:34 -08:00
Peter Bell	c13df8b688	Fix cusparse version check (#32405 ) Summary: The current version check doesn't use proper lexicographic comparison and so will break for future versions of cuSPARSE with `CUSPARSE_VER_MAJOR > 10` and `CUSPARSE_VER_MINOR < 2`. Also, my cusparse headers for CUDA 9 don't seem to include version macros at all, so added `if !defined` to be explicit about that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32405 Differential Revision: D19499412 Pulled By: ezyang fbshipit-source-id: 1593bf1e5a4aae8b75bb3b350d016cc6c3b9c009	2020-01-21 11:30:30 -08:00
Rohan Varma	9ce25cce91	add an option to record time spent waiting for GIL (#30842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30842 We'd like to profile the time spent on GIL acqusiition to debug performance issues. Test Plan: Unit tests pass. Differential Revision: D18837590 fbshipit-source-id: 925968f71c5fb96b8cd93f1eab4647602d2617d1	2020-01-21 11:29:23 -08:00
Edward Z. Yang	1177191c8e	Synchronize with ShipIt. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2020-01-21 13:39:28 -05:00
Tongzhou Wang	cc2d5b15ad	F.normalize uses clamp_min_ inplace (#32360 ) Summary: We don't care about autograd when `out!=None` anyways Pull Request resolved: https://github.com/pytorch/pytorch/pull/32360 Differential Revision: D19452402 Pulled By: colesbury fbshipit-source-id: c54775289f8a700019ca61e951d59ff4894ac980	2020-01-21 10:38:06 -08:00
Eli Uriegas	0c03304bdf	.circleci: Only run macos libtorch on master (#32378 ) Summary: These jobs were taking forver to run so we decided it's only really worth it to run it on master. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32378 Differential Revision: D19499301 Pulled By: seemethere fbshipit-source-id: 22cac5b5baee84e44607a16daeb77048cb0f5974	2020-01-21 10:38:01 -08:00
Bartosz Gasiorzewski	a2641e6005	Make type of `Tensor.type()` more specific (#32353 ) Summary: Fixes the following issue: ``` $ cat test.py import torch t = torch.tensor(1.5) t.type(torch.float32)[None] $ mypy test.py test.py:4: error: Invalid index type "None" for "Union[str, Tensor]"; expected type "Union[int, slice]" Found 1 error in 1 file (checked 1 source file) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32353 Differential Revision: D19499388 Pulled By: ezyang fbshipit-source-id: 715111e934aea020b20f850d27e32c4f70b82572	2020-01-21 10:37:56 -08:00
Peter Bell	418ebc827b	Build: Respect USE_CUDNN=0, even if cudnn is found (#32404 ) Summary: Currently, setting `USE_CUDNN=0` has no effect and any cudnn library found on your system will be used anyway. This is especially problematic when your system has multiple CUDA versions installed, and you are building with a version that lacks a matching cudnn. CMake will find any other cudnn versions and you end up with both CUDA versions added to your compiler include paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32404 Differential Revision: D19499425 Pulled By: ezyang fbshipit-source-id: a9b3f6f9dc22033481c3c1c5999b1a7ef98468cb	2020-01-21 10:36:03 -08:00
Kimish Patel	ecbf6f99e6	Removed unused weight update in prepack. Moved zero point update to (#32254 ) Summary: qlinear/qconv to be consistent with data update. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32254 Differential Revision: D19422929 Pulled By: kimishpatel fbshipit-source-id: 595a4f7d6fde4978c94f3e720ec8645f3f2bdb7a	2020-01-19 19:08:37 -08:00
Yuxin Wu	b543e3cd6f	support empty batch in group normalization (#32401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32401 https://github.com/pytorch/pytorch/issues/12013 Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- 'test_GroupNorm_empty' Differential Revision: D19463720 fbshipit-source-id: 8ae44590fc5eeb1adc69a2345d7cc2187d3307ac	2020-01-19 19:04:54 -08:00
svcscm	7fbfb7eef2	Updating submodules Summary: GitHub commits: `ea6039a6c9` `0d30b8e0fc` `7acedd4723` `4db6e3b785` `cd898afb5e` `cf5dd11204` `08bdcfd87e` `fc84c09b8f` `454d37976b` `a22e6b8cb4` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: b87550b26e69216be2a8e40870a6e7dab825261c	2020-01-19 03:30:58 -08:00
Yanli Zhao	58234c0254	support torch script call over rpc (#32197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197 This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test. [test all]This diff makes following changes: 1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type. Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods. 2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well. 3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT. 4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes. 5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue ghstack-source-id: 96879167 ghstack-source-id: 96879167 Test Plan: unit test Differential Revision: D19402374 fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e	2020-01-18 09:24:17 -08:00
James Reed	1ecad2bb2b	Test passing custom class instance to bound method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32320 Test Plan: Imported from OSS Differential Revision: D19437335 Pulled By: jamesr66a fbshipit-source-id: 8f5166dbe6fc5704b12b6224932460b12be0d39b	2020-01-17 23:09:38 -08:00
James Reed	c7078a1ce8	Fix returning instance of custom class from method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32312 Test Plan: Imported from OSS Differential Revision: D19433511 Pulled By: jamesr66a fbshipit-source-id: f048d5f60eaba992ee42fea2d318a59b3a156578	2020-01-17 23:09:34 -08:00
James Reed	c7fdf5b251	Remove __torch__ from custom class qualname Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32301 Test Plan: Imported from OSS Differential Revision: D19431645 Pulled By: jamesr66a fbshipit-source-id: 198522a1641cb9f90fa4c614da4ca4162fadf456	2020-01-17 23:09:29 -08:00
James Reed	ceffdbd217	Temporary workaround for BC test due to schema parser changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32324 Test Plan: Imported from OSS Differential Revision: D19438085 Pulled By: jamesr66a fbshipit-source-id: 3dd2586e73c890a7bdadd6cbb3df2c186f93199d	2020-01-17 23:08:20 -08:00
Nik Ved	61ee8c972f	porting scatter_add to ATen (CPU) (#31662 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24758](https://github.com/pytorch/pytorch/issues/24758). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31662 Differential Revision: D19440824 Pulled By: ngimel fbshipit-source-id: b13443cfcc8bcb9ec21f1cddb5c6fbc0ef4bb0f2	2020-01-17 21:36:54 -08:00
davidriazati	53429680d5	Remove stray `@script` (#32235 ) Summary: This should be covered under recursive script now Pull Request resolved: https://github.com/pytorch/pytorch/pull/32235 Pulled By: driazati Differential Revision: D19414889 fbshipit-source-id: 85f8132401dbe44c9dbaef7c0350110f90eb9843	2020-01-17 19:22:09 -08:00
Protonu Basu	8c40a78277	Back out "Calling JITed 8 Bit Fused SLS in FBGEMM from C2" (#32381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32381 Original commit changeset: 0dfa936eb503 "Facebook" Temporary remedy for SEV : https://our.intern.facebook.com/intern/sevmanager/view/s/193726 Test Plan: Run CI tests Reviewed By: jspark1105 Differential Revision: D19458382 fbshipit-source-id: 731790f96b341ade5e70ff13e4b0b5fafad0fea6	2020-01-17 19:08:48 -08:00
svcscm	25e62ebac9	Updating submodules Summary: GitHub commits: `9b13f58aa1` `044b292acc` `e1f67bbf3d` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 21df26f60f436eb8c1766f66afac4a0d93dd33d1	2020-01-17 18:32:53 -08:00
jiej	10c2bd35af	Fix cudnn channels_last descriptors problem (#31952 ) Summary: This is to append fixes to https://github.com/pytorch/pytorch/issues/31783 so we can pull the fixes in without breaking tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31952 Differential Revision: D19433839 Pulled By: ngimel fbshipit-source-id: 5b3d2f0b2a86aacd1d100dd86996ee0d63e5ee92	2020-01-17 17:45:07 -08:00
Jithun Nair	824e649d40	Specify requires_grad for Parameter replica so it's not always set to True by default (#32356 ) Summary: This is the proposed fix for issue https://github.com/pytorch/pytorch/issues/32018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32356 Differential Revision: D19450648 Pulled By: mrshenli fbshipit-source-id: c63eeb6e9f5a87ebe613dd7013907559f295a7ea	2020-01-17 17:41:10 -08:00
Jiakai Liu	0ac31a99be	run code analysis against mobile interpreter (#32276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32276 Include mobile interpreter in mobile code analysis pass, which has some manually registered ops in temporary namespaces. The mobile interpreter is still under development and these ops will be removed in the future. This is a temporary step for internal build experiment. Test Plan: Imported from OSS Differential Revision: D19426818 Pulled By: ljk53 fbshipit-source-id: 507453dc801e5f93208f1baea12400beccda9ca5	2020-01-17 17:21:28 -08:00
Xiang Gao	5bc44fb6ea	TensorIterator unrolling and vectorized load - step 0, 1 (#31974 ) Summary: This is step 0 and 1 for https://github.com/pytorch/pytorch/issues/31975: - Old code is moved to namespace `legacy` - New `elementwise_kernel` and `launch_kernel` added to namespace `modern`, they only support 1d contiguous case for now - In `gpu_kernel_impl`, dispatch to the new code if the problem is trivial 1d contiguous. In terms of performance, this PR affect elementwise operators on contiguous tensors. The performance is improved slightly (up to 8%) for medium size tensors on Volta. ## compiled code See https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise.ipynb We can see that, previously, the add kernel compiles to ``` //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 71 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 73 /0030/ S2R R3, SR_CTAID.X ; /0040/ IMAD R0, R3, 0x200, R0 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 76 /0050/ ISETP.GE.AND P0, PT, R0, c[0x0][0x160], PT ; /0060/ P0 EXIT ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110 /0070/ IMAD R3, R0.reuse, c[0x0][0x194], RZ ; /0080/ IMAD R6, R0, c[0x0][0x198], RZ ; /0090/ IADD3 R4, P0, R3.reuse, c[0x0][0x178], RZ ; /00a0/ IADD3 R2, P1, R6.reuse, c[0x0][0x180], RZ ; /00b0/ LEA.HI.X.SX32 R5, R3, c[0x0][0x17c], 0x1, P0 ; /00c0/ LEA.HI.X.SX32 R3, R6, c[0x0][0x184], 0x1, P1 ; /00d0/ LDG.E.SYS R5, [R4] ; /00e0/ LDG.E.SYS R2, [R2] ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 77 /00f0/ IMAD R0, R0, c[0x0][0x190], RZ ; /0100/ IADD3 R6, P0, R0, c[0x0][0x170], RZ ; /0110/ LEA.HI.X.SX32 R7, R0, c[0x0][0x174], 0x1, P0 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110 /0120/ FFMA R9, R2, c[0x0][0x1a0], R5 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 170 /0130/ STG.E.SYS [R6], R9 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 81 /0140/ EXIT ; .L_16826: /0150/ BRA `(.L_16826); /0160/ NOP; /0170/ NOP; .L_29063: ``` Now it compiles to ``` //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210 /0000/ MOV R1, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R6, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217 /0030/ MOV R7, 0x4 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 208 /0040/ S2R R3, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210 /0050/ LEA R6, R6, R3, 0x8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225 /0060/ IADD3 R2, R6.reuse, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217 /0070/ IMAD.WIDE R4, R6.reuse, R7.reuse, c[0x0][0x190] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225 /0080/ IADD3 R3, R6, 0x80, RZ ; /0090/ ISETP.GE.AND P1, PT, R2, c[0x0][0x160], PT ; /00a0/ ISETP.GE.AND P0, PT, R6.reuse, c[0x0][0x160], PT ; /00b0/ ISETP.GE.AND P2, PT, R3, c[0x0][0x160], PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217 /00c0/ IMAD.WIDE R2, R6.reuse, R7, c[0x0][0x188] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225 /00d0/ IADD3 R14, R6, 0xc0, RZ ; /00e0/ ISETP.GE.AND P3, PT, R14, c[0x0][0x160], PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 228 /00f0/ @!P1 LDG.E.SYS R11, [R4+0x100] ; /0100/ @!P0 LDG.E.SYS R0, [R2] ; /0110/ @!P0 LDG.E.SYS R9, [R4] ; /0120/ @!P1 LDG.E.SYS R8, [R2+0x100] ; /0130/ @!P2 LDG.E.SYS R10, [R2+0x200] ; /0140/ @!P2 LDG.E.SYS R13, [R4+0x200] ; /0150/ @!P3 LDG.E.SYS R12, [R2+0x300] ; /0160/ @!P3 LDG.E.SYS R15, [R4+0x300] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245 /0170/ IMAD.WIDE R6, R6, R7, c[0x0][0x180] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191 /0180/ FFMA R9, R9, c[0x0][0x168], R0 ; /0190/ FFMA R11, R11, c[0x0][0x168], R8 ; /01a0/ FFMA R13, R13, c[0x0][0x168], R10 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245 /01b0/ @!P0 STG.E.SYS [R6], R9 ; /01c0/ @!P1 STG.E.SYS [R6+0x100], R11 ; /01d0/ @!P2 STG.E.SYS [R6+0x200], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191 /01e0/ FFMA R15, R15, c[0x0][0x168], R12 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 244 /01f0/ P3 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245 /0200/ STG.E.SYS [R6+0x300], R15 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 248 /0210/ EXIT ; .L_727: /0220/ BRA `(.L_727); /0230/ NOP; /0240/ NOP; /0250/ NOP; /0260/ NOP; /0270/ NOP; .L_32233: ``` ## benchmark The benchmark is for add kernel on Volta. See https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-unroll.ipynb For tensors of size from 2^20 to 2^30, previously we had ``` 1.5.0a0+dedd16b dedd16b4181cae81e37e978cd3bf24c1ba35ca05 33 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 48.7 µs ± 75 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 78.9 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 140 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 261 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 506 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 993 µs ± 189 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.96 ms ± 139 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.9 ms ± 955 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.79 ms ± 187 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Now we have ``` 1.5.0a0+b1a239b b1a239be8d529e89875fe47cd09964ef3a9516ac 30.4 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 45.2 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 75 µs ± 476 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 134 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 253 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 489 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 961 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.91 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.8 ms ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.57 ms ± 763 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` It is slightly better. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31974 Differential Revision: D19450765 Pulled By: ngimel fbshipit-source-id: 79601bfceb5da84ff87384ba8193793eb4095a2e	2020-01-17 17:16:23 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
svcscm	c8ca70e39d	Updating submodules Summary: GitHub commits: `54b290f00f` `e8df50310d` `ef5c9efe12` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 7b6dc88d40e8fd8c396d4d12846db43b0fb4258c	2020-01-17 15:48:29 -08:00
Zachary DeVito	7e3c438913	Renaming IValue List functions (#32093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093 toGenericListRef -> toListRef isGenericList -> isList toGenericList -> toList toXListRef -> toXVector Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D19369767 Pulled By: zdevito fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae	2020-01-17 15:17:45 -08:00
Rohan Varma	bdd5e15437	skip testExceptions in ProcessGroupGloo if built with TSAN (#32242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32242 TSAN and fork don't play well together, so skip this test if we're building under TSAN. It will still run in other modes. Differential Revision: D19416113 fbshipit-source-id: 7e88d63a843356372160c2524c05e8fd1706553e	2020-01-17 14:17:06 -08:00
svcscm	5a58c16722	Updating submodules Summary: GitHub commits: `29aba0a287` `37a97eb4de` `0efdd57292` `6d886fc7eb` `2e5854752a` `931d1c643b` `781986ef71` `2e6d2903d7` `e04348ff63` `e8650fd560` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: abd7ee4aaec8401b2c885335940773a0655b4496	2020-01-17 12:48:36 -08:00
Yanghan Wang	9b6ec61bfd	exposing CPU/GPU Copy ops (#32248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248 expose CPU/GPU copy ops Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test Reviewed By: houseroad Differential Revision: D19405856 fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e	2020-01-17 12:40:43 -08:00
Elias Ellison	e7bc1663bd	fix unchecked cast alias analysis (#32309 ) Summary: Unchecked cast just refines the type of a value, the value stays the same, so the output should alias the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32309 Differential Revision: D19439037 Pulled By: eellison fbshipit-source-id: fe6902d0d9a5a9ef5e9c13e1dbd056576d8c327e	2020-01-17 12:29:28 -08:00
Yinghai Lu	df514fd8c0	C++ C2/Glow operator unittest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258 Test Plan: ``` buck test glow/fb/test/numerics:fp16_op_test ``` Reviewed By: bddppq Differential Revision: D19401786 fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9	2020-01-17 12:13:34 -08:00
Ashkan Aliabadi	e133d8be3b	Fix ASAN / potential segfault in quantized Tensor memory allocations. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29882 Differential Revision: D18522039 Pulled By: AshkanAliabadi fbshipit-source-id: 1fdc68491aa2ac176633b9ecc3ee78c9175a97aa	2020-01-17 12:09:25 -08:00
Alexander Melnikov	4e69352713	Add 64bit atomic fetch add (#32354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32354 adding int_64 version of AtomicFetchAdd Reviewed By: bwasti Differential Revision: D19434349 fbshipit-source-id: b2358e8c5c6b7cd7e7b21de974b4ee1b5258fcf4	2020-01-17 11:43:43 -08:00
Tao Xu	aa61d1ee85	Add a new job to support custom build (#32323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32323 ### Summary Since we have released the custom build in 1.4.0, it's time to setup a CI for that. This PR adds a new iOS job to the iOS builds. To save time, It only runs the arm64 build. ### Test Plan - Don't break any iOS jobs - Custom Build works. Test Plan: Imported from OSS Differential Revision: D19451342 Pulled By: xta0 fbshipit-source-id: 9de305c004fc795710ecf01d436ef4792c07760c	2020-01-17 11:39:08 -08:00
Pavel Belevich	7732924501	Delete unused bernoulli_Tensor from THTensorRandom.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32328 Test Plan: Imported from OSS Differential Revision: D19448736 Pulled By: pbelevich fbshipit-source-id: 92380ca1e0c0ac88d100e6fba8d216a46d0b181e	2020-01-17 11:09:19 -08:00
Jerry Zhang	8c1268aad3	Use default scale/zero_point in fake_quantize module instead of None (#32318 ) Summary: Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out. fixes: https://github.com/pytorch/pytorch/issues/32082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318 Differential Revision: D19434801 Pulled By: jerryzh168 fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469	2020-01-17 11:04:08 -08:00
anjali411	5b815d980e	Added cummin Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32238 Differential Revision: D19416791 Pulled By: anjali411 fbshipit-source-id: 5aadc0a7a55af40d76f444ab7d7d47ec822f55a5	2020-01-17 10:51:58 -08:00
Xiang Gao	78d8f691ad	Don't dispatch to integral types in smooth_l1_kernel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32333 Differential Revision: D19442787 Pulled By: ngimel fbshipit-source-id: 9578483202614d7406eceb13cbf15b253c04f237	2020-01-17 10:47:43 -08:00
Rohan Varma	6a5a55d573	use gtest asserts in ProcessGroupGlooTest instead of other checks (#32138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32138 I personally prefer `throw std::runtime_error("BOOM")`, but we should probably have asserts here now that it is gtest. Also ensures that the correct exceptions are thrown by the `testSignal` tests. ghstack-source-id: 96811000 Differential Revision: D19382905 fbshipit-source-id: 1b00dd70524d03c8bd6f48715baa5070a7985467	2020-01-17 10:31:59 -08:00
Nikolay Korovaiko	4968bc2450	cap the maximum depth of bailout chains at 1 (#32073 ) Summary: This is another implementation of the maximum bailout depth. The first version was implemented in https://github.com/pytorch/pytorch/pull/31521 This one has advantages that * the bailout depth only exists in `CodeImpl` which seems to be an appropriate place to keep it in. * threading many objects is reduced to threading through CodeImpl and getPlanFor Pull Request resolved: https://github.com/pytorch/pytorch/pull/32073 Differential Revision: D19443432 Pulled By: Krovatkin fbshipit-source-id: 898384bb2308a1532a50a33d9e05cfca504711e6	2020-01-17 09:42:46 -08:00
svcscm	61a2b34113	Updating submodules Summary: GitHub commits: `2d9c2bb401` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: ea12c419c4bab8ce60793deecb10a8ead086a4d5	2020-01-17 05:54:26 -08:00
Rohan Varma	904ab092c2	fix testSend and testRecv in ProcessGroupGlooTest (#32134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32134 These tests weren't written in the most correct way and were often flaky. It was tricky to identify these tests as flaky until we moved this file to use gtest. The gist of the issue is that the test previously would not coordinate sends and recvs properly. For example, we created a single thread to test an abortRecv and a successful recv. A separate sender thread was used to send 2 messages. What could go wrong here is that the first send could successfully complete, resulting in the receiving end processing the message before it gets the abort signal. In this case we would have an error in the test. ghstack-source-id: 96806879 Differential Revision: D19379395 fbshipit-source-id: 24782ccaf6e6ec6b445378b29d5f10f901e0dee6	2020-01-17 04:00:39 -08:00
Yanli Zhao	7a9c920bac	add lock for ncclCommAbort (#31901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31901 ncclCommAbort is not thread safe, so adding a lock for it ghstack-source-id: 96829715 Test Plan: unit tests Differential Revision: D19293869 fbshipit-source-id: 711b4a07605d6e5a81577247d2f90a78041c1809	2020-01-17 03:57:08 -08:00
David Gisser	91bdb872ce	fix spelling mistake: excpected -> expected Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28817 Differential Revision: D18544562 Pulled By: dgisser fbshipit-source-id: 51f728e807f9c4bb30f58585d5b6f436cb880153	2020-01-17 00:11:08 -08:00
Jing Huang	ef5ae4823a	Register RoIAlignRotated with C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30785 Reviewed By: wat3rBro Differential Revision: D18415056 fbshipit-source-id: e00376bec948309d53f2172697cd477449f769b2	2020-01-16 16:32:28 -08:00
Nikolay Korovaiko	b79030d6c8	remove unused code after refactoring optimizations into profiling-sensitive and profiling-insensitive (#32106 ) Summary: After we removed `Specialize_AutogradZero` from the optimization pipeline of the simple executor mode, we don't need to mark any inputs as undefined in `autodiff`. Also, `needsGradient` in `graph_executor.cpp` never runs on graph with profiling information, so I removed that code as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32106 Differential Revision: D19374238 Pulled By: Krovatkin fbshipit-source-id: 4223d3efe3c904a55a28471e5ae9593017ce3e07	2020-01-16 16:31:16 -08:00
Xintao Chen	c2761490fc	Enhancing the test (#32321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32321 Updating the test to test more meaningful sematics Test Plan: [xintchen@devvm6308.prn2 ~/fbsource/fbcode] buck test mode/dev //caffe2:ATen-core-test -- 'OperatorRegistrationTest\.whenRegisteringCPUTensorType_thenCanOnlyCallUnboxedWithCPUTensorIdDispatchKey' Building: finished in 0.4 sec (100%) 517/517 jobs, 0 updated Total time: 0.5 sec Trace available for this run at /tmp/testpilot.20200116-132729.2541763.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision e5f315ebe0508d11fc281fa4b4f7b43d2ef1c003 fbpkg 67e8eb96914f400db234fd9af70fdcde at Wed Jan 15 23:38:32 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/762/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/6192449492430045 ✓ caffe2:ATen-core-test - OperatorRegistrationTest.whenRegisteringCPUTensorType_thenCanOnlyCallUnboxedWithCPUTensorIdDispatchKey 0.002 1/1 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6192449492430045 Summary (total time 1.15s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D19436345 fbshipit-source-id: c1f2383d62627aa4507616b8905ceb42ac563e9d	2020-01-16 15:56:34 -08:00
Nikolay Korovaiko	53708e21ed	classic fixed-point liveness Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31724 Differential Revision: D19426570 Pulled By: Krovatkin fbshipit-source-id: 3387dfb25e6e9456d5d0517eac1d2e44e61d6813	2020-01-16 15:13:22 -08:00
Tao Xu	8c8bd79f32	Add CI scripts for Custom Build (#32316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32316 ### Summary Since the Custom Build has been released in 1.4.0, it's time setup CI. To do that, we need 1. Add a python script to generate the yaml file 2. Add new build scripts to circle CI (arm64 only). ### Test Plan - Don't break the current iOS CIs Test Plan: Imported from OSS Differential Revision: D19437362 Pulled By: xta0 fbshipit-source-id: 395e27a582c43663af88d11b1ef974a4687e672c	2020-01-16 14:46:16 -08:00
Edward Yang	34c751c263	Eliminate exception throwing code from dispatch call sites (#32168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32168 We move the exception raising into the function, saving us a big pile of instructions for raising the stack. After this stack of changes, the compiler is willing to inline, e.g., `c10::KernelFunction::callUnboxed<at::Tensor, at::Tensor const&>(c10::OperatorHandle const&, at::Tensor const&) const::__func__` (whereas previously it refused to do so.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392948 Pulled By: ezyang fbshipit-source-id: d5edab00cae48444b308e74438a17a421532c08f	2020-01-16 14:43:16 -08:00
Edward Yang	b85dbe8f7b	Out-of-line construction of OperatorName. (#32121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32121 This reduces code size in the call sites of this function (of which there are many: one for every operator call) since we no longer have to construct std::string at the site. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392951 Pulled By: ezyang fbshipit-source-id: 8bc43d46ba635380ff9f8989f7557fdd74b552cf	2020-01-16 14:43:12 -08:00
Edward Yang	36d09197ab	Move error reporting code out-of-line from header. (#32118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32118 This reduces code size and makes the calling function more likely to inline. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392950 Pulled By: ezyang fbshipit-source-id: 5e3829cca5604407229f93c2486eb9a325581ea2	2020-01-16 14:43:07 -08:00
Edward Yang	7b7390778c	Make an assert on a hotpath trigger only in DEBUG mode. (#32117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32117 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392949 Pulled By: ezyang fbshipit-source-id: 7f579e45d49bddeab36b8dd1a90c83224a368ac8	2020-01-16 14:42:18 -08:00
Xiang Gao	8746f90cf6	Fix weight backward for cudnn conv of large tensor (#31889 ) Summary: This is the last PR for https://github.com/pytorch/pytorch/issues/22496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31889 Differential Revision: D19431371 Pulled By: ngimel fbshipit-source-id: 754fa91d49ad03549cb07aa30dde34bf9e851302	2020-01-16 14:15:52 -08:00
David Clissold	b26ee54176	For ppc64le, stop presenting the python 2.7 builds (we will no longer… (#32315 ) Summary: For ppc64le, we no longer plan to run regular builds on Python 2.7, and we wish to stop publicizing the build status for those two builds (ppc64le/CPU and ppc64le/GPU each on py27). This pull request simply removes the build status links for these two builds, replacing them with a generic dash character (consistent with other un-publicized builds within the table). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32315 Differential Revision: D19435939 Pulled By: soumith fbshipit-source-id: c9f31e7acba83e42f6a758ac011bbef36fd8aaa0	2020-01-16 13:49:40 -08:00
Hugo	cd99b3706a	Pin Pillow to latest and use a torchvision that works with it (#32290 ) Summary: Follow on from https://github.com/pytorch/pytorch/pull/31777, as suggested in https://github.com/pytorch/pytorch/pull/31777#issuecomment-575166543. Pillow 7.0.0 removed `PILLOW_VERSION` and `__version__` should be used instead. torchvision 0.5.0 switched from using `PILLOW_VERSION` to `__version__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32290 Differential Revision: D19430280 Pulled By: mrshenli fbshipit-source-id: be8d6317a4948d71e818adeafe61dfe567df5601	2020-01-16 10:48:22 -08:00
Gaurav Singh	f94aab45fd	Logical condition reduction (#32201 ) Summary: x \|\| ( !x && y ) <=> to x \|\| y Pull Request resolved: https://github.com/pytorch/pytorch/pull/32201 Differential Revision: D19429334 Pulled By: ezyang fbshipit-source-id: 044dc46c2d9a7e180aa1795703c0097b0c7c3585	2020-01-16 07:57:12 -08:00
Vadim Kantorov	14548c2d5b	out variant for native_batch_norm forward (#29192 ) Summary: This is dealing with forward of native BatchNorm CUDA impl to support inplace operation. The larger issue: https://github.com/pytorch/pytorch/issues/26288 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/29192 Differential Revision: D19410370 Pulled By: ezyang fbshipit-source-id: a6889c96bdd848f3a1cb2d943d06e054d22fb7ab	2020-01-16 07:24:13 -08:00
Nathan Goldbaum	bab87e4b60	reimplement __torch_function__ overrides for torch.functional using inline logic (#32194 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30831. This improves the performance of operators in the `torch.functional` namespace that are overridable by `__torch_function__` implementations when supplied with `Tensor` operands. Running the split benchmark in various configurations produces the following timings: <details> <summary>Expand for timings on <code>master</code> </summary> ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cpu # Input: M: 8, N: 8, parts: 2, device: cpu Forward Execution Time (us) : 3.340 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cuda # Input: M: 8, N: 8, parts: 2, device: cuda Forward Execution Time (us) : 3.333 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 3.366 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cuda # Input: M: 256, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 3.385 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cpu # Input: M: 512, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 3.468 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cuda # Input: M: 512, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 3.416 ``` </details> <details> <summary>Expand for timings with this pull request applied</summary> ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cpu # Input: M: 8, N: 8, parts: 2, device: cpu Forward Execution Time (us) : 2.261 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cuda # Input: M: 8, N: 8, parts: 2, device: cuda Forward Execution Time (us) : 2.223 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.237 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cuda # Input: M: 256, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.218 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cpu # Input: M: 512, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.259 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cuda # Input: M: 512, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.234 ``` </details> <details> <summary>Expand for timings on <code>master</code> with <code>__torch_function__</code> dispatch disabled </summary> ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cpu # Input: M: 8, N: 8, parts: 2, device: cpu Forward Execution Time (us) : 2.180 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cuda # Input: M: 8, N: 8, parts: 2, device: cuda Forward Execution Time (us) : 2.172 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.171 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cuda # Input: M: 256, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.146 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cpu # Input: M: 512, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.175 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cuda # Input: M: 512, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.152 ``` </details> So at least on the machine I'm testing on, this brings the overhead down to less than 100 ns. For comparison, the overhead for `__array_function__` in NumPy is about 850 ns on the same machine. <details> <summary>Expand for timings for NumPy <code>__array_function__</code> dispatch </summary> ``` In [1]: import numpy as np In [2]: %timeit np.mean([1]) 8.89 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit np.mean._implementation([1]) 8.04 µs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` See [the implementation in NumPy](https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L195) for why this measures `__array_function__` overhead. </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32194 Differential Revision: D19410396 Pulled By: ezyang fbshipit-source-id: ada788a4399c81cd7eb2d548aa04a2459e96634a	2020-01-16 07:10:38 -08:00
Xintao Chen	7df5dc2775	Creating callUnboxedWithDispatchKey method (#32198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32198 creating a method called "callUnboxedWithDispatchKey". Also adding tests to make sure it works. Test Plan: buck test mode/dev //caffe2:ATen-core-test Differential Revision: D19402815 fbshipit-source-id: b206cf04b1216fbbd5b54ac79aef495cb0c1be06	2020-01-16 01:37:41 -08:00
Yinghai Lu	d75b6b3f9d	Support shape inference and lowering of SparseLengthsWeightedSumFused4BitRowwise (#32257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32257 Pull Request resolved: https://github.com/pytorch/glow/pull/4018 att. Test Plan: Unit tests: ``` buck test glow:masterCaffe2ImporterTest -- caffe2.SparseLengthsSumFused4BitRowwise buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: jfix71 Differential Revision: D19389014 fbshipit-source-id: 5f6863443adee5d3bf7a50a105866441eefb9560	2020-01-15 23:49:06 -08:00
svcscm	f3b62d4b1c	Updating submodules Summary: GitHub commits: `191bbb1069` `9d5a6e33e3` `2bdfe1544a` `1600bee8de` `b7f1b3e51c` `3220376f13` `1ba747dfb4` `0d5b08cbfc` `481179a38e` `9bc4f9c40f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 79135519c3449c2b77ff1ca7d4f13724e2390f6e	2020-01-15 21:37:32 -08:00
h6197627	851a7e861b	Add CAFFE2_API to video decoding functions (#31187 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31132 Also closes old issue https://github.com/pytorch/pytorch/issues/11735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31187 Differential Revision: D19147172 Pulled By: pbelevich fbshipit-source-id: e959058eec3489061f431fbecc99ded0d4dc1704	2020-01-15 19:39:02 -08:00
svcscm	89c6e18c43	Updating submodules Summary: GitHub commits: `9915834ced` `3cdb0d61d6` `93a4e9f4cc` `dafd450683` `b5d5670e40` `bab52dcc84` `d2b4d42d4b` `83479196c3` `f2ec66095a` `99561fee3b` `eacaa4f35d` `4ce4667b20` `89291814cc` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 2a3c90f0a7615441dae746b18b9048cfddf0f4de	2020-01-15 17:54:21 -08:00
Michael Suo	90c65b81c3	Define `repr()` on IValues (#32232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32232 Previously, we were using `operator<<` as the default way of printing IValue constants during serialization. The semantics of `operator<<` were ill-defined; and this bit us in particular with strings and lack of quoting. This PR defines the role of `operator<<`: much like Python `str()`, it is intended to produce a human-readable-ish representation for debugging purposes. This PR also defines a new `repr()` function on IValue that is intended to produce a valid Python expression that can be used to recreate an object with the same value. `repr()` is not defined on all IValue kinds (notably tensors!) for this reason. Test Plan: Imported from OSS Differential Revision: D19417036 Pulled By: suo fbshipit-source-id: c102d509eaf95a28b6a62280bc99ca6f09603de5	2020-01-15 17:35:41 -08:00
Ivan Kobzarev	104b2c610b	Tensor prep from image in native (#31426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31426 Tensor convertion from YUV image is moved to native with optimizations to eliminate branching inside loop, no variables declaration, less ops. Perf stat from local devices - measuring converting 320x240 image from camera to 1,3,224,224 tensor; Legend: Java - current java impl JavaOpt - current java impl + the same optimizations with no if/else in for, declare variables outside of for, inlining etc. C - C impl ``` Nexus 5 JavaOpt N:25 avg:119.24 min: 87 max:177 p10:102 p25:105 p50:115 p75:127 p90:150 C N:25 avg: 17.24 min: 14 max: 39 p10: 14 p25: 15 p50: 15 p75: 16 p90: 23 Java N:25 avg:139.96 min: 70 max:214 p10: 89 p25:110 p50:139 p75:173 p90:181 avg C vs JavaOpt 6.91x Pixel 3 XL JavaOpt N:19 avg: 16.11 min: 12 max: 19 p10: 14 p25: 15 p50: 16 p75: 18 p90: 19 C N:19 avg: 5.79 min: 3 max: 10 p10: 4 p25: 5 p50: 6 p75: 6 p90: 9 Java N:19 avg: 16.21 min: 12 max: 20 p10: 14 p25: 15 p50: 16 p75: 18 p90: 20 avg C vs JavaOpt 2.78x Full build with 4 abis inside: Pixel 3 XL JavaOpt N:25 avg: 18.84 min: 16 max: 24 p10: 16 p25: 17 p50: 18 p75: 20 p90: 22 C N:25 avg: 7.96 min: 5 max: 10 p10: 7 p25: 7 p50: 8 p75: 9 p90: 9 avg C vs JavaOpt 2.36x ``` Test Plan: Imported from OSS Differential Revision: D19165429 Pulled By: IvanKobzarev fbshipit-source-id: 3b54e545f6fbecbc5bb43216aca81061e70bd369	2020-01-15 17:10:00 -08:00
Ivan Kobzarev	de5821d291	Torchscript print to logcat (#31456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31456 External request https://discuss.pytorch.org/t/jit-android-debugging-the-model/63950 By default torchscript print function goes to stdout. For android it is not seen in logcat by default. This change propagates it to logcat. Test Plan: Imported from OSS Differential Revision: D19171405 Pulled By: IvanKobzarev fbshipit-source-id: f9c88fa11d90bb386df9ed722ec9345fc6b25a34	2020-01-15 16:44:56 -08:00
Tao Xu	31b7d0873c	Add File existence checking (#32208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32208 ### Summary Since the master branch will generate `libtorch_cpu.a`, which is different from the release branch. This PR will skip the missing libs before archiving them. ### Test Plan - don't break the nightly build Test Plan: Imported from OSS Differential Revision: D19420042 Pulled By: xta0 fbshipit-source-id: fb28df17b7e95d5c7fdf5f3a21bece235d7be17c	2020-01-15 15:35:50 -08:00
neginraoof	8b4c695e47	Added cons folding for ONNX mul, div, sqrt ops (#32077 ) Summary: An example of a model with such leaf nodes is faster_rcnn model. This PR helps optimizing onnx ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32077 Reviewed By: hl475 Differential Revision: D19399622 Pulled By: houseroad fbshipit-source-id: 35c628c6f1514b79f1bcf7982c25f0f4486f8941	2020-01-15 15:31:34 -08:00
neginraoof	ffc8e255c4	Sort export w/ negative axes (#31971 ) Summary: Fixing export of Sort on negative axes Pull Request resolved: https://github.com/pytorch/pytorch/pull/31971 Reviewed By: hl475 Differential Revision: D19325874 Pulled By: houseroad fbshipit-source-id: 18ab2bf39221970c8ab65a1355f5759f88faa54f	2020-01-15 15:13:23 -08:00
Negin Raoof	4460a86cd6	Support op registration if name starts with underscore (_) (#32017 ) Summary: This is required for rehistering torchvision::_new_empty_tensor op Pull Request resolved: https://github.com/pytorch/pytorch/pull/32017 Reviewed By: hl475 Differential Revision: D19399606 Pulled By: houseroad fbshipit-source-id: 43e1f2d78d2a0310af347b42f7e9b54cd503a20d	2020-01-15 14:57:57 -08:00
Will Feng	01010f5705	Add comments to torch::nn::ConvTranspose{1,2,3}d modules explaining how to use them in a Sequential module (#32223 ) Summary: Following changes in https://github.com/pytorch/pytorch/pull/31005. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32223 Differential Revision: D19415328 Pulled By: yf225 fbshipit-source-id: f6f74f10ba3b5cc7e1a92f8b02ea4c9747018ae8	2020-01-15 14:53:33 -08:00
Edward Yang	a5161c7022	Update out-of-date comment on Docker image updates. (#32224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32224 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19416878 Pulled By: ezyang fbshipit-source-id: 0205d0635658a3328128dcaad94bbbef505342be	2020-01-15 14:30:58 -08:00
Shen Li	322f34b245	Adding DDP Design Note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32158 Test Plan: Imported from OSS Differential Revision: D19405980 Pulled By: mrshenli fbshipit-source-id: 808ef1c71b637546f8872375bf1828967b1a5a60	2020-01-15 14:10:45 -08:00
Alexander Golynski	74621ca926	Add allgather_base as per our discussion re: ProcessGroup interface. (#31892 ) Summary: Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31892 Test Plan: No functional changes, no tests yet. Differential Revision: D19290739 Pulled By: agolynski fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a	2020-01-15 14:05:23 -08:00
Alban Desmaison	81048c41ab	remove simple .data from torch/nn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31482 Test Plan: Imported from OSS Differential Revision: D19303243 Pulled By: albanD fbshipit-source-id: 5afdfeb4b8382c09b9ec65acd545148ed76d4285	2020-01-15 12:40:38 -08:00
Chetan Kandpal	3363ca20a7	example_outputs Doc Edit (#31826 ) Summary: torch.onnx.export docs contain two descriptions for 'example_outputs' arg. So combined the information for it with the description with the parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31826 Differential Revision: D19274928 Pulled By: zou3519 fbshipit-source-id: cbcce0a79c51784c1d7aa8981aab8aac118ca9b4	2020-01-15 12:34:34 -08:00
Shihao Xu	3d01e3d16f	Notify other threads before running callbacks (#31713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31713 - In case the callbacks are heavy/slow, the other threads should be able to start work on the value of the future after the current thread moves the value and unlock the mutex. - `completed()` is not inlined. Avoid function call overhead. ghstack-source-id: 96694593 Test Plan: tdb Differential Revision: D5624371 fbshipit-source-id: 5762e6e894d20108ec9afedd1a6e64bcd97ee3fe	2020-01-15 12:03:07 -08:00
Tim Gates	0392e8384b	Fix simple typo: whos -> whose (#31288 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31287 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31288 Differential Revision: D19166753 Pulled By: zou3519 fbshipit-source-id: da31ad323b8fafa7cbc502fda4e2eb6e02facfb6	2020-01-15 11:47:21 -08:00
Jerry Zhang	4314620ba0	[jit] Module clone work with shared ClassType (#31970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31970 Now that the ClassType can be shared among different module instances, we'll preserve the sharing in clone as well, that is if the original module has a ClassType that is shared, we'll clone this ClassType once and share it between different module instances as well. Test Plan: build/test/test_jit Imported from OSS Differential Revision: D19406251 fbshipit-source-id: 2881c695f6e718e5432040a3817cf187a62017bf	2020-01-15 11:24:53 -08:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Shu Liu	8c3ee9f2ba	[Python] Deprecate use of scipy.misc.logsumexp and scipy.misc.comb (#32209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32209 * Deprecate use of scipy.misc.logsumexp and scipy.misc.comb. * Removed in 1.0.0 https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.misc.logsumexp.html and https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.misc.comb.html * Use scipy.special.logsumexp and scipy.special.comb instead. * This diff updates most usages of except those in experimental folders. * This diff does NOT fix existing lint/code/TARGETS issues. * This diff does NOT autoformat codes. Test Plan: sandcastle auto unittests Differential Revision: D19406460 fbshipit-source-id: 2103fa0d674d9671a0175f4ce54b3c887d22f04e	2020-01-15 10:40:47 -08:00
Vamshi Chowdary	05088da8e9	[pytorch][PR] Fixed error in sample code of documentation (#31682 ) Summary: "in_features" and "out_features" are not defined. Possibly a typo. They should be "input_features" and "output_features" instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/31682 Differential Revision: D19251685 Pulled By: zou3519 fbshipit-source-id: ac9e524e792a1853a16e8876d76b908495d8f35e	2020-01-15 10:34:07 -08:00
Alban Desmaison	ef0f96e92f	[pytorch][PR] update comment in autograd.h for locking (#32222 ) Summary: Just update the comment to make it accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32222 Differential Revision: D19410428 Pulled By: albanD fbshipit-source-id: ad13596382613c2728e674a47049ea4f563964b9	2020-01-15 09:42:24 -08:00
Richard Zou	19bbb4fccb	Stop building documentation in pytorch_linux_xenial_cuda*_build (#32187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32187 Fixes #32058. Previously we would build documentation during the pytorch linux cuda build. We don't actually need to do this because we have a dedicated python_doc_build job that builds the docs. With this change, the CUDA build should run ~10 minutes faster, giving devs faster signal. Test Plan: - Check the CUDA (10.1) build on this PR, make sure it doesn't build the docs. Differential Revision: D19400417 Pulled By: zou3519 fbshipit-source-id: e8fb2b818146f33330e06760377a9afbc18a71ed	2020-01-15 07:48:42 -08:00
Elias Ellison	4dce482acb	dict type unification fix (#32185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32185 Previously we would unify the contained types of dictionaries, however this breaks type safety. ``` torch.jit.script def test(input: Dict[str, None], cond): if cond: out = input else: out: {"1": 1} out["hi"] = 3 ``` This would only occur if a dictionary is being re-assigned across an if condition with different contained types, which is pretty unlikely. I tested `model_backward_compatibility` for all fb models and this didn't break anything. This PR is a precursor to alias analysis changes. Also fixes `Future` type unification. Because `Future` is an immutable type, it is okay to unify the contained type. Test Plan: Imported from OSS Differential Revision: D19398585 Pulled By: eellison fbshipit-source-id: ebc8812cdf5b6dba37b1cfbc2edc7d8c467b258c	2020-01-14 23:02:05 -08:00
Elias Ellison	c70bb0a4f8	Fixes to prim ops (#32179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32179 Tensors are used as keys in dictionaries, so we need to annotate that key insertion into a dictionary inserts the key into the wildcard set. Also fixes bug with `listCopyAndSort` not copying the input list. Test Plan: Imported from OSS Differential Revision: D19397555 Pulled By: eellison fbshipit-source-id: 17acdc22ff5e2dda44fd25c80450396f5592095e	2020-01-14 22:58:29 -08:00
Jongsoo Park	879620e85e	[caffe2] fix how np.clip is used in lengths_reducer_fused_{4,8}_rowwise_ops_test (#32086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32086 np.clip(1, num_indices // 2, 10) -> np.clip(num_indices // 2, 1, 10) Also change batchsize -> num_rows to match with what the variable actually does Test Plan: CI Reviewed By: hx89 Differential Revision: D19361521 fbshipit-source-id: 9ce864c7d7da046dc606afa5207da677ccf80f52	2020-01-14 22:53:28 -08:00
James Donald	7ad03855dc	Fix 'template' keyword warning with clang-cl and clang.exe (#32104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32104 Fixes these warnings: ``` xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(96,17): warning: use 'template' keyword to treat 'data' as a dependent template name W.t.data<uint8_t>(), ^ template xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(97,17): warning: use 'template' keyword to treat 'data' as a dependent template name B.t.data<int32_t>(), ^ template ``` Test Plan: Tested locally with clang-cl and CI for other toolchains Reviewed By: boguscoder Differential Revision: D19353563 fbshipit-source-id: c28afb8c1ad72fd77ef82556ba89fcf09100d1f9	2020-01-14 20:09:35 -08:00
Shihao Xu	02f09a1bbd	Implement backend-agnostic rpc._wait_all_workers() utility (#32190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32190 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. ghstack-source-id: 96693296 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19399908 fbshipit-source-id: 1dee607cd49adafe88534621a1c85e2736e2f595	2020-01-14 19:19:14 -08:00
Rohan Varma	7572501d40	move ProcessGroupGlooTest to gtest (#32133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32133 We should do this to better debug the test. Differential Revision: D19375479 fbshipit-source-id: 8c2bf61bae605a38252bb793b091ade479bea11a	2020-01-14 17:42:42 -08:00
anjali411	8dc67a014f	Add cummax Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32169 Differential Revision: D19393236 Pulled By: anjali411 fbshipit-source-id: 5dac6b0a4038eb48458d4a0b253418daeccbb6bc	2020-01-14 17:19:10 -08:00
Nikolay Korovaiko	02c3493a84	Fix an invalid peephole transformation if input/output values are written to (#28455 ) Summary: fixes https://github.com/pytorch/pytorch/issues/28360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28455 Differential Revision: D19374601 Pulled By: Krovatkin fbshipit-source-id: 622f24b40aba03e79e55a6b8d25d88417f7d8bad	2020-01-14 16:28:07 -08:00
Will Feng	2bd179147a	Fix typo in config script to re-enable libtorch build and test in macOS CI (#32072 ) Summary: Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue. Test Plan: Check that libtorch build and test are running again in macOS CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32072 Differential Revision: D19391909 Pulled By: yf225 fbshipit-source-id: 1ab345b099869f78e1124f1a8bd185fa51371b6a	2020-01-14 16:23:57 -08:00
Lu Fang	f6f1e0aef5	Automatic update of fbcode/onnx to 65020daafa9183c769938b4512ce543fd5740f8f (#32125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32125 Previous import was 57ebc587fcf3913b4be93653b0dd58c686447298 Included changes: - [65020daa](https://github.com/onnx/onnx/commit/65020daa): better error message for undefined inputs (#2540) <Yuxin Wu> - [8afff0e9](https://github.com/onnx/onnx/commit/8afff0e9): bump ORT version (#2538) <Lu Fang> - [3d9ca57e](https://github.com/onnx/onnx/commit/3d9ca57e): fix name of directory (#2537) <Prasanth Pulavarthi> - [df8fa2c9](https://github.com/onnx/onnx/commit/df8fa2c9): Repository guidelines (#2539) <Prasanth Pulavarthi> - [49cc2f02](https://github.com/onnx/onnx/commit/49cc2f02): Update CircleCI job to use Python3.6 (#2527) <bddppq> - [25ff79a4](https://github.com/onnx/onnx/commit/25ff79a4): Fix wrong model version, it's not 12 (the onnx_opset_version()), not 11 (the opset version of the latest stable), but 10 (#2478) <daquexian> - [7cebaed5](https://github.com/onnx/onnx/commit/7cebaed5): Fix Windows py3.5 CI (#2529) <bddppq> - [eddae00e](https://github.com/onnx/onnx/commit/eddae00e): Correct the order of arguments of InferShapes (#2500) <Shinichiro Hamaji> - [41b5afe6](https://github.com/onnx/onnx/commit/41b5afe6): Include <ostream> in common/status.h (#2519) <Casey Carter> - [423f1977](https://github.com/onnx/onnx/commit/423f1977): add 8 bit support to maxpool op (#2510) <Ashwini Khade> - [78593c2f](https://github.com/onnx/onnx/commit/78593c2f): add 8 bit support to reducemin and reducemax ops (#2516) <Ashwini Khade> Test Plan: cont build Reviewed By: benoitsteiner Differential Revision: D19380034 fbshipit-source-id: ddce8450864a611773b2a32e2f0254c9bb6b6906	2020-01-14 15:21:37 -08:00
davidriazati	f3b67bf750	Fix frontend kwarg defualts error (#32146 ) Summary: This was not tested before, fixes #32139 (which was actually a false positive, functions with kwargs but without defaults on those kwargs are supported). This PR adds testing for both cases and cleans up the error reporting. ](https://our.intern.facebook.com/intern/diff/19385828/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32146 Pulled By: driazati Differential Revision: D19385828 fbshipit-source-id: 5eab74df6d02f8e1d7ec054cafb44f909f9d637e	2020-01-14 14:59:36 -08:00
Tao Xu	ecc3497172	Update Gemfile (#32147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32147 ### Summary Got some security warnings regarding the ruby dependencies. This diff updates the packages in Gemfile. ``` GitHub has detected that a package defined in the ios/TestApp/Gemfile.lock file of the pytorch/pytorch repository contains a security vulnerability. Package name: excon Affected versions: < 0.71.0 Fixed in version: 0.71.0 Severity: LOW Identifier(s): GHSA-q58g-455p-8vw9 CVE-2019-16779 ``` ### Test Plan - Won't affect the existing iOS CI jobs Test Plan: Imported from OSS Differential Revision: D19400087 Pulled By: xta0 fbshipit-source-id: 34b548d136cfd6b68fcc53bf0b243461bd7afd64	2020-01-14 14:52:50 -08:00
Martin Yuan	9bf0479b65	Fix the passing-by-ref constructor of OperatorName. (#32170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32170 Stack from [ghstack](https://github.com/ezyang/ghstack): Change the overload name from passing by const ref to by value and move. * #32170 Fix the passing-by-ref constructor of OperatorName. Test Plan: Imported from OSS Differential Revision: D19396225 Pulled By: iseeyuan fbshipit-source-id: e946c47647e1f8d23d7565cfe93f487845e7f24c	2020-01-14 13:52:12 -08:00
Michael Suo	51a34545e9	Revert D18482934: support torch script call over rpc Test Plan: revert-hammer Differential Revision: D18482934 Original commit changeset: bd82a0d820c4 fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158	2020-01-14 13:30:56 -08:00
Tao Xu	4a26bb9b18	Suppress pip logs (#31912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31912 ### Summary Clean up the logs from pip-install. ### Test Plan - Don't break the iOS simulator build Test Plan: Imported from OSS Differential Revision: D19395526 Pulled By: xta0 fbshipit-source-id: a638a209cab801ce90c8615e7ea030b1ab0939f3	2020-01-14 12:04:53 -08:00
Michael Ranieri	2bb9dbeffa	omit constexpr with nvcc on clang (#32149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32149 This is an attempt at clarifying some of the preprocessor boolean logic that was getting more and more complicated. The previous logic used constexpr with nvcc on clang; which we were getting compiler failures on in ovrsource with mode/linux/* (based on platform007). Test Plan: ovrsource xplat/caffe2 compiles fbsource sandcastle green Differential Revision: D19385409 fbshipit-source-id: 60a02bae9854388b87510afdd927709673a6c313	2020-01-14 11:49:16 -08:00
Peter Bell	b0ac425dc4	Emit warning from deprecated torch function signatures (#32009 ) Summary: Continuation of https://github.com/pytorch/pytorch/issues/31514, fixes https://github.com/pytorch/pytorch/issues/28430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32009 Test Plan: I verified that the deprecation warnings only occur once on a relevant workflow. Built with: ``` buck build mode/opt //vision/fair/detectron2/tools:train_net ``` Ran with: ``` DETECTRON2_ENV_MODULE=detectron2.fb.env ~/local/train_net.par --config-file configs/quick_schedules/retinanet_R_50_FPN_instant_test.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 2 ``` Inspected log: ``` [01/14 07:28:13 d2.engine.train_loop]: Starting training from iteration 0 buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1299: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, Number alpha) buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1334: UserWarning: This overload of add_ is deprecated: add_(Number alpha, Tensor other) Consider using one of the following signatures instead: add_(Tensor other, Number alpha) [01/14 07:28:25 d2.utils.events]: eta: 0:00:10 iter: 19 total_loss: 1.699 loss_cls: 1.185 loss_box_reg: 0.501 time: 0.5020 data_time: 0.0224 lr: 0.000100 max_mem: 3722M [01/14 07:28:35 fvcore.common.checkpoint]: Saving checkpoint to ./output/model_final.pth ``` Differential Revision: D19373523 Pulled By: ezyang fbshipit-source-id: 75756de129645501f43ecc4e3bf8cc0f78c40b90	2020-01-14 11:44:29 -08:00
davidriazati	61e509b992	Skip un-runnable tests (#31965 ) Summary: `test_init_ops` calls `orthogonal_` which fails without lapack (this test was just missing a skip condition) The cpp tests would fail with a `undefined symbol` error if run with `BUILD_TESTS=0`, so this PR skips them if that flag is `0` ](https://our.intern.facebook.com/intern/diff/19320064/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31965 Pulled By: driazati Differential Revision: D19320064 fbshipit-source-id: d1dcd36714107688ded25a414e8969abe026bd03	2020-01-14 11:36:52 -08:00
davidriazati	0664c6bbfd	Add ccls cache to gitignore (#31437 ) Summary: `ccls` [puts a cache](https://github.com/MaskRay/ccls/wiki/Customization#cachedirectory) in the working directory by default, this PR adds it to gitignore so git doesn't pick it up ](https://our.intern.facebook.com/intern/diff/19165007/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31437 Pulled By: driazati Differential Revision: D19165007 fbshipit-source-id: 41012eb0ece2df60b8566d7929710b154c38ee66	2020-01-14 11:27:18 -08:00
Alexander Golynski	b783a75aa3	Fix scalar^tensor derivative for scalars that are zero Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32063 Test Plan: Imported from OSS Differential Revision: D19394258 Pulled By: agolynski fbshipit-source-id: 3eed0f9cc1b8c677c6948c927d007044be67fe7f	2020-01-14 11:11:23 -08:00
Alexander Golynski	fa60e1150d	Fix tensor^tensor derivative for 0 base entries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32062 Test Plan: Imported from OSS Differential Revision: D19394259 Pulled By: agolynski fbshipit-source-id: 836525e03573af838511ad5b4cc87ec2c1536a5e	2020-01-14 11:10:25 -08:00
Xiang Gao	1487582ba7	Switch important CI from CUDA 9 to 10.1 (#31951 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31951 Differential Revision: D19393566 Pulled By: ezyang fbshipit-source-id: 06f9637791494a453d3fbef765840dc9f9805196	2020-01-14 09:38:55 -08:00
Yanli Zhao	dbd737158b	support torch script call over rpc (#30063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30063 This diff makes following changes: 1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type. Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods. 2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well. 3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT. 4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes. 5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue ghstack-source-id: 96638829 Test Plan: unit test Differential Revision: D18482934 fbshipit-source-id: bd82a0d820c47a8e45b2e7c616eca06573f7d7ea	2020-01-14 09:27:04 -08:00
Edward Yang	5f1a881cb8	Add private user tensor type IDs for experimentation. (#31830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31830 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19330312 Pulled By: ezyang fbshipit-source-id: fe2e53e732e946088e983ec45fed2393436f0517	2020-01-14 09:01:03 -08:00
Tongzhou Wang	8d472bab6b	Make torch.backends.mkldnn usable without import Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32055 Differential Revision: D19373220 Pulled By: ezyang fbshipit-source-id: 50ab3ff70fc893c81123419c4d3cf2e3e48a0a93	2020-01-14 08:19:19 -08:00
Alban Desmaison	77c78b7d28	remove .data from torch/nn doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31481 Test Plan: Imported from OSS Differential Revision: D19303242 Pulled By: albanD fbshipit-source-id: 4f650df9e9e302a299175967bcc6e30a5099fa2a	2020-01-14 07:30:42 -08:00
Alban Desmaison	c036fbdc5c	remove .data from torch/jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31480 Test Plan: Imported from OSS Differential Revision: D19303244 Pulled By: albanD fbshipit-source-id: ec66b32353f2f9b16072185ecde3ae8abbe09a35	2020-01-14 07:30:37 -08:00
Alban Desmaison	26621d101f	remove simple .data from torch/nn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31482 Test Plan: Imported from OSS Differential Revision: D19303185 Pulled By: albanD fbshipit-source-id: 610eae096bab24a7b9f651b9af2e3ecd19df55b0	2020-01-14 07:29:24 -08:00
svcscm	62b1a5f846	Updating submodules Summary: GitHub commits: `2156e48924` `8c5b4af317` `be69716784` `4f76ad1fab` `0b12b2f13c` `0449b53cb1` `1481689822` `43ffa9bbf0` `787d6b6c93` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: b0080fd1a4c26efbe8f26245fbba7740fbac08f3	2020-01-13 20:15:38 -08:00
Brian Stark	a472f0201f	Added support for Dim operation in ONNX export (#31928 ) Summary: While ONNX does not currently directly support the Dim operation on a tensor, we can provide the same functionality with two ONNX operations. This allows us to support Dim for all opsets. It may be adventageous to add support for Dim into a future ONNX opset, and use that for more efficient code. While testing dim op found that there is an issue with empty blocks withing if statements. Modified graph generation to prevent generation of empty if blocks. Fixes https://github.com/pytorch/pytorch/issues/27569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31928 Reviewed By: hl475 Differential Revision: D19376602 Pulled By: houseroad fbshipit-source-id: 111682b058a5341f5cca6c1a950c83ae412a4c6c	2020-01-13 19:42:43 -08:00
svcscm	c474952b5d	Updating submodules Summary: GitHub commits: `1f8321394d` `024c1d0b43` `1d57089fc3` `3c6f1f782c` `21a27b0f8e` `23bb716b62` `894c6d21af` `e3e241d700` `ac4e11d84a` `c35803ad68` `647388f265` `50a3288630` `b197f0c95a` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 1807ac876a126d221c257edbd4732f9a1240e869	2020-01-13 18:07:08 -08:00
fehiepsi	470c496eb2	use cholesky_inverse to compute precision matrix (#32092 ) Summary: Resolves a long-standing TODO. :D I also fix the docs of lowrank_mvn which is raised at [forum](https://discuss.pytorch.org/t/lowrankmultivariatenormal-example-raises-valueerror/65381). cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/32092 Differential Revision: D19373912 Pulled By: ezyang fbshipit-source-id: b13129d7c30e87c6f8a6ced86601762a3f5c5624	2020-01-13 16:35:46 -08:00
Pritam Damania	f003008d6e	Allow TCPStore to pick a port to bind to. (#31674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31674 The motivation of this PR was to fix the problem where we would see "Address already in use" issues for TCPStoreTest due to port conflicts. To resolve this: 1. We can now pass in port 0 for TCPStore and retrieve the port it actually bound to using a new getPort() API. 2. Added a `wait` flag to TCPStore constructor indicating whether or not it should wait for workers (defaults to true). 3. Made `waitForWorkers` a public API to ensure that we can construct TCPStore without waiting and wait for workers separately. This helps in TCPStoreTest to ensure we can retrieve the port and pass it to the client stores. ghstack-source-id: 96486845 Test Plan: waitforbuildbot Differential Revision: D19240947 fbshipit-source-id: 7b1d1cb2730209fac788764845f1dbbe73d75d9b	2020-01-13 14:23:31 -08:00
Will Feng	632d6fc583	Revert D19373615: Fix typo in config script to re-enable libtorch build and test in macOS CI Test Plan: revert-hammer Differential Revision: D19373615 Original commit changeset: 28686ef58953 fbshipit-source-id: 432b04adfd9d010e1965846a386f117ebc80e013	2020-01-13 14:11:30 -08:00
Zafar Takhirov	701ca68882	Docs entry for the `is_quantized` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32075 Test Plan: Imported from OSS Differential Revision: D19353861 Pulled By: z-a-f fbshipit-source-id: 4249216ac9a4af354a251c62181d65bc14cbfd3e	2020-01-13 13:54:35 -08:00
svcscm	d53ce5e4cd	Updating submodules Summary: GitHub commits: `b5718e35c8` `e1af1b0550` `8a34e7f444` `e9e70ade5b` `d9e693ece0` `329347c63c` `671b5aa064` `7f3bb0bf37` `6207e92b9b` `d4b95d87d4` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 3c9131bdee0bf8a8ca5c679a95e8ff8a6f805762	2020-01-13 13:30:11 -08:00
Richard Zou	d97413eb7a	Change python/cpp docs CI to use a CPU-only image (#32102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32102 Previously, the docs CI depended on our CUDA xenial py3 build. This meant that the turnaround time to get signal for docs was very slow (I've seen builds that go as much as 3 hours). Fortunately, the docs CI do not (and should not!) rely on CUDA. This PR changes it so that the docs CI runs on a CPU-only machine. Fixes #29995 Test Plan: - Check CI status on this PR by reading logs for the python and cpp docs builds. - I built the docs locally, once for CPU, and once for CUDA, and verified (via diff) that the pages were exactly the same) Differential Revision: D19374078 Pulled By: zou3519 fbshipit-source-id: 3eb36f692c3c0632d2543d3439c822d51a87b809	2020-01-13 12:01:49 -08:00
Jerry Zhang	1f34801460	More robust mangling (#31978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31978 Currently we keep a `mangleIndex_` that's intenral to compilation unit and just increment the index when we found the original name is mangled, this doesn't guarantee the new name is not defined. This PR fixes the problem by querying whether the new name is defined or not. fixes: https://github.com/pytorch/pytorch/issues/31268 Test Plan: fixes the issue Imported from OSS Differential Revision: D19350535 fbshipit-source-id: fe3262b2838d4208ab72e2cd4a5970b3a792ae86	2020-01-13 11:11:50 -08:00
Will Feng	a3dd44653f	Fix typo in config script to re-enable libtorch build and test in macOS CI (#32072 ) Summary: Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue. Test Plan: Check that libtorch build and test are running again in macOS CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32072 Differential Revision: D19373615 Pulled By: yf225 fbshipit-source-id: 28686ef5895358a2b60db46b1946f21c58c6a18e	2020-01-13 10:25:10 -08:00
leetanenbaum	5988d36f58	Fix cumprod error for tensors with zero elements (#32070 ) Summary: Currently cumprod crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both dim() and numel() in cumprod backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/32070 Differential Revision: D19373200 Pulled By: ezyang fbshipit-source-id: d8ecde33f3330b40a7c611f6faa3b1d707ef2a9a	2020-01-13 09:50:27 -08:00
Brian Vaughan	695c4f1bab	Fix a typo in function name: liner -> linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32068 Test Plan: Imported from OSS Differential Revision: D19373360 Pulled By: nairbv fbshipit-source-id: 7696300b5c1dbcd7991fda3311d68807b2960982	2020-01-13 09:33:50 -08:00
Xiang Gao	8e93159fb6	CUDA 8 cleanup (#32013 ) Summary: CUDA 8 is no longer supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/32013 Differential Revision: D19372963 Pulled By: ezyang fbshipit-source-id: e584d7d5d5908933221ea4400234b3e6e7c32e7a	2020-01-13 08:48:48 -08:00
ashish	9a4219eb39	Install complete set of headers for ROCm build (#32076 ) Summary: This PR adds a more complete list of pytorch header files to be installed at build time. It also fixes one instance of including a header from local src directory instead of installed directory. A more complete set of headers enable other modules to correctly work with pyTorch built for ROCm. cc: ezyang bddppq iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/32076 Differential Revision: D19372933 Pulled By: ezyang fbshipit-source-id: 3b5f3241c001fa05ea448c359a706ce9a8214aa0	2020-01-13 08:33:28 -08:00
Xiang Gao	4002fec509	Display NVCC version in CI for convenience to look at Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32069 Differential Revision: D19372943 Pulled By: ezyang fbshipit-source-id: c78e5779d4139e42df1f235db65d8c0399ffa1a2	2020-01-13 08:16:52 -08:00
Eleanor Dwight Holland	e74a215ade	Changed clip_grad_norm_ total_norm calculation (#32020 ) Summary: Redefines the computation of the total_norm to increase performance as shown in https://github.com/pytorch/pytorch/issues/31474. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32020 Differential Revision: D19353309 Pulled By: ngimel fbshipit-source-id: bf7530dcd39f56614a211b5f21445864d4f2e875	2020-01-13 08:13:46 -08:00
vishwakftw	77c2c78e01	Fix typographical error in torch.triu docstring (#32067 ) Summary: below --> above Fixes https://github.com/pytorch/pytorch/issues/32032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32067 Differential Revision: D19355788 Pulled By: zou3519 fbshipit-source-id: dc7a2538a78cd11e72d47ad923ef50599a5a87e2	2020-01-13 07:21:33 -08:00
Zachary DeVito	14593f077f	remove list specialization from ivalue (#30734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734 What are specialized lists? The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types. e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>. Why do we have specialized lists? When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>, std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain these same types. Conversion was just unwrapping the IValue, very easy and cheap. What is the problem with specialized lists? We end up with significant special cases through the compiler. Other types like Dict are not specialized. So in the Pickler, for instance, there is a single piece of logic to handle their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't match Python, leading to problems along translation boundaries. Our pickle serialization is slightly different than python, so it is harder to load objects from our IValue serialization as Python values. They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++ bindings to TorchScript. This would entail having a single torch::List class (untemplated) that can be used to construct inputs. This is made much harder if the underlying ivalue needs to be different depending on the type inside the list. The ideal case would be to have a constructor like ``` template<typename T> List(std::vector<T> foo); ``` It would then set up the type tags correctly based on type T, without the need for passing tags. Do specialized lists improve perf? Not in a way we have been able to measure. Our major concern initially was having to translate a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern for aten::_convolution which takes a number of mostly-constant lists of integers. However, when we measure the effect of actually having to do this conversion for an aten::_convolution, it does not take measurable time (benchmark results below). This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code. What are the issues removing them? This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly the same. The only visible change is that toTensorListRef and family have turned into toTensorVector because they now return by value a copy of the list as a vector. Further PRs can then clean up the complexity issues that arose from speclization. This will likely involve removing the isTensorList/isIntList functions, and refactoring the code that used them to work generically. At some point we will also change serialization to no longer write specialized lists in the pickle binary. This is forward incompatible, so will go in its own PR. Benchmark: ``` import torch import torch.nn as nn import torch.nn.functional as F import time class MnistNet(nn.Module): def __init__(self): super(MnistNet, self).__init__() self.conv1 = nn.Conv2d(1, 1, kernel_size=1) self.conv2 = nn.Conv2d(1, 1, kernel_size=1) def forward(self, x): for i in range(10): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x model = MnistNet() x = torch.rand(1, 1, 1, 1) r = torch.jit.trace(model, x ) r(x) r(x) r(x) r(x) print(torch.jit.last_executed_optimized_graph()) while True: b = time.time() for i in range(100): r(x) e = time.time() print(e - b) ``` Results (no observable difference): ``` Before (actual conv) 0.13251137733459473 0.13260436058044434 0.13276338577270508 0.1327497959136963 0.13250041007995605 0.13270330429077148 0.13290190696716309 0.13265132904052734 0.13274288177490234 0.1326758861541748 0.13253355026245117 0.13254785537719727 0.13260746002197266 0.13285017013549805 0.13264012336730957 0.132490873336792 0.13280034065246582 0.13243484497070312 0.1325232982635498 0.1326127052307129 0.13264131546020508 0.13274383544921875 0.13298296928405762 0.1326909065246582 ------------------- After (actual conv) 0.13127517700195312 0.13150334358215332 0.13092470169067383 0.13102364540100098 0.13134360313415527 0.13155555725097656 0.13314104080200195 0.13151955604553223 0.13160037994384766 0.1315293312072754 0.13137340545654297 0.13148093223571777 0.131455659866333 0.1327371597290039 0.13134026527404785 0.13152337074279785 0.13151192665100098 0.13165974617004395 0.13403725624084473 0.13251852989196777 0.13135504722595215 0.1315624713897705 0.1317615509033203 0.1314380168914795 0.13157200813293457 -------------------- The following replace the convolution operator with a no-op, to show that even if the conv op was made faster, then we still would not see a difference: Before (fake conv) 0.0069539546966552734 0.0069522857666015625 0.007120847702026367 0.007344722747802734 0.007689952850341797 0.007932662963867188 0.00761723518371582 0.007501363754272461 0.007532835006713867 0.007141828536987305 0.007174253463745117 0.007114410400390625 0.007071495056152344 ------------------ After (fake conv) 0.007458209991455078 0.007337093353271484 0.007268190383911133 0.007313251495361328 0.007306575775146484 0.007468700408935547 0.0073091983795166016 0.007308483123779297 0.007538318634033203 0.007356882095336914 0.007464170455932617 0.007372140884399414 ``` Test Plan: Imported from OSS Differential Revision: D18814702 Pulled By: zdevito fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6	2020-01-12 18:28:25 -08:00
J M Dieterich	46f32e136a	Revert "Support PyTorch ROCm CI on Ubuntu18.04 (#31886 )" (#31946 ) Summary: This reverts commit 4ee9c562188ae930cb2520cfce7805f55acaf968. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31946 Differential Revision: D19368391 Pulled By: bddppq fbshipit-source-id: 63d032a5256ff4da7247fb1092be314c5b133eb6	2020-01-12 14:04:38 -08:00
Rohan Varma	927c2a02b0	enable autograd profiler to work with RPC and RRef. (#31381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31381 This PR adds support for being able to profile both sync and async RPCs, so that users can use the autograd profiler and be able to view metrics such as RPC latency and number of calls in the profiler output. The way this is implemented is by using the existing `RecordFunction` class provided by the autograd profiler. We create a `RecordFunction` instance when sending an RPC, if autograd profiling is enabled. We also invoke the starting callbacks on this `RecordFunction` instance, this does things such as start the CPU timer. This instance is then persisted across the lifetime of the RPC by attaching it to the `Future` created by the RPC. When the RPC is finished (i.e. when `future->markComplete()` is called), we run the `RecordFunction` instance's end callbacks, which among other things, stops the timer so that we get the correct RPC latency. The `RecordFunction` and relevant callbacks in `profiler.cpp` are modified slightly to support running end callbacks from a different thread (which is needed since futures are marked as completed by a different thread than the main RPC thread). By default, the autograd profiler uses a `thread_local` list of `Events` and `thread_id`. However, since we'd like to run the `RecordFunction`'s callbacks from a different thread, we would like to access the list of `Events` created by the original thread. This is done by attaching the `thread_id` for the event to the `RecordFunction`, and then looking up the event with that thread in `all_event_lists` (see the changes in `profiler.cpp`). To ensure that the original behavior does not change in the profiler, this described behavior is only run when a user calls `setOverrideThreadId()` on the `RecordFunction` object. ghstack-source-id: 96527291 Test Plan: Added a unit test. Differential Revision: D19053322 fbshipit-source-id: 9a27a60c809fc4fdb16fa5d85085f3b6b21abfbb	2020-01-10 21:26:18 -08:00
Chenyang Yu	20e5c90d82	accept url query when rank or wolrd_size is specified (#32016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32016 The previously logic will raise exception when there is query in url when rank or world_size is specified The fix will parse the url and stitch rank and world_size into url.query and regenerate the url. Test Plan: f161291877 Differential Revision: D19337929 fbshipit-source-id: 6bb3a07716dda5233553804000b706052ff18db8	2020-01-10 18:27:06 -08:00
Will Feng	b6cee03e29	C++ tensor indexing: add Slice / TensorIndex (#30424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30424 `at::indexing::TensorIndex` is used for converting C++ tensor indices such as `{None, "...", Ellipsis, 0, true, {1, None, 2}, torch::tensor({1, 2})}` into its equivalent `std::vector<TensorIndex>`, so that further tensor indexing operations can be performed using the supplied indices. Test Plan: Imported from OSS Differential Revision: D18695902 Pulled By: yf225 fbshipit-source-id: d73e14a411cdbec815866b02e75ffd71a9186e89	2020-01-10 17:53:41 -08:00
anjali411	638e4ad8b9	Updated function definition for torch.mode and torch.median in torch docs (#32003 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/32002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32003 Differential Revision: D19334306 Pulled By: anjali411 fbshipit-source-id: fe6a7cc7295b2d582a0b528f353ec64d9085e8c5	2020-01-10 13:13:54 -08:00
Silun Wang	28c1258f18	Scale init for batch-norm and layer-norm (#31983 ) Summary: Per discussion with Fei Tian, we need to add a `scale_init_value` to scale down the output of normalization such as batch-norm and layer-norm. Currently we have `sparse_normalization_options` to normalize embedding pooling output. By default, scale = 1.0, we found it's better to set scale from 0.025 to 0.1 https://fb.quip.com/MiKUAibEaYhH Besides, I am removing the tags from normalizers because it makes more sense to calculate norm ops in distributed trainers, not ps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31983 Test Plan: Testing LN and BN after sum-pooling -- baseline f160348514 LN: f160348609 BN: f160348710 {F226106518} Layer norm after sum-pooling fwd_net https://fburl.com/sa4j207n Layer norm after dot-prod fwd_net https://fburl.com/twggwyvb ## Unit Tests Testing normalization after pooling ``` buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_batch_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_batch_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_layer_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_layer_normalization ``` Testing normalization after dot-prod ``` buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_batch_norm buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_layer_norm ``` Differential Revision: D19277618 Pulled By: SilunWang fbshipit-source-id: ea323e33e3647ba55d2e808ef09d94ad7b45b934	2020-01-10 11:55:56 -08:00
Rohan Varma	c5af0afdcb	catch exceptions in ProcessGroupAgent::enqueueSend and report them. (#31023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31023 Adds support to catch exceptions in ProcessGroupAgent::enqueueSend and report them in the future by marking the future as completed with an exception indicating the error. An example of when this could happen is if the receiving side aborts when the sender is sending the message, previously, we would hang until the timeout is hit, and the original exception would be lost. ghstack-source-id: 96498386 Test Plan: Added a relevant unit test: `test_sender_exceptions` in rpc_test.py Differential Revision: D18901981 fbshipit-source-id: 08de26936c4ad45b837219a247088cbea644c04c	2020-01-10 11:39:57 -08:00
Jiakai Liu	346005d3ed	integrate op dependency analysis process into CI Summary: Custom build and internal build will depend on the analysis result so let's make sure it doesn't break. Tested locally with LLVM-5.0, LLVM-7 and LLVM-8. Test Plan: - check CI result Differential Revision: D18894637 Pulled By: ljk53 fbshipit-source-id: 657854e4bed85a84907e3b6638d158823a56ec80	2020-01-10 11:37:37 -08:00
Jiakai Liu	16b8ca56b6	update docker image version (#31848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31848 Trigger docker image build and bump up docker image version. Test Plan: - Check tag at: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Differential Revision: D19282725 Pulled By: ljk53 fbshipit-source-id: a27b2831a92ff54d80ccbae0f18dadff0469254c	2020-01-10 11:37:32 -08:00
Jiakai Liu	03ff3eb94d	skip TEST_DILL on Python2 (#32027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32027 The test was added in #30985 for #28313. Seems the fix only works for Python3 but doesn't work on Python2. The current Python2 CI docker image doesn't have `dill` module installed at all so it's not captured. I'm trying to build and push new CI docker image which has `dill` installed and I verified it's the latest version 0.3.1.1 but the fix doesn't seem to work and blocks me from upgrading image version. It works for Python3 docker image though... Here is a succeeded job with old image (no dill installed): https://app.circleci.com/jobs/github/pytorch/pytorch/4192688 Here is a failed job with new image (dill installed): https://app.circleci.com/jobs/github/pytorch/pytorch/4192679 This PR bypasses the test for Py2 to unblock docker image change. We can figure out a proper fix for Py2 later. Test Plan: Imported from OSS Differential Revision: D19341451 Pulled By: ljk53 fbshipit-source-id: d5768de8cbaf1beba8911da76f4942b8f210f2d2	2020-01-10 11:37:28 -08:00
Jiakai Liu	ab5eb65e74	gate torch_global_deps with BUILD_SHARED_LIBS flag (#32011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32011 Run into build problem with Ninja + code analysis build as follows: ``` The install of the torch_global_deps target requires changing an RPATH from the build tree, but this is not supported with the Ninja generator unless on an ELF-based platform. ``` Seems we don't need build the target for static build mode? Verified code analyzer works with the patch. Test Plan: Imported from OSS Differential Revision: D19336818 Pulled By: ljk53 fbshipit-source-id: 37f45a9392c45ce92c1df40d739b23954e50a13a	2020-01-10 11:37:24 -08:00
Jerry Zhang	f995ec2076	Remove qconfig_dict in top level eager mode quantization API (#31972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31972 Since eager mode quantization requires many user modifications, we can't consistently quantize a given model by just changing qconfig_dict, therefore the top level `qconfig_dict` is not that useful. fixes: https://github.com/pytorch/pytorch/issues/31549 Test Plan: . Imported from OSS Differential Revision: D19330691 fbshipit-source-id: 8aee6e5249e0c14e8a363ac1a83836e88887cd7d	2020-01-10 11:04:37 -08:00
svcscm	c5a362a96d	Updating submodules Summary: GitHub commits: `b14a430062` `c1c5426018` `42d18a93c4` `a4e11e8721` `25c971b0c3` `b2ea65322f` `e86573b6de` `31d721301c` `687119aeaf` `25cad9547d` `428862c045` `95640f80d8` `0e4db05b37` `5cb83de9cc` `4fdb800074` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: bcd533c540c1170844dbf2b23538d72c95a0d304	2020-01-10 11:01:20 -08:00
xiaobing.zhang	8098ae455c	Move rshift to Aten (#31594 ) Summary: VitalyFedyunin , this PR is about move rshift to Aten. Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__rshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__irshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __rshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17183916084468365 device: cpu, dtype: torch.uint8, 100000 times 0.16587729007005692 device: cpu, dtype: torch.int16, 100000 times 0.16659130714833736 device: cpu, dtype: torch.int32, 100000 times 0.17177579551935196 device: cpu, dtype: torch.int64, 100000 times 0.17860156949609518 device: cpu, dtype: torch.float32, 100000 times 0.23938780091702938 device: cpu, dtype: torch.float64, 100000 times 0.22591270506381989 device: cuda, dtype: torch.int8, 100000 times 1.2709560776129365 device: cuda, dtype: torch.uint8, 100000 times 1.2692269310355186 device: cuda, dtype: torch.int16, 100000 times 1.2785452520474792 device: cuda, dtype: torch.int32, 100000 times 1.2733035255223513 device: cuda, dtype: torch.int64, 100000 times 1.2785427365452051 device: cuda, dtype: torch.float32, 100000 times 1.2980637094005942 device: cuda, dtype: torch.float64, 100000 times 1.3062487514689565 __rshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03122080024331808 device: cpu, dtype: torch.uint8, 10000 times 0.030290847644209862 device: cpu, dtype: torch.int16, 10000 times 0.024531075730919838 device: cpu, dtype: torch.int32, 10000 times 0.024743229150772095 device: cpu, dtype: torch.int64, 10000 times 0.025563121773302555 device: cpu, dtype: torch.float32, 10000 times 0.6707976600155234 device: cpu, dtype: torch.float64, 10000 times 0.5344798369333148 device: cuda, dtype: torch.int8, 10000 times 0.12768010422587395 device: cuda, dtype: torch.uint8, 10000 times 0.12681372743099928 device: cuda, dtype: torch.int16, 10000 times 0.12995595764368773 device: cuda, dtype: torch.int32, 10000 times 0.12989260721951723 device: cuda, dtype: torch.int64, 10000 times 0.12804713658988476 device: cuda, dtype: torch.float32, 10000 times 0.13013121113181114 device: cuda, dtype: torch.float64, 10000 times 0.1406280631199479 __irshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3805475188419223 device: cpu, dtype: torch.uint8, 100000 times 0.36341007333248854 device: cpu, dtype: torch.int16, 100000 times 0.36908434610813856 device: cpu, dtype: torch.int32, 100000 times 0.3669992135837674 device: cpu, dtype: torch.int64, 100000 times 0.37847711704671383 device: cpu, dtype: torch.float32, 100000 times 0.4311870699748397 device: cpu, dtype: torch.float64, 100000 times 0.44503832422196865 device: cuda, dtype: torch.int8, 100000 times 1.4343859804794192 device: cuda, dtype: torch.uint8, 100000 times 1.4298221375793219 device: cuda, dtype: torch.int16, 100000 times 1.4460898758843541 device: cuda, dtype: torch.int32, 100000 times 1.4518025070428848 device: cuda, dtype: torch.int64, 100000 times 1.4456725595518947 device: cuda, dtype: torch.float32, 100000 times 1.4610810624435544 device: cuda, dtype: torch.float64, 100000 times 1.4736663019284606 __irshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.05944254994392395 device: cpu, dtype: torch.uint8, 10000 times 0.058085592463612556 device: cpu, dtype: torch.int16, 10000 times 0.05094402376562357 device: cpu, dtype: torch.int32, 10000 times 0.050842881202697754 device: cpu, dtype: torch.int64, 10000 times 0.06223891582340002 device: cpu, dtype: torch.float32, 10000 times 0.7006897022947669 device: cpu, dtype: torch.float64, 10000 times 0.5614962242543697 device: cuda, dtype: torch.int8, 10000 times 0.1461706068366766 device: cuda, dtype: torch.uint8, 10000 times 0.14335164614021778 device: cuda, dtype: torch.int16, 10000 times 0.1448021186515689 device: cuda, dtype: torch.int32, 10000 times 0.14513055887073278 device: cuda, dtype: torch.int64, 10000 times 0.1439579650759697 device: cuda, dtype: torch.float32, 10000 times 0.14666561130434275 device: cuda, dtype: torch.float64, 10000 times 0.1540807681158185 ``` After: ``` _rshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.16366520430892706 device: cpu, dtype: torch.uint8, 100000 times 0.16091545950621367 device: cpu, dtype: torch.int16, 100000 times 0.1659633992239833 device: cpu, dtype: torch.int32, 100000 times 0.1682385364547372 device: cpu, dtype: torch.int64, 100000 times 0.17289020214229822 device: cpu, dtype: torch.float32, 100000 times 0.24359441827982664 device: cpu, dtype: torch.float64, 100000 times 0.21783945057541132 device: cuda, dtype: torch.int8, 100000 times 1.2517220517620444 device: cuda, dtype: torch.uint8, 100000 times 1.260181212797761 device: cuda, dtype: torch.int16, 100000 times 1.2681935774162412 device: cuda, dtype: torch.int32, 100000 times 1.2764465296640992 device: cuda, dtype: torch.int64, 100000 times 1.294325228780508 device: cuda, dtype: torch.float32, 100000 times 1.3062216322869062 device: cuda, dtype: torch.float64, 100000 times 1.303224254399538 __rshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.027045012451708317 device: cpu, dtype: torch.uint8, 10000 times 0.026978280395269394 device: cpu, dtype: torch.int16, 10000 times 0.025594274513423443 device: cpu, dtype: torch.int32, 10000 times 0.02593063935637474 device: cpu, dtype: torch.int64, 10000 times 0.02668109256774187 device: cpu, dtype: torch.float32, 10000 times 0.09746317192912102 device: cpu, dtype: torch.float64, 10000 times 0.1644029449671507 device: cuda, dtype: torch.int8, 10000 times 0.12530914042145014 device: cuda, dtype: torch.uint8, 10000 times 0.12615622486919165 device: cuda, dtype: torch.int16, 10000 times 0.12741118855774403 device: cuda, dtype: torch.int32, 10000 times 0.1284919548779726 device: cuda, dtype: torch.int64, 10000 times 0.12974756956100464 device: cuda, dtype: torch.float32, 10000 times 0.13044228963553905 device: cuda, dtype: torch.float64, 10000 times 0.13918257877230644 __irshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.19456563983112574 device: cpu, dtype: torch.uint8, 100000 times 0.190769555978477 device: cpu, dtype: torch.int16, 100000 times 0.2002257639542222 device: cpu, dtype: torch.int32, 100000 times 0.20456529594957829 device: cpu, dtype: torch.int64, 100000 times 0.2043834924697876 device: cpu, dtype: torch.float32, 100000 times 0.2832390898838639 device: cpu, dtype: torch.float64, 100000 times 0.2582795573398471 device: cuda, dtype: torch.int8, 100000 times 1.304957083426416 device: cuda, dtype: torch.uint8, 100000 times 1.3216373259201646 device: cuda, dtype: torch.int16, 100000 times 1.3238621400669217 device: cuda, dtype: torch.int32, 100000 times 1.333009460940957 device: cuda, dtype: torch.int64, 100000 times 1.3835567953065038 device: cuda, dtype: torch.float32, 100000 times 1.4483617274090648 device: cuda, dtype: torch.float64, 100000 times 1.4179155295714736 __irshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03196091763675213 device: cpu, dtype: torch.uint8, 10000 times 0.03048650734126568 device: cpu, dtype: torch.int16, 10000 times 0.03048624936491251 device: cpu, dtype: torch.int32, 10000 times 0.030591044574975967 device: cpu, dtype: torch.int64, 10000 times 0.031246556900441647 device: cpu, dtype: torch.float32, 10000 times 0.10918692220002413 device: cpu, dtype: torch.float64, 10000 times 0.18057993799448013 device: cuda, dtype: torch.int8, 10000 times 0.13614848721772432 device: cuda, dtype: torch.uint8, 10000 times 0.130373639985919 device: cuda, dtype: torch.int16, 10000 times 0.1332557238638401 device: cuda, dtype: torch.int32, 10000 times 0.1331850504502654 device: cuda, dtype: torch.int64, 10000 times 0.1363008264452219 device: cuda, dtype: torch.float32, 10000 times 0.1370363561436534 device: cuda, dtype: torch.float64, 10000 times 0.1442740885540843 ``` Fix https://github.com/pytorch/pytorch/issues/24512 #24516 https://github.com/pytorch/pytorch/issues/24659 https://github.com/pytorch/pytorch/issues/24663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31594 Differential Revision: D19346542 Pulled By: ezyang fbshipit-source-id: 37dd00b86898810b850cf4769c3af8aea6d4596b	2020-01-10 10:52:15 -08:00
Johannes M Dieterich	a201027e93	Abstract atomic add calls (#31992 ) Summary: Instead of a mixture of direct calls to library provided atomicAdd calls, such as float atomicAdd(float, float) and calls provided internally, such as void atomicAdd(long, long), abstract to one API void gpuAtomicAdd(T*, T) in THCAtomics.cuh for the PyTorch backend. The advantage of this approach is that it allows us to more easily distinguish between capabiltiies of different platforms (and their versions). Additionally, the abstraction of void returning atomicAdds allows us to, in the future, support fast HW instructions on some platforms that will not return the previous value. Call sites that do not satisfy above conditions and are either highly platform specific (__half2 atomicAdd fast path in one operator) or require the return explicitly (some int atomicAdd invocations) are left untouched. The Caffe2 backend also remains untouched. While here, add a bunch of includes of THCAtomics.cuh that were missing before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31992 Differential Revision: D19330220 Pulled By: ezyang fbshipit-source-id: d6ab73ec5168c77e328faeef6c6f48eefba00861	2020-01-10 09:48:42 -08:00
Tongzhou Wang	c6f41ae01b	Fix and add more padding mode support for Conv (#31784 ) Summary: Fix https://github.com/pytorch/pytorch/issues/29712 #29668 , add arg checking, doc, and support for reflection and replication padding modes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31784 Differential Revision: D19301974 Pulled By: ezyang fbshipit-source-id: a0ed4815c0c22e416b16e256bba04324e376b2f8	2020-01-10 08:14:58 -08:00
Tongzhou Wang	b6f43afaca	Fix tensordot allowing negative dims (#31954 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31926 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954 Differential Revision: D19331847 Pulled By: zou3519 fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28	2020-01-10 07:42:04 -08:00
Rohan Varma	8ea49e7a08	add missing braces for format in rpc _to_worker_info (#31969 ) Summary: This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969 Differential Revision: D19331927 Pulled By: rohan-varma fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65	2020-01-09 23:18:46 -08:00
Jiakai Liu	4e84661139	update llvmlite to 0.30.0 (#31858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31858 Trying to upgrade docker image but ran into the following error: ``` Running test_nn ... [2020-01-04 18:05:12.537860] Traceback (most recent call last): File "test_nn.py", line 45, in <module> from common_cuda import TEST_CUDA, TEST_MULTIGPU, TEST_CUDNN, TEST_CUDNN_VERSION File "/var/lib/jenkins/workspace/test/common_cuda.py", line 16, in <module> import numba.cuda File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 178, in <module> _ensure_llvm() File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 100, in _ensure_llvm raise ImportError(msg) ImportError: Numba requires at least version 0.30.0 of llvmlite. Installed version is 0.28.0. ``` Test Plan: Imported from OSS Differential Revision: D19282923 Pulled By: ljk53 fbshipit-source-id: bdeefbf4f6c0c97df622282f76e77eb1eadba436	2020-01-09 19:28:08 -08:00
Shen Li	62f93443e5	Explain RPC behavior when using Tensor as arg or return value Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968 Test Plan: Imported from OSS Differential Revision: D19321380 Pulled By: mrshenli fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac	2020-01-09 16:42:24 -08:00
Zafar Takhirov	6abfa9ad8a	Quantized H Tangent function (#31031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031 This activation will be needed for the LSTM implementation. Also includes the QNNPack implementation. Test Plan: Imported from OSS Differential Revision: D19334280 Pulled By: z-a-f fbshipit-source-id: ae14399765a47afdf9b1e072d3967c24ff473e8d	2020-01-09 16:16:17 -08:00
Bram Wasti	021e1e20c1	Revert D19320493: Javadoc changes Test Plan: revert-hammer Differential Revision: D19320493 Original commit changeset: cc76b2a2acbe fbshipit-source-id: 3b36dd2d2591acc60a06a421dd625c21adbe578a	2020-01-09 14:23:30 -08:00
Jiakai Liu	700d1c5cbc	update CI script to take string docker image version (#31857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31857 According to mingbowan we will change to use string docker image version because the tag is no longer an integer since we move the docker image build job to circle CI: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Test Plan: - with stacked PR Differential Revision: D19282726 Pulled By: ljk53 fbshipit-source-id: 7a12ae89a11cf15163b905734d50fed6dc98cb07	2020-01-09 14:15:10 -08:00
Lu Fang	67ff051ddd	Remove temporary fix for torchbind in BC check (#31982 ) Summary: Remove the patch Pull Request resolved: https://github.com/pytorch/pytorch/pull/31982 Reviewed By: hl475 Differential Revision: D19333205 Pulled By: houseroad fbshipit-source-id: 1d16fd31ede7266789141238520d47b762a7a340	2020-01-09 13:58:16 -08:00
Alban Desmaison	2968faf154	Update doc about output_differentiability keyword in derivatives.yaml Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31925 Test Plan: Imported from OSS Differential Revision: D19303833 Pulled By: albanD fbshipit-source-id: 291a9f122720844a5f8386b22cf6abc66ae86e4d	2020-01-09 13:48:06 -08:00
Edward Yang	67c1d930eb	Lock graph_task before writing leaf_streams. (#31995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995 Fixes #31906. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19331259 Pulled By: ezyang fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d	2020-01-09 13:26:36 -08:00
TH3CHARLie	1296e2d55e	C++ API parity: isinf (#31099 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099 Differential Revision: D19314733 Pulled By: yf225 fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e	2020-01-09 13:16:13 -08:00
Sameer Deshmukh	cfdfdf70d7	remove JSON dumping dependency (#30724 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19420 So after actually writing a C++ JSON dumping class I figured that a faster and cleaner way would be simply rewrite the Python without the JSON module since the JSON that we need to output is so simple. For now I decided to not touch the `parse_cpu_trace` function since only changing `export_chrome_trace` shows a 4x speedup. Here's the script I used for benchmarking: ``` python import time import torch x = torch.ones(2, 2) start = time.time() with torch.autograd.profiler.profile() as prof: for _ in range(10000): x * x for i in range(50): prof.export_chrome_trace("trace.json") stop = time.time() print(stop-start) ``` master branch (using json dump) -> 8.07515025138855 new branch (without json dump) -> 2.0943689346313477 I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659) and it does work fine. Please let me know what you think. If you still insist on the C++ version I can send a new patch soon enough. CC ezyang rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724 Differential Revision: D19298955 Pulled By: ezyang fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427	2020-01-09 12:56:16 -08:00
jlquinn	bc68a8745f	Spelling fix in transformer docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973 Differential Revision: D19330660 Pulled By: zou3519 fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba	2020-01-09 11:13:23 -08:00
Jessica Lin	26f552a3d1	Javadoc changes (#31956 ) Summary: - Add Javadoc url in index.rst - Delete no longer needed java rst files - Remove intersphinx extension from conf.oy - Remove javasphinx from docs/requirements.txt Pull Request resolved: https://github.com/pytorch/pytorch/pull/31956 Differential Revision: D19320493 Pulled By: jlin27 fbshipit-source-id: cc76b2a2acbe2ecdabcd3339e1cc3182f0c906ae	2020-01-09 10:55:24 -08:00
xiaobing.zhang	e59e5ba5a3	Move geometric to Aten(CPU) (#31878 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24704. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.geometric_(0.5) for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.geometric_(0.5) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0092 (ms). input size(128, 10) forward time is 0.0802 (ms). input size(128, 100) forward time is 0.7994 (ms). input size(128, 1000) forward time is 7.8403 (ms). ``` After: ``` input size(128, 1) forward time is 0.0088 (ms). input size(128, 10) forward time is 0.0781 (ms). input size(128, 100) forward time is 0.7815 (ms). input size(128, 1000) forward time is 7.7163 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31878 Differential Revision: D19314510 Pulled By: ezyang fbshipit-source-id: 2d95bf9938c8becf280890acf9e37223ddd08a39	2020-01-09 10:47:56 -08:00
xiaobing.zhang	99b3f9cac4	Move log_sigmoid to Aten(CPU) (#30958 ) Summary: VitalyFedyunin, This PR is about port LogSigmoid activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" m = nn.LogSigmoid() #warm up for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Before: ``` input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms). input size(128, 100) forward time is 0.90 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 9.04 (ms); backwad avg time is 0.87 (ms). ``` After: ``` input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 0.28 (ms); backwad avg time is 0.07 (ms). ``` OMP_NUM_THREADS=1: ``` Before: input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms). input size(128, 100) forward time is 0.88 (ms); backwad avg time is 0.10 (ms). input size(128, 1000) forward time is 8.72 (ms); backwad avg time is 0.81 (ms). After: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 0.63 (ms); backwad avg time is 0.15 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24724, https://github.com/pytorch/pytorch/issues/24725. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30958 Differential Revision: D19275111 Pulled By: ezyang fbshipit-source-id: bbfe82e58fb27a4fb21c1914c6547a9050072e5c	2020-01-09 10:30:00 -08:00
xiaobing.zhang	5a76335aaa	Move lshift to Aten (#31566 ) Summary: VitalyFedyunin , this PR is about move lshift to Aten. Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__lshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ilshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __lshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.31618343852460384 device: cpu, dtype: torch.uint8, 100000 times 0.31258584931492805 device: cpu, dtype: torch.int16, 100000 times 0.3140896391123533 device: cpu, dtype: torch.int32, 100000 times 0.34389012958854437 device: cpu, dtype: torch.int64, 100000 times 0.339566046372056 device: cpu, dtype: torch.float32, 100000 times 0.4180623721331358 device: cpu, dtype: torch.float64, 100000 times 0.4165227338671684 device: cuda, dtype: torch.int8, 100000 times 1.7851383443921804 device: cuda, dtype: torch.uint8, 100000 times 1.7842160519212484 device: cuda, dtype: torch.int16, 100000 times 1.789359962567687 device: cuda, dtype: torch.int32, 100000 times 1.7822618428617716 device: cuda, dtype: torch.int64, 100000 times 1.7968465769663453 device: cuda, dtype: torch.float32, 100000 times 1.8066061967983842 device: cuda, dtype: torch.float64, 100000 times 1.8046843251213431 __lshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.04618230368942022 device: cpu, dtype: torch.uint8, 10000 times 0.04634759668260813 device: cpu, dtype: torch.int16, 10000 times 0.040676115080714226 device: cpu, dtype: torch.int32, 10000 times 0.04404774494469166 device: cpu, dtype: torch.int64, 10000 times 0.04511771444231272 device: cpu, dtype: torch.float32, 10000 times 0.6887832451611757 device: cpu, dtype: torch.float64, 10000 times 0.5559549620375037 device: cuda, dtype: torch.int8, 10000 times 0.17996764183044434 device: cuda, dtype: torch.uint8, 10000 times 0.17970609478652477 device: cuda, dtype: torch.int16, 10000 times 0.17873135022819042 device: cuda, dtype: torch.int32, 10000 times 0.1781835313886404 device: cuda, dtype: torch.int64, 10000 times 0.17846618220210075 device: cuda, dtype: torch.float32, 10000 times 0.18056879844516516 device: cuda, dtype: torch.float64, 10000 times 0.18132662680000067 __ilshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.61110960226506 device: cpu, dtype: torch.uint8, 100000 times 0.6333359787240624 device: cpu, dtype: torch.int16, 100000 times 0.6345370784401894 device: cpu, dtype: torch.int32, 100000 times 0.6470990972593427 device: cpu, dtype: torch.int64, 100000 times 0.6587044578045607 device: cpu, dtype: torch.float32, 100000 times 0.7269002720713615 device: cpu, dtype: torch.float64, 100000 times 0.7217964073643088 device: cuda, dtype: torch.int8, 100000 times 1.9880435159429908 device: cuda, dtype: torch.uint8, 100000 times 1.986489498987794 device: cuda, dtype: torch.int16, 100000 times 2.0059875370934606 device: cuda, dtype: torch.int32, 100000 times 1.995262237265706 device: cuda, dtype: torch.int64, 100000 times 1.9974954994395375 device: cuda, dtype: torch.float32, 100000 times 2.00442770216614 device: cuda, dtype: torch.float64, 100000 times 2.009664717130363 __ilshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.08199594635516405 device: cpu, dtype: torch.uint8, 10000 times 0.08096733782440424 device: cpu, dtype: torch.int16, 10000 times 0.0734213450923562 device: cpu, dtype: torch.int32, 10000 times 0.0769620593637228 device: cpu, dtype: torch.int64, 10000 times 0.08650507684797049 device: cpu, dtype: torch.float32, 10000 times 0.7196345143020153 device: cpu, dtype: torch.float64, 10000 times 0.597336508333683 device: cuda, dtype: torch.int8, 10000 times 0.19723015930503607 device: cuda, dtype: torch.uint8, 10000 times 0.19754122477024794 device: cuda, dtype: torch.int16, 10000 times 0.19710093270987272 device: cuda, dtype: torch.int32, 10000 times 0.19611249305307865 device: cuda, dtype: torch.int64, 10000 times 0.19750046730041504 device: cuda, dtype: torch.float32, 10000 times 0.19680574722588062 device: cuda, dtype: torch.float64, 10000 times 0.19689027685672045 ``` After: ``` __lshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3031281465664506 device: cpu, dtype: torch.uint8, 100000 times 0.30772678554058075 device: cpu, dtype: torch.int16, 100000 times 0.3088294789195061 device: cpu, dtype: torch.int32, 100000 times 0.30907699652016163 device: cpu, dtype: torch.int64, 100000 times 0.31315001379698515 device: cpu, dtype: torch.float32, 100000 times 0.38823566399514675 device: cpu, dtype: torch.float64, 100000 times 0.39300001971423626 device: cuda, dtype: torch.int8, 100000 times 1.3225595457479358 device: cuda, dtype: torch.uint8, 100000 times 1.31739442050457 device: cuda, dtype: torch.int16, 100000 times 1.3198596313595772 device: cuda, dtype: torch.int32, 100000 times 1.309600466862321 device: cuda, dtype: torch.int64, 100000 times 1.3264533821493387 device: cuda, dtype: torch.float32, 100000 times 1.3377520674839616 device: cuda, dtype: torch.float64, 100000 times 1.3343619462102652 __lshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.02718757465481758 device: cpu, dtype: torch.uint8, 10000 times 0.02701799664646387 device: cpu, dtype: torch.int16, 10000 times 0.025483975186944008 device: cpu, dtype: torch.int32, 10000 times 0.025557605549693108 device: cpu, dtype: torch.int64, 10000 times 0.026179466396570206 device: cpu, dtype: torch.float32, 10000 times 0.0962932649999857 device: cpu, dtype: torch.float64, 10000 times 0.1611471576616168 device: cuda, dtype: torch.int8, 10000 times 0.13165222201496363 device: cuda, dtype: torch.uint8, 10000 times 0.13358880020678043 device: cuda, dtype: torch.int16, 10000 times 0.1342075066640973 device: cuda, dtype: torch.int32, 10000 times 0.1328689968213439 device: cuda, dtype: torch.int64, 10000 times 0.13336248509585857 device: cuda, dtype: torch.float32, 10000 times 0.1345295710489154 device: cuda, dtype: torch.float64, 10000 times 0.14084953162819147 __ilshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.19080814253538847 device: cpu, dtype: torch.uint8, 100000 times 0.18541878275573254 device: cpu, dtype: torch.int16, 100000 times 0.19136024825274944 device: cpu, dtype: torch.int32, 100000 times 0.1916898973286152 device: cpu, dtype: torch.int64, 100000 times 0.1973192635923624 device: cpu, dtype: torch.float32, 100000 times 0.2668355852365494 device: cpu, dtype: torch.float64, 100000 times 0.24472137168049812 device: cuda, dtype: torch.int8, 100000 times 1.3581306440755725 device: cuda, dtype: torch.uint8, 100000 times 1.3522163443267345 device: cuda, dtype: torch.int16, 100000 times 1.366145665757358 device: cuda, dtype: torch.int32, 100000 times 1.3674909211695194 device: cuda, dtype: torch.int64, 100000 times 1.3734915973618627 device: cuda, dtype: torch.float32, 100000 times 1.3831533305346966 device: cuda, dtype: torch.float64, 100000 times 1.396162535995245 __ilshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.02847585454583168 device: cpu, dtype: torch.uint8, 10000 times 0.02960751298815012 device: cpu, dtype: torch.int16, 10000 times 0.028516249731183052 device: cpu, dtype: torch.int32, 10000 times 0.02842544950544834 device: cpu, dtype: torch.int64, 10000 times 0.029186096973717213 device: cpu, dtype: torch.float32, 10000 times 0.0999628696590662 device: cpu, dtype: torch.float64, 10000 times 0.16676222812384367 device: cuda, dtype: torch.int8, 10000 times 0.13856443110853434 device: cuda, dtype: torch.uint8, 10000 times 0.13766566663980484 device: cuda, dtype: torch.int16, 10000 times 0.13652489613741636 device: cuda, dtype: torch.int32, 10000 times 0.13678150344640017 device: cuda, dtype: torch.int64, 10000 times 0.13749946560710669 device: cuda, dtype: torch.float32, 10000 times 0.13879029918462038 device: cuda, dtype: torch.float64, 10000 times 0.14587809145450592 ``` Fix https://github.com/pytorch/pytorch/issues/24510 #24514 https://github.com/pytorch/pytorch/issues/24657 https://github.com/pytorch/pytorch/issues/24661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31566 Differential Revision: D19314251 Pulled By: ezyang fbshipit-source-id: 52df17b2c18ef1880374c6dbcf18fb1118086552	2020-01-09 09:41:36 -08:00
Richard Zou	5c423cae72	Add precision tests for CUDA half linspace+logspace (#31962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31962 I added precision tests for CUDA half, float, and double. The precision for CUDA half seems bad, but I checked the numbers against previous versions of pytorch. The output of CUDA Half linspace+logspace are exactly the same when compared with 1.2.0. Test Plan: - Run CI Differential Revision: D19320182 Pulled By: zou3519 fbshipit-source-id: 38d3d4dea2807875ed0b0ec2b93b19c10a289988	2020-01-09 07:35:52 -08:00
Iurii Zdebskyi	5d5f156558	Revert D18903453: Quantized H Tangent function Test Plan: revert-hammer Differential Revision: D18903453 Original commit changeset: 0050b1cebb1d fbshipit-source-id: 205978f71d5688d4068861f7cf2dff40fbb311c6	2020-01-09 07:30:49 -08:00
Edward Yang	ddff4efa26	Don't use RTLD_GLOBAL to load _C. (#31162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162 This should help us resolve a multitude of weird segfaults and crashes when PyTorch is imported along with other packages. Those would often happen because libtorch symbols were exposed globally and could be used as a source of relocations in shared libraries loaded after libtorch. Fixes #3059. Some of the subtleties in preparing this patch: * Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this. * Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D19262579 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2	2020-01-09 07:28:15 -08:00
Edward Yang	8614860210	Uniformly apply Windows logic in cpp_extensions everywhere (#31161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161 Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols. But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262578 Pulled By: ezyang fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f	2020-01-09 07:28:11 -08:00
Negin Raoof	0dbd5c0bfe	Added torchvision tests as part of ORT tests (#31835 ) Summary: Added torchvision tests as part of ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835 Reviewed By: hl475 Differential Revision: D19278607 Pulled By: houseroad fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd	2020-01-08 21:04:29 -08:00
Supriya Rao	6d9a9e379d	Fix segfault in caffe2 slice test (#31801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31801 Try to fix issue #30764 Test Plan: python test/onnx/test_utility_funs.py TestUtilityFuns Imported from OSS Differential Revision: D19315046 fbshipit-source-id: de3595969280e4ebe762cb098ff0891f8b5a9a90	2020-01-08 17:13:29 -08:00
Hector Yuen	9e9ca6ec37	add conversion functions to embedding tables (#31083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083 add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases) Test Plan: added unit tests enhanced shape inference tests Reviewed By: jspark1105 Differential Revision: D18920547 fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891	2020-01-08 16:56:12 -08:00
jjsjann123	eb23171bce	TensorIterator norm update (#31903 ) Summary: special case for norm out where p == 2. Instead of calling `pow`, we use multiplication as a faster code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31903 Differential Revision: D19312749 Pulled By: ngimel fbshipit-source-id: 73732b7b37a243a14438609784795b920271a0b5	2020-01-08 16:50:42 -08:00
Elias Ellison	8ecd3f783d	check for object equality in constant pooling (#31800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800 If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness. Test Plan: Imported from OSS Differential Revision: D19269499 Pulled By: eellison fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4	2020-01-08 16:47:07 -08:00
Elias Ellison	319cc21108	Add AliasDb API For Changing Aliasing (#31501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501 We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable. Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing. Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship` Related: https://github.com/pytorch/pytorch/issues/28360 Test Plan: Imported from OSS Differential Revision: D19254413 Pulled By: eellison fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15	2020-01-08 16:47:03 -08:00
davidriazati	5cc49ed45f	Document `IValue` (#31904 ) Summary: This is a first pass attempt at documenting `IValue` to help with problems like in #17165. Most users are probably concerned with * how to make an `IValue` that matches the input type to their graph (most of the constructors are pretty self explanatory, so as long as they are in the docs I think its enough) * how to extract the results after running their graph (there is a small note on the behavior of `.toX()` based on confusions we've had in the past) Preview: https://driazati.github.io/pytorch_doc_previews/31904/api/structc10_1_1_i_value.html#exhale-struct-structc10-1-1-i-value There are also some random CSS fixes to clean up the style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31904 Pulled By: driazati Differential Revision: D19318733 fbshipit-source-id: b29dae3349d5a7ea5a3b8e09cd23f7ff8434edb4	2020-01-08 16:08:35 -08:00
davidriazati	883fb5434a	Use real argument names for Python functions (#29300 ) Summary: This hooks up `inspect` so that Python functions get their parameters names attached instead of naming them `0, 1, 2, ...`. This also fixes issue #28537 where `ignore` functions were improperly typing `self`. ](https://our.intern.facebook.com/intern/diff/19256434/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300 Pulled By: driazati Differential Revision: D19256434 fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c	2020-01-08 15:41:28 -08:00
davidriazati	09a22f3301	Remove C++ docs contributing page (#31908 ) Summary: Stacked PRs * #31908 - Remove C++ docs contributing page * #31905 - Add doc previewing instructions We should have 1 source of truth for contribution instructions (CONTRIBUTING.md). This PR moves the instructions from the C++ doc pages there instead of having its own separate page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31908 Pulled By: driazati Differential Revision: D19296366 fbshipit-source-id: c1daf004259342bd09e09dea3b80e34db47066ec	2020-01-08 15:37:35 -08:00
davidriazati	8c59d48281	Add doc previewing instructions (#31905 ) Summary: Stacked PRs * #31908 - Remove C++ docs contributing page * #31905 - Add doc previewing instructions This adds some instructions on how to get started with Github pages you can show reviewers your documentation changes. Hopefully we can delete this eventually and build docs automatically on relevant PRs in CI. ](https://our.intern.facebook.com/intern/diff/19296364/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31905 Pulled By: driazati Differential Revision: D19296364 fbshipit-source-id: df47fa1a8d7be029c3efcf6521298583ad9f7a95	2020-01-08 15:37:31 -08:00
xiaobing.zhang	dedd16b418	remove THConv code which never be used (#31879 ) Summary: Just remove dead code in TH. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31879 Differential Revision: D19315818 Pulled By: ezyang fbshipit-source-id: dbeb2475e19e9ebf769df2649cc859c08d3d184d	2020-01-08 15:14:27 -08:00
xiaobing.zhang	9a3cb1e859	Move cauchy to Aten(CPU) (#31824 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24684. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.cauchy_() for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.cauchy_() t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0071 (ms). input size(128, 10) forward time is 0.0596 (ms). input size(128, 100) forward time is 0.5798 (ms). input size(128, 1000) forward time is 5.8395 (ms). ``` After: ``` input size(128, 1) forward time is 0.0070 (ms). input size(128, 10) forward time is 0.0583 (ms). input size(128, 100) forward time is 0.5714 (ms). input size(128, 1000) forward time is 5.7674 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31824 Differential Revision: D19314411 Pulled By: ezyang fbshipit-source-id: 58098546face3e5971b023f702cfe44ff1cccfbc	2020-01-08 15:10:53 -08:00
xiaobing.zhang	9ba6a768de	Add op bitwise_or (#31559 ) Summary: ezyang , this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 . Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__or__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ior__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17616272252053022 device: cpu, dtype: torch.uint8, 100000 times 0.17148233391344547 device: cpu, dtype: torch.int16, 100000 times 0.17616403382271528 device: cpu, dtype: torch.int32, 100000 times 0.17717823758721352 device: cpu, dtype: torch.int64, 100000 times 0.1801931718364358 device: cuda, dtype: torch.int8, 100000 times 1.270583058707416 device: cuda, dtype: torch.uint8, 100000 times 1.2636413089931011 device: cuda, dtype: torch.int16, 100000 times 1.2839747751131654 device: cuda, dtype: torch.int32, 100000 times 1.2548385225236416 device: cuda, dtype: torch.int64, 100000 times 1.2650810535997152 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031136621721088886 device: cpu, dtype: torch.uint8, 10000 times 0.030786747112870216 device: cpu, dtype: torch.int16, 10000 times 0.02391665056347847 device: cpu, dtype: torch.int32, 10000 times 0.024147341027855873 device: cpu, dtype: torch.int64, 10000 times 0.024414129555225372 device: cuda, dtype: torch.int8, 10000 times 0.12741921469569206 device: cuda, dtype: torch.uint8, 10000 times 0.1249831635504961 device: cuda, dtype: torch.int16, 10000 times 0.1283819805830717 device: cuda, dtype: torch.int32, 10000 times 0.12591975275427103 device: cuda, dtype: torch.int64, 10000 times 0.12655890546739101 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3908365070819855 device: cpu, dtype: torch.uint8, 100000 times 0.38267823681235313 device: cpu, dtype: torch.int16, 100000 times 0.38239253498613834 device: cpu, dtype: torch.int32, 100000 times 0.3817988149821758 device: cpu, dtype: torch.int64, 100000 times 0.3901665909215808 device: cuda, dtype: torch.int8, 100000 times 1.4211318120360374 device: cuda, dtype: torch.uint8, 100000 times 1.4215159295126796 device: cuda, dtype: torch.int16, 100000 times 1.4307750314474106 device: cuda, dtype: torch.int32, 100000 times 1.4123614141717553 device: cuda, dtype: torch.int64, 100000 times 1.4480243818834424 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06468924414366484 device: cpu, dtype: torch.uint8, 10000 times 0.06442475505173206 device: cpu, dtype: torch.int16, 10000 times 0.05267547257244587 device: cpu, dtype: torch.int32, 10000 times 0.05286940559744835 device: cpu, dtype: torch.int64, 10000 times 0.06211103219538927 device: cuda, dtype: torch.int8, 10000 times 0.15332304500043392 device: cuda, dtype: torch.uint8, 10000 times 0.15353196952492 device: cuda, dtype: torch.int16, 10000 times 0.15300503931939602 device: cuda, dtype: torch.int32, 10000 times 0.15274472255259752 device: cuda, dtype: torch.int64, 10000 times 0.1512152962386608 ``` After: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.2465507509186864 device: cpu, dtype: torch.uint8, 100000 times 0.2472386620938778 device: cpu, dtype: torch.int16, 100000 times 0.2469814233481884 device: cpu, dtype: torch.int32, 100000 times 0.2535214088857174 device: cpu, dtype: torch.int64, 100000 times 0.24855613708496094 device: cuda, dtype: torch.int8, 100000 times 1.4351346511393785 device: cuda, dtype: torch.uint8, 100000 times 1.4434308474883437 device: cuda, dtype: torch.int16, 100000 times 1.4520929995924234 device: cuda, dtype: torch.int32, 100000 times 1.4456610176712275 device: cuda, dtype: torch.int64, 100000 times 1.4580101007595658 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.029985425993800163 device: cpu, dtype: torch.uint8, 10000 times 0.03024935908615589 device: cpu, dtype: torch.int16, 10000 times 0.026356655173003674 device: cpu, dtype: torch.int32, 10000 times 0.027377349324524403 device: cpu, dtype: torch.int64, 10000 times 0.029163731262087822 device: cuda, dtype: torch.int8, 10000 times 0.14540370367467403 device: cuda, dtype: torch.uint8, 10000 times 0.1456305105239153 device: cuda, dtype: torch.int16, 10000 times 0.1450125053524971 device: cuda, dtype: torch.int32, 10000 times 0.1472016740590334 device: cuda, dtype: torch.int64, 10000 times 0.14709716010838747 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.27195510920137167 device: cpu, dtype: torch.uint8, 100000 times 0.2692424338310957 device: cpu, dtype: torch.int16, 100000 times 0.27726674638688564 device: cpu, dtype: torch.int32, 100000 times 0.2815811652690172 device: cpu, dtype: torch.int64, 100000 times 0.2852728571742773 device: cuda, dtype: torch.int8, 100000 times 1.4743850827217102 device: cuda, dtype: torch.uint8, 100000 times 1.4766502184793353 device: cuda, dtype: torch.int16, 100000 times 1.4774163831025362 device: cuda, dtype: torch.int32, 100000 times 1.4749693805351853 device: cuda, dtype: torch.int64, 100000 times 1.5772947426885366 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03614502027630806 device: cpu, dtype: torch.uint8, 10000 times 0.03619729354977608 device: cpu, dtype: torch.int16, 10000 times 0.0319912089034915 device: cpu, dtype: torch.int32, 10000 times 0.03319283854216337 device: cpu, dtype: torch.int64, 10000 times 0.0343862259760499 device: cuda, dtype: torch.int8, 10000 times 0.1581476852297783 device: cuda, dtype: torch.uint8, 10000 times 0.15974601730704308 device: cuda, dtype: torch.int16, 10000 times 0.15957212820649147 device: cuda, dtype: torch.int32, 10000 times 0.16002820804715157 device: cuda, dtype: torch.int64, 10000 times 0.16129320487380028 ``` Fix https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559 Differential Revision: D19315875 Pulled By: ezyang fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad	2020-01-08 15:06:30 -08:00
xiaobing.zhang	4f9d2f74e2	Port softplus activation to Aten(CPU+CUDA) (#30504 ) Summary: VitalyFedyunin, This PR is about port Softplus activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Softplus() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms). CPU: input size(128, 100) forward time is 1.16 (ms); backwad avg time is 0.69 (ms). input size(128, 10000) forward time is 60.19 (ms); backwad avg time is 31.86 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: input size(128, 100) forward time is 0.43 (ms); backwad avg time is 0.16 (ms). input size(128, 10000) forward time is 1.65 (ms); backwad avg time is 0.83 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.53 (ms); backwad avg time is 0.28 (ms). input size(128, 10000) forward time is 51.33 (ms); backwad avg time is 25.48 (ms). After: input size(128, 100) forward time is 0.44 (ms); backwad avg time is 0.16 (ms). input size(128, 10000) forward time is 42.05 (ms); backwad avg time is 13.97 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24633, https://github.com/pytorch/pytorch/issues/24634, https://github.com/pytorch/pytorch/issues/24766, https://github.com/pytorch/pytorch/issues/24767. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30504 Differential Revision: D19274913 Pulled By: ezyang fbshipit-source-id: 21b29e8459dcba5a040cc68333887b45a858328e	2020-01-08 15:03:53 -08:00
Yinghai Lu	d2fdf140af	Combine all the user inputs together and convert them to fp16 (#31898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31898 Att Reviewed By: tracelogfb Differential Revision: D19291357 fbshipit-source-id: 747ed5234ca042ceeaff2d094701ead7597ac3ee	2020-01-08 14:36:42 -08:00
Yinghai Lu	8b4feff01d	Use simd version for fp16 conversions (#31897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31897 Previous version only use avx2. The _simd version uses avx512 if CPU is capable of that. Test Plan: Unitttest Reviewed By: tracelogfb Differential Revision: D19291499 fbshipit-source-id: 3b1ee0ba756e5c9defbd5caf7f68982d9b2ca06c	2020-01-08 14:36:38 -08:00
Alban Desmaison	1314f7f4f4	Ensure the original grad_mode is restored during backward (#31884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884 Fix #31715 Test Plan: Imported from OSS Differential Revision: D19301076 Pulled By: albanD fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce	2020-01-08 14:16:51 -08:00
Alban Desmaison	c299cb05ef	temporary fix for jit test backward compatibility issues Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31949 Test Plan: Imported from OSS Differential Revision: D19314763 Pulled By: albanD fbshipit-source-id: b5eff0ed53a371d260596ca85d914c8bddb0a8aa	2020-01-08 13:32:08 -08:00
Mingbo Wan	462bfc7fe7	docker hub image info (#31923 ) Summary: result: http://docker.pytorch.org/docker_hub.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31923 Differential Revision: D19316770 Pulled By: mingbowan fbshipit-source-id: 57f34d8983d26772bb0d310fa0a4085674c860e5	2020-01-08 13:20:06 -08:00
Edward Yang	5dfcfeebb8	Revert D19298735: Emit warning from deprecated torch function signatures Test Plan: revert-hammer Differential Revision: D19298735 Original commit changeset: 03cb78af1765 fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70	2020-01-08 13:04:41 -08:00
Zafar Takhirov	620060cb0c	Quantized H Tangent function (#31031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031 This activation will be needed for the LSTM implementation. Also includes the QNNPack implementation. Test Plan: Imported from OSS Differential Revision: D18903453 Pulled By: z-a-f fbshipit-source-id: 0050b1cebb1ddb179b7ecbcb114fe70705070f67	2020-01-08 12:59:39 -08:00
Peter Bell	54777b1e73	Avoid reference invalidation in cuda SpectralOps' plan_caches (#31861 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31412 The root cause is `plan_caches` being resized in one thread while another holds a reference to an existing `CuFFTParamsLRUCache` which then becomes invalidated. I was able to reproduce the crash very reliably without this fix applied and no longer see it. Being a race condition, it's hard to say for sure though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31861 Differential Revision: D19312314 Pulled By: ezyang fbshipit-source-id: 06e4561128d503f2d70cdfe1982be0f3db2a8cf8	2020-01-08 11:50:05 -08:00
Shen Li	7f723cbd8a	Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D19290954 Original commit changeset: cdb22203c2f2 fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3	2020-01-08 10:25:51 -08:00
Xiang Gao	c66ca74f03	Add device debug info to CUDA build (#31929 ) Summary: Also print NVCC flags in the summary Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929 Differential Revision: D19312079 Pulled By: ezyang fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037	2020-01-08 09:56:20 -08:00
Sebastian Messmer	f0072b3af5	Remove C++11 compatibility from c10::optional (#30919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30919 deletecode ghstack-source-id: 96383227 Test Plan: waitforsandcastle Differential Revision: D18869641 fbshipit-source-id: c08345d17a291cea3749af20473b6acddc78ab27	2020-01-08 09:19:59 -08:00
Sebastian Messmer	f67851d69a	Fix c10::util::get_fully_qualified_type_name for MSVC (#31313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31313 This is a bugfix. The reason we couldn't enable the constexpr-ness for it before is that it was buggy, and without constexpr it crashed at runtime and not at compile time which seems to have passed our CI unfortunately... ghstack-source-id: 96380160 Test Plan: Now it works even when enabling constexpr for it Differential Revision: D19087471 fbshipit-source-id: 28be107389f4507d35d08eab4b089a405690529b	2020-01-08 09:11:10 -08:00
Sebastian Messmer	2a294aace6	Remove memory ordering from LeftRight (#31026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31026 This is error prone and probably wrong. Since we don't use LeftRight on the hot path anymore, let's remove this. ghstack-source-id: 96369644 Test Plan: none Differential Revision: D18902165 fbshipit-source-id: 7b9478cd7cc071f403d75da20c7c889c27248b5c	2020-01-08 08:59:30 -08:00
James Donald	84dfa96f62	Fix -Wundef warning in conversions.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31911 Test Plan: * CI builds including GPU and OSS-build tests * The `defined(__HIP_DEVICE_COMPILE__) ` instance a few lines below is proof that this is a define/undef flag, not a define01 flag Reviewed By: hlu1 Differential Revision: D19296560 fbshipit-source-id: 1c45069aec534b0bf4a87751a74680675c985e06	2020-01-08 08:39:37 -08:00
Alban Desmaison	ee817012b2	Add more tests to the autograd wrt view and inplace (#31147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31147 The goal here is to add more tests of the current behavior of the autograd to make sure no regressions are introduced when modifying it. Do let me know if you think of other corner cases I missed. Test Plan: Imported from OSS Differential Revision: D19301082 Pulled By: albanD fbshipit-source-id: 2cb07dcf99e56eb1f2c56a179796f2e6042d5a2d	2020-01-08 07:14:52 -08:00
Shihao Xu	6664703842	Implement backend-agnostic rpc._wait_all_workers() utility (#31888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. ghstack-source-id: 96386210 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19290954 fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf	2020-01-08 01:00:25 -08:00
Edward Yang	9116f02beb	Rename TORCH_DCHECK to TORCH_INTERNAL_ASSERT_DEBUG_ONLY (#31917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31917 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19301480 Pulled By: ezyang fbshipit-source-id: fcce8868733965b9fbd326b4ec273135759df377	2020-01-07 17:28:47 -08:00
Sebastian Messmer	ab60cca488	Make c10::util::get_fully_qualified_type_name() backwards compatible with clang 4 (#31351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351 Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly. Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message. ghstack-source-id: 96380163 Test Plan: testinprod Differential Revision: D19135587 fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa	2020-01-07 17:07:54 -08:00
Sebastian Messmer	0dca9c30ca	constexpr typeid improvements (#31312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31312 ghstack-source-id: 96369343 Test Plan: unit tests Differential Revision: D19087198 fbshipit-source-id: 7f9a7169f11973759b9ecabcc755c211d34e2742	2020-01-07 17:07:49 -08:00
Sebastian Messmer	c21f89970f	Remove c++14-conditional constexpr (#30916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916 These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead. ghstack-source-id: 96369584 Test Plan: waitforsandcastle Differential Revision: D18869635 fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff	2020-01-07 16:40:11 -08:00
David Reiss	4daa3dedbe	Fix IValue.isList Summary: I think this was wrong before? Test Plan: Not sure. Reviewed By: IvanKobzarev Differential Revision: D19221358 fbshipit-source-id: 27e675cac15dde29e026305f4b4e6cc774e15767	2020-01-07 16:33:36 -08:00
David Reiss	1b4d3d5748	Properly return data from non-contiguous tensors in Java Summary: These were returning incorrect data before. Now we make a contiguous copy before converting to Java. Exposing raw data to the user might be faster in some cases, but it's not clear that it's worth the complexity and code size. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221361 fbshipit-source-id: 22ecdad252c8fd968f833a2be5897c5ae483700c	2020-01-07 16:33:31 -08:00
David Reiss	2d6a2c898c	Support tensors with a storage offset in Java (#31584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31584 These were returning incorrect data before. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221360 fbshipit-source-id: b3f01de086857027f8e952a1c739f60814a57acd	2020-01-07 16:33:26 -08:00
David Reiss	6d1fa8296b	Support tensors with empty shape in Java Summary: These are valid tensors. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221362 fbshipit-source-id: fa9af2fc539eb7381627b3d473241a89859ef2ba	2020-01-07 16:33:21 -08:00
davidriazati	3c07eb33bb	Better error for `torch::jit::load`ing a eager file (#31709 ) Summary: This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++. Relevant for #31620 ](https://our.intern.facebook.com/intern/diff/19252172/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31709 Pulled By: driazati Differential Revision: D19252172 fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5	2020-01-07 16:20:42 -08:00
Shihao Xu	a730920a3d	Make RRef leak detection always print a warning log (#31922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31922 For better debugging, `test_rref_leak` failure in https://app.circleci.com/jobs/github/pytorch/pytorch/4135881, as per discussion in https://github.com/pytorch/pytorch/pull/31888. ghstack-source-id: 96375261 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19302814 fbshipit-source-id: 51632aede98e01689f8bc0f266788a9b020daa15	2020-01-07 15:18:00 -08:00
Karl Ostmo	227d1a43a4	Revert D18838848: disable __torch_function__ overides for operators in torch.functional Test Plan: revert-hammer Differential Revision: D18838848 Original commit changeset: 22b8015d7b2f fbshipit-source-id: fdaeffcd112990ed379782cf7216d3f1beeb2cb1	2020-01-07 15:03:15 -08:00
Bram Wasti	8a0503b355	Run a non-quiet submodule update to prevent timeouts on Circle CI (#31900 ) Summary: As in title, this PR will disable the `--quiet` flag used in the CI as a workaround to a timeout hitting Mac OS CI. Circle CI works by timing out when no text has been printed for 10 min. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31900 Differential Revision: D19302899 Pulled By: bwasti fbshipit-source-id: 145647da983ee06f40794bda1abd580ea45a0019	2020-01-07 14:01:05 -08:00
Jeremy Lilley	114562cf93	For torch::from_blob() add clue when memory is non-owned. (#31222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31222 - When constructing torch::from_blob() in the case where the deleter is a nop, switch to using a nullptr context in the DataPtr (with a nop deleter) - No real extra memory/cpu requirements here, actually saves a minor alloc. Why? Trying to get a signal that a Tensor might contain non-owned memory from torch::from_blob(), by detecting the nullptr context. ghstack-source-id: 96336078 Test Plan: buck test mode/dev caffe2/test/cpp/api/... buck test mode/dev-nosan caffe2/test/... Differential Revision: D18992119 fbshipit-source-id: 4eea642f82d0858b57fdfc6995364a760c10567d	2020-01-07 13:12:30 -08:00
Nathan Goldbaum	ca72df06ae	disable __torch_function__ overides for operators in torch.functional (#30839 ) Summary: For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++. cc hl475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839 Differential Revision: D18838848 Pulled By: ezyang fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8	2020-01-07 12:27:28 -08:00
lixinyu	bb279c5c63	named tensor max pooling support Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31669 Test Plan: Imported from OSS Differential Revision: D19240348 Pulled By: glaringlee fbshipit-source-id: 004387aa753e4e41afdede66647abbb0bcbd9808	2020-01-07 12:03:18 -08:00
Artem Volkhin	3a2757c682	Fix tracing for modules with List[Tensor] as output (#31343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31343 Fix an issue in TorchScript tracing for modules with `c10::List<at::Tensor>` as an output. TensorList was not supported properly. Test Plan: unit tests Reviewed By: wanchaol Differential Revision: D18850722 fbshipit-source-id: 87a223104d1361fe754d55deceeb1e8bbcad629b	2020-01-07 11:57:25 -08:00
Peter Bell	74d69e296e	Raise an error if torch.cat is given `out` as one of the input tensors (#30577 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30562 for both cpu and cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30577 Differential Revision: D19298732 Pulled By: ezyang fbshipit-source-id: ea539c97493ee17d8f60b1134d100a44c8717578	2020-01-07 11:30:33 -08:00
Jessica Lin	c888473b57	Restructure docs organization and naming (#31849 ) Summary: * Rename “Other Languages” → “Language Bindings” * Move the Community section to the bottom * Move "Language Bindings" above "Python API" Pull Request resolved: https://github.com/pytorch/pytorch/pull/31849 Differential Revision: D19290966 Pulled By: jlin27 fbshipit-source-id: 30b579e032a9fb1636e4afc7bbbd85a2708f637d	2020-01-07 11:16:53 -08:00
Pritam Damania	bf8e1c0710	Integrate async mode for autograd engine with distributed autograd. (#31508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31508 This PR builds on top of https://github.com/pytorch/pytorch/pull/31230 to ensure that distributed autograd doesn't block an RPC thread anymore during the backward pass. I've also added a unit test where all ranks hammer rank 0 without about 60 backward calls (which would cause a deadlock earlier), but now such a test passes without any issues. ghstack-source-id: 96345097 Test Plan: waitforbuildbot Differential Revision: D19188749 fbshipit-source-id: b21381b38175699afd0f9dce1ddc8ea6a220f589	2020-01-07 11:01:16 -08:00
Peter Bell	0e5a6700cc	Emit warning from deprecated torch function signatures (#31514 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28430 The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used. One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures. [`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514 Differential Revision: D19298735 Pulled By: ezyang fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647	2020-01-07 10:57:53 -08:00
Pritam Damania	5cc62f2913	Ensure autograd callbacks are called only once for reentrant backward. (#31909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31909 https://github.com/pytorch/pytorch/pull/31230 introduced a bug where we would end up calling `graph_task_post_processing` twice for reentrant backward calls (once when we mark the future completed and then we we called graph_task_post_processing in execute_with_graph_task). This PR fixes the issues by verifying the future we return in that case is completed and we remove the call to graph_task_post_processing. In addition to that I added a test that reproduced the problem and verified it is fixed by this PR. ghstack-source-id: 96349102 Test Plan: waitforbuildbot Differential Revision: D19296363 fbshipit-source-id: dc01a4e95989709ad163bb0357b1d191ef5a4fb2	2020-01-07 10:35:04 -08:00
Johannes M Dieterich	4ee9c56218	Support PyTorch ROCm CI on Ubuntu18.04 (#31886 ) Summary: In order to support Ubuntu18.04, some changes to the scripts are required. * install dependencies with -y flag * mark install noninteractive * install some required dependencies (gpg-agent, python3-distutils, libidn11) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31886 Differential Revision: D19300586 Pulled By: bddppq fbshipit-source-id: d7fb815a3845697ce63af191a5bc449d661ff1de	2020-01-07 10:32:47 -08:00
Sameer Deshmukh	2f5eefe525	Raise ValueError if CUDA device is specified without specifying the : (#29087 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29087 Differential Revision: D19298959 Pulled By: ezyang fbshipit-source-id: 878ea4840682012f07177d8d159a77c0e5afada6	2020-01-07 10:29:49 -08:00
Edward Yang	3c7db5ccbc	Don't unconditionally compile runJITCPPTests (#31236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31236 It is not compiled on Windows Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262581 Pulled By: ezyang fbshipit-source-id: 80bfa553333a946f00291aaca6ad26313caaa9e6	2020-01-07 10:24:52 -08:00
Fei Tian	809ee9d04c	Enable personalized FC weight_init and sparse_emb weight_init (#31707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31707 Change the initialization value for FC weight init and sparse embedding lookup init. Previous default initialization is uniform(-\sqrt(1/input_dim), \sqrt(1/input_dim)); Now pass into a flexible hyperparameter, say \alpha into it, to change into uniform(-\sqrt(\alpha/input_dim), \sqrt(\alpha/input_dim)); Reviewed By: chonglinsun Differential Revision: D18825615 fbshipit-source-id: 4c5f2e07f2b3f5d642fd96d64dbf68892ebeb30b	2020-01-07 10:10:54 -08:00
Andreas Koepf	22044c6f7c	Use TORCH_CHECK instead of AT_ASSERT in torch::cuda::gather() (#27456 ) Summary: The error message produced by AT_ASSERT() in gather() encouraged users to file a bug report ("please report a bug to PyTorch..."). The assertion should be a regular argument check since it can be triggered by passing tensors with different dimensionality, e.g. `torch.cuda.comm.gather([torch.rand(1, device='cuda'), torch.rand(1, 1, device='cuda')])`. See: https://github.com/pytorch/pytorch/issues/26400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27456 Differential Revision: D19300270 Pulled By: ezyang fbshipit-source-id: ec87d225e23445020b377521e0daccceb4748215	2020-01-07 10:04:24 -08:00
yyb1995	20c5dd59bd	Add stub for transformer.py and MultiheadAttention Class. (#28396 ) Summary: Add stub for `transformer.py` and `class MultiheadAttention`. Add import for `transformer.py` and `class MultiheadAttention` in `__init__.pyi.in`. I've tested the code hint in PyCharm and all works file. Relate issue: [https://github.com/pytorch/pytorch/issues/27842](https://github.com/pytorch/pytorch/issues/27842) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/28396 Differential Revision: D19300287 Pulled By: ezyang fbshipit-source-id: 1a79d6518b5edd4643892c46a959108385c739ad	2020-01-07 09:13:36 -08:00
Eli Uriegas	346a349111	Update all instances of 1.4.0 -> 1.5.0 (#31785 ) Summary: Done with: ``` ❯ sed -i 's/1\.4\.0/1.5.0/g' $(find -type f -not -path "./third_party/") ``` This was previously done in separate commits, but it would be beneficial to bump all included projects within this repository at the same time. Old bumps for reference: [iOS]Update Cocoapods to 1.4.0: https://github.com/pytorch/pytorch/pull/30326 * [android] Change nightly builds version to 1.4.0-SNAPSHOT: https://github.com/pytorch/pytorch/pull/27381 * Roll master to 1.4.0: https://github.com/pytorch/pytorch/pull/27374 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31785 Differential Revision: D19277925 Pulled By: seemethere fbshipit-source-id: f72ad082f0566004858c9374879f4b1bee169f9c	2020-01-07 08:00:17 -08:00
rohithkrn	985fd970aa	Enable BFloat16 support for Convolutions on ROCm (#30948 ) Summary: This PR adds bfloat16 support for convolutions on ROCm. - Intergrates MIOpen bfloat16 convolution support into PyTorch - Enables bfloat16 convolution for non-miopen paths, i.e THCUNN, native hip kernels - Enables bfloat16 type for probability distribution functions(this is included in this PR since conv unit tests use bfloat16 random number generators) Native cuda kernels for convolution and random functions will be compiled for CUDA as well. iotamudelta bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/30948 Differential Revision: D19274164 Pulled By: ezyang fbshipit-source-id: c0888a6ac72a2c5749b1ebb2195ac6f2209996be	2020-01-07 06:57:35 -08:00
Rohan Varma	a561a8448b	minor doc tweak to use mp.spawn in example (#30381 ) Summary: Per pietern's comment in https://github.com/pytorch/pytorch/issues/30022, we can make this example launcher a bit simpler by using `torch.multiprocessing`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30381 Differential Revision: D19292080 Pulled By: rohan-varma fbshipit-source-id: 018ace945601166ef3af05d8c3e69d900bd77c3b	2020-01-06 22:19:01 -08:00
Gao, Xiang	34561dadcd	Don't handle bias inside cudnn_convolution* (#31524 ) Summary: Compared to cuDNN bias, PyTorch add has the following advantage: - faster, especially for backward (see: https://github.com/zasdfgbnm/things/blob/master/2019/conv-backward-profile.md) - handles 64bit indexing automatically - has less code, less maintenance effort ngimel I submit this PR early so the CI could start building it. But I have not tested it locally yet (still waiting for compiling). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31524 Differential Revision: D19264244 Pulled By: ngimel fbshipit-source-id: cb483d378a6d8bce0a05c3643a796e544bd8e8f0	2020-01-06 16:47:54 -08:00
Peter Bell	5d80f63478	no_grad, enable_grad: support for decorating generator functions (#31792 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31497 This allows `torch.no_grad` and `torch.enable_grad` to be used as decorators for generator functions. In which case it disables/enables grad only inside the body of the generator and restores the context outside of the generator. https://github.com/pytorch/pytorch/issues/31497 doesn't include a complete reproducer but the included test with `torch.is_grad_enabled` show this is working where it failed before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31792 Differential Revision: D19274971 Pulled By: albanD fbshipit-source-id: fde6d3fd95d76c8d324ad02db577213a4b68ccbe	2020-01-06 15:21:20 -08:00
Edward Yang	58cffbff91	Add missing TORCH_CUDA_API annotation to throw_nccl_error (#31157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31157 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262583 Pulled By: ezyang fbshipit-source-id: 8fb87b41ab53770329b38e1e2fe679fb868fee12	2020-01-06 14:39:51 -08:00
Edward Yang	4ef9daf7b2	Remove dead CAFFE2_LIBS variable (#31155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31155 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262584 Pulled By: ezyang fbshipit-source-id: 147ac5a9c36e813ea9a2f68b498880942d661be5	2020-01-06 14:39:47 -08:00
Edward Yang	a9dae70bae	Remove LibIRC logic from cmake. (#31152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152 Per apaszke: I can't find any reasonable references to libIRC online, so I decided to remove this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262582 Pulled By: ezyang fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548	2020-01-06 14:39:43 -08:00
neginraoof	112196fdee	Fix index put (#31552 ) Summary: This change is required for cases like: x[1:] = data or x[:3] = data Pull Request resolved: https://github.com/pytorch/pytorch/pull/31552 Reviewed By: hl475 Differential Revision: D19238815 Pulled By: houseroad fbshipit-source-id: 56c9837d86b341ea92b0a71d55034ce189d12e6c	2020-01-06 14:09:48 -08:00
neginraoof	78cba90a8c	Enable constant folding for Reshape (#31054 ) Summary: Enabled constant folding for onnx::Reshape Pull Request resolved: https://github.com/pytorch/pytorch/pull/31054 Reviewed By: hl475 Differential Revision: D18946951 Pulled By: houseroad fbshipit-source-id: 499e8bf5fb091a94f7a27cbdf4311a23b1a6e3d3	2020-01-06 13:35:44 -08:00
Ivan Kobzarev	492ca46e71	Fix androidTest - exclude host tests from it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31522 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D19200861 Pulled By: IvanKobzarev fbshipit-source-id: a6024f3013398f9e0d237e06c984a20493d42f11	2020-01-06 11:29:46 -08:00
Yinghai Lu	c65305e991	Add a check method for custom type tensor (#31290 ) Summary: For backend integration, backend (e.g. Glow) needs to check the content of the tensor to determine whether it is a legit byte tensor or some special packed format. This provides a convenient interface for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31290 Reviewed By: jackm321, qizzzh Differential Revision: D19069684 Pulled By: yinghai fbshipit-source-id: 63360fa2c4d32695fe9767a40027d446d63efdd4	2020-01-06 11:15:33 -08:00
Farhan Khan	1f2b6d632a	Refactor tests in pytorch's test/dist_autograd_test.py file (#31803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31803 Refactored the following fairly similar functions: 1. `test_context_cleanup_tensor_with_grad` 2. `test_context_cleanup_tensor_no_grad` 3. `test_context_cleanup_no_tensors` by creating a helper function `context_cleanup_test_helper` that can be invoked with the appropriate arguments. Test Plan: Verified by running tests. Differential Revision: D19269246 fbshipit-source-id: bfb42b078ad56b97ceeecf0d68b4169768c2c453	2020-01-06 10:59:00 -08:00
anjali411	ddff014b79	fixed scale_factor calculation for uint8 tensor (#31778 ) Summary: When calling the add_images() method on the tensorboard SummaryWriter with a uint8 NCHW tensor, the tensor is incorrectly scaled, resulting in overflow behavior. This leads to incorrect images being displayed in tensorboard. Issue: https://github.com/pytorch/pytorch/issues/31459 Local Testing (ran this code with and without the PR changes and printed scale_factor): import torch import torchvision from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() x=torch.tensor([[[[1, 2, 3], [4, 5, 6]]]], dtype=torch.uint8) writer.add_images("images", x) Before- scale_factor: 255, After- scale_factor: 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31778 Differential Revision: D19289189 Pulled By: anjali411 fbshipit-source-id: 350a1650337244deae4fd8f8b7fb0e354ae6986b	2020-01-06 10:27:35 -08:00
meganset	1ba1799a66	C++ added 3rd arg of false to BatchNorm/InstanceNorm register_parameter … (#31873 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/31680 C++ BatchNorm & InstanceNorm attempt to register undefined tensors when affine is false. Fixes https://github.com/pytorch/pytorch/issues/31680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31873 Differential Revision: D19287087 Pulled By: yf225 fbshipit-source-id: 0d57f10c49083386919b703d72b520a73a8e9e7f	2020-01-06 01:46:24 -08:00
Shen Li	33430cf094	Revert D18643137: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D18643137 Original commit changeset: d669d4fc9ad6 fbshipit-source-id: fe1f8ed77c1c5760638fef06e67ba100b86c33e9	2020-01-05 11:58:51 -08:00
Pritam Damania	fde94e7556	Provide async mode for local autograd engine. (#31230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230 A major issue with distributed autograd currently is that we block an RPC thread when we call Engine::execute_with_graph_task. To resolve this issue, I've made modifications to the local autograd engine such that `execute_with_graph_task` returns a Future instead. The `execute()` methods for Engine::execute() and DistEngine::execute() still wait() on this Future which ensures there is no change in behavior yet. In follow up PRs we can modify the distributed autograd engine to take advantage of this Future. Closes #26359 ghstack-source-id: 96298057 Test Plan: waitforbuildbot Differential Revision: D18999709 fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229	2020-01-05 00:29:28 -08:00
James Noeckel	3f0b330736	corrected keyword argument name in docs for Tensor.scatter (#31617 ) Summary: See https://github.com/pytorch/pytorch/issues/31601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31617 Differential Revision: D19268872 Pulled By: mruberry fbshipit-source-id: 52f0213f4aab991fd549b7623556a2ced61631a6	2020-01-04 21:48:30 -08:00
svcscm	9020d30fc9	Updating submodules Summary: GitHub commits: `d7f0e32081` `f2a603d2df` `323a2bc3e5` `04c07965ef` `c179d38294` `6fac956f22` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 558f35dbf1adb3b45179629c61d77488e441d4e3	2020-01-04 21:43:31 -08:00
Shihao Xu	502533cfe6	Implement backend-agnostic rpc._wait_all_workers() utility (#30710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30710 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers$ buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_forward_chain ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_wait_all_workers$ ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` # Debug ``` buck test mode/dev-nosan caffe2/test:rpc_fork -- test_shutdown ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_clean_context_during_backward buck build mode/dev-nosan //caffe2/test:dist_autograd_fork buck-out/gen/caffe2/test/dist_autograd_fork\#binary.par -r test_clean_context_during_backward ``` https://our.intern.facebook.com/intern/testinfra/diagnostics/281475127895800.844424945328750.1575664368/ ``` I1206 12:27:47.491420 185619 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.493880 185630 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.494526 185625 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.495390 185636 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. E1206 12:27:47.544198 185627 pair.cc:642] 1 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) E1206 12:27:47.544203 185633 pair.cc:642] 2 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) E1206 12:27:47.544210 185639 pair.cc:642] 3 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) ``` This should mean the UDF in the request has been run, so Python proceeded and ran to `_agent.shutdown()`. While the RpcAgents on followers wanted to send back the response, but the leader has closed RPC. Need to re-trigger "pytorch_rpc-buck" to reproduce the rare-seen issue. Differential Revision: D18643137 fbshipit-source-id: d669d4fc9ad65ed48bed1329a4eb1c32ba51323c	2020-01-04 17:13:44 -08:00
Martin Yuan	f362cd510d	Move prim ops from JIT registration to C10 (#30612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612 The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style. Test Plan: Imported from OSS Differential Revision: D19237648 Pulled By: iseeyuan fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9	2020-01-04 13:47:44 -08:00
Jerry Zhang	5579611544	Enable foldbn tests (#29220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29220 Support for accessing constant is added in previous PRs, this PR re-enables the foldbn tests Test Plan: test_jit.py Imported from OSS Differential Revision: D18846848 fbshipit-source-id: 90ceaf42539ffee80b984e0d8b2420da66c263c3	2020-01-04 11:47:01 -08:00
Jerry Zhang	ebe69236d1	Expose class constant through `attr` and `setattr` in object (#29219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29219 We added class constant in previous PRs, this PR allows access to class constant in the object API Test Plan: build/bin/test_jit python test/test_jit.py Imported from OSS Differential Revision: D18846851 fbshipit-source-id: 888a6517d5f747d1f8ced283c0c2c30b2f6c72c6	2020-01-04 11:09:35 -08:00
Jerry Zhang	6f62c311a1	Add unsafeRemoveConstant for ClassType (#30787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30787 This is needed when we fuse conv bn modules, where we need to rewrite a constant bias (None) of conv to an attribute bias of Tensor Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D18846850 fbshipit-source-id: 9fd5fe85d93d07226e180b75d2e068fe00ca25fe	2020-01-04 01:11:59 -08:00
Jerry Zhang	2bac76969c	Fix getConstant (#31012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31012 - getConstant should throw when the item is not found - add another getConstant which takes slot index as argument Test Plan: test_class_type.cpp Imported from OSS Differential Revision: D18898418 fbshipit-source-id: d3a23a4896fdbf5fa98e1c55c9c4d6205840014b	2020-01-03 23:06:11 -08:00
Michael Suo	8420f205ee	Remove refs from ArrayRef arguments (#31845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31845 ArrayRef is trivially copyable and should be passed by value. Removing unnecessary `&`s. Test Plan: Imported from OSS Differential Revision: D19278523 Pulled By: suo fbshipit-source-id: 026db693ea98d19246b02c48d49d1929ecb6478e	2020-01-03 22:50:55 -08:00
Mingbo Wan	b0a2765103	move docker image html to correct bucket (#31832 ) Summary: save docker image version to docker.pytorch.org bucket to be served with http://docker.pytorch.org test result: https://s3.amazonaws.com/docker.pytorch.org/pytorch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31832 Differential Revision: D19281263 Pulled By: mingbowan fbshipit-source-id: d906a72d419876c81a570a2086b2d8d2c47d5d17	2020-01-03 21:38:58 -08:00
Jerry Zhang	5fe3604987	Preserve constant from ConcreteModuleType to ClassType (#29218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29218 We need to be able to access constant in module. Test Plan: tbd Imported from OSS Differential Revision: D18846847 fbshipit-source-id: 22d2c485c3c449bc14ad798f6e1a0c64fc8fb346	2020-01-03 21:30:04 -08:00
Zafar Takhirov	e5b7231edc	Adding version check for hypothesis deadline Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31262 Test Plan: Imported from OSS Differential Revision: D19036700 Pulled By: z-a-f fbshipit-source-id: 8e898a6f064dfb4876aa0d3cc299288b5af7b37d	2020-01-03 19:17:55 -08:00
Rohan Varma	28c9dd4436	fix ProcessGroupGlooTest (#31255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31255 This test had 2 issues. A timeout would occasionally happen due to a timeout of 50ms, and CUDA could would get compiled and run on CPU, leading to errors. This PR fixes those issues. Differential Revision: D19028231 fbshipit-source-id: e50752228affe0021e7c0caa83bce78d76473759	2020-01-03 18:35:29 -08:00
svcscm	27488773b0	Updating submodules Summary: GitHub commits: `8c7c0e201e` `b84db9a971` `0524fa0b36` `2df7b2ba54` `80553514ed` `4eb66bc7aa` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 97d0605beabcfc15236038215208acf034f8eba4	2020-01-03 17:04:54 -08:00
Shen Li	c829c6f3d2	Disable flaky test_debug_info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31847 Test Plan: Imported from OSS Differential Revision: D19278009 Pulled By: mrshenli fbshipit-source-id: 652fa6741a48f35d9f8f54534e84d64fdd96b439	2020-01-03 17:01:27 -08:00
Xiaomeng Yang	6b1db202bc	Add tanh to c10::cuda::compat (#31844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31844 Add tanh to c10::cuda::compat Test Plan: unittest Reviewed By: bddppq Differential Revision: D19277230 fbshipit-source-id: d2cceea58722393ecb90aacec05b692dbb92d467	2020-01-03 14:27:36 -08:00
Liqian Peng	9407137102	Update the descriptive error message for enforce fail (#31575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31575 We need a new exception class specifically for the enforce_finite operator, because we need to map it to a specific python exception ExitException, not the RuntimeError type that all c10::Errors get mapped to by default. This diff includes: - Define c10::EnforceFiniteNotMet - API CAFFE_ENFORCE_FINITE to throw c10::EnforceFiniteNotMet - Map from c10::EnforceFiniteNotMet to python ExitException - Apply CAFFE_ENFORCE_FINITE in caffe2 op Test Plan: - integration test pass: https://fburl.com/fblearner/xwkzbqyo - integration test with D19213617: https://fburl.com/fblearner/479y4jrj Generate error message as desired - Example: - Original error message f157597803 {F225477055} - Updated error message (with D19213617 to generate the error): f158571327 {F225477071} Reviewed By: zheng-xq Differential Revision: D19206240 fbshipit-source-id: bd256862801d5957a26b76d738edf4e531f03827	2020-01-03 13:53:20 -08:00
Jerry Zhang	40e720282c	Using _floats_wrapper in per_channel_tensor generation (#31780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31780 We need to specify width to ensure the generated float is representable by `float32` fixes: https://github.com/pytorch/pytorch/issues/31774 Test Plan: ci Imported from OSS Differential Revision: D19275165 fbshipit-source-id: 50560b4208c562b6bcd2abccadd234f29fbb4b0a	2020-01-03 13:40:08 -08:00
Nikita Shulga	86a4e2135d	Do not register `const float ` type on utiliy_ops.cu (#31583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31583 But rather use `float `, which is alredy registered Test Plan: CI Reviewed By: xianjiec Differential Revision: D19221405 fbshipit-source-id: eb8eabcf828745022bc1e4185a0e65abd19a8f04	2020-01-03 13:28:26 -08:00
Rohan Varma	457c57d9f7	use unordered_set instead of vector for futureTimeouts key in (#31813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31813 Closes https://github.com/pytorch/pytorch/issues/31804. We were using an `std::vector` for the key for a map that keeps track of futures to mark them if they timeout, but we can instead use an `unordered_set`. This results in a faster lookup in the code block where we remove futureIDs from this set when they complete successfully. Previously we were finding them via a linear `std::find`. Switching it to a constant time find will help performance in the case where a large number of futures are scheduled to time out at the same time, or if there is no timeout enforced. To benchmark a rough perf improvement, I created 50k futures with the same timeout. Before this PR, the lookup `std::find(futuresAtTime.begin(), futuresAtTime.end(), id)` took ~200us, now it takes 1us. ghstack-source-id: 96251355 Test Plan: Unit tests pass. Differential Revision: D19269798 fbshipit-source-id: 1a0fa84a478ee27a16ab0b9fa6f5413b065a663e	2020-01-03 13:21:23 -08:00
Junjie Bai	b44c0f328e	Skip same tests in ONNX Python3 CI as in Python2 (#31827 ) Summary: resolve https://github.com/pytorch/pytorch/issues/31103 vgg models were not tested in Python2 but are turned on in Python3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31827 Reviewed By: houseroad Differential Revision: D19274123 Pulled By: bddppq fbshipit-source-id: c48beb574e8b03b2adbd6c9d8ca3f600bee93024	2020-01-03 12:42:42 -08:00
Mingfei Ma	79e30ff3f8	optimize index_select performance on CPU with TensorIterator (#30598 ) Summary: This PR aims at improving `index_select` performance on CPU with `TensorIterator`. The code has equally effective optimization for both contiguous tensor and non-contiguous tensor. The code will try to parallel inner loop in case the slice of copy is large enough, otherwise it will parallel on outer loop. Thus both the user scenarios from DLRM (from `Embedding`) and Fairseq transformer is covered. 1. for contiguous input, single socket: 1.25x performance speedup 2. for non-contiguous input, single socket: 799x performance speedup 3. for contiguous input, single core: same performance 4. for non-contiguous input, single core: 31x performance speedup Pull Request resolved: https://github.com/pytorch/pytorch/pull/30598 Differential Revision: D19266892 Pulled By: VitalyFedyunin fbshipit-source-id: 7aaf8e2c861b4a96250c968c4dd95c8d2c5b92d7	2020-01-03 11:59:43 -08:00
Zafar Takhirov	0ae063d5d9	Fixed concatenation benchmark + added it to the microbenchmarking runs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31587 Test Plan: Imported from OSS Differential Revision: D19221813 Pulled By: z-a-f fbshipit-source-id: ee0eb60da7899b23fdc63326302d1e2fd4b540ee	2020-01-03 11:23:12 -08:00
Edward Yang	9c9d3cd550	Revert D19262570: Fix race condition when creating build dir Test Plan: revert-hammer Differential Revision: D19262570 Original commit changeset: bb18c72e4264 fbshipit-source-id: 40675ef6ef4c98629deaaef0b25956f92534ff50	2020-01-03 11:17:42 -08:00
xiaobing.zhang	a02a5129a8	Move rrelu to Aten(CPU) (#31094 ) Summary: VitalyFedyunin, this PR is about port rrelu activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" m = nn.RReLU(0.1, 0.3).train() # for inference #m = nn.RReLU(0.1, 0.3).eval() #warm up for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Before: ``` Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.03 (ms). input size(128, 10) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 100) forward time is 0.17 (ms); backwad avg time is 0.06 (ms). input size(128, 1000) forward time is 1.45 (ms); backwad avg time is 0.07 (ms). inferecne: input size(128, 1) forward time is 0.01 (ms). input size(128, 10) forward time is 0.01 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.15 (ms). ``` After: ``` Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.03 (ms). input size(128, 10) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 100) forward time is 0.17 (ms); backwad avg time is 0.07 (ms). input size(128, 1000) forward time is 1.43 (ms); backwad avg time is 0.08 (ms). inferecne: input size(128, 1) forward time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.03 (ms). ``` OMP_NUM_THREADS=1: ``` Before: Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 1.45 (ms); backwad avg time is 0.14 (ms). inferecne: input size(128, 1) forward time is 0.01 (ms). input size(128, 10) forward time is 0.01 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.20 (ms). After: Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 1.43 (ms); backwad avg time is 0.15 (ms). inferecne: input size(128, 1) forward time is 0.01 (ms). input size(128, 10) forward time is 0.02 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.06 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24755, https://github.com/pytorch/pytorch/issues/24756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31094 Differential Revision: D19270936 Pulled By: VitalyFedyunin fbshipit-source-id: 11bb3236b1037a558022d3777d1f9a429af2bffe	2020-01-03 11:10:00 -08:00
xiaobing.zhang	b47e9b97a2	Add op bitwise_and (#31104 ) Summary: Refer to https://github.com/pytorch/pytorch/pull/25665, add `bitwise_and` operator. Benchmark script : ``` import timeit #for __and__ for n, t in [(10, 100000),(1000, 10000)]: print('__and__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) #for __iand__ for n, t in [(10, 100000),(1000, 10000)]: print('__iand__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __and__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.1766007635742426 device: cpu, dtype: torch.uint8, 100000 times 0.17322628945112228 device: cpu, dtype: torch.int16, 100000 times 0.17650844901800156 device: cpu, dtype: torch.int32, 100000 times 0.17711848113685846 device: cpu, dtype: torch.int64, 100000 times 0.18240160401910543 device: cuda, dtype: torch.int8, 100000 times 1.273967768996954 device: cuda, dtype: torch.uint8, 100000 times 1.2778537990525365 device: cuda, dtype: torch.int16, 100000 times 1.2753686187788844 device: cuda, dtype: torch.int32, 100000 times 1.2797665279358625 device: cuda, dtype: torch.int64, 100000 times 1.2933144550770521 __and__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031139614060521126 device: cpu, dtype: torch.uint8, 10000 times 0.03091452084481716 device: cpu, dtype: torch.int16, 10000 times 0.022756479680538177 device: cpu, dtype: torch.int32, 10000 times 0.025045674294233322 device: cpu, dtype: torch.int64, 10000 times 0.024164282716810703 device: cuda, dtype: torch.int8, 10000 times 0.12820732593536377 device: cuda, dtype: torch.uint8, 10000 times 0.12775669433176517 device: cuda, dtype: torch.int16, 10000 times 0.12697868794202805 device: cuda, dtype: torch.int32, 10000 times 0.12832533661276102 device: cuda, dtype: torch.int64, 10000 times 0.1280576130375266 __iand__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3687064303085208 device: cpu, dtype: torch.uint8, 100000 times 0.36253443732857704 device: cpu, dtype: torch.int16, 100000 times 0.362891579978168 device: cpu, dtype: torch.int32, 100000 times 0.37680106051266193 device: cpu, dtype: torch.int64, 100000 times 0.3689364707097411 device: cuda, dtype: torch.int8, 100000 times 1.419940729625523 device: cuda, dtype: torch.uint8, 100000 times 1.4247053815051913 device: cuda, dtype: torch.int16, 100000 times 1.4191444097086787 device: cuda, dtype: torch.int32, 100000 times 1.4305962566286325 device: cuda, dtype: torch.int64, 100000 times 1.4567416654899716 __iand__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06224383972585201 device: cpu, dtype: torch.uint8, 10000 times 0.06205617543309927 device: cpu, dtype: torch.int16, 10000 times 0.05016433447599411 device: cpu, dtype: torch.int32, 10000 times 0.05216377507895231 device: cpu, dtype: torch.int64, 10000 times 0.06139362137764692 device: cuda, dtype: torch.int8, 10000 times 0.14827249851077795 device: cuda, dtype: torch.uint8, 10000 times 0.14801877550780773 device: cuda, dtype: torch.int16, 10000 times 0.14952312968671322 device: cuda, dtype: torch.int32, 10000 times 0.14999118447303772 device: cuda, dtype: torch.int64, 10000 times 0.14951884001493454 ``` After: ``` __and__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.23157884553074837 device: cpu, dtype: torch.uint8, 100000 times 0.23063660878688097 device: cpu, dtype: torch.int16, 100000 times 0.23005440644919872 device: cpu, dtype: torch.int32, 100000 times 0.23748818412423134 device: cpu, dtype: torch.int64, 100000 times 0.24106105230748653 device: cuda, dtype: torch.int8, 100000 times 1.4394256137311459 device: cuda, dtype: torch.uint8, 100000 times 1.4436759827658534 device: cuda, dtype: torch.int16, 100000 times 1.4631587155163288 device: cuda, dtype: torch.int32, 100000 times 1.459101552143693 device: cuda, dtype: torch.int64, 100000 times 1.4784048134461045 __and__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.028442862443625927 device: cpu, dtype: torch.uint8, 10000 times 0.028130197897553444 device: cpu, dtype: torch.int16, 10000 times 0.025318274274468422 device: cpu, dtype: torch.int32, 10000 times 0.02519288007169962 device: cpu, dtype: torch.int64, 10000 times 0.028299466706812382 device: cuda, dtype: torch.int8, 10000 times 0.14342594426125288 device: cuda, dtype: torch.uint8, 10000 times 0.145280827768147 device: cuda, dtype: torch.int16, 10000 times 0.14673697855323553 device: cuda, dtype: torch.int32, 10000 times 0.14499565307050943 device: cuda, dtype: torch.int64, 10000 times 0.14582364354282618 __iand__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.25548241566866636 device: cpu, dtype: torch.uint8, 100000 times 0.2552562616765499 device: cpu, dtype: torch.int16, 100000 times 0.25905191246420145 device: cpu, dtype: torch.int32, 100000 times 0.26635489892214537 device: cpu, dtype: torch.int64, 100000 times 0.26269810926169157 device: cuda, dtype: torch.int8, 100000 times 1.485458506271243 device: cuda, dtype: torch.uint8, 100000 times 1.4742380809038877 device: cuda, dtype: torch.int16, 100000 times 1.507783885113895 device: cuda, dtype: torch.int32, 100000 times 1.4926990242674947 device: cuda, dtype: torch.int64, 100000 times 1.519851053133607 __iand__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03425929415971041 device: cpu, dtype: torch.uint8, 10000 times 0.03293587639927864 device: cpu, dtype: torch.int16, 10000 times 0.029559112153947353 device: cpu, dtype: torch.int32, 10000 times 0.030915481969714165 device: cpu, dtype: torch.int64, 10000 times 0.03292469773441553 device: cuda, dtype: torch.int8, 10000 times 0.15792148280888796 device: cuda, dtype: torch.uint8, 10000 times 0.16000914946198463 device: cuda, dtype: torch.int16, 10000 times 0.1600684942677617 device: cuda, dtype: torch.int32, 10000 times 0.16162546630948782 device: cuda, dtype: torch.int64, 10000 times 0.1629159888252616 ``` Fix https://github.com/pytorch/pytorch/issues/24508, https://github.com/pytorch/pytorch/issues/24509, https://github.com/pytorch/pytorch/issues/24655, https://github.com/pytorch/pytorch/issues/24656. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31104 Differential Revision: D18938930 Pulled By: VitalyFedyunin fbshipit-source-id: a77e805a0b84e8ace16c6e648c2f67dad44f2e44	2020-01-03 10:32:36 -08:00
xiaobing.zhang	68f3782106	remove std_single and var_single code in TH (#31608 ) Summary: std_single and var_single in TH never be used, remove them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31608 Differential Revision: D19270920 Pulled By: VitalyFedyunin fbshipit-source-id: e106a42383bf224f7e2c1c092b95484d23af4b0a	2020-01-03 10:16:52 -08:00
leetanenbaum	0b9cd410a9	Fix cumsum error for tensors with zero elements (#31694 ) Summary: Currently `cumsum` crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both `dim()` and `numel()` in cumsum backward Fixes https://github.com/pytorch/pytorch/issues/31515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31694 Reviewed By: mrshenli Differential Revision: D19266613 Pulled By: leedtan fbshipit-source-id: 9407e0aa55440fed911c01a3580bb6c5eab62a16	2020-01-03 10:16:46 -08:00
Hong Xu	daf00beaba	Remove duplicated Numa detection code. (#30628 ) Summary: cmake/Dependencies.cmake (`1111a6b810/cmake/Dependencies.cmake (L595-L609)`) has already detected Numa. Duplicated detection and variables may lead to incorrect results. Close https://github.com/pytorch/pytorch/issues/29968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628 Differential Revision: D18782479 Pulled By: ezyang fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0	2020-01-03 08:48:46 -08:00
Kaiyu Shi	8c425dd201	Fix race condition when creating build dir (#30956 ) Summary: The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956 Differential Revision: D19262570 Pulled By: ezyang fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc	2020-01-03 07:58:26 -08:00
Tongzhou Wang	f56c59ead6	clarify when to use `as_tuple` in `torch.nonzero` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31798 Differential Revision: D19272332 Pulled By: zou3519 fbshipit-source-id: 954d086a7b9f1a719e0dac303a4253bf7ec8e9f4	2020-01-03 07:43:35 -08:00
Owen Anderson	95cb66570a	Erase array sizes from types in c10::str(). (#31683 ) Summary: This dramatically reduces the number of instantiations and eliminates ~900KB of code from my local build of libtorch_cpu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31683 Differential Revision: D19258364 Pulled By: resistor fbshipit-source-id: addb921a26289978ffd14c203325ca7e35a4515b	2020-01-02 22:30:57 -08:00
Rohan Varma	f39105b68f	add num_pending_users to debug info (#31539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31539 Adding this metric primarily because it is needed to unblock unit tests for https://github.com/pytorch/pytorch/pull/31381. It also may be useful to look at this metric to see the number of pending RRef forks that currently exist. ghstack-source-id: 96230360 Test Plan: Modified the relevant unit test. Differential Revision: D19204158 fbshipit-source-id: 016345e52cd02cc5f46837bffd8d589ba8575f29	2020-01-02 21:28:03 -08:00
Junjie Bai	5be8dac329	Remove non-ascii character from torch/onnx/symbolic_opset11.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31814 Reviewed By: houseroad Differential Revision: D19270742 Pulled By: bddppq fbshipit-source-id: 80800d588e63701d6e1b5838d7ada993f0246a81	2020-01-02 20:54:32 -08:00
Jiakai Liu	fc598f9023	generate op dependency graph as python code Summary: Add support to print op dependence as python code so that both custom build script and BUCK can import it without yaml parser. Test Plan: - generate the file: ``` ANALYZE_TORCH=1 FORMAT=py DEPLOY=1 tools/code_analyzer/build.sh -closure=false ``` - load the file in python: ``` python >>> from tools.code_analyzer.generated.torch import TORCH_DEPS >>> print(TORCH_DEPS) ``` Differential Revision: D18894639 Pulled By: ljk53 fbshipit-source-id: e304d0525a07a13cf6e8a9317cd22637200d044c	2020-01-02 20:26:28 -08:00
Jiakai Liu	fa0424f224	add LLVM-dev package to android docker image (#31215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31215 Install LLVM-dev package for code analysis CI job: #30937 LLVM-dev package is not related to android NDK but the whole code analysis thing is for mobile custom build so choose this docker image. Test Plan: - wait docker image to build? Differential Revision: D19193223 Pulled By: ljk53 fbshipit-source-id: 54a79daf8d98fa7c8b9eed11f519e1c7b1614be8	2020-01-02 20:26:24 -08:00
Rohan Varma	dc43f9dc54	fix test_backward_node_failure flakiness (#31588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31588 Per title. This test can sometimes fail with a different error regex than the one that is currently tested, so add this error regex to make the test pass consistently. Differential Revision: D19222275 fbshipit-source-id: 89c95276d4d9beccf9e0961f970493750d78a96b	2020-01-02 15:44:16 -08:00
Shen Li	155376721c	Pin hypothesis package to 4.57.1 to avoid test failures Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31794 Test Plan: Imported from OSS Differential Revision: D19266039 Pulled By: mrshenli fbshipit-source-id: 4b1839c4de2b4476c8173a79582c861bf4fa998f	2020-01-02 15:33:03 -08:00
Shen Li	5f8308e32d	Pin Pillow to v6 as PILLOW_VERSION is removed in v7 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31777 Test Plan: Imported from OSS Differential Revision: D19264247 Pulled By: mrshenli fbshipit-source-id: 52b0a3629e3a96ef2f9d3e289b9f7bb6a2745786	2020-01-02 15:32:58 -08:00
svcscm	feb0ccdbfd	Updating submodules Summary: GitHub commits: `123ae291fc` `b9e9d4f7d9` `86ea03e727` `1cd1bfb668` `917504ac42` `06cc652030` `e63819cbe3` `6d21d8cfd3` `b636829d55` `19d0faece2` `9860344e10` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 1de7509af788dc7861cfc779936fbc9e0146a5a5	2020-01-02 14:35:41 -08:00
Nikita Shulga	ed5cd0d742	Use numeric limits to define TensorTypeSet(FULL) representation (#31668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31668 This also removes an annoying warning about change of sign conversion Test Plan: Run unit tests Reviewed By: ezyang Differential Revision: D19238631 fbshipit-source-id: 29b50abac635e530d5b0453c3a0f36a4573fbf5b	2020-01-02 12:54:02 -08:00
olramde	d770fbc1d2	Some modifications to improve readability (#31352 ) Summary: In the long string, formalstring thinks it is good to have a name. When using dict, literal is better for readability and faster than dict constructor. I always appreciate your efforts in creating the world's best frameworks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352 Differential Revision: D19191967 Pulled By: ngimel fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991	2020-01-02 12:48:34 -08:00
Lu Fang	7078f4b27d	skip _test_optional_float in BC check (#31786 ) Summary: Skip _test_optional_float Pull Request resolved: https://github.com/pytorch/pytorch/pull/31786 Reviewed By: hl475 Differential Revision: D19265059 Pulled By: houseroad fbshipit-source-id: 6b95bd3b8cad83a4c459c0603befaaeeade6cdff	2020-01-02 11:12:38 -08:00
svcscm	37fc59e847	Updating submodules Summary: GitHub commits: `17caab3d7b` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: f4828cd5c81615d0df86f915b3abb6a58509aa79	2020-01-02 10:57:58 -08:00
Kirayue	9e9bfbfd8d	Update old scheduler example usage (#31358 ) Summary: Update the old example usage in CosineAnnealingWarm, `scheduler.step()` should be called after `optimizer.step()`. https://github.com/pytorch/pytorch/issues/20028#issuecomment-566061580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31358 Differential Revision: D19199311 Pulled By: vincentqb fbshipit-source-id: cb29b95f8277d2dfa75ec2a83c1af03a5c9c9a69	2020-01-02 09:15:04 -08:00
BowenBao	c4f10e0fe7	Renaming scales parameter for interpolate (#31526 ) Summary: PR separated from https://github.com/pytorch/pytorch/pull/31274. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31526 Reviewed By: zou3519 Differential Revision: D19221931 Pulled By: gchanan fbshipit-source-id: 81958a9910867ac9d62f2b47abc49384526c4e51	2020-01-02 08:19:30 -08:00
Vishwak Srinivasan	236b0a318c	Delete ATen/stub (#31763 ) Summary: This folder contained an empty CombinedStub file which isn't explicitly used anywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31763 Differential Revision: D19262563 Pulled By: ezyang fbshipit-source-id: 5d095c93d6f7a1cc35f5919aa6006b31c2376b18	2020-01-02 07:04:07 -08:00
Lu Fang	cb1af5f61f	Revert D19233558: add float[] str[] constants Test Plan: revert-hammer Differential Revision: D19233558 Original commit changeset: 4f7c6d9ddbe7 fbshipit-source-id: a5020a9169e349a5970323471d673e8cd7818c66	2019-12-31 11:57:34 -08:00
peterjc123	7a3ed36309	Fix nvcc math functions for MSVC 2019 (#31704 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31108. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31704 Differential Revision: D19256110 Pulled By: mingbowan fbshipit-source-id: a4aba2830aba002497f70a75ef995e5e7de08393	2019-12-31 10:52:12 -08:00
Shen Li	1499b894c4	Apply clang-format to csrc/distributed/rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31681 Test Plan: Imported from OSS Differential Revision: D19247085 Pulled By: mrshenli fbshipit-source-id: ce6c1710663eecda3641d8dcf80ef16f9d21b93e	2019-12-31 07:25:50 -08:00
Jiyan Yang	b102550d2c	Allow to pass in masks through db (#31676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31676 Facebook： Previously we assumed mask is passed in as a tensor which is not feasible for sparse parameter. Here we allow to pass in the mask through db path which requires the masks to be stored in some db first. Test Plan: unit tests Reviewed By: ellie-wen Differential Revision: D18928753 fbshipit-source-id: 75ca894de0f0dcd64ce17b13652484b3550cbdac	2019-12-30 20:54:27 -08:00
Pritam Damania	39297bfe08	Fix flaky test_debug_info. (#31675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31675 This test could be flaky since there could be inflight RPC requests as part of startup which might not have finished. As a result, if they finish between the different calls to retrieve debug_info, there could be a problem since we would report separate information. As a result, we wait to ensure the metrics stabilize to avoid flakiness. ghstack-source-id: 96188488 Test Plan: waitforbuildbot Differential Revision: D19242588 fbshipit-source-id: 8f3db7e7365acbd3742e6ec0c2ddcca68f27db9e	2019-12-30 18:07:26 -08:00
Xinyi Zhang	f4e955ff62	Change PackSegments to ensure consistent behavior between CPU and GPU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31673 Reviewed By: Wakeupbuddy, BIT-silence Differential Revision: D18925762 fbshipit-source-id: e0c318e97f69b14a54f43c176af57d98fbc16c9f	2019-12-30 13:31:45 -08:00
Elias Ellison	dd0f2f0c19	add float[] str[] constants (#31503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31503 Add support for float lists and string lists constants, which enables better constant propagation + constant pooling + freezing. Test Plan: Imported from OSS Differential Revision: D19233558 Pulled By: eellison fbshipit-source-id: 4f7c6d9ddbe7623757a9a20606ce5f394e14e93d	2019-12-30 11:58:17 -08:00
davidriazati	6064223808	`@slowTest` some slow tests (#31706 ) Summary: These are all the jit tests that take > 10 seconds according to `pytest test/test_jit.py --durations=15` ``` 32.76s call test/test_jit.py::TestModels::test_super_resolution 32.20s call test/test_jit.py::TestModels::test_neural_style 30.90s call test/test_jit.py::TestJit::test_export_batchnorm 25.95s call test/test_jit.py::TestJit::test_dropout_module_requires_grad 22.24s call test/test_jit.py::TestJitGeneratedModule::test_nn_Transformer 12.38s call test/test_jit.py::TestScript::test_fuser_double_float_codegen ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31706 Pulled By: driazati Differential Revision: D19251567 fbshipit-source-id: 8e76f717506b8bf28d1a63ce302feb0446dc9141	2019-12-30 11:45:24 -08:00
Natalia Gimelshein	ee87b01f40	add additional types to indexing operations dispatch (#31692 ) Summary: - Fixes https://github.com/pytorch/pytorch/issues/31672 - Adds Bfloat16 dispatch to the indexing operations that were missing it - index_put on cuda does not have bfloat16 dispatch, because I'm not sure bfloat16 math ops work on cuda Note: `index_put_` with `accum=True` is enabled for `bool`, which does not make much sense, but I'm not the one who started it, so this behavior is preserved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31692 Differential Revision: D19249561 Pulled By: ngimel fbshipit-source-id: 1269196194f7b9f611b32be198c001704731a78f	2019-12-29 23:03:54 -08:00
vishwakftw	22d84204f7	Expose torch.poisson in documentation (#31667 ) Summary: Changelog: - Add doc string for torch.poisson briefing current behavior - Check for non-positive entries in the tensor passed as input to torch.poisson Closes https://github.com/pytorch/pytorch/issues/31646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31667 Differential Revision: D19247371 Pulled By: ngimel fbshipit-source-id: b53d105e73bf59a45beeb566f47365c3eb74efca	2019-12-28 21:32:26 -08:00
WANG	3b7916fccd	Modify the order of arguments position of torch.std and torch.std_mean in doc (#31677 ) Summary: Change log: - [x] Change the order of arguments position of torch.std and torch.std_mean in doc. - [x] Correct a spelling mistake of torch.std_mean in doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31677 Differential Revision: D19247372 Pulled By: ngimel fbshipit-source-id: 8685f5207c39be524cdc81250430beac9d75f330	2019-12-28 20:36:26 -08:00
Shen Li	e8e47c0a1b	Split RRef class into abstract RRef and RRefBase (#28942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28942 The new abstract RRef class contains only user-facing RRef APIs. It will be later moved to a common folder so that it can be shared by jit and distributed packages to provide TorchScript support. Test Plan: Imported from OSS Differential Revision: D18240590 Pulled By: mrshenli fbshipit-source-id: ac28cfc2c8039ab7131b537b2971ed4738710acb	2019-12-28 20:01:02 -08:00
Jiyan Yang	90a187618e	Integrate masked sparse Adagrad (#31641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31641 Assuming mask is provided as a tensor Test Plan: unit test Reviewed By: ellie-wen Differential Revision: D18928737 fbshipit-source-id: a4f3dd51769c2b56e5890043e91c18e6128be082	2019-12-27 18:40:50 -08:00
anjali411	ae214f67a5	updated code to ensure error check for negative dims Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31636 Differential Revision: D19233031 Pulled By: anjali411 fbshipit-source-id: c29265ddd1f887f1a0b98aca56a2691d7584353d	2019-12-27 14:39:57 -08:00
Mingbo Wan	647569e546	get rid of choco install (#30897 ) Summary: 7zip and cmake are part of base image, no need to re-install. Remove the install step can make build/test more stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30897 Differential Revision: D19232961 Pulled By: mingbowan fbshipit-source-id: fa3bbd1325839a2a977bf13fdbd97fda43793b8d	2019-12-27 13:12:04 -08:00
Dehua Cheng	35bee0c729	separate op for rowwise counter (#31612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31612 Count the number recent update on rows. Exponential decay is applied on the counter with decay rate r, such that r^{counter_halflife} = 0.5; If counter_halflife is nonpositive, this operator is turned off. Test Plan: added unittest Reviewed By: chocjy Differential Revision: D19217921 fbshipit-source-id: 96d850123e339212cc0e0ef352ea8a1b1bf61dfa	2019-12-27 12:18:39 -08:00
Gregory Chanan	e84e7ec556	Kill aten_custom_call. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25613 Test Plan: Imported from OSS Differential Revision: D17172503 Pulled By: gchanan fbshipit-source-id: 1456ecca8f459d008e335412cd7084bdfcb93439	2019-12-27 11:08:42 -08:00
Yinghai Lu	b522a8e1ff	Optimize zero length input (#31602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31602 Pull Request resolved: https://github.com/pytorch/glow/pull/3943 Zero length input is something we hit fairly frequently in practice. Previous handling of global TensorPool involves two locks per input (acquire and reclaim). Here we use a specialized anchor tensor to host zero length input. Note that it is only padded to max sequence length. If necessary, an easy extension can be added to pad to max `InputPlaceholder.getType().size()`. Reviewed By: jfix71 Differential Revision: D19192467 fbshipit-source-id: cafdc1eb7bf9b9d6ead04a0243b0be838f6b71cd	2019-12-26 22:31:15 -08:00
Lu Fang	204939b401	Automatic update of fbcode/onnx to 57ebc587fcf3913b4be93653b0dd58c686447298 (#31642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31642 Previous import was c08a7b76cf7c1555ae37186f12be4d62b2c39b3b Included changes: - [57ebc587](https://github.com/onnx/onnx/commit/57ebc587): python_out does not recognize dllexport_decl. (#2482) <xkszltl> - [477a9b87](https://github.com/onnx/onnx/commit/477a9b87): Edited PythonAPIOverview.md (#2491) <AlexMuresan> - [59b9f908](https://github.com/onnx/onnx/commit/59b9f908): Minor correction type (#2411) <Jhuo IH> - [cdc8b861](https://github.com/onnx/onnx/commit/cdc8b861): fix the optimize pass of fuse_consecutive_transposes (#2471) <XavierAtShanghai> - [ad1f5567](https://github.com/onnx/onnx/commit/ad1f5567): Add clarification for bias quantization in QlinearConv Op spec (#2464) <Ashwini Khade> - [d9a73ccc](https://github.com/onnx/onnx/commit/d9a73ccc): Add remove operator and function requirements to the add new op doc. (#2486) <Emad Barsoum> Test Plan: cont build Reviewed By: hl475 Differential Revision: D19234753 fbshipit-source-id: 4b7de1407d9b64e584f6e6d68cbe03fa1b4c854d	2019-12-26 21:25:04 -08:00
Lu Fang	ffcac9ad37	Clean White List for BC Checks (#31629 ) Summary: Delete obsolete items Pull Request resolved: https://github.com/pytorch/pytorch/pull/31629 Reviewed By: hl475 Differential Revision: D19231522 Pulled By: houseroad fbshipit-source-id: 393ed630f7854b643c8fa8c5f3f576718934de96	2019-12-26 21:21:39 -08:00
Jiyan Yang	4983ef8de1	Integrating MaskedAdagrad Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31640 Test Plan: unit test Reviewed By: ellie-wen Differential Revision: D18805278 fbshipit-source-id: 1def4a89b7e4e04385c762bf127d95c5e513180e	2019-12-26 17:18:39 -08:00
Jie	909b8eba0d	cudnn grouped convolution nhwc patch (#31444 ) Summary: Earlier cudnn version doesn't support grouped convolution in NHWC well. Legit configuration in later cudnn version might return CUDNN_STATUS_NOT_SUPPORTED. We are falling back to NCHW when runtime check of cudnn version is < 7.6.0 to keep the logic simple. Note: We might update the heuristics, 7.6.0 is very conservative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31444 Differential Revision: D19232414 Pulled By: VitalyFedyunin fbshipit-source-id: 4c2d79ed347c49cd388bbe5b2684dbfa233eb2a3	2019-12-26 17:16:02 -08:00
Fan Wang	39508501a4	Create byte-aware word lstm benchmark (#31260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31260 1. Update the LiteLM dataset conversion script (fbcode/pytext/fb/tools/lite_lm_dataset_to_tensorproto.py) 2. Created a benchmark json file for byte-aware lstm word model (xplat/aibench/specifications/models/caffe2/assistant/lite_lm_len5.json) 3. In order to run the model -- created an int64 Tensor for the model, added batch gather ops to the BUCK file Test Plan: ``` 1. Create tensorproto of the model input buck run mode/opt //pytext/fb/tools:byte_lm_dataset_to_tensorproto -- --in-path /mnt/vol/pytext/smart_keyboard/aibench/test_5.txt --out-path /mnt/vol/pytext/smart_keyboard/aibench/byteAwareWordLM/ --hidden_dim 203 --layers_num 2 --max_seq_len 64 --max_byte_len 15 2. Run the aibench command buck run fbsource//xplat/aibench:run_bench -- -b aibench/specifications/models/caffe2/assistant/lm_byte_lstm_len5.json --remote --devices SM-G960U-8.0.0-26 ``` Reviewed By: gardenia22 Differential Revision: D17785682 fbshipit-source-id: 351c3c8bae16449e72ac641522803b23a83349be	2019-12-26 16:44:30 -08:00
Avinash Madasu	91eb7c26cd	Fix Typos Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31630 Differential Revision: D19233162 Pulled By: zou3519 fbshipit-source-id: c2716a2df2b2ccfeda7718b484e9605515ecdf01	2019-12-26 15:47:10 -08:00
svcscm	34dce8e348	Updating submodules Summary: GitHub commits: `a40d608341` `50e0ea13e5` `bcbdec74f4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 3de13d5b9b20ec18927ee3f0224df789172a3e9c	2019-12-26 15:06:04 -08:00
davidriazati	ec4e347744	Add Python language reference docs (#30686 ) Summary: This exposes our audit of https://docs.python.org/3/reference/ with descriptions for each line item. To generate the `.rst` from the Quip: ```bash pip install m2r m2r jit_language_reference.md ``` https://driazati.github.io/pytorch_doc_previews/30686/jit.html#python-functions-and-modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/30686 Pulled By: driazati Differential Revision: D19219587 fbshipit-source-id: 249db9b5ee20e38804d4302bbfeca7d54f27d0bd	2019-12-26 13:21:36 -08:00
Lu Fang	5d95a9ca79	Print all broken ops instead of the first one (#31628 ) Summary: Originally, we only print one broken schema. With this changeset, all the broken schemas are printed out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31628 Reviewed By: hl475 Differential Revision: D19231444 Pulled By: houseroad fbshipit-source-id: 3dd5b4609a6a9a9046e95f2f30deb9beeb5dcd56	2019-12-26 12:51:43 -08:00
svcscm	cf46bcace8	Updating submodules Summary: GitHub commits: `faebc336da` `23d8703808` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 0368879112c318607821bbf3a081669dade19148	2019-12-26 12:27:04 -08:00
Gregory Chanan	866c1b1fcc	Ensure legacy sparse constructor/new doesn't interpret python data as tensor data. (#31490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31490 When this happens, a dense tensor is constructed from a sparse constructor. Fixes: https://github.com/pytorch/pytorch/issues/16154 Test Plan: Imported from OSS Reviewed By: cpuhrsch, mrshenli Differential Revision: D19196498 Pulled By: gchanan fbshipit-source-id: 57a6324833e35f3e62318587ac74267077675b93	2019-12-26 10:46:18 -08:00
svcscm	e2951d586d	Updating submodules Summary: GitHub commits: `11a904583d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: f00bf65aebddb4541faa2626d42ac436e090ee89	2019-12-26 09:49:33 -08:00
Gregory Chanan	29f345831e	Error out if legacy Tensor.new is called on alternate layouts / dtypes (#31485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31485 Fixes: https://github.com/pytorch/pytorch/issues/22158 Test Plan: Imported from OSS Differential Revision: D19196499 Pulled By: gchanan fbshipit-source-id: a01ea7641b5fcd00a9d267243539ff64a5492e5f	2019-12-26 07:27:24 -08:00
Jongsoo Park	a54dc87e8e	revert D18805532 and make numerics of masked adagrad consistent with unmasked adagrad (#30784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30784 Instead of putting experimental Masked*Adagrad to OSS, we decided to change D18805278 . Test Plan: CI Reviewed By: chocjy Differential Revision: D18824265 fbshipit-source-id: 3d893fe6c441f2ff7af4c497cf81b9c49363e7a8	2019-12-24 10:02:13 -08:00
Lu Fang	363d8be787	Bypass _TorchScriptTesting_StackString::pop in BC check now (#31586 ) Summary: Failed result: https://circleci.com/gh/pytorch/pytorch/4054919?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console Original PR: https://github.com/pytorch/pytorch/pull/30242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31586 Reviewed By: hl475 Differential Revision: D19222086 Pulled By: houseroad fbshipit-source-id: 96db2bf18fa06eaebdd558e86615e26b95f34516	2019-12-23 22:00:20 -08:00
Kaikai Wang	46ad80c839	Fix null pointer dereference on Android for strtod_c (#31582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31582 D19124934 removed a dummy pointer passed to strtod_c() that's used only for Android (https://fburl.com/diffusion/zkv34jf1). Without it, jit parsing on Android start throwing SIGSEGV due to null pointer dereferencing. This diff adds the dummy pointer back. Test Plan: Tests Reviewed By: driazati, shoumikhin Differential Revision: D19221071 fbshipit-source-id: 2e230c3fbfa873c3f7b92f73c87ee766ac182115	2019-12-23 20:08:13 -08:00
davidriazati	446e9af5b9	Fix parsing of big float literals (#29940 ) Summary: Stacked PRs * #29940 - [jit] Fix parsing of big float literals * #29935 - [jit] Fix hex literal parsing * #29931 - [jit] Throw a better error for int too big for int64_t ](https://our.intern.facebook.com/intern/diff/19186604/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29940 Pulled By: driazati Differential Revision: D19186604 fbshipit-source-id: 6ef66588a5cf956f281e7bd1e5584ef06f5296e9	2019-12-23 17:21:07 -08:00
Xiang Gao	218cfd568d	Conv transpose/backward split 32bit (#31510 ) Summary: Basically the same as https://github.com/pytorch/pytorch/pull/31379 except for that I write a separate function `split_batch_dim_to_32bit_out` for the logic. This function could also be used for convolution forward, and I will rebase this PR after https://github.com/pytorch/pytorch/issues/31379 get merged and then change `raw_cudnn_convolution_forward_out` to use `split_batch_dim_to_32bit_out` here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31510 Differential Revision: D19210563 Pulled By: ngimel fbshipit-source-id: e20bb82b6360aa2c0e449e127188c93f44e1e9b4	2019-12-23 11:34:17 -08:00
Aditya Kumar	fb63c0e2c9	Remove -Wno-unused-private-field Test Plan: Sanity check Reviewed By: nlutsenko Differential Revision: D18833450 fbshipit-source-id: c69b6679b4caa3e868ca41113cd502c8905a776b	2019-12-23 10:59:00 -08:00
Gregory Chanan	68e5172382	Support optional float parameters (float?, optional<double>). (#31517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517 This is going to be used by upsample (which currently uses magic values to represent optionals). For now, we just introduce a fake function for testing (torch._test_optional_float(x)). Test Plan: Imported from OSS Differential Revision: D19198721 Pulled By: gchanan fbshipit-source-id: 0a1382fde0927c5d277d02d62bfb31fb574b8c74	2019-12-23 08:33:39 -08:00
Vincent Quenneville-Belair	9459db86bf	Raise warning for schedulers following chainable shedulers (#31125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29697. Raise warning for schedulers following chainable schedulers in https://github.com/pytorch/pytorch/issues/26423. See explanation for * [new warning when load/save](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564655802) * [change from deprecation to user warning](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564659775). gchanan -- This should go in the upcoming release following https://github.com/pytorch/pytorch/issues/26423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31125 Differential Revision: D19143740 Pulled By: vincentqb fbshipit-source-id: 35b55fe6c5b39ca5a68b1a6e19f14eb95b9a784e	2019-12-23 08:24:22 -08:00
Rohan Varma	fe76af96ed	fix test_process_group_debug_info flaky test (#31533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31533 Fixes this test that was flaky and has been disabled (see https://github.com/pytorch/pytorch/issues/31112) ghstack-source-id: 96038999 Test Plan: Run the test 1000 times and ensure that it passes. Differential Revision: D19203366 fbshipit-source-id: 7978cbb8ca0989a0a370a36349cdd4db3bb8345b	2019-12-22 18:01:21 -08:00
Rohan Varma	cc2d5ca37f	add enabled API to autograd profiler (#31380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31380 For being able to profile async RPCs, we attach a `RecordFunction` object to the future that is created during the RPC to persist it across the lifetime of the RPC (this is implemented in the next PR: ). Since we'd only like to do this when profiling is enabled, this PR adds an enabled API to the autograd profiler. ghstack-source-id: 96053933 Test Plan: Modified unit test. Differential Revision: D19050391 fbshipit-source-id: aa382110e69d06b4a84c83b31d2bec2d8a81ba10	2019-12-22 16:24:59 -08:00
James Reed	7d630278da	Separate torchbind from Python (#30242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29501 Currently blocked on schema serialization issue Test Plan: Imported from OSS Differential Revision: D18463063 Pulled By: jamesr66a fbshipit-source-id: c12a1b644eb9bf04e68ff93cccf91d6cb3e75359	2019-12-21 22:52:40 -08:00
Xiang Gao	700109eb63	set stream everytime when we get a cuDNN handle (#31541 ) Summary: cudnn version of https://github.com/pytorch/pytorch/pull/31537 https://github.com/pytorch/pytorch/pull/31532 is a quick fix and this is a bigger change. This would deprecate https://github.com/pytorch/pytorch/pull/31532, but we could also merge https://github.com/pytorch/pytorch/pull/31532 first for a quick fix and then work on this later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31541 Differential Revision: D19206753 Pulled By: ngimel fbshipit-source-id: 3352f923d13a9baf0971f64f8b7ce03e9a8b42b1	2019-12-20 21:34:40 -08:00
Xiang Gao	b5bbec7bad	set stream everytime when we get a cuSparse handle (#31538 ) Summary: cuSparse version of https://github.com/pytorch/pytorch/pull/31537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31538 Differential Revision: D19206895 Pulled By: ngimel fbshipit-source-id: a32c0bc310189a89a0098837438d62458b5c0a7c	2019-12-20 21:31:17 -08:00
Xiang Gao	8d8e82883e	set stream everytime when we get a cuBlas handle (#31537 ) Summary: I don't see any reason for not doing so, because it is a common error that people forget to set the stream. And I don't think there is a reason for not running on the current stream. This is just for cublas, cusparse and cudnn should be modified also. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31537 Differential Revision: D19206908 Pulled By: ngimel fbshipit-source-id: ba2b2b74e9847f0495c76dbc778751a9f23f8b36	2019-12-20 21:31:13 -08:00
Xiang Gao	0b0f90f53c	Split on batch dimension when 32bit indexing not enough for convolution forward (#31379 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/22496 This is just a first step towards the support of 64bit convolution on CUDA. In the forward of convolution, if the total tensor size is larger than 2^31, then we split it on the batch dimension. I want to get some review feedback before moving forward for the same splitting approach for backward. There are real-world use cases that even when N=1 the input is still larger than 2^31. For this case, the splitting would be complicated, so I am planning to modify `use_cudnn` to just dispatch to the slow fallback kernel in PyTorch in a later PR. Update: `later PR` is https://github.com/pytorch/pytorch/pull/31383 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31379 Differential Revision: D19192018 Pulled By: ngimel fbshipit-source-id: c26ecc56319ac67c4d5302ffed246b8d9b5eb972	2019-12-20 21:27:06 -08:00
Mingbo Wan	3820d6f6b9	make gc script python2 compatible (#31536 ) Summary: get rid of f-string, somehow we still have python2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31536 Differential Revision: D19204187 Pulled By: mingbowan fbshipit-source-id: da8e17e4dccdd6fd1b0e92eb4740f5a09a8a4209	2019-12-20 16:34:33 -08:00
Ivan Kobzarev	c808eed04a	Nightly dimension, input shape in gradle (#30195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30195 1. Added flavorDimensions 'build' local/nightly to be able to test the latest nightlies ``` cls && gradle clean test_app:installMobNet2QuantNightlyDebug -PABI_FILTERS=x86 --refresh-dependencies && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity ``` 2. To be able to change all new model setup editing only `test_app/build.gradle` Inlined model asset file names to `build.gradle` Extracted input tensor shape to `build.gradle` (BuildConfig) Test Plan: Imported from OSS Differential Revision: D18893394 Pulled By: IvanKobzarev fbshipit-source-id: 1fae9989d6f4b02afb42f8e26d0f3261d7ca929b	2019-12-20 16:08:04 -08:00
Ivan Kobzarev	3a19980b78	Tensor class created from java does not call native methods Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31520 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D19199477 Pulled By: IvanKobzarev fbshipit-source-id: ba51454586a9385dba4ab73936f907346e0105d1	2019-12-20 14:40:54 -08:00
Martin Yuan	11854bcd38	Add test to torch.jit.export_opnames, make the _C function private Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31446 Test Plan: Imported from OSS Differential Revision: D19172851 Pulled By: iseeyuan fbshipit-source-id: f06d8766ed73c9abe4ebf41c402ee64880d745be	2019-12-20 13:38:43 -08:00
svcscm	81329c907d	Updating submodules Summary: GitHub commits: `cbce6d17bb` `4762e080cf` `174107c0a4` `8dee0e0058` `ce52b27b4d` `f89dea4fec` `b269fc595c` `5b014c641e` `ae2d7e11a2` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 252ea5198c3fe4ecfe24e878ea701c48c57618de	2019-12-20 13:35:02 -08:00
David Reiss	35b249769d	Exclude lite interpreter Java files from OSS host build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31204 Test Plan: Imported from OSS Differential Revision: D19200610 Pulled By: dreiss fbshipit-source-id: 0cf41c99b4c2604afc2dccfebbea213c0e1f9638	2019-12-20 13:32:27 -08:00
Jerry Zhang	08de70cad1	Remove observers in the end (#31407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31407 Remove observers in the end instead of before quantize tensor since we still need them to find the quantization paramters for each module instance Test Plan: . Imported from OSS Differential Revision: D19162367 fbshipit-source-id: f817af87183f6c42dc97becea85ddeb7e050e2b1	2019-12-20 13:17:26 -08:00
Jerry Zhang	b4c48b7e29	Call `getQSchemeAndQParamMap` later in `quantizeTensors` (#31406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31406 Previously we record quantization parameters for a given value when we collect the observer nodes, but actually the quantization parameter can vary depending on each module instance, to achieve that, we need to delay the call to later stage and only record the `Value*` that's needed in `collectObserverNodesAndValueToQuantize` function Test Plan: . Imported from OSS Differential Revision: D19162369 fbshipit-source-id: e0f97e322d18a281bf15b6c7bbb04c3dfacb512f	2019-12-20 13:17:21 -08:00
Sam Gross	df9d5b8a77	Use macros instead of directly accessing Python object fields (#31388 ) Summary: The Python C API documentation states "Access to the [PyObject] members must be done by using the macros Py_REFCNT and Py_TYPE." Pull Request resolved: https://github.com/pytorch/pytorch/pull/31388 Differential Revision: D19161790 Pulled By: colesbury fbshipit-source-id: ac9a3738c913ad290a6d3460d0d657ec5c13b711	2019-12-20 12:11:17 -08:00
Nikolay Korovaiko	5375ceae80	run optimizations on pre-profiled graph (#31392 ) Summary: This is the first stab at running profile-insensitive optimizations on pre-profiled graphs. Running those optimizations has a potential to simplify graphs greatly before GuardElimination and GuardElimination should be able to remove more guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31392 Differential Revision: D19173639 Pulled By: Krovatkin fbshipit-source-id: 2485a2a598c10f9b5445efb30b16439ad4551b3f	2019-12-20 10:49:08 -08:00
James Reed	256db1e61b	Add fake parsing for torchbind classes in schema type parser Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31506 Test Plan: Imported from OSS Differential Revision: D19187722 Pulled By: jamesr66a fbshipit-source-id: 4529409454d64393a821b8fa795db39bc82da8fc	2019-12-20 10:28:57 -08:00
Jongsoo Park	7a12ccd003	optimize FloatToFused8BitRowwiseQuantized and Fused8BitRowwiseQuantizedToFloat (#31470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31470 Optimize performance of these two operators. Additionally use nearbyint instead of round to be consistent with 4-bit embedding table quantization. Reviewed By: hyuen Differential Revision: D19072103 fbshipit-source-id: efe96f14aeff7958cceb453ed625d3fd693891ff	2019-12-20 10:09:26 -08:00
neginraoof	0b57b383b1	Im2col export (#30972 ) Summary: Added im2col to opset 11. This symbolic is used to export torch.nn.Unfold Pull Request resolved: https://github.com/pytorch/pytorch/pull/30972 Reviewed By: hl475 Differential Revision: D18946921 Pulled By: houseroad fbshipit-source-id: 13dd0cbae899700df32fd74d6dff1f29033a2b4c	2019-12-20 09:45:45 -08:00
Nikita Shulga	6cd987e7c0	Make fully_qualified_type_name_impl() compatible with VS2017 15.9 (#31455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31455 In 15.9, __FUNCSIG__ unwraps using definitions as well as preserves noexcept qualifiers Test Plan: Build caffe2 on Windows using VS2017 Differential Revision: D19166204 fbshipit-source-id: b6c5f70e5262d13adf585f77b92223cf5f1e78dd	2019-12-20 09:17:44 -08:00
Nikita Shulga	2099cfa13d	Fix input_channels divisibility check in concat_split_op (#31448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31448 Replace `(!x%y)` with `(x%y != 0)` Test Plan: CI Reviewed By: orionr Differential Revision: D19165492 fbshipit-source-id: 246635fb8ddd5823196bcef9d0e6cdf1c349015e	2019-12-20 09:12:54 -08:00
Gregory Chanan	b38901aa15	Test reading `__cuda_array_interface__` inferred strides. (#31451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31451 The PR that fixed this, https://github.com/pytorch/pytorch/pull/24947, didn't add a test. Fixes: https://github.com/pytorch/pytorch/issues/31443 Test Plan: Imported from OSS Differential Revision: D19170020 Pulled By: gchanan fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b	2019-12-20 08:21:39 -08:00
Brian Vaughan	d0d6e0b5e3	add type promotion support for sparse tensors (#30429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30429 also fix a bug in uncoalesced division General approach here is that we: * compute the common dtype based on input tensors * error if the output tensor is specified and the common type can't be cast back to the output type (e.g. for inplace ops) * convert input tensor (values) to the common dtype * perform the op as normal (computing at the common dtype instead of the result type). * convert/copy the result values back to that of the result tensor (for in-place ops). For uncoalesced division we need to coalesce, because an integral tensor with values=[1,1] at the same index divided by 2 would give 1/2 + 1/2 =0 instead of 2/2=1. Test Plan: Imported from OSS Differential Revision: D19143223 Pulled By: nairbv fbshipit-source-id: 480fa334c0b2b3df046818f2342cfd4e2d9d892a	2019-12-20 08:01:00 -08:00
svcscm	e9ef087d2d	Updating submodules Summary: GitHub commits: `357842e091` `d62f47c763` `dc94cd4972` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: dcb9813e1469cc867d9c826daa873c535ef408ab	2019-12-20 00:57:39 -08:00
Chunli Fu	4c341582ea	modify model to enable loading by blob (#31507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31507 This script is used to generate a model with bound shape inference and blob reorder, which are requirements for big model loading on T17. 1. Load existing model. 2. Do bound shape inference and blob reorder (put embedding blobs at the end). 3. Save the modified model. Test Plan: Generated a new moel and tested on NNPI. P124181047 (mismatch is AA variance) Reviewed By: ipiszy Differential Revision: D19165467 fbshipit-source-id: c3522fc5dc53b7ec652420558e9e8bf65a1ccfae	2019-12-19 21:57:22 -08:00
davidriazati	06dbef663d	Add support for `del` (#31273 ) Summary: Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts Fixes #20615 ](https://our.intern.facebook.com/intern/diff/19181473/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273 Pulled By: driazati Differential Revision: D19181473 fbshipit-source-id: c42a2d43ec361a98e0c425232981edc9c39388c4	2019-12-19 21:48:11 -08:00
Xiang Gao	624088e444	Don't dispatch to cudnn if it is not possible to make it 32bit by splitting batch dim (#31383 ) Summary: Also a step towards supporting 64bit indexing in convolution. See also: https://github.com/pytorch/pytorch/pull/31379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31383 Differential Revision: D19183443 Pulled By: ngimel fbshipit-source-id: 0c2030fac147e629d7be0c29f0683ec2b3f28c71	2019-12-19 18:00:03 -08:00
svcscm	87768e5ade	Updating submodules Summary: GitHub commits: `286867987e` `09cbf47ea5` `db100834c1` `1ba92b8582` `60240e3f08` `beb5c4798e` `c37eb5d377` `1ada29037c` `f12539bbc9` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 75b16ea1bc038b599b3540d0615dd9eb9ecfda74	2019-12-19 17:30:48 -08:00
Zachary DeVito	457286a383	fix missing type check in dictionary literal Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31375 Test Plan: Imported from OSS Differential Revision: D19145440 Pulled By: zdevito fbshipit-source-id: 69909089586149ef766b4858d3420864a81b2493	2019-12-19 16:22:36 -08:00
Rohan Varma	348d42114e	Kill MessageType::SHUTDOWN related logic in pg agent (#31270 ) Summary: https://github.com/pytorch/pytorch/pull/30330 got rid of the need to send a `MessageType::SHUTDOWN` message, so we can now remove the logic/utils for this type of message. I think we can also delete the enum entry in the `enum MessageType`, but we may want to keep it in case the logic in https://github.com/pytorch/pytorch/pull/30710 is ever moved to C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31270 Test Plan: All existing unit tests pass Differential Revision: D19146983 Pulled By: rohan-varma fbshipit-source-id: 35b185411f9446d7d4dfc37a6cb5477cf041e647	2019-12-19 13:47:43 -08:00
davidriazati	57caeb3fc1	Fix builtins table (#31492 ) Summary: Fixes a bad merge that is breaking distributed tests on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/31492 Pulled By: driazati Differential Revision: D19180978 fbshipit-source-id: f69f525e2c7f61194686f07cf75db00eb642882f	2019-12-19 13:33:15 -08:00
Jerry Zhang	226c2d79ce	Get QScheme from observer module (#31293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31293 Previously we check the number of elements in scale to determine if we are using per channel quantization, but we should get qscheme information from observer module directly and we'll expose this information to caller as well Test Plan: . Imported from OSS Differential Revision: D19146669 fbshipit-source-id: ea430eeae0ef8f441be39aa6dcc1bb530b065554	2019-12-19 13:33:11 -08:00
Richard Zou	dbe2f265d0	Better error msg for autograd profiler + multi-worker dataloader crash (#31473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31473 Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. Differential Revision: D19178080 Pulled By: zou3519 fbshipit-source-id: c632525ba1f7b168324f1aa55416e5250f56a086	2019-12-19 13:30:19 -08:00
Richard Zou	e67064a96f	Exclude generated source docs from Google (#31484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31484 See https://github.com/pytorch/pytorch/issues/26123 for context. Previously, when someone googles for `pytorch "adaptive_max_pool2d"`, https://pytorch.org/docs/stable/_modules/torch/nn/modules/pooling.html is the first result. This PR changes the docs build script to exclude all such generated source docs under `_modules/` from Google. It does this by doing a search for `<head>` and then appending `<meta name="robots" content="noindex">`. The [google developer docs](https://support.google.com/webmasters/answer/93710?hl=en) suggest that this is the right way to prevent google from indexing the page. In the future, when the CI builds documentation (both master and stable docs), the newly created docs under _modules will have the meta noindex tag. Test Plan: - I ran `find "$install_path/_modules" -name "*.html" -print0 \| xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'` on a docs build locally and checked that it does indeed append the meta noindex tag after `<head>`. - In a few days we should rerun the search to see if these pages are still being indexed. Differential Revision: D19180300 Pulled By: zou3519 fbshipit-source-id: 5f5aa95a85dd9f065607c2a16f4cdd24ed699a83	2019-12-19 13:27:12 -08:00
Richard Zou	8f3c0d541e	Speed up `Tensor::has_names` for unnamed tensors (#31436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31436 Tensor::has_names is slower than it should be for unnamed tensors because of the following: - it always tries to access the TLS for NamesMode. Unnamed tensors don't need to peek at NamesMode to determine if they have names or not. - There is some virtual function being called because TensorImpl is in c10 and NamedTensorMeta is in libtorch. This PR short-circuits Tensor::has_names for unnamed tensors by checking if the underlying TensorImpl hold a pointer to NamedTensorMeta or not. If the NamedTensorMeta is nullptr; then the tensor is definitely unnamed. Benchmarks: - I have a dedicated benchmarking machine where I isolate a single CPU and make sure it runs at a fixed frequency. - I benchmarked torch.add, which calls `tensor::has_names` three times. - The TL;DR is that torch.add between size-1 unnamed tensors gets sped up ~200ns after this change which is a 9% improvement. - Before, on my machine: https://gist.github.com/zou3519/dfd648a1941d584711d850754e0694bc - After on my machine: https://gist.github.com/zou3519/e78f0d8980b43d0d9c3e3e78ecd0d4d5 Test Plan: - run tests Differential Revision: D19166510 Pulled By: zou3519 fbshipit-source-id: 1888a4e92d29152a5e3b778a95e531087e532f53	2019-12-19 13:19:30 -08:00
anjali411	9d9bc93bfb	Added error message to indicate that reduction operations are not supported for dim>=64 (#31476 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/23159 Currently we don't support reduction operations for dim>=64 and we should give a descriptive RuntimeError indicating the same Diff: D19179039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31476 Differential Revision: D19179039 Pulled By: anjali411 fbshipit-source-id: 58568f64627bf3df6b3e00a1498544c030e74a0e	2019-12-19 13:00:53 -08:00
Elias Ellison	779b128872	add back in reference to jit_unsupported section (#31486 ) Summary: It was added in https://github.com/pytorch/pytorch/pull/31329 and removed in a bad merge in https://github.com/pytorch/pytorch/pull/31138/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/31486 Differential Revision: D19181967 Pulled By: eellison fbshipit-source-id: 7e4b4a9b2042c30ec18f7f737bc4a9a56fac7d92	2019-12-19 12:44:16 -08:00
anjali411	49fe7a7401	Updated documentation for NLLLoss to explain what x, y and w refer to (#31488 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/31385 In the current documentation for NLLLoss, it's unclear what `y` refers to in the math section of the loss description. There was an issue(https://github.com/pytorch/pytorch/issues/31295) filed earlier where there was a confusion if the loss returned for reduction=mean is right or not, perhaps because of lack in clarity of formula symbol description in the current documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31488 Differential Revision: D19181391 Pulled By: anjali411 fbshipit-source-id: 8b75f97aef93c92c26ecbce55b3faf2cd01d3e74	2019-12-19 12:28:16 -08:00
Jerry Zhang	d6acc87c93	Guard against copying from quantized Tensor to non-quantized Tensor (#29660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29660 att Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18799897 fbshipit-source-id: 5d1b4ef84f5ae8eba830784b74485d78fa1e6fcf	2019-12-19 12:16:44 -08:00
peterjc123	c4121ed8db	Fix is_fundamental template for MSVC (#30959 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959 Differential Revision: D18891797 Pulled By: mingbowan fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1	2019-12-19 12:10:22 -08:00
svcscm	6d6a91fb0f	Updating submodules Summary: GitHub commits: `58a1ec274c` `24da1c8b66` `77d5ba7887` `c7b80d7ab5` Test Plan: n/a Reviewed By: tgreenidge fbshipit-source-id: be872df9014b795b279b93bd81efbaa41f2d0fd7	2019-12-19 12:05:29 -08:00
davidriazati	28376e826d	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31463 Pulled By: driazati Differential Revision: D19173580 fbshipit-source-id: 6e5bb24949ec357c4d5b29a16d1733b664f21e05	2019-12-19 10:17:01 -08:00
Gregory Chanan	540b9da41e	Bump numba version in circleCI config to 0.46.0. (#31435 ) Summary: The current numba version doesn't appear to actually work with our numba-cuda tests (numba.cuda.is_available()) fails. Previous attempts to upgrade were blocked by https://github.com/numba/numba/issues/4368. It's a bit unclear to me, but I believe 0.46.0 fixes the above version. I'm verify that we catch that issue in CI via https://github.com/pytorch/pytorch/pull/31434. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31435 Differential Revision: D19166865 Pulled By: gchanan fbshipit-source-id: e01fa48c577e35de178423db7a7f79ac3dd3894d	2019-12-19 07:55:55 -08:00
Nikolay Korovaiko	fc3103b116	fixing a naming issue in creating a residual loop node in a bailout graph (#31400 ) Summary: This addresses the issue of differentiating between `%4` in `%12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3)` and `%y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24` in `%4` loop's body in a residual continuation loop, because these should be different values. ``` [DUMP profiling_graph_executor_impl.cpp:124] with prim::BailoutTemplate_0 = graph(%z.1 : int, [DUMP profiling_graph_executor_impl.cpp:124] %size.1 : int): [DUMP profiling_graph_executor_impl.cpp:124] %2 : Tensor = prim::Constant[value= 1 1 [ CPUDoubleType{2} ]]() [DUMP profiling_graph_executor_impl.cpp:124] %3 : Double(2) = prim::BailOut[index=0](%2, %z.1, %size.1) [DUMP profiling_graph_executor_impl.cpp:124] %4 : int = prim::Constant[value=0]() # test_jit.py:3772:54 [DUMP profiling_graph_executor_impl.cpp:124] %5 : None = prim::Constant() [DUMP profiling_graph_executor_impl.cpp:124] %6 : bool = prim::Constant[value=1]() # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] %counters.1 : int[] = prim::ListConstruct() [DUMP profiling_graph_executor_impl.cpp:124] %8 : int = prim::Constant[value=8]() [DUMP profiling_graph_executor_impl.cpp:124] %9 : int = aten::__round_to_zero_floordiv(%size.1, %8) [DUMP profiling_graph_executor_impl.cpp:124] %10 : int = aten::mul(%9, %8) [DUMP profiling_graph_executor_impl.cpp:124] %11 : int = aten::sub(%size.1, %10) [DUMP profiling_graph_executor_impl.cpp:124] %12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3) # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] block0(%i.2 : int, %15 : int, %y.7 : Tensor): [DUMP profiling_graph_executor_impl.cpp:124] %17 : Double(2) = prim::BailOut[index=1](%y.7, %z.1, %counters.1, %9, %11, %i.2, %15) [DUMP profiling_graph_executor_impl.cpp:124] %18 : int[] = aten::append(%counters.1, %15) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %19 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %20 : Tensor = aten::ones(%19, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %21 : Double(1) = prim::BailOut[index=2](%20, %z.1, %counters.1, %9, %11, %i.2, %15, %17) [DUMP profiling_graph_executor_impl.cpp:124] %22 : Tensor[] = prim::ListConstruct(%17, %21) [DUMP profiling_graph_executor_impl.cpp:124] %y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %24 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:124] %25 : int = aten::add(%15, %24) [DUMP profiling_graph_executor_impl.cpp:124] %26 : int[] = aten::append(%counters.1, %25) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %27 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %28 : Tensor = aten::ones(%27, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %29 : Double(1) = prim::BailOut[index=3](%28, %z.1, %counters.1, %9, %11, %i.2, %y.5, %25) [DUMP profiling_graph_executor_impl.cpp:124] %30 : Tensor[] = prim::ListConstruct(%y.5, %29) [DUMP profiling_graph_executor_impl.cpp:124] %y.9 : Double(4) = aten::cat(%30, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %32 : int = aten::add(%25, %24) [DUMP profiling_graph_executor_impl.cpp:124] %33 : int[] = aten::append(%counters.1, %32) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %34 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %35 : Tensor = aten::ones(%34, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %36 : Double(1) = prim::BailOut[index=4](%35, %z.1, %counters.1, %9, %11, %i.2, %y.9, %32) [DUMP profiling_graph_executor_impl.cpp:124] %37 : Tensor[] = prim::ListConstruct(%y.9, %36) [DUMP profiling_graph_executor_impl.cpp:124] %y.10 : Double(5) = aten::cat(%37, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %39 : int = aten::add(%32, %24) [DUMP profiling_graph_executor_impl.cpp:124] %40 : int[] = aten::append(%counters.1, %39) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %41 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %42 : Tensor = aten::ones(%41, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %43 : Double(1) = prim::BailOut[index=5](%42, %z.1, %counters.1, %9, %11, %i.2, %y.10, %39) [DUMP profiling_graph_executor_impl.cpp:124] %44 : Tensor[] = prim::ListConstruct(%y.10, %43) [DUMP profiling_graph_executor_impl.cpp:124] %y.11 : Double(6) = aten::cat(%44, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %46 : int = aten::add(%39, %24) [DUMP profiling_graph_executor_impl.cpp:124] %47 : int[] = aten::append(%counters.1, %46) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %48 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %49 : Tensor = aten::ones(%48, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %50 : Double(1) = prim::BailOut[index=6](%49, %z.1, %counters.1, %9, %11, %i.2, %y.11, %46) [DUMP profiling_graph_executor_impl.cpp:124] %51 : Tensor[] = prim::ListConstruct(%y.11, %50) [DUMP profiling_graph_executor_impl.cpp:124] %y.12 : Double(7) = aten::cat(%51, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %53 : int = aten::add(%46, %24) [DUMP profiling_graph_executor_impl.cpp:124] %54 : int[] = aten::append(%counters.1, %53) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %55 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %56 : Tensor = aten::ones(%55, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %57 : Double(1) = prim::BailOut[index=7](%56, %z.1, %counters.1, %9, %11, %i.2, %y.12, %53) [DUMP profiling_graph_executor_impl.cpp:124] %58 : Tensor[] = prim::ListConstruct(%y.12, %57) [DUMP profiling_graph_executor_impl.cpp:124] %y.13 : Double(8) = aten::cat(%58, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %60 : int = aten::add(%53, %24) [DUMP profiling_graph_executor_impl.cpp:124] %61 : int[] = aten::append(%counters.1, %60) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %62 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %63 : Tensor = aten::ones(%62, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %64 : Double(1) = prim::BailOut[index=8](%63, %z.1, %counters.1, %9, %11, %i.2, %y.13, %60) [DUMP profiling_graph_executor_impl.cpp:124] %65 : Tensor[] = prim::ListConstruct(%y.13, %64) [DUMP profiling_graph_executor_impl.cpp:124] %y.14 : Double(9) = aten::cat(%65, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %67 : int = aten::add(%60, %24) [DUMP profiling_graph_executor_impl.cpp:124] %68 : int[] = aten::append(%counters.1, %67) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %69 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %70 : Tensor = aten::ones(%69, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %71 : Double(1) = prim::BailOut[index=9](%70, %z.1, %counters.1, %9, %11, %i.2, %y.14, %67) [DUMP profiling_graph_executor_impl.cpp:124] %72 : Tensor[] = prim::ListConstruct(%y.14, %71) [DUMP profiling_graph_executor_impl.cpp:124] %y.15 : Tensor = aten::cat(%72, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %74 : int = aten::add(%67, %24) [DUMP profiling_graph_executor_impl.cpp:124] -> (%6, %74, %y.15) [DUMP profiling_graph_executor_impl.cpp:124] %75 : Double(10) = prim::BailOut[index=10](%y.1, %z.1, %counters.1, %11, %12) [DUMP profiling_graph_executor_impl.cpp:124] %76 : int, %y : Tensor = prim::Loop(%11, %6, %12, %75) # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] block0(%i.1 : int, %79 : int, %y.6 : Tensor): [DUMP profiling_graph_executor_impl.cpp:124] %81 : Double(*) = prim::BailOut[index=11](%y.6, %z.1, %counters.1, %11, %i.1, %79) [DUMP profiling_graph_executor_impl.cpp:124] %82 : int[] = aten::append(%counters.1, %79) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %83 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %84 : Tensor = aten::ones(%83, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %85 : Double(1) = prim::BailOut[index=12](%84, %counters.1, %11, %i.1, %79, %81) [DUMP profiling_graph_executor_impl.cpp:124] %86 : Tensor[] = prim::ListConstruct(%81, %85) [DUMP profiling_graph_executor_impl.cpp:124] %y.4 : Tensor = aten::cat(%86, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %88 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:124] %89 : int = aten::add(%79, %88) [DUMP profiling_graph_executor_impl.cpp:124] -> (%6, %89, %y.4) [DUMP profiling_graph_executor_impl.cpp:124] %90 : Double(12) = prim::BailOut[index=13](%y, %counters.1) [DUMP profiling_graph_executor_impl.cpp:124] %91 : (Tensor, int[]) = prim::TupleConstruct(%90, %counters.1) [DUMP profiling_graph_executor_impl.cpp:124] return (%91) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31400 Differential Revision: D19172750 Pulled By: Krovatkin fbshipit-source-id: 85d3aac4e80b65b83b6be3c0bca8075a731a2b7e	2019-12-19 00:34:50 -08:00
David Riazati	1e116a5089	Revert D19054937: Add support for `del` Test Plan: revert-hammer Differential Revision: D19054937 Original commit changeset: c535ea16a9e6 fbshipit-source-id: e57d31811441947b7ee38c8c2b16eecde5005792	2019-12-18 22:39:41 -08:00
Junjie Bai	489dd6cb90	Add TORCH_DCHECK macro that checks only in debug builds (#31240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31240 Follow up on discoveries/discussions in https://github.com/pytorch/pytorch/pull/30810 Mimic the `DCHECK` macro from https://github.com/pytorch/pytorch/blob/e5eb871/c10/util/logging_is_not_google_glog.h#L117-L125 With this change the perf gap is eliminated: ``` ================================================================================ Program Output: ================================================================================ Run on (36 X 1601 MHz CPU s) 2019-12-12 20:12:13 ----------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------- BM_IntrusivePtrCtorDtor 23 ns 23 ns 30914703 BM_SharedPtrCtorDtor 27 ns 27 ns 25895944 BM_IntrusivePtrArray/16 503 ns 503 ns 1392139 BM_IntrusivePtrArray/32 1006 ns 1006 ns 695749 BM_IntrusivePtrArray/64 2013 ns 2013 ns 347714 BM_IntrusivePtrArray/128 4024 ns 4024 ns 173964 BM_IntrusivePtrArray/256 8047 ns 8047 ns 86994 BM_IntrusivePtrArray/512 16106 ns 16106 ns 43461 BM_IntrusivePtrArray/1024 32208 ns 32207 ns 21731 BM_IntrusivePtrArray/2048 64431 ns 64430 ns 10865 BM_IntrusivePtrArray/4096 128940 ns 128938 ns 5429 BM_SharedPtrArray/16 503 ns 503 ns 1392128 BM_SharedPtrArray/32 1006 ns 1006 ns 695940 BM_SharedPtrArray/64 2012 ns 2012 ns 347817 BM_SharedPtrArray/128 4024 ns 4023 ns 173927 BM_SharedPtrArray/256 8069 ns 8069 ns 86741 BM_SharedPtrArray/512 16143 ns 16142 ns 43357 BM_SharedPtrArray/1024 32283 ns 32283 ns 21685 BM_SharedPtrArray/2048 64718 ns 64717 ns 10817 BM_SharedPtrArray/4096 129469 ns 129466 ns 5407 ================================================================================ ``` ``` ================================================================================ Program Output: ================================================================================ Run on (80 X 2001 MHz CPU s) 2019-12-12 20:12:23 ----------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------- BM_IntrusivePtrCtorDtor 18 ns 18 ns 38630411 BM_SharedPtrCtorDtor 22 ns 22 ns 32356114 BM_IntrusivePtrArray/16 402 ns 402 ns 1739637 BM_IntrusivePtrArray/32 805 ns 805 ns 869818 BM_IntrusivePtrArray/64 1610 ns 1609 ns 434881 BM_IntrusivePtrArray/128 3218 ns 3218 ns 217437 BM_IntrusivePtrArray/256 6436 ns 6436 ns 108739 BM_IntrusivePtrArray/512 12882 ns 12882 ns 54356 BM_IntrusivePtrArray/1024 25763 ns 25763 ns 27177 BM_IntrusivePtrArray/2048 51532 ns 51531 ns 13590 BM_IntrusivePtrArray/4096 103091 ns 103091 ns 6778 BM_SharedPtrArray/16 402 ns 402 ns 1740165 BM_SharedPtrArray/32 804 ns 804 ns 869035 BM_SharedPtrArray/64 1610 ns 1610 ns 434975 BM_SharedPtrArray/128 3218 ns 3218 ns 217505 BM_SharedPtrArray/256 6457 ns 6457 ns 108510 BM_SharedPtrArray/512 12909 ns 12909 ns 54249 BM_SharedPtrArray/1024 25810 ns 25810 ns 27127 BM_SharedPtrArray/2048 51763 ns 51763 ns 13531 BM_SharedPtrArray/4096 103506 ns 103505 ns 6759 ================================================================================ ``` Test Plan: buck test caffe2/c10/... buck test mode/opt caffe2/c10/... Differential Revision: D18998243 fbshipit-source-id: ddf0a118a80efe032b52d403867c1f416c721590	2019-12-18 21:55:58 -08:00
Elias Ellison	fb24f7c4ad	catch all exceptions in converting default values to ivalues (#31398 ) Summary: Previously we would only catch `py::cast_error` which led to incomprehensible error messages like: `TypeError: 'NoneType' object is not iterable`. We are running arbitrary pybind code here, and not doing anything with the error message, so we should be less restrictive with the types of errors we catch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31398 Differential Revision: D19166655 Pulled By: eellison fbshipit-source-id: 84db8b3714c718b475913f2f4bb6f19e62f2d9ec	2019-12-18 20:27:46 -08:00
Jerry Zhang	1bb6c51421	Fix getAttribute (#31011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31011 `getAttribute` is supposed to throw when there the attribute is not found rather than return a `nullptr`. Test Plan: . Imported from OSS Differential Revision: D18898417 fbshipit-source-id: 0fe7d824b978ad19bb5ef094d3aa560e9fc57f87	2019-12-18 19:27:39 -08:00
Jeremy Lilley	dff7b945bf	Avoid sending large unneeded data over wire in process_group_agent. (#31357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31357 If a user selects a subset of a Tensor and sends it in an RPC, we were sending the whole original Tensor Storage over the network. While this sounds reasonable, in practice, we observed view-like Tensors being sent over rpc, where only 1% of the data in the provided Tensor's Storage was actually used/needed. The simple solution here is to just force a clone in the serializer code if we see that less than (arbitrary) half the bits are used, and the tensor is more than a nominal few KB. Add related tests to ensure this doesn't break. An alternate approach would be to modify the Pickler. That said, since Pickler is shared by more components, the logic might be harder to tailor appropriately at that layer (particularly given that the Pickler has explicit logic to share a single Storage* among several Tensors that commonly point to the same Storage*). It's possible that we might want to further refine the basic thresholds in this change. In practice, we've seen a mostly bimodal distribution thus far for the percent of Tensor Storage referred by a Tensor in observed rpcs (i.e. either 90%+ or sub-10% of the Storage referenced), hence the existing 50% threshold here is probably not an unreasonable starting point. ghstack-source-id: 95925474 Test Plan: buck test mode/dev caffe2/test/cpp/rpc/... Differential Revision: D19137056 fbshipit-source-id: e2b3a4dd0cc6e1de820fd0740aa1d59883dbf8d4	2019-12-18 19:24:24 -08:00
svcscm	1bb800cf5c	Updating submodules Summary: GitHub commits: `f5d37bdcfd` `21ba9e3692` `576eeaee27` `7ba1f57d53` `e520f8f5b3` `54f9092b0c` `88bb770ce1` `d91888de6c` `ff06eb0881` `fdaeb6ea30` `1fd432f00f` `60b7cb3408` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: f63bd0a879f4d08e159f530f595067f5a09ffe70	2019-12-18 18:41:23 -08:00
Jerry Zhang	fe707c7849	Use `default_observer` and `default_weight_observer` in tests (#31424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31424 att Test Plan: test_jit.py Imported from OSS Differential Revision: D19162368 fbshipit-source-id: 33b95ba643eeeae942283bbc33f7ceda8d14c431	2019-12-18 18:35:07 -08:00
davidriazati	e1509cb468	Add support for `del` (#31273 ) Summary: Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts Fixes #20615 ](https://our.intern.facebook.com/intern/diff/19054937/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273 Pulled By: driazati Differential Revision: D19054937 fbshipit-source-id: c535ea16a9e62d176f8ad45947670fc3535af77c	2019-12-18 18:19:22 -08:00
Michael Suo	e7d25a3e4d	add a suggested alternative to _get_trace_graph Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31441 Test Plan: Imported from OSS Differential Revision: D19165646 Pulled By: suo fbshipit-source-id: 96a264bc55ceafd798d92b986d319cddbb0d9c69	2019-12-18 17:34:25 -08:00
Kaikai Wang	d2e66b44cc	Temporary fix to support building pytorch from fbsource (for xplat dependencies) (#31393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31393 pytorch build was set up with the include paths (-I) relative to fbcode/. This works well for fbcode builds, but doesn't work for the new fbcode_deps args for xplat build targets that work across xplat and fbcode. When these targets are built, the include paths need to be relative to fbsource, so fbcode/ suffix needs to be added to those paths. Longer term, to properly fix this, we need to use raw_headers with public_include_directories specified for all of these targets. Test Plan: buck test mode/dev //papaya/integration/service/local/test:mnist_federated_system_test -- 'MnistFederatedSystemTest\.test' --run-disabled Reviewed By: mzlee Differential Revision: D19148465 fbshipit-source-id: a610e84bf4cad5838e54e94bae71b957c4b6d4b5	2019-12-18 17:30:57 -08:00
James Reed	a3cdb7eca3	Fix default instantation of dynamic quantized LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31433 Test Plan: Imported from OSS Differential Revision: D19164539 Pulled By: jamesr66a fbshipit-source-id: 7045817ab3dfb530c4480a10523c4c6bcdbfc7eb	2019-12-18 16:59:00 -08:00
Tristan Rice	1e80ff7a67	autograd/profiler: make record_function more threadsafe (#31346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31346 This makes it so that if profiling is enabled/disabled from a different thread while a RecordFunction span is active via an op it doesn't crash the process. We currently see when using torch.distributed.rpc to enable/disable profiling on other nodes while other things are running. Test Plan: buck test //caffe2/test:autograd -- test_record_function Reviewed By: albanD Differential Revision: D19133258 fbshipit-source-id: 30712b06c6aa051789948de2918dcfb9b78967ba	2019-12-18 16:27:42 -08:00
davidriazati	148bcd3ee5	Add support for builtins as attributes (#31269 ) Summary: Fixes #27495 This adds builtins as another piece of a concrete type. They're separate from normal functions since they represent the `BuiltinFunction` sugared value (which is a direct call to a builtin op). It also moves the builtins related logic from `jit/__init__.py` to `jit/_builtins.py` so it can be used from `jit/_recursive.py` to look up functions in the builtins table. ](https://our.intern.facebook.com/intern/diff/19149779/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31269 Pulled By: driazati Differential Revision: D19149779 fbshipit-source-id: d4e5e5d7d7d528b75a2f503e6004394251a4e82d	2019-12-18 15:24:45 -08:00
davidriazati	503a4e9019	Cleanup after moving language reference (#31146 ) Summary: Stacked PRs * #31146 - [jit] Cleanup after moving language reference * #31138 - [jit] Move TorchScript language reference to its own page Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language Pull Request resolved: https://github.com/pytorch/pytorch/pull/31146 Pulled By: driazati Differential Revision: D19167390 fbshipit-source-id: f28daed36754a553264fc8ac142ed22c3e26d63e	2019-12-18 15:09:35 -08:00
davidriazati	ae2487bf4d	Move TorchScript language reference to its own page (#31138 ) Summary: Stacked PRs * #31146 - [jit] Cleanup after moving language reference * #31138 - [jit] Move TorchScript language reference to its own page Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138 Pulled By: driazati Differential Revision: D19167375 fbshipit-source-id: d37110d85fc8b8d2c741be49846e873de1357c2a	2019-12-18 15:09:31 -08:00
Yanghan Wang	d08250c223	fix zero-batch handling in convtranspose (#24341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24341 ConvTransposeOp doesn't crash for zero-batch, but it doesn't modify the output blob. This leads to buggy behaviour especially when running the same network twice using different input, or backprop during training. Seems `ConvTransposeUnpoolBase<Context>::GetOutputSize` works for zero-batch, so I remove the check for `input.numel() > 0`, and reshape the output blob before returning. For CudnnConvTransposeGradientOp, it's a bit verbose to set `dfilter` and `dbias`, it's a seems the Cudnn can handle it, so simply remove the `X.numel() == 0` branch. Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:conv_transpose_test -- --run-disabled Reviewed By: BIT-silence Differential Revision: D16807606 fbshipit-source-id: 0d72c5bd8f2e03c34465e7b530cca548d9bdd5e1	2019-12-18 15:06:36 -08:00
davidriazati	7692494c67	Fix hex literal parsing (#29935 ) Summary: Stacked PRs * #29940 - [jit] Fix parsing of big float literals * #29935 - [jit] Fix hex literal parsing * #29931 - [jit] Throw a better error for int too big for int64_t Previously these were all parsed as `0` ](https://our.intern.facebook.com/intern/diff/19124944/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29935 Pulled By: driazati Differential Revision: D19124944 fbshipit-source-id: 1ee0c1dee589933363a5efba069a2cfaf94373c5	2019-12-18 14:00:22 -08:00
davidriazati	1f50cfc24d	Throw a better error for int too big for int64_t Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29931 Pulled By: driazati Differential Revision: D19124934 fbshipit-source-id: 91841d7ba4f2f6142c51fba07b7faa14bb817e3a	2019-12-18 14:00:16 -08:00
Elias Ellison	fb30a48b4e	add unsupported section (#31329 ) Summary: Add a section for unsupported ops, and modules. Automatically generate the properties and attributes that aren't bound, and for ops that have semantic mismatches set up tests so the docs stay up to date. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31329 Differential Revision: D19164472 Pulled By: eellison fbshipit-source-id: 46290bb8a64d9de928cfb1eda5ff4558c3799c88	2019-12-18 13:56:02 -08:00
Andreas Koepf	5e8bac24b4	Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#28135 ) Summary: Fix: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765 Port of TH SoftMarginCriterion to ATen using un-fused tensor operators but with custom backward code. This is a follow-up/fixc of reverted PR https://github.com/pytorch/pytorch/issues/27673. Benchmark results: CPU became faster, GPU slower. To reach previous TH perf probably manual fusion is necessary. ### WITH patch ``` CPU warmup 1000 took 7.997200009413064e-05 CPU warmup 10000 took 0.0008116499957395718 CPU warmup 100000 took 0.0012691459996858612 CPU warmup TOTAL time 0.0021982479956932366 CPU forward 1000 took 7.320100849028677e-05 CPU forward 10000 took 0.00015837099635973573 CPU forward 100000 took 0.0010471990099176764 CPU forward 1000000 took 0.01238470000680536 CPU forward 10000000 took 0.12747182900784537 CPU forward 100000000 took 1.2076255190040683 CPU forward TOTAL time 1.3488940890092636 CPU for- & backward 1000 took 0.00032587299938313663 CPU for- & backward 10000 took 0.0006926299975020811 CPU for- & backward 100000 took 0.002146183993318118 CPU for- & backward 1000000 took 0.019158899012836628 CPU for- & backward 10000000 took 0.2957490350090666 CPU for- & backward 100000000 took 1.7630806300003314 CPU for- & backward TOTAL time 2.081367089995183 GPU warmup 1000 took 0.0004558280052151531 GPU warmup 10000 took 0.0002567449992056936 GPU warmup 100000 took 0.0001593509950907901 GPU warmup TOTAL time 0.0009442300070077181 GPU forward 1000 took 0.00015061900194268674 GPU forward 10000 took 0.00015258099301718175 GPU forward 100000 took 0.00015409699699375778 GPU forward 1000000 took 0.0008183339959941804 GPU forward 10000000 took 0.004424853003001772 GPU forward 100000000 took 0.04356115800328553 GPU forward TOTAL time 0.04938192600093316 GPU for- & backward 1000 took 0.0008062430133577436 GPU for- & backward 10000 took 0.0006074949924368411 GPU for- & backward 100000 took 0.0007091690058587119 GPU for- & backward 1000000 took 0.001022183001623489 GPU for- & backward 10000000 took 0.009945805999450386 GPU for- & backward 100000000 took 0.0944173600000795 GPU for- & backward TOTAL time 0.28060428200114984 ``` ### WITHOUT patch ``` CPU warmup 1000 took 6.394000956788659e-05 CPU warmup 10000 took 0.00038220599526539445 CPU warmup 100000 took 0.0034939230099553242 CPU warmup TOTAL time 0.003981974994530901 CPU forward 1000 took 4.7855006414465606e-05 CPU forward 10000 took 0.000347569992300123 CPU forward 100000 took 0.003367935001733713 CPU forward 1000000 took 0.03605044000141788 CPU forward 10000000 took 0.35935167300340254 CPU forward 100000000 took 3.630371332008508 CPU forward TOTAL time 4.029640004009707 CPU for- & backward 1000 took 0.00028494100843090564 CPU for- & backward 10000 took 0.0006738200027029961 CPU for- & backward 100000 took 0.0051178760040784255 CPU for- & backward 1000000 took 0.04925115800870117 CPU for- & backward 10000000 took 0.7172313440096332 CPU for- & backward 100000000 took 5.441953932997421 CPU for- & backward TOTAL time 6.21466830400459 GPU warmup 1000 took 0.001803738996386528 GPU warmup 10000 took 0.00041877900366671383 GPU warmup 100000 took 0.0003870719956466928 GPU warmup TOTAL time 0.0026561370032140985 GPU forward 1000 took 0.00037833399255760014 GPU forward 10000 took 0.00038825398951303214 GPU forward 100000 took 0.0003841099969577044 GPU forward 1000000 took 0.0007090550061548129 GPU forward 10000000 took 0.0016171559982467443 GPU forward 100000000 took 0.013463679002597928 GPU forward TOTAL time 0.017010531009873375 GPU for- & backward 1000 took 0.0007374050037469715 GPU for- & backward 10000 took 0.0006343529967125505 GPU for- & backward 100000 took 0.0006375070079229772 GPU for- & backward 1000000 took 0.0007550300069851801 GPU for- & backward 10000000 took 0.002672752001672052 GPU for- & backward 100000000 took 0.023170708998804912 GPU for- & backward TOTAL time 0.20251446698966902 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28135 Differential Revision: D18001447 Pulled By: VitalyFedyunin fbshipit-source-id: ad90dc1cca42dcaf3ea9e17e4f8fd79cee0a293e	2019-12-18 13:33:59 -08:00
xiaobing.zhang	7cf8b9bada	Move leaky_relu to Aten(CPU, CUDA) (#29899 ) Summary: VitalyFedyunin, This PR is about port LeakyReLU activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.LeakyReLU() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.14 (ms). input size(128, 10000) forward time is 4.21 (ms); backwad avg time is 8.02 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 1.98 (ms); backwad avg time is 6.21 (ms) ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.04 (ms). input size(128, 10000) forward time is 0.03 (ms); backwad avg time is 0.09 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 0.47 (ms); backwad avg time is 1.02 (ms). ``` How to set the numbers of thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run ./run.sh num_threads test.py. Fixes https://github.com/pytorch/pytorch/issues/24583 #24584 https://github.com/pytorch/pytorch/issues/24720 #24721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29899 Differential Revision: D18816231 Pulled By: VitalyFedyunin fbshipit-source-id: afb1e43a99317d17f50cff1b593cd8f7a0a83da2	2019-12-18 13:14:11 -08:00
Tristan Rice	b0bd35ff13	caffe2/event: allow multiple errors such as when cancelled (#31335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335 When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well. Typically we see: 1. SendOp failed due to a network error 2. async scheduling cancels all other ops via `SetFinished("Cancelled");` 3. Another SendOp fails due to a network error and crashes the process when the exception is thrown. This changes caffe2 ops to allow failing twice. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu Reviewed By: andrewwdye Differential Revision: D19106548 fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9	2019-12-18 13:10:57 -08:00
Mingbo Wan	4d22c3ba01	fix docker login, add docker image tag list after purge as html (#31328 ) Summary: example of the generated html: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31328 Differential Revision: D19147113 Pulled By: mingbowan fbshipit-source-id: 5104e92d4490f047a6474e2b12aed3293b52a9df	2019-12-18 12:08:51 -08:00
Pavel Belevich	47766e648f	C++ API parity: MultiheadAttention Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27309 Test Plan: Imported from OSS Differential Revision: D17766736 Pulled By: pbelevich fbshipit-source-id: 7a5f2399f081945d31d4c13d7a8d248c387fc1a6	2019-12-18 10:13:29 -08:00
Elliot Waite	c63f8e5ebe	Fix typo in data.rst docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31395 Differential Revision: D19160010 Pulled By: zou3519 fbshipit-source-id: cbc4e719e69117e8747617729d240c72e7a4e3dd	2019-12-18 09:52:10 -08:00
Natalia Gimelshein	285cc13435	check devices for all input tensors in index_put (#31280 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/30960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31280 Differential Revision: D19149114 Pulled By: ngimel fbshipit-source-id: af185a98ac6ea614f43bbf865de02ea113d4ed56	2019-12-18 09:25:40 -08:00
Pritam Damania	913323750d	CODEOWNERS for distributed optimizer. (#31403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31403 ghstack-source-id: 95874532 Test Plan: waitforbuildbot Differential Revision: D19154217 fbshipit-source-id: a18ebe646b97c83cc0eb0821b10b4c76d5ce2878	2019-12-18 09:25:35 -08:00
Pritam Damania	359c39b3c2	Use global lock instead of per instance lock. (#31404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31404 Multiple "trainers" could each create different instances of DistributedOptimizer, which means we can still have a race condition unless we do a trully global per worker lock. ghstack-source-id: 95874624 Test Plan: run unit tests -- unfortunatelly due to the non-deterministic behavior it's not clear how to unit test this properly. Differential Revision: D19154248 fbshipit-source-id: fab6286c17212f534f1bd1cbdf9f0de002d48c74	2019-12-18 09:22:54 -08:00
Jerry Zhang	386cd59d44	Remove redundant queries of qconfig in `insertObservers` (#31292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31292 att Also we need to do this check after we call `insertObservers` on invoked modules as well since qconfig can be None for parent module while being valid for invoked modules Test Plan: . Imported from OSS Differential Revision: D19146668 fbshipit-source-id: be6811353d359ed3edd5415ced29a4999d86650b	2019-12-18 09:15:52 -08:00
Iurii Zdebskyi	58d2dd5b73	Enabled flip for bool tensors (#31267 ) Summary: Fix this [issue](https://github.com/pytorch/pytorch/issues/31213) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31267 Differential Revision: D19047249 Pulled By: izdeby fbshipit-source-id: f58ca3ac88aab28742b8d345400270f7d31c3856	2019-12-18 09:01:32 -08:00
Vitaly Fedyunin	3e59e80429	Revert D18941024: Move TorchScript language reference to its own page Test Plan: revert-hammer Differential Revision: D18941024 Original commit changeset: d0ff600870a1 fbshipit-source-id: 01c0eac4c9741f27b91d710616e71a0d769f6f6a	2019-12-18 08:55:50 -08:00
Kurt Mohler	3694749cd1	Detect dill version in torch.save/load (#30985 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/28313 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30985 Differential Revision: D19142947 Pulled By: zou3519 fbshipit-source-id: 10e3a182a99e80ca8c9c8328b6f8764b27d78eb3	2019-12-18 08:05:08 -08:00
Shady Aly	74e59c6fed	caffe2::TypeInfo fix when using clang-cl on Windows (#31364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31364 clang-cl defines both `_MSC_VER` and `__clang__`. Names are mangled clang style though. calling `extract` with the wrong name mangling pattern will throw `std::logic_error`. This crashes on Windows when `get_fully_qualified_type_name` is called because it is marked with `noexcept`. Test Plan: Windows builds no longer crash on startup. Reviewed By: mattjgalloway Differential Revision: D19142064 fbshipit-source-id: 516b9b63daeff30f5c097d192b0971c7a42db57e	2019-12-18 07:51:07 -08:00
davidriazati	c05538b831	Move TorchScript language reference to its own page (#31138 ) Summary: Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language ](https://our.intern.facebook.com/intern/diff/18941024/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138 Pulled By: driazati Differential Revision: D18941024 fbshipit-source-id: d0ff600870a14c4a7c6ce54867d152072a12c48c	2019-12-18 00:46:19 -08:00
Michael Suo	3c8892aa0c	avoid doing quadratic work in concrete type inference (#31020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31020 Before, the recursive scripting process re-did the concrete type inference process for every submodule call. This changes things so that the concrete type inference process only occurs once (at the top level), and we re-use all the inferred concrete types while recursively compiling submodules. This is both more efficient (we don't do n^2 work inferring concrete types) and less bug-prone (since we infer the concrete type only once, there is no possibility of a mismatch). Test Plan: Imported from OSS Differential Revision: D18904110 Pulled By: suo fbshipit-source-id: 6560b85ae29fe5e9db1ee982dbf8bc222614b8d8	2019-12-17 21:55:55 -08:00
Michael Suo	878b0e35f7	Simplify recursive script compilation flow. (#31019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31019 No more `recurisve_script`, just direct calls to `create_script_module`. This reduces the number of pathways through the frontend, and the uniformity is useful for a future PR. Test Plan: Imported from OSS Differential Revision: D18904113 Pulled By: suo fbshipit-source-id: 7de061dfef0cbdfc9376408fc6c1167b81803f01	2019-12-17 21:55:50 -08:00
Michael Suo	82d52bc718	remove remnants of properties hack (#31018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31018 Properties are now disallowed so this hack is no longer necessary Test Plan: Imported from OSS Differential Revision: D18904112 Pulled By: suo fbshipit-source-id: 83448da677082d59355729bb72d9f9f4c31ea756	2019-12-17 21:55:45 -08:00
Michael Suo	7e81d72d12	remove unnecessary arg from create_script_module (#31017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31017 This arg is now derivable from another one. So we don't need to pass both Test Plan: Imported from OSS Differential Revision: D18904111 Pulled By: suo fbshipit-source-id: ea74ea9c2ae83d9e0e6977b0eb6629f53545e2e4	2019-12-17 21:55:41 -08:00
Michael Suo	e5631119f6	use expect instead of casting in register_c10_ops (#31401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31401 As title, just a mechanical change Test Plan: Imported from OSS Differential Revision: D19152965 Pulled By: suo fbshipit-source-id: 6bb27df7c8f542c55110286c156358ba0936269f	2019-12-17 21:37:59 -08:00
Michael Suo	4ec2448580	Update OVERVIEW.md (#31373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31373 Just some housekeeping Test Plan: Imported from OSS Differential Revision: D19145987 Pulled By: suo fbshipit-source-id: ae8142dab2bddcf0b628c27c426ca26334c48238	2019-12-17 21:29:16 -08:00
Michael Suo	e0ab255a51	Updates to serialization.md (#31372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31372 Keeping it current with the latest changes. Test Plan: Imported from OSS Differential Revision: D19145986 Pulled By: suo fbshipit-source-id: 88122e66fa87a354ef8e87faffe58551074e3f03	2019-12-17 21:29:12 -08:00
Sebastian Messmer	e169e02836	Refactor custom op tests (#31282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31282 Introduce a helper to easily call stack ops ghstack-source-id: 95855728 Test Plan: unit tests Differential Revision: D19061515 fbshipit-source-id: a7d6329e26cd3d94730d88c8a6393e10bfbd8e9b	2019-12-17 20:48:01 -08:00
Vitaly Fedyunin	c5d2758c35	Disable flaky TestMomentumSGD.test_fp16momentum_sgd (#31369 ) Summary: Related to https://github.com/pytorch/pytorch/issues/31368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31369 Differential Revision: D19147072 Pulled By: VitalyFedyunin fbshipit-source-id: 6fad13be7b35f992d84a20f23877cad05ff18616	2019-12-17 19:16:54 -08:00
Wanchao Liang	e3fecabdcb	Setup operator registration for distributed package (#31214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31214 This set up the basic infrastructure for distributed autograd and rpc to bind their operators to TorchScript, since the whole distributed package is builtin behind the `USE_DISTRIBUTED` flag, we separate the registration and build it only when the flag is on. Test Plan: Imported from OSS Differential Revision: D19137160 fbshipit-source-id: ff47dc4c380ebe273fe0eea9e5e3fccfbd6466d7	2019-12-17 17:26:43 -08:00
Zafar Takhirov	e33dea6e4e	dynamicly quantized lstm benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30149 Test Plan: Imported from OSS Differential Revision: D18613005 Pulled By: z-a-f fbshipit-source-id: 966bfe2c862b1b4006b228bd9115c5c1cd3ad8cf	2019-12-17 16:52:04 -08:00
Sebastian Messmer	f0243ea712	Use [[deprecated]] instead of C10_DEPRECATED (#30918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30918 This is a C++14 feature we can use now ghstack-source-id: 95811482 Test Plan: waitforsandcastle Differential Revision: D18869636 fbshipit-source-id: b5b3d78b61b6ceb2deda509131f8502e95b1d057	2019-12-17 15:21:34 -08:00
Yanghan Wang	d9c3913dfc	move BatchPermutationOp to caffe2/operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31350 Reviewed By: houseroad Differential Revision: D19053527 fbshipit-source-id: 50d11f137d0f5c07e8ad899a3a84d56a042bbc32	2019-12-17 14:58:27 -08:00
Sebastian Messmer	0b8332efb4	Remove c++11 examples from doc comments (#30925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30925 - ghstack-source-id: 95810835 Test Plan: it's just comments Differential Revision: D18869634 fbshipit-source-id: 346498ae2472dbfe23ef40533bff891fde9922c4	2019-12-17 14:58:22 -08:00
Sebastian Messmer	5554e5b793	Docs: c++11 -> c++14 (#30530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30530 Switch some mentions of "C++11" in the docs to "C++14" ghstack-source-id: 95812049 Test Plan: testinprod Differential Revision: D18733733 fbshipit-source-id: b9d0490eb3f72bad974d134bbe9eb563f6bc8775	2019-12-17 14:09:02 -08:00
Zachary DeVito	cc8d6342fc	make profiling take no_grad flags into account (#31071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31071 Previously the profiler would think Tensors would require grad, even when the no_grad flag is enabled during execution. This makes the profiling and guards respect the no_grad flag, which eliminates extra differentiable graphs that appear in the backward graph (where no_grad is typically enabled). Test Plan: Imported from OSS Differential Revision: D18915468 Pulled By: zdevito fbshipit-source-id: 1ae816a16ab78ae5352825cc6b4a68ed7681a089	2019-12-17 13:22:16 -08:00
Zachary DeVito	dab5f72543	we should have a config-based way to skip flaky tests (#30978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30978 This particular approach queries our issue tracker for test titles that match the following format: ``` DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync) ``` And then skips the python test for them. There is 1 second timeout so if the internet flakes we still run the test suite, without disabling any tests. This is intended as a quick fix, similar to ninja unland, to get to a green master. Long term test disables should go into the code. Test Plan: Imported from OSS Pulled By: zdevito Differential Revision: D18890532 fbshipit-source-id: fe9447e59a6d5c9ad345f7c3ff15d63b6d2a09e2	2019-12-17 11:58:43 -08:00
Gregory Chanan	d2067569e7	Kill THTensor_(bhistc). (#31254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31254 It's not used. Test Plan: Imported from OSS Differential Revision: D19022923 Pulled By: gchanan fbshipit-source-id: caa5e6b7a133f24f8f3349fd1e53147f8dd3fd97	2019-12-17 08:54:17 -08:00
Gregory Chanan	49eff2f43c	Kill THSize. (#31218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31218 It isn't used. Test Plan: Imported from OSS Differential Revision: D18986641 Pulled By: gchanan fbshipit-source-id: 0a434941d12193941f097232c18ffe4268bf5f82	2019-12-17 08:54:13 -08:00
Yanghan Wang	52b8a52e4d	move AliasWithNameOp to caffe2/operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31281 Reviewed By: houseroad Differential Revision: D19053453 fbshipit-source-id: 350bfd5c001db9c17916dcae7ade8f56db1e9841	2019-12-17 02:39:40 -08:00
BowenBao	0e548a76eb	Upgrade exported ONNX IR version to 6 (#31025 ) Summary: Upgrade IR version from 4 to 6, below is change doc from ONNX. The upgrade should be backward compatible. ``` // IR VERSION 5 published on March 18, 2019 // - Add message TensorAnnotation. // - Add quantization annotation in GraphProto to map tensor with its scale and zero point quantization parameters. IR_VERSION_2019_3_18 = 0x0000000000000005; // IR VERSION 6 published on Sep 19, 2019 // - Add support for sparse tensor constants stored in model. // - Add message SparseTensorProto // - Add sparse initializers IR_VERSION = 0x0000000000000006; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31025 Reviewed By: hl475 Differential Revision: D18935444 Pulled By: houseroad fbshipit-source-id: 9ba47f9657fa1a668db291cf04af07d5e8d73c21	2019-12-16 23:18:22 -08:00
Xintao Chen	10ce1765be	Introducing ScalarTypeType and LayoutType (#31074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31074 As the title, It's step 1 in https://github.com/pytorch/pytorch/pull/30694#issuecomment-564205276. Not using those types in any other place. Test Plan: Making sure all unit tests and build pass successfully. Differential Revision: D18916246 fbshipit-source-id: c8213307ed196e1b51ce1a2a7c10869dcd45b79e	2019-12-16 21:46:47 -08:00
Mingzhe Li	f9010d7648	remove wipe cache from op bench (#31334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334 The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench. Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N1_K1_cpu # Input: M: 1, N: 1, K: 1, device: cpu Forward Execution Time (us) : 111.192 A/B test also pass Benchmark Run #2476535015 Reviewed By: hl475 Differential Revision: D19126970 fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd	2019-12-16 16:34:14 -08:00
Vitaly Fedyunin	229ce89b92	Fix coverage and hypothesis conflict (#31320 ) Summary: Temporarily enforcing versions for all envs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31320 Differential Revision: D19122781 Pulled By: VitalyFedyunin fbshipit-source-id: fe6473b177367371387d4b3b873131e7ecfbc0f8	2019-12-16 15:52:42 -08:00
Shihao Xu	c5d3be1102	Remove the second copy on calling dist_autograd_context._known_worker_ids() (#31206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31206 Improvement on #25525. - DistAutogradContext::getKnownWorkerIds() returns a unordered_map as temp value. No need to copy this temp value A into another temp value B. ghstack-source-id: 95736296 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_worker_ids_recorded ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift -- test_context_cleanup_tensor_with_grad ``` Differential Revision: D5707771 fbshipit-source-id: 9fea83dc69b02047aef8b02a73028a260ac0be40	2019-12-16 15:07:39 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Mingzhe Li	c6a8f884d8	add copy_ operator the op bench (#31327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31327 Adds copy_ operator to the benchmark suite Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 --operators copy_ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: copy_ # Mode: Eager # Name: copy__M1_N1_K1_cpu_dtype_onetorch.int32_dtype_twotorch.int32 # Input: M: 1, N: 1, K: 1, device: cpu, dtype_one: torch.int32, dtype_two: torch.int32 Forward Execution Time (us) : 60.645 Reviewed By: hl475 Differential Revision: D19122910 fbshipit-source-id: e5f0b0e2612daae0201b1b4a87f52b971e0cc4a8	2019-12-16 13:45:12 -08:00
Mingzhe Li	d401ba1417	benchmark binary ops in binary_test (#31326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31326 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_in_one[64,1,64]_in_two[1,64,1]_cpu_dtypetorch.float32 # Input: in_one: [64, 1, 64], in_two: [1, 64, 1], device: cpu, dtype: torch.float32 Forward Execution Time (us) : 28080.802 Reviewed By: hl475 Differential Revision: D19120113 fbshipit-source-id: 1105de208f7609cc6d74f0b5bc6fe75f19146b28	2019-12-16 13:45:08 -08:00
vishwakftw	455e85a2f1	Fix unflatten when dim is a negative integer (#31208 ) Summary: Changelog: - Wrap dim to be a positive integer when dim is negative Pull Request resolved: https://github.com/pytorch/pytorch/pull/31208 Test Plan: - Updated tests in test_namedtensor.py Fixes https://github.com/pytorch/pytorch/issues/31184 Differential Revision: D19036569 Pulled By: zou3519 fbshipit-source-id: 86e01e20988dee7c4b6c73232f66282d687f9a2c	2019-12-16 12:48:03 -08:00
Gregory Chanan	9ca61aec0f	Kill THLogAdd (#31217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31217 It doesn't seem to be used. Test Plan: Imported from OSS Differential Revision: D18986642 Pulled By: gchanan fbshipit-source-id: 96d615df82731d2224d403ab6e2cad6d4c6674fd	2019-12-16 12:30:16 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Sebastian Messmer	c95d46abbd	Remove C++11 compatibility from c10::util::crc64_t (#30920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30920 deletecode ghstack-source-id: 95255641 Test Plan: waitforsandcastle Differential Revision: D18869640 fbshipit-source-id: c3d7f4e1a29caff9fd8a8141c258f6f1c3fd830c	2019-12-15 23:43:02 -08:00
Sebastian Messmer	0d7391f8b2	Test cases for custom ops with autograd (#31003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31003 - ghstack-source-id: 95663728 Test Plan: unit tests Differential Revision: D18896189 fbshipit-source-id: d71f7678fff644536fe30452ee21a5a7df1f1f0b	2019-12-15 22:37:24 -08:00
Ivan Kobzarev	930d0751e6	Java Tensor hybrid, owns at::Tensor, no memcopy for java outputs. (#30501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30501 Motivation: In current state output of libtorch Module forward,runMethod is mem copied to java ByteBuffer, which is allocated, at least in some versions of android, on java heap. That could lead to intensive garbage collection. Change: Output java tensor becomes owner of output at::Tensor and holds it (as `pytorch_jni::TensorHybrid::tensor_` field) alive until java part is not destroyed by GC. For that org.pytorch.Tensor becomes 'Hybrid' class in fbjni naming and starts holding member field `HybridData mHybridData;` If construction of it starts from java side - java constructors of subclasses (we need all the fields initialized, due to this `mHybridData` is not declared final, but works as final) call `this.mHybridData = super.initHybrid();` to initialize cpp part (`at::Tensor tensor_`). If construction starts from cpp side - cpp side is initialiaed using provided at::Tensor with `makeCxxInstance(std::move(tensor))` and is passed to java method `org.pytorch.Tensor#nativeNewTensor` as parameter `HybridData hybridData`, which holds native pointer to cpp side. In that case `initHybrid()` method is not called, but parallel set of ctors of subclasses are used, which stores `hybridData` in `mHybridData`. Renaming: `JTensor` -> `TensorHybrid` Removed method: `JTensor::newAtTensorFromJTensor(JTensor)` becomes trivial `TensorHybrid->cthis()->tensor()` Test Plan: Imported from OSS Differential Revision: D18893320 Pulled By: IvanKobzarev fbshipit-source-id: df94775d2a010a1ad945b339101c89e2b79e0f83	2019-12-15 21:36:20 -08:00
Xiang Gao	60ec53c7fd	Fix copy kernel speed regression introduced in #29631 (#31279 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31271 This fixes copy kernel speed regression introduced in https://github.com/pytorch/pytorch/issues/29631. The previous implementation forces the compiler to instantiate `static_cast_with_inter_type` because it is passed as an argument of a function. This behavior makes it impossible for compilers to do optimizations like automatic vectorization, and, function call itself is expensive compared to a single casting instruction. To check the change, run ``` readelf -Ws /home/xgao/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so \| grep static_cast_with_inter_type ``` On nightly build, we have output ``` 168217: 0000000001852bf0 5 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsdE5applyEd 168816: 0000000001852d30 33 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEaE5applyEa 168843: 00000000018531f0 7 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIblE5applyEl 168930: 0000000001852c20 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIslE5applyEl 168935: 00000000018528d0 124 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIfNS_4HalfEE5applyES1_ 169023: 0000000001852f30 17 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEhE5applyEh 169713: 00000000018525c0 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIahE5applyEh 170033: 0000000001852c10 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsiE5applyEi 170105: 0000000001852bd0 5 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIshE5applyEh 170980: 0000000001852fc0 27 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdES1_IfEE5applyES3_ 171398: 0000000001852810 13 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIdbE5applyEb 171574: 00000000018532e0 35 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIbNS_8BFloat16EE5applyES1_ 171734: 0000000001852b20 6 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIlSt7complexIdEE5applyES2_ 172422: 0000000001853350 54 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EaE5applyEa 172704: 00000000018533c0 38 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EfE5applyEf 172976: 0000000001852890 10 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIflE5applyEl 173038: 0000000001852f80 9 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEfE5applyEf 173329: 00000000018531c0 20 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIbfE5applyEf 173779: 00000000018524d0 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIhiE5applyEi 174032: 0000000001852960 14 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIfNS_8BFloat16EE5applyES1_ 174334: 0000000001852d60 29 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEdE5applyEd 174470: 0000000001852c60 124 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsNS_4HalfEE5applyES1_ 174770: 0000000001852bc0 15 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIlNS_8BFloat16EE5applyES1_ 176408: 0000000001853980 144 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_4HalfEbE5applyEb 176475: 0000000001852790 128 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIdNS_4HalfEE5applyES1_ .... ``` And after this PR, we get empty output ``` ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31279 Differential Revision: D19075587 Pulled By: ngimel fbshipit-source-id: c20088241f39fa40c1d055f0a46eb5b9ece52e71	2019-12-15 14:01:31 -08:00
Natalia Gimelshein	9dc3d8738c	fix view call on discontiguous tensor in to_sparse_backward (#31223 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31223 Differential Revision: D19044172 Pulled By: ngimel fbshipit-source-id: ac9fa71197d4f6c5b90a26e8d23360250745a2e2	2019-12-15 11:51:47 -08:00
Serhat Yilmaz	0e50c1b0d9	Replace assert with cuda assert macro (#31297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31297 Follow-up to https://github.com/pytorch/pytorch/pull/31276 This is final replacement needed for aten out of place hipification. Test Plan: wait for CI to clear. Reviewed By: bddppq Differential Revision: D19070209 fbshipit-source-id: 1428cd0ddfb5a8f4e234fabce822285e898047ea	2019-12-15 05:43:00 -08:00
Rohan Varma	ec92711aac	Fix error message in incorrect rref.localValue() call (#31199 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31198, see the issue for more details. We throw an error when `local_value()` is called on a non-owned rref, but the incorrect node name is printed in the error message. This PR fixes that and adds a relevant unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31199 Differential Revision: D19072014 Pulled By: rohan-varma fbshipit-source-id: 760c20bfd2fbf286eaaca19500469509a575cfec	2019-12-14 22:51:00 -08:00
Xiang Gao	ffe0c1ae4d	Make test_torch.py pass cuda-memcheck (#29243 ) Summary: Make the following changes: - When there are more than 10k errors, cuda-memcheck only shows 10k errors, in this case we shouldn't raise an Exception - Add UNDER_CUDA_MEMCHECK environment to allow disabling `pin_memory` tests when running cuda-memcheck. - Add a `--ci` command option, when turned on, then this script would run output to stdout instead of writing a file, and exit with an error if cuda-memcheck fails - Add a `--nohang` command option. When turned on, then hang would be treated as pass instead of error - Do simple filtering on the test to run: if `'cpu'` in the test name but not `'cuda'` is not in the test name - Add `--split` and `--rank` to allowing splitting the work (NVIDIA CI has a limitation of 3 hours, we have to split the work to satisfy this limitation) - The error summary could be `ERROR SUMMARY: 1 error`, or `ERROR SUMMARY: 2 errors`, the tail could be `error` or `errors`, it is not of the same length. The script is fixed to handle this case. - Ignore errors from `cufft` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29243 Differential Revision: D18941701 Pulled By: mruberry fbshipit-source-id: 2048428f32b66ef50c67444c03ce4dd9491179d2	2019-12-14 20:29:58 -08:00
Ivan Kobzarev	701e05dcbb	Buck test targets robolectric,instrumentattion Summary: Buck targets for robolectric and instrumentation tests for pytorch android: ``` buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:test_host ``` ``` buck test //xplat/caffe2/android:test_instrumentation ``` For both: ``` buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch ``` Models in assets: `pt_android_test_asset` - creates buck target that can be included in both robolectric and instrumentation tests that contains asset created from provided torchscript sources as separate file, using the latest binaries of libtorch. `pt_gen_test_asset_bin` does that tacing, usage format ``` generate_test_asset input_file.jit output_file.py ``` Example of test-host setup for users of pytorch android: robolectric tests: ``` load("fbsource//xplat/caffe2:pt_defs.bzl", "pt_android_test_asset", "pt_predictor_binary", "PT_ANDRIOID_TEST_HOST_JNI_DEPS") pt_android_test_asset( name = "test_asset", src = "test_asset.jit", asset_name = "test_asset.pt", ) robolectric3_test( name = "example_test_host", srcs = [...], jni_deps = PT_ANDRIOID_TEST_HOST_JNI_DEPS, deps = [ ":pytorch_common", ":test_asset", "//fbandroid/java/com/facebook/soloader/annotation:annotation", "//fbandroid/java/com/facebook/testing/robolectric/v3:v3", "//fbandroid/libraries/soloader/java/com/facebook/soloader:soloader", "//fbandroid/third-party/java/robolectric3/robolectric:robolectric", ], ) ``` COMMON_LINKER_FLAGS = ["-Wl,--no-as-needed"] can not be applied on MacOs Test Plan: ``` [twsvcscm@od0187.atn1 /data/sandcastle/boxes/fbsource (b416b20a)]$ buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch Parsing buck files: finished in 7.2 sec Creating action graph: finished in 0.7 sec Building: finished in 11.9 sec (100%) 791/791 jobs, 0 updated Total time: 19.9 sec Testing: finished in 11.0 sec (30 PASS/0 FAIL) RESULTS FOR //xplat/caffe2/android:test_host //xplat/caffe2/android:test_instrumentation PASS 159ms 15 Passed 0 Skipped 0 Failed org.pytorch.PytorchHostTests PASS 152ms 15 Passed 0 Skipped 0 Failed org.pytorch.PytorchInstrumentedTests (localhost:31930) TESTS PASSED ``` OSS changes test: ``` gradle -p android pytorch_android:cAT passes ``` Reviewed By: dreiss Differential Revision: D18799005 fbshipit-source-id: 881609826a837efebc8526aee40355c5a62947d0	2019-12-14 20:29:52 -08:00
Serhat Yilmaz	57ee7dab87	Wraps assert statements in cuda kernels (#31276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31276 Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail() This is similar to https://github.com/pytorch/pytorch/pull/13902 in caffe2 land. Test Plan: wait for CI to clear Reviewed By: bddppq Differential Revision: D19047582 fbshipit-source-id: 34703b03786c8eee9c78d2459eb54bde8dc21a57	2019-12-14 20:29:47 -08:00
Martin Yuan	58eb15f41c	JIT Type parser for mobile (#30391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30391 A Type parser to parse the python string of a Type. For example, "Tuple[str, Optional[float], Dict[str, List[Tensor]], int]". Please refer to test_type_parser.cpp for the usage. One of the use cases is in lite interpreter, types needs to be serialized (directly calling the python_str() of the Type) and deserialized (calling parseType(str)). Test Plan: Imported from OSS Differential Revision: D18924268 Pulled By: iseeyuan fbshipit-source-id: 830d411563abfbeec023f01e7f8f4a1796f9a59a	2019-12-14 20:29:42 -08:00
Ivan Kobzarev	065685180d	Loading module from android asset (#30378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30378 Loading module directly from android assets. Iteration on https://github.com/pytorch/pytorch/pull/30109 Loading Module: ``` mModule = AndroidUtils.loadModuleFromAsset(assetName, getAssets()); ``` `org.pytorch.AndroidUtils` is excluded from pytorch_jni host build Testing: test_app module load switched to this approach and works fine ``` gradle test_app:installMobNet2QuantDebug -PABI_FILTERS=x86 && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity ``` Test Plan: Imported from OSS Differential Revision: D18893269 Pulled By: IvanKobzarev fbshipit-source-id: a7c73776f40e9c67bef233da05db56cc6efbe76a	2019-12-14 20:29:37 -08:00
Yinglin Sun	70013415c7	DDP should not set grad for globally unused params (#28883 ) Summary: https://github.com/pytorch/pytorch/issues/28294 DDP should not set grad for globally unused parameters DDP currently computes the param to bucket mapping upfront, and allreduce grads for all params in every iteration. Even if params are unused, it will just set grad to zero. With such behavior, optimizer cannot tell if a param indeed has a zero grad or it is not used in the current iteration. This could trigger convergence problems for optimizers with weight decay and momentum such as SGD. However, DDP cannot simply set grad to None for local unused parameters, as local unused parameters might be used in other processes, and hence we still need to allreduce its grad. Instead DDP should figure out the globally unused parameters and skip touching their grad in the end of backward. Implementation summary: * Add locally used parameter map for each model replica. * Mark the locally unused parameters in the end of forward and then reduce to get the globally unused parameters. * In the end of backward skip touching grad for those globally unused parameters. * Add a unit test test_global_local_unused_params_grad Pull Request resolved: https://github.com/pytorch/pytorch/pull/28883 Differential Revision: D18491530 Pulled By: mrshenli fbshipit-source-id: 24e9b5f20df86c34ddbf9c7106250fd6ce186699	2019-12-14 20:29:32 -08:00
Peter Bell	7cb83bea3b	Fix static cuda builds on older cmake versions (#30935 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/28378#issuecomment-562597033 To reproduce the failure I had to downgrade to `cmake 3.9` (Ubuntu 18 uses 3.10 apparently). These older `cmake` versions unfortunately don't seem to allow `target_link_libraries(INTERFACE)` to be used with imported libraries. Switching back to `set_property(TARGET)` fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30935 Differential Revision: D18956912 Pulled By: albanD fbshipit-source-id: a2b728ee3268599a428b7878c988e1edef5d9dda	2019-12-14 20:29:27 -08:00
Davide Libenzi	7c1b5084a7	Enable equality operator for bfloat16 CPU scalar types. (#30817 ) Summary: See https://github.com/pytorch/xla/issues/1330 for reference. mruberry ailzhang FYI Pull Request resolved: https://github.com/pytorch/pytorch/pull/30817 Differential Revision: D18847375 Pulled By: mruberry fbshipit-source-id: d1efedf8b975b8d9b55cf0ddf141818eaa7c91f0	2019-12-14 20:29:21 -08:00
Sebastian Messmer	2950530031	caffe2::TypeMeta uses compile time type names (#26619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26619 ghstack-source-id: 95348564 Test Plan: unit tests Differential Revision: D17519252 fbshipit-source-id: 337ec76d17172dd1af60a1676d69964a41dcb7a1	2019-12-14 20:29:16 -08:00
Sebastian Messmer	6e1e09fd10	Compile time type names (#26618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26618 Implement a mechanism to get type names at compile time In a future diff, I'm planning to introduce this to caffe2::TypeMeta and a few other places. ghstack-source-id: 95337871 Test Plan: unit tests Differential Revision: D17519253 fbshipit-source-id: e14017f962fd181d147accb3f53fa8d6ee42a3f8	2019-12-14 20:29:11 -08:00
Vitaly Fedyunin	c35cddb306	Switch default memory format of clone operator to Preserve Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30089 Test Plan: Imported from OSS Differential Revision: D18624985 Pulled By: VitalyFedyunin fbshipit-source-id: 8d315b08b7b5858fd0a81d3375b44ccb94787ad4	2019-12-14 20:29:06 -08:00
Vitaly Fedyunin	fde3d707ad	Switch default memory format of to (and similar) operators to Preserve Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30088 Test Plan: Imported from OSS Differential Revision: D18624984 Pulled By: VitalyFedyunin fbshipit-source-id: 54901786d7496c7dce785140b0585ac9093b1d86	2019-12-14 20:29:01 -08:00
Vitaly Fedyunin	927588df8e	Switch default memory format of _like operators to Preserve Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30087 Test Plan: Imported from OSS Differential Revision: D18624986 Pulled By: VitalyFedyunin fbshipit-source-id: 8e434966f872ffaddf1249248ea445cbbab300ce	2019-12-14 20:28:57 -08:00
Gregory Chanan	1ec989404c	Kill some unnecessary function declarations. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31216 Test Plan: Imported from OSS Differential Revision: D18986640 Pulled By: gchanan fbshipit-source-id: 30630d9ea025bb510f85e9627cbb4ba46de5e93d	2019-12-14 20:28:52 -08:00
Gao, Xiang	d7d07e7caf	thrust is included in SortingKthValue.cu but never used Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31263 Differential Revision: D19042793 Pulled By: ngimel fbshipit-source-id: 28f06c46a53e15f106ebee6c36e2ad25a3676bd2	2019-12-14 20:28:47 -08:00
Serhat Yilmaz	cd3f05b44d	Small fixes for hipification (#31200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31200 We do not hipify these files when doing out of place. Test Plan: wait for CI to clear. Differential Revision: D18963683 fbshipit-source-id: eeba8597143f26417d0a8181a4c746139afefa24	2019-12-14 20:28:43 -08:00
Xiang Gao	9954739956	Refactor test for unique and unique_consecutive and fix some bugs (#31211 ) Summary: Tests for unique_dim will be refactored in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31211 Differential Revision: D19034968 Pulled By: ngimel fbshipit-source-id: 855d326b37638b5944f11fbbce03394cf000daf9	2019-12-14 20:28:38 -08:00
Anjali Chourdia	3587f769dc	use propagate_names instead of propagate_names_for_reduction for cumsum and cumprod Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31134 Differential Revision: D18964172 Pulled By: anjali411 fbshipit-source-id: 3050c6d283a469a858378c44ac2ab9102baefce5	2019-12-14 20:28:33 -08:00
Shihao Xu	a9ad98fb25	Remove unused argument "destId" in addSendRpcBackward (#31207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31207 Cleanup after #30914. In #30914, `autogradContext->addKnownWorkerId(dst);` was moved out of `addSendRpcBackward()`. So `addSendRpcBackward()` does not need `dstId` as it's argument anymore. ghstack-source-id: 95509218 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_context_cleanup_tensor_no_grad ``` Differential Revision: D5742365 fbshipit-source-id: accd041a594ec18d369231f5590289828d87baa7	2019-12-14 20:28:29 -08:00
peter	8fea7a49d6	pinning hypothesis for windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31169 Differential Revision: D19036734 Pulled By: mingbowan fbshipit-source-id: 2205a40720329cb53e741c9827c9049142759588	2019-12-14 20:28:24 -08:00
Jeremy Lilley	b64baa963f	Robustify rpc_agent handlers with generic Future<T> (#31224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31224 If a future coming back to a rpc_agent server is satisfied with an exception, ensure this information is propagated back over the wire. ghstack-source-id: 95522418 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/... Differential Revision: D18979185 fbshipit-source-id: 99848ae805cc2d48948809a238f61a2e0ef234c9	2019-12-14 20:28:20 -08:00
Yanli Zhao	36d17f4105	abort nccl communicators before throwing operation timed out (#31128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31128 When operation times out due to some errors that are not detected by nccl communicators, ncclCommWatchdog can not check this time out error and thus can not abort ncclComms accordingly. So explicitly abort ncclComms here before throwing this timed out exception to users, after this, ncclCommWatchdog can detect nccl communicators are aborted and clean up devNCCLCommMap_ accordingly. if throwing timed out excepiton without aborting nccl communicators here, it was observed that CUDA GPU will have 100% utilization and can not run new events successfully. ghstack-source-id: 95528488 Test Plan: newly revised test _test_nccl_errors_blocking passed with the changes in this diff; the reviesed test failed withtout the changes in this diff Reviewed By: isunjin Differential Revision: D18928607 fbshipit-source-id: be65a05ce4ff005f0c7fed36ae8e28903e8ffe2b	2019-12-13 00:33:36 -08:00
Yangqing Jia	1ef99cf0ab	Intrusive_ptr implementation slower than shared_ptr (#30810 ) Summary: It was a random coding exercise so I wasn't putting much effort into it; but, I was like "hey is the current intrusive_ptr implementation optimized enough?" so I compared it with shared_ptr (using std::shared_from_this). My benchmark result shows that intrusive_ptr is actually slower. On my macbook the speed is: ``` --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- BM_IntrusivePtrCtorDtor 14 ns 14 ns 52541902 BM_SharedPtrCtorDtor 10 ns 10 ns 71898849 BM_IntrusivePtrArray 14285 ns 14112 ns 49775 BM_SharedPtrArray 13821 ns 13384 ns 51602 ``` Wanted to share the results so someone could probably take a look if interested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30810 Reviewed By: yinghai Differential Revision: D18828785 Pulled By: bddppq fbshipit-source-id: 202e9849c9d8a3da17edbe568572a74bb70cb6c5	2019-12-13 00:25:36 -08:00
Ivan Kobzarev	f7c92f60ba	Typo in filename align with classname Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31235 Test Plan: Imported from OSS Differential Revision: D19001793 Pulled By: IvanKobzarev fbshipit-source-id: ae7f410be6b3c291f1feb3027b5b4a6b7ce15ab3	2019-12-12 23:16:29 -08:00
Ivan Kobzarev	db90a5b992	Switch to open sourced fbjni (#30175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30175 fbjni was opensourced and java part is published as 'com.facebook.fbjni:fbjni-java-only:0.0.3' switching to it. We still need submodule fbjni inside the repo (which is already pointing to https://github.com/facebookincubator/fbjni) for so linking. Packaging changes: before that `libfbjni.so` came from pytorch_android_fbjni dependency, as we also linked fbjni in `pytorch_android/CMakeLists.txt` - it was built in pytorch_android, but excluded for publishing. As we had 2 libfbjni.so there was a hack to exclude it for publishing and resolve duplication locally. ``` if (rootProject.isPublishing()) { exclude '/libfbjni.so' } else { pickFirst '/libfbjni.so' } ``` After this change fbjni.so will be packaged inside pytorch_android.aar artefact and we do not need this gradle logic. I will update README in separate PR after landing previous PR to readme(https://github.com/pytorch/pytorch/pull/30128) to avoid conflicts Test Plan: Imported from OSS Differential Revision: D18982235 Pulled By: IvanKobzarev fbshipit-source-id: 5097df2557858e623fa480625819a24a7e8ad840	2019-12-12 20:05:22 -08:00
Jianyu Huang	199e1fb348	Use AVX2 to increase frequency for FP16<->FP32 Caffe2 ops (#31203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31203 For multi-instance environment, AVX2 should help increase the clock frequency. ghstack-source-id: 95502576 Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- "Float16" Reviewed By: jspark1105 Differential Revision: D18962649 fbshipit-source-id: 6532d929a99f41f2f6ad1a1a1962e38ae3ddaecb	2019-12-12 19:42:29 -08:00
Ivan Kobzarev	ca8cb3241a	Expose setNumThreads to android api (#31205 ) Summary: PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure: https://app.circleci.com/jobs/github/pytorch/pytorch/3916388 This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205 Reviewed By: dreiss Differential Revision: D18977250 Pulled By: IvanKobzarev fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5	2019-12-12 18:57:27 -08:00
Mingzhe Li	b7c148013f	fix torch square_ benchmark runtime error (#31221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31221 This is fixing the runtime error introduced in https://github.com/pytorch/pytorch/pull/30719 that added torch square_ operator to the benchmark suite. Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: square_ # Mode: Eager # Name: square__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 66.291 Reviewed By: hl475 Differential Revision: D18987889 fbshipit-source-id: 09c56e3a73aab5ab661aac2b06429063b3a82fac	2019-12-12 18:48:02 -08:00
Alexander Stante	f30b14dead	Fix handling of type comments in body (#30590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30477. Any type comment after `# type: (...) -> ` is ignored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30590 Differential Revision: D18887351 Pulled By: driazati fbshipit-source-id: 162c652f6d7610d14609bbcb25aaa27cdd947a76	2019-12-12 18:19:30 -08:00
Yanli Zhao	20a2e526ef	build a generic future<T> (#29579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29579 Per #28923, this diff is to move Future<Message> to torch::utils and extend it to be Future<T>, most of implementations are copied from FutureMessage and ivalue::Future. merge ivalue::Future with Future<T> will be done separately. The main difference between Future<T> and FutureMessage is the error handling, instead of checking message type inside Future to handle error, this future<T> owns has_error_ and error_ states. also this future passes value_, has_error_ and error_ states to callbacks for easily read future states. In next diff, a torch script rpc async API will be created, before the API returns, it will create an ivalue::Future and passes it to Future<T>'s call back where state of ivalue::Future will be set. In this way, the torch script rpc async API can still return a ivalue::Future and call wait() to get its state appropriately afterwards. ghstack-source-id: 95479525 Test Plan: unit tests Differential Revision: D18263023 fbshipit-source-id: 48a65712656a72c2feb0bb3ec8b308c0528986a6	2019-12-12 16:57:14 -08:00
svcscm	c08f2ea254	Updating submodules Summary: GitHub commits: `367861fec0` `22f5444c09` `11c103407d` `34507cb383` `16d5e3e5ac` `c4ce8e637f` `0f7ef79620` `330fa43933` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 2b6847af7ccba6b53a866e3fded2edf9995b0aaf	2019-12-12 16:53:44 -08:00
Qi Zhou	5ef0d6f854	Remove subgraphNode kind assert in unmergeSubgraph (#31212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31212 To be able to use this function more broadly. Test Plan: unit tests Reviewed By: jackm321 Differential Revision: D18978913 fbshipit-source-id: d998dc7c7f9540f491a8a4bc5d6d25d9c3bf8764	2019-12-12 15:59:55 -08:00
Daya Khudia	a2463cbc38	Adding quantized clamp kernel (#30541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30541 ghstack-source-id: 95450749 Adding quantized clamp kernel Test Plan: Added test. buck test mode/dev //caffe2/test:quantized -- 'test_qclamp $test_quantized\.TestQuantizedOps$' --print-passing-details Differential Revision: D18739628 fbshipit-source-id: 38a029ab96c5b0689bb15c67dc4f274883e74975	2019-12-12 15:54:40 -08:00
Lara	1d5af9599d	Update ONNX Flatten to accept negative indices in opset 11 (#30751 ) Summary: Update ONNX Flatten to accept negative indices in opset 11. With this change, some cases of flatten do not rely on the input rank being available. Fixes : https://github.com/pytorch/pytorch/issues/30512 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/30751 Reviewed By: hl475 Differential Revision: D18946904 Pulled By: houseroad fbshipit-source-id: a6fa30a9182fff92211e505a19325525c6112f19	2019-12-12 15:27:54 -08:00
Mingbo Wan	84d6796658	move AWS ECR gc jobs to circleci (#30996 ) Summary: all jobs are currently running with "--dry-run", so you can verify if the jobs are doing the right thing. i'll remove the flag and make it runs every hour same as on Jenkins once this PR is approved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30996 Differential Revision: D18971001 Pulled By: mingbowan fbshipit-source-id: 2384bdb50ebdf47aad265395f26be3843f0ce05e	2019-12-12 14:28:20 -08:00
Linbin Yu	5c936845cf	fix torch_train build (#30497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30497 fix torch_train build Test Plan: buck build //xplat/caffe2:torch_trainAndroid Reviewed By: dreiss Differential Revision: D18719662 fbshipit-source-id: a3d06b4068d502dbe29681d9f26906f2b8c7b622	2019-12-12 14:20:17 -08:00
Shen Li	a38184dbab	Only create OwnerRRefs when processing remote calls (#31163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31163 The purpose is to unblock integration with TorchScript. Currently, an OwnerRRef will be created by either a remote call or a to_here call, whichever arrives first. However, when making RRef an IValue, we need to know the type of value held by the RRef, which is retrived by checking the return type of the TorchScript function. The TorchScript function is only avaible during the remote call but not in the to_here() call. Hence, an OwnerRRef can only be created when processing a remote call. This commit implements this behavior by introducing a conditional variable for every OwnerRRef in the RRefContext, and let the to_here() call and PyRRef::unpickle block on the CV until the value is ready. Test Plan: Imported from OSS Differential Revision: D18949591 Pulled By: mrshenli fbshipit-source-id: 17513c6f1fd766885ea8e1cd38f672a403fa4222	2019-12-12 14:02:04 -08:00
Iurii Zdebskyi	f6c31f61c5	Enabled roll for bool tensor (#31194 ) Summary: Fixed this [issue](https://github.com/pytorch/pytorch/issues/31079). Tested via unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/31194 Differential Revision: D18958141 Pulled By: izdeby fbshipit-source-id: 119bf4d31df10ee02c277f5a4663038470cf7780	2019-12-12 13:48:14 -08:00
Elias Ellison	bee6344d4e	remove / rewrite weak module tests (#31193 ) Summary: Remove most of the testing for `weak_script`, since we removed it. Refactor a few of the existing tests to use recursive scripting api. Fix for https://github.com/pytorch/pytorch/issues/23965 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31193 Differential Revision: D18966291 Pulled By: eellison fbshipit-source-id: 6b1e18c293f55017868a14610d87b69be42bde12	2019-12-12 13:33:38 -08:00
Jianyu Huang	066e3ed953	Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31127 Original commit changeset: d22448b90843 On Skylake T6: Single Core: (Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.) - Before the PR: ``` native_layer_norm 0.81% 5.884ms 0.81% 5.884ms 122.580us NaN 0.000us 0.000us 48 [[47, 1, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 0.68% 5.053ms 0.68% 5.053ms 105.272us NaN 0.000us 0.000us 48 [[56, 1, 1024], [1024], [1024]] ``` 20 Cores: - Before the PR: ``` native_layer_norm 1.65% 41.682ms 1.65% 41.682ms 868.365us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 1.34% 33.829ms 1.34% 33.829ms 704.771us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` ghstack-source-id: 95420889 Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval" python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval Differential Revision: D18936428 fbshipit-source-id: 8cae33d35fb338b5ac49b1597c2709152612d6e5	2019-12-12 13:31:12 -08:00
Vitaly Fedyunin	66f2bba852	Adding function to convert Module to channels last Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28991 Test Plan: Imported from OSS Differential Revision: D18430810 Pulled By: VitalyFedyunin fbshipit-source-id: 0693d4e31fc6f9831722c29fc83517f16ddfc028	2019-12-12 11:38:35 -08:00
Ivan Kobzarev	4ead2e8996	Fix CircleCI behavior for non-leaf stack PRs (#31088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31088 Original issue: https://github.com/pytorch/pytorch/issues/31027 The problem is that for the stacks of PRs for non-leaf PRs circleCI does not set environment variable `CIRCLE_PULL_REQUEST` which is used to filter out some jobs that should run only on `master`. (Android job for master includes alll 4 abis (x86, x86_64, armeabi-v7a, arm64-v8a) and gradle build tries to get results from all 4 abis, for PRs we run only x86 build for resources economy. Thats why not filtered master android job fails as abis apart x86 were not scheduled) env variable `CIRCLE_BRANCH ` is set fine and can be used as a workaround to distinguish that this is PR (published with ghstack). Test Plan: Imported from OSS Differential Revision: D18966385 Pulled By: IvanKobzarev fbshipit-source-id: 644c5ef07fcf2d718b72695da2cc303da8b94ef4	2019-12-12 11:33:14 -08:00
Richard Zou	bcb0bb7e0e	Remove unnecessary ATen/core/EnableNamedTensor.h (#31117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31117 After this diff, we will have completely removed the named tensor feature flagging. This means that named tensors are always on and that there is no mechanism to turn them off. There should be no more follow-up diffs. I performed the deletion of the header with ``` find . -type f -print0 \| xargs -0 sed -i '/#include <ATen\/core\/EnableNamedTensor.h>/d' ``` Test Plan: - wait for CI Differential Revision: D18934952 Pulled By: zou3519 fbshipit-source-id: 253d059074b910fef15bdf885ebf71e0edf5bea5	2019-12-12 09:53:07 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Michael Suo	c0bcfd0445	Revert D18923167: Expose setNumThreads to android api Test Plan: revert-hammer Differential Revision: D18923167 Original commit changeset: 8d98c2edbff4 fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9	2019-12-12 09:23:58 -08:00
Elias Ellison	56de8853da	Resubmit overload v2 (#31123 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/30356 and https://github.com/pytorch/pytorch/pull/31014 :'( The last commit contains the fix. There was an internal FBcode error not able to compile the previous `impl_default->second.equal(default_val.second))` line. I tried various fixes in C++ internally but couldn't figure anything out. This is a good example of the programming costs of going from python -> c++ for different types of objects, because the conceptual overhead has expanded in scope from (python) -> (python, c++, pybind). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31123 Differential Revision: D18936128 Pulled By: eellison fbshipit-source-id: 7d8fd66a6dd4a3e9838f3a0b68c219b6565a9462	2019-12-12 07:54:23 -08:00
Jerry Zhang	3a02ed822b	Remove `insert_prepack_unpack` and `fold_prepack` for now (#30909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30909 `fold_prepack` doesn't work anymore after we change `scale`, `zero_point` to be attributes, but since the freeze API is coming up, I don't want to spend time to make this work since this will be thrown away later. Test Plan: . Imported from OSS Differential Revision: D18864537 fbshipit-source-id: 649e6b91f2b04b8babacc0afb6bc1530ed7259d3	2019-12-12 07:44:31 -08:00
Stephen Roller	159835e666	Add types for the remaining optimizers. (#31130 ) Summary: Patch Description Round out the rest of the optimizer types in torch.optim by creating the stubs for the rest of them. Testing: I ran mypy looking for just errors in that optim folder. There's no new mypy errors created. ``` $ mypy torch/optim \| grep optim $ git checkout master; mypy torch/optim \| wc -l 968 $ git checkout typeoptims; mypy torch/optim \| wc -l 968 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31130 Reviewed By: stephenroller Differential Revision: D18947145 Pulled By: vincentqb fbshipit-source-id: 5b8582223833b1d9123d829acc1ed8243df87561	2019-12-12 06:36:41 -08:00
Shihao Xu	2488231fe3	Tweak pollTimedOutRPCs thread synchronization (#30355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 95409528 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5516149 fbshipit-source-id: 4bb0bd59fa31d9bfaef9f07ac0126782da17f762	2019-12-11 22:02:32 -08:00
Michael Suo	0db6c01301	Re-enable python 2 builds (#31164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31164 We have a small number of internal projects that still are on Python 2. Until we can figure out how to get rid of them, we need to continue supporting Python 2 for PyTorch. Test Plan: Imported from OSS Differential Revision: D18949698 Pulled By: suo fbshipit-source-id: 4a9d7e4306ed81576e05f243de472937a2bb1176	2019-12-11 22:02:28 -08:00
Serhat Yilmaz	4f5a4be45f	Add native/quantized to the list of header rewrites (#31151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31151 same as title. I am not sure why this was not added in the first place. Test Plan: wait for build to succeed. Reviewed By: bddppq, xw285cornell Differential Revision: D18880216 fbshipit-source-id: 8b17d4fbd5dd08c28c52df8b1da77b69d56d65dc	2019-12-11 21:59:29 -08:00
BowenBao	6ab2d1b1a4	Partially support tensor lists in loop/concat/stack (#30126 ) Summary: This is a follow-up PR after https://github.com/pytorch/pytorch/pull/29136 ~~and https://github.com/pytorch/pytorch/pull/29171~~ ONNX::Loop does not support Sequence type as loop-carried dependencies. Only tensors are supported. This PR adds a pass that converts Sequence loop-carried dependencies to scan_outputs. In opset 11, only the below pattern is supported. ``` PTIR graph: ... %res.1 : Tensor[] = prim::ListConstruct() %res : Tensor[] = prim::Loop(%11, %22, %res.1) block0(%i.1 : Tensor, %res.6 : Tensor[]): ... %res.3 : Tensor[] = aten::append(%res.6, %17) -> (%22, %res.3) return (%res.3) ONNX graph: ... %res : Tensor = onnx::Loop(%11, %22) block0(%i.1 : Tensor): ... -> (%22, %17) %res_seq : Tensor[] = onnx::SplitToSequence[keepdims=0](%res) return (%res_seq) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30126 Reviewed By: hl475 Differential Revision: D18946880 Pulled By: houseroad fbshipit-source-id: 67ee65700513e8a942344a3d647e2e73c19ee3d2	2019-12-11 21:24:41 -08:00
Shihao Xu	a3ed350eb2	Change type of timeoutFutures_ key to time_point instead of duration (#31078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31078 Make `ProcessGroupAgent::pollTimedOutRPCs` code more conventional. - Use `std::chrono::time_point` to represent `endTime` instead of `std::chrono::duration`. - Replace `std::condition_variable::wait_for(lock, endTime)` with `std::condition_variable::wait_until(lock, endTime)`. - Remove the unnecessary `::getRPCRemainingTime()`. ghstack-source-id: 95408482 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5705442 fbshipit-source-id: ba54b7bdb84bc02d05c22360b01290d044bbfcf5	2019-12-11 21:01:31 -08:00
Will Feng	49a5841a9f	Make Conv{1,2,3}dOptions and ConvTranspose{1,2,3}dOptions different classes (#31005 ) Summary: Currently, both `Conv{1,2,3}dOptions` and `ConvTranspose{1,2,3}dOptions` are aliases of the `ConvOptions<{1,2,3}>` class, which causes confusion because the `ConvOptions` class has parameters such as `transposed` that shouldn't be exposed to the end user. (This has caused issues such as https://github.com/pytorch/pytorch/issues/30931.) This PR makes the following improvements: 1. Rename the original `torch::nn::ConvOptions<N>` class to `torch::nn::detail::ConvNdOptions<N>` class, to signify that it's an implementation detail and should not be used publicly. 2. Create new classes `torch::nn::ConvOptions<N>` and `torch::nn::ConvTransposeOptions<N>`, which have parameters that exactly match the constructor of `torch.nn.Conv{1,2,3}d` and `torch.nn.ConvTranspose{1,2,3}d` in Python API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31005 Differential Revision: D18898048 Pulled By: yf225 fbshipit-source-id: 7663d646304c8cb004ca7f4aa4e70d3612c7bc75	2019-12-11 20:31:48 -08:00
Elias Ellison	85107e72b4	Fix type unification With Specialized Tensor Shapes (#31076 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/30015 We had a model that failed in shape propagation because we could not unify `Tensor` and `Optional[BoolTensor]`. Tensor not subtyping Optional[BoolTensor] was correct, but we should have unified those two types to `Optional[Tensor]`. The fix here is that for immutable types containers (Optional, Tuple Type), we should be attempting to unify with complete shape information, and if that fails, then try to unify those types with unshaped types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31076 Differential Revision: D18921802 Pulled By: eellison fbshipit-source-id: aa6890277470c60b349ed1da4d81cc5d71d377f6	2019-12-11 20:11:34 -08:00
Lara	97c1e90f46	ONNX Interpolate Add Scales Params (#28324 ) Summary: Fix for : https://github.com/pytorch/pytorch/issues/27176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324 Reviewed By: hl475 Differential Revision: D18309133 Pulled By: houseroad fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf	2019-12-11 20:09:15 -08:00
Lara	79c27ba4ef	Add ONNX Export Support to floor_divide (#31081 ) Summary: Adding support for the new ATen op floor_divide which was introduced in https://github.com/pytorch/pytorch/pull/30493/files. This operation is used in Torchvision/FasterRCNN-MaskRCNN, which are now failing after the new op was introduced. This PR fixes the failure. cc: neginraoof Pull Request resolved: https://github.com/pytorch/pytorch/pull/31081 Reviewed By: houseroad Differential Revision: D18945316 Pulled By: eellison fbshipit-source-id: 09919c237d618ce7db293c7770f48f7304949dcf	2019-12-11 19:39:11 -08:00
svcscm	d81c6bde3b	Updating submodules Summary: GitHub commits: `36ab9debf5` `55e5070f0a` `5fed1a6da7` `9f0f470fce` `e1dfe80fe0` `786d2c588c` `6c2b9d596d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 1242688c93ba233f19f3afac174c814ae4c455dc	2019-12-11 18:58:37 -08:00
Zafar Takhirov	efe683fb2a	dynamicly quantized linear benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30148 Test Plan: Imported from OSS Differential Revision: D18613006 Pulled By: z-a-f fbshipit-source-id: 3851189a2822fd09a5dd97c9d54774727822d2bf	2019-12-11 18:39:57 -08:00
Jeremy Lilley	73f9e81660	Make rref fetch calls async. (#31086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31086 This change leverages the new future response framework so that server threads don't block until setValue is called. Particulurly, we add a getFuture() method to OwnerRRef so that we get a future that is satisfied once setValue is called. ghstack-source-id: 95402273 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D18925272 fbshipit-source-id: 2caf51019e5b5fd7ec45539544780067deb28610	2019-12-11 18:30:09 -08:00
davidriazati	679b20b1e4	Unify list elements for all list types (#30777 ) Summary: Previously list elements were only unified for tensor lists. This improves error messages and expands the unification logic to include all types. ](https://our.intern.facebook.com/intern/diff/18837726/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30777 Pulled By: driazati Differential Revision: D18837726 fbshipit-source-id: c4d275562a8429700987569426d694faa8f6002e	2019-12-11 17:00:52 -08:00
nikitaved	0414463007	doc fix for max method: a warning about different behaviour on CPU and GPU (#31115 ) Summary: Fixes [30708](https://github.com/pytorch/pytorch/issues/30708), Adds warning regarding different behaviour of the method depending on device type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31115 Differential Revision: D18937365 Pulled By: zou3519 fbshipit-source-id: 7c731dd80f8b371de08d7fdfcc2196be15a593e1	2019-12-11 16:02:33 -08:00
Richard Zou	e5a550cd1d	Fix Test CI by pinning hypothesis and correcting the import (#31137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31137 Our Test CI is broken because: - hypothesis recently did a new release that reorganized their internal modules - we were importing something from their internal module structure. This PR fixes the CI by doing the following: - import SearchStrategy from the correct (public) location - Pin the hypothesis version to avoid future surprises. In the long term, we should stop install hypothesis every time the CI runs and instead install it as a part of our docker build process. See https://github.com/pytorch/pytorch/issues/31136 for details. Test Plan: - I tested this locally; before this PR test/test_nn.py fails to run but after it does run. - Wait for CI Differential Revision: D18940817 Pulled By: zou3519 fbshipit-source-id: c1ef78faa5a33ddf4d923f947c03cf075a590bb8	2019-12-11 15:42:59 -08:00
Brian Vaughan	945ce71b18	Correctly handle scalar types, fix parse of numpy ints (#30486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486 Fixes: https://github.com/pytorch/pytorch/issues/29252 There is some incorrect code in the handling of parsing python numbers that led to issue #29252: When we allow interpretation of a zero-dim numpy integer value as a scalar in pytorch, we incorrectly parse the int as a float. This PR also fixes the issue described in the "FIXME" here: https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487 Test Plan: Added a unit test based on the example given in the issue. Differential Revision: D18932520 Pulled By: nairbv fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65	2019-12-11 15:35:57 -08:00
Michael Suo	293a139d79	add a warning for script classes (#31069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31069 Just to clarify that they are still experimental. Test Plan: Imported from OSS Differential Revision: D18920496 Pulled By: suo fbshipit-source-id: d2f3014592a01a21f7fc60a4ce46dd0bfe5e19e9	2019-12-11 14:48:55 -08:00
Ivan Kobzarev	6225443009	Expose setNumThreads to android api (#31033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033 Intention: There are requests from users to control number of threads from android side: https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2 https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2 At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads() Test Plan: Imported from OSS Differential Revision: D18923167 Pulled By: IvanKobzarev fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e	2019-12-11 14:20:14 -08:00
Shihao Xu	06d874f95b	Change startTime_ to endTime_ in FutureInfo (#30342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30342 This can eliminate the unnecessary calls to getRPCEndTime(). Reduce lines of code for simplicity. ghstack-source-id: 95377162 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5705624 fbshipit-source-id: aca4c4917718124022c09ee0d13cf5ca483402af	2019-12-11 14:04:49 -08:00
svcscm	7a8261e962	Updating submodules Summary: GitHub commits: `06033e7eb2` `c56d2fa73f` `972f299a62` `3717a88289` `ea64a080c6` `b4e0237162` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 73d2d91c851f1905d6d4606a9f8002eb47246852	2019-12-11 12:52:00 -08:00
Shen Li	4b2d356ac1	Re-enable test_rref_context_debug_info after enforcing proper synchronization (#30994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30994 The flakiness we saw was due to missing barriers(), which caused states leaked into previous or subsequent checks. This commit attempts fix this problem by adding barriers before and after each check. Test Plan: Imported from OSS Differential Revision: D18893457 Pulled By: mrshenli fbshipit-source-id: 42bcc12efa7e6e43e2841ef23e4bc2543b0236c6	2019-12-11 12:38:14 -08:00
Alban Desmaison	5b03ff0a09	Update embedding renorm comment to reference fixed issue (#29140 ) Summary: Address last comment in https://github.com/pytorch/pytorch/issues/28546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29140 Differential Revision: D18915091 Pulled By: albanD fbshipit-source-id: 756ff5bb6a92d47c80aa9f96ff6f0edea5fd24de	2019-12-11 11:58:55 -08:00
Rohan Varma	dbc8b00816	Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077 ) Summary: We mention `WorkerInfo` and `RpcBackendOptions` in a couple of different locations in our docs, and these are public classes that the user may use, so we should add the class to the documentation. <img width="978" alt="Screen Shot 2019-12-10 at 1 42 22 PM" src="https://user-images.githubusercontent.com/8039770/70571759-47db2080-1b53-11ea-9d61-c83985a29dd9.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31077 Differential Revision: D18928162 Pulled By: rohan-varma fbshipit-source-id: 67f11eedd87523c469377b791a0ba23704ec3723	2019-12-11 11:39:57 -08:00
Yuchen Hao	4a751dfc20	optimize MulGradient for common shapes (#19705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19705 Optimizing for a case when there's a consecutive dims that are not broadcasted followed by another consecutive dims that are broadcasted. For example, MulGradient(["dC", "A", "B"], ["dA", "dB"], broadcast=True, axis=0) where A.shape == dC.shape == [9508, 80] and B.shape == [80] . Test Plan: In SKL T6, Running mul_gradient_benchmark without this optimization Operator #0 (dA, MulGradient) 11.9119 ms/iter After this optimization, Operator #0 (dA, MulGradient) 0.672759 ms/iter Need to land D15291800 before to fix the unit test error Reviewed By: dmudiger Differential Revision: D15075415 fbshipit-source-id: 0f97be17cf8f1dacbafa34cd637fb8bc1c5e5387	2019-12-11 11:39:52 -08:00
Shen Li	a53b39f09d	Disable flaky test_process_group_debug_info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31113 Test Plan: Imported from OSS Differential Revision: D18932365 Pulled By: mrshenli fbshipit-source-id: a2996b6a8d3881be4ffc174b85509aeee8c51c96	2019-12-11 11:36:58 -08:00
Iurii Zdebskyi	44ecc3a70b	Add tracing support for optional Device and Layout (#30979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30979 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. -------------- In this PR: Add tracing support for optional Device and Layout types. -------------- Test Plan: Imported from OSS Differential Revision: D18912685 Pulled By: izdeby fbshipit-source-id: 4a9514ce2eee0041f9bc96636d3ddb4f077675e1	2019-12-11 11:32:52 -08:00
Iurii Zdebskyi	672f4cfad9	Added C++ API test (#30980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30980 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. -------------- In this PR: Add a test to check that C++ API behavior stays the same after all the changes. While working on it a bug related to `requires_grad` was found and logged in the master task. -------------- Test Plan: Imported from OSS Differential Revision: D18912681 Pulled By: izdeby fbshipit-source-id: 19772a37c92dde820839b79055f348689b99fa77	2019-12-11 11:21:05 -08:00
David Riazati	1f87e823b8	Make `nn.Transformer` TorchScript compatible (#28561 ) Summary: This makes `nn.Transformer` usable from TorchScript. It preserves backwards compatibility via `__setstate__` on the encoder/decoder. Fixes https://github.com/pytorch/pytorch/issues/24173 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28561 Differential Revision: D18124753 Pulled By: driazati fbshipit-source-id: 7314843e5aa9c9bf974c4672e4edb24ed8ef4a6f	2019-12-11 10:57:31 -08:00
Richard Zou	a929d312ac	Add dill>=0.3.1 as testing dependency (#31121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31121 For https://github.com/pytorch/pytorch/pull/30985 . Test Plan: - run `pip install "dill>=0.3.1"` locally, check that it actually installs dill>=0.3.1. Differential Revision: D18934871 Pulled By: zou3519 fbshipit-source-id: 688a489b9e81134ccb5ab4b099116e3fe6b6b7ae	2019-12-11 10:33:00 -08:00
svcscm	3593981976	Updating submodules Summary: GitHub commits: `9b38c6430e` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 8801c415c9b00bec46efc102c0daceba59397449	2019-12-11 09:50:33 -08:00
Alban Desmaison	717274c001	Add useful warnings for t.grad when it won't be populated for known reasons (#30531 ) Summary: Fix https://github.com/pytorch/pytorch/issues/2362 and https://github.com/pytorch/pytorch/issues/19778 To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30531 Differential Revision: D18832767 Pulled By: albanD fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff	2019-12-11 09:47:18 -08:00
xiaobing.zhang	3301794855	Port ELU activation to Aten (#29275 ) Summary: VitalyFedyunin, This PR is about port ELU activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.ELU() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.28 (ms); backwad avg time is 0.18 (ms). input size(128, 10000) forward time is 23.53 (ms); backwad avg time is 14.46 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.16 (ms); backwad avg time is 0.08 (ms). input size(128, 10000) forward time is 15.53 (ms); backwad avg time is 6.60 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.24 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.73 (ms); backwad avg time is 1.11 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 14.40 (ms); backwad avg time is 6.00 (ms). ``` How to set the numbers of thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run ./run.sh num_threads test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29275 Differential Revision: D18587389 Pulled By: VitalyFedyunin fbshipit-source-id: bea8f3f006c6893090f863d047c01886d195437a	2019-12-11 09:44:34 -08:00
Jianyu Huang	4aa30d3c0c	Revert D18293522: Optimize LayerNorm with explicit vectorization using Vec256 Test Plan: revert-hammer Differential Revision: D18293522 Original commit changeset: f4cfed6e62ba fbshipit-source-id: cdd6d9d36c00b516aecdab549abeeffc4a473829	2019-12-11 08:55:28 -08:00
Richard Zou	9305f44854	Remove BUILD_NAMEDTENSOR from codegen and .cu files (#31047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31047 Changelist: - remove BUILD_NAMEDTENSOR from .cu files - remove BUILD_NAMEDTENSOR special handling in function_wrapper.py - remove BUILD_NAMEDTENSOR from cpp_extension.py. This code actually did nothing because we always compile with BUILD_NAMEDTENSOR. Test Plan: - run tests Differential Revision: D18908442 Pulled By: zou3519 fbshipit-source-id: b239e24de58580adaf3cef573350773a38b1e4f0	2019-12-11 08:49:56 -08:00
svcscm	65f6e449c7	Updating submodules Summary: GitHub commits: `0f94976f31` `be15abd839` `034086d70f` `aa131abdf5` `a3f268f1b5` `6394aabc99` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: fa99a0a096de1f088e5fa8cd92fdf5fd6c330740	2019-12-11 07:25:34 -08:00
Jianyu Huang	d6d6075573	Optimize LayerNorm with explicit vectorization using Vec256 (#29104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29104 We would like to provide the vectorized implementation for layer norm. This PR reuses https://github.com/pytorch/pytorch/pull/23349. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval" python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval Differential Revision: D18293522 fbshipit-source-id: f4cfed6e62bac1b43ee00c32b495ecc836bd9ec5	2019-12-11 06:01:45 -08:00
Michael Suo	28ee309c9a	disable onnx py3 gcc5 build (#31100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31100 This appears to not work right now. Disabling pending an investigation. Test Plan: Imported from OSS Differential Revision: D18928777 Pulled By: suo fbshipit-source-id: 63089131bad98902979e5cf4373732c85badef9d	2019-12-11 00:26:15 -08:00
BowenBao	8013ffd400	Fix weight_norm export for dim=0 (#31015 ) Summary: Exported weight_norm is incorrectly reducing over axis 0 as well when dim is set to 0. Previous test case only covers weight with size(0) == 1, which yields the same result whether reduced over or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31015 Reviewed By: hl475 Differential Revision: D18900894 Pulled By: houseroad fbshipit-source-id: 19004f51933b37f848dbe4138e617a7a8e35a9ec	2019-12-10 23:43:56 -08:00
peterjc123	9a5fd2eb07	Fix conflicts in CMAKE_GENERATOR and generator (#30971 ) Summary: ...specified in -G https://cmake.org/cmake/help/latest/variable/CMAKE_GENERATOR.html According to the document, the generator could be determined through two methods: 1. Specify in `-G` 2. Read from `CMAKE_GENERATOR` We should avoid conflicts in these two methods. This fixes https://github.com/pytorch/pytorch/issues/30910. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30971 Differential Revision: D18927529 Pulled By: mingbowan fbshipit-source-id: e9a179ceb32d6fbabfaeac6cfe9e6170ca170b20	2019-12-10 22:22:26 -08:00
Shunting Zhang	7f5f2e8871	add ZERO_COLLISION_HASH to caffe2 data type (#30912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912 Add a new data type ZERO_COLLISION_HASH . Test Plan: ci Reviewed By: boryiingsu Differential Revision: D18843626 fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c	2019-12-10 21:36:24 -08:00
Michael Suo	c72dd526a7	kill py2 onnx builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31082 Differential Revision: D18922689 Pulled By: suo fbshipit-source-id: 98c91b90ee3b1dd13c6020597a0ace741a1597da	2019-12-10 20:25:42 -08:00
Elias Ellison	9f3fe78239	peephole optimize type refinements (#31024 ) Summary: Peephole optimize out type refinements when they are no longer refining the type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31024 Differential Revision: D18920958 Pulled By: eellison fbshipit-source-id: 6d05d9812b9f9dcf001de760a78a2042fb832773	2019-12-10 18:32:28 -08:00
Michael Suo	d02280b432	move migration guide to appendix (#31068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31068 Let's get it out of the early parts now that the recursive API has been around for a while Test Plan: Imported from OSS Differential Revision: D18920498 Pulled By: suo fbshipit-source-id: 6f4389739dd9e7e5f3014811b452249cc21d88e7	2019-12-10 18:04:02 -08:00
svcscm	d088bd0bad	Updating submodules Summary: GitHub commits: `c6506e2698` `4427c1a832` `a653857178` `558f42bd6c` `3839cbaf52` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 4a253bba6de9a2c2a11a82e33809a370e1b4fd04	2019-12-10 16:58:08 -08:00
Jeremy Lilley	e7e6d56b77	Allow async work in rpc RequestCallback processing. (#30637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30637 RequestCallback api currently forces work to be always synchronous, which, as we scale, means we're going to need to throw large number of (mostly blocked) threads at the rpc problem. For some activities like dependent autograd rpcs, there's not a necessary reason to block in these threads. In this change, the RequestCallback api is updated to return a shared_ptr<FutureMessage> rather than a Message: std::shared_ptr<FutureMessage> operator()(Message& request) const; With a futures-style api, RPC ops that wish to be async can then be async, while short-lived blocking functions (or Python UDFs) can just block. In this change, we keep all of the current ops as synchronous (i.e. we block and then return a completed FutureMessage). We also update the rpc_agents in a manner compatible with this sort of parallelism. Here, we only want to incur overhead when we use the async behavior. Some modest extra cost seems unavoidable here (e.g. the allocation for the std::make_shared<>), but we can trivially detect the synchronous/completed case in the rpc_agent and avoid the extra thread-switches/etc. in that case. ghstack-source-id: 95287026 Test Plan: - Basic: buck test mode/dev-nosan caffe2/test/... - Additional testcase in ThriftRpcAgentTest for deferred work. Differential Revision: D18774322 fbshipit-source-id: cf49922a71707cfb1726de16f93af23b160385d8	2019-12-10 16:11:05 -08:00
Supriya Rao	e42af97349	Add quantized concat conversion (#30887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30887 Support to convert quantized concat from pytorch to caffe2 Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_cat Imported from OSS Differential Revision: D18855676 fbshipit-source-id: 5d0cf3f03c61819e168b080afa368b1255d0419c	2019-12-10 15:46:16 -08:00
Ilia Cherniavskii	3de8584de8	Correct definition of nodes that work with Autograd (#30683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30683 Assume that a node can work with autograd only if it is not a fusion group and in prim or aten namespaces. Test Plan: CI Reviewed By: lly-zero-one Differential Revision: D18795171 Pulled By: ilia-cher fbshipit-source-id: 301090557e330b58be70e956784f7f0dc343c684	2019-12-10 15:39:38 -08:00
Michael Suo	b7652a2f81	remove py2 flake8 lint (#29357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29357 As title Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D18920562 Pulled By: suo fbshipit-source-id: b5dd559cfb0ba6c64b9ccf3655417afb56a7b472	2019-12-10 15:31:10 -08:00
Michael Suo	d113b22571	kill PyTorch py2 circle jobs (#29353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29353 First step to killing Python 2 everywhere. I don't really know that much about the caffe2 circle jobs so I left them alone for now. Test Plan: Imported from OSS Differential Revision: D18920563 Pulled By: suo fbshipit-source-id: b37d8427a6ecd4b8a7e16c1ff948e0ce13b5798f	2019-12-10 15:31:06 -08:00
TH3CHARLie	5edfe9cb80	add torch.square (#30719 ) Summary: fixes https://github.com/pytorch/pytorch/issues/30524 This adds an new operator `torch.square` to PyTorch I think it is ready for the first-time review now albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/30719 Differential Revision: D18909268 Pulled By: albanD fbshipit-source-id: 5626c445d8db20471a56fc1d7a3490e77812662b	2019-12-10 15:22:46 -08:00
Michael Suo	e3d40f857b	Make nn.Module `forward()` type annotation more permissive (#31057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31057 The current signature basically will always fail to type check, because mypy enforces that the subclass method's input types must be "wider" than their superclass method's input types (i.e. they can vary contravariantly). And nothing is wider than `Any`. This change makes it so that any input params are allowed in `forward()`. Fixes #29099 Test Plan: Imported from OSS Differential Revision: D18918034 Pulled By: suo fbshipit-source-id: 9940e9f769b55d580d9d7f23abf6f88edb92627f	2019-12-10 14:36:13 -08:00
svcscm	8fd85d70be	Updating submodules Summary: GitHub commits: `163b6e2428` `1d7a0e1a4b` `b8031f09d7` `7fd86a8f64` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 98b2487b39fb56641641c0947ed09f883755126a	2019-12-10 14:19:31 -08:00
Gregory Chanan	ed20937231	Remove TensorImpl::maybe_zero_dim. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30878 Test Plan: Imported from OSS Differential Revision: D18855989 Pulled By: gchanan fbshipit-source-id: 44087b6136ec40d0a3de5b5a9f03c60d002a1107	2019-12-10 13:21:47 -08:00
svcscm	0cbbe050bb	Updating submodules Summary: GitHub commits: `b459fcc89f` `2b060c1498` `13a2c072c4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 59fb11a977dcb7b2c09acb7fe997b0d5e52f27c4	2019-12-10 12:48:07 -08:00
Zafar Takhirov	cc319659e3	qnnpack TanH Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31013 Test Plan: Imported from OSS Differential Revision: D18898903 Pulled By: z-a-f fbshipit-source-id: aa126a98627b808678f629f39853c3b9c70eb2bf	2019-12-10 12:23:37 -08:00
Pritam Damania	b01b05790e	Fix memory leak due to circular dependency. (#31030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31030 DistAutogradContext held a shared_ptr reference to RecvRpcBackward and RecvRpcBackward held a shared_ptr reference to the context. This circular dependency caused significant memory leaks. As a result, I'm changing the reference in RecvRpcBackward to be a weak_ptr. Test Plan: waitforbuildbot Differential Revision: D18896389 fbshipit-source-id: e5bc588b6f998885854e3a67de1e82452e8475ce	2019-12-10 12:20:43 -08:00
Summer Deng	57f29a44c7	Bug fix of the histogram observers (#30970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30970 Check null tensors in the histogram observers Test Plan: f154576636 vs f154820243 Reviewed By: hx89 Differential Revision: D18865771 fbshipit-source-id: 669c014d914525deee36142e12f013afaf3caf1d	2019-12-10 11:45:20 -08:00
Gregory Chanan	27d7dba9ab	Remove scalar_check specification and codegen. (#30874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30874 These have all been disabled at this point, so there is no difference in the generated code. Test Plan: Imported from OSS Differential Revision: D18855990 Pulled By: gchanan fbshipit-source-id: 03796b2978e23ef9060063f33241a1cbb39f1cf3	2019-12-10 11:41:20 -08:00
Tao Xu	47033b49f3	Suppress XCode build warnings (#31000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31000 ## Summary Add Fastlane configurations to suppress the build warnings from XCode. Test Plan: Imported from OSS Differential Revision: D18912489 Pulled By: xta0 fbshipit-source-id: f2c54d54a12ad2415695d1fcb1800684c7a9e560	2019-12-10 11:37:52 -08:00
svcscm	2da3b9a0f6	Updating submodules Summary: GitHub commits: `fd8771904e` `6bf51e234f` `6380df5e10` `696c2a2359` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 188670fcdc50ccf060eea137698ecfb45484e059	2019-12-10 11:23:13 -08:00
Pieter Noordhuis	78a00d72b4	Revert D18899127: resubmit polish up overloads on free functions Test Plan: revert-hammer Differential Revision: D18899127 Original commit changeset: 9049b8718926 fbshipit-source-id: c70a8aa4120aa757dce0926a8ab3cc5c92cd6041	2019-12-10 10:51:07 -08:00
Hong Xu	394d2f7037	Fix the rendering of the doc of max. (#30779 ) Summary: Close https://github.com/pytorch/pytorch/issues/30731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30779 Differential Revision: D18837317 Pulled By: zou3519 fbshipit-source-id: b9b5ba414756a68d4b39a7a7c2d89fee1e3c040f	2019-12-10 10:48:16 -08:00
Protonu Basu	313c211f3f	Calling JITed 8 Bit Fused SLS in FBGEMM from C2 (#30926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30926 Calling the JITed FBGEMM kernel for Fused 8 Bit Sparse Length Sum (Fused8BitRowwiseEmbeddingLookup) Test Plan: buck test mode/dbg //caffe2/caffe2/python:lengths_reducer_fused_8bit_rowwise_ops_test All tests pass. Reviewed By: jspark1105 Differential Revision: D18058128 fbshipit-source-id: 0dfa936eb503712c39e53748e015fc156afde86f	2019-12-10 10:44:05 -08:00
Chunli Fu	bb7befb12c	Support loading by blob in predictor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30805 Reviewed By: ipiszy Differential Revision: D18827383 fbshipit-source-id: b97f958768618ca29a02b057667a9b4ee313ad3c	2019-12-10 10:34:14 -08:00
Summer Deng	a42d093db2	FCTransposed to FbFCPacked (#29766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29766 Add FbgemmPackTranspose op to support the packing on FCTransposed weights Add FCTransposed to FbFCPacked transformation to Dper fp16 exporter Test Plan: ``` buck test mode/opt caffe2/caffe2/fb/fbgemm:fb_fc_packed_op_test ``` ``` buck test mode/opt caffe2/caffe2/python:layers_test ``` Differential Revision: D18482306 fbshipit-source-id: e8f1947b3d0d04892293509ebf88742f5f0f5997	2019-12-10 10:18:21 -08:00
Lu Fang	c34ef1aa2e	Automatic update of fbcode/onnx to c08a7b76cf7c1555ae37186f12be4d62b2c39b3b (#30619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30619 Previous import was fea8568cac61a482ed208748fdc0e1a8e47f62f5 Included changes: - [c08a7b76](https://github.com/onnx/onnx/commit/c08a7b76): doc: fix some typos at ONNXIFI (#2473) <Yorkie Liu> - [4be12d46](https://github.com/onnx/onnx/commit/4be12d46): remove workshop update since it is done (#2460) <Prasanth Pulavarthi> - [86107d1b](https://github.com/onnx/onnx/commit/86107d1b): Updated with correct URL to LICENSE (#2468) <Ryan Loney> - [9bf6fbb6](https://github.com/onnx/onnx/commit/9bf6fbb6): Update Argmin/Argmax (#2461) <Lara Haidar> - [748d81b8](https://github.com/onnx/onnx/commit/748d81b8): Fix windows conda build (#2452) <Ashwini Khade> - [a32db1c5](https://github.com/onnx/onnx/commit/a32db1c5): Delete duplicate word in comment (#2439) <Haibo Hao> - [e108da9a](https://github.com/onnx/onnx/commit/e108da9a): Fix bug in function body verifier (#2390) <G. Ramalingam> - [c3d3ef82](https://github.com/onnx/onnx/commit/c3d3ef82): docs: fix typo in IR.md (#2441) <Elliot Waite> Test Plan: ci Reviewed By: hl475 Differential Revision: D18766132 fbshipit-source-id: 13c04f21399579acb87a8f9fac2e4c329b0720b8	2019-12-10 10:15:08 -08:00
hxia11	06c7420fa2	Raise error if a block can not be found from a CUDA tensor (#30870 ) Summary: After several discussions, we agreed not to put any extra safety check for recordStream as either the check will cause failures in certain scenarios or there is no need to throw for user errors. As a summary, it simply does what is described in https://github.com/pytorch/pytorch/issues/27405, check if a tensor is indeed allocated by a CUDACachingAllocator instance, if it is, then throw internal error if a block can not be retrieved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30870 Differential Revision: D18851669 Pulled By: yxia11 fbshipit-source-id: c2f01798cd24f1fd0f35db8764057d5d333dab95	2019-12-10 08:04:00 -08:00
Elias Ellison	af4040d808	resubmit polish up overloads on free functions (#31014 ) Summary: Resubmitting https://github.com/pytorch/pytorch/pull/30356 Second commit has reintroduces deleted function which caused revert previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31014 Differential Revision: D18899127 Pulled By: eellison fbshipit-source-id: 9049b8718926c329d9cb46bb96eac6c278e9b866	2019-12-10 07:57:47 -08:00
Richard Zou	e05ee4c421	Remove BUILD_NAMEDTENSOR macros (#30894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30894 This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In https://github.com/pytorch/pytorch/pull/25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-529014810) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](https://github.com/pytorch/pytorch/pull/25798#issuecomment-555707645) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-530539104) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](https://github.com/pytorch/pytorch/pull/25798#issuecomment-562214053), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: D18858543 Pulled By: zou3519 fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13	2019-12-10 07:54:05 -08:00
Elias Ellison	f48a8901c5	Add floor_divide function (#30493 ) Summary: Adds `torch.floor_divide` following the numpy's `floor_divide` api. I only implemented the out-of-place version, I can add the inplace version if requested. Also fixes https://github.com/pytorch/pytorch/issues/27512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30493 Differential Revision: D18896211 Pulled By: eellison fbshipit-source-id: ee401c96ab23a62fc114ed3bb9791b8ec150ecbd	2019-12-10 07:51:39 -08:00
svcscm	44428d0ee2	Updating submodules Summary: GitHub commits: `6c87dc4d3c` `5ec43afc1d` `1e3cb8283f` `3af1c72471` `dc8e6e6e68` `405e596d50` `f40ae54a52` `479a143912` `e63b40cb4b` `cb5f0670a6` `470a664def` `6e8f70b2d9` `0fb026ca58` `3595e0cf38` `79b171ffa3` `fb5322d98d` `cd48fc606b` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 99bee659ea0fca0247d67d2dac12a821e1bd402d	2019-12-10 07:45:23 -08:00
Chunli Fu	42324cb6e8	Change interface from map of TensorShape to shapeInfoMap (#30802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30802 Change shape_hints from map<string, TensorShape> to ShapeInfoMap to catch dimType info from model file. Reviewed By: ipiszy Differential Revision: D18821486 fbshipit-source-id: c5d9ed72e158d3698aba38900aeda00f776745b4	2019-12-10 00:35:11 -08:00
neginraoof	5205556782	Export custom ops (#29752 ) Summary: Updated to export API: When calling this API, a dict containing the custom opsets (domain and version) used to export the model could be provided. We allow registering one custom opset (domain, version) per ONNX opset. So, when exporting an operator from a custom domain, users need to pass this pair. Default custom opset version is 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29752 Reviewed By: hl475 Differential Revision: D18703662 Pulled By: houseroad fbshipit-source-id: 84d22557d132b526169051193d730761798fce60	2019-12-09 18:48:50 -08:00
Jerry Zhang	04b9324476	Factor out getInvokedMethod in `InsertQuantDeQuantHelper` (#30860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30860 att Test Plan: . Imported from OSS Differential Revision: D18849021 fbshipit-source-id: e5ff260f2f4e88075b0c6b32ccfd8272053ccc41	2019-12-09 16:10:58 -08:00
Shen Li	fa6661422f	Disable flaky test_rref_context_debug_info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30990 Test Plan: Imported from OSS Differential Revision: D18893023 Pulled By: mrshenli fbshipit-source-id: 80b36927f243fa53c4d64f7e7c51097290ffdeee	2019-12-09 15:55:51 -08:00
Wanchao Liang	73dd8c005a	Revert D18864774: polish up overloads on free functions Test Plan: revert-hammer Differential Revision: D18864774 Original commit changeset: 6c566738bd6f fbshipit-source-id: 669192605a3bc1a6ba06bbb5cae54f61637a45ae	2019-12-09 15:41:45 -08:00
Elias Ellison	446488960a	polish up overloads on free functions (#30356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30356 This finishes up the `torch.jit.overload` api for free-functions. - defaults now required on the implementation function itself - fully follows [overload spec](https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading) such that the following is supported ``` overload def mouse_event(x1: int, y1: int) -> ClickEvent: ... def mouse_event(x1: int, y1: int, x2: Optional[int] = None, y2: Optional[int] = None): ... ``` Note: `jit.overload` isn't supported yet for UDT, but is support for modules. This PR doesn't make the same changes for modules, if reviewers think I should include them then I could do so in a follow up PR or wait to land this. Since that's still an internal api I think it's fine, and the changes here would allow us to expose `torch.jit.overload` on free functions. Test Plan: Imported from OSS Differential Revision: D18864774 Pulled By: eellison fbshipit-source-id: 6c566738bd6f0551a000a9ea8d56e403636b7856	2019-12-09 15:12:18 -08:00
Elias Ellison	a03581b927	add tests that schemas are valid (#30749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30749 Add check to schemas that the schema is sane. I removed the defaults from symbolic_script because they were in some cases wrong and don't actually do anything. At the point they're invoked the forward should already have matched all arguments. Test Plan: Imported from OSS Differential Revision: D18864775 Pulled By: eellison fbshipit-source-id: 273d7e96d65b8a3d3de72e2d7bfcdf2417046c6b	2019-12-09 15:12:13 -08:00
Shen Li	e9ca13d7f5	Add glue code to collect debug info from all components Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30888 Test Plan: Imported from OSS Differential Revision: D18857139 Pulled By: mrshenli fbshipit-source-id: 5c1bfb83a21a4a57c4297bb94f14baa09520b791	2019-12-09 14:39:11 -08:00
Shen Li	8a57362000	Fix index out of bound error in Engine::ready_queue_size when called before start_threads Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30967 Test Plan: Imported from OSS Differential Revision: D18887178 Pulled By: mrshenli fbshipit-source-id: 67baeac9214a4749ce7e9b4d89862c93620b2d5e	2019-12-09 14:39:07 -08:00
Shen Li	a38c9b1ade	Adding debugging metrics to process group agent Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30884 Test Plan: Imported from OSS Differential Revision: D18857140 Pulled By: mrshenli fbshipit-source-id: 4ec61d13778dd49467159d0db4b6dd51feaf282b	2019-12-09 14:39:03 -08:00
Elias Ellison	82268bf300	handle reassignment to inf and nan (#30877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30877 Previously, when the environment tried to reassign variables which had been assigned to "inf" or "nan" it would fail because they are not simple values. Constant prop exposed this, a test was failing internally because of it. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D18861016 Pulled By: eellison fbshipit-source-id: b9b72978a26a0b00b13bf8ea7685825551f5a541	2019-12-09 14:20:17 -08:00
Elias Ellison	3eefc06feb	add constant prop for immutable types (#30544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30544 Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with https://github.com/pytorch/pytorch/pull/30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. Test Plan: Imported from OSS Differential Revision: D18833607 Pulled By: eellison fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a	2019-12-09 14:20:12 -08:00
Elias Ellison	648bb501a1	rename shouldAnnotate api (#30543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30543 `shouldAnnotate` doesn't make make a ton of sense as a public api Test Plan: Imported from OSS Differential Revision: D18833608 Pulled By: eellison fbshipit-source-id: 460ee05d0fa91b1edc640c037be2a6ee8eaf50a6	2019-12-09 14:20:07 -08:00
Jerry Zhang	45f0556ba0	Proper print for one element tuple (#30853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30853 Right now we print one element tuple as `(val)`, and it will be interpreted as `val` in parsing, this PR changes it to `(val,)` so we can recognize the one element tuple in parsing Test Plan: . Imported from OSS Differential Revision: D18846849 fbshipit-source-id: 42959b9190c2567ef021a861497077c550324b7c	2019-12-09 14:15:40 -08:00
Jerry Zhang	5bf58274cc	getQParams return a dictionary of qparams (#30859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30859 We can dictionary of quantization parameters to simplify the code handling these things a bit Test Plan: . Imported from OSS Differential Revision: D18849023 fbshipit-source-id: 09e9860b2656a1affa8776016e16794529bcee3b	2019-12-09 13:42:21 -08:00
svcscm	fb36f1c334	Updating submodules Summary: GitHub commits: `0f96b98cec` `8090b337a4` `e43d2c4424` `70d1c268bf` `fc6140865b` `4caba2ed65` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 5b4edf4267942ab0cbd2980dc500227e3ce353e3	2019-12-09 13:02:10 -08:00
Sebastian Messmer	536481d9de	Fix missing virtual destructor (#30927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30927 Classes that are used virtually (e.g. have virtual methods) must have a virtual destructor or bad things happen ghstack-source-id: 95144736 Test Plan: waitforsandcastle Differential Revision: D18870351 fbshipit-source-id: 333af4e95469fdd9103aa9ef17b40cbc4a343f82	2019-12-09 12:25:26 -08:00
Sebastian Messmer	528fa737ba	Custom op autograd tests (#30519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30519 Re-enable them and write a few additional ones ghstack-source-id: 95143051 Test Plan: unit tests Differential Revision: D18729561 fbshipit-source-id: 8cefd8320913d72a450a3324bfd7c88faed072d7	2019-12-09 12:25:22 -08:00
xiaobing.zhang	daef363b15	Move Softshrink activation to Aten(CPU+CUDA) (#30229 ) Summary: VitalyFedyunin, This PR is about port Softshrink activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Softshrink() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms). CPU: input size(128, 100) forward time is 0.19 (ms); backwad avg time is 0.23 (ms). input size(128, 10000) forward time is 17.23 (ms); backwad avg time is 16.83 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.32 (ms); backwad avg time is 0.08 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.10 (ms). input size(128, 10000) forward time is 7.58 (ms); backwad avg time is 7.91 (ms). After: input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 7.30 (ms); backwad avg time is 1.02 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30229 Differential Revision: D18810054 Pulled By: VitalyFedyunin fbshipit-source-id: e19074824396570db45ba488ae4f9fe1b07a5839	2019-12-09 12:19:46 -08:00
Rohan Varma	4f342a61c1	add the worker IDs outside of addSendRpcBackward to ensure they are (#30914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30914 When tensors don't require grad, we don't call `addSendRpcBackward`, where we record known workerIDs to clean up the dist autograd context later. But since https://github.com/pytorch/pytorch/pull/29781, we always include the autograd context ID in RPCs, even if tensors do not require grad. So, it could be possible that we don't release the contexts on some nodes. This can contribute to OOMs since the contexts will not be cleaned up in this case, which can be checking by running the unit test without this patch. We can fix this issue by moving the `addKnownWorkerIds` call to the `getMessageWithAutograd` function. ghstack-source-id: 95178561 Test Plan: Added a unit test: `test_context_cleanup_tensor_no_grad` Differential Revision: D18869191 fbshipit-source-id: b80f66bfd0dd7d01960abe1691d3f44095bb1b2b	2019-12-09 11:38:34 -08:00
Gregory Chanan	c75bc9067c	MultiMarginCriterion: move scalar_check from codegen to code. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30827 Test Plan: Imported from OSS Differential Revision: D18833658 Pulled By: gchanan fbshipit-source-id: decd42789d92d4fbfeea9b470b3d7333e3862263	2019-12-09 07:48:58 -08:00
Owen Anderson	190dac13e3	Use universal references and perfect forwarding in Loops.h. (#30466 ) Summary: This simplifies the generated code a bit, saving about 40K off of libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30466 Differential Revision: D18836215 Pulled By: resistor fbshipit-source-id: ad75c9e04783bb29cc06afd2022f73f9625dd52b	2019-12-08 23:31:10 -08:00
Jongsoo Park	6848f9abb8	call fp16<->fp32 routines in fbgemm from Half2Float and Float2Half operators (#30715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30715 Changed caffe2/caffe2/TARGETS file to define USE_FBGEMM for x86 and USE_SSE_ONLY is not defined. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- Float16 Reviewed By: jianyuh Differential Revision: D18806067 fbshipit-source-id: 1b44b90a9f6dc3c27f81a46038c0f7542ed2bab3	2019-12-07 19:46:47 -08:00
Pritam Damania	776fdda753	Add debug info API for distributed autograd. (#30642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30642 Adding a couple of basic metrics for distributed autograd which would help in determining stuckness. ghstack-source-id: 95156189 Test Plan: waitforbuildbot Differential Revision: D18776478 fbshipit-source-id: a0556ad6fe2b7c3cd0082ee2350c1c78cafaaec5	2019-12-07 13:56:51 -08:00
svcscm	0b33080992	Updating submodules Summary: GitHub commits: `452ebf30a8` `8e85afc8a1` `39d204760c` `5760376392` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: aa1ff805dbe1a1cbe5eb256ed2ba30af587a8707	2019-12-07 13:48:58 -08:00
Pavel Belevich	4bb497b38e	MultiheadAttention fixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30666 Test Plan: Imported from OSS Differential Revision: D18864094 Pulled By: pbelevich fbshipit-source-id: f7a634b2c7f526282bf918d47b9cc82aa0c0af1d	2019-12-07 09:42:10 -08:00
svcscm	8b6d7698d6	Updating submodules Summary: GitHub commits: `40ac0e57c1` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: ac74c10651a5a4ef67c93a38dc6673f0687e38ae	2019-12-07 02:43:38 -08:00
Pritam Damania	f1bd8cc286	Fix lint issues in dist_autograd_test.py (#30928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30928 ghstack-source-id: 95152373 Test Plan: waitforbuildbot Differential Revision: D18872870 fbshipit-source-id: 2cd1ef228da4bd90c13e2f067a0c89b975fa3179	2019-12-07 01:44:37 -08:00
BowenBao	63f1b780ba	Support exporting aten::copy_ and aten::index_put to ONNX opset 11 (#26941 ) Summary: - [x] Add more comments and refactor the logic of `ReshapeToAdvancedIndexingFormat` - [x] Add more description here. Cases that are/aren't supported, and how they are supported. - [x] Need to merge this PR https://github.com/pytorch/pytorch/issues/27186 to enable testing inplace operators. We are now supporting exporting aten::copy_ and aten::index_put to ONNX. Here's a breakdown of the different cases in PyTorch code. ``` # Case 1: Scalar Indices x[0, 1, 2] = data # Case 2: Slice Indices x[1:3, :, ::2] = data # Case 3: Ellipsis Indices x[..., 0] = data # Case 4: Tensor Indices ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) x[ind1, ind2] = data # Case 5: Mixing all the above cases ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) x[1:3, ind1, ind2, ..., 3] = data ``` Limitations: Tensor indices must be consecutive, and 1-d tensors. ``` # Supported ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) x[ind1, ind2] = data # Not supported ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) ind3 = torch.tensor([[0], [1]]) x[ind1, :, ind2] = data x[ind3] = data ``` Negative indices are not supported. ``` # Not supported x[-1] = data ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26941 Differential Revision: D17951030 Pulled By: houseroad fbshipit-source-id: 4357777072f53aa0bc4b297aa1ee53457a7f8dec	2019-12-06 22:48:46 -08:00
Junjie Bai	a26238da57	Enable using `torch.autograd.profiler.record_function` as decorator (#30861 ) Summary: ```python record_function('my_func') def f(x, y): return x + y with profile() as p: f(1, 2) print(prof.key_averages().table()) ``` ``` ------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- my_func 85.42% 86.796us 87.27% 88.670us 88.670us 1 ------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 101.606us ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30861 Differential Revision: D18857993 Pulled By: bddppq fbshipit-source-id: eb6b8e2a8d4f3a7f8e5b4cb3da1ee3320acb1ae7	2019-12-06 21:38:35 -08:00
Pritam Damania	5c56986738	Attach autograd edges only for tensors requiring grad. (#30904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30904 When we sent tensors over RPC, on the server side we would call addRecvRpcBackward which would call `set_history` on all tensors. This was incorrect and set the `requires_grad` flag on tensors that didn't actually need grad. To fix this, we only attach autograd edges to tensors that need grads. ghstack-source-id: 95113672 ghstack-source-id: 95113999 Test Plan: waitforbuildbot Differential Revision: D18828561 fbshipit-source-id: d8942b76e9e4c567f8f1821f125c00d275ea0f90	2019-12-06 18:05:57 -08:00
Michael Suo	62b10721fb	Actually make flake8 do something (#30892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892 Fixes all outstanding lints and actually installs a properly configured flake8 Test Plan: Imported from OSS Differential Revision: D18862825 Pulled By: suo fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85	2019-12-06 17:50:50 -08:00
Ansha Yu	8d35b6cec7	embedding_bag make_bag_size optimization (#30701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30701 From James' PR https://github.com/pytorch/pytorch/pull/19715 embedding_bag microbenchmarks: Baseline: P123020983 Refactor make_bag_size, no changing at::zeros to at::empty (this diff): P123021393 Inference benchmark on T6_SKL - _embedding_bag self time only: bs=40, baseline: .302 ms/iter bs=40, with diff: .244 ms/iter bs=1 baseline: .148 ms/iter bs=1 with diff: .124 ms/iter The bigger gap comes from fb::embedding_bag_byte_rowwise_offsets, I'm looking into that one too. Test Plan: MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./inference_benchmark_nolr_emb.par --pt-scripted-model=traced_model.pt --pt-inputs="batch_size_40/pt_inputs.pth" --iters=3000 --warmup-iters=100 buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 3000 --operators embeddingbag Reviewed By: yinghai, qizzzh Differential Revision: D18800166 fbshipit-source-id: 820e6ece0b6ade72ee42409661f92c548f43a4cb	2019-12-06 16:17:16 -08:00
Ailing Zhang	cd6167ff63	Upgrade bazel to 1.2.0. (#30885 ) Summary: Companion diff for https://github.com/pytorch/xla/pull/1464. Should land only after the pytorch/xla PR is in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30885 Differential Revision: D18866835 Pulled By: ailzhang fbshipit-source-id: 51f4d2770f8ef873a659579ddd81a42957ffb885	2019-12-06 16:08:24 -08:00
Xingying Cheng	7b97eaeba5	Add module level qpl logging. (#30906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30906 Add mobile module observer to measure performance of each method run. ghstack-source-id: 95120194 Test Plan: Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent: 1. buck install -r fb4a 2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params: a. sample_rate: 1.0 b. enabled: true c. use_bytedoc_pytorch_model: true d. use_bytedoc_caffe2_model: false e. use_full_jit: false 3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage; 4. Click on the ads, wait for the offsite ads loads; 5. Click back to news feed; 6. Go to scuba table: https://fburl.com/scuba/4fghwp0b and see all the operator runs have been logged: {F223456981} Reviewed By: ljk53 Differential Revision: D18702116 fbshipit-source-id: a9f07eee684e3022cef5ba3c5934f30f20192a85	2019-12-06 15:52:26 -08:00
Nikolay Korovaiko	118f1c633b	refactor the way we are handling bailout counts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30410 Differential Revision: D18733370 Pulled By: Krovatkin fbshipit-source-id: 0ea9dc0f3dd1a47bcc09f1d54745460f9bd71886	2019-12-06 15:45:38 -08:00
Tongzhou Wang	c37de32b23	Enable len(dataloader) for iterable dataset (#23587 ) Summary: Copy-paste comment from code for reasoning: ``` # NOTE [ IterableDataset and __len__ ] # # For `IterableDataset`, `__len__` could be inaccurate when one naively # does multi-processing data loading, since the samples will be duplicated. # However, no real use case should be actually using that behavior, so # it should count as a user error. We should generally trust user # code to do the proper thing (e.g., configure each replica differently # in `__iter__`), and give us the correct `__len__` if they choose to # implement it (this will still throw if the dataset does not implement # a `__len__`). # # To provide a further warning, we track if `__len__` was called on the # `DataLoader`, save the returned value in `self._len_called`, and warn # if the iterator ends up yielding more than this number of samples. ``` Fixes https://github.com/pytorch/pytorch/issues/30184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587 Differential Revision: D18852625 Pulled By: ailzhang fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826	2019-12-06 15:38:05 -08:00
Serhat Yilmaz	a77eafa1d8	Fix 'initialized after field' error (#30908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30908 Same as title. Test Plan: Wait for CI to clear. Reviewed By: bddppq, xw285cornell Differential Revision: D18862837 fbshipit-source-id: bc34356b85774fc20ba46d321c8a2bb5d5c727f6	2019-12-06 15:04:18 -08:00
Jiakai Liu	baccd26df7	update code analyzer script to handle splitted torch libraries (#30864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30864 Change it to handle all archive files under install folder. Test Plan: ``` ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh ANALYZE_TORCH=1 tools/code_analyzer/build.sh ``` Differential Revision: D18850317 Pulled By: ljk53 fbshipit-source-id: 7c57ae16c82b6ded53aa7df385f3b6074190fc04	2019-12-06 14:38:30 -08:00
Sebastian Messmer	223f46f5fa	Fix flake8 warning (#30905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30905 - ghstack-source-id: 95117983 Test Plan: - Differential Revision: D18861981 fbshipit-source-id: b794a7fbe05af29471286c7f665cf3f86541eb5a	2019-12-06 14:19:35 -08:00
James Reed	4fd20c0816	Kill hypothesis deadline testing (#30890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890 We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing Test Plan: Imported from OSS Differential Revision: D18857597 Pulled By: jamesr66a fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6	2019-12-06 13:36:14 -08:00
Shen Li	26c51468c5	Fix examples in RRef API doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30857 Test Plan: Imported from OSS Differential Revision: D18847527 Pulled By: mrshenli fbshipit-source-id: 7dc9d28277597f8fc3ef97fa9ac98a312e76e6fb	2019-12-06 13:14:11 -08:00
Shen Li	642469b706	Fix examples in API doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30856 Test Plan: Imported from OSS Differential Revision: D18847528 Pulled By: mrshenli fbshipit-source-id: 57f666d9d4b634fb77b1b65debd2b07e2bebd57a	2019-12-06 13:14:06 -08:00
Shen Li	5e6c3fb23b	Add more details to explain rpc_backend_options arg in init_rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30855 Test Plan: Imported from OSS Differential Revision: D18847529 Pulled By: mrshenli fbshipit-source-id: b4f0d5797f3b41cce155b7821d6bd34b268bd24e	2019-12-06 13:14:02 -08:00
Jerry Zhang	6d06b925ba	Remove `values_to_quantize_` (#30858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30858 This is not needed since we have `values_to_qparams_` Test Plan: . Imported from OSS Differential Revision: D18848992 fbshipit-source-id: dc81f59967a93abdd5562f1010f02de4f4e60db0	2019-12-06 12:15:13 -08:00
Xingying Cheng	81e4739141	Move QScheme ops to c10 (#30134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30134 ghstack-source-id: 95055387 Test Plan: buck build mode/dev caffe2:generate-code Differential Revision: D18609716 fbshipit-source-id: fec39359e0b97387a9b13f8179d72a731cc61808	2019-12-06 12:04:51 -08:00
Mingbo Wan	d6ddfab11f	save linux build binary size to Scuba (#30832 ) Summary: example: https://fburl.com/scuba/mjheume7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30832 Differential Revision: D18857146 Pulled By: mingbowan fbshipit-source-id: 66bcd352922944c227f337a66e8a75e2d7393fd3	2019-12-06 11:55:35 -08:00
Xingying Cheng	78254eab45	Add mobile operator observer for qpl logging. Summary: Add mobile operator observer to measure performance of each operator run, the result will also log into QPL event: [MOBILE_OPERATOR_STATS ](https://fburl.com/quicklog/8773a00a). Test Plan: Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent: 1. buck install -r fb4a 2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params: a. sample_rate: 1.0 b. enabled: true c. use_bytedoc_pytorch_model: true d. use_bytedoc_caffe2_model: false e. use_full_jit: false 3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage; 4. Click on the ads, wait for the offsite ads loads; 5. Click back to news feed; 6. Go to scuba table: https://fburl.com/scuba/er7t4g9u and see all the operator runs have been logged: {F223250762} Reviewed By: ljk53 Differential Revision: D18131224 fbshipit-source-id: 23e2f6e2a9851c04b29511b45dc53f3cce03e8a0	2019-12-06 11:55:32 -08:00
Qi Zhou	44ff7b08d8	Reduce intrusive_ptr incref/decref costs (#30709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30709 Intrusive_ptr doesn't provide a explicit incref method. When a users want to incref the target, they creates a intrusive_ptr to wrap the target, then makes a copy which does the actual incref, then release both the first intrusive_ptr and the copy to prevent decref at deconstruction time. This is very inefficient. Instead, do the incref/decref directly. Differential Revision: D18798505 fbshipit-source-id: 524d4f30d07d733df09d54423b044d80e4651454	2019-12-06 11:52:20 -08:00
Sebastian Messmer	e123d90a93	Back out "Back out "Back out "Revert D18542342: Boxed variable dispatch""" (#30650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30650 Original commit changeset: 51bb7aac7cb7 ghstack-source-id: 95082205 Test Plan: CI Differential Revision: D18778190 fbshipit-source-id: 7e9577e88fd0492006b6ea836ec081aea9da6b0c	2019-12-06 11:45:09 -08:00
Sebastian Messmer	37435d36ed	Refactor VariableTypeManual (#30649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30649 Operators in VariableTypeManual are now no longer registered against the VariableTypeId key, but they are registered as compound ops. See https://github.com/pytorch/pytorch/issues/30102 for background. This also requires the non-variable codegen to ignore them and requires removal of VariableMethodStubs.cpp. So, because function_wrapper.py now also needs to know which ops are manual, instead of having a hard-coded list in gen_variable_type.cpp for ops with manual implementation, we now have a `manual_kernel_registration` flag in native_functions.yaml that disables the registration of operator kernels for this operator (the schema is still registered). Then, we manually register the right kernels for the operator. ghstack-source-id: 95082204 Test Plan: unit tests Differential Revision: D18778191 fbshipit-source-id: 0af6f9e43ff4fb9800ce19b286dfccd0fd22cc41	2019-12-06 11:45:05 -08:00
Mikhail Zolotukhin	b0e7db5b31	Revert D18840736: make sure windows tests get triggered Test Plan: revert-hammer Differential Revision: D18840736 Original commit changeset: 6fdf73649622 fbshipit-source-id: 719576e9c717847bfb4b057875a273123e941db3	2019-12-06 11:26:37 -08:00
Jerry Zhang	4ed2eae2d0	Add registerQParams function (#30552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30552 For upcoming changes to support quantizing shared class type Test Plan: . Imported from OSS Differential Revision: D18818653 fbshipit-source-id: 393a55db69b20a1c00ffa0157ab568cb097915b2	2019-12-06 11:17:35 -08:00
Soumith Chintala	0051467118	Update CITATION from Workshop paper to Conference paper (#30872 ) Summary: The conference paper is finally published at NeurIPS 2019: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/30872 Differential Revision: D18854253 Pulled By: soumith fbshipit-source-id: 4f91838b1953e976542997959d5571884f739872	2019-12-06 09:16:17 -08:00
Gregory Chanan	377131b0eb	MultiMarginCriterion: fix scalar_check in the case where reduction == None. (#30826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30826 Previously the scalar_check for the reduction None case was: input.dim() <= 1, but it should be target based, i.e.: target.dim() == 0. This follows from the "correct cases", i.e. (N, C) X (N,) -> (N,) (C,) X () -> () Test Plan: Imported from OSS Differential Revision: D18833660 Pulled By: gchanan fbshipit-source-id: 26338b842a8311718c4b89da3e2f1b726d5409b8	2019-12-06 09:04:38 -08:00
Anjali Chourdia	5687ee1d85	added a serialize function in SGD class to utilize the existing macro for serialization/deserialization calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30739 Differential Revision: D18842908 Pulled By: anjali411 fbshipit-source-id: 7dc13ff9c4fc126790b88b1b4b5d03425c349d38	2019-12-06 08:38:07 -08:00
Gregory Chanan	e5d571ae25	Remove scalar_check from topk, move it to the THC implementation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30852 Test Plan: Imported from OSS Differential Revision: D18842662 Pulled By: gchanan fbshipit-source-id: b5e8a4367fce9441be2ddbd026495f1911038221	2019-12-06 07:50:20 -08:00
Gregory Chanan	60714dfb64	change index_select scalar_check to retain dimensionality of input. (#30790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30790 The index_select documentaiton reads: "The returned tensor has the same number of dimensions as the original tensor (input)." But the implementation would return a 0-dimensional tensor iff both the input and index were 0-dimensional. This change makes it so we retuan a 0-dimensional tensor iff the input is 0-dimensional. Restacked version of: https://github.com/pytorch/pytorch/pull/30502 Test Plan: Imported from OSS Differential Revision: D18825717 Pulled By: gchanan fbshipit-source-id: aeb10c5107e748af3e264fbdc81fff5dd4833cc4	2019-12-06 07:47:53 -08:00
Seiya Tokui	1d7b40f1c4	Fix reading `__cuda_array_interface__` without strides (#24947 ) Summary: When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24947 Differential Revision: D18838986 Pulled By: ezyang fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42	2019-12-06 07:36:27 -08:00
Edward Yang	11b3065323	Run method_tests on CUDA. (#30821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30821 While investigating while our tests didn't catch #30704 I noticed that none of our tests in method_tests() were being run on CUDA. This diff moves those tests into the new device-generic test framework so that we also get CUDA coverage. For expediency, I blacklisted all tests which didn't work on CUDA (rather than fix them); that's something we can leave for future PRs. This is done by way of a new expectedFailure gadget. Note that all occurences of skipIfNoLapack needed to be replaced with skipCPUIfNoLapack. I punted for test_jit; it's possible those tests should also run CUDA but a JIT expert should take a look here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18840089 Pulled By: ezyang fbshipit-source-id: 66b613b5024c91d3e391c456bb642be7e73d4785	2019-12-06 07:24:27 -08:00
Xintao Chen	9a858aba5f	Moving checks related to options.aliasAnalysis and schema.hasAliasInfo to read callsite (#30671 ) Summary: Context: In D18530964, we allow not set aliasAnalysis at previous registration call, and then update it to the correct one in following registration call. But its not working E2E due to those existing checks. So we want to remove or delay those TORCH_CHECKs. Here is the existing three callsites for operator.aliasAnalysisKind(): https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/ir.cpp?lines=994%2C995%2C996%2C1001%2C1004 https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/operator.cpp?lines=147%2C155 https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/passes/alias_analysis.cpp?lines=260%2C277%2C380 Things to check 1. Those two checks are different. But since in original op_registration code, if options.schemaOrName_->is_right() is FALSE, we kind of convert it to FunctionSchema type, so in the read callsites, we only need to check the following: options.aliasAnalysisKind_ == AliasAnalysisKind::FROM_SCHEMA \|\| !schema.hasAnyAliasInfo() 2. If the three callsites above are indeed needed for those checks. 3. Here we made assumptions that for reads from jit or other places, its always being called after all registrations calls are done. Trying to make sure its a valid assumption Pull Request resolved: https://github.com/pytorch/pytorch/pull/30671 Test Plan: Will update and refactor the tests soon. Differential Revision: D18784623 Pulled By: charliechen0401 fbshipit-source-id: 75edea140d0ae3e54820e1aeef010c81fe26416a	2019-12-06 01:36:22 -08:00
Shen Li	619e2ffe23	Replace deprecated AT_* with TORCH_* to reduce warnings in c10d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30795 Test Plan: Imported from OSS Differential Revision: D18826310 Pulled By: mrshenli fbshipit-source-id: 0041ac2e5788e874e0a566abd57a8a90e658da9b	2019-12-06 01:28:30 -08:00
Shen Li	b0cba8ceae	Replace deprecated AT_ERROR with TORCH_CHECK to reduce warnings in rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30794 Test Plan: Imported from OSS Differential Revision: D18826311 Pulled By: mrshenli fbshipit-source-id: bfd58d30f386bbe9535264b2afce4acbe7ac5b0e	2019-12-06 01:28:26 -08:00
Xiang Gao	2011cc1e91	Fix half->float case of softmax backward when inner_size is not 1 (#30838 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30572 That unit test is tested to fail with master and success with this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30838 Differential Revision: D18841066 Pulled By: ngimel fbshipit-source-id: 86a7ccdb3016c98d62dd0946daff101704cd1f68	2019-12-06 00:25:34 -08:00
Satendra Gera	d32aec5ad6	Add get_metrics and get_debug_info to rpc agent (#30833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30833 [rpc] Add get_metrics and get_debug_info to rpc agent Test Plan: UT and builds Reviewed By: mrshenli Differential Revision: D18835068 fbshipit-source-id: f552cf196bb6d54ccd38a44ba981e7d5b15513f0	2019-12-05 23:52:42 -08:00
Jerry Zhang	58cdf1429c	Add tests for quantizing traced models (#30476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30476 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18795724 fbshipit-source-id: 9253e102bf458d9185f68848071a4e4eff9f9b08	2019-12-05 23:03:45 -08:00
Jerry Zhang	f1755d9aea	Insert GetAttr for quantization parameters instead of Constant (#30551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30551 To enable quantizing with shared types, we need to insert GetAttr nodes for quantization parameters since the code might be shared by multiple module instances and we'd like to make quantized module instance also share the same code but with different values of attributes. Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D18818652 fbshipit-source-id: fc95623cac59dcedd9e3f95397524eae515e7a11	2019-12-05 22:52:45 -08:00
Jerry Zhang	1fa4908ac0	Refactor test_quantization.py and enable `test_nested` (#30475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30475 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18795727 fbshipit-source-id: c9942c5361e0a34e91a08b8fc27405799db7ff4f	2019-12-05 21:56:03 -08:00
Rohan Varma	ef95a72690	modify test_local_shutdown_with_rpc to not be flaky (#30837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30837 This test would get very occasional flakes, with an error saying the RPC timed out. This happened because one worker would still be waiting for the return value of an RPC, but another worker had already performed its local shutdown, so it would not have sent the response. This didn't show up in initial testing since the flakiness is very rare (< 1/100 test runs). This diff fixes the issue by not erroring if these RPCs timeout. The reason this is okay is because with a local shutdown, we should not expect for all outstanding RPCs to be completed, since workers are free to shut down without completing/waiting on outstanding work. ghstack-source-id: 95021672 ghstack-source-id: 95021672 Test Plan: Ran the test 1000 times to ensure that it is not flaky. Differential Revision: D18775731 fbshipit-source-id: 21074e8b4b4bbab2be7b0a59e80cb31bb471ea46	2019-12-05 21:46:39 -08:00
Joseph Spisak	7af9d77290	Update persons_of_interest.rst Updating to add POI for mobile, quantization and an addition to optimizers.	2019-12-05 21:20:40 -08:00
Jerry Zhang	a7406516d1	Refactor bias and weight check and add aten::linear pattern (#30474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30474 There are some common parts in `isBiasOfConvOrLinear` and `isWeightOfConvOrLinear`, we can factor them out, the refactor will allow for easier extension of new patterns Test Plan: python test/test_jit.py python test/test_quantization.py Imported from OSS Differential Revision: D18795725 fbshipit-source-id: 446463da5e3fa8464db441ed0d9651930487b3b7	2019-12-05 21:00:39 -08:00
Supriya Rao	a51c5f5cbf	Add JIT pass to insert permutes for conv ops (#30679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30679 Caffe2 expects quantized ops to be in NHWC format while pytorch inputs are in NCHW. Add a jit pass to insert permutes to convert from nchw2nhwc before each conv op and add nhwc2nchw permute after the conv op. Using graph rewriter to find consecutive redundant permutes and remove them from the graph Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps Imported from OSS Differential Revision: D18790518 fbshipit-source-id: 4dd39cf0b31b21f5586c0edfdce2260d4e245112	2019-12-05 18:51:16 -08:00
Zachary DeVito	c1159494a6	Revert D18621773: we should have a config-based way to skip flaky tests Test Plan: revert-hammer Differential Revision: D18621773 Original commit changeset: 5532f1d5fa3f fbshipit-source-id: 22239b88a6f9551938e6e2178bf9162e3385b011	2019-12-05 17:08:20 -08:00
Mingbo Wan	4034aa7621	make sure windows tests get triggered (#30836 ) Summary: we prefer "_" over "-" in build names, so change checks in test script Pull Request resolved: https://github.com/pytorch/pytorch/pull/30836 Differential Revision: D18840736 Pulled By: mingbowan fbshipit-source-id: 6fdf736496225c5f8ab44906d8f4681b7bf894a7	2019-12-05 15:47:56 -08:00
xiaobing.zhang	82c3f4861f	Move hardtanh activation to Aten(CPU, CUDA) (#30152 ) Summary: VitalyFedyunin, This PR is about port Hardtanh activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Hardtanh() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.06 (ms). input size(128, 10000) forward time is 0.84 (ms); backwad avg time is 0.44 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.61 (ms); backwad avg time is 0.10 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 5.21 (ms); backwad avg time is 5.25 (ms). After: input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 1.09 (ms); backwad avg time is 1.09 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30152 Differential Revision: D18815545 Pulled By: VitalyFedyunin fbshipit-source-id: d23b6b340a7276457f22dce826bcbe3b341d755f	2019-12-05 15:28:03 -08:00
Edward Yang	6e38d50352	Revert D18117070: Migrate max and min (binary) from TH to ATen. Test Plan: revert-hammer Differential Revision: D18117070 Original commit changeset: e06d37a8a140 fbshipit-source-id: 49dd33f52e7e3ffcaafc02109a0a0a67545ec7e8	2019-12-05 14:43:29 -08:00
Zachary DeVito	e5bd7a7942	we should have a config-based way to skip flaky tests (#29944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29944 This particular approach queries our issue tracker for test titles that match the following format: ``` DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync) ``` And then skips the python test for them. There is 1 second timeout so if the internet flakes we still run the test suite, without disabling any tests. This is intended as a quick fix, similar to ninja unland, to get to a green master. Long term test disables should go into the code. Test Plan: Imported from OSS Differential Revision: D18621773 Pulled By: zdevito fbshipit-source-id: 5532f1d5fa3f83f77fc3597126cbb7dba09a3c33	2019-12-05 14:28:27 -08:00
Gregory Chanan	0974dcc244	Fix error checking of CUDA multi_margin_loss. (#30825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30825 It didn't verify in the 1-d case that the targets were size 1.. Test Plan: Imported from OSS Differential Revision: D18833659 Pulled By: gchanan fbshipit-source-id: 9b0276e7b0423fdaf2ba7cfa34bde541558c61f9	2019-12-05 14:23:00 -08:00
Edward Yang	2ced81f289	Revert "Default to not build Caffe2 operators on Windows. (#29061 )" (#30740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30740 This reverts commit 7102aceaf88ab71781c6019458bd7a07e86a532f. Test Plan: Imported from OSS Differential Revision: D18834315 Pulled By: ezyang fbshipit-source-id: 2dbd1cf686864b9840365083182cd6188a285399	2019-12-05 14:01:59 -08:00
Hong Xu	f874230d33	Vectorize smooth L1 loss backward function on CPU. (#30046 ) Summary: Benchmark (Intel i7-8850H, turbo off, release build, RHEL 7.7): ``` import timeit for dtype in ('torch.float', 'torch.double'): print(f'dtype={dtype}') for n, t in [(10_000, 100000), (100_000, 20000)]: print(f'numel() == {n} for {t} times') print(timeit.timeit('output.backward(retain_graph=True)', number=t, setup=f""" import torch loss = torch.nn.SmoothL1Loss() input = torch.randn({n}, requires_grad=True) target = torch.randn({n}) output = loss(input, target) """)) ``` Before: ``` dtype=torch.float numel() == 10000 for 100000 times 6.154701935998673 numel() == 100000 for 20000 times 5.157296671999575 dtype=torch.double numel() == 10000 for 100000 times 6.195317157000318 numel() == 100000 for 20000 times 5.099748799999361 ``` After: ``` dtype=torch.float numel() == 10000 for 100000 times 4.968745516000126 numel() == 100000 for 20000 times 2.4029395039997326 dtype=torch.double numel() == 10000 for 100000 times 4.9910852479988534 numel() == 100000 for 20000 times 2.4867371629989066 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30046 Differential Revision: D18602399 Pulled By: VitalyFedyunin fbshipit-source-id: 4c6c7b7b69ad6bce759786ddd7d6bc1e88ecf6ab	2019-12-05 13:57:42 -08:00
peterjc123	6486bdfb90	Fix `os.register_at_fork` not defined on Windows (#30809 ) Summary: According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809 Differential Revision: D18828777 Pulled By: bddppq fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f	2019-12-05 13:36:53 -08:00
zrphercule	c564d794ed	Add ATen/native/ headers to torch target (#30835 ) Summary: We dont have ATen/native/*.h in torch target before, and we would like it to be exposed for external use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30835 Differential Revision: D18836160 Pulled By: zrphercule fbshipit-source-id: 7330a9c9d8b65f173cc332b1cfeeb18c7dca20a8	2019-12-05 13:24:21 -08:00
Will Feng	244b0bd1a5	Add docs for how we expose declarations in at:: to torch:: (#30760 ) Summary: This PR adds docs for how we expose declarations in `at::` to `torch::`, to make the semantics more clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30760 Differential Revision: D18833081 Pulled By: yf225 fbshipit-source-id: eff4d8815c67f681ce3a930ce99771cf2e55dbd9	2019-12-05 13:05:28 -08:00
Jiakai Liu	be55874f2c	style fixes to code analyzer (#30808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30808 Addressed some comments on #29550 after it's landed. Test Plan: ``` LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh LLVM_DIR=... ANALYZE_TORCH=1 tools/code_analyzer/build.sh -closure=false -debug_path=true ``` Differential Revision: D18835100 Pulled By: ljk53 fbshipit-source-id: 991d292ddc0211a88b04d0bdc24719f471c7786e	2019-12-05 11:25:37 -08:00
Qi Zhou	9617d07bd5	Wrap warning handler in a function to avoid siof (#30800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30800 SparseNN benchmark crashed due to this. Wrap warning handler in a function to avoid siof. Test Plan: Tested locally, SparseNN benchmark no longer crashes. Reviewed By: yinghai Differential Revision: D18826731 fbshipit-source-id: 8fcab8a3f38cc20f775409c0686363af3c27d0a6	2019-12-05 11:22:15 -08:00
Jiakai Liu	bf1b4b6fef	add torch_cpu to the static library list in TorchConfig.cmake.in (#30769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30769 The TorchConfig.cmake is the public cmake we produce in install folder for 3rd party client code to get all libtorch dependencies easily. Apparently this build flow is not well covered by our CI (which is focused on 1st party build / shared libraries?) as the little dummy project for code analysis testing purpose was broken by #30315 without fail any CI. Fixed the problem for mobile build and add the dummy project build to mobile CI as well. Test Plan: - make sure new CI pass; Differential Revision: D18825054 Pulled By: ljk53 fbshipit-source-id: 80506f3875ffbc1a191154bb9e3621c621e08b12	2019-12-05 11:13:32 -08:00
Nathan Goldbaum	f531815526	Deprecate tensor.type() (#30281 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29161. I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281 Differential Revision: D18830818 Pulled By: ezyang fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20	2019-12-05 10:55:34 -08:00
Natalia Gimelshein	2171f91053	reenable cuda_kernel_loop_overflow_large test (#30797 ) Summary: Fix https://github.com/pytorch/pytorch/issues/30771 has landed, original issue https://github.com/pytorch/pytorch/issues/26838 is now closed cc peterjc123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30797 Differential Revision: D18827307 Pulled By: ngimel fbshipit-source-id: 41b3db5fc9db85daeaa1b53c55b468976c996285	2019-12-05 10:09:39 -08:00
Hong Xu	1578a28692	Migrate max and min (binary) from TH to ATen. (#27185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27185 TH implementation will be removed after the unary max and min are migrated. Benchmark: (Debian 10, Release build, gcc 7.4, no turbo) ```python import timeit for device in ('cpu', 'cuda'): print(f'device: {device}') for op in ('max', 'min'): for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'torch.{op}(a, b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'torch.{op}(a)' + (';torch.cuda.synchronize()' if device == 'cuda' else ''), setup=f'import torch; a = torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype}) * ({n} / 2)', number=t)) print() ``` Before: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.241763713000182 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.7138833169992722 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.2183356810000987 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7031846980007685 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7704679510006827 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.289198366999699 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7937613740014058 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2930124340000475 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8032857640009752 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.2908709189996443 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8829010000008566 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.2994690759987861 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 1.8037853410005482 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.2929310759991495 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.8075240359994496 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2932477679987642 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7868400779989315 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2885970789993735 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8389664830010588 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.29402057399966 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.787109836999662 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.842438002999188 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.429616614999759 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.835390076999829 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.940423873000327 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4108991760003846 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.9318018840003788 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4168134739993548 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9610764919998473 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4189234130008117 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.960172712999338 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4162539499993727 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.8985912560001452 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.4113489299998037 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.9160250799995993 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4128787690005993 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8806865219994506 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4086357010000938 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9362181240012433 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4151225870009512 ``` After: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.2685823729998447 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.72004808300062 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.212242640000113 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7089235590001408 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7767087259999244 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2916517639996528 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8265984959998605 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.3002885240002797 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8084679720004715 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3012119999993956 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8800218449996464 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.3060645710002063 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.4905043950002437 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.9126290209997023 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7972335520007618 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2918074379995232 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8047651860006226 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2992197730000044 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8526509560006161 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3030709570002728 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.700986622000528 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.8415469050005413 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.3051693249999516 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.8321999460004008 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8086475109994353 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.405110773999695 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.913458047999484 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4236377289998927 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9386842409994642 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4230227469997772 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 3.0341797270002644 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4289592409995748 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.6091147850002017 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 2.036691903999781 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8256167649997224 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4078955400000268 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8631781489993955 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4210130069996012 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 3.0112479260005784 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4297719679998409 ``` Solve partly #24594 #24595 Close #25016 Test Plan: Imported from OSS Differential Revision: D18117070 Pulled By: VitalyFedyunin fbshipit-source-id: e06d37a8a1405848ba0b9e398870a77eb52bae8b	2019-12-05 09:55:56 -08:00
Heungsub Hans Lee	fa251cfd97	Fully deprecate variadic inputs of checkpoint_sequential (#25985 ) Summary: To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985 Differential Revision: D18809875 Pulled By: albanD fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0	2019-12-05 09:23:28 -08:00
Gregory Chanan	2607772959	Turn off scalar_checks for SpatialDepthwiseConvolution and SpatialConvolutionMM. (#30789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30789 The input(s) can't be 0-dimensional, so its irrelevant. Restacked version of: https://github.com/pytorch/pytorch/pull/30438 Test Plan: Imported from OSS Differential Revision: D18825716 Pulled By: gchanan fbshipit-source-id: a4883b795163efcb9d8dba6166d0f2102b6728a2	2019-12-05 08:07:31 -08:00
Gregory Chanan	f12332eb51	Move scalar_check from codegen to code in MultiLabelMarginCriterion. (#30770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30770 Restacked version of: https://github.com/pytorch/pytorch/pull/30753 Test Plan: Imported from OSS Differential Revision: D18821556 Pulled By: gchanan fbshipit-source-id: 64b7311b1eb3855c4f1981d060accc918b99088d	2019-12-05 08:07:26 -08:00
Gregory Chanan	50625798df	Fix scalar check of MultiLabelMarginLoss. (#30768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30768 The behavior didn't match the documentation, because the documentation (for 'none' reduction) reads: input X target -> output (N, C) X (N, C) -> (N,) (C,) X (C,) -> () but the later case would output (1,). This also changes the case to: () X (C,) -> () from: () X (C,) -> (C,) which makes more sense with the above formulas. Restacked version of: https://github.com/pytorch/pytorch/pull/30748 Test Plan: Imported from OSS Differential Revision: D18821554 Pulled By: gchanan fbshipit-source-id: 3df77c51cf25648cb5fab62a68b09f49c91dab4e	2019-12-05 08:07:20 -08:00
Gregory Chanan	473a044835	Fix a CUDA memory leak in MultiLabelMarginCriterion error checking. (#30767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30767 Restacked version of: https://github.com/pytorch/pytorch/pull/30733 Test Plan: Imported from OSS Differential Revision: D18821553 Pulled By: gchanan fbshipit-source-id: 8bf0365ce54dd2f07a5d6d0937332d0baf75b350	2019-12-05 08:07:15 -08:00
Gregory Chanan	ba1a9871cb	Turn off scalar_check for is_target for MultiLabelMarginCriterion, which is handled correctly in code. (#30766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30766 Restacked version of: https://github.com/pytorch/pytorch/pull/30728 Test Plan: Imported from OSS Differential Revision: D18821555 Pulled By: gchanan fbshipit-source-id: 27acc72f82e94eddeea675ae66e010cfb2fc7421	2019-12-05 08:07:10 -08:00
Gregory Chanan	35a6997863	Support 0-d tensors in CUDA MultiLabelMarginCriterion. (#30765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30765 It is already supported in CPU and is pretty easy to add for consistency. Restacked version of: https://github.com/pytorch/pytorch/pull/30727 Test Plan: Imported from OSS Differential Revision: D18821557 Pulled By: gchanan fbshipit-source-id: e6aa3e91000ff3fd63941defc7d30aef58ae2f82	2019-12-05 08:07:05 -08:00
Serhat Yilmaz	c4e9748bc6	Provide full path for buck hipification (#30746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30746 This diff should be safe as long as open source build succeeds and should have no impact to cuda. Differential Revision: D18811302 fbshipit-source-id: a7adab993816cba51842701898fac5019438b664	2019-12-05 07:57:52 -08:00
Dylan Bespalko	f2a2fec47c	CUDA-strided-complex Binary and Unary Op support (#30295 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex) Changes so far: - [x] Added complex support of torch.empty and torch.fill() - [x] Added complex support of CopyKernels - The 'static_cast_with_inter_type' template function is specialized for the following cases - `dest_t = thrust::complex<dest_value_t>`, `src_t = std::complex<src_value_t>` - `dest_t = std::complex<dest_value_t>`, `src_t = thrust::complex<src_value_t>` - This handles the compile-time case where `dest_value_t=double` and `src_value_t=float`. - [x] Added complex support of BinaryOp kernels - `using thrust_t = typename ztype_cuda<scalar_t>::thrust_t;` converts std::complex<T> ScalarTypes to thrust types and is a no-op of other Scalar Types. - The operator is performed using complex number support defined in `thrust/complex.h` - This could be extended to work with ROCm by using `rocm/complex.h` - [x] Added complex support of UnaryOp kernels - Added CUDA support for `angle()`, `real()`, `imag()`, `conj()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30295 Differential Revision: D18781954 Pulled By: ezyang fbshipit-source-id: 25d204c0b8143ee27fda345a5d6a82f095da92a7	2019-12-05 07:30:39 -08:00
Sebastian Messmer	139aa51962	Clean up non-C++14 code (#28443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28443 We're now on C++14, so we don't need the else branch of these ifdef's anymore ghstack-source-id: 94904074 Test Plan: waitforsandcastle Differential Revision: D18069136 fbshipit-source-id: f1613cab9a99ee30f99775e4a60a1b06fd0a03ff	2019-12-05 00:41:29 -08:00
Soumith Chintala	a939b52ddb	fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_… (#30771 ) Summary: …overflow_large to working state Pull Request resolved: https://github.com/pytorch/pytorch/pull/30771 Differential Revision: D18821529 Pulled By: ngimel fbshipit-source-id: c5cbf56e686a2a3cfc7274dd96db37289dac7588	2019-12-04 20:58:30 -08:00
Jerry Zhang	1d20c32bf1	Make `InsertQuantDeQuantHelper` global (#30550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30550 Right now we have a `InsertQuantDeQuantHelper` for each module, but we need it to be global because we need to know what graphs have been quantized before and based on this information we can decide how to handle the module instance. Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D18818651 fbshipit-source-id: bfcaf37094ce20a257171a0c99b05b9348ebc13d	2019-12-04 20:03:00 -08:00
Jerry Zhang	c4c2e23385	Supporting making submodules unique (#30037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30037 Support quantization for modules with reused submodules, e.g. relu (automatically make unique) We first do a pass on the graph to find all duplicate uses of the same module, and record the `Value`s of the module instance, for each of these values we create a new module and change the access to that module. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18821483 fbshipit-source-id: 1698b981e9e9f0c728d9f03fcbcfbd260151f679	2019-12-04 19:26:56 -08:00
Zachary DeVito	7a2889b014	Stop producing op_version_set version numbers. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28122 Test Plan: Imported from OSS Differential Revision: D17959565 Pulled By: zdevito fbshipit-source-id: 701101bd870700eb0c9882c69e2cfdd2524b555e	2019-12-04 19:14:43 -08:00
Jerry Zhang	3c1bb21cf5	Invoke more passes in `insertObservers` (#30473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30473 Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. ghstack-source-id: 94887831 Test Plan: This is needed for quantizing traced model tests to pass Imported from OSS Differential Revision: D18795722 fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829	2019-12-04 18:45:04 -08:00
Jongsoo Park	e09c415387	Back out "make the order btw div and mul in adagrad update consistent" (#30737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30737 Original commit changeset: 2a8b2a3f5401 Reverting this to be safe until we address test failures in T58528495 Test Plan: CI Reviewed By: wx1988 Differential Revision: D18812384 fbshipit-source-id: 2a3ac554024773022ec827f259127e4c8cffe6e2	2019-12-04 17:43:45 -08:00
Nathan Goldbaum	1f1ce53e8e	Don't install pybind11 header directory for system pybind11 installs (#30758 ) Summary: For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version. Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758 Differential Revision: D18820189 Pulled By: bddppq fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17	2019-12-04 16:43:21 -08:00
Wanchao Liang	569ea63f3b	fix anynonzero op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29423 Test Plan: Imported from OSS Differential Revision: D18820523 fbshipit-source-id: 55c7a1911121f0aed008bd684b448151bbbf0a8a	2019-12-04 16:40:43 -08:00
svcscm	1d8a13147c	Updating submodules Summary: GitHub commits: `1e345af4de` `61d54df22c` `dab87e19bf` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 88e55e94c7473a7a310338eaaf508e7fc71e0df6	2019-12-04 16:40:39 -08:00
svcscm	cd032c7f6a	Updating submodules Summary: GitHub commits: `b94ef9fb23` `4462a7f00a` `16e629c415` `50770702ad` `5b632a5deb` `d2fa2cbcd6` `4e152f651e` `54c89b5f03` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 766783d00f8440c1264f13045ae6411233355af6	2019-12-04 14:56:01 -08:00
Jerry Zhang	1707774417	AddConstant and findConstant for ClassType (#29217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29217 We want to preserve constant information in ClassType so that users can access the constants in the module by name. This is also used later for freezing some attribute(converting attributes to constant) Test Plan: tbd Imported from OSS Differential Revision: D18799955 fbshipit-source-id: fbfbcd5d3f7f560368b96e2a87e270c822a3d03a	2019-12-04 14:17:13 -08:00
davidriazati	2308a0ec1b	Improve documentation around builtin functions (#30347 ) Summary: This breaks the builtins page into some more sections and adds details about Python built-in functions ](https://our.intern.facebook.com/intern/diff/18718166/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30347 Pulled By: driazati Reviewed By: wanchaol Differential Revision: D18718166 fbshipit-source-id: bf43260ab7bcf92cccef684a5ce68cb16020771d	2019-12-04 13:50:40 -08:00
Gregory Chanan	42e79d7e8a	Kill THNN version of MultiMarginCriterion; it's not used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30725 Test Plan: Imported from OSS Differential Revision: D18808767 Pulled By: gchanan fbshipit-source-id: bcc4a6e272036f3d167fc158a53fe7aa1dec51f9	2019-12-04 13:46:32 -08:00
Nathan Goldbaum	9d3402e4cb	Add the __torch_function__ API override mechanism (#30730 ) Summary: This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (`b8792c0438`). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures. I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730 Differential Revision: D18813270 Pulled By: ezyang fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68	2019-12-04 13:19:07 -08:00
xiaobing.zhang	289e9a07fd	Move Tanh backward to Aten(CPU+CUDA) (#30224 ) Summary: VitalyFedyunin, This PR is about port Tanh backward to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Tanh() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) bwd_t = 0 for i in range(10000): output = m(input) t1 = _time() output.backward(grad_output) t2 = _time() bwd_t = bwd_t + (t2 - t1) bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) backwad avg time is %.2f (ms)." % (n, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) backwad avg time is 0.12 (ms). input size(128, 10000) backwad avg time is 0.17 (ms). CPU input size(128, 100) backwad avg time is 0.05 (ms). input size(128, 10000) backwad avg time is 0.35 (ms). ``` After: ``` GPU: input size(128, 100) backwad avg time is 0.12 (ms). input size(128, 10000) backwad avg time is 0.17 (ms). CPU input size(128, 100) backwad avg time is 0.04 (ms). input size(128, 10000) backwad avg time is 0.25 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) backwad avg time is 0.03 (ms). input size(128, 10000) backwad avg time is 1.85 (ms). After: input size(128, 100) backwad avg time is 0.02 (ms). input size(128, 10000) backwad avg time is 1.16 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30224 Differential Revision: D18810045 Pulled By: VitalyFedyunin fbshipit-source-id: ab37948ab8f76bdaf9f3d1388562eaf29dacc0ea	2019-12-04 12:55:33 -08:00
Elias Ellison	d38f9117fd	Cache compilation of free functions (#30503 ) Summary: We don't have to recompile free functions if we've already compiled them. Improved compilation of resnet18 by 27%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30503 Differential Revision: D18796501 Pulled By: eellison fbshipit-source-id: 2dee0fc5fcf9adc5b92213f8cb813730d71b376f	2019-12-04 12:45:35 -08:00
Jongsoo Park	9d69c55b0d	add MaskedRowWiseSparseAdagrad Summary: As title Test Plan: buck test caffe2/caffe2/fb/optimizers:masked_adagrad_test Reviewed By: chocjy Differential Revision: D18736639 fbshipit-source-id: d0d73f75228604d3448651bff2cf34ecc21f9ba6	2019-12-04 12:36:09 -08:00
Gregory Chanan	786de33832	Move scalar_check logic from codegen to code in NLLLoss. (#30670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30670 Also turn off scalar_check for grad_input: it isn't necessary because the input can't be 0-dimensional. Test Plan: Imported from OSS Differential Revision: D18784523 Pulled By: gchanan fbshipit-source-id: 246d30970457075a0403dd0089317659a2cd2dd4	2019-12-04 12:30:23 -08:00
Gregory Chanan	fa2aa245cf	Simplify scalar_check of nll_loss. (#30669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30669 The inputs can't be 0-d, so we don't need that check in the scalar_check. Test Plan: Imported from OSS Differential Revision: D18784524 Pulled By: gchanan fbshipit-source-id: d44222dffc91880a6e8c7be69e6e146e60040d43	2019-12-04 12:30:19 -08:00
Gregory Chanan	6918f0ce86	Move scalar_check for total_weight in NLLLoss functions to code from codegen. (#30665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30665 total_weight is a "hidden" output just for autograd, so it's not user visible. The existing test_nn tests cover this (I verified that the new code is executed) and this matches the CPU behavior. Test Plan: Imported from OSS Differential Revision: D18782709 Pulled By: gchanan fbshipit-source-id: 6d1c20eeaeffa14d06f375b37f11e866587f5fa0	2019-12-04 12:30:14 -08:00
Jerry Zhang	756f279d95	Rename QuantizeHelper to InsertQuantDeQuantHelper (#30549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30549 Preparing for later refactoring Test Plan: . Imported from OSS Differential Revision: D18802464 fbshipit-source-id: 0b5afb143549d93eed4c429125d3d5fd253093a9	2019-12-04 10:40:22 -08:00
Jerry Zhang	f73cd28082	InsertObservers for shared class types (#30548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30548 ClassTypes can be shared among different module instances, but previously we assumed they would be unique, this PR enables the insert_observers pass to work with shared class types Test Plan: python test/test_jit.py python test/test_quantization.py Imported from OSS Differential Revision: D18802465 fbshipit-source-id: b782e71e44a043af45577ac2b5c83e695155bb8b	2019-12-04 09:34:47 -08:00
Jiakai Liu	6e145b4614	add irregular c10 op registration/invocation cases to test project (#30558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30558 Most c10 op registration/invocation cases are generated by aten codegen following some fixed pattern, but a handful of them were written manually, mainly for quantized ops. Added these "irregular" cases to the test project to verify static code analyzer can handle them as well. Test: - build and run the test project; Test Plan: Imported from OSS Differential Revision: D18811098 Pulled By: ljk53 fbshipit-source-id: 7bdf17175dfec41c56c0d70f124cc96478135bc4	2019-12-04 08:46:00 -08:00
Edward Yang	a55f125e3b	Check the error return of nvrtcGetProgramLogSize and nvrtcGetProgramLog (#30663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30663 Yes they can fail. See https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18810088 Pulled By: ezyang fbshipit-source-id: 96186e71c9a195bdbbed811e7ba8dc40bec09eae	2019-12-04 08:37:43 -08:00
Jongsoo Park	ca072951d5	move MaskedAdagrad to caffe2/operators/experimental/optimizers (#30714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30714 Move Masked*Adagrad operators so caffe2/python/optimizer.py can use them. Test Plan: buck test caffe2/caffe2/operators/experimental/optimizers:masked_adagrad_test Reviewed By: chocjy Differential Revision: D18805532 fbshipit-source-id: 49b1f755b31296c62e7a6a8134313b962ad9690c	2019-12-04 08:29:13 -08:00
Tongzhou Wang	d0af07ca4c	Fix capitalization inconsistency in optim.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30608 Differential Revision: D18808516 Pulled By: ezyang fbshipit-source-id: 4be68be9a8c8c3da7a0b98162bc1050b588fab43	2019-12-04 08:17:03 -08:00
Edward Yang	38986e1dea	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. This is a reland of https://github.com/pytorch/pytorch/pull/29731 but I've extracted all of the prep work into separate PRs which can be landed before this one. Some things of note: * torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) * The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO" * A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly * I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an exported fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way. * There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18790941 Pulled By: ezyang fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7	2019-12-04 08:04:57 -08:00
Will Price	1189595875	Fix Tensor.argsort -> torch.argsort documentation link Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30464 Differential Revision: D18717657 Pulled By: zou3519 fbshipit-source-id: 9894f63c6cb1b5311117441e78805230d1bc09f3	2019-12-04 07:49:38 -08:00
Edward Yang	b8792c0438	Revert D18645954: add __torch_function__ API override mechanism Test Plan: revert-hammer Differential Revision: D18645954 Original commit changeset: 54b5e4344d7a fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13	2019-12-04 07:41:47 -08:00
Tongzhou Wang	a68b790293	fix ref to nonexistent torch.repeat Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30614 Differential Revision: D18808517 Pulled By: ezyang fbshipit-source-id: 27f9bda6fbbd1c3c751a0e96fdc336bf724c0b31	2019-12-04 07:27:01 -08:00
Tongzhou Wang	ec7bb9de1c	format tri[lu]_indices doc better Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30377 Differential Revision: D18689152 Pulled By: zou3519 fbshipit-source-id: 7fab1e39ecd39ef6a3869befcbe217f8d3b6a87e	2019-12-04 07:16:34 -08:00
Tongzhou Wang	d6ca93b353	add doc for F.softplus Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30055 Differential Revision: D18762624 Pulled By: zou3519 fbshipit-source-id: 61da88cbb8cd0f37ac26b0fb8aaacdbe85c724ba	2019-12-04 07:16:30 -08:00
Prasun Anand	d12786b24f	add __torch_function__ API override mechanism (#27064 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details). For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py. This PR currently contains: * tests for `__torch_function__` behavior * modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument. This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)). ### Benchmarks: See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064 Differential Revision: D18645954 Pulled By: ezyang fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767	2019-12-04 05:56:46 -08:00
Jiakai Liu	c0299d2707	add LLVM code analyzer in order to replace static dispatch Summary: [Why static dispatch] Static dispatch was introduced to allow stripping out unused ops at link time (with “gc-sections” linker flag) for mobile build. The alternative approaches to do "non-static" dispatch are: * virtual methods - old ATen dispatcher, which has already been deprecated; * registry pattern - used by caffe2, c10 and JIT; However, none of them are “gc-sections” friendly. Global registers are root symbols - linker cannot strip out any op if we use registry pattern for mobile. [Why static dispatch isn’t great] * One more code path to maintain; * Need recompile framework to add new backends/ops; * Doesn’t support AutoGrad yet thus blocks on-device training; [Static Code Analysis] This PR introduces a LLVM analysis pass. It takes LLVM bitcode / assembly as input and generates dependecy graph among aten ops. From a set of root ops used by a model, we can calculate transitive closure of all dependent ops, then we can ask codegen to only register these ops. [Approach] To generate the dependency graph it searches for 3 types of connections in LLVM bitcode / assembly: 1) op registration: op name (schema string literal) -> registered function; 2) regular function call: function -> function; 3) op invocation: function -> op name (schema string literal) For 2) it uses similar algorithm as llvm::LazyCallGraph - not only looks into call/invoke instructions but also recursively searches for function pointers in each instruction's operands. For 1) and 3) it searches for connections between operator name string literals / function pointers and c10 op registration/invocation API calls in LLVM IR graph via "use" edges (bi-directional): 1. llvm::Value has "users()" method to get other llvm::Value nodes that use the value; 2. most of types derive from llvm::User which has "operands()" method to get other llvm::Value nodes being used by the value; [Limitation] For now the search doesn't go beyond the function boundary because the reference to op name string literals and c10 op registration/invocation APIs are almost always in the same function. The script uses regular expression to identify c10 API calls: * op_schema_pattern="^(aten\|quantized\|profiler\|_test)::[^ ]+" * op_register_pattern="c10::RegisterOperators::(op\|checkSchemaAndRegisterOp_)" * op_invoke_pattern="c10::Dispatcher::findSchema\|callOp" If we create helper function around c10 API (e.g. the "callOp" method defined in aten/native), we could simply add them to the regular expression used to identify c10 API. [Example] In the following example, it finds out: 1) the registered function for "quantized:add" operator; 2) one possible call path to at::empty() function; 3) the called operator name "aten::empty": - "quantized::add" - c10::detail::wrap_kernel_functor_unboxed_<at::native::(anonymous namespace)::QAdd<false>, at::Tensor (at::Tensor, at::Tensor, double, long)>::call(c10::OperatorKernel, at::Tensor, at::Tensor, double, long) - at::native::(anonymous namespace)::QAdd<false>::operator()(at::Tensor, at::Tensor, double, long) - void at::native::DispatchStub<void ()(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::operator()<at::Tensor&, at::Tensor const&, at::Tensor const&>(c10::DeviceType, at::Tensor&, at::Tensor const&, at::Tensor const&) - at::native::DispatchStub<void ()(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::choose_cpu_impl() - void at::native::(anonymous namespace)::qadd_kernel<false>(at::Tensor&, at::Tensor const&, at::Tensor const&) - at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) - at::TensorIterator::build() - at::TensorIterator::fast_set_up() - at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) - "aten::empty" [How do we know it’s correct?] Built a test project that contains different op registration/invocation patterns found in pytorch codebase, including both codegen and non-codegen cases. * Tried different optimization flags “-O0”, “-O3” - the result seems to be stable. * Filtered by common patterns: “aten::”, “at::”, “at::native”, “at::CPUType”, “at::TypeDefault” - manually checked the relationship between function schema strings and corresponding implementations were captured. * It can print instruction level data flow and show warning message if it encounters unexpected cases (e.g.: found 0 or multiple op names per registration/invocation API call, found 0 registered functions, etc). * Verified consistent results on different linux / macOs hosts. It can handle different STL library ABI reliably, including rare corner cases for short string literals [Known issues] * Doesn’t handle C code yet; * Doesn’t handle overload name yet (all variants are collapsed into the main op name); Test Plan: ``` LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 scripts/build_code_analyzer.sh ``` Differential Revision: D18428118 Pulled By: ljk53 fbshipit-source-id: d505363fa0cbbcdae87492c1f2c29464f6df2fed	2019-12-04 01:02:33 -08:00
Qi Zhou	f5c9452beb	Fix toObject() r-value version (#30713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30713 It should use moveToIntrusivePtr. This function is a very hot one and used a lot in interpreter loop. e.g. GET_ATTR, SET_ATTR. Making a copy and doing incref/decref caused big overhead. Reviewed By: yinghai Differential Revision: D18805212 fbshipit-source-id: 3a9368604f71638a21300ad086739c4b50f0644e	2019-12-04 00:19:35 -08:00
Jiakai Liu	d456a538f9	op dependency analysis bash driver Summary: Move the shell script into this separate PR to make the original PR smaller and less scary. Test Plan: - With stacked PRs: 1. analyze test project and compare with expected results: ``` ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh ``` 2. analyze LibTorch: ``` ANALYZE_TORCH=1 tools/code_analyzer/build.sh ``` Differential Revision: D18474749 Pulled By: ljk53 fbshipit-source-id: 55c5cae3636cf2b1c4928fd2dc615d01f287076a	2019-12-04 00:12:24 -08:00
Michael Suo	7e472679ff	pin actions/checkout version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30703 Test Plan: Imported from OSS Differential Revision: D18805447 Pulled By: suo fbshipit-source-id: d58ebe0e90b81c9282d3977f36c53c54cac750d9	2019-12-03 20:52:54 -08:00
Martin Yuan	b26401f965	Dump operator names of a script module (#30467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30467 Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] Test Plan: Imported from OSS Differential Revision: D18801619 Pulled By: iseeyuan fbshipit-source-id: f9b198d3e82b095daf704ee595d8026ad889bb13	2019-12-03 20:20:33 -08:00
Shen Li	63a1542ed2	Adding Debug Info for RRef Context Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30610 Test Plan: Imported from OSS Differential Revision: D18763592 Pulled By: mrshenli fbshipit-source-id: ad8854bdb6250c29eaa0f582d66cfd31394312e5	2019-12-03 19:16:31 -08:00
Shen Li	6dda241ab8	Add RRef.__str__() API Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30609 Test Plan: Imported from OSS Differential Revision: D18763593 Pulled By: mrshenli fbshipit-source-id: 20f1eea2d6cfe9ab2a27a9677d97dde07c1dca9b	2019-12-03 19:16:26 -08:00
Hong Xu	bb5dcaf24f	Add logical_and and logical_or (#30521 ) Summary: With the CI failure caused in 8bbafa0b32d2899ef6101172d62c6049427c977b fixed (incorrect return type of the lambdas in CUDA kernels) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521 Differential Revision: D18770151 Pulled By: ailzhang fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a	2019-12-03 18:24:54 -08:00
Hong Xu	ab834d5093	Remove exp10 in TH (unused) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422 Test Plan: Imported from OSS Differential Revision: D18764280 Pulled By: VitalyFedyunin fbshipit-source-id: 626b88a115f2efce4a53c6784f0a6660b36c97f9	2019-12-03 18:17:24 -08:00
Hong Xu	76acf5b553	Remove many unused bfloat16 functions in TH Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30329 Test Plan: Imported from OSS Differential Revision: D18764281 Pulled By: VitalyFedyunin fbshipit-source-id: bc3f91c6d09d4f73c77fe1492a358128744aee76	2019-12-03 18:17:19 -08:00
Hong Xu	4ac614191a	Remove exp10 in TH (unused) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422 Test Plan: Imported from OSS Differential Revision: D18764186 Pulled By: VitalyFedyunin fbshipit-source-id: 9343a5a7e4edf61ba3b85eaf846b2e149ed6529a	2019-12-03 18:17:15 -08:00
Michael Ranieri	ea3697db69	inline to prevent duplicate obj when linking (#30363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30363 getting duplicate definition errors when linking test. ghstack-source-id: 94472892 Test Plan: CI passes Differential Revision: D18669686 fbshipit-source-id: 3d3bfc38e4247cf8bea655537824b891b84f67bc	2019-12-03 15:59:25 -08:00
Prasun Anand	3cf8382984	detect_anomaly() for SparseTensors (#29803 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28649 1. Modified detect_anomaly() to use isnan() 2. isnan() for SparseTensors returns a bool Tensor of _values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29803 Differential Revision: D18594299 Pulled By: ezyang fbshipit-source-id: 3f4190c569f53219be330584fc604ca43c4a6c7a	2019-12-03 15:42:51 -08:00
Rohan Varma	fef4360536	remove default constructor in futureInfo (#30197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30197 This default constructor was added because std::map's operator[] requires a default constructor. However, instead of using operator[], we can use emplace and remove the constructor, to ensure that the FutureInfo struct doesnt get constructed with garbage values. ghstack-source-id: 94802453 Test Plan: Unit tests pass. Differential Revision: D18627675 fbshipit-source-id: c4cb000e60081478c0fd7308e17103ebbc4dc554	2019-12-03 15:36:22 -08:00
Tristan Rice	59151d3e43	autograd/profiler: support merging FunctionEventAvg (#30677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30677 Currently you can only add FunctionEvents to FunctionEventAvg. This makes it so you can add multiple FunctionEventAvg objects together. This is useful for merging multiple profiles together such as when dealing with distributed training. Test Plan: added unit test buck test //caffe2/test:autograd -- test_profiler Reviewed By: bddppq Differential Revision: D18785578 fbshipit-source-id: 567a441dec885db7b0bd8f6e0ac9a60b18092278	2019-12-03 15:28:58 -08:00
Peter Bell	dcd1216efe	Force early initialization of OpenMP in forked children (#29006 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28389 Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006 Differential Revision: D18782456 Pulled By: ezyang fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3	2019-12-03 15:23:31 -08:00
Brian Vaughan	a376dd344c	Added check for torch.where on CPU that both arguments have same dtype (#30662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30662 Cherry picked from: https://github.com/pytorch/pytorch/pull/29081 Test Plan: Imported from OSS Differential Revision: D18782295 Pulled By: nairbv fbshipit-source-id: 897ab25ddf8819ca34f5e86c5d3f41debb56cb04 Co-authored-by: ifedan	2019-12-03 15:19:52 -08:00
Brian Vaughan	56dd2836ec	Make zeros argument of torch.where same dtype as other argument (#30661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30661 Cherry-picked from https://github.com/pytorch/pytorch/pull/29080 Test Plan: Imported from OSS Differential Revision: D18781870 Pulled By: nairbv fbshipit-source-id: 9de85aa91bf7e0856f35c7c6238a8923315ed27f Co-authored-by: ifedan	2019-12-03 15:19:48 -08:00
Shen Li	2ba03e0287	Enable test_trainer_ps in dist_autograd_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30341 Test Plan: Imported from OSS Differential Revision: D18769574 Pulled By: mrshenli fbshipit-source-id: caf25742fa1fc9dbf6486f5ec981fae3f29784bc	2019-12-03 15:12:36 -08:00
Nikolay Korovaiko	d4c25add45	make sure the counter stays correct in between bailout transitions (#30186 ) Summary: This fixes the second issue reported in https://github.com/pytorch/pytorch/issues/29909 namely, a loop counter is assigned the wrong values after transitioning to a bailout graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30186 Differential Revision: D18646845 Pulled By: Krovatkin fbshipit-source-id: 1f7c601dd9f35892979385ffa132fb0886a4f203	2019-12-03 14:59:08 -08:00
Will Feng	03a73cb9ac	Remove namespace F = torch::nn::functional from torch/nn/modules/batchhnorm.h (#30684 ) Summary: This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to. Fixes https://github.com/pytorch/pytorch/issues/30682. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684 Differential Revision: D18795717 Pulled By: yf225 fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad	2019-12-03 14:52:23 -08:00
Brian Vaughan	604a27361f	remove tuple_parser (#30659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30659 I could only find one usage of TupleParser and it doesn't seem worth maintaining just for that one usage. Test Plan: Imported from OSS Differential Revision: D18795979 Pulled By: nairbv fbshipit-source-id: 6e50d65fc8fade0944f36ab20d00f1539a3d4cb8	2019-12-03 14:49:59 -08:00
Joseph Spisak	4d4d8e0dce	Update persons_of_interest.rst (#30647 ) Summary: Adding back the 3 names for the MSFT team - re: ONNX Governance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30647 Differential Revision: D18781163 Pulled By: jlin27 fbshipit-source-id: 7284ba29841ab41b9807c9d92694630b50de7b6a	2019-12-03 14:46:15 -08:00
Michael Suo	4e6379379c	fetch before checking out PR tip Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30680 Test Plan: Imported from OSS Differential Revision: D18796189 Pulled By: suo fbshipit-source-id: 99da48e5fd510ffdf4e606c2393eb55d4f6ca8d5	2019-12-03 14:43:19 -08:00
Supriya Rao	980aead1f8	Add support for quantized slice conversion (#30498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30498 Updated Int8SliceOp to accept dim, start and end index similar to Pytorch. Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_slice Imported from OSS Differential Revision: D18740519 fbshipit-source-id: 2313f37a4936edb150ce04911b241e591e191801	2019-12-03 14:37:59 -08:00
Sebastian Messmer	bc2e6d10fa	Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14" Summary: Original commit changeset: 775d2e29be0b Test Plan: CI Reviewed By: mruberry Differential Revision: D18775520 fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac	2019-12-03 14:33:43 -08:00
ashish	aff693ab1c	Ensure MIOpen is called on same stream as operator for RNN (#30672 ) Summary: To ensure synchronization between copying of weights in RNN wei buf, and the operation, both the pyTorch operator as well as underlying MIOpen call must be on the same HIP stream. This is also consistent with MIOpen calls in other pyTorch operators ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/30672 Differential Revision: D18785683 Pulled By: bddppq fbshipit-source-id: 144611046cb70cfe450680295734203f253ac6e2	2019-12-03 14:28:45 -08:00
Yanli Zhao	40146eb48e	Skip ProcessGroupGlooAyncTest if there is no CUDA available (#30345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30345 Skip ProcessGroupGlooAyncTest if there is no CUDA available, otherwise in sandcastle non GPU host the test will abort with failing to load CUDA library ghstack-source-id: 94771241 Test Plan: test skipped on non GPU host Differential Revision: D18665322 fbshipit-source-id: 8c7b89aeecc6ec007bee12d864a6058384254e61	2019-12-03 13:27:34 -08:00
Jerry Zhang	19cd90d303	Globally record observer nodes (#30547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30547 att Test Plan: test_jit.py test_quantization.py Imported from OSS Differential Revision: D18784752 fbshipit-source-id: 000e140aa86ff12a240d98da71871a5a5053401f	2019-12-03 12:16:00 -08:00
Natalia Gimelshein	1b5ce05924	don't use size()/stride() functions in TensorImpl, use size_[d]/stride_[d] instead (#30452 ) Summary: This improved multi-d microbenchmark by ~100 ns, empty_tensor_restride used to be 13% of iteration time, now about 5% Pull Request resolved: https://github.com/pytorch/pytorch/pull/30452 Test Plan: Covered by existing tests Differential Revision: D18704233 Pulled By: ngimel fbshipit-source-id: be527f09183bc31e9d1f63fd49bfbe0998fe167f	2019-12-03 11:38:07 -08:00
Jerry Zhang	7023e13fbb	Fix mapping white list (#30636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30636 Currently DeQuantStub is still in whitelist because set union has lower precedence than set difference fix issue: https://github.com/pytorch/pytorch/issues/29646 Test Plan: verified locally that we don't attach qconfig for DeQuantStub Imported from OSS Differential Revision: D18775275 fbshipit-source-id: 8da07e40963555671b3d4326c9291706103f858e	2019-12-03 11:34:28 -08:00
Tao Xu	f114c33e69	Fix iOS CI (#30327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30327 ### Summary Seems like starting from macOS 10.15, we can no longer get access to the `Downloads` folder in our macOS machines. ``` permissionError: [Errno 1] Operation not permitted: '/Users/distiller/Downloads' ``` The fix is to change the conda download directory to ${HOME} ### Test Plan - iOS jobs are back to normal - Don't break other jobs Test Plan: Imported from OSS Differential Revision: D18717380 Pulled By: xta0 fbshipit-source-id: cad754076bf4ae5035741aa57a310ad87c76726e	2019-12-03 11:24:21 -08:00
Edward Yang	1b12fd33ed	Add missing trigramma_stub definition. (#30314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30314 Somehow we forgot to define it! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762356 Pulled By: ezyang fbshipit-source-id: 28afc605ad986266071e3831049ec8a7f71fd695	2019-12-03 10:46:52 -08:00
Edward Yang	a009fc14be	Workaround hcc bug regarding extern "C" definitions (#30313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30313 See comments in code about the bug. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762360 Pulled By: ezyang fbshipit-source-id: 406a01f2f0c3722b381428c89afd67b3c3c19142	2019-12-03 10:46:48 -08:00
Edward Yang	8269f7b652	Delete redundant THC_API on THCStorage_new (#30312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30312 It's not necessary because it's already defined in the header. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762363 Pulled By: ezyang fbshipit-source-id: 418bf355d460dd171ac449559f20bf55415e54ae	2019-12-03 10:46:43 -08:00
Edward Yang	d43e205026	Properly include declaration of dispatch in file that registers it. (#30311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30311 multinomial_stub must be in scope to register against it. Somehow, this works today, but when I split torch_cpu and torch_cuda it doesn't. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762358 Pulled By: ezyang fbshipit-source-id: ef9c111292cd02d816af1c94c8bbaadabffaabe5	2019-12-03 10:46:38 -08:00
Edward Yang	a5b1f6e7d7	Add missing _API definitions. (#30310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30310 - Annotate CUDAGenerator.h with correct TORCH_CUDA_API. This is actually CUDA related functionality with its implementation living in the cuda/ folder. For some reason it lives at the top level; it should be moved (but that should be handled in another PR.) - Add missing TORCH/CAFFE_API annotations to. All of these functions are used from CUDA code, which means that we need to correctly annotate them if we split CPU/CUDA code into separate libraries. Test Plan: Imported from OSS Differential Revision: D18762357 Pulled By: ezyang fbshipit-source-id: c975a8e4f082fe9f4196c2cca40977623caf4148	2019-12-03 10:46:32 -08:00
Edward Yang	08394cede3	DEFINE_DISPATCH in the correct namespace. (#30308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30308 Dispatch is declared in non-anonymous namespace, so it definitely shouldn't be defined in an anonymous namespace. This doesn't seem to matter today, but it matters when we split libtorch into two libraries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762361 Pulled By: ezyang fbshipit-source-id: 484f0fab183c385dd889db9dad3e48e92e0a3900	2019-12-03 10:46:27 -08:00
Edward Yang	9740011f10	Use normal dispatch to get to CUDA threshold kernels, instead of DispatchStub. (#30307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30307 DispatchStub will stop working when I split CPU/CUDA libraries, because there are some symbols from the templates in DispatchStub stubs which aren't properly exported and I couldn't figure out how to make them dispatch properly. This is the only case where DispatchStub is being used to dispatch to CUDA, anyway. This partially addresses #29844 but I need to also just completely delete the CUDA registration logic from DispatchStub entirely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762362 Pulled By: ezyang fbshipit-source-id: bdfa8739c0daf23badf3c5af61890a934af00813	2019-12-03 10:46:22 -08:00
Ailing Zhang	a997f224ac	Add torch.multiprocessing.create_processes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493 Differential Revision: D18766066 Pulled By: ailzhang fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89	2019-12-03 10:38:19 -08:00
Lara	4d30415f12	Add ONNX Scripting Conv Support (#30618 ) Summary: Convolution nodes are traced as aten:_convolution and are currently supported in ONNX. Scripting convolution uses aten:conv<1,2,3>d which are currently not supported in ONNX. This PR adds the symbolics for aten:conv<1,2,3>d and aten:conv_transpose<1,2,3>d Pull Request resolved: https://github.com/pytorch/pytorch/pull/30618 Reviewed By: hl475 Differential Revision: D18778145 Pulled By: houseroad fbshipit-source-id: 4af0379f29974a1ce8443024d1d87b3eb8d2dd36	2019-12-03 10:28:38 -08:00
Jerry Zhang	89be1a22d4	split getInvokedMethods (#30546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30546 factor out this function for later support of quantizing shared types Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D18776304 fbshipit-source-id: f5a736b0f69019cefe17ec4517da1ae5462f78e1	2019-12-03 10:11:57 -08:00
Natalia Gimelshein	d5c136097a	improve .view() performance (#30554 ) Summary: Improve .view() performance by not calling set_ and instead restriding returned alias. This improves performance of .view() operation from ~500ns to ~360 ns Pull Request resolved: https://github.com/pytorch/pytorch/pull/30554 Test Plan: covered by existing tests Differential Revision: D18759896 Pulled By: ngimel fbshipit-source-id: 9757c93158bc55e9c87dc30ac3415ba8f8b849e5	2019-12-03 09:17:43 -08:00
Rohan Varma	5a484245d9	Change test_invalid_names test to only test constructor of WorkerInfo (#30620 ) Summary: This tests seems to only test that we throw exceptions in the `WorkerInfo` constructor when invalid names are passed in, so I don't think we need to complicate by initializing RPC, and exposing ourselves to potential flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30620 Differential Revision: D18766955 Pulled By: rohan-varma fbshipit-source-id: 11643de4d57431e5f46e096c7766de3ab0b9b05a	2019-12-03 09:07:10 -08:00
Shen Li	f9f54201d3	Remove deprecated fromIvalue in RRefForkData Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30646 Test Plan: Imported from OSS Differential Revision: D18777610 Pulled By: mrshenli fbshipit-source-id: 7a749c1035e36bbb464332d3829fd53e2c6cf727	2019-12-03 09:01:40 -08:00
Nik Ved	b446572997	TestCppExtension now removes /tmp/torch_extensions folder so that it can be used by other users in a multi-user environment. (#30095 ) Summary: Previous behaviour: a user runs tests from `TestCppExtension` class so that `/tmp/torch_extensions` is created under her ownership and not removed afterwards, then the other user's run of the same tests might result in 'Permission denied' exception upon deleting `/tmp/torch_extensions`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30095 Differential Revision: D18770234 Pulled By: ezyang fbshipit-source-id: 4c6b972e4c4327a94c8b4bf6b0b9998a01c218bb	2019-12-03 07:44:27 -08:00
Gregory Chanan	8b29701ae5	Turn off scalar_checks for _th_reciprocal. (#30436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30436 The underlying TH implementation is correct. Test Plan: Imported from OSS Differential Revision: D18699088 Pulled By: gchanan fbshipit-source-id: e75a588ae4afb0506922ba98208546d5c0de623a	2019-12-03 07:04:53 -08:00
Gregory Chanan	61798865e3	Turn off scalar_checks for torch.clamp. (#30435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30435 The underlying THC implementations are correct. Test Plan: Imported from OSS Differential Revision: D18699089 Pulled By: gchanan fbshipit-source-id: f5d1319bf48eae36903296dad0b98ed80661f732	2019-12-03 07:04:47 -08:00
Brian Vaughan	e5b947a3a8	Raise an error for is_signed on quantized types (#30527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527 When we introduced dtype.is_signed we allowed for support of quantized types, but we're not sure what the correct result should be. See discussion at https://github.com/pytorch/pytorch/pull/29511 Test Plan: Imported from OSS Differential Revision: D18765410 Pulled By: nairbv fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d	2019-12-03 06:34:53 -08:00
Will Feng	18ec4632b3	Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (#30626 ) Summary: PR https://github.com/pytorch/pytorch/pull/30523 attempted to fix https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462, but the fix wasn't complete. This PR makes the following improvements: 1. Fixes https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462 properly by excluding undefined tensors in the result of `Module::parameters()` / `named_parameters()` / `buffers()` / `named_buffers()`, which mirrors the Python API behavior. 2. Audits all use sites of `Module::parameters_` / `buffers_` and change them to `Module::named_parameters(/recurse=/false)` / `named_buffers(/recurse=/false)` when appropriate, so that use sites of module parameters / buffers never need to worry about undefined tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30626 Differential Revision: D18777507 Pulled By: yf225 fbshipit-source-id: 55b64b69779e1186342efd3c44857f416334ed6b	2019-12-02 21:59:58 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Jianyu Huang	0bebfe2143	Add the explicit per-tensor/per-channel quant info when we print the module (#30591 ) Summary: As Title says. We would like to explicitly distinguish per-tensor/per-channel scheme when we print the module. Here is an example for Lenet after applying the per-channel dynamic quantization: Before this PR: ``` FloatModel( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) (fc1): DynamicQuantizedLinear( in_features=800, out_features=500 (_packed_params): LinearPackedParams() ) (fc2): DynamicQuantizedLinear( in_features=500, out_features=10 (_packed_params): LinearPackedParams() ) ) ``` After this PR: ``` FloatModel( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) (fc1): DynamicQuantizedLinear( in_features=800, out_features=500, qscheme=torch.per_channel_affine (_packed_params): LinearPackedParams() ) (fc2): DynamicQuantizedLinear( in_features=500, out_features=10, qscheme=torch.per_channel_affine (_packed_params): LinearPackedParams() ) ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30591 Differential Revision: D18764366 Pulled By: jianyuh fbshipit-source-id: e897ab42ace6b82b2a90729ba788313c7873de1a	2019-12-02 20:14:46 -08:00
Jeremy Lilley	4dab29a2bd	Fix serialization memory lifetime issue. (#30603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30603 Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. ghstack-source-id: 94756036 Test Plan: existing test suite at buck test mode/dev-nosan caffe2/test:rpc_fork Differential Revision: D18760463 fbshipit-source-id: 9de890d66626aa48f13ca376dd9bd50b92e0cb00	2019-12-02 20:10:28 -08:00
Pritam Damania	db81e13d6b	Fix TCPStoreTest and improve tcputils::connect() (#30354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30354 TCPStoreTest would timeout since the TCPStore constructor for the server would block the main thread waiting for workers. The workers themselves were spawned later on once the server store is created. As a result, this test would always timeout. To fix the test, I moved the server store to a thread so that the workers can register with the server in parallel. In addition to this made a few improvements to tcputils::connect. When tcputils::connect() encountered an exception, it always looked at `errno` for the error code. In some cases `errno` could be overwritten and the real error code would be stored in `std::system_error`. As a result, I've modified the code to look at the error code in `std::system_error` if we catch an exception of that type. ghstack-source-id: 94758939 Test Plan: waitforbuildbot Differential Revision: D18668454 fbshipit-source-id: d5a3c57b066b094bfecda9a79d9d31bfa32e17f0	2019-12-02 19:52:34 -08:00
Wenlei He	9e3d19412b	Disable implicit conversion warning (#30529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30529 We started to see build failures for multiple services with top-of-trunk LLVM compiler. The failures point to a warning that was treated as error for implicit conversion from long to double. Per discussion on D18642524, I'm disabling this warning from the containing TARGET file. T58053069 opened for code owner to track this - a proper source code fix and more unit test is needed. Test Plan: local build, sandcastle Reviewed By: smessmer Differential Revision: D18668396 fbshipit-source-id: 28c0ff3258c5ba3afd41a0053f9fe1b356a496a8	2019-12-02 18:30:03 -08:00
Supriya Rao	968c0d4a46	Add support for converting quantized AvgPool2d and Reshape operations (#30490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30490 Add symbolic mapping to Int8AvgPool2d and Int8Reshape op in C2 Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps Imported from OSS Differential Revision: D18740520 fbshipit-source-id: 1606125500c4b549fbc984e7929b7fd5204396a0	2019-12-02 18:15:01 -08:00
Pritam Damania	2d0a4e42e9	Add barriers to fix flaky test_graph_for_py_nested_call and (#30624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30624 These tests were flaky since we would end up calling the 'verify' methods before some of the RPCs were done. The `check_rpc_done` function might not guarantee this since set_rpc_done sets an appropriate flag in python which causes `check_rpc_done` to pass. Although, there are a few steps after that like attaching the send functions for the response of the RPC that might not have executed by then. ghstack-source-id: 94781954 Test Plan: Run the tests 100 times. Reviewed By: zhaojuanmao Differential Revision: D18768786 fbshipit-source-id: a14c3f4b27de14fe5ecc6e90854dc52652f769b8	2019-12-02 18:12:28 -08:00
Michael Ranieri	98ab55fc51	PRAGMA missing for clang (#30351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30351 Not sure what proper fix is, clang is having trouble with the loop pragmas. This at least gets things compiling. ghstack-source-id: 94458450 Test Plan: CI passes Differential Revision: D18665812 fbshipit-source-id: b8a899ce4138010cbe308eaa2c0838dd9e15573f	2019-12-02 17:50:22 -08:00
davidriazati	9c02b88791	Add pickler support for Device (#30131 ) Summary: This PR adds (un)pickling support for `c10::Device`. It also adds `torch.device` as a type annotation for device attributes. ](https://our.intern.facebook.com/intern/diff/18664421/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30131 Pulled By: driazati Differential Revision: D18664421 fbshipit-source-id: 64378fb42b2d1bbe2bd86259e5ed10f24b5d1e49	2019-12-02 17:43:08 -08:00
davidriazati	19b7d49fac	Add TOC to CONTRIBUTING.md (#29671 ) Summary: This TOC is manually generated but `CONTRIBUTING.md` seems like its stable enough for that to be okay Pull Request resolved: https://github.com/pytorch/pytorch/pull/29671 Pulled By: driazati Differential Revision: D18771604 fbshipit-source-id: 0d6c9c6cf1083d3be413219d3cead79c2fe5050b	2019-12-02 16:47:59 -08:00
Gregory Chanan	569729527b	Turn off scalar_checks for exp, cos, cosh, tan, atan, tanh, erf, erfc. (#30434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30434 These are all pointwise ops that are implemented correctly wrt shapes in THC. Test Plan: Imported from OSS Differential Revision: D18699087 Pulled By: gchanan fbshipit-source-id: 82cb91b00c77bfaca75be497c87fc7ae52daf46c	2019-12-02 16:10:25 -08:00
Sebastian Messmer	9082123038	Back out "Back out "Revert D18542342: Boxed variable dispatch"" Summary: Original commit changeset: 7f3e32a6ee0c Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D18766763 fbshipit-source-id: 51bb7aac7cb7ce3df94681e838949e7a156e3ad9	2019-12-02 16:06:36 -08:00
Mingbo Wan	3636cb0364	windows build (#30556 ) Summary: based on https://github.com/pytorch/pytorch/pull/28677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30556 Differential Revision: D18764040 Pulled By: mingbowan fbshipit-source-id: 53104636800f5887b74a82c154bc5e9603de9322	2019-12-02 14:54:22 -08:00
Jongsoo Park	d32f261f16	make the order btw div and mul in adagrad update consistent (#30449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30449 There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. In this diff we first compute effective_lr = lr / (sqrt(moment) + epsilon) and then multiply with gradient. Test Plan: CI Reviewed By: protonu Differential Revision: D18703416 fbshipit-source-id: 2a8b2a3f5401466549561412bd22f07abac3c598	2019-12-02 13:53:38 -08:00
Edward Yang	1111a6b810	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/29095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274 Differential Revision: D18762293 Pulled By: ezyang fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9	2019-12-02 12:19:58 -08:00
peterjc123	6deb41c88d	Update magma to 2.5.1 for Windows and switch CUDA in CI to 9.2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30513 Differential Revision: D18764184 Pulled By: ezyang fbshipit-source-id: 4992869fd6a89471a5d25eb6a9b44ad8eceb480f	2019-12-02 11:56:10 -08:00
Mingzhe Li	b68d1fc316	add small input shapes to some ops (#30617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617 as title Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split Reviewed By: hl475 Differential Revision: D18764248 fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19	2019-12-02 10:46:43 -08:00
Liu Xiteng	8ee61e0be4	Fix CPU_INTEL flag error on windows (#30564 ) Summary: ${CMAKE_HOST_SYSTEM_PROCESSOR} get processor name by `uname -p` on linux and `%PROCESSOR_ARCHITECTURE%` on windows 1. %PROCESSOR_ARCHITECTURE% has value in (AMD64\|IA64\|ARM64) for 64-bit processor, and (x86) for 32-bit processor 2. `uname -p` has value like "(x86_64\|i[3-6]+86)" We cannot tell intel cpu from other cpus by ${CMAKE_HOST_SYSTEM_PROCESSOR}. It is the architecture, not provider. i. e. Intel CPU i7-9700K CPU on windows get "AMD64" reference: [MSDN](https://docs.microsoft.com/zh-cn/windows/win32/winprog64/wow64-implementation-details?redirectedfrom=MSDN) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30564 Differential Revision: D18763031 Pulled By: ezyang fbshipit-source-id: 11ae20e66b4b89bde1dcf4df6177606a3374c671	2019-12-02 08:43:01 -08:00
Jeremy Lilley	e6000a7c04	Temporarily disable test_numerical_consistency_per_tensor (#30600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30600 test_numerical_consistency_per_tensor in test_fake_quant is failing on Windows. ghstack-source-id: 94742124 Test Plan: CircleCI tests Differential Revision: D18760287 fbshipit-source-id: 7f59355eab74e811bb370ad2836ed2f1def1f621	2019-12-02 06:57:14 -08:00
Jeremy Lilley	c780610f2d	Disable test_backward_per_tensor in test_fake_quant (#30594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30594 This testcase started breaking, clean up for the build. ghstack-source-id: 94736837 Test Plan: Unittest disabling change Differential Revision: D18758635 fbshipit-source-id: 05df1158ff0ccd75e401f352da529fb663b1cae0	2019-12-01 22:26:28 -08:00
Peter Bell	53785771a7	Don't build test_cpp_rpc if torch is built without distributed support (#30587 ) Summary: On the latest master, I get link errors when building one of the tests: ```sh /home/pbell/git/pytorch/build/../test/cpp/rpc/test_wire_serialization.cpp:23: undefined reference to `torch::distributed::rpc::wireDeserialize(void const*, unsigned long)' ``` This seems to be caused by PR https://github.com/pytorch/pytorch/issues/29785 not working with `USE_DISTRIBUTED=0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30587 Differential Revision: D18758625 Pulled By: jjlilley fbshipit-source-id: 0ad0703acdbbac22bb4b8317370fbe2606fcb67e	2019-12-01 16:43:12 -08:00
Shen Li	dd52f50fc8	Add examples to RRef doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30516 Test Plan: Imported from OSS Differential Revision: D18728183 Pulled By: mrshenli fbshipit-source-id: af472ebed0e6dd0a85653b080abd3ac4d482bd26	2019-11-28 15:34:26 -08:00
Shen Li	30d70d5378	Make doc source format consistent in rpc/init.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30515 Test Plan: Imported from OSS Differential Revision: D18728184 Pulled By: mrshenli fbshipit-source-id: 7b643c7f8225943113fbd7130ff6aadb30c1d4e9	2019-11-28 15:34:22 -08:00
Shen Li	ec5e471647	Reorganize rpc API doc and add introduction (#30491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30491 Our RPC API docs presents the APIs well but misses a general introduction to the APIs. Readers might be a little lost the first time landing this page. This commits reorganizes the APIs into four components from user's perspective, RPC, RRef, dist autograd, and dist optimizer. It also adds an intro to each and briefly discribes why we provide those. Test Plan: Imported from OSS Differential Revision: D18723294 Pulled By: mrshenli fbshipit-source-id: 4aced4ab537b070aa780aaaf9724659fd47cb3cb	2019-11-28 15:34:18 -08:00
Jeremy Lilley	f4e7e9039d	Improve process_group_agent() serialization speed (#29785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29785 TLDR: This change improves process_group's serialization speed: Serialize_Tensor64: 12.38us -> 1.99us (~-84%) Deserialize_Tensor64: 33.89us -> 5.62us (~-84%) Serialize_Tensor1M: 525.74us -> 285.43us (~-45%) Deserialize_Tensor1M: 892.61us -> 273.68us (~-70%) After speaking with the jit team, we had consensus that torch::save()/load() are somewhat high-overhead for RPC serialization, mostly intended for persistent disk data. (Particularly, for large tensors, 35% of the time is spent in CRC checking, even with the fb-side changes to subsitute 40x faster SSE-accelerated crc checking; Also, for small tensors, the zip container overhead is considerable, as is the overhead of lexing/parsing an embedded text python program for each RPC). The jit team encouraged us to use jit::pickler, with the WriteableTensorData way of outputting result tensors (not the default side-tensor table, or with pickling the actual tensors). This ends up just pickling some tensor metadata, and giving us some tensor blobs that we can mindlessly blit over the wire (they copy to cpu memory if needed). There is yet no standardized container format for the pickled data (there is jit::pickle_save() checked in, but but it's experimental, no load function is yet provided), but they encouraged us to just use something sensible for this, and possibly revisit later. For now, I made the directory headers slightly http-inspired. Note that serialization is just one component of the pipeline, but that said, we also see reasonable reductions in end-to-end echo times (noisier): ProcessGroupAgent_Echo(Tensor_Small) 855.25us -> 492.65us (~-42%) ProcessGroupAgent_Echo(Tensor_1M) 10.82ms -> 6.94ms (~-35%) ProcessGroupAgent_Echo(Small_NoTensor) 688.82us -> 301.72us (~-56%) ProcessGroupAgent_Echo(1MB_NoTensor) 4.65ms -> 3.71ms (~-20%) I moved the "wire serialization" logic to a separate file to assist with unittesting. ghstack-source-id: 94694682 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/api:serialize buck test mode/dev-nosan caffe2/test/... Differential Revision: D18493938 fbshipit-source-id: 07ddfe87dbe56472bc944f7d070627052c94a8f4	2019-11-28 09:57:52 -08:00
Rohan Varma	1350b99de4	Add local shutdown to process group agent (#30330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330 This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately. ghstack-source-id: 94673884 ghstack-source-id: 94673884 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18661775 fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2	2019-11-27 22:34:08 -08:00
Will Feng	7ac8efa689	Skip undefined tensors when moving torch::nn module to a different device (#30523 ) Summary: This fixes high-pri issues such as https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30523 Differential Revision: D18732904 Pulled By: yf225 fbshipit-source-id: fe5a7a43838000f5803bd9c01ecfba0c3f02df5d	2019-11-27 21:21:02 -08:00
Sebastian Messmer	640109ae5d	Back out "Revert D18542342: Boxed variable dispatch" Summary: Original commit changeset: 082992125447 Test Plan: waitforsandcastle Reviewed By: akinh Differential Revision: D18737627 fbshipit-source-id: 7f3e32a6ee0c330002ae7fdcc8a35e8b540bb4db	2019-11-27 17:39:09 -08:00
Lu Fang	87f29557bd	Ignore logical_and and logical_or in op BC check for now (#30537 ) Summary: Get the CI happy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30537 Reviewed By: hl475 Differential Revision: D18738567 Pulled By: houseroad fbshipit-source-id: f30a87e22653b83ebdb1b54851460ec245866ecf	2019-11-27 16:59:37 -08:00
Sebastian Messmer	a2ed50c920	Revert D17908478: Switch PyTorch/Caffe2 to C++14 Test Plan: revert-hammer Differential Revision: D17908478 Original commit changeset: 6e340024591e fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d	2019-11-27 14:57:05 -08:00
Gregory Chanan	0b25371f5d	Turn off scalar_check for _th_normal. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29955 Test Plan: Imported from OSS Differential Revision: D18548051 Pulled By: gchanan fbshipit-source-id: c652999ac9e37d2592aa85ef022040fe0700b5cf	2019-11-27 14:52:06 -08:00
Sebastian Messmer	f3631c2464	Revert D18542342: Boxed variable dispatch Test Plan: revert-hammer Differential Revision: D18542342 Original commit changeset: a30ae35d98f8 fbshipit-source-id: 082992125447c814c90f7934fadf00995e146e0e	2019-11-27 14:01:40 -08:00
Karl Ostmo	7d2b0aa693	add retries to network operations (curl, conda install, git clone) (#30479 ) Summary: Addresses some of the top network-related flakiness occurrences. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30479 Differential Revision: D18736386 Pulled By: kostmo fbshipit-source-id: 9eb5dca0cd0281894a0b304fbaf59a0341d3ff58	2019-11-27 13:58:15 -08:00
Richard Zou	c1c5622a6a	Add katex to pytorch-linux-xenial-py3.6-gcc5.4 docker image (#30522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30522 This is in preparation for moving the docs push CI jobs to depend on `pytorch-linux-xenial-py3.6-gcc5.4` rather than `pytorch-linux-xenial-cuda9-cudnn7-py3`. Test Plan: Imported from OSS Differential Revision: D18731108 Pulled By: zou3519 fbshipit-source-id: fd753a5ca818fa73a14e4276c33368a247cc40e1	2019-11-27 12:41:58 -08:00
Tao Xu	a69be8123a	Use `gettimeofday` on iOS (#30361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30361 ### Summary By default, the compiler will choose `clock_gettime` for the iOS build. However, that API is not available until iOS 10. Since the Facebook app still supports iOS 9.0, we have to use `gettimeofday` instead. ```shell xplat/caffe2/torch/csrc/autograd/profiler.h:86:3: error: 'clock_gettime' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability] xplat/caffe2/torch/csrc/autograd/profiler.h:86:17: error: '_CLOCK_MONOTONIC' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability] ``` P.S. the open-sourced version is iOS 12.0 and above, so we don't have this problem. ### Test Plan - buck build works - Don't break CIs Test Plan: Imported from OSS Differential Revision: D18730262 Pulled By: xta0 fbshipit-source-id: fe6d954b8d3c23cbc9d1e25a2e72e0b0c1d4eaa9	2019-11-27 11:48:41 -08:00
svcscm	2f42488d36	Updating submodules Summary: GitHub commits: `64dc8e79e9` `3b2aa3c218` `dc6c17ca9e` `4508ea4e06` `6150034ff3` `12b7a89a4b` `9befbe9b40` `2fd96cc070` `68bf04ce46` `19bd96d453` `7229ad4fd7` `b2bb2b465b` `4c65c9023d` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: e7dc6a4ebafdc6a01aff89f4038f5679ed6e7011	2019-11-27 11:44:54 -08:00
Alban Desmaison	106ab487eb	fix typo in doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30518 Differential Revision: D18729361 Pulled By: albanD fbshipit-source-id: 4e386b99e898b9cd8f9a21dff642d0f40355899f	2019-11-27 11:19:13 -08:00
peterjc123	fcb7371e65	Update docs for cpp_extension on Windows (#30392 ) Summary: Targets https://github.com/pytorch/pytorch/issues/30379. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30392 Differential Revision: D18730438 Pulled By: albanD fbshipit-source-id: f718d006ee8aaaa356c1e15e53a0469f15e8ed41	2019-11-27 10:56:29 -08:00
Sebastian Messmer	d0acc9c085	Switch PyTorch/Caffe2 to C++14 (#30406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406 ghstack-source-id: 94642238 Test Plan: waitforsandcastle Differential Revision: D17908478 fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb	2019-11-27 10:47:31 -08:00
Richard Zou	ec5c08de74	Revert D18580867: Add logical_and and logical_or Test Plan: revert-hammer Differential Revision: D18580867 Original commit changeset: 7e4d7c37da4d fbshipit-source-id: 81fb604c7aef8d847f518f5faa016e7bd0423016	2019-11-27 09:27:00 -08:00
Bowen Bao	1e8ed021c6	Support logsoftmax with dim != -1 (#30433 ) Summary: PyTorch dim and ONNX axis have different meanings. ONNX only supports log_softmax with dim = -1. Transpose must be added before and after log_softmax to support other cases. This requires input rank to be known at export time. Fixes https://github.com/pytorch/pytorch/issues/17918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30433 Reviewed By: hl475 Differential Revision: D18723520 Pulled By: houseroad fbshipit-source-id: d0ed3b3f051d08d46495a7abfa854edd120dca3a	2019-11-27 08:34:38 -08:00
Pieter Noordhuis	0282c5ae69	Add helper to aggregate multiple process groups (#25768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25768 The round robin process group can be constructed from multiple other process groups. Every collective call against this new process group is delegated to the specified process groups in a round robin fashion. Doing so may benefit performance when calling into multiple NCCL process groups. Instead of adding support for round-robin usage of NCCL communicators, we achieve the same without changing the NCCL process group and adding this wrapper class. The API to create this round robin process group is a bit harsh. If we find it adds significant benefit we can revisit and make this a first class citizen in the torch.distributed module. ghstack-source-id: 94578376 Test Plan: The newly added test passes. Reviewed By: chenyangyu1988 Differential Revision: D17226323 fbshipit-source-id: ec9f754b66f33b983fee30bfb86a1c4c5d74767d	2019-11-27 08:34:34 -08:00
Pieter Noordhuis	1d3f3a1a0c	Add pybind11 trampoline class for c10d.Store (#30415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30415 This enables subclassing of c10d.Store and implementing its interface in Python. ghstack-source-id: 94586627 Test Plan: New tests passes. Reviewed By: vladbelous Differential Revision: D18693018 fbshipit-source-id: fa1eba4bd11cc09a3d6bf3f35369c885033c63c0	2019-11-27 08:34:29 -08:00
Sebastian Messmer	d2336edcfb	Boxed variable dispatch (#29934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29934 Previously, when doing boxed dispatch (e.g. custom ops), the dispatcher manually removed the VariableTensorId flag before dispatching because custom ops don't have variable kernels. This is one of the blockers that prevented us from using the boxed dispatch mechanism for ops from native_functions.yaml because they define variable kernels and need them to be called for autograd. This PR changes that. The dispatcher doesn't remove the VariableTensorId flag anymore. Instead, to make custom ops work, we implement a variable fallback kernel that is called whenever no other variable kernel was found. ghstack-source-id: 94618474 Test Plan: unit tests Differential Revision: D18542342 fbshipit-source-id: a30ae35d98f89f7ae507151f55c42cfbed54a451	2019-11-27 08:34:25 -08:00
neginraoof	512c2a2df5	Enable constant folding (#29834 ) Summary: Set default do_constant_folding = True Pull Request resolved: https://github.com/pytorch/pytorch/pull/29834 Reviewed By: hl475 Differential Revision: D18588037 Pulled By: houseroad fbshipit-source-id: b35c06161321629c886e177ea666eff31cebf06a	2019-11-27 08:34:20 -08:00
Junjie Bai	c1c8105de0	Make the warning of using SparseTensor in JIT less noisy Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30499 Test Plan: waitforsandcastle Reviewed By: wanchaol Differential Revision: D18705553 fbshipit-source-id: d6e16e3285a74a1c031a5312f7a690f1baf392f8	2019-11-27 08:34:16 -08:00
Jiakai Liu	829499e626	avoid Formatting::print() when STRIP_ERROR_MESSAGES is set (#30451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30451 TORCH_CHECK takes __VA_ARGS__ so there is no need to concatenate strings before calling it. This way it won't call Formatting::print() on the tensor when STRIP_ERROR_MESSAGES macro is set. Formatting::print() calls several specific tensor methods that brings in unnecessary inter-op dependencies for static code analysis. Test Plan: - builds Differential Revision: D18703784 Pulled By: ljk53 fbshipit-source-id: 1c0628e3ddcb2fd42c475cb161edbef09dfe8eb5	2019-11-26 17:38:45 -08:00
Daya Khudia	2d6b2f39e9	Fix docs so that the example works (#30120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30120 The example given for functional conv2d didn't work. This diff fixes the example in docs so that it works. Fixes https://github.com/pytorch/pytorch/issues/29649 ghstack-source-id: 94601559 Test Plan: Tried the example locally Differential Revision: D18604606 fbshipit-source-id: ff1a4f903e2843efe30d962d4ff00e5065cd1d7e	2019-11-26 17:38:40 -08:00
Ivan Kobzarev	5ada5363fc	GenericDict/List type use unshapedType() (#30428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30428 Reported issue https://discuss.pytorch.org/t/incomprehensible-behaviour/61710 Steps to reproduce: ``` class WrapRPN(nn.Module): def __init__(self): super().__init__() def forward(self, features): # type: (Dict[str, Tensor]) -> int return 0 ``` ``` #include <torch/script.h> int main() { torch::jit::script::Module module = torch::jit::load("dict_str_tensor.pt"); torch::Tensor tensor = torch::rand({2, 3}); at::IValue ivalue{tensor}; c10::impl::GenericDict dict{c10::StringType::get(),ivalue.type()}; dict.insert("key", ivalue); module.forward({dict}); } ``` ValueType of `c10::impl::GenericDict` is from the first specified element as `ivalue.type()` It fails on type check in` function_schema_inl.h` !value.type()->isSubtypeOf(argument.type()) as `DictType::isSubtypeOf` requires equal KeyType and ValueType, while `TensorType`s are different. Fix: Use c10::unshapedType for creating Generic List/Dict Test Plan: Imported from OSS Differential Revision: D18717189 Pulled By: IvanKobzarev fbshipit-source-id: 1e352a9c776a7f7e69fd5b9ece558f1d1849ea57	2019-11-26 17:38:36 -08:00
Pavel Belevich	6bd8937aee	FunctionParameter::set_default_str replace \|\| with && Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30471 Test Plan: Imported from OSS Differential Revision: D18710958 Pulled By: pbelevich fbshipit-source-id: 7e5339175c7e16cd975a90bf6b123df728045e4d	2019-11-26 17:38:31 -08:00
Hong Xu	21d7532dfe	Add more comment on NumPy detection in Python scripts. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30417 Differential Revision: D18716502 Pulled By: albanD fbshipit-source-id: 0b1b86f882e0e24cb6845e4a44708048e7e3b4a8	2019-11-26 17:38:27 -08:00
Hong Xu	8bbafa0b32	Add logical_and and logical_or (#28162 ) Summary: Superseding https://github.com/pytorch/pytorch/issues/24379 as type promotion has been implemented. Close https://github.com/pytorch/pytorch/issues/24379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28162 Differential Revision: D18580867 Pulled By: ailzhang fbshipit-source-id: 7e4d7c37da4dc8df87314bd4f1f6a7539e46586a	2019-11-26 17:38:22 -08:00
Bram Wasti	92e27c5e89	Flag to disable Variable Summary: using `buck build mode/opt mode/no-gpu //experimental/ngimel/benchmark_framework_overheads:cpp_benchmark` ``` devvm497.prn3.facebook.com:/data/users/bwasti/fbsource/fbcode $ ./cpp_benchmark --niter 10000 creating inputs, number of dimensions 1 starting op benchmarking 10000 iterations using cpp frontend elapsed time per iteration 0.90638 us ``` ``` devvm497.prn3.facebook.com:/data/users/bwasti/fbsource/fbcode $ ./cpp_benchmark --niter 10000 --disable_variable_dispatch creating inputs, number of dimensions 1 starting op benchmarking 10000 iterations using cpp frontend elapsed time per iteration 0.775436 us ``` Test Plan: let all tests run Reviewed By: smessmer Differential Revision: D18654276 fbshipit-source-id: 362812b2c87ec428448b2ac65baac45f492fdce4	2019-11-26 17:38:18 -08:00
Santiago Castro	4eff2f2007	Fix missing closing quotes in docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30448 Differential Revision: D18711396 Pulled By: zou3519 fbshipit-source-id: 6e35e0779716185791273eedca7a93667a6cda90	2019-11-26 17:38:13 -08:00
James Reed	05a1644ce3	Fix BC for quantized linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30481 Test Plan: Imported from OSS Differential Revision: D18714602 Pulled By: jamesr66a fbshipit-source-id: d51206c22cf2446e98053446789c6324c0481321	2019-11-26 17:38:09 -08:00
Elias Ellison	976d91d30a	Comment on a set of ops bound at the python layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30420 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D18713999 Pulled By: eellison fbshipit-source-id: 3a8d6e4431cbfe6a78ca047217c1c53c47403841	2019-11-26 17:38:04 -08:00
Elias Ellison	634f370c63	Add comment to ops bound at python layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30419 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D18714000 Pulled By: eellison fbshipit-source-id: 22ccb941b2db24031921f378c600e68fe70e1346	2019-11-26 17:37:59 -08:00
Deyu Fu	c5a6c4d6c9	Adding elementwise kernel also operating on index (#28175 ) Summary: This PR add `gpu_kernel_with_index` as an addition to element-wise kernel template. It allows kernel to not only operate on input tensor value, but also each values index(view as 1d, so from 0 to numel) within the lambda. Direct use case here is to replace thrust::tabulate used in range/arange/linspace. Benifits are: - thrust::tabulate causes additional unneccessary synchronization on cpu. - Now it works with tensor iterator, output no longer needs to be contiguous and a memcpy is saved It can also potentially be reused to add new function to pytorch later, if we see use case both value and index is needed.(for example unify tril/triu into tensor iterator element-wise? add other pattern?) Known issues: https://github.com/pytorch/pytorch/pull/23586 is needed to enable non-contiguous case work properly, since overlapping needs to be checked. Currently non-contiguous tensor falls into TOO_HARD. I could write proper check in this file but I figured using exist method is better. jjsjann123 It does not work beyond 32bit indexing. But thrust was erroring on those case too. We could split tensor in caller to enable this. Index changes after split, so it is easier for caller to pass different lambda, and harder for the template to handle it in general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28175 Differential Revision: D18708649 Pulled By: ngimel fbshipit-source-id: 382081c96f266ae7b61095fc1f2af41c6b210fa9	2019-11-26 17:37:55 -08:00
Xingying Cheng	e9cc4a5942	Add @DoNotStrip to nativeNewTensor method. (#30472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30472 Add DoNotStrip to nativeNewTensor method. ghstack-source-id: 94596624 Test Plan: Triggered build on diff for automation_fbandroid_fallback_release. buck install -r fb4a Tested BI cloaking using pytext lite interpreter. Obverse that logs are sent to scuba table: {F223408345} Reviewed By: linbinyu Differential Revision: D18709087 fbshipit-source-id: 74fa7a0665640c294811a50913a60ef8d6b9b672	2019-11-26 12:16:33 -08:00
Jerry Zhang	fec903ce00	Fix test case after get_qparams refactor (#30470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30470 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18710775 fbshipit-source-id: b1c7c0afbc538ff1d3e19c5d3d6bd425e4f94f06	2019-11-26 12:16:29 -08:00
albanD	b0871f211b	Make all optimizers consistent so that they don't change gradients inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257 Test Plan: Imported from OSS Differential Revision: D18665461 Pulled By: albanD fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95	2019-11-26 12:16:25 -08:00
Xinyi Zhang	45880f4246	Change logging to remove the word "error" from info log Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30468 Reviewed By: xianjiec Differential Revision: D18702959 fbshipit-source-id: a777445bea735dce89182dd95f38907963fab556	2019-11-26 12:16:21 -08:00
vishwakftw	dcd9f49809	Specify ordering on singular values and eigenvalues output from torch… (#30389 ) Summary: ….svd/symeig respectively Changelog: - Adds a note to docstrings of the both functions specifying the ordering Fixes https://github.com/pytorch/pytorch/issues/30301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30389 Differential Revision: D18707608 Pulled By: zou3519 fbshipit-source-id: b0f73631578f39a24fae9af4997c6491de8be9a8	2019-11-26 10:23:47 -08:00
Gregory Chanan	dbce53fe32	Turn off scalar_check for _th_gather. (#29954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29954 The underlying op handles scalar_check correctly. Test Plan: Imported from OSS Differential Revision: D18548054 Pulled By: gchanan fbshipit-source-id: a1b44afa80c2928b78abbfba8b8b5d3608ac0fd3	2019-11-26 10:23:42 -08:00
Gregory Chanan	72ac45662b	Turn off scalar_checks for torch.take. (#29953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29953 The underlying function handles it correctly. Test Plan: Imported from OSS Differential Revision: D18548055 Pulled By: gchanan fbshipit-source-id: cc2d0ae37d9689423363d115c6a653cb64840528	2019-11-26 10:23:37 -08:00
Gregory Chanan	79a830af56	Turn off scalar_check for Tensor.set_(Tensor) (#29952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29952 The underlying op handles the check correctly. Test Plan: Imported from OSS Differential Revision: D18548048 Pulled By: gchanan fbshipit-source-id: 9ac6fde743408e59ccdfc61bd574ebe6e2862238	2019-11-26 10:23:33 -08:00
BowenBao	0febff36ac	Export dynamic unbind/split and __getitem__ (#29136 ) Summary: In ONNX opset 11, a series of sequence ops were added. Operators that are related to Tensor[] in PyTorch can be exported using these sequence ops. In this PR, unbind/split that produces Tensor[], and __getitem__ that takes Tensor[] as input, are exported correctly to ONNX opset 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29136 Reviewed By: hl475 Differential Revision: D18309222 Pulled By: houseroad fbshipit-source-id: be12c96bf8d0a56900683ef579f1c808c0a1af21	2019-11-26 06:54:06 -08:00
Supriya Rao	2599b9b551	Add output_size argument to caffe2 Int8ResizeNearest (#30202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30202 Pytorch Upsample operator has output_size as an argument. For quantized tensor inputs we cannot get the input_size to calculate the width and height scale factor. Instead we pass the output_size directly to caffe2 to calculate the scale factors. Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_upsample Imported from OSS Differential Revision: D18631478 fbshipit-source-id: 38a39129bc863f4ecf2293acc068e40ab7edc825	2019-11-26 06:54:02 -08:00
Shen Li	efe1859ad9	By default ignore RRef leaks during shutdown (#30217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217 Before this commit, RRefContext throws an error if it detects any RRef leak during shutdown. However, this requires applications to make sure that is has freed all references to RRefs in application code, which can be a bad debugging experience when for large applications. Besides, this also relies on Python GC to free things up in time, which might not always be true. After this commit, RRefContext would ignore leaking RRefs during shutdown, as shutdown is called when the application has finished training and no longer care about local states. Hence, it should be OK to just ignore those leaks and destroy OwnerRRefs. If application would like to enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak to False. Test Plan: Imported from OSS Differential Revision: D18632546 Pulled By: mrshenli fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38	2019-11-26 06:53:58 -08:00
Spandan Tiwari	06db5ad707	Provide names for operator nodes in ONNX exported graph. (#27342 ) Summary: The PyTorch exporter does not add any name to the ONNX operators in the exported graph. A common request is to add names to op nodes by default. This helps the readability of the graph in visualization tools such a Netron, or when the ONNX graph is printed as a string. Also, it helps with the debuggability of the ONNX graph. Therefore this PR adds name to operators in the exporters. The names follow a simple format, <op_type>_<index>. Expect files for tests in `test/onnx/test_operators.py` have been updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27342 Reviewed By: hl475 Differential Revision: D17790979 Pulled By: houseroad fbshipit-source-id: 1eaae88b5f51f152735a2ff96e22827837e34d9d	2019-11-26 06:53:53 -08:00
BowenBao	584be86c3f	Try exporting ONNX with force_outplace=False (#29466 ) Summary: This should resolve https://github.com/pytorch/pytorch/issues/29008. This flag has two effects on the tracer. - Remove the underscroll for inplace operators. E.g.: index_put_ ==> index_put. This is handled in utils.py separately as well. - Add out as input for backward computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29466 Reviewed By: hl475 Differential Revision: D18422815 Pulled By: houseroad fbshipit-source-id: 317b6a3c8a5751fe6fe49d7543e429d281ed0d6d	2019-11-26 06:53:49 -08:00
Raghuraman Krishnamoorthi	eccf42fd15	Bug fix: Handle missing keys in observer state dict during load (#30357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357 Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant. ghstack-source-id: 94468814 Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly. Differential Revision: D18668517 fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7	2019-11-26 06:53:45 -08:00
Ivan Kobzarev	ab5774547a	Add info about transitive dependencies in case of using local aars (#30128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30128 Preview: https://github.com/pytorch/pytorch/tree/gh/IvanKobzarev/23/head/android Based on users issue: https://discuss.pytorch.org/t/android-somethings-went-wrong-with-pytorch-android-1-4-0-snapshot/61009/3 Test Plan: Imported from OSS Differential Revision: D18702658 Pulled By: IvanKobzarev fbshipit-source-id: 14928baccd58ddbe633fad03038271d8333c4b49	2019-11-26 06:53:40 -08:00
Jonathan Reynolds	085dde5965	Fix for when PyTorch model trace has RecursiveScriptModules (#30430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30430 When a module isn't a TracedModule, attempt to get name information with `original_name` property on module and default to 'Module' when no such property exists. Test Plan: ### Change child module to scripted module: ``` model = torchvision.models.alexnet() model.classifier = torch.jit.script(model.classifier) ``` ### Add graph ``` w = SummaryWriter() w.add_graph(model, torch.rand((2, 3, 224, 224))) w.close() ``` ### No errors However, graph is disconnected at parts and hard to understand. {F223327878} Reviewed By: sanekmelnikov Differential Revision: D18690836 fbshipit-source-id: 42295d06b7c1d48d5401776dca1e0d12cd64b49d	2019-11-26 06:53:35 -08:00
Sebastian Messmer	8199596d7e	Add missing std::move (#30411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30411 - ghstack-source-id: 94526555 Test Plan: unit tests Differential Revision: D18690385 fbshipit-source-id: fd348c0887c279694c2f6d287b361c8e07f02ffb	2019-11-26 06:53:31 -08:00
Jerry Zhang	661a6c8ef2	Add `get_qparams` and revert the changes to `calculate_qparams` (#30262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262 `get_qparams` returns all parameters that's needed to call quantize function Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18645047 fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a	2019-11-26 06:53:26 -08:00
davidriazati	46e7f31fa3	Document unsupported types (#30344 ) Summary: This adds a listing of the parts of the `typing` module that are unsupported This is also a first pass decisions on features are 'unlikely to be implemented' vs 'not implemented' so they're open to discussion ](https://our.intern.facebook.com/intern/diff/18665628/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30344 Pulled By: driazati Differential Revision: D18665628 fbshipit-source-id: 22b8ebbde23df03839306cdb4344ca18a44f2c29	2019-11-26 06:53:22 -08:00
Zhang Zhi	ab2ec4d835	Fix inexistent parameter in document (#24335 ) Summary: There is no `out` argument to `argsort` according to the source code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24335 Differential Revision: D16829134 Pulled By: vincentqb fbshipit-source-id: 8f91154984cd4a753ba1d6105fb8a9bfa0da22b3	2019-11-26 06:53:17 -08:00
Jerry Zhang	0b71e7e1fd	Refactor QAT Conv module for better extensibility (#30362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30362 Right now the qat modules(qat.ConvBn2d, qat.ConvBnReLU2d, qat.Conv2d) are not convinent to support other dimensions of Conv, this PR refactors these modules so that we can support Conv1d/Conv3d better Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18691152 fbshipit-source-id: 5b561e6b054eadd31b98cabdf1ac67a61ee9b805	2019-11-26 06:53:12 -08:00
Lingyi Liu	b8f50d9cc8	Support to add dequant for each use of Value (#30145 ) Summary: In this PR, we mainly handle the case there are multiple usage of a Value when inserting the quant-dequant pair. This change will add one dequant for each usage of the Value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30145 Differential Revision: D18671600 Pulled By: lly-zero-one fbshipit-source-id: 61324a98861da85b80dcf7e930381311118ae53b	2019-11-25 14:52:58 -08:00
Gao, Xiang	25f4ba7c1b	Improve compare kernel (#29743 ) Summary: Currently, the way the compare kernels handle dtypes is very funny (this behavior is introduced in https://github.com/pytorch/pytorch/pull/28427 and I just realize it today): Let's say `a, b` are two float tensors on CUDA. If you do `a < b`, this is what would happen inside the loop: - Step 1: Fetch `a` and `b`, dynamically cast them from `float` to `float`. (i.e. check the scalar type to figure out if it needs cast. it doesn't. so do nothing then.) - Step 2: compute `a < b`, get a `bool` result - Step 3: statically cast the result into `float` - Step 3: do a dynamic cast of the result from `float` to `bool` and store the value And if you do `a.lt_(b)`, this is what would happen: - Step 1: Fetch `a` and `b`, no casting - Step 2: compute `a < b`, get a `bool` result - Step 3: statically cast the result into `float` - Step 4: store the result to memory, no casting Although dynamic casting happens on registers, it still hurt the performance a bit (~8%). This PR fixes this issue. Now for compare kernels, if the output is bool and inputs have the same dtype, then there is no dynamic casting. Otherwise, there will be dynamic casting for each input and output. That is, the dynamic casting behavior of the two cases described above are swapped. Benchmark on `a < b` for tensor of 1000000000 fp32 elements: Before https://github.com/pytorch/pytorch/issues/28427 6.35 ms Current master: 6.88 ms With this PR: 6.36 ms Benchmark on `a.lt_(b)` does not show any difference across versions. Besides this, what worries me most is, with type promotion, the logic for tensor iterator is becoming super complicated, and it is hard to see if one change causes the performance regression of others. I suggest we create scripts that could benchmark tensor iterator entirely, review that code and put it somewhere inside the repository (maybe under `/tools` or `/test/scripts`?), and whenever we are not certain about the performance we could run it to check. (I guess not on this PR but on PRs after the script is done. If there are worries about performance, the author of PRs should run the script manually, and the reviewer should remind PR author to do so if necessary) If this is a good idea, I will send a PR for the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29743 Differential Revision: D18671269 Pulled By: ngimel fbshipit-source-id: 89a9c1c8b5fd45d5ae8fe907d65c2fe1a7dfd2dc	2019-11-25 14:52:53 -08:00
Rohan Varma	5c6705e62c	add default arg for init_method (#30208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208 Adds default arg for init_method so users don't have to pass this in, and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs. ghstack-source-id: 94500475 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18630074 fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a	2019-11-25 14:52:48 -08:00
Sebastian Meßmer	d64e2581cc	Add list of supported XCode/CUDA versions to README Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30407 Differential Revision: D18689043 Pulled By: smessmer fbshipit-source-id: cd772451ef31356ed3045ebb1a9c4f5e5e91bb45	2019-11-25 14:52:42 -08:00
Sebastian Messmer	0517323dad	Update osx CI to XCode 9.4 / CUDA 10.0, cudnn 7.6.5 (#30359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30359 We need this for C++14 support ghstack-source-id: 94519850 Test Plan: unit tests Differential Revision: D18668868 fbshipit-source-id: 87e8eadf0e60a1699fba4524aea53b306b9a7f24	2019-11-25 14:52:37 -08:00
Xiaomeng Yang	c12f9a12a8	Fix quantized ConvReLU3d test (#30266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30266 Fix quantized ConvReLU3d test Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv" Reviewed By: hl475 Differential Revision: D18645717 fbshipit-source-id: bbe93f9daf5046f2aa05363efc7d0e59eaff37bf	2019-11-25 14:52:32 -08:00
Gregory Chanan	d7ac90e2ef	Stop binding std_single and var_single from TH; they aren't used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29951 Test Plan: Imported from OSS Differential Revision: D18548057 Pulled By: gchanan fbshipit-source-id: 0143f694517fa8229e53bd2bc636501804a3f80b	2019-11-25 14:52:27 -08:00
Gregory Chanan	0c67311878	Turn off scalar_check for set_(Storage, ...) (#29950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29950 The underlying code handles it correctly. Test Plan: Imported from OSS Differential Revision: D18548052 Pulled By: gchanan fbshipit-source-id: 88b737572c816fb0026ac5e66da7e3f4ab686773	2019-11-25 14:52:22 -08:00
Gregory Chanan	7160300638	Turn off scalar_check for reductions _th_max, _th_min. (#29949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29949 The underlying functions handle this already. Test Plan: Imported from OSS Differential Revision: D18548047 Pulled By: gchanan fbshipit-source-id: 123c9297db4e4315da9b1d996ac8b41aa1b4c7bc	2019-11-25 14:52:17 -08:00
Gregory Chanan	16606e1725	Turn off scalar_check for mode; the underlying code is correct. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29948 Test Plan: Imported from OSS Differential Revision: D18548053 Pulled By: gchanan fbshipit-source-id: 15cdfc24d3e5123497c72dc09c5e6b28cb5e1f88	2019-11-25 14:52:12 -08:00
Gregory Chanan	b8eba7aca9	Turn off scalar_check for ormqr. (#29947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29947 It requires > 0-dimensional tensors. Test Plan: Imported from OSS Differential Revision: D18548049 Pulled By: gchanan fbshipit-source-id: ce80a42515b59513a0e5ef2b32e2c2b90b4d64f5	2019-11-25 14:52:07 -08:00
Gregory Chanan	7c6cc1d6d4	Turn off scalar_checks for _th_multinomial_alias_draw. (#29946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29946 it requires > 0-dimensional tensors. Test Plan: Imported from OSS Differential Revision: D18548050 Pulled By: gchanan fbshipit-source-id: 4d1e3b53bd701137cc2cb674f95627a5e064a274	2019-11-25 14:52:02 -08:00
Gregory Chanan	6e88ddf352	Turn off scalar_check for _th_addmv and _th_eig as they can never pass. (#29945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29945 Both functions require at least 1 2-dimensional tensor, so can never return an inferred scalar. Test Plan: Imported from OSS Differential Revision: D18548056 Pulled By: gchanan fbshipit-source-id: f99a41d490b9a5ab5717534c92e4f2e848c743e8	2019-11-25 14:51:56 -08:00
Gregory Chanan	ce5f1a1b25	Turn off scalar_check for masked_select. (#29923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29923 Note that this changes the behavior of masked_select when both "self" and "mask" are 0-dimensional. In previous versions of PyTorch, this would return a 0-dimensional tensor. But the documentation reads: "Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor." Test Plan: Imported from OSS Differential Revision: D18539560 Pulled By: gchanan fbshipit-source-id: 1637ed2c434fcf8ceead0073aa610581f4a19d21	2019-11-25 14:51:51 -08:00
Gregory Chanan	0c9c62ba6e	Turn off scalar_checks for __and__ and clone. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29880 Test Plan: Imported from OSS Differential Revision: D18521732 Pulled By: gchanan fbshipit-source-id: 7fdf5d8a7b93b43ac32067222cb8df5e790900de	2019-11-25 14:51:46 -08:00
Gregory Chanan	94ad7544ae	Turn off scalar_check for __or__ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29879 Test Plan: Imported from OSS Differential Revision: D18521745 Pulled By: gchanan fbshipit-source-id: 93d17d5e9cad5dd6d2c20221d87408c838d74eca	2019-11-25 14:51:40 -08:00
Gregory Chanan	f994377d28	Turn off scalar_check for lshift, rshift. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29878 Test Plan: Imported from OSS Differential Revision: D18521746 Pulled By: gchanan fbshipit-source-id: 11fd7db79ac8ae76b1a5df25fb0ff59d81fcf394	2019-11-25 14:51:34 -08:00
Edward Yang	99a46b44ea	Use correct API macro in VariableHooksInterface. (#30320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30320 Fixes #30296 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18665704 Pulled By: ezyang fbshipit-source-id: f09a953137fcc105959382254f9b8886af5aea3b	2019-11-25 14:51:29 -08:00
Xingying Cheng	20dfae4099	Fix the crashes for c++ not able to find java class through Jni (#30390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30390 Fix the crashes for c++ not able to find java class through Jni ghstack-source-id: 94499644 Test Plan: buck install -r fb4a Reviewed By: ljk53 Differential Revision: D18667992 fbshipit-source-id: aa1b19c6dae39d46440f4a3e691054f7f8b1d42e	2019-11-25 14:51:23 -08:00
Sebastian Messmer	3990e9d1ca	Improve performance of LeftRight::read() (#30282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30282 The atomic increment/decrements in LeftRight::read() were measurable in perf benchmarks. Let's improve their perf. ghstack-source-id: 94443230 Test Plan: unit tests, perf benchmarks Differential Revision: D18650228 fbshipit-source-id: d184ce8288510ab178e7c7da73562609d1ca3c9f	2019-11-23 15:25:13 -08:00
Sebastian Messmer	0c7e4c1d62	backend fallback test (#29682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29682 This PR re-introduces backend_fallback_test.cpp, which was previously called boxed_fallback_test.cpp and showed how to use the backend fallback API. ghstack-source-id: 94481314 Test Plan: unit tests Differential Revision: D18462654 fbshipit-source-id: 3e9b5c8f35c05f9cd795f44a5fefd1a0aaf03509	2019-11-23 15:25:09 -08:00
Sebastian Messmer	959a849a23	better boxing (#29681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29681 Remove callUnboxedOnly() and instead use metaprogramming to figure out if an operator can use a boxed fallback or not. This enables boxed fallback for ops in native_functions.yaml even if they don't have `use_c10_dispatcher: full` set, as long as they're in the range of supported types. ghstack-source-id: 94481320 Test Plan: unit tests Differential Revision: D18462653 fbshipit-source-id: 2955e3c4949267520a1734a6a2b919ef5e9684a2	2019-11-23 15:25:05 -08:00
Sebastian Messmer	aa2862b843	Hide the OperatorKernel* argument from the stack based kernel API (#29337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337 This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it. But if a kernel is registered in a boxed way, we don't need it and should hide this from the API. This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does. Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so. ghstack-source-id: 94481316 Test Plan: unit tests Differential Revision: D18361991 fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492	2019-11-23 15:25:01 -08:00
Sebastian Messmer	afdc0bd4ec	OperatorHandle::callBoxed/callUnboxed (#29330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29330 This makes for a nicer API, especially in backend fallback kernels who get an OperatorHandle instance and can directly call these methods on it. ghstack-source-id: 94481322 Test Plan: unit tests stacked on top Differential Revision: D18357424 fbshipit-source-id: fa8c638335f246c906c8e16186507b4c486afb3f	2019-11-23 15:24:57 -08:00
Sebastian Messmer	fb8c17dde1	Test cases for backend fallback kernels (#29214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29214 - ghstack-source-id: 94481312 Test Plan: unit tests Differential Revision: D18329308 fbshipit-source-id: 1dbae401f2255c69ed16d436f891b9b60c333d81	2019-11-23 15:24:53 -08:00
Sebastian Messmer	583c288232	Add a OperatorHandle argument to boxed kernels (#29201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201 This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called. ghstack-source-id: 94481313 Test Plan: I will add unit tests in a diff stacked on top Differential Revision: D18282746 fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99	2019-11-23 15:24:49 -08:00
Sebastian Messmer	24aabe439a	Make Dispatcher::backendFallbackKernels_ an array (#30340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30340 We already made OperatorEntry::dispatchTable_ an array to be able to avoid the concurrency primitives there, but Dispatcher::backendFallbackKernels_ has the same issue. Let's make it a table too. Since there is some code duplication here, we also factor out the concept of a KernelFunctionTable to be used in both places. ghstack-source-id: 94481317 Test Plan: unit tests Differential Revision: D18663426 fbshipit-source-id: ba82ca5c4cae581eea359d5c0c3a5e23b0f8838c	2019-11-23 15:24:45 -08:00
Sebastian Messmer	7b5045be9d	Remove LeftRight from OperatorEntry and DispatchTable. (#30333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30333 re-export of https://github.com/pytorch/pytorch/pull/30328 ghstack-source-id: 94481321 Differential Revision: D18661518 fbshipit-source-id: 5a35a1ed2fae3b21a43614957a91d648c21bcca1	2019-11-23 15:24:41 -08:00
Sebastian Messmer	4aa692fc91	Convert KernelTable to a flat-indexed array rather than a hashtable. (#30332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30332 - ghstack-source-id: 94481315 Reviewed By: resistor Differential Revision: D18660421 fbshipit-source-id: 9f11434f1c3c234c45f586719182053fa81731f0	2019-11-23 15:24:37 -08:00
Chris Gottbrath	7c4b9042ab	Updates to quantization documentation (#30288 ) Summary: This pull request includes fixes for six quantization doc bugs. https://github.com/pytorch/pytorch/issues/30283 - Rendering issue on QConfig https://github.com/pytorch/pytorch/issues/26305 - Minor doc issue on fuse_modules() https://github.com/pytorch/pytorch/issues/27451 - Issues with ConvReLU2d, ConvReLU3d, and LinearReLU doc issues https://github.com/pytorch/pytorch/issues/26899 - Missing docstrings in torch.nn.intrinsic fused functions https://github.com/pytorch/pytorch/issues/29735 - add discussion of QNNPack to quantization doc page https://github.com/pytorch/pytorch/issues/27938 - some of the quantized functions lack documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/30288 Differential Revision: D18653368 Pulled By: gottbrath fbshipit-source-id: 410b3dd81ff10909a7f1a7736ca42d7cabf0beb1	2019-11-23 09:29:30 -08:00
Julia Kreutzer	7570b2798a	updating citation (#30267 ) Summary: NIPS -> NeurIPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/30267 Differential Revision: D18672928 Pulled By: soumith fbshipit-source-id: c20f26a0547f94ff39f8ee40e5f0ccc5fcc814af	2019-11-23 07:24:14 -08:00
Lingyi Liu	59ca9b7430	Graph-mode quantization for convolution from traced model (#30245 ) Summary: In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245 Differential Revision: D18671597 Pulled By: lly-zero-one fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848	2019-11-23 01:24:50 -08:00
davidriazati	2a7a39c1af	(de)serialization of values between C++ and Python (#30108 ) Summary: This PR updates `torch::pickle_save` to use the new zipfile format introduced in #29232 and adds `torch::pickle_load` which can decode the zipfile format. Now that `torch.save/load` use this format as well (if the `_use_new_zipfile_serialization` flag is `True`), raw values saved in Python can be loaded in C++ and vice versa. Fixes #20356 ](https://our.intern.facebook.com/intern/diff/18607087/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30108 Pulled By: driazati Differential Revision: D18607087 fbshipit-source-id: 067cdd5b1cf9c30ddc7e2e5021a8cceee62d8a14	2019-11-23 00:06:07 -08:00
Hector Yuen	ee20e66c48	replace the SLSRQ for their right emulations in the replayer test (#30367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30367 use the SLS emulations that match the hardware Test Plan: replayer test Differential Revision: D18667605 fbshipit-source-id: 89aee630184737b86ecfb09717437e5c7473e42c	2019-11-23 00:06:03 -08:00
Lingyi Liu	328ec5460f	refactor the observer removal and quantize tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30360 Differential Revision: D18670373 Pulled By: lly-zero-one fbshipit-source-id: 1481d6e4d5ce40376577b8deb0a0f74d5559076e	2019-11-22 21:25:23 -08:00
Shihao Xu	6a00191fc2	Add RpcAgent::getWorkerInfos() (#30241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30241 We need an API to get all worker infos. This will be used by backend-agnostic `rpc.wait_all_workers()` API. ghstack-source-id: 94454935 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_get_worker_infos buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_get_worker_infos ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_get_worker_infos buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_get_worker_infos ``` Differential Revision: D5693412 fbshipit-source-id: 5123c8248b6d44fd36b8a5f381dbabb2660e6f0f	2019-11-22 18:26:30 -08:00
Hongyi Jia	c7f988b8c6	transport open registration (#30167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30167 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29164 - Created GlooDeviceFactory to hide device creation details - Added transport option while on Python interface The reason of making the factory class is to make it easier to extend gloo transport in the future Test Plan: Imported from OSS Reviewed By: satgera, d4l3k Differential Revision: D18596527 fbshipit-source-id: e8114162ee8d841c0e0769315b48356b37d6ca0a	2019-11-22 17:41:52 -08:00
Sebastian Messmer	ac103a5d78	Remove variable wrapping from register_c10_ops (#29207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29207 The logic calling c10 ops from JIT did some variable wrapping to make sure all results are always variables. Thanks to ezyang, this is not needed anymore because everything is a variable now. ghstack-source-id: 93345590 Test Plan: waitforsandcastle Differential Revision: D18327507 fbshipit-source-id: 86512c5e19d6972d70f125feae172461c25e3cb6	2019-11-22 15:32:55 -08:00
Hao Lu	9fb879934e	Revert D18641413: add unit tests to iOS CI jobs Test Plan: revert-hammer Differential Revision: D18641413 Original commit changeset: 12942206f1de fbshipit-source-id: 4fa76d50fb897db4342d10a4e46a9887e37ef233	2019-11-22 15:24:27 -08:00
Chuan Jiang	6c9b188262	Support in-place update in IndexHashOp (#30275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30275 `IndexHash` did not support in-place update. Reviewed By: kennyhorror Differential Revision: D18612231 fbshipit-source-id: adeccdf1ceb6107454555ff9cdf66fd5e5773f2a	2019-11-22 14:49:28 -08:00
Richard Zou	99a2a0b1ca	Implement torch.diagonal for named tensors (#30193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30193 Featuring: - Added a NoNamesGuard::reset() function that sets NamesMode back to what it was before the guard. This makes it so that we don't have to create a new context to run code in an unnamed way. - Added a diagonal(Tensor, *, Dimname outdim, Dimname dim1, Dimname dim2, int64_t offset=0) overload. All of the non-tensor arguments are keyword only for readability purposes; something like `tensor.diagonal("A", "B", "C")` would be really confusing. Test Plan: - Added new tests Differential Revision: D18638363 Pulled By: zou3519 fbshipit-source-id: ea37b52a19535f84a69be38e95e569e88f307381	2019-11-22 14:49:23 -08:00
Brian Vaughan	2e709763a3	add wrapper to exclude XLA when running device tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30316 Test Plan: Imported from OSS Differential Revision: D18659286 Pulled By: nairbv fbshipit-source-id: 86d035bb0c54c612868590c3188cfcd969c3f686	2019-11-22 13:04:59 -08:00
David Riazati	8c6f0c0587	Detect TorchScript archives in torch.load (#29339 ) Summary: This PR looks for a `constants.pkl` file at the top level in a zip file in `torch.load`. If found, it calls `torch.jit.load` instead and issues a warning to call `torch.jit.load` directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/29339 Differential Revision: D18611095 Pulled By: driazati fbshipit-source-id: f070a02f6b5509054fc3876b3e8356bbbcc183e1	2019-11-22 12:30:30 -08:00
David Reiss	90cb1e67ff	Fix exception message in Java Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30205 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D18653568 Pulled By: dreiss fbshipit-source-id: a5fcb809eba641a7fbd0e99e835eceeb248e680c	2019-11-22 12:04:49 -08:00
Chunli Fu	0c18de2623	Add inferBoundShapeOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30101 Reviewed By: ipiszy Differential Revision: D18387803 fbshipit-source-id: 5edb6b949257370b62fa6da477bd6ed2f16a9bd1	2019-11-22 12:04:45 -08:00
David Reiss	35e6c1763e	Switch Docker image onda-cuda-cxx11-ubuntu1604 to new uniform name (#29943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29943 This was apparently the same as "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest", so standardize on that name. Test Plan: This PR, which is stacked on top of a commit that puts one of the jobs using that container into the set of PR builds. Imported from OSS Differential Revision: D18653554 fbshipit-source-id: 40e6c52db02265d61e8166bb1211376faccfc53a	2019-11-22 11:39:55 -08:00

2723 changed files with 152809 additions and 55755 deletions

									
										1

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -39,6 +39,7 @@ LINUX_PACKAGE_VARIANTS = OrderedDict(

				        "3.5m",

				        "3.6m",

				        "3.7m",

				        "3.8m",

				    ],

				    conda=dimensions.STANDARD_PYTHON_VERSIONS,

				    libtorch=[

									
										20

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -24,7 +24,7 @@ class Conf(object):

				    def gen_docker_image(self):

				        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':

				            return miniutils.quote("pytorch/conda-cuda-cxx11-ubuntu1604:latest")

				            return miniutils.quote("pytorch/pytorch-binary-docker-image-ubuntu16.04:latest")

				        docker_word_substitution = {

				            "manywheel": "manylinux",

				@ -33,11 +33,9 @@ class Conf(object):

				        docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)

				        # The cpu nightlies are built on the pytorch/manylinux-cuda100 docker image

				        alt_docker_suffix = self.cuda_version or "100"

				        # The cpu nightlies are built on the pytorch/manylinux-cuda102 docker image

				        alt_docker_suffix = self.cuda_version or "102"

				        docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix

				        if self.cuda_version == "101":

				            return "soumith/manylinux-cuda101@sha256:5d62be90d5b7777121180e6137c7eed73d37aaf9f669c51b783611e37e0b4916"

				        return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				    def get_name_prefix(self):

				@ -69,7 +67,17 @@ class Conf(object):

				            job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")

				            job_def["filters"] = {"branches": {"only": "postnightly"}}

				        else:

				            job_def["filters"] = {"branches": {"only": "nightly"}}

				            job_def["filters"] = {

				                "branches": {

				                    "only": "nightly"

				                },

				                # Will run on tags like v1.5.0-rc1, etc.

				                "tags": {

				                    # Using a raw string here to avoid having to escape

				                    # anything

				                    "only": r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"

				                }

				            }

				        if self.libtorch_variant:

				            job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)

				        if phase == "test":

									
										15

.circleci/cimodel/data/caffe2_build_data.py
									
												View File
												
				@ -5,7 +5,9 @@ from cimodel.lib.conf_tree import Ver

				CONFIG_TREE_DATA = [

				    (Ver("ubuntu", "16.04"), [

				        ([Ver("gcc", "5")], [XImportant("onnx_py2")]),

				        ([Ver("clang", "7")], [XImportant("onnx_py3.6")]),

				        ([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),

				                               XImportant("onnx_ort1_py3.6"),

				                               XImportant("onnx_ort2_py3.6")]),

				    ]),

				]

				@ -27,7 +29,9 @@ class TreeConfigNode(ConfigNode):

				        return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]

				    def is_build_only(self):

				        if str(self.find_prop("language_version")) == "onnx_py3.6":

				        if str(self.find_prop("language_version")) == "onnx_main_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort2_py3.6":

				            return False

				        return set(str(c) for c in self.find_prop("compiler_version")).intersection({

				            "clang3.8",

				@ -36,6 +40,12 @@ class TreeConfigNode(ConfigNode):

				            "android",

				        }) or self.find_prop("distro_version").name == "macos"

				    def is_test_only(self):

				        if str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort2_py3.6":

				            return True

				        return False

				class TopLevelNode(TreeConfigNode):

				    def __init__(self, node_name, subtree):

				@ -68,6 +78,7 @@ class LanguageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["language_version"] = node_name

				        self.props["build_only"] = self.is_build_only()

				        self.props["test_only"] = self.is_test_only()

				    def child_constructor(self):

				        return ImportantConfigNode

									
										26

.circleci/cimodel/data/caffe2_build_definitions.py
									
												View File
												
				@ -12,7 +12,7 @@ from dataclasses import dataclass

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"

				DOCKER_IMAGE_VERSION = 345

				DOCKER_IMAGE_VERSION = "345"

				@dataclass

				@ -23,6 +23,7 @@ class Conf:

				    # for gpu files and host compiler (gcc/clang) for cpu files)

				    compilers: [Ver]

				    build_only: bool

				    test_only: bool

				    is_important: bool

				    @property

				@ -33,7 +34,9 @@ class Conf:

				    def get_cudnn_insertion(self):

				        omit = self.language == "onnx_py2" \

				            or self.language == "onnx_py3.6" \

				            or self.language == "onnx_main_py3.6" \

				            or self.language == "onnx_ort1_py3.6" \

				            or self.language == "onnx_ort2_py3.6" \

				            or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \

				            or str(self.distro) in ["ubuntu14.04", "macos10.13"]

				@ -50,6 +53,13 @@ class Conf:

				    def construct_phase_name(self, phase):

				        root_parts = self.get_build_name_root_parts()

				        build_name_substitutions = {

				            "onnx_ort1_py3.6": "onnx_main_py3.6",

				            "onnx_ort2_py3.6": "onnx_main_py3.6",

				        }

				        if phase == "build":

				            root_parts = [miniutils.override(r, build_name_substitutions) for r in root_parts]

				        return "_".join(root_parts + [phase]).replace(".", "_")

				    def get_platform(self):

				@ -62,7 +72,9 @@ class Conf:

				        lang_substitutions = {

				            "onnx_py2": "py2",

				            "onnx_py3.6": "py3.6",

				            "onnx_main_py3.6": "py3.6",

				            "onnx_ort1_py3.6": "py3.6",

				            "onnx_ort2_py3.6": "py3.6",

				            "cmake": "py2",

				        }

				@ -74,7 +86,9 @@ class Conf:

				        parameters = OrderedDict()

				        lang_substitutions = {

				            "onnx_py2": "onnx-py2",

				            "onnx_py3.6": "onnx-py3.6",

				            "onnx_main_py3.6": "onnx-main-py3.6",

				            "onnx_ort1_py3.6": "onnx-ort1-py3.6",

				            "onnx_ort2_py3.6": "onnx-ort2-py3.6",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				@ -136,6 +150,7 @@ def instantiate_configs():

				            distro=fc.find_prop("distro_version"),

				            compilers=fc.find_prop("compiler_version"),

				            build_only=fc.find_prop("build_only"),

				            test_only=fc.find_prop("test_only"),

				            is_important=fc.find_prop("important"),

				        )

				@ -150,10 +165,11 @@ def get_workflow_jobs():

				    x = []

				    for conf_options in configs:

				        phases = ["build"]

				        if not conf_options.build_only:

				            phases = dimensions.PHASES

				        if conf_options.test_only:

				            phases = ["test"]

				        for phase in phases:

				            x.append(conf_options.gen_workflow_job(phase))

									
										3

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -3,8 +3,8 @@ PHASES = ["build", "test"]

				CUDA_VERSIONS = [

				    None,  # cpu build

				    "92",

				    "100",

				    "101",

				    "102",

				]

				STANDARD_PYTHON_VERSIONS = [

				@ -12,4 +12,5 @@ STANDARD_PYTHON_VERSIONS = [

				    "3.5",

				    "3.6",

				    "3.7",

				    "3.8"

				]

									
										26

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -4,17 +4,15 @@ from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				CONFIG_TREE_DATA = [

				    ("xenial", [

				        (None, [

				            XImportant("2.7.9"),

				            X("2.7"),

				            XImportant("3.5"),  # Not run on all PRs, but should be included on [test all]

				            X("3.5"),

				            X("nightly"),

				        ]),

				        ("gcc", [

				            ("5.4", [  # All this subtree rebases to master and then build

				                XImportant("3.6"),

				                ("3.6", [

				                    ("parallel_tbb", [XImportant(True)]),

				                    ("parallel_native", [XImportant(True)]),

				                    ("parallel_tbb", [X(True)]),

				                    ("parallel_native", [X(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				@ -24,11 +22,11 @@ CONFIG_TREE_DATA = [

				            ("5", [

				                XImportant("3.6"),  # This is actually the ASAN build

				            ]),

				            # ("7", [

				            #     ("3.6", [

				            #         ("xla", [XImportant(True)]),

				            #     ]),

				            # ]),

				            ("7", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("9", [

				@ -39,14 +37,16 @@ CONFIG_TREE_DATA = [

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153

				                # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)

				                X("3.6"),

				            ]),

				            ("9.2", [X("3.6")]),

				            ("10.1", [X("3.6")]),

				            ("10.2", [

				                XImportant("3.6"),

				                ("3.6", [

				                    ("libtorch", [XImportant(True)])

				                ]),

				            ]),

				            ("9.2", [X("3.6")]),

				            ("10", [X("3.6")]),

				            ("10.1", [X("3.6")]),

				        ]),

				        ("android", [

				            ("r19c", [

									
										23

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -13,7 +13,7 @@ DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"

				# ARE YOU EDITING THIS NUMBER?  MAKE SURE YOU READ THE GUIDANCE AT THE

				# TOP OF .circleci/config.yml

				DOCKER_IMAGE_VERSION = 405

				DOCKER_IMAGE_VERSION = "f990c76a-a798-42bb-852f-5be5006f8026"

				@dataclass

				@ -160,6 +160,11 @@ def gen_dependent_configs(xenial_parent_config):

				        configs.append(c)

				    return configs

				def gen_docs_configs(xenial_parent_config):

				    configs = []

				    for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push"]:

				        configs.append(HiddenConf(x, parent_build=xenial_parent_config))

				@ -210,7 +215,6 @@ def instantiate_configs():

				            android_abi = fc.find_prop("android_abi")

				            parms_list_ignored_for_docker_image.append(android_abi)

				            restrict_phases = ["build"]

				            fc.props["is_important"] = True

				        elif compiler_name:

				            gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")

				@ -222,7 +226,7 @@ def instantiate_configs():

				                python_version = fc.find_prop("pyver")

				                parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if cuda_version in ["9.2", "10", "10.1"]:

				        if cuda_version in ["9.2", "10", "10.1", "10.2"]:

				            # TODO The gcc version is orthogonal to CUDA version?

				            parms_list.append("gcc7")

				@ -248,7 +252,16 @@ def instantiate_configs():

				            parallel_backend=parallel_backend,

				        )

				        if cuda_version == "9" and python_version == "3.6" and not is_libtorch:

				        # run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds

				        # should run on a CPU-only build that runs on all PRs.

				        if distro_name == 'xenial' and fc.find_prop("pyver") == '3.6' \

				                and cuda_version is None \

				                and parallel_backend is None \

				                and compiler_name == 'gcc' \

				                and fc.find_prop('compiler_version') == '5.4':

				            c.dependent_tests = gen_docs_configs(c)

				        if cuda_version == "10.1" and python_version == "3.6" and not is_libtorch:

				            c.dependent_tests = gen_dependent_configs(c)

				        if (compiler_name == "gcc"

				@ -275,7 +288,7 @@ def get_workflow_jobs():

				    config_list = instantiate_configs()

				    x = ["setup"]

				    x = []

				    for conf_options in config_list:

				        phases = conf_options.restrict_phases or dimensions.PHASES

3649

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										40

.circleci/docker/build.sh
									
												View File
												
				@ -27,6 +27,8 @@ elif [[ "$image" == *-bionic* ]]; then

				  UBUNTU_VERSION=18.04

				fi

				TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"

				# It's annoying to rename jobs every time you want to rewrite a

				# configuration, so we hardcode everything here rather than do it

				# from scratch

				@ -54,6 +56,13 @@ case "$image" in

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.8)

				    # TODO: This is a hack, get rid of this as soon as you get rid of the travis downloads

				    TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64"

				    TRAVIS_PYTHON_VERSION=3.8

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc4.8)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=4.8

				@ -67,6 +76,7 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.6

				@ -87,22 +97,6 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda8-cudnn7-py2)

				    CUDA_VERSION=8.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=2.7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda8-cudnn7-py3)

				    CUDA_VERSION=8.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda9-cudnn7-py2)

				    CUDA_VERSION=9.0

				    CUDNN_VERSION=7

				@ -118,7 +112,6 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=9.2

				@ -146,6 +139,17 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				@ -157,6 +161,7 @@ case "$image" in

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ANDROID=yes

				    ANDROID_NDK_VERSION=r19c

				@ -184,6 +189,7 @@ tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"

				# Build image

				docker build \

				       --no-cache \

				       --build-arg "TRAVIS_DL_URL_PREFIX=${TRAVIS_DL_URL_PREFIX}" \

				       --build-arg "BUILD_ENVIRONMENT=${image}" \

				       --build-arg "PROTOBUF=${PROTOBUF:-}" \

				       --build-arg "THRIFT=${THRIFT:-}" \

									
										2

.circleci/docker/common/install_android.sh
									
												View File
												
				@ -10,7 +10,7 @@ apt-get autoclean && apt-get clean

				rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				pushd /tmp

				curl -Os https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				curl -Os --retry 3 https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				popd

				_ndk_dir=/opt/ndk

				mkdir -p "$_ndk_dir"

									
										2

.circleci/docker/common/install_cache.sh
									
												View File
												
				@ -8,7 +8,7 @@ sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment

				export PATH="/opt/cache/bin:$PATH"

				# Setup compiler cache

				curl https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				chmod a+x /opt/cache/bin/sccache

				function write_sccache_stub() {

									
										2

.circleci/docker/common/install_cmake.sh
									
												View File
												
				@ -10,7 +10,7 @@ file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

				# Download and install specific CMake version in /usr/local

				pushd /tmp

				curl -Os "https://cmake.org/files/${path}/${file}"

				curl -Os --retry 3 "https://cmake.org/files/${path}/${file}"

				tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz

				rm -f cmake-*.tar.gz

				popd

									
										6

.circleci/docker/common/install_conda.sh
									
												View File
												
				@ -65,9 +65,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # DO NOT install cmake here as it would install a version newer than 3.5, but

				  # we want to pin to version 3.5.

				  conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six

				  if [[ "$CUDA_VERSION" == 8.0* ]]; then

				    conda_install magma-cuda80 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.0* ]]; then

				  if [[ "$CUDA_VERSION" == 9.0* ]]; then

				    conda_install magma-cuda90 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.1* ]]; then

				    conda_install magma-cuda91 -c pytorch

				@ -88,7 +86,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # scikit-learn is pinned because of

				  # https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5

				  # only)

				  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.43.1 llvmlite==0.28.0

				  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.46.0 llvmlite==0.30.0

				  popd

				fi

									
										5

.circleci/docker/common/install_travis_python.sh
									
												View File
												
				@ -14,7 +14,7 @@ if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  # Download Python binary from Travis

				  pushd tmp

				  as_jenkins wget --quiet https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64/python-$TRAVIS_PYTHON_VERSION.tar.bz2

				  as_jenkins wget --quiet ${TRAVIS_DL_URL_PREFIX}/python-$TRAVIS_PYTHON_VERSION.tar.bz2

				  # NB: The tarball also comes with /home/travis virtualenv that we

				  # don't care about.  (Maybe we should, but we've worked around the

				  # "how do I install to python" issue by making this entire directory

				@ -88,6 +88,9 @@ if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  # Install psutil for dataloader tests

				  as_jenkins pip install psutil

				  # Install dill for serialization tests

				  as_jenkins pip install "dill>=0.3.1"

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

									
										1

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -46,6 +46,7 @@ RUN bash ./install_gcc.sh && rm install_gcc.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ARG TRAVIS_DL_URL_PREFIX

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				ADD ./common/install_travis_python.sh install_travis_python.sh

				RUN bash ./install_travis_python.sh && rm install_travis_python.sh

									
										13

.circleci/ecr_gc_docker/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				FROM ubuntu:16.04

				RUN apt-get update && apt-get install -y python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

				ADD requirements.txt /requirements.txt

				RUN pip install -r /requirements.txt

				ADD gc.py /usr/bin/gc.py

				ADD docker_hub.py /usr/bin/docker_hub.py

				ENTRYPOINT ["/usr/bin/gc.py"]

									
										125

.circleci/ecr_gc_docker/docker_hub.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,125 @@

				#!/usr/bin/env python

				from collections import namedtuple

				import boto3

				import requests

				import os

				IMAGE_INFO = namedtuple(

				    "IMAGE_INFO", ("repo", "tag", "size", "last_updated_at", "last_updated_by")

				)

				def build_access_token(username, passwordtr):

				    r = requests.post(

				        "https://hub.docker.com/v2/users/login/",

				        data={"username": username, "password": password},

				    )

				    r.raise_for_status()

				    token = r.json().get("token")

				    return {"Authorization": "JWT " + token}

				def list_repos(user, token):

				    r = requests.get("https://hub.docker.com/v2/repositories/" + user, headers=token)

				    r.raise_for_status()

				    ret = sorted(

				        repo["user"] + "/" + repo["name"] for repo in r.json().get("results", [])

				    )

				    if ret:

				        print("repos found:")

				        print("".join("\n\t" + r for r in ret))

				    return ret

				def list_tags(repo, token):

				    r = requests.get(

				        "https://hub.docker.com/v2/repositories/" + repo + "/tags", headers=token

				    )

				    r.raise_for_status()

				    return [

				        IMAGE_INFO(

				            repo=repo,

				            tag=t["name"],

				            size=t["full_size"],

				            last_updated_at=t["last_updated"],

				            last_updated_by=t["last_updater_username"],

				        )

				        for t in r.json().get("results", [])

				    ]

				def save_to_s3(tags):

				    table_content = ""

				    client = boto3.client("s3")

				    for t in tags:

				        table_content += (

				            "<tr><td>{repo}</td><td>{tag}</td><td>{size}</td>"

				            "<td>{last_updated_at}</td><td>{last_updated_by}</td></tr>"

				        ).format(

				            repo=t.repo,

				            tag=t.tag,

				            size=t.size,

				            last_updated_at=t.last_updated_at,

				            last_updated_by=t.last_updated_by,

				        )

				    html_body = """

				    <html>

				        <head>

				            <link rel="stylesheet"

				                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"

				                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"

				                crossorigin="anonymous">

				            <link rel="stylesheet" type="text/css"

				                href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">

				            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">

				            </script>

				            <script type="text/javascript" charset="utf8"

				                src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>

				            <title> docker image info</title>

				        </head>

				        <body>

				            <table class="table table-striped table-hover" id="docker">

				            <caption>Docker images on docker hub</caption>

				            <thead class="thead-dark">

				                <tr>

				                <th scope="col">repo</th>

				                <th scope="col">tag</th>

				                <th scope="col">size</th>

				                <th scope="col">last_updated_at</th>

				                <th scope="col">last_updated_by</th>

				                </tr>

				            </thead>

				            <tbody>

				                {table_content}

				            </tbody>

				            </table>

				        </body>

				        <script>

				            $(document).ready( function () {{

				                $('#docker').DataTable({{paging: false}});

				            }} );py

				        </script>

				    </html>

				    """.format(

				        table_content=table_content

				    )

				    client.put_object(

				        Bucket="docker.pytorch.org",

				        ACL="public-read",

				        Key="docker_hub.html",

				        Body=html_body,

				        ContentType="text/html",

				    )

				if __name__ == "__main__":

				    username = os.environ.get("DOCKER_HUB_USERNAME")

				    password = os.environ.get("DOCKER_HUB_PASSWORD")

				    token = build_access_token(username, password)

				    tags = []

				    for repo in list_repos("pytorch", token):

				        tags.extend(list_tags(repo, token))

				    save_to_s3(tags)

									
										202

.circleci/ecr_gc_docker/gc.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,202 @@

				#!/usr/bin/env python

				import argparse

				import datetime

				import boto3

				import pytz

				import sys

				def save_to_s3(project, data):

				    table_content = ""

				    client = boto3.client("s3")

				    for repo, tag, window, age, pushed in data:

				        table_content += "<tr><td>{repo}</td><td>{tag}</td><td>{window}</td><td>{age}</td><td>{pushed}</td></tr>".format(

				            repo=repo, tag=tag, window=window, age=age, pushed=pushed

				        )

				    html_body = """

				    <html>

				        <head>

				            <link rel="stylesheet"

				                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"

				                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"

				                crossorigin="anonymous">

				            <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">

				            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>

				            <script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>

				            <title>{project} nightly and permanent docker image info</title>

				        </head>

				        <body>

				            <table class="table table-striped table-hover" id="docker">

				            <thead class="thead-dark">

				                <tr>

				                <th scope="col">repo</th>

				                <th scope="col">tag</th>

				                <th scope="col">keep window</th>

				                <th scope="col">age</th>

				                <th scope="col">pushed at</th>

				                </tr>

				            </thead>

				            <tbody>

				                {table_content}

				            </tbody>

				            </table>

				        </body>

				        <script>

				            $(document).ready( function () {{

				                $('#docker').DataTable({{paging: false}});

				            }} );

				        </script>

				    </html>

				    """.format(

				        project=project, table_content=table_content

				    )

				    # for pytorch, file can be found at

				    # http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html

				    # and later one we can config docker.pytorch.org to point to the location

				    client.put_object(

				        Bucket="docker.pytorch.org",

				        ACL="public-read",

				        Key="{project}.html".format(project=project),

				        Body=html_body,

				        ContentType="text/html",

				    )

				def repos(client):

				    paginator = client.get_paginator("describe_repositories")

				    pages = paginator.paginate(registryId="308535385114")

				    for page in pages:

				        for repo in page["repositories"]:

				            yield repo

				def images(client, repository):

				    paginator = client.get_paginator("describe_images")

				    pages = paginator.paginate(

				        registryId="308535385114", repositoryName=repository["repositoryName"]

				    )

				    for page in pages:

				        for image in page["imageDetails"]:

				            yield image

				parser = argparse.ArgumentParser(description="Delete old Docker tags from registry")

				parser.add_argument(

				    "--dry-run", action="store_true", help="Dry run; print tags that would be deleted"

				)

				parser.add_argument(

				    "--keep-stable-days",

				    type=int,

				    default=14,

				    help="Days of stable Docker tags to keep (non per-build images)",

				)

				parser.add_argument(

				    "--keep-unstable-days",

				    type=int,

				    default=1,

				    help="Days of unstable Docker tags to keep (per-build images)",

				)

				parser.add_argument(

				    "--filter-prefix",

				    type=str,

				    default="",

				    help="Only run cleanup for repositories with this prefix",

				)

				parser.add_argument(

				    "--ignore-tags",

				    type=str,

				    default="",

				    help="Never cleanup these tags (comma separated)",

				)

				args = parser.parse_args()

				if not args.ignore_tags or not args.filter_prefix:

				    print(

				        """

				Missing required arguments --ignore-tags and --filter-prefix

				You must specify --ignore-tags and --filter-prefix to avoid accidentally

				pruning a stable Docker tag which is being actively used.  This will

				make you VERY SAD.  So pay attention.

				First, which filter-prefix do you want?  The list of valid prefixes

				is in jobs/private.groovy under the 'docker-registry-cleanup' job.

				You probably want either pytorch or caffe2.

				Second, which ignore-tags do you want?  It should be whatever the most

				up-to-date DockerVersion for the repository in question is.  Follow

				the imports of jobs/pytorch.groovy to find them.

				"""

				    )

				    sys.exit(1)

				client = boto3.client("ecr", region_name="us-east-1")

				stable_window = datetime.timedelta(days=args.keep_stable_days)

				unstable_window = datetime.timedelta(days=args.keep_unstable_days)

				now = datetime.datetime.now(pytz.UTC)

				ignore_tags = args.ignore_tags.split(",")

				def chunks(chunkable, n):

				    """ Yield successive n-sized chunks from l.

				    """

				    for i in range(0, len(chunkable), n):

				        yield chunkable[i : i + n]

				stable_window_tags = []

				for repo in repos(client):

				    repositoryName = repo["repositoryName"]

				    if not repositoryName.startswith(args.filter_prefix):

				        continue

				    # Keep list of image digests to delete for this repository

				    digest_to_delete = []

				    print(repositoryName)

				    for image in images(client, repo):

				        tags = image.get("imageTags")

				        if not isinstance(tags, (list,)) or len(tags) == 0:

				            continue

				        tag = tags[0]

				        created = image["imagePushedAt"]

				        age = now - created

				        # new images build on circle ci use workflow ID as tag, which has 4 "-"

				        if tag.isdigit() or tag.count("-") == 4 or tag in ignore_tags:

				            window = stable_window

				            if tag in ignore_tags:

				                stable_window_tags.append((repositoryName, tag, "", age, created))

				            elif age < window:

				                stable_window_tags.append((repositoryName, tag, window, age, created))

				        else:

				            window = unstable_window

				        if tag in ignore_tags:

				            print("Ignoring tag {} (age: {})".format(tag, age))

				            continue

				        if age < window:

				            print("Not deleting manifest for tag {} (age: {})".format(tag, age))

				            continue

				        if args.dry_run:

				            print("(dry run) Deleting manifest for tag {} (age: {})".format(tag, age))

				        else:

				            print("Deleting manifest for tag {} (age: {})".format(tag, age))

				            digest_to_delete.append(image["imageDigest"])

				    # Issue batch delete for all images to delete for this repository

				    # Note that as of 2018-07-25, the maximum number of images you can

				    # delete in a single batch is 100, so chunk our list into batches of

				    # 100

				    for c in chunks(digest_to_delete, 100):

				        client.batch_delete_image(

				            registryId="308535385114",

				            repositoryName=repositoryName,

				            imageIds=[{"imageDigest": digest} for digest in c],

				        )

				    save_to_s3(args.filter_prefix, stable_window_tags)

3

.circleci/ecr_gc_docker/requirements.txt Normal file

View File

 @ -0,0 +1,3 @@
 boto3
 pytz
 requests

									
										9

.circleci/generate_config_yml.py
									
												View File
												
				@ -88,8 +88,11 @@ YAML_SOURCES = [

				    File("job-specs-custom.yml"),

				    File("binary_update_htmls.yml"),

				    File("binary-build-tests.yml"),

				    File("docker_build_job.yml"),

				    File("docker_jobs.yml"),

				    File("workflows.yml"),

				    File("workflows-setup-job.yml"),

				    File("windows-build-test.yml"),

				    Listgen(pytorch_build_definitions.get_workflow_jobs, 3),

				    File("workflows-pytorch-macos-builds.yml"),

				    File("workflows-pytorch-android-gradle-build.yml"),

				@ -102,12 +105,14 @@ YAML_SOURCES = [

				    Listgen(binary_build_definitions.get_binary_build_jobs, 3),

				    File("workflows-nightly-ios-binary-builds.yml"),

				    File("workflows-nightly-android-binary-builds.yml"),

				    Header("Nightly tests"),

				    Listgen(binary_build_definitions.get_nightly_tests, 3),

				    File("workflows-nightly-uploads-header.yml"),

				    Listgen(binary_build_definitions.get_nightly_uploads, 3),

				    File("workflows-s3-html.yml"),

				    File("workflows-docker-builder.yml")

				    File("workflows-docker-builder.yml"),

				    File("workflows-ecr-gc.yml"),

				]

									
										12

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -1,5 +1,11 @@

				#!/bin/bash

				set -eux -o pipefail

				retry () {

				    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				}

				# This step runs on multiple executors with different envfile locations

				if [[ "$(uname)" == Darwin ]]; then

				  # macos executor (builds and tests)

				@ -17,7 +23,7 @@ export PYTORCH_ROOT="$workdir/pytorch"

				export BUILDER_ROOT="$workdir/builder"

				# Clone the Pytorch branch

				git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

				retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

				pushd "$PYTORCH_ROOT"

				if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then

				  # "smoke" binary build on PRs

				@ -33,13 +39,13 @@ else

				  echo "Can't tell what to checkout"

				  exit 1

				fi

				git submodule update --init --recursive --quiet

				retry git submodule update --init --recursive

				echo "Using Pytorch from "

				git --no-pager log --max-count 1

				popd

				# Clone the Builder master repo

				git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				echo "Using builder from "

				git --no-pager log --max-count 1

									
										4

.circleci/scripts/binary_install_miniconda.sh
									
												View File
												
				@ -31,9 +31,9 @@ fi

				conda_sh="$workdir/install_miniconda.sh"

				if [[ "$(uname)" == Darwin ]]; then

				  retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				  curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				else

				  retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

				  curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

				fi

				chmod +x "$conda_sh"

				"$conda_sh" -b -p "$MINICONDA_ROOT"

									
										16

.circleci/scripts/binary_ios_build.sh
									
												View File
												
				@ -5,20 +5,24 @@ echo ""

				echo "DIR: $(pwd)"

				WORKSPACE=/Users/distiller/workspace

				PROJ_ROOT=/Users/distiller/project

				export TCLLIBPATH="/usr/local/lib" 

				export TCLLIBPATH="/usr/local/lib"

				# Install conda

				curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x ~/Downloads/conda.sh

				/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda

				curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x ~/conda.sh

				/bin/bash ~/conda.sh -b -p ~/anaconda

				export PATH="~/anaconda/bin:${PATH}"

				source ~/anaconda/bin/activate

				# Install dependencies

				conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				# sync submodules

				cd ${PROJ_ROOT}

				git submodule sync

				git submodule update --init --recursive

				# run build script

				chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				@ -26,13 +30,13 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				echo "IOS_ARCH: ${IOS_ARCH}"

				echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				export BUILD_PYTORCH_MOBILE=1

				export IOS_ARCH=${IOS_ARCH}

				export IOS_PLATFORM=${IOS_PLATFORM}

				unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				#store the binary

				cd ${WORKSPACE}

				DEST_DIR=${WORKSPACE}/ios

				mkdir -p ${DEST_DIR}

				cp -R ${PROJ_ROOT}/build_ios/install ${DEST_DIR}

				mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

				mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

									
										8

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -14,11 +14,13 @@ mkdir -p ${ZIP_DIR}/src

				cp -R ${ARTIFACTS_DIR}/arm64/include ${ZIP_DIR}/install/

				# build a FAT bianry

				cd ${ZIP_DIR}/install/lib

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch.a)

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch_cpu.a libtorch.a libXNNPACK.a)

				for lib in ${target_libs[*]}

				do

				    libs=(${ARTIFACTS_DIR}/x86_64/lib/${lib} ${ARTIFACTS_DIR}/arm64/lib/${lib})

				    lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}

				    if [ -f "${ARTIFACTS_DIR}/x86_64/lib/${lib}" ] && [ -f "${ARTIFACTS_DIR}/arm64/lib/${lib}" ]; then

				        libs=("${ARTIFACTS_DIR}/x86_64/lib/${lib}" "${ARTIFACTS_DIR}/arm64/lib/${lib}")

				        lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}

				    fi

				done

				# for nnpack, we only support arm64 build

				cp ${ARTIFACTS_DIR}/arm64/lib/libnnpack.a ./

									
										14

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -11,11 +11,15 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then

				  source activate testenv >/dev/null

				elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then

				  export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"

				elif [[ "$DESIRED_PYTHON" == 3.8m ]]; then

				  export PATH="/opt/python/cp38-cp38/bin:\$PATH"

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				  export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"

				  python_path="/opt/python/cp\$python_nodot-cp\${python_nodot}"

				  # Prior to Python 3.8 paths were suffixed with an 'm'

				  if [[ -d  "\${python_path}/bin" ]]; then

				    export PATH="\${python_path}/bin:\$PATH"

				  elif [[ -d "\${python_path}m/bin" ]]; then

				    export PATH="\${python_path}m/bin:\$PATH"

				  fi

				fi

				# Install the package

				@ -28,11 +32,11 @@ pkg="/final_pkgs/\$(ls /final_pkgs)"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "\$pkg" --offline

				  if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				    conda install -y cpuonly -c pytorch

				    retry conda install -y cpuonly -c pytorch

				  fi

				  retry conda install -yq future numpy protobuf six

				  if [[ "$DESIRED_CUDA" != 'cpu' ]]; then

				    # DESIRED_CUDA is in format cu90 or cu100

				    # DESIRED_CUDA is in format cu90 or cu102

				    if [[ "${#DESIRED_CUDA}" == 4 ]]; then

				      cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"

				    else

									
										19

.circleci/scripts/binary_linux_upload.sh
									
												View File
												
				@ -5,15 +5,6 @@ set -eu -o pipefail

				set +x

				declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				cat >/home/circleci/project/login_to_anaconda.sh <<EOL

				set +x

				echo "Trying to login to Anaconda"

				yes | anaconda login \

				    --username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \

				    --password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"

				set -x

				EOL

				chmod +x /home/circleci/project/login_to_anaconda.sh

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				@ -21,12 +12,18 @@ chmod +x /home/circleci/project/login_to_anaconda.sh

				set -eux -o pipefail

				export PATH="$MINICONDA_ROOT/bin:$PATH"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				# Upload the package to the final location

				pushd /home/circleci/project/final_pkgs

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry timeout 30 /home/circleci/project/login_to_anaconda.sh

				  anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force

				  anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload  "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

									
										19

.circleci/scripts/binary_macos_upload.sh
									
												View File
												
				@ -4,15 +4,6 @@ set -eu -o pipefail

				set +x

				export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				cat >/Users/distiller/project/login_to_anaconda.sh <<EOL

				set +x

				echo "Trying to login to Anaconda"

				yes | anaconda login \

				    --username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \

				    --password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"

				set -x

				EOL

				chmod +x /Users/distiller/project/login_to_anaconda.sh

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				@ -22,11 +13,17 @@ set -eux -o pipefail

				source "/Users/distiller/project/env"

				export "PATH=$workdir/miniconda/bin:$PATH"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				pushd "$workdir/final_pkgs"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry /Users/distiller/project/login_to_anaconda.sh

				  retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

									
										32

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -40,25 +40,25 @@ if [[ -z "$DOCKER_IMAGE" ]]; then

				  fi

				fi

				# Upload to parallel folder for devtoolsets

				# All nightlies used to be devtoolset3, then devtoolset7 was added as a build

				# option, so the upload was redirected to nightly/devtoolset7 to avoid

				# conflicts with other binaries (there shouldn't be any conflicts). Now we are

				# making devtoolset7 the default.

				if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$(uname)" == 'Darwin' ]]; then

				  export PIP_UPLOAD_FOLDER='nightly/'

				else

				  # On linux machines, this shouldn't actually be called anymore. This is just

				  # here for extra safety.

				  export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'

				fi

				# Default to nightly, since that's where this normally uploads to

				PIP_UPLOAD_FOLDER='nightly/'

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				#TODO: We should be pulling semver version from the base version.txt

				BASE_BUILD_VERSION="1.5.0.dev$DATE"

				# Change BASE_BUILD_VERSION to git tag when on a git tag

				if git describe --tags --exact >/dev/null 2>/dev/null; then

				  # Switch upload folder to 'test/' if we are on a tag

				  PIP_UPLOAD_FOLDER='test/'

				  # Grab git tag, remove prefixed v and remove everything after -

				  # Used to clean up tags that are for release candidates like v1.5.0-rc1

				  # Turns tag v1.5.0-rc1 -> v1.5.0

				  BASE_BUILD_VERSION="$(git describe --tags | sed -e 's/^v//' -e 's/-.*$//')"

				fi

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE"

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"

				else

				  export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE+$DESIRED_CUDA"

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

				fi

				export PYTORCH_BUILD_NUMBER=1

				@ -96,7 +96,7 @@ export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"

				export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.4.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.5.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

									
										25

.circleci/scripts/binary_run_in_docker.sh
									
												View File
												
				@ -16,31 +16,12 @@ set -eux -o pipefail

				# Expect actual code to be written to this file

				chmod +x /home/circleci/project/ci_test_script.sh

				VOLUME_MOUNTS="-v /home/circleci/project/:/circleci_stuff -v /home/circleci/project/final_pkgs:/final_pkgs -v ${PYTORCH_ROOT}:/pytorch -v ${BUILDER_ROOT}:/builder"

				# Run the docker

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d "${DOCKER_IMAGE}")

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				else

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d "${DOCKER_IMAGE}")

				fi

				# Copy the envfile and script with all the code to run into the docker.

				docker cp /home/circleci/project/. "$id:/circleci_stuff"

				# Copy built packages into the docker to test. This should only exist on the

				# binary test jobs. The package should've been created from a binary build job,

				# whhich persisted the package to a CircleCI workspace, which this job then

				# copies into a GPU enabled docker for testing

				if [[ -d "/home/circleci/project/final_pkgs" ]]; then

				  docker cp /home/circleci/project/final_pkgs "$id:/final_pkgs"

				fi

				# Copy the needed repos into the docker. These do not exist in the smoke test

				# jobs, since the smoke test jobs do not need the Pytorch source code.

				if [[ -d "$PYTORCH_ROOT" ]]; then

				  docker cp "$PYTORCH_ROOT" "$id:/pytorch"

				fi

				if [[ -d "$BUILDER_ROOT" ]]; then

				  docker cp "$BUILDER_ROOT" "$id:/builder"

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				fi

				# Execute the test script that was populated by an earlier section

									
										5

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -57,7 +57,6 @@ time python aten/src/ATen/gen.py \

				  -s aten/src/ATen \

				  -d build/aten/src/ATen \

				  aten/src/ATen/Declarations.cwrap \

				  aten/src/THNN/generic/THNN.h \

				  aten/src/THCUNN/generic/THCUNN.h \

				  aten/src/ATen/nn.yaml \

				  aten/src/ATen/native/native_functions.yaml

				@ -73,10 +72,10 @@ time python tools/setup_helpers/generate_code.py \

				# Build the docs

				pushd docs/cpp

				pip install breathe==4.11.1 bs4 lxml six

				pip install breathe>=4.13.0 bs4 lxml six

				pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"

				pip install exhale>=0.2.1

				pip install sphinx==1.8.5

				pip install sphinx>=2.0

				# Uncomment once it is fixed

				# pip install -r requirements.txt

				time make VERBOSE=1 html -j

									
										6

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
				@ -90,6 +90,12 @@ else

				  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"

				fi

				# Prevent Google from indexing $install_path/_modules. This folder contains

				# generated source files.

				# NB: the following only works on gnu sed. The sed shipped with mac os is different.

				# One can `brew install gnu-sed` on a mac and then use "gsed" instead of "sed".

				find "$install_path/_modules" -name "*.html" -print0 | xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'

				git add "$install_path" || true

				git status

				git config user.email "soumith+bot@pytorch.org"

									
										4

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -2,7 +2,7 @@

				set -ex -o pipefail

				# Set up NVIDIA docker repo

				curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				curl -s -L --retry 3 https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				@ -45,7 +45,7 @@ retry () {

				retry sudo pip -q install awscli==1.16.35

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  DRIVER_FN="NVIDIA-Linux-x86_64-430.40.run"

				  DRIVER_FN="NVIDIA-Linux-x86_64-440.59.run"

				  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				  nvidia-smi

									
										2

.circleci/scripts/setup_linux_system_environment.sh
									
												View File
												
				@ -2,7 +2,7 @@

				set -eux -o pipefail

				# Set up CircleCI GPG keys for apt, if needed

				curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -

				curl --retry 3 -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -

				# Stop background apt updates.  Hypothetically, the kill should not

				# be necessary, because stop is supposed to send a kill signal to

									
										140

.circleci/scripts/should_run_job.py
									
												View File
											
				@ -1,140 +0,0 @@

				import argparse

				import re

				import sys

				# Modify this variable if you want to change the set of default jobs

				# which are run on all pull requests.

				#

				# WARNING: Actually, this is a lie; we're currently also controlling

				# the set of jobs to run via the Workflows filters in CircleCI config.

				default_set = set([

				    # PyTorch CPU

				    # Selected oldest Python 2 version to ensure Python 2 coverage

				    'pytorch-linux-xenial-py2.7.9',

				    # PyTorch CUDA

				    'pytorch-linux-xenial-cuda9-cudnn7-py3',

				    # PyTorch ASAN

				    'pytorch-linux-xenial-py3-clang5-asan',

				    # PyTorch DEBUG

				    'pytorch-linux-xenial-py3.6-gcc5.4',

				    # LibTorch

				    'pytorch-libtorch-linux-xenial-cuda9-cudnn7-py3',

				    # Caffe2 CPU

				    'caffe2-py2-mkl-ubuntu16.04',

				    # Caffe2 CUDA

				    'caffe2-py3.5-cuda10.1-cudnn7-ubuntu16.04',

				    # Caffe2 ONNX

				    'caffe2-onnx-py2-gcc5-ubuntu16.04',

				    'caffe2-onnx-py3.6-clang7-ubuntu16.04',

				    # Caffe2 Clang

				    'caffe2-py2-clang7-ubuntu16.04',

				    # Caffe2 CMake

				    'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',

				    # Caffe2 CentOS

				    'caffe2-py3.6-devtoolset7-cuda9.0-cudnn7-centos7',

				    # Binaries

				    'manywheel 2.7mu cpu devtoolset7',

				    'libtorch 2.7m cpu devtoolset7',

				    'libtorch 2.7m cpu gcc5.4_cxx11-abi',

				    'libtorch 2.7 cpu',

				    'libtorch-ios-11.2.1-nightly-x86_64-build',

				    'libtorch-ios-11.2.1-nightly-arm64-build',

				    'libtorch-ios-11.2.1-nightly-binary-build-upload',

				    # Caffe2 Android

				    'caffe2-py2-android-ubuntu16.04',

				    # Caffe2 OSX

				    'caffe2-py2-system-macos10.13',

				    # PyTorch OSX

				    'pytorch-macos-10.13-py3',

				    'pytorch-macos-10.13-cuda9.2-cudnn7-py3',

				    # PyTorch Android

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19',

				    # PyTorch Android gradle

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',

				    # Pytorch iOS builds

				    'pytorch-ios-11.2.1-x86_64_build',

				    'pytorch-ios-11.2.1-arm64_build',

				    # PyTorch Mobile builds

				    'pytorch-linux-xenial-py3-clang5-mobile-build',

				    # Pytorch backward compatibility check

				    'pytorch-linux-backward-compatibility-check-test',

				    # XLA

				    'pytorch-xla-linux-xenial-py3.6-clang7',

				    # GraphExecutor config jobs

				    'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test',

				    'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test',

				    # Other checks

				    'pytorch-short-perf-test-gpu',

				    'pytorch-python-doc-push',

				    'pytorch-cpp-doc-push',

				])

				# Collection of jobs that are *temporarily* excluded from running on PRs.

				# Use this if there is a long-running job breakage that we can't fix with a

				# single revert.

				skip_override = {

				    # example entry:

				    # 'pytorch-cpp-doc-push': "https://github.com/pytorch/pytorch/issues/<related issue>"

				}

				# Takes in commit message to analyze via stdin

				#

				# This script will query Git and attempt to determine if we should

				# run the current CI job under question

				#

				# NB: Try to avoid hard-coding names here, so there's less place to update when jobs

				# are updated/renamed

				#

				# Semantics in the presence of multiple tags:

				#   - Let D be the set of default builds

				#   - Let S be the set of explicitly specified builds

				#   - Let O be the set of temporarily skipped builds

				#   - Run S \/ (D - O)

				parser = argparse.ArgumentParser()

				parser.add_argument('build_environment')

				args = parser.parse_args()

				commit_msg = sys.stdin.read()

				# Matches anything that looks like [foo ci] or [ci foo] or [foo test]

				# or [test foo]

				RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')

				markers = RE_MARKER.finditer(commit_msg)

				for m in markers:

				    if m.group(1) and m.group(2):

				        print("Unrecognized marker: {}".format(m.group(0)))

				        continue

				    spec = m.group(1) or m.group(2)

				    if spec is None:

				        print("Unrecognized marker: {}".format(m.group(0)))

				        continue

				    if spec in args.build_environment or spec == 'all':

				        print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))

				        sys.exit(0)

				skip_override_set = set(skip_override.keys())

				should_run_set = default_set - skip_override_set

				for spec in should_run_set:

				    if spec in args.build_environment:

				        print("Accepting {} as part of default set".format(args.build_environment))

				        sys.exit(0)

				print("Rejecting {}".format(args.build_environment))

				for spec, issue in skip_override.items():

				    if spec in args.build_environment:

				        print("This job is temporarily excluded from running on PRs. Reason: {}".format(issue))

				        break

				sys.exit(1)

									
										29

.circleci/scripts/should_run_job.sh
									
												View File
											
				@ -1,29 +0,0 @@

				#!/usr/bin/env bash

				set -exu -o pipefail

				SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

				# Check if we should actually run

				echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"

				echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"

				if [ -z "${BUILD_ENVIRONMENT:-}" ]; then

				  echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"

				  echo "CircleCI scripts are probably misconfigured."

				  exit 1

				fi

				if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then

				  echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"

				  echo "written out.  Are you perhaps running the wrong copy of this script?"

				  echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"

				  exit 1

				fi

				if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then

				  if [[ $CIRCLE_BRANCH != "ci-all/"* ]] && [[ $CIRCLE_BRANCH != "nightly" ]] &&  [[ $CIRCLE_BRANCH != "postnightly" ]] ; then

				    # Don't swallow "script doesn't exist

				    [ -e "$SCRIPT_DIR/should_run_job.py"  ]

				    if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then

				      circleci step halt

				      exit

				    fi

				  fi

				fi

									
										87

.circleci/scripts/upload_binary_size_to_scuba.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,87 @@

				import glob

				import json

				import logging

				import os

				import os.path

				import re

				import sys

				import time

				import requests

				def get_size(file_dir):

				    try:

				        # we should only expect one file, if no, something is wrong

				        file_name = glob.glob(os.path.join(file_dir, "*"))[0]

				        return os.stat(file_name).st_size

				    except:

				        logging.exception(f"error getting file from: {file_dir}")

				        return 0

				def build_message(size):

				    pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [

				        None,

				        None,

				        None,

				    ]

				    os_name = os.uname()[0].lower()

				    if os_name == "darwin":

				        os_name = "macos"

				    return {

				        "normal": {

				            "os": os_name,

				            "pkg_type": pkg_type,

				            "py_ver": py_ver,

				            "cu_ver": cu_ver,

				            "pr": os.environ.get("CIRCLE_PR_NUMBER"),

				            "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				            "sha1": os.environ.get("CIRCLE_SHA1"),

				            "branch": os.environ.get("CIRCLE_BRANCH"),

				        },

				        "int": {

				            "time": int(time.time()),

				            "size": size,

				            "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				        },

				    }

				def send_message(message):

				    access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")

				    if not access_token:

				        raise ValueError("Can't find access token from environment variable")

				    url = "https://graph.facebook.com/scribe_logs"

				    r = requests.post(

				        url,

				        data={

				            "access_token": access_token,

				            "logs": json.dumps(

				                [

				                    {

				                        "category": "perfpipe_pytorch_binary_size",

				                        "message": json.dumps(message),

				                        "line_escape": False,

				                    }

				                ]

				            ),

				        },

				    )

				    print(r.text)

				    r.raise_for_status()

				if __name__ == "__main__":

				    file_dir = os.environ.get(

				        "PYTORCH_FINAL_PACKAGE_DIR", "/home/circleci/project/final_pkgs"

				    )

				    if len(sys.argv) == 2:

				        file_dir = sys.argv[1]

				    print("checking dir: " + file_dir)

				    size = get_size(file_dir)

				    if size != 0:

				        try:

				            send_message(build_message(size))

				        except:

				            logging.exception("can't send message")

									
										25

.circleci/scripts/vs_install.ps1
									
										Normal file
									
												View File
												
				@ -0,0 +1,25 @@

				$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"

				$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.14.11",

				                                                     "--add Microsoft.Component.MSBuild",

				                                                     "--add Microsoft.VisualStudio.Component.Roslyn.Compiler",

				                                                     "--add Microsoft.VisualStudio.Component.TextTemplating",

				                                                     "--add Microsoft.VisualStudio.Component.VC.CoreIde",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest",

				                                                     "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",

				                                                     "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")

				curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

				if ($LASTEXITCODE -ne 0) {

				    echo "Download of the VS 2017 installer failed"

				    exit 1

				}

				$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru

				Remove-Item -Path vs_installer.exe -Force

				$exitCode = $process.ExitCode

				if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				    echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."

				    exit 1

				}

									
										66

.circleci/validate-docker-version.py
									
												View File
												
				@ -1,43 +1,43 @@

				#!/usr/bin/env python3

				import urllib.request

				import re

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				from yaml import load

				RE_VERSION = re.compile(r'allDeployedVersions = "([0-9,]+)"')

				try:

				    from yaml import CLoader as Loader

				except ImportError:

				    from yaml import Loader

				URL_TEMPLATE = (

				    "https://raw.githubusercontent.com/pytorch/ossci-job-dsl/"

				    "master/src/main/groovy/ossci/{}/DockerVersion.groovy"

				)

				def check_version(job, expected_version):

				    url = URL_TEMPLATE.format(job)

				    with urllib.request.urlopen(url) as f:

				        contents = f.read().decode('utf-8')

				        m = RE_VERSION.search(contents)

				        if not m:

				            raise RuntimeError(

				                "Unbelievable! I could not find the variable allDeployedVersions in "

				                "{}; did the organization of ossci-job-dsl change?\n\nFull contents:\n{}"

				                .format(url, contents)

				            )

				        valid_versions = [int(v) for v in m.group(1).split(',')]

				        if expected_version not in valid_versions:

				            raise RuntimeError(

				                "We configured {} to use Docker version {}; but this "

				                "version is not deployed in {}.  Non-deployed versions will be "

				                "garbage collected two weeks after they are created.  DO NOT LAND "

				                "THIS TO MASTER without also updating ossci-job-dsl with this version."

				                "\n\nDeployed versions: {}"

				                .format(job, expected_version, url, m.group(1))

				            )

				def load_config(filename=".circleci/config.yml"):

				    with open(filename, "r") as fh:

				        return load("".join(fh.readlines()), Loader)

				def load_tags_for_projects(workflow_config):

				    return {

				        v["ecr_gc_job"]["project"]: v["ecr_gc_job"]["tags_to_keep"]

				        for v in workflow_config["workflows"]["ecr_gc"]["jobs"]

				        if isinstance(v, dict) and "ecr_gc_job" in v

				    }

				def check_version(job, tags, expected_version):

				    valid_versions = tags[job].split(",")

				    if expected_version not in valid_versions:

				        raise RuntimeError(

				            "We configured {} to use Docker version {}; but this "

				            "version is not configured in job ecr_gc_job_for_{}.  Non-deployed versions will be "

				            "garbage collected two weeks after they are created.  DO NOT LAND "

				            "THIS TO MASTER without also updating ossci-job-dsl with this version."

				            "\n\nDeployed versions: {}".format(job, expected_version, job, tags[job])

				        )

				def validate_docker_version():

				    check_version('pytorch', pytorch_build_definitions.DOCKER_IMAGE_VERSION)

				    check_version('caffe2', caffe2_build_definitions.DOCKER_IMAGE_VERSION)

				    tags = load_tags_for_projects(load_config())

				    check_version("pytorch", tags, pytorch_build_definitions.DOCKER_IMAGE_VERSION)

				    check_version("caffe2", tags, caffe2_build_definitions.DOCKER_IMAGE_VERSION)

				if __name__ == "__main__":

									
										6

.circleci/verbatim-sources/binary-build-tests.yml
									
												View File
												
				@ -12,9 +12,3 @@

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_2.7m_cu100_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cu100"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

									
										38

.circleci/verbatim-sources/binary-job-specs.yml
									
												View File
												
				@ -2,7 +2,7 @@

				    <<: *binary_linux_build_params

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -19,8 +19,8 @@

				            elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then

				              retry apt-get update

				              retry apt-get -y install expect moreutils

				              conda install -y -c eumetsat expect

				              conda install -y cmake

				              retry conda install -y -c eumetsat expect

				              retry conda install -y cmake

				            fi

				    - run:

				        name: Update compiler to devtoolset7

				@ -41,6 +41,16 @@

				        no_output_timeout: "1h"

				        command: |

				            source "/pytorch/.circleci/scripts/binary_linux_build.sh"

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				            source /env

				            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            pip3 install requests && \

				            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \

				            python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				    - persist_to_workspace:

				        root: /

				        paths: final_pkgs

				@ -56,7 +66,7 @@

				        image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    # TODO: We shouldn't attach the workspace multiple times

				    - attach_workspace:

				        at: /home/circleci/project

				@ -79,7 +89,7 @@

				        image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - attach_workspace:

				@ -130,7 +140,7 @@

				  smoke_mac_test:

				    <<: *binary_linux_test_upload_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      - attach_workspace:

				          at: ~/workspace

				@ -158,10 +168,10 @@

				  binary_mac_build:

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -199,10 +209,10 @@

				  binary_mac_upload: &binary_mac_upload

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -227,7 +237,7 @@

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - should_run_job

				    - attach_scripts

				    - checkout

				    - run_brew_for_ios_build

				    - run:

				@ -247,15 +257,15 @@

				    - persist_to_workspace:

				        root: /Users/distiller/workspace/

				        paths: ios

				  binary_ios_upload: 

				  binary_ios_upload:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "11.2.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - should_run_job

				    - attach_scripts

				    - checkout

				    - run_brew_for_ios_build

				    - run:

									
										12

.circleci/verbatim-sources/caffe2-job-specs.yml
									
												View File
												
				@ -4,7 +4,7 @@

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				@ -64,7 +64,7 @@

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -124,10 +124,10 @@

				  caffe2_macos_build:

				    <<: *caffe2_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - attach_scripts

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				@ -151,7 +151,7 @@

				            # Install Anaconda if we need to

				            if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				              rm -rf ${TMPDIR}/anaconda

				              curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				              curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				              chmod +x ${TMPDIR}/conda.sh

				              /bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda

				              rm -f ${TMPDIR}/conda.sh

				@ -162,7 +162,7 @@

				            pip -q install numpy

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

									
										8

.circleci/verbatim-sources/commands.yml
									
												View File
												
				@ -3,16 +3,12 @@ commands:

				  # attaches the workspace at ~/workspace; this workspace is generated

				  # by the setup job. Note that ~/workspace is not the default working

				  # directory (that's ~/project).

				  should_run_job:

				    description: "Test if the job should run or not"

				  attach_scripts:

				    description: "Attach the scripts that power everything else"

				    steps:

				      - attach_workspace:

				          name: Attaching workspace

				          at: ~/workspace

				      - run:

				          name: Should run job

				          no_output_timeout: "2m"

				          command: ~/workspace/.circleci/scripts/should_run_job.sh

				  # This system setup script is meant to run before the CI-related scripts, e.g.,

				  # installing Git client, checking out code, setting up CI env, and

									
										21

.circleci/verbatim-sources/docker_build_job.yml
									
												View File
											
				@ -1,21 +0,0 @@

				  docker_build_job:

				      parameters:

				        image_name:

				          type: string

				          default: ""

				      machine:

				        image: ubuntu-1604:201903-01

				      resource_class: large

				      environment:

				        IMAGE_NAME: << parameters.image_name >>

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_<< parameters.image_name >>

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              cd .circleci/docker && ./build_docker.sh

									
										84

.circleci/verbatim-sources/docker_jobs.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,84 @@

				  docker_build_job:

				      parameters:

				        image_name:

				          type: string

				          default: ""

				      machine:

				        image: ubuntu-1604:201903-01

				      resource_class: large

				      environment:

				        IMAGE_NAME: << parameters.image_name >>

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_<< parameters.image_name >>

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              cd .circleci/docker && ./build_docker.sh

				  docker_for_ecr_gc_build_job:

				      machine:

				        image: ubuntu-1604:201903-01

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_for_ecr_gc

				            no_output_timeout: "1h"

				            command: |

				              cd .circleci/ecr_gc_docker

				              docker build . -t 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              eval $(aws ecr get-login --no-include-email --region us-east-1)

				              set -x

				              docker push 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				  ecr_gc_job:

				      parameters:

				        project:

				          type: string

				          default: "pytorch"

				        tags_to_keep:  # comma separate values

				          type: string

				      environment:

				        PROJECT: << parameters.project >>

				        IMAGE_TAG: << parameters.tags_to_keep >>

				      docker:

				        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				          aws_auth:

				            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				      steps:

				        - run:

				            name: garbage collecting for ecr images

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              /usr/bin/gc.py --filter-prefix ${PROJECT}  --ignore-tags ${IMAGE_TAG}

				  docker_hub_index_job:

				      docker:

				        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				          aws_auth:

				            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				      steps:

				        - run:

				            name: garbage collecting for ecr images

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              export DOCKER_HUB_USERNAME=${CIRCLECI_DOCKER_HUB_USERNAME}

				              export DOCKER_HUB_PASSWORD=${CIRCLECI_DOCKER_HUB_PASSWORD}

				              set -x

				              /usr/bin/docker_hub.py

									
										26

.circleci/verbatim-sources/header-section.yml
									
												View File
												
				@ -1,15 +1,9 @@

				# WARNING: DO NOT EDIT THIS FILE DIRECTLY!!!

				# See the README.md in this directory.

				# IMPORTANT: To update Docker image version, please first update

				# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/pytorch/DockerVersion.groovy and

				# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/caffe2/DockerVersion.groovy,

				# and then update DOCKER_IMAGE_VERSION at the top of the following files:

				# * cimodel/data/pytorch_build_definitions.py

				# * cimodel/data/caffe2_build_definitions.py

				# And the inline copies of the variable in

				# * verbatim-sources/job-specs-custom.yml

				#   (grep for DOCKER_IMAGE)

				# IMPORTANT: To update Docker image version, please follow

				# the instructions at

				# https://github.com/pytorch/pytorch/wiki/Docker-image-build-on-CircleCI

				version: 2.1

				@ -19,3 +13,17 @@ docker_config_defaults: &docker_config_defaults

				    # This IAM user only allows read-write access to ECR

				    aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}

				    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}

				executors:

				  windows-with-nvidia-gpu:

				    machine:

				      resource_class: windows.gpu.nvidia.medium

				      image: windows-server-2019-nvidia:stable

				      shell: bash.exe

				  windows-cpu-with-nvidia-cuda:

				    machine:

				      # we will change to CPU host when it's ready

				      resource_class: windows.xlarge

				      image: windows-server-2019-vs2019:stable

				      shell: bash.exe

									
										120

.circleci/verbatim-sources/job-specs-custom.yml
									
												View File
												
				@ -2,13 +2,13 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-python-doc-push

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -47,13 +47,13 @@

				  pytorch_cpp_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-cpp-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -93,10 +93,10 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - attach_scripts

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				@ -107,7 +107,7 @@

				            export IN_CIRCLECI=1

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				@ -133,11 +133,11 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      # This workspace also carries binaries from the build job

				      - should_run_job

				      - attach_scripts

				      - run_brew_for_macos_build

				      - run:

				          name: Test

				@ -154,64 +154,16 @@

				      - store_test_results:

				          path: test/test-reports

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				    macos:

				      xcode: "9.0"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # Install CUDA 9.2

				            sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true

				            curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip

				            unzip ~/cuda_9.2.64_mac_installer.zip -d ~/

				            sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window

				            sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib

				            sudo rm -rf /usr/local/cuda || true

				            # Install cuDNN 7.1 for CUDA 9.2

				            curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz

				            rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1

				            tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/

				            sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            set -x

				            git submodule sync && git submodule update -q --init --recursive

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

				  pytorch_android_gradle_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				@ -291,13 +243,13 @@

				  pytorch_android_publish_snapshot:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				@ -327,13 +279,13 @@

				  pytorch_android_gradle_build-x86_32:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - attach_scripts

				    - run:

				        name: filter out not PR runs

				        no_output_timeout: "5m"

				@ -376,9 +328,9 @@

				      xcode: "11.2.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - attach_scripts

				      - checkout

				      - run_brew_for_ios_build    

				      - run_brew_for_ios_build

				      - run:

				          name: Run Fastlane

				          no_output_timeout: "1h"

				@ -410,30 +362,44 @@

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            export TCLLIBPATH="/usr/local/lib"

				            # Install conda

				            curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				            chmod +x ~/Downloads/conda.sh

				            /bin/bash ~/Downloads/conda.sh -b -p ~/anaconda

				            curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				            chmod +x ~/conda.sh

				            /bin/bash ~/conda.sh -b -p ~/anaconda

				            export PATH="~/anaconda/bin:${PATH}"

				            source ~/anaconda/bin/activate

				            # Install dependencies

				            conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				            retry () {

				                $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				            }

				            retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				            # sync submodules

				            cd ${PROJ_ROOT}

				            git submodule sync

				            git submodule update --init --recursive

				            # export

				            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				            # run build script

				            chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				            echo "IOS_ARCH: ${IOS_ARCH}"

				            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				            export BUILD_PYTORCH_MOBILE=1

				            #check the custom build flag

				            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

				            if [ -n "${SELECTED_OP_LIST}" ]; then

				                export SELECTED_OP_LIST="${PROJ_ROOT}/ios/TestApp/custom_build/${SELECTED_OP_LIST}"

				            fi

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				      - run:

				          name: Run Build Tests

				          name: Run Build Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				@ -445,7 +411,11 @@

				              exit 1

				            fi

				            echo ${IOS_DEV_TEAM_ID}

				            ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

				            else

				              ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}

				            fi

				            if ! [ "$?" -eq "0" ]; then

				              echo 'xcodebuild failed!'

				              exit 1

				@ -455,15 +425,14 @@

				          no_output_timeout: "2h"

				          command: |

				            set -e

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then 

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              echo "not SIMULATOR build, skip it."

				              exit 0

				            fi

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            source ~/anaconda/bin/activate

				            #install the latest version of PyTorch and TorchVision

				            pip install torch torchvision

				            pip install torch torchvision --progress-bar off

				            #run unit test

				            cd ${PROJ_ROOT}/ios/TestApp/benchmark

				            python trace_model.py

				@ -471,4 +440,3 @@

				            cd ${PROJ_ROOT}/ios/TestApp

				            instruments -s -devices

				            fastlane scan

									
										45

.circleci/verbatim-sources/pytorch-build-params.yml
									
												View File
												
				@ -12,10 +12,14 @@ pytorch_params: &pytorch_params

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				    build_only:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    BUILD_ONLY: << parameters.build_only >>

				  resource_class: << parameters.resource_class >>

				pytorch_ios_params: &pytorch_ios_params

				@ -29,11 +33,46 @@ pytorch_ios_params: &pytorch_ios_params

				    ios_platform:

				      type: string

				      default: ""

				    op_list:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				    IOS_PLATFORM: << parameters.ios_platform >>

				    SELECTED_OP_LIST: << parameters.op_list >>

				pytorch_windows_params: &pytorch_windows_params

				  parameters:

				    test_name:

				      type: string

				      default: ""

				    cuda_version:

				      type: string

				      default: "10"

				    python_version:

				      type: string

				      default: "3.6"

				    vc_version:

				      type: string

				      default: "14.11"

				    vc_year:

				      type: string

				      default: "2017"

				    vc_product:

				      type: string

				      default: "BuildTools"

				    use_cuda:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: "pytorch-win-ws2019-cuda10-cudnn7-py3"

				    SCCACHE_BUCKET: "ossci-compiler-cache"

				    CUDA_VERSION: <<parameters.cuda_version>>

				    PYTHON_VERSION: <<parameters.python_version>>

				    VC_VERSION: <<parameters.vc_version>>

				    VC_YEAR: <<parameters.vc_year>>

				    VC_PRODUCT: <<parameters.vc_product>>

				    USE_CUDA: <<parameters.use_cuda>>

				    TORCH_CUDA_ARCH_LIST: "7.5"

				    JOB_BASE_NAME: <<parameters.test_name>>

									
										158

.circleci/verbatim-sources/pytorch-job-specs.yml
									
												View File
												
				@ -5,7 +5,7 @@ jobs:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				@ -19,28 +19,27 @@ jobs:

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          # NB: Temporarily disable the rebase logic in v1.4.0, don't merge this change into master

				          # # TODO We may want to move the rebase logic to a separate step after checkout

				          # # Rebase to master only if in xenial_py3_6_gcc5_4 case

				          # if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then

				          #   echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"

				          #   set -x

				          #   git config --global user.email "circleci.ossci@gmail.com"

				          #   git config --global user.name "CircleCI"

				          #   git config remote.origin.url https://github.com/pytorch/pytorch.git

				          #   git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				          #   git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet

				          #   export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				          #   echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				          #   export GIT_COMMIT=${CIRCLE_SHA1}

				          #   echo "GIT_COMMIT: " ${GIT_COMMIT}

				          #   git checkout -f ${GIT_COMMIT}

				          #   git reset --hard ${GIT_COMMIT}

				          #   git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}

				          #   set +x

				          # else

				          #   echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          # fi

				          # TODO We may want to move the rebase logic to a separate step after checkout

				          # Rebase to master only if in xenial_py3_6_gcc5_4 case

				          if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then

				            echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"

				            set -x

				            git config --global user.email "circleci.ossci@gmail.com"

				            git config --global user.name "CircleCI"

				            git config remote.origin.url https://github.com/pytorch/pytorch.git

				            git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet

				            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				            echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				            export GIT_COMMIT=${CIRCLE_SHA1}

				            echo "GIT_COMMIT: " ${GIT_COMMIT}

				            git checkout -f ${GIT_COMMIT}

				            git reset --hard ${GIT_COMMIT}

				            git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}

				            set +x

				          else

				            echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          fi

				          git submodule sync && git submodule update -q --init --recursive

				@ -89,7 +88,7 @@ jobs:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - attach_scripts

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -132,10 +131,119 @@ jobs:

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          retrieve_test_reports

				    - store_test_results:

				        path: test-reports

				  pytorch_windows_build:

				    <<: *pytorch_windows_params

				    parameters:

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.11"

				      vc_year:

				        type: string

				        default: "2017"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: windows-cpu-with-nvidia-cuda

				    steps:

				      - checkout

				      - run:

				          name: Install VS2017

				          command: |

				            if [[ "${VC_YEAR}" == "2017" ]]; then

				              powershell .circleci/scripts/vs_install.ps1

				            fi

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/cuda_10.1.243_426.00_win10.exe

				            mkdir cuda_install_logs

				            ./cuda_10.1.243_426.00_win10.exe -s -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				            cat cuda_install_logs/LOG.setup.exe.log

				            if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe"

				            then

				              echo "CUDA installation failed"

				              exit 1

				            fi

				            rm -rf ./cuda_install_logs

				            rm -f ./cuda_10.1.243_426.00_win10.exe

				      - run:

				          name: Install Cudnn

				          command : |

				            cd c:/

				            curl --retry 3 -O https://ossci-windows.s3.amazonaws.com/cudnn-10.1-windows10-x64-v7.6.4.38.zip

				            7z x cudnn-10.1-windows10-x64-v7.6.4.38.zip -ocudnn

				            cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/"

				      - run:

				          name: Build

				          no_output_timeout: "90m"

				          command: |

				            set -e

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-build.sh

				  pytorch_windows_test:

				    <<: *pytorch_windows_params

				    parameters:

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.11"

				      vc_year:

				        type: string

				        default: "2017"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: windows-with-nvidia-gpu

				    steps:

				      - checkout

				      - run:

				          name: Install VS2017

				          command: |

				            if [[ "${VC_YEAR}" == "2017" ]]; then

				              powershell .circleci/scripts/vs_install.ps1

				            fi

				      - run:

				          name: Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-test.sh

									
										134

.circleci/verbatim-sources/windows-build-test.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,134 @@

				      # Warning: indentation here matters!

				      - pytorch_windows_build:

				          name: pytorch_windows_vs2017_14.11_py36_cuda10.1_build

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: "14.11"

				          vc_year: "2017"

				          vc_product: "BuildTools"

				          use_cuda: "1"

				          requires:

				            - setup

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				      - pytorch_windows_test:

				          name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test1

				          test_name: pytorch-windows-test1

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: "14.11"

				          vc_year: "2017"

				          vc_product: "BuildTools"

				          use_cuda: "1"

				          requires:

				            - setup

				            - pytorch_windows_vs2017_14.11_py36_cuda10.1_build

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				      - pytorch_windows_test:

				          name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test2

				          test_name: pytorch-windows-test2

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: "14.11"

				          vc_year: "2017"

				          vc_product: "BuildTools"

				          use_cuda: "1"

				          requires:

				            - setup

				            - pytorch_windows_vs2017_14.11_py36_cuda10.1_build

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				      - pytorch_windows_build:

				          name: pytorch_windows_vs2017_14.16_py36_cuda10.1_build

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: "14.16"

				          vc_year: "2017"

				          vc_product: "BuildTools"

				          use_cuda: "1"

				          requires:

				            - setup

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				      - pytorch_windows_test:

				          name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test1

				          test_name: pytorch-windows-test1

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: "14.16"

				          vc_year: "2017"

				          vc_product: "BuildTools"

				          use_cuda: "1"

				          requires:

				            - setup

				            - pytorch_windows_vs2017_14.16_py36_cuda10.1_build

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				      - pytorch_windows_test:

				          name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test2

				          test_name: pytorch-windows-test2

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: "14.16"

				          vc_year: "2017"

				          vc_product: "BuildTools"

				          use_cuda: "1"

				          requires:

				            - setup

				            - pytorch_windows_vs2017_14.16_py36_cuda10.1_build

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				      - pytorch_windows_build:

				          name: pytorch_windows_vs2019_py36_cuda10.1_build

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: ""

				          vc_year: "2019"

				          vc_product: "Community"

				          use_cuda: "1"

				          requires:

				            - setup

				      - pytorch_windows_test:

				          name: pytorch_windows_vs2019_py36_cuda10.1_test1

				          test_name: pytorch-windows-test1

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: ""

				          vc_year: "2019"

				          vc_product: "Community"

				          use_cuda: "1"

				          requires:

				            - setup

				            - pytorch_windows_vs2019_py36_cuda10.1_build

				      - pytorch_windows_test:

				          name: pytorch_windows_vs2019_py36_cuda10.1_test2

				          test_name: pytorch-windows-test2

				          cuda_version: "10"

				          python_version: "3.6"

				          vc_version: ""

				          vc_year: "2019"

				          vc_product: "Community"

				          use_cuda: "1"

				          requires:

				            - setup

				            - pytorch_windows_vs2019_py36_cuda10.1_build

									
										51

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml
									
												View File
												
				@ -1,3 +1,5 @@

				      # TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file

				      #       instead of doing one offs here

				      # Binary builds (subset, to smoke test that they'll work)

				      #

				      # NB: If you modify this file, you need to also modify

				@ -10,13 +12,17 @@

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				          docker_image: "pytorch/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda102"

				      - binary_linux_build:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          name: binary_linux_manywheel_3_7m_cu102_devtoolset7_build

				          build_environment: "manywheel 3.7m cu102 devtoolset7"

				          requires:

				            - setup

				          docker_image: "pytorch/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda102"

				          filters:

				            branches:

				              only:

				                - master

				      - binary_linux_build:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_build

				          build_environment: "conda 2.7 cpu devtoolset7"

				@ -31,7 +37,7 @@

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "pytorch/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda102"

				      - binary_linux_build:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				@ -46,33 +52,54 @@

				          build_environment: "wheel 3.6 cpu"

				          requires:

				            - setup

				          filters:

				            branches:

				              only:

				                - master

				      - binary_mac_build:

				          name: binary_macos_conda_2_7_cpu_build

				          build_environment: "conda 2.7 cpu"

				          requires:

				            - setup

				          filters:

				            branches:

				              only:

				                - master

				      # This job has an average run time of 3 hours o.O

				      # Now only running this on master to reduce overhead

				      - binary_mac_build:

				          name: binary_macos_libtorch_2_7_cpu_build

				          build_environment: "libtorch 2.7 cpu"

				          requires:

				            - setup

				          filters:

				            branches:

				              only:

				                - master

				      - binary_linux_test:

				          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_2_7mu_cpu_devtoolset7_build

				          docker_image: "pytorch/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda102"

				          filters:

				            branches:

				              only:

				                - master

				      - binary_linux_test:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          name: binary_linux_manywheel_3_7m_cu102_devtoolset7_test

				          build_environment: "manywheel 3.7m cu102 devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          docker_image: "pytorch/manylinux-cuda100"

				            - binary_linux_manywheel_3_7m_cu102_devtoolset7_build

				          docker_image: "pytorch/manylinux-cuda102"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				          filters:

				            branches:

				              only:

				                - master

				      - binary_linux_test:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_test

				          build_environment: "conda 2.7 cpu devtoolset7"

				@ -89,7 +116,7 @@

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "pytorch/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda102"

				      - binary_linux_test:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

									
										11

.circleci/verbatim-sources/workflows-docker-builder.yml
									
												View File
												
				@ -7,6 +7,7 @@

				              only:

				                - master

				    jobs:

				      - docker_for_ecr_gc_build_job

				      - docker_build_job:

				          name: "pytorch-linux-bionic-clang9-thrift-llvmdev"

				          image_name: "pytorch-linux-bionic-clang9-thrift-llvmdev"

				@ -17,11 +18,8 @@

				          name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda8-cudnn7-py2"

				          image_name: "pytorch-linux-xenial-cuda8-cudnn7-py2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda8-cudnn7-py3"

				          image_name: "pytorch-linux-xenial-cuda8-cudnn7-py3"

				          name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9-cudnn7-py2"

				          image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"

				@ -46,6 +44,9 @@

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.5"

				          image_name: "pytorch-linux-xenial-py3.5"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.8"

				          image_name: "pytorch-linux-xenial-py3.8"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-clang7"

				          image_name: "pytorch-linux-xenial-py3.6-clang7"

									
										28

.circleci/verbatim-sources/workflows-ecr-gc.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,28 @@

				  ecr_gc:

				    triggers:

				      - schedule:

				          cron: "45 * * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - ecr_gc_job:

				            name: ecr_gc_job_for_pytorch

				            project: pytorch

				            tags_to_keep: "271,262,256,278,282,291,300,323,327,347,389,401,402,403,405,a8006f9a-272d-4478-b137-d121c6f05c83,6e7b11da-a919-49e5-b2ba-da66e3d4bb0a,f990c76a-a798-42bb-852f-5be5006f8026"

				      - ecr_gc_job:

				            name: ecr_gc_job_for_caffe2

				            project: caffe2

				            tags_to_keep: "348,345,336,325,324,315,306,301,287,283,276,273,266,253,248,238,230,213"

				      - ecr_gc_job:

				            name: ecr_gc_job_for_translate

				            project: translate

				            tags_to_keep: "8"

				      - ecr_gc_job:

				            name: ecr_gc_job_for_tensorcomp

				            project: tensorcomp

				            tags_to_keep: "34"

				      - docker_hub_index_job

									
										8

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml
									
												View File
												
				@ -3,7 +3,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				          filters:

				            branches:

				              only: nightly

				@ -12,7 +12,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				          filters:

				            branches:

				              only: nightly

				@ -21,7 +21,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				          filters:

				            branches:

				              only: nightly

				@ -30,7 +30,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				          filters:

				            branches:

				              only: nightly

									
										3

.circleci/verbatim-sources/workflows-nightly-uploads-header.yml
									
												View File
												
				@ -4,8 +4,5 @@

				      #- binary_linux_libtorch_2.7m_cu90_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu90_build

				      #- binary_linux_libtorch_2.7m_cu100_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu100_build

				      # Nightly uploads

									
										8

.circleci/verbatim-sources/workflows-pytorch-android-gradle-build.yml
									
												View File
												
				@ -1,10 +1,18 @@

				      - pytorch_android_gradle_build-x86_32:

				          name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32

				          filters:

				            branches:

				              only:

				                - master

				          requires:

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				      - pytorch_android_gradle_build:

				          name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				          filters:

				            branches:

				              only:

				                - master

				          requires:

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				            - pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

									
										4

.circleci/verbatim-sources/workflows-pytorch-ge-config-tests.yml
									
												View File
												
				@ -4,7 +4,7 @@

				            - setup

				            - pytorch_linux_xenial_py3_6_gcc5_4_build

				          build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"

				          resource_class: large

				      - pytorch_linux_test:

				          name: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test

				@ -12,5 +12,5 @@

				            - setup

				            - pytorch_linux_xenial_py3_6_gcc5_4_build

				          build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"

				          resource_class: large

									
										11

.circleci/verbatim-sources/workflows-pytorch-ios-builds.yml
									
												View File
												
				@ -1,7 +1,6 @@

				      # Pytorch iOS PR builds

				      - pytorch_ios_build:

				          name: pytorch_ios_11_2_1_x86_64_build

				          context: org-member

				          build_environment: "pytorch-ios-11.2.1-x86_64_build"

				          ios_arch: "x86_64"

				          ios_platform: "SIMULATOR"

				@ -15,3 +14,13 @@

				          ios_platform: "OS"

				          requires:

				            - setup

				      - pytorch_ios_build:

				          name: pytorch_ios_11_2_1_arm64_custom_build

				          context: org-member

				          build_environment: "pytorch-ios-11.2.1-arm64_custom_build"

				          ios_arch: "arm64"

				          ios_platform: "OS"

				          op_list: "mobilenetv2.yaml"

				          requires:

				            - setup

									
										3

.circleci/verbatim-sources/workflows-pytorch-macos-builds.yml
									
												View File
												
				@ -8,6 +8,3 @@

				          requires:

				            - setup

				            - pytorch_macos_10_13_py3_build

				      - pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				          requires:

				            - setup

									
										32

.circleci/verbatim-sources/workflows-pytorch-mobile-builds.yml
									
												View File
												
				@ -4,4 +4,34 @@

				          requires:

				            - setup

				          build_environment: "pytorch-linux-xenial-py3-clang5-mobile-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:405"

				          build_only: "1"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:f990c76a-a798-42bb-852f-5be5006f8026"

				      - pytorch_linux_build:

				          name: pytorch_linux_xenial_py3_clang5_mobile_custom_build_static

				          requires:

				            - setup

				          build_environment: "pytorch-linux-xenial-py3-clang5-mobile-custom-build-static"

				          build_only: "1"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:f990c76a-a798-42bb-852f-5be5006f8026"

				      - pytorch_linux_build:

				          name: pytorch_linux_xenial_py3_clang5_mobile_custom_build_dynamic

				          requires:

				            - setup

				          build_environment: "pytorch-linux-xenial-py3-clang5-mobile-custom-build-dynamic"

				          build_only: "1"

				          # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

				      - pytorch_linux_build:

				          name: pytorch_linux_xenial_py3_clang5_mobile_code_analysis

				          requires:

				            - setup

				          # Most of this CI is already covered by "mobile-custom-build-dynamic" job

				          filters:

				            branches:

				              only:

				                - master

				                - /ci-all\/.*/

				          build_environment: "pytorch-linux-xenial-py3-clang5-mobile-code-analysis"

				          build_only: "1"

				          # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

									
										8

.circleci/verbatim-sources/workflows-setup-job.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,8 @@

				      - setup:

				          # Run this job on everything since it is

				          # the dependency for everything.

				          filters:

				            tags:

				              only: /.*/

				            branches:

				              only: /.*/

2

.flake8

View File

 @ -8,6 +8,6 @@ ignore =
     # these ignores are from flake8-bugbear; please fix!
     B007,B008,
     # these ignores are from flake8-comprehensions; please fix!
     C400,C401,C402,C403,C404,C405,C407,C411,
     C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
 per-file-ignores = __init__.py: F401
 exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi,.git

									
										12

.github/workflows/lint.yml
									
										vendored
									
												View File
												
				@ -22,7 +22,9 @@ jobs:

				          pip install -r requirements.txt

				          cd .circleci && ./ensure-consistency.py

				      - name: Ensure Docker version is correctly deployed

				        run: .circleci/validate-docker-version.py

				        run: |

				          pip install pyyaml

				          .circleci/validate-docker-version.py

				      - name: Shellcheck Jenkins scripts

				        run: |

				          sudo apt-get install -y shellcheck

				@ -65,7 +67,8 @@ jobs:

				      - name: Run flake8

				        run: |

				          set -eux

				          pip install flake8

				          pip install flake8 flake8-mypy flake8-bugbear flake8-comprehensions flake8-executable flake8-pyi mccabe pycodestyle pyflakes

				          flake8 --version

				          flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt

				          cat ${GITHUB_WORKSPACE}/flake8-output.txt

				      - name: Add annotations

				@ -102,7 +105,7 @@ jobs:

				        run: |

				          set -eux

				          pip install flake8

				          rm -rf .circleci

				          rm -rf .circleci tools/clang_format_new.py

				          flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt

				          cat ${GITHUB_WORKSPACE}/flake8-output.txt

				      - name: Add annotations

				@ -150,7 +153,7 @@ jobs:

				          # Install dependencies

				          pip install pyyaml

				          wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -

				          sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-8 main"

				          sudo apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-8 main"

				          sudo apt-get update

				          sudo apt-get install -y clang-tidy-8

				          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-8 1000

				@ -175,7 +178,6 @@ jobs:

				              -s aten/src/ATen \

				              -d build/aten/src/ATen \

				              aten/src/ATen/Declarations.cwrap \

				              aten/src/THNN/generic/THNN.h \

				              aten/src/THCUNN/generic/THCUNN.h \

				              aten/src/ATen/nn.yaml \

				              aten/src/ATen/native/native_functions.yaml

15

.gitignore vendored

View File

 @ -57,11 +57,6 @@ torch/csrc/jit/generated/*
 torch/csrc/jit/fuser/config.h
 torch/csrc/nn/THCUNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THNN_generic.cpp
 torch/csrc/nn/THNN_generic.cwrap
 torch/csrc/nn/THNN_generic.h
 torch/csrc/nn/THNN.cpp
 torch/csrc/nn/THNN.cwrap
 torch/bin/
 torch/cmake/
 torch/lib/*.a*
 @ -250,3 +245,13 @@ GSYMS
 GPATH
 tags
 TAGS
 # ccls file
 .ccls-cache/
 # clang-format storage location used by apply_clang_format.py
 .clang-format-bin
 # clangd background index
 .clangd/

10

.gitmodules vendored

View File

 @ -111,10 +111,14 @@
     path = third_party/foxi
     url = https://github.com/houseroad/foxi.git
 [submodule "third_party/tbb"]
 	path = third_party/tbb
 	url = https://github.com/01org/tbb
 	branch = tbb_2018
     path = third_party/tbb
     url = https://github.com/01org/tbb
     branch = tbb_2018
 [submodule "android/libs/fbjni"]
     ignore = dirty
     path = android/libs/fbjni
     url = https://github.com/facebookincubator/fbjni.git
 [submodule "third_party/XNNPACK"]
     ignore = dirty
     path = third_party/XNNPACK
     url = https://github.com/google/XNNPACK.git

									
										12

.jenkins/caffe2/build.sh
									
												View File
												
				@ -104,7 +104,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  build_args+=("BUILD_TEST=ON")

				  build_args+=("USE_OBSERVERS=ON")

				  build_args+=("USE_ZSTD=ON")

				  "${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@"

				  BUILD_CAFFE2_MOBILE=1 "${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@"

				  exit 0

				fi

				@ -130,7 +130,7 @@ if [[ $BUILD_ENVIRONMENT == *py2-cuda9.0-cudnn7-ubuntu16.04* ]]; then

				  # removing http:// duplicate in favor of nvidia-ml.list

				  # which is https:// version of the same repo

				  sudo rm -f /etc/apt/sources.list.d/nvidia-machine-learning.list

				  curl -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb

				  curl --retry 3 -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb

				  sudo dpkg -i ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb

				  sudo apt-key add /var/nvinfer-runtime-trt-repo-5.0.2-ga-cuda9.0/7fa2af80.pub

				  sudo apt-get -qq update

				@ -175,18 +175,12 @@ if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py"

				fi

				# building bundled nccl in this config triggers a bug in nvlink. For

				# more, see https://github.com/pytorch/pytorch/issues/14486

				if [[ "${BUILD_ENVIRONMENT}" == *-cuda8*-cudnn7* ]]; then

				    build_args+=("USE_SYSTEM_NCCL=ON")

				fi

				# Try to include Redis support for Linux builds

				if [ "$(uname)" == "Linux" ]; then

				  build_args+=("USE_REDIS=ON")

				fi

				# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace

				# Use a specialized onnx namespace in CI to catch hardcoded onnx namespace

				build_args+=("ONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")

				###############################################################################

									
										5

.jenkins/caffe2/test.sh
									
												View File
												
				@ -40,6 +40,9 @@ for test in $(find "$cpp_test_dir" -executable -type f); do

				        LD_LIBRARY_PATH="$ld_library_path" "$test"

				      fi

				      ;;

				    */*_benchmark)

				      LD_LIBRARY_PATH="$ld_library_path" "$test" --benchmark_color=false

				      ;;

				    *)

				      # Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While

				      # planning to migrate to gtest as the common PyTorch c++ test suite, we

				@ -82,7 +85,7 @@ fi

				EXTRA_TESTS=()

				# CUDA builds always include NCCL support

				if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *-rocm* ]]; then

				  EXTRA_TESTS+=("$caffe2_pypath/contrib/nccl")

				fi

									
										25

.jenkins/pytorch/build-mobile-code-analysis.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,25 @@

				#!/usr/bin/env bash

				# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables

				set -eu -o pipefail

				# This script builds and runs code analyzer tool to generate aten op dependency

				# graph for custom mobile build.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Clang version:"

				clang --version

				export LLVM_DIR="$(llvm-config-5.0 --prefix)"

				echo "LLVM_DIR: ${LLVM_DIR}"

				# Run the following 2 steps together because they share the same (reusable) time

				# consuming process to build LibTorch into LLVM assembly.

				# 1. Run code analysis test first to fail fast

				time ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh

				# 2. Run code analysis on mobile LibTorch

				time ANALYZE_TORCH=1 tools/code_analyzer/build.sh -closure=false

									
										26

.jenkins/pytorch/build-mobile.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,26 @@

				#!/usr/bin/env bash

				# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables

				set -eu -o pipefail

				# This script uses linux host toolchain + mobile build options in order to

				# build & test mobile libtorch without having to setup Android/iOS

				# toolchain/simulator.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Install torch & torchvision - used to download & trace test model.

				pip install torch torchvision --progress-bar off

				# Run end-to-end process of building mobile library, linking into the predictor

				# binary, and running forward pass with a real model.

				if [[ "$BUILD_ENVIRONMENT" == *-mobile-custom-build-static* ]]; then

				  TEST_CUSTOM_BUILD_STATIC=1 test/mobile/custom_build/build.sh

				elif [[ "$BUILD_ENVIRONMENT" == *-mobile-custom-build-dynamic* ]]; then

				  export LLVM_DIR="$(llvm-config-5.0 --prefix)"

				  echo "LLVM_DIR: ${LLVM_DIR}"

				  TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh

				else

				  TEST_DEFAULT_BUILD=1 test/mobile/custom_build/build.sh

				fi

									
										66

.jenkins/pytorch/build.sh
									
												View File
												
				@ -14,13 +14,13 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# (2) build with NCCL and MPI

				# (3) build with only MPI

				# (4) build with neither

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get -qq update

				  sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get -qq update

				  if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				@ -36,10 +36,12 @@ if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-mobile* ]]; then

				  # Use linux host toolchain + mobile build options in order to build & test

				  # mobile libtorch without having to setup Android/iOS toolchain/simulator.

				  exec ./scripts/build_mobile.sh -DBUILD_BINARY=ON "$@"

				if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-mobile-code-analysis* ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile-code-analysis.sh" "$@"

				fi

				echo "Python version:"

				@ -51,6 +53,11 @@ gcc --version

				echo "CMake version:"

				cmake --version

				if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				  echo "NVCC version:"

				  nvcc --version

				fi

				# TODO: Don't run this...

				pip_install -r requirements.txt || true

				@ -59,7 +66,7 @@ if ! which conda; then

				  # In ROCm CIs, we are doing cross compilation on build machines with

				  # intel cpu and later run tests on machines with amd cpu.

				  # Also leave out two builds to make sure non-mkldnn builds still work.

				  if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda9-cudnn7-py3-* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda10.1-cudnn7-py3-* ]]; then

				    pip_install mkl mkl-devel

				    export USE_MKLDNN=1

				  else

				@ -98,7 +105,6 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  elif [[ "${BUILD_ENVIRONMENT}" == *-x86_64* ]]; then

				    build_args+=("-DANDROID_ABI=x86_64")

				  fi

				  export BUILD_PYTORCH_MOBILE=1

				  exec ./scripts/build_android.sh "${build_args[@]}" "$@"

				fi

				@ -198,16 +204,6 @@ if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then

				  assert_git_not_dirty

				  # Test documentation build

				  if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				    pushd docs

				    # TODO: Don't run this here

				    pip_install -r requirements.txt || true

				    LC_ALL=C make html

				    popd

				    assert_git_not_dirty

				  fi

				  # Build custom operator tests.

				  CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				  CUSTOM_OP_TEST="$PWD/test/custom_operator"

				@ -221,7 +217,7 @@ if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then

				  assert_git_not_dirty

				else

				  # Test standalone c10 build

				  if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda10.1-cudnn7-py3* ]]; then

				    mkdir -p c10/build

				    pushd c10/build

				    cmake ..

				@ -248,30 +244,24 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  pip_install lark-parser

				  # Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642

				  sudo add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"

				  sudo add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-8 main"

				  wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -

				  sudo apt-get -qq update

				  # Install clang-7 clang++-7 for xla

				  sudo apt-get -qq install clang-7 clang++-7

				  # Install clang-8 clang++-8 for xla

				  sudo apt-get -qq install clang-8 clang++-8

				  # Bazel dependencies

				  sudo apt-get -qq install pkg-config zip zlib1g-dev unzip

				  # XLA build requires Bazel

				  wget https://github.com/bazelbuild/bazel/releases/download/0.24.1/bazel-0.24.1-installer-linux-x86_64.sh

				  chmod +x bazel-*.sh

				  sudo ./bazel-*.sh

				  BAZEL="$(which bazel)"

				  if [ -z "${BAZEL}" ]; then

				    echo "Unable to find bazel..."

				    exit 1

				  fi

				  # Install bazels3cache for cloud cache

				  sudo apt-get -qq install npm

				  npm config set strict-ssl false

				  curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -

				  curl -sL --retry 3 https://deb.nodesource.com/setup_6.x | sudo -E bash -

				  sudo apt-get install -qq nodejs

				  # XLA build requires Bazel

				  # We use bazelisk to avoid updating Bazel version manually.

				  sudo npm install -g @bazel/bazelisk

				  sudo ln -s "$(command -v bazelisk)" /usr/bin/bazel

				  # Install bazels3cache for cloud cache

				  sudo npm install -g bazels3cache

				  BAZELS3CACHE="$(which bazels3cache)"

				  if [ -z "${BAZELS3CACHE}" ]; then

				@ -281,7 +271,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  bazels3cache --bucket=${XLA_CLANG_CACHE_S3_BUCKET_NAME} --maxEntrySizeBytes=0

				  pushd xla

				  export CC=clang-7 CXX=clang++-7

				  export CC=clang-8 CXX=clang++-8

				  # Use cloud cache to build when available.

				  sed -i '/bazel build/ a --remote_http_cache=http://localhost:7777 \\' build_torch_xla_libs.sh

									
										12

.jenkins/pytorch/common.sh
									
												View File
												
				@ -128,7 +128,7 @@ if [ -z "$COMPACT_JOB_NAME" ]; then

				  exit 1

				fi

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3* ]] || \

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda10.1-cudnn7-py3* ]] || \

				   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]] || \

				   [[ "$BUILD_ENVIRONMENT" == *pytorch_macos* ]]; then

				  BUILD_TEST_LIBTORCH=1

				@ -140,7 +140,7 @@ fi

				# min version 3.5, so we only do it in two builds that we know should use conda.

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py2* ]] || \

				     [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then

				     [[ "$BUILD_ENVIRONMENT" == *cuda10.1-cudnn7-py3* ]]; then

				    if ! which conda; then

				      echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"

				      exit 1

				@ -179,3 +179,11 @@ function get_exit_code() {

				  set -e

				  return $retcode

				}

				function file_diff_from_base() {

				  # The fetch may fail on Docker hosts, but it's not always necessary.

				  set +e

				  git fetch origin master --quiet

				  set -e

				  git diff --name-only "$(git merge-base origin master HEAD)" > "$1"

				}

									
										2

.jenkins/pytorch/macos-common.sh
									
												View File
												
				@ -13,7 +13,7 @@ mkdir -p ${WORKSPACE_DIR}

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then

				  mkdir -p ${WORKSPACE_DIR}

				  retry curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh

				  curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh

				  retry bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3

				fi

				export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"

									
										10

.jenkins/pytorch/macos-test.sh
									
												View File
												
				@ -54,7 +54,14 @@ test_python_all() {

				  # using the address associated with the loopback interface.

				  export GLOO_SOCKET_IFNAME=lo0

				  echo "Ninja version: $(ninja --version)"

				  python test/run_test.py --verbose

				  if [ -n "$CIRCLE_PULL_REQUEST" ]; then

				    DETERMINE_FROM=$(mktemp)

				    file_diff_from_base "$DETERMINE_FROM"

				  fi

				  python test/run_test.py --verbose --determine-from="$DETERMINE_FROM"

				  assert_git_not_dirty

				}

				@ -102,7 +109,6 @@ test_custom_script_ops() {

				  # Run tests Python-side and export a script module.

				  python test_custom_ops.py -v

				  python test_custom_classes.py -v

				  python model.py --export-script-module=model.pt

				  # Run tests C++-side and load the exported script module.

				  build/test_custom_ops ./model.pt

									
										12

.jenkins/pytorch/multigpu-test.sh
									
												View File
												
				@ -14,13 +14,13 @@ if [ -n "${IN_CIRCLECI}" ]; then

				  # TODO move this to docker

				  pip_install unittest-xml-reporting

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				@ -31,7 +31,7 @@ fi

				python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" build/bin/test_api

				time python test/run_test.py --verbose -i distributed

				time python test/run_test.py --verbose -i c10d

				time python test/run_test.py --verbose -i c10d_spawn

				time python test/run_test.py --verbose -i distributed/test_distributed

				time python test/run_test.py --verbose -i distributed/test_c10d

				time python test/run_test.py --verbose -i distributed/test_c10d_spawn

				assert_git_not_dirty

									
										78

.jenkins/pytorch/test.sh
									
												View File
												
				@ -15,13 +15,13 @@ if [ -n "${IN_CIRCLECI}" ]; then

				  # TODO move this to docker

				  pip_install unittest-xml-reporting

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get -qq update

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get -qq update

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				@ -37,6 +37,7 @@ fi

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # TODO: Move this to Docker

				  sudo apt-get -qq install --no-install-recommends apt-transport-https ca-certificates

				  sudo apt-get -qq update

				  sudo apt-get -qq install --no-install-recommends libsndfile1

				fi

				@ -48,7 +49,8 @@ if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  # ninja is installed in /var/lib/jenkins/.local/bin

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				  # TODO: move this to Docker

				  # TODO: Please move this to Docker

				  # The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

				  pip_install --user "hypothesis==4.53.2"

				  # TODO: move this to Docker

				@ -74,27 +76,49 @@ fi

				# if you're not careful.  Check this if you made some changes and the

				# ASAN test is not working

				if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    # Suppress vptr violations arising from multiple copies of pybind11

				    export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true

				    # We suppress the vptr volation, since we have separate copies of

				    # libprotobuf in both libtorch.so and libcaffe2.so, and it causes

				    # the following problem:

				    #    test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:

				    #        runtime error: member call on address ... which does not point

				    #        to an object of type 'google::protobuf::MessageLite'

				    #        ...: note: object is of type 'onnx_torch::ModelProto'

				    #

				    # This problem should be solved when libtorch.so and libcaffe2.so are

				    # merged.

				    export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp

				    export PYTORCH_TEST_WITH_ASAN=1

				    export PYTORCH_TEST_WITH_UBSAN=1

				    # TODO: Figure out how to avoid hard-coding these paths

				    export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-5.0/bin/llvm-symbolizer

				    export TORCH_USE_RTLD_GLOBAL=1

				    # NB: We load libtorch.so with RTLD_GLOBAL for UBSAN, unlike our

				    # default behavior.

				    #

				    # The reason for this is that without RTLD_GLOBAL, if we load multiple

				    # libraries that depend on libtorch (as is the case with C++ extensions), we

				    # will get multiple copies of libtorch in our address space.  When UBSAN is

				    # turned on, it will do a bunch of virtual pointer consistency checks which

				    # won't work correctly.  When this happens, you get a violation like:

				    #

				    #    member call on address XXXXXX which does not point to an object of

				    #    type 'std::_Sp_counted_base<__gnu_cxx::_Lock_policy::_S_atomic>'

				    #    XXXXXX note: object is of type

				    #    'std::_Sp_counted_ptr<torch::nn::LinearImpl*, (__gnu_cxx::_Lock_policy)2>'

				    #

				    # (NB: the textual types of the objects here are misleading, because

				    # they actually line up; it just so happens that there's two copies

				    # of the type info floating around in the address space, so they

				    # don't pointer compare equal.  See also

				    #   https://github.com/google/sanitizers/issues/1175

				    #

				    # UBSAN is kind of right here: if we relied on RTTI across C++ extension

				    # modules they would indeed do the wrong thing;  but in our codebase, we

				    # don't use RTTI (because it doesn't work in mobile).  To appease

				    # UBSAN, however, it's better if we ensure all the copies agree!

				    #

				    # By the way, an earlier version of this code attempted to load

				    # libtorch_python.so with LD_PRELOAD, which has a similar effect of causing

				    # it to be loaded globally.  This isn't really a good idea though, because

				    # it depends on a ton of dynamic libraries that most programs aren't gonna

				    # have, and it applies to child processes.

				    export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so

				    # Increase stack size, because ASAN red zones use more stack

				    ulimit -s 81920

				    (cd test && python -c "import torch")

				    (cd test && python -c "import torch; print(torch.__version__, torch.version.git_version)")

				    echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_ubsan(0)")

				@ -107,23 +131,28 @@ elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then

				  export ATEN_CPU_CAPABILITY=avx

				fi

				if [ -n "$CIRCLE_PULL_REQUEST" ]; then

				  DETERMINE_FROM=$(mktemp)

				  file_diff_from_base "$DETERMINE_FROM"

				fi

				test_python_nn() {

				  time python test/run_test.py --include nn --verbose

				  time python test/run_test.py --include test_nn --verbose --determine-from="$DETERMINE_FROM"

				  assert_git_not_dirty

				}

				test_python_ge_config_simple() {

				  time python test/run_test.py --include jit_simple --verbose

				  time python test/run_test.py --include test_jit_simple --verbose --determine-from="$DETERMINE_FROM"

				  assert_git_not_dirty

				}

				test_python_ge_config_legacy() {

				  time python test/run_test.py --include jit_legacy jit_fuser_legacy --verbose

				  time python test/run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="$DETERMINE_FROM"

				  assert_git_not_dirty

				}

				test_python_all_except_nn() {

				  time python test/run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods

				  time python test/run_test.py --exclude test_nn test_jit_simple test_jit_legacy test_jit_fuser_legacy --verbose --bring-to-front test_quantization test_quantized test_quantized_tensor test_quantized_nn_mods --determine-from="$DETERMINE_FROM"

				  assert_git_not_dirty

				}

				@ -153,7 +182,7 @@ test_aten() {

				}

				test_torchvision() {

				  pip_install --user git+https://github.com/pytorch/vision.git@44a5bae933655ed7ff798669a43452b833f9ce01

				  pip_install --user git+https://github.com/pytorch/vision.git@43e94b39bcdda519c093ca11d99dfa2568aa7258

				}

				test_libtorch() {

				@ -180,7 +209,6 @@ test_custom_script_ops() {

				    cp -a "$CUSTOM_OP_BUILD" build

				    # Run tests Python-side and export a script module.

				    python test_custom_ops.py -v

				    python test_custom_classes.py -v

				    python model.py --export-script-module=model.pt

				    # Run tests C++-side and load the exported script module.

				    build/test_custom_ops ./model.pt

				@ -191,7 +219,9 @@ test_custom_script_ops() {

				test_xla() {

				  export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"

				  export XRT_WORKERS="localservice:0;grpc://localhost:40934"

				  # Issue #30717: randomize the port of XLA/gRPC workers is listening on to reduce flaky tests.

				  XLA_PORT=`shuf -i 40701-40999 -n 1`

				  export XRT_WORKERS="localservice:0;grpc://localhost:$XLA_PORT"

				  pushd xla

				  echo "Running Python Tests"

				  ./test/run_tests.sh

				@ -214,7 +244,7 @@ test_backward_compatibility() {

				  pushd test/backward_compatibility

				  python dump_all_function_schemas.py --filename new_schemas.txt

				  pip_uninstall torch

				  pip_install torch==1.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html

				  pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

				  python check_backward_compatibility.py --new-schemas new_schemas.txt

				  popd

				  set +x

				@ -240,9 +270,9 @@ elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then

				  # TODO: run some C++ tests

				  echo "no-op at the moment"

				elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then

				  test_torchvision

				  test_python_nn

				elif [[ "${BUILD_ENVIRONMENT}" == *-test2 || "${JOB_BASE_NAME}" == *-test2 ]]; then

				  test_torchvision

				  test_python_all_except_nn

				  test_aten

				  test_libtorch

									
										2

.jenkins/pytorch/win-build.sh
									
												View File
												
				@ -10,7 +10,7 @@ if [ ! -f setup.py ]; then

				fi

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build

				COMPACT_JOB_NAME=pytorch-win-ws2019-cuda10-cudnn7-py3-build

				SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )

				source "$SCRIPT_PARENT_DIR/common.sh"

									
										47

.jenkins/pytorch/win-test-helpers/build_pytorch.bat
									
												View File
												
				@ -4,7 +4,7 @@ if "%DEBUG%" == "1" (

				  set BUILD_TYPE=release

				)

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%

				:: This inflates our log size slightly, but it is REALLY useful to be

				:: able to see what our cl.exe commands are (since you can actually

				@ -35,17 +35,28 @@ goto cuda_build_end

				:: Override VS env here

				pushd .

				call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

				if "%VC_VERSION%" == "" (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64

				) else (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%

				)

				@echo on

				popd

				set DISTUTILS_USE_SDK=1

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set CUDA_PATH_V9_0=%CUDA_PATH%

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2

				set CUDA_PATH_V9_2=%CUDA_PATH%

				goto cuda_build_common

				:cuda_build_10

				pushd .

				if "%VC_VERSION%" == "" (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64

				) else (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%

				)

				@echo on

				popd

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1

				set CUDA_PATH_V10_1=%CUDA_PATH%

				@ -54,6 +65,8 @@ goto cuda_build_common

				:cuda_build_common

				set DISTUTILS_USE_SDK=1

				set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64

				set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%

				set CUDNN_ROOT_DIR=%CUDA_PATH%

				@ -64,14 +77,16 @@ set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%

				set PATH=%TMP_DIR_WIN%\bin;%PATH%

				:: Target only our CI GPU machine's CUDA arch to speed up the build

				set TORCH_CUDA_ARCH_LIST=5.2

				:: Target only our CI GPU machine's CUDA arch to speed up the build, we can overwrite with env var

				:: default on circleci is Tesla T4 which has capability of 7.5, ref: https://developer.nvidia.com/cuda-gpus

				:: jenkins has M40, which is 5.2

				if "%TORCH_CUDA_ARCH_LIST%" == "" set TORCH_CUDA_ARCH_LIST=5.2

				sccache --stop-server

				sccache --start-server

				sccache --zero-stats

				set CC=sccache cl

				set CXX=sccache cl

				set CC=sccache-cl

				set CXX=sccache-cl

				set CMAKE_GENERATOR=Ninja

				@ -107,7 +122,19 @@ if not "%USE_CUDA%"=="0" (

				    copy %TMP_DIR_WIN%\bin\sccache.exe %TMP_DIR_WIN%\bin\nvcc.exe

				  )

				  set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc

				  :: randomtemp is used to resolve the intermittent build error related to CUDA.

				  :: code: https://github.com/peterjc123/randomtemp

				  :: issue: https://github.com/pytorch/pytorch/issues/25393

				  ::

				  :: Previously, CMake uses CUDA_NVCC_EXECUTABLE for finding nvcc and then

				  :: the calls are redirected to sccache. sccache looks for the actual nvcc

				  :: in PATH, and then pass the arguments to it.

				  :: Currently, randomtemp is placed before sccache (%TMP_DIR_WIN%\bin\nvcc)

				  :: so we are actually pretending sccache instead of nvcc itself.

				  curl -kL https://github.com/peterjc123/randomtemp/releases/download/v0.2/randomtemp.exe --output %TMP_DIR_WIN%\bin\randomtemp.exe

				  set RANDOMTEMP_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc.exe

				  set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\randomtemp.exe

				  set RANDOMTEMP_BASEDIR=%TMP_DIR_WIN%\bin

				  if "%REBUILD%"=="" set USE_CUDA=1

									
										8

.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat
									
												View File
												
				@ -1,4 +1,4 @@

				if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda90

				if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda92

				if "%CUDA_VERSION%" == "10" set CUDA_SUFFIX=cuda101

				if "%CUDA_SUFFIX%" == "" (

				@ -8,10 +8,10 @@ if "%CUDA_SUFFIX%" == "" (

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z

				    curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet

				    aws s3 cp s3://ossci-windows/magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet

				  )

				  7z x -aoa %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma

				  7z x -aoa %TMP_DIR_WIN%\magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma

				)

				set MAGMA_HOME=%TMP_DIR_WIN%\magma

									
										6

.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat
									
												View File
												
				@ -5,11 +5,11 @@ if "%BUILD_ENVIRONMENT%"=="" (

				)

				if "%REBUILD%"=="" (

				  IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe

				  curl --retry 3 -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe

				  %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3

				)

				call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

				if "%REBUILD%"=="" (

				  :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				  call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3

				  call conda install -y -q python=%PYTHON_VERSION% numpy cffi pyyaml boto3

				  call conda install -y -q -c conda-forge cmake

				)

									
										4

.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat
									
												View File
												
				@ -1,8 +1,8 @@

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2019.4.245.7z --output %TMP_DIR_WIN%\mkl.7z

				    curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/mkl_2020.0.166.7z --output %TMP_DIR_WIN%\mkl.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/mkl_2019.4.245.7z %TMP_DIR_WIN%\mkl.7z --quiet

				    aws s3 cp s3://ossci-windows/mkl_2020.0.166.7z %TMP_DIR_WIN%\mkl.7z --quiet

				  )

				  7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl

				)

									
										7

.jenkins/pytorch/win-test-helpers/installation-helpers/install_sccache.bat
									
												View File
												
				@ -4,11 +4,14 @@ if "%REBUILD%"=="" (

				  :check_sccache

				  %TMP_DIR_WIN%\bin\sccache.exe --show-stats || (

				    taskkill /im sccache.exe /f /t || ver > nul

				    del %TMP_DIR_WIN%\bin\sccache.exe

				    del %TMP_DIR_WIN%\bin\sccache.exe || ver > nul

				    del %TMP_DIR_WIN%\bin\sccache-cl.exe || ver > nul

				    if "%BUILD_ENVIRONMENT%"=="" (

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe

				      curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe

				      curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output %TMP_DIR_WIN%\bin\sccache-cl.exe

				    ) else (

				      aws s3 cp s3://ossci-windows/sccache.exe %TMP_DIR_WIN%\bin\sccache.exe

				      aws s3 cp s3://ossci-windows/sccache-cl.exe %TMP_DIR_WIN%\bin\sccache-cl.exe

				    )

				    goto :check_sccache

				  )

									
										25

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat
									
												View File
												
				@ -3,7 +3,7 @@ if exist "%TMP_DIR%/ci_scripts/pytorch_env_restore.bat" (

				    exit /b 0

				)

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%

				set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%

				:: Install Miniconda3

				if "%BUILD_ENVIRONMENT%"=="" (

				@ -13,7 +13,7 @@ if "%BUILD_ENVIRONMENT%"=="" (

				)

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )

				    curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe

				    curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe

				    %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3

				)

				call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

				@ -21,8 +21,10 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (

				    :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				    :: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352

				    call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0

				    call conda install -y -q -c conda-forge cmake

				)

				pip install -q ninja future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow

				:: The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

				pip install ninja future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow

				:: No need to install faulthandler since we only test Python >= 3.6 on Windows

				:: faulthandler is builtin since Python 3.3

				@ -33,19 +35,27 @@ goto cuda_build_end

				:cuda_build_9

				pushd .

				call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

				if "%VC_VERSION%" == "" (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64

				) else (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%

				)

				@echo on

				popd

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0

				set CUDA_PATH_V9_0=%CUDA_PATH%

				set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2

				set CUDA_PATH_V9_2=%CUDA_PATH%

				goto cuda_build_common

				:cuda_build_10

				pushd .

				call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

				if "%VC_VERSION%" == "" (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64

				) else (

				    call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%

				)

				@echo on

				popd

				@ -56,6 +66,7 @@ goto cuda_build_common

				:cuda_build_common

				set DISTUTILS_USE_SDK=1

				set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64

				set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%

				set CUDNN_ROOT_DIR=%CUDA_PATH%

									
										2

.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat
									
												View File
												
				@ -1,3 +1,3 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				cd test && python run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose && cd ..

				cd test && python run_test.py --exclude test_nn test_jit_simple test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="%1" && cd ..

				if ERRORLEVEL 1 exit /b 1

									
										2

.jenkins/pytorch/win-test-helpers/test_python_nn.bat
									
												View File
												
				@ -8,7 +8,7 @@ python %SCRIPT_HELPERS_DIR%\run_python_nn_smoketests.py

				if ERRORLEVEL 1 exit /b 1

				echo Run nn tests

				python run_test.py --include nn --verbose

				python run_test.py --include test_nn --verbose --determine-from="%1"

				if ERRORLEVEL 1 exit /b 1

				popd

									
										14

.jenkins/pytorch/win-test.sh
									
												View File
												
				@ -1,7 +1,7 @@

				#!/bin/bash -ex

				# shellcheck disable=SC2034

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-test

				COMPACT_JOB_NAME=pytorch-win-ws2019-cuda10-cudnn7-py3-test

				SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )

				source "$SCRIPT_PARENT_DIR/common.sh"

				@ -30,18 +30,22 @@ fi

				export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers

				if [ -n "$CIRCLE_PULL_REQUEST" ]; then

				  DETERMINE_FROM="${TMP_DIR}/determine_from"

				  file_diff_from_base "$DETERMINE_FROM"

				fi

				run_tests() {

				    if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				        $SCRIPT_HELPERS_DIR/test_python_nn.bat && \

				        $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat && \

				        $SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM" && \

				        $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM" && \

				        $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \

				        $SCRIPT_HELPERS_DIR/test_libtorch.bat

				    else

				        if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				            $SCRIPT_HELPERS_DIR/test_python_nn.bat

				            $SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM"

				        elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				            $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat && \

				            $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM" && \

				            $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \

				            $SCRIPT_HELPERS_DIR/test_libtorch.bat

				        fi

0

.python2 → .python3

View File

14

CITATION

View File

 @ -1,6 +1,10 @@
 @inproceedings{paszke2017automatic,
   title={Automatic Differentiation in {PyTorch}},
   author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
   booktitle={NIPS Autodiff Workshop},
   year={2017}
 @incollection{NEURIPS2019_9015,
 title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
 author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
 booktitle = {Advances in Neural Information Processing Systems 32},
 editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
 pages = {8024--8035},
 year = {2019},
 publisher = {Curran Associates, Inc.},
 url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
 }

									
										84

CMakeLists.txt
									
												View File
												
				@ -30,7 +30,7 @@ endif()

				set(CMAKE_INSTALL_MESSAGE NEVER)

				set(CMAKE_CXX_STANDARD 11)

				set(CMAKE_CXX_STANDARD 14)

				if (NOT MSVC)

				  set(CMAKE_C_STANDARD 11)

				endif()

				@ -81,12 +81,24 @@ if(APPLE)

				  set(CMAKE_MACOSX_RPATH ON)

				endif()

				if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")

				  set(CPU_INTEL ON)

				if (WIN32)

				  # On Windows, CMAKE_HOST_SYSTEM_PROCESSOR is calculated through `PROCESSOR_ARCHITECTURE`,

				  # which only has the value of `x86` or `AMD64`. We cannot infer whether it's a Intel CPU

				  # or not. However, the environment variable `PROCESSOR_IDENTIFIER` could be used.

				  if ($ENV{PROCESSOR_IDENTIFIER} MATCHES "Intel")

				    set(CPU_INTEL ON)

				  else ()

				    set(CPU_INTEL OFF)

				  endif ()

				else ()

				  set(CPU_INTEL OFF)

				  if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")

				    set(CPU_INTEL ON)

				  else ()

				    set(CPU_INTEL OFF)

				  endif ()

				endif ()

				# For non-supported platforms, turn USE_DISTRIBUTED off by default.

				# It is not tested and likely won't work without additional changes.

				if(NOT LINUX)

				@ -107,11 +119,9 @@ option(BUILD_BINARY "Build C++ binaries" OFF)

				option(BUILD_DOCS "Build Caffe2 documentation" OFF)

				option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)

				option(BUILD_PYTHON "Build Python binaries" ON)

				cmake_dependent_option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON

				  "NOT MSVC" OFF)

				option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)

				option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)

				option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)

				option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)

				option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" OFF)

				option(USE_STATIC_DISPATCH "Use static dispatch for ATen operators" OFF)

				cmake_dependent_option(

				    CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON

				@ -175,6 +185,7 @@ option(USE_SNPE "Use Qualcomm's SNPE library" OFF)

				option(USE_SYSTEM_EIGEN_INSTALL

				    "Use system Eigen instead of the one under third_party" OFF)

				option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)

				option(USE_XNNPACK "Use XNNPACK" ON)

				option(USE_ZMQ "Use ZMQ" OFF)

				option(USE_ZSTD "Use ZSTD" OFF)

				cmake_dependent_option(

				@ -192,6 +203,7 @@ cmake_dependent_option(

				    USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON

				    "USE_DISTRIBUTED" OFF)

				option(USE_TBB "Use TBB" OFF)

				option(ONNX_ML "Enable traditional ONNX ML API." ON)

				# Used when building Caffe2 through setup.py

				option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" ON)

				@ -207,6 +219,8 @@ cmake_dependent_option(

				set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")

				set(SELECTED_OP_LIST "" CACHE STRING

				    "Path to the yaml file that contains the list of operators to include for custom build. Include all operators by default.")

				set(OP_DEPENDENCY "" CACHE STRING

				    "Path to the yaml file that contains the op dependency graph for custom build.")

				# This is a fix for a rare build issue on Ubuntu:

				# symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk

				@ -260,8 +274,16 @@ if (MSVC)

				    endif()

				    # /bigobj increases number of sections in .obj file, which is needed to link

				    # against libaries in Python 2.7 under Windows

				    set(${flag_var} "${${flag_var}} /MP /bigobj")

				    # against libraries in Python 2.7 under Windows

				    # For Visual Studio generators, if /MP is not added, then we may need

				    # to add /MP to the flags.

				    # For other generators like ninja, we don't need to add /MP because it is

				    # already handled by the generator itself.

				    if(CMAKE_GENERATOR MATCHES "Visual Studio" AND NOT ${flag_var} MATCHES "/MP")

				      set(${flag_var} "${${flag_var}} /MP /bigobj")

				    else()

				      set(${flag_var} "${${flag_var}} /bigobj")

				    endif()

				  endforeach(flag_var)

				  foreach(flag_var

				@ -283,17 +305,24 @@ if (MSVC)

				  list(APPEND CUDA_NVCC_FLAGS "-Xcompiler /w -w")

				endif(MSVC)

				IF(NOT MSVC)

				  list(APPEND CUDA_NVCC_FLAGS_DEBUG "-g" "-lineinfo" "--source-in-ptx")

				  list(APPEND CUDA_NVCC_FLAGS_RELWITHDEBINFO "-g" "-lineinfo" "--source-in-ptx")

				ENDIF(NOT MSVC)

				# Set INTERN_BUILD_MOBILE for all mobile builds. Components that are not

				# applicable to mobile are disabled by this variable.

				if (ANDROID OR IOS)

				# Setting `BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN` environment variable can

				# force it to do mobile build with host toolchain - which is useful for testing

				# purpose.

				if (ANDROID OR IOS OR DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN})

				  set(INTERN_BUILD_MOBILE ON)

				endif()

				# Setting `PYTORCH_BUILD_MOBILE` environment variable can force it to do mobile

				# build with host toolchain.

				if (DEFINED ENV{PYTORCH_BUILD_MOBILE})

				  set(INTERN_BUILD_MOBILE ON)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DC10_MOBILE")

				  if (DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN})

				    # C10_MOBILE is derived from Android/iOS toolchain macros in

				    # c10/macros/Macros.h, so it needs to be explicitly set here.

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DC10_MOBILE")

				  endif()

				endif()

				# INTERN_BUILD_ATEN_OPS is used to control whether to build ATen/TH operators.

				@ -318,11 +347,13 @@ if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)

				  set(FEATURE_TORCH_MOBILE ON)

				  set(NO_API ON)

				  set(USE_FBGEMM OFF)

				  set(USE_PYTORCH_QNNPACK ON)

				  set(USE_QNNPACK OFF)

				  set(INTERN_DISABLE_ONNX ON)

				  set(INTERN_DISABLE_AUTOGRAD ON)

				  set(INTERN_USE_EIGEN_BLAS ON)

				  # Disable developing mobile interpreter for actual mobile build.

				  # Enable it elsewhere to capture build error.

				  set(INTERN_DISABLE_MOBILE_INTERP ON)

				endif()

				# ---[ Utils

				@ -383,10 +414,6 @@ if(USE_FBGEMM)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_FBGEMM")

				endif()

				if(BUILD_NAMEDTENSOR)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBUILD_NAMEDTENSOR")

				endif()

				if(USE_QNNPACK)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_QNNPACK")

				endif()

				@ -395,6 +422,10 @@ if(USE_PYTORCH_QNNPACK)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_PYTORCH_QNNPACK")

				endif()

				if(USE_XNNPACK)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL")

				endif()

				# ---[ Whitelist file if whitelist is specified

				include(cmake/Whitelist.cmake)

				@ -405,8 +436,8 @@ if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.8.0

				endif()

				# ---[ Build flags

				set(CMAKE_C_STANDARD 99)

				set(CMAKE_CXX_STANDARD 11)

				set(CMAKE_C_STANDARD 11)

				set(CMAKE_CXX_STANDARD 14)

				if(NOT MSVC)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -fPIC")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")

				@ -414,6 +445,7 @@ if(NOT MSVC)

				  # Details at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1459

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror=return-type")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-field-initializers")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-type-limits")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-array-bounds")

				@ -475,6 +507,10 @@ if(NOT MSVC)

				  set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")

				  set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-math-errno")

				  set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-trapping-math")

				  check_cxx_compiler_flag("-Werror=format" HAS_WERROR_FORMAT)

				  if (HAS_WERROR_FORMAT)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror=format")

				  endif()

				endif()

				if (USE_ASAN)

11

CODEOWNERS

View File

 @ -10,13 +10,13 @@
 /test/test_c10d.py @pietern @mrshenli @zhaojuanmao
 /torch/utils/cpp_extension.py @goldsborough @fmassa @soumith @ezyang
 # Not there to stricly require the approval, but to be tagged as a reviewer
 # Not there to strictly require the approval, but to be tagged as a reviewer
 # on the PRs to push them into a high priority inbox.
 /torch/csrc/api/data/ @apaszke
 /torch/csrc/autograd/ @apaszke
 /torch/csrc/autograd/ @apaszke @albanD
 /torch/csrc/jit/ @apaszke
 /torch/nn/ @apaszke
 /torch/autograd/ @apaszke
 /torch/autograd/ @apaszke @albanD
 /torch/jit/ @apaszke
 /torch/utils/data/ @apaszke
 @ -25,3 +25,8 @@
 /torch/csrc/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/distributed/rpc @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/distributed/optim @mrshenli @pritamdamania87 @zhaojuanmao @aazzolini
 # Distributed tests
 /test/distributed @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/testing/_internal/distributed @mrshenli @pritamdamania87 @zhaojuanmao

									
										76

CODE_OF_CONDUCT.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,76 @@

				# Code of Conduct

				## Our Pledge

				In the interest of fostering an open and welcoming environment, we as

				contributors and maintainers pledge to make participation in our project and

				our community a harassment-free experience for everyone, regardless of age, body

				size, disability, ethnicity, sex characteristics, gender identity and expression,

				level of experience, education, socio-economic status, nationality, personal

				appearance, race, religion, or sexual identity and orientation.

				## Our Standards

				Examples of behavior that contributes to creating a positive environment

				include:

				* Using welcoming and inclusive language

				* Being respectful of differing viewpoints and experiences

				* Gracefully accepting constructive criticism

				* Focusing on what is best for the community

				* Showing empathy towards other community members

				Examples of unacceptable behavior by participants include:

				* The use of sexualized language or imagery and unwelcome sexual attention or

				advances

				* Trolling, insulting/derogatory comments, and personal or political attacks

				* Public or private harassment

				* Publishing others' private information, such as a physical or electronic

				address, without explicit permission

				* Other conduct which could reasonably be considered inappropriate in a

				professional setting

				## Our Responsibilities

				Project maintainers are responsible for clarifying the standards of acceptable

				behavior and are expected to take appropriate and fair corrective action in

				response to any instances of unacceptable behavior.

				Project maintainers have the right and responsibility to remove, edit, or

				reject comments, commits, code, wiki edits, issues, and other contributions

				that are not aligned to this Code of Conduct, or to ban temporarily or

				permanently any contributor for other behaviors that they deem inappropriate,

				threatening, offensive, or harmful.

				## Scope

				This Code of Conduct applies within all project spaces, and it also applies when

				an individual is representing the project or its community in public spaces.

				Examples of representing a project or community include using an official

				project e-mail address, posting via an official social media account, or acting

				as an appointed representative at an online or offline event. Representation of

				a project may be further defined and clarified by project maintainers.

				## Enforcement

				Instances of abusive, harassing, or otherwise unacceptable behavior may be

				reported by contacting the project team at <conduct@pytorch.org>. All

				complaints will be reviewed and investigated and will result in a response that

				is deemed necessary and appropriate to the circumstances. The project team is

				obligated to maintain confidentiality with regard to the reporter of an incident.

				Further details of specific enforcement policies may be posted separately.

				Project maintainers who do not follow or enforce the Code of Conduct in good

				faith may face temporary or permanent repercussions as determined by other

				members of the project's leadership.

				## Attribution

				This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,

				available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

				[homepage]: https://www.contributor-covenant.org

				For answers to common questions about this code of conduct, see

				https://www.contributor-covenant.org/faq

									
										191

CONTRIBUTING.md
									
												View File
												
				@ -1,3 +1,37 @@

				- [Contributing to PyTorch](#contributing-to-pytorch)

				- [Developing PyTorch](#developing-pytorch)

				- [Codebase structure](#codebase-structure)

				- [Unit testing](#unit-testing)

				  * [Better local unit tests with pytest](#better-local-unit-tests-with-pytest)

				- [Writing documentation](#writing-documentation)

				  * [Building documentation](#building-documentation)

				    + [Tips](#tips)

				    + [Building C++ Documentation](#building-c---documentation)

				  * [Previewing changes](#previewing-changes)

				    + [Submitting changes for review](#submitting-changes-for-review)

				  * [Adding documentation tests](#adding-documentation-tests)

				- [Profiling with `py-spy`](#profiling-with-py-spy)

				- [Managing multiple build trees](#managing-multiple-build-trees)

				- [C++ development tips](#c---development-tips)

				  * [Build only what you need](#build-only-what-you-need)

				  * [Code completion and IDE support](#code-completion-and-ide-support)

				  * [Make no-op build fast](#make-no-op-build-fast)

				    + [Use Ninja](#use-ninja)

				    + [Use CCache](#use-ccache)

				    + [Use a faster linker](#use-a-faster-linker)

				  * [C++ frontend development tips](#c---frontend-development-tips)

				- [CUDA development tips](#cuda-development-tips)

				- [Windows development tips](#windows-development-tips)

				  * [Known MSVC (and MSVC with NVCC) bugs](#known-msvc--and-msvc-with-nvcc--bugs)

				  * [Running clang-tidy](#running-clang-tidy)

				  * [Pre-commit tidy/linting hook](#pre-commit-tidy-linting-hook)

				  * [Building PyTorch with ASAN](#building-pytorch-with-asan)

				    + [Getting `ccache` to work](#getting--ccache--to-work)

				    + [Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?](#why-this-stuff-with--ld-preload--and--libasan-rt--)

				    + [Why LD_PRELOAD in the build function?](#why-ld-preload-in-the-build-function-)

				    + [Why no leak detection?](#why-no-leak-detection-)

				- [Caffe2 notes](#caffe2-notes)

				## Contributing to PyTorch

				If you are interested in contributing to PyTorch, your contributions will fall

				@ -95,7 +129,6 @@ and `python setup.py clean`. Then you can install in `develop` mode again.

				  * [src](aten/src)

				    * [TH](aten/src/TH)

				      [THC](aten/src/THC)

				      [THNN](aten/src/THNN)

				      [THCUNN](aten/src/THCUNN) - Legacy library code from the original

				      Torch. Try not to add things here; we're slowly porting these to

				      [native](aten/src/ATen/native).

				@ -185,26 +218,13 @@ pytest test/test_nn.py -k Loss -v

				The above is an example of testing a change to Loss functions: this command runs tests such as

				`TestNN.test_BCELoss` and `TestNN.test_MSELoss` and can be useful to save keystrokes.

				## Writing Documentation

				## Writing documentation

				PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)

				for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to

				fit into Jupyter documentation popups.

				For C++ documentation (https://pytorch.org/cppdocs), we use

				[Doxygen](http://www.doxygen.nl/) and then convert it to

				[Sphinx](http://www.sphinx-doc.org/) via

				[Breathe](https://github.com/michaeljones/breathe) and

				[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen

				reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more

				information on the documentation syntax. To build the documentation locally,

				`cd` into `docs/cpp` and then `make html`.

				We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen

				commands. To run this check locally, run `./check-doxygen.sh` from inside

				`docs/cpp`.

				### Building Documentation

				### Building documentation

				To build the documentation:

				@ -229,28 +249,6 @@ cd docs

				make html

				```

				4. To view HTML files, you must start an HTTP server. For example

				```bash

				# Start a server from the current directory (Python 3 only)

				cd docs/build/html

				python -m http.server

				```

				If you are developing on a remote machine, you can set up an SSH tunnel so that

				you can access the HTTP server on the remote machine on your local machine. To map

				remote port 8086 to local port 8086, use either of the following commands.

				```bash

				# For SSH

				ssh my_machine -L 8086:my_machine:8086

				# For Eternal Terminal

				et my_machine -t="8086:8086"

				```

				Then navigate to `localhost:8086` in your web browser.

				#### Tips

				The `.rst` source files live in [docs/source](docs/source). Some of the `.rst`

				@ -267,12 +265,84 @@ ls | grep rst | grep -v index | grep -v jit | xargs rm

				# Make your changes, build the docs, etc.

				# Don't commit the deletions!

				git add index.rst jit.rst 

				git add index.rst jit.rst

				...

				```

				#### Building C++ Documentation

				For C++ documentation (https://pytorch.org/cppdocs), we use

				[Doxygen](http://www.doxygen.nl/) and then convert it to

				[Sphinx](http://www.sphinx-doc.org/) via

				[Breathe](https://github.com/michaeljones/breathe) and

				[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen

				reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more

				information on the documentation syntax.

				### Adding Documentation Tests

				We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen

				commands. To run this check locally, run `./check-doxygen.sh` from inside

				`docs/cpp`.

				To build the documentation, follow the same steps as above, but run them from

				`docs/cpp` instead of `docs`.

				### Previewing changes

				To view HTML files locally, you can open the files in your web browser. For example,

				navigate to `file:///your_pytorch_folder/docs/build/html/index.html` in a web

				browser.

				If you are developing on a remote machine, you can set up an SSH tunnel so that

				you can access the HTTP server on the remote machine from your local machine. To map

				remote port 8000 to local port 8000, use either of the following commands.

				```bash

				# For SSH

				ssh my_machine -L 8000:my_machine:8000

				# For Eternal Terminal

				et my_machine -t="8000:8000"

				```

				Then navigate to `localhost:8000` in your web browser.

				#### Submitting changes for review

				It is helpful when submitting a PR that changes the docs to provide a rendered

				version of the result. If your change is small, you can add a screenshot of the

				changed docs to your PR.

				If your change to the docs is large and affects multiple pages, you can host

				the docs yourself with the following steps, then add a link to the output in your

				PR. These instructions use GitHub pages to host the docs

				you have built. To do so, follow [these steps](https://guides.github.com/features/pages/)

				to make a repo to host your changed documentation.

				GitHub pages expects to be hosting a Jekyll generated website which does not work

				well with the static resource paths used in the PyTorch documentation. To get around

				this, you must add an empty file called `.nojekyll` to your repo.

				```bash

				cd your_github_pages_repo

				touch .nojekyll

				git add .

				git commit

				git push

				```

				Then, copy built documentation and push the changes:

				```bash

				cd your_github_pages_repo

				cp -r ~/my_pytorch_path/docs/build/html/* .

				git add .

				git commit

				git push

				```

				Then you should be able to see the changes at your_github_username.github.com/your_github_pages_repo.

				### Adding documentation tests

				It is easy for code snippets in docstrings and `.rst` files to get out of date. The docs

				build includes the [Sphinx Doctest Extension](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html),

				@ -337,7 +407,7 @@ privileges.

				tweaked to adjust the stack sampling rate, see the `py-spy` readme for more

				details.

				## Managing Multiple Build Trees

				## Managing multiple build trees

				One downside to using `python setup.py develop` is that your development

				version of PyTorch will be installed globally on your account (e.g., if

				@ -356,7 +426,7 @@ source activate pytorch-myfeature

				python setup.py develop

				```

				## C++ Development tips

				## C++ development tips

				If you are working on the C++ code, there are a few important things that you

				will want to keep in mind:

				@ -364,7 +434,7 @@ will want to keep in mind:

				1. How to rebuild only the code you are working on.

				2. How to make rebuilds in the absence of changes go faster.

				### Build only what you need.

				### Build only what you need

				`python setup.py build` will build everything by default, but sometimes you are

				only interested in a specific component.

				@ -387,10 +457,11 @@ variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `U

				- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators).

				- `USE_NNPACK=0` will disable compiling with NNPACK.

				- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).

				- `USE_XNNPACK=0` will disable compiling with XNNPACK.

				For example:

				```bash

				DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 python setup.py develop

				DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop

				```

				For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build

				@ -407,7 +478,7 @@ C++ code. You need to `pip install ninja` to generate accurate

				information for the code in `torch/csrc`. More information at:

				- https://sarcasm.github.io/notes/dev/compilation-database.html

				### Make no-op build fast.

				### Make no-op build fast

				#### Use Ninja

				@ -534,7 +605,17 @@ The easiest way to use `lld` this is download the

				ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld

				```

				## CUDA Development tips

				### C++ frontend development tips

				We have very extensive tests in the [test/cpp/api](test/cpp/api) folder. The

				tests are a great way to see how certain components are intended to be used.

				When compiling PyTorch from source, the test runner binary will be written to

				`build/bin/test_api`. The tests use the [GoogleTest](https://github.com/google/googletest/blob/master/googletest)

				framework, which you can read up about to learn how to configure the test runner. When

				submitting a new feature, we care very much that you write appropriate tests.

				Please follow the lead of the other tests to see how to write a new test case.

				## CUDA development tips

				If you are working on the CUDA code, here are some useful CUDA debugging tips:

				@ -543,7 +624,7 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:

				    slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.

				2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,

				   `cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).

				3. CUDA supports a lot of C++11 features such as, `std::numeric_limits`, `std::nextafter`,

				3. CUDA supports a lot of C++11/14 features such as, `std::numeric_limits`, `std::nextafter`,

				   `std::tuple` etc. in device code. Many of such features are possible because of the

				   [--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions)

				   nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374)

				@ -620,7 +701,7 @@ two dynamic libraries, one linking with the other:

				```CMake

				project(myproject CXX)

				set(CMAKE_CXX_STANDARD 11)

				set(CMAKE_CXX_STANDARD 14)

				add_library(foo SHARED foo.cpp)

				add_library(bar SHARED bar.cpp)

				# NB: don't forget to __declspec(dllexport) at least one symbol from foo,

				@ -694,7 +775,7 @@ static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm");

				  we have AliasAnalysisKind::PURE_FUNCTION and not AliasAnalysisKind::PURE.

				  The same is likely true for other identifiers that we just didn't try to use yet.

				### Running Clang-Tidy

				## Running clang-tidy

				[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++

				linter and static analysis tool based on the clang compiler. We run clang-tidy

				@ -725,7 +806,7 @@ root folder if you used `setup.py build`. You can use `-c <clang-tidy-binary>`

				to change the clang-tidy this script uses. Make sure you have PyYaml installed,

				which is in PyTorch's `requirements.txt`.

				### Pre-commit Tidy/Linting Hook

				## Pre-commit tidy/linting hook

				We use clang-tidy and flake8 (installed with flake8-bugbear,

				flake8-comprehensions, flake8-mypy, and flake8-pyi) to perform additional

				@ -740,7 +821,7 @@ You'll need to install an appropriately configured flake8; see

				[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type)

				for documentation on how to do this.

				### Building PyTorch with ASAN

				## Building PyTorch with ASAN

				[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) is very

				useful for debugging memory errors in C++. We run it in CI, but here's how to

				@ -788,7 +869,7 @@ suo-devfair ~/pytorch ❯ build_with_asan

				suo-devfair ~/pytorch ❯ run_with_asan python test/test_jit.py

				```

				#### Getting `ccache` to work

				### Getting `ccache` to work

				The scripts above specify the `clang` and `clang++` binaries directly, which

				bypasses `ccache`. Here's how to get `ccache` to work:

				@ -799,7 +880,7 @@ bypasses `ccache`. Here's how to get `ccache` to work:

				3. Change the `CC` and `CXX` variables in `build_with_asan()` to point

				   directly to `clang` and `clang++`.

				#### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?

				### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?

				The “standard” workflow for ASAN assumes you have a standalone binary:

				@ -820,7 +901,7 @@ workaround for cases like this:

				More information can be found

				[here](https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso).

				#### Why LD_PRELOAD in the build function?

				### Why LD_PRELOAD in the build function?

				We need `LD_PRELOAD` because there is a cmake check that ensures that a

				simple program builds and runs. If we are building with ASAN as a shared

				@ -829,7 +910,7 @@ dynamic linker errors and the check will fail.

				We don’t actually need either of these if we fix the cmake checks.

				#### Why no Leak detection?

				### Why no leak detection?

				Python leaks a lot of memory. Possibly we could configure a suppression file,

				but we haven’t gotten around to it.

									
										73

Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,73 @@

				# syntax = docker/dockerfile:experimental

				#

				# NOTE: To build this you will need a docker version > 18.06 with

				#       experimental enabled and DOCKER_BUILDKIT=1

				#

				#       If you do not use buildkit you are not going to have a good time

				#

				#       For reference: 

				#           https://docs.docker.com/develop/develop-images/build_enhancements/

				ARG BASE_IMAGE=ubuntu:18.04

				ARG PYTHON_VERSION=3.7

				FROM ${BASE_IMAGE} as dev-base

				RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \

				    apt-get update && apt-get install -y --no-install-recommends \

				        build-essential \

				        ca-certificates \

				        ccache \

				        cmake \

				        curl \

				        git \

				        libjpeg-dev \

				        libpng-dev && \

				    rm -rf /var/lib/apt/lists/*

				RUN /usr/sbin/update-ccache-symlinks

				RUN mkdir /opt/ccache && ccache --set-config=cache_dir=/opt/ccache

				ENV PATH /opt/conda/bin:$PATH

				FROM dev-base as conda

				RUN curl -v -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \

				    chmod +x ~/miniconda.sh && \

				    ~/miniconda.sh -b -p /opt/conda && \

				    rm ~/miniconda.sh && \

				    /opt/conda/bin/conda install -y python=${PYTHON_VERSION} conda-build pyyaml numpy ipython&& \

				    /opt/conda/bin/conda clean -ya

				FROM dev-base as submodule-update

				WORKDIR /opt/pytorch

				COPY . .

				RUN git submodule update --init --recursive

				FROM conda as build

				WORKDIR /opt/pytorch

				COPY --from=conda /opt/conda /opt/conda

				COPY --from=submodule-update /opt/pytorch /opt/pytorch

				RUN --mount=type=cache,target=/opt/ccache \

				    TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \

				    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \

				    python setup.py install

				FROM conda as conda-installs

				ARG INSTALL_CHANNEL=pytorch-nightly

				RUN /opt/conda/bin/conda install -c "${INSTALL_CHANNEL}" -y pytorch torchvision && \

				    /opt/conda/bin/conda clean -ya

				FROM ${BASE_IMAGE} as official

				LABEL com.nvidia.volumes.needed="nvidia_driver"

				RUN --mount=type=cache,id=apt-final,target=/var/cache/apt \

				    apt-get update && apt-get install -y --no-install-recommends \

				        ca-certificates \

				        libjpeg-dev \

				        libpng-dev && \

				    rm -rf /var/lib/apt/lists/*

				COPY --from=conda-installs /opt/conda /opt/conda

				ENV PATH /opt/conda/bin:$PATH

				ENV NVIDIA_VISIBLE_DEVICES all

				ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

				ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

				WORKDIR /workspace

				FROM official as dev

				# Should override the already installed version from the official-image stage

				COPY --from=build /opt/conda /opt/conda

									
										41

README.md
									
												View File
												
				@ -25,8 +25,8 @@ You can reuse your favorite Python packages such as NumPy, SciPy and Cython to e

				| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center>—</center> |

				| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center>—</center> |

				| Windows CPU / GPU | <center>—</center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/) |  <center>—</center> |

				| Linux (ppc64le) CPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |

				| Linux (ppc64le) GPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/) |

				| Linux (ppc64le) CPU | <center>—</center> | <center>—</center> | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |

				| Linux (ppc64le) GPU | <center>—</center> | <center>—</center> | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/) |

				See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).

				@ -53,7 +53,7 @@ Elaborating further:

				### A GPU-Ready Tensor Library

				If you use NumPy, then you have used Tensors (a.k.a ndarray).

				If you use NumPy, then you have used Tensors (a.k.a. ndarray).

				![Tensor illustration](./docs/source/_static/img/tensor_illustration.png)

				@ -200,6 +200,14 @@ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

				```

				Each CUDA version only supports one particular XCode version. The following combinations have been reported to work with PyTorch.

				| CUDA version | XCode version |

				| ------------ | ------------- |

				| 10.0         | XCode 9.4     |

				| 10.1         | XCode 10.1    |

				On Windows

				At least Visual Studio 2017 Update 3 (version 15.3.3 with the toolset 14.11) and [NVTX](https://docs.nvidia.com/gameworks/content/gameworkslibrary/nvtx/nvidia_tools_extension_library_nvtx.htm) are needed.

				@ -272,20 +280,30 @@ ccmake build  # or cmake-gui build

				### Docker Image

				Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which Python version is to be used by Miniconda, or leave it unset to use the default. Build from pytorch repo directory as docker needs to copy git repo into docker filesystem while building the image.

				```

				docker build -t pytorch -f docker/pytorch/Dockerfile .  # [optional] --build-arg WITH_TORCHVISION=0

				#### Using pre-built images

				You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

				```bash

				docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

				```

				You can also pull a pre-built docker image from Docker Hub and run with nvidia-docker,

				but this is not currently maintained and will pull PyTorch 0.2.

				```

				nvidia-docker run --rm -ti --ipc=host pytorch/pytorch:latest

				```

				Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.

				for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you

				should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.

				#### Building the image yourself

				**NOTE:** Must be built with a docker version > 18.06

				The `Dockerfile` is supplied to build images with cuda support and cudnn v7.

				You can pass `PYTHON_VERSION=x.y` make variable to specify which Python version is to be used by Miniconda, or leave it

				unset to use the default.

				```bash

				make -f docker.Makefile

				# images are tagged as docker.io/${your_docker_username}/pytorch

				```

				### Building the Documentation

				To build documentation in various formats, you will need [Sphinx](http://www.sphinx-doc.org) and the

				@ -316,6 +334,7 @@ Three pointers to get you started:

				* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.

				* Slack: The [PyTorch Slack](https://pytorch.slack.com/) hosts a primary audience of moderate to experienced PyTorch users and developers for general chat, online discussions, collaboration etc. If you are a beginner looking for help, the primary medium is [PyTorch Forums](https://discuss.pytorch.org). If you need a slack invite, please fill this form: https://goo.gl/forms/PP1AGvNHpSaJP8to1

				* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: https://eepurl.com/cbG0rv

				* for brand guidelines, please visit our website at [pytorch.org](https://pytorch.org/)

				## Releases and Contributing

									
										16

android/README.md
									
												View File
												
				@ -34,12 +34,12 @@ repositories {

				dependencies {

				    ...

				    implementation 'org.pytorch:pytorch_android:1.4.0-SNAPSHOT'

				    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0-SNAPSHOT'

				    implementation 'org.pytorch:pytorch_android:1.5.0-SNAPSHOT'

				    implementation 'org.pytorch:pytorch_android_torchvision:1.5.0-SNAPSHOT'

				    ...

				}

				```

				The current nightly(snapshots) version is the value of `VERSION_NAME` in `gradle.properties` in current folder, at this moment it is `1.4.0-SNAPSHOT`.

				The current nightly(snapshots) version is the value of `VERSION_NAME` in `gradle.properties` in current folder, at this moment it is `1.5.0-SNAPSHOT`.

				## Building PyTorch Android from Source

				@ -49,6 +49,7 @@ For this you can use `./scripts/build_pytorch_android.sh` script.

				```

				git clone https://github.com/pytorch/pytorch.git

				cd pytorch

				git submodule update --init --recursive

				sh ./scripts/build_pytorch_android.sh

				```

				@ -59,7 +60,7 @@ The workflow contains several steps:

				2\. Create symbolic links to the results of those builds:

				`android/pytorch_android/src/main/jniLibs/${abi}` to the directory with output libraries

				`android/pytorch_android/src/main/cpp/libtorch_include/${abi}` to the directory with headers. These directories are used to build `libpytorch.so` library that will be loaded on android device.

				3\. And finally run `gradle` in `android/pytorch_android` directory with task `assembleRelease`

				Script requires that Android SDK, Android NDK and gradle are installed.

				@ -103,8 +104,15 @@ dependencies {

				    implementation(name:'pytorch_android', ext:'aar')

				    implementation(name:'pytorch_android_torchvision', ext:'aar')

				    implementation(name:'pytorch_android_fbjni', ext:'aar')

				    ...

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				}

				```

				We also have to add all transitive dependencies of our aars.

				As `pytorch_android` [depends](https://github.com/pytorch/pytorch/blob/master/android/pytorch_android/build.gradle#L62-L63) on `'com.android.support:appcompat-v7:28.0.0'` and `'com.facebook.soloader:nativeloader:0.8.0'`, we need to add them.

				(In case of using maven dependencies they are added automatically from `pom.xml`).

				At the moment for the case of using aar files directly we need additional configuration due to packaging specific (`libfbjni.so` is packaged in both `pytorch_android_fbjni.aar` and `pytorch_android.aar`).

				```

									
										6

android/build.gradle
									
												View File
												
				@ -11,6 +11,10 @@ allprojects {

				            runnerVersion = "1.2.0"

				            rulesVersion = "1.2.0"

				            junitVersion = "4.12"

				            androidSupportAppCompatV7Version = "28.0.0"

				            fbjniJavaOnlyVersion = "0.0.3"

				            soLoaderNativeLoaderVersion = "0.8.0"

				        }

				        repositories {

				@ -34,8 +38,6 @@ allprojects {

				    }

				}

				ext.isPublishing = { ['uploadArchives', 'bintrayUpload'].any { gradle.startParameter.taskNames.contains(it) } }

				ext.deps = [

				        jsr305: 'com.google.code.findbugs:jsr305:3.0.1',

				]

									
										3

android/build_test_app.sh
									
												View File
												
				@ -62,7 +62,7 @@ mkdir -p $OUT_DIR

				pushd $PYTORCH_DIR

				python $PYTORCH_DIR/setup.py clean

				ANDROID_ABI=$abi BUILD_PYTORCH_MOBILE=1 VERBOSE=1 ANDROID_DEBUG_SYMBOLS=1 $PYTORCH_DIR/scripts/build_android.sh -DANDROID_CCACHE=$(which ccache)

				ANDROID_ABI=$abi VERBOSE=1 ANDROID_DEBUG_SYMBOLS=1 $PYTORCH_DIR/scripts/build_android.sh -DANDROID_CCACHE=$(which ccache)

				cp -R $PYTORCH_DIR/build_android/install/lib $OUT_DIR/

				cp -R $PYTORCH_DIR/build_android/install/include $OUT_DIR/

				@ -97,4 +97,3 @@ find $PYTORCH_ANDROID_DIR -type f -name *apk

				find $PYTORCH_ANDROID_DIR -type f -name *apk | xargs echo "To install apk run: $ANDROID_HOME/platform-tools/adb install -r "

				popd

									
										3

android/gradle.properties
									
												View File
												
				@ -1,6 +1,6 @@

				ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64

				VERSION_NAME=1.4.0-SNAPSHOT

				VERSION_NAME=1.5.0-SNAPSHOT

				GROUP=org.pytorch

				MAVEN_GROUP=org.pytorch

				POM_URL=https://github.com/pytorch/pytorch/tree/master/android

				@ -27,3 +27,4 @@ android.useAndroidX=true

				android.enableJetifier=true

				nativeLibsDoNotStrip=false

				testAppAllVariantsEnabled=false

									
										16

android/pytorch_android/CMakeLists.txt
									
												View File
												
				@ -1,6 +1,6 @@

				cmake_minimum_required(VERSION 3.4.1)

				project(pytorch_jni CXX)

				set(CMAKE_CXX_STANDARD 11)

				set(CMAKE_CXX_STANDARD 14)

				set(CMAKE_VERBOSE_MAKEFILE ON)

				set(TRACE_ENABLED OFF)

				@ -48,6 +48,14 @@ add_library(pytorch_jni SHARED

				    ${pytorch_android_SOURCES}

				)

				if (APPLE)

				  # Need to add rpath so dlopen can find dependencies.

				  add_custom_command(TARGET pytorch_jni

				      POST_BUILD COMMAND

				      ${CMAKE_INSTALL_NAME_TOOL} -add_rpath "@loader_path"

				        $<TARGET_FILE:pytorch_jni>)

				endif()

				target_compile_options(pytorch_jni PRIVATE

				  -fexceptions

				)

				@ -72,8 +80,10 @@ if (ANDROID_ABI)

				  endfunction(import_static_lib)

				  import_static_lib(libtorch)

				  import_static_lib(libtorch_cpu)

				  import_static_lib(libc10)

				  import_static_lib(libnnpack)

				  import_static_lib(libXNNPACK)

				  import_static_lib(libpytorch_qnnpack)

				  import_static_lib(libeigen_blas)

				  import_static_lib(libcpuinfo)

				@ -85,9 +95,11 @@ if (ANDROID_ABI)

				      -Wl,--gc-sections

				      -Wl,--whole-archive

				      libtorch

				      libtorch_cpu

				      -Wl,--no-whole-archive

				      libc10

				      libnnpack

				      libXNNPACK

				      libpytorch_qnnpack

				      libeigen_blas

				      libcpuinfo

				@ -100,8 +112,10 @@ else()

				  target_link_libraries(pytorch_jni

				      fbjni

				      torch

				      torch_cpu

				      c10

				      nnpack

				      XNNPACK

				      pytorch_qnnpack

				      cpuinfo

				      clog

									
										17

android/pytorch_android/build.gradle
									
												View File
												
				@ -33,6 +33,11 @@ android {

				            }

				            jniLibs.srcDirs = ['src/main/jniLibs']

				        }

				        androidTest {

				            java {

				                exclude 'org/pytorch/PytorchHostTests.java'

				            }

				        }

				    }

				    externalNativeBuild {

				        cmake {

				@ -41,11 +46,6 @@ android {

				    }

				    packagingOptions {

				        if (rootProject.isPublishing()) {

				            exclude '**/libfbjni.so'

				        } else {

				            pickFirst '**/libfbjni.so'

				        }

				        if (nativeLibsDoNotStrip.toBoolean()) {

				            doNotStrip "**/*.so"

				            logger.warn('WARNING: nativeLibsDoNotStrip==true; debug symbols included')

				@ -58,10 +58,9 @@ android {

				}

				dependencies {

				    api project(':fbjni')

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				    implementation 'com.facebook.fbjni:fbjni-java-only:' + rootProject.fbjniJavaOnlyVersion

				    implementation 'com.android.support:appcompat-v7:' + rootProject.androidSupportAppCompatV7Version

				    implementation 'com.facebook.soloader:nativeloader:' + rootProject.soLoaderNativeLoaderVersion

				    testImplementation 'junit:junit:' + rootProject.junitVersion

				    testImplementation 'androidx.test:core:' + rootProject.coreVersion

									
										20

android/pytorch_android/generate_test_asset.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,20 @@

				#include <torch/jit.h>

				#include <torch/script.h>

				#include <torch/csrc/jit/api/module.h>

				#include <iostream>

				#include <fstream>

				#include <string>

				int main(int argc, char* argv[]) {

				  std::string input_file_path{argv[1]};

				  std::string output_file_path{argv[2]};

				  std::ifstream ifs(input_file_path);

				  std::stringstream buffer;

				  buffer << ifs.rdbuf();

				  torch::jit::Module m("TestModule");

				  m.define(buffer.str());

				  m.save(output_file_path);

				}

									
										7

android/pytorch_android/host/build.gradle
									
												View File
												
				@ -14,7 +14,12 @@ repositories {

				sourceSets {

				    main {

				        java.srcDir '../src/main/java'

				        java {

				            srcDir '../src/main/java'

				            exclude 'org/pytorch/PyTorchAndroid.java'

				            exclude 'org/pytorch/LiteModuleLoader.java'

				            exclude 'org/pytorch/LiteNativePeer.java'

				        }

				    }

				    test {

				        java {

Compare commits

2176 Commits v1.4.1 ... base/1.5

1 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

20 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

15 .circleci/cimodel/data/caffe2_build_data.py Unescape Escape View File

26 .circleci/cimodel/data/caffe2_build_definitions.py Unescape Escape View File

3 .circleci/cimodel/data/dimensions.py Unescape Escape View File

26 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

23 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

3649 .circleci/config.yml View File

40 .circleci/docker/build.sh Unescape Escape View File

2 .circleci/docker/common/install_android.sh Unescape Escape View File

2 .circleci/docker/common/install_cache.sh Unescape Escape View File

2 .circleci/docker/common/install_cmake.sh Unescape Escape View File

6 .circleci/docker/common/install_conda.sh Unescape Escape View File

5 .circleci/docker/common/install_travis_python.sh Unescape Escape View File

1 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

13 .circleci/ecr_gc_docker/Dockerfile Normal file Unescape Escape View File

125 .circleci/ecr_gc_docker/docker_hub.py Executable file Unescape Escape View File

202 .circleci/ecr_gc_docker/gc.py Executable file Unescape Escape View File

3 .circleci/ecr_gc_docker/requirements.txt Normal file Unescape Escape View File

9 .circleci/generate_config_yml.py Unescape Escape View File

12 .circleci/scripts/binary_checkout.sh Unescape Escape View File

4 .circleci/scripts/binary_install_miniconda.sh Unescape Escape View File

16 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

8 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

14 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

19 .circleci/scripts/binary_linux_upload.sh Unescape Escape View File

19 .circleci/scripts/binary_macos_upload.sh Unescape Escape View File

32 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

25 .circleci/scripts/binary_run_in_docker.sh Unescape Escape View File

5 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

6 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

4 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

2 .circleci/scripts/setup_linux_system_environment.sh Unescape Escape View File

140 .circleci/scripts/should_run_job.py Unescape Escape View File

29 .circleci/scripts/should_run_job.sh Unescape Escape View File

87 .circleci/scripts/upload_binary_size_to_scuba.py Normal file Unescape Escape View File

25 .circleci/scripts/vs_install.ps1 Normal file Unescape Escape View File

66 .circleci/validate-docker-version.py Unescape Escape View File

6 .circleci/verbatim-sources/binary-build-tests.yml Unescape Escape View File

38 .circleci/verbatim-sources/binary-job-specs.yml Unescape Escape View File

12 .circleci/verbatim-sources/caffe2-job-specs.yml Unescape Escape View File

8 .circleci/verbatim-sources/commands.yml Unescape Escape View File

21 .circleci/verbatim-sources/docker_build_job.yml Unescape Escape View File

84 .circleci/verbatim-sources/docker_jobs.yml Normal file Unescape Escape View File

26 .circleci/verbatim-sources/header-section.yml Unescape Escape View File

120 .circleci/verbatim-sources/job-specs-custom.yml Unescape Escape View File

45 .circleci/verbatim-sources/pytorch-build-params.yml Unescape Escape View File

158 .circleci/verbatim-sources/pytorch-job-specs.yml Unescape Escape View File

134 .circleci/verbatim-sources/windows-build-test.yml Normal file Unescape Escape View File

51 .circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml Unescape Escape View File

11 .circleci/verbatim-sources/workflows-docker-builder.yml Unescape Escape View File

28 .circleci/verbatim-sources/workflows-ecr-gc.yml Normal file Unescape Escape View File

8 .circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml Unescape Escape View File

3 .circleci/verbatim-sources/workflows-nightly-uploads-header.yml Unescape Escape View File

8 .circleci/verbatim-sources/workflows-pytorch-android-gradle-build.yml Unescape Escape View File

4 .circleci/verbatim-sources/workflows-pytorch-ge-config-tests.yml Unescape Escape View File

11 .circleci/verbatim-sources/workflows-pytorch-ios-builds.yml Unescape Escape View File

3 .circleci/verbatim-sources/workflows-pytorch-macos-builds.yml Unescape Escape View File

32 .circleci/verbatim-sources/workflows-pytorch-mobile-builds.yml Unescape Escape View File

8 .circleci/verbatim-sources/workflows-setup-job.yml Normal file Unescape Escape View File

2 .flake8 Unescape Escape View File

12 .github/workflows/lint.yml vendored Unescape Escape View File

15 .gitignore vendored Unescape Escape View File

10 .gitmodules vendored Unescape Escape View File

12 .jenkins/caffe2/build.sh Unescape Escape View File

5 .jenkins/caffe2/test.sh Unescape Escape View File

25 .jenkins/pytorch/build-mobile-code-analysis.sh Executable file Unescape Escape View File

26 .jenkins/pytorch/build-mobile.sh Executable file Unescape Escape View File

66 .jenkins/pytorch/build.sh Unescape Escape View File

12 .jenkins/pytorch/common.sh Unescape Escape View File

2 .jenkins/pytorch/macos-common.sh Unescape Escape View File

10 .jenkins/pytorch/macos-test.sh Unescape Escape View File

12 .jenkins/pytorch/multigpu-test.sh Unescape Escape View File

78 .jenkins/pytorch/test.sh Unescape Escape View File

2 .jenkins/pytorch/win-build.sh Unescape Escape View File

47 .jenkins/pytorch/win-test-helpers/build_pytorch.bat Unescape Escape View File

8 .jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat Unescape Escape View File

6 .jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat Unescape Escape View File

2176 Commits

v1.4.1 ... base/1.5

1

.circleci/cimodel/data/binary_build_data.py

View File

20

.circleci/cimodel/data/binary_build_definitions.py

View File

15

.circleci/cimodel/data/caffe2_build_data.py

View File

26

.circleci/cimodel/data/caffe2_build_definitions.py

View File

3

.circleci/cimodel/data/dimensions.py

View File

26

.circleci/cimodel/data/pytorch_build_data.py

View File

23

.circleci/cimodel/data/pytorch_build_definitions.py

View File

3649

.circleci/config.yml

View File

40

.circleci/docker/build.sh

View File

2

.circleci/docker/common/install_android.sh

View File

2

.circleci/docker/common/install_cache.sh

View File

2

.circleci/docker/common/install_cmake.sh

View File

6

.circleci/docker/common/install_conda.sh

View File

5

.circleci/docker/common/install_travis_python.sh

View File

1

.circleci/docker/ubuntu/Dockerfile

View File

13

.circleci/ecr_gc_docker/Dockerfile Normal file

View File

125

.circleci/ecr_gc_docker/docker_hub.py Executable file

View File

202

.circleci/ecr_gc_docker/gc.py Executable file

View File

3

.circleci/ecr_gc_docker/requirements.txt Normal file

View File

9

.circleci/generate_config_yml.py

View File

12

.circleci/scripts/binary_checkout.sh

View File

4

.circleci/scripts/binary_install_miniconda.sh

View File

16

.circleci/scripts/binary_ios_build.sh

View File

8

.circleci/scripts/binary_ios_upload.sh

View File

14

.circleci/scripts/binary_linux_test.sh

View File

19

.circleci/scripts/binary_linux_upload.sh

View File

19

.circleci/scripts/binary_macos_upload.sh

View File

32

.circleci/scripts/binary_populate_env.sh

View File

25

.circleci/scripts/binary_run_in_docker.sh

View File

5

.circleci/scripts/cpp_doc_push_script.sh

View File

6

.circleci/scripts/python_doc_push_script.sh

View File

4

.circleci/scripts/setup_ci_environment.sh

View File

2

.circleci/scripts/setup_linux_system_environment.sh

View File

140

.circleci/scripts/should_run_job.py

View File

29

.circleci/scripts/should_run_job.sh

View File

87

.circleci/scripts/upload_binary_size_to_scuba.py Normal file

View File

25

.circleci/scripts/vs_install.ps1 Normal file

View File

66

.circleci/validate-docker-version.py

View File

6

.circleci/verbatim-sources/binary-build-tests.yml

View File

38

.circleci/verbatim-sources/binary-job-specs.yml

View File

12

.circleci/verbatim-sources/caffe2-job-specs.yml

View File

8

.circleci/verbatim-sources/commands.yml

View File

21

.circleci/verbatim-sources/docker_build_job.yml

View File

84

.circleci/verbatim-sources/docker_jobs.yml Normal file

View File

26

.circleci/verbatim-sources/header-section.yml

View File

120

.circleci/verbatim-sources/job-specs-custom.yml

View File

45

.circleci/verbatim-sources/pytorch-build-params.yml

View File

158

.circleci/verbatim-sources/pytorch-job-specs.yml

View File

134

.circleci/verbatim-sources/windows-build-test.yml Normal file

View File

51

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml

View File

11

.circleci/verbatim-sources/workflows-docker-builder.yml

View File

28

.circleci/verbatim-sources/workflows-ecr-gc.yml Normal file

View File

8

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml

View File

3

.circleci/verbatim-sources/workflows-nightly-uploads-header.yml

View File

8

.circleci/verbatim-sources/workflows-pytorch-android-gradle-build.yml

View File

4

.circleci/verbatim-sources/workflows-pytorch-ge-config-tests.yml

View File

11

.circleci/verbatim-sources/workflows-pytorch-ios-builds.yml

View File

3

.circleci/verbatim-sources/workflows-pytorch-macos-builds.yml

View File

32

.circleci/verbatim-sources/workflows-pytorch-mobile-builds.yml

View File

8

.circleci/verbatim-sources/workflows-setup-job.yml Normal file

View File

2

.flake8

View File

12

.github/workflows/lint.yml vendored

View File

15

.gitignore vendored

View File

10

.gitmodules vendored

View File

12

.jenkins/caffe2/build.sh

View File

5

.jenkins/caffe2/test.sh

View File

25

.jenkins/pytorch/build-mobile-code-analysis.sh Executable file

View File

26

.jenkins/pytorch/build-mobile.sh Executable file

View File

66

.jenkins/pytorch/build.sh

View File

12

.jenkins/pytorch/common.sh

View File

2

.jenkins/pytorch/macos-common.sh

View File

10

.jenkins/pytorch/macos-test.sh

View File

12

.jenkins/pytorch/multigpu-test.sh

View File

78

.jenkins/pytorch/test.sh

View File

2

.jenkins/pytorch/win-build.sh

View File

47

.jenkins/pytorch/win-test-helpers/build_pytorch.bat

View File

8

.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat

View File

6

.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat

View File

4

.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat

View File