Compare commits

..

2176 Commits

Author SHA1 Message Date
6d488714a7 .circleci: Specify setup job to run on everything (#35013)
Summary:
CircleCI by default, chooses to run 0 jobs on tags meaning that when we
tag a build that no job is run if a dependent job does not contain the
correct filters.

This adds an explicit configuration to run the setup job on every branch
and every tag that CircleCI can run on.

For more information on CircleCI filters and what they do (and more
importantly what they do not do) visit:

https://circleci.com/docs/2.0/configuration-reference/#filters-1

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35013

Differential Revision: D20535560

Pulled By: seemethere

fbshipit-source-id: 7ee5dddbc0a9416fd76ed198e5447318c53e1873
2020-03-19 09:36:27 -07:00
35d9874a35 in test_data_parallel.py, remove skipIfRocm from tests that pass (#34978)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34978

Differential Revision: D20535920

Pulled By: mrshenli

fbshipit-source-id: 3baa8608dd3b0dd5578bc32e56a2e6c1fe69492d
2020-03-19 09:16:43 -07:00
1f4a4aaf64 functional autograd api (#34066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34066

Basic implementation of https://github.com/pytorch/pytorch/issues/30632

Test Plan: Imported from OSS

Differential Revision: D20260307

Pulled By: albanD

fbshipit-source-id: 7db5c2411ddc3e954ff8fbbe93eb3b96a2bcfb2f
2020-03-19 08:24:07 -07:00
96860af870 Revert D20164420: [1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd
Test Plan: revert-hammer

Differential Revision:
D20164420

Original commit changeset: 3d4ed7423096

fbshipit-source-id: 67f0f9c11cee84df6dbe37db7821dd601227df66
2020-03-19 08:02:07 -07:00
7c06b86e42 Revert D20518647: [pytorch][PR] [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer
Test Plan: revert-hammer

Differential Revision:
D20518647

Original commit changeset: 4760d1d29df1

fbshipit-source-id: b84f1a06c2de27e147716279223a6844ef89f760
2020-03-19 07:53:43 -07:00
5d92a6cc30 Revert D7778113: Reland "[RPC] Use qualified name str directly in RPC torch script code path"
Test Plan: revert-hammer

Differential Revision:
D7778113

Original commit changeset: b830c03ac946

fbshipit-source-id: ef08b287a6db58320c738cde0c99b3333f5724eb
2020-03-19 06:05:23 -07:00
9c4683e8e3 Revert D20312366: [pytorch][PR] Added type promotion logic for complex numbers
Test Plan: revert-hammer

Differential Revision:
D20312366

Original commit changeset: 90f00a1a916d

fbshipit-source-id: 4510739a888b2eec5d8a72e792998ac46da6d82a
2020-03-19 05:55:57 -07:00
0d8447a9b8 Warns when performing integer division with div and addcdiv (#34570)
Summary:
Per title.

In the future we want to make div(), the division operator, and addcdiv perform true division as in Python 3, NumPy, and JAX. To do this without silently breaking users we plan to:

- Warn (once) in 1.5 when a user performs integer division using div or addcdiv
- RuntimeError in 1.6 when a user attempts to perform integer division using div or addcdiv
- Always perform true division in 1.7 using div, /, and addcdiv

Users can use true_divide or floor_divide today to explicitly specify the type of division they like.

A test for this behavior is added to test_type_promotion. Unfortunately, because we are only warning once (to avoid a deluge) the test only uses maybeWarns Regex.

The XLA failure is real but will be solved by https://github.com/pytorch/pytorch/pull/34552. I'll be sure to land that PR first to avoid temporarily breaking the XLA build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34570

Differential Revision: D20529211

Pulled By: mruberry

fbshipit-source-id: 65af5a9641c5825175d029e8413c9e1730c661d0
2020-03-19 04:10:55 -07:00
6f737dd4a3 Fix signed-unsigned warnings (#34791)
Summary:
And few typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791

Test Plan: CI

Differential Revision: D20524879

Pulled By: malfet

fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409
2020-03-19 00:29:56 -07:00
c8f665dcb6 Added type promotion logic for complex numbers (#34093)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/33780
After this PR:
1. dtype promotion logic will correctly work for ops involving complex scalars
2. torch.ComplexFloatTensor, torch.ComplexDoubleTensor works
3. added alias for complex64 (cfloat) and complex128 (cdouble)
4. added an internal function get_complex_default_dtype (consciously not exposed in public API)

>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64)

>>> torch.set_default_dtype(torch.float64)
>>> 1j*torch.ones(2)
tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128)

>>> 1j + torch.ones(2)
tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128)

>>> torch.tensor(1j) + torch.ones(2,2)
tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)],
        [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093

Differential Revision: D20312366

Pulled By: anjali411

fbshipit-source-id: 90f00a1a916d9c8eeda101eb6e9d250fce569815
2020-03-18 23:36:13 -07:00
d616cad676 Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#34962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34962

Relanding #34733. Fix is in https://github.com/pytorch/pytorch/pull/34988.

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_class_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_module_rref_in_py_and_use_in_script
```

```
buck test mode/dev //caffe2/test/distributed/rpc/jit:rpc_fork_thrift -- test_return_local_script_module_rref_in_py_and_use_in_script
```

Differential Revision: D7778113

fbshipit-source-id: b830c03ac9463075fca248eba75be364b0e8b080
2020-03-18 22:25:09 -07:00
be82e554fe Revert D20524479: [pytorch][PR] [C++ API Parity] Add xor_convergence test for lbfgs
Test Plan: revert-hammer

Differential Revision:
D20524479

Original commit changeset: 3413779676ab

fbshipit-source-id: ef8007ed6c184bc8b8751eb713aac2a891260048
2020-03-18 21:56:17 -07:00
153b16ef4c Doxygen for torchbind (#35007)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35007

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D20525680

Pulled By: jamesr66a

fbshipit-source-id: aaa768f395e30dcec8007d50e17f21837c306719
2020-03-18 21:49:24 -07:00
eef17edaa3 Fix warnings in test/test_jit_fuser.py (#34980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34980

We were passing sample inputs to `torch.jit.script` (as if it was
`torch.jit.trace`), but this parameter was treated as an optional
`optimize` parameter. That parameter is deprecated and that caused a
warning.

Differential Revision: D20520369

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 87b40a5e35bfc4a3d7a5d95494632bfe117e40b7
2020-03-18 19:55:25 -07:00
55b254e114 update gitignore to include clangd index (#35018)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35018

Test Plan: Imported from OSS

Differential Revision: D20528402

Pulled By: suo

fbshipit-source-id: badb487a4fbb0299b49c1b1022bcd7b61eba1e88
2020-03-18 19:53:03 -07:00
d3b6099366 [build] Update gloo submodule (#34969)
Summary:
Update gloo submodule to `113bde13035594cafdca247be953610b53026553` be compatible with separate compilation introduced by
https://github.com/facebookincubator/gloo/pull/251
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34969

Test Plan: CI

Differential Revision: D20527163

Pulled By: malfet

fbshipit-source-id: 300d83d8fe95d57b8d740543efada3c56ac7b493
2020-03-18 19:24:23 -07:00
5f67c923f1 [1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd (#34638)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34638

Fixes: https://github.com/pytorch/pytorch/issues/27643

This PR manages notifying workers in the event of a failure during distributed autograd. Gracefully handles propagating errors across all nodes in the backward pass and sets state in the local autograd engines accordingly.

(Note: this ignores all push blocking failures!)

Test Plan: Added 2 new tests checking errors when they are thrown in an intermediate node during distributed autograd. Ensured that all existing distributed autograd tests pass.

Differential Revision: D20164420

fbshipit-source-id: 3d4ed74230969ac70bb763f1b5b1c16d979f66a2
2020-03-18 18:56:14 -07:00
a73dfcf8cf Adjust ProtoBufPatch to protobuf-3.11.x (#35008)
Summary:
`GetEmptyStringAlreadyInited` invocation pattern in protobuf generated header files chanegd to
 `:PROTOBUF_NAMESPACE_ID::internal::GetEmptyStringAlreadyInited`, where `PROTOBUF_NAMESPACE_ID` is defined in `protobuf/port_def.inc` as `google::protobuf`

This likely to have changed around protobuf-3.8.x time, but I've only tested it using protobuf-3.11.4
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35008

Test Plan: Update `third-party/protobuf` submodule to 3.11.4, compile and run `pattern_net_transform_test`

Differential Revision: D20526949

Pulled By: malfet

fbshipit-source-id: fddaa3622c48ad883612c73c40a20d306d88d66b
2020-03-18 18:35:23 -07:00
e5ee95e448 [RPC] Add to confirmed users immediately if the fork is shared from owner, instead of adding nothing to pending users (#34988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34988

In https://github.com/pytorch/pytorch/pull/31893, we introduced a confirmedUsers_ map in RRefContext.

For the case that the fork is shared from the owner,  there is no `pendingUsers_` intermediate phase for this fork, we should put this fork into `confirmedUsers_` immediately.

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork
```

Differential Revision: D7735909

fbshipit-source-id: 14c36a16486f0cc9618dcfb111fe5223781b647d
2020-03-18 18:17:41 -07:00
b8e043abca [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957)
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20518647

Pulled By: anjali411

fbshipit-source-id: 4760d1d29df1784e2d01e2a476d2a08e9df4ea1c
2020-03-18 17:28:57 -07:00
2a1c83823d [tools] Parallelize tools/clang_format_new.py (#34750)
Summary:
**Summary**
This commit parallelizes the invocation of `clang-format` on all files
in `tools/clang_format_new.py` using `asyncio`.

**Testing**
Ran and timed the script.

*Before*
```
$ time ./tools/clang_format_new.py  --diff
...
real	0m7.615s
user	0m6.012s
sys	0m1.634s
```

*After*
```
$ time ./tools/clang_format_new.py  --diff
...
Some files not formatted correctly

real	0m2.156s
user	0m8.488s
sys	0m3.201s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34750

Differential Revision: D20523133

Pulled By: SplitInfinity

fbshipit-source-id: 509741a0b4fcfcdcd7c5a45654e3453b4874d256
2020-03-18 17:27:02 -07:00
6e47e7bf52 [pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites
Summary:
There are three guards related to mobile build:
* AutoGradMode
* AutoNonVariableTypeMode
* GraphOptimizerEnabledGuard

Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size.

Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards:

Full JIT: still set all three guards. More specifically:
* OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI.
* FB: Not covered by this diff (as we are using mobile interpreter for most internal builds).

Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically:
* OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites.
* FB:
** JNI callsites: Use the unified LiteJITCallGuard.
** For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites.

Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch.

PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods.

Test Plan: - CI

Reviewed By: xta0

Differential Revision: D20498017

fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728
2020-03-18 17:19:35 -07:00
a4048b4703 port ge changes from bert/pytorch_fusion (#34942)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34942

Differential Revision: D20505894

Pulled By: Krovatkin

fbshipit-source-id: 7b442fae6aa2b1a29891b94f824094a1fddae4a2
2020-03-18 17:13:24 -07:00
4521477f83 [C++ API Parity] Add xor_convergence test for lbfgs (#35001)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35001

Differential Revision: D20524479

Pulled By: anjali411

fbshipit-source-id: 3413779676ab95c1ee82298f95d3441a89873107
2020-03-18 17:06:53 -07:00
bcbde490e4 Fix flake (#34974)
Summary:
fix flake, add overload names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34974

Differential Revision: D20519191

Pulled By: eellison

fbshipit-source-id: d08d36b64397287cad484690074e694d8a0e472e
2020-03-18 16:45:33 -07:00
b2e5e0cad6 [quant][graphmode] quantization support for aten::rehshape (#34803)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34803

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20504457

fbshipit-source-id: 5ca691ef4880c72d30d62390e63e3288b2f06dce
2020-03-18 15:40:43 -07:00
69e701fbf9 Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning
Summary: Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning

Reviewed By: mraway

Differential Revision: D20286298

fbshipit-source-id: de3e029611d843f38d3f42ecd4148358f7e14a2b
2020-03-18 15:28:00 -07:00
e35dd4f603 [jit] Include call stack in OSError message (#34669)
Summary:
Previously there was no indication of why you would get an `OSError` for something (such as the generated methods of a `dataclass`).
](https://our.intern.facebook.com/intern/diff/20426570/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34669

Pulled By: driazati

Differential Revision: D20426570

fbshipit-source-id: 45d63631984fa26a87c03de5523fb10d8abbc6db
2020-03-18 15:10:23 -07:00
3b7e1cd2cc Makes floor_divide a method, adds sparse floor division (#34552)
Summary:
(Updated per review feedback)

`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:

- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors

Tests are added to test_sparse.py and test_torch.py for these new behaviors.

In addition, this PR:

- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU

Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).

The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.

There are two potential follow-up issues suggested by this PR:

- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552

Differential Revision: D20509850

Pulled By: mruberry

fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8
2020-03-18 15:00:53 -07:00
d77d907f0e [quant][graphmode] Add quantization support for aten::dropout (#34347)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34347

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20504453

fbshipit-source-id: 1bab29e21d0564ed88cdeb4894addfe00ebbd390
2020-03-18 14:35:27 -07:00
c747f09846 Add operator [] to c10::impl::ListIterator (#34926)
Summary:
This is causing failures on my Windows build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34926

Differential Revision: D20501850

Pulled By: smessmer

fbshipit-source-id: 92c72dd657b27b1786952dbdccfceff99f4ba743
2020-03-18 12:57:38 -07:00
064f6285af Torchvision in jenkins testing (#34909)
Summary:
This pull request updates the Torchvision commit to use ROCm enabled torchvision in `.jenkins/pytorch/test.sh`.
Pytorch tests:
```
test_SyncBatchNorm_process_group (__main__.TestDistBackend)
test_alexnet (jit.test_models.TestModels)
test_script_module_script_resnet (jit.test_models.TestModels)
test_script_module_trace_resnet18 (jit.test_models.TestModels)
test_torchvision_smoke (__main__.TestTensorBoardPytorchGraph)
```
in `test2` were skipped because torchvision was not installed in `test2` instead it was installed in `test1`. The PR moved torchvision test to correct place and thereby enabling the above mentioned tests.

cc: ezyang iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34909

Differential Revision: D20515333

Pulled By: ezyang

fbshipit-source-id: 69439756a687ba441c1f8107233b4dbc1e108387
2020-03-18 12:45:51 -07:00
1afc584188 Deprecates current torch.full integral type inference, adds torch.full complex type inference (#34709)
Summary:
Per title.

Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases:

- When integral fill values (including bool) are given
- When complex fill values are given

For example:

```
np.full((1, 2), 1).dtype
: dtype('int64')

np.full((1, 2), (1 + 1j)).dtype
: dtype('complex128')
```

Whereas in PyTorch

```
torch.full((1, 2), 1).dtype
: torch.float32

torch.full((1, 2), (1 + 1j)).dtype
: RuntimeError: value cannot be converted to type float without overflow: (1,1)
```

This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly.

The PR also implements inference for complex types. So that with it:

```
torch.full((1, 2), (1 + 1j)).dtype
: torch.complex64
```

The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned.

A test for these behaviors is added to test_torch.py.

Implementation note:

This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object.

Future Work:

This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example.

Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709

Differential Revision: D20509387

Pulled By: mruberry

fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933
2020-03-18 12:19:31 -07:00
f3b8a470e1 Added functionality for all to take Lists as input (#34582)
Summary:
New pull request after rebase error in pull request https://github.com/pytorch/pytorch/issues/33923
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34582

Differential Revision: D20447689

Pulled By: eellison

fbshipit-source-id: 4296b64185eccb136b1b614b532deb3af20c7544
2020-03-18 12:01:30 -07:00
d0577e19f0 Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only
Test Plan: revert-hammer

Differential Revision:
D20346700

Original commit changeset: 12d77b391731

fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60
2020-03-18 11:42:51 -07:00
b35e544772 Minor fixes for RPC API doc (#34955)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34955

Test Plan: Imported from OSS

Differential Revision: D20512262

Pulled By: mrshenli

fbshipit-source-id: 86ed099638fd32dc8fbde5a6f284239b146fd5e9
2020-03-18 11:20:32 -07:00
d29f450e63 Revert D20442573: [RPC] Use qualified name str directly in RPC torch script code path
Test Plan: revert-hammer

Differential Revision:
D20442573

Original commit changeset: 87f8b7d94adc

fbshipit-source-id: db0f10c28352d2b3ca21b5357e8e09c01a50018c
2020-03-18 11:00:09 -07:00
689598df0b [quant][graphmode] insert quant/dequant work for duplicated debugName (#34315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34315

previously we register quantization parameter attributes using debugName of
the observed value, but debugName is not unique, this PR addresses this problem
by making attribute names unique

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20504455

fbshipit-source-id: 6dd83bdfc4e4dc77ad3af3d5b48750fb01b2fce1
2020-03-18 10:49:25 -07:00
aaa8f02156 Eager autocasting, out-of-place ops only (#32140)
Summary:
Initial integration of eager autocasting, supporting out-of-place ops only for easier review.
Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081

In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests.  Support for these ops (much of which has already been written) will be broken into later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140

Differential Revision: D20346700

Pulled By: ezyang

fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f
2020-03-18 10:28:21 -07:00
fa5bc9fa2e Fix problem in NHWC max_pool2d; use accumulate type in NHWC max_pool2d (#34934)
Summary:
This PR would fix https://github.com/pytorch/pytorch/issues/34736. Both code snippet in that issue can now execute normally. More tests are also added.

This PR is a follow-up on https://github.com/pytorch/pytorch/issues/34519, where one variable was mistakenly missed when updating the max_pool2d kernel.

This PR also uses accumulate type of scalar_t in the backward kernel, which resolves the numerical precision issue when stride < kernel_size on fp16.

cc csarofeen ptrblck jjsjann123 VitalyFedyunin ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34934

Differential Revision: D20512062

Pulled By: VitalyFedyunin

fbshipit-source-id: a461ebbb3e3684aa183ae40e38d8f55bb6f4fee1
2020-03-18 08:32:10 -07:00
d927d58c2a Revert D20289209: Support RowWiseSparseAdam on GPU
Test Plan: revert-hammer

Differential Revision:
D20289209

Original commit changeset: a7a8a21bd18c

fbshipit-source-id: 4a8ae684d099a5499c28b7e65578fc7ab10b248d
2020-03-18 07:35:07 -07:00
a1eaaea288 Revert D20497453: [pytorch][PR] Makes floor_divide a method, adds sparse floor division
Test Plan: revert-hammer

Differential Revision:
D20497453

Original commit changeset: ac326f2007d8

fbshipit-source-id: b94b89b1a25521506e3d0a6b072d3d4d8c55e63d
2020-03-18 01:48:50 -07:00
a3de359464 Do not throw from CUDAContext destructor (#34756)
Summary:
Throwing from destructor leads to undefined behaviour (most often to segault)
So it's better to leak memory then segault
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756

Test Plan: Run `test_pytorch_onnx_caffe2`

Differential Revision: D20504228

Pulled By: malfet

fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab
2020-03-18 00:13:18 -07:00
b7129050e7 Makes floor_divide a method, adds sparse floor division (#34552)
Summary:
(Updated per review feedback)

`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:

- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors

Tests are added to test_sparse.py and test_torch.py for these new behaviors.

In addition, this PR:

- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU

Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).

The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.

There are two potential follow-up issues suggested by this PR:

- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552

Differential Revision: D20497453

Pulled By: mruberry

fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d
2020-03-18 00:01:45 -07:00
bcbdba450c [caffe2] open source 2/4-bit SLS operators (#34903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34903

Reattempt of D20461609

Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D20495304

fbshipit-source-id: 66a99677583f50fd40e29c514710c7b1a8cdbc29
2020-03-17 22:55:10 -07:00
d7e4a379a0 [C++ API Parity] LBFGS optimizer step() update and added closure to the Optimizer step() function (#34564)
Summary:
Follow-ups after this PR:

* Remove `LossClosureOptimizer`, and merge `Optimizer` into `OptimizerBase` (and rename the merged class to Optimizer)
* Merge the LBFGS-specific serialize test function and the generic `test_serialize_optimizer` function, possibly by passing a bool `has_only_global_state` flag into the `test_serialize_optimizer` function to denote whether `size()` should be equal to 1 or 2?
    * https://github.com/pytorch/pytorch/pull/34564#discussion_r393780303
* It seems that we don't have the equivalent `XORConvergence_LBFGS` test like the other optimizers, and it would be good to add one
* Remove mentions of `parameters_` in optimizer.cpp, de-virtualize all functions, and remove the `OptimizerBase(std::vector<Tensor> parameters)` constructor from `OptimizerBase`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34564

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20495701

Pulled By: anjali411

fbshipit-source-id: 6d35286d2decb6f7dff93d9d3e57515770666622
2020-03-17 22:27:24 -07:00
df20f5b374 Updating submodules
Summary:
GitHub commits:

70331595ce
51ae830b00

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 045a70a24059fc1120d54d5b85ffe0e2831d2161
2020-03-17 21:34:16 -07:00
130e720784 [torchbind] Add more comprehensive docscrings (#34906)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34906

Test Plan: Imported from OSS

Differential Revision: D20496221

Pulled By: jamesr66a

fbshipit-source-id: 3863ec77324564f6f0f1c54b0cbd6c29d12f3c74
2020-03-17 20:41:18 -07:00
09a7788a2f [torchbind] Improve IValue custom class API and remove most Capsule stuff (#34848)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34848

Test Plan: Imported from OSS

Differential Revision: D20480514

Pulled By: jamesr66a

fbshipit-source-id: 1c595faf34e00aab0a6202a8902426bd310551c3
2020-03-17 20:39:34 -07:00
c4fdba326d Support using self as the destination in rpc.remote for builtin operators (#34931)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34931

Test Plan: Imported from OSS

Differential Revision: D20503571

Pulled By: mrshenli

fbshipit-source-id: ed1454a349798b18b9953bbf13c86bc43d3b559d
2020-03-17 20:30:19 -07:00
b5edf329f8 [JIT] Make RPC RRef Owner WorkerInfo.name available to TorchScript (#34896)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34896

Make TorchScript support calling ref.owner() to get owner worker id and calling ref.owner_name() to get owner worker name.

Differential Revision: D7652208

fbshipit-source-id: a60125bb316ac2cf19a993cbd2affc933c0af7c9
2020-03-17 20:28:18 -07:00
95f1cb34b9 Revert D20480546: adds quantized implementation of hard sigmoid
Test Plan: revert-hammer

Differential Revision:
D20480546

Original commit changeset: 9febcb44afd9

fbshipit-source-id: 4461b455e63448cf45237e23c988b492c3e0f1b0
2020-03-17 19:58:08 -07:00
ff3d205ee5 [rpc] handle exceptions in ProcessGroupAgent::enqueueRecv (#34413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34413

In this diff we have made various improvements to ProcessGroupAgent in order to accomodate edge and error cases such as a "non-clean" shutdown (shutdowns in which we abort RPC as quickly as possible, and don't wait for all pending work across all RPC agents to be completed):

1. Catch and log exceptions in `enqueueRecv`. This prevents us from calling `std::terminate()` in a different thread and logs an error message indicating the issue. With this we no longer have crashes caused by exceptions in this thread during non-graceful shutdown.

2. Provide cleaner error messages everywhere (and use `c10::str` where possible). One example is in `agent::send()`.

3. Add the ability to abort pending sends that cause blocking waits in `handleSend`. The reason we need to abort this is since during a non-graceful shutdown, we could become blocked waiting for these since there is no guarantee the remote end is still active and this would result in a long wait and eventual timeout. We abort these by adding them to a map, and go through this map during `shutdown()`.

4. Fix flaky tests: `test_handle_send_exceptions` and `test_backward_node_failure` and `test_backward_node_failure_python_udf`. These tests were flaky since they dealt with non-graceful shutdown of workers which has chances for a bunch of edge cases explained above.

We have also refactored `createExceptionResponse`, `enqueueRecv`, and some test functions for the above reasons in this diff.

For testing:
Ensured that the tests are no longer flaky with 500 tests runs. Previously, these tests were flaky and disabled. Also added a unit test in the internal `ProcessGroupAgentTest.cpp`.
ghstack-source-id: 100311598

Test Plan: Ensured that the tests are no longer flaky with 500 tests runs. Previously, these tests were flaky and disabled. Also added a unit test in the internal `ProcessGroupAgentTest.cpp`.

Reviewed By: mrshenli

Differential Revision: D20269074

fbshipit-source-id: de9cad7f7185f9864ffbb6b14cd8ca9f6ff8f465
2020-03-17 19:01:41 -07:00
1c8e086537 [quant][graphmode][refactor] Change QParamMap to QParamVector (#34314)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34314

Test Plan:
.

Imported from OSS

Differential Revision: D20493032

fbshipit-source-id: fd945b861ae08e1d97f154aa2b1fb3099761882b
2020-03-17 18:35:15 -07:00
4bd3e9b41b fix barrier in jit test (#34901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34901

init_pg is needed for dist.barrier call, otherwise default process group may not be found for some rpc backend
ghstack-source-id: 100319642

Test Plan: unit  test

Differential Revision: D20495321

fbshipit-source-id: a44241bd2ff6e1404eee9b241270a94e9fd114d0
2020-03-17 18:19:08 -07:00
74a28ff1dd Make checkInputs more robust (#34838)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34838

Differential Revision: D20500828

Pulled By: Krovatkin

fbshipit-source-id: 7eff720dff2698423f3e65b3809ff6f598f936d7
2020-03-17 17:51:12 -07:00
e43c2d59dd Reduce memory overhead of categorical.sample (#34900)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34714 (using the discussed solution). Thanks to jjabo for flagging and suggesting this.

Instead of expanding `probs` to prepend `sample_shape`, it is better  to use the `num_samples` argument to `torch.multinomial` instead, which is faster and consumes lesser memory.

Existing tests should cover this. I have profiled this on different inputs and the change results in faster `.sample` (e.g. 100X faster on the example in the issue), or at worst is similar to what we have now with the default `sample_shape` argument.

cc. fritzo, alicanb, ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34900

Differential Revision: D20499065

Pulled By: ngimel

fbshipit-source-id: e5be225e3e219bd268f5f635aaa9bf7eca39f09c
2020-03-17 17:49:41 -07:00
85c51a8c10 Fix dist autograd context Example block format (#34921)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34921

Test Plan: Imported from OSS

Differential Revision: D20500012

Pulled By: mrshenli

fbshipit-source-id: 6c81123ad347726032c29630d7bf58feb6d8c5fd
2020-03-17 17:44:14 -07:00
f05abd1259 Fix example block format in Distributed Optimizer API doc (#34919)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34919

Test Plan: Imported from OSS

Differential Revision: D20500013

Pulled By: mrshenli

fbshipit-source-id: d28cbdd1ec207e1e8501ce389b7040fb764f12ca
2020-03-17 17:44:09 -07:00
e87db8a77b Fix example format in Distributed Autograd doc (#34914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34914

Test Plan: Imported from OSS

Differential Revision: D20500015

Pulled By: mrshenli

fbshipit-source-id: 55715fd1ffce143952d3f6ffcf60ee83ade0efb4
2020-03-17 17:44:01 -07:00
552f9d3a68 Minor fixes for RPC API docs (#34890)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34890

Test Plan: Imported from OSS

Differential Revision: D20491788

Pulled By: mrshenli

fbshipit-source-id: 95a9821d70e0afe51f586b891845b3106c7105ce
2020-03-17 17:43:55 -07:00
3c48aadd98 Update descriptions for transmitting CUDA tensors (#34888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34888

Test Plan: Imported from OSS

Differential Revision: D20491408

Pulled By: mrshenli

fbshipit-source-id: 4ca35ac9edd4c1af4f2bae2cfb0f1f6060658d5c
2020-03-17 17:43:48 -07:00
800bdcf000 Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887

Test Plan: Imported from OSS

Differential Revision: D20491409

Pulled By: mrshenli

fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332
2020-03-17 17:43:42 -07:00
6446ccce76 Adding warnings for async Tensor serialization in remote and rpc_async (#34885)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34885

Test Plan: Imported from OSS

Differential Revision: D20491279

Pulled By: mrshenli

fbshipit-source-id: 8c861e7c7e9ea39f9427f80bc4e75c72c0087366
2020-03-17 17:43:35 -07:00
0d857d55b9 Add a warning for RRef serialization (#34884)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34884

Test Plan: Imported from OSS

Differential Revision: D20491278

Pulled By: mrshenli

fbshipit-source-id: fd00701fd0090639ffe392f40610426c78bc9269
2020-03-17 17:40:55 -07:00
f87cd83d11 Append multiple arguments to list of flags as multiple items (#34899)
Summary:
This makes PyTorch compileable(but not linkable) with `CUDA_SEPARABLE_COMPILATION` option enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34899

Test Plan: CI

Differential Revision: D20501050

Pulled By: malfet

fbshipit-source-id: 02903890a827fcc430a26f397d4d05999cf3a441
2020-03-17 16:48:32 -07:00
841f7600bb [quant][graphmode] Quantization pattern for aten::linear (#33854)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33854

att

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20493031

fbshipit-source-id: bafd0a3ba5d07327d451b3915f043db33b012b53
2020-03-17 16:36:30 -07:00
71f02a481b [RPC] Avoid polluting Python root logger on importing "torch" module (#34871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34871

We used to configure root logger in RPC module. A stream handler is added to `root.handlers`. This is not desired behavior for pytorch users. We should instead keep the root logger handler list untouched.

We can configure the logger local to the rpc module, set it's log level, so it doesn't use it's ancestor, which is usually the root which has no stream handlers in most cases.
https://docs.python.org/3/library/logging.html#logging.Logger.setLevel

And add a stream handler to make it output to stdout, even if the root handlers is not configured and has an empty list.
https://docs.python.org/3/library/logging.html#logging.Logger.addHandler
https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler
ghstack-source-id: 100322141

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_wait_all_workers
```

Differential Revision: D7677493

fbshipit-source-id: 88a66079e7348c79a7933e3527701917cbebb7ba
2020-03-17 16:07:06 -07:00
58c5b6d306 adds quantized implementation of hard sigmoid (#34607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34607

Adds quantized version of hardsigmoid activation.

Note: not implementing the _ and .out versions is
currently intended, because the implementation changes the scale and
zp and it's nice to not allow the user to specify scale
and zp.  Lmk if we should handle this differently.

Test Plan:
tests
benchmarks

Imported from OSS

Differential Revision: D20480546

fbshipit-source-id: 9febcb44afd920125ed2ca4900492f0b712078ea
2020-03-17 16:01:39 -07:00
97757dca79 Format register_ditributed_ops.cpp (#34922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34922

format

Test Plan: `

Differential Revision: D7717743

fbshipit-source-id: 207bd46a6b0579adbd35f6417af239ec717c7a41
2020-03-17 15:42:18 -07:00
0216c76e12 SNIFAE Template Constructors of IValue (#34647) (#34843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34843

Currently, we use not_ok_to_boxing to filter Dimname that can not be
converted/constructed to IValue. The correct way should be SNIFAE the
constructor of IValue.

(Note: this ignores all push blocking failures!)

Test Plan:
PyTorch compiled after the code change.

All unit test passed

Imported from OSS

Differential Revision: D20494886

fbshipit-source-id: 91dfba6a41a3ae2d6ceba9d4124cbf612ea3f080
2020-03-17 15:40:48 -07:00
959a7138fd Support RowWiseSparseAdam on GPU (#34341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34341

Implement RowWiseSparseAdam on CUDA

Reviewed By: xianjiec

Differential Revision: D20289209

fbshipit-source-id: a7a8a21bd18c1b9891f04f202d3ecaf183e30cad
2020-03-17 15:08:24 -07:00
72e3d66f50 [ROCm] Fix for std::isnan regression in ROCm (#34664)
Summary:
Filing this PR since we are in the process of migrating ROCm CI to ROCm version 3.1. This patch is to ensure the correct functionality of float <-> bfloat16 conversion in rocm3.1. `std::isnan` regresses with rocm3.1.

iotamudelta ezyang

cc: ashishfarmer (original author of this patch)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34664

Differential Revision: D20440972

Pulled By: ezyang

fbshipit-source-id: 1ccb911c88f05566d94e01878df6c70cf7f31242
2020-03-17 15:03:17 -07:00
b227ea955e .circleci: Remove should_run_job, no longer needed (#34326)
Summary:
Done at the recommendation of ezyang

TODO:

- [x] Sync `XImportant`
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34326

Differential Revision: D20496786

Pulled By: seemethere

fbshipit-source-id: 8c84e097d81db28d7dcda8720973bce77f6eb4f7
2020-03-17 15:01:59 -07:00
5857a125df Turn on exact_dtype by default on test_optim.py (#34825)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34825

Test Plan: Imported from OSS

Differential Revision: D20498111

Pulled By: great-way

fbshipit-source-id: e689ca40c496b6b4cccb0df30bdae89b2c024f31
2020-03-17 14:41:13 -07:00
a4224886f3 Eliminate guards through max_pool ops. (#34512)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34512

Differential Revision: D20478962

Pulled By: resistor

fbshipit-source-id: 86fc926305f95cae8b334ed344d8e0cdd1ef7b2b
2020-03-17 14:00:00 -07:00
6b701de130 Add types argument to __torch_function__ (#34303)
Summary:
This PR adds the `types` argument to `__torch_function__` as per RFC 0001: https://github.com/pytorch/rfcs/pull/3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34303

Differential Revision: D20474992

Pulled By: ezyang

fbshipit-source-id: cdd40b3b38f3bda4ece8812a629f5db87e919d01
2020-03-17 13:32:00 -07:00
275f5c8049 setup.py: Add numpy as required for install_requires (#34510)
Summary:
Was originally not a requirement but we should add it back here since
it's required on import and we require it anyways for our conda
packages.

Tested with:

```
❯ pkginfo -f requires_dist *.whl
requires_dist: ['numpy']
```

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34510

Differential Revision: D20352125

Pulled By: seemethere

fbshipit-source-id: 383e396fe500ed7043d83c3df57d1772d0fff1e6
2020-03-17 13:31:55 -07:00
940e678da9 Add back cudaHostRegister to cudart API. (#34665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34665

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20493861

Pulled By: ezyang

fbshipit-source-id: 4215e3037a16be460f20cfc2859be5ee074128d3
2020-03-17 13:30:39 -07:00
7a3cf67fd8 Implement channels last upsample2d/3d forward pass kernel. (#34597)
Summary:
Thi PR implement channel last upsampling nearest for 2D/3D.
This is supposed to be faster, plus, avoids converting formats going in
and out of operator.
Will post benchmarking numbers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34597

Test Plan: python test/test_nn.py TestNN.test_upsamplingNearest3d_channels_last

Differential Revision: D20390583

Pulled By: kimishpatel

fbshipit-source-id: e0162fb97604a261887f38fc957d3f787c80954e
2020-03-17 13:04:42 -07:00
3ad7dfa2cf move emulation libraries to contrib (#34861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34861

start with unary ops

Test Plan:
buck test //glow/fb/test/numerics/...

```
[hyz@devgpu019.snc1 ~/fbsource/fbcode/caffe2/caffe2/contrib] buck test //glow/fb/test/numerics/...
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 2.0 sec
Building: finished in 9.8 sec (100%) 14826/14826 jobs, 23 updated
  Total time: 11.9 sec
Trace available for this run at /tmp/testpilot.20200316-143829.59858.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 7228e74a7f7e8e4934ab79a135930e665ca0e589 fbpkg e6db8251dbeb46b68a52a862744deff4 at Sun Mar  8 21:16:39 2020 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/795/t.par
/proc/self/fd/4/__monkeytype_main_wrapper__.py:934: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
Discovering tests
Running 34 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874425505432
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slsw_all_one_tenth_mel_25 (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 0.000 1/34 (passed)
      ✓ glow/fb/test/numerics:test_batchnorm_nnpi_fp16nnpi - test_bn (glow.fb.test.numerics.test_batchnorm_nnpi_fp16.BatchnormTest) 1.974 2/34 (passed)
      ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_clip (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 1.371 3/34 (passed)
      ✓ glow/fb/test/numerics:test_batchmatmul_nnpi_fp16nnpi - test_batch_matmul (glow.fb.test.numerics.test_batchmatmul_nnpi_fp16.TestBatchMatMul) 2.993 4/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_clip_graph (glow.fb.test.numerics.test_operator_onnxifi.CommonOpsTest) 0.536 5/34 (passed)
      ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - test_accumulator_limits (glow.fb.test.numerics.test_numerics_nnpi.AccTest) 0.472 6/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_mat_mul_graph (glow.fb.test.numerics.test_operator_onnxifi.MatMulTest) 0.495 7/34 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_tanh (glow.fb.test.numerics.test_op_nnpi_fp16.UnaryOpTest) 0.573 8/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_fc_graph (glow.fb.test.numerics.test_operator_onnxifi.FCTest) 0.793 9/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_concat_graph_sampe_shape (glow.fb.test.numerics.test_operator_onnxifi.ConcatTest) 0.441 10/34 (passed)
      ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_small_sls (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 0.463 11/34 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_add_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.772 12/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_fp16fc_graph (glow.fb.test.numerics.test_operator_onnxifi.Fp16FCTest) 0.481 13/34 (passed)
      ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_exercise (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 0.495 14/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_tanh_graph (glow.fb.test.numerics.test_operator_onnxifi.TanhTest) 0.538 15/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_add_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.517 16/34 (passed)
      ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_numeric_cases (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 0.555 17/34 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_sub_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.692 18/34 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_sigmoid (glow.fb.test.numerics.test_op_nnpi_fp16.UnaryOpTest) 1.038 19/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_sigmoid_graph (glow.fb.test.numerics.test_operator_onnxifi.SigmoidTest) 0.530 20/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_div_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.590 21/34 (passed)
      ✓ glow/fb/test/numerics:test_sls_4bit_nnpi_fp16nnpi - test_slws_fused_4bit_rowwise_all_same (glow.fb.test.numerics.test_sls_4bit_nnpi_fp16.SparseLengthsSumTest) 0.607 22/34 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_div_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.583 23/34 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_mul_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.803 24/34 (passed)
      ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - test_accumulator_simple (glow.fb.test.numerics.test_numerics_nnpi.AccTest) 0.484 25/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slws_fused_8bit_rowwise_length1_graph (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 9.069 26/34 (passed)
      ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_slws_fused_8bit_rowwise_intel2 (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 1.741 27/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_mul_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.902 28/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_sub_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.678 29/34 (passed)
      ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_slws_fused_8bit_rowwise_all_same (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 0.726 30/34 (passed)
      ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_num0 (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 1.621 31/34 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slws_fused_8bit_rowwise_graph (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 10.121 32/34 (passed)
     ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_gather_graph (glow.fb.test.numerics.test_operator_onnxifi.CommonOpsTest) 99.675 33/34 (passed)
      ✓ glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_NNPI 0.156 34/34 (passed)
      {emoji:2702} glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_Interpreter 0.000 (OMITTED)
Test output:
> This test was disabled.
> To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck).
> To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/281474992503783
      ✓ glow/fb/test/numerics:fp16_op_test - main 3.986 (passed)
      ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - main 12.606 (passed)
      ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - main 12.622 (passed)
      ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - main 12.688 (passed)
      ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - main 12.688 (passed)
      ✓ glow/fb/test/numerics:test_batchnorm_nnpi_fp16nnpi - main 12.744 (passed)
      ✓ glow/fb/test/numerics:test_batchmatmul_nnpi_fp16nnpi - main 12.763 (passed)
      ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - main 12.800 (passed)
      ✓ glow/fb/test/numerics:test_sls_4bit_nnpi_fp16nnpi - main 13.034 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874425505432
Summary (total time 134.18s):
  PASS: 43
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 1
    glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_Interpreter

```

Reviewed By: yinghai

Differential Revision: D20471053

fbshipit-source-id: 0bd8e69fbb843a02dc031f45a060aa78c602b42c
2020-03-17 12:50:41 -07:00
cfab65d90d Fix CMake Dev warning in caffe2/CMakeLists.txt (#34886)
Summary:
If arguments of `ENDIF()` block are non-empty, they should match corresponding `IF()` BLOCK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34886

Test Plan: CI

Differential Revision: D20494631

Pulled By: malfet

fbshipit-source-id: 5fed86239b4a0cb4b3aedd02c950c1b800199d2d
2020-03-17 12:19:42 -07:00
3e68d0c5d0 Revert D20461609: [caffe2] open source 2/4-bit SLS operators
Test Plan: revert-hammer

Differential Revision:
D20461609

Original commit changeset: b3ef73ff10f2

fbshipit-source-id: e90ee5e34b1feab5b0bd582ed7e96e37de7044b0
2020-03-17 11:10:10 -07:00
95833a49e6 [TensorExpr] Pull changes from bertmaher/pytorch_fusion. (#34842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34842

This PR (hopefully the last one of such kind) is merging changes from a
side branch where tensor expessions based fuser work has been done so
far. This PR is is a squashed version of changes in the side branch,
which is available here: https://github.com/bertmaher/pytorch

Differential Revision: D20478208

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 21556e009f1fd88099944732edba72ac40e9b9c0
2020-03-17 11:02:48 -07:00
ecd7c0f84c [RPC] Use qualified name str directly in RPC torch script code path (#34733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34733

simplify
ghstack-source-id: 100292435

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_class_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_return_local_script_module_rref_in_py_and_use_in_script
```

Differential Revision: D20442573

fbshipit-source-id: 87f8b7d94adc03544f8e2955d01cd4702bb31a34
2020-03-17 10:28:52 -07:00
a0b7a39a92 Updating submodules
Summary:
GitHub commits:

eff7e6d11d
7812ac2fa9

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: a3f94dd5b48240169296d773b2828cd97b0871dd
2020-03-17 10:02:37 -07:00
65889388d1 Use randomtemp to resolve intermittent cuda build errors (#34777)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25393.
Core logic of randomtemp: https://github.com/peterjc123/randomtemp/blob/master/randomtemp/randomtemp.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34777

Differential Revision: D20491243

Pulled By: ezyang

fbshipit-source-id: 76b0e1819ac1e3f760d5451197bd75ea13df1f0b
2020-03-17 09:56:01 -07:00
67cb018462 Print cuda install logs for Windows CI (#34858)
Summary:
Related to https://github.com/pytorch/pytorch/issues/34821.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34858

Differential Revision: D20491248

Pulled By: ezyang

fbshipit-source-id: c6ddd59197a7bce31c1a3ea5dc28b0ee95d5c216
2020-03-17 09:37:25 -07:00
acbca57d18 improve batch_norm contiguous case's performance (#34530)
Summary:
For batch_norm inference contiguous case, we can get a better performance by manually vectorize it.
Test script:
```                                                                                                   X
 import torch
 import torch.nn as nn
 import time

 torch.manual_seed(0)

 for n in [1, 10, 100]:
     for c in [1, 10, 100]:
         for hw in [1, 10, 200]:
             m = nn.BatchNorm2d(c, affine=False)
             m.eval()
             input = torch.randn(20, c, hw, hw)
             # warm up
             for i in range(200):
                 output = m(input)
             fwd_t = 0
             for j in range(1000):
                 t1 = time.time()
                 output = m(input)
                 t2 = time.time()
                 fwd_t = fwd_t + (t2 -t1)

             fwd_avg = fwd_t / 1000 * 1000
             print("size = (%d, %d, %d, %d); compute time is %.4f(ms)" % (n, c, hw, hw, fwd_avg))
```

Before:
```
size = (1, 1, 1, 1); compute time is 0.0110(ms)
size = (1, 1, 10, 10); compute time is 0.0123(ms)
size = (1, 1, 200, 200); compute time is 0.8166(ms)
size = (1, 10, 1, 1); compute time is 0.0107(ms)
size = (1, 10, 10, 10); compute time is 0.0257(ms)
size = (1, 10, 200, 200); compute time is 8.7533(ms)
size = (1, 100, 1, 1); compute time is 0.0122(ms)
size = (1, 100, 10, 10); compute time is 0.1619(ms)
size = (1, 100, 200, 200); compute time is 123.5674(ms)
size = (10, 1, 1, 1); compute time is 0.0109(ms)
size = (10, 1, 10, 10); compute time is 0.0123(ms)
size = (10, 1, 200, 200); compute time is 0.5629(ms)
size = (10, 10, 1, 1); compute time is 0.0107(ms)
size = (10, 10, 10, 10); compute time is 0.0253(ms)
size = (10, 10, 200, 200); compute time is 8.7817(ms)
size = (10, 100, 1, 1); compute time is 0.0120(ms)
size = (10, 100, 10, 10); compute time is 0.1655(ms)
size = (10, 100, 200, 200); compute time is 123.2488(ms)
size = (100, 1, 1, 1); compute time is 0.0109(ms)
size = (100, 1, 10, 10); compute time is 0.0123(ms)
size = (100, 1, 200, 200); compute time is 0.5740(ms)
size = (100, 10, 1, 1); compute time is 0.0108(ms)
size = (100, 10, 10, 10); compute time is 0.0257(ms)
size = (100, 10, 200, 200); compute time is 8.7201(ms)
size = (100, 100, 1, 1); compute time is 0.0122(ms)
size = (100, 100, 10, 10); compute time is 0.1628(ms)
size = (100, 100, 200, 200); compute time is 123.1739(ms)
```
After:
```
size = (1, 1, 1, 1); compute time is 0.0105(ms)
size = (1, 1, 10, 10); compute time is 0.0114(ms)
size = (1, 1, 200, 200); compute time is 0.5771(ms)
size = (1, 10, 1, 1); compute time is 0.0105(ms)
size = (1, 10, 10, 10); compute time is 0.0160(ms)
size = (1, 10, 200, 200); compute time is 6.9851(ms)
size = (1, 100, 1, 1); compute time is 0.0122(ms)
size = (1, 100, 10, 10); compute time is 0.0848(ms)
size = (1, 100, 200, 200); compute time is 98.6758(ms)
size = (10, 1, 1, 1); compute time is 0.0105(ms)
size = (10, 1, 10, 10); compute time is 0.0115(ms)
size = (10, 1, 200, 200); compute time is 0.2690(ms)
size = (10, 10, 1, 1); compute time is 0.0105(ms)
size = (10, 10, 10, 10); compute time is 0.0159(ms)
size = (10, 10, 200, 200); compute time is 6.6946(ms)
size = (10, 100, 1, 1); compute time is 0.0123(ms)
size = (10, 100, 10, 10); compute time is 0.0854(ms)
size = (10, 100, 200, 200); compute time is 98.7327(ms)
size = (100, 1, 1, 1); compute time is 0.0107(ms)
size = (100, 1, 10, 10); compute time is 0.0116(ms)
size = (100, 1, 200, 200); compute time is 0.2681(ms)
size = (100, 10, 1, 1); compute time is 0.0104(ms)
size = (100, 10, 10, 10); compute time is 0.0159(ms)
size = (100, 10, 200, 200); compute time is 6.7507(ms)
size = (100, 100, 1, 1); compute time is 0.0124(ms)
size = (100, 100, 10, 10); compute time is 0.0852(ms)
size = (100, 100, 200, 200); compute time is 98.6866(ms)
```
For real modle Resnext101, we can also get **~20%** performance improvement for large batch size,
Test script:
```
 import torch
 import torchvision
 import torch
 import time

 torch.manual_seed(0)
 #torch.set_num_threads(1)

 model = torchvision.models.resnext101_32x8d().eval()

 for batch_size in [1, 64]:
     input = torch.randn(batch_size, 3, 224, 224)
     #warm up
     with torch.no_grad():
         for i in range(5):
             output = model(input)

         fwd_t = 0
         for i in range(10):
             t1 = time.time()
             output = model(input)
             t2 = time.time()
             fwd_t = fwd_t + (t2 - t1)

         time_fwd_avg = fwd_t / 10 * 1000
         print("Throughput of resnext101 with batch_size = %d is %10.2f (imgs/s)" % (batch_size, batch_size * 1000/              time_fwd_avg ))
```
Before:
```
Throughput of resnext101 with batch_size = 1 is       7.89 (imgs/s)
Throughput of resnext101 with batch_size = 64 is      13.02 (imgs/s)

num_threads =1
Throughput of resnext101 with batch_size = 1 is       2.97 (imgs/s)
Throughput of resnext101 with batch_size = 64 is       2.75 (imgs/s)
```
After:
```
Throughput of resnext101 with batch_size = 1 is       8.95 (imgs/s)
Throughput of resnext101 with batch_size = 64 is      15.52 (imgs/s)

num_threads = 1
Throughput of resnext101 with batch_size = 1 is       3.10 (imgs/s)
Throughput of resnext101 with batch_size = 64 is       2.88 (imgs/s)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34530

Differential Revision: D20479560

Pulled By: ngimel

fbshipit-source-id: 2e788ebcd814556116c90553ec61159eeffb3c16
2020-03-17 09:22:35 -07:00
a8ca340ad6 Remove all uses of AT_CHECK and replace them with TORCH_CHECK (#34846)
Summary:
AT_CHECK has been deprecated and provides no more features than
TORCH_CHECK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34846

Differential Revision: D20481339

Pulled By: mrshenli

fbshipit-source-id: 1777e769a069a78e03118270294e5e273d516ca7
2020-03-17 08:59:02 -07:00
76d9e76b4a Default to erroring when failing to return from non-void function. (#34663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34663

Been bitten by this so many times.  Never more.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20425480

Pulled By: ezyang

fbshipit-source-id: c4489efacc4149c9b57d1b8207cc872970c2501f
2020-03-17 07:31:56 -07:00
d9b97a4ffd [caffe2] open source 2/4-bit SLS operators (#34783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34783

Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM

Test Plan: CI

Reviewed By: yinghai

Differential Revision: D20461609

fbshipit-source-id: b3ef73ff10f2433afe06ffa73fe1145282d9ec4c
2020-03-17 01:00:31 -07:00
089a0a2117 [torchbind] Test moving custom classes to/from IValue (#34847)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34847

Test Plan: Imported from OSS

Differential Revision: D20480512

Pulled By: jamesr66a

fbshipit-source-id: 87f5f8ea8764e26d383b17e4f72538166ddd0655
2020-03-16 23:57:42 -07:00
699a4ed8f5 [testing][do not land] (#34605)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34605

Test Plan: Imported from OSS

Differential Revision: D20393219

Pulled By: jamesr66a

fbshipit-source-id: c74d886f5f01061294203a002b72b75a3c446f09
2020-03-16 23:56:00 -07:00
89cbc0edea fix tests that could have racy script module instantiation (#34792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34792

it is not thread safe to initiate script module in multiple threads.

for both test_remote_script_module and test_torchscript_functions_not_supported, it is possible that client thread is initiating MyScriptModule while server thread is initiating it as well in the same rank process.

removing MyScriptModule instatiation in client thread, it is not needed actually.
ghstack-source-id: 100266609

Test Plan: unit tests

Differential Revision: D20463234

fbshipit-source-id: 6ff70ad90fa50b0b44c78df2495b4bcaabb4487b
2020-03-16 23:14:07 -07:00
e70c28856f [Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811)
Summary:
To speed up compilation time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811

Test Plan: CI

Differential Revision: D20476992

Pulled By: malfet

fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792
2020-03-16 22:15:18 -07:00
471ddacd8b Add retry decorator and use it for Hub tests. (#34829)
Summary:
fix https://github.com/pytorch/pytorch/issues/34751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34829

Differential Revision: D20476231

Pulled By: ailzhang

fbshipit-source-id: eb38ee655e28250352b15e8e37b3b39310a7c378
2020-03-16 20:19:45 -07:00
b336deb6ee [quant][mobile] Not use qnnpack max_pool2d if ceil_mode is true (#34844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34844

QNNPACK max_pool2d operator does not support ceil_mode so this can cause crashes in the kernel when it is set to true.
We default to the server implementation when ceil_mode is set to true

Test Plan:
python test/test_quantized.py

Imported from OSS

Differential Revision: D20478701

fbshipit-source-id: 7962444ac493f5c3c32a9aa1a7be465e8b84ccc2
2020-03-16 19:27:04 -07:00
1e140c353c [profiler][rpc] fix a race condition in the profiler when multiple threads call (#33719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33719

We were seeing a strange error where gathering profiler events (specifically `parse_cpu_trace` in `profiler.py`) would fail with the error:
`IndexError: pop from empty list`.

It turned out that this was because for one particular `Event`, there was a pop recorded but not a push. Instead of the `push` event being completely missing, it was overwritten by a completely different event.

After a bunch of debugging, and trying several hypotheses, it turns out that this was a race condition in `RangeEventList::record`. What happened was that different threads would call into `RangeEventList::record` on the same event list instance, and one record would stomp over the data written by the other one. Somehow the data written was a valid `Event` so the error did not manifest itself until the profiler realized a `pop` was missing a matching `push` in the python code.

I fixed this by adding a lock to serialize writes to `RangeEventList::record`.

This PR also makes a small change to pass in the `RecordFunction` name into `popRange`. It makes the debugging easier when investigating the events recorded.

Differential Revision: D20071125

fbshipit-source-id: 70b51a65bcb833a7c88b7462a978fd3a39265f7e
2020-03-16 18:41:16 -07:00
422e348619 Don't run user function until all UserRRefs in the args are confirmed (#34497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34497

Use a thread_local table to intercept UserRRefs created during user
function args deserialization, and then wait for confirmations of
those UserRRefs before launching the given user function.

Differential Revision: D20347464

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 087484a2d2f03fbfb156752ab25653f39b412a07
2020-03-16 18:30:06 -07:00
d876fef743 Fix send count for local RPC (#34809)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34809

Test Plan: Imported from OSS

Differential Revision: D20470495

Pulled By: mrshenli

fbshipit-source-id: 2d6e2a2889be07fb074443f05db5089291daf8cf
2020-03-16 18:30:01 -07:00
38b2856c71 Split deserialize from runPythonUdf and remove generatePythonUDFResult (#34496)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34496

Differential Revision: D20347469

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: b832a3a9e2ef61f149175f737b26f65d63bf797b
2020-03-16 18:28:07 -07:00
ae0c88d6aa .circleci: Add manywheel builds for python 3.8 (#34732)
Summary:
Not entirely sure why this wasn't here before but we definitely need to
test for this.

Closes https://github.com/pytorch/pytorch/issues/34727

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34732

Differential Revision: D20480508

Pulled By: seemethere

fbshipit-source-id: 43bcff679ca35993f6bf1b10980acd7c86f780b1
2020-03-16 17:28:46 -07:00
480d1849b0 [ONNX] Fix for expand -1 dim value (#34069)
Summary:
PyTorch expand allows size with -1 dim value. -1 dim value means to infer the dimension from input tensor. This can be exported to ONNX expand with 1 dim value since ONNX expand supports two-way broadcast.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34069

Reviewed By: hl475

Differential Revision: D20195532

Pulled By: houseroad

fbshipit-source-id: c90e7d51b9d7422c09c5ed6e135ca8263105b8c9
2020-03-16 15:30:20 -07:00
1bac5fd0d3 add hardsigmoid FP operator to PyTorch (#34545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34545

This is for common operator coverage, since this is widely used.  A future PR
will add the quantized version.

Some initial questions for reviewers, since it's my first FP operator
diff:
* do we need a backwards.out method for this?
* do we need CUDA? If yes, should it be this PR or is it ok to split

Test Plan:
```
// test
python test/test_torch.py TestTorchDeviceTypeCPU.test_hardsigmoid_cpu_float32

// benchmark
python -m pt.hardsigmoid_test
...
Forward Execution Time (us) : 40.315

Forward Execution Time (us) : 42.603
```

Imported from OSS

Differential Revision: D20371692

fbshipit-source-id: 95668400da9577fd1002ce3f76b9777c6f96c327
2020-03-16 15:24:12 -07:00
6d8649dc53 [caffe2] fix Transpose2D calls in NHWC<->NCHW (#34625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34625

These templated function calls are not specifying the template args correctly.  The first arg is the index type, not the array data type.  That means, right now it's using `T` as the index type as well, which will break if we do a template specialization for uint8_t.  If we omit both, it will correctly infer that the index type is `int` and the data type is `T`.

Reviewed By: BIT-silence

Differential Revision: D20358728

fbshipit-source-id: 8cbd8eeb14bce602c02eb6fce2cc141f0121fa24
2020-03-16 15:18:44 -07:00
31eaeba38a Increase the prec of test_baddbmm (#34764)
Summary:
This test is flaky on my computer, the error is:
```
AssertionError: tensor(1.3351e-05) not less than or equal to 1e-05
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34764

Differential Revision: D20476006

Pulled By: ezyang

fbshipit-source-id: dad7e702275346070552c8a98765c37e6ca2c197
2020-03-16 15:06:01 -07:00
8bae1ed144 PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem - copy (#34721)
Summary:
This is a copy of PR https://github.com/pytorch/pytorch/issues/29488 to help the merging process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34721

Differential Revision: D20444270

Pulled By: vincentqb

fbshipit-source-id: 042c56c8c0dae37834f52b4aee2deae7dd6fa659
2020-03-16 14:13:30 -07:00
976d6aaa51 Revert D20251830: [TensorExpr] Add tensorexpr benchmarks.
Test Plan: revert-hammer

Differential Revision:
D20251830

Original commit changeset: bafd66ce32f6

fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052
2020-03-16 13:20:16 -07:00
ef78fa8668 caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810)
Summary:
Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15%
For example, it reduces pool_op.cu compilation from 18.8s to 16s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810

Test Plan: CI

Differential Revision: D20472230

Pulled By: malfet

fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b
2020-03-16 12:58:05 -07:00
e93e7b2795 [TensorExpr] Add tensorexpr benchmarks. (#34230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230

This PR adds some benchmarks that we used to assess tensor expressions performance.

Differential Revision: D20251830

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af
2020-03-16 11:49:39 -07:00
ea5c86c276 [TensorExpr] Add LLVM codegen. (#34228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228

This PR adds LLVM codegen to tensor expressions. LLVM is added as an
optional build dependency specified with `USE_LLVM=<path_to_llvm>`
variable. If this variable is not set or LLVM is not found in the
specified path, the LLVM codegen is completely disabled.

Differential Revision: D20251832

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2
2020-03-16 11:49:34 -07:00
35e7efeb9a [TensorExpr] Add CUDA codegen. (#34227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34227

This PR adds a CUDA support to tensor expressions.

Differential Revision: D20251836

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: ab36a55834cceff30c8371fef6cca1054a32f017
2020-03-16 11:49:29 -07:00
42b2c8c65d [TensorExpr] Add a fuser pass based on tensor expressions. (#34226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34226

LLVM and Cuda backends are added in subsequent PRs, so at this point the fuser is pretty useless, but it still can be tested and its logic is not going to change with addition of the codegens.

Differential Revision: D20251838

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 82b0d221fa89904ed526689d02a6c7676a8ce8de
2020-03-16 11:49:24 -07:00
e31d462e92 [TensorExpr] Pull changes to core classes for representing expressions and statements from the side branch. (#34224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34224

Our development has been happening on a side branch `pytorch_fusion` in
`bertmaher/pytorch` fork. This PR moves changes to the core classes
representing expressions and transformations on them.

At this moment, the tensor expressions are only used in tests.
Subsequent PRs add LLVM and CUDA codegen for tensor expressions and
implement fuser on top of these.

This PR is huge as it is a squashed version of changes in the side
branch. It is not practical to pull changes one by one from the branch,
so here is the squashed version. If you're interested in seeing the
history of changes, please refer to https://github.com/bertmaher/pytorch

Differential Revision: D20251835

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 1a871acc09cf3c6f7fb4af40d408cdbb82dc7dab
2020-03-16 11:47:47 -07:00
99b91ee2ad [fix][tiny][caffe2] Avoid triggering errors when allow ratio is 100% (#34757)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34757

Reviewed By: Wakeupbuddy

Differential Revision: D20451255

fbshipit-source-id: 07997cf31dba653b61d082ec3f28357c3b90c4eb
2020-03-16 11:39:32 -07:00
24c9e61e79 Enable JIT tests on Windows (#27029)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27029

Reviewed By: eellison

Differential Revision: D20458664

Pulled By: jamesr66a

fbshipit-source-id: 22be918543703869f471e89b3478423198351bf3
2020-03-16 11:26:21 -07:00
1af6002321 Initial implementation of NNPI Int8FC op
Test Plan:
```
 buck test mode/no-gpu glow/fb/test/numerics:test_fc_nnpi_int8nnpi -- --print-passing-detail
```

Reviewed By: hyuen

Differential Revision: D20450490

fbshipit-source-id: c4811cdc994548b6e319d57115434dfc199e07c2
2020-03-16 10:46:17 -07:00
a57f92e4de [jit] copy unused/ignored methods to ScriptModule during compilation (#33981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33981

Okay it turns out that https://github.com/pytorch/pytorch/pull/29342
deletes actually useful things from the resulting Python module. In
particular, people like having `ignore`'d methods attached so that they
can invoke them from python.

Test Plan: Imported from OSS

Differential Revision: D20171650

Pulled By: suo

fbshipit-source-id: 71862e932c6a56cd055d0cff6657887ee0ceb9a8
2020-03-16 10:38:59 -07:00
cec9758afa [quant][graphmode] Add quantization pattern for quantized::add_relu (#33532)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33532

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20354880

fbshipit-source-id: ea608a5ace395a909851f9e577ffdcb51512a3af
2020-03-16 10:20:57 -07:00
8eaafbd99b Remove unused newWithSize declaration. (#34730)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34730

Test Plan: Imported from OSS

Differential Revision: D20446078

Pulled By: gchanan

fbshipit-source-id: 0effc088dcba4f60385e3b23fa656cb772a3b7bc
2020-03-16 09:17:54 -07:00
b94d650868 Remove unused newView declaration. (#34729)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34729

Test Plan: Imported from OSS

Differential Revision: D20446077

Pulled By: gchanan

fbshipit-source-id: b68471aeaf673851bdfc6bb0615aba8ebb883a4c
2020-03-16 09:16:14 -07:00
a66b837b19 Migrate dirichlet_grad from CUDA_tensor_apply4 to TensorIterator (#33996)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33996

Test Plan: Imported from OSS

Differential Revision: D20196789

Pulled By: VitalyFedyunin

fbshipit-source-id: 69ee720f4f3d8a2df91874b77ee3918ce1b951b2
2020-03-16 08:56:32 -07:00
c3c0cf1591 Migrate binary_cross_entropy_backward from CUDA_tensor_apply4 to (#33995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33995

TensorIterator

Test Plan: Imported from OSS

Differential Revision: D20196790

Pulled By: VitalyFedyunin

fbshipit-source-id: c0c231a20e6e69fc3c68c3ac5082b20f2feb6158
2020-03-16 08:54:49 -07:00
762be86e63 [C++ API Parity] [Optimizers] added closure to optimizers (#34790)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34790

Differential Revision: D20468361

Pulled By: anjali411

fbshipit-source-id: 1c6115d735b211dc2bedf002d58931cb32cf657a
2020-03-16 07:51:44 -07:00
bdd7dbfd4b [C++ API] RNN / GRU / LSTM layer refactoring (#34322)
Summary:
This PR refactors RNN / GRU / LSTM layers in C++ API to exactly match the implementation in Python API.

**BC-breaking changes:**
- Instead of returning `RNNOutput`, RNN / GRU forward method now returns `std::tuple<Tensor, Tensor>`, and LSTM forward method now returns `std::tuple<Tensor, std::tuple<Tensor, Tensor>>`, matching Python API.
- RNN / LSTM / GRU forward method now accepts the same inputs (input tensor and optionally hidden state), matching Python API.
- RNN / LSTM / GRU layers now have `forward_with_packed_input` method which accepts `PackedSequence` as input and optionally hidden state, matching the `forward(PackedSequence, ...)` variant in Python API.
- RNN / LSTM / GRU layers no longer have these fields: `w_ih` / `w_hh` / `b_ih` / `b_hh`. Instead, to access the weights and biases of the gates, users should do e.g. `rnn->named_parameters()["weight_ih_l0"]`, which mirrors the Python API `rnn.weight_ih_l0`.
- In `RNNOptions`
    - `tanh()` / `relu()` / `activation` are removed. Instead, `nonlinearity` is added which takes either `torch::kTanh` or `torch::kReLU`
    - `layers` -> `num_layers`
    - `with_bias` -> `bias`
- In `LSTMOptions`
    - `layers` -> `num_layers`
    - `with_bias` -> `bias`
- In `GRUOptions`
    - `layers` -> `num_layers`
    - `with_bias` -> `bias`

The majority of the changes in this PR focused on refactoring the implementations in `torch/csrc/api/src/nn/modules/rnn.cpp` to match the Python API. RNN tests are then changed to reflected the revised API design.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34322

Differential Revision: D20458302

Pulled By: yf225

fbshipit-source-id: ffff2ae1ddb1c742c966956f6ad4d7fba03dc54d
2020-03-15 17:48:29 -07:00
d4f182d06b Add overloaded name to prim operators (#34280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34280

To have prim ops searchable for lite interpreter, overloaded names need to be added for the operators with the same name but different schema. For example, aten::add in register_prim_ops.cpp. The difference is a combination of args and output type.
`"aten::add(str a, str b) ->str"`
`"aten::add(int a, int b) ->int"`
`"aten::add(float a, float b) ->float"`
`"aten::add(int a, float b) ->float"`
`"aten::add(float a, int b) ->float"`
`"aten::add(Scalar a, Scalar b) ->Scalar"`

Solution:
Use the argument type and/or output type (the same to the existing overloaded names). The overloaded name should be minimum as long as the operators can be differentiated. For other operators please look into the source code change for details.

`"aten::add.str(str a, str b) ->str"`
`"aten::add.int(int a, int b) ->int"`
`"aten::add.float(float a, float b) ->float"`
`"aten::add.int_float(int a, float b) ->float"`
`"aten::add.float_int(float a, int b) ->float"`
`"aten::add.Scalar_Scalar(Scalar a, Scalar b) ->Scalar"`

Test Plan: Imported from OSS

Differential Revision: D20456997

Pulled By: iseeyuan

fbshipit-source-id: 2c3dc324b4a4e045559f62c6cc2a10fbb9a72dcf
2020-03-15 17:05:54 -07:00
c86d1361b8 Removes unused THCTensor_(triu), THCTensor_(div) (#34712)
Summary:
Per title. Dead code removal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34712

Differential Revision: D20442618

Pulled By: mruberry

fbshipit-source-id: b03aa4984328f94021c1480e21375fd868d6d550
2020-03-15 16:42:35 -07:00
c258e4732a solve conv3d backward get incorrect result problem (#34358)
Summary:
Fix https://github.com/pytorch/pytorch/issues/34344.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34358

Differential Revision: D20461698

Pulled By: ngimel

fbshipit-source-id: 472624d0037ab65d9dcc221f647ec68818be5fc9
2020-03-15 16:15:53 -07:00
7848c229b8 Move min and max(reduce all) to Aten(CPU) (#33936)
Summary:
This PR is about port min and max(reduce all) to Aten.
Performance test script:
```
import torch
import timeit

torch.manual_seed(0)
#torch.set_num_threads(1)

device = "cpu"
print(f'device: {device}')
for op in ('max', 'min'):
    for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
        for n, t in [(20_000, 200000),
                     (200_000, 20000)]:
            print(f'a.{op}(), numel() == {n} for {t} times, dtype={dtype}')
            print(timeit.timeit(f'a.{op}()', setup=f'import torch; a =(torch.torch.randn({n}) * 100).to({dtype})', number=t))
```
Test device: **skx-8180, 2 sockets**
Before:
```
a.max(), numel() == 20000 for 200000 times, dtype=torch.double
2.773961597122252
a.max(), numel() == 200000 for 20000 times, dtype=torch.double
2.3256353894248605
a.max(), numel() == 20000 for 200000 times, dtype=torch.float
3.800648272037506
a.max(), numel() == 200000 for 20000 times, dtype=torch.float
3.31692426931113
a.max(), numel() == 20000 for 200000 times, dtype=torch.int16
2.735901520587504
a.max(), numel() == 200000 for 20000 times, dtype=torch.int16
2.2510280115529895
a.max(), numel() == 20000 for 200000 times, dtype=torch.int32
2.723656536079943
a.max(), numel() == 200000 for 20000 times, dtype=torch.int32
2.228839812800288
a.max(), numel() == 20000 for 200000 times, dtype=torch.int64
2.703160767443478
a.max(), numel() == 200000 for 20000 times, dtype=torch.int64
2.3175809988752007
a.min(), numel() == 20000 for 200000 times, dtype=torch.double
2.820106916129589
a.min(), numel() == 200000 for 20000 times, dtype=torch.double
2.325718787498772
a.min(), numel() == 20000 for 200000 times, dtype=torch.float
3.833602518774569
a.min(), numel() == 200000 for 20000 times, dtype=torch.float
3.316444822587073
a.min(), numel() == 20000 for 200000 times, dtype=torch.int16
2.7308286419138312
a.min(), numel() == 200000 for 20000 times, dtype=torch.int16
2.198460517451167
a.min(), numel() == 20000 for 200000 times, dtype=torch.int32
2.730219766497612
a.min(), numel() == 200000 for 20000 times, dtype=torch.int32
2.2268200274556875
a.min(), numel() == 20000 for 200000 times, dtype=torch.int64
2.7342184390872717
a.min(), numel() == 200000 for 20000 times, dtype=torch.int64
2.320415544323623
```
After:
```
a.max(), numel() == 20000 for 200000 times, dtype=torch.double
1.7767417253926396
a.max(), numel() == 200000 for 20000 times, dtype=torch.double
0.550495645031333
a.max(), numel() == 20000 for 200000 times, dtype=torch.float
1.1113408291712403
a.max(), numel() == 200000 for 20000 times, dtype=torch.float
0.44446005020290613
a.max(), numel() == 20000 for 200000 times, dtype=torch.int16
0.5246349424123764
a.max(), numel() == 200000 for 20000 times, dtype=torch.int16
0.47057845536619425
a.max(), numel() == 20000 for 200000 times, dtype=torch.int32
0.6597231412306428
a.max(), numel() == 200000 for 20000 times, dtype=torch.int32
0.40366593934595585
a.max(), numel() == 20000 for 200000 times, dtype=torch.int64
1.767227927222848
a.max(), numel() == 200000 for 20000 times, dtype=torch.int64
0.6187495030462742
a.min(), numel() == 20000 for 200000 times, dtype=torch.double
1.7881382443010807
a.min(), numel() == 200000 for 20000 times, dtype=torch.double
0.5440589748322964
a.min(), numel() == 20000 for 200000 times, dtype=torch.float
1.1090848250314593
a.min(), numel() == 200000 for 20000 times, dtype=torch.float
0.4293213738128543
a.min(), numel() == 20000 for 200000 times, dtype=torch.int16
0.5207074657082558
a.min(), numel() == 200000 for 20000 times, dtype=torch.int16
0.41422136034816504
a.min(), numel() == 20000 for 200000 times, dtype=torch.int32
0.6145811947062612
a.min(), numel() == 200000 for 20000 times, dtype=torch.int32
0.4172037309035659
a.min(), numel() == 20000 for 200000 times, dtype=torch.int64
1.7397673893719912
a.min(), numel() == 200000 for 20000 times, dtype=torch.int64
0.596766366623342
```
Single thread:
Before:
```
a.max(), numel() == 20000 for 200000 times, dtype=torch.double
2.5068740313872695
a.max(), numel() == 200000 for 20000 times, dtype=torch.double
2.234461876563728
a.max(), numel() == 20000 for 200000 times, dtype=torch.float
3.5549037409946322
a.max(), numel() == 200000 for 20000 times, dtype=torch.float
3.2497852174565196
a.max(), numel() == 20000 for 200000 times, dtype=torch.int16
2.493077039718628
a.max(), numel() == 200000 for 20000 times, dtype=torch.int16
2.171935741789639
a.max(), numel() == 20000 for 200000 times, dtype=torch.int32
2.469274105504155
a.max(), numel() == 200000 for 20000 times, dtype=torch.int32
2.273881389759481
a.max(), numel() == 20000 for 200000 times, dtype=torch.int64
2.5818942049518228
a.max(), numel() == 200000 for 20000 times, dtype=torch.int64
2.2394551979377866
a.min(), numel() == 20000 for 200000 times, dtype=torch.double
2.5894540259614587
a.min(), numel() == 200000 for 20000 times, dtype=torch.double
2.331936141476035
a.min(), numel() == 20000 for 200000 times, dtype=torch.float
3.590122046880424
a.min(), numel() == 200000 for 20000 times, dtype=torch.float
3.255849950015545
a.min(), numel() == 20000 for 200000 times, dtype=torch.int16
2.5205496419221163
a.min(), numel() == 200000 for 20000 times, dtype=torch.int16
2.168218174017966
a.min(), numel() == 20000 for 200000 times, dtype=torch.int32
2.658622432500124
a.min(), numel() == 200000 for 20000 times, dtype=torch.int32
2.3376982398331165
a.min(), numel() == 20000 for 200000 times, dtype=torch.int64
2.496626536361873
a.min(), numel() == 200000 for 20000 times, dtype=torch.int64
2.2504652086645365
```
After:
```
a.max(), numel() == 20000 for 200000 times, dtype=torch.double
1.9525171788409352
a.max(), numel() == 200000 for 20000 times, dtype=torch.double
1.6108122132718563
a.max(), numel() == 20000 for 200000 times, dtype=torch.float
1.2444602297618985
a.max(), numel() == 200000 for 20000 times, dtype=torch.float
0.7705567870289087
a.max(), numel() == 20000 for 200000 times, dtype=torch.int16
0.6575072864070535
a.max(), numel() == 200000 for 20000 times, dtype=torch.int16
0.13242999743670225
a.max(), numel() == 20000 for 200000 times, dtype=torch.int32
0.829406064003706
a.max(), numel() == 200000 for 20000 times, dtype=torch.int32
0.35575105529278517
a.max(), numel() == 20000 for 200000 times, dtype=torch.int64
1.6426756298169494
a.max(), numel() == 200000 for 20000 times, dtype=torch.int64
1.4049720335751772
a.min(), numel() == 20000 for 200000 times, dtype=torch.double
2.029639278538525
a.min(), numel() == 200000 for 20000 times, dtype=torch.double
1.6363644907251
a.min(), numel() == 20000 for 200000 times, dtype=torch.float
1.3821239182725549
a.min(), numel() == 200000 for 20000 times, dtype=torch.float
0.834847847931087
a.min(), numel() == 20000 for 200000 times, dtype=torch.int16
0.6913397628813982
a.min(), numel() == 200000 for 20000 times, dtype=torch.int16
0.1370067736133933
a.min(), numel() == 20000 for 200000 times, dtype=torch.int32
0.8190992185845971
a.min(), numel() == 200000 for 20000 times, dtype=torch.int32
0.3640836915001273
a.min(), numel() == 20000 for 200000 times, dtype=torch.int64
1.6516661625355482
a.min(), numel() == 200000 for 20000 times, dtype=torch.int64
1.4111155439168215
```
Fixes: https://github.com/pytorch/pytorch/issues/33197

Fix https://github.com/pytorch/pytorch/issues/24728, https://github.com/pytorch/pytorch/issues/24729
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33936

Differential Revision: D20461658

Pulled By: ngimel

fbshipit-source-id: 5749260114ace3ea7b513e32edc805c844a19c8a
2020-03-15 16:09:58 -07:00
f058c03b15 Disallow sending CUDA tensors over RPC for current RPC agents. (#33604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33604

For our current RPC agents, this PR disallows sending CUDA tensors
over RPC and asks users to copy them explicitly to CPU. Currently, this seems
to be the easiest contract to guarantee for our current RPC agents, otherwise
if we do support this transparently it gets a little tricky in terms of whether
a CUDA tensor on the client should be sent to CPU/GPU of the remote end and
also which GPU device on the remote end.

In the future, the TensorPipe RPC agent can have its own specific handling of
CUDA tensors.

Closes https://github.com/pytorch/pytorch/issues/28881
ghstack-source-id: 100166120

Test Plan: waitforbuildbot

Differential Revision: D20020183

fbshipit-source-id: ca4d43d2a24e8fcd3a60b21e654aa0e953e756cb
2020-03-15 15:01:46 -07:00
f404537c26 CUDA Loops: move address computation into policy, make policy.load load all arguments (#33720)
Summary:
So that in the future we can make policy accept an offset calculator in its constructor for the support of non-contiguous tensors.

The `elementwise_kernel_helper` is now very general and it can handle any cases:

```C++
template<typename func_t, typename policy_t>
__device__ inline void elementwise_kernel_helper(func_t f, policy_t policy) {
  using traits = function_traits<func_t>;
  using return_t = typename traits::result_type;
  using args_t = typename traits::ArgsTuple;

  int idx = blockIdx.x;

  return_t results[thread_work_size];
  cuda9::workaround::enable_default_constructor<args_t> args_[thread_work_size];
  args_t *args = reinterpret_cast<args_t *>(&args_);

  // load
  policy.load(args, idx);

  // compute
  #pragma unroll
  for (int i = 0; i < thread_work_size; i++) {
    if (policy.check_inbounds(i)) {
      results[i] = c10::guts::apply(f, args[i]);
    }
  }

  // store
  policy.store(results, idx);
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33720

Differential Revision: D20459652

Pulled By: ngimel

fbshipit-source-id: aa8b122e0e8c6e08ab354785e04753ff778882e2
2020-03-15 14:41:05 -07:00
528aabd373 Fix backward compatibility check test for schemas containing (#34782)
Summary:
"torch.classes".
BC check tests skips adding torch.classes based schemas to existing
schemas. Removed the skip.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34782

Test Plan:
cd test/backward_compatibility
python dump_all_function_schemas.py --filename new_schemas.txt
python check_backward_compatibility.py --new-schemas new_schemas.txt

Before this PR fails with:
```
Mar 15 11:12:20 Broken ops: [
Mar 15 11:12:20 	_xnnpack::conv2d_packed(Tensor X, __torch__.torch.classes.XNNPackConv2dOpContext W_prepack) -> (Tensor Y)
Mar 15 11:12:20 	_xnnpack::conv2d_prepack(Tensor W, Tensor? B, int[2] stride, int[2] padding, int[2] dilation, int groups) -> (__torch__.torch.classes.XNNPackConv2dOpContext)
Mar 15 11:12:20 	_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.XNNPackLinearOpContext W_prepack) -> (Tensor Y)
Mar 15 11:12:20 	_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> (__torch__.torch.classes.XNNPackLinearOpContext)
Mar 15 11:12:20 ]
```
After this PR, it passes.

Reviewed By: houseroad

Differential Revision: D20461994

Pulled By: kimishpatel

fbshipit-source-id: de692644ee7d49accf2d8260cd3a10f6e147653a
2020-03-15 14:35:19 -07:00
15c84c37b6 [PyTorch BC] Clean up the BC whitelist (#34784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34784

Remove stale items

Test Plan: ci

Reviewed By: hl475

Differential Revision: D20461740

fbshipit-source-id: 46dcc39f3a867165aadee182033b09ca65ee8551
2020-03-15 12:46:57 -07:00
08bc3c6cbf Remove unnecessary import (#34778)
Summary:
https://github.com/pytorch/pytorch/issues/34563 accidentally introduced a lint error due to an unused import. This PR removes this import.

Jit tests run as expected after this change:
```
> python test/test_jit.py
.....
Ran 2435 tests in 100.077s

OK (skipped=140, expected failures=1)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34778

Differential Revision: D20459708

Pulled By: tugrulince

fbshipit-source-id: bb742085fafc849ff3d9507d1557556e01fbeb4b
2020-03-15 09:56:55 -07:00
1d81bd02cc Export roi_align_gradient_op to c10 (#34776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34776

Export roi_align_gradient_op to c10

Test Plan: unittest

Reviewed By: houseroad

Differential Revision: D20459210

fbshipit-source-id: 80bf065f83bb44b39a150bae25b3591c16f522fa
2020-03-15 02:43:39 -07:00
373c80ee90 Fix missing header (#34762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34762

So far it's by luck that we somehow include "caffe2/core/tensor.h" before including "caffe2/caffe2/quantization/server/fbgemm_pack_blob.h". This is not safe and this diff fixes it.

Test Plan: unittest

Reviewed By: jianyuh

Differential Revision: D20455352

fbshipit-source-id: 777dae32a23d0ec75fd7e5e1627426b5a5f81f5a
2020-03-15 00:19:42 -07:00
6c555e1508 Revert D20311699: [pytorch][PR] [C++ API] RNN / GRU / LSTM layer refactoring
Test Plan: revert-hammer

Differential Revision:
D20311699

Original commit changeset: e2b60fc7bac6

fbshipit-source-id: 72f4a762189490998d6b716857eeac053a11742d
2020-03-14 16:18:48 -07:00
84bd71dbd4 Enable threading for XNNPACK ops. (#34547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34547

This enables threading by passing a threadpool to xnnpack ops.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20370553

fbshipit-source-id: 4db08e73f8c69b9e722b0e11a00621c4e229a31a
2020-03-14 12:53:36 -07:00
4da5569300 Pass to remove prepacking ops. (#34319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34319

Removes prepacking ops and install them as attributes of the top level
module. Needs to run freezing as the first pass.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20290726

fbshipit-source-id: 633ceaa867ff7d5c8e69bd814c0362018394cb3a
2020-03-14 12:53:31 -07:00
7dd5da2026 JIT pass to insert XNNPACK ops (#34048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34048

Rewrites the graph to insert xnnpack prepack and packed run ops for
conv2d and linear.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20185658

fbshipit-source-id: c4c073c912ad33e822e7beb4ed86c9f895129d55
2020-03-14 12:53:27 -07:00
4c30fc7238 Integrate XNNPACK with custom class for packing weights. (#34047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047

This PR integrates the added xnnpack conv2d and linear op via
custom class registration for packed weights. The packed struct
is serializable.

Test Plan:
python test test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20185657

fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698
2020-03-14 12:51:56 -07:00
e23a9dc140 [C++ API] RNN / GRU / LSTM layer refactoring (#34322)
Summary:
This PR refactors RNN / GRU / LSTM layers in C++ API to exactly match the implementation in Python API.

**BC-breaking changes:**
- Instead of returning `RNNOutput`, RNN / GRU forward method now returns `std::tuple<Tensor, Tensor>`, and LSTM forward method now returns `std::tuple<Tensor, std::tuple<Tensor, Tensor>>`, matching Python API.
- RNN / LSTM / GRU forward method now accepts the same inputs (input tensor and optionally hidden state), matching Python API.
- RNN / LSTM / GRU now has `forward_with_packed_input` method which accepts `PackedSequence` as input and optionally hidden state, matching the `forward(PackedSequence, ...)` variant in Python API.
- In `RNNOptions`
    - `tanh()` / `relu()` / `activation` are removed. Instead, `nonlinearity` is added which takes either `torch::kTanh` or `torch::kReLU`
    - `layers` -> `num_layers`
    - `with_bias` -> `bias`
- In `LSTMOptions`
    - `layers` -> `num_layers`
    - `with_bias` -> `bias`
- In `GRUOptions`
    - `layers` -> `num_layers`
    - `with_bias` -> `bias`

The majority of the changes in this PR focused on refactoring the implementations in `torch/csrc/api/src/nn/modules/rnn.cpp` to match the Python API. RNN tests are then changed to reflected the revised API design.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34322

Differential Revision: D20311699

Pulled By: yf225

fbshipit-source-id: e2b60fc7bac64367a8434647d74c08568a7b28f7
2020-03-14 12:09:04 -07:00
5710374e4e [reland][quant][graphmode] Add quantized conv2d-relu fusion pattern (#33279) (#34744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34744

att

Test Plan: python test/test_jit.py

Differential Revision: D20449667

Pulled By: jerryzh168

fbshipit-source-id: 01bbc26604fac421dcaacaf4fa1b57731f1f08b7
2020-03-14 01:03:18 -07:00
fb20621b3b Move torchbind out of jit namespace (#34745)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34745

Test Plan: Imported from OSS

Differential Revision: D20450239

Pulled By: jamesr66a

fbshipit-source-id: 3f5597626f21d7b5e329b57da358c76b531bf806
2020-03-13 23:03:14 -07:00
8a395882ce [quant][onnx] Support conversion of quantized sigmoid operator from pytorch to caffe2 (#34629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34629

Add support for sigmoid in the conversion flow through onnx

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_quantized_sigmoid
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_small_model

Imported from OSS

Differential Revision: D20433680

fbshipit-source-id: 95943e14637d294122e4d102c5c19c06d27064c6
2020-03-13 22:42:06 -07:00
af28915164 [quant][onnx] Add support to convert max_pool2d quantized pytorch op to C2 (#33945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33945

Add mapping for this operator in symbolics

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_max_pool2d

Imported from OSS

Differential Revision: D20433681

fbshipit-source-id: 88f02ade698262a6f8824671830bc1f7d40bbfa6
2020-03-13 22:40:49 -07:00
d041d0784e [C++ API] RNNCell / LSTMCell / GRUCell layers (#34400)
Summary:
This PR adds `RNNCell` / `LSTMCell` / `GRUCell` layers to the C++ frontend, with implementations exactly matching the Python API equivalent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34400

Differential Revision: D20316859

Pulled By: yf225

fbshipit-source-id: bb7cee092622334043c0d0fd0fcb4e75e707699c
2020-03-13 21:52:24 -07:00
68758b2fa0 Add the quantized batch_norm3d and also batch_norm3d fused with relu operators (#34702)
Summary:
as title, for bringing up the quantized video model. Will add the batch_norm_relu test in another PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34702

Differential Revision: D20436092

Pulled By: lly-zero-one

fbshipit-source-id: 116bd306f7880bfd763d8575654fbd6c92818338
2020-03-13 20:30:28 -07:00
da11646db1 [C++ API] Link to module options doc for functional that has same options as module (#34752)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34752

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20452681

Pulled By: yf225

fbshipit-source-id: 06b56a08bd480999353ebbff39c035225e4070df
2020-03-13 20:19:43 -07:00
7dee36a061 .circleci: Remove CUDA 10.0, no longer needed (#34726)
Summary:
Since we've added CUDA 10.2, it is time to retire CUDA 10.0

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34726

Differential Revision: D20453081

Pulled By: seemethere

fbshipit-source-id: fd5bb35325a5f1577d0f0404d16cd7dfe34c86ad
2020-03-13 18:55:45 -07:00
52005b551c invokeOperatorFromPython: support overloaded operator calling (#34671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34671

Like the python arg parser, this tries to convert to the schema in order.
It introduces schema_match_exception which gets thrown when the schema doesn't match,
allowing the overload handler to try the next option.

Behavior will not 100% match the schema argument parser but should work for
simple cases using custom binding.

Test Plan: Imported from OSS

Differential Revision: D20432206

Pulled By: zdevito

fbshipit-source-id: 280839a2205ea3497db3a9b5741fccc1e2bff9a8
2020-03-13 18:46:03 -07:00
ab76a8206f [JIT][mobile] Support built-in Function call in lite interpreter (#34676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34676

Test Plan: Imported from OSS

Differential Revision: D20427938

Pulled By: jamesr66a

fbshipit-source-id: 79eebfa858776f26da55ffd49d3f78fa7ae0df9b
2020-03-13 18:24:18 -07:00
af3a7e2b50 [jit] small cleanups after script:: removal (#34677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34677

1. Remove remaining uses of `script::` namespace from the codebase,
2. Add one more typedef for `script::ExtraFilesMap` which is part of the
public interface.

Pull Request resolved: #34580

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D20431739

Pulled By: suo

fbshipit-source-id: a29d369c755b6506c53447ca1f286b6339222c9a
2020-03-13 17:56:16 -07:00
e7910aa9e5 [fix] use non-inplace for insert observer pass (#34190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34190

inplace modification of ClassType might affect other tests, so we want to do non-inplace modifications.
Actually the inplace argument will be removed soon.

Test Plan:
ci

Imported from OSS

Differential Revision: D20451765

fbshipit-source-id: e87ad528c4e7f84f5774b94a8e3e85568269682d
2020-03-13 17:25:07 -07:00
1734bd6871 skip mask_rcnn test (#34734)
Summary:
fix master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34734

Differential Revision: D20447607

Pulled By: eellison

fbshipit-source-id: 165c64f0484abf068b7d3a204a6bcb623ffe0910
2020-03-13 15:50:49 -07:00
6d790c3611 Mark PyTorch incompatible with python-3.6.0 (#34724)
Summary:
Per https://github.com/pytorch/pytorch/issues/19161 PyTorch is incompatible with 3.6.0 due to the missing `PySlice_Unpack`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34724

Test Plan: CI + try to load pytorch binary using python-3.6.0

Differential Revision: D20449052

Pulled By: malfet

fbshipit-source-id: 2c787fc64f5d1377c7f935ad2f3c77f46723d7dd
2020-03-13 15:22:34 -07:00
aedffdf7d8 Support for Tensor Shape Type Hint (#34595)
Summary:
This PR is related to [https://github.com/pytorch/pytorch/issues/33953](https://github.com/pytorch/pytorch/issues/33953).
I've created a directory `type_hint_tests` for the example as suggested by zou3519 [here](https://github.com/pytorch/pytorch/issues/33953#issuecomment-597716405). This directory is supposed to contain examples over which mypy will run. I've added the test in `test/test_type_hints.py`.
The test can simply be invoked by
```
$ python3 test/test_type_hints.py
Fail to import hypothesis in common_utils, tests are not derandomized
.b'test/type_hint_tests/size.py:7: error: Tuple index out of range\ntest/type_hint_tests/size.py:8: error: Tuple index out of range\n'
.
----------------------------------------------------------------------
Ran 2 tests in 13.660s

OK

```
Note that I've not made the change of fixing the stub to show that the test works. The issue can be fixed by changing definition of Size in `class Size(Tuple[_int, ...]): ... ` in `/torch/__init__.pyi.in`.
After changing the `Size` definition, the test passes.
```
$ python3 test/test_type_hints.py
Fail to import hypothesis in common_utils, tests are not derandomized
.b''
.
----------------------------------------------------------------------
Ran 2 tests in 19.382s

OK
```
I will do that once i get approval from zou3519. This is an initial implementation, please provide your suggestions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34595

Differential Revision: D20441817

Pulled By: zou3519

fbshipit-source-id: 00a434adf5bca813960f4efea38aa6d6953fe85f
2020-03-13 15:16:24 -07:00
c9ed111894 [caffe2][quantization] Add initializer and precision as read-only property to QueryTensorQparam (#34706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34706

as title

Test Plan: test in stacked diff

Reviewed By: csummersea

Differential Revision: D20436618

fbshipit-source-id: e51ef0a22708425cd296c05f4089fe8c98eda90a
2020-03-13 15:09:35 -07:00
c371c3aba7 [rpc][profiler] add a test case to verify record_function context manager works (#34511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34511

With https://github.com/pytorch/pytorch/pull/34122/files, issues
with using record_function context manager and profiling RPCs were fixed. This
adds a test case to verify that we can use RPC with the `record_function`
decorator.
ghstack-source-id: 100109932

Test Plan: Unit test change

Differential Revision: D20352242

fbshipit-source-id: d6429e4352ad3b8d874dc0f27b23ecb6202e6b2b
2020-03-13 15:03:30 -07:00
0f3b6f3dec Add min function to cuda math compat (#34723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34723

Add min function to cuda math compat

Test Plan: unittest

Reviewed By: houseroad

Differential Revision: D20444517

fbshipit-source-id: 1a93343cc57249ef1101eeb7ef373266f6a2873a
2020-03-13 14:31:09 -07:00
a730abd997 [PyTorch][tools] Add linux64 clang-format hash
Summary:
This commit adds a reference hash for the linux64 clang-format binary and in
doing so, enables this script to be used on Linux machines.

Test Plan:
Ran the script.

```
meghanl@devvm1517:caffe2  (ff25240c|remote/master)$ export http_proxy=fwdproxy:8080
meghanl@devvm1517:caffe2  (ff25240c|remote/master)$ export https_proxy=fwdproxy:8080
meghanl@devvm1517:caffe2  (ff25240c|remote/master)$ python3 ./tools/clang_format_new.py --diff
Downloading clang-format to /data/users/meghanl/fbsource/fbcode/caffe2/.clang-format-bin
0% |################################################################| 100%
Using clang-format located at /data/users/meghanl/fbsource/fbcode/caffe2/.clang-format-bin/clang-format
meghanl@devvm1517:caffe2  (ff25240c|remote/master)$ echo $?
1
```
A non-zero return code indicates that `clang-format` will make changes.

Reviewed By: suo

Differential Revision: D20434291

fbshipit-source-id: fa13766e9d94720d4b0d8a540d2f1507e788f7a5
2020-03-13 14:22:17 -07:00
f933fa3613 [docs][1.5] update RPC docs to reflect correct use of dist_autograd backwards and dist_optim step() (#34670)
Summary:
- Clarify that `torch.distributed.autograd.backwards()` does not use the current thread local autograd context, instead it looks it up based on the context_id passed in
- Clarify the same for `torch.distributeed.optimizer.optim.step()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34670

Differential Revision: D20427645

Pulled By: rohan-varma

fbshipit-source-id: a1a88de346cdd4dbe65fb2b7627157f86fd2b6a3
2020-03-13 14:09:23 -07:00
c9023e3b12 Support left and right shift operators in JIT (#34563)
Summary:
With this PR, we can now support left and right shift operators in the JIT engine for <int, int> and <Tensor, int>.

Updated tests pass as expected:
```
> python test/test_jit.py
...
Ran 2427 tests in 84.861s

OK (skipped=139, expected failures=1)
```

Running the following code with Python results in the output below:
```
> cat ~/expressions.py
import torch

torch.jit.script
def fn(a, b):
    # type: (int, int)
    return (
        a << b,  # supported
        b >> a,  # supported
        a & b,
        a | b,
        a ^ b
    )
print(fn.graph)
```

```
> python ~/expressions.py
graph(%a.1 : int,
      %b.1 : int):
  %4 : int = aten::leftshift(%a.1, %b.1) # /home/ince/expressions.py:7:8
  %7 : int = aten::rightshift(%b.1, %a.1) # /home/ince/expressions.py:8:8
  %10 : int = aten::__and__(%a.1, %b.1) # /home/ince/expressions.py:9:8
  %13 : int = aten::__or__(%a.1, %b.1) # /home/ince/expressions.py:10:8
  %16 : int = aten::__xor__(%a.1, %b.1) # /home/ince/expressions.py:11:8
  %17 : (int, int, int, int, int) = prim::TupleConstruct(%4, %7, %10, %13, %16)
  return (%17)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34563

Differential Revision: D20434209

Pulled By: tugrulince

fbshipit-source-id: 886386c59755106e17b84778b8e495b80a6269cd
2020-03-13 13:00:33 -07:00
c34ee4fb6e [JIT] disable test (#34722)
Summary:
I opened https://github.com/pytorch/pytorch/issues/34658 but it didn't work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34722

Differential Revision: D20444547

Pulled By: eellison

fbshipit-source-id: 90aa06098587b48c9760a9c6df9bec01d642fcdb
2020-03-13 12:48:27 -07:00
027d7f7ba5 Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623

The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid
of it entirely.

Close #34502

Test Plan: Imported from OSS

Differential Revision: D20420112

Pulled By: albanD

fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50
2020-03-13 12:27:22 -07:00
4a599f47fb scripts: Add script to promote conda packages (#34659)
Summary:
How this actually works:
  1. Get's a list of URLs from anaconda for pkgs to download, most
  likely from pytorch-test
  2. Download all of those packages locally in a temp directory
  3. Upload all of those packages, with a dry run upload by default

This, along with https://github.com/pytorch/pytorch/issues/34500 basically completes the scripting work for the eventual promotion pipeline.

Currently testing with:
```
TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 PYTORCH_CONDA_FROM=pytorch scripts/release/promote/conda_to_conda.sh
```

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34659

Differential Revision: D20432687

Pulled By: seemethere

fbshipit-source-id: c2a99f6cbc6a7448e83e666cde11d6875aeb878e
2020-03-13 12:14:58 -07:00
b1dbe33056 Skip TestNN.test_spectral_norm_load_state_ if PyTorch is compiled w… (#34686)
Summary:
…ithout lapack

LAPACK is needed for `at::svd``, which is called from `pinverse()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34686

Test Plan: CI + local run

Differential Revision: D20442637

Pulled By: malfet

fbshipit-source-id: b3531ecc1197b0745ddcf50febb7fb4a7700d612
2020-03-13 11:36:33 -07:00
40eff454ce Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519)
Summary:
This PR would fix https://github.com/pytorch/pytorch/issues/33988 and fix https://github.com/pytorch/pytorch/issues/34083.

Previously, the max_pool2d_nhwc kernels used a shared memory with size proportional to the tensor size (c \* h \* w). When the tensor size is too large, the kernel launch fails.

This PR follows the guidance in AdaptiveAvgPool2d_nhwc by increasing the number of grid_x with split in "C" dimension. With that change, there will be a maximum limit in the shared memory size (which is less than 48 kb) regardless of tensor size.

A benchmark can be found at [here](0b98146089/max-pool2d/max-pool2d.ipynb). TL;DR barely any performance drop is found.

cc csarofeen ptrblck jjsjann123 VitalyFedyunin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34519

Differential Revision: D20388848

Pulled By: VitalyFedyunin

fbshipit-source-id: 9454f385f9315afaab4a05303305578bbcd80b87
2020-03-13 11:28:49 -07:00
3924c55f4c [C++ API] Update torch::nn functional docs (#34688)
Summary:
- `torch::nn::functional` functions must provide example for how to use the corresponding functional options
- `torch::nn::functional` functions must link to the corresponding functional options
- remove `TORCH_NN_FUNCTIONAL_USE_MODULE_OPTIONS` macro, and put `torch::nn::functional` options docs inside the functional namespace, right above functional declaration
- `torch::nn::functional` options docs should not link back to torch::nn layers. Instead, they should  have links to `torch::nn::functional::xxx`

----

This PR is BC-breaking in the following way:
`TORCH_NN_FUNCTIONAL_USE_MODULE_OPTIONS` macro is removed, and user should explicitly write
```cpp
namespace functional {
using SomeFuncOptions = SomeModuleOptions;
} // namespace functional
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34688

Differential Revision: D20431251

Pulled By: yf225

fbshipit-source-id: 7d4f27dca3aad2a1e523690927d7afb261b9d308
2020-03-13 10:27:28 -07:00
27410318ad [PyTorch][Mobile] Fix the operator latency issue.
Summary: Last diff enabled operator stats for non-production build including AIBench. But the operator latency is off: https://our.intern.facebook.com/intern/aibench/details/414567479798816 as it is representing operator execution end time, and as the threadLocalDebugInfo was not set, the start time is 0. So this diff is fixing it by creating a new ThreadLocalDebugInfo object when op starts to run and store the model information for logging.

Test Plan:
```buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform android --framework pytorch --remote --devices SM-G960F-8.0.0-26```
https://our.intern.facebook.com/intern/aibench/details/922804117425407

```buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android --framework pytorch --remote --devices SM-G960F-8.0.0-26```
https://our.intern.facebook.com/intern/aibench/details/593403202250750

Reviewed By: xta0

Differential Revision: D20436388

fbshipit-source-id: 740bc94c3f51daef6af9b45c1ed7a708f5fc8836
2020-03-13 09:49:54 -07:00
8e8a37d746 Fix bug in baddbmm corner case (#33467) (#33538)
Summary:
Ensure `torch.baddbmm(c, a, b)` returns `beta*c` when `a @ b` has empty inner dimension.

Fixes https://github.com/pytorch/pytorch/issues/33467.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33538

Differential Revision: D20352352

Pulled By: albanD

fbshipit-source-id: a7021c1979f82402ecea4784d6cc39783392ea16
2020-03-13 09:30:20 -07:00
8f854fb9e2 [1/n][multi-tower] add partition info in predictor construction (#34175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34175

to incorporate PartitionInfo added in D20015493

Test Plan: unit tests

Reviewed By: yinghai

Differential Revision: D20133759

fbshipit-source-id: 130db2d80bca3c05a7ec91292159f857046718e0
2020-03-13 09:23:39 -07:00
14c1ab049d [Codemod][FBSourceGoogleJavaFormatLinter] Daily arc lint --take GOOGLEJAVAFORMAT
Reviewed By: zertosh

Differential Revision: D20415422

fbshipit-source-id: 860f8dd9dce0a2420792bafb7d3e58bd883ab7e4
2020-03-13 06:27:03 -07:00
b93518a662 Revert D20422879: [pytorch][PR] Remove hotpatches that circumvent MAGMA bug
Test Plan: revert-hammer

Differential Revision:
D20422879

Original commit changeset: 8dd7a30b5c31

fbshipit-source-id: a44dda3220d426a92b0e158e9903566be8701374
2020-03-13 06:00:11 -07:00
6791ae51a5 Updating submodules
Summary:
GitHub commits:

e8f09733c7
7e1606a407
674cf41732
e961892c6c
a5dffd2784

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: eb2e20f65ba40bacbfeb1d0cb54ed373cca564ff
2020-03-13 04:17:59 -07:00
fd35596585 [docs][1.5] Update distributed autograd note (#34657)
Summary:
- Update API calls `backward` and `optim.step` now that we require `context_id`
- Add notes to clarify purpose of distributed autograd context (this was a source of confusion in some feedback)
- Add note that details why optimizer requires context_id
- Clearly specify that we don't have SMART mode yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34657

Differential Revision: D20427667

Pulled By: rohan-varma

fbshipit-source-id: 5f8a3539ccf648a78e9e9a0dfdfe389c678b1606
2020-03-12 22:56:32 -07:00
808f84ee35 [Shape Inference] Update shape inference in dper3 backend - C2 part (#34474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34474

Add InferQuantization - set current_dim_type_ to CONSTANT for quantization ops.

Test Plan: buck test mode/opt-clang caffe2/caffe2/opt:bound_shape_inference_test

Reviewed By: yinghai

Differential Revision: D20332703

fbshipit-source-id: 36fa9bc81ae9f49dd00d8393d99ccce0884542df
2020-03-12 22:20:51 -07:00
ad4bc8c9b8 Best-effort Error Detection for Using Deleted UserRRefs (#34673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34673

Test Plan: Imported from OSS

Differential Revision: D20427839

Pulled By: mrshenli

fbshipit-source-id: b1b12ca42a9ed5294806c53fa7d6f54e7dc8b188
2020-03-12 21:39:15 -07:00
f9aa0c870f Use c10::str in py_rref.cpp (#34681)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34681

Test Plan: Imported from OSS

Differential Revision: D20428827

Pulled By: mrshenli

fbshipit-source-id: 847486b3114f0e9a2ad5f80c5e44db82d977c6a2
2020-03-12 21:39:10 -07:00
673d56c838 Use c10::str in process_group_agent.cpp (#34679)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34679

Test Plan: Imported from OSS

Differential Revision: D20428467

Pulled By: mrshenli

fbshipit-source-id: 2bfde4e383347c6e709109f074f55b9bc8068a49
2020-03-12 21:38:14 -07:00
e9a660a160 Revert D20354878: [quant][graphmode] Add quantized conv2d-relu fusion pattern
Test Plan: revert-hammer

Differential Revision:
D20354878

Original commit changeset: 2b19797d4b3f

fbshipit-source-id: 18f447074794af0d579e145df02af47d01746921
2020-03-12 21:29:08 -07:00
5d65b5cd01 Add the 3d upsample quantized op for video model (#34594)
Summary:
as title, we are currently missing this 3d op, which is required for video related model.

Performance benchmark:
```
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 64, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 4, 1, 2, 3])

    x = x.permute([0, 4, 1, 2, 3])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.interpolate(x, size=30, scale_factor=None, mode="nearest", align_corners=None)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.functional.interpolate(q_x, size=30, scale_factor=None, mode="nearest", align_corners=None)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')
```

```
**** torch.qint8 *****
time/iter ms (float)  time/iter ms (quant)  quant/float
1136.8209528923035  1.294245719909668 0.0011384780660638283
GB/s float  GB/s quant
0.20510608588517917 45.03953391792442
**** torch.quint8 *****
time/iter ms (float)  time/iter ms (quant)  quant/float
827.9890131950378 1.11464262008667  0.0013462046021426
GB/s float  GB/s quant
0.28160868355034036 52.29678369508914
**** torch.qint32 *****
time/iter ms (float)  time/iter ms (quant)  quant/float
834.6958303451538 7.481417655944824 0.008963046638020456
GB/s float  GB/s quant
0.2793459455806586  31.16640544920269
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34594

Differential Revision: D20389106

Pulled By: lly-zero-one

fbshipit-source-id: d3a8c2cac58087d8b29e9cae64822f5b2d4c03ba
2020-03-12 21:06:38 -07:00
d5f8c8f3ba Revert D20121169: [pytorch][PR] ONNX Export Support for CrossEntropyLoss
Test Plan: revert-hammer

Differential Revision:
D20121169

Original commit changeset: 7b56617e8c60

fbshipit-source-id: d7f302d1e54f3c978c3be0a0ad1ee600790a5b27
2020-03-12 20:30:54 -07:00
4ae74b3b25 [DPER3][Shape Inference] Initial Shape Inference in DPER3 frontend (#33607)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33607

Differential Revision: D20025048

fbshipit-source-id: 8b3a3bcfeb450de4d38c555bf2bb116ddedad3ec
2020-03-12 20:25:50 -07:00
0ff4d37933 [quant][graphmode] Add quantized conv2d-relu fusion pattern (#33279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33279

att

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20354878

fbshipit-source-id: 2b19797d4b3fd96918164a58bfbd768211ad6c6d
2020-03-12 19:49:57 -07:00
44256199a9 [JIT] remove specialized list ops (#34520)
Summary:
Now that lists are no longer specialized, we can register only one operator for list ops that are generic to their element type.
This PR reorgs lists into three sets of ops:
- CREATE_GENERIC_LIST_OPS
- CREATE_SPECIALIZED_LIST_OPS
- CREATE_COMPARATOR_LIST_OPS_SPECIALIZED (we didn't bind certain specialized ops to Tensor)

This is important to land quickly because mobile is finalizing its bytecode soon, after which we could not remove these ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34520

Reviewed By: iseeyuan

Differential Revision: D20429775

Pulled By: eellison

fbshipit-source-id: ae6519f9b0f731eaa2bf4ac20736317d0a66b8a0
2020-03-12 17:49:23 -07:00
c78eacb5ee scripts: Add promotion script for s3 to pypi (#34500)
Summary:
Is reliant on scripts for promotion from s3 to s3 to have already run.

A continuation of the work done in https://github.com/pytorch/pytorch/issues/34274

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34500

Test Plan: yeah_sandcastle

Differential Revision: D20389101

Pulled By: seemethere

fbshipit-source-id: 5e5b554cff964630c5414d48be35f14ba6894021
2020-03-12 17:21:23 -07:00
52787388d2 [tools] Add clang_format_new.py to download, verify and run clang-format binary (#34566)
Summary:
**Summary**
This commit adds `tools/clang_format_new.py`, which downloads a platform-appropriate
clang-format binary to a `.gitignored` location, verifies the binary by comparing its
SHA1 hash to a reference hash (also included in this commit), and runs it on all files
matched a specific regex in a list of whitelisted subdirectories of pytorch.

This script will eventually replace `tools/clang_format.py`.

**Testing**
Ran the script.

*No Args*
```
pytorch > ./tools/clang_format.py
Downloading clang-format to /Users/<user>/Desktop/pytorch/.clang-format-bin
0% |################################################################| 100%
Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format
> echo $?
0
> git status
<bunch of files>
```

`--diff` *mode*
```
> ./tools/clang_format.py --diff
Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format
Some files are not formatted correctly
> echo $?
1

<format files using the script>

> ./tools/clang_format.py --diff
Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format
All files are formatted correctly
> echo $?
0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34566

Differential Revision: D20431290

Pulled By: SplitInfinity

fbshipit-source-id: 3966f769cfb923e58ead9376d85e97127415bdc6
2020-03-12 17:08:54 -07:00
90ca7a1feb [quant][graphmode] Add Finalize function that inlines graph and produce quantized ops (#33927)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927

Test Plan:
test will be added in later PRs

Imported from OSS

Differential Revision: D20354879

fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404
2020-03-12 14:52:58 -07:00
9f05fc9322 [Aten] First argument of check_names_valid_for() should be an unsigned value (#34158)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34158

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D20232089

fbshipit-source-id: d74b5e36a139998e6967b7b6339001c49d9d58e8
2020-03-12 13:46:37 -07:00
721bd11cc3 [caffe2] Refactor out common util functions from tvm_transformer (#34652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34652

Split from D20006007 because it needs to synced to open source and also for easy testing & landing.

Test Plan:
```
buck test caffe2/caffe2/fb/tvm:test_tvm_transform
```
CI

Reviewed By: yinghai

Differential Revision: D20414037

fbshipit-source-id: 6e17dd9f8cffe87bc59c6e3cc6fd1f8d8def926b
2020-03-12 13:30:15 -07:00
787c307e63 Revert D20368543: [pytorch][PR] [JIT] remove specialized list ops
Test Plan: revert-hammer

Differential Revision:
D20368543

Original commit changeset: ad0c6d70d2a6

fbshipit-source-id: b8b1a64ac830d5f544567714b940c57274194d3f
2020-03-12 12:55:49 -07:00
8c332ff84f [JIT] EliminateDeadCode shouldn't remove custom operator node that has untracked mutation (#34635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34635

For custom op, it's removed in EliminateDeadCode IR optimization step, causing wrong training result.

EliminateDeadCode decides to remove it, because it has no output, so output is used. Also, it has no side effect, and has no untracked mutation, which is not true, custom op can have untracked mutation.

The if statement here only allows aten and prim operator to have untracked mutation, which should be removed.
ghstack-source-id: 100001319

Test Plan:
```
buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_jit

buck build mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_jit \
&& buck-out/gen/caffe2/torch/fb/distributed/pytorch/tests/test_jit\#binary.par -r test_use_dense_adagrad_step
```

Reviewed By: wanchaol

Differential Revision: D7440221

fbshipit-source-id: e424417ab397d90075884c7050c59dfc5c84cf77
2020-03-12 12:37:32 -07:00
fe9b4e3cba [DPER3] Blob Reorder (#33579)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33579

Differential Revision: D20008865

fbshipit-source-id: f35aded311d9d1d7d438d828ccabd2bab5575e5c
2020-03-12 12:28:12 -07:00
9e6cd98c3f Ensure torch_cuda is linked against on Windows (#34288)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31611.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34288

Differential Revision: D20314251

Pulled By: seemethere

fbshipit-source-id: 15ab2d4de665d553a1622a2d366148697deb6c02
2020-03-12 12:16:44 -07:00
31cd893899 remove some TH dead code (#34644)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34644

Test Plan: Imported from OSS

Differential Revision: D20423063

Pulled By: ngimel

fbshipit-source-id: 2783345ea9b3ed65e51a7d0e17cfa29f2c12cc43
2020-03-12 12:10:32 -07:00
cb06cb7b9f Remove hotpatches that circumvent MAGMA bug (#34357)
Summary:
Changelog:
- The magma implementation of small singular square batch matrices had a bug that resulted in nan values in the LU factorization result. This has been fixed in MAGMA 2.5.2. This PR removes the existing patch that was a temporary workaround for this bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34357

Test Plan: - Existing tests for det and lu should pass

Differential Revision: D20422879

Pulled By: seemethere

fbshipit-source-id: 8dd7a30b5c31fc5b844e0a11965efd46067e936a
2020-03-12 11:59:23 -07:00
a74fbea345 Continuous bernoulli distribution (take 2) (#34619)
Summary:
We recently had a NeurIPS paper (https://arxiv.org/abs/1907.06845 and https://papers.nips.cc/paper/9484-the-continuous-bernoulli-fixing-a-pervasive-error-in-variational-autoencoders) where we introduce a new [0,1]-supported distribution: the continuous Bernoulli. This pull request implements this distribution in pytorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34619

Differential Revision: D20403123

Pulled By: ngimel

fbshipit-source-id: d807c7d0d372c6daf6cb6ef09df178bc7491abb2
2020-03-12 11:53:18 -07:00
944ea4c334 ONNX Export Support for CrossEntropyLoss (#33767)
Summary:
Add ONNX export support for torch.nn.CrossEntropyLoss.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33767

Reviewed By: hl475

Differential Revision: D20121169

Pulled By: houseroad

fbshipit-source-id: 7b56617e8c60617b922949fc8b4ecc626eedf7ed
2020-03-12 11:46:58 -07:00
352e9b11e0 Attempt to resolve inconsistent dll linkage warnings on MSVC (#34639)
Summary:
Continue the work in https://github.com/pytorch/pytorch/pull/19242.
Remove the template declarations that implies different dll linkage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34639

Differential Revision: D20419400

Pulled By: ezyang

fbshipit-source-id: 5c7c30f0a4c3ba555589629f352ddb1c006c0c54
2020-03-12 11:41:02 -07:00
fff6fe83a7 [pytorch-rpc] WireSerializer should check has_storage() (#34626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34626

We need to check has_storage() before looking at it in
cloneSparseTensors(), to avoid gratuitously throwing.

Ideally, we'd add a test for this (I wrote one up but had to disable it),
but won't work until JIT Pickler supports sparse tensors.
ghstack-source-id: 100018077

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcAgent/...

Differential Revision: D20399971

fbshipit-source-id: 5debfa8140eb1f949d37336330223962cc320abc
2020-03-12 11:35:21 -07:00
2f32b92763 [ROCm] Enable BFloat16 type for EmbeddingBag ops et al (#34630)
Summary:
This PR enables bfloat16 type for

- Embedding, Index, Sigmoid Ops used in [DLRM](https://github.com/facebookresearch/dlrm)
- Miscellaneous ops like comparison ops, arange op used in unit tests
- Rename types list with the pattern `*_with_bfloat16` in `test_torch.py` to avoid confusion

iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34630

Differential Revision: D20405093

Pulled By: ezyang

fbshipit-source-id: aa9538acf81b3a5a9a46ce5014529707fdf25687
2020-03-12 11:30:33 -07:00
1e6c47413a Updating submodules
Summary:
GitHub commits:

87f3feae5a
cd6c8897f5

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 0c961541c715da74ae417ad25bf29f48e74e45d1
2020-03-12 11:23:39 -07:00
d81d65b2f7 Add entry for distributed tests to CODEOWNERS. (#34637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34637

ghstack-source-id: 100003837

Test Plan: waitforbuildbot

Differential Revision: D20404552

fbshipit-source-id: a7f35beb8b78ad25e5cd000cd940dd7e94cc65de
2020-03-12 11:17:51 -07:00
f9f8424386 [JIT] remove specialized list ops (#34520)
Summary:
Now that lists are no longer specialized, we can register only one operator for list ops that are generic to their element type.
This PR reorgs lists into three sets of ops:
- CREATE_GENERIC_LIST_OPS
- CREATE_SPECIALIZED_LIST_OPS
- CREATE_COMPARATOR_LIST_OPS_SPECIALIZED (we didn't bind certain specialized ops to Tensor)

This is important to land quickly because mobile is finalizing its bytecode soon, after which we could not remove these ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34520

Differential Revision: D20368543

Pulled By: eellison

fbshipit-source-id: ad0c6d70d2a6be6ff0e948d6786052167fc43e27
2020-03-12 10:48:14 -07:00
3f1ba3c465 Redo of "Add API for listing functions overridable by __torch_function__" (#34240)
Summary:
This is a redo of https://github.com/pytorch/pytorch/pull/33791, which was reverted because it introduced a flaky test. The test was flaky and only flaky on Python3.5 because of dict order randomization.

I've fixed the issue with tests clobbering each other in b539fec and removed the override tests for `torch.nn.functional.tanh` and `torch.nn.functional.sigmoid`, which are deprecated and shouldn't be overridable in e0d7402. I also verified that no more test clobbering is happening.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34240

Differential Revision: D20252442

Pulled By: cpuhrsch

fbshipit-source-id: 069568e342a41c90e1dc76cbf85ba4aed47f24be
2020-03-12 10:33:17 -07:00
4e07c35679 Delete all user forks tracked in RRefContext before graceful shutting down (#31893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31893

In order to resolve the issue summarized in https://github.com/pytorch/pytorch/issues/31325.

The overal solution is to proactively send out delete fork messages from user nodes, before user nodes detecting rref leaks.

As the first step, we want to have a weak ref tracker to track all user rrefs.
ghstack-source-id: 100023142

Test Plan:
V22 is the version that make User to wait on delete UseerRRef message.

# Unit tests

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_nested_rref_stress --stress-runs 100

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_nested_rref_stress

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par - r test_rref_forward_chain

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_non_garbage_collected_user_rref_due_to_local_circular_dependency
```

Reviewed By: mrshenli

Differential Revision: D19292254

fbshipit-source-id: 92c3e8d0b00f183c5e22f163bdca482cc25a1ce9
2020-03-12 10:23:08 -07:00
dd313f314e Stop creating unnecessary Storage with newWithStorage1d. (#34389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34389

Test Plan: Imported from OSS

Differential Revision: D20311060

Pulled By: gchanan

fbshipit-source-id: 6d681e0a78e3ea3982d11cfd2eedca843f48302a
2020-03-12 10:18:28 -07:00
518e9f94c2 Kill newWithStorage. (#34388)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34388

Test Plan: Imported from OSS

Differential Revision: D20311059

Pulled By: gchanan

fbshipit-source-id: 4619a99c7bea76b54b7938b798eedc5bc2983dd5
2020-03-12 10:18:23 -07:00
9fd08b9c37 Get rid of newWithSize. (#34387)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34387

Test Plan: Imported from OSS

Differential Revision: D20311058

Pulled By: gchanan

fbshipit-source-id: b62653fd31a181d06aa73cda68abe75614cea0a9
2020-03-12 10:17:15 -07:00
a54416d208 [C++ API] Remove deprecated torch::nn::BatchNorm / FeatureDropout / modules_ordered_dict and torch::nn::init::Nonlinearity / FanMode (#34508)
Summary:
This PR is BC-breaking in the following way:
- The deprecated `torch::nn::BatchNorm` is removed in favor of `torch::nn::BatchNorm{1,2,3}d`
- The deprecated `torch::nn::FeatureDropout` is removed in favor of `torch::nn::Dropout{2,3}d`
- The deprecated `torch::nn::modules_ordered_dict` is removed. User should do `Sequential sequential({{"m1", MyModule(1)}, {"m2", MyModule(2)}})` instead.
- The deprecated `torch::nn::init::Nonlinearity` is removed, in favor of the following enums:
    - `torch::kLinear`
    - `torch::kConv1D`
    - `torch::kConv2D`
    - `torch::kConv3D`
    - `torch::kConvTranspose1D`
    - `torch::kConvTranspose2D`
    - `torch::kConvTranspose3D`
    - `torch::kSigmoid`
    - `torch::kTanh`
    - `torch::kReLU`
    - `torch::kLeakyReLU`
- The deprecated `torch::nn::init::FanMode` is removed, in favor of the following enums:
    - `torch::kFanIn`
    - `torch::kFanOut`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34508

Differential Revision: D20351601

Pulled By: yf225

fbshipit-source-id: cca0cd112f29a31bb023e348ca8f82780e42bea3
2020-03-12 10:09:58 -07:00
e95657b87e [C++ API] AdaptiveLogSoftmaxWithLoss (#29076)
Summary:
Implemented AdaptiveLogSoftmaxWithLoss and some tests for modules. Reference https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29076

Differential Revision: D20404588

Pulled By: yf225

fbshipit-source-id: edbadf432b8173cbcc6caf83c9c03dd92dc31a37
2020-03-12 09:53:58 -07:00
157d2d7825 Fix version check for grad_fn for views (#34145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34145

This fix the following behavior:
```python
import torch

class MyFn(torch.autograd.Function):
    staticmethod
    def forward(ctx, inp, inplace):
        view = inp.clone()[:3]
        if inplace:
            view += 2
        return view

    staticmethod
    def backward(ctx, grad):
        return grad, None

base = torch.rand(10, requires_grad=True)
foo = MyFn.apply(base, False)

print(foo.grad_fn)
# <torch.autograd.function.MyFnBackward object at 0x7f5fd28c4d18>

foo = MyFn.apply(base, True)

print(foo.grad_fn)
# <AsStridedBackward object at 0x7f601c0c3cf0>
```

Where both should be printing `MyFnBackward`.

Test Plan: Imported from OSS

Differential Revision: D20229907

Pulled By: albanD

fbshipit-source-id: 5ebd315d459023017d51760c5bafe43acd5fc3e2
2020-03-12 09:47:56 -07:00
43c9cc7a9c add quantized ELU activation (#34267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34267

Adds quantized ELU.

Test Plan:
```
python test/test_quantized.py TestQuantizedOps.test_qelu
```

still need to benchmark, saving that for after the review comments

Imported from OSS

Differential Revision: D20370953

fbshipit-source-id: fe941bf966f72dd9eee2c4b2ef45fe7afb50c866
2020-03-12 09:31:00 -07:00
514cba0661 [JIT] remove builtin interpolate functions (#34514)
Summary:
`torch.nn.functional.interpolate` was written as a builtin op when we scripted the standard library, because it has four possible overloads. As a result, whenever we make a change to `interpolate`, we need to make changes in two places, and it also makes it impossible to optimize the interpolate op. The builtin is tech debt.

I talked with ailzhang, and the symbolic script changes are good to remove (i guess that makes a third place we needed to re-implement interpolate).

I'm trying to get rid of unneccessary builtin operators because we're standardizing mobile bytecode soon, so we should try to get this landed as soon as possible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34514

Differential Revision: D20391089

Pulled By: eellison

fbshipit-source-id: abc84cdecfac67332bcba6b308fca4db44303121
2020-03-12 09:21:33 -07:00
962e362427 Fix _cat operator (#34591)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34591

Test Plan: Imported from OSS

Differential Revision: D20388000

Pulled By: VitalyFedyunin

fbshipit-source-id: 8ae7593dbddc1a96a03193a99afc9a4ce46203ad
2020-03-12 09:20:10 -07:00
a22008f91e Prohibit copying autograd engines (#34567)
Summary:
Make sure that there could not be more than one instance of either `torch::autograd::Engine` or `torch::autograd::python::PythonEngine`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34567

Test Plan: CI

Differential Revision: D20390622

Pulled By: malfet

fbshipit-source-id: c90595032afc88f552dee52901361b58b282dc1a
2020-03-12 08:06:53 -07:00
3c76b2aeea Replace THPLayout with at::Layout in Python Argument Parser (#34543) (#34584)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34584

Test Plan:
```
python setup.py develop
python test/test_torch.py
```
Output:
```
...
Ran 3834 tests in 198.825s

OK (skipped=180)
```

Imported from OSS

Differential Revision: D20403330

fbshipit-source-id: 41474d5e7001db070f98ac8379f909f0ac74deb6
2020-03-12 07:19:00 -07:00
f70945b1c3 fix the quantized batchnorm2d (#34579)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34579

Differential Revision: D20382783

Pulled By: lly-zero-one

fbshipit-source-id: dadfc4974cb4c808f1eedf8cc4ec52ec8d3ea1b0
2020-03-12 00:48:40 -07:00
c235be42dd [jit] kill script namespace (#34515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515

Once upon a time we thought this was necessary. In reality it is not, so
removing it.

For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.

There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.

Test Plan: Imported from OSS

Differential Revision: D20353503

Pulled By: suo

fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
2020-03-11 23:32:48 -07:00
cf8b728255 Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34588

I constructed the patch by deleting OperatorOptions and then rerouting
all queries for AliasAnalysisKind to FunctionSchema.  Some of the
behavior is kind of bogus: we really shouldn't be mutating FunctionSchema
after the fact, but that won't get fixed until we actually switch to
true schema merging.

Reland of https://github.com/pytorch/pytorch/pull/34160

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20387079

Pulled By: ezyang

fbshipit-source-id: d189f7a6ad8cd186b88b6fbfa3f189994eea14e8
2020-03-11 20:59:46 -07:00
b039bca4db Fix typo in data.rst (#34624)
Summary:
Fix minor typo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34624

Differential Revision: D20401946

Pulled By: ngimel

fbshipit-source-id: 0c6a7d838aa15120b3ecb8b9ba4b57550c9bcd32
2020-03-11 19:40:18 -07:00
2fe7fc681d [PT] add macro to expose caffe2 ops to PyTorch mobile (#34578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578

Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile.

Test Plan:
verified caffe2 ops are registered in PT mobile
(on the whole stack)

```
_caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output)
_caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1)
_caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size)
_caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1)
_caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints)
_caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y)
_caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor)

Reviewed By: dreiss

Differential Revision: D20128254

fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4
2020-03-11 19:15:14 -07:00
0dc0fffca1 [net_transform] only skip ConstantFill for autogen_grad (#34628)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34628

Differential Revision: D20370564

fbshipit-source-id: 854c8ab44ba262e5020383447ed6bb629064ec33
2020-03-11 19:09:52 -07:00
86fb522acd Remove cudaMemcpy on full memory overlap (#34548)
Summary:
TensorIterator is already checking partial overlap, so there is no trivial UB, but TensorITerator allows full overlap, and it is not a bad idea to skip the memcpy in such case.

fixes: https://github.com/pytorch/pytorch/issues/34525
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34548

Differential Revision: D20371643

Pulled By: ngimel

fbshipit-source-id: ff9e2e872537010afe040204e008b2499af963ad
2020-03-11 17:36:03 -07:00
adb8e26182 Fix for handling batch size 0. (#34599)
Summary:
Separating this out in a different diff, however since most of the
xnnpack integration is not tested until the PR https://github.com/pytorch/pytorch/issues/34047, this was not
caught till then.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34599

Test Plan: Tested in test/test_xnnpack_integration.py via https://github.com/pytorch/pytorch/issues/34047.

Differential Revision: D20391000

Pulled By: kimishpatel

fbshipit-source-id: 596a3e54445072ab63f700d425d07c7f44586683
2020-03-11 16:36:28 -07:00
9064fafb6e [C++ API] Update torch::nn layer docs (#34522)
Summary:
This PR updates C++ API torch::nn layer docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34522

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20380832

Pulled By: yf225

fbshipit-source-id: ee99a838ec05c6ce2a23aa97555707e507d09958
2020-03-11 16:09:09 -07:00
56832bf7f3 [JIT] Add support for tolist for GPU-resident Tensors (#34554)
Summary:
**Summary**
This commit modifies the JIT implementation of `Tensor.tolist` so that it
can be called on GPU-resident Tensors as well. If the Tensors is not on the
CPU when the operator is invoked, it is copied to the CPU before doing any
of the rest of the work to convert it into a list.

**Testing**
This commit adds GPU versions of some of the existing CPU tests for this
feature.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34554

Differential Revision: D20392604

Pulled By: SplitInfinity

fbshipit-source-id: 69c17b98d866428c19d683588046169538aaf1e3
2020-03-11 15:14:12 -07:00
866505b100 [ci] try to fix rocm builds (#34600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34600

They are failing with:
```
E: The method driver /usr/lib/apt/methods/https could not be found.
```

Trying the solution recommended in: https://unix.stackexchange.com/questions/263801/apt-get-fails-the-method-driver-usr-lib-apt-methods-https-could-not-be-found

The long-term solution is to move all this to be pre-installed in the
docker image.

Test Plan: Imported from OSS

Differential Revision: D20391153

Pulled By: suo

fbshipit-source-id: 959dff2ea9e77bb52739c0659e9d800cdbe4cb01
2020-03-11 15:01:12 -07:00
2de4f245c6 Fix typo in documentation (#34581)
Summary:
Update the  parameter description of `total_steps` in `OneCycleLR`. References https://github.com/pytorch/pytorch/issues/34531
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34581

Differential Revision: D20386306

Pulled By: albanD

fbshipit-source-id: f8b424a01760e8f5d4de5367b6c60fb342019689
2020-03-11 13:57:10 -07:00
25e4e9eb86 [On-device Benchmark] speed_benchmark_torch switch to log latency from dataset level to row level (#34598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34598

as above

Test Plan:
test.txt
```
what time is it now
could you set a reminder at 7 am
waht is the weather today
```
example json
```
{
    "model": {
      "category": "CNN",
      "description": "Assistant Mobile Inference",
      "files": {
        "model": {
          "filename": "model.pt1",
          "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1",
          "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
        },
        "data": {
          "filename": "input.txt",
          "location": "/home/pengxia/test/input.txt",
          "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
        }
      },
      "format": "pytorch",
      "framework": "pytorch",
      "kind": "deployment",
      "name": "Assistant Mobile Inference"
    },
    "tests": [
      {
        "command": "{program} --model {files.model}  --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter 5 --input_file {files.data} --report_pep true",
        "identifier": "{ID}",
        "metric": "delay",
        "iter": 15,
        "warmup": 2,
        "log_output": true
      }
    ]
  }

```

iter = 5 (--iter 5 ) *3(3 lintes in the test.txt)  = 15

arbabu123 I will provide a wrapper to compute the iter in future.

run following command
```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices  SM-G960U-8.0.0-26
```

results
https://our.intern.facebook.com/intern/aibench/details/275259559594003

**Note: this is compatible with the existing examples.**

Reviewed By: kimishpatel, ljk53

Differential Revision: D20389285

fbshipit-source-id: 80165ef394439a307ac7986cf540a80fdf3d85d6
2020-03-11 13:51:42 -07:00
70f3298684 Fix SELECTED_OP_LIST file path issue (#33942)
Summary:
If SELECTED_OP_LIST is specified as a relative path in command line, CMake build will fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33942

Differential Revision: D20392797

Pulled By: ljk53

fbshipit-source-id: dffeebc48050970e286cf263bdde8b26d8fe4bce
2020-03-11 13:19:31 -07:00
1f834b5c2a [JIT] Torchbind error if python instantiate class that doesnt exist (#34568)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34568

Test Plan: Imported from OSS

Differential Revision: D20378106

Pulled By: jamesr66a

fbshipit-source-id: 395a3b05d23727b9cfd074440b2d0e8ef002ec09
2020-03-11 13:13:08 -07:00
12fb8148e4 Disable ROCM when building mobile libtorch. (#34478)
Summary:
When a system has ROCm dev tools installed, `scripts/build_mobile.sh` tried to use it.
This PR fixes looking up unused ROCm library when building libtorch mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34478

Differential Revision: D20388147

Pulled By: ljk53

fbshipit-source-id: b512c38fa2d3cda9ac20fe47bcd67ad87c848857
2020-03-11 11:28:32 -07:00
b553e6911a [distributed] quicker exit in the case of failed tests in distributed (#34150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34150

In the distributed setting we commonly have tests in which there are errors where one process
exits but the other do not (since they are for example waiting for work from
the process that exited). Currently, when this situation happens we do not
handle this well, and wait for process 0 to timeout. This results in wasted
time waiting for test errors and a less helpful "Process 0 timed out..." error
message when the error was actually something else.

This diff fixes the issue by checking for exited subprocesses and terminating
the test when we see a subprocess that has exited uncleanly. We still enforce
timeouts and return when all processes have exited cleantly in the happy path.
ghstack-source-id: 99921462

Test Plan:
All distributed tests + tested by writing tests that should trigger
the unclean subprocess detection, and verified that we exit quickly instead of
waiting for the entire timeout.

Differential Revision: D20231032

fbshipit-source-id: 3e0d4a20925b7d1098ec4c40ffcc66845425dd62
2020-03-11 11:27:17 -07:00
2cf576e9ea small typos (#34589)
Summary:
Spotted a couple of small typos 🙏
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34589

Differential Revision: D20387653

Pulled By: ngimel

fbshipit-source-id: 3089fe606ccb8c8ee57cf7a900aba714fd0ce567
2020-03-11 11:01:31 -07:00
82cdd3abae Stop last usage of newWithSize. (#34386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34386

Test Plan: Imported from OSS

Differential Revision: D20311061

Pulled By: gchanan

fbshipit-source-id: 1e90a90db2efa1a566d4a78a6d1b8d918b91cf66
2020-03-11 09:58:30 -07:00
4b929e5466 Revert D20193196: [pytorch][PR] PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem
Test Plan: revert-hammer

Differential Revision:
D20193196

Original commit changeset: 78a487991242

fbshipit-source-id: 8da4f8cb17c45af41e8c0ce80bc72581eb10dbb8
2020-03-11 09:24:34 -07:00
6f8a8e4e47 Revert D20282846: Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema.
Test Plan: revert-hammer

Differential Revision:
D20282846

Original commit changeset: ba7bca6e8adc

fbshipit-source-id: b9e15d2b2c3d1dbc6e971ab3c0bdf380e769dcf1
2020-03-11 07:50:29 -07:00
63964175b5 Revert D20379910: [pytorch][PR] Set USE_RCCL cmake option (dependent on USE_NCCL)
Test Plan: revert-hammer

Differential Revision:
D20379910

Original commit changeset: 981f924be93d

fbshipit-source-id: 2cfc2eebe6ebabf801f0ea6a183aad2342ada79f
2020-03-11 07:41:13 -07:00
2ec779d46c PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem (#29488)
Summary:
This PR implements the following linear algebra algorithms for low-rank matrices:
- [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
  + exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
  + uses `torch.lowrank.get_approximate_basis`
  + exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] PCA - using `torch.svd_lowrank`
  + uses `torch.svd_lowrank`
  + exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices, uses non-centered sparse matrix algorithm
  + [x] documentation
- [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124)
  + exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883)
  + exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)`
  + [x] dense matrices
  + [x] batches of dense matrices
  + [x] sparse matrices
  + [x] documentation
- [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically
  + the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point.
- [x] benchmarks
  + [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations.
  + [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster.
  + [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results)

Resolves https://github.com/pytorch/pytorch/issues/8049.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488

Differential Revision: D20193196

Pulled By: vincentqb

fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e
2020-03-11 07:33:49 -07:00
5fc5cf6571 Stop using ctypes to interface with CUDA libraries. (#33678)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678

Differential Revision: D20249187

Pulled By: ezyang

fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed
2020-03-11 07:22:46 -07:00
9d42177a31 Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34160

I constructed the patch by deleting OperatorOptions and then rerouting
all queries for AliasAnalysisKind to FunctionSchema.  Some of the
behavior is kind of bogus: we really shouldn't be mutating FunctionSchema
after the fact, but that won't get fixed until we actually switch to
true schema merging.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20282846

Pulled By: ezyang

fbshipit-source-id: ba7bca6e8adc3365789639b88e54c4e881b1692e
2020-03-11 07:15:18 -07:00
b2344b70da Beef up documentation on Dispatcher.h, reorder methods for clarity. (#33838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33838

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20227875

Pulled By: ezyang

fbshipit-source-id: 319855b1f0fa436f9ed5256d2106b07f20e6b833
2020-03-11 07:13:39 -07:00
fbbeee0983 Port remainder from TH to ATen (CPU and CUDA) (#34136)
Summary:
CPU issue https://github.com/pytorch/pytorch/issues/24753
CUDA issue https://github.com/pytorch/pytorch/issues/24615
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34136

Differential Revision: D20375458

Pulled By: ezyang

fbshipit-source-id: 1a9fb39a7e2d17a0d31bd14b211eaacea060e834
2020-03-11 07:08:11 -07:00
7aca9afdfb [pytorch] remove boilerplate setQEngine() from PyTorch mobile predictors (#34556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34556

According to
https://github.com/pytorch/pytorch/pull/34012#discussion_r388581548,
this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't
really necessary for mobile.

In Context.cpp it selects the last available QEngine if the engine isn't
set explicitly. For OSS mobile prebuild it should only include QNNPACK
engine so the default behavior should already be desired behavior.

It makes difference only when USE_FBGEMM is set - but it should be off
for both OSS mobile build and internal mobile build.

Test Plan: Imported from OSS

Differential Revision: D20374522

Pulled By: ljk53

fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698
2020-03-11 00:55:14 -07:00
2ce9513b0c AccumulateGrad: ensure sparse tensor indices and values refcount is always 1 (#34559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34559

We check the use_count for indices and values when we avoid a clone
for sparse tensors. The sparse tensor grad itself might have a higher refcount
due to DDP hooks/dist autograd structures holding refs, but the indices and
values inside the sparse tensor should always have a refcount of 1.
ghstack-source-id: 99900534

Test Plan: waitforbuildbot

Differential Revision: D20375239

fbshipit-source-id: 6a654549d13071ab3451cef94259caf7627b575c
2020-03-10 23:41:44 -07:00
ab2297dfe6 Add Tensor overload for start in narrow. (#34317)
Summary:
https://github.com/pytorch/pytorch/issues/31558
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34317

Differential Revision: D20294333

Pulled By: ailzhang

fbshipit-source-id: 47c6646ae298e04a455923bd5048db026a5e3c7c
2020-03-10 22:33:22 -07:00
2e88a78d2e add quantized_hardtanh (#34097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34097

Adds quantized hardtanh.  Calls the clamp kernel behind the
scenes.

Test Plan:
```
python test/test_quantized.py
```

Imported from OSS

Differential Revision: D20208860

fbshipit-source-id: 165a6a1c22f1dcc479679e5ea0c990d0e9c3b6c5
2020-03-10 22:27:15 -07:00
8d84c5f1c7 Fix static data initialization deadlock on GIL (#34505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34505

A thread could hold GIL when calling PythonRpcHandler::getInstance(),
meantime another thread could have been doing static data
initialization by calling `new PythonRpcHandler()`, inside of which GIL is
also required. Static data initialization is thread-safe, so the thread
holding the GIL will wait for the other thread to finish static data
initializating before going forward. Because the initialization can't
proceed without GIL, there is a deadlock. We ask the calling thread to
release GIL to avoid this situation.
ghstack-source-id: 99893858

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn -- 'test_backward_simple_script_call \(test_dist_autograd_spawn\.DistAutogradTestWithSpawn\)' --stress-runs 100
```

Differential Revision: D7490489

fbshipit-source-id: 76f63cc7bedf088d3dbff288f53aa0bd33749255
2020-03-10 20:40:22 -07:00
ce77d4a316 Set USE_RCCL cmake option (dependent on USE_NCCL) (#31341)
Summary:
so that Gloo build has RCCL path enabled for ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31341

Differential Revision: D20379910

Pulled By: ezyang

fbshipit-source-id: 981f924be93ddcc0705c1934f92d938c29aaf312
2020-03-10 20:26:09 -07:00
23b2fba79a [jit] Add type tags to lists/dicts in pickle (#33255)
Summary:
Stacked PRs
 * #33474 - [jit] Remove list specializations from pickler
 * **#33255 - [jit] Add type tags to lists/dicts in pickle**

This adds a global call to `torch.jit._pickle.restore_type_tags` for
lists and dicts so that we can preserve their types after serialization.
](https://our.intern.facebook.com/intern/diff/20346780/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255

Pulled By: driazati

Differential Revision: D20346780

fbshipit-source-id: c8534954ef4adb2e3c880401acbee30cd284f3db
2020-03-10 19:17:01 -07:00
4167db11f7 [pytorch][ci] add build_only flag to mobile CI jobs (#34560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34560

These jobs don't have next phase so we don't really need commit the
docker images.
Should also fix issue #34557.

Test Plan: Imported from OSS

Differential Revision: D20375308

Pulled By: ljk53

fbshipit-source-id: 328cb428fcfb0fbb79b2a233b5f52607158c983c
2020-03-10 17:45:51 -07:00
a09c4d3997 [pt][quant] Vectorized qmul and more methods on qint data types (#34376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34376

Vectorized implementation of qmul. qmul is now ~16x faster on my development machine. This implementation works for qint8, quint8 and qint32. Also added some commonly used operations, such as multiply operator, requantize operation etc., to qint vector classes for future use.

```
#!/usr/bin/env python

import time
import torch
import torch.nn as nn
torch.set_num_threads(1)
# print(torch.__config__.parallel_info())

A = torch.rand(1, 54, 54, 256)
B = torch.rand(1, 54, 54, 256)

scale = .05
zero_point = 50

for dtype in [torch.quint8, torch.qint8]:

    qA = torch.quantize_per_tensor(A, scale=scale, zero_point=zero_point,
            dtype=dtype)
    qB = torch.quantize_per_tensor(B, scale=scale, zero_point=zero_point,
            dtype=dtype)

    NITER = 1000
    s = time.time()
    for i in range(NITER):
        out = torch.ops.quantized.mul(qA, qB, scale=scale, zero_point=zero_point)
    time_per_iter = (time.time() - s) / NITER

    print('dtype: {} time per iter ms: {:.3f}'.format(dtype, time_per_iter * 1000))
```
### Before
dtype: torch.quint8 time per iter ms: 6.714
dtype: torch.qint8 time per iter ms: 6.780

### After
dtype: torch.quint8 time per iter ms: 0.431
dtype: torch.qint8 time per iter ms: 0.417

### Test
Modified qmul tests to include qint8 and qint32 data types.

python test/test_quantized.py TestQuantizedOps.test_qmul_relu_same_qparams
python test/test_quantized.py TestQuantizedOps.test_qmul_relu_different_qparams
python test/test_quantized.py TestQuantizedOps.test_qmul_broadcast
ghstack-source-id: 99862681

Differential Revision: D20308515

fbshipit-source-id: 4fa65b2ba433cfd59260fc183a70f53a6fcc36b4
2020-03-10 16:51:41 -07:00
903ad90325 [JIT] Introduce a fake Tensor creation node for IR unit tests (#34334)
Summary:
**Summary**
There is often a need to create a Tensor when writing IR by hand for JIT
optimisation pass unit tests. The only options for this today are real
Tensor creation functions like `aten::ones`. Any test that uses these functions
must also use the same default arguments as the Python/C++ API, which means
that all of the tests have to be updated when the API is updated. This commit
introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that
should be used in unit tests instead of real Tensor creation functions. This new
primitive has no public-facing API, so the maintenance burden is much lower.

**Testing**
This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of
`aten::rand`, `aten::ones`, and `aten::zeros`.

```
$ ./bin/test_jit
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = *-*_CUDA:*_MultiCUDA
[==========] Running 75 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 75 tests from JitTest
[ RUN      ] JitTest.ADFormulas
[       OK ] JitTest.ADFormulas (82 ms)
[ RUN      ] JitTest.Attributes
[       OK ] JitTest.Attributes (0 ms)
...
...
...
[ RUN      ] JitTest.LiteInterpreterPrim
[       OK ] JitTest.LiteInterpreterPrim (0 ms)
[ RUN      ] JitTest.LiteInterpreterLoadOrigJit
[       OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms)
[----------] 75 tests from JitTest (150 ms total)

[----------] Global test environment tear-down
[==========] 75 tests from 1 test case ran. (150 ms total)
[  PASSED  ] 75 tests.
```

**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33500.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34334

Differential Revision: D20296437

Pulled By: SplitInfinity

fbshipit-source-id: df4e7b0881ae4913424e5a409bfa171a61c3e568
2020-03-10 16:12:45 -07:00
d0834c5b64 Preserve memory format for torch.cat on CUDA (#34526)
Summary:
fix https://github.com/pytorch/pytorch/issues/34084

cc: ptrblck VitalyFedyunin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34526

Differential Revision: D20371847

Pulled By: ngimel

fbshipit-source-id: e3b1a34caff2db8099ad9afe91bf9b473d5da6e8
2020-03-10 16:06:10 -07:00
be3bc1deb1 convert counter back to list #33229 (#33356)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356

Differential Revision: D20003196

Pulled By: vincentqb

fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92
2020-03-10 15:46:24 -07:00
dd7cec680c Do not use clang if it can not parse system extensions (#34549)
Summary:
Attempt to build pytorch with ASAN on system with gcc-8 fails due to the mismatch system compilation flags.
Address the issue by using original compiler to build `torch._C` extension
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34549

Test Plan: Run `.jenkins/pytorch/build-asan.sh` on FC-30

Differential Revision: D20373781

Pulled By: malfet

fbshipit-source-id: 041c8d25f96b4436385a5e0eb6fc46e9b5fdf3f1
2020-03-10 15:40:08 -07:00
09296c34a4 Add the build for runtime dispatch for AVX, AVX2 instruction set (#26125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26125

We already had some optimization implementation using AVX2 for improve the quantized kernel performance. In this diff, we want to enable the runtime dispatch.

Test Plan:
Sandcastle build and test

Also test with a python binary calling into vectorized op.

torch.__config__.show()
PyTorch built with:
  - GCC 4.2
  - clang 8.0.20181009
  - Intel(R) Math Kernel Library Version 2017.0.3 Product Build 20170413 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.18.1 (Git Hash N/A)
  - OpenMP 1
  - **CPU capability usage: AVX2**
  - Build settings:

Reviewed By: jamesr66a

Differential Revision: D17337251

fbshipit-source-id: 8e22d10011a12a4eaf54cea3485353eb1811d828
2020-03-10 15:32:57 -07:00
259d7299db [caffe2] do not declare __assert_fail in clang builds (#33893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33893

It appears that when Clang drives CUDA compilation ` __assert_fail` is always defined as device function.

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true -c cxx.untracked_headers=ignore //fblearner/flow/projects/dper:workflow
```

Reviewed By: ngimel

Differential Revision: D20145034

fbshipit-source-id: 23153411ed631e05421c7afcf41b7ea5619cdd96
2020-03-10 14:45:03 -07:00
2d24005d18 [C++ API Parity] rmsprop optimizer update (#33450)
Summary:
**This PR is BC-breaking in the following way:**

In RMSpropOptions:
1. learning_rate is renamed to lr.

**Test plan before 1.5 release:**

Test that in 1.5 we can load a C++ RMSprop optimizer that was serialized in 1.4, and their states are the same.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33450

Differential Revision: D20366623

Pulled By: anjali411

fbshipit-source-id: 83250be9b583a766927e0e22a4de8b0765379451
2020-03-10 13:30:56 -07:00
6f12145c60 Change std::to_string call to c10::to_string
Summary: I'm using this code in an internal Android build, and std::to_string doesn't work in our internal Android builds yet.

Test Plan: Internal build.

Reviewed By: ljk53

Differential Revision: D20234221

fbshipit-source-id: 8fd61235bf9b487e07a1459c452830e732c7afb0
2020-03-10 13:18:27 -07:00
2cf344be4c Turn on exact_dtype by default on test_sparse.py (#34489) (#34542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34542

Turn on exact_dtype by default on test_sparse.py (#34489)

Pull Request resolved: #34489

Test Plan:
```
python test/test_sparse.py
```

Imported from OSS

Differential Revision: D20369764

fbshipit-source-id: ade2434f77af8ae419bda653b4c46616c052a8b2
2020-03-10 12:52:09 -07:00
b185359fb4 Avoid clone for sparse tensors during accumulation of grads. (#33427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33427

This PR is an attempt to avoid clone for sparse tensors similar to how
we avoid clone for dense tensors currently.

As per my understanding even if the 'indices' and 'values' of a sparse tensor
are non-continguous, operations like 'add' are still supported. As a result,
the major change in this PR is to use create a shallow copy instead of clone()
for sparse tensors.
ghstack-source-id: 99838375

Test Plan: waitforbuildbot

Differential Revision: D19926698

fbshipit-source-id: b5a3f36c2aa273e17f8b7a9f09c1ea00e7478109
2020-03-10 12:41:47 -07:00
5f61f42c79 .circleci: Switch should_run_job cuda 10.1 -> 10.2 (#34498)
Summary:
We updated the default jobs to run in a different PR but neglected to
update this script as well.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34498

Differential Revision: D20368420

Pulled By: seemethere

fbshipit-source-id: 240171b18f397095e3a8d57de3a29d1d2e891d85
2020-03-10 12:25:09 -07:00
cd9d9a2235 fix handling of replica parameters in DataParallel (#33907)
Summary:
In DataParallel, replica parameters are not leaves (because they are computed via broadcast from master parameters), and should be treated as such. Fixes https://github.com/pytorch/pytorch/issues/33552
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33907

Differential Revision: D20150199

Pulled By: ngimel

fbshipit-source-id: 5965d4115b6b3a8433063126ff6269567872fbeb
2020-03-10 10:35:44 -07:00
0dbfb26e53 Clean up include list of Shape.cu (#34528)
Summary:
The include list seems to be copied from somewhere else, and some totally unrelated files are included.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34528

Differential Revision: D20358622

Pulled By: ngimel

fbshipit-source-id: d8a6260f5f77b0eabdbd68e3728873efd632d9bc
2020-03-10 10:29:20 -07:00
cb689a5d68 remove duplicated process group gloo timeout (#31342)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31342

Test Plan: unit test

Differential Revision: D19131704

fbshipit-source-id: 4e91d5933635ee2c7c301caf89a5a7009c5cb7c8
2020-03-10 09:08:02 -07:00
c7dd5f89a2 Fix #33562 (uncaught domain_error on macOS) (#34301)
Summary:
Tries to fix https://github.com/pytorch/pytorch/issues/33562 by raising `std::runtime_error` instead of `std::domain_error`.
* The Python tests already expect `RuntimeError` so this shouldn't affect Python users of PyTorch.
* If someone out there is using C10 or ATen from C++ and tries to catch `std::domain_error` specifically, this fix would break their code. Hopefully that's not the case.

Alternative to this PR is someone try to really get to the bottom of why `std::domain_error` isn't being caught.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34301

Differential Revision: D20344579

Pulled By: ezyang

fbshipit-source-id: d5f3045085a2f75b71b864335ebf44991d0cad80
2020-03-10 08:56:38 -07:00
9e94e46453 Check if rnn weights need to be flattened (#34265)
Summary:
cuDNN needs it, MIOpen doesn't. However, since it seems to be the PyTorch preference to not introduce ROCm-specific logic in the python layer, we need to add a C++ function to detect if rnn weight flattening is needed.

This PR will be needed to fix the rnn unit test errors arising for PR https://github.com/pytorch/pytorch/issues/33837.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34265

Differential Revision: D20345105

Pulled By: ezyang

fbshipit-source-id: a2588a6e2ac6f7d1edf2b7872bc6a879a7df96ec
2020-03-10 08:45:29 -07:00
29b673392f [ROCm] Enable BFloat16 type for loss functions and few misc ops required for resnet50 (#34469)
Summary:
This PR enables bfloat16 type for loss criterion ops(and the ops they depend on) and few miscellaneous ops required to train resnet50.

iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34469

Differential Revision: D20348856

Pulled By: ezyang

fbshipit-source-id: 0a8f06c2169cfa3c9cf319120e27150170095f6c
2020-03-10 08:39:07 -07:00
20b18a58f1 Update compiler warning about ABI compatibility (#34472)
Summary:
3ac42677633a39c588c3fea19d2d4121f114edb3 already forces pytorch to use gcc>=5 everywhere
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34472

Differential Revision: D20345134

Pulled By: ezyang

fbshipit-source-id: 3ce706405e8784cac5c314500466b5f988ad31bf
2020-03-10 08:12:07 -07:00
f5ee46f1cf Remove custom function in no_grad block error message (#33896)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33896

Fixes #32625. Previously, we'd receive an error message if we have a
custom function return a view of an input in a no_grad block:
```
class Alias(Function):
    staticmethod
    def forward(ctx, x):
        return x[:]

    staticmethod
    def backward(ctx, gx):
        return gx

inp = torch.rand(2, requires_grad=True)

with torch.no_grad():
    # Used to error out
    output = Alias.apply(inp)
```

After this change, the error no longer happens. The behavior changes to
become consistent to if we had implemented an operator that does the
same thing as the custom function:
- the output requires_grad
- we are able to detect (and error out) if the user tries to modify the
output in-place outside of the no_grad block.

Test Plan: - new test

Differential Revision: D20345601

Pulled By: zou3519

fbshipit-source-id: 7f95b4254f52ddbf989d26f449660403bcde1c78
2020-03-10 07:58:55 -07:00
3e6e2e9b7b Print the current Node name in anomaly mode (#33875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33875

Fixes #33675.

I added a `current_node_name` argument to AnomalyMetadata::print_stack.
This is a mandatory arg because I found only one callsite and making it
a default arg on a virtual function can be confusing.

Test Plan:
- Tested locally:
https://gist.github.com/zou3519/09937387c83efc76e1700374d5c9c9d9
- I don't know how to add a test for this: the message is printed to
stderr but it isn't an exception nor a warning. I considered capturing
the stderr of a subprocess but that seems like asking for flakiness.

Differential Revision: D20349399

Pulled By: zou3519

fbshipit-source-id: 7585ddffe2bf9e1081f4028a9c44de783978a052
2020-03-10 07:51:52 -07:00
d30fa4837e Unify gradient accumulation between distributed autograd and local autograd (#33214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33214

Distributed autograd had some custom logic in terms of how we
accumulated gradients. This was mostly done early on to enable basic
functionality. Although, in the long term we should merge this logic with what
we have in the local autograd engine. A lot of work has gone into ensuring we
accumulate grads correctly and efficiently and we should reuse that as a
starting point.

We can investigate if we need further custom logic for distributed autograd
later on if we need additional optimizations.

In this PR I've merged the gradient accumulation logic and also the gradient
hooks. As a result, now gradient hooks are called in distributed autograd as
well.
ghstack-source-id: 99838019

Test Plan: waitforbuildbot

Differential Revision: D19843284

fbshipit-source-id: 7923d7e871fb6afd3e98dba7de96606264dcb5f3
2020-03-10 01:56:08 -07:00
4f62cbe7de [ONNX] Support one_hot (#34454)
Summary:
This PR resolves https://github.com/pytorch/pytorch/issues/22534 by adding a converter for the `torch.nn.functional.one_hot` function, and covering it with a test.

Are there other places this should be tested?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34454

Reviewed By: hl475

Differential Revision: D20354255

Pulled By: houseroad

fbshipit-source-id: 84224c1610b2cc7986c91441c65647ddc090750d
2020-03-09 22:26:36 -07:00
965146b818 [jit] delete netdef converter (#33807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33807

afaik this is unused, so removing it from the source tree. RIP :(

Test Plan: Imported from OSS

Differential Revision: D20122118

Pulled By: suo

fbshipit-source-id: cb45943f5b9f969482301a2f9fe540326dbc78f2
2020-03-09 22:25:16 -07:00
3671036ef3 Adds true_divide function, analogous to Python 's, JAX's, NumPy's (true) division (#34236)
Summary:
See NumPy's division documentation here: https://numpy.org/doc/1.18/reference/generated/numpy.divide.html#numpy.divide.

True division is the same as PyTorch's default division except when both inputs are integer or bool tensors. In the latter case the inputs are (conceptually) cast to the default floating type before the division is performed.

The function is implemented for dense and sparse tensors and supports exporting to ONNX from PyTorch's eager mode or JIT traces. The function is inherently incompatible with exporting to ONNX via JIT script, and is another datapoint suggesting we should deprecate exporting scripted graphs to ONNX.

Tests are added for the type promotion, named tensor, and ONNX export behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34236

Reviewed By: houseroad

Differential Revision: D20334087

Pulled By: mruberry

fbshipit-source-id: 83d00d886f46f713215d7d9e02ffd043164c57f1
2020-03-09 21:06:33 -07:00
e408d46477 Print pytorch version before running ASAN tests (#34521)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34521

Test Plan: CI

Differential Revision: D20357233

Pulled By: malfet

fbshipit-source-id: 1c1b5a94a66d828383676a7a1403bbc13bb21c83
2020-03-09 20:52:46 -07:00
b9c32209db Use SerializedPyObj in PythonRpcHandler::generatePythonUDFResult (#34495)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34495

Differential Revision: D20347466

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 79625adb4ac3c9c6da4f40016e973bf17466c693
2020-03-09 20:41:05 -07:00
b82658810e Split deserialize from _run_function in RPC internal.py (#34494)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34494

Differential Revision: D20347463

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: e6fd886622f26c46bb83ac118e67abb2f5b296b9
2020-03-09 20:41:00 -07:00
544fb64440 Use SerializedPyObj in PythonRpcHandler (#34493)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34493

Differential Revision: D20347462

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 9edda9eb95b1994464459271bb53ee77b760e474
2020-03-09 20:40:55 -07:00
18ef09f5ac Remove _load_return_value from RPC internal.py (#34492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34492

Differential Revision: D20347468

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: 92388d0d50a08fb895bacacf94c7b5495b4ae2b6
2020-03-09 20:40:50 -07:00
6d1c4df660 Consolidate Python Messages to use SerializedPyObj (#34491)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34491

Differential Revision: D20347467

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: efae4111d961f3a528cede77c863fb049cda9029
2020-03-09 20:40:45 -07:00
3b661eb84c Avoid copy contents in SerializedPyObj (#34490)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34490

Differential Revision: D20347465

Test Plan: Imported from OSS

Pulled By: mrshenli

fbshipit-source-id: d59e74e3ee9122992a5c50a083e43ab31b7a70f5
2020-03-09 20:38:54 -07:00
2de4fa702b [JIT] Preserve qualified names on traced modules (#34395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34395

fixes: https://github.com/pytorch/pytorch/issues/33913

Test Plan: Imported from OSS

Differential Revision: D20347778

Pulled By: jamesr66a

fbshipit-source-id: 7b5a35b6f9678c34cb6127d531fa3bfe65703116
2020-03-09 19:23:53 -07:00
79e1305519 [net_runner] Get shape info from qtensors (#34321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34321

Mostly cosmetic as we can infer the shape anyway. It can remove a lot of the noise in the log though.

Note that weight sharing doesn't work yet. I'll add another diff to address this.

Reviewed By: houseroad

Differential Revision: D20290841

fbshipit-source-id: fe6f9b60d05dbe150af15b5d9d7a69fd902e12cc
2020-03-09 18:34:16 -07:00
e16908cb1f profile block outputs; helps guard elimination (#33889)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33889

Reviewed By: zdevito

Differential Revision: D20294979

Pulled By: Krovatkin

fbshipit-source-id: 2a68710ec8f8f854c99dfe173f49da442a39e498
2020-03-09 17:12:58 -07:00
2c1a302d6a [ROCm] Enable double __shfl_down (#34103)
Summary:
This allows us to enable some double-based pdist tests running into accrued error from casting down to float previously.

Addresses https://github.com/pytorch/pytorch/issues/33128
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34103

Differential Revision: D20343279

Pulled By: ezyang

fbshipit-source-id: a2da768259fab34ef326976283b7a15bebbbb979
2020-03-09 16:23:56 -07:00
0a4a558c2c Dictionary Constants (#32869)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32869

Differential Revision: D19909339

Pulled By: Krovatkin

fbshipit-source-id: 6fe2a9b470768f84b957c69cdf9af3a1bd9b1ca9
2020-03-09 16:12:36 -07:00
90ff3b56d0 Kill some unused TH(C)Storage functions. (#34385)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34385

Test Plan: Imported from OSS

Differential Revision: D20311064

Pulled By: gchanan

fbshipit-source-id: 6dc50621dc417e9ea4624cdebd0970453fa75a77
2020-03-09 16:03:56 -07:00
4e357089b4 Stop calling newWithSize directly. (#34384)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34384

Test Plan: Imported from OSS

Differential Revision: D20311057

Pulled By: gchanan

fbshipit-source-id: 1e1a1f9b757b62f20d8d806f21abdd70f07b12aa
2020-03-09 16:03:51 -07:00
fea618b524 [JIT] remove list with default builtin (#34171)
Summary:
I think this was added when we couldn't compile the function itself. now we can.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34171

Differential Revision: D20269960

Pulled By: eellison

fbshipit-source-id: 0a60458d639995d9448789c249d405343881b304
2020-03-09 16:02:26 -07:00
34688d2c48 Add brand guidelines link (#34503)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34503

Differential Revision: D20349273

Pulled By: soumith

fbshipit-source-id: 6b085377741ace5d200ca0d536de433b9bb7825c
2020-03-09 15:55:52 -07:00
2e7eef41ac [quant][graphmode] Swap quantized functional linear with aten::linear (#33853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33853

Quant fusion relies on inline, but inline will break the CallFunction("linaer", ...) into a if block
it will be hard to recognize this block and swap it with quantized::linear, in order to
preserve the op, we will swap all quantized functional linear into aten::linear.
They might produce different backward graph, but this is called in the step before we get quantized
model, so it shouldn't affect anything.
We'll integrate this with convert_script later in the new "finalize_quant" API

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20343873

fbshipit-source-id: 423e03bf893b79267d2dc97bc997ee1bfe54ec0f
2020-03-09 15:45:20 -07:00
7688ca631a Enable RTTI for mobile builds, to enable custom class via torchbind in mobile (#34368)
Summary:
Custom classes via torchbind requires runtime type information.
We are trying to enable custom class based graph rewrite for XNNPACK in
this stacked PRs: https://github.com/pytorch/pytorch/pull/34047.
They require RTTI enabled for mobile. Mobile builds are failing
currently without it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34368

Differential Revision: D20306155

Pulled By: kimishpatel

fbshipit-source-id: 52c61ff5467a619e8f51708a05258eee35dd0a56
2020-03-09 15:43:55 -07:00
2c0f3536b6 [jit] Make ModuleLists a sugared value (#34320)
Summary:
Previously when emitting subscripts we only emitted actual values, but
now they may sometimes emit a `ModuleValue`, so it should stay as a
`SugaredValue`. This allows for the result of the subscript to be
treated as a real module (i.e. you can just do `self.modlist[1](inputs)`
instead of `self.modlist[1].forward(inputs)`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34320

Pulled By: driazati

Differential Revision: D20345642

fbshipit-source-id: 2bedf9a454af747b704422f6bbb8370cbdf4bf61
2020-03-09 15:36:46 -07:00
cyy
c218963270 fix more errors (#34480)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34480

Differential Revision: D20345198

Pulled By: ezyang

fbshipit-source-id: 583246acd02850ead96f1f0574d01ef6697c6352
2020-03-09 14:54:15 -07:00
15a7b9cf0a [RpcAgent] Metrics for current num active/async rpc calls. (#34398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34398

As part of PR 34109, it was suggested that we track the number of outstanding
async calls for RPC DebugInfo, particularly if we move towards using
at::launch() threads on occasion for continuations.

This particular aspect of the change was distinct from the main purpose of the
diff, and started getting bigger, so split this functionality out as a separate diff.
For completeness, we track client_active_calls, server_active_calls,
server_active_async_calls, and write some very basic unittest coverage.
ghstack-source-id: 99708836

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/...

Differential Revision: D20314994

fbshipit-source-id: 2f7c75d5c511b27ed0c09c7b8a67b6fb49df31a5
2020-03-09 13:34:59 -07:00
8294db8f15 [iOS][CI] Remove org-member from iOS Simulator Builds (#34410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34410

### Summary

Currently, the iOS jobs are not being run on PRs anymore. This is because all iOS jobs have specified the `org-member` as a context which used to include all pytorch members. But seems like recently this rule has changed. It turns out that only users from the admin group or builder group can have access right to the context values. https://circleci.com/gh/organizations/pytorch/settings#contexts/2b885fc9-ef3a-4b86-8f5a-2e6e22bd0cfe

This PR will remove `org-member` from the iOS simulator build which doesn't require code signing. For the arm64 builds, they'll only be run on master, not on PRs anymore.

### Test plan

- The iOS simulator job should be able to appear in the PR workflow

Test Plan: Imported from OSS

Differential Revision: D20347270

Pulled By: xta0

fbshipit-source-id: 23f37d40160c237dc280e0e82f879c1d601f72ac
2020-03-09 13:22:54 -07:00
776d2a1e8f [quant][graphmode] Handling ops doesn't require observation in insertObservers (#33481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33481

We have to propagate observed property of values through ops like max_pool2d, flatten and
avoid inserting duplicated observers.
For example:
```
x1 = self.conv(x)
x2 = maxpool(x1)
x3 = self.conv(x2)
```
If x1 is observed, we should propagate this information through maxpool and
we should consider x2 as observed as well.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20261897

fbshipit-source-id: 7de354a3ccb2b6e1708f5c743d4d9f7272691a93
2020-03-09 13:15:54 -07:00
2b45368e50 Fix cudnn 64bit indexing issue (#34407)
Summary:
Fix https://github.com/pytorch/pytorch/issues/33143
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34407

Differential Revision: D20325106

Pulled By: ngimel

fbshipit-source-id: 5aa52295f5491f189b7a8bea0987f28de0589d98
2020-03-09 12:35:55 -07:00
e025677e3c Remove **kwargs from torch.meshgrid (#34356)
Summary:
Changelog:
- Remove **kwargs from torch.meshgrid as they serve no purpose

Closes https://github.com/pytorch/pytorch/issues/34206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34356

Differential Revision: D20310971

Pulled By: zou3519

fbshipit-source-id: 97250051504aa3ec1e2a9af9296e7cc71872e5bf
2020-03-09 12:07:43 -07:00
70fe508c26 [pytorch] fix BUILD_CAFFE2_MOBILE gating around caffe2/operators/experimental/c10/cpu (#34354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34354

The condition `NOT INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE` was
added in #27086, but seems it's always false on current master:

BUILD_CAFFE2_MOBILE is ON by default - the name is a little bit misleading -
it is ON even when it's building non-mobile PyTorch/Caffe2. It is OFF only
when it's building PyTorch mobile, where INTERN_BUILD_MOBILE is ON.

And when it's building PyTorch mobile, it won't build caffe2/operators
at all (by setting BUILD_CAFFE2_OPS OFF: https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L345)

So I imagine the real intention is to skip when it's building Caffe2 mobile.
We can simply remove the deprecating BUILD_CAFFE2_MOBILE condition.

Test Plan: Imported from OSS

Differential Revision: D20345298

Pulled By: ljk53

fbshipit-source-id: d2cb4e2248fc209d63b2843e0f12e577e323def4
2020-03-09 12:00:57 -07:00
6d3783a6bc Clean up unused newWithSize variants. (#34383)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34383

Test Plan: Imported from OSS

Differential Revision: D20311065

Pulled By: gchanan

fbshipit-source-id: 9fc2cc4377f32c865401b04868a7405c49929c64
2020-03-09 11:19:30 -07:00
91e922a338 [AI Bench] Add support for nlu model
Summary: add support for nlu specific input

Test Plan:
tested

```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices  SM-G950U-7.0-24
```
make sure it compatible with previous test
```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices  SM-G950U-7.0-24
```

```
{
  "model": {
    "category": "CNN",
    "description": "Assistant Mobile Inference",
    "files": {
      "model": {
        "filename": "model.pt1",
        "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1",
        "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
      },
      "data": {
        "filename": "input.txt",
        "location": "/home/pengxia/test/input.txt",
        "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3"
      }
    },
    "format": "pytorch",
    "framework": "pytorch",
    "kind": "deployment",
    "name": "Assistant Mobile Inference"
  },
  "tests": [
    {
      "command": "{program} --model {files.model}  --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter {iter} --input_file {files.data} --report_pep true",
      "identifier": "{ID}",
      "metric": "delay",
      "iter": 5,
      "warmup": 2,
      "log_output": true
    }
  ]
}

```
input.txt
```
what is weather today
what time it is
set a reminder for tomorrow
```

result
https://our.intern.facebook.com/intern/aibench/details/137241352201417

Reviewed By: kimishpatel

Differential Revision: D20300947

fbshipit-source-id: 7c1619541a2e9514a560a9acb9029cfc4669f37a
2020-03-09 10:39:49 -07:00
bcfd348858 [ONNX] Export new_zeros (#34077)
Summary:
ONNX export for new_zeros op added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34077

Reviewed By: hl475

Differential Revision: D20332074

Pulled By: houseroad

fbshipit-source-id: 4235c4f2c279c37aa8dde6d13c1b26f621967768
2020-03-09 10:38:22 -07:00
baeb359e7a Remove using namespace torch::autograd from header files (#34423)
Summary:
This PR prevents leaking symbols from `torch::autograd` namespace to the root namespace.
Fixes https://github.com/pytorch/pytorch/issues/34371.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34423

Differential Revision: D20338404

Pulled By: yf225

fbshipit-source-id: e7ff3348193667a0cee5d38f9a003ae36cc704ca
2020-03-09 10:31:21 -07:00
e3d50c4dda Retain the order of parameters while generating ConcreteModuleTypes (#34131)
Summary:
`ConcreteModuleTypeBuilder` used to keep parameters together with all others attributes in an `unordered_map` often leading to reordering them while building up the type. Parameter order is semantically meaningful, so we need to preserve it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34131

Differential Revision: D20331542

Pulled By: suo

fbshipit-source-id: 5b860025f7902654d6099751d3fb14b12f6f5a67
2020-03-09 10:25:45 -07:00
f62a7e7efb Simplify implementation of newWithStorage1d. (#34382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34382

The previous implementation was handling both newWithStorage and newWithSize, which doesn't make much sense.

Test Plan: Imported from OSS

Differential Revision: D20311056

Pulled By: gchanan

fbshipit-source-id: 2696a4566e6203c98338c86cbf4c236bd18d7c49
2020-03-09 10:18:44 -07:00
b1bd950a4d Fixed stub for AdamW (#34299)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/33757](https://github.com/pytorch/pytorch/issues/33757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34299

Differential Revision: D20337844

Pulled By: ezyang

fbshipit-source-id: 54bf174a09b8db9bf6e0c3c717730dd7c795d76b
2020-03-09 08:45:51 -07:00
739d4609c3 [C++ API] Fix ModuleList compile error: error: 'begin' was not declared in this scope (#34463)
Summary:
One example in the current docs for `torch::nn::ModuleList` doesn't compile, and this PR fixes it.
Fixes https://github.com/pytorch/pytorch/issues/32414.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34463

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20331120

Pulled By: yf225

fbshipit-source-id: 50bb078fe1a900c9114d5434e92dc40ee13b52bf
2020-03-09 08:15:50 -07:00
b09e90af1e Fix C++ at::Tensor docs generation (#34467)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25845.

**Test Plan:**
Check `pytorch_cpp_doc_push` CI job, and see if there is `classat_1_1_tensor` generated (similar to `structat_1_1native_1_1_convolution_descriptor`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34467

Differential Revision: D20338190

Pulled By: yf225

fbshipit-source-id: 52dc05af5e0d742e740de5576d0d2b3e17ef28dd
2020-03-09 08:04:32 -07:00
6e2bb1c054 End of the .data removal in torch/optim (#34211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211

Test Plan: Imported from OSS

Differential Revision: D20248684

Pulled By: albanD

fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421
2020-03-09 06:40:39 -07:00
7e55494502 Warns on read-only Numpy array->tensor conversion (#33615)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/5442.

Per title (and see issue). A test is added to test_torch.py to verify the behavior.

Update (with new behavior):

NumPy arrays can be non-writeable (read-only). When converting a NumPy array to a Torch tensor the storage is shared, but the tensor is always writable (PyTorch doesn't have a read-only tensor). Thus, when a non-writeable NumPy array is converted to a PyTorch tensor it can be written to.

In the past, PyTorch would silently copy non-writeable NumPy arrays and then convert those copies into tensors. This behavior violates the from_numpy contract, however, which promises that the tensor and the array share memory.

This PR adds a warning message when a non-writeable NumPy array is converted into a Torch tensor. This will not break any networks, but will make end users aware of the behavior. They can work-around the warning message by marking their NumPy arrays as writeable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33615

Differential Revision: D20289894

Pulled By: mruberry

fbshipit-source-id: b76df0077399eb91038b12a6bf1917ef38c2cafd
2020-03-08 20:03:50 -07:00
79d47c1c5f Fix the missing ';' in Conv.cpp (#34448)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34415.
BTW, isn't this tested on CI? Maybe we need to introduce some tests with legacy versions of cuDNN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34448

Differential Revision: D20325104

Pulled By: ngimel

fbshipit-source-id: f03dec30ffa6e50a28ee8103d7d49cd6fc0a6d69
2020-03-07 21:43:18 -08:00
7d9f611b64 Add worker_name helper to dist_utils. (#34162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34162

This avoids the "worker{}".format(..) in our unit tests to something
cleaner.
ghstack-source-id: 99713074

Test Plan: waitforbuildbot

Differential Revision: D20233533

fbshipit-source-id: 5cff952ca68af5a6d26dc5cc01463cf7756d83d9
2020-03-07 13:24:45 -08:00
8a17dc65af [quantization] Make FP16 RNN use new prepack op (#34339)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34339

Test Plan: Imported from OSS

Differential Revision: D20297194

Pulled By: jamesr66a

fbshipit-source-id: 8bf6d0f2cb047e90bbdd184aaad337b143040d10
2020-03-07 10:04:01 -08:00
45a504dd2d [JIT] Introduce BuiltinOpFunction and integrate into torchbind (#34098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34098

* #33900 [JIT] Move stuff out of class_type.cpp

Test Plan: Imported from OSS

Differential Revision: D20229166

Pulled By: jamesr66a

fbshipit-source-id: d658a63a5d6e372e675f35b8456adc8de82b49f3
2020-03-07 10:03:56 -08:00
60e8615a6d [JIT] Virtualize Function (#33921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)!

Test Plan: Imported from OSS

Differential Revision: D20177227

Pulled By: jamesr66a

fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde
2020-03-07 10:03:50 -08:00
bb1114258c [JIT] Move stuff out of class_type.cpp (#33900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33900

These functions don't require any libtorch-specific functionality, so move them into the header so they're included in the ATen build

Test Plan: Imported from OSS

Differential Revision: D20175874

Pulled By: jamesr66a

fbshipit-source-id: 1efab1b60e196a635e6c6afadb042b63771170f0
2020-03-07 10:02:32 -08:00
65bad41cbe Fixed typos in quantization docs / docstrings (#34182)
Summary:
Removed extra back quote character.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34182

Differential Revision: D20320146

Pulled By: jerryzh168

fbshipit-source-id: 33c347711a052cc55f7d1a41ed959dadf99a3d7d
2020-03-06 21:53:52 -08:00
c5e822b7bb Back out "[jit] Add type tags to lists/dicts in pickle" (#34406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34406

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34405

Original commit changeset: 2f1826e6679a

Test Plan: reverting, see S197156

Reviewed By: akyrola, volkhin

Differential Revision: D20317456

fbshipit-source-id: 89298a9c022edba1d54bcdc7541804cb919e33f5
2020-03-06 20:02:16 -08:00
392afb9f8b Fix overlapping keywords (#34142)
Summary:
This commit fixes overlapping keywords in the CPP Docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34142

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20319949

Pulled By: yf225

fbshipit-source-id: e7bb2efdc286c85792c6f18a260c3bba33c54008
2020-03-06 19:16:21 -08:00
b0479506a8 Add the 3d avg pool for video related model (#33339)
Summary:
```
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 5, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 4, 1, 2, 3])

    x = x.permute([0, 4, 1, 2, 3])

    NITER = 10

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.avg_pool3d(x, kernel_size=3, stride=None, padding=0)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.avg_pool3d(q_x, kernel_size=3, stride=None, padding=0)
    time_per_iter_quant = (time.time() - s) / NITER
    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')
```

```
**** torch.qint8 *****
time/iter ms (float)  time/iter ms (quant)  quant/float
16.286182403564453  0.7308721542358398  0.04487682479080417
**** torch.quint8 *****
time/iter ms (float)  time/iter ms (quant)  quant/float
15.364313125610352  0.6497383117675781  0.042288796541418254
**** torch.qint32 *****
time/iter ms (float)  time/iter ms (quant)  quant/float
15.649032592773438  13.879132270812988  0.8869003363966556
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33339

Differential Revision: D19900904

Pulled By: lly-zero-one

fbshipit-source-id: 4522cc6b4a0751aeda6c7edc258e0cb3f55a8fe3
2020-03-06 17:44:34 -08:00
d98516026e [PyTorch BC] Clean up the BC whitelist (#34393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34393

Clean up the list

Test Plan: CI

Reviewed By: hl475

Differential Revision: D20300530

fbshipit-source-id: 50e7da0a9f8295eff33590982f32f84abee96d9c
2020-03-06 16:10:20 -08:00
ccf6fab65e Fix doc and type hints for "torch.add"; fix deprecated python calls in tests (#33935)
Summary:
This PR fixed documentation for `torch.add` with alpha. It also fixed these deprecated python calls `torch.add` and `torch.addmm` in tests, which may affect performance in *test/test_sparse.py* and *test/test_nn.py*.

cc csarofeen ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33935

Differential Revision: D20313320

Pulled By: ngimel

fbshipit-source-id: fb08413d7e244865952e3fc0e1be7f1794ce4e9a
2020-03-06 15:53:58 -08:00
01edb7450f [Lite Trainer] Add necessary registrations for MNIST model (#33717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33717

Because of the special treatment of operator names for lite interpreter, all the operators used in lite interpreter are still prepended by "_". Add the necessary registrations for MNIST model. All the ops with autograd capability are included in torch_mobile_train. After rebase the selective build from D19649074 can be utilized to strip the unused ops.

Note that this diff is for feasibility test. The training accuracy are not covered in the test.
ghstack-source-id: 97780066

Test Plan:
```
buck run xplat/caffe2/fb/lite_trainer:lite_trainer -c pt.disable_gen_tracing=1 -c pt.static_dispatch=0 -- --model=/path/MnistModel.bc
```
{F227898221}

Reviewed By: dreiss

Differential Revision: D19743201

fbshipit-source-id: cacadd76f3729faa0018d147a69466bbf54312fd
2020-03-06 15:49:03 -08:00
96ca06cfce Add nhwc memory format test for dropout (#34379)
Summary:
cc: ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34379

Differential Revision: D20310118

Pulled By: ngimel

fbshipit-source-id: a9bafd6b8fbcb57443e22181cf6bd9879b6f6051
2020-03-06 15:43:21 -08:00
37dfc6c498 Reenable large conv tests (#34259)
Summary:
Please merge after https://github.com/pytorch/pytorch/pull/33073

With that PR, we are now trying different algorithms when OOM, so hopefully there will be some algo working at low memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34259

Differential Revision: D20310094

Pulled By: ngimel

fbshipit-source-id: bccd8162bd06a0e54ac6f42a7fd9a5b766f92cd7
2020-03-06 15:36:54 -08:00
516a587438 Enhance reproducibility documentation (#33795)
Summary:
Improves explanation of non-determinism when running on GPUs. Adds info about `torch.nn.BCELoss` operating non-deterministically on GPUs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33795

Differential Revision: D20284880

Pulled By: ngimel

fbshipit-source-id: d543959636d261a80c234150304344b19a37ba5d
2020-03-06 15:32:04 -08:00
079de7f376 .circleci: Remove macOS builds related to CUDA (#34333)
Summary:
We don't release binaries for macOS with CUDA support so we should just
remove it from our regular PR pipeline

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34333

Differential Revision: D20312565

Pulled By: seemethere

fbshipit-source-id: 376228680aa0e814d1b37f1ff63b7d1262515e44
2020-03-06 13:18:06 -08:00
2d3f6cbf03 .circleci: Update default smoke tests from cuda 10.0 -> 10.2 (#34328)
Summary:
Now that https://github.com/pytorch/pytorch/issues/34241 is merged, we can update these to the latest cuda version to get a better signal.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34328

Differential Revision: D20312552

Pulled By: seemethere

fbshipit-source-id: 8e6bf797e067500d5dd9a607c6c19465028637bc
2020-03-06 13:11:58 -08:00
5608ffc46c [PyTorch] Remove const modifiers from passed by value integers in qbatch_norm_fn (#34378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34378

This fixes strange symbol mangling mismatch beteen `DECLARE_DISPATCH(qbatch_norm_fn, qbatch_norm_stub)` and `REGISTER_DISPATCH(qbatch_norm_stub, &q_batch_norm_kernel<false>);` if code is build on Windows with clang

Test Plan: CI + build PyTorch on Windows using clang

Reviewed By: EscapeZero

Differential Revision: D20309550

fbshipit-source-id: e97c7c3b6fee2e41ea6b2f8167ce197aec404e3d
2020-03-06 13:04:54 -08:00
c6ea71b6e8 Fix Conv.cpp, &&= is not a C++ operator (#34381)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34381

Differential Revision: D20310674

Pulled By: ngimel

fbshipit-source-id: a453c1d07bcf7aead7402f091bccb4af7b1ec690
2020-03-06 12:38:58 -08:00
5f641f93f1 [aten] Don't deadlock in IValue::Future impl, tests. (#34099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34099

This change effectively applies into IValue's future impl a few fixes
we discovered when using the torch::utils::Future<T> impl.

The parallel impls should probably eventually be merged, but until then:

  - Don't hold the lock when invoking the callbacks. This makes
    it effectively impossible (deadlocks) to call value() to get
    the value from inside the callback.

  - We discovered that it was slightly cleaner in practice to
    notify condition variables prior to invoking callbacks
    (best to unblock paused threads ASAP, before spawning new work).

  - Fix some var naming inconsistency.
  - Add a some caffe2 cpp test coverage.
ghstack-source-id: 99336569

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- 'JitTest\.IValueFuture'

```

Differential Revision: D20203278

fbshipit-source-id: 6e805ba547899dab9aab458e4b23049db31f930e
2020-03-06 12:34:50 -08:00
0489b8da42 Add scripts to promote S3 artifacts from test channels to stable channels (#34274)
Summary:
Currently testing against the older release `1.4.0` with:
```
PYTORCH_S3_FROM=nightly TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 scripts/release/promote/libtorch_to_s3.sh
PYTORCH_S3_FROM=nightly TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 scripts/release/promote/wheel_to_s3.sh
```

These scripts can also be used for `torchvision` as well which may make the release process better there as well.

Later on this should be made into a re-usable module that can be downloaded from anywhere and used amongst all pytorch repositories.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34274

Test Plan: sandcastle_will_deliver

Differential Revision: D20294419

Pulled By: seemethere

fbshipit-source-id: c8c31b5c42af5096f09275166ac43d45a459d25c
2020-03-06 12:18:16 -08:00
879a90b322 [ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343

Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization

tl;dr
- fp16 tensor deserialization 12x faster, serialized size 25% lower
- uint8 tensor deserialization 36x faster, serialized size 25% lower

Test Plan:
```
============================================================================
caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative  time/iter  iters/s
============================================================================
BlobProtoInt32DeserializationFloat16                        12.37ms    80.82
BlobProtoByteDeserializationFloat16             1125.46%     1.10ms   909.64
----------------------------------------------------------------------------
BlobProtoInt32DeserializationUInt8                          17.57ms    56.92
BlobProtoByteDeserializationUInt8               3629.45%   484.02us    2.07K
============================================================================
```

Reviewed By: yinghai

Differential Revision: D20137451

fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1
2020-03-06 11:58:30 -08:00
98afce3c56 Remove unnecessary assert in autograd engine (#34307)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34307

Test Plan: Imported from OSS

Differential Revision: D20283401

Pulled By: albanD

fbshipit-source-id: 34f6eb8955b7d9cb259260abc1056ddd9f354107
2020-03-06 11:45:46 -08:00
6d8a0f6731 [Aten] Init container iterators to an unsigned type (#34159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34159

This fixes `comparison of integers of different sign` warnings

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D20232085

fbshipit-source-id: 8f325be54395be54c704335cb7edf2ec7ef75e75
2020-03-06 10:35:43 -08:00
4c99351de6 [AMD] Remove num_gpu check for remote execution (#34318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34318

Stop checking whether we have AMD GPU devices on the host, because we may be constructing a net on a machine without GPU, and run the net on another one with GPU

Reviewed By: ajauhri

Differential Revision: D20269562

fbshipit-source-id: 1f561086cacdcead3ce7c03c2d02c25336c8b11a
2020-03-06 09:53:57 -08:00
4872b126fd [aten] remove stmt unreachable, variable never used warnings (#34017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34017

Remove warning
```
caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(437): warning: statement is unreachable
caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(271): warning: variable "transpose_m1" was set but never used
caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(271): warning: variable "transpose_m2" was set but never used
```

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D20181179

fbshipit-source-id: 3665912ba55bffbd8b4555f8a6803e57a502c103
2020-03-06 09:52:43 -08:00
82a177c07f [c10] remove warning attribute does not apply to any entity (#34018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34018

Remove warning
```
caffe2/c10/util/ArrayRef.h(278): warning: attribute does not apply to any entity
```

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D20181191

fbshipit-source-id: 58bd168a87a94fec925c7cde8b8d728a4257446c
2020-03-06 09:47:10 -08:00
17ceb6941f [RPC] Create local RRef<ModuleInterface> remotely in Python, use it remotely in TorchScript (#34183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34183

https://github.com/pytorch/pytorch/pull/33263 enhanced the RRef Python constructor to infer most types, by `jit::tryToInferType(..)`.

But this helper function can't infer `ScriptModule` type due to `ScriptModule`'s special per-Module type singleton logic, so it's still not possible for an Python-created RRef to know the JIT type of it's contained `ScriptModule`.

Instead of inferring the specific type of a Module, which could leads to too many candidate types (due to Module's multiple inheritance possibility), it's more straightforward to set it's type as a user-specified `ModuleInterface` type.

We added an optional argument `type_hint` for users to mark an `RRef` for what `ModuleInterface` type it's holds.

ghstack-source-id: 99649379

(Note: this ignores all push blocking failures!)

Test Plan:
Aspects that need to be confirmed in the test cases

https://fb.quip.com/aGxRAh2lCg05

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_create_local_script_class_rref

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_create_local_script_module_rref

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_return_local_script_class_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_return_local_script_module_rref_in_py_and_use_in_script

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_torchscript_function_exception
```

Differential Revision: D7065050

fbshipit-source-id: e10210c0996622969e499e4a35b0659b36787c1c
2020-03-06 08:28:22 -08:00
a7da4490cc Clean up some legacy scalar/empty handling. (#34217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34217

LegacyNoScalar variants cause 0-dim tensors to behave like 1-dim tensors.
LegacyAll variants cause 0-dim tensors to behave like 1-dim tensors, and numel == 0 tensors to be treated like 0-dimensional tensors.

This this was done by codemod, these are often unneeded and often translated incorrectly to ATen.

Test Plan: Imported from OSS

Differential Revision: D20249577

Pulled By: gchanan

fbshipit-source-id: 6f2876d3e479562c9323f3629357a73a47869150
2020-03-06 08:13:31 -08:00
9c5578fd0a Make sure Vec256 int32_t and int16_t loadu temprary arrays are properly initialized (#34281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34281

Seems like #32722 has missed two loadu functions

Test Plan: Imported from OSS

Differential Revision: D20287731

Pulled By: albanD

fbshipit-source-id: d959b2508de3f9f660368152d7260026d7fbccbe
2020-03-06 07:55:45 -08:00
35b6d2945d Tensor.random_ check that from and to are in tensor dtype bounds (#34033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34033

Test Plan: Imported from OSS

Differential Revision: D20182414

Pulled By: pbelevich

fbshipit-source-id: 3704570ead7de169ce13c81164be0aff0806fb46
2020-03-06 07:22:47 -08:00
30680196e4 Revert D20121915: [JIT] Add support for list()
Test Plan: revert-hammer

Differential Revision:
D20121915

Original commit changeset: c6c4ef444dbf

fbshipit-source-id: 829adb58780f4d0f41acebb3e7640a9c68bdbc1b
2020-03-06 07:16:40 -08:00
f9f135c5d8 ChannelsLast3d support is_contiguous, contiguous, suggest_memory_format, caching (#33033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33033

Test Plan: Imported from OSS

Differential Revision: D19759661

Pulled By: glaringlee

fbshipit-source-id: 6c4798fa93589338c0c71c5308b9fd1151330245
2020-03-06 06:02:03 -08:00
415595ace4 [C++ API] Remove init-list form of at::indexing::Slice (#34255)
Summary:
The init-list form of `at::indexing::Slice` (i.e. `tensor.index({{1, None, 2}, ...})` instead of `tensor.index({Slice(1, None, 2), ...})`) in C++ API can be easily confused with the list-form indexing in Python API (e.g. `tensor[[1, 3, 2], ...]`), which is not good from readability perspective. This PR removes the init-list form of `at::indexing::Slice` to make the API less confusing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34255

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20290166

Pulled By: yf225

fbshipit-source-id: abbcbeca0b179219e5e1f196a33ef8aec87ebb76
2020-03-06 05:51:53 -08:00
b8fd88319a C++ make torch::nn::Sequential push_back(AnyModule) methods public (#34208)
Summary:
Issue https://github.com/pytorch/pytorch/issues/33192
Moves Sequential::push_back methods with AnyModule from private -> public
Allows adding an existing AnyModule via something like:

```
  torch::nn::Sequential q;
  auto a=torch::nn::AnyModule(torch::nn::Linear(1,2));
  q->push_back(a);
  q->push_back("fc",a);
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34208

Differential Revision: D20300278

Pulled By: yf225

fbshipit-source-id: 4525319bb7fb6667e43a006c9f446a2193781005
2020-03-06 05:47:14 -08:00
9a5e9d8cec [pytorch][mobile] change mobile build scripts to build PyTorch by default (#34203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34203

Currently cmake and mobile build scripts still build libcaffe2 by
default. To build pytorch mobile users have to set environment variable
BUILD_PYTORCH_MOBILE=1 or set cmake option BUILD_CAFFE2_MOBILE=OFF.

PyTorch mobile has been released for a while. It's about time to change
CMake and build scripts to build libtorch by default.

Changed caffe2 CI job to build libcaffe2 by setting BUILD_CAFFE2_MOBILE=1
environment variable. Only found android CI for libcaffe2 - do we ever
have iOS CI for libcaffe2?

Test Plan: Imported from OSS

Differential Revision: D20267274

Pulled By: ljk53

fbshipit-source-id: 9d997032a599c874d62fbcfc4f5d4fbf8323a12e
2020-03-05 23:40:47 -08:00
b50825e011 Make RecordFunction more robust for async use cases (#34122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34122

Earlier work added support for async rpc cases when RecordFunction's
end callbacks might be called in a different thread; in addition some
extra care was needed to handle pointer to parent function;

This PR makes RecordFunction aware of potentially multiple threads in
use, as well as removes unused parent() call and restricts current()
RecordFunction to scope-based record functions (RECORD_FUNCTION macro)

Test Plan: unit tests

Differential Revision: D20297709

Pulled By: ilia-cher

fbshipit-source-id: 46a59e1b2eea0bbd8a59630385e193b38d30f9d1
2020-03-05 22:28:53 -08:00
38857734f0 [JIT] fix py35 test (#34350)
Summary:
test_module_interfaces was using syntax only supported in >= 3.6
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34350

Reviewed By: mrshenli

Differential Revision: D20298869

Pulled By: eellison

fbshipit-source-id: 22319ca403113cff2eedf57767bb34d9580e6db3
2020-03-05 21:31:19 -08:00
76035f050b [C++ API Parity] Adam: updated step and class design (#33730)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33730

Differential Revision: D20292073

Pulled By: anjali411

fbshipit-source-id: a7b4a70f29027ab355aebb91873ea55d5cb51783
2020-03-05 19:15:24 -08:00
f4da78f1b3 Remove RPC TorchScript private API (#33978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33978

We can directly pass user_callbale to rpc_async API in TorchScript. There is no need to have private API for taking qualified name.
ghstack-source-id: 99600360

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_torchscript_functions_not_supported
```

Differential Revision: D7420993

fbshipit-source-id: 228c15b21848e67418fab780e3fd6a1c6da5142d
2020-03-05 18:35:05 -08:00
02478984d6 Add support to dump unsupported ops. Add lite_interpter_load test. (#34278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34278

This diff helps check all the ops not supported by lite_interpreter.
Helpful mainly to find all the ops that need to be added instead of adding them
one by one.

Test Plan:
buck run caffe2/binaries:lite_interpreter_model_load --
--model=<bytecode-model-path>

Reviewed By: iseeyuan

Differential Revision: D20266341

fbshipit-source-id: 5a6c7a5bc52f910cea82a72045870da8105ccb87
2020-03-05 18:31:31 -08:00
434af5d94a [quant] Speed up per-channel min-max observer (#34118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34118

Previously calc_per_channel_qparams was using for loops and python primitives, which called `item` many times causing slowdown during training.
    These changes uses torch primitives on the tensor to speed up the operation over 60x

    Perf results on MobileNetV2 during training using autograd profiler

    FP32 forward call -
    Self CPU time total: 47.222ms
    CUDA time total: 124.001ms

    before change
    FakeQuant Model -
    Self CPU time total: 19.107s
    CUDA time total: 27.177s

    after change
    FakeQuant Model -
    Self CPU time total: 404.667ms
    CUDA time total: 446.344ms

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D20287841

fbshipit-source-id: 6b706b8206e0d0da3c3c217b014e8da5b71b870d
2020-03-05 18:29:41 -08:00
d2b5eb2a45 [ONNX] Fix for random generators export (#33789)
Summary:
Export random generator with dynamic input size
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33789

Reviewed By: hl475

Differential Revision: D20121175

Pulled By: houseroad

fbshipit-source-id: c16d11eb07678166d125759d97aadfcd7c80ef14
2020-03-05 17:58:54 -08:00
89d314b5d5 [pytorch] update mobile docker image version (#34337)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34337

Test Plan: Imported from OSS

Differential Revision: D20296975

Pulled By: ljk53

fbshipit-source-id: bc4a39689dca22e4530f25225f1884eda9bc74de
2020-03-05 17:47:36 -08:00
1cf12b7e53 [quant] Fix histogram observer to work with QAT on GPU (#34232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34232

By default `torch.zeros` creates the tensor on GPU. Need to specify the device argument to get it to work correctly on GPU during QAT.

Test Plan:
1. Tested by running QAT on GPU

2. python test/test_quantization.py

Imported from OSS

Differential Revision: D20286351

fbshipit-source-id: 745723c85d902870c56c1c7492f26cb027ae9dc6
2020-03-05 17:19:12 -08:00
e4a883e601 cuDNN convolution try multiple algo (#33073)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/31336 https://github.com/pytorch/pytorch/issues/1664

Sometimes cuDNN heuristics return algorithms that can not be used. Instead of just using the first algorithm returned, we should try these algorithms one by one until one of them succeed.

Benchmark:
https://github.com/zasdfgbnm/things/blob/master/2020Q1/conv-benchmark.ipynb
```python
i = torch.randn(256, 3, 256, 256).cuda()
c = torch.nn.Conv2d(3, 3, 3, 3).cuda()

%timeit c(i); torch.cuda.synchronize()
```
before vs after = 498 vs 490 µs

The performance is improved I guess because, before this PR, we always call the heuristics to get the algorithm, but after this PR, we only do at the first time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33073

Differential Revision: D20284755

Pulled By: ngimel

fbshipit-source-id: b03af37c75939ca50c2cb401c706ba26914dd10e
2020-03-05 17:06:21 -08:00
5500c3de0a Revert D20150304: [pytorch][PR] [JIT] Introduce a fake Tensor creation node for IR unit tests
Test Plan: revert-hammer

Differential Revision:
D20150304

Original commit changeset: c88f5289055a

fbshipit-source-id: 14ac0e46145e9fb4f200c6318b63edd541380aeb
2020-03-05 16:25:08 -08:00
78aebbcb88 [JIT] add other module apis (#34106)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34106

Test Plan: Imported from OSS

Differential Revision: D20283996

Pulled By: eellison

fbshipit-source-id: 88e7bc4547e96717d6c8efe0b25ede0d198d9e68
2020-03-05 16:12:29 -08:00
2af64ba3ed Allow output to zero-strided tensors if the size is <= 1 along that dim (#34100)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33812
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34100

Differential Revision: D20267778

Pulled By: ngimel

fbshipit-source-id: 1b84c4f6e6bf5d29c3698daa3cb71554b25c1eee
2020-03-05 16:01:33 -08:00
ccf4d69b75 [Lite Interpreter] Enable __setstate__ (#33294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294

1. Serialize bytecode of __setstate__ and run it when loading the model.
2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile.

Test Plan: Imported from OSS

Differential Revision: D20162898

Pulled By: iseeyuan

fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c
2020-03-05 15:24:21 -08:00
765c5b1c95 .circleci: Add CUDA 10.2 to CI (#34241)
Summary:
Basically a re-do of https://github.com/pytorch/pytorch/pull/33471

Should be safe to merge now that https://github.com/pytorch/pytorch/issues/34135 has been merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34241

Differential Revision: D20292711

Pulled By: seemethere

fbshipit-source-id: c508b5ef58f52aa3a263fd33b0373f31719fa0a4
2020-03-05 15:06:34 -08:00
f218842f2e [JIT] Add support for list() (#33818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33818

Test Plan: Imported from OSS

Differential Revision: D20121915

Pulled By: eellison

fbshipit-source-id: c6c4ef444dbf1d4134dccb28c13315e225945b64
2020-03-05 14:48:20 -08:00
479c3b0aa5 [JIT] add support for torch.norm (#33783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33783

Fix for https://github.com/pytorch/pytorch/issues/20113

Test Plan: Imported from OSS

Differential Revision: D20121917

Pulled By: eellison

fbshipit-source-id: ffedcc40678cd80f5529ff9323088eed544e5158
2020-03-05 14:46:24 -08:00
beb4309406 [ONNX] Reduce ONNX test time on CI (#33242)
Summary:
Among all ONNX tests, ONNXRuntime tests are taking the most time on CI (almost 60%).
This is because we are testing larger models (mainly torchvision RCNNs) for multiple onnx opsets.
I decided to divide tests between two jobs for older/newer opsets. This is now reducing the test time from 2h to around 1h10mins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33242

Reviewed By: hl475

Differential Revision: D19866498

Pulled By: houseroad

fbshipit-source-id: 446c1fe659e85f5aef30efc5c4549144fcb5778c
2020-03-05 14:38:34 -08:00
ff2731b45c Revert "Disable MNIST test in test_xla() (#34261)" (#34316)
Summary:
Should be passing now ;)
This reverts commit 4a194f89aadc7cd1d7e24622b53855cfb885da75.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34316

Reviewed By: mrshenli

Differential Revision: D20287196

Pulled By: ailzhang

fbshipit-source-id: 1cc48a11edcc48a0ec4161c94487912eba63c9a5
2020-03-05 14:27:26 -08:00
9651088228 Tuck the packing logic into Int8FCPackWeight op (#34289)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34289

Test Plan:
```
 buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test
```

Reviewed By: csummersea

Differential Revision: D20275538

fbshipit-source-id: 699ca2a145c7c9a50b0fdab7bd68d8557a031ac0
2020-03-05 13:43:08 -08:00
9ce833879f [JIT] Introduce a fake Tensor creation node for IR unit tests (#33914)
Summary:
**Summary**
There is often a need to create a Tensor when writing IR by hand for JIT
optimisation pass unit tests. The only options for this today are real
Tensor creation functions like `aten::ones`. Any test that uses these functions
must also use the same default arguments as the Python/C++ API, which means
that all of the tests have to be updated when the API is updated. This commit
introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that
should be used in unit tests instead of real Tensor creation functions. This new
primitive has no public-facing API, so the maintenance burden is much lower.

**Testing**
This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of
`aten::rand`, `aten::ones`, and `aten::zeros`.

```
$ ./bin/test_jit
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = *-*_CUDA:*_MultiCUDA
[==========] Running 75 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 75 tests from JitTest
[ RUN      ] JitTest.ADFormulas
[       OK ] JitTest.ADFormulas (82 ms)
[ RUN      ] JitTest.Attributes
[       OK ] JitTest.Attributes (0 ms)
...
...
...
[ RUN      ] JitTest.LiteInterpreterPrim
[       OK ] JitTest.LiteInterpreterPrim (0 ms)
[ RUN      ] JitTest.LiteInterpreterLoadOrigJit
[       OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms)
[----------] 75 tests from JitTest (150 ms total)

[----------] Global test environment tear-down
[==========] 75 tests from 1 test case ran. (150 ms total)
[  PASSED  ] 75 tests.
```

**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33500.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33914

Differential Revision: D20150304

Pulled By: SplitInfinity

fbshipit-source-id: c88f5289055a02dc20b7a5dcdf87469f9816d020
2020-03-05 12:42:42 -08:00
75d29f8d3e Allow converting IValue to vector<string> (#34269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269

follow up for https://github.com/pytorch/pytorch/pull/16519

Test Plan: unit tests

Reviewed By: houseroad

Differential Revision: D20261495

fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb
2020-03-05 12:31:23 -08:00
3a4bac5c76 Throw a proper error when parsing local variable annotations without assignments (#34133)
Summary:
Currently, putting `outputs: List[Tensor]` instead of `outputs: List[Tensor] = []` in your JITed code results in:
```
Traceback (most recent call last):
  File "custom_lstms.py", line 453, in <module>
    test_script_stacked_bidir_rnn(5, 2, 3, 7, 4)
  File "custom_lstms.py", line 404, in test_script_stacked_bidir_rnn
    rnn = script_lstm(input_size, hidden_size, num_layers, bidirectional=True)
  File "custom_lstms.py", line 62, in script_lstm
    other_layer_args=[LSTMCell, hidden_size * dirs, hidden_size]))
  File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1267, in script
    return torch.jit._recursive.create_script_module(obj, torch.jit._recursive.infer_methods_to_compile)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 305, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct
    init_fn(script_module)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn
    scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct
    init_fn(script_module)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn
    scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct
    init_fn(script_module)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn
    scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct
    init_fn(script_module)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn
    scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 317, in create_script_module_impl
    stubs = stubs_fn(nn_module)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 511, in infer_methods_to_compile
    stubs.append(make_stub_from_method(nn_module, method))
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 41, in make_stub_from_method
    return make_stub(func)
  File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 34, in make_stub
    ast = torch.jit.get_jit_def(func, self_name="RecursiveScriptModule")
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 173, in get_jit_def
    return build_def(ctx, py_ast.body[0], type_line, self_name)
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 206, in build_def
    build_stmts(ctx, body))
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 129, in build_stmts
    stmts = [build_stmt(ctx, s) for s in stmts]
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 129, in <listcomp>
    stmts = [build_stmt(ctx, s) for s in stmts]
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 181, in __call__
    return method(ctx, node)
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 294, in build_AnnAssign
    rhs = build_expr(ctx, stmt.value)
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 180, in __call__
    raise UnsupportedNodeError(ctx, node)
  File "/home/apaszke/pytorch/torch/jit/frontend.py", line 116, in __init__
    source_range = ctx.make_range(offending_node.lineno,
AttributeError: 'NoneType' object has no attribute 'lineno'
```

This patch makes the error message more reasonable:
```
torch.jit.frontend.UnsupportedNodeError: annotated assignments without assigned value aren't supported:
  File "custom_lstms.py", line 221
        # type: (Tensor, Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]
        inputs = reverse(input.unbind(0))
        outputs: List[Tensor]
        ~ <--- HERE
        for i in range(len(inputs)):
            out, state = self.cell(inputs[i], state)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34133

Differential Revision: D20249076

Pulled By: ezyang

fbshipit-source-id: 40ec34ad38859f9fe56f379d3f8d08644b00fab9
2020-03-05 11:23:07 -08:00
ed11e2536a [pytorch_ci] Skip determination tests in rocm
Summary: I don't know why, but this segfaults on rocm.

Test Plan: Can only be tested on master

Reviewed By: mrshenli

Differential Revision: D20286011

fbshipit-source-id: dde952449bf54ae459d36020f3e3db6fa087b39f
2020-03-05 11:23:02 -08:00
e907128caf [ROCm] Enable BFloat16 type for pooling ops (#34166)
Summary:
This PR enables bfloat16 type for pooling ops on ROCm. Also adds bfloat16 implementation of atomicAdd since pooling ops use it.

Note: Changes in the lambda function blocks is only indentation as it is now wrapped inside `AT_SKIP_BFLOAT16_IF_NOT_ROCM` macro.

iotamudelta ezyang bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34166

Differential Revision: D20263421

Pulled By: ezyang

fbshipit-source-id: 3f4199ec57522e638ec29f45e22c6ec919b7816d
2020-03-05 11:20:54 -08:00
8216d9ae64 ONNX Export Support for NLLLoss (#33509)
Summary:
Adding ONNX export support for torch.nn.NLLLoss().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33509

Reviewed By: hl475

Differential Revision: D20052212

Pulled By: houseroad

fbshipit-source-id: 62efcff4efa1e0e97c65ad1b670c2fc1da08d28f
2020-03-05 11:13:21 -08:00
e642a65bea [pytorch][CI] add e2e mobile custom build jobs to CI (#34184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34184

Add mobile custom build with static dispatch & dynamic dispatch to CI.
Most of mobile code analysis CI should be covered by the custom build +
dynamic dispatch flow, so changing it to running on master only.

Test Plan: Imported from OSS

Differential Revision: D20241774

Pulled By: ljk53

fbshipit-source-id: f34c5748735c536ab6b42c8eb1429d8bbdaefd62
2020-03-05 10:26:45 -08:00
d98bd5e1f5 [test all] Back out "Revert D20171428: [profiler] fix chrome tracing for profiler run with cuda"
Summary:
There was an error in
https://github.com/pytorch/pytorch/pull/30724/files that resulted in
export_chrome_trace generating invalid JSON. This only came up when the
profiler is run with use_cuda=True from what it looks like. In the future, we
should have tests that ensure we generate valid JSON because we no longer use
the json library.
ghstack-source-id: 99508836

Test Plan: Added a unit test.

Differential Revision: D20237040

fbshipit-source-id: 510befbdf4ec39632ac56544afcddee6c8cc3aca
2020-03-05 09:05:56 -08:00
4a194f89aa Disable MNIST test in test_xla() (#34261)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34261

Test Plan: Imported from OSS

Differential Revision: D20260350

Pulled By: mrshenli

fbshipit-source-id: b92a6b79e59bdfdf8e68b5dd73f87ea1dfd0daed
2020-03-05 07:55:52 -08:00
Jie
2b79bab029 [CUDA_FUSER] Fork CUDA fuser (#33527)
Summary:
Separating CUDA fuser from CPU fuser.

1. New node in IR - prim::CudaFusionGroup:
   This enables the cuda fuser to co-exist along side the old fuser. Allows us
   to incrementally build and expand cuda fuser.

2. copied FuseGraph optimization passes to CudaFuserGraph:
   We will re-factor & reuse Chunk/Concat in the old fuser logic, which is
   handled in the optimization pass at this moment. Unfortunately many code in
   the pass is tightly binded with the legacy fuser, which makes code sharing
   difficult.
   The CudaFusionGraph will support only a subset of operations comparing to
   legacy fuser (CUDA only). It is registered as a custom pass post fusion via
     ```torch._C._jit_register_cuda_fuser()```
   To have it in effect, you should also turn off fusion on GPU via
     ```torch._C._jit_override_can_fuse_on_gpu(False)```

3. We don't have codegen in this PR yet (WIP). Currently we just fall back to
   the old fuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33527

Differential Revision: D20171598

Pulled By: ZolotukhinM

fbshipit-source-id: 9a3c0f06f46da7eaa80ae7551c04869f5b03ef71
2020-03-04 20:25:08 -08:00
e132047f1b [JIT] fix alias assertion (#34268)
Summary:
[This check](019ffdca31/torch/csrc/jit/ir/alias_analysis.cpp (L772)) wasn't being triggered for None outputs of tuples, because `mustBeNone` would return false if `num_outputs != 1`.  This caused an assertion to fail in alias analysis. It's kind of a convoluted case to repro and I wasn't able to make a succinct one, but I tested internally and it fixed the bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34268

Differential Revision: D20261539

Pulled By: eellison

fbshipit-source-id: 95edea10e2971727cfd3f3bc2b6bdf9dbadca6a9
2020-03-04 19:00:58 -08:00
e2ddf935bb Run RPC JIT tests with variable type hints only in Python >=3.6 (#34284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34284

Python 3.5 only supports function type hints.
Variable type hints are introduced in Python 3.6.
So these tests with JIT type hints will fail with "Syntax Error" in Python 3.5 environment.

ghstack-source-id: 99542199

Test Plan: `

Differential Revision: D7348891

fbshipit-source-id: c4c71ac021f35b5e6f7ce4d3e6af10dd1d2600cc
2020-03-04 18:59:08 -08:00
c62de4286e Add test to verify dist_autograd doesn't populate .grad field. (#33949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33949

ghstack-source-id: 99419830

Test Plan: waitforbuildbot

Differential Revision: D20165254

fbshipit-source-id: ef4413637b1568d81e4aca053838230025df6bba
2020-03-04 17:08:48 -08:00
e1c6f93f14 Clean warning message (#34143)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34143

Test Plan: Imported from OSS

Differential Revision: D20228174

Pulled By: VitalyFedyunin

fbshipit-source-id: 7ab873e87be8621b0f72e8300942fd82cbc19b29
2020-03-04 15:02:19 -08:00
1546d2afeb [pytorch_ci] Don't run determination tests in py35
Test Plan: Can only really be tested in PyTorch master

Reviewed By: mrshenli

Differential Revision: D20260023

fbshipit-source-id: b5444c376894bfccd6524cf04a71cf76eea72275
2020-03-04 14:23:40 -08:00
e236e15934 [quant] Run weight_post_process for QAT (#33852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33852

This fixes an issue for QAT models. During eval if we call `prepare_qat` and `convert` before calling `load_state_dict` it throws an error because the weight info (num channels) is not updated in the observer module.
It is not an issue for per-tensor case

Fixes issue #33830

Test Plan:
python test/test_quantization.py EagerModePostTrainingQuantTest.test_eval_after_train
python test/test_quantization.py EagerModeQuantizationAwareTrainingTest.test_eval_after_train

Imported from OSS

Differential Revision: D20212996

fbshipit-source-id: a04af8fe4df2e555270ae4d6693f5777d86f8a46
2020-03-04 14:01:32 -08:00
d59e036f4d Revert D20194092: Add support to dump unsupported ops. Add lite_interpter_load test.
Test Plan: revert-hammer

Differential Revision:
D20194092

Original commit changeset: 0d596cd02043

fbshipit-source-id: 17b4bae27543f231bd6c12d90368d399ca55ebdf
2020-03-04 13:53:58 -08:00
17a5c67796 Add support to dump unsupported ops. Add lite_interpter_load test. (#34072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34072

This diff helps check all the ops not supported by lite_interpreter.
Helpful mainly to find all the ops that need to be added instead of adding them
one by one.

Test Plan:
buck run caffe2/binaries:lite_interpreter_model_load --
--model=<bytecode-model-path>

Reviewed By: iseeyuan

Differential Revision: D20194092

fbshipit-source-id: 0d596cd0204308027194af7ed738551d0c32a374
2020-03-04 13:18:12 -08:00
385067ed4f [pytorch][cmake] improve build mobile with host toolchain (#34187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34187

Noticed that a recent PR broke Android/iOS CI but didn't break mobile
build with host toolchain. Turns out one mobile related flag was not
set on PYTORCH_BUILD_MOBILE code path:
```
"set(INTERN_DISABLE_MOBILE_INTERP ON)"
```

First, move the INTERN_DISABLE_MOBILE_INTERP macro below, to stay with
other "mobile + pytorch" options - it's not relevant to "mobile + caffe2"
so doesn't need to be set as common "mobile" option;

Second, rename PYTORCH_BUILD_MOBILE env-variable to
BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN - it's a bit verbose but
becomes more clear what it does - there is another env-variable
"BUILD_PYTORCH_MOBILE" used in scripts/build_android.sh, build_ios.sh,
which toggles between "mobile + pytorch" v.s. "mobile + caffe2";

Third, combine BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN with ANDROID/IOS
to avoid missing common mobile options again in future.

Test Plan: Imported from OSS

Differential Revision: D20251864

Pulled By: ljk53

fbshipit-source-id: dc90cc87ffd4d0bf8a78ae960c4ce33a8bb9e912
2020-03-04 11:43:16 -08:00
93990bab58 Make use of our S3 mirror if Yann Lecunn's website is not accessible (#34215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34215

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20251538

Pulled By: ezyang

fbshipit-source-id: c419f0ce869aca4dede7e37ebd274a08632d10bf
2020-03-04 11:35:34 -08:00
67608cc018 Fix MKLDNN conv2d 5d weight handling (#34115)
Summary:
Effectively backporting c5c00c119f before that PR lands

The bug didn't manifesting itself earlier because MkldnnConv2d constructor didn't reorder the weights. So the issue was arising only on second serialization/deserialization. This also fixes the constructor to deliver better perf right away.

Note, that I still serialize 5d tensor - it was the previous behavior, we have to handle it anyway and with https://github.com/pytorch/pytorch/issues/32422 the output of `mkldnn_reorder_conv2d_weight` will always be 4d.

cc pinzhenx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34115

Reviewed By: wanchaol

Differential Revision: D20224685

Pulled By: dzhulgakov

fbshipit-source-id: 24ca9227c4eb4c139096a64ae348808d7478d7dc
2020-03-04 11:26:38 -08:00
9dd5d51b01 [ATen] Exclude CUDA tests when running basic under valgrind (#34181)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34181

Test Plan: CI

Reviewed By: orionr, seemethere

Differential Revision: D20241021

fbshipit-source-id: a7371afc45acc2c07a36c8216036338e14170a56
2020-03-04 11:24:33 -08:00
8269c4f3d3 Added nullptr check for pthradpool_get_threads_count (#34087)
Summary:
We get seg fault without this in using XNNPACK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34087

Differential Revision: D20199787

Pulled By: kimishpatel

fbshipit-source-id: d3d274e7bb197461632b21688820cd4c10dcd819
2020-03-04 11:10:53 -08:00
ac6e75a165 Revert D20195053: [pytorch][PR] Add API for listing functions overridable by __torch_function__
Test Plan: revert-hammer

Differential Revision:
D20195053

Original commit changeset: 1585f4e405f5

fbshipit-source-id: 3c1aab9c60e3138d40d200ae4238bda0cddf8896
2020-03-04 10:13:54 -08:00
78b81dad83 [Dist Autograd][Better Engineering] Enhanced Error Reporting in Dist Autograd/RPC (#34179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34179

Fixes: https://github.com/pytorch/pytorch/issues/27644

Test Plan: Asserted `test_backward_autograd_engine_error` throws an exception with node information.

Differential Revision: D20238150

fbshipit-source-id: a49b279b77416a7e0e09043aa44ed616023d8e70
2020-03-04 10:13:49 -08:00
45b8c8dbcb [torch] Fix sign-compare warning in torch::utils::rnn:pack_sequence (#34185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34185

ArrayRef<T>::size() is size_t

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D20241552

fbshipit-source-id: 73cd062db810ebc5a4e34e094dfe6c7e6571ef2d
2020-03-04 10:13:45 -08:00
39f78db7ec optimize UpSampleNearest 1d 2d and 3d performance on CPU (#31452)
Summary:
This PR aims at improving `UpSample` performance with `mode='nearest'` on 1D 2D and 3D, both inference and training are covered. Current implementation from 'ATen' doesn't have parallelization.

1. single socket inference speedup for 1d, 2d and 3d: **63x, 57x, 46x**.
2. single core inference speedup for 1d, 2d and 3d: **5.9x, 4.6x, 3.4x**.
3. dual sockets training speedup for 1d, 2d and 3d: **38x, 33x, 65x**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31452

Differential Revision: D20077828

Pulled By: VitalyFedyunin

fbshipit-source-id: a7815cf2ae344696067d2ec63bd4f4e858eaafff
2020-03-04 10:13:41 -08:00
112cecc440 Remove the use of macros when defining division between integers (#34104)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34104

Test Plan: Imported from OSS

Differential Revision: D20222676

Pulled By: VitalyFedyunin

fbshipit-source-id: fb026ce7843e7931324ea82542fb07784e40efdb
2020-03-04 10:13:36 -08:00
438f4ea0ac Cleaner implementation of bitwise operations of integeral types (#33849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33849

For integral types, there is no need to manipulate with
`reinterpret_cast` and therefore a cleaner implementation is available.
This might also be helpful on some less optimized compilers or on a less optimized arch (while a
test on gcc 8.3 x64 shows no difference in performance).

Test Plan: Imported from OSS

Differential Revision: D20222675

Pulled By: VitalyFedyunin

fbshipit-source-id: 875890d1479f8abab4c4a19d934fe9807d12dfd2
2020-03-04 10:13:32 -08:00
3a3fcbbc39 Use templates instead of macros when defining bitwise operators. (#33835)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33835

Test Plan: Imported from OSS

Differential Revision: D20131414

Pulled By: VitalyFedyunin

fbshipit-source-id: ec7eb7cb14e037a277cc8d71d5c9df27abf51752
2020-03-04 10:11:36 -08:00
78ad3dc174 Fix Lint (#34218)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34218

Test Plan: Imported from OSS

Differential Revision: D20249788

Pulled By: mrshenli

fbshipit-source-id: 5ca2acaff5344fc4455c70af60576f8e93e54cbf
2020-03-04 09:48:57 -08:00
6f52562e75 [quant][graphmode] Add add_relu pattern in skip values (#32816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32816

att

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20208786

fbshipit-source-id: ef84b77f46f88b192a75c123aabaa203836a7dfb
2020-03-04 09:36:02 -08:00
22506ae71d Reduce code duplication in OperatorEntry by keying hash map on optional<DispatchKey> (#33817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33817

Then, nullopt denotes catch all, whereas everything else is specific to
a DispatchKey.  I can delete the second copy of methods when I do this.
This refactor should be pushed all the way to the frontend but I am doing
it one step at a time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20125163

Pulled By: ezyang

fbshipit-source-id: 026075a4bab81b0bd88b07f0800f6e6bbeb2166a
2020-03-04 08:57:22 -08:00
c688eb28a2 Minor fix for quantizing the Ads complex model
Summary:
Remove Int8Relu in quantized model
Suppress log warnings if verbose is false

Test Plan: TBD

Reviewed By: yinghai

Differential Revision: D20202474

fbshipit-source-id: 995ef8e665d8edeee810eedac831440b55271a7b
2020-03-04 08:34:59 -08:00
5f4a01b2ea Update MAGMA to 2.5.2 for Windows (#34205)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34205

Differential Revision: D20248224

Pulled By: soumith

fbshipit-source-id: f5e0fe06aa8f8ee551abe45db1d55d06e95ab928
2020-03-04 08:28:09 -08:00
f6c883ccea TH: Defer to ATen's AVX detection code (#34088)
Summary:
As per https://github.com/pytorch/pytorch/issues/22338#issuecomment-593028168, this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34088

Differential Revision: D20236039

Pulled By: ezyang

fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c
2020-03-04 08:22:02 -08:00
fdd771c90f Make tracing in code gen optional (#33715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33715

Tracing codes depend on the full JIT, which is not available in lite interpreter. Use `-c pt.disable_gen_tracing=1` to turn off generating tracing part.
ghstack-source-id: 99252322

Test Plan:
```
buck build xplat/caffe2:torch -c pt.disable_gen_tracing=1
```
The tracing part of generated/VariableType_?.cpp will not be generated.

Reviewed By: smessmer

Differential Revision: D19684577

fbshipit-source-id: a1e5b80eca5e51c7bf72b5cc8f0e36c2135fabc2
2020-03-04 08:16:31 -08:00
790274bff2 [caffe2] Fix signed unsigned comparison warning (#34161)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34161

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D20232087

fbshipit-source-id: 09dc8d452c5923cd2941e0cc01eac7a6677b38e8
2020-03-04 08:02:44 -08:00
6d78882158 Add layout.html to template for stable docs (#33770)
Summary:
When docs are built, conf.py points to a _templates-stable/layout.html that does not exist.
Adding this file here so future stable docs will build with Google Analytics tags and without the unstable able that is in _templates/layout.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33770

Differential Revision: D20164895

Pulled By: jlin27

fbshipit-source-id: 5fca9f9b825b1484dab52e2b2d91f92ae6372371
2020-03-04 03:14:52 -08:00
fc6dce6033 [c10] Fix TORCH_INTERNAL_ASSERT_DEBUG_ONLY MSVC bug (#34173)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34173

Test Plan:
Temporarily change `AT_ASSERTM` to `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` to test MSVC fix.

```
buck test mode/opt //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest'
```

& CI

Reviewed By: yinghai

Differential Revision: D20235886

fbshipit-source-id: 2b7d618e924a0ede95f4a6b8f60cc08e9d58b09d
2020-03-04 02:45:35 -08:00
f097ca503d Add and test training in lite interpreter. (#32359)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32359

Test Plan: Imported from OSS

Differential Revision: D19450614

Pulled By: iseeyuan

fbshipit-source-id: 6bafff39d7880a5b7fb9cd70c33a4e584812be12
2020-03-03 23:33:43 -08:00
2ba74b741e Add backward Int8Quantize shape inference (#34152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34152

Propagate the input shape of Int8Quantize backwards.

Test Plan:
```
buck test caffe2/caffe2/opt:bound_shape_inference_test
```

Reviewed By: csummersea

Differential Revision: D20231521

fbshipit-source-id: a77c61b0d5bc570241e62553cecd9ff38553ff44
2020-03-03 22:04:25 -08:00
57c1b80ec2 [pytorch]Migrate _th_ger to Aten and kill resize_scalar in codegen (#33792)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33792

Test Plan: Imported from OSS

Differential Revision: D20107158

Pulled By: glaringlee

fbshipit-source-id: bceddb2d39d3abf36f277daba537677312449c9c
2020-03-03 20:27:54 -08:00
7d01888a75 [JIT] Register rpc.rpc_async(..) as a JIT operator (#33329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33329

# Use case

```
torch.jit.script
def send_rpc_async(dst_worker_name, user_callable_qual_name, tensor):
    # type: (str, str, Tensor) -> None
    rpc._rpc_async_torchscript(
        dst_worker_name, user_callable_qual_name, args=(tensor,)
    )
```

# Problem

```
torch.jit.frontend.NotSupportedError: keyword-arg expansion is not supported:
  File "/data/users/shihaoxu/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/rpc/rpc_spawn#binary,link-tree/torch/distributed/rpc/api.py", line 722
    args = args if args else ()
    kwargs = kwargs if kwargs else {}
    fut = _invoke_rpc_torchscript(to, qualified_name, *args, **kwargs)
                                                               ~~~~~~ <--- HERE
    return fut
```

# Solution

Register `rpc.rpc_async(..)` as a JIT operator to handle variable-length argument list.

# Plan

This PR is the required changes to make `rpc.rpc_async(..)` a JIT prim operator, which can dynamically handle different number of arguments.

- Register "prim::rpc_async" as a `Symbol` in "interned_string.h"
- Add a if branch in "python_sugared_value.cpp" `toSugarValue(py::object, ..)` entry utility function to set up how JIT frontend convert `torch.distributed.rpc.rpc_async(..)` Python function (Python object) into a `SpecialFormValue` (IR SugaredValue).
- Add a switch case for "prim::rpc_aynsc" Symbol in "ir_emitter.cpp" and `emitApplySpecialForm(..)` to set up how JIT compiler provides inputs to the "prim::rpc_aynsc" Operator.
- Register "prim::rpc_async" as a `jit::Operator` and provide implementation in "register_distributed_ops.cpp".

Notice, since the distributed module is an optional part when building PyTorch. The code to be added in this PR should be wrapped within preprocessing maco.
```
#ifdef USE_DISTRIBUTED
new code here
#endif
```

Test Plan:
Items that need to be confirmed in the test cases

https://fb.quip.com/DCvdA9ZLjeO0

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork  \
\
&& buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_call_python_function_remotely_from_script_not_supported
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn
```

```
buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test-2.7 -- test_layer_norm_op_jit
```

Differential Revision: D5738300

fbshipit-source-id: a4604fe762e00be062dc8232ca9790df31fb2074
2020-03-03 19:57:42 -08:00
9b39ad7f2c [jit] Fix iOS build (#34180)
Summary:
`unpickler.cpp` depends on the mobile type parser all the time, so include it regardless of whether it's a mobile build or not
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34180

Pulled By: driazati

Differential Revision: D20241881

fbshipit-source-id: a998dd2b3f1c7f58e55bb7851dc595c8ddf9eacb
2020-03-03 19:44:43 -08:00
3c042a6ab9 [pytorch][mobile] support for custom mobile build with dynamic dispatch (#34055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34055

Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D20193327

Pulled By: ljk53

fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa
2020-03-03 19:25:16 -08:00
e5bbd23ca7 [quant][graphmode] Skip quantizing input and output in matched module (#32814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32814

We skip quantization for the intermediate values for patterns like `Conv - ReLU`,
but currently we didn't skip quantizing the input/output of the graphs of matched modules,
since we now changed the way we add observers, this also needs to be updated.

Test Plan:
python test/test_jit.py -- 'TestJit.test_insert_observers_skip_values'

Imported from OSS

Differential Revision: D20208785

fbshipit-source-id: ce30f2c4c8ce737500d0b41357c80ec8b33aecf9
2020-03-03 18:38:36 -08:00
7cee787a19 [pytorch_ci] Python target determinator (#33577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/33221

This will make it so that if a pull request is just pure Python files, then we'll only run the Python tests that are connected to the dependency graph of the touched files.

Assumptions made:
- the Python code does not do dynamic imports
- test_X.py never imports from test_Y.py

Right now this is only done for test_nn (presumably the largest test entrypoint), but it's not much more work to do it for all the other test entrypoints too.

Test Plan:
CircleCI results when touching just a few Python files:
- pytorch_macos_10_13_py3_test: 41 ->13 minutes https://circleci.com/gh/pytorch/pytorch/4550574?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
- pytorch_windows_vs2019_py36_cuda10.1_test1: 11 -> 2 minutes https://circleci.com/gh/pytorch/pytorch/4550846?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
- pytorch_windows_vs2019_py36_cuda10.1_test2: 51 -> 21 minutes https://circleci.com/gh/pytorch/pytorch/4550845?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
- pytorch_linux_xenial_py3_6_gcc5_4_test: 41 -> 14 minutes https://circleci.com/gh/pytorch/pytorch/4550543?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Differential Revision: D20009089

fbshipit-source-id: 41708cc301d1c866eb92a04421d8346feb0e3cb5
2020-03-03 18:01:12 -08:00
7c20578794 NNPI op mapping correct SpatialBN NNPI op name (#34176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34176

Wrong operator name for the NNPI SpatialBN

Test Plan: flow canary

Reviewed By: hyuen

Differential Revision: D20237933

fbshipit-source-id: dfde658dcbf2482320e36d549f7d83c27df264a0
2020-03-03 17:57:28 -08:00
a19db54b36 [Redo][ATen] Remove AT_ASSERTM from Blob::free_() (#34168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34168

Redo D19153199. It was reverted because it broke CI, due to the change of `AT_ASSERTM` to `TORCH_INTERNAL_ASSERT_DEBUG_ONLY`. Two problems:
1) bug in `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` about MSVC. I'm sending another diff to fix this bug.
2) BlobTest was expecting `Blob::template Get<T>()` to throw when there is a type mismatch.

For now I'll leave `AT_ASSERTM` as it is.

Test Plan:
```
buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' --run-disabled
buck test mode/opt //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' --run-disabled
```

Reviewed By: yinghai

Differential Revision: D20235225

fbshipit-source-id: 594dad97c03c419afaa8f9023408bc5a119b3cfa
2020-03-03 17:54:05 -08:00
31cc311143 Expose CUDACachingAllocator raw_alloc and raw_delete to python (#33860)
Summary:
This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls).

Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal https://github.com/cupy/cupy/pull/3126

We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33860

Differential Revision: D20212788

Pulled By: ngimel

fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c
2020-03-03 17:50:11 -08:00
4edff32f81 [c10] Fix typo in __assert_fail noreturn modifier guard (#34157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34157

`[[noreturn]` only conficts with CUDA __asert_fail defition if clang is used if host compiler

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D20232088

fbshipit-source-id: 7182c28a15278e03175865cd0c87410c5de5bf2c
2020-03-03 17:25:25 -08:00
99e211e661 [jit] Add type tags to lists/dicts in pickle (#33255)
Summary:
Stacked PRs
 * #33474 - [jit] Remove list specializations from pickler
 * **#33255 - [jit] Add type tags to lists/dicts in pickle**

This adds a global call to `torch.jit._pickle.restore_type_tags` for
lists and dicts so that we can preserve their types after serialization.
](https://our.intern.facebook.com/intern/diff/19868637/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255

Pulled By: driazati

Reviewed By: xman1979, Tianshu-Bao

Differential Revision: D19868637

fbshipit-source-id: 2f1826e6679a786ca209198690269f399a542c04
2020-03-03 16:48:21 -08:00
7da24b36b1 Apply clang-format to RPC files (#34139)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34139

Test Plan: Imported from OSS

Differential Revision: D20227342

Pulled By: mrshenli

fbshipit-source-id: 01b478bde1f6a51f69eb5277fa90ba6ac2d4b5dc
2020-03-03 16:44:35 -08:00
3af0dffe84 Use double quotes in C++ to stay consistent with Python RPC docs (#34095)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34095

Test Plan: Imported from OSS

Differential Revision: D20227343

Pulled By: mrshenli

fbshipit-source-id: 69c556beee1f9e944eb1053b5ff0ac368dd99c60
2020-03-03 16:44:30 -08:00
f1085a8e41 Improve ProcessGroup RpcBackendOptions Constructor API (#34081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34081

Before this commit, applications have to do the following to configure
number of threads in ProcessGroup RPC backend:

```
op = ProcessGroupRpcBackendOptions()
op.rpc_timeout = rpc_timeout
op.init_method = init_method
op.num_send_recv_threads = 32
init_rpc(...., rpc_backend_options=op)
```

After this commit, it can be simplified to:

```
init_rpc(...., rpc_backend_options=ProcessGroupRpcBackendOptions(num_send_recv_threads=32))
```

Fixes #34075

Test Plan: Imported from OSS

Differential Revision: D20227344

Pulled By: mrshenli

fbshipit-source-id: def4318e987179b8c8ecca44d7ff935702c8a6e7
2020-03-03 16:43:29 -08:00
9d1c971b11 [Aten] Suppress valgrind leaks in libcuda (#34169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34169

Valgrind have no insight how memory is being initialized by ioctls()

Test Plan: CI

Reviewed By: seemethere

Differential Revision: D20235974

fbshipit-source-id: 46413afa4842e7d42582bbbda903438b1d98691f
2020-03-03 16:00:17 -08:00
1beb309e03 Make DEBUG == REL_WITH_DEB_INFO on CUDA build (#34153)
Summary:
Related issue: https://github.com/pytorch/pytorch/issues/34079

I don't know how much we care about the difference between `-G` and `-lineinfo` in `DEBUG` vs `REL_WITH_DEB_INFO`, but since `-G` never worked, let's just use `-lineinfo` on both `DEBUG` and `REL_WITH_DEB_INFO`. This would resolve the failure in `DEBUG=1` build. Locally tested to work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34153

Reviewed By: ljk53

Differential Revision: D20232049

Pulled By: ngimel

fbshipit-source-id: 4e48ff818850ba911298b0cc159522f33a305aaa
2020-03-03 15:07:42 -08:00
cb3905e8cf .circleci: Re-do run nightly pipelines on tag (#34148)
Summary:
Commit that this commit relied on was found to be causing issues with
valgrind https://github.com/pytorch/pytorch/issues/33471

Re-does https://github.com/pytorch/pytorch/issues/34078 after revert.

This reverts commit 1aff3e2dd3c3937aa1fedbfeee2143cfca25abcc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34148

Differential Revision: D20234451

Pulled By: seemethere

fbshipit-source-id: cb5e496a3f761beeeb0cc8df71f9ebc0b271737b
2020-03-03 15:00:59 -08:00
7cda964e20 Remove deprecated codepath for old-style autograd.Function (#30696) (#33956)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33956

Test Plan: Imported from OSS

Differential Revision: D20167359

Pulled By: glaringlee

fbshipit-source-id: 9b323bd29eca97bce0475225ad2b3b2ded29005d
2020-03-03 14:58:02 -08:00
04378eb618 [JIT] Add modulelist indexing for integer literal (#29236)
Summary:
Allow indexing into modulelists for integer literals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29236

Differential Revision: D19583935

Pulled By: eellison

fbshipit-source-id: 24d54051422a69769dac5e82f3bf622ded2bd8a6
2020-03-03 14:47:31 -08:00
ba1bd41767 Turn on strict dtype checking for test_torch.py (#33825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33825

Partially addresses #20376

I do this by overriding assertEqual in classes that opt into
this.  This means I have to fix #33821.  The fix is a little
unsatisfactory as idiomatic Python 2 super() calls don't work
(since the class is no longer in scope); hopefully this will just
work when we go to Python 3.

General approach taken:
- A lot of dtype mismatches are because we specified tensor constants
  that infer to some dtype, but the actual dtype needed is something else.
  Those are easy, just annotate the tensor() constructor (often a legacy
  Tensor/FloatTensor call) with dtype
- There are a few cases where the promotion rules are nontrivial.  Some of them
  I just typed out the expected promotion rules manually (based on trial
  and error)
- There are some more complex cases; if it gets too hairy I just
  set exact_dtype=False and nope the fuck out

I don't have time to do it for all the other classes.  But the setup
should work if people just incrementally add the overrides to classes,
and then eventually flip the default.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20125791

Pulled By: ezyang

fbshipit-source-id: 389c2d1efbd93172af02f13e38ac5e92fe730c57
2020-03-03 14:45:53 -08:00
c579976603 Revert D20171428: [profiler] fix chrome tracing for profiler run with cuda
Test Plan: revert-hammer

Differential Revision:
D20171428

Original commit changeset: ec135a154ce3

fbshipit-source-id: 51ef4351a0df33fd087edbca1b7cd753cdbf1fdf
2020-03-03 14:36:01 -08:00
f299c2d6e1 Completely kill CUDA_tensor_apply3 (#34026)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34026

Test Plan: Imported from OSS

Differential Revision: D20196078

Pulled By: VitalyFedyunin

fbshipit-source-id: 502184f412edee90a4f4c030def277a99a7369d4
2020-03-03 14:18:17 -08:00
1affaf8d10 Migrate lerp from CUDA_tensor_apply3 to TensorIterator (#34025)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34025

Test Plan: Imported from OSS

Differential Revision: D20196079

Pulled By: VitalyFedyunin

fbshipit-source-id: 150d1de6632c58850020b73ee72e0ed380072926
2020-03-03 14:18:12 -08:00
27f56632a4 Migrate bce loss from CUDA_tensor_apply3 to TensorIterator (#34023)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34023

Test Plan: Imported from OSS

Differential Revision: D20196084

Pulled By: VitalyFedyunin

fbshipit-source-id: bd000f09139cb848562e5310f10067db85e1b935
2020-03-03 14:16:40 -08:00
92083f31b5 [gloo] dont hold locks in calls to buffer in ProcessGroupGloo:RecvWork::wait() and (#33926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33926

The UnboundBuffer calls here are already protected by a mutex. We only
need to hold the lock while writing the shared structures completed_ and
exception_.
ghstack-source-id: 99315427

Test Plan:
CI

CI

Differential Revision: D20154546

fbshipit-source-id: d1b74508c917b21acdcd0f6a914eb0455437ca0e
2020-03-03 13:28:45 -08:00
c93b1d427c [profiler] fix chrome tracing for profiler run with cuda (#33987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33987

There was an error in
https://github.com/pytorch/pytorch/pull/30724/files that resulted in
`export_chrome_trace` generating invalid JSON. This only came up when the
profiler is run with `use_cuda=True` from what it looks like. In the future, we
should have tests that ensure we generate valid JSON because we no longer use
the json library.

Test Plan: Add UT to validate JSON.

Differential Revision: D20171428

fbshipit-source-id: ec135a154ce33f62b78d98468174dce4cf01fedf
2020-03-03 13:27:26 -08:00
6a97777f72 Remove use of .data from optimizers (#33640)
Summary:
Removes all uses of `.data` from optimizers.

Or tries to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640

Reviewed By: vincentqb

Differential Revision: D20203216

Pulled By: albanD

fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0
2020-03-03 13:21:55 -08:00
f26bbb5f86 [fix] flake8 lint error (#34146)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34146

Test Plan:
.

Imported from OSS

Differential Revision: D20228830

fbshipit-source-id: 41de3c27c10256939ae6309d25b0499f708a3dca
2020-03-03 13:15:27 -08:00
a8fc3d8c2a Fix HistogramObserver to not do detach on input (#34114)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33545, added a unittest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34114

Differential Revision: D20224719

Pulled By: dzhulgakov

fbshipit-source-id: 053d3b3b0c86340027ba1b95b5f3c247aa151aee
2020-03-03 13:15:22 -08:00
9650253d70 [caffe2] fix ambiguous call to 'fmaxType' THCHalfAutoNumerics.cuh (#33569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33569

Clang reported a few places where a call to `fmaxType` is ambiguous. In all cases one of the arguments is `double` and another is `float`. Fix the error by creating a proper value 0 and remove the unneeded `ZERO_MACRO` code.

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20006926

fbshipit-source-id: ca6cfacd57459b1c48eb5080b822d9509b03544d
2020-03-03 13:13:19 -08:00
49586a2a7e fix sph batchnorm to use sph fma
Summary: make use of springhill's fma on SpatialBatchnorm

Test Plan:
re-enabled the unit test, ran it a couple of times
pending: net runner

Reviewed By: amylittleyang

Differential Revision: D20227767

fbshipit-source-id: 7c601f185940249c0a32bdf95d74a20552cd2625
2020-03-03 12:53:08 -08:00
49921cad28 Minimum build should also exclude XNNPACK (#34110)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34110

Differential Revision: D20228129

Pulled By: ezyang

fbshipit-source-id: 24e1482f6a6ff423de966bb7a7a45ad3815791e9
2020-03-03 12:51:37 -08:00
fbc9c61c81 randn and normal_ for complex tensors (#34037)
Summary:
1. randn and normal_ methods will work for complex tensors after this PR
2. added an internal function for viewing complex tensors as float tensors which enables us to reuse functions defined for float tensors for complex tensors with change in arguments passed(like size, standard deviation in case of normal_). currently the resultant new float tensor doesn't share the storage with the input complex tensor which means that the version counter wouldn't be updated if any function is called on this resultant tensor, but once the dtype entry is removed from the storage class, this issue will be resolved.

Side notes:
1. didn't add a separate header for the util functions because of this issue https://github.com/pytorch/pytorch/issues/20686#issuecomment-593002293
2. we should eventually have a public API method view_complex_as_float once (2) mentioned above gets resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34037

Differential Revision: D20221793

Pulled By: anjali411

fbshipit-source-id: a78f5e83d6104e2f55e0b250c4ec32e8d29a14eb
2020-03-03 12:46:01 -08:00
ad2825a2c9 Add API for listing functions overridable by __torch_function__ (#33791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33182

This adds private API functions that developers of types that implement `__torch_function__` can use to ensure full coverage of the subset of the PyTorch API that can be overrided.

I've refactored some of the code in the tests into a new `torch._overrides.get_overridable_functions` function. I've also changed `TENSOR_LIKE_TORCH_OVERRIDES` into `torch._overrides.get_testing_overrides` and `IGNORED_TORCH_FUNCTIONS` into `torch._overrides.get_ignored_functions`. Making these two static global variables in the tests into functions should allow rewriting their implementation to construct their return values instead of just statically defining the return value as is done here. Currently that is blocked on not being able to inspect function signatures of compiled kernels in PyTorch (see https://github.com/pytorch/pytorch/issues/28233). See the docs I've added for usage examples of these new functions. I also refactored the existing override tests to make use of these new functions, which should be a good forcing function to make sure they're kept up-to-date.

Finally, while working on this I discovered that `TestTorchFunctionOverrides.test_mean` and `TestTorchFunctionOverrides.test_mm` weren't ever being run because they were getting clobbered by the other dynamically generated override tests. I fixed that by renaming the tests and then fixing the actual test code. I've verified that all the subclassing semantics is correct and that the updated test answers are correct. I'm happy to put the fixes to the existing tests in as a separate pull request if that would be easier to review.

ping cpuhrsch since the feature request originally came from them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33791

Differential Revision: D20195053

Pulled By: cpuhrsch

fbshipit-source-id: 1585f4e405f5223932b410eae03a288dc8eb627e
2020-03-03 12:40:34 -08:00
358450e02b improved TorchScript traceback (#33834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834

This changes how we report Tracebacks to make them more clear when
there are both serialized and non-serialized ranges. It now looks like:

```
Traceback (most recent call last):
  File "foo.py", line 25, in <module>
    s2(a, b)
  File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 7, in forward
    x: Tensor,
    y: Tensor) -> Tensor:
    return (self).bar(x, y, )
            ~~~~~~~~~ <--- HERE
  def bar(self: __torch__.Moo,
    x: Tensor,
  File "code/__torch__.py", line 11, in bar
    x: Tensor,
    y: Tensor) -> Tensor:
    _0 = (self).baz(x, y, )
          ~~~~~~~~~ <--- HERE
    _1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None)
    return torch.add(_0, _1, alpha=1)
  File "code/__torch__.py", line 17, in baz
    x: Tensor,
    y: Tensor) -> Tensor:
    return torch.add(x, y, alpha=1)
           ~~~~~~~~~ <--- HERE

Traceback of TorchScript, original code (most recent call last):
  File "foo.py", line 11, in forward
    def forward(self, x, y):
        return self.bar(x, y)
               ~~~~~~~~ <--- HERE
  File "foo.py", line 9, in bar
    def bar(self, x, y):
        return self.baz(x, y) + torch.ones(3)
               ~~~~~~~~ <--- HERE
  File "foo.py", line 7, in baz
    def baz(self, x, y):
        return x + y
               ~~~~~ <--- HERE
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1
```

It follows Python convension of having the most important information last
and reading from the bottom up.

Changes:
* Moved the error message to the end, to copy Python
* Report original traceback separate from serialized traceback
* Make sure root functions have names in the interpreter trace.

Test Plan: Imported from OSS

Differential Revision: D20126136

Pulled By: zdevito

fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13
2020-03-03 12:27:38 -08:00
74a0663afd In torch_test, mark every test that takes >5s on a DEBUG CPU-only build as slow test (#33901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33901

After this change, the pytest profile looks like:

4.83s call     test/test_torch.py::TestTorch::test_fft_ifft_rfft_irfft
4.23s call     test/test_torch.py::TestTorch::test_var_dim
4.22s call     test/test_torch.py::TestTorch::test_std_dim
4.19s call     test/test_torch.py::TestTorch::test_max
4.06s call     test/test_torch.py::TestTorch::test_min
3.60s call     test/test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu
2.62s call     test/test_torch.py::TestTorchDeviceTypeCPU::test_pow_cpu
2.60s call     test/test_torch.py::TestTorch::test_matmul_small_brute_force_1d_Nd

And the entire CPU-only test suite can be run in 88s on my Intel(R) Xeon(R) CPU
E5-2650 v4 @ 2.20GHz

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20222288

Pulled By: ezyang

fbshipit-source-id: 4224a9117f42566e290ae202881d76f1545cebec
2020-03-03 11:49:49 -08:00
9b527b35bb CUDA Vectorized Dropout (#33879)
Summary:
Add vectorization to dropout kernels for both reads & writes. Moved the `masked_scale_kernel` implementation to `TensorIterator` to pick up recent autovectorization additions by zasdfgbnm , and wrote a vectorized specialization of the dropout training kernel (along with some fairly conservative dispatch logic).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33879

Differential Revision: D20222853

Pulled By: ngimel

fbshipit-source-id: 711f56ca907fbc792a10d4bf069c28adab7d6ad7
2020-03-03 11:43:45 -08:00
0cf34cf672 [pytorch][mobile] make sure mobile build work with dynamic dispatch (#34038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34038

Mobile build doesn't include autograd/VariableType dispatch. As the
result AutoNonVariableTypeMode needs to be set in mobile runtime.

With static dispatch this works is done inside generated jit-dispatch
code - AutoNonVariableTypeMode needs to be set on per-op basis. Setting
it globally or setting it for wrong ops might break some `is_variable()`
checks in the codebase.

Thanks to the unification of Variable class and Tensor class, all
is_variable() checks have been removed, so AutoNonVariableTypeMode can
be set globally now.

We never tested inference-only mobile build with dynamic dispatch. It
seems that dynamic dispatch also requires setting AutoNonVariableTypeMode
for our mobile build (where VariableType functions are not registered).

Verified the end-to-end test works with this change:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Test Plan: Imported from OSS

Differential Revision: D20193329

Pulled By: ljk53

fbshipit-source-id: cc98414d89d12463dc82b0cdde0b6160dafc0349
2020-03-03 11:34:08 -08:00
51936c5ea4 [pytorch][CI] end-to-end custom build script (#34012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34012

Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Test Plan: Imported from OSS

Differential Revision: D20193328

Pulled By: ljk53

fbshipit-source-id: 48c14cae849fde86e27123f00f9911996c1cf40e
2020-03-03 10:55:17 -08:00
5b9f1ada30 [quant][graphmode] Observing input/output values in call site (#33277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33277

Currently we insert observer in the called graph, which is incorrect since graphs can be shared
and the decision of whether to insert observer or not might dependend on where the graph is called.
For example, for a call sequence `self.conv1(self.conv2(x))`, we can't inserting observer correctly
if `self.conv1` and `self.conv2` are sharing the same type in the current implementation, because we insert
observer in the graph of the forward method of Conv2d right now and this call sequence requires us to insert
only one observer for the output of self.conv1/input of self.conv2.
We'll need to insert observers for input/output values of the graph in call site instead.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20208787

fbshipit-source-id: 739e1d877639c0d0ed24e573bbd36211defa6836
2020-03-03 10:53:24 -08:00
7289e8e865 [caffe2] std::numeric_limits<double>::quiet_NaN() use instead of ::nan("") (#33566)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33566

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20006447

fbshipit-source-id: ec522bc2065ad033ee2eeedd26d4a8a7a27e5f56
2020-03-03 10:42:58 -08:00
1702152ef9 fixup unit tests (#34105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105

make parallel_net_test.cc chronos conforming.
exclude gtest asserts that check thrown exceptions when exceptions are disabled.

Test Plan: CI green

Differential Revision: D20153525

fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0
2020-03-03 10:33:21 -08:00
5082839de5 Migrate Lerp from CUDA_tensor_apply4 to TensorIterator (#33994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33994

Test Plan: Imported from OSS

Differential Revision: D20196788

Pulled By: VitalyFedyunin

fbshipit-source-id: e5e281460e8cca7ea3911fe56549e1ab62d50e76
2020-03-03 09:38:49 -08:00
4074d559e4 Migrate kl_div_backward from CUDA_tensor_apply3 to TensorIterator (#34022)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34022

Test Plan: Imported from OSS

Differential Revision: D20196080

Pulled By: VitalyFedyunin

fbshipit-source-id: 265884dc01c3260197776ee5baaadbe6b523fede
2020-03-03 09:33:31 -08:00
3def76583a [RESUBMIT] [pytorch] Migrating index_add cuda to ATen (#33548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33548

Mostly just moved code.
Index dim and number of indices checks are added to make checks idential to index_add_cpu_

This is a resubmit of #30573, which got reverted.

Test Plan: Imported from OSS

Differential Revision: D20002248

Pulled By: gchanan

fbshipit-source-id: 46df4047cb3fc1dff37a15b83c70b2cbb7a6460b
2020-03-03 09:06:13 -08:00
f29110fdf8 [pytorch] blas gemm fix for k=0 (#33819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33819

These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true.

Resubmit of https://github.com/pytorch/pytorch/pull/33419 (which got reverted due to a problem with XLA, but which now has been fixed)
ghstack-source-id: 99333280

Test Plan: Test included

Differential Revision: D20121460

fbshipit-source-id: c1056b8e26751e24078bbe80c7cb4b223bcca7cb
2020-03-03 08:56:05 -08:00
b1fd7ba019 Revert D20169501: [pytorch][PR] .circleci: Add CUDA 10.2 to our CI pipeline
Test Plan: revert-hammer

Differential Revision:
D20169501

Original commit changeset: 43b7ca680200

fbshipit-source-id: dbeb0315ccc06b8e082d019cd1ffcd97e1d38e04
2020-03-03 08:15:36 -08:00
1aff3e2dd3 Revert D20204104: [pytorch][PR] .circleci: Add filter to run nightly builds on tag
Test Plan: revert-hammer

Differential Revision:
D20204104

Original commit changeset: 685630e8a04b

fbshipit-source-id: 1f4c890b0b199b406bac51e30febb8c6482e7e31
2020-03-03 08:03:03 -08:00
cyy
5be8a4e027 find mkl installed by nuget (#34031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34031

Differential Revision: D20221807

Pulled By: ezyang

fbshipit-source-id: 827e2775956f408febb287676bbf9a96a70fe2d4
2020-03-03 07:44:20 -08:00
a23e8099dd Fix typo (#34008)
Summary:
This PR removes apparently unnecessary dots in the documentation of `torch.t`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34008

Differential Revision: D20195084

Pulled By: ezyang

fbshipit-source-id: a34022de6b7a32d05a0bb3da197ee3507f4b8d8e
2020-03-03 07:38:40 -08:00
2ce9d26809 Support cdf for mixture_same_family distribution (#33408)
Summary:
The new added mixture_same_family should support cdf if the family has cdf implemented.

This is very useful for flow models where cdf of mixture of gassian/logistic is used to model flow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33408

Differential Revision: D20191552

Pulled By: ezyang

fbshipit-source-id: 0bfd7973aa335c162919398a12ddec7425712297
2020-03-03 07:31:24 -08:00
e0b90b87a4 [C2] Fix slowness of the ReshapeOp. (#33729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33729

ReshapeOp is doing some useless movements of data between CPU and GPU, which results in crazy amount of kernel calls from this operator. Which makes this operator ridiculosly slow compared to BatchMatMul for cases of pretty cheap models (for example on some versions of GAT).

This diff is moving ReshapeOp to leverage CPU storage and reduce amount of kernel calls from num_dims + 3 calls for case of 3-D
tensor to 2 calls.

Test Plan:
Unit-tests are still passing.

TODO: perf testing

Reviewed By: akyrola

Differential Revision: D19659491

fbshipit-source-id: 2341b21e57208b988169f2df5fb598be3dc8acb2
2020-03-03 00:44:22 -08:00
0afee0c20b [rpc][metrics] add initial metric handler classes. (#33153)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33153

Test Plan: Added unit tests.

Reviewed By: pritamdamania87

Differential Revision: D19615364

fbshipit-source-id: e0447463651390b08ad48e134cb73764d8dcf4f3
2020-03-02 22:03:12 -08:00
0689cf8fc1 [c10] Make __assert_fail CUDA definition compilable with clang host compiler (#34102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34102

if nvcc is invoked with clang host compiler, it will fail with the following error due to the decorators mismatch defined in cuda and c10:
```
 error: attribute "noreturn" did not appear on original declaration
```

Test Plan: Build pytorch with clang

Reviewed By: EscapeZero

Differential Revision: D20204951

fbshipit-source-id: ff7cef0db43436e50590cb4bbf1ae7302c1440fa
2020-03-02 20:11:49 -08:00
cyy
8a14b41617 fix warnings reported by PVS (#33868)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868

Differential Revision: D20169059

Pulled By: ailzhang

fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd
2020-03-02 18:51:38 -08:00
0729ad733d Change lint from python2 -> python3 (#34107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34107

Updates linter to only lint for python3 instead of linting for python2

Test Plan: good_testplan

Reviewed By: orionr

Differential Revision: D20205395

fbshipit-source-id: 1fa34e5fdf15f7aed96a66d2ce824a7337ee6218
2020-03-02 18:11:42 -08:00
f909b5535e [autograd] fix allow_unused checking for C++ API (#34035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34035

Bug for the conditon check in https://github.com/pytorch/pytorch/pull/24342, realized we don't have tests in either
python or cpp to catch this, so added testes for both python and cpp.

Thanks hczhu on capturing it!

Test Plan: Imported from OSS

Differential Revision: D20198837

Pulled By: wanchaol

fbshipit-source-id: 33846a14c0a8e7aac2e8328189d10c38a0d7e6ee
2020-03-02 17:57:15 -08:00
0759191f12 blacklist spatialBN until bitwise matching (#34092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34092

Disable op in transform map until we get bitwise matching to ice-ref

Test Plan: CI

Reviewed By: hyuen

Differential Revision: D20177936

fbshipit-source-id: e316384184cb264852e63e5edce721a8614742d1
2020-03-02 17:55:00 -08:00
3b93928ada .circleci: Add filter to run nightly builds on tag (#34078)
Summary:
## What this will do:

When the repository is tagged the current nightly build pipelines will run and upload to the `test` subdirectory in our S3 bucket for `download.pytorch.org`. Will also upload to the correct organization on anaconda [pytorch-nightly](https://anaconda.org/pytorch-test)

This is only meant for release candidates and will actually not run on any tag that does not match the release candidate regex.

This has been tested on a small scale with: 3ebe0ff2f8

## Related PRs:
* `.circleci: Divert packages to test channel on tag`: https://github.com/pytorch/pytorch/pull/33842
* `.cirlceci: Swap PYTORCH_BUILD_VERSION if on tag`: https://github.com/pytorch/pytorch/pull/33326

## Work to be done later:
- [ ] Figure out how to remove manual step of updating s3 html indices.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34078

Differential Revision: D20204104

Pulled By: seemethere

fbshipit-source-id: 685630e8a04b19fc17374585e9228a13a8c3e407
2020-03-02 17:20:21 -08:00
ad3f4a32bd [pytorch][buck] fix selective buck build (#34090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34090

Update the per-op-registration template file to use the new c10 registration API.
ghstack-source-id: 99318973

Test Plan:
```
buck build -c pt.selective_build=1 \
fbandroid/mode/dev_clang_libcxx fbandroid/mode/server \
xplat/caffe2/fb/lite_predictor:lite_predictor_resnet
```

Differential Revision: D20200452

fbshipit-source-id: dc619cb6bdfc0c787b87475eb24b6a2da29e70e2
2020-03-02 17:13:08 -08:00
1ed950e1b6 [distributed] skip use_ignore_output tests in c10d if not built with gloo (#33513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33513

These tests require gloo so like the other tests, they should be
skipped if not building with gloo. Otherwise they crash on Mac if not built
with gloo currently.

verified that it does not crash anymore with this PR.
ghstack-source-id: 99303707

Test Plan: Built on Mac and verified that the tests do not fail.

Differential Revision: D19976908

fbshipit-source-id: 6a2a70c3eab83efd0e188e86cabe56de4a869f4d
2020-03-02 16:43:21 -08:00
ff1fc402a8 Migrate dirichlet from CUDA_tensor_apply3 to TensorIterator (#34021)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34021

Test Plan: Imported from OSS

Differential Revision: D20196082

Pulled By: VitalyFedyunin

fbshipit-source-id: 9736a0ebbc529975e95a4f996dbc28e070cf1e63
2020-03-02 16:31:32 -08:00
77b9016a8e Migrate gamma grad from CUDA_tensor_apply3 to TensorIterator (#34020)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34020

Test Plan: Imported from OSS

Differential Revision: D20196083

Pulled By: VitalyFedyunin

fbshipit-source-id: 8659bc004678a656071263c94e929f2e1a686812
2020-03-02 16:29:45 -08:00
bb4465f9f5 .circleci: Add CUDA 10.2 to our CI pipeline (#33471)
Summary:
Adds support for CUDA 10.2 builds on our nightly pipelines / regular test pipeliens.

Depends on https://github.com/pytorch/builder/pull/404
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33471

Test Plan: sandcastle_will_deliver

Reviewed By: ezyang

Differential Revision: D20169501

Pulled By: seemethere

fbshipit-source-id: 43b7ca680200a67fa88ad4f7b5a121954c9f089d
2020-03-02 15:50:48 -08:00
b874c039f6 Allow checking for cached module before asserting (#33954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954

fixes caffe2/core/module_test.cc on windows
misc lint fixes.

Test Plan: CI green

Reviewed By: malfet

Differential Revision: D20153512

fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0
2020-03-02 15:43:50 -08:00
a4716d0e26 Fix lint (#34094)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34094

Pulled By: driazati

Differential Revision: D20201433

fbshipit-source-id: d8292b329aebd232556db517b71daeee3f266bfc
2020-03-02 15:34:52 -08:00
c206b4398d Show errors from the tasks in the thread pool (#33938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33938

Making sure we don't silently ignore exceptions from the tasks in the
thread pool

Test Plan: python setup.py clean && python setup.py develop install

Differential Revision: D20178603

Pulled By: ilia-cher

fbshipit-source-id: 34971032205a1a53fb7419ed84ebb986f9e959ad
2020-03-02 14:49:52 -08:00
a57a7b4c29 Change input value in examples of BCEWithLogitsLoss (#34053)
Summary:
In the examples of `BCEWithLogitsLoss`, `0.999` is passed as the prediction value. The value `0.999` seems to be a probability, but actually it's not. I think it's better to pass a value that is greater than 1, not to confuse readers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34053

Differential Revision: D20195456

Pulled By: ezyang

fbshipit-source-id: 3abbda6232ee1ab141d202d0ce1177526ad59c53
2020-03-02 14:35:56 -08:00
15bf4892f2 prevent crash on exit from static destructor race (#33955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33955

unit tests on windows (clang and cl) were crashing on exit due to racing with static variable destruction.

Test Plan: CI green

Differential Revision: D20153587

fbshipit-source-id: 22e35e591660d49f3a755f93d0c14d7a023ebb2a
2020-03-02 14:28:13 -08:00
e568c039bd Enable Tensor.random_(from, to) for half on CPU (#34030)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34030

Test Plan: Imported from OSS

Differential Revision: D20182412

Pulled By: pbelevich

fbshipit-source-id: b7439e6d66e1c0b9ffa8b397cab057c9146f5714
2020-03-02 14:22:35 -08:00
384a4feab6 Fix bad math typesetting (#34027)
Summary:
Fixing documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34027

Differential Revision: D20195235

Pulled By: ezyang

fbshipit-source-id: 0281bc0e8718e700e0982ced1342969b367ba57c
2020-03-02 14:16:34 -08:00
11843049d5 [jit] Fix flipped PackedSequence outputs in script (#32955)
Summary:
Stacked PRs
 * **#32955 - [jit] Fix flipped PackedSequence outputs in script**
 * #32953 - [jit] Support properties on `Device`

Fixes #32605

Pull Request resolved: https://github.com/pytorch/pytorch/pull/32955

Pulled By: driazati

Differential Revision: D20165514

fbshipit-source-id: a130c438b40e51ec27d36f021b0dc7869570aa6a
2020-03-02 13:50:36 -08:00
45c45195cd Remove warning about building from source to use the NCCL backend (#34051)
Summary:
I think this warning isn't true anymore, and the NCCL backend works without PyTorch needing to be built from source.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34051

Differential Revision: D20195310

Pulled By: ezyang

fbshipit-source-id: 14f879a8c43ea5efdbdf0f638792ea2b90011f4a
2020-03-02 13:43:43 -08:00
51d969e86a preprocessor cleanup (#33957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957

lots of small preprocessor warning cleanup for windows

Test Plan: CI green

Reviewed By: malfet, albanD

Differential Revision: D20153582

fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04
2020-03-02 13:37:19 -08:00
4b3ae7e0af Enable -Werror=format compile errors on torch exception types (#34019)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33899

In the issue, we have
```
TypeError("expected %s (got %s)", dispatch_key, toString(other.key_set()).c_str());
```
which results in `dispatch_key` being interpreted as a c-string by `sprintf`. Adding `__attrbute__((format))` to the `TypeError` constructor allows gcc or clang to detect this at compile time. Then `-Werror=format` makes it a hard error at compile time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34019

Differential Revision: D20194842

Pulled By: ezyang

fbshipit-source-id: fa4448916c309d91e3d949fa65bb3aa7cca5c6a8
2020-03-02 13:25:39 -08:00
9239608037 fix windows clang attributes (#33959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959

make sure clang on windows uses correct attributes.
add support for cl.exe style pragma attributes

Test Plan: CI green

Differential Revision: D20153548

fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1
2020-03-02 13:20:51 -08:00
87b3f87f27 Migrate prelu from CUDA_tensor_apply2 to TensorIterator (#34003)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34003

Test Plan: Imported from OSS

Differential Revision: D20196994

Pulled By: VitalyFedyunin

fbshipit-source-id: 1749a968b1ec6636e08c11c93de43b5599e7cf4b
2020-03-02 12:49:32 -08:00
9956a231b9 Fix backward compatibility tests (#34071)
Summary:
1. As RRef has been added as a JIT type in https://github.com/pytorch/pytorch/issues/32992, we no longer need to skip them
2. Nightly now knows about Any
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34071

Reviewed By: houseroad

Differential Revision: D20196963

Pulled By: mrshenli

fbshipit-source-id: 1ea79c5682e8be9087b9cb74104e1b914c3fc456
2020-03-02 12:42:33 -08:00
ec0f2184ba clang intrinsics targeting (#33958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33958

look for clang intrinsic headers on windows

Test Plan: CI green

Differential Revision: D20153573

fbshipit-source-id: c87da3b0e9950d3df0bf8350df8ae592064d6613
2020-03-02 12:37:07 -08:00
ba4cff2ffc [dtype inference] Following pytorch default for float vs double (#33713)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33713

Differential Revision: D20193387

Pulled By: anjali411

fbshipit-source-id: d802ec395df4e75e2be02e91d7288ae6fb7cf8e0
2020-03-02 11:56:34 -08:00
cab8772c6c Freezing Torchscript modules (#32178)
Summary:
This patch enables folding GetAttr nodes with their corresponding
values. _jit_pass_freeze_module API returns a new TorchScipt module
where all function calls and get attributes are inlined.
Usage:

frozen_model = torch._C._freeze_module(scrited_model._c)
frozen_model.forward(...)

This API currently optimizes the forward method. We will follow up to
to preserve and optimize methods and attributes that are annotated as
 torch.jit.interface.

Several future improvements to JIT optimizations are required to maximize
clean up/de-sugar the graph and eliminate redundancies.
Ideally, we want to produce a graph that can easily be lowered to
GLOW and other low-level backends.
__
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178

Differential Revision: D19419640

Pulled By: bzinodev

fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b
2020-03-02 11:38:36 -08:00
e73d4286b0 Fix conflict between XNNPACK's clog dependency and our cpuinfo dependency (#33922)
Summary:
Currently if we run

```bash
DEBUG=1 ONNX_ML=0 MAX_JOBS=8 CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache CMAKE_CUDA_COMPILER_LAUNCHER=ccache USE_OPENMP=0 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_NCCL=0 USE_CUDA=1 USE_CUDNN=0 USE_STATIC_CUDNN=0 USE_NNPACK=0 USE_QNNPACK=0 USE_FBGEMM=0 BUILD_TEST=0 TORCH_CUDA_ARCH_LIST="6.1" python setup.py develop --cmake-only
```

then `touch build/CMakeCache.txt` (which adjusting build options will
do), then `python setup.py develop`, the following error message will
show up:

```
CMake Error at build/clog-source/CMakeLists.txt:249 (ADD_SUBDIRECTORY):
ADD_SUBDIRECTORY not given a binary directory but the given source
directory "/home/hong/wsrc/pytorch/build/clog-source" is not a subdirectory
of "/home/hong/wsrc/pytorch/build/clog-source".  When specifying an
out-of-tree source a binary directory must be explicitly specified.
```

This is due to a conflict between our cpuinfo submodule and XNNPACK's
external clog dependency. Moving our cpuinfo upward and setting
CLOG_SOURCE_DIR resolves the issue.

 ---

Also reverted https://github.com/pytorch/pytorch/issues/33947 , where `CLOG_SOURCE_DIR` as an option is not quite appropriate (given that cpuinfo uses its included clog subdir) and the setting of this variable should be a bit later when the dir of cpuinfo is known.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33922

Differential Revision: D20193572

Pulled By: ezyang

fbshipit-source-id: 7cdbdc947a6c7e0ef10df33feccb5b20e1b3ba43
2020-03-02 10:40:12 -08:00
Jie
e54b8e1a47 [CUDNN NHWC CONVOLUTION] Re-stride input tensors for wgrad in cudnn_convolution (#33784)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33784

Differential Revision: D20127485

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d893ffe7ff9499e7e9a7e8bed720e9441d1018e
2020-03-02 10:05:59 -08:00
31737e989d [aten] remove shadowed declaration warning (#34014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34014

Remove warning
```
caffe2/aten/src/ATen/core/op_registration/op_registration.h: In lambda function:
caffe2/aten/src/ATen/core/op_registration/op_registration.h:704:47: warning: declaration of ‘c10::DeviceType t’ shadows a parameter [-Wshadow=compatible-local]
   auto deviceTypeToDispatchKey = [](DeviceType t){
                                               ^
caffe2/aten/src/ATen/core/op_registration/op_registration.h:703:21: note: shadowed declaration is here
 inline CppFunction dispatch(DeviceType t, Func&& raw_f) {
          ~~~~~~~~~~~^
```

Test Plan: CI

Reviewed By: dzhulgakov

Differential Revision: D20181155

fbshipit-source-id: 41947d171369b9bd7a87e3e367492f9b2165fd6b
2020-03-02 09:22:13 -08:00
ad17dafc50 [caffe2] Remove python2 from operator_test (#33977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33977

Removing python2 from operator_test so we can retire python2 support for PyTorch.

Test Plan: waitforsandcastle

Reviewed By: seemethere

Differential Revision: D20129500

fbshipit-source-id: d4c82e4acfc795be9bec6a162c713e37ffb9f5ff
2020-03-02 08:55:53 -08:00
f4532d7542 Fix typo (#33925)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33925

Differential Revision: D20171970

Pulled By: vincentqb

fbshipit-source-id: 5c1a8553760f74cecebaea7e88463b767ab81211
2020-03-02 08:13:55 -08:00
71f8624ecb Revert D19153199: [ATen] Remove AT_ASSERTM from Blob::free_()
Test Plan: revert-hammer

Differential Revision:
D19153199

Original commit changeset: f93983d5bf32

fbshipit-source-id: d79cf659f3cb26427196b9d9d1fe44e15874ad79
2020-03-02 07:35:35 -08:00
6631c2a627 [doc] Add grad context manager doc to toplevel torch module. (#33877)
Summary:
fixes https://github.com/pytorch/pytorch/issues/32014
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33877

Differential Revision: D20141801

Pulled By: albanD

fbshipit-source-id: bac713382a71666dd5e2499f710c51a55cc579ba
2020-03-02 06:32:36 -08:00
a500491cbc Fix index_put when tensor length > int_max (#33753)
Summary:
This PR would fix https://github.com/pytorch/pytorch/issues/33345.

The original CUDA kernel looks good. I changed most appearances of `int` to `int64_t` to avoid the CUDA memory access issue. Removed the two `TORCH_CHECK`. Added a unit test.

cc csarofeen ngimel ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33753

Differential Revision: D20185005

Pulled By: ngimel

fbshipit-source-id: ef0abdc12ea680e10fe6b85266e2773c7a272f0d
2020-03-01 21:51:23 -08:00
f857fe18cd [ATen] Remove AT_ASSERTM from Blob::free_() (#33929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33929

`Blob::~Blob()` calls `Blob::free_()`. `Blob::free_()` throws and destructors should not throw.

A few other minor tweaks include:
- Remove `static_cast<void*>()` in `ShareExternal`
- Remove default values of `pointer_` and `has_ownership_`

Test Plan:
```
buck test caffe2/caffe2:caffe2_test_cpu
```

https://our.intern.facebook.com/intern/ads/canary/424941782651397826
https://our.intern.facebook.com/intern/ads/canary/424941799628450155

Reviewed By: yinghai

Differential Revision: D19153199

fbshipit-source-id: f93983d5bf324b9a464ad2d1ed0dba13f807d2f6
2020-03-01 21:09:04 -08:00
e017b1e9fb Updating submodules
Summary:
GitHub commits:

af57f36db0

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: 4bd71218aee5e2a20a3496f2a51d464a19c0f879
2020-03-01 20:54:32 -08:00
ad769d74d9 Collapse _like overloads into a single overload. (#33705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33705

The fact that there were two overloads appears to be a historical
artifact that dates back to when goldsborough originally added these
bindings in the first place.  If TensorOptions is made optional,
then you only need one overload, not two, as they are exactly redundant
with each other.  When MemoryFormat was added, it was made a little
harder to do this, as the C++ syntax at::empty_like(t, memory_format) would
not work if you collapsed the overload; but now it works because TensorOptions
supports MemoryFormat.

The upshot is, I can get rid of all the overloads and just have one overload.
Amazingly, this change is backwards compatible, as the test attests.  While
I was at it, I also deleted the overload name from the functions entirely.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20073355

Pulled By: bhosmer

fbshipit-source-id: c6a8908213b32ccf6737ea864d135e2cce34f56b
2020-03-01 19:40:22 -08:00
b98bce8cd4 Add MemoryFormat to TensorOptions, but not codegen. (#33704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33704

This diff adds MemoryFormat field to TensorOptions, and teaches
all kernels that take TensorOptions to respect it, but doesn't
teach the codegen about it.  As such, it is now possible to specify
memory_format using TensorOptions syntax, e.g.,
at::empty_like(tensor, at::memory_format(MemoryFormat::Contiguous))
in the C++ API, but there isn't any other user visible effect.

The intended end state of this diff stack is to eliminate the
explicit MemoryFormat? arguments from native functions, but
as this change has BC implications I'd prefer to do it separately.
So this starts things off with a non-BC breaking addition to the
API.  For all internal functions that are not bound by codegen,
I switch them to exclusively using TensorOptions (eliminating
MemoryFormat); there's only a few, mostly quantized and to().

To keep things screwed down in the short term, it is a HARD ERROR
to specify both the explicit MemoryFormat argument as well as
TensorOptions.  This caught a few errors in my diff where I needed
to modify memory format settings and then call code later, esp
in empty_like.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20073356

Pulled By: bhosmer

fbshipit-source-id: 18d310d7ee7cf2ee182994104652afcfc9d613e2
2020-03-01 18:22:12 -08:00
9f7708eecb Updating submodules
Summary:
GitHub commits:

8c1badaa4a
ce1ee42199
b23caba073
aa48f50c9a
f7695cddae
8a386d9549
baab5386e2

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: 6c036499de97418afd9337979e89365ce13ceee7
2020-03-01 16:05:00 -08:00
15caf3b516 move test helper functions out of test funciton (#33960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33960

test helper functions should be out of test function. it is possible process 2 launches test functions slower than process 1, and process 1 sends request to run a helper function on process 2. process 2 may have not compile the helper function yet when process 2 starts to serve processs 1's request, and thus may return error like "attempted to get undefined function"
ghstack-source-id: 99205620

Test Plan: test_remote_script_module was flaky for thrift backend in my local stress test runs, due to error "attempted to get undefined function". With fix in this diff, stress runs passed

Differential Revision: D20167969

fbshipit-source-id: 8a2b9cd7bd62462e24bdbcb69ad32dca745d6956
2020-03-01 14:16:56 -08:00
84ec5357d3 Make HashNode visible (#34045)
Summary:
HashNode and CompareNode are useful functions for hanlding jit::Node. This is to unblock https://github.com/pytorch/glow/pull/4235.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34045

Reviewed By: houseroad

Differential Revision: D20184733

Pulled By: yinghai

fbshipit-source-id: 6c829f2f111a490fd2d85017475c1731cd97fb20
2020-03-01 12:28:18 -08:00
ace2b4f37f [resubmit] try to infer rref type from python (#33992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33992

resubmit of https://github.com/pytorch/pytorch/pull/33369 with tweaks on when the rref type being created to ensure ivalue->type() hold the correct RRef type inside of inner element type.

Test Plan: Imported from OSS

Differential Revision: D20175043

Pulled By: wanchaol

fbshipit-source-id: a08b178e989c995632374e6c868d23c5a85526ae
2020-02-29 20:26:40 -08:00
7747fe81c4 reuse named tensor error message in generated code (#33536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33536

Simple fix, merge the identical string literals that were being inlined into every wrapper for ops that don't support named tensors. E.g.
```
Tensor all(const Tensor & self, int64_t dim, bool keepdim) {
    if (self.has_names()) {
        AT_ERROR(
            "all is not yet supported with named tensors. Please drop names via "
            "`tensor = tensor.rename(None)`, call the op with an unnamed tensor, "
            "and set names on the result of the operation.");
    }
    const OptionalDeviceGuard device_guard(device_of(self));
    return at::native::all(self, dim, keepdim);
}
```
becomes
```
Tensor all(const Tensor & self, int64_t dim, bool keepdim) {
    if (self.has_names()) {
        AT_ERROR("all", named_tensors_unsupported_error);
    }
    const OptionalDeviceGuard device_guard(device_of(self));
    return at::native::all(self, dim, keepdim);
}
```

Also updated the generated file comments to include the source template names, e.g.
```
// generated by aten/src/ATen/gen.py from TypeDefault.cpp
```

Test Plan: Imported from OSS

Differential Revision: D19993407

Pulled By: bhosmer

fbshipit-source-id: 88395a649e6ba53191332344123555c217c5eb40
2020-02-29 17:00:13 -08:00
7f7ea685c0 Revert D18672405: Use codegen'ed unboxing wrappers
Test Plan: revert-hammer

Differential Revision:
D18672405

Original commit changeset: bf2a7056082d

fbshipit-source-id: b7ef1529fc266b4856e49e4dbd1fe8c7ba3d455d
2020-02-29 15:27:54 -08:00
3acfccafbb Revert D20172782: Fix mobile build
Test Plan: revert-hammer

Differential Revision:
D20172782

Original commit changeset: e4bfca2a6076

fbshipit-source-id: 3093efd4a135f8d6c3174887ad1e3362aad9aa7c
2020-02-29 15:21:07 -08:00
595445e889 Revert D20178827: Fix mobile build
Test Plan: revert-hammer

Differential Revision:
D20178827

Original commit changeset: 980ac3d1ab3d

fbshipit-source-id: 9af6cb319e80c9b6a916bbdeffd69920075c7aec
2020-02-29 15:04:35 -08:00
c596ec7eb3 [pytorch] update code analyzer script to cover new c10::Module::def API (#33975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33975

Currently the code analysis script doesn't go beyond the scope of the
registration API call, i.e. calling registration via a wrapper will not
be covered by the analysis - currently the new API is essentially a
wrapper around old API.

Simply adding the new API signature to the registration API pattern can
solve the problem for now. We might need change the analyzer code if
things change significantly in the future.

Test Plan:
- update test project to use the new API;
- run analyzer against pytorch codebase;

Differential Revision: D20169549

Pulled By: ljk53

fbshipit-source-id: c7925fa0486eee18f07e791a38c32152fee59004
2020-02-29 10:29:45 -08:00
5a8562a6af Fix mobile build (#34000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34000

-
ghstack-source-id: 99241400

Test Plan: liujiakai

Differential Revision: D20178827

fbshipit-source-id: 980ac3d1ab3d47c12613c20ee9b8dc7d083f56a9
2020-02-28 23:28:00 -08:00
1494005cfd C++ tensor indexing: more indexing tests (#30427)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30427

Test Plan: Imported from OSS

Differential Revision: D18695899

Pulled By: yf225

fbshipit-source-id: 74455fe52ef922556fabe65aefca9ec93fe2346d
2020-02-28 22:07:41 -08:00
0e52627358 Fixing pthreadpool symbol conflict issue. (#33869)
Summary:
Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that
is conflicting, to pthread_create_c2.
Removed 2 other conflicting symbols that are not used internally at all.
Pointing XNNPACK to original repo instead of the fork.

Copy pasted the new interface and implementation to
caff2/utils/threadpool, so that for internal builds we compile against
this.

When threadpool is unified this will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869

Differential Revision: D20140580

Pulled By: kimishpatel

fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3
2020-02-28 21:23:18 -08:00
85b1c45a45 [JIT] fix alias assertion (#33952)
Summary:
This bug has been hit a couple times recently. We need to handle all bivariant types, not just optional, when asserting mutability/immutability of pointed-to elements in alias analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33952

Differential Revision: D20166025

Pulled By: eellison

fbshipit-source-id: cf3df9897a639641ef8303a08ba2b13523d01ef1
2020-02-28 19:54:29 -08:00
2111c4ff0c [jit] Add missing tensor properties (#33906)
Summary:
Fixes #30775

This adds TorchScript implementations (copied from `python_variable.cpp`) for the remainin `Tensor` properties that were missing from the jit, in addition to a test that ensures new properties will trigger a failure so we can decide whether we want to add them as well.

For `some_tensor`, adds:

* `some_tensor.T`
* `some_tensor.ndim`
* `some_tensor.is_leaf`
* `some_tensor.name`
](https://our.intern.facebook.com/intern/diff/20153288/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33906

Pulled By: driazati

Differential Revision: D20153288

fbshipit-source-id: 2ddc48a14267077bc176065267e5ce52181b3d6b
2020-02-28 19:06:11 -08:00
6e70b2da62 Fix mobile build (#33985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33985

This was broken by https://github.com/pytorch/pytorch/pull/32521 but only showed up in master CI builds
ghstack-source-id: 99220995

Test Plan: CI

Differential Revision: D20172782

fbshipit-source-id: e4bfca2a6076f1bc1c562fca9c7dfcb156bfbf3e
2020-02-28 18:43:18 -08:00
2f6ffe8c39 [jit] Resolve type annotation names to types (#29623)
Summary:
This adds some machinery so that we use Python to resolve types to a value and the corresponding resolution logic in `annotations.py` instead of using the string.

This PR also `slowTests` a random test since it was taking > 1 min whereas all the other tests take < 10 seconds.

Fixes #31864
Fixes #31950
](https://our.intern.facebook.com/intern/diff/20144407/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29623

Pulled By: driazati

Differential Revision: D20144407

fbshipit-source-id: ef3699f6b86039d8b4646ffc42c21bd1132d1681
2020-02-28 18:35:10 -08:00
55b44f6746 Throw an exception when method cannot be found from mobile module. (#33972)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33972

Test Plan: Imported from OSS

Differential Revision: D20168965

Pulled By: iseeyuan

fbshipit-source-id: 2efe5dcb1fb80407cd88a47c50cb382ecd8aa275
2020-02-28 18:28:09 -08:00
de55e47a4b Pass all ops to XLA with additional info about whether it's compound (#33908)
Summary:
This PR prepares us to allow XLA use `XLAPreAutograd` to override compound ops.
To do this we'll need to pass all ops, with additional infomation about whether it's compound or not for XLA to parse.
Companion PR: https://github.com/pytorch/xla/pull/1698
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33908

Differential Revision: D20149585

Pulled By: ailzhang

fbshipit-source-id: a93140e8a34548fcabcea454386d15df58177c1d
2020-02-28 18:17:23 -08:00
38b6cb479b Check fuser results when profiling (#33944)
Summary:
With the profiling executor enabled the fuser won't be invoked until the second pass over a script function, so some of these tests weren't correctly comparing the fused output with the interpreter output.  I've used the `checkScript` method where applicable, which seems to do the right thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33944

Test Plan: Locally inject obvious errors into the fuser and verify that the updated tests fail when they're supposed to.

Differential Revision: D20162320

Pulled By: bertmaher

fbshipit-source-id: 4a2f3f2d2ff1d81f23db504dc8cd0d5417bdcc50
2020-02-28 17:01:34 -08:00
4377061baf [caffe2] fix atomicAdd redeclaration Clang error (#33559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33559

For sm_60+ CUDA supports `atomicAdd(double*, double*)` function and for lower compute capabilities the CUDA C Programming Guide [1] suggest a user implementation as in this code. On the other side, Clang's CUDA wrappers unconditionally define this function, regardless of compute capability, and merit an error if it actually get's used.

So the problem is: when Clang is used for < sm_60, CUDA's `atomicAdd(double*, double*)` cannot be used and it cannot be redeclared in standard compliant C++.

Workaround the problem by using Clang's `enable_if` attribute [2], which has a side effect of function redeclaration.

1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions
2. https://clang.llvm.org/docs/AttributeReference.html#enable-if

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20005113

fbshipit-source-id: d0d4bd6514f201af9cdeba1229bd9b798df0d02e
2020-02-28 15:48:19 -08:00
4fb8679218 [caffe2] fix field initialization after base Clang errors (#33556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33556

Fix several places exposed by Clang where order of member initializer list doesn't actually match the actual initialization order. The fix is to simply reorder member initializer lists.

Also accepted formatting changes suggested by clang-format linter.

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20004834

fbshipit-source-id: b61c7c3f1fe8413bbb3512f6b62177a3ddf67682
2020-02-28 15:42:49 -08:00
991f7a20f2 Use clog from cpuinfo/deps instead of downloading (#33947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33947

XNNPACK was downloading clog because we weren't setting CLOG_SOURCE_DIR.
Actually, it was downloading cpuinfo and pointing to the copy of clog
within that.  So let's just point to the copy of clog within the cpuinfo
submodule we already have.

(Note: this ignores all push blocking failures!)

Test Plan:
Ran cmake and didn't see any downloading.
Verified that our clog is the same as the one that was being downloaded
with `diff -Naur`.

Differential Revision: D20169656

Pulled By: suo

fbshipit-source-id: ba0f7d1535f702e504fbc4f0102e567f860db94b
2020-02-28 15:19:03 -08:00
69d2741480 Add list of view ops to public doc. (#32560)
Summary:
This PR comes from discussion with albanD in https://fb.quip.com/npBHAXaPfnbu. Main goal is to clarify view ops with general outplace/inplace ops and remind users about the difference.
For reference this information is only available in code which is internal and hard to find. Also changes to this list actually affect users so we think it's better to expose it as public information. It's also helpful for new backend like XLA when implementing PyTorch ops. 19bbb4fccb/tools/autograd/gen_autograd.py (L32-L68)
Please feel free to comment!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32560

Differential Revision: D20161069

Pulled By: ailzhang

fbshipit-source-id: b5f1fd4353fe7594a427784db288aeb5a37dc521
2020-02-28 15:05:55 -08:00
b678256bfb Move glu to Aten(CPU) (#33179)
Summary:
This PR move glu to Aten(CPU).
Test script:
```
import torch
import torch.nn.functional as F
import time

torch.manual_seed(0)

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n // 2, device=device)
    for i in range(1000):
        output = F.glu(input)
        output.backward(grad_output)

for n in [10, 100, 1000, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n // 2, device=device)
    for i in range(10000):
        t1 = _time()
        output = F.glu(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test device: **skx-8180.**
Before:
```
input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms).
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms).
input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms).
input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms).
```
After:
```
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms).
```
Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179

Differential Revision: D19839835

Pulled By: VitalyFedyunin

fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9
2020-02-28 14:54:38 -08:00
3c5677a676 Use codegen'ed unboxing wrappers (#32521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32521

Not all ops support the templated unboxing wrappers yet. For the ones that don't,
let's use the codegen'ed unboxing wrappers from register_aten_ops.cpp, but register
them with c10 directly instead of JIT.

The `use_c10_dispatcher` setting in `native_functions.yaml` now has a new option 'with_codegenerated_unboxing_wrapper' which means we take the codegened unboxing wrapper from register_aten_ops.cpp and stuff it into c10. This new argument is the default, 'unboxed_only' is not the default anymore. For the (very few) ops that don't support boxed dispatch yet (i.e. ops taking TensorOptions arguments), we set them to 'unboxed_only' and they follow the old behavior of having register_aten_ops.cpp register the jit op.

Next steps here are (1) to make TensorOptions work with boxed dispatch and remove the `unboxed_only` option from `use_c10_dispatcher`, so that all ops go through the new path and (2) make the new path template-only and remove codegen from it (see https://github.com/pytorch/pytorch/issues/32366).

First experiments show that
- For a small JITted model that calls add (i.e. a op with just two arguments that are both tensors) on two tensors in a loop, we see a 2-4% performance improvement (~35-50ns) when compared to the old path. This is a simple op that takes two tensor arguments and no non-tensor arguments, so iterating over it in boxed dispatch is cheap.
- For a small JITted model that calls avgpool1d (i.e. an op that has one tensor arg and 5 non-tensor args) on a tensor in a loop, we see a 3-4% performance regression (~60ns) when compared to the old path. This is an op that takes only one tensor argument and then 6 non-tensor arguments. Unboxed dispatch doesn’t have to look at those but boxed dispatch still needs to iterate over them.

This performance difference is likely due to boxed dispatch iterating over all arguments in a loop and unboxed dispatch not having to look at non-tensor arguments.

ghstack-source-id: 99161484

Test Plan: unit tests that call existing ops through JIT

Differential Revision: D18672405

fbshipit-source-id: bf2a7056082dfad61e7e83e9eeff337060eb6944
2020-02-28 14:48:25 -08:00
2fa51dde28 Remove unnecessary tensor copies (#33732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732

move and forward instead of copy

Benchmarks:
A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance.
No visible change for a model like resnet that does more work in its kernels.
ghstack-source-id: 99161486

Test Plan: benchmarks

Differential Revision: D20082642

fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847
2020-02-28 14:47:04 -08:00
917e56e950 Throw an error if nbytes is called on a sparse tensor. (#33897)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33897

Test Plan: Imported from OSS

Differential Revision: D20146388

Pulled By: gchanan

fbshipit-source-id: b5853096e290fa7fb50be41446b138ebdf71009f
2020-02-28 14:12:50 -08:00
f5d92fbc25 Get rid of newWithStorage2d calls. (#33823)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33823

Test Plan: Imported from OSS

Differential Revision: D20122448

Pulled By: gchanan

fbshipit-source-id: b249372c93ee71b84a293dfb5c298a8fb664da16
2020-02-28 14:07:44 -08:00
56d9906083 update mapping of fake operators (#33946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33946

update mapping of fake operators to model nnpi
update SpatialBN to non-lowered

Test Plan:
compilation

https://github.com/pytorch/pytorch/pull/33946

Reviewed By: amylittleyang

Differential Revision: D20156136

fbshipit-source-id: e6ed87c3c5eba692a49376f0d9dae37ae185f185
2020-02-28 14:01:02 -08:00
ad44394f15 Updating submodules
Summary:
GitHub commits:

e5b1164ad7
6df461c14e
41535d0218
30c57a1a0e
3b9aeb2ebe

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: 8361b5814c531edc99f96f11db97d6b2adcc5280
2020-02-28 13:29:48 -08:00
9fd1a7697f Create CODE_OF_CONDUCT.md 2020-02-28 13:20:00 -08:00
a726827ec8 Formatting changes for gradient scaling (#33832)
Summary:
hard to get right locally...I can build the docs but never quite match what it looks like live.  the bullet point indentation was just an oversight.

Removing `Returns:` formatting tabs because they take up a lot of space when rendered and add no clarity.  Some functions in Pytorch [do use them](https://pytorch.org/docs/master/torch.html#torch.eye), but [many don't bother](https://pytorch.org/docs/master/torch.html#torch.is_tensor), so apparently some people shared my feelings (Not using them is in line with existing practice).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33832

Differential Revision: D20135581

Pulled By: ngimel

fbshipit-source-id: bc788a7e57b142f95c4fa5baf3fe01f94c45abd8
2020-02-28 11:40:48 -08:00
5dde8cd483 [caffe2] fix no matching function min/max Clang errors (#33563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563

When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used.

Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1].

1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20005795

fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696
2020-02-28 11:33:24 -08:00
c6d301220a Fix torch.cat() performance regression on single core CPU (#33534)
Summary:
This PR addresses the performance regression on `torch.cat()` on CPU with single thread.
Previous optimization https://github.com/pytorch/pytorch/issues/30806 introduced regression for several cases on pytorch operator benchmark.
See https://github.com/pytorch/pytorch/issues/33334 for detail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33534

Differential Revision: D20129963

Pulled By: VitalyFedyunin

fbshipit-source-id: 3fa6cd266978e5b54fa37105555502b77352df3e
2020-02-28 11:22:08 -08:00
890242254b Updating submodules
Summary:
GitHub commits:

6f4df6e0cd
6b7df86da1
f873713ad6
2b3b76cc4d
b990727d33

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: bf7b1639ee23e1e823bc2217f56c87dc7befaf7f
2020-02-28 10:42:20 -08:00
04dc0e6973 Split Distribution.cu into smaller files to reduce compilation time. (#33892)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33892

Test Plan: Imported from OSS

Differential Revision: D20148925

Pulled By: gchanan

fbshipit-source-id: 955e6ff22ee5fb24000b9f2ee58a243e76edf993
2020-02-28 09:21:51 -08:00
dece155335 Modified assertEqual to handle complex tensors (#33773)
Summary:
- Modified assertEqual to handle complex tensors
- added a test in test_torch.py to test torch.zeros
- added dispatch for complex for index_kernel, index_put_kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33773

Differential Revision: D20135553

Pulled By: anjali411

fbshipit-source-id: f716604535c0447ecffa335b0fc843431397c988
2020-02-28 08:43:28 -08:00
09046713cc removed .data from test_autograd.py (#33886)
Summary:
issue: https://github.com/pytorch/pytorch/issues/33630
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33886

Differential Revision: D20160292

Pulled By: anjali411

fbshipit-source-id: 14a42d8148bd60db2dd8ec39f83f99c061ae19c1
2020-02-28 08:24:07 -08:00
f5f1e5e7f6 [quant][graphmode][refactor] Factor out getInvokedMethod (#33649)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33649

Test Plan:
.

Imported from OSS

Differential Revision: D20123589

fbshipit-source-id: 0853d757434fb85c6d86666ff9fc991f8c4cb4bc
2020-02-27 23:48:09 -08:00
7f1112820a [quant][graphmode][refactor] Move check for weight outside of insertObserverFor (#33276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33276

att

Test Plan:
.

Imported from OSS

Differential Revision: D20123593

fbshipit-source-id: 45dc8488ddf02225ba2c20374c9385edd77a4912
2020-02-27 23:48:04 -08:00
7c13f576ea [quant][graphmode][refactor] Checks for bias and weight (#33273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33273

- Move the check for bias to valueNeedsToBeQuantized
- Move TORCH_CHECK inside the functions for checking if a value is bias or weight

Test Plan:
.

Imported from OSS

Differential Revision: D20123595

fbshipit-source-id: 4b805d57dcaf41a6436506d021dd5f6518bc88fd
2020-02-27 23:47:59 -08:00
97541a5106 [quant][graphmode][refactor] Move values_to_skip check inside valueNeedsToBeQuantized (#33275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33275

att

Test Plan:
.

Imported from OSS

Differential Revision: D20123592

fbshipit-source-id: 2b56ea8bab27eb9ea2bf792c83e48a7af8917e1a
2020-02-27 23:46:29 -08:00
64aab3260a [jit] allow RRef local creation with IValue objects (#33263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33263

This PR allow PyRRef local creation to inspect the pyobject, if it
founds that we could turn it to an IValue, turn to an IValue first,
otherwise hold it as a PyObjectType

Test Plan:
Imported from OSS

https://fb.quip.com/aGxRAh2lCg05

Differential Revision: D19871243

Pulled By: wanchaol

fbshipit-source-id: ae5be3c52fb1e6db33c64e64ef64bc8b9ea63a9a
2020-02-27 22:49:53 -08:00
1507573a52 [caffe2] fix no return statement in constexpr function Clang error in TypeIndex.h (#33576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33576

`throw` statement at the end of `constexpr` is ill-formed according to Clang. It happens when Clang is driving CUDA compilation and compiles for device the effected code. Due to its compilation model it requires host code to be well-formed even when compiling for device.

Fix the error by guarding the entire definition of `type_index_impl` with `__CUDA_ARCH__` check.

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: smessmer

Differential Revision: D20008881

fbshipit-source-id: b0dc9abf0dc308b8b8637b54646a0411baf7fef3
2020-02-27 22:29:58 -08:00
c18cb1eb52 Improve dll loading logic on Windows (#33856)
Summary:
The way it works on the Anaconda distribution of Python 3.8 is a bit different. Loading DLLs explicitly  (e.g. `ctype.CDLL`) relies on paths appended by `os.add_dll_directory`. But if you try to load DLLs implicitly (e.g. `from torch._C import *`), it will rely on `PATH`.

Fixes https://github.com/pytorch/vision/issues/1916.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33856

Differential Revision: D20150080

Pulled By: soumith

fbshipit-source-id: cdbe76c138ea259ef7414c6634d4f7e0b1871af3
2020-02-27 21:58:35 -08:00
cb8d9f99aa [JIT] Implement Tensor.tolist() (#33472)
Summary:
**Summary**
This commit adds an implementation of `Tensor.tolist()` to the JIT interpreter.

**Testing**
This commit adds several unit tests that test that this function works correctly for
0D, 1D, 2D and 3D tensors of type `float`, `int` and `bool`.

```
(base) meghanl-mbp:pytorch meghanl$ python test/test_jit.py TestList.test_to_list -v
Fail to import hypothesis in common_utils, tests are not derandomized
test_to_list (jit.test_list_dict.TestList)
Unit tests for Tensor.tolist() function. ... ok

----------------------------------------------------------------------
Ran 1 test in 0.329s

OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33472

Differential Revision: D20109738

Pulled By: SplitInfinity

fbshipit-source-id: a6e3fee5e3201d5e1f0c4ca45048488ae2bf5e33
2020-02-27 21:45:46 -08:00
5029ff001b [Revert] manual revert of D19918320 (#33920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33920

revert D19918320

Test Plan: revert diff

Reviewed By: zhaojuanmao

Differential Revision: D20151299

fbshipit-source-id: c346554ae9074991331479e434e54b0cc513f1a4
2020-02-27 21:22:36 -08:00
8f84deddd1 [jit] fix up refs in overview.md (#33919)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33919

Test Plan: Imported from OSS

Differential Revision: D20154953

Pulled By: suo

fbshipit-source-id: 2ef83cce8da88212bed7edc813c9b233267ea81b
2020-02-27 19:22:51 -08:00
d6485b411b [jit] add top-level readme to csrc/jit (#33916)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33916

Test Plan: Imported from OSS

Differential Revision: D20150771

Pulled By: suo

fbshipit-source-id: c7550954ddd6a294ce833348bf9fa058503e9bd7
2020-02-27 19:21:05 -08:00
bd7e9c490a [jit] stop printing crap in test_jit (#33917)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33917

Test Plan: Imported from OSS

Differential Revision: D20150750

Pulled By: suo

fbshipit-source-id: 9a35298a8856d423fb6b9043174853cccf968706
2020-02-27 19:06:43 -08:00
d66c320b10 disable leaky_relu_ backward calculation with negative slope (#33639)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33639

Test Plan: Imported from OSS

Differential Revision: D20045735

Pulled By: glaringlee

fbshipit-source-id: b3becf30a8fe9ee178792bd88f6ee10102504ed5
2020-02-27 18:54:57 -08:00
997b5b5797 [quant][graphmode][refactor] Simplify signature for insertObserverFor (#33274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33274

att

Test Plan:
.

Imported from OSS

Differential Revision: D20123588

fbshipit-source-id: e656d96e0b6004bfcca5df2ab222184d4e1dd6ad
2020-02-27 17:24:41 -08:00
db4a24e008 [jit] remove some unused/redundant files (#33806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33806

as title

Test Plan: Imported from OSS

Differential Revision: D20122117

Pulled By: suo

fbshipit-source-id: 209d29ed2c873181140c9fb5cdc305c200ce4008
2020-02-27 17:16:12 -08:00
877ab3afe3 Better handing of Autograd+Fork errors. (#33885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33885

Fixes: #32835
Fixes: #5834

Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp.

Test Plan: Imported from OSS

Differential Revision: D20144024

Pulled By: VitalyFedyunin

fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f
2020-02-27 16:07:29 -08:00
746e5218e7 Mistake in MSELoss documentation (#33836)
Summary:
Replaced `sum` with `mean` in [line 392](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/loss.py#L392)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33836

Differential Revision: D20142053

Pulled By: ailzhang

fbshipit-source-id: 2bfe19944ffc5534902dd9087023e70ddf5746c3
2020-02-27 15:34:46 -08:00
48fd410e44 Try fix XLAPreAutograd with *_like functions. (#33848)
Summary:
In *_like functions we call
`globalLegacyTypeDispatch().initForDispatchKeySet(c10::detail::multi_dispatch_key_set(self, options));` -> `dispatchKeyToBackend` and thus this change.
`self` has both `XLAPreAutograd` and `XLATensorId` in key set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33848

Differential Revision: D20135898

Pulled By: ailzhang

fbshipit-source-id: a8585f39f3fa77b53718f20d3144f4f2f3cb8e53
2020-02-27 15:28:40 -08:00
87e97ced20 Split UnaryOpsKernel into smaller files for faster compilation. (#33888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33888

Test Plan: Imported from OSS

Differential Revision: D20143653

Pulled By: gchanan

fbshipit-source-id: de708030e93e96091e0c01a89b4342872d0657dd
2020-02-27 15:13:01 -08:00
aff1da5aac .circleci: Remove trailing slash, fix conda upload (#33903)
Summary:
Conda registers a suffixed slash as a new user so it was failing to
upload the anaconda packages.

In the future this should be handled through a single variable that can
be used for both but until then this will have to do.

Bug was introduced in https://github.com/pytorch/pytorch/issues/33842

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33903

Differential Revision: D20148679

Pulled By: seemethere

fbshipit-source-id: 27c95f5d906ce84aa34bf5d76fd6f1ef5df08fb9
2020-02-27 14:56:02 -08:00
a7fe200f5f [caffe2] simplify caffe2 code with fbgemm handling block size 1 emb (#33774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33774

Simplify caffe2 code using D19246900

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D20102410

fbshipit-source-id: 8de4d9cfac66898db0718ac6477339fd5e5428e3
2020-02-27 14:45:28 -08:00
524dad13a8 Add device to the test tensor. Default device type is CPU, in pytorch… (#33635)
Summary:
…/xla this will result in a failure since it is comparing a XLA tensor with a CPU tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33635

Differential Revision: D20043517

Pulled By: ailzhang

fbshipit-source-id: d84038ea675e4d4a9c02e7a8b0924bdb12f40501
2020-02-27 14:40:07 -08:00
edd5c009f7 fix docs mistakes in lr_scheduler.MultiplicativeLR (#33805)
Summary:
This PR is referenced to an issue: [The docs of `MultiplicativeLR` use `LambdaLR` as example](https://github.com/pytorch/pytorch/issues/33752#issue-570374087)

https://github.com/pytorch/pytorch/issues/33752
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33805

Differential Revision: D20121314

Pulled By: mruberry

fbshipit-source-id: 5afa63bbe83d35ce4e55705b8cbd96326a907651
2020-02-27 14:11:57 -08:00
d97560999b Split BinaryCompareKernel.cu into a file-per-kernel to speed up compilation. (#33871)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33871

Test Plan: Imported from OSS

Differential Revision: D20140862

Pulled By: gchanan

fbshipit-source-id: a4fde38c1c7c5905e3855fa490ea2e87bb24c703
2020-02-27 13:48:36 -08:00
5eacdfb21f Revert D20127441: [pytorch][PR] [JIT] Introduce a fake Tensor creation node for IR unit tests
Test Plan: revert-hammer

Differential Revision:
D20127441

Original commit changeset: 56da4f23ac46

fbshipit-source-id: 7d4602e5011bec6f6871eab16af05a3198694e5d
2020-02-27 13:48:31 -08:00
c4d611a0f5 Split BinaryMiscOpsKernels into more files for faster build times. (#33873)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33873

Test Plan: Imported from OSS

Differential Revision: D20140974

Pulled By: gchanan

fbshipit-source-id: 88b982881e8034f3b03cdb6911ae4239d2bb1596
2020-02-27 13:47:06 -08:00
910acafc79 Revert D20124224: [jit] stop printing crap in test_jit
Test Plan: revert-hammer

Differential Revision:
D20124224

Original commit changeset: 9241d21fdf94

fbshipit-source-id: 0680f9db922f9a33a4e859eedd142b87a51bbede
2020-02-27 13:40:34 -08:00
53630f7681 Updating submodules
Summary:
GitHub commits:

ae68f84fcd
6cb0beaf0e
401fb54029
fe8777e593
44fcf005eb
72ee067b90
01a3c124d4
c94f8f43b9
a09b292a28
472e40a902
967d4bc051

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: e8e43b1cbc365fd7f5b068d625c4020240358690
2020-02-27 13:35:14 -08:00
243af17d65 Revert D20103905: [jit] Fix flipped PackedSequence outputs in script
Test Plan: revert-hammer

Differential Revision:
D20103905

Original commit changeset: 84081213ed21

fbshipit-source-id: 2b260654fac87e52fbaf8035018e4ea484928af1
2020-02-27 13:29:35 -08:00
a7cf5c859f Revert D20136865: fix lint
Test Plan: revert-hammer

Differential Revision:
D20136865

Original commit changeset: 4bf7ac324a0a

fbshipit-source-id: 94cc83cda180f744cec174d269f1b82babff0e5c
2020-02-27 13:21:44 -08:00
908eee5583 remove .data from test/distributed/ (#33874)
Summary:
`.data` calls are unsafe and should not be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33874

Differential Revision: D20141059

Pulled By: izdeby

fbshipit-source-id: 8e11afc74f0cb04f5b18b458068fb813a6d51708
2020-02-27 13:14:29 -08:00
390d4d6df3 [JIT] Introduce a fake Tensor creation node for IR unit tests (#33595)
Summary:
**Summary**
There is often a need to create a Tensor when writing IR by hand for JIT
optimisation pass unit tests. The only options for this today are real
Tensor creation functions like `aten::ones`. Any test that uses these functions
must also use the same default arguments as the Python/C++ API, which means
that all of the tests have to be updated when the API is updated. This commit
introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that
should be used in unit tests instead of real Tensor creation functions. This new
primitive has no public-facing API, so the maintenance burden is much lower.

**Testing**
This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of
`aten::rand`, `aten::ones`, and `aten::zeros`.

```
$ ./bin/test_jit
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = *-*_CUDA:*_MultiCUDA
[==========] Running 75 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 75 tests from JitTest
[ RUN      ] JitTest.ADFormulas
[       OK ] JitTest.ADFormulas (82 ms)
[ RUN      ] JitTest.Attributes
[       OK ] JitTest.Attributes (0 ms)
...
...
...
[ RUN      ] JitTest.LiteInterpreterPrim
[       OK ] JitTest.LiteInterpreterPrim (0 ms)
[ RUN      ] JitTest.LiteInterpreterLoadOrigJit
[       OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms)
[----------] 75 tests from JitTest (150 ms total)

[----------] Global test environment tear-down
[==========] 75 tests from 1 test case ran. (150 ms total)
[  PASSED  ] 75 tests.
```

**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33500.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33595

Differential Revision: D20127441

Pulled By: SplitInfinity

fbshipit-source-id: 56da4f23ac46335227254f606c6481718108f378
2020-02-27 13:10:20 -08:00
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
afbd04449e [quant][graphmode] Swap dequantize after inline for ops that doesn't require observation (#33173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33173

How to deal with ops that’s defined for both floating point and quantized Tensor?

Category of ops: the ones that doesn’t require observers, which means the quantization parameters(scale/zero_point) of the output of this op can be inferred from the quantization parameters of inputs.
For example:
avg_pool, max_pool, flatten, transpose, upsample

Another related topic to previous one is how do we deal with things like adaptive_avg_pool2d that does not require to be observed and it works with quantized tensor as well? If we insert quant/dequant for them, even the quant fusion becomes a numerically changing operation because the scale/zero_point for input and output are different.

Proposal

We can swap the operator with dequantize whenever we see it. For example, for pattern
Let’s say aten::general_op is defined for both floating point and quantized

%r = aten::conv(...)
%q = quantize(%r)
%dq = dequantize(%q)
%f = aten::general_op(%dq)
...

We detect that all inputs of aten::general_op is produced by dequantize, we’ll first delete all the dequantize for the inputs and then insert dequantize for each use of the output of the aten::general_op, note that this should work generally for all the case we might encounter.

After transformation we’ll have:

%r = aten::conv(...)
%q = quantize(%r)
%x = aten::general_op(%q)
%f = dequantize(%x)
...

1. Multiple inputs
    1. We need to make sure all inputs of the aten::general_op are produced by dequantize before we do this transformation
2. Input used by multiple operators
    1. We already did this by inserting dequantize for each use of the value
3. Output used by multiple operators
    1. We’ll reuse the code that inserts dequantize(might need some refactor)

Note that current concat does not belong to this category right now since it does not inherit quantization parameters from inputs.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20123590

fbshipit-source-id: de2febe1f37e4079457a23acaeccbc6d9c9e1f8a
2020-02-27 12:42:29 -08:00
6647a44e8c Automatic update of fbcode/onnx to 9fdae4c68960a2d44cd1cc871c74a6a9d469fa1f (#33858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33858

Previous import was 04a29addfd5b912812addb8dea5f8763fbfaad01

Included changes:
- **[9fdae4c6](https://github.com/onnx/onnx/commit/9fdae4c6)**: Copy sizes in some optimizers to remain shape information (#2574) <daquexian>
- **[c978d102](https://github.com/onnx/onnx/commit/c978d102)**: Implement CELU node as a Function (#2575) <Jeremy Cochoy>
- **[c677aef4](https://github.com/onnx/onnx/commit/c677aef4)**: Fix CI build break (#2603) <Changming Sun>
- **[d343755d](https://github.com/onnx/onnx/commit/d343755d)**: Allow function body to rely on other operator sets (#2597) <Ke Zhang>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D20135343

fbshipit-source-id: d719c4ba2ae26892a5fa921691c84eba64b59291
2020-02-27 12:40:39 -08:00
bd77abffe3 Kill some unused (TH)Storage-based APIs. (#33815)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33815

Test Plan: Imported from OSS

Differential Revision: D20119333

Pulled By: gchanan

fbshipit-source-id: 15042ca0fabdc88b53d662b6dd964968f64997f4
2020-02-27 12:23:25 -08:00
b10761d890 fix type stub errors (#33762)
Summary:
I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs.

I expected below code should be type-checked without any errors.

```python
import torch
from torch.nn import Linear
from torch.autograd import Variable
from torch.optim import AdamW
from torch.utils import hooks

# nn.Module should have training attribute
module = Linear(10, 20)
module.training

# torch should have dtype bfloat16
tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16)

# torch.Tensor.cuda should accept int or str value
torch.randn(5).cuda(1)
torch.tensor(5).cuda('cuda:0')

# optimizer should have default attribute
module = Linear(10, 20)
print(AdamW(module.weight).default)

# torch.Tensor should have these boolean attributes
torch.tensor([1]).is_sparse
torch.tensor([1]).is_quantized
torch.tensor([1]).is_mkldnn

# Size class should tuple of int
a, b = torch.tensor([[1,2,3]]).size()

# check modules can be accessed
torch.nn.parallel
torch.autograd.profiler
torch.multiprocessing
torch.sparse
torch.onnx
torch.jit
torch.hub
torch.random
torch.distributions
torch.quantization
torch.__config__
torch.__future__

torch.ops
torch.classes

# Variable class's constructor should return Tensor
def fn_to_test_variable(t: torch.Tensor):
    return None

v = Variable(torch.tensor(1))
fn_to_test_variable(v)

# check RemovableHandle attributes can be accessed
handle = hooks.RemovableHandle({})
handle.id
handle.next_id

# check torch function hints
torch.is_grad_enabled()
```

But current master branch raises errors. (I checked with pyright)

```
$ pyright test.py
Searching for source files
Found 1 source file
test.py
  12:45 - error: 'bfloat16' is not a known member of module
  15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]'
  'int' is incompatible with 'device'
  Cannot assign to 'None'
  16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]'
  'str' is incompatible with 'device'
  Cannot assign to 'None'
  23:19 - error: Cannot access member 'is_sparse' for type 'Tensor'
  Member 'is_sparse' is unknown
  24:19 - error: Cannot access member 'is_quantized' for type 'Tensor'
  Member 'is_quantized' is unknown
  25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor'
  Member 'is_mkldnn' is unknown
  32:7 - error: 'autograd' is not a known member of module
  33:7 - error: 'multiprocessing' is not a known member of module
  34:7 - error: 'sparse' is not a known member of module
  35:7 - error: 'onnx' is not a known member of module
  36:7 - error: 'jit' is not a known member of module
  37:7 - error: 'hub' is not a known member of module
  38:7 - error: 'random' is not a known member of module
  39:7 - error: 'distributions' is not a known member of module
  40:7 - error: 'quantization' is not a known member of module
  41:7 - error: '__config__' is not a known member of module
  42:7 - error: '__future__' is not a known member of module
  44:7 - error: 'ops' is not a known member of module
  45:7 - error: 'classes' is not a known member of module
  60:7 - error: 'is_grad_enabled' is not a known member of module
20 errors, 0 warnings
Completed in 1.436sec
```

and below list is not checked as errors, but I think these are errors too.

* `nn.Module.training` is not boolean
* return type of `torch.Tensor.size()` is `Tuple[Unknown]`.

 ---

related issues.

https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762

Differential Revision: D20118884

Pulled By: albanD

fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab
2020-02-27 06:58:53 -08:00
095de1e872 Migrate random_ from the TH to Aten (CPU and CUDA) (#33663)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33663

Test Plan: Imported from OSS

Differential Revision: D20056350

Pulled By: pbelevich

fbshipit-source-id: f9859b79ffdec70c48d6ee3ec70fd6fad593a9f5
2020-02-27 05:05:42 -08:00
f5952cf7cb fix lint (#33861)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33861

Test Plan: Imported from OSS

Differential Revision: D20136865

Pulled By: suo

fbshipit-source-id: 4bf7ac324a0abce9b45121ac5ab438448a6f3149
2020-02-27 00:33:51 -08:00
9733711394 [JIT] Support calling Tensor.element_size() in TorchScript (#33808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33808

# Problem

https://github.com/pytorch/pytorch/issues/33620
ghstack-source-id: 99073701

Test Plan:
```
buck test mode/dev-nosan //caffe2/test:jit -- test_numel

buck test mode/dev-nosan //caffe2/test:jit -- test_element_size

buck build mode/dev-nosan //caffe2/test:jit \
&& buck-out/gen/caffe2/test/jit\#binary.par -r test_numel

buck build mode/dev-nosan //caffe2/test:jit \
&& buck-out/gen/caffe2/test/jit\#binary.par -r test_element_size
```

Compile error

P126667043

Generated code,
```
buck-out/dev/gen/caffe2/generate-code=register_aten_ops_0.cpp/register_aten_ops_0.cpp

buck-out/dev/gen/caffe2/generate-code=register_aten_ops_2.cpp/register_aten_ops_2.cpp
```
P126667064

Differential Revision: D7050644

fbshipit-source-id: 20dbdb9c500b6d7683c23e3049d43ed0ca06d831
2020-02-26 22:30:44 -08:00
00f685d2d8 Add Scalar::type() (#33603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33603

This function returns ScalarType based on its value. This is helpful
to avoid code generated in aten_op.h has returned Scalars depending on
arg self to determine its type.

Test Plan: Imported from OSS

Differential Revision: D20100218

Pulled By: ezyang

fbshipit-source-id: 337729a7559e6abb3a16b2a563a2b92aa96c7016
2020-02-26 22:25:18 -08:00
d41c8d0461 Correctly preserve "not set anywhere" TensorOptions when merging. (#33510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33510

Previously, we would fill in TensorOption with defaults whenever an
item was missing from both the left and right side of the merge.  This
is morally incorrect: if we don't have an item on the left or right,
we should keep the entry empty (so the downstream user can apply
the appropriate defaulting rule).

I don't think this caused any bugs, but I noticed this error when
working on a later patch in my diff stack.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20001775

Pulled By: ezyang

fbshipit-source-id: 88139fc268b488cd1834043584a0d73f46c8ecaa
2020-02-26 21:46:39 -08:00
ca002a0f6b Switch empty_like to use merge_in to process TensorOptions. (#33505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33505

This shouldn't change semantics, but it has the benefit of making
torch::empty_like(x, dtype(kFloat)) actually work (previously, this
would just ignore all of the properties from x).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20001776

Pulled By: ezyang

fbshipit-source-id: ba81186d3293abc65da6130b2684d42e9e675208
2020-02-26 21:44:33 -08:00
84101f353e Avoid problematic pickle usages on Python 3.8.0 and 3.8.1 (#33824)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32289

This has been fixed upstream as of Python 3.8.2. I think the easiest and least invasive way to ameliorate this is to catch the error condition and print a more informative error asking the user to update their Python version. It might be possible to buffer the data into memory and then read from memory, but that would be an invasive change and might cause memory exhaustion for very large models.

Suggestions for alternate fixes or ways to improve the error message wording are very welcome.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33824

Differential Revision: D20131722

Pulled By: ezyang

fbshipit-source-id: a6e3fbf4bf7f9dcce5772b36f7a622cbf14b5ae4
2020-02-26 21:15:38 -08:00
421e3e9a54 Release GIL for RPC pybind functions. (#33610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33610

Our pybind definitions for several RPC functions didn't release GIL
once we were processing stuff in C++.

This PR adds asserts that we release GIL appropriately and adds
py::gil_scoped_release and py::gil_scoped_acquire in the appropriate places.
ghstack-source-id: 99066749

Test Plan: waitforbuildbot

Differential Revision: D20025847

fbshipit-source-id: 57a778cba0336cf87352b07c89bbfb9254c4bdd7
2020-02-26 20:56:06 -08:00
cea0cc8ca8 [jit] Unify augmented assign handling (#33578)
Summary:
Stacked PRs
 * **#33578 - [jit] Unify augmented assign handling**
 * #32993 - [jit] Fix aug assign for non-tensor attributes

We handle augmented assignments to `Select` and `Var` statements differently, but the actual in place update is the same for both, so this PR factors it out into a method so we don't have 2 code paths doing the same thing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/33578

Pulled By: driazati

Differential Revision: D20127647

fbshipit-source-id: 94f37acbd2551498de9d2ca09a514508266f7d31
2020-02-26 19:13:15 -08:00
24dd800e6a [Dist Autograd] Functional API for Dist Autograd and Dist Optimizer (#33711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711

Fixed #33480

This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id.

This diff incorporates these API changes and all places where these functions are called.

More concretely, this code:

```
with dist_autograd.context():
    # Forward pass.
    dist_autograd.backward([loss.sum()])
    dist_optim.step()
```

should now be written as follows:

```
with dist_autograd.context() as context_id:
    # Forward pass.
    dist_autograd.backward(context_id, [loss.sum()])
    dist_optim.step(context_id)
```

Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking.

Differential Revision: D20011710

fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65
2020-02-26 19:08:28 -08:00
4c33222c51 [quant][graphmode] Replicate dequantize nodes (#33531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33531

We already insert dequantize for each use of the value, but there might still be cases where we only
see the value is used multiple times after inline. This pass adds the support to replicate dequantize
after inline to ensure output of dequantize is only used by one node, which is necessary to preserve all
quantization patterns like `dequant - conv - quant`

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20123591

fbshipit-source-id: 6edb10a4566538bcf9379d332233f870372b7a63
2020-02-26 18:59:16 -08:00
2b9fa4a756 [jit] Fix flipped PackedSequence outputs in script (#32955)
Summary:
Stacked PRs
 * **#32955 - [jit] Fix flipped PackedSequence outputs in script**
 * #32953 - [jit] Support properties on `Device`

Fixes #32605
](https://our.intern.facebook.com/intern/diff/20103905/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32955

Pulled By: driazati

Differential Revision: D20103905

fbshipit-source-id: 84081213ed214846e563b9f05bcab0210bb1a71b
2020-02-26 18:53:27 -08:00
150e025be8 [jit] stop printing crap in test_jit (#33779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33779

This should eliminate random warnings and print spew from test_jit.

It also fixes a bug where we weren't properly comparing captured outputs
(!)

Test Plan: Imported from OSS

Differential Revision: D20124224

Pulled By: suo

fbshipit-source-id: 9241d21fdf9470531b0437427b28e325cdf08d3a
2020-02-26 18:46:03 -08:00
4dad00b64b [rpc] special case tensor type check when getting RRef (#33582)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33582

Test Plan: Imported from OSS

Differential Revision: D20009837

Pulled By: wanchaol

fbshipit-source-id: 7e9ab87d4dddb822c7575891a2b620eff83bfa00
2020-02-26 18:44:40 -08:00
d494986171 [jit] make RRef type annotation available in Python (#33526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33526

Test Plan: Imported from OSS

Differential Revision: D19988848

Pulled By: wanchaol

fbshipit-source-id: aeebc946d08b38dac0b656617bf395e86bcea558
2020-02-26 18:44:35 -08:00
2448c97a53 [jit] infer RRef type as container type (#33369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33369

This PR add RRef type infer rule when we try to infer a type from a
pyobject, this allow script module attributes could contain a rref,
(i.e. List[RRefs] as a module attribute)

Test Plan: Imported from OSS

Differential Revision: D19918320

Pulled By: wanchaol

fbshipit-source-id: e5fd99c0ba5693b22ed48f0c0550b5e1dac89990
2020-02-26 18:43:13 -08:00
857eb4145e [JIT] add support for torch.cdist (#33737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33737

Test Plan: Imported from OSS

Differential Revision: D20121916

Pulled By: eellison

fbshipit-source-id: b0427bbfd3ade1f3129c4a95a542fbc32c3abd76
2020-02-26 18:37:37 -08:00
f31b1d3453 [JIT] add support for lu_unpack (#33736)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33736

Test Plan: Imported from OSS

Differential Revision: D20121914

Pulled By: eellison

fbshipit-source-id: 1136f4d7678a2233129aefe3e30234af385b8353
2020-02-26 18:37:33 -08:00
4543cf4eb1 [JIT] add support for torch.lu to torchscript (#33724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33724

Fix for https://github.com/pytorch/pytorch/issues/33381, partial fix of https://github.com/pytorch/pytorch/issues/30786

Test Plan: Imported from OSS

Differential Revision: D20077321

Pulled By: eellison

fbshipit-source-id: a1e6a0370712b36c9f66979098ac2f9d500ca5f6
2020-02-26 18:37:28 -08:00
fddf73250d [JIT] fix resolving of functions in torch/functional. fix compilation of torch.stft (#33504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33504

Fix resolution fo functions that are bound onto torch in torch/functional.py. This does not fix compilation of all of those functions, those will be done in follow ups. Does torch.stft as a start.

Fixes #21478

Test Plan: Imported from OSS

Differential Revision: D20014591

Pulled By: eellison

fbshipit-source-id: bb362f1b5479adbb890e72a54111ef716679d127
2020-02-26 18:35:43 -08:00
057fd5e10d add support for _modules, reducing special casing of nn.Sequential (#29495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29495

This PR adds support for `_modules`, making it so we no longer need to special case support for `nn.Sequential`. I was getting internal errors around the previous approach using `self.define()`, so i am adding this PR as part of the stack.

Fix for https://github.com/pytorch/pytorch/issues/28998

Test Plan: Imported from OSS

Differential Revision: D18412561

Pulled By: eellison

fbshipit-source-id: a8b24ebee39638fccf63b2701f65f8bb0de84faa
2020-02-26 18:07:19 -08:00
6eef66e1f4 .circleci: Divert packages to test channel on tag (#33842)
Summary:
This sets up PIP_UPLOAD_FOLDER to point to the correct channel for
release candidates as opposed to nightlies.

Removes an old safety check that's not needed anymore for devtoolset3

And provides a nice default for PIP_UPLOAD_FOLDER, which should clear up
confusion on where it's initially set

This is a stepping stone towards the promotable pipeline.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33842

Differential Revision: D20130791

Pulled By: seemethere

fbshipit-source-id: dac94ef46299574c36c08c968dd36faddeae6363
2020-02-26 17:25:18 -08:00
cd0acf4374 port masked_fill from TH to ATen (#33330)
Summary:
port `masked_fill` from TH to ATen with TensorIterator.

single core performance roughly stays the same, single socket performance has **3~16x** boost.

`masked_fill` is missing from https://github.com/pytorch/pytorch/issues/24507
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33330

Differential Revision: D20098812

Pulled By: VitalyFedyunin

fbshipit-source-id: ff20712ffc00cc665550997abcfdfb8916c18e40
2020-02-26 17:20:07 -08:00
a0e90e1b45 ONNX Error Message on Missing Op (#33593)
Summary:
Print a complete and comprehensive error message with a description of the issue when an op is missing during ONNX export (previously an ambiguous "key not in registry" error was thrown which was not helpful for the user to understand the failure).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33593

Reviewed By: hl475

Differential Revision: D20052213

Pulled By: houseroad

fbshipit-source-id: ae3010a97efdab26effad5e4a418e9cc41f5b04e
2020-02-26 15:18:16 -08:00
02908dfa67 remove setStorage with null StorageImpl support. (#33735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33735

This apparently used to create a new storage, but I couldn't find anywhere in the code where this actually happens.

Changing it to an assert to see what happens.

Test Plan: Imported from OSS

Differential Revision: D20084029

Pulled By: gchanan

fbshipit-source-id: e9c4db115a25fc2e17a3b166c1ff5a0e6b56d690
2020-02-26 15:12:41 -08:00
04f88a3a7b Add partition info message to NetDef (#33616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33616

Att. We start by assign `node_name` of DeviceOption in each of the op in the net. The for each unique node_name, we will have a PartitionInfo describing the partition, including logic devices that it can be assigned and we establish the link by partition names.

Test Plan:
unittests

Canaries:
AF: https://our.intern.facebook.com/intern/ads/canary/424817103900710410
AI: https://our.intern.facebook.com/intern/ads/canary/424737510862189908

Reviewed By: ipiszy, bangshengtang, jfix71

Differential Revision: D20015493

fbshipit-source-id: 0bb0f30cfc3892f7b8709d87b8bc1fbab2f2c46d
2020-02-26 14:54:58 -08:00
51e405743f Revert D20010383: [jit] Unify augmented assign handling
Test Plan: revert-hammer

Differential Revision:
D20010383

Original commit changeset: 52e559ce907e

fbshipit-source-id: 7ca938070d5e98c91e7a7b8485a3c1e790c3ceb2
2020-02-26 14:22:14 -08:00
867990dc17 [jit] Unify augmented assign handling (#33578)
Summary:
Stacked PRs
 * **#33578 - [jit] Unify augmented assign handling**
 * #32993 - [jit] Fix aug assign for non-tensor attributes

We handle augmented assignments to `Select` and `Var` statements differently, but the actual in place update is the same for both, so this PR factors it out into a method so we don't have 2 code paths doing the same thing.
](https://our.intern.facebook.com/intern/diff/20010383/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33578

Pulled By: driazati

Differential Revision: D20010383

fbshipit-source-id: 52e559ce907e95e5c169ab9d9690d0d235db36f3
2020-02-26 14:09:40 -08:00
c32fa465a5 Preserve Backward compatibility of models serialized before #31040 (#33796)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33796

Test Plan: Imported from OSS

Differential Revision: D20109662

Pulled By: jerryzh168

fbshipit-source-id: 9bc936a59fd6dd1031fbf05eb90f98ae9677b936
2020-02-26 13:40:38 -08:00
5c33d98b0d Add assert_tensor_equal and assert_tensor_not_equal to test/cpp/api/support.h (#30426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30426

This PR adds `assert_tensor_equal` and `assert_tensor_not_equal` to `test/cpp/api/support.h`, as better functions for testing whether two tensors are equal / not equal.

Test Plan: Imported from OSS

Differential Revision: D18695900

Pulled By: yf225

fbshipit-source-id: c19b9bc4c4e84d9f444015023649d27618fcbdf5
2020-02-26 13:25:25 -08:00
8aa09de19e build: set -DNDEBUG in Release (#32719)
Summary:
This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719

Test Plan:
* Build with VERBOSE=1 and manually inspect `less ndebug.build.log | grep 'c++' | grep -v -- -DNDEBUG` (only with nina on Linux)
* CI

Fixes https://github.com/pytorch/pytorch/issues/22745

Differential Revision: D20104340

Pulled By: yf225

fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c
2020-02-26 12:53:31 -08:00
93e30c16cb .circleci: Switch to using robot token for conda uploads (#33786)
Summary:
Thanks to pjh5 for continued use of his account to upload binaries but I
think we can start using a bot account now for this.

Just a draft until we can ensure the env variables get injected correctly and the token can actually upload

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33786

Differential Revision: D20122423

Pulled By: seemethere

fbshipit-source-id: 0444584831a40ae730325d258935f6d1b873961b
2020-02-26 11:37:40 -08:00
45e4b614d1 Per channel quantization performance improvement (#33772)
Summary:
Benchmark:
NVIDIA GTX 1650 + AMD Ryzen Threadripper 3970X
```python
import torch
print(torch.__version__)

for i in range(1000):
    torch.randn(1024 * 128, device='cuda')

def cuda(e):
    a = torch.randn(2 ** e, 32, device='cuda')
    s = torch.randn(32, device='cuda')
    z = torch.randn(32, device='cuda')
    torch.cuda.synchronize()
    %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); torch.cuda.synchronize()

def cpu(e):
    a = torch.randn(2 ** e, 32, device='cpu')
    s = torch.randn(32, device='cpu')
    z = torch.randn(32, device='cpu')
    %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999);

for i in range(10, 24):
    cuda(i)
print()
for i in range(10, 32):
    cpu(i)
```
Before
```
1.5.0a0+9bc922d
849 µs ± 44.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
817 µs ± 30.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
814 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.11 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.19 ms ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.6 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.44 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.14 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.41 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.9 ms ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
26.9 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.6 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
104 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
207 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

249 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
420 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
766 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.45 ms ± 574 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.84 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.69 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.29 ms ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.32 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.4 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
47.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
187 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
379 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
652 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.22 s ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.34 s ± 8.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.56 s ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
8.97 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
17.8 s ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
35.2 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
After
```
1.5.0a0+a7ec8cc
92.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
97.7 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
119 µs ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
146 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
211 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
347 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
624 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.17 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.25 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.43 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.51 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.9 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
33.7 ms ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

201 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
285 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
287 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
347 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
675 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.34 ms ± 643 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.82 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.7 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
20.3 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.4 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
78.8 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
153 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
285 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
541 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.97 s ± 8.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.81 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Fixes https://github.com/pytorch/pytorch/issues/33647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33772

Differential Revision: D20112531

Pulled By: ngimel

fbshipit-source-id: f90e3ef1b5be8276851637f3e1251cb8f1af411f
2020-02-26 10:19:25 -08:00
f597ac6efc Fix grid_sample gradients at image borders (#32829)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23925

This fixes the incorrect gradients returned by `F.grid_sample` at image borders under `"border"` and `"reflection"` padding modes.

At nondifferentiable points, the choice of which gradient to return among its super- or subgradients is rather arbitrary and generally does not affect training. Before this change, however, a bug in the code meant that the gradient returned at the exact borders was not selected from among the super- or subgradients.

The gradient is now set to zero at the borders, which is a defensible choice for both the `"border"` and `"reflection"` padding modes:
* For `"border"` padding, this effectively means that the exact borders of the image are now considered out of bounds, and therefore receive zero gradient.
* For `"reflection"` padding, this effectively treats the exact borders as extrema.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32829

Differential Revision: D20118564

Pulled By: soumith

fbshipit-source-id: ef8571ff585be35ab1b90a922af299f53ab9c095
2020-02-26 10:10:42 -08:00
b8f0acf50f Fix examples with updated pruning naming convention (#33144)
Summary:
Fix in docs requested by vainaijr.
Closes issue https://github.com/pytorch/pytorch/issues/32991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33144

Differential Revision: D20104640

Pulled By: albanD

fbshipit-source-id: 9b1be2c1cbde1964967967a9581bb6932a305d81
2020-02-26 10:02:50 -08:00
a8e7ed48f4 [pt][quant] Parallelize quantize and dequantize (#33765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33765

quantize and dequantize methods now make use of multiple threads. This makes use of shz0116's recent parallelization of quantize/dequantize routines in FBGEMM.

Fixes:
https://github.com/pytorch/pytorch/issues/32006
https://github.com/pytorch/FBGEMM/issues/142

Alternative to https://github.com/pytorch/pytorch/pull/30153

```
#!/usr/bin/env python

import time
import torch
import torch.nn as nn
torch.set_num_threads(4)
# print(torch.__config__.parallel_info())

W = torch.rand(1, 54, 54, 256)

NITER = 1000
s = time.time()
for i in range(NITER):
    W_q = torch.quantize_per_tensor(W, scale=1.0, zero_point = 0, dtype=torch.quint8)
time_per_iter = (time.time() - s) / NITER

print('quantize time per iter ms', time_per_iter * 1000)

s = time.time()
for i in range(NITER):
    W_deq = W_q.dequantize()
time_per_iter = (time.time() - s) / NITER

print('dequantize time per iter ms', time_per_iter * 1000)
```

### With 1 thread
quantize time per iter ms 0.22633790969848633
dequantize time per iter ms 0.6573665142059326

### With 4 threads
quantize time per iter ms 0.0905618667602539
dequantize time per iter ms 0.19511842727661133
ghstack-source-id: 98935895

Test Plan: python test/test_quantized.py

Reviewed By: jspark1105

Differential Revision: D20098521

fbshipit-source-id: bd8c45761b4651fcd5b20b95759e3868a136c048
2020-02-26 10:00:40 -08:00
2eb95d8f4a Migrate fmod and fmod_ from TH to ATen (CPU) (#33592)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24701
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33592

Differential Revision: D20043875

Pulled By: ezyang

fbshipit-source-id: b8c0a4e73a3cef6e55e91bbd35f8aadca8114c56
2020-02-26 09:35:16 -08:00
f87b0b2515 Remove the use of macros in defining binary ops for base Vec256 (#33733)
Summary:
This greatly improves readability and maintainability (e.g., debugging)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33733

Differential Revision: D20103187

Pulled By: ezyang

fbshipit-source-id: e539e46f5d378a2b01da7ecaa6b850655e0fa866
2020-02-26 09:21:35 -08:00
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
24659d28a1 Feature/vonmises upstream (#33418)
Summary:
Third try of https://github.com/pytorch/pytorch/issues/33177 😄
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33418

Differential Revision: D20069683

Pulled By: ezyang

fbshipit-source-id: f58e45e91b672bfde2e41a4480215ba4c613f9de
2020-02-26 08:19:12 -08:00
758ad516f3 [Lite interpreter] Pass shared_ptr properly (#33667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33667

Pass shared_ptr properly according to C++ guidances. Thank kimishpatel for pointing it out.

Test Plan: Imported from OSS

Differential Revision: D20111001

Pulled By: iseeyuan

fbshipit-source-id: 213a0f950a7f3b9199d789dc0155911f6102d77a
2020-02-25 21:40:05 -08:00
fc6a153688 [WIP] Reanimate gradient scaling API with original scale update heuristic (#33366)
Summary:
Also, windows memory failures responsible for the earlier reversion have been fixed.

This PR (initially) contains 2 commits:
* a revert of the revert
* all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366

Differential Revision: D20099026

Pulled By: ngimel

fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529
2020-02-25 19:00:34 -08:00
a836c4ca78 Skip manual backward for cdist with case p=2 (#31167)
Summary:
Fixes an issue with `cdist` backward calculation for large inputs for the euclidean case.

The grid size when launching the kernel exceeded the 2^16 limit for the second dimension, resulting in `RuntimeError: CUDA error: invalid configuration argument`

Code to reproduce:

```
h, w, d = 800, 1216, 12
n = 133
A = torch.randn(n, d).cuda()
B = torch.randn(h, w, d).cuda()
A.requires_grad = True
B.requires_grad = True

B = B.reshape(-1, d).contiguous()
dist = torch.cdist(A, B)
loss = dist.sum()
loss.backward()
```

Thanks to tkerola for the bug report, reproduction and suggesting a solution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31167

Differential Revision: D20035605

Pulled By: ngimel

fbshipit-source-id: ae28ba4b549ee07a8bd937bb1de2438dc24eaa17
2020-02-25 18:19:30 -08:00
9a5ea71380 pad_packed_sequence: doc improvement (#33768)
Summary:
pad_packed_sequence:
1. clarify that batch's order is restored to the original one
2. add example

This is a follow up to https://github.com/pytorch/pytorch/issues/33746
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33768

Differential Revision: D20102792

Pulled By: ngimel

fbshipit-source-id: 5ef511e5e3833edcb85cc01af0e92568b6d7a3cf
2020-02-25 18:00:04 -08:00
5bac7febad removed padding and dilation from LPPool2d Doc (#33714)
Summary:
removed padding and dilation from LPPool2d Doc as the function dose not support padding and dilation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33714

Differential Revision: D20097021

Pulled By: ngimel

fbshipit-source-id: fc1c2d918b32f4b45c7e6e6bd93f018e867a628f
2020-02-25 17:54:38 -08:00
038ee01393 Disable printing of the histogram when dump (#33749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33749

Disable printing of the histogram when dump to make the log cleaner.

Test Plan: CI

Reviewed By: amylittleyang

Differential Revision: D20087735

fbshipit-source-id: 5421cd9d25c340d92f29ce63fed2a58aefef567d
2020-02-25 17:37:55 -08:00
8667379133 [quant][graphmode][refactor] Factor out insertDequantCall (#33172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33172

For code reuse

Test Plan:
.

Imported from OSS

Differential Revision: D20087842

fbshipit-source-id: 797868d31b96c4ff8640121ea4bee1396deb6b57
2020-02-25 17:22:35 -08:00
a13ee18982 [quant][graphmode] refactor nodeQuantizable (#33171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33171

For better code reuse

Test Plan:
.

Imported from OSS

Differential Revision: D20087845

fbshipit-source-id: f88cffb410bd54a1b3f937786104f46bcd1190d3
2020-02-25 15:20:22 -08:00
8159316714 Revert D19941103: [pytorch] blas gemm fix for k=0
Test Plan: revert-hammer

Differential Revision:
D19941103

Original commit changeset: e1c85d1e7574

fbshipit-source-id: da12747130c60b61452aa46e269c66546a1075f9
2020-02-25 13:30:38 -08:00
4d203c6fc8 Move cumprod and cumsum to Aten(CPU) (#33280)
Summary:
This PR is about move cumprod and cumsum to Aten.
Test script:
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#torch.set_num_threads(1)

#warm up
for n in [10, 300]:
    input = torch.randn(n, n, n, requires_grad=False, device=device)
    input = input * 0.01 + 1
    for dim in range(input.dim()):
        for i in range(100):
            #output = input.cumsum(dim)
            output = input.cumprod(dim)

for n in [10, 300]:
    input = torch.randn(n, n, n, requires_grad=False, device=device)
    input = input * 0.01 + 1
    for dim in range(input.dim()):
        fwd_t = 0
        for i in range(1000):
            t1 = _time()
            #output = input.cumsum(dim)
            output = input.cumprod(dim)
            t2 = _time()
            fwd_t = fwd_t + (t2 -t1)
        fwd_avg = fwd_t / 1000 * 1000
        print("size = (%d, %d, %d); reduce dim=%d; compute time is %.4f(ms)" % (n, n, n, dim, fwd_avg))
```
Test device: **skx-8180**.
Performance:
```
size = (10, 10, 10); reduce dim=0; compute time is 0.0098(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0089(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0089(ms)
size = (300, 300, 300); reduce dim=0; compute time is 208.9403(ms)
size = (300, 300, 300); reduce dim=1; compute time is 241.5989(ms)
size = (300, 300, 300); reduce dim=2; compute time is 66.2587(ms)
After:
size = (10, 10, 10); reduce dim=0; compute time is 0.0065(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0063(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms)
size = (300, 300, 300); reduce dim=0; compute time is 36.0139(ms)
size = (300, 300, 300); reduce dim=1; compute time is 36.0776(ms)
size = (300, 300, 300); reduce dim=2; compute time is 21.0111(ms)
number_threads = 1:
size = (10, 10, 10); reduce dim=0; compute time is 0.0053(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0051(ms)
size = (300, 300, 300); reduce dim=0; compute time is 81.8831(ms)
size = (300, 300, 300); reduce dim=1; compute time is 88.5687(ms)
size = (300, 300, 300); reduce dim=2; compute time is 54.9922(ms)

cumprod:
Before:
size = (10, 10, 10); reduce dim=0; compute time is 0.0096(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0088(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0088(ms)
size = (300, 300, 300); reduce dim=0; compute time is 221.2601(ms)
size = (300, 300, 300); reduce dim=1; compute time is 249.7894(ms)
size = (300, 300, 300); reduce dim=2; compute time is 71.5182(ms)
number_threads = 1:
size = (10, 10, 10); reduce dim=0; compute time is 0.0100(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0093(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0093(ms)
size = (300, 300, 300); reduce dim=0; compute time is 207.6287(ms)
size = (300, 300, 300); reduce dim=1; compute time is 241.6693(ms)
size = (300, 300, 300); reduce dim=2; compute time is 66.2977(ms)
After:
size = (10, 10, 10); reduce dim=0; compute time is 0.0063(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0062(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms)
size = (300, 300, 300); reduce dim=0; compute time is 36.4283(ms)
size = (300, 300, 300); reduce dim=1; compute time is 38.1139(ms)
size = (300, 300, 300); reduce dim=2; compute time is 20.9140(ms)
number_threads =1:
size = (10, 10, 10); reduce dim=0; compute time is 0.0052(ms)
size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms)
size = (10, 10, 10); reduce dim=2; compute time is 0.0050(ms)
size = (300, 300, 300); reduce dim=0; compute time is 82.6926(ms)
size = (300, 300, 300); reduce dim=1; compute time is 90.1265(ms)
size = (300, 300, 300); reduce dim=2; compute time is 55.0196(ms)
```
Fix https://github.com/pytorch/pytorch/issues/24668, https://github.com/pytorch/pytorch/issues/24669.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33280

Differential Revision: D20076997

Pulled By: VitalyFedyunin

fbshipit-source-id: 12225767da8cfdc5e44257462a432bffa04cd469
2020-02-25 13:03:16 -08:00
0dded4026e [C++ API] Add PackedSequence / pack_padded_sequence / pad_packed_sequence / pack_sequence (#33652)
Summary:
Most of the function implementation and test code are translated from the Python version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33652

Differential Revision: D20052211

Pulled By: yf225

fbshipit-source-id: ce6767db54364f91ef4f06674239a12278c2752a
2020-02-25 12:53:41 -08:00
c20628c5f6 Remove clean_tag from tensorboard (#33133)
Summary:
The function originally comes from 4279f99847/tensorflow/python/ops/summary_op_util.py (L45-L68)

As its comment says:
```
    # In the past, the first argument to summary ops was a tag, which allowed
    # arbitrary characters. Now we are changing the first argument to be the node
    # name. This has a number of advantages (users of summary ops now can
    # take advantage of the tf name scope system) but risks breaking existing
    # usage, because a much smaller set of characters are allowed in node names.
    # This function replaces all illegal characters with _s, and logs a warning.
    # It also strips leading slashes from the name.
```

This function is only for compatibility with TF's operator name restrictions, and is therefore no longer valid in pytorch. By removing it, tensorboard summaries can use more characters in the names.

Before:
![0209-12:10:14](https://user-images.githubusercontent.com/1381301/74109072-37382e00-4b35-11ea-8c9f-ab37a8bd5808.png)

After:
![0209-12:10:57](https://user-images.githubusercontent.com/1381301/74109081-4323f000-4b35-11ea-9dab-447f8466a41e.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33133

Differential Revision: D20089307

Pulled By: ezyang

fbshipit-source-id: 3552646dce1d5fa0bde7470f32d5376e67ec31c6
2020-02-25 12:41:58 -08:00
72288e82e2 Use shim executable sccache-cl as the compiler instead of sccache cl (#33745)
Summary:
CMake only views the first item of `CC` and `CXX` as executable. So calling `sccache.exe` directly won't work. Using a shim executable resolves this problem.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33745

Differential Revision: D20100397

Pulled By: soumith

fbshipit-source-id: 3a130d30dd548b7c2e726c064e66ae4fccb30c44
2020-02-25 12:24:05 -08:00
0e74cbcc54 Revert "Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572)" (#33742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33742

This reverts commit 90f4c5695e1785883d9ae7c86ad3fabd1963a4cb.

Test Plan: Imported from OSS

Differential Revision: D20095103

Pulled By: ezyang

fbshipit-source-id: ff47dae21c278570b4ca497d76deedb75823d6d7
2020-02-25 12:09:49 -08:00
9bc922d518 Extend cuda install timeout for Windows jobs (#33755)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33755

Differential Revision: D20100372

Pulled By: soumith

fbshipit-source-id: 8b39177d3e87d248857f0582de6c9e203d09d4a7
2020-02-25 11:51:43 -08:00
7eba36b1f6 [quant][graphmode][refactor] Separate preprocess step for insertObserver (#32813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32813

We need to separate the step to make the logic more clear
and also to find all the values we want to skip in advance
without the interference of inserted observers

Test Plan:
.

Imported from OSS

Differential Revision: D20087841

fbshipit-source-id: ec3654ca561c0d4e2c05011988bb9ecc8671c5c2
2020-02-25 11:26:22 -08:00
d82093e665 [profiler] remove redundant assert in record_function_ops (#33225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33225

This removes a redundant assert statement in `record_function_ops`. In
the else branch in question, we are guaranteed to have `current == &rec`, so
this assert will never fire.

Although, maybe we should add an assert failure when `current == &rec` since it
seems that `current` should always be profiler::record_function_exit.
ghstack-source-id: 98852219

Test Plan: Existing autograd profiler UTs past

Differential Revision: D19849145

fbshipit-source-id: 2014a0d3b9d11e5b64942a54e0fb45e21f46cfa2
2020-02-25 10:59:10 -08:00
2b404de347 [scripts] Add script to fetch clang-format binary from AWS S3 (#33644)
Summary:
**Summary**
This commit adds a script that fetches a platform-appropriate `clang-format` binary
from S3 for use during PyTorch development. The goal is for everyone to use the exact
same `clang-format` binary so that there are no formatting conflicts.

**Testing**
Ran the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33644

Differential Revision: D20076598

Pulled By: SplitInfinity

fbshipit-source-id: cd837076fd30e9c7a8280665c0d652a33b559047
2020-02-25 10:47:03 -08:00
98526c7444 Migrate fake_quant_slice to TensorIterator (#33744)
Summary:
This is a quick improvement for per tensor quantization.

per-channel should remove the loop in https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/fake_quant_per_channel_affine.cpp

# Benchmark:
device = GTX-1650
```python
import torch
print(torch.__version__)

for i in range(1000):
    torch.randn(1024 * 128, device='cuda')

def f(e):
    a = torch.randn(2 ** e, device='cuda')
    torch.cuda.synchronize()
    %timeit torch.fake_quantize_per_tensor_affine(a, 0.5, 0, 0, 1); torch.cuda.synchronize()

for i in range(15, 27):
    f(i)
```
Before
```
1.5.0a0+bf00b4d
14.5 µs ± 981 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
18.2 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
25.6 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
38.6 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
70.2 µs ± 5.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
125 µs ± 4.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
231 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
461 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
891 µs ± 88.2 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.77 ms ± 8.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.77 ms ± 80.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.16 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
After
```
1.5.0a0+3f18ac3
12.5 µs ± 738 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
13.7 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
17.9 µs ± 850 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.7 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
50.4 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
95 µs ± 8.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
173 µs ± 7.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
348 µs ± 29.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
657 µs ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.33 ms ± 77.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.71 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.33 ms ± 439 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33744

Differential Revision: D20090129

Pulled By: ngimel

fbshipit-source-id: 5dd48a0c5455a2b6c5c638d747c1767cb259255d
2020-02-25 10:44:21 -08:00
8196ec0115 Remove some dead THStorage related code. (#33734)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33734

Test Plan: Imported from OSS

Differential Revision: D20084030

Pulled By: gchanan

fbshipit-source-id: 29aa5459e8ecc8af8af31157797f44057d6a786e
2020-02-25 09:44:05 -08:00
5ef1c2c5d2 Back out "[pt][quant] RNN debug test" (#33750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33750

Original commit changeset: 8c38d8f067e5
ghstack-source-id: 98911215

Test Plan: CI

Differential Revision: D20090521

fbshipit-source-id: 73df43ad60574e44e80b36ebf6392030c3efb66e
2020-02-25 09:28:00 -08:00
ee23944f46 [Caffe2] Fix shape inference for element-wise operators (#33431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33431

Some elementwise operators don't have shape and type inference specified for the output tensor: `BitwiseOr`, `BitwiseAnd`, `BitwiseXor`, `Not`, `Sign`.

This change fixes this issue:
- For `Not` and `Sign` operators, the output has the same type and shape as the input, so `IdenticalTypeAndShapeOfInput` function is used to specify that.
- For bitwise operators created by `CAFFE2_SCHEMA_FOR_BINARY_BITWISE_OP` macro, the type and shape inference rules should be the same as for other binary element-wise operators, so `TensorInferenceFunction(ElementwiseOpShapeInference)` is used to specify that.

Also some tests were modified to ensure that the shape and type are inferred (`ensure_outputs_are_inferred` parameter)

Test Plan:
```
CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:elementwise_ops_test
CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:math_ops_test
```

Note that the tests have to be executed with `CAFFE2_ASSERT_SHAPEINFERENCE=1` in order to fail upon shape inference failure.

Reviewed By: idning

Differential Revision: D19880164

fbshipit-source-id: 5d7902e045d79e5669e5e98dfb13a39711294939
2020-02-25 09:03:06 -08:00
819ca2c285 add bfloat16 conversion method in type stub (__init__.pyi) (#33747)
Summary:
Resolve https://github.com/pytorch/pytorch/issues/33699

`torch/__init__.pyi` will be generated like

```python
# TODO: One downside of doing it this way, is direct use of
# torch.tensor.Tensor doesn't get type annotations.  Nobody
# should really do that, so maybe this is not so bad.
class Tensor:
    requires_grad: _bool = ...
    grad: Optional[Tensor] = ...

    # some methods here...

    overload
    def bernoulli_(self, p: _float=0.5, *, generator: Generator=None) -> Tensor: ...
    def bfloat16(self) -> Tensor: ...
    def bincount(self, weights: Optional[Tensor]=None, minlength: _int=0) -> Tensor: ...

    # some methods here...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33747

Differential Revision: D20090316

Pulled By: ngimel

fbshipit-source-id: b9ce4c0d4ef720c94ccac0a0342a012e8cf3af0c
2020-02-25 08:49:47 -08:00
fd175fa8a2 fix bugs in gen_pyi.py (#33748)
Summary:
This loop should generate type hints for inplace binary operator methods (`binop` variable) but had been using `name` variable. That's why that wrong type hints had been generated.

Resolve https://github.com/pytorch/pytorch/issues/33698

 ---

Current `__init__.pyi` has these type hints.

```python
class Tensor:

    # some codes here...

    overload
    def zeros_like_(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like_(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like_(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like__(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like__(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like__(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like__(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like___(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like___(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like___(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like___(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like____(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like____(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def zeros_like____(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def zeros_like____(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...

    # some codes here...
```

But `__init__.pyi` should generate these type hints.

```python
class Tensor:

    # some codes here...

    overload
    def add_(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def add_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def add_(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def add_(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...

    # some codes here...

    overload
    def div_(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def div_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def div_(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def div_(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...

    # some codes here...

    overload
    def mul_(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def mul_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def mul_(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def mul_(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...

    # some codes here...

    overload
    def sub_(self, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def sub_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ...
    overload
    def sub_(self, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...
    overload
    def sub_(self, value: Number, other: Union[Tensor, Number], *, out: Optional[Tensor]=None) -> Tensor: ...

    # some codes here...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33748

Differential Revision: D20090444

Pulled By: ngimel

fbshipit-source-id: e4a5dd08126629ec4c54b630a87ee540e669ec9a
2020-02-25 08:45:19 -08:00
6bdb59539f follow-up test_torch .data removal (#33696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33696

This changes two tests:
- The batchnorm inference cannot change the memory format of the weights as they are 1D. So this is removed.
- The batchnorm test now run both in affine and not affine mode.
- I added back the test for type errors using .data. In particular, `.data` allows to change the type of a Tensor inplace (very bad, never do it!) but since it is possible, we should test it until .data is removed.

cc Enealor who did the first version of the PR.

Test Plan: Imported from OSS

Differential Revision: D20069241

Pulled By: albanD

fbshipit-source-id: a0348f40c44df38d654fb2a2b2b526d9d42f598a
2020-02-25 07:36:42 -08:00
4ef854b4b4 Fix potential hang when exiting main process (#33721)
Summary:
The following script reproduces the hang
```py
import multiprocessing, logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)

import torch

class Dataset:
    def __len__(self):
        return 23425

    def __getitem__(self, idx):
        return torch.randn(3, 128, 128), idx % 100

ds = Dataset()
trdl = torch.utils.data.DataLoader(ds, batch_size=64, num_workers=300, pin_memory=True, shuffle=True)

for e in range(1000):
    for ii, (x, y) in enumerate(trdl):
        print(f'tr {e: 5d} {ii: 5d} avg y={y.mean(dtype=torch.double).item()}')
        if ii % 2 == 0:
            print("="*200 + "BEFORE ERROR" + "="*200)
            1/0
```

The process will hang at joining the putting thread of `data_queue` in **main process**. The root cause is that too many things are put in the queue from the **worker processes**, and the `put` at 062ac6b472/torch/utils/data/dataloader.py (L928) is blocked at background thread. The `pin_memory_thread` exits from the set `pin_memory_thread_done_event`, without getting the `(None, None)`. Hence, the main process needs the same treatment as the workers did at
062ac6b472/torch/utils/data/_utils/worker.py (L198) .

After the patch, the script finishes correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33721

Differential Revision: D20089209

Pulled By: ezyang

fbshipit-source-id: e73fbfdd7631afe1ce5e1edd05dbdeb7b85ba961
2020-02-25 07:04:41 -08:00
7a8b6c2c6b [pytorch] blas gemm fix for k=0 (#33419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33419

These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true.
ghstack-source-id: 98836075

Test Plan: Previously got error for special case where k=0 which has gone. The error was in some complicated autograd, and I'm not sure how and where an simple regression test should be added.

Differential Revision: D19941103

fbshipit-source-id: e1c85d1e75744b1c51ad9b71c7b3211af3c5bcc6
2020-02-25 06:49:50 -08:00
4460c8b034 [C2] Tiny changes to adagrad to make it slightly better. (#33727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33727

Some small changes to adagrad (tiny bit faster, though there is more interesting diff in the stack on this).

Test Plan: Part of the stack

Reviewed By: chocjy

Differential Revision: D20029499

fbshipit-source-id: 7f4fddb9288d7881ef54673b17a0e19ef10d64c0
2020-02-24 23:02:17 -08:00
65864d3634 [C2] Small improvement for elementwise_mul operator. (#33537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33537

Cases of embeddings smaller than 128, we can get a bit more compute by
allocating less threads per block.

Test Plan: Unit-test, benchmark.

Reviewed By: xianjiec

Differential Revision: D19969594

fbshipit-source-id: 6cc6b14fc61302804bed9093ea3591f21e3827d8
2020-02-24 23:00:27 -08:00
adbe289870 Update MKL to 2020.0.166 for Windows (#33690)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33690

Differential Revision: D20089300

Pulled By: ezyang

fbshipit-source-id: 887c006fbdb2c837f0a1c607a196811f44f1fb35
2020-02-24 22:43:34 -08:00
36919278cc C++ tensor multi-dim indexing: add index() and index_put_() overloads, simple indexing tests, merge with Python indexing path (#32841)
Summary:
This PR adds the following items:
- **1st item**: `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads for `Tensor::index` and `Tensor::index_put_`, to be used specifically for multi-dim indexing purpose.

Design rationale:
* C++ `Tensor::index` and `Tensor::index_put_` are both existing tensor APIs, and they currently (before this PR) only accept a list of tensors (i.e. `ArrayRef<Tensor>`) as indices. If we change their signatures to also accept non-tensors as indices (i.e. `ArrayRef<TensorIndex>`, and `TensorIndex` is convertible from `Tensor` / `Slice` / `None` / `Ellipsis`), it would slow down the original code path (since now it has to go through more steps), which is undesirable.

    To get around this problem, the proposed solution is to keep the original `ArrayRef<Tensor>` overload, and add `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads to `Tensor::index` and `Tensor::index_put_`. This way, the original code path won’t be affected, and the tensor multi-dim indexing API is only used when the user explicitly pass an `ArrayRef<TensorIndex>` or a braced-init-list of `TensorIndex`-convertible types to `Tensor::index` and `Tensor::index_put_` .

    Note that the above proposed solution would still affect perf for the user’s original `Tensor::index` or `Tensor::index_put_` call sites that use a braced-init-list of tensors as input, e.g. `tensor.index({...})` or `tensor.index_put_({...}, value)`, since now such function calls would take the multi-dim indexing path instead of the original advanced indexing path. However, there are only two instances of this in our codebase (one in ATen cpp test, one in a C++ API nn init function), and they can be easily changed to explicitly use `ArrayRef<Tensor>` as input (I changed them in this PR). For external user’s code, since this is part of the C++ frontend which is still considered experimental, we will only talk about this change in the release note, and ask users to switch to using `ArrayRef<Tensor>` explicitly if they want to keep using the original advanced indexing code path.

- **2nd item**: Mechanisms for parsing `ArrayRef<TensorIndex>` indices and performing indexing operations (mirroring the functions in `torch/csrc/autograd/python_variable_indexing.cpp`).
- **3rd item**: Simple tests to demonstrate that the `Tensor::index()` and `Tensor::index_put_()` APIs work. I will add more tests after the first few PRs are reviewed.
- **4th item**: Merge Python/C++ indexing code paths, for code simplicity. I tested locally and found that there is no perf regression resulting from the merge. I will get more concrete numbers for common use cases when we settle on the overall design.

This PR supersedes https://github.com/pytorch/pytorch/pull/30425.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32841

Differential Revision: D19919692

Pulled By: yf225

fbshipit-source-id: 7467e64f97fc0e407624809dd183c95ea16b1482
2020-02-24 22:04:00 -08:00
6aecfd1e80 Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722

In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Test Plan:
Build: CI
Functionality: Not exposed

Reviewed By: dreiss

Differential Revision: D20069796

Pulled By: AshkanAliabadi

fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c
2020-02-24 21:58:56 -08:00
2a4aad7466 Don't activate vc env again for cuda with ninja on Windows (#33700)
Summary:
Possibly get rid of https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33700

Differential Revision: D20089251

Pulled By: ezyang

fbshipit-source-id: 0cfe62b869fb874e25f06894aa76fadc44cf6817
2020-02-24 21:56:29 -08:00
7caf3c396b [quant][graphmode][refactor] Change signature of getModuleAccessPath (#32812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32812

We'll error out for the case we can't handle inside the function,
instead of checking each time in the callsite

Test Plan:
.

Imported from OSS

Differential Revision: D20087846

fbshipit-source-id: ae6d33a94adf29c4df86d67783e7ef8753c91f90
2020-02-24 21:52:43 -08:00
a1862468d0 Add missing test launchers for JitRpcTest and JitDistAutogradTest (#32891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32891

- Add JitDistAutoGradTest into fork/spawn test launcher
- Add JitRpcTest into fork/spawn test launcher

ghstack-source-id: 98900090

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_spawn
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork_thrift

buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn_thrift
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork_thrift

buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_spawn
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_spawn_thrift
```

Differential Revision: D5785394

fbshipit-source-id: 335a85424d22f1a83874be81a8139499c9a68ce2
2020-02-24 21:42:47 -08:00
a9cef05f5d improve EmbeddingBag performance on cuda (#33589)
Summary:
This PR improves performance of EmbeddingBag on cuda by removing 5 kernel launches (2 of those are synchronizing memcopies).
- 2 memcopies are checking values of offsets[0] and offsets[-1] to be in expected range (0 for the former, less than number of indices for the latter). It seems strange to check only those 2 values, if users are providing invalid offsets, invalid values can be anywhere in the array, not only the first and last element. After this PR, the checks are skipped on cuda, the first value is forced to 0, if the last value is larger than expected, cuda kernel will assert. It is less nice than ValueError, but then again, the kernel could have asserted if other offset values were invalid. On the cpu, the checks are moved inside the cpu implementation from functional.py, and will throw RuntimeError instead of ValueError.
- 3 or 4 initializations (depending on the mode) of the output tensors with .zeros() are unnecessary, because every element of those tensors is written to, so their data can be uninitialized on the start.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33589

Reviewed By: jianyuh

Differential Revision: D20078011

Pulled By: ngimel

fbshipit-source-id: 2fb2e2080313af64adc5cf1b9fc6ffbdc6efaf16
2020-02-24 21:37:34 -08:00
3cf97bc23c Fix typing error of torch/nn/modules/container.pyi.in (#33686)
Summary:
* `Sequential` has `__iter__` method, but type stub doesn't
* `ModuleList.__getitem__` returns `Module`, but type stub doesn't
* Type stub says `ParameterList` has `insert` method, but actual `ParameterList` doesn't
* `ParameterDict.__getitem__` should returns `Parameter`
* `ParameterList` and `ParameterDict` have `extra_repr` methods

 ---

torch/nn/modules/container.py: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.py
torch/nn/modules/container.pyi.in: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.pyi.in
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33686

Differential Revision: D20086730

Pulled By: ngimel

fbshipit-source-id: a8271489417461c67ff84a239c4cd96c3aa17b5c
2020-02-24 21:20:38 -08:00
d6ea4be153 Fix minor problems in index_put_ docs (#33689)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/33641
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33689

Differential Revision: D20086967

Pulled By: ngimel

fbshipit-source-id: d9dde8edb904de1cf56b9337920cb29e008b72fb
2020-02-24 21:15:36 -08:00
54aac4af1f Update hypothesis_utils.py (#33739)
Summary:
A typo..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33739

Differential Revision: D20088096

Pulled By: jerryzh168

fbshipit-source-id: d8b5d263c25f8c779698607be87bf76aca1811ab
2020-02-24 20:56:42 -08:00
cba8af9b24 [pytorch] Set alias analysis kind to FROM_SCHEMA for qadd, qmul, qclamp, qconcat (#33359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33359

Updated alias analysis kind to FROM_SCHEMA so input tensors can be marked as nonmutable
when appropriate, allowing for constant folding of these tensors.

Needed to update the schemas of the _out variants with annotations to mark the output input
tensor as aliased and mutable.

Test Plan:
```
import torch

class M(torch.nn.Module):
    def __init__(self):
        super(M, self).__init__()

    def forward(self, x):
        w = torch.tensor([3], dtype=torch.float)
        w = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8)
        y = torch.tensor([3], dtype=torch.float)
        y = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8)
        return torch.ops.quantized.add_out(x, w, y)

m = torch.jit.script(M())
torch._C._jit_pass_constant_propagation(m.graph)
print(m.graph)
```
```
graph(%self : __torch__.___torch_mangle_9.M,
      %x.1 : Tensor):
  %11 : int = prim::Constant[value=12]() # <ipython-input-11-1dd94c30cb58>:9:49
  %9 : float = prim::Constant[value=1.]() # <ipython-input-11-1dd94c30cb58>:9:41
  %10 : int = prim::Constant[value=0]() # <ipython-input-11-1dd94c30cb58>:9:46
  %36 : QInt8(1) = prim::Constant[value={3}]()
  %y.2 : Tensor = aten::quantize_per_tensor(%36, %9, %10, %11) # <ipython-input-11-1dd94c30cb58>:11:12
  %24 : Tensor = quantized::add_out(%x.1, %36, %y.2) # <ipython-input-11-1dd94c30cb58>:12:15
  return (%24)
```
As expected, the aten::quantize_per_tensor() for w is now folded. The aten::quantize_per_tensor()
for y is not folded, since that tensor is aliased/modified.

Differential Revision: D19910667

fbshipit-source-id: 127071909573151dc664500d363399e3643441b7
2020-02-24 20:08:06 -08:00
bc5e9e0d55 [quant][graphmode][refactor] Move the check for qconfig inside insertObserver call (#32809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32809

This is a refactor to help further changes to quantization.cpp
We want some operations on the graph happen before we call insertObserver for invoked methods,
especially `addIntermediateValuesToSkipObserver` since we want to skip the input of the ReLU
module in `Conv - ReLU` pattern.

Test Plan:
test_jit.py
test_quantization.py

Imported from OSS

Differential Revision: D20087844

fbshipit-source-id: 28b7fa0c7ce9e254ab9208eb344893fb705e14d9
2020-02-24 20:03:33 -08:00
bf00b4d305 [TensorExpr] Add a boilerplate pass for future TensorExpr fusion pass. (#33464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33464

I added a python-exposed knob to register this pass in custom passes pipeline. If the knob is not used, the pass is not registered and thus not run at all.

Differential Revision: D19958217

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: fecdd98567fcda069fbdf8995c796899a3dbfa5c
2020-02-24 18:47:31 -08:00
9278196d89 scatter_add uses src, not other (#32307)
Summary:
using `other` kwarg gives `TypeError: scatter_add_() missing 1 required positional arguments: "src"`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32307

Differential Revision: D20076859

Pulled By: zou3519

fbshipit-source-id: dfb417c087d5be41fad02dc0b2cf0506c89b1b02
2020-02-24 18:01:34 -08:00
98af01ee7c [quant] Make FakeQuant use REGISTER_DISPATCH (#33682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33682

Previously, there were two API's for CPU and CUDA. This change keeps one top level API, i.e `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine` and uses the device type to dispatch to different backends (CPU and CUDA).
CPU kernel implementation is in QuantizedOpKernels.cpp
CUDA kernel implementation is in fake_quantize_core.cu

Test Plan:
python test/test_fake_quant.py

Benchmark Results for CPU
FakeQuantize tensor of size (2, 256, 128, 128)

Before:
per tensor quant ms 9.905877113342285
per channel quant ms 74.93825674057007

After:
per tensor quant ms 6.028120517730713
per channel quant ms 44.91588592529297

Imported from OSS

Differential Revision: D20072656

fbshipit-source-id: 0424f763775f88b93380a452e3d6dd0c90cb814b
2020-02-24 17:48:13 -08:00
b10a39bb32 Migrate _cat from TH to ATen (CUDA) (#33237)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24520

Benchmarks:

Upstream:

```
$ python -m pt.cat_test --tag_filter all --device cuda  --omp_num_threads 1 --mkl_num_threads 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1,1,1)_N2_dim0_cuda
# Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 17.355

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 30.718

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(128,1024,2)_N2_dim1_cuda
# Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 17.329

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 30.176

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim0_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 74.417

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1025,1023,2)_N2_dim1_cuda
# Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 75.728

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim2_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 190.165

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa8876fcf28>,111,65]_N5_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa8876fcf28>, 111, 65], N: 5, dim: 0, device: cuda
Forward Execution Time (us) : 57.711

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[96,<function<lambda>at0x7fa886237048>,64]_N5_dim1_cuda
# Input: sizes: [96, <function <lambda> at 0x7fa886237048>, 64], N: 5, dim: 1, device: cuda
Forward Execution Time (us) : 49.903

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[128,64,<function<lambda>at0x7fa7b57bb840>]_N5_dim2_cuda
# Input: sizes: [128, 64, <function <lambda> at 0x7fa7b57bb840>], N: 5, dim: 2, device: cuda
Forward Execution Time (us) : 84.181

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bba60>,32,64]_N50_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bba60>, 32, 64], N: 50, dim: 0, device: cuda
Forward Execution Time (us) : 82.339

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[32,<function<lambda>at0x7fa7b57bbae8>,64]_N50_dim1_cuda
# Input: sizes: [32, <function <lambda> at 0x7fa7b57bbae8>, 64], N: 50, dim: 1, device: cuda
Forward Execution Time (us) : 82.312

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[33,65,<function<lambda>at0x7fa7b57bbb70>]_N50_dim2_cuda
# Input: sizes: [33, 65, <function <lambda> at 0x7fa7b57bbb70>], N: 50, dim: 2, device: cuda
Forward Execution Time (us) : 90.715

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda
# Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 129.021

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda
# Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda
Forward Execution Time (us) : 142.966

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda
# Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda
Forward Execution Time (us) : 387.023

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbbf8>]_N100_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbbf8>], N: 100, dim: 0, device: cuda
Forward Execution Time (us) : 36.647

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbc80>]_N1000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbc80>], N: 1000, dim: 0, device: cuda
Forward Execution Time (us) : 278.890

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbd08>]_N2000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbd08>], N: 2000, dim: 0, device: cuda
Forward Execution Time (us) : 557.752

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7fa7b57bbd90>]_N3000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7fa7b57bbd90>], N: 3000, dim: 0, device: cuda
Forward Execution Time (us) : 842.512

```

New version:

```
$ python -m pt.cat_test --tag_filter all --device cuda  --omp_num_threads 1 --mkl_num_threads 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1,1,1)_N2_dim0_cuda
# Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 24.419

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 25.025

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(128,1024,2)_N2_dim1_cuda
# Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 24.247

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(512,512,2)_N2_dim1_cuda
# Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 25.098

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim0_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda
Forward Execution Time (us) : 74.441

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1025,1023,2)_N2_dim1_cuda
# Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda
Forward Execution Time (us) : 74.866

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(1024,1024,2)_N2_dim2_cuda
# Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 189.280

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1c9b056048>,111,65]_N5_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1c9b056048>, 111, 65], N: 5, dim: 0, device: cuda
Forward Execution Time (us) : 57.629

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[96,<function<lambda>at0x7f1c9b0560d0>,64]_N5_dim1_cuda
# Input: sizes: [96, <function <lambda> at 0x7f1c9b0560d0>, 64], N: 5, dim: 1, device: cuda
Forward Execution Time (us) : 49.975

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[128,64,<function<lambda>at0x7f1bce8f38c8>]_N5_dim2_cuda
# Input: sizes: [128, 64, <function <lambda> at 0x7f1bce8f38c8>], N: 5, dim: 2, device: cuda
Forward Execution Time (us) : 83.643

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3ae8>,32,64]_N50_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3ae8>, 32, 64], N: 50, dim: 0, device: cuda
Forward Execution Time (us) : 82.307

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[32,<function<lambda>at0x7f1bce8f3b70>,64]_N50_dim1_cuda
# Input: sizes: [32, <function <lambda> at 0x7f1bce8f3b70>, 64], N: 50, dim: 1, device: cuda
Forward Execution Time (us) : 82.323

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[33,65,<function<lambda>at0x7f1bce8f3bf8>]_N50_dim2_cuda
# Input: sizes: [33, 65, <function <lambda> at 0x7f1bce8f3bf8>], N: 50, dim: 2, device: cuda
Forward Execution Time (us) : 90.549

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda
# Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda
Forward Execution Time (us) : 129.022

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda
# Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda
Forward Execution Time (us) : 142.969

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda
# Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda
Forward Execution Time (us) : 386.973

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3c80>]_N100_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3c80>], N: 100, dim: 0, device: cuda
Forward Execution Time (us) : 43.800

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3d08>]_N1000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3d08>], N: 1000, dim: 0, device: cuda
Forward Execution Time (us) : 279.023

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3d90>]_N2000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3d90>], N: 2000, dim: 0, device: cuda
Forward Execution Time (us) : 565.790

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_sizes[<function<lambda>at0x7f1bce8f3e18>]_N3000_dim0_cuda
# Input: sizes: [<function <lambda> at 0x7f1bce8f3e18>], N: 3000, dim: 0, device: cuda
Forward Execution Time (us) : 845.153
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33237

Differential Revision: D20069181

Pulled By: ngimel

fbshipit-source-id: b392e1ffd72c0d8df0c5a2d3ac96f59b37c84e32
2020-02-24 17:41:16 -08:00
97da60d511 Updating submodules
Summary:
GitHub commits:

ea8bae1f0f
134472ee45
37e6cf9d62
eb367d45c0
76de6e15c0
e1b1a55309

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: 9d0d688d81be822900475223a787c5649e143e85
2020-02-24 17:34:59 -08:00
479e474a37 [quant][graphmode] FoldConvBatchNorm2d support shared ClassTypes (#32379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32379

Folding Conv2d - BatchNorm2d modules means recalculate the weight and bias of Conv2d module by incorproating the parameters
of BatchNorm2d, and also change the method calls to calling only forward of Conv2d module, this involves change of both module
types and graph because the bias of Conv2d is a parameter when it has value and is an attribute when it is
None(since JIT code has assumption of prameter being Tensor in multiple places), therefore
we'll need to remove the bias attribute when it is None and add a bias attribute later. Since ClassType might be shared, we separate
remove and add in separate steps and also keep track of the processed graph to avoid modifying the graph and type multiple times.
However we'll have to record the slot index of bias as well so we can replay the slot removal on other instances of Conv2d module.

Test Plan:
tbd

Imported from OSS

Differential Revision: D20078719

fbshipit-source-id: cee5cf3764f3e0c0a4a2a167b78dbada2e3835cc
2020-02-24 17:29:13 -08:00
54e41a87eb Make ELU great again (#33244)
Summary:
Due to compiler bug, we have to make some workaround on ELU for CUDA. A necessary condition for this bug to happen is `invoke_with_array` in `Loops.cuh`. Now, https://github.com/pytorch/pytorch/issues/33222 will kill that function, and we need to remove that workaround once https://github.com/pytorch/pytorch/issues/33222 is landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33244

Differential Revision: D20076197

Pulled By: ngimel

fbshipit-source-id: 39f99783014c78cecad1c39cb46092278ff220b9
2020-02-24 17:18:30 -08:00
5b031d961d [pt][quant] RNN debug test (#33621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33621

ghstack-source-id: 98746093

Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn \(test_quantization\.PostTrainingDynamicQuantTest\)'  --print-passing-details

Differential Revision: D20036968

fbshipit-source-id: 7cbb027a6afbe28bc250fc663089c6a9406e880b
2020-02-24 16:15:17 -08:00
696527e659 [caffe2] Add embedding empty ratio checker (disabled by default) (#33145)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33145

Reviewed By: xianjiec

Differential Revision: D19716574

fbshipit-source-id: 42a636600ac3977910d35093916865790bbe5b10
2020-02-24 16:10:01 -08:00
5090d7082b add propagate flag USE_DISTRIBUTED for libtorch_python_source
Reviewed By: pritamdamania87

Differential Revision: D20070789

fbshipit-source-id: fdb8a2eefb5bfc1ae1d80e29bd15eb1d70920c87
2020-02-24 16:02:47 -08:00
330b69fef8 Kill dead scalar_check. (#33695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33695

I'm not sure how this stuck around, but it has no effect.

Test Plan: Imported from OSS

Differential Revision: D20068867

Pulled By: gchanan

fbshipit-source-id: 79191338a8bc7a195e2b7265005ca6f00aab3818
2020-02-24 14:53:24 -08:00
996c0adb53 [quant] Regsiter fake_quant and observer attributes as buffers (#33626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626

For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest.

Test Plan:
Tested on actual model on GPU

Imported from OSS

Differential Revision: D20038839

fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4
2020-02-24 14:16:03 -08:00
dc3d47110a [docs] add experimental warning to TorchScript classes in language reference (#33697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33697

reference

Test Plan: Imported from OSS

Differential Revision: D20070220

Pulled By: suo

fbshipit-source-id: 9828d876afed59203cc472eaf0134d52d399069e
2020-02-24 14:01:19 -08:00
533b973fd0 Fix visibility of torch::nn::RNNImpl::options (#33718)
Summary:
In PR https://github.com/pytorch/pytorch/issues/33027, `options` in RNNImpl was mistakenly changed to `protected` (it was `public` before)

```
 protected:
  FORWARD_HAS_DEFAULT_ARGS({1, AnyValue(Tensor())})

  RNNOptions options;
```

This PR changes it back to `public` again.

Fixes https://github.com/pytorch/pytorch/issues/33694.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33718

Differential Revision: D20075149

Pulled By: yf225

fbshipit-source-id: 82901369eeaacd82df849e17df64dc1aaf98f9fe
2020-02-24 13:50:39 -08:00
062ac6b472 Bring up new-style registration API as wrapper around old-style (#33205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33205

A number of important use-cases are implemented:

- def(schema): defines a schema, with no implementation (alias
  inferred from schema, by default)
- def(schema, fn_ptr): registers fn_ptr as a catch-all kernel
  for the operation
- def(schema, lambda): registers lambda as a catch-all kernel
  for the operation
- def(schema, torch::dispatch(dispatch_key, fn)), and
  def(schema, torch::dispatch(device_type, fn)): registers
  the function to only be executed when dispatch_key/device_type
  is selected for use
- def(schema, TORCH_OPTIMIZED_FN(fn)): registers the function
  as unboxed only, using the inline syntax

All of our code generated registrations in ATen are switched to
the new API.

Some aspects of the API which are not fully implemented:

- It's still not valid to omit the schema when registering a function
  pointer, due to #32549
- Although it's possible to take advantage of top-level namespaces
  ala torch::import("aten"), we don't use it because this results
  in worse code (as we have to cat everything back together).  This
  is not an essential problem, we just need the internals to be less
  stupid.

There are some aspects of the API which don't semantically make sense,
but I chose not to fix them in this PR:

- For some reason, TORCH_OPTIMIZED_FN uses the *runtime* wrapper to
  do wrapping, rather than the compile time one which inlines the
  function in.  This means that there isn't any reason we should be
  passing in the function pointer as a template argument; a regular
  old argument ought to have worked fine.  This is seemingly
  consistent with the current API though; needs further investigation.
- There's no reason to optional<DispatchKey>, DispatchKey would
  work just fine (use DispatchKey::Undefined for the nullopt case)

In the long term, we should swap the wrapper around: the new-style
API has the real implementation, and the old-style API is backwards
compatibility.  However, this implies a lot of internal refactoring,
so I decided to short circuit around it to get this in faster

Ancillary changes:
- I stopped moving optional<DispatchKey>, it's literally just two
  words, pass it by value please.
- Needed to add a & qualified version of RegisterOps::op, since
  I'm storing RegisterOps as a member inside the new style
  Namespace and I cannot conveniently get a rvalue reference
  to it in that situation.  (BTW, register_ = std::move(register_)
  really doesn't work, don't try it!)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19856626

Pulled By: ezyang

fbshipit-source-id: 104de24b33fdfdde9447c104853479b305cbca9a
2020-02-24 11:45:14 -08:00
ced8865d91 Add sigmoid to mobile ops
Summary: Used by segmentation model.

Test Plan: Ran segmentation model on mobile.

Reviewed By: iseeyuan

Differential Revision: D19881378

fbshipit-source-id: 87f00058050fd173fbff1e88987ce09007622b83
2020-02-24 11:37:24 -08:00
32c93099c4 Add typing info for data members of utils.data.sampler classes (#33679)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33490
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33679

Differential Revision: D20063099

Pulled By: ngimel

fbshipit-source-id: 1bbf71a65408d117019ab38d7d095cfd337f5d1e
2020-02-24 11:29:59 -08:00
4d9b649261 jit pickling rref (#32959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32959

in rpc torch script call path, we need to pickle/unpickle rref, this diff is added to make jit pickler/unpickler be able to pickle/unpickle rref. It is similar to what is implemented for PyRef::pickle() and PyRef::unpickle().
The pickling/unpickling design assumes it is always coupled with RPC calls. It is not needed to checkpoint a model with rref, before checkpointing the model, user should call ref.to_here() to get value inside rref.

The pickling process is:
1. push torch.distributed.rpc.rref global string
1. call rref.fork() and create rrefForkData, which is a few IDs and type str of the value held inside the rref, the IDs includes rref id, fork id, caller work id, callee work id, owner work id
2. push the rrefForkData

The unpickling process is:
1. read torch.distributed.rpc.rref global string, and retrieve the cached global lamda function
2. the globa lamda function will get rrefForkData
3. if callee is also owner work id, then get owner rref based on Ids inside rrefFork data and return the ownerRRef
4. if callee is not owner work id, then create user rref using the rrefForkData and return the userRRef
5. meanwhile owner rref will be notified and do reference counting correctly

During unpickling, a type_resolver is needed to parse type str. This type_resolver has python dependency, so we get it from rpc_agent, and pass it to unpickler during construction. So we added a type_resolver argumenmt to jit unpickler constructor in this diff.
ghstack-source-id: 98814793

Test Plan: unit test

Differential Revision: D19713293

fbshipit-source-id: 4fd776cdd4ce8f457c4034d79acdfb4cd095c52e
2020-02-24 11:16:35 -08:00
481e7f2e78 catch and propagate warnings for JIT ScriptMethods (#33010)
Summary:
We align it with ScriptFunctions by using the HANDLE_TH_ERRORS/END_HANDLE_TH_ERRORS_PYBIND macros.

Fixes https://github.com/pytorch/pytorch/issues/24155  or https://github.com/pytorch/pytorch/issues/24828 ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33010

Differential Revision: D20053585

Pulled By: suo

fbshipit-source-id: c8876b54069285ba9638bb2328fd8738b59c396d
2020-02-24 10:28:17 -08:00
6a76433b9d [Update independent.py]add explicit string representation (#33676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33676

Differential Revision: D20069202

Pulled By: ngimel

fbshipit-source-id: 48b609d4fb7a098e9e3383553103a9441673d63f
2020-02-24 10:15:00 -08:00
6a275b696e adding IterableDataset to utils.data.__init__ (#33543)
Summary:
this shall fix issue https://github.com/pytorch/pytorch/issues/27820 again
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33543

Differential Revision: D20002446

Pulled By: vincentqb

fbshipit-source-id: 7563a56fd6238efe8ea5626b02ba5e8fcda0780e
2020-02-24 10:09:38 -08:00
e3ba533c8b Minimize the cases where we have to cpu_zero. (#33570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33570

In this PR, we are a bit more careful about avoiding zero-ing the output.  Analysis as follows:
1) `mm` doesn't need zero_ because it never calls scal, which is the underlying problem.
2) for `mv`, which does call scal (in certain cases), we can just move the zeroing to where it would actually be a problem, namely when the scalar value is 0.
In this case we just run the non-BLAS version of the code.

Test Plan: Imported from OSS

Differential Revision: D20007665

Pulled By: gchanan

fbshipit-source-id: 1f3a56954501aa9b2940d2f4b35095b2f60089a8
2020-02-24 07:47:36 -08:00
641750e33c Fix NaN handling in torch.mv. (#31666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31666

List of changes:
1) Fix a case where torch.mv was not handling NaNs correctly.  In particular, with a transposed tensor and expanded vector, NaNs in the output are kept, even if beta = 0.
This is handled in the `out=` case by zero-ing out the passed-in Tensor, but this can happen just the same with the non-out variant if the allocated tensor happens to have a NaN.
Also adds tests for this case.
NOTE: we zero out the output tensor in all cases for mv and mm, even though this is probably overkill.  I didn't find another case where this would be a problem, but the old code at least
attempted to do this for all mv and mm calls and I didn't add comprehensive testing to be sure that it's not a problem.

2) on CPU: move mv, mv_out, mm, mm_out to be direct wrappers on _th_addmv, _th_addmm, rather than having their own wrappers in Declarations.cwrap.
Ths is to remove the magic around cpu_zero from the codegen, which simplifies the codegen and makes testing this easier.

Test Plan: Imported from OSS

Differential Revision: D19239953

Pulled By: gchanan

fbshipit-source-id: 27d0748d215ad46d17a8684696d88f4cfd8a917e
2020-02-24 07:46:08 -08:00
039dc90854 Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration.
Test Plan: revert-hammer

Differential Revision:
D19521853

Original commit changeset: 99a1fab31d0e

fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6
2020-02-23 22:07:19 -08:00
9d834cc889 [JIT] Fix FunctionType::python_str() (#33680)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33680

Test Plan: Imported from OSS

Differential Revision: D20062777

Pulled By: jamesr66a

fbshipit-source-id: fcdb0527ca6776ff161cd535794e9c12bb32bdde
2020-02-23 21:52:09 -08:00
5fa03d4dbb Fix bug where we were trying to get a schema for prim::Constant, which is not registered as an operator. (#33645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33645

Fix bug where we were trying to get a schema for prim::Constant, which is not registered as an operator.
ghstack-source-id: 98785729

Test Plan: buck test mode/dev //pytext/models/test:scripted_seq2seq_generator_test -- 'test_generator \(pytext\.models\.test\.scripted_seq2seq_generator_test\.ScriptedSeq2SeqGeneratorTest\)'

Differential Revision: D20050833

fbshipit-source-id: cc38510b0135b750fdf57fb9c1e66ce1d91ee128
2020-02-23 21:37:35 -08:00
e1bddbbaf6 Bounds checking for functor execution in vectorized/unrolled kernels (#33642)
Summary:
The current logic for vectorized/unrolled operations in CUDALoops.cuh applies bounds checking to loads and stores, [but not to the actual functor's execution](16d6c17845/aten/src/ATen/native/cuda/CUDALoops.cuh (L264)).  In other words, for a block acting on the tail of a tensor that doesn't require the whole block to participate in memory transactions, many threads execute their functor on uninitialized data.  For functors that only communicate with the outside world via the bounds-checked loads and stores, that's ok.  The threads acting on garbage data never actually write their results.  But [my proposed inf/nan checking kernel](https://github.com/pytorch/pytorch/pull/33366/files#diff-9701a2b34900195d160bdc234e001b79R70-R79) has the additional side effect of writing to a `found_inf` flag in global memory.  For irregularly-shaped tensors where tail threads execute the functor on garbage data, these threads would sometimes see and report spurious infs/nans.

In general, we can't guarantee functors won't have side effects.  For safety (and efficiency) we should apply bounds checking to the functor execution as well as the loads and stores.

Is it possible that other elementwise kernels (in addition to the strided/vectorized implementation) are also executing functors unconditionally?  That would cause similar failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33642

Differential Revision: D20062985

Pulled By: ngimel

fbshipit-source-id: 65b8d75a001ce57865ed1c0cf89105d33f3f4dd4
2020-02-23 21:17:31 -08:00
941b42428a Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509)
Summary:
In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding **native** implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Reviewed By: dreiss

Differential Revision: D19521853

Pulled By: AshkanAliabadi

fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa
2020-02-23 19:08:42 -08:00
7aa605ed92 Remove uses of .data in test_torch (#33638)
Summary:
Removes almost every usage of `.data` in test_torch to address part of https://github.com/pytorch/pytorch/issues/33629.

Lines 4706-4710 had to be refactored to allow this. The changed test is fundamentally the same, as it appears to be meant to confirm that using an input of a different type than the weight causes an appropriate error.

There is one remaining usage of `.data`, and it is on line 5132. This was left as the `set_` and `resize_` methods still mention `.data` explicitly. I figure the right time to remove this is when those methods have their runtime errors updated.

Note: ~~some tests are skipped locally, and so I am still verifying that nothing has been obviously broken.~~ Appears to be passing early tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33638

Differential Revision: D20062288

Pulled By: albanD

fbshipit-source-id: 672a6d7a20007baedb114a20bf1ddcf6c4c0a16a
2020-02-23 14:11:21 -08:00
6d448acb34 [PyTorch BC] Skip aten::random_ to fix BC CI (#33666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33666

it's caused by a revert. So let's skip it.

Test Plan: ci

Reviewed By: hl475

Differential Revision: D20057382

fbshipit-source-id: d71af8efe68b31befcef5dddc372540e8a8ae2ac
2020-02-22 21:28:18 -08:00
9e384f9ce4 Remove duplicate header include. (#33656)
Summary:
The same header `<torch/nn/functional/conv.h>` is included twice.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33656

Differential Revision: D20056913

Pulled By: yf225

fbshipit-source-id: b1563035c9821731b99c26eec130ff0b9cc627a7
2020-02-22 14:17:07 -08:00
312627a7c3 Revert D19776613: Migrate random_ from the TH to Aten (CPU)
Test Plan: revert-hammer

Differential Revision:
D19776613

Original commit changeset: a8d262bccf5f

fbshipit-source-id: 36389ffa3d8377743f55f97221d7a7ee25a409f6
2020-02-22 08:15:27 -08:00
a2f3c6c26f Call RandomNumberSeed() on-demand (#33539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539

We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`.

Test Plan:
unittests.

Canaries:
AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410
AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838
Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569

Reviewed By: ipiszy

Differential Revision: D19993190

fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff
2020-02-22 01:22:18 -08:00
8291e06f8f Fixes cuda->numpy and non-strided->numpy segfaults (#33612)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/33300.

Calling .numpy() on a CUDA or non-strided (e.g. sparse) tensor segfaults in current PyTorch. This fixes the segfaults and throws the appropriate TypeError, as was intended.

Two tests, one in test_cuda.py and the other in test_sparse.py, are added to verify the behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33612

Differential Revision: D20038210

Pulled By: mruberry

fbshipit-source-id: 265531dacd37c392232fd3ec763489a62ef54795
2020-02-21 22:23:08 -08:00
59daf1611b [Caffe2] Skip //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.RocksDB'
Summary: Skip the test to unblock dper fbpkg push

Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.RocksDB' --run-disabled

Reviewed By: cheshen1

Differential Revision: D20043418

fbshipit-source-id: 05ceb2cea08722a671fa211d73680fd4b78f354c
2020-02-21 21:30:02 -08:00
1c08fa7051 [Caffe2] Skip caffe2/caffe2:caffe2_test_cpu - DBSeekTest.LMDB
Summary: skip broken tests in https://fburl.com/svc/zsbsrc7a to unblock dper fbpkg push.

Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.LMDB' --run-disabled

Reviewed By: cheshen1

Differential Revision: D20042330

fbshipit-source-id: 5b86e66da2a219c915c471b8e87f33239bdc5ba9
2020-02-21 21:28:31 -08:00
a7e22b4c6a add bailout checks to checkScript (#32802)
Summary:
this adds enough infrastructure to run bailout checks in `checkScript`. I'll need to figure out the best way to enable it for nightly builds now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32802

Differential Revision: D19974718

Pulled By: Krovatkin

fbshipit-source-id: 40485503f6d3ae14edcce98e1eec1f0559f3ad08
2020-02-21 21:18:54 -08:00
9b2b15f4fc misc windows warning fixes (#33632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33632

* `inline_container.h` was unnecessarily exposing all includers to caffe2 headers via `caffe2/core/logging.h`
* Add msvc version of hiding unused warnings.
* Make sure clang on windows does not use msvc pragmas.
* Don't redefine math macro.

Test Plan: CI green

Differential Revision: D20017046

fbshipit-source-id: 230a9743eb88aee08d0a4833680ec2f01b7ab1e9
2020-02-21 19:36:25 -08:00
d971007c29 Migrate random_ from the TH to Aten (CPU) (#32534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32534

Fixes #24752
Fixes #32510

Test Plan: Imported from OSS

Differential Revision: D19776613

Pulled By: pbelevich

fbshipit-source-id: a8d262bccf5f2807f6125c83080aa16d77491b19
2020-02-21 16:13:58 -08:00
e10aa6b72f Fix flaky DagNetTest unittest
Summary: The first run of the net is noisy sometimes - just run it twice.

Reviewed By: cheshen1

Differential Revision: D20039274

fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29
2020-02-21 16:08:04 -08:00
6474ea404d [C2] Native GPU implementation for bucketize (#33529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33529

Current version goes through GPU -> CPU -> GPU copy and is pretty slow: ~19 ms
for 1M elements with 20 possible buckets based on benchmark.

This new version is ~0.2 on the same

Test Plan: benchmark + unit-test

Reviewed By: chocjy

Differential Revision: D19969518

fbshipit-source-id: 51889bc9a232b6d45d9533e53b7b7f4531da481f
2020-02-21 15:47:04 -08:00
15ba902c08 Turn ONNX_ML into a proper build option. (#33424)
Summary:
The detection of the env variable ONNX_ML has been properly handled in tools/setup_helpers/cmake.py,
line 242.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33424

Differential Revision: D20043991

Pulled By: ezyang

fbshipit-source-id: 91d1d49a5a12f719e67d9507cc203c8a40992f03
2020-02-21 15:42:33 -08:00
16d6c17845 improve roll performance (#33623)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33544
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33623

Differential Revision: D20037643

Pulled By: ngimel

fbshipit-source-id: 9fd293eca5242daf414c116344b2e1fde9f9ebc5
2020-02-21 15:09:51 -08:00
f62f1b2ef0 Revert "Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to … (#33553)
Summary:
…have different argument types"

This reverts commit 05fb160048b71c1b8b00d2083a08618318158c1a.

Please go to https://github.com/pytorch/pytorch/pull/33558 and check the CUDA9 on CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33553

Differential Revision: D20017575

Pulled By: ngimel

fbshipit-source-id: a5fd78eea00c7b0925ab21fd90a7daeb66725f1a
2020-02-21 14:56:30 -08:00
a72946dbab Stop generating out full function type for registration, use decltype or infer it (#33097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33097

Previously, we had to specify full types because the functions we registering
might be overloaded, and the type was necessary to resolve the ambiguity.  I
disambiguate all of these names by mangling the names of the methods we
place on CPUType/CUDAType/TypeDefault with the overload name (these are
*internal* wrappers which are not user visible), and then can strip
the generation of full function types from the registration.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19837898

Pulled By: ezyang

fbshipit-source-id: 5f557184f6ec84cb0613d4eb2e33b83fd1712090
2020-02-21 14:26:14 -08:00
22963f42ec Delete unnecessary aliasAnalysis specification from operator registrations. (#33093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093

In #30187 the aliasAnalysis field on operator registration was updated
so that alias analysis could be specified in only some registration call
sites, rather than requiring it be consistently specified in all call
sites.  With this change, we can eliminate the requirement that all
registrations specify aliasAnalysis; as long as we know *one* site
specifies the correct aliasAnalysis, we don't have to specify it
any of the other sites.

In this patch, the "one site" is TypeDefault.cpp (previously we only
generated these stub declarations for manually registered functions,
but now we generate the stubs for everything).  Then I delete aliasAnalysis
anywhere we register an op for an existing function (which is a lot
of places).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19837897

Pulled By: ezyang

fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2
2020-02-21 14:24:44 -08:00
d5b768dffd refactor strongTypePtr (#33590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33590

ghstack-source-id: 98713798

Test Plan: unit test

Differential Revision: D20015521

fbshipit-source-id: 8c744a6f30f12671bef89c3555110ce26609d9a3
2020-02-21 13:32:18 -08:00
47e90d774e C++/Python API Parity: add pad_sequence (#32387)
Summary:
- add `pad_sequence` and tests
- related issue https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32387

Differential Revision: D20025421

Pulled By: yf225

fbshipit-source-id: caa9ae2114bece8db387a3a1610f24a3e06b1324
2020-02-21 13:16:09 -08:00
bb5181b716 [TensorExpr] Add IR Printer. (#33220)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33220

Test Plan: Imported from OSS

Differential Revision: D19848379

Pulled By: ZolotukhinM

fbshipit-source-id: 1c6ab4f63080d4506dedc3c47938de92fb4bfba2
2020-02-21 13:10:26 -08:00
fc70fc3610 [TensorExpr] Add IR visitor, IR mutator, and IR evaluator. (#33219)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33219

Test Plan: Imported from OSS

Differential Revision: D19848381

Pulled By: ZolotukhinM

fbshipit-source-id: 44ca7cd99c25e290a8ffd8146785c19f9c785dfd
2020-02-21 13:10:22 -08:00
49af9425a7 [TensorExpr] Add core classes for representing expressions and statements. (#33218)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33218

Test Plan: Imported from OSS

Differential Revision: D19848378

Pulled By: ZolotukhinM

fbshipit-source-id: 48399f8651324d5ad0607e08573d5d7b2026bb23
2020-02-21 13:10:17 -08:00
1a4f997178 [TensorExpr] Add a class for representing data type. (#33217)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33217

Test Plan: Imported from OSS

Differential Revision: D19848380

Pulled By: ZolotukhinM

fbshipit-source-id: d8683f8fc4555d2456cd2a7c827d8e8231915b49
2020-02-21 13:10:12 -08:00
089d658153 [TensorExpr] Add classes for memory management in tensor expressions. (#33216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33216

All tensor expressions belong to a kernel arena and are freed when the
arena is destroyed. Until it is destroyed, all expressions stay valid.

Test Plan: Imported from OSS

Differential Revision: D19848382

Pulled By: ZolotukhinM

fbshipit-source-id: a581ea2b635b9ba2cc53949616a13d8d3a47caae
2020-02-21 13:08:50 -08:00
616beb1412 [ROCm] Added support for pytorch extensions to use HIP (#32669)
Summary:
This pull request has changes for:
1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py
2. Fixes for hipify module to be able to be used by a torch extension

cc: ezyang iotamudelta jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669

Differential Revision: D20033893

Pulled By: zou3519

fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008
2020-02-21 12:10:02 -08:00
ca8e025cdf improve the doc of enforce_sorted in pack_padded_sequence (#33617)
Summary:
this is a follow up PR to https://github.com/pytorch/pytorch/issues/33602:

torch/nn/utils/rnn.html:

`pack_padded_sequence` has a confusing and incomplete description of the `enforce_sorted` param. Currently it goes:

```
        enforce_sorted (bool, optional): if ``True``, the input is expected to
            contain sequences sorted by length in a decreasing order. If
            ``False``, this condition is not checked. Default: ``True``.
```

The second part "this condition is not checked" (1) makes no sense since the alluded to condition is not described and (2) it's incomplete as it doesn't reflect the important part, that it actually does the sorting. I think it should say something like:

```
        enforce_sorted (bool, optional): if ``True``, the input is expected to
            contain sequences sorted by length in a decreasing order. If
            ``False``, the input will get sorted unconditionally. Default: ``True``.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33617

Differential Revision: D20035131

Pulled By: albanD

fbshipit-source-id: 654382eb0cb62b5abc78497faa5b4bca42db5fda
2020-02-21 11:51:08 -08:00
293fa5fc44 [Documentation] Fix minor typo in torch.serialization (#33549)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33549

Differential Revision: D20002545

Pulled By: albanD

fbshipit-source-id: 46fe2002329e5250c009eb066432909b71ecd74d
2020-02-21 09:29:13 -08:00
e77abb9a5b Normalize reward-to-go in C++ actor-critic (#33550)
Summary:
Comparing to the [Python implementation](https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py), it seems like the tensor of normalized reward-to-go is computed but never used. Even if it's just an integration test, this PR switches to the normalized version for better convergence.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33550

Differential Revision: D20024393

Pulled By: yf225

fbshipit-source-id: ebcf0fee14ff39f65f6744278fb0cbf1fc92b919
2020-02-21 09:19:39 -08:00
ee28831341 [jit] Fix aug assign for non-tensor attributes (#32993)
Summary:
Instead of erroring out this de-sugars augmented assignments to class
members from `self.a += 1` to `self.a = self.a + 1`.

Fixes #32973
](https://our.intern.facebook.com/intern/diff/19737636/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32993

Pulled By: driazati

Differential Revision: D19737636

fbshipit-source-id: 07307cde88d8c348a7affdafe26db21c74e28ec0
2020-02-21 08:42:35 -08:00
fa80299bdf __torch_function__ overrides for torch.functional and torch.nn.functional (#32799)
Summary:
This adds `__torch_function__` support for all functions in `torch.functional` and `torch.nn.functional`.

The changes to C++ code and codegen scripts are to facilitate adding `__torch_function__` support for the native functions in `torch._C._nn`. Note that I moved the `handle_torch_function` C++ function to a header that both `python_torch_functions.cpp` and `python_nn_functions.cpp` include. The changes to `python_nn_functions.cpp` mirror the changes I made to `python_torch_functions.cpp` when `__torch_function__` support was first added in https://github.com/pytorch/pytorch/issues/27064. Due to the somewhat different way the `torch._C` and `torch._C._nn` namespaces are initialized I needed to create a new static reference to the `torch._C._nn` namespace (`THPNNVariableFunctions`). I'm not sure if that is the best way to do this. In principle I could import these namespaces in each kernel and avoid the global variable but that would have a runtime cost.

I added `__torch_function__` support to the Python functions in `torch.nn.functional` following the approach in https://github.com/pytorch/pytorch/issues/32194.

I re-enabled the test that checks if all functions in the `torch` namespace are explicitly tested for `__torch_function__` support. I also generalized the check to work for `torch.functional` and `torch.nn.functional` as well. This test was explicitly disabled in https://github.com/pytorch/pytorch/issues/30730 and I'm happy to disable it again if you think that's appropriate. I figured now was as good a time as any to try to re-enable it.

Finally I adjusted the existing torch API tests to suppress deprecation warnings and add keyword arguments used by some of the code in `torch.nn.functional` that were missed when I originally added the tests in https://github.com/pytorch/pytorch/issues/27064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32799

Differential Revision: D19956809

Pulled By: ezyang

fbshipit-source-id: 40d34e0109cc4b9f3ef62f409d2d35a1d84e3d22
2020-02-21 08:38:37 -08:00
6cec555926 Replace AT_CHECK with TORCH_CHECK in torch/csrc/jit/pybind_utils.h (#33524)
Summary:
This is generating a considerable amount of warning, due to the fact
that the header file is included in multiple places.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33524

Differential Revision: D20006604

Pulled By: ezyang

fbshipit-source-id: 0885cd2a708679ba5eeabb172366eb4c5a3bbef4
2020-02-21 08:38:32 -08:00
90f4c5695e Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33572

This reverts commit 687a7e4a2566861c53c8fb53a80b198465168b38.

Original PR #33305

Reland with BC tests whitelisted. See https://github.com/pytorch/pytorch/issues/33580 for reasoning why this change is not actually BC breaking.

Test Plan: Imported from OSS

Differential Revision: D20011011

Pulled By: ezyang

fbshipit-source-id: 116374efc93af12b8ad738a0989d6f0daa9569e2
2020-02-21 08:36:32 -08:00
e2a9ea0f72 Ensure that lambda is no less than zero in softshrink (#33201)
Summary:
Softshrink is ill-defined when `lambda < 0`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33201

Differential Revision: D19899571

Pulled By: ezyang

fbshipit-source-id: ac0dd8edea3435810a76a3a88152f83a024c7859
2020-02-21 08:34:06 -08:00
a6a72ac68f Fix all occurrences of C416. (#33429)
Summary:
C416: Unnecessary (list/set) comprehension - rewrite using list/set().

See https://pypi.org/project/flake8-comprehensions/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429

Differential Revision: D19972858

Pulled By: ezyang

fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23
2020-02-21 08:32:22 -08:00
4588f49f68 Kill cudaDeviceAllocator in THCState (#33380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33380

Differential Revision: D19973151

Pulled By: ezyang

fbshipit-source-id: 41634c43b28ca723e39e761afd32e5015e122368
2020-02-21 08:06:11 -08:00
a943b0518b strict check for a device type in Fuser (#33025)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33025

Differential Revision: D19975873

Pulled By: Krovatkin

fbshipit-source-id: 57f160bec9e4285dda63611f12665264754aac32
2020-02-20 23:53:27 -08:00
e8a03438cc Make TestCuda.test_memory_stats more robust (#33575)
Summary:
IIUC Python does not guarantee when an object is garbage collected. So it is possible that, some other test running before `TestCuda.test_memory_stats` creates object which is only garbage collected during  `TestCuda.test_memory_stats`, causing mem stats to change and causing this test to fail. This kind of failure is very hard to debug (it took me and mcarilli and ptrblck quite a while to figure out what is happening), and it is the root cause of mcarilli's gradient scaling PR https://github.com/pytorch/pytorch/pull/26512 failing on Windows.

cc: csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33575

Differential Revision: D20009260

Pulled By: ngimel

fbshipit-source-id: 62f2716aefac3aa6c7d1898aa8a78e6b8aa3075a
2020-02-20 21:02:55 -08:00
009293ec5c [pytorch][size] remove unused SparseCPUType from mobile build (#33517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33517

I don't think any mobile model uses SparseCPU backend yet so we can skip
generating dispatch code for this backend type.

This will help reduce mobile code size with dynamic dispatch turned on,
roughly ~100K for uncompressed iOS: D19616007 +413K v.s. D19616016 +319K.

It probably doesn't affect much static dispatch build size as the unused
static dispatch methods will be stripped by linker in the end.
ghstack-source-id: 98615810

Test Plan: - CI & BuildSizeBot

Reviewed By: linbinyu

Differential Revision: D19978633

fbshipit-source-id: 27bf6ada2ba98482084cf23724cf400b538b0a03
2020-02-20 20:12:36 -08:00
ac9b40164d Use cheaper check in isTensorList (#33528)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33528

Test Plan: Imported from OSS

Reviewed By: ajyu

Differential Revision: D19989166

Pulled By: zdevito

fbshipit-source-id: b0c484e037ca48226ed4d9204a06982e0c627ff0
2020-02-20 20:10:51 -08:00
d3d975cbf6 Updating submodules
Summary:
GitHub commits:

a16cb11a77
d92f4e3e1e
d021412065
a7c056b5b4
ac6d53d1c9
d75ce0a8ae
622abbcbb3
e1f7368d51
dc2e654b75
50c9e44631

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 452151a75a70f744cba309b2700f274275d476bd
2020-02-20 18:25:57 -08:00
9266bde970 [pytorch] Minor: add GIL assert to PythonRpcHandler::handleExceptionGILHeld (#33557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33557

We should add GIL asserts in some places to keep assumptions documented.
This just adds one in an exception codepath as a placeholder for more.

This change also moves a #define from a .h to the .cpp to reduce scope.
ghstack-source-id: 98673532

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D20005387

fbshipit-source-id: b7eff54a6f1dd69d199f8ca05cdb3001c50b37c4
2020-02-20 18:15:44 -08:00
0bde610c14 Re-sync with internal repository (#33591) 2020-02-20 16:46:16 -08:00
3498c000e2 [TVM] Remove dynamic batch size dispatching (#33584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33584

- Remove dynamic batch size dispatching
- Set caffe2_tvm_min_ops to 8
- Set caffe2_tvm_profiling_based_jit to false
- Rename some variable names

Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform

Reviewed By: yinghai

Differential Revision: D19850620

fbshipit-source-id: 2ec9bbd9fa72f953e79f3e27609ad00d4e135710
2020-02-20 16:13:29 -08:00
faa800eb5b [JIT] remove inline everything jitter skip (#33468)
Summary:
The `not inline_everything` check was causing the jitter check to be skipped whenever we emitted a function. thanks SplitInfinity for pointing this out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33468

Differential Revision: D19975934

Pulled By: eellison

fbshipit-source-id: 03faf8d2fd93f148100d8cf49cb67b8e15cf1f04
2020-02-20 15:58:25 -08:00
c882425c24 Add 64-bit indexing support to THC index reductions (#33405)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32863, (together with https://github.com/pytorch/pytorch/issues/33310 for the `TensorIterator` reductions)

This adds 64-bit indexed kernels for `THC_reduceDimIndex` and uses `THCTensor_canUse32BitIndexMath` to switch between the two at runtime.

I have a test for this locally but haven't included it here because `max` is much slower than `argmax`. To the point where the test takes several minutes to call max on just one `2**32` element tensor. That seems excessive, even for a slow test but I can push it if preferred.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33405

Differential Revision: D20010769

Pulled By: ezyang

fbshipit-source-id: a8a86f662598d5fade4d90448436418422c699a3
2020-02-20 15:20:14 -08:00
23846d5a38 [caffe2] use Clang identification macro in various places (#33574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574

Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation.

Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case.

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```

Reviewed By: BIT-silence

Differential Revision: D20007440

fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75
2020-02-20 15:16:11 -08:00
5782758b54 Add instructions and operators for new bytecode format of PyText model (#33555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555

A quick fix for the PyText model (in internal production) on the new bytecode format.

Test Plan: Imported from OSS

Differential Revision: D20008266

Pulled By: iseeyuan

fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440
2020-02-20 15:05:37 -08:00
108fc78395 [caffe2] fix invalid % escape in inline assembly strings (#33554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554

NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool.

1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow

Reviewed By: bddppq

Differential Revision: D20003621

fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc
2020-02-20 14:31:52 -08:00
e5cf7afd0a torch.tensor can infer complex dtype now (#33361)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33361

Test Plan: Imported from OSS

Differential Revision: D19943477

Pulled By: anjali411

fbshipit-source-id: ff6d7d2a6fdb6c58390f33bdd8be2f3fa182518b
2020-02-20 14:24:15 -08:00
13e4ee7883 Added tensor.is_complex(), is_complex and dtype.is_complex py binding, tensor printing, and dixed the scalar type returned for complex float (#33268)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33268

Test Plan: Imported from OSS

Differential Revision: D19907698

Pulled By: anjali411

fbshipit-source-id: c3ce2e99fc09da91a90a8fb94e5525a00bb23703
2020-02-20 13:38:01 -08:00
36d724c963 run peephole to do profile-based optimizations (#33337)
Summary:
We need to run a peephole before constant propagation in the profiling pipeline, so we fold `prim::shape` for inputs with complete tensor types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33337

Differential Revision: D19905624

Pulled By: Krovatkin

fbshipit-source-id: 80fff067941556053847ddc7afe0fd1c7a89a3ba
2020-02-20 12:39:22 -08:00
1a25747342 Check for consistent devices in at::where (#33432)
Summary:
Changelog:
- Add a check to ensure that all inputs to `where` lie on the same device
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33432

Test Plan:
- Added test_where_invalid_device

Fixes https://github.com/pytorch/pytorch/issues/33422

Differential Revision: D19981115

Pulled By: VitalyFedyunin

fbshipit-source-id: 745896927edb53f61f3dd48ba9e1e6cd10d35434
2020-02-20 12:18:01 -08:00
71225ecc8c Revert D20006312: Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is.
Test Plan: revert-hammer

Differential Revision:
D20006312

Original commit changeset: 4d4cc8ae78ad

fbshipit-source-id: 4bd4b9d1331dc97f5b83e0df491be5fd0a11214a
2020-02-20 12:05:13 -08:00
687a7e4a25 Revert D19975411: Remove special case codegen for tril_indices/triu_indices.
Test Plan: revert-hammer

Differential Revision:
D19975411

Original commit changeset: 996598759bed

fbshipit-source-id: 6bdb4b8f903e13815fc146e6f3260e5bb04c1045
2020-02-20 11:29:53 -08:00
d19a50bf27 Add missing weight_decay parameter validation for Adam and AdamW (#33126)
Summary:
Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126

Differential Revision: D19860366

Pulled By: vincentqb

fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc
2020-02-20 11:11:51 -08:00
cdf381c967 Fix LambdaLR scheduler side effects (#32848)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32756
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848

Differential Revision: D19859736

Pulled By: vincentqb

fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d
2020-02-20 11:09:56 -08:00
3233033a17 Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is.
Test Plan: revert-hammer

Differential Revision:
D19975410

Original commit changeset: eb729870c2d2

fbshipit-source-id: 4d4cc8ae78ad18751c126b93d82932ac2732f1b5
2020-02-20 11:01:44 -08:00
718c538ff9 Add ability to enable/disable MIOpen at runtime (#33118)
Summary:
1. Set `torch._C.has_cudnn` to `True` for ROCm
2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()`
3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118

Differential Revision: D19977719

Pulled By: bddppq

fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad
2020-02-20 10:47:57 -08:00
01e1de8220 allow remote torchscript call to itself (#32990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32990

right now remote torchscript call can not call to itself, this diff is to support this in the same way as how is supported when calling remote python call to itself
ghstack-source-id: 98599082

Test Plan: unit test

Differential Revision: D19731910

fbshipit-source-id: 6495db68c3eaa58812aa0c5c1e72e8b6057dc5c4
2020-02-20 09:44:10 -08:00
a9e4448dff Update documentation on why _cudnn_init_dropout_state looks the way it is. (#33347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33347

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19975410

Pulled By: ezyang

fbshipit-source-id: eb729870c2d279d7d9ca43c92e514fe38dedb06d
2020-02-20 09:36:26 -08:00
196fda5a79 Remove special case codegen for tril_indices/triu_indices. (#33305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33305

The current TensorOptions code is written to exactly extract out
TensorOptions based on exact struct match, including default arguments.
That meant that tril_indices/triu_indices which had a different
default argument didn't match, and thus needed a special case.

I resolve this special case by instead replacing the explicit long
default argument with a None default argument, and then adjusting
the actual implementations to select the correct dtype when none
was specified.  I think the general rule I'm following here is that
it is always acceptable to replace an explicit default argument,
with a None argument (assuming the backend will compute it appropriately);
the documentation gets modestly worse, but everything that was
previously expressible continues to be expressible.  Maybe later
we should switch the default argument back to long, but for now
the simplification in code is worth it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19975411

Pulled By: ezyang

fbshipit-source-id: 996598759bed9e8d54fe61e19354ad038ed0e852
2020-02-20 09:34:28 -08:00
ffe327f7d9 Revert "Disable flaky test TestCppExtensionAOT.test_cuda_extension in… (#33404)
Summary:
… Windows CI (https://github.com/pytorch/pytorch/issues/33282)"

This reverts commit 5b922918d023126ad1f468c68577c9b599ad202d.

Fixes https://github.com/pytorch/pytorch/issues/33270.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33404

Differential Revision: D19972594

Pulled By: ezyang

fbshipit-source-id: c8f67536fd6e4b7135171d621ad671b1b2a21fd4
2020-02-20 09:08:29 -08:00
05fb160048 Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types
Test Plan: revert-hammer

Differential Revision:
D19964089

Original commit changeset: a1e8e62d1ebc

fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0
2020-02-20 08:19:21 -08:00
883b18ea70 Delete build_variables.bzl following configerator change.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2020-02-20 10:26:49 -05:00
e95282ab28 [caffe2] make fused rowwise quant/dequant op work for N-dim tensors (#33426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33426

Make 2/4/8-bit fused rowwise conversion operators more general to work for N-dim tensors

Test Plan: CI

Reviewed By: ellie-wen

Differential Revision: D19943136

fbshipit-source-id: 47008544dd7e1d11a346d34f35449e0fcc0e7ee0
2020-02-19 23:29:42 -08:00
bf0951d937 Updating ONNX checker logic. (#33522)
Summary:
We want to run ONNX checker only when selected operator type is ONNX, and nowhere else. This PR updates the logic in the exporter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33522

Reviewed By: hl475

Differential Revision: D19983954

Pulled By: houseroad

fbshipit-source-id: 15db726321637a96fa110051cc54e9833e201133
2020-02-19 19:30:29 -08:00
1fe635be3c Allow vectorized gpu loop to have different argument types (#33222)
Summary:
Although currently the only user of GPU loops that has args with different dtypes is `where`, it sounds strange to restrict the args to have the same dtype. Allowing args to have different dtypes also makes it possible for me to clean up legacy code by reusing current code to implement unrolled GPU loop for non-contiguous tensors.

The stack storage of `elementwise_kernel_helper` is changed from `arg_t args[nt][arity]` to `traits:: ArgsTuple args[nt]`. Due to this change, we can no longer get element by `operator[]`, but instead we should use `std::get`. As a result, we can no longer unroll the loop wrt arity using pragma, but we have to
create a `static_unroll` to make use of template meta-programming to do the same job.

A good side effect of this change is, `invoke_with_array` is no longer needed and can be replaced with already existing `c10::guts::apply`. And we don't need the `namespace arg_type` workaround either. This makes the code less ugly.

The same approach might also work for ROCm loops, but I didn't change anything on ROCm in this PR, because I don't want potential compilation error or perf regression to delay this PR. But after this gets merged, I will try on ROCm and send a separate PR to make the code less diverge if the same approach trivially applies (trivially apply means a mindless copy-paste doesn't introduce unexpected compilation error or perf regression).

Assembly (https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb#33222):
```
**Symbol:**
void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char*, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char*, 3>)

**ASM:**

	.section	.text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits
	.sectioninfo	@"SHI_REGISTERS=20"
	.align	128
        .global         _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_
        .type           _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function
        .size           _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40520 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_)
        .other          _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT"
_ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_:
.text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253
        /*0000*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R9, SR_CTAID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 39
        /*0030*/                   S2R R0, SR_TID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253
        /*0040*/                   IMAD.SHL.U32 R9, R9, 0x100, RZ ;
        /*0050*/                   IADD3 R5, -R9, c[0x0][0x160], RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227
        /*0060*/                   SHF.R.S32.HI R17, RZ, 0x1f, R9 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 255
        /*0070*/                   ISETP.GE.AND P0, PT, R5, 0x100, PT ;
        /*0080*/              @!P0 BRA `(.L_2919) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227
        /*0090*/                   IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ;
        /*00a0*/                   SHF.L.U64.HI R17, R9, 0x2, R17 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 229
        /*00b0*/                   IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ;
        /*00c0*/                   IADD3 R2, P1, R12, c[0x0][0x190], RZ ;
        /*00d0*/                   IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ;
        /*00e0*/                   IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 82
        /*00f0*/                   IMAD.WIDE R8, R0, 0x10, R8 ;
        /*0100*/                   IMAD.WIDE R2, R0, 0x10, R2 ;
        /*0110*/                   LDG.E.128.SYS R8, [R8] ;
        /*0120*/                   LDG.E.128.SYS R4, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227
        /*0130*/                   IADD3 R12, P0, R12, c[0x0][0x180], RZ ;
        /*0140*/                   IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102
        /*0150*/                   IMAD.WIDE R12, R0, 0x10, R12 ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*0160*/                   FFMA R7, R7, c[0x0][0x168], R11 ;
        /*0170*/                   FFMA R6, R6, c[0x0][0x168], R10 ;
        /*0180*/                   FFMA R5, R5, c[0x0][0x168], R9 ;
        /*0190*/                   FFMA R4, R4, c[0x0][0x168], R8 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102
        /*01a0*/                   STG.E.128.SYS [R12], R4 ;
        /*01b0*/                   EXIT ;
.L_2919:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*01c0*/                   ISETP.GE.AND P0, PT, R0, R5, PT ;
        /*01d0*/                   BMOV.32.CLEAR RZ, B0 ;
        /*01e0*/                   BSSY B0, `(.L_2920) ;
        /*01f0*/                   IMAD.MOV.U32 R4, RZ, RZ, RZ ;
        /*0200*/                   CS2R R6, SRZ ;
        /*0210*/                   IMAD.MOV.U32 R8, RZ, RZ, RZ ;
        /*0220*/                   IMAD.MOV.U32 R10, RZ, RZ, RZ ;
        /*0230*/               P0 BRA `(.L_2921) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*0240*/                   IADD3 R3, P1, R9, R0, RZ ;
        /*0250*/                   LEA.HI.X.SX32 R6, R0, R17, 0x1, P1 ;
        /*0260*/                   LEA R2, P1, R3, c[0x0][0x188], 0x2 ;
        /*0270*/                   LEA.HI.X R3, R3, c[0x0][0x18c], R6, 0x2, P1 ;
        /*0280*/                   LDG.E.SYS R10, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0290*/                   IADD3 R6, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*02a0*/                   ISETP.GE.AND P1, PT, R6, R5, PT ;
        /*02b0*/               P1 BRA `(.L_2922) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*02c0*/                   LDG.E.SYS R6, [R2+0x100] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*02d0*/                   IADD3 R8, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*02e0*/                   ISETP.GE.AND P1, PT, R8, R5, PT ;
        /*02f0*/               P1 BRA `(.L_2923) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0300*/                   IADD3 R8, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*0310*/                   ISETP.GE.AND P1, PT, R8, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*0320*/                   LDG.E.SYS R8, [R2+0x200] ;
        /*0330*/              @!P1 LDG.E.SYS R7, [R2+0x300] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102
        /*0340*/               P1 IMAD.MOV.U32 R7, RZ, RZ, RZ ;
        /*0350*/                   BRA `(.L_2921) ;
.L_2923:
        /*0360*/                   IMAD.MOV.U32 R7, RZ, RZ, RZ ;
        /*0370*/                   IMAD.MOV.U32 R8, RZ, RZ, RZ ;
        /*0380*/                   BRA `(.L_2921) ;
.L_2922:
        /*0390*/                   CS2R R6, SRZ ;
        /*03a0*/                   IMAD.MOV.U32 R8, RZ, RZ, RZ ;
.L_2921:
        /*03b0*/                   BSYNC B0 ;
.L_2920:
        /*03c0*/                   BMOV.32.CLEAR RZ, B0 ;
        /*03d0*/                   BSSY B0, `(.L_2924) ;
        /*03e0*/               P0 BRA `(.L_2925) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*03f0*/                   IADD3 R3, P1, R9, R0, RZ ;
        /*0400*/                   LEA.HI.X.SX32 R12, R0, R17, 0x1, P1 ;
        /*0410*/                   LEA R2, P1, R3, c[0x0][0x190], 0x2 ;
        /*0420*/                   LEA.HI.X R3, R3, c[0x0][0x194], R12, 0x2, P1 ;
        /*0430*/                   LDG.E.SYS R11, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0440*/                   IADD3 R12, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*0450*/                   ISETP.GE.AND P1, PT, R12, R5, PT ;
        /*0460*/               P1 BRA `(.L_2926) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*0470*/                   LDG.E.SYS R13, [R2+0x100] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0480*/                   IADD3 R12, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*0490*/                   ISETP.GE.AND P1, PT, R12, R5, PT ;
        /*04a0*/               P1 BRA `(.L_2927) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*04b0*/                   LDG.E.SYS R15, [R2+0x200] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*04c0*/                   IADD3 R12, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*04d0*/                   ISETP.GE.AND P1, PT, R12, R5, PT ;
        /*04e0*/               P1 BRA `(.L_2928) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*04f0*/                   LDG.E.SYS R4, [R2+0x300] ;
        /*0500*/                   BRA `(.L_2928) ;
.L_2927:
        /*0510*/                   IMAD.MOV.U32 R15, RZ, RZ, RZ ;
        /*0520*/                   BRA `(.L_2928) ;
.L_2926:
        /*0530*/                   IMAD.MOV.U32 R15, RZ, RZ, RZ ;
        /*0540*/                   IMAD.MOV.U32 R13, RZ, RZ, RZ ;
        /*0550*/                   BRA `(.L_2928) ;
.L_2925:
        /*0560*/                   IMAD.MOV.U32 R15, RZ, RZ, RZ ;
        /*0570*/                   IMAD.MOV.U32 R13, RZ, RZ, RZ ;
        /*0580*/                   IMAD.MOV.U32 R11, RZ, RZ, RZ ;
.L_2928:
        /*0590*/                   BSYNC B0 ;
.L_2924:
	//## File "/usr/include/c++/8/tuple", line 1315
        /*05a0*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*05b0*/                   IADD3 R9, P0, R9, R0, RZ ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*05c0*/                   FFMA R11, R11, c[0x0][0x168], R10 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59
        /*05d0*/                   IADD3 R14, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*05e0*/                   LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ;
        /*05f0*/                   LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0600*/                   ISETP.GE.AND P1, PT, R14, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*0610*/                   LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ;
        /*0620*/                   STG.E.SYS [R2], R11 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0630*/               P1 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59
        /*0640*/                   IADD3 R10, R0, 0x80, RZ ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*0650*/                   FFMA R13, R13, c[0x0][0x168], R6 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0660*/                   ISETP.GE.AND P0, PT, R10, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*0670*/                   STG.E.SYS [R2+0x100], R13 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0680*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59
        /*0690*/                   IADD3 R0, R0, 0xc0, RZ ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*06a0*/                   FFMA R15, R15, c[0x0][0x168], R8 ;
        /*06b0*/                   FFMA R7, R4, c[0x0][0x168], R7 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*06c0*/                   ISETP.GE.AND P0, PT, R0, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*06d0*/                   STG.E.SYS [R2+0x200], R15 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*06e0*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*06f0*/                   STG.E.SYS [R2+0x300], R7 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 260
        /*0700*/                   EXIT ;
.L_2929:
        /*0710*/                   BRA `(.L_2929);
        /*0720*/                   NOP;
        /*0730*/                   NOP;
        /*0740*/                   NOP;
        /*0750*/                   NOP;
        /*0760*/                   NOP;
        /*0770*/                   NOP;
.L_40520:
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33222

Differential Revision: D19964089

Pulled By: ngimel

fbshipit-source-id: a1e8e62d1ebcc67fb49f00d87c02bcdd13194024
2020-02-19 18:41:27 -08:00
81394581a3 [Caffe2][ThreadPool] Make sure numThreads does not exceed the number of big cores (#33523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523

When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly.

Test Plan:
```
cd ~/fbsource/xplat
buck test caffe2:caffe2_testAndroid
```

Reviewed By: dreiss

Differential Revision: D19779267

fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd
2020-02-19 18:24:24 -08:00
602ef0d9d0 [WIP] migrate scatter_ to ATen CPU (+multithreading, nondeterministic) (#33139)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24757, partially https://github.com/pytorch/pytorch/issues/33094. Uses fix introduces in https://github.com/pytorch/pytorch/issues/33108 to avoid regressions for some compilers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33139

Differential Revision: D19882462

Pulled By: ngimel

fbshipit-source-id: 5016f186a4aadc3cc32edcfd9abdea11786f27e9
2020-02-19 18:17:37 -08:00
6cb9e6b015 Back out "Revert D19871946: [distributed] pass in timeout to TCP store when initializing" (#33434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33434

Reland of https://github.com/pytorch/pytorch/pull/33325, since the
unit test was flaky and failed on land.
To ensure that the test is not flaky, I bumped the timeout so the rendezvous
does not timeout (timing out the rendezvous in 1s led to the flakiness). I also
generalized our mechanism for retrying on errors to include retrying on errors
due to timeout in rendezvous.
ghstack-source-id: 98558377

Test Plan: Added UT test_tcp_store_timeout_set

Differential Revision: D19935390

fbshipit-source-id: 56ccf8c333dd2f954a33614d35cd1642d4e9473a
2020-02-19 17:17:17 -08:00
ecb05f12c3 Support broadcast for quantized mul kernel (#30442)
Summary:
Since the tensor iterator supports the broadcast, we will just remove the assertion on input shapes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30442

Differential Revision: D19976562

Pulled By: lly-zero-one

fbshipit-source-id: 91b27fc8b2570f29d110c6df26eacdd16f587b9f
2020-02-19 16:52:31 -08:00
ea514c819a Make slow_conv_transpose2d_backward tensors contiguous (#33462)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33462

Test Plan: Imported from OSS

Differential Revision: D19956516

Pulled By: VitalyFedyunin

fbshipit-source-id: 4fa9dcba0dd02b891ab36e6ecee8fc59e049c15c
2020-02-19 16:44:14 -08:00
e5a02aa2fe [caffe2] simplify relative error expr (#32999)
Summary:
simplify relative error expr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32999

Differential Revision: D19739382

Pulled By: jerryzh168

fbshipit-source-id: 95e0c68f6d9cb6708f400cc1cdb311af83b0621e
2020-02-19 16:35:44 -08:00
bd3c6e8e91 avoid large vector copy when query per_channel q_params (#31040)
Summary:
The quantizer use std::vector to save per_channel scales and zero_points, but when query scales(zero_points), it requires to return tensor. These lead to use std::vector to initialize tensors and it dose cost lots of time. So I change quantizer to save per_channel scales and zero_points by using tensor directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31040

Differential Revision: D19701070

Pulled By: jerryzh168

fbshipit-source-id: 9043f16c44b74dd8289b8474e540171765a7f92a
2020-02-19 16:24:24 -08:00
8527ba8b70 [jit] Add None parameter as parameter instead of attributes (#32964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32964

att

Test Plan:
.

Imported from OSS

Differential Revision: D19913188

fbshipit-source-id: 9cdd93cbaf9892f4311656c786637765a675a68c
2020-02-19 16:06:56 -08:00
507f963aa6 [RPC Reliability] Enabled retries for RPCs with exponential backoff (#33365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33365

This adds functionality for re-trying RPC's that are sent with the function sendWithRetries(). It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout.

GitHub Issue: https://github.com/pytorch/pytorch/issues/32124

Per the first 4 milestones, the following will be addressed in future PR's:
* enabling RPC Retries for RRef internal messages

Differential Revision: D19915694

fbshipit-source-id: 4a520e32d5084ebcf90e97fd9f26867115a35c0c
2020-02-19 15:59:29 -08:00
416413dec4 [jit] add inlined_graph method to ScriptFunctions (#33508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33508

Ever since we switched to not inlining by default, some users have
complained since they relied on inlining occuring to, e.g. process the
graph with some other tool. Add an inlined_graph for convenience in
those cases.

Test Plan: Imported from OSS

Differential Revision: D19977638

Pulled By: suo

fbshipit-source-id: fe1fa92ff888959203d5d1995930d488b5f9e24c
2020-02-19 15:41:25 -08:00
5e80ca12bb [pt][fbgemm] Turn on USE_FBGEMM on Windows env (#297)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/297

Pull Request resolved: https://github.com/pytorch/pytorch/pull/33250

As Title says. FBGEMM has recently added the support for Windows.

ghstack-source-id: 97932881

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D19738268

fbshipit-source-id: e7f3c91f033018f6355edeaf6003bd2803119df4
2020-02-19 15:09:21 -08:00
cbf8657945 [jit] Fix ModuleDict type sharing (#33515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33515

Previously, if we had a `ModuleDict` with the same value types but
different names for keys, they would share types under certain
conditions. This only happens for `ModuleDict`, because in other cases
a simple Python class check invalidates the class.

Test Plan: Imported from OSS

Differential Revision: D19978552

Pulled By: suo

fbshipit-source-id: f31b2af490064f89b70aa35f83ba740ddaf2a77a
2020-02-19 15:01:46 -08:00
8908b62fb2 Clean views created inside no_grad that are modified inplace (#32839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32839

As mentioned in the updated comment in `variable.h`, this disambiguate code like:
```python
base = torch.rand(10, requires_grad=True)
with torch.no_grad():
    view = base[1]
view.copy_(var)
torch.autograd.grad(base.sum(), var)  # <- what should it return?
```
Given that there is no consensus of what should happen here (does the gradient flow through the view in the no_grad or not). This special case is detected and forbidden.
As mentionned in the error message:
- If you want it to be tracked: move both out of the no_grad
- If do not want them to be tracked, move both inside the no_grad

This implies that any custom Function that returns views does not allow inplace modification on its output. I'll add a PR to the stack to relax this to be a DeprecationWarning for now. And we will make it into an actual error for 1.6

This replaces https://github.com/pytorch/pytorch/pull/26607
cc sublee

Test Plan: Imported from OSS

Differential Revision: D19814114

Pulled By: albanD

fbshipit-source-id: ff2c9d97c8f876d9c31773a2170e37b06d88bed7
2020-02-19 14:55:53 -08:00
20c1e25832 Re-sync with internal repository (#33519) 2020-02-19 14:33:44 -08:00
1d9fcf8bd2 Correct documentation for torch.unsqueeze (#33478)
Summary:
"out" argument in torch.unsqueeze is not actually implemented, fixed documentation https://github.com/pytorch/pytorch/issues/29800
After: ![image](https://user-images.githubusercontent.com/33493903/74796371-6289ee00-5296-11ea-8493-e8c18ac63bdf.png)

Before: ![image](https://user-images.githubusercontent.com/33493903/74796444-96651380-5296-11ea-816c-2adacfa79e35.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33478

Differential Revision: D19978477

Pulled By: yf225

fbshipit-source-id: 42337326c1ec04975307366c94591ee32a11b091
2020-02-19 14:01:06 -08:00
62c953b348 Fix svd tests between devices. (#33470)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33470

Differential Revision: D19974449

Pulled By: ailzhang

fbshipit-source-id: e456608fe95d270d822e786a5955cce7c746165c
2020-02-19 13:53:10 -08:00
a8bd1d24c9 [Documentation] cummin doc fix (#33492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33492

Differential Revision: D19976082

Pulled By: anjali411

fbshipit-source-id: c9f8f541783fded98b8aba54e293f824c926496e
2020-02-19 13:51:38 -08:00
d4e4513a64 [JIT] Add more ops to 'removableGuard' in guard elimination pass. (#33465)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33465

Differential Revision: D19958385

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: f89b6a2ead279b55af286072223fc9ea1b5fe3b3
2020-02-19 11:47:23 -08:00
07e5e42713 [jit][fix] Remove slot in parameter slot (#32846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32846

att

Test Plan:
build/bin/test_jit

Imported from OSS

Differential Revision: D19844711

fbshipit-source-id: 3d29e5e97e97781f5dc00069827971baed52d76e
2020-02-19 11:15:15 -08:00
1e3664b6ef Remove c/pdist tests from _internal/common_utils.py (#33409)
Summary:
* remove brute_test from `torch/testing/_internal/common_utils.py`
* add these tests as internal tests to `test_torch.py`

CC ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33409

Differential Revision: D19951729

Pulled By: ailzhang

fbshipit-source-id: b1126aaf26fa64a0f17cbb582dc8038b79cfe3eb
2020-02-19 10:27:30 -08:00
60339a38ed Fixes #33001 (#33456)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/33001.

When subtracting 1 from a empty array, instead of being `-1` as seems to be expected in the later code (while loop), because `size()` seems to be unsigned, it becomes a very large number. This causes a segfault during the while loop later in the code where it tries to access a empty array.

This issue seemed to happen only on the pi with the following example code: `v = torch.FloatTensor(1, 135).fill_(0); v[0, [1]] += 2`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33456

Differential Revision: D19963711

Pulled By: ezyang

fbshipit-source-id: 1dbddd59a5df544cd7e025fc540c9efe2c4e19f4
2020-02-19 09:57:52 -08:00
165b1ad8e8 Kill THCState_getNumDevices (#33375)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33375

Differential Revision: D19973163

Pulled By: ezyang

fbshipit-source-id: d8edede3a3ac5012e4208bb30b6e66d8a2d1019f
2020-02-19 09:52:40 -08:00
96e5dea9f4 Remove unused variable (#33484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33484

att

Test Plan: unittests

Reviewed By: jfix71

Differential Revision: D19862090

fbshipit-source-id: c6a33604e2fc78fb90ae2b5fcc72421ee89a02aa
2020-02-19 08:51:56 -08:00
d7f00b1b45 Remove using declaration from widely-used header file. (#33293)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33293

Test Plan: Imported from OSS

Differential Revision: D19904992

Pulled By: gchanan

fbshipit-source-id: b5ac76db2e5cdb422671c6c5424858e1d97c323e
2020-02-19 08:19:11 -08:00
a67691e508 Fix isnan for integral types in MSVC (#33483)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/32537#discussion_r381077989.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33483

Differential Revision: D19970623

Pulled By: anjali411

fbshipit-source-id: 53502101822672a333ab5349d93b6e93f7ee4265
2020-02-19 08:13:03 -08:00
53ad596342 [jit] Remove `torch.jit._dump_trace (#33453)
Summary:
This was old code that isn't tested and is broken, it should have been
deleted in #24874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/33453

Pulled By: driazati

Differential Revision: D19961403

fbshipit-source-id: 94c52360460194d279dad5b0ea756ee366f525e1
2020-02-19 07:49:44 -08:00
8b6a898d2b Updating submodules
Summary:
GitHub commits:

d9ead2de34

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 6c245f2a656d30b7baf8d0bff85a49090174c289
2020-02-19 05:09:56 -08:00
d13c1b8af8 [jit] de-optionalize SourceRange context (#32880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32880

The PR below made it impossible to construct a SourceRange without a
context, so get rid of its optional-ness

Test Plan: Imported from OSS

Differential Revision: D19670923

Pulled By: suo

fbshipit-source-id: 05936fca2a3d5e613313ade9287b2210bc4a3ccd
2020-02-18 23:46:05 -08:00
d85c913bfd [jit] Delete the ErrorReport default constructor (#32879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32879

An error report without a SourceRange context is bad, because it doesn't
tell the user where something happened. Delete the default constructor
to make it harder to create errors like this (you can still use a fake
SourceRange if you absolutely need to).

Also clean up the only case where the default constructor was used.

Test Plan: Imported from OSS

Differential Revision: D19670924

Pulled By: suo

fbshipit-source-id: 46888a86e5d32b84c8d6d52c0c8d70243722b14a
2020-02-18 23:44:32 -08:00
e9ac92a242 Make RPC message constructor actually move (#33440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33440

The constructors make a copy without `std::move` in the initializer list.

Test Plan:
Confirmed manually that without this change, the `data()` pointer of
the vector changes. With this change it does not, as intended.

Reviewed By: mrshenli

Differential Revision: D19948685

fbshipit-source-id: ee4f22e29894b858ad86068722dc2f4651987517
2020-02-18 23:31:33 -08:00
d50305e2f3 Updating submodules
Summary:
GitHub commits:

7903fc3142
462eaef5fc
e2966a7507
09013ed8c4
df7e47c39b
f40e6d1dbf

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 37553007eb60438d5ddd9cb16f0edc24e4637c25
2020-02-18 23:27:08 -08:00
a5f01846c2 Kill THCState_getCurrentStream (#33376)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33376

Differential Revision: D19964101

Pulled By: ngimel

fbshipit-source-id: d6b76327191a469f3a88a54d8ffe07121139ab16
2020-02-18 21:24:27 -08:00
96989a2a11 [ONNX] Adding ONNX large model export support in exporter (#33062)
Summary:
There are large models such as GPT2-large which cannot be exported with the current exporter because of the 2GB protobuf limit (e.g. see https://github.com/pytorch/pytorch/issues/19277). ONNX spec specifies a special format for large (> 2GB)  models. This PR adds support for exporting large models in ONNX large model format in the PyTorch-ONNX exporter.

This is the first PR for this feature that enables the end-to-end execution. Tests for large model export have been added. We may need follow-up PRs to refine this workflow based on user feedback.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33062

Reviewed By: hl475

Differential Revision: D19782292

Pulled By: houseroad

fbshipit-source-id: e972fcb066065cae6336aa91c03023d9c41c88bd
2020-02-18 20:51:43 -08:00
3ad59734d7 Add type annotation for bias in _ConvNd (#32885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32885

Currently Tensor bias is registered as parameter and None bias is registered as attribute.
We need the type annotation because when we try to fold ConvBn in graph mode quantization we'll
remove the None bias attribute and add a Tensor bias attribute, without type annotation the
bias Value in the graph  will be marked with different type in these two cases, so we have rewrite the
graph to change the type as well in that case. But with type annotation we don't need to modify the graph
since both cases the bias value will have type `Tensor?`

Test Plan:
.

Imported from OSS

Differential Revision: D19844710

fbshipit-source-id: 52438bc72e481ab78560533467f9379a8b0b0cfa
2020-02-18 20:09:18 -08:00
feaa622fc6 [Update transforms.py]Add TanhTransform (#19785)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/33195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19785

Differential Revision: D19642395

Pulled By: ezyang

fbshipit-source-id: 73c386fb89cd195201757b5fa47d6c01914a1f8f
2020-02-18 17:42:10 -08:00
43e015f4b1 Bug fix in dynamic quantization kernels + better test coverage. (#33320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33320

Reviewed By: supriyar

Differential Revision: D19893911

Pulled By: AshkanAliabadi

fbshipit-source-id: e79dd06af333c6629e3412315550814da28d9c24
2020-02-18 15:32:44 -08:00
f1b73799d5 Clean up isinstance flags (#33265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33265

This removes the need for isinstance to keep trace of list and tuple
separately by introducing AnyListType and AnyTupleType into the JIT
type system to be the common supertype of any lists or tuples.

This allows us to remove the weird flags from the interpreter for
the isinstance operator.

Test Plan: Imported from OSS

Differential Revision: D19883933

Pulled By: zdevito

fbshipit-source-id: f998041b42d8b4554c5b99f4d95d1d42553c4d81
2020-02-18 15:07:06 -08:00
7f2c25b6fa Move special ops into interpreter (#32889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889

Common primitive ops that have special inputs make it very hard to
serialize the bytecode for mobile because information about how the
op behaves is hidden in the Node*. This changes how we handle the following
ops so that they are encoded as their own interpreter bytecodes.

```
    USES NODE: prim::TupleUnpack(...) -> (...)
    USES NODE: prim::TupleSlice(...) -> (...)
    USES NODE: prim::TupleConstruct(...) -> (...)
    USES NODE: prim::ListUnpack(...) -> (...)
    USES NODE: prim::ListConstruct(...) -> (...)
    USES NODE: prim::DictConstruct(...) -> (...)
    USES NODE: prim::Constant() -> (...)
    USES NODE: prim::isinstance(...) -> (...)
    USES NODE: prim::CreateObject(...) -> (...)
    USES NODE: prim::fork(...) -> (...)
    USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```

This leaves a state where the _only_ remaining Node*-consuming builtins
are things that are only introduced during JIT optimization and will
not appear in mobile code.

Serialization of bytecode can now be made to directly write the CodeImpl
object without modification.

Test Plan: Imported from OSS

Differential Revision: D19673157

Pulled By: zdevito

fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26
2020-02-18 15:07:01 -08:00
83c347ff4a Remove prim::Constant op (#32804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32804

Constants are interpreter primitives so the op was not actually used.
This cleans up some of the logic around it.

This also fixes constant prop such that failures to look up an op
do not silently stop constant propagation. Instead, only errors
inside the op implementation itself will do this.

Test Plan: Imported from OSS

Differential Revision: D19673156

Pulled By: zdevito

fbshipit-source-id: 7beee59a6a67a6c2f8261d86bd505280fefa999e
2020-02-18 15:06:56 -08:00
c59e35b147 interpreter handling for varargs to remove need for looking at Node (#32791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32791

When a registered operator has varags (ends with ... in its schema),
the interpreter now appends the number of arguments to the top of
the stack before invoking the operator. This allows the removal of more
uses of Node* in the interpreter.

This PR also then cleans up the constructors for Operator to make
it more likely someone chooses the correct one. After making these ops:

```
USES NODE: prim::TupleUnpack(...) -> (...)
USES NODE: prim::TupleSlice(...) -> (...)
USES NODE: prim::TupleConstruct(...) -> (...)
USES NODE: prim::ListUnpack(...) -> (...)
USES NODE: prim::ListConstruct(...) -> (...)
USES NODE: prim::DictConstruct(...) -> (...)
USES NODE: prim::Constant() -> (...)
USES NODE: prim::isinstance(...) -> (...)
USES NODE: prim::CreateObject(...) -> (...)
USES NODE: prim::fork(...) -> (...)
USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```

Into interpreter primitives, we can remove all but two constructors for operators:
one that is (schema_string, operation), and one that is (symbol, op_creator) for
the remaining weird primitives.

Test Plan: Imported from OSS

Differential Revision: D19673158

Pulled By: zdevito

fbshipit-source-id: 95442a001538a6f53c1db4a210f8557ef118de66
2020-02-18 15:04:48 -08:00
da015c77a1 Cummax and Cummin doc update and performance benchmark (#32537)
Summary:
[CPU] Benchmark results for cummax, cummin:

In [1]: import torch

In [2]: x=torch.randn(5,6,7).cuda()

In [3]: %timeit x.cummax(0)
134 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: %timeit x.max(0)
114 µs ± 560 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit x.cummax(1)
134 µs ± 760 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit x.max(1)
118 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit x.cumsum(0)
97.1 µs ± 6.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit x.cumprod(0)
83.6 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit x.cumprod(1)
86.3 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: y=torch.randn(5,6,7)

In [11]: %timeit y.cummax(0)
148 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit y.max(0)
111 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [13]: %timeit y.cumsum(0)
54.8 µs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: %timeit y.cumprod(0)
56.2 µs ± 836 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32537

Differential Revision: D19951171

Pulled By: anjali411

fbshipit-source-id: cf972c550189473e9ce62e24ac7dd34b9373fef9
2020-02-18 14:12:25 -08:00
016d73bd74 remove Complex CPU/CUDA backend enum keys (#33267)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33267

Test Plan: Imported from OSS

Differential Revision: D19907696

Pulled By: anjali411

fbshipit-source-id: 78cc55344313387c4b05bb003688915cee64e3be
2020-02-18 13:38:39 -08:00
1d743e3154 Add guard elimination support for aten::unsqueeze. (#33371)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33371

Differential Revision: D19920041

Pulled By: resistor

fbshipit-source-id: 906af47676dba014c31eef069a4753207f2efc60
2020-02-18 13:22:58 -08:00
1af30451e5 sync srcs between fbcode and ovrsource targets (#33368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33368

reorganizing files that describe sources to ensure the same list is used for both fbcode and ovrsource targets. (BUCK vs TARGETS)

Test Plan: CI green

Reviewed By: malfet

Differential Revision: D19803036

fbshipit-source-id: 69c1fa10877c3f0c0e9c1517784949c3c9939710
2020-02-18 13:00:43 -08:00
44af8ee6cd Add pybind11 exception translator (#30588)
Summary:
Closes https://github.com/pytorch/pytorch/issues/30027

The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function:
```cpp
m.def("foo", foo, py::call_guard<torch::PyWarningHandler>());
```
Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588

Differential Revision: D19905626

Pulled By: albanD

fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6
2020-02-18 11:33:29 -08:00
4c8064c9e1 Fix avx-512 detection logic for jit fuser with MSVC 2019 (#33403)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33401.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33403

Differential Revision: D19949812

Pulled By: soumith

fbshipit-source-id: 00dc3c99b5ba1c13394d5d38bcb148720434b0a3
2020-02-18 11:04:18 -08:00
abbf6e7f53 fix clang-tidy lint (#33448)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33448

Test Plan: Imported from OSS

Differential Revision: D19952962

Pulled By: suo

fbshipit-source-id: db04bf74f6156edd1bd0716b12f6ca911c84a6bf
2020-02-18 11:02:57 -08:00
4468a7b7b3 Updating submodules
Summary:
GitHub commits:

efc34423b6
75bb459654
fc1945c2e0
332a31a145
2b6eef4dc9

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: d105b9aa5001c53f884f007406684b73809a7680
2020-02-18 10:21:04 -08:00
f938b3b4e0 Remove TH binding of set_(Tensor). (#33358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33358

We just translate this code to ATen.

Test Plan: Imported from OSS

Differential Revision: D19911114

Pulled By: gchanan

fbshipit-source-id: 2279e63bb7006f7253620417937e3ce9301e0cdb
2020-02-18 10:10:00 -08:00
879cf0b15a fix typing bug of LambdaLR.__init__ (#33271)
Summary:
## problem

```python
class LambdaLR(_LRScheduler):
    """Sets the learning rate of each parameter group to the initial lr
    times a given function. When last_epoch=-1, sets initial lr as lr.

    Args:
        optimizer (Optimizer): Wrapped optimizer.
        lr_lambda (function or list): A function which computes a multiplicative
            factor given an integer parameter epoch, or a list of such
            functions, one for each group in optimizer.param_groups.
        last_epoch (int): The index of last epoch. Default: -1.

    Example:
        >>> # Assuming optimizer has two groups.
        >>> lambda1 = lambda epoch: epoch // 30
        >>> lambda2 = lambda epoch: 0.95 ** epoch
        >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
        >>> for epoch in range(100):
        >>>     train(...)
        >>>     validate(...)
        >>>     scheduler.step()
    """
```

`LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas.

## related issue

Resolve https://github.com/pytorch/pytorch/issues/32645
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271

Differential Revision: D19878665

Pulled By: vincentqb

fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0
2020-02-18 09:10:00 -08:00
2c99ea8654 Dirac init compatibility with group convolutions (#32825)
Summary:
Initializing weights of group-conv with init.dirac_, and applying, previously resulted in an output that makes no sense:
```
x = torch.randn([1, 3, 3, 3])
print('input:\n', x)
conv_layer = torch.nn.Conv2d(3, 3, 3, padding=1, groups=3, bias=False)
torch.nn.init.dirac_(conv_layer.weight.data)
print('\noutput (before this PR):\n',conv_layer(x))

input:
 tensor([[[[ 0.5369, -1.1428,  0.1031],
          [ 0.4638, -0.0854, -0.6553],
          [ 0.8321, -2.5926, -0.3214]],

         [[-0.2289, -0.0895,  0.4407],
          [ 1.2309, -1.2096, -1.5216],
          [-0.1798,  1.1694,  0.3469]],

         [[ 0.1905,  0.8095,  0.5490],
          [-0.4525, -0.4284, -0.1141],
          [ 1.1857, -0.9246, -0.5119]]]])

output (before this PR):
 tensor([[[[ 0.5369, -1.1428,  0.1031],
          [ 0.4638, -0.0854, -0.6553],
          [ 0.8321, -2.5926, -0.3214]],

         [[ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000]],

         [[ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000]]]], grad_fn=<MkldnnConvolutionBackward>)
````

This PR allows introducing groups to the initialization:
```
torch.nn.init.dirac_(conv_layer.weight.data, groups=3)
print('output (after this PR):\n', conv_layer(x))

output (after this PR):
 tensor([[[[ 0.5369, -1.1428,  0.1031],
          [ 0.4638, -0.0854, -0.6553],
          [ 0.8321, -2.5926, -0.3214]],

         [[-0.2289, -0.0895,  0.4407],
          [ 1.2309, -1.2096, -1.5216],
          [-0.1798,  1.1694,  0.3469]],

         [[ 0.1905,  0.8095,  0.5490],
          [-0.4525, -0.4284, -0.1141],
          [ 1.1857, -0.9246, -0.5119]]]], grad_fn=<MkldnnConvolutionBackward>)
```

When out_channels is different than input_channels, it does the natural thing which is applying identity in each group separately:

```
x = torch.randn([1, 2, 3, 3])
print('input:\n', x)
conv_layer = torch.nn.Conv2d(2, 4, 3, padding=1, groups=2, bias=False)
torch.nn.init.dirac_(conv_layer.weight.data, groups=2)
print('\noutput:\n', conv_layer(x))

input:
 tensor([[[[ 1.2205, -0.6608,  0.8640],
          [-0.5464,  1.1288,  1.4726],
          [-0.6693,  0.4000, -1.7613]],

         [[-0.8760, -0.8814, -0.4705],
          [ 0.6283, -0.5943,  0.6873],
          [-0.6852,  1.4723,  0.3325]]]])

output:
 tensor([[[[ 1.2205, -0.6608,  0.8640],
          [-0.5464,  1.1288,  1.4726],
          [-0.6693,  0.4000, -1.7613]],

         [[ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000]],

         [[-0.8760, -0.8814, -0.4705],
          [ 0.6283, -0.5943,  0.6873],
          [-0.6852,  1.4723,  0.3325]],

         [[ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000]]]], grad_fn=<MkldnnConvolutionBackward>)
```

Argument 'groups' defaults to 1 so it is backward compatible.

Tests are modified to include cases of with groups>1 but also contain groups=1 cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32825

Differential Revision: D19859926

Pulled By: vincentqb

fbshipit-source-id: 9dfdd24471ff14d79c442dfd28c1891aff812fdf
2020-02-18 09:00:12 -08:00
28c5213a97 Add mechanism to pass a number of workers to cpp extensions (#33346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33346

Fixes #33091

This PR lets users control the number of workers that cpp extensions
uses through the environment variable `MAX_JOBS`. If the environment
variable is a non-negative integer we use that many threads; otherwise,
ninja falls back to the default.

I chose to use the name `MAX_JOBS` because we use it in PyTorch already
to control the number of workers PyTorch builds with. There is a risk
that users of cpp extensions already have `MAX_JOBS` set but we are
hoping that that risk is small and/or it means semantically the same
thing.

Test Plan: - tested locally

Differential Revision: D19911645

Pulled By: zou3519

fbshipit-source-id: d20ed42de4f845499ed38f1a1c73e9ccb620f780
2020-02-18 06:48:11 -08:00
cfb4862673 [pytorch] correct input size check for GroupNorm (#33008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33008

Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed.

**Details**:
The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros.
However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful.

One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](791c172a33/torchvision/models/segmentation/deeplabv3.py (L50)), where GroupNorm could be used in place of BatchNorm [here](791c172a33/torchvision/models/segmentation/deeplabv3.py (L55)). However, now this is prohibited and results in failures.

Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size.

Test Plan: check that existing tests pass

Reviewed By: fmassa

Differential Revision: D19723407

fbshipit-source-id: c85c244c832e6592e9aedb279d0acc867eef8f0c
2020-02-18 06:43:53 -08:00
dde2ff4608 [Fuser] Add a knob for disabling/enabling CUDA fuser. (#33395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33395

By default the GPU fuser stays enabled, but this function allows to
manually disable it. It will be useful for working on other
implementations of fuser.

Test Plan: Imported from OSS

Differential Revision: D19926911

Pulled By: ZolotukhinM

fbshipit-source-id: 7ea9d1dd7821453d640f81c487b63e1d585123c4
2020-02-17 21:28:09 -08:00
a203dc2e6d [C++ API] Allow skipping default arguments in module's forward method when module is used in Sequential (#33027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33027

This PR allows default arguments in module's forward method to be skipped when module is used in `torch::nn::Sequential`, by introducing the `FORWARD_HAS_DEFAULT_ARGS` macro and requiring that all modules that have default arguments in its forward method must have a corresponding `FORWARD_HAS_DEFAULT_ARGS` macro call.

Fixes issue mentioned in https://github.com/pytorch/pytorch/issues/30931#issuecomment-564144468.

Test Plan: Imported from OSS

Differential Revision: D19777815

Pulled By: yf225

fbshipit-source-id: 73282fcf63377530063e0092a9d84b6c139d2e32
2020-02-17 20:38:02 -08:00
4724964810 [C++ API] Expose AnyValue and AnyModuleHolder classes (#33026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33026

This PR contains necessary changes to prepare for https://github.com/pytorch/pytorch/pull/33027. It exposes the following classes to public:
1. `torch::nn::AnyValue`, because if the user has optional arguments in their module's forward method, they must also use the `FORWARD_HAS_DEFAULT_ARGS` macro and pass in the default values for those optional arguments wrapped by `torch::nn::AnyValue`.
2. `torch::nn::AnyModuleHolder`, because `torch::nn::Module` needs to declare it as a friend class for it to be able to access `torch::nn::Module`'s protected methods such as `_forward_has_default_args` / `_forward_num_required_args` / `_forward_populate_default_args`.

Test Plan: Imported from OSS

Differential Revision: D19777814

Pulled By: yf225

fbshipit-source-id: 1c9d5aa24f0689154752c426a83ee98f64c9d02f
2020-02-17 20:35:22 -08:00
5d7f42847c Add at::Tensor::retain_grad API (#33349)
Summary:
This PR adds `at::Tensor::retain_grad`, and its implementation mirrors the Python `torch.Tensor.retain_grad` API:
c6271c63f2/torch/tensor.py (L292-L315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33349

Differential Revision: D19944524

Pulled By: yf225

fbshipit-source-id: e61d5d761996b6d1b860c04c4b4650c1a49a6a8c
2020-02-17 20:03:48 -08:00
55fa133cdc Remove gpu_kernel_with_index (#33370)
Summary:
Although `gpu_kernel_with_index` might look like a quite general helper function at first look, it actually isn't.

The problem is not only 32bit indexing, but something more fundamental: `TensorIterator` reorder dims and shapes, so if you have non-contiguous tensor such as `torch.empty(5, 5).t()` , the index won't be correct. Since the whole point of `TensorIterator` is to manipulate shapes/strides to speedup loops, it is fundamentally impossible to get the correct linear index without tons of efforts.

Currently, the range factories are not failing on an `out=non_contiguous_tensor`  is because it is so lucky that  `has_internal_overlap` is stupid enough to return everything not contiguous as `TOO_HARD`.

Since `gpu_kernel_with_index` is not general, we should move it from `Loops.cuh` to `RangeFactories.cu`. And since the kernel is so simple to implement, it makes no sense to use `TensorIterator` which goes through tons of unnecessary checks like `compute_dtypes`.

`torch.range` is not tested for 64bit-indexing, and I will file a new PR to remove it (it was supposed to be removed at 0.5).

Benchmark:
The device is GTX-1650, I don't have a good GPU at home.

Code:
```python
import torch
print(torch.__version__)

for i in range(100):
    torch.randn(1000, device='cuda')
torch.cuda.synchronize()

for i in range(15, 29):
    %timeit torch.arange(2 ** i, device='cuda'); torch.cuda.synchronize()
```

Before:
```
1.5.0a0+c37a9b8
11.9 µs ± 412 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.7 µs ± 309 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
19.6 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
28.9 µs ± 923 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
48.4 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
85.7 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
162 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
312 µs ± 9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
618 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.22 ms ± 9.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.45 ms ± 97.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.9 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.1 ms ± 378 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:
```
1.5.0a0+7960d19
11 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.4 µs ± 550 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
18.4 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
27.6 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
46.2 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
83.3 µs ± 5.61 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
158 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
307 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
603 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.2 ms ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.4 ms ± 23.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.77 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.51 ms ± 933 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33370

Differential Revision: D19925990

Pulled By: ngimel

fbshipit-source-id: f4a732fe14a5582b35a56618941120d62e82fdce
2020-02-17 17:15:04 -08:00
ebb008eb68 Optimize Unfold3dAcc to improve performance of conv3d backward (#33317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33317

Optimize Unfold3dAcc to improve performance of conv3d backward

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "Conv3d"

Reviewed By: houseroad

Differential Revision: D19892678

fbshipit-source-id: 18873dd1d1409263d9925840db302b21fb3b490d
2020-02-17 14:49:02 -08:00
c90b393c00 Fix logging for aborted communicators in ProcessGroupNCCL. (#33147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33147

The log mentioned that it is aborting communicators even if
`blockingWait_` was false. This was incorrect, and I updated the logging to
reflect the appropriate behavior.
ghstack-source-id: 98025017

Test Plan: waitforbuildbot

Differential Revision: D19817967

fbshipit-source-id: fb3415af2cc99eb20981ceaa5203c0a1880fd6f3
2020-02-17 14:42:51 -08:00
1a589f50bd [auto quant] Add quant_scheme_generator to interface with dper
Summary:
Add quant_scheme_generator that will be used to interface with dper.

Also updated two related functions:

- Add batch_size option to save_local_dataset() in dataset utils to be more flexible.

Test Plan:
Tested in the stacked diff D19747206.

buck test deeplearning/numeric_suite/toolkit/test:int8_static_utils_test

Reviewed By: csummersea

Differential Revision: D19745159

fbshipit-source-id: a4ac1ef0ffdddc68bdf5e209ae801b8c475d0b96
2020-02-17 10:41:22 -08:00
87dc2dbcce Updating submodules
Summary:
GitHub commits:

19c040cb01

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: ddc41000622a682874ab3a11fdf4a91038f9c15f
2020-02-16 23:57:14 -08:00
c57f8984e6 [caffe2] make order btw div and mul in adgrad consistent (#32974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32974

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/286

Re-attempt of D18805426 . Decided to be consistent with PyTorch Adagrad

There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. This diff make them consistent by doing w += lr * grad / (sqrt(moment) + epsilon) in Adagrad and w += lr / (sqrt(moment) + epsilon) * grad in RowWiseSparseAdagrad.

The Adagrad order is consistent with PyTorch (see aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp addcmul_cpu_kernel function). The RowWiseSparseAdagrad order is to make compute more efficient. In RowWiseSparseAdagrad, lr / (sqrt(moment) + epsilon) is shared among all elements in the row

And, we're not going to use FMA to be consistent with PyTorch (even though it provides a little accuracy benefit)

Test Plan: CI

Reviewed By: wx1988

Differential Revision: D19342865

fbshipit-source-id: e950c16f2e1c4a2f2a3ef53b1705db373c67f341
2020-02-16 22:45:59 -08:00
d29997373e Updating submodules
Summary:
GitHub commits:

80dda47903
797af57bb6
b2fceb9d05

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: dde5fb9abca185422df11dc61c658dc333ad63ca
2020-02-16 21:01:37 -08:00
d4e4beddc4 Revert D19871946: [distributed] pass in timeout to TCP store when initializing
Test Plan: revert-hammer

Differential Revision:
D19871946

Original commit changeset: dd002180c4c8

fbshipit-source-id: 40b0676c51e43366c0700e81d16cc7927ee8efc2
2020-02-16 19:37:44 -08:00
df47a3abe0 [distributed] pass in timeout to TCP store when initializing (#33325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33325

Closes https://github.com/pytorch/pytorch/issues/32924. There was a bug where for TCPStore, we would not respect the timeout passed into `init_process_group` while constructing the TCPStore. Instead, we'd set the timeout after the rendezvous created the store, meaning that we used the default timeout of 300s while connecting to the server. This diff passes the timeout passed into `init_process_group` to rendezvous so that it can be passed into the constructor for TCPStore, so that we can use the right timeout at construction time.

Question: Should we make this change for FileStore as well? Currently the FileStore constructor does not take in a timeout at all.
ghstack-source-id: 98401875

Test Plan: Added a UT

Differential Revision: D19871946

fbshipit-source-id: dd002180c4c883216645b8a97cc472c6116ac117
2020-02-16 17:59:44 -08:00
c75d06d854 Move gating part of SparseFeatureGating to local
Summary: in dper2, local net is hard-coded by whitelisting some layers. Add SparseFeatureGating related layers to local net explicitly.

Test Plan:
* workflow: f167812211
* QRT: fall back looks normal

{F228442018}

Differential Revision: D19852280

fbshipit-source-id: 6fecc3d745c3f742d029575a7b9fe320618f1863
2020-02-16 14:18:27 -08:00
f6808df75f [BC] Temporarily fix the BC check (#33387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33387

CI is broken. Skip two functions to fix the problem.

Test Plan: ci

Reviewed By: hl475

Differential Revision: D19926249

fbshipit-source-id: a46d1465c59de8616d2af5fb0b9cc18532359f88
2020-02-15 18:31:25 -08:00
495bd5818b Fix index truncation in argmin/max for large tensors (#33310)
Summary:
Fixes the `TensorIterator` parts of https://github.com/pytorch/pytorch/issues/32863 (THC is still broken)

`TensorIterator::split` now keeps track of the `view_offsets` into the full tensor range. With this, I can take the base offset for the reduced dimension and translate partial results from the sub-iter into the index range of the full tensor. This happens only once for each intermediate result, so we should still benefit from the performance of 32-bit indexing in loops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33310

Differential Revision: D19906136

Pulled By: ngimel

fbshipit-source-id: 3372ee4b8d5b115a53be79aeafc52e80ff9c490b
2020-02-15 17:24:55 -08:00
cd038c0ae9 Get rid of some template arguments in GPU loop (#33308)
Summary:
Globally define
```C++
constexpr int num_threads = C10_WARP_SIZE * 2;
constexpr int thread_work_size = 4;
constexpr int block_work_size = thread_work_size * num_threads;
```
and kill all the template arguments passing these values.

These are effectively global, but we are now passing them around by template arguments, causing many inconvenience in coding.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33308

Differential Revision: D19907250

Pulled By: ngimel

fbshipit-source-id: 4623b69baea7e6e77f460ffdfa07cf9f8cba588a
2020-02-15 15:17:46 -08:00
fd684cc312 Use torch.set_default_dtype in test_data_parallel and rename dtype2prec (#32962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32962

As per gchanan's comments on
https://github.com/pytorch/pytorch/pull/30445, I've used
`torch.set_default_dtype` in test_data_parallel instead of specifying
dtype=torch.double everywhere. Also, renamed dtype2prec to dtype2prec_DONTUSE
ghstack-source-id: 98388429

Test Plan: waitforbuildbot

Differential Revision: D19714374

fbshipit-source-id: eb55bbca33881625636ba9ea6dd4cb692f25668e
2020-02-15 14:07:54 -08:00
6dd6b0bfae Revert D19900566: [pytorch][PR] Simplify prim::shape when we have complete tensor types.
Test Plan: revert-hammer

Differential Revision:
D19900566

Original commit changeset: c8eaad70c8ea

fbshipit-source-id: 764f2139fdf19f22a397694d011078ec525f5e8a
2020-02-15 11:37:35 -08:00
d35a4c202e Add support for aten::slice to guard elimination. (#33311)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33311

Differential Revision: D19911105

Pulled By: resistor

fbshipit-source-id: 402cfe5f2e03a62b78ed13157e1462cefd9eeafb
2020-02-14 22:54:37 -08:00
c37a9b874b Updating submodules
Summary:
GitHub commits:

65758fd3b1
fb73204584
618f71a795

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 814ebbcf35bcecc62ec64854a26ea645d651fbc2
2020-02-14 20:48:09 -08:00
1e76649d30 fast setup for output tensor in tensor iterator (#33165)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33165

Test Plan: Imported from OSS

Differential Revision: D19825853

Pulled By: glaringlee

fbshipit-source-id: 8f908f2e93a4e377306a77e8a771208603b20e72
2020-02-14 20:34:50 -08:00
c6271c63f2 Updating submodules
Summary:
GitHub commits:

46fd5fed10
87cd6087c6

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 402427af823fe31ac1f6e18c5a020ec6ec7cc1af
2020-02-14 20:04:48 -08:00
e1a895858f Allow to register custom passes both before and after fusion. (#33261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33261

It was requested in #33114.

Test Plan: Imported from OSS

Differential Revision: D19910600

Pulled By: ZolotukhinM

fbshipit-source-id: 827f1744b97f386065a21d1ba5d82c1f90edbe46
2020-02-14 16:28:52 -08:00
3359871f5d .circleci: Use volume mounts instead of docker cp (#33355)
Summary:
docker cp was erroring out, so lets just use volume mounts instead which
should hopefully be more consistent

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33355

Differential Revision: D19913948

Pulled By: seemethere

fbshipit-source-id: 059ddd36a8162f946cfea451b5dcd1706f1209e9
2020-02-14 15:32:57 -08:00
dfafe2aad1 .cirlceci: Swap PYTORCH_BUILD_VERSION if on tag (#33326)
Summary:
Basically just fills out PYTORCH_BUILD_VERSION to the correct version
baesd on the git tag.

This makes it so that we don't have to continually edit this file
when doing releases.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33326

Differential Revision: D19911035

Pulled By: seemethere

fbshipit-source-id: e27105f3e193a49dd68452d8f60232f8a132acad
2020-02-14 14:43:29 -08:00
5cab54e0db Revert D19560159: [RPC Reliability] Implemented retries for RPCs with exponential backoff
Test Plan: revert-hammer

Differential Revision:
D19560159

Original commit changeset: 40cd86f9a25d

fbshipit-source-id: 70f5b19bc05fc34e3c912f42f9d32b9fb80aed06
2020-02-14 14:29:59 -08:00
0b5b2b864a [BC-Breaking] Rename at::Tensor::base() to _base() (#33316)
Summary:
This PR renames `at::Tensor::base()` to `at::Tensor::_base()`, to achieve parity with Python `torch.Tensor._base` API.

----

This PR is BC-breaking in the following way:

Previously, to get the tensor that this tensor is a view of, the user would call `tensor.base()` in C++. Now, they must call `tensor._base()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33316

Differential Revision: D19905687

Pulled By: yf225

fbshipit-source-id: 949d97b707b2c82becb99ac89e9ac24359d183e6
2020-02-14 14:06:58 -08:00
9c0625b004 [iOS] Add watchOS support (#33318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33318

### Summary

Recently, we have a [discussion](https://discuss.pytorch.org/t/libtorch-on-watchos/69073/14) in the forum about watchOS. This PR adds the support for building watchOS  libraries.

### Test Plan

- `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=WATCHOS ./scripts/build_ios.sh`

Test Plan: Imported from OSS

Differential Revision: D19896534

Pulled By: xta0

fbshipit-source-id: 7b9286475e895d9fefd998246e7090ac92c4c9b6
2020-02-14 14:02:22 -08:00
ecd9a5ad12 Simplify prim::shape when we have complete tensor types. (#33336)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33336

Differential Revision: D19900566

Pulled By: resistor

fbshipit-source-id: c8eaad70c8ea57ebbe920dcfbdaf6a9435b49506
2020-02-14 13:53:08 -08:00
9c8b67b179 Revert D19905015: Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows
Test Plan: revert-hammer

Differential Revision:
D19905015

Original commit changeset: b117e44d5552

fbshipit-source-id: a10c78aed953434f69f466bdd36f914334ba82f3
2020-02-14 13:42:29 -08:00
b730c5a3bd remove dispatch key (#33266)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33266

Test Plan: Imported from OSS

Differential Revision: D19907697

Pulled By: anjali411

fbshipit-source-id: 99fc06b7c41229e8d9ed4271de62247cda12ee6e
2020-02-14 13:26:15 -08:00
6ade7e3a15 [ROCm] Enable 3D convolutions through ROCm (#33067)
Summary:
For both the Caffe2 and PyTorch backends, enable 3D convolutions through MIOpen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33067

Reviewed By: BIT-silence

Differential Revision: D19880495

Pulled By: bddppq

fbshipit-source-id: 8f6f970910654c1c5aa871b48a04c1054875691c
2020-02-14 13:19:10 -08:00
9823662b43 [ONNX] Export split with list of sizes (#33161)
Summary:
Exporting Split with a dynamic list of split_sizes is not supported.
This PR enables export using onnx SplitToSequence + SequenceAt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33161

Reviewed By: hl475

Differential Revision: D19860152

Pulled By: houseroad

fbshipit-source-id: 300afedc22b01923efb23acd1a3627aa146bb251
2020-02-14 12:46:33 -08:00
e9e9331927 Fractional Max Pooling: output ratios defined as double (#33304)
Summary:
References https://github.com/pytorch/pytorch/issues/33240
Changes options.output_ratio from long integer to double to allow ratios to used to calculate output size from inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33304

Differential Revision: D19887318

Pulled By: yf225

fbshipit-source-id: 228c2c6bf4158307700c2a983d27d539c6b9eded
2020-02-14 12:31:39 -08:00
243cc20451 Enable inplace relu fusion for training (#33105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33105

Support inplace relu for Conv+BN+Relu fusion during training.
ghstack-source-id: 97944659

Test Plan: buck test caffe2/test:quantization --  'test_fuse_module_train \(test_quantization\.FusionTest\)' --print-passing-details

Differential Revision: D19795221

fbshipit-source-id: 056dc06050d145750c4d0044c0fc1c3febcfdafc
2020-02-14 12:15:58 -08:00
8245641091 Re-activate binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33321)
Summary:
Re-send the PR as Intel has restored the relevant packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33321

Differential Revision: D19894221

Pulled By: zhangguanheng66

fbshipit-source-id: bc19dcfa5b17ff047f9ae09ebd8eadfb01f7ed68
2020-02-14 12:01:56 -08:00
92b67c03e4 [RPC Reliability] Implemented retries for RPCs with exponential backoff (#32602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32602

This adds functionality for re-trying RPC's that are sent with the function `sendWithRetries()`. It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout.

GitHub Issue: https://github.com/pytorch/pytorch/issues/32124

Per the first 3 milestones, the following will be addressed in future PR's:
* enabling RPC Retries for RRef internal messages

Differential Revision: D19560159

fbshipit-source-id: 40cd86f9a25dc24367624d279a3b9720b20824cf
2020-02-14 11:57:24 -08:00
ae53f8dd25 Revert D19859905: [pytorch][PR] Gradient scaling API
Test Plan: revert-hammer

Differential Revision:
D19859905

Original commit changeset: bb8ae6966214

fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970
2020-02-14 11:03:27 -08:00
b276ddda38 remove THC dist code which nerver be used (#33283)
Summary:
Remove THC dist code which nerver be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33283

Differential Revision: D19905361

Pulled By: gchanan

fbshipit-source-id: 367fd31e2209d36b30af31511554fdbdd67c98e4
2020-02-14 10:37:23 -08:00
4bef344210 Implementation of mixture distributions (#22742)
Summary:
Addressing issue https://github.com/pytorch/pytorch/issues/18125
This implements a mixture distributions, where all components are from the same distribution family. Right now the implementation supports the ```mean, variance, sample, log_prob``` methods.

cc: fritzo and neerajprad

- [x] add import and `__all__` string in `torch/distributions/__init__.py`
- [x] register docs in docs/source/distributions.rst

### Tests
(all tests live in tests/distributions.py)
- [x] add an `Example(MixtureSameFamily, [...])` to the `EXAMPLES` list,
     populating `[...]` with three examples:
     one with `Normal`, one with `Categorical`, and one with `MultivariateNormal`
     (to exercise, `FloatTensor`, `LongTensor`, and nontrivial `event_dim`)
- [x] add a `test_mixture_same_family_shape()` to `TestDistributions`. It would be good to test this with both `Normal` and `MultivariateNormal`
- [x] add a `test_mixture_same_family_log_prob()` to `TestDistributions`.
- [x] add a `test_mixture_same_family_sample()` to `TestDistributions`.
- [x] add a `test_mixture_same_family_shape()` to `TestDistributionShapes`

### Triaged for follup-up PR?
- support batch shape
- implement `.expand()`
- implement `kl_divergence()` in torch/distributions/kl.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22742

Differential Revision: D19899726

Pulled By: ezyang

fbshipit-source-id: 9c816e83a2ef104fe3ea3117c95680b51c7a2fa4
2020-02-14 10:31:56 -08:00
7dde91b0ae Vectorize elu and its backward function on CPU (#32986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32986

Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz)

```python
import timeit
for op in ('ELU',):
    print('Forward')
    for dtype in ('torch.double', 'torch.float'):
        for n, t in [(10_000, 100000),
                    (100_000, 10000)]:
            print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}')
            print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, dtype={dtype})', number=t))
    print('Backward')
    for dtype in ('torch.double', 'torch.float'):
        for n, t in [(20_000, 100000),
                    (200_000, 10000)]:
            print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}')
            print(timeit.timeit('y.backward(retain_graph=True)',
                                setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, requires_grad=True, dtype={dtype}); x = m(a); y = x.sum()',
                                number=t))
```

Before:

```
Forward
torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double
5.292799739996553
torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double
4.828570917001343
torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float
3.1359513780043926
torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float
2.7030876770004397
Backward
torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double
4.568238995998399
torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double
1.8908141480060294
torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float
3.8652471189998323
torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float
1.13068484600808
```

After:

```
Forward
torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double
2.1265591429983033
torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double
1.6708065870043356
torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float
1.1806934149935842
torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float
0.77735430400935
Backward
torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double
4.494567882007686
torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double
2.007220732004498
torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float
3.615133151994087
torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float
1.105554559995653
```

Test Plan: Imported from OSS

Differential Revision: D19794595

Pulled By: VitalyFedyunin

fbshipit-source-id: c319ec04676ced22179b8b34789ac8bf6428deab
2020-02-14 09:45:17 -08:00
1b2d2ba504 [PyTorch] Fix write-after-free (TSAN) in GraphTask::set_error() (#33156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33156

When dist_autograd_spawn_thrift's 'test_backward_node_failure_python_udf' test is
run, it was encountering a TSAN error related to holding the mutex while the
underlying datastructure was being dealloced.

In this change, we simply get a shared_ptr<> reference to the future, and
set_exception() without having the lock held, to avoid deallocing underneath
the lock.
ghstack-source-id: 98303434

Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/rpc:dist_autograd_spawn_thrift -- 'test_backward_node_failure_python_udf \(test_dist_autograd_spawn\.DistAutogradTestWithSpawn\)'

Differential Revision: D19821362

fbshipit-source-id: 82f735e33f8e608552418ae71592400fa3621e40
2020-02-14 09:32:17 -08:00
0c98939b7b Revert D19899550: [pytorch][PR] Second try on Von Mises: Make it JIT compatible
Test Plan: revert-hammer

Differential Revision:
D19899550

Original commit changeset: fbcdd9bc9143

fbshipit-source-id: c8a675a8b53f884acd0e6c57bc7aa15faf83d5d6
2020-02-14 08:42:16 -08:00
ff5f38f53b Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows
Test Plan: revert-hammer

Differential Revision:
D19858239

Original commit changeset: f068d8505886

fbshipit-source-id: b117e44d5552e157747920d8098ce3b86a29c6bf
2020-02-14 07:35:08 -08:00
b1583ceb1e Second try on Von Mises: Make it JIT compatible (#33177)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/17168 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33177

Differential Revision: D19899550

Pulled By: ezyang

fbshipit-source-id: fbcdd9bc91438164bcb2b1cbc314c765520754e1
2020-02-14 07:16:41 -08:00
ecd3c252b4 Suport all length one SLS op lowering: C2 part (#33332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33332

We check the input shape of lengths and indices of SLS and add an attribute if they are the same.

Test Plan:
```
buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- test_slws_fused_8bit_rowwise_length1_graph
```

Reviewed By: ipiszy

Differential Revision: D19874903

fbshipit-source-id: 06b643b5351d0ba19ba209b5a5b599fbb38b1dfc
2020-02-13 22:53:11 -08:00
0150f40dde dont force msvc /Ox flag which can conflict with /RTC1 in debug config (#33164)
Summary:
Relates to https://github.com/pytorch/pytorch/issues/33132

This fix doesn't add full multi-configuration support described in https://github.com/pytorch/pytorch/issues/33132 but at least avoid the error presented in the issue when `CMAKE_BUILD_TYPE=Debug` is used with MSVC.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33164

Differential Revision: D19899727

Pulled By: ezyang

fbshipit-source-id: 28a364d920c4a3fb577c6b484ccd69a133fbcf5d
2020-02-13 22:15:20 -08:00
602aec325d Kill old cuda support (#33302)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33302

Differential Revision: D19899586

Pulled By: ezyang

fbshipit-source-id: 11293475795b4bfee9a65133bb6718649e220787
2020-02-13 21:48:07 -08:00
e5218e3e12 Add missing error messages for container modules (#29991)
Summary:
Container `Module`s, including `ModuleList`, `ParameterList` and `ParameterDict`, should not be called like a regular `Module`.
This PR add error messages for these special modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29991

Differential Revision: D19698535

Pulled By: ezyang

fbshipit-source-id: fe156a0bbb033041086734b38f8c6fde034829bf
2020-02-13 21:34:27 -08:00
92fbf7cf97 [caffe2] use JIT'ed fp16 SLS (#32432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32432

Use JIT'ed fp16 SLS in D19477209 from Caffe2 operators

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D19477208

fbshipit-source-id: ef2ccba10f5f4c475166141bf09c266dedb92d38
2020-02-13 21:15:39 -08:00
642bd51043 [ONNX] Skip problematic ONNX test to unblock CI (#33323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33323

skip the tests until it is fixed

Test Plan: ci

Reviewed By: hl475

Differential Revision: D19894675

fbshipit-source-id: 1cfc153577bf021171f4412115d84719beae7a91
2020-02-13 21:08:27 -08:00
e5c7b7b8b5 Automatic update of fbcode/onnx to 04a29addfd5b912812addb8dea5f8763fbfaad01 (#33328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33328

Previous import was 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e

Included changes:
- **[04a29add](https://github.com/onnx/onnx/commit/04a29add)**: Use // instead of # (#2598) <Lu Fang>
- **[f8e140a9](https://github.com/onnx/onnx/commit/f8e140a9)**: Kezhan/function update (#2596) <Ke Zhang>
- **[6185faae](https://github.com/onnx/onnx/commit/6185faae)**: fix the attribute types section in IR.md (#2590) <Ke Zhang>
- **[f254647a](https://github.com/onnx/onnx/commit/f254647a)**: Allow Constant operator to promote scalar and list to tensors. (#2592) <Jeremy Cochoy>
- **[f12ec799](https://github.com/onnx/onnx/commit/f12ec799)**: Add NegativeLogLikelihood(NllLoss) op (#2551) <liqunfu>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D19897554

fbshipit-source-id: d8efb5c5ac8f9d71727de33c67af681ed8ec8123
2020-02-13 21:03:17 -08:00
93179b1c1c [jit] Initial use RRef in TorchScript (#33190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33190

This enable the initial RRef type to be used inside TorchScript, user
could pass a python RRef into a torchscript function and call to_here
inside. Specifically, this PR:

- Add RRef schema type parsing
- Add python interop for RRef in Python and into JIT
- register to_here op in register_distributed_ops

More support for RRef in TorchScript will be added in future PRs

Test Plan: Imported from OSS

Differential Revision: D19871244

Pulled By: wanchaol

fbshipit-source-id: 7eca6c491a84666b261c70806254b705603bd663
2020-02-13 20:17:25 -08:00
b2c5896432 [jit] Add RRef to IValue and JIT type system (#32992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32992

This PR add RRef to IValue and the JIT type system.

- The RRefInterface abstract class inherit from intrusive_ptr_target,
  this made the RRef class can be hold in ivalue as intrusive_ptr

- Add RRefType as a JIT type, it's a container type similar to
future type.

Test Plan: Imported from OSS

Differential Revision: D19871242

Pulled By: wanchaol

fbshipit-source-id: cb80ca32605096f9a42ef147109fb368a7c1d4d3
2020-02-13 20:17:20 -08:00
9ae4d38a21 [rpc] Switch RRef to be managed by intrusive_ptr (#33189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33189

Add RRefInterface to Aten/Core, which will later be used by IValue

Switch all the rpc code base to use intrusive_ptr instead of shared_ptr,
so that we could add it to IValue.

Actual adding to IValue and JIT will be in next PR

Test Plan: Imported from OSS

Differential Revision: D19871241

Pulled By: wanchaol

fbshipit-source-id: d7e1fd04b46320e0f26c18591b49c92ad30a4032
2020-02-13 20:15:31 -08:00
cb4e6d025a Updates numpy to tensor negative stride error message (#33254)
Summary:
See https://discuss.pytorch.org/t/bugs-about-torch-from-numpy-array/43312.

This update incorporates albanD 's suggestion into the error message, saving future users from having to ask or look on the forums if they encounter this issue and don't mind making their arrays contiguous.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33254

Differential Revision: D19885808

Pulled By: mruberry

fbshipit-source-id: 8f0fd994cf8c088bf3c3940ab4dfb3ddbc5b3ede
2020-02-13 15:38:52 -08:00
a80d0330e4 add int4 fake fp16 mappings
Summary: update this mapping with thte int4 sls ops so we can run netrunner

Test Plan: testing with net_runner

Reviewed By: jfix71

Differential Revision: D19879826

fbshipit-source-id: eac84b10e2365c21cb8a7cfbf3123e26a9945deb
2020-02-13 15:37:23 -08:00
eb9b4b1f29 handle errors in ProcessGroupAgent::listenLoop(). (#32957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32957

Closes https://github.com/pytorch/pytorch/issues/29703. If there is a
gloo timeout and `recvWork->wait()` times out in `listenLoop()`,
processGroupagent crashes since there is an unhandled exception in a thread.
This catches the exception and exits the listen loop. In a follow up diff, we
will enhance these error conditions so that if users attempt to send RPCs
again, they are notified that the RPC agent was in a bad state and it was
shutdown.

This PR also adds a new option, `processGroupTimeout` to PG agent's backend
options. This allows us to control the gloo timeout.
ghstack-source-id: 98236783

Test Plan: Added a unit test.

Differential Revision: D19678979

fbshipit-source-id: 3895ae754f407b84aca76c6ed3cb087d19178c40
2020-02-13 14:50:05 -08:00
7ae1e023e7 glu: port cpu forward implementation to ATen (#26410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26410

I only ported the CPU forward implementation for now to try a CPU-only benchmark.

Test Plan: Imported from OSS

Differential Revision: D17454519

Pulled By: gchanan

fbshipit-source-id: ff757cf972c5627074fea2f92a670129007a49f4
2020-02-13 14:32:25 -08:00
0808485c6a Workaround performance bug / memory leak in GOMP (#32875)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32008

This is similar to CaoZhongZ's patch which runs on all OpenMP threads in the team and selectively exits early to scale the number of threads active. I have also restored the `if` clause from before https://github.com/pytorch/pytorch/issues/26963 so that running on 1 thread should still avoid additional synchronisation.

One comment is that this does slightly change the meaning of `at::get_num_threads` inside of a `parallel_for` loop since it's not guaranteed that the function was called on that many threads. I've looked at the uses within ATen and couldn't see anything that would be problematic. There are a few places in `quantized` that seem to make this assumption but they always use a grain size of 1 so should be safe:
d9e99ab544/aten/src/ATen/native/quantized/cpu/qconv.cpp (L436-L437)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32875

Differential Revision: D19775823

Pulled By: VitalyFedyunin

fbshipit-source-id: 4f843b78cdb9e2766339590d728923786a00af6d
2020-02-13 14:31:08 -08:00
bbdc5b7bd0 Optimize error checking in mvlgamma (#32665)
Summary:
- Clean up error checking code
- Avoid unecessary floating-point computation
- Use float instead of double when possible to avoid massive cast in the tensor
- Use bool instead of uint8_t for clear Boolean purpose
- Improve error message
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32665

Differential Revision: D19601920

Pulled By: VitalyFedyunin

fbshipit-source-id: 0c6c6b5ff227b1437a6c1bae79b2c4135a13cd37
2020-02-13 14:05:19 -08:00
5b922918d0 Disable flaky test TestCppExtensionAOT.test_cuda_extension in Windows CI (#33282)
Summary:
See https://github.com/pytorch/pytorch/issues/33270 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33282

Differential Revision: D19886975

Pulled By: yf225

fbshipit-source-id: 7e6756095b1bb8c55fc5acb8fc2cb02c1e89b032
2020-02-13 13:10:44 -08:00
0c93c2b142 Add a warning sign for anomaly detection (#33176) (#33239)
Summary:
Fixes [33176](https://github.com/pytorch/pytorch/issues/33176)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33239

Differential Revision: D19879847

Pulled By: albanD

fbshipit-source-id: 594b936c10f98c364331e782b64f42059413a741
2020-02-13 12:52:21 -08:00
6c6a814a2c Beef up documentation on DispatchKey.h (#33011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33011

I also reordered some of the keys in non-semantic ways to make the
organizational grouping mroe clear.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19796584

Pulled By: ezyang

fbshipit-source-id: 3083abadb47e9f382b9fbe981af0b34203c6ea4d
2020-02-13 12:26:19 -08:00
2e88d3d703 [quant] Add Quantized BatchNorm2d module (#33109)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33109

Test Plan:
python test/test_quantized_nn_mods.py ModuleAPITest.test_batch_norm

Imported from OSS

Differential Revision: D19861926

fbshipit-source-id: 67315e49b4b3577b965d422ca707d927d977feeb
2020-02-13 12:15:43 -08:00
d0435604a5 [quant] Add a quantized batch_norm operator (#33080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33080

Quantized batch norm for cases where batch norm cannot be fused with conv.
AVX2 implementation is from Caffe2.

Test Plan:
python test/test_quantized.py TestQuantizedOps.test_batch_norm

Imported from OSS

Differential Revision: D19861927

fbshipit-source-id: bd8cd101fc063cb6358132ab7c651a160999293c
2020-02-13 12:15:38 -08:00
b28a834813 [codemod][lint][fbcode] Apply google-java-format
Test Plan: Sandcastle. Visual inspection.

Reviewed By: scottrice

Differential Revision: D19878711

fbshipit-source-id: be56f70b35825140676be511903e5274d1808f25
2020-02-13 12:14:14 -08:00
bf16688538 [JIT] peephole optimize values with NoneType (#33264)
Summary:
If a value has the type None, we can always replace it with a None constant.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33264

Differential Revision: D19878695

Pulled By: eellison

fbshipit-source-id: 5d0e7ffb37c5747997df093fec3183039d8dff4d
2020-02-13 12:03:49 -08:00
0c474d95d9 Remove Half support in binary cross entropy and some activation functions on CPU (#33206)
Summary:
For reasons similar to https://github.com/pytorch/pytorch/issues/33021. Note that the support of half type has
not been available in any releases yet so it should be safe to remove (All forward ones concerning this PR were added in daef363b15c8a3aaaed09892004dc655df76ff81 and 8cb05e72c69fdd837548419770f3f1ba9807c16d)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33206

Differential Revision: D19861137

Pulled By: ezyang

fbshipit-source-id: 38a3a398a716a782c26a611c56ddeab7eb7ac79e
2020-02-13 11:47:42 -08:00
946f3a9ed7 Refactor and add VS 14.16 and 2019 CI for Windows (#33117)
Summary:
Changes according to https://github.com/pytorch/pytorch/issues/18319.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33117

Differential Revision: D19858239

Pulled By: ezyang

fbshipit-source-id: f068d8505886b92c9388c9c636eab5bd20377ceb
2020-02-13 11:45:41 -08:00
2635055229 [ROCm] Enable 3D batch norms through MIOpen (#33262)
Summary:
Enable test for Caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33262

Differential Revision: D19880486

Pulled By: bddppq

fbshipit-source-id: af663a11137a53302e55198f38117ab6bdc9ec89
2020-02-13 11:29:51 -08:00
acea368095 Fix compilation error when buildng with FFMPEG (#27589)
Summary:
When building with FFMPEG, I encountered compilation error due to missing include/library.
I also find the change in video_input_op.h will improve build on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27589

Differential Revision: D19700351

Pulled By: ezyang

fbshipit-source-id: feff25daa43bd2234d5e75c66b9865b672a8fb51
2020-02-13 11:23:48 -08:00
40246fa63c Gradient scaling API (#26512)
Summary:
This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing.  Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081.

Volume-wise, this PR is mostly documentation and tests.  The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight .  The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant.

The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package.  `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases.  Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P).

Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible.

Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API.

However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates.  User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself.  [I know what the contract with custom optimizers should be](35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`.

Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081.  The gradient scaling API is intended to be orthogonal/modular relative to autocasting.  Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well.

### Todo
- [ ] How do I get c10 registered status for my two custom kernels?  They're very simple.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512

Differential Revision: D19859905

Pulled By: mruberry

fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923
2020-02-13 11:06:06 -08:00
d613bd0522 [rpc][easy] move unnecessary python call directly to pybind (#33174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33174

Closes https://github.com/pytorch/pytorch/issues/32780. It looks like
this is the only callsite where we do `_get_current_rpc_agent().foo()`, and we
can do this directly in the pybind layer to save some overhead.
ghstack-source-id: 98200664

Test Plan: All UTs should pass.

Differential Revision: D19828786

fbshipit-source-id: 5c34a96b5a970e57e6a1fdf7f6e54c1f6b88f3d8
2020-02-13 09:14:13 -08:00
0bf60e348f Revert D19878241: [pytorch][PR] Restore tests binary_macos_libtorch_2_7_cpu_build and binary_macos_li…
Test Plan: revert-hammer

Differential Revision:
D19878241

Original commit changeset: 07bce43e4667

fbshipit-source-id: 7f76717d73e264f30e8f56fb7bc38c8928dea092
2020-02-13 09:09:11 -08:00
ff7d147732 Restore tests binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33291)
Summary:
Fix https://github.com/pytorch/pytorch/issues/33209
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33291

Differential Revision: D19878241

Pulled By: zhangguanheng66

fbshipit-source-id: 07bce43e466708dacd37b87ba3419435c6a7cde5
2020-02-13 08:48:16 -08:00
d554b112e3 Add histogram collection and weight prepacking utils (#33125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33125

Provide histogram collection and weights prepacking interface for Dper to auto quantize the Ads models.

Test Plan:
buck test mode/opt deeplearning/numeric_suite/toolkit/test:int8_static_utils_test

buck test mode/opt deeplearning/numeric_suite/toolkit/test:histogram_utils_test

Reviewed By: amylittleyang

Differential Revision: D19794819

fbshipit-source-id: 6a4f4a6684da0977b7df2feed8a4b961db716da8
2020-02-13 01:40:20 -08:00
b98c7d34ed [TVM] Add clip op to c2_frontend (#33257)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33257

Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform

Reviewed By: yinghai

Differential Revision: D19866406

fbshipit-source-id: e903e15178af323d0bd1f804e09919023c0a2989
2020-02-12 22:30:43 -08:00
16685d93e9 [TVM] Add ReplaceNaN op (#33256)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33256

Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform

Reviewed By: yinghai

Differential Revision: D19851553

fbshipit-source-id: dee048c52ade16d9e531256b90e5d3391632cd8e
2020-02-12 22:29:30 -08:00
03e9b9ce18 [PyTorch BC] Remove unnecessary items in whitelist (#33247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33247

remove stale items.

Test Plan: ci

Reviewed By: hl475

Differential Revision: D19861294

fbshipit-source-id: 2b112e5908c19a1ff190e3850085038065d21c53
2020-02-12 21:34:18 -08:00
e45343fa14 TORCH_INTERNAL_ASSERT_DEBUG_ONLY not eating message string (#33251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33251

Somehow this was preventing `c10::Error` exceptions from ever being thrown on windows when `defined(NDEBUG) == false`. Kinda scary.

Test Plan: sandcastle green, made sure `intrusive_ptr_test.cpp` (givenStackObject_whenReclaimed_thenCrashes) passed inside ovrsource using `mode/win/dev-debug`

Reviewed By: malfet

Differential Revision: D19865667

fbshipit-source-id: c32d5752025c043e57d16c6d14a94b069bed0bc3
2020-02-12 21:23:34 -08:00
f61b45fc89 [jit] Support properties on Device (#32953)
Summary:
Stacked PRs
 * #32955 - [jit] Fix flipped PackedSequence outputs in script
 * **#32953 - [jit] Support properties on `Device`**

PyTorch devices have a `index` and `type` property. This PR adds support for both to TorchScript
](https://our.intern.facebook.com/intern/diff/19849320/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32953

Pulled By: driazati

Differential Revision: D19849320

fbshipit-source-id: ce845258c6110058dd9ea1f759ef74b7ed2e786e
2020-02-12 18:59:10 -08:00
806e7daa1f Rename TorchScript compiler to IR emitter to better reflect its function. (#33127)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33127

Test Plan: Imported from OSS

Differential Revision: D19806503

Pulled By: ZolotukhinM

fbshipit-source-id: ab78bdbbac5f12dbcc6c2e2573f5862a16ffcf3d
2020-02-12 18:45:13 -08:00
91744907d4 SGD: updated step and class design (#32592)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32592

Differential Revision: D19868154

Pulled By: anjali411

fbshipit-source-id: ce888efc68b1531d97e8b0abf2b146198e012d2f
2020-02-12 18:38:55 -08:00
914610d079 [pytorch][quant] Add assert for min, max, qmin, qmax for ChooseQuantizationParams (#32739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32739

As Title says.
ghstack-source-id: 98061467

Test Plan: CI

Differential Revision: D19610810

fbshipit-source-id: f9621cd7d780769941ed77974b19c5226d4b2b30
2020-02-12 16:49:31 -08:00
bc0ab07064 Opitmize Unfold3d to improve performance of Conv3d (#33191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33191

Opitmize Unfold3d to improve performance of Conv3d forward

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "Conv3d"

Reviewed By: houseroad

Differential Revision: D19821946

fbshipit-source-id: 937adafddb9a1aef5f1d1423dd99884c59e465f9
2020-02-12 16:34:55 -08:00
0e753b2818 Fix SIGABORT caused by double exception in PyTorchStreamReader when file not found. (#33243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33243

If a file does not exist in an archive, PyTorchStreamReader throws an exception. However, when PyTorchStreamReader is destructed another exception is thrown while processing the first exception. As a result of this double exception there is SIGABORT.

Thanks dreiss for catching this bug and suggesting the fix. It happened when he used _load_for_mobile to load a torch script file without bytecode session. A unittest is added to test this case.

Test Plan: Imported from OSS

Differential Revision: D19859205

Pulled By: iseeyuan

fbshipit-source-id: 8f96b6256f1a1f933fce1c256d64604c7e9269e4
2020-02-12 16:27:15 -08:00
ac8511a21e Updating submodules
Summary:
GitHub commits:

927d8afa7a
e64508917b
40d690970f

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 9135af67550f83a598a0a0baa1f9f6b1e4311ddf
2020-02-12 15:43:34 -08:00
f9ad5528e0 Fix for rand_like as well. (#33095)
Summary:
This is a followup PR to https://github.com/pytorch/pytorch/issues/32830 This solves the same issue for RandLike which we saw in RandNLike
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33095

Reviewed By: hl475

Differential Revision: D19848625

Pulled By: houseroad

fbshipit-source-id: 147921becf79490027a93606d52c5bc41d9eaf7f
2020-02-12 14:54:39 -08:00
f045dab3dd Remove ImplicitTensorToNum (#32761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32761

This replaces ImplicitTensorToNum with result-specific operators like
IntImplicit, FloatImplicit, or ScalarImplicit. Note that ScalarImplicit
was not correctly implemented before and this PR fixes the lapse.

This does not change on-disk serialization because these operators are not
serialized directly but written as eg. `annotated(int, foo)`.

Test Plan: Imported from OSS

Differential Revision: D19615385

Pulled By: zdevito

fbshipit-source-id: 48575f408e8219d2ec5b46936fc2aa691f283976
2020-02-12 14:49:07 -08:00
99349defc1 remove unnecessary Node* ops (#32760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32760

Minor changes to the way ops are implemented to remove incidental use of Node*
in the operator implementation.

Current state for operators that previously took Node:

```
TBD:

USES NODE: prim::DifferentiableGraph(...) -> (...)
USES NODE: prim::profile(...) -> (...)
USES NODE: prim::FusionGroup(...) -> (...)
USES NODE: prim::PythonOp(...) -> (...)
USES NODE: prim::ImplicitTensorToNum(Tensor a) -> Scalar # next PR

Should be made interpreter primitives:

USES NODE: prim::TupleUnpack(...) -> (...)
USES NODE: prim::TupleSlice(...) -> (...)
USES NODE: prim::TupleConstruct(...) -> (...)
USES NODE: prim::ListUnpack(...) -> (...)
USES NODE: prim::ListConstruct(...) -> (...)
USES NODE: prim::DictConstruct(...) -> (...)
USES NODE: prim::Constant() -> (...)
USES NODE: prim::isinstance(...) -> (...)
USES NODE: prim::CreateObject(...) -> (...)
USES NODE: prim::fork(...) -> (...)
USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack

Should be made into vararg operators, i.e. the operators last argument should be an IValue
that contains the number of arguments.

USES NODE: prim::FusedConcat(...) -> (...)
USES NODE: prim::MMTreeReduce(...) -> (...)
USES NODE: prim::MMBatchSide(...) -> (...)
USES NODE: prim::ConstantChunk(...) -> (...)
USES NODE: prim::AutogradAnyNonZero(...) -> bool
USES NODE: prim::BroadcastSizes(...) -> (...)
USES NODE: prim::ChunkSizes(...) -> (...)
USES NODE: aten::format(str self, ...) -> str
USES NODE: prim::Print(...) -> (...)

fixed:

USES NODE: aten::extend(Tensor[](a!) self, Tensor [] other) -> ()
USES NODE: aten::copy(Tensor[](a) self) -> Tensor[]
USES NODE: aten::extend(int[](a!) self, int [] other) -> ()
USES NODE: aten::copy(int[](a) self) -> int[]
USES NODE: aten::extend(float[](a!) self, float [] other) -> ()
USES NODE: aten::copy(float[](a) self) -> float[]
USES NODE: aten::extend(bool[](a!) self, bool [] other) -> ()
USES NODE: aten::copy(bool[](a) self) -> bool[]
USES NODE: aten::extend(t[](a!) self, t [] other) -> ()
USES NODE: aten::copy(t[](a) self) -> t[]
USES NODE: aten::keys(Dict(str, t) self) -> str[](*)
USES NODE: aten::values(Dict(str, t) self) -> t[](*)
USES NODE: aten::dict((str, tVal)[] inputs) -> Dict(str, tVal)
USES NODE: aten::keys(Dict(int, t) self) -> int[](*)
USES NODE: aten::values(Dict(int, t) self) -> t[](*)
USES NODE: aten::dict((int, tVal)[] inputs) -> Dict(int, tVal)
USES NODE: aten::keys(Dict(float, t) self) -> float[](*)
USES NODE: aten::values(Dict(float, t) self) -> t[](*)
USES NODE: aten::dict((float, tVal)[] inputs) -> Dict(float, tVal)
USES NODE: aten::keys(Dict(Tensor, t) self) -> Tensor[](*)
USES NODE: aten::values(Dict(Tensor, t) self) -> t[](*)
USES NODE: aten::dict((Tensor, tVal)[] inputs) -> Dict(Tensor, tVal)
USES NODE: aten::test_vartype2(t a, t[] b) -> (t[])
USES NODE: aten::_ncf_unsqueeze(Tensor self, int ndim) -> Tensor
USES NODE: aten::_ncf_view(Tensor self, int[] input_shape, int normalized_ndim) -> Tensor
USES NODE: prim::is_none(int? a) -> bool
USES NODE: aten::__interpolate(Tensor input, int? size = None, float[]? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor
USES NODE: aten::__interpolate(Tensor input, int[]? size = None, float[]? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor
USES NODE: aten::__interpolate(Tensor input, int? size = None, float? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor
USES NODE: aten::__interpolate(Tensor input, int[]? size = None, float? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor
USES NODE: aten::sorted(t[](a) self) -> (t[])
USES NODE: aten::sort(t[](a!) self, bool reverse=False) -> ()
USES NODE: aten::test_vartype(t[] a, t b) -> (t)
USES NODE: prim::unchecked_unwrap_optional(t(a)? optional) -> t(a)
USES NODE: prim::unchecked_cast(...) -> (...)
USES NODE: aten::dict() -> Dict(str, Tensor)
USES NODE: prim::Load(...) -> (...)
USES NODE: prim::Store(...) -> (...)
USES NODE: prim::Drop(...) -> (...)
USES NODE: aten::tensor(t[] data, *, ScalarType? dtype=None, Device? device=None, bool requires_grad=False) -> Tensor
USES NODE: aten::as_tensor(t[] data, *, ScalarType? dtype=None, Device? device=None) -> Tensor
```

Test Plan: Imported from OSS

Differential Revision: D19615387

Pulled By: zdevito

fbshipit-source-id: 95298c3c4249b9f812c332d13f0fb79daeecb662
2020-02-12 14:49:02 -08:00
72a00a8a9c Remove Node dependencies from operator.h (#32682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32682

This moves code around so that operator.h/cpp no longer requires a full
definition of Node* nor does it include alias analysis or the pretty printer.

This should make it possible to include in the mobile build.

Functionality for checking if operators match Node and to look up
and operator for a Node have moved to the Node object.

Test Plan: Imported from OSS

Differential Revision: D19615386

Pulled By: zdevito

fbshipit-source-id: e38bdf29971183597ef940d061c06ba56e71d9c5
2020-02-12 14:47:26 -08:00
ab14375b08 Workaround for CUDA10.2.89 CUDA extension compilation error (#33230)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/33203
PR based on https://github.com/mpark/variant/pull/73

Verified locally on CUDA10.2.89 and 10.1.243

Thanks ngimel for the hint and gridley for the initial fix in the variant repo! :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33230

Differential Revision: D19858083

Pulled By: ngimel

fbshipit-source-id: b9438084f5688712c6aa6b17813c68ccde237bbb
2020-02-12 14:23:30 -08:00
40265e2d66 prevent various warnings related to undef and redef (#33196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33196

Test Plan: Sandcastle green

Reviewed By: malfet

Differential Revision: D19842268

fbshipit-source-id: 47bc3d7a75e803041491e11a648b4a9e7d9cc72c
2020-02-12 13:28:35 -08:00
323b0e0a0f fix #30480 torch.normal shape checking is broken (#32243) (#33050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33050

Following what gchanan proposed in #30480
- If the (logical) shapes of mean and std are broadcastable, we broadcast them for the output
  Done in tensor iterator already.
- If the (logical) shapes of mean and std are not broadcastable and they have the same number of elements, we fall back to the old behavior (pick the shape of mean)
  Done by reshape std to the same shape of mean.
- If the (logical) shapes of mean and std are not broadcastable and don't have the same number of elements, we error out.
  Done by tensor iterator already.

Test Plan: Imported from OSS

Differential Revision: D19771186

Pulled By: glaringlee

fbshipit-source-id: a0b71063c7f5fdda2d4ceb84e06384414d7b4262
2020-02-12 12:43:09 -08:00
2e9b7c5fe1 Migrate dist from TH to ATen(CPU, CUDA) (#29714)
Summary:
[https://github.com/pytorch/pytorch/issues/24691](https://github.com/pytorch/pytorch/issues/24691)
[https://github.com/pytorch/pytorch/issues/24551](https://github.com/pytorch/pytorch/issues/24551)

Benchmark:

**Speed**
```python
import time, sys
import torch
import math

inf = math.inf

torch.manual_seed(0)
devices = ["cpu", "cuda"]
ps = [0, 1, 2, 3, 4, inf, -inf]

# Warm up
for device in devices:
    for n in [1, 10, 100, 1000]:
        x = torch.randn(100, n, requires_grad=False, device=device)
        y = torch.randn(100, n, requires_grad=False, device=device)
        for i in range(1000):
            for p in ps:
                dist_xy = torch.dist(x, y, p)

for device in devices:
    print('On {}'.format(device))
    for n in [1, 10, 100, 1000]:
        total_time = 0
        x = torch.randn(100, n, requires_grad=False, device=device)
        y = torch.randn(100, n, requires_grad=False, device=device)
        for i in range(10000):
            for p in ps:
                t1 = time.time()
                dist_xy = torch.dist(x, y, p)
                t2 = time.time()
                total_time += (t2 - t1)
        average_time = total_time / 10000 / len(ps) * 1000
        print("input size(100, %d) average time is %.8f (ms)." % (n, average_time))
```

Output
Before:
```shel
On cpu
input size(100, 1) average time is 0.0079491 (ms).
input size(100, 10) average time is 0.0364167 (ms).
input size(100, 100) average time is 0.3120752 (ms).
input size(100, 1000) average time is 3.0605820 (ms).
On cuda
input size(100, 1) average time is 0.04745627 (ms).
input size(100, 10) average time is 0.04919453 (ms).
input size(100, 100) average time is 0.06601572 (ms).
input size(100, 1000) average time is 0.07849015 (ms).
```

After:
```shell
On cpu
input size(100, 1) average time is 0.0099936 (ms).
input size(100, 10) average time is 0.0340414 (ms).
input size(100, 100) average time is 0.2793379 (ms).
input size(100, 1000) average time is 0.7858076 (ms).
On cuda
input size(100, 1) average time is 0.04410237 (ms).
input size(100, 10) average time is 0.03326339 (ms).
input size(100, 100) average time is 0.03314828 (ms).
input size(100, 1000) average time is 0.03990038 (ms).
```

**Precision**

```python
for device in devices:
    torch.manual_seed(0)
    print('On {}'.format(device))
    for n in [1, 10, 100, 1000]:
        x = torch.randn(100, n, requires_grad=False).to(device)
        y = torch.randn(100, n, requires_grad=False).to(device)
        for p in ps:
            dist_xy_float = torch.dist(x, y, p)
            dist_xy_double = torch.dist(x.double(), y.double(), p)
            difference = torch.abs(dist_xy_double - dist_xy_float)
            print('input size (100, {}), p: {}, float: {}, double: {}, difference: {}'.format(n, p, dist_xy_float, dist_xy_double, difference))
```

Part of [output](https://gist.github.com/rivergold/dd95014dc7f163b22f72699d1134cdd2)
Before:
```shell
On cpu
input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0
input size (100, 100), p: 1, float: 11474.1806640625, double: 11474.185433543797, difference: 0.00476948129653465
input size (100, 100), p: 2, float: 143.50729370117188, double: 143.5073391487937, difference: 4.5447621829453055e-05
input size (100, 100), p: 3, float: 36.045475006103516, double: 36.04550275212738, difference: 2.774602386779179e-05
input size (100, 100), p: 4, float: 18.796083450317383, double: 18.79609807865317, difference: 1.4628335787136848e-05
input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07
input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0
On cuda
input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0
input size (100, 100), p: 1, float: 11474.1865234375, double: 11474.185433543797, difference: 0.00108989370346535
input size (100, 100), p: 2, float: 143.50733947753906, double: 143.5073391487933, difference: 3.2874575595087663e-07
input size (100, 100), p: 3, float: 36.04550552368164, double: 36.045502752127405, difference: 2.7715542358919265e-06
input size (100, 100), p: 4, float: 18.796098709106445, double: 18.796098078653177, difference: 6.304532682577246e-07
input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07
input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0
```

After
```shell
On cpu
input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0
input size (100, 100), p: 1, float: 11474.1806640625, double: 11474.185433543797, difference: 0.00476948129653465
input size (100, 100), p: 2, float: 143.50729370117188, double: 143.5073391487937, difference: 4.5447621829453055e-05
input size (100, 100), p: 3, float: 36.045475006103516, double: 36.04550275212738, difference: 2.774602386779179e-05
input size (100, 100), p: 4, float: 18.796083450317383, double: 18.79609807865317, difference: 1.4628335787136848e-05
input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07
input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0
On cuda
input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0
input size (100, 100), p: 1, float: 11474.185546875, double: 11474.185433543797, difference: 0.00011333120346534997
input size (100, 100), p: 2, float: 143.50733947753906, double: 143.5073391487933, difference: 3.2874575595087663e-07
input size (100, 100), p: 3, float: 36.04550552368164, double: 36.045502752127405, difference: 2.7715542358919265e-06
input size (100, 100), p: 4, float: 18.796096801757812, double: 18.796098078653177, difference: 1.2768953645547754e-06
input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07
input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29714

Differential Revision: D19769518

Pulled By: albanD

fbshipit-source-id: 69b79b64f1f190b410efe884662b6601e903eccf
2020-02-12 12:26:48 -08:00
97bf41ca22 Fix iOS x86_64 CI failure (#33194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33194

### Summary

The iOS x86_64 job has been failed for a few days. I haven't found the root cause, but seems like updating the torchvision to its latest version can fix the problem

### Test Plan

- the x86_64 job works

Test Plan: Imported from OSS

Differential Revision: D19845079

Pulled By: xta0

fbshipit-source-id: 5034e252600b6704b860d68c371a65bef4cf37fc
2020-02-12 11:07:48 -08:00
87640570b3 Make CUDA OOM error a type (#33056)
Summary:
There are cases when we want to recover from CUDA OOM, for example, some cuDNN algorithms use huge workspace and we want to recover from OOM to pick a different algorithm, in such cases, there is no reason to catch all errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33056

Differential Revision: D19795359

Pulled By: ezyang

fbshipit-source-id: a34e23bf6d172dc0257389251dafef5b38d27d2b
2020-02-12 10:45:40 -08:00
a389f8fa18 Revert D18912680: Prepare templates
Test Plan: revert-hammer

Differential Revision:
D18912680

Original commit changeset: 9e3828e42ee5

fbshipit-source-id: 9ef81991394f4e36f0652dfe594d5122969bd9cf
2020-02-12 10:39:09 -08:00
3cfea39968 Document how BCELoss avoids infinite results (#33160)
Summary:
Issue https://github.com/pytorch/pytorch/issues/31453
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33160

Differential Revision: D19835527

Pulled By: albanD

fbshipit-source-id: 82fd2dd46ffbc87e90ca8e100db411b6ff6bfe32
2020-02-12 07:56:19 -08:00
05281a5671 Add nice error message if missing overrides in custom autograd.Function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33142

Test Plan: Imported from OSS

Differential Revision: D19815786

Pulled By: albanD

fbshipit-source-id: 5513d900c7b711b625383686fcf03f822ab7ea80
2020-02-12 07:55:06 -08:00
09915ad570 [TensorBoard] Correct typo and wrap dataformats. (#31604)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/31603

- A minor spelling typo is corrected: "suitible" --> "suitable"
- A minor quality of life improvement is added: the data format strings are better rendered as fixed width to indicate that they are string constants.  "CHW" --> "`CHW`"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31604

Differential Revision: D19697293

Pulled By: ezyang

fbshipit-source-id: ee38b0d4c9ca8a233ac9243c310d9a3b42ad6f32
2020-02-12 07:51:04 -08:00
c6e0360812 Minor change of docstring example of WeightedRandomSampler (#30846)
Summary:
Previous example
```python
>>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True))
        [0, 0, 0, 1, 0]
```
may seem misleading according to provided weights.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30846

Differential Revision: D19697367

Pulled By: ezyang

fbshipit-source-id: 3d6e3cd0cecb5272a368707ba35bc7acdbd82c30
2020-02-12 07:46:39 -08:00
1767ae8daf [caffe2] remove dnnlowp log code (#33184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33184

dnnlowp specific code shouldn't be in the default FC in the first place

Test Plan: Just removing #ifdef #endif

Reviewed By: jianyuh

Differential Revision: D19835301

fbshipit-source-id: 7880cf298bedb3f0bc407d140d342124663ea4a7
2020-02-12 00:47:09 -08:00
9d9fa2eace [2/3] Bind Bucketize to PyTorch (#33014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33014

Export Bucketize to PyTorch.

Test Plan: buck test caffe2/caffe2/python/operator_test:torch_integration_test

Reviewed By: bddppq

Differential Revision: D19737534

fbshipit-source-id: be1c892bb8d01da9892f221f150f1a2788ac732e
2020-02-11 23:20:10 -08:00
47e589eb6e Disable flaky tests test_DistributedDataParallel and test_backend_group for ROCm (#33211)
Summary:
Getting intermittent error in CI runs:

**TestDistBackend.test_DistributedDataParallel**
```
02:36:32   File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/serialization.py", line 442, in _legacy_save
02:36:32     pickler.dump(obj)
02:36:32 AttributeError: Can't pickle local object 'Module._replicate_for_data_parallel.<locals>.zero_grad'
```
Some CI runs where it failed:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16163/console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16165/console

**TestDistBackend.test_backend_group**
```
test_backend_group (__main__.TestDistBackend) ... Memory access fault by GPU node-5 (Agent handle: 0x265c670) on address 0x7fded754a000. Reason: Page not present or supervisor privilege.
```
Some CI runs where it failed:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16288/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33211

Differential Revision: D19849089

Pulled By: bddppq

fbshipit-source-id: 5e997653cc344f4c6819d46bedc6d3bd75b5d854
2020-02-11 22:50:03 -08:00
5bc5dd58f3 [jit] fix a typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29107

Differential Revision: D19698662

Pulled By: ezyang

fbshipit-source-id: e7eea3246008e2c6d560ff5e4d84b90f65ff1afd
2020-02-11 22:45:28 -08:00
b9a5353fee Move where cuda implementation to TensorIterator (#33228)
Summary:
Reopen of https://github.com/pytorch/pytorch/pull/32984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33228

Differential Revision: D19850862

Pulled By: ngimel

fbshipit-source-id: b92446a49b4980188fa4788220a2164650e905c2
2020-02-11 22:28:27 -08:00
7863d2413d Updating submodules
Summary:
GitHub commits:

9fd0d1a3c7
bcaf9cdf1f
3e49249d30
98307ea1ec
f48ebb4d48
353f9c9f29
1caef25fc0
805ab665f2

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 609187c69ba2c6b31a05dcfdb1770054002ddb6e
2020-02-11 22:00:54 -08:00
d609497dde bulk_eval_collect_histograms
Summary:
Collect activation histograms along the model evaluation and aggregate all the histograms from multiple threads/readers into one file
The original functionality of bulk_eval workflow is still valid. The output predictions and extra blobs will be exported to a hive table, which will be very useful for numerical debugging.

Test Plan:
FBL
```flow-cli canary dper.workflows.bulk_eval.export --mode dbg --parameters-file experimental/summerdeng/sparsenn/bulk_eval_input_configs.json  --run-as-secure-group team_ai_system_sw_hw_co-design --entitlement gpu_prod --name "Histogram collection with caffe2 logging. Attach histogram observer to the predict net. Use small model 102343030. "
```
f163861773

When the flow is done, we can get all the histogram files under the specified dir. For example:
```
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6ca65cc0
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6cde8a80
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6d144840
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6d4a9600
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6da303c0
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6dd1c800
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e0855c0
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e3e0380
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e95a140
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6eafcf00
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6ed1a100
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f094ec0
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f561c80
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f783a40
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6fccb7c0
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7003d580
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb703ae340
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7084ae80
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70bc1c40
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70f43a00
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70ff7680
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb71361300
-rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb716df0c0
-rw-rw-r--. 1 185754 185754 4024538 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7199c780
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb71b72f00
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72330000
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72598100
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7290d880
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72b03980
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72f1f160
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb8bcee9e0
-rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fd51b457260
-rw-rw-r--. 1 185754 185754 4026659 Jan 23 09:51 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final
```

The aggregated histogram file is  /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final. It can be loaded to the following auto quant workflow for int8 static quantization.

######## Code refactoring ########

Moved the utility functions to process activation histograms to the deeplearning/numeric_suite/toolkit:hist_processor and add the dependency in dper.

We also had a hist_compiler in the caffe2/caffe2/fb/fbgemm/numerical_debugger/python_utils/hist_compiler.py. Also refactored the code to reuse the utility functions in deeplearning/numeric_suite/toolkit:hist_processor.

The histograms from bulk_eval and the hist_compiler are identical.
/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.compiled.bak
/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final.bak

Reviewed By: hx89

Differential Revision: D19270090

fbshipit-source-id: c7ecb4f2bbf1ea725c52e903356ad9a7b9ad73ac
2020-02-11 21:39:47 -08:00
9e7638f7c1 "batchSize" was set but never used (#32294)
Summary:
fixes a compiler warning:
```
torch/aten/src/ATen/native/cuda/MaxUnpooling.cu.cc(402):
warning: variable "batchSize" was set but never used
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32294

Differential Revision: D19697277

Pulled By: ezyang

fbshipit-source-id: b9821be325826dc4785cad7994803b54f1711a0c
2020-02-11 21:28:49 -08:00
66ee4f1c81 [ROCm] Enable Bfloat16 type for activation and batch-norm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32065

Differential Revision: D19728858

Pulled By: ezyang

fbshipit-source-id: 8f828c558bfe6c5f43f476ff8a0f967341f8f351
2020-02-11 21:04:20 -08:00
f255b7a3ac Drop support of the build option USE_GLOO_IBVERBS (#33163)
Summary:
Two releases have passed since its deprecation:
8a026d4f74b71944ac2860c315996165a40f5626
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33163

Differential Revision: D19850713

Pulled By: ezyang

fbshipit-source-id: 30a60df470b88e8c40e33112296e437cde29c49f
2020-02-11 20:35:50 -08:00
1487137c5b add missing default value for LRScheduler.step() (#32411)
Summary:
see also other type errors in https://github.com/pytorch/pytorch/pull/30576 and https://github.com/pytorch/pytorch/pull/30441
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32411

Differential Revision: D19697245

Pulled By: ezyang

fbshipit-source-id: d0295d747541adec5d6fad646f4cf4bb2f04abf5
2020-02-11 20:34:33 -08:00
139afd0ea7 Fix link to py-spy content in contribution guide TOC (#31760)
Summary:
The extra dashes are breaking the link here
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31760

Differential Revision: D19697301

Pulled By: ezyang

fbshipit-source-id: 65de026b9016dc8689c9dac9efb8aafd00b535cd
2020-02-11 20:27:35 -08:00
74c8a8f7bc Revert D19825127: [pytorch][PR] Move where cuda implementation to TensorIterator
Test Plan: revert-hammer

Differential Revision:
D19825127

Original commit changeset: bbf4682349d9

fbshipit-source-id: 0c439b8c9a00a5aa46fd196396cf7cc83cddb1b4
2020-02-11 19:49:18 -08:00
000a5e2b7f bad tbb lambda capture, bad chunk size (#30352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30352

1) tbb forwards us ident through parameter, we don't need to capture it.
2) tbb is being passed steps <= 0 which is bad.

Taken from TBB documentation:
```
The index type must be an integral type. The loop must not wrap around. The step value must be positive. If omitted, it is implicitly 1.
```

I have a build that uses `TBB_USE_DEBUG=1` and there are currently a lot of issues with PyTorch use.
Is TBB version not tested very much right now?
ghstack-source-id: 94459382

Test Plan: CI green

Differential Revision: D18666029

fbshipit-source-id: d5aa8327b03181d349e1964f9c8211298c433d6a
2020-02-11 18:46:32 -08:00
a23009f98f Quantized leaky relu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33004

Test Plan: Imported from OSS

Differential Revision: D19740193

Pulled By: z-a-f

fbshipit-source-id: 32542d5465db44190366a2f8b737305a03b5fa76
2020-02-11 17:56:02 -08:00
769abddfa3 Build ahead-of-time C++ extensions with ninja on windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33084

Differential Revision: D19817361

Pulled By: ezyang

fbshipit-source-id: 95a6d0ffa9beb6885c8a41688621b33da51706ae
2020-02-11 17:50:09 -08:00
acd51e13f7 TorchScript add check if quantized
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32890

Test Plan: Imported from OSS

Differential Revision: D19673463

Pulled By: z-a-f

fbshipit-source-id: 453ff662810845fcaeb8e6d5919afa8e2d395768
2020-02-11 17:38:49 -08:00
cb39a5400c Use C10_WARP_SIZE to fix functionality on HIP vs CUDA for batch_norm_backward_reduce (#33098)
Summary:
1. Use C10_WARP_SIZE instead of hardcoded value "32".
2. `getNumThreads` returns a minimum of 32 for CUDA, which is same as the warp size in CUDA. However, for HIP, it returns a minimum of 16, which is less than the warp size (64) in HIP. This creates an issue in the [reduce function](14548c2d5b/aten/src/ATen/native/cuda/Normalization.cuh (L115)) when it zeroes out the other entries in shared memory [here](14548c2d5b/aten/src/ATen/native/cuda/Normalization.cuh (L137)): since `blockDim.x` is at least equal to the warp size in CUDA, this never zeroes out `shared[0]`, but for HIP, since `blockDim.x` could be 16 or 32, which is less than the warp size (64), this results in `blockDim.x * blockDim.y` being potentially less than the warp size for small cases, which then zeroes out `shared[0]` as well. This results in an erroneous output of zero for the reduce function on ROCm (depending on how the block dimensions are set).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33098

Differential Revision: D19837355

Pulled By: bddppq

fbshipit-source-id: ea526acd82ec08b1acb25be860b7e663c38ff173
2020-02-11 16:47:22 -08:00
44723a1c24 [ONNX] Fix ONNX CI (#33200)
Summary:
Move the data to aws
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33200

Reviewed By: hl475

Differential Revision: D19843193

Pulled By: houseroad

fbshipit-source-id: bb0451d211cfc951ddb66264b92586c43b6e8841
2020-02-11 16:38:26 -08:00
af4d6120bd Temporarily disable failing 'binary_macos_libtorch_2_7_cpu_build' and… (#33207)
Summary:
… 'binary_macos_wheel_3_6_cpu_build' jobs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33207

Differential Revision: D19844787

Pulled By: kostmo

fbshipit-source-id: d44a0e26bf76afe4a5f94d7f1ad2d558de6f5d47
2020-02-11 15:44:35 -08:00
04829e924a Update CPU threading doc (#33083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33083

Added more recommendations, some notes and warning

Test Plan: cd docs ; make html

Differential Revision: D19829133

Pulled By: ilia-cher

fbshipit-source-id: b9fbd89f5875b3ce35cc42ba75a3b44bb132c506
2020-02-11 14:13:51 -08:00
6706c3f457 Prepare templates (#30982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30982

This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues.

Main focus of these changes is TensorOptions in code generation.
Goals:
- Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers.
- Refactor TensorOptions logic to a single place.
- Log all discovered issues.

Non goals:
- Fix Everything!
- Remove all the hacks in code generation scripts.
- Clean up and defector all code generation scripts.

-----------
In this PR:
Updating the templates.

-----------

Test Plan: Imported from OSS

Differential Revision: D18912680

Pulled By: izdeby

fbshipit-source-id: 9e3828e42ee5c3aefbf3729f4a8d6db813f2e7c3
2020-02-11 13:10:14 -08:00
45818a3de4 Remove some Half support in some binary CPU kernels (#33021)
Summary:
They were probably mistakenly added as we do not intend to support Half
on CPUs in general and in these situations Half type would probably be
significantly slower than their float and double counterpart due to the
lack of vectorization and the need of additional casting.

cc XiaobingSuper
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33021

Differential Revision: D19795152

Pulled By: VitalyFedyunin

fbshipit-source-id: b19796db88880a46557e1b2fd06e584d46093562
2020-02-11 12:54:47 -08:00
7b50e76255 optimize cat performance on CPU with TensorIterator (#30806)
Summary:
This PR aims at improving `cat` performance on CPU.
Current `cat` logic from `TH` module has no parallelization when the input tensor array are all contiguous.
This code also try to reuse the same `TensorIterator` as much as possible, in order to reduce overhead of creating `TensorIterator`, this is helpful when the slice of copy is not large enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30806

Differential Revision: D19275026

Pulled By: VitalyFedyunin

fbshipit-source-id: 756e9b86891f725c256b0a6981887ff06d88b053
2020-02-11 12:49:56 -08:00
ad90c97c0a Removes flaky check (#33146)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/32949.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33146

Differential Revision: D19836001

Pulled By: mruberry

fbshipit-source-id: 773069ae0c181e1a050b65b888c87590c1dddb32
2020-02-11 12:21:07 -08:00
a64d0ffe81 Use int64 in pdist kernel to handle batches >= 46342 #30583 (#31593)
Summary:
Currently `torch.pdist` yields an illegal CUDA memory access for batch sizes >= 46342 as reported by SsnL in https://github.com/pytorch/pytorch/issues/30583.
Thanks for the minimal code reproduction, btw! ;)

Reason for this bug:
The calculation if `i` in the [`pdist_kerne_cuda_impl`](46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)) might overflow, if a tensor with a `batch size >= 46342` is passed to `torch.pdist`.

Detailed description:
* `result` is resizes as ` n * (n - 1) / 2 = 1073767311` ([line of code](46ad80c839/aten/src/ATen/native/Distance.cpp (L140)))
* `grid` is initialized as `result.numel()` ([line of code](46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L246)))
* `k` is assigned to the `blockIdx.x` as an `int32` ([line of code](46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L108)))
* `i` is calculated using `2 * k >= 2147534622` ([line of code](46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112))), which overflows, since `2147534622 > 2147483647 (int32_max)`.

Using `const int64_t k = blockIdx.x;` would solve the illegal memory access. This seems also be done for [`cdist_kernel_cuda_impl`](46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L198-L201)).

However, we might expect a slowdown, so I've timed the current PyTorch master vs. this PR:
(tested with `x = torch.randn(x.size(0), 128)` on a V100)

 |x.size(0) | int32 idx | int64 idx | slowdown |
 |----------|-----------|-----------|----------|
| 50000 | -              | 4.4460 | - |
| 25000 | 1.02522 | 1.10869 | 7.53% |
| 12500 | 0.25182 | 0.27277 | 7.68% |
| 6250 | 0.06291 | 0.06817 | 7.72% |
| 3125 | 0.01573 | 0.01704 | 7.69% |
| 1562 | 0.00393 | 0.00426 | 7.75% |

While checking the backward kernel, it seems I'm triggering another error with a size limit of
```python
x = torch.randn(1449, 1, device='cuda', requires_grad=True)
out = torch.pdist(x)
out.mean().backward()
> RuntimeError: CUDA error: invalid configuration argument
```
, while `[<=1448, 1]` works.

I'll take another look at this issue. Let me know, if the potential fix should go into this PR or if I should open a new issue.

CC ngimel, csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31593

Differential Revision: D19825571

Pulled By: ngimel

fbshipit-source-id: ace9ccab49f3cf0ce894cdb6daef0795e2e8ec03
2020-02-11 12:00:39 -08:00
367488b001 Move where cuda implementation to TensorIterator (#32984)
Summary:
`where` is special because the arguments do not have the same type, which does not satisfy the assumption in modern https://github.com/pytorch/pytorch/pull/32383. I migrate it to TensorIterator so that there is something to test that this case is not broken. Currently, this case fallback to using legacy (not vectorized, not unrolled) code. It should be supported in the future when I cleanup `Loops.cuh`.

I also move some sharing part of `CUDALoops.cuh` and `ROCmLoops.cuh` into `Loops.cuh` so that to logic for checking whether `func_t` has the same arg types could be shared.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32984

Differential Revision: D19825127

Pulled By: ngimel

fbshipit-source-id: bbf4682349d96b4480c4d657f3c18a3a67a9bf17
2020-02-11 11:10:06 -08:00
31370949be Add zero_mask function for vectorized functions. (#32985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32985

This can be useful in many situations to decide whether all elements are
zeros or non-zeros, such as elu as shown in #32986 .

Test Plan: Imported from OSS

Differential Revision: D19794549

Pulled By: VitalyFedyunin

fbshipit-source-id: 1be1c863d69b9a19fdcfcdd7cb52343066f740d3
2020-02-11 11:01:29 -08:00
855ee6446f Revert D18749922: [pytorch] Migrating index_add cuda to ATen
Test Plan: revert-hammer

Differential Revision:
D18749922

Original commit changeset: d243be43a3b6

fbshipit-source-id: 15dafa644d84ff8803bd9ab3cdd40e12d805924a
2020-02-11 10:33:20 -08:00
857bae39e0 Updated DispatchKeyExtractor to expect TensorOptions (#30981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30981

This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues.

Main focus of these changes is TensorOptions in code generation.
Goals:
- Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers.
- Refactor TensorOptions logic to a single place.
- Log all discovered issues.

Non goals:
- Fix Everything!
- Remove all the hacks in code generation scripts.
- Clean up and defector all code generation scripts.

-----------
In this PR:
Extended DispatchKeyExtractor logic to expect TensorOptions.

-----------

Test Plan: Imported from OSS

Differential Revision: D18912684

Pulled By: izdeby

fbshipit-source-id: 25cf1c397caa14272ca65b4003f1f03ff282ea77
2020-02-11 10:09:08 -08:00
e7f0b15473 Remove return value for __exit__ (#32997)
Summary:
When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager.

Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised.

```python
In [1]: import torch
   ...:
   ...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler):
   ...:     def __init__(self, optimizer, gamma, milestones, last_epoch = -1):
   ...:         self.init_lr = [group['lr'] for group in optimizer.param_groups]
   ...:         self.gamma = gamma
   ...:         self.milestones = milestones
   ...:         super().__init__(optimizer, last_epoch)
   ...:
   ...:     def get_lr(self, step):
   ...:         global_step = self.last_epoch #iteration number in pytorch
   ...:         gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1]
   ...:         return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr]
   ...:
   ...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1)
   ...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20])
```
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-7fad6ba050b0> in <module>
     14
     15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1)
---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20])

<ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch)
      6         self.gamma = gamma
      7         self.milestones = milestones
----> 8         super().__init__(optimizer, last_epoch)
      9
     10     def get_lr(self, step):

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch)
     75         self._step_count = 0
     76
---> 77         self.step()
     78
     79     def state_dict(self):

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch)
    141                 print("1a")
    142                 # try:
--> 143                 values = self.get_lr()
    144                 # except TypeError:
    145                     # raise RuntimeError

TypeError: get_lr() missing 1 required positional argument: 'step'
```

May be related to https://github.com/pytorch/pytorch/issues/32898.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997

Differential Revision: D19737731

Pulled By: vincentqb

fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655
2020-02-11 09:27:29 -08:00
6c0dc66cb4 [caffe2] use JIT'ed fp32 SLS (#33123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33123

Pull Request resolved: https://github.com/pytorch/pytorch/pull/32413

Use JIT'ed fp32 SLS in Caffe2 operators

Test Plan:
```
./fblearner/flow/run_integration_tests --regex dper.workflows.canary.canary_workflow --wait
```
f167043951 was killed due to 3hr timeout instead of failed.

Reviewed By: jianyuh

Differential Revision: D19680711

fbshipit-source-id: efaca333edcfeab0007ad88f4f5168b2229e7e66
2020-02-11 08:59:17 -08:00
3655975565 Add allow_rebase_history flag and fix codegen functions for multiple views (#32790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32790

Same as https://github.com/pytorch/pytorch/pull/31990 but without the first commit in the stack that is problematic for a lot of people.

Test Plan: Imported from OSS

Differential Revision: D19814116

Pulled By: albanD

fbshipit-source-id: d104911a5b098a5807b4bc08b69803ebd4f69fa6
2020-02-11 07:16:02 -08:00
330d051bd5 [pytorch] Migrating index_add cuda to ATen (#30573)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30573

Mostly just moved code.
Index dim and number of indices checks are added to make checks idential to index_add_cpu_
ghstack-source-id: 98010129

Test Plan: existing tests

Differential Revision: D18749922

fbshipit-source-id: d243be43a3b6a9b9591caf0c35ef2fb6ec0d3ead
2020-02-11 06:03:53 -08:00
9857d9b4cd fix gather regression by not materializing loop vars in the error mes… (#33108)
Summary:
…sage

Per title, fixes regression reported in https://github.com/pytorch/pytorch/issues/32425. cc nikitaved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33108

Differential Revision: D19816116

Pulled By: ngimel

fbshipit-source-id: 9f4a84c8e4533873b71bb7bbf3a7915b05308845
2020-02-10 18:27:02 -08:00
6f46962f21 [1/3] Bind IndexHash to PyTorch (#33015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33015

Export IndexHash to PyTorch

Test Plan:
buck test caffe2/caffe2/python/operator_test:torch_integration_test

      ✓ caffe2/caffe2/python/operator_test:torch_integration_test-2.7 - test_index_hash_op (caffe2.caffe2.python.operator_test.torch_integration_test.TorchIntegration) 0.151 44/50 (passed)

Reviewed By: bddppq

Differential Revision: D19727301

fbshipit-source-id: a65c954539e81a15577fe5c3c0deb3614e983534
2020-02-10 17:47:38 -08:00
61ac14a483 Updating submodules
Summary:
GitHub commits:

543b39c9ad
38c2e0ee44
552c07c32b
4369f2c7bb
07dbb5d2f4

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 803108a618a5be9ea58a38644c851486bad3bfbc
2020-02-10 17:19:07 -08:00
a3e69d3405 Use bazelisk instead of specifying bazel version manually. (#33036)
Summary:
Bazelisk automatically reads `.bazelversion` file and install the required version of Bazel. This saves us from updating CI script everytime we need a Bazel upgrade.
Use clang-8 for consistency with pytorch/xla repo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33036

Differential Revision: D19820819

Pulled By: ailzhang

fbshipit-source-id: 1560ec225cd037a811769a509a704b0df77ea183
2020-02-10 17:14:08 -08:00
524fe8a96c Updating submodules
Summary:
GitHub commits:

4bc5213b66
9ae570bb89
b2bc1da561
dcde8696bd

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: c5ca30dab73f80cd13f5a5bf6e3867083b2512ac
2020-02-10 15:07:12 -08:00
d672779339 [CI][treehug] Disable xenial_py2.7 tests due to mypy min version py3.5
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33159

Test Plan: Imported from OSS

Differential Revision: D19822400

Pulled By: IvanKobzarev

fbshipit-source-id: 8e7b561e6a6181ec1f9b6f56a539ddcb538b3858
2020-02-10 14:52:29 -08:00
495c1df510 [pytorch] convert code analyzer to a binary (#33102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33102

Add a simple main() to build code analyzer as a binary. This enables
easier integration with FB internal build environment.
ghstack-source-id: 97958658

Test Plan: - CI

Differential Revision: D19798560

Pulled By: ljk53

fbshipit-source-id: 126230e3bf7568046a309e8a6785230f820e0222
2020-02-10 14:46:29 -08:00
e8c4f5a74b Temporarily disable failing iOS builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33154

Differential Revision: D19820655

Pulled By: kostmo

fbshipit-source-id: fc3e22b1bf4ec112085ea846c3999efd0f3e26f3
2020-02-10 13:47:57 -08:00
3bde97d5a5 Move a resize from codegen to code.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33024

Test Plan: Imported from OSS

Differential Revision: D19774147

Pulled By: gchanan

fbshipit-source-id: 08cb099f1695b28117e4236e214976b548aec7a1
2020-02-10 12:47:14 -08:00
3c4cec56aa Enable test_distributed for ROCm but only with nccl backend [REDUX] (#32551)
Summary:
This is a redux of the original PR https://github.com/pytorch/pytorch/issues/28814 which was reverted in PR https://github.com/pytorch/pytorch/issues/29736 due to test_DistributedDataParallel being suspected as being flaky. Further investigation revealed it wasn't flakiness, but a bug in the PyTorch source code which has been now fixed in PR https://github.com/pytorch/pytorch/issues/32356. This PR is another attempt at enabling the test_distributed unit test suite only for the nccl backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32551

Differential Revision: D19729966

Pulled By: bddppq

fbshipit-source-id: 12a0d850991a903cc7723d63693b6157071d7115
2020-02-10 12:42:36 -08:00
f4fbe9549d Revert D19800021: [pytorch][PR] Improve error message for assertWarnsRegex
Test Plan: revert-hammer

Differential Revision:
D19800021

Original commit changeset: 1c31ae785c8f

fbshipit-source-id: d7b340d678562c25a84d48be66c576075000b50d
2020-02-10 12:17:52 -08:00
6be4ec100f [pytorch] Elide more Thrift Tensor send copies. (#31998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31998

This change builds on recent torch::from_blob() changes to avoid Tensor
copies on send in more cases.

Particularly, this change adds an enabled option to assume if the Tensor
Storage's DataPtr has a non-trivial deleter, then the Tensor does in fact
manage the underlying memory. And hence we can reference the Tensor's Storage
via an IOBuf that is referenced while sending, saving a Tensor copy.

We add appropriate test cases, particularly re: torch::from_blob() which
would have been problematic would recent changes.
ghstack-source-id: 97778619

Test Plan: buck test mode/dev caffe2/torch/fb/distributed/wireSerializer/test/...

Reviewed By: satgera

Differential Revision: D19306682

fbshipit-source-id: 05f56efb2d5d6279ae4b54dfcbba0f729c2c13fa
2020-02-10 11:34:33 -08:00
ebed008dd4 Correct /MP usage in MSVC (#33120)
Summary:
## Several flags
`/MP[M]`: It is a flag for the compiler `cl`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
`/maxcpucount:[M]`: It is a flag for the generator `msbuild`. It leads to project-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.
`/p:CL_MPCount=[M]`: It is a flag for the generator `msbuild`. It leads the generator to pass `/MP[M]` to the compiler.
`/j[M]`: It is a flag for the generator `ninja`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC.

## Reason for the change
1. Object-level multiprocessing is preferred over project-level multiprocessing.
2. ~For ninja, we don't need to set `/MP` otherwise M * M processes will be spawned.~ Actually, it is not correct because in ninja configs, there are only one source file in the command. Therefore, the `/MP` switch should be useless.
3. For msbuild, if it is called through Python configuration scripts, then `/p:CL_MPCount=[M]` will be added, otherwise, we add `/MP` to `CMAKE_CXX_FLAGS`.
4. ~It may be a possible fix for https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Because `/MP` is also passed to `nvcc`.~ It is probably not true. Because `/MP` should not be effective given there is only one source file per command.

## Reference
1. https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019
2. https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows
3. https://blog.kitware.com/cmake-building-with-all-your-cores/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33120

Differential Revision: D19817227

Pulled By: ezyang

fbshipit-source-id: f8d01f835016971729c7a8d8a0d1cb8a8c2c6a5f
2020-02-10 11:29:25 -08:00
9d94f56ce0 Backward operation of torch.eig for real eigenvalues (#33090)
Summary:
Another pull request to follow up issue https://github.com/pytorch/pytorch/issues/32531.
Here I implemented the backward operation for `torch.eig` with a condition that all the eigenvalues are real.

This pull request is independent of my another pull request https://github.com/pytorch/pytorch/issues/32932, which means that there is no dependency between this PR and my another PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33090

Differential Revision: D19814347

Pulled By: albanD

fbshipit-source-id: 2fae30964e97987abb690544df8240aedeae56e8
2020-02-10 09:52:56 -08:00
c917a247a8 Improve error message for assertWarnsRegex (#33099)
Summary:
`assertWarnsRegex` now prints out any warnings that it caught while failing to find a matching warning. This makes it easier to debug tests by just looking at the CI logs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33099

Differential Revision: D19800021

Pulled By: ezyang

fbshipit-source-id: 1c31ae785c8ffc5d47619aff6597e479263be2de
2020-02-10 07:27:59 -08:00
3e8d813263 Add more checks to custom Function (#33069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33069

This PR adds the following:
- Warn when a non-input Tensor is given to `mark_dirty()` as it is not needed.
- Raise an error if we modify inplace an input that is a view and that we have multiple output. This setting is not handled by `CopySlices` and will raise a cryptic error during the backward.
- Raise an error if an input is modified inplace but not returned. That will prevent the graph rewrite from being done correctly.

Test Plan: Imported from OSS

Differential Revision: D19791563

Pulled By: albanD

fbshipit-source-id: 4d8806c27290efe82ef2fe9c8c4dc2b26579abd1
2020-02-10 07:25:24 -08:00
e1c53a5c86 Fix version counter bump in cpp Function (#33068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33068

The version counter is already tracked if we use pytorch's functions but not if the user unpack the Tensor and modifies it by hand or with a third party library.

Test Plan: Imported from OSS

Differential Revision: D19791564

Pulled By: albanD

fbshipit-source-id: a73c0f73d8fd0c0e5bf838f14bed54fa66937840
2020-02-10 07:22:29 -08:00
efba630287 Issue a warning when zero_grad is used in DataParallel (#33064)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31768, second attempt of https://github.com/pytorch/pytorch/issues/32870

DataParallel creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module. However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should issue a warning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33064

Differential Revision: D19790178

Pulled By: albanD

fbshipit-source-id: 886f36640acef4834a6fa57a26ce16b42ff0e9ad
2020-02-10 07:04:27 -08:00
e2f1288514 Add utils to inspect fp16/int8 packed weights (#32979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32979

Since we use prepacked weights in the Fp16 FCs and future Int8 FCs in production Ads models, we provide the python utils to inspect the unpacked format of the weights for debugging purpose. The main interfaces are the following:

```
from deeplearning.numeric_suite.toolkit import packed_weights_inspector
# inspect fp16 packed weights
unpacked_fp16_weights = packed_weights_inspector.extract_fp16_fc_packed_weights(fp16_weight_blob_name)

# inspect int8 packed weights
unpacked_int8_weights, qparams = packed_weights_inspector.extract_int8_fc_packed_weights(int8_weight_blob_name)
```

Test Plan:
```
buck test mode/opt deeplearning/numeric_suite/toolkit/test:packed_weights_inspector_test
```

Reviewed By: amylittleyang

Differential Revision: D19724474

fbshipit-source-id: e937672b3722e61bc44c2587aab2288a86aece9a
2020-02-08 18:18:56 -08:00
6249d7302b [ONNX] Fix export for avg_pool with default stride (#33017)
Summary:
If using nn.functional avg_pool, stride is an optional arg. If not provided, it is set to kernel_size.
This PR fixes the export of avg_pool with default stride.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33017

Reviewed By: hl475

Differential Revision: D19759604

Pulled By: houseroad

fbshipit-source-id: b0352db6fbaf427f4cff9ba8a942efdeb39b6f02
2020-02-07 22:46:46 -08:00
0e29e9e0f6 Re-enable internal test runs
Summary:
Fix internal error message due to old version of hypothesis
   test_suite = self.load_tests()
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 678, in load_tests
    suite = loader.load_all()
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 467, in load_all
    __import__(module_name, level=0)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/test_quantization.py", line 45, in <module>
    hu.assert_deadline_disabled()
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/torch/testing/_internal/hypothesis_utils.py", line 322, in assert_deadline_disabled
    assert settings().deadline is None
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/hypothesis/_settings.py", line 127, in __getattr__
    raise AttributeError('settings has no attribute %s' % (name,))
AttributeError: settings has no attribute deadline

Test Plan: buck test mode/dev //caffe2/test:quantization -- --run-disabled runs successfully

Differential Revision: D19795232

fbshipit-source-id: ef1d8be20b4be30e1cfad4cd5019c4779a5f4568
2020-02-07 18:08:18 -08:00
17d4ef9e9e Support using scalar tensor for split (#32493)
Summary:
split requires an int input, however in tracing operators such as
size(axis) return a tensor, which is different behavior than when not
tracing. As such need to modify split to handle these cases.

Fixes https://github.com/pytorch/pytorch/issues/27551
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32493

Reviewed By: hl475

Differential Revision: D19538254

Pulled By: houseroad

fbshipit-source-id: c8623009de5926aa38685e08121f4b48604bd8c0
2020-02-07 17:16:43 -08:00
7314f1c281 [torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070

`start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`.

Test Plan:
Warning message looks like:
```
main.py:8: UserWarning: This method only supports start_method=spawn (got: fork).
To use a different start_method use:
         torch.multiprocessing.start_process(...)
  warnings.warn(msg)
```

Reviewed By: ailzhang

Differential Revision: D19780235

fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b
2020-02-07 15:26:00 -08:00
c6fa6d82ae move Decompose before profiling to prevent clearing shape info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33100

Differential Revision: D19793346

Pulled By: Krovatkin

fbshipit-source-id: fdc5927f4970eabbb5a8f62a499d5b79117af2a9
2020-02-07 14:04:40 -08:00
868db903ae ONNX support for torch.take (#33061)
Summary:
Adding ONNX export support for torch.take()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33061

Reviewed By: hl475

Differential Revision: D19782651

Pulled By: houseroad

fbshipit-source-id: 0168fb941e166acda4ca607165248b8e0b260ace
2020-02-07 13:41:26 -08:00
a9583c1f75 Vectorize softplus and its backward function on CPU (#32944)
Summary:
The benchmarking shows a huge performance gain (2-7x faster).

Also note that I removed Half support because it isn't generally supported on CPU.

Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz)

```python
import timeit
for op in ('Softplus',):
    print('Forward')
    for dtype in ('torch.double', 'torch.float'):
        for n, t in [(10_000, 10000),
                    (100_000, 1000)]:
            print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}')
            print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype})', number=t))
    print('Backward')
    for dtype in ('torch.double', 'torch.float'):
        for n, t in [(10_000, 40000),
                    (100_000, 4000)]:
            print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}')
            print(timeit.timeit('y.backward(retain_graph=True)',
                                setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype}, requires_grad=True); x = m(a); y = x.sum()',
                                number=t))
```

Before:

```
Forward
torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double
3.73130346799735
torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double
3.6790116359916283
torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float
2.7477027159911813
torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float
2.7382752639969112
Backward
torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double
7.037510035006562
torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double
5.855093962003593
torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float
3.413616877005552
torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float
2.5485514330066508
```

After:

```
Forward
torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double
0.9465823079954134
torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double
0.8799468770012027
torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float
0.39715987400268205
torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float
0.3563060039887205
Backward
torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double
2.400547721001203
torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double
1.4740848699875642
torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float
1.6684603010071442
torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float
0.6815649690106511
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32944

Differential Revision: D19725407

Pulled By: VitalyFedyunin

fbshipit-source-id: 7430de838df731bd17617eff63f10107d5ad6b8b
2020-02-07 11:28:49 -08:00
e7b42209eb Added sparkspot model.
Summary: Lite interpereter does not have softplus and sub ops for this model.

Test Plan:
buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android --framework pytorch --remote --devices SM-G960U-8.0.0-26

 https://our.intern.facebook.com/intern/aibench/details/890521439770638

buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android/arm64 --framework pytorch --remote --devices SM-G960U-8.0.0-26

https://our.intern.facebook.com/intern/aibench/details/485779747361527

For Caffe2:
buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/caffe2/mobile_migration/sparkspot.json --platform android --framework caffe2 --remote --devices SM-G950U-7.0-24

https://our.intern.facebook.com/intern/aibench/details/177482569133423

Reviewed By: ljk53, iseeyuan

Differential Revision: D19757721

fbshipit-source-id: cdd4b39d072925fc8de17184f2c90918de6245ba
2020-02-07 11:22:06 -08:00
de27f4261d [jit] remove redundant variables from JIT TestCase
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29091

Differential Revision: D19746083

Pulled By: suo

fbshipit-source-id: 76fd71740fe7a3f52da361d96a7b694ec208de24
2020-02-07 10:42:33 -08:00
d678093907 [ONNX] Extend op registration to next opsets (#32943)
Summary:
Currently, custom ops are registered for a specific opset version.
For example, all torchvision custom ops are registered for opset 11, and cannot be exported into higher opset versions. This PR extends op registration to higher opset versions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32943

Reviewed By: hl475

Differential Revision: D19739406

Pulled By: houseroad

fbshipit-source-id: dd8b616de3a69a529d135fdd02608a17a8e421bc
2020-02-07 10:37:50 -08:00
3b2f267ad8 add to codeowner to get better inbox notification for PR
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33087

Differential Revision: D19790389

Pulled By: albanD

fbshipit-source-id: 360ee1fc47a9b0b8d8ddbe47b77f2cbffaead9c8
2020-02-07 07:56:47 -08:00
674dca0831 Automatic update of fbcode/onnx to 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e (#33075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33075

Previous import was 65020daafa9183c769938b4512ce543fd5740f8f

Included changes:
- **[8b3f7e2e](https://github.com/onnx/onnx/commit/8b3f7e2e)**: Update Dropout and  BatchNorm to be Training Friendly (#2568) <Lara Haidar>
- **[61f0bbc5](https://github.com/onnx/onnx/commit/61f0bbc5)**: Fix a bug in ScatterND shape inference (#2577) <Bowen Bao>
- **[05bce9cf](https://github.com/onnx/onnx/commit/05bce9cf)**: add utility function to make reference attribute whose name is not the same as the attribute it refers. (#2583) <Ke Zhang>
- **[71181c83](https://github.com/onnx/onnx/commit/71181c83)**: Clarify spec for constant of shape with dim_n = 0 (#2567) <Negin Raoof>
- **[eadba733](https://github.com/onnx/onnx/commit/eadba733)**: Update sigs.md with link to calendar page (#2579) <Prasanth Pulavarthi>
- **[08562f8e](https://github.com/onnx/onnx/commit/08562f8e)**: Update working-groups.md (#2580) <Prasanth Pulavarthi>
- **[0e718913](https://github.com/onnx/onnx/commit/0e718913)**: Fix Slice op's shape inference logic (#2526) <Hariharan Seshadri>
- **[12111410](https://github.com/onnx/onnx/commit/12111410)**: Add missing spaces to Random*Like doc (#2572) <Takeshi Watanabe>
- **[7e6e61d6](https://github.com/onnx/onnx/commit/7e6e61d6)**: Contributing: fix typos (#2571) <Maher Jendoubi>
- **[bbd604ef](https://github.com/onnx/onnx/commit/bbd604ef)**: Add Einsum op (#2504) <Negin Raoof>
- **[fd3ab73a](https://github.com/onnx/onnx/commit/fd3ab73a)**: Clarify split supports zero length splits (#2544) <Negin Raoof>
- **[6dd73774](https://github.com/onnx/onnx/commit/6dd73774)**: Fix circleci build and drop unsupported Windows builds (#2565) <Wei-Sheng Chin>
- **[b3d201a2](https://github.com/onnx/onnx/commit/b3d201a2)**: Fix the formula of intermediate zero calculation for DynamicQuantizeLinear (#2556) <Yufeng Li>
- **[3613eb25](https://github.com/onnx/onnx/commit/3613eb25)**: Add wording to clarify. (#2555) <Dwayne Robinson>
- **[dfa4384c](https://github.com/onnx/onnx/commit/dfa4384c)**: Fix shape inference for Split with split attribute (#2328) <Shinichiro Hamaji>
- **[684fc1bc](https://github.com/onnx/onnx/commit/684fc1bc)**: Keep symbolic dims in Concat with a single input (#2418) <Shinichiro Hamaji>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D19784487

fbshipit-source-id: 421cdc3394faeff0168853f4ff065fc599ca3967
2020-02-07 02:18:57 -08:00
e025f393f6 windows template specialization bug (#33076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076

attempt at fixing https://github.com/pytorch/pytorch/issues/30886

Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes

Differential Revision: D19784550

fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6
2020-02-07 00:41:22 -08:00
05d18ffaf5 Distributed Autograd: Allow multiple backward passes to accumulate gradients. (#32506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32506

In this PR, we've introduced a `retain_graph` parameter to distributed
autograd similar to `torch.autograd.backward`.

In terms of design, this parameter is sent over RPC to all nodes and is used to
create the GraphTask on the local nodes. This enables us to run
`dist_autograd.backward()` multiple times in the same context.

The use case currently for this is to benchmark only the backward pass for
distributed autograd. We'd like to measure the QPS for the backward pass and as
a result, running a single forward pass and multiple backward passes in a loop
is one way to benchmark backward pass performance.
ghstack-source-id: 97868900

Test Plan: waitforbuildbot

Differential Revision: D19521288

fbshipit-source-id: 7ad8521059fd400d7b5a6ab77ce56e1927ced90a
2020-02-06 23:27:21 -08:00
f0d7bd41b9 [jit] Minor: avoid recalculating some keys for map accesses in pickler. (#33060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33060

Noticed this when tracking down a partially-related SIGSEGV.
If inserting a non-present key into a memoized map, don't re-calculate it twice
(probably safer that way anyway).
ghstack-source-id: 97904485

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D19778008

fbshipit-source-id: 95b1d708c034a54b96a22ccbdffb24f72d08dffd
2020-02-06 21:25:04 -08:00
10db323b75 Updating submodules
Summary:
GitHub commits:

4121390031
fdd24faa6c
94471e632b
0a24425afd
8b79c69b6c
99f3917826
3853cef0ba
5db0cb90fc
714edbb20f
880ade1420

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: a63558a8df40c936d8959287f815835502b6cbd9
2020-02-06 21:01:50 -08:00
afa8cbf8c2 Modifed randNLike for scripting (#32830)
Summary:
the rand N like function had required args which were not being used.
As such modified the method signature to give default values so when
scripting does not provide these arguments which are not even being
used, no error is thrown.

Additionally modified the const checker for handling prim::Constant as
well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32830

Reviewed By: hl475

Differential Revision: D19731715

Pulled By: houseroad

fbshipit-source-id: a3cacb3977eecb88b122e0ceb654fdbf1c8286c1
2020-02-06 18:19:42 -08:00
432858c960 [ONNX] Fix exporting copy_ with index as tensor input (#32801)
Summary:
Supporting the below case. Previously index for copy_ was only considered as constant integer, where as it could be a tensor input as well.

```python
class InPlaceIndexedAssignment(torch.nn.Module):
    def forward(self, data, index, new_data):
        data[index] = new_data
        return data

data = torch.zeros(3, 4)
index = torch.tensor(1)
new_data = torch.arange(4).to(torch.float32)
torch.onnx.export(InPlaceIndexedAssignment(), (data, index, new_data), 'inplace_assign.onnx', opset_version=11)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32801

Reviewed By: hl475

Differential Revision: D19731666

Pulled By: houseroad

fbshipit-source-id: 08703fdccd817f901282e19847e259d93929e702
2020-02-06 18:11:47 -08:00
ca33aeba09 [JIT] Add Exit Transform / Convert To SSA to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24114

Differential Revision: D19780828

Pulled By: eellison

fbshipit-source-id: d481ad886b2ad6349a1646672e507336d45759fb
2020-02-06 18:04:06 -08:00
b0476dc6e6 Fix Typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33038

Differential Revision: D19769127

Pulled By: zou3519

fbshipit-source-id: 53a7fa603b097d7070ca484997a587ec74e87357
2020-02-06 11:16:56 -08:00
38820a7014 [JIT] Resolve custom classes in source importer (#32977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32977
ghstack-source-id: 97736042

Test Plan: Imported from OSS

Differential Revision: D19724588

fbshipit-source-id: b31b6ae14d2881d3604922e611fe4749108e674d
2020-02-06 10:45:40 -08:00
757cea92a4 [c10] Allow taking a std::tuple as arg (#32948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32948
ghstack-source-id: 97736044

Test Plan: Imported from OSS

Differential Revision: D19709119

fbshipit-source-id: 26b069a95ae7a79a2d5cbe3845eb1a5dcd398be1
2020-02-06 10:44:31 -08:00
8195961f20 Revert D19730209: [pytorch][PR] Issue a warning when using zero_grad in DataParallel
Test Plan: revert-hammer

Differential Revision:
D19730209

Original commit changeset: cb9b2cb0c2e0

fbshipit-source-id: 5bf53ea3c37a7ed2411a2acc34e40d07eff144c9
2020-02-06 07:05:51 -08:00
ec1e9a1ae2 Revert D19417087: fix #30480 torch.normal shape checking is broken
Test Plan: revert-hammer

Differential Revision:
D19417087

Original commit changeset: 1c4bc7df9231

fbshipit-source-id: ee579304cd79e48a6ce87daf490b53baabc655a8
2020-02-06 07:01:29 -08:00
e76fa9822d [C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32607

As desc.

Test Plan: Unit-test.

Reviewed By: xw285cornell, chocjy

Differential Revision: D19551567

fbshipit-source-id: 3a121351d2b4016e99a1536dec746be970698664
2020-02-05 23:49:27 -08:00
3c17cbb6c8 fix #30480 torch.normal shape checking is broken (#32243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32243

Following what gchanan proposed in #30480
- If the (logical) shapes of mean and std are broadcastable, we broadcast them for the output
  Done in tensor iterator already.
- If the (logical) shapes of mean and std are not broadcastable and they have the same number of elements, we fall back to the old behavior (pick the shape of mean)
  Done by reshape std to the same shape of mean.
- If the (logical) shapes of mean and std are not broadcastable and don't have the same number of elements, we error out.
  Done by tensor iterator already.

Test Plan: Imported from OSS

Differential Revision: D19417087

Pulled By: glaringlee

fbshipit-source-id: 1c4bc7df923110a803620b9e2abd11a7151fc33e
2020-02-05 23:47:14 -08:00
b00345a6f2 Move normal distribution to Aten(CPU) (#32031)
Summary:
Fix https://github.com/pytorch/pytorch/issues/24746
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32031

Differential Revision: D19729002

Pulled By: ezyang

fbshipit-source-id: f571368a8a2ac4068c937062167a2fd89e64098c
2020-02-05 20:39:40 -08:00
46c3c18bcc Issue a warning when using zero_grad in DataParallel (#32870)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31768

`DataParallel` creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module,

~breaking any model that uses `backward`-`zero_grad` in its `forward`. I fix this by patching the replica module so that `zero_grad` clears grads on the parent as well.~

However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should raise a warning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32870

Differential Revision: D19730209

Pulled By: ezyang

fbshipit-source-id: cb9b2cb0c2e0aca688ce0ff3e56b40fbd2aa3c66
2020-02-05 20:25:04 -08:00
6209412647 Add option to use ninja to compile ahead-of-time cpp_extensions (#32495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495

Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.

The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.

User-facing changes
------------------------------

I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.

Implementation Details
------------------------------

This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.

Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build

Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.

Test Plan
------------------------------

Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.

PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.

Test Plan: Imported from OSS

Differential Revision: D19730432

Pulled By: zou3519

fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-05 18:49:29 -08:00
e54d954572 [ONNX] Add flag to enable script tests (#32654)
Summary:
This will allow us to incrementally enable more tests for scripting as we put in fixes. houseroad spandantiwari
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32654

Reviewed By: hl475

Differential Revision: D19583401

Pulled By: houseroad

fbshipit-source-id: 8dc05e4784df819c939dffdf33b00cbb80bfa364
2020-02-05 17:51:00 -08:00
1b746b95fb Consider hub_dir alongside TORCH_HOME env variable for storing hub models (#32844)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32844

Differential Revision: D19747566

Pulled By: ailzhang

fbshipit-source-id: caca41a3a057d7d280d4783515aba2cc48c82012
2020-02-05 15:35:53 -08:00
74ce3a032c Fix some bugs with zipfile serialization (#32244)
Summary:
Stacked PRs
 * #32958 - Make zip serialization the default
 * **#32244 - Fix some bugs with zipfile serialization**

It includes the following changes:
* Split up tests so that we can test both serialization methods
    * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end)
* Call `readinto` on a buffer if possible instead of `read` + a copy
* Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine)
](https://our.intern.facebook.com/intern/diff/19418935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244

Pulled By: driazati

Reviewed By: eellison

Differential Revision: D19418935

fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573
2020-02-05 15:32:14 -08:00
ab75d64e6e Add ability to abort NCCL communicators from the store. (#32895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32895

When a particular rank calls `ncclCommAbort` on a communicator, it is
important to ensure all other ranks call `ncclCommAbort` on their respective
communicators. If this is not done, the other ranks could get stuck causing the
GPU to spin with 100% utilization.

To alleviate this issue, whenever any rank calls `ncclCommAbort` we put the
unique communicator id in the store. The NCCL watchdog thread then monitors the
store and aborts any communicators found in the store as "aborted".

A few more general fixes in this PR:

1) Use std::shared_ptr for the store in PrefixStore. PrefixStore was using a
reference to the store and when that reference went out of scope the store
object it was holding onto was invalid. This caused a segfault in the watchdog
thread.
2) Enhanced logging for the watchdog thread.

Test Plan: waitforbuildbot

Differential Revision: D19638159

fbshipit-source-id: 596cd87c9fe6d4aeaaab4cb7319cc37784d06eaa
2020-02-05 15:28:05 -08:00
df1d68d52e [jit] fix parser for one-line functions (#32941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32941

The Python grammar allows single-statement one-line functions. So we
should allow it in the string parser.

Test Plan: Imported from OSS

Differential Revision: D19704153

Pulled By: suo

fbshipit-source-id: 8c06cc9c600aa2a9567b484a1ecc0360aad443e3
2020-02-05 13:11:47 -08:00
908b451efb Enabling the nccl/rccl test for ROCM environment (#32340)
Summary:
Enabling the RCCL test on rocm by adding a temporary grace period to clean up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32340

Differential Revision: D19744459

Pulled By: xw285cornell

fbshipit-source-id: 1af3b64113a67f93e622d010ddd3020e5d6c8bc8
2020-02-05 12:02:31 -08:00
e8581869f2 Properly update _flat_weights in RNN models (#32989)
Summary:
Resubmitting https://github.com/pytorch/pytorch/issues/32939
Should fix https://github.com/pytorch/pytorch/issues/32346 hopefully. Now when _flat_weights list is updated, None elements are appended to it if some weights are missing, subsequent setattr calls for the missing weights should repair _flat_weights and make it suitable to use in the backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32989

Differential Revision: D19731952

Pulled By: ngimel

fbshipit-source-id: 2118a19840491e7ab0fef15185fad982f42795a6
2020-02-05 11:53:41 -08:00
72b9412be2 Move some broadcasting logic away from codegen. (#32982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32982

For masked_scatter_ and masked_fill_ (which already have manually written wrappers), move the broadcasting logic into the manually written wrappers.

Test Plan: Imported from OSS

Differential Revision: D19726830

Pulled By: gchanan

fbshipit-source-id: 1f6e55e19c1314a76e43946b14d58f147c0f8204
2020-02-05 10:23:49 -08:00
fbde3c05b6 [aten] fix vector memory leak (#32478)
Summary:
free(y) missing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32478

Differential Revision: D19728471

Pulled By: agolynski

fbshipit-source-id: 73e7933c832f9c19f3fe09df76699c7b335a87bd
2020-02-05 10:18:54 -08:00
81a9046301 Fix dispatch of argmax/argmin. (#32961)
Summary:
The way we currently dispatch argmax/argmin to out-of-source devices is bad and caused issues, e.g it doesn't work well when the input requires grad. https://github.com/pytorch/xla/issues/1585.
Making argmax/argmin dispatch at device level resolves it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32961

Differential Revision: D19726826

Pulled By: ailzhang

fbshipit-source-id: f7fb445fd8e7691524afcc47d24d8e6b0171d10c
2020-02-05 10:17:50 -08:00
3531f99384 Kill _th_max, _th_min overloads that aren't used.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32981

Test Plan: Imported from OSS

Differential Revision: D19726831

Pulled By: gchanan

fbshipit-source-id: 22b5b9115838360850c4ee250ed95742f3444dc8
2020-02-05 09:20:21 -08:00
16c166e2ea Add XLAPreAutograd key for XLA use cases that need custom autograd. (#32788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32788

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19628643

Pulled By: ezyang

fbshipit-source-id: 7099b08eff37913144b961dda00b070bd4b939d4
2020-02-05 08:10:02 -08:00
6b0813ea5d Stop using dispatchTypeId to do checks for tensor list unwrap. (#32787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32787

Gets rid of a longstanding TODO.  TensorList unwrap is only used for cat, which
means we can assume that the inputs are dense, and do something similar to how
we do the dense tensor wrapping above.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19628642

Pulled By: ezyang

fbshipit-source-id: 3264439407585fb97995a9a2302c2913efecb421
2020-02-05 08:08:16 -08:00
1b446aa2ee Expose Channel Last 3d enum
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32947

Test Plan: Imported from OSS

Differential Revision: D19707716

Pulled By: glaringlee

fbshipit-source-id: 03824769376043bc6151a4580aba27654de5077f
2020-02-04 23:33:19 -08:00
836b4c9e64 Attempt to workaround MSVC17 static constexpr bug
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33002

Test Plan: Imported from OSS

Differential Revision: D19739097

Pulled By: jamesr66a

fbshipit-source-id: 7ce54ddb1f56a741d88d3215b154192171c54dfa
2020-02-04 22:33:22 -08:00
f393adc0ed [JIT] Fix python pickle serialization for torchbind (#32878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32878
ghstack-source-id: 97736045

Test Plan: Imported from OSS

Differential Revision: D19669879

fbshipit-source-id: 23ea91cffe7344d1eed014e2509983c281dd18d3
2020-02-04 19:29:55 -08:00
23a4800708 [JIT] Make IRParser use op schema (#32854)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32854
ghstack-source-id: 97736043

Test Plan: Imported from OSS

Differential Revision: D19656881

fbshipit-source-id: 509d09fdbd765ca5cd153bec6440aedfb4e6d23b
2020-02-04 19:29:50 -08:00
bc4790b3aa [JIT] Trace uses of torchbind classes as module attributes (#32833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32833
ghstack-source-id: 97736046

Test Plan: Imported from OSS

Differential Revision: D19645714

fbshipit-source-id: 10a7271f13c3588aea666b44b916e90ba7b3c666
2020-02-04 19:28:37 -08:00
d141465713 Fix torch::allclose to handle std::numeric_limits<T>::lowest() for integral types (#32978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32978

Fixes #32946

Test Plan: Imported from OSS

Differential Revision: D19726013

Pulled By: pbelevich

fbshipit-source-id: ada4aeabc8e39016d24f1a40f02fb7c56f069cd3
2020-02-04 19:06:52 -08:00
e4f633ba0b Updating submodules
Summary:
GitHub commits:

619d2503cb
c442208177
75d9b18eba
ed5142083a

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 11a53fea064f8e40c2a89d3068421d7cad231d00
2020-02-04 16:36:24 -08:00
4502d8c391 Interpolate Float [] support in ONNX (#32554)
Summary:
The PR https://github.com/pytorch/pytorch/pull/31791 adds support for float[] constant, which affects some cases of ONNX interpolate support.
This PR adds float[] constants support in ONNX, updates interpolate in ONNX, and re-enable the disabled tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32554

Reviewed By: hl475

Differential Revision: D19566596

Pulled By: houseroad

fbshipit-source-id: 843f62c86126fdf4f9c0117b65965682a776e7e9
2020-02-04 16:14:40 -08:00
bda874b480 [rpc] throw correct Exception on local client based on the RemoteException (#32936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32936

Closes https://github.com/pytorch/pytorch/issues/32732. Currently if a
UDF run in RPC throws an exception such as ValueError or TypeError, we wrap
this in a RemoteException on the callee side. When raising this on the caller
side, we currently raise a vanilla Exception. This diff changes it so that the
correct exception is thrown. Tested by changing the current rpc tests to assert
on the right type of error rather than just the base `Exception`.
ghstack-source-id: 97706957

Test Plan: Modified unit test.

Differential Revision: D19700434

fbshipit-source-id: e451b772ea6aecc1d2e109e67e7f932eb9151f15
2020-02-04 16:08:25 -08:00
a9141dd240 Patch Half.h for compiling CUDA with clang (#29027)
Summary:
Following discussion: https://github.com/pytorch/pytorch/issues/28417
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29027

Differential Revision: D19698745

Pulled By: ezyang

fbshipit-source-id: fab4be3bcbac8f3b334d7e0a56e6a790e2c6b6d8
2020-02-04 15:05:52 -08:00
7ea6559658 Add size checks to torch.stack (#32931)
Summary:
Checks the size of each tensor passed to `torch.stack` before calling `cat` to address https://github.com/pytorch/pytorch/issues/29510. This is done in the `get_stack_input` function as that is a common path. The function now compares the size of each tensor in the TensorList to the size of the first tensor and throws an exception when the sizes are not equal.

To compare:
```
x = torch.zeros([1, 2])
y = torch.zeros([1, 3])
torch.stack([x, y]) # Errors due to size differences
```
Current error:
```
RuntimeError: invalid argument 0: Sizes of tensors must match
except in dimension 0. Got 2 and 3 in dimension 2 at (path)\aten\src\TH/generic/THTensor.cpp:612
```
New error:
```
RuntimeError: stack expects each tensor to be equal size, but
got [1, 2] at entry 0 and [1, 3] at entry 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32931

Differential Revision: D19700110

Pulled By: ezyang

fbshipit-source-id: 7e18bb00fa2c137e418e340d719b6b76170b83e3
2020-02-04 15:00:54 -08:00
58e8d5588a [ONNX] Export bitwise_not for bool (logical_not) (#28439)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25805 (for bool tensors as in the issue)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28439

Differential Revision: D19700156

Pulled By: ezyang

fbshipit-source-id: 0706ada6a8d259dce381ba2d009f226e14c3c14f
2020-02-04 14:45:58 -08:00
4f5908d5d7 Remove unneded TORCH_API (#32015)
Summary:
It was causing a build error when compiling on MINGW64
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32015

Differential Revision: D19697296

Pulled By: ezyang

fbshipit-source-id: 71e58783c48f8e99755c091b2027d59740dfca47
2020-02-04 14:44:35 -08:00
6305e4a88f Add warning and example for seeding to DistributedSampler (#32951)
Summary:
Closes gh-31771

Also note that the `epoch` attribute is *only* used as a manual seed in each iteration (so it could easily be changed/renamed).  Seeding consecutive iterations with `[0, 1, 2, ...]` is low-entropy, however in practice it probably doesn't matter when using the sampler in combination with a dataloader (because there won't be enough data nor epochs to run into statistical issues
due to low-entropy seeding). So leaving that as is.

Rendered docstring:

<img width="534" alt="image" src="https://user-images.githubusercontent.com/98330/73701250-35134100-46e9-11ea-97b8-3baeb60fcb37.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32951

Differential Revision: D19729333

Pulled By: ezyang

fbshipit-source-id: 3ddf90a3828b8bbae88aa2195a5d0b7d8ee1b066
2020-02-04 14:36:59 -08:00
b0d5ce3848 Revert D19710990: [pytorch][PR] properly update _flat_weights in RNN modules
Test Plan: revert-hammer

Differential Revision:
D19710990

Original commit changeset: c978c7519464

fbshipit-source-id: 8710bc2f4f1d01d9c93d038b59caf1e6859375dd
2020-02-04 14:35:55 -08:00
cyy
27e1fecabd let user specify CUDA_HOST_COMPILER
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32904

Differential Revision: D19729047

Pulled By: ezyang

fbshipit-source-id: c233e3924f71a025c51d25a7e3a8d728dac8730a
2020-02-04 14:32:12 -08:00
d3a0bdd06b proofreading (#29797)
Summary:
two instances of if -> it in torch.nn.modules.batchnorm.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29797

Differential Revision: D19698613

Pulled By: ezyang

fbshipit-source-id: 7312b2333f227113e904dfa91db90d00e525affb
2020-02-04 14:30:36 -08:00
ea968f5cc3 fix possible pandas import error during tensorboard tests (#29650)
Summary:
TensorBoard tests using SummaryWriter() may fail with a pandas import
complaint if TensorFlow packages are installed in the same python
environment as PyTorch:

Traceback (most recent call last):
  File "test_tensorboard.py", line 212, in test_writer
    with self.createSummaryWriter() as writer:
  File "test_tensorboard.py", line 64, in createSummaryWriter
    return SummaryWriter(temp_dir)
...
  File "[...]/site-packages/pandas/core/arrays/categorical.py", line 52, in <module>
    import pandas.core.algorithms as algorithms
AttributeError: module 'pandas' has no attribute 'core'

The exact failure may depend on the pandas version. We've also seen:

  File "[...]/site-packages/pandas/core/arrays/categorical.py", line 9, in <module>
    import pandas.compat as compat
AttributeError: module 'pandas' has no attribute 'compat'

The module import chain leading to the failure is tensorboard imports
tensorflow imports tensorflow_estimator imports pandas. pandas includes
a submodule named 'bottleneck', whose name collides with the PyTorch
'test/bottleneck/' subdirectory.

So IF tensorboard, tensorflow, tensorflow_estimator, and pandas are
installed in the python environment AND IF testing is run from within
PyTorch's 'test/' directory (or maybe just with 'test/' in PYTHONPATH,
etc.), then TensorBoard tests using SummaryWriter() will fail.

Rename the 'bottleneck/' directory slightly to avoid the name collision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29650

Differential Revision: D19698638

Pulled By: ezyang

fbshipit-source-id: cb59342ed407cb37aefc833d67f768a8809129ac
2020-02-04 14:27:46 -08:00
478356aeec Fix broken links in governance.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30815

Differential Revision: D19697401

Pulled By: ezyang

fbshipit-source-id: d7e1a1b54039624f471b6cfb568428feb73060f4
2020-02-04 14:26:09 -08:00
18d1896ba0 Fix confusing "does not have GPU support" warning message (#30721)
Summary:
Many people who use caffe2 are confused about "does not have GPU support" warning message.
https://github.com/facebookresearch/video-nonlocal-net/issues/6
facebookarchive/caffe2#346
facebookarchive/caffe2#1634
facebookarchive/caffe2#197

Many none GPU reasons can cause this warning message. It is better to give the error info.
![image](https://user-images.githubusercontent.com/13826327/70129721-41175e00-16ba-11ea-85df-a4b1a1690149.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30721

Differential Revision: D19697413

Pulled By: ezyang

fbshipit-source-id: bd24b7c814e7e677352068b9e9f77a68de080159
2020-02-04 14:20:00 -08:00
67706187fb Fix a broken link in contribution_guide.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30814

Differential Revision: D19697403

Pulled By: ezyang

fbshipit-source-id: b01fd0e189b3bc7ccaa197c9c64e12fee70a6310
2020-02-04 14:14:25 -08:00
b69c685c4a try to find cudnn header in /usr/include/cuda (#31755)
Summary:
With fedora negativo17 repo, the cudnn headers are installed in /usr/include/cuda directory, along side with other cuda libraries.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31755

Differential Revision: D19697262

Pulled By: ezyang

fbshipit-source-id: be80d3467ffb90fd677d551f4403aea65a2ef5b3
2020-02-04 14:10:32 -08:00
e999095594 Updating submodules
Summary:
GitHub commits:

8f3d7019bb
a5df50cf5c
b896a52075
3a073234da
7c05bee055
90f0aa9665
5cdd1abbb9

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 70dd062814f68bda77e119bb9deaefbf71c551e6
2020-02-04 13:00:26 -08:00
d3fa68eeec Fix for MKL detection script on Windows (#32970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32914.
1. Use `DEFINED ENV{MKLProductDir}` instead of `$ENV{MKLProductDir}`
2. Cache `INTEL_COMPILER_DIR` and `INTEL_MKL_DIR`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32970

Differential Revision: D19727677

Pulled By: soumith

fbshipit-source-id: 065c6bee35a2295f1c478df1460cad7668b25af5
2020-02-04 12:41:39 -08:00
e922826dda [pytorch] simplify lazy initialization of DefaultCPUGenerator singleton (#32897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32897

Moving the default static instance into the method to achieve the same purpose.
ghstack-source-id: 97570792

Test Plan: - CI

Reviewed By: dreiss

Differential Revision: D19674566

fbshipit-source-id: 27f54da66dd7667c34905eddaac6579e64aa1118
2020-02-04 11:37:14 -08:00
aa3c871739 Adds TestViewOps, updates documentation (#32512)
Summary:
Understanding which ops return views and which return tensors with new storage is a common user issue, and an issue for developers connecting accelerators to PyTorch, too. This generic test suite verifies that ops which should return views do (and a few ops that shouldn't don't).  The documentation has also been updated for .t(), permute(), unfold(), and select() to clarify they return views.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32512

Differential Revision: D19659454

Pulled By: mruberry

fbshipit-source-id: b4334be9b698253a979e1bb8746fdb3ca24aa4e3
2020-02-04 11:10:34 -08:00
341fb6d11d Make caffe2/caffe2/python/models/seq2seq python3 compatible
Test Plan: watiforsadcastle

Reviewed By: dzhulgakov

Differential Revision: D19698403

fbshipit-source-id: 36b73e07e598c848abbe368e522484da9ba4c78f
2020-02-04 10:51:47 -08:00
Jie
9e7c47644f [NHWC CUDNN CONV]Update cudnn convolution memory_format behavior (#32482)
Summary:
1. Allows both the memory_format of weight & input to dictate the output
memory_format.
2. Provides utility function to recursively convert memory_format of Conv2d and
ConvTranspose2d layers. This allows easy model conversion and ensures that lost
memory_format through incompatible layers could be restored at Convolution-like
layer, where significant performance boost is expected on later generation CUDA
devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32482

Differential Revision: D19647903

Pulled By: VitalyFedyunin

fbshipit-source-id: 62c96ff6208ff5e84fae1f55b63af9a010ad199a
2020-02-04 09:50:57 -08:00
ec2c974bd5 Simplify some TH codegen by moving code out of the switch and killing dead code. (#32888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32888

This kills ~1500 lines of generated code by doing the following:
1) Stop binding _th_clone, which isn't used anymore.

2) Move allocation code out of the switch, because it doesn't need to be there, example:
Now:
```
auto dispatch_scalar_type = infer_scalar_type(self);
auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(c10::Storage(scalarTypeToTypeMeta(dispatch_scalar_type), 0, allocator(), true),DispatchKey::CPUTensorId).release();
auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_));
switch (dispatch_scalar_type) {
    case ScalarType::Bool: {
        ...
    case ScalarType::Byte: {
	    ...
```
Before:
```
auto dispatch_scalar_type = infer_scalar_type(self);
switch(dispatch_scalar_type) {
    case ScalarType::Bool: {
       	auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<bool>(), 0, allocator(), true),DispatchKey::CPUTensorId).release();
        auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_));
    case ScalarType::Byte: {
        auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<byte>(), 0, allocator(), true),DispatchKey::CPUTensorId).release();
        auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_));
```

Note there's one extra lookup from ScalarType -> TypeMeta, but that can go away once we are able to put everything in a dispatch macro.

3) Prepare for more moves out of the switch by using dispatch_scalar_type where we would have used an explicit ScalarType::Name
More moves are currently blocked by "real" types needing to map scalar_type -> C++ type.  Dispatch macros can solve that, but I'll need to wrap the actual TH calls in templates so the entire
thing can be done via dispatch.

4) Kill some codegen that isn't used anymore: ALLOC_WRAP, is_actual_return_long.

Test Plan: Imported from OSS

Differential Revision: D19672613

Pulled By: gchanan

fbshipit-source-id: 753f480842d11757e10182e43b471bd3abaa5446
2020-02-04 08:41:20 -08:00
820410b505 Added upsample_neartest2d op for lite interpreter. (#32913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32913

This enables mobile detection and tracking models.

Test Plan: buck test caffe2/test/cpp/jit:jit -- JitTest.LiteInterpreterUpsampleNearest2d

Reviewed By: iseeyuan

Differential Revision: D19664502

fbshipit-source-id: 1c7270dcf394aba7b510c5aa80552c58a5038f24
2020-02-04 07:59:03 -08:00
b894dc06de [Pytorch] Propagate errors in clearAndWaitForOutstandingRpcsAsync. (#32952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32952

When the Async() version of clearAndWaitForOutstandingRpcs() was written,
we didn't yet have the generic Future<T> class, and hadn't worked out our
error model fully.

This change fixes that method to properly propagate the first encountered error
to the future, using a bool+CAS.
ghstack-source-id: 97665749

Test Plan: existing test coverage, buck test mode/dev-nosan caffe2/test/...

Differential Revision: D19710337

fbshipit-source-id: 66ce5593a94a16ea624930dbb9409917ef5cfd5d
2020-02-03 20:47:51 -08:00
b4b1b100bd Add a loop test for onnxified net (#32935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32935

Mock away the content of onnxified net with some low cost ops so that we can still mimic the input/output transfer while doing minimal work on the card.

Test Plan:
```
buck run glow/fb/test:sparsenn_test -- --gtest_filter='SparseNNTest.vanillaC2' --onnxifi_debug_mode --onnxifi_loop_test_mode --nocaffe2_predictor_use_memonger
```

Differential Revision: D19631971

fbshipit-source-id: f970c55ccb410702f479255eeb750e01e3f8c2ae
2020-02-03 18:35:41 -08:00
df71b3e23a properly update _flat_weights in RNN modules (#32939)
Summary:
Should fix https://github.com/pytorch/pytorch/issues/32346 hopefully. Now when _flat_weights list is updated, `None` elements are appended to it if some weights are missing, subsequent `setattr` calls for the missing weights should repair _flat_weights and make it suitable to use in the backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32939

Differential Revision: D19710990

Pulled By: ngimel

fbshipit-source-id: c978c7519464e94beeffa9bc33b9172854a2f298
2020-02-03 18:27:00 -08:00
3cac9900ca Clarify when softplus is reverted to linear. (#32945)
Summary:
The default value is removed because it is explained right below.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32945

Reviewed By: soumith

Differential Revision: D19706567

Pulled By: ailzhang

fbshipit-source-id: 1b7cc87991532f69b81aaae2451d944f70dda427
2020-02-03 17:54:31 -08:00
544eab37d0 Move deprecation warning out of generated code into python_arg_parser. (#32907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32907

All op-specific information used in this logic was available to the
parser itself, so the check can be done in that context, no codegen
needed.

No change in the warning behavior itself, mod minor formatting tweak -
passes existing tests. Saves like ~275K binary size on mac:
```
-rwxr-xr-x  1 bhosmer  1876110778   16502064 Feb  1 00:43 torch/lib/libtorch_python.dylib
-rwxr-xr-x  1 bhosmer  1876110778   16247888 Feb  1 00:44 torch/lib/libtorch_python.dylib
```

[codegen diff](https://github.com/bhosmer/scratch/compare/deprecation_warning_before...deprecation_warning_after)

More important than the size savings is the minimization of codegen. Ideally the generated artifact should express distinctive per-op properties in as minimal a form as practically possible - e.g. here instead of generating check-and-warn behavior into every binding, we generate only the data that triggers the behavior in the parser. (And actually we were generating it already.)

Test Plan: Imported from OSS

Differential Revision: D19679928

Pulled By: bhosmer

fbshipit-source-id: cf0140573118430720c6b797c762fe5be98acd86
2020-02-03 17:47:04 -08:00
612e621da0 Improve CHECK_OP macro (#29539)
Summary:
- Show values in question like glog.
- Handle expressions with logical operators properly by adding
  parentheses around expressions.
- Allow outputting nullptr (some build failed without this)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29539

Reviewed By: dreiss

Differential Revision: D19698991

Pulled By: ljk53

fbshipit-source-id: e329c01622cfc386ac009904092519a4adfe94a8
2020-02-03 17:27:41 -08:00
5ca7bf453d Tests for verifying behaviour of BatchNorm using 0-dim batch sizes. (#32384)
Summary:
The `BatchNorm*` part of the issue (see gh-12013) seems to have been fixed in the master branch and these tests would make it concrete.

However I would appreciate comments on https://github.com/pytorch/pytorch/issues/12013#issuecomment-575871264 on whether the current behaviour is satisfactory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32384

Differential Revision: D19704154

Pulled By: ngimel

fbshipit-source-id: 1bbbbf1ae1215a460b22cf26e6b263e518ecf60b
2020-02-03 16:58:23 -08:00
9c2ed2574a Vectorized memory access in TensorIterator GPU loop for 1d contiguous case (#32383)
Summary:
Step 2 of https://github.com/pytorch/pytorch/issues/31975

Vectorized memory access is enabled. Generated code: https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb

```
void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char*, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char*, 3>)

**ASM:**

	.section	.text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits
	.sectioninfo	@"SHI_REGISTERS=20"
	.align	128
        .global         _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_
        .type           _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function
        .size           _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40898 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_)
        .other          _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT"
_ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_:
.text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 294
        /*0000*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R9, SR_CTAID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 177
        /*0030*/                   S2R R0, SR_TID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 294
        /*0040*/                   IMAD.SHL.U32 R9, R9, 0x100, RZ ;
        /*0050*/                   IADD3 R5, -R9, c[0x0][0x160], RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256
        /*0060*/                   SHF.R.S32.HI R17, RZ, 0x1f, R9 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 296
        /*0070*/                   ISETP.GE.AND P0, PT, R5, 0x100, PT ;
        /*0080*/              @!P0 BRA `(.L_3173) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256
        /*0090*/                   IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ;
        /*00a0*/                   SHF.L.U64.HI R17, R9, 0x2, R17 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 260
        /*00b0*/                   IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ;
        /*00c0*/                   IADD3 R2, P1, R12, c[0x0][0x190], RZ ;
        /*00d0*/                   IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ;
        /*00e0*/                   IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 218
        /*00f0*/                   IMAD.WIDE R8, R0, 0x10, R8 ;
        /*0100*/                   IMAD.WIDE R2, R0, 0x10, R2 ;
        /*0110*/                   LDG.E.128.SYS R8, [R8] ;
        /*0120*/                   LDG.E.128.SYS R4, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256
        /*0130*/                   IADD3 R12, P0, R12, c[0x0][0x180], RZ ;
        /*0140*/                   IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 238
        /*0150*/                   IMAD.WIDE R12, R0, 0x10, R12 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196
        /*0160*/                   FFMA R7, R7, c[0x0][0x168], R11 ;
        /*0170*/                   FFMA R6, R6, c[0x0][0x168], R10 ;
        /*0180*/                   FFMA R5, R5, c[0x0][0x168], R9 ;
        /*0190*/                   FFMA R4, R4, c[0x0][0x168], R8 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 238
        /*01a0*/                   STG.E.128.SYS [R12], R4 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 301
        /*01b0*/                   EXIT ;
.L_3173:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*01c0*/                   ISETP.GE.AND P0, PT, R0, R5, PT ;
        /*01d0*/                   BMOV.32.CLEAR RZ, B0 ;
        /*01e0*/                   BSSY B0, `(.L_3174) ;
        /*01f0*/               P0 BRA `(.L_3175) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*0200*/                   IADD3 R3, P1, R9, R0, RZ ;
        /*0210*/                   LEA.HI.X.SX32 R4, R0, R17, 0x1, P1 ;
        /*0220*/                   LEA R2, P1, R3, c[0x0][0x188], 0x2 ;
        /*0230*/                   LEA.HI.X R3, R3, c[0x0][0x18c], R4, 0x2, P1 ;
        /*0240*/                   LDG.E.SYS R8, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184
        /*0250*/                   IADD3 R4, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*0260*/                   ISETP.GE.AND P1, PT, R4, R5, PT ;
        /*0270*/               P1 BRA `(.L_3175) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*0280*/                   LDG.E.SYS R4, [R2+0x100] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184
        /*0290*/                   IADD3 R6, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*02a0*/                   ISETP.GE.AND P1, PT, R6, R5, PT ;
        /*02b0*/               P1 BRA `(.L_3175) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184
        /*02c0*/                   IADD3 R10, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*02d0*/                   LDG.E.SYS R7, [R2+0x200] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*02e0*/                   ISETP.GE.AND P1, PT, R10, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*02f0*/              @!P1 LDG.E.SYS R6, [R2+0x300] ;
.L_3175:
        /*0300*/                   BSYNC B0 ;
.L_3174:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*0310*/                   BMOV.32.CLEAR RZ, B0 ;
        /*0320*/                   BSSY B0, `(.L_3176) ;
        /*0330*/               P0 BRA `(.L_3177) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*0340*/                   IADD3 R3, P1, R9, R0, RZ ;
        /*0350*/                   LEA.HI.X.SX32 R10, R0, R17, 0x1, P1 ;
        /*0360*/                   LEA R2, P1, R3, c[0x0][0x190], 0x2 ;
        /*0370*/                   LEA.HI.X R3, R3, c[0x0][0x194], R10, 0x2, P1 ;
        /*0380*/                   LDG.E.SYS R11, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184
        /*0390*/                   IADD3 R10, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*03a0*/                   ISETP.GE.AND P1, PT, R10, R5, PT ;
        /*03b0*/               P1 BRA `(.L_3177) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*03c0*/                   LDG.E.SYS R13, [R2+0x100] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184
        /*03d0*/                   IADD3 R10, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*03e0*/                   ISETP.GE.AND P1, PT, R10, R5, PT ;
        /*03f0*/               P1 BRA `(.L_3177) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184
        /*0400*/                   IADD3 R10, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180
        /*0410*/                   ISETP.GE.AND P1, PT, R10, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183
        /*0420*/                   LDG.E.SYS R10, [R2+0x200] ;
        /*0430*/              @!P1 LDG.E.SYS R15, [R2+0x300] ;
.L_3177:
        /*0440*/                   BSYNC B0 ;
.L_3176:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*0450*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196
        /*0460*/                   IADD3 R9, P0, R9, R0, RZ ;
        /*0470*/                   FFMA R11, R11, c[0x0][0x168], R8 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197
        /*0480*/                   IADD3 R14, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196
        /*0490*/                   LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ;
        /*04a0*/                   LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*04b0*/                   ISETP.GE.AND P1, PT, R14, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196
        /*04c0*/                   LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ;
        /*04d0*/                   STG.E.SYS [R2], R11 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*04e0*/               P1 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197
        /*04f0*/                   IADD3 R8, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196
        /*0500*/                   FFMA R13, R13, c[0x0][0x168], R4 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*0510*/                   ISETP.GE.AND P0, PT, R8, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196
        /*0520*/                   STG.E.SYS [R2+0x100], R13 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*0530*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197
        /*0540*/                   IADD3 R0, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196
        /*0550*/                   FFMA R7, R10, c[0x0][0x168], R7 ;
        /*0560*/                   FFMA R15, R15, c[0x0][0x168], R6 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*0570*/                   ISETP.GE.AND P0, PT, R0, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196
        /*0580*/                   STG.E.SYS [R2+0x200], R7 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193
        /*0590*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196
        /*05a0*/                   STG.E.SYS [R2+0x300], R15 ;
        /*05b0*/                   EXIT ;
.L_3178:
        /*05c0*/                   BRA `(.L_3178);
        /*05d0*/                   NOP;
        /*05e0*/                   NOP;
        /*05f0*/                   NOP;
.L_40898:
```

We can clearly see the `LDG.E.128` in it, which is a result of vectorization.

Benchmark: https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-vec.ipynb

Benchmark on P100, dtype `uint8`:

before:
```
1.4.0a0+a5b4d78
e1d97025eeeddcf083e9bee0c8f6a53168991a71
22.2 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
34.7 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
52 µs ± 312 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
86.9 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
154 µs ± 204 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
291 µs ± 668 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
566 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.18 ms ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.29 ms ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.4 ms ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

after:
```
1.4.0a0+a5b4d78
1281cdfd8188fe86241ecaf71d001809d016c3a3
24 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
30.5 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
43.1 µs ± 300 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
67.6 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
116 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
215 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
413 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
824 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.63 ms ± 478 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.19 ms ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

Benchmark on P100, dtype `half`:

Before:
```
1.4.0a0+a5b4d78
1c017f0c14c91bd5125ab387a90441b0c0e2f3ad
30.8 µs ± 226 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
43.4 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
69.1 µs ± 83 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
119 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
224 µs ± 99.1 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
418 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
865 µs ± 237 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.69 ms ± 695 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.3 ms ± 527 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.77 ms ± 741 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After

```
1.4.0a0+a5b4d78
7e50ee27333e7047072d328d03767b4845286356
28.9 µs ± 61.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
40.2 µs ± 244 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
63.8 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
109 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
199 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
380 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
743 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.47 ms ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.91 ms ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.8 ms ± 296 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

cc: csarofeen ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32383

Differential Revision: D19697455

Pulled By: ngimel

fbshipit-source-id: 0707481c2f334e6634c000b4afd275b2fee8fbe1
2020-02-03 16:20:40 -08:00
4baadd54d7 add SpatialBN lowered fake fp16
Summary:
SpatialBNFakeLoweredFp16NNPI

this is the fake operator for SpatialBN that gets lowered into add/mul/div, etc.

Test Plan: test_spatialbn

Reviewed By: tracelogfb, amylittleyang

Differential Revision: D19658680

fbshipit-source-id: 2abddbcd9a2023ac75c494f20eaac2051b7139dc
2020-02-03 15:03:34 -08:00
5c019fede3 [ONNX] Fix for constant folding flaky tests (#32546)
Summary:
Fix for constant folding flaky tests
Looks like the constant folding test modules are sometimes exported with ONNX_ATEN op export type, which is causing the CI failures.
I'm unable to repro this issue locally, but my guess is that the op export param is being overwritten on CI build at some point.
This PR sets the op export type and hopefully fixes the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32546

Reviewed By: hl475

Differential Revision: D19606919

Pulled By: houseroad

fbshipit-source-id: 31793d6857bbbf99b43b4a7c22a045a56ae19e44
2020-02-03 14:23:50 -08:00
a751ddaaa5 Use leaky singletons for torch.distributed. (#32923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32923

As per
https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 and
https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use-members, we
should be using leaky singletons to avoid static initialization order problem.

Closes https://github.com/pytorch/pytorch/issues/27412
ghstack-source-id: 97601384

Test Plan: waitforbuildbot

Differential Revision: D19688986

fbshipit-source-id: 8c1935fb7da8a7116dbca55eb43dc04bc02695ac
2020-02-03 14:15:18 -08:00
6996f8d880 Add missing default_collate in dataloader.pyi
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28935

Differential Revision: D19698781

Pulled By: ezyang

fbshipit-source-id: abdd735c98656ed16cd326529441d1fcec2ace3e
2020-02-03 14:01:49 -08:00
1c42b9466b [ONNX] Update support of exporting bool type index mask (#32445)
Summary:
e.g. `tensor[torch.tensor([0, 1, 0], dtype=torch.bool)]`
Previously the mask is of type uint8. Both uint8 and bool should be supported for export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32445

Reviewed By: hl475

Differential Revision: D19610713

Pulled By: houseroad

fbshipit-source-id: 8df636e0c3cb0b82919a689242a962c79220209c
2020-02-03 13:01:14 -08:00
e03e4f3a2d [ONNX] Add einsum export (#32716)
Summary:
Adding symbolic for onnx einsum as part of opset 12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32716

Reviewed By: hl475

Differential Revision: D19626168

Pulled By: houseroad

fbshipit-source-id: d8cc8af5f05f36aca3cd55dead602261ccdfec51
2020-02-03 12:56:50 -08:00
167a892e99 Add missing shuffle attribute to DistributedSampler typing file
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28763

Differential Revision: D19698808

Pulled By: ezyang

fbshipit-source-id: 7820acd7b0715ebf1d9ae954dca0058b6759075e
2020-02-03 12:02:58 -08:00
48eff08256 Fix the level of headers in pytorch/CONTRIBUTING.md (#28412)
Summary:
**Running Clang-Tidy**, **Pre-commit Tidy/Linting Hook**, **Building PyTorch with ASAN** shouldn't belong to **Windows development tips**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28412

Differential Revision: D19700228

Pulled By: ezyang

fbshipit-source-id: 39d999c68e4bd9264f4ae1fdab517871c883a663
2020-02-03 11:50:25 -08:00
14c15eb3b0 Py2 -> py3 for caffe2/caffe2/contrib/tensorboard (#32882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32882

Update tensorboard binary and unit tests to python 3

Test Plan:
```
> buck test //caffe2/caffe2/contrib/tensorboard:tensorboard_test
```
```
> buck test //caffe2/caffe2/contrib/tensorboard:tensorboard_exporter_test
```

Reviewed By: sanekmelnikov

Differential Revision: D19670873

fbshipit-source-id: f5eb65ccbb4ecfdc801b9fa05a60d4c5c29dc428
2020-02-03 11:36:35 -08:00
00c6b90327 Fix in documentation of convolutional modules (#30079)
Summary:
I noticed the description of the initialization of convolutional modules is inconsistent with the actual implementation. There are two such cases:

1) `k` in the initialization of ConvTranspose modules is not dependent on the input channels but on the output channels (`kaiming_uniform_` uses the size of the second dimension of `weight` which is transposed in the first two dimensions).

2) Both the normal convolutions and the transposed ones use `k` divided by `groups`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30079

Differential Revision: D19698511

Pulled By: ezyang

fbshipit-source-id: 1ba938fbbd97663eaf29fd1245872179d2761fff
2020-02-03 11:22:36 -08:00
37953d92d1 raise when jit-load.ing a folder (#27836)
Summary:
Very similar to https://github.com/pytorch/pytorch/issues/16267 but handling directories.

Stoked to contribute!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27836

Differential Revision: D19698398

Pulled By: ezyang

fbshipit-source-id: eabc3a44d258124f860babb47ab91e22c2c3d6cc
2020-02-03 11:19:57 -08:00
3fa907c145 [docs] Fix argument type of torch.masked_select (#30385)
Summary:
This should be `BoolTensor`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30385

Differential Revision: D19698414

Pulled By: ezyang

fbshipit-source-id: 68f1e10eb9d4b99552bb158f6ad7e6ff0f7cc1c4
2020-02-03 11:15:11 -08:00
10183061eb [ONNX] Update ONNX landing page since 1.3 (#32805)
Summary:
* New ops supported for exporting.
* Updates on support for tensor indexing and dynamic list of tensors.
* lara-hdr, spandantiwari Should we also include updates on torchvision support in this page?

cc houseroad, neginraoof Please review if I have missed anything.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32805

Reviewed By: hl475

Differential Revision: D19635699

Pulled By: houseroad

fbshipit-source-id: b6be4fce641f852dcbceed20b4433f4037d8024a
2020-02-03 10:38:29 -08:00
ef50161ec9 [JIT] Update OVERVIEW.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28870

Differential Revision: D19698758

Pulled By: ezyang

fbshipit-source-id: 23167ec5bf9f7ab81012a124206bb4c2bdd6ca06
2020-02-03 10:32:36 -08:00
7cddc302e5 min, max: check that operand and outputs are on the same device type (#32862)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32001
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32862

Differential Revision: D19695935

Pulled By: ezyang

fbshipit-source-id: bb37eb7a187214aa69259828024366f479a258d7
2020-02-03 10:16:22 -08:00
b34e0dda24 Emit the C++ version when compiling pytorch from source. (#32819)
Summary:
The need for this is felt because sometimes we change a build script and change the `std=c++XX` flag, which does not get caught until the compilation has progressed for a while.

https://github.com/pytorch/pytorch/issues/31757
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32819

Differential Revision: D19697205

Pulled By: ezyang

fbshipit-source-id: b045a1d15e24c4c6007b5d1464756051d32bf911
2020-02-03 10:12:03 -08:00
c841ab403c add missing method annotations to torch.Tensor (#30576)
Summary:
Looks like some of the tensor methods defined in https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L393 were missing.

Also add missing self object to `map_`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30576

Differential Revision: D19698355

Pulled By: ezyang

fbshipit-source-id: 6df99f17d5de11715dbe89aecb292612405c08ac
2020-02-03 09:59:14 -08:00
e085c55e53 Fix \\ warnings/errors when building optim documentation (#32911)
Summary:
This PR fixes the warnings and errors attributed to the use of `\\` outside of a proper environment. While rendered correctly in the documentation, it produces the warning
```
LaTeX-incompatible input and strict mode is set to 'warn': In LaTeX, \\ or \newline does nothing in display mode [newLineInDisplayMode]
```
on the CI tools and errors with
```
ParseError: KaTeX parse error: Expected 'EOF', got '\\' at position (x): ...
```
when not set to warn.

This PR also makes minor formatting adjustments. The `CosineAnnealingLR` documentation has been adjusted to remove an unnecessarily large fraction and to improve spacing. The `SGD` documentation has been adjusted so that variables are consistently typeset and so that it follows the convention of punctuating equations. I attached images of the current documentation, the new documentation and a marked version to highlight differences.

* SGD:
New: ![new_sgd](https://user-images.githubusercontent.com/53704971/73596383-98795500-44d6-11ea-97ce-bac02a0a1638.png)
Current: ![current_sgd](https://user-images.githubusercontent.com/53704971/73596384-98795500-44d6-11ea-86d3-b407cebbb513.png)
Marked new: ![marked_sgd](https://user-images.githubusercontent.com/53704971/73596385-98795500-44d6-11ea-9e06-9ac5e5e27270.png)

* CosineAnnealingLR:
New: ![new_calr](https://user-images.githubusercontent.com/53704971/73596382-98795500-44d6-11ea-9c90-02406d297bae.png)
Current: ![current_calr](https://user-images.githubusercontent.com/53704971/73596387-9911eb80-44d6-11ea-93fb-ee72d695312a.png)
Marked new: ![marked_calr](https://user-images.githubusercontent.com/53704971/73596386-9911eb80-44d6-11ea-91a6-ed7a62b4e255.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32911

Differential Revision: D19697114

Pulled By: ezyang

fbshipit-source-id: 567304bd4adcfa4086eae497cb818cf74375fe5d
2020-02-03 09:54:38 -08:00
7101f6b5c0 Properly handle NaN in binary max and min (#32541)
Summary:
The output depends asymmetrically on whether the first or the second
argument is NaN. See https://github.com/pytorch/pytorch/issues/25016 for detail of the issue.

This is part of a continuing effort that was dropped in https://github.com/pytorch/pytorch/issues/30851

The failure in https://github.com/pytorch/pytorch/issues/27185 is resolved by explicitly casting a half type number to float when applying `isnan`.

Close https://github.com/pytorch/pytorch/issues/25016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32541

Differential Revision: D19644643

Pulled By: VitalyFedyunin

fbshipit-source-id: 8d49e6ed5a9996a817df7a9419dc5eee601430bc
2020-02-03 09:04:39 -08:00
e87887ccb4 Update type hints for torch.optim.optimizer.Optimizer (#32900)
Summary:
This PR fixes type hints for `torch.optim.optimizer.Optimizer` object, issue also reported in https://github.com/pytorch/pytorch/issues/23731

To test things I used following optimiser implementation, that is fully covered with type hints:

```python
from typing import Optional, Callable, Union, Iterable

from torch import Tensor
from torch.optim.optimizer import Optimizer

OptClosure = Optional[Callable[[], float]]
_params_t = Union[Iterable[Tensor], Iterable[dict]]

class SGD(Optimizer):
    def __init__(self, params: _params_t, lr: float = 0.1) -> None:
        defaults = dict(lr=lr)
        super(SGD, self).__init__(params, defaults)

    def __setstate__(self, state: dict) -> None:
        super(SGD, self).__setstate__(state)

    def step(self, closure: OptClosure = None) -> Optional[float]:
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad.data
                p.data.add_(-group['lr'], d_p)
        return loss
```

Without fix `mypy` reports bunch of inconsistencies in types and missing properties:

```bash
$ mypy  torch_optimizer/sgd.py
torch_optimizer/sgd.py:14: error: Too many arguments for "__init__" of "Optimizer"
torch_optimizer/sgd.py:17: error: "__setstate__" undefined in superclass
torch_optimizer/sgd.py:19: error: Return type "Optional[float]" of "step" incompatible with return type "None" in supertype "Optimizer"
torch_optimizer/sgd.py:24: error: "SGD" has no attribute "param_groups"
Found 4 errors in 1 file (checked 1 source file)
```

with fix not issues:
```bash
$ mypy  torch_optimizer/sgd.py
Success: no issues found in 1 source file
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32900

Differential Revision: D19697175

Pulled By: ezyang

fbshipit-source-id: d5e2b3c421f69da3df8c32b3d53b4b6d15d61a41
2020-02-03 09:00:01 -08:00
29e6f13cd1 Enable MKL on MacOS if installed (#32905)
Summary:
Fix cmake script that missed MKL directories

Signed-off-by: caozhong <zhong.z.cao@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32905

Differential Revision: D19688496

Pulled By: ezyang

fbshipit-source-id: d04a608eea5f983e153a48b0b1eb0390aebbe6c0
2020-02-02 14:57:43 -08:00
f8dd65f2a1 Updating submodules
Summary:
GitHub commits:

e384ddc186

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 18d4371821439388a6b546a1953c31856c80ec85
2020-02-02 14:56:10 -08:00
ff0ba563d5 Updating submodules
Summary:
GitHub commits:

6eb4ee98ba

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 74dda0be26516756cd4d4d2df2167392fc48074a
2020-02-02 12:22:16 -08:00
71ad88199a Clarify the searched string is displayed in the error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32789

Differential Revision: D19646635

Pulled By: suo

fbshipit-source-id: 18233fee7c75f7da2a1826fb66f78a519e6d9c77
2020-02-01 17:24:37 -08:00
b564eaf7a8 Bug fixes: torch::tensor(floating-point values) -> default dtype, and torch::tensor(integer values) ->at::kLong (#32367)
Summary:
Some of the `torch::tensor` behavior is updated to better match Python API. Fixes https://github.com/pytorch/pytorch/issues/32234.

This PR is BC-breaking in the following way:
- `torch::tensor({1.0f, 2.0f})`: float -> default dtype
- `torch::tensor(at::ArrayRef<int>({1, 2, 3}))`: int -> at::kLong
- `torch::tensor(std::vector<int>({1, 2, 3}))`: int -> at::kLong
- `torch::tensor(at::ArrayRef<float>({1.f, 2.f, 3.f}))`: float -> default dtype
- `torch::tensor(std::vector<float>({1.f, 2.f, 3.f}))`: float -> default dtype
- `torch::tensor(at::ArrayRef<double>({1., 2., 3.}))`: double -> default dtype
- `torch::tensor(std::vector<double>({1., 2., 3.}))`: double -> default dtype
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32367

Differential Revision: D19498484

Pulled By: yf225

fbshipit-source-id: 19c8dc2a56476266153cff4c404e7f84d309eb12
2020-02-01 15:00:07 -08:00
4cc6e6bbbe Adding scalar to the c10 registration type check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32886

Test Plan: Imported from OSS

Differential Revision: D19673484

Pulled By: z-a-f

fbshipit-source-id: ea8478a4fe6788dcb044ec1ab7d51dc50ab3fa60
2020-02-01 13:15:50 -08:00
ce07fb26c0 Updating submodules
Summary:
GitHub commits:

3f4acb24bb
930ea23548
c0c5daf3db

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 878178c5412375d74e7f64d7e4142f57ddbc931f
2020-02-01 13:14:30 -08:00
c83f984906 Updating submodules
Summary:
GitHub commits:

5adba3596a
d8b4f2ff66
daa254211a
9c4684ff10
fdb82b21cb

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 4e74f7e888cc2004ba937d3bb253645fbd2388c5
2020-01-31 23:24:51 -08:00
040bc1d0e1 [JIT] make is_scripting a condvalue (#32871)
Summary:
Add `torch.jit.is_scripting` to the list of CondValues, or values that if they are an input to a if statement we only compile one side of the if. I'm not sure if we actually want this PR.

Pros:
- Makes it easier to add features that are not yet supported in TorchScript (like has_torch_function)
- The current idiom of writing `torch.jit.is_scripting` and factoring out the block to a function annotated with `torch.jit.ignore` is functionally equivalent and much more cumbersome

Cons:
- Makes it easier to add features that are not yet supported in TorchScript
- Perhaps is confusing as a reader what is being compiled. Potentially could give all caps name or otherwise change name to make it more visually stand out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32871

Differential Revision: D19670383

Pulled By: eellison

fbshipit-source-id: 5257b0bd23c66f199d59a7f2c911e948301e5588
2020-01-31 18:23:42 -08:00
4d7ab255d3 [PyTorch][TorchScript] Add support for join on List of strings in TorchScript (#32847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32847

Add support for join on List of strings in TorchScript.

Test Plan:
(pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py
Fail to import hypothesis in common_utils, tests are not derandomized
.
Ran 1 test in 1.090s
OK

Differential Revision: D19650809

fbshipit-source-id: 387a8f0e3cc3111fd3dadd3d54c90fc8c7774cf9
2020-01-31 18:20:38 -08:00
144eb59756 [rpc] don't crash callee when function does not exist on it, instead return Exception (#32726)
Summary:
Closes https://github.com/pytorch/pytorch/issues/27368.
Previously, if a function `'func` did not exist on worker A but existed in B, and the user ran `rpc.rpc_sync(A,  func)`, A would crash with a segmentation fault since it is not able to find the function. B would eventually timeout since RPCs by default time out in 60s.

At the root this comes from an unhandled exception when trying to deserialize the `PythonUDF` to run.

This PR makes it so that we can recover from this error, and A reports back a `RemoteException` to B indicating that the function was not found. Now, A will no longer crash and B can handle the exception appropriately and with more information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32726

Differential Revision: D19648825

Pulled By: rohan-varma

fbshipit-source-id: 53847f4bfb68187db41c61d69ddac13613e814b4
2020-01-31 18:02:12 -08:00
a8d39a7937 Updating submodules
Summary:
GitHub commits:

e0fd90427f
c892e21dc6
3cdc99f2b2
800d24ddc5
74326cdb3c
e4af160c09
6c2fb05f6d
a0555ecf37
e4122f77fc

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 9e3e0a7231c3e5cc0167cd935541dd7a8a4ea84d
2020-01-31 17:56:39 -08:00
4493b10500 [PyTorch] Gate out mobile operator logging observer.
Summary: Introduce separate gating for mobile operator logging observer.

Reviewed By: ljk53

Differential Revision: D19665993

fbshipit-source-id: b81a228c55110a02edb8c2b6f9fd02e750b2ad69
2020-01-31 17:25:53 -08:00
10bd21d550 [JIT] fix nested select assign (#32877)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/31902

```
self.sub.a = 1
 ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32877

Differential Revision: D19670322

Pulled By: eellison

fbshipit-source-id: 6d8f350b4d1169be1d2a56050fccd7c246ad9212
2020-01-31 16:58:26 -08:00
ad78c0f4fc Fixed the flaky test_rref_context_debug_info (#32749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32749

The test was flaky since the message from owner RRef confirming fork would arrive after the test checked whether the pending User RRefs map was empty - leading to an assertion error. This diff creates a utility function that should be used by any test to wait for this message to complete processing before doing any assertions related to the pending User RRefs map.

GitHub Issue: https://github.com/pytorch/pytorch/issues/30988

Test Plan: Stress tested `test_rref_context_debug_info` 200 times.

Differential Revision: D19612289

fbshipit-source-id: 57a7c19b1cf792b94c263d3efbbbb6da60c07d07
2020-01-31 16:53:18 -08:00
d03c9aaa05 Fix upsampling test case on ppc (#32786)
Summary:
Power and x86 are giving slightly different results when scaling images up using `torch.nn.functional.interpolate` and when using OpenCV's `resize`. This is causing `test_upsampling_not_recompute_scale_factor` to fail on Power, but not x86. This changes the expected value to what OpenCV on Power produces if the test case is running on Power as well.

See https://github.com/pytorch/pytorch/issues/31915

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32786

Differential Revision: D19672053

Pulled By: ezyang

fbshipit-source-id: 3497f852bdc6d782646773792f9107c857c7b806
2020-01-31 16:40:56 -08:00
fe01376ffe [JIT] namedtuple constants (#32873)
Summary:
If there was a namedtuple with immutable constant inputs, that was also the input / output of a function which expected a namedtuple it would fail. Fix by using namedtuple constructor on serialization. (no one has run into this bug yet).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32873

Differential Revision: D19668807

Pulled By: eellison

fbshipit-source-id: bae33506e53b6a979b4e65a3e7c989b1408c98f4
2020-01-31 15:25:31 -08:00
fbe121e395 Quantized sigmoid function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31851

Test Plan: Imported from OSS

Differential Revision: D19280716

Pulled By: z-a-f

fbshipit-source-id: f47d37e32a675756fcaca293e2c14f90c43891de
2020-01-31 14:40:21 -08:00
7b65acdf9e Solves Issue #32750 - torch.prod now works fine with FP16 Input Tensor and FP32 Output Tensor (#32831)
Summary:
This PR solves Issue https://github.com/pytorch/pytorch/issues/32750.

- Changes function prod_kernel_impl to use `out_t` argument instead of `scalar_t` (which caused the garbage output for FP16 input and FP32 output tensor type).
- Adds test case for `torch.prod` (for CUDA): tests both `torch.prod` and `torch.tensor.prod`. Checks all the combinations for dtypes: `torch.float16` and `torch.float32`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32831

Differential Revision: D19664666

Pulled By: ngimel

fbshipit-source-id: c275363355c832899f10325043535949cd12b2f8
2020-01-31 14:25:08 -08:00
8ddd5bb0e9 Don't serialize None values in observer (#32733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32733

Similar to https://github.com/pytorch/pytorch/pull/32318, we should stop serializing None values since they can't be broadcasted

Test Plan: Imported from OSS

Differential Revision: D19611586

Pulled By: jerryzh168

fbshipit-source-id: 369881de0567ed8eb25bdada892227f49bb5b29d
2020-01-31 13:28:43 -08:00
1760d5b83c Remove wrap_dim from codegen layer. (#32738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32738

This is to simplify the codegen layer, with the goal of making it simple enough to just check in.

Test Plan: Imported from OSS

Differential Revision: D19610927

Pulled By: gchanan

fbshipit-source-id: 760734f579b1f655775e6d270918c361985f3743
2020-01-31 13:13:35 -08:00
660a93c558 Code cleaning: Some iterating variables in builtin_functions.cpp can be const (#32852)
Summary:
To suppress a clang-tidy warning:

    torch/csrc/jit/script/builtin_functions.cpp#L89

    [performance-for-range-copy] warning: loop variable is copied but only
    used as const reference; consider making it a const reference

Also make the const qualifier of scalar explicit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32852

Differential Revision: D19663277

Pulled By: ezyang

fbshipit-source-id: f4ec5688d3cbea9a5f40db6063b7d111b0bf0cce
2020-01-31 12:55:20 -08:00
ada966b7d7 [pytorch] avoid thread_local std::vector<Call> for mobile build (#32849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32849

We learned that Android NDK's gcc + gnustl combination might produce a
use-after-free for thread_local variables with non-trivial destructors.

This PR removes such a thread_local use case from error_report.cpp for mobile build,
which is the only case included in mobile lite-JIT build.
ghstack-source-id: 97491327

Test Plan: - CI

Reviewed By: dreiss

Differential Revision: D19652702

fbshipit-source-id: ee8d316ad5c6e6c8a8006eb25f3bba1618dd7e6d
2020-01-31 12:48:57 -08:00
d9e99ab544 Loops.cuh legacy code cleanup -- gpu_kernel_with_index (#32777)
Summary:
I didn't see any use case where the functor of `gpu_kernel_with_index` needs to have argument other than the index. Merge conflict with https://github.com/pytorch/pytorch/pull/32755.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32777

Differential Revision: D19646381

Pulled By: ngimel

fbshipit-source-id: 81d2be74170457e39943274e3689845e83758bfa
2020-01-31 12:02:50 -08:00
fd3bd7777d Updating submodules
Summary:
GitHub commits:

01fc273e29
53222db222
dea724242e
3dd493b166
ec496347bc
03f4ec299e

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: e362b5df2099f1c3dd2ef7702d4bbd5bb85e4b27
2020-01-31 11:54:30 -08:00
b16dab8a41 Coding header is better specified in lowercase letters (#32850)
Summary:
The Python document <https://www.python.org/dev/peps/pep-0263/> gives
all examples using lowercase letters. Although it doesn't say
straightly, the following paragraph seems to indicate that uppercase
letters aren't legitimate:

> If a source file uses both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'.  Any other encoding will cause an error.

My Emacs also complains about the uppercase letters every time I save
the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32850

Differential Revision: D19663281

Pulled By: ezyang

fbshipit-source-id: 48127d3c2fd6e22dd732a2766913735136ec2ebc
2020-01-31 10:02:30 -08:00
22466552e3 Updating submodules
Summary:
GitHub commits:

edc4a4f551
72c7112964
62c8286307

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 92dd070a28091dda81e315591d6d12cddfecf00f
2020-01-31 10:01:15 -08:00
ed10408cc6 Updating submodules
Summary:
GitHub commits:

a3394d248c
91f92d0106
e50c78af57
d49bb54c3d
504fda5cda
42086f8764
d5b454a9c0
0e31e0a8b0

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 7ce9d3444d653c6889ffe080425aa082c33f137a
2020-01-30 22:05:39 -08:00
03557a9838 Make save_for_lite_interpreter private (#32771)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32771

It's a patch to #32621, make the api private.

Test Plan: Imported from OSS

Differential Revision: D19657307

Pulled By: iseeyuan

fbshipit-source-id: e604a0cbed6a1e61413daaafc65bea92b90f1f5d
2020-01-30 21:01:54 -08:00
c3b4bfcfed Add knobs to set the number of profiling runs and bailout depth (#32735)
Summary:
Diagnostic API to simplify debugging and experiments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32735

Differential Revision: D19626708

Pulled By: Krovatkin

fbshipit-source-id: aa8c0da94d4559329fd7c8093329aea4e0271b6a
2020-01-30 18:50:56 -08:00
12bcfa7c77 Remove Python dependency (toPyTuple/fromPyTuple, jitCompilationUnit, deserialize) in rref_impl.h/cpp (#32753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32753

Functions to be bound as an Aten operator could not have Python dependency.

This is to refactor and remove Python dependency.
ghstack-source-id: 97485800

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call
```

Differential Revision: D5741675

fbshipit-source-id: 31ee60955be8d815d0773f3699e3ff2f1f9d8849
2020-01-30 17:52:48 -08:00
29fabb1fbc make tests for empty inputs check zero parameter grads (#32820)
Summary:
Make batch norm with empty inputs return zero parameter gradients. Now batch norm, group norm and convolutions now return zero grads for parameters, so make tests check that. Fixes some bullet points in https://github.com/pytorch/pytorch/issues/12013 (interpolate is not fixed by this PR, is being fixed in other PRs)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32820

Differential Revision: D19651470

Pulled By: ngimel

fbshipit-source-id: 96fdd085f9b0e98e91217dd2ac1f30f9c482b8be
2020-01-30 17:42:55 -08:00
bc2e05a398 Update Docs for building PyTorch for Android.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32578

Reviewed By: ljk53

Differential Revision: D19588904

Pulled By: dreiss

fbshipit-source-id: 2934752b9c5b94f2f141417669d8385be44d703b
2020-01-30 17:12:03 -08:00
fcf9fcedf4 Remove needs_dynamic_casting from TensorIterator and move it to Loops.cuh (#32755)
Summary:
Remove `needs_dynamic_casting` from TensorIterator and move it to `Loops.cuh`.

The original design of `needs_dynamic_casting` is fundamentally flawed: it injects logics into TensorIterator and uses a bunch of boolean values to test whether the dynamic casting is needed. This makes it very fragile, as the TensorIterator is so complicated and it is easy to introduce unnecessary dynamic casts. It also makes the `gpu_kernel` very unflexible, differently cases needs to manipulate TensorIterator to make it work.

For example, currently
```python
torch.zeros(10, device='cuda').mul_(0.9)
```
needs dynamic cast, but it shouldn't.

Testing whether dynamic casting is needed could be easy: just compare the dtypes of the lambda with the dtypes of operands. If they don't match, then dynamically cast, otherwise don't cast.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32755

Differential Revision: D19644092

Pulled By: ngimel

fbshipit-source-id: 130bb8bd78d20c2ed1bdfc9d9fb451eb0f0c7e55
2020-01-30 17:06:23 -08:00
0f0972051a Cudnn bn size fix (#32763)
Summary:
Should fix https://github.com/pytorch/pytorch/issues/29744 by falling back to native batch norm implementation, if cudnn cannot execute the provided shape.

Shape numbers were verified for cudnn 7.6.5.32 with tensor shapes:
```python
# for spatial bn
x = torch.Size([880801, 256, 5])
x = torch.Size([65535, 256, 5])
x = torch.Size([880801, 64, 4, 4])
x = torch.Size([65535, 64, 4, 4])

# for per-act bn
x = torch.Size([131070, 2048])
x = torch.Size([262136, 2048])
```
for `training()` and `eval()` mode using `torch.float32` and `torch.float16`.

I've increased the shape of our current smoke test to, but I can also add all use cases of the support matrix, if wanted.

CC ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32763

Differential Revision: D19644328

Pulled By: ngimel

fbshipit-source-id: c2151bf9fe6bac79b8cbc69cff517a4b0b3867aa
2020-01-30 16:57:15 -08:00
bcb7c22679 [PyTorch BC] Fix the ci (#32843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32843

fix the ci by skipping aten::join

Test Plan: ci

Reviewed By: hl475

Differential Revision: D19650584

fbshipit-source-id: 4446eef568ded334217ff9205a795daffebe41a1
2020-01-30 16:05:03 -08:00
5380e16db9 Updating submodules
Summary:
GitHub commits:

73638a8795
7a83deaa83
969d173d11

Test Plan: n/a

Reviewed By: wittgenst

fbshipit-source-id: 399ed7a972876727a6bfd1409667c735c406fef5
2020-01-30 15:41:49 -08:00
765904f1b9 [torch] fd error check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32797

Differential Revision: D19642262

Pulled By: mrshenli

fbshipit-source-id: 1720812166dd583dca6d72cb7e24b65ec013a62b
2020-01-30 15:30:03 -08:00
94ddc2c462 Resubmit more code fakefp16 mapping unification (#32798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32798

ATT

Test Plan: unittests

Reviewed By: amylittleyang

Differential Revision: D19632251

fbshipit-source-id: 670004050d67415bb24392f3520afa32b64ce740
2020-01-30 12:48:48 -08:00
690d41f24e Centralize addition of "always on" dispatch keys. (#32734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32734

VariableTensorId is the only key with this treatment today,
but BackendSelect and CompoundOp are coming soon.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19628091

Pulled By: ezyang

fbshipit-source-id: 250753f90528fa282af7a18d8d2f7736382754bd
2020-01-30 11:49:40 -08:00
5ddd2cd92b Make DispatchKeyGuards accept DispatchKey::Undefined (#32729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32729

When working on the vmap prototype I noticed that this was helpful
as it lets me easily initialize a no-op guard, if I need to do it
at constructor time (which I usually do, because the guards don't
have move constructors).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19628092

Pulled By: ezyang

fbshipit-source-id: d6259a3f70d287cdac2e4a5f3984e2880f19bdc2
2020-01-30 11:49:35 -08:00
3d0a470d89 Rename DispatchKey::UndefinedTensorId to Undefined (#32728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32728

It doesn't have much to do with tensors anymore.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19628093

Pulled By: ezyang

fbshipit-source-id: 4d57111cdf44ba347bec8a32bb5b4b47a83c1eaf
2020-01-30 11:47:40 -08:00
a40a19ccab Remove GIL from RRefContext (#32807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32807

After this commit, RRefContext no longer depends on pybind.

Test Plan: Imported from OSS

Differential Revision: D19636316

Pulled By: mrshenli

fbshipit-source-id: 88faa101c32e9019e979ae8e5da6706e49842726
2020-01-30 10:53:25 -08:00
413c0f6c29 Fixes moving after weight norm application (#32563)
Summary:
This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN.

One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563

Differential Revision: D19602725

Pulled By: mruberry

fbshipit-source-id: d8f9441d17815c8c9ba15b256d4be36f784a3cf9
2020-01-30 10:31:11 -08:00
9bab617b3e Make python version a parameterizable option for Windows CI.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32823

Differential Revision: D19642347

Pulled By: ezyang

fbshipit-source-id: a4d461aa29a06bb7f5e5d359a2df2c90e9a4fd41
2020-01-30 08:16:43 -08:00
cc35c876cb Fix backcompat for linear_relu_dynamic_fp16 (#32803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32803

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#32803 Fix backcompat for linear_relu_dynamic_fp16**

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D19642281

Pulled By: albanD

fbshipit-source-id: 3b6ae4dd81bf8a70dd81ccbb02fffd7653bbd08c
2020-01-30 08:08:29 -08:00
fa65859270 Re-enable non-deterministic autograd tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32793

Test Plan: Imported from OSS

Differential Revision: D19634632

Pulled By: albanD

fbshipit-source-id: 9dda29536c2ed4afb81ecbea471ba615241bbac2
2020-01-30 08:00:19 -08:00
85bd3e5bdb Removing @expectedFailureXLA from test_nll_loss_empty_tensor_reduction_mean (#32701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32701

Because it's disabled in XLA(https://github.com/pytorch/xla/pull/1563)
Discussed in https://github.com/pytorch/xla/issues/1539

Test Plan: Imported from OSS

Differential Revision: D19633349

Pulled By: pbelevich

fbshipit-source-id: b9a81c976a96b325356ff210ff838dfcd5352db7
2020-01-30 07:38:12 -08:00
6874278985 Revert D19611800: [PyTorch][TorchScript] Add support for join on List of strings in TorchScript
Test Plan: revert-hammer

Differential Revision:
D19611800

Original commit changeset: cef66356abc1

fbshipit-source-id: 41af9e0de83b1fb808b17255ec905e137909457d
2020-01-30 06:46:28 -08:00
b0923acb29 Reduce RPC branches for Python/BuiltinOp/TorchScript (#32689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32689

As described in https://github.com/pytorch/pytorch/issues/32565
ghstack-source-id: 97440343

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call
```

Differential Revision: D5721814

fbshipit-source-id: 9079e81764be1e7c7b85dd72a18c76f3ecfd2547
2020-01-30 01:19:35 -08:00
affd598c1f Fix/simplify alias annotation handling in op codegen. (#32574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32574

Previously, we ignored alias annotations when deriving argument mutability
and instead recognized particular signature patterns (in-place, out variant)
and assigned mutability accordingly. Op signatures that didn't fit these
patterns would error (e.g. see #30526, which this fixes).

No change in the generated binding code.

Code changes:
1. in function_wrapper.py, fix the mutability derivation logic used when creating an argument's c++ type property. Note that we temporarily need to trap a special case and apply the old logic, see code comment for details.

2. in gen_jit_dispatch.py, update logic that assumed only one mutable Tensor argument per declaration. Happily this mostly was accomplished by bypassing some now-redundant signature regeneration machinery. Another special case here requires that we keep the old machinery around temporarily.

Test Plan: Imported from OSS

Differential Revision: D19564875

Pulled By: bhosmer

fbshipit-source-id: 5637a9672923676d408c9586f3420bcc0028471a
2020-01-30 00:31:03 -08:00
fb159b5236 Some work on eager op binding codegen (gen_python_functions.py) (#29986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986

Previously in addition to generating a python binding for each op,
we would generate an almost-trivial helper for each overload.
This PR eliminates the helpers, simplifying codegen logic a bit and
reducing the source-level indirection by a step.
Perf should be unchanged.

codegen diff: 1f2f07fb60

Note: in the interests of keeping the diff contained, there's only
some light cleanup here beyond what's necessary for the codegen changes.
Plan is to do some more substantial refactoring in followup PRs that
leave generated code unchanged.

Test Plan: Imported from OSS

Differential Revision: D18567980

Pulled By: bhosmer

fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906
2020-01-30 00:29:53 -08:00
821b6aa769 [pytorch] Minor: avoid acquiring GIL twice in PyRRef::localValue() (#32785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32785

Add PythonRpcHandler::handleExceptionWithGIL() so that in PyRRef::localValue(),
we don't need to release the GIL and re-acquire the following line.
ghstack-source-id: 97418465

Test Plan: existing test coverage

Differential Revision: D19626195

fbshipit-source-id: db694d04b078811f819626789e1e86f1b35adb5b
2020-01-29 21:27:43 -08:00
c2d736cefb Add support for Dynamic LSTM quantization on Mobile (#32757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32757

This PR updates the main quantize_dynamic API to use QNNPACK backend for mobile

Test Plan:
python test/test_quantization.py PostTrainingDynamicQuantTest.test_quantized_rnn

Imported from OSS

Differential Revision: D19632220

fbshipit-source-id: b4c51485c281d088524101b97c84dd806438b597
2020-01-29 20:55:48 -08:00
55c382e62b Fixed access to element in size tensor for scripting (#32652)
Summary:
when using scripting, there was an error in attempting to access a
specific element from within the size tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32652

Reviewed By: hl475

Differential Revision: D19610726

Pulled By: houseroad

fbshipit-source-id: bca49927bbe71dbe7e7d7edf301908fe79e089b5
2020-01-29 18:33:46 -08:00
8ead65a946 [PyTorch][TorchScript] Add support for join on List of strings in TorchScript
Summary: Add support for join on List of strings in TorchScript.

Test Plan:
(pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py
Fail to import hypothesis in common_utils, tests are not derandomized
.
----------------------------------------------------------------------
Ran 1 test in 1.090s

OK

Differential Revision: D19611800

fbshipit-source-id: cef66356abc14dfd100a806d25dd1a8bc9af0a11
2020-01-29 18:22:52 -08:00
cccf5e7011 Resolve rendezvous race condition
Summary:
When running the ctr_mbl_feed, we've encountered hang issue related to the rendezvous handshake based on zeus. It was mitigated by this diff https://our.intern.facebook.com/intern/diff/D19167151/.

This diff resolves the race condition by adding a reference to the rendezvous handler.

Test Plan: x7340282797

Reviewed By: yifuwang

Differential Revision: D19627293

fbshipit-source-id: 560af289db8ef6cf8d6f101f95ec27d5a361fd04
2020-01-29 17:49:07 -08:00
3552be1090 [jit] fix the NoneType param/buffer hack (#32745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32745

Some parameters (like `bias` in conv) are optional. To achieve this
previously, you had to add `bias` as a constant, which would invoke some
pretty weird behavior in the frontend, summarized as:
```
if bias is not None:
  add it as a parameter normally
else: # bias is None
  add it as a constant with the value None
```

There are several things bad about this:
1. Bias is not a constant. Marking it `__constants__` is confusing.
2. It basically relies on an implementation detail (the frontend
processes parameters before constants) to work.

Okay, whatever. I don't even know why we did this originally, but
getting rid of it doesn't break anything, so I assume improved NoneType
refinement has made this a non-issue.

Note on perf: this will make no difference; if bias was `None` it's still
folded out today, if bias is a Tensor it would be added as a parameter
both before and after this change

Test Plan: Imported from OSS

Differential Revision: D19628634

Pulled By: suo

fbshipit-source-id: d9128a09c5d096b938fcf567b8c23b09ac9ab37f
2020-01-29 17:04:39 -08:00
2e359ef86d enable empty batch for all flavor of convolutions (#32709)
Summary:
resubmitting https://github.com/pytorch/pytorch/issues/32612 after a merge gone wrong. Enables convolution with an empty batch or number of channels for all flavors of convolution (grouped convolution, convTranspose). Would make https://github.com/pytorch/pytorch/issues/31658 unnecessary. Also returns zero gradients for the parameters, that's necessary for correct DDP operation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32709

Differential Revision: D19627968

Pulled By: ngimel

fbshipit-source-id: 7359759bd05ff0df0eb658cac55651c607f1b59f
2020-01-29 16:33:48 -08:00
a840afbeb4 [pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32683

Pull Request resolved: https://github.com/pytorch/glow/pull/4079

Similar to D17768404, we changed the EmbeddingBag operator for 8-bit fused version to add the option to include the last offset and parallelize the op.
ghstack-source-id: 97404645

Test Plan:
To generate the AVX2 code (`embedding_lookup_fused_8bit_rowwise_idx_avx2.cc`):
```
python hp_emblookup_codegen.py --fused --use-offsets
```

To test the correctness:

```
buck test //caffe2/torch/fb/sparsenn:test -- test_embedding_bag_byte_rowwise_offsets  --print-passing-details
```

Reviewed By: yinghai

Differential Revision: D19592761

fbshipit-source-id: f009d675ea3f2228f62e9f86b7ccb94700a0dfe0
2020-01-29 16:04:56 -08:00
b565d9b356 Logspace fixes (#32744)
Summary:
Reopening of PR https://github.com/pytorch/pytorch/issues/32631 with `viable/strict` base for testing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32744

Differential Revision: D19626090

Pulled By: ngimel

fbshipit-source-id: ed0fc759198ee2edc23afdcb1e190a11d70ec4c8
2020-01-29 15:17:00 -08:00
fc2ff7912f [quantization] Remove incorrect fp16 dynamic linear/relu op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32774

Test Plan: Imported from OSS

Differential Revision: D19624471

Pulled By: jamesr66a

fbshipit-source-id: eb6cb11fabf2ddd5edf345aff35b86b83c3af94c
2020-01-29 14:50:24 -08:00
9357b91180 Remove -Werror from test/cpp_extensions/setup.py (#32704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32704

-Werror is too aggressive check for test cpp extensions because it fails even on deprecation warnings which is are included from core codebase.

Fixes #32136

Test Plan: Imported from OSS

Differential Revision: D19620190

Pulled By: pbelevich

fbshipit-source-id: 0e91566eb5de853559bb59e68a02b0bb15e7341b
2020-01-29 14:12:32 -08:00
8b187e8f2a Fix ivalue_inl.h:353:29: warning: comparison of unsigned expression >= 0 is always true (#32778)
Summary:
`slot` is unsigned integer which is `always >= 0`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32778

Differential Revision: D19625789

Pulled By: ngimel

fbshipit-source-id: c92c35c65d4372be934283e87aeba99e9e0ef353
2020-01-29 14:04:05 -08:00
c47c78d0bf Revert D19597036: More code fakefp16 mapping unification
Test Plan: revert-hammer

Differential Revision:
D19597036

Original commit changeset: deed61945884

fbshipit-source-id: c057e57810a99464aefb00b645613ecd6a7c5533
2020-01-29 13:32:42 -08:00
3ee6673e99 Refreshing numel on a stride update is pointless. (#32116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32116

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19579875

Pulled By: ezyang

fbshipit-source-id: 00393c9dc101967c79231bfae36b23b7b80135fb
2020-01-29 13:26:28 -08:00
8c6f52ac24 Delete resize_dim() (#32114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32114

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19579876

Pulled By: ezyang

fbshipit-source-id: d09a231ba891403a06eae0c2203e0ad7dd6d3a12
2020-01-29 13:26:23 -08:00
b371eab8c7 Expunge last two sites of resize_dim (#32112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32112

It turns out we already removed these from the CPU version; copy
the changes over.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19579874

Pulled By: ezyang

fbshipit-source-id: e40efbf94e128fd81421b227b76dd9c9c0256d96
2020-01-29 13:25:22 -08:00
c7df28a2a3 Delete copy/move constructors on these RAII guards. (#32727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32727

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19621858

Pulled By: ezyang

fbshipit-source-id: 5112c849252478d8249de4f8c8c5a2d6caf60672
2020-01-29 13:20:15 -08:00
5ffa1efa52 Add missing C10_API to dispatch key TLS setter/getters (#32557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32557

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19579853

Pulled By: ezyang

fbshipit-source-id: 45f83a7a5ead0344e4c13526abb5fafdedaed4a4
2020-01-29 13:20:09 -08:00
3b47922855 Improve documentation in dispatcher; remove unnecessary optional (#32533)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32533

Applies renames based on comments in #32439.  I also updated some
other documentation and variable names while I was at it.

Fixes #32435.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19579854

Pulled By: ezyang

fbshipit-source-id: 85021a92a2a84501f49ee5c16318f81f5df64f8d
2020-01-29 13:18:29 -08:00
8cb05e72c6 Port BCELoss to ATen to increase accuracy (#31365)
Summary:
Fixes issue https://github.com/pytorch/pytorch/issues/24933
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31365

Differential Revision: D19557712

Pulled By: ezyang

fbshipit-source-id: 3ae78c949b2f6c21b294d986d28e09daa9b0c526
2020-01-29 12:58:37 -08:00
50d82f5122 Make VC++ version a parametrizable option for Windows CI. (#32043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32043

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19621910

Pulled By: ezyang

fbshipit-source-id: dce00a56ff679548fd9f467661c3c54c71a3dd4e
2020-01-29 12:11:47 -08:00
e84f9d9d0c Fix TensorProtosDBInput AttributeError (#32274)
Summary:
https://github.com/pytorch/pytorch/issues/6794
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32274

Differential Revision: D19621889

Pulled By: ezyang

fbshipit-source-id: 1bdd042b6421a2798c7f1e9030dfc6dfc1246989
2020-01-29 12:05:43 -08:00
8693164acb Randomize xla port (#32718)
Summary:
fixes https://github.com/pytorch/pytorch/issues/30717
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32718

Differential Revision: D19607998

Pulled By: ailzhang

fbshipit-source-id: 81ba9c7c71988a64cdc8fa5500967509657438fe
2020-01-29 12:04:01 -08:00
b5d8982ae2 clean up GIL usuage (#32748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32748

This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so
1. pyRef::to_here() and pyRef::local_value() added GIL
2. pyRef::pickle and pyRef::unpickle() added GIL release tag
3. in request_callback_impl, also added GIL as needed
4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function
ghstack-source-id: 97373011

Test Plan: unit test

Differential Revision: D19612337

fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890
2020-01-29 11:58:46 -08:00
eab99ab08e [android] fbjni DoNotStrip annotation for oss native methods (#32567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32567

As a first change to support proguard.
even if these methods could be not called from java, on jni level we register them and this registration will fail if methods are stripped.

Adding DoNotStrip to all native methods that are registered in OSS.

After integration of consumerProguardFiles in fbjni that prevents stripping by proguard DoNotStrip it will fix errors with proguard on.

Test Plan: Imported from OSS

Differential Revision: D19624684

Pulled By: IvanKobzarev

fbshipit-source-id: cd7d9153e9f8faf31c99583cede4adbf06bab507
2020-01-29 11:52:53 -08:00
2471ddc96c Improved speed of frobenous norm for non-complex dtype (#30871)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex)

Changes:
[x] Fixed performance issue raise in https://github.com/pytorch/pytorch/issues/30704 so that non-complex numbers do not call `conj()` and `real()`.
[x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in https://github.com/pytorch/pytorch/issues/27064.
[x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`.
    - `checkBackend()` is replaced with a device type check and a layout check.
    - This ensures the ComplexCPU Type ID is supported.
[x] Added AVX support for complex `exp()`, as requested in https://github.com/pytorch/pytorch/issues/755
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30871

Differential Revision: D19200726

Pulled By: ezyang

fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277
2020-01-29 11:43:53 -08:00
b1c85dd916 Custom RNG DispatchKey (#32325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32325

The purpose of this PR is to enable PyTorch dispatching on `at::Generator*` parameters and demonstrate how it can be used in cpp extensions to implement custom RNG.
1. `CustomRNGKeyId` value added to DispatchKey enum and `DispatchKeySet key_set_` added to `at::Generator`
2. The overloaded `operator()(at::Generator* gen)` added to MultiDispatchKeySet.
3. The existing CPUGenerator and CUDAGenerator class are supplied with CPUTensorId and CUDATensorId dispatch keys
4. The implementation of CPU's `cauchy_kernel`(as an example, because it's already moved to ATen) was templatized and moved to `ATen/native/cpu/DistributionTemplates.h` to make it available for cpp extensions
5. Minor CMake changes to make native/cpu tensors available for cpp extensions
6. RegisterCustomRNG test that demonstrates how CustomCPUGenerator class can be implemented and how custom_rng_cauchy_ native function can be registered to handle Tensor::cauchy_ calls.

Test Plan: Imported from OSS

Differential Revision: D19604558

Pulled By: pbelevich

fbshipit-source-id: 2619f14076cee5742094a0be832d8530bba72728
2020-01-29 11:30:04 -08:00
642c9ef922 More code fakefp16 mapping unification
Summary: ATT

Reviewed By: amylittleyang

Differential Revision: D19597036

fbshipit-source-id: deed61945884fb4b01d058f3c72c75f5a937a41c
2020-01-29 11:01:24 -08:00
d119de8abd Deduplication of type casting codes (#32730)
Summary:
These codes are implemented twice at different places by different people, we should merge them together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32730

Differential Revision: D19622023

Pulled By: ezyang

fbshipit-source-id: a9cbda31428b335bf28a7e4050f51f58e787b94f
2020-01-29 10:13:15 -08:00
cbb744f00f apply linter to rpc test files (#32659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32659

Applies linter to RPC test files so that we can use linter shortcuts
without getting unnecessary changes to the whole file.
ghstack-source-id: 97361237

Test Plan: No actual changes.

Differential Revision: D19584742

fbshipit-source-id: a11ce74ee0e2817e6f774fff7c39bcab06e99307
2020-01-29 09:49:45 -08:00
8bc889e502 Fix crash of SobolEngine if default tensor type is cuda (#32496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32496

Addresses https://github.com/pytorch/pytorch/issues/32494

Test Plan:
```
import torch
from torch.quasirandom import SobolEngine

torch.set_default_tensor_type(torch.cuda.FloatTensor)
se = SobolEngine(3)
```

Reviewed By: 2timesjay

Differential Revision: D19517571

fbshipit-source-id: 02eb499ffbd4260474d348e9bb536fb8c36c2c31
2020-01-29 08:49:18 -08:00
c7bf4d22fe added exception args to the returned error message (#32693)
Summary:
addresses https://github.com/pytorch/pytorch/issues/32692
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32693

Differential Revision: D19606757

Pulled By: mrshenli

fbshipit-source-id: 79fc09f8bb6a33e1b73ce0bbc45387544c7adc1b
2020-01-29 08:26:27 -08:00
c35ca84eee Get rid of some unused THGenerate*Type defines. (#32657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32657

The goal here is to simplify the codegen enough that we can just handwrite the bindings, so anything in here is "bad".

Test Plan: Imported from OSS

Differential Revision: D19584521

Pulled By: gchanan

fbshipit-source-id: 93005b178228c52a1517e911adde2e2fe46d66a5
2020-01-29 08:12:45 -08:00
594cadeb8f Make sure temporary vectors are properly initialized in avx2 code (#32722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32722

Checked using [this](https://godbolt.org/z/uAaE9R) that it gives the correct assembly.

Test Plan: Imported from OSS

Differential Revision: D19610012

Pulled By: albanD

fbshipit-source-id: 4d1cb812951ae03d412a0fba3c80730f0d286e1f
2020-01-29 07:58:25 -08:00
5e2311033e fix windows build (#32762)
Summary:
remove windows visibility macro
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32762

Differential Revision: D19616367

Pulled By: eellison

fbshipit-source-id: d824162fe92bff4cb2b1a170312cd14b6d7bd99d
2020-01-28 22:55:48 -08:00
fd850685da Updating submodules
Summary:
GitHub commits:

b81d0657df

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 82d39025e331083e58c0d0cc9b47985e590bb289
2020-01-28 21:03:34 -08:00
62d652f922 replaces .at with [] in getSlot (#32677)
Summary:
per title. cc qizzzh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32677

Differential Revision: D19596094

Pulled By: ngimel

fbshipit-source-id: 06177b9e12d203d84b541205437ef2ad51db0fac
2020-01-28 20:49:03 -08:00
c729614997 [JIT] Improve May Contain Alias Using Contained Elements (#32326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32326

Now that we have type-level granularity we can improve `mayContainAlias` queries. Each new values is initialized as containing the wildcard set of each contained mutable type. Whenever a value is added to a container it is set to the wildcard set. Now, to check if any two values contain overlapping values, we can just check if the `containedMemoryLocations` of two sets overlap.

Test Plan: Imported from OSS

Differential Revision: D19563262

Pulled By: eellison

fbshipit-source-id: c6d7489749c14b2054a6d50ef75baca699ada471
2020-01-28 18:08:56 -08:00
25d33a2ee8 [JIT] Use Type Level Granularity in Alias Analysis Wildcards (#32251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32251

Previously wildcard sets were associated by TypeKind, meaning all Lists were in one alias set, all Classes were in one alias set, etc. We can improve analysis by bucketing wildcard sets by TypePtr instead. Any two mutable types which can unify should be in the same wildcard set bucket.

This also allows us do much simpler `mayContainAlias` analysis, and also improves `analyzeConservative` analysis because now we can recurse through all contained memory locations and mark writes, instead of just recursing only level deep in contained elements.

Test Plan: Imported from OSS

Differential Revision: D19563263

Pulled By: eellison

fbshipit-source-id: 371a37d1a8596abc6c53f41c09840b6c140ea362
2020-01-28 18:07:48 -08:00
02f055ffd9 Add mapping for FbFCPacked in fakefp16 transform
Summary: ATT. Since the infra is there.

Test Plan: run it

Reviewed By: amylittleyang

Differential Revision: D19605250

fbshipit-source-id: c68be4d7963afa4fa5f8f60c90f1913605eae516
2020-01-28 17:00:24 -08:00
18aab32959 Move exponential_ from TH to Aten (CPU) (#32501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32501

This diff will address https://github.com/pytorch/pytorch/issues/24699

We ask the input `lambda` to be >= 0 to be same as https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.exponential.html#numpy-random-exponential. This does not exist in the previous implementation.

Benchmark I am using PT operator microbenchmark
```
================================================================================
Before the change, Program Output:
================================================================================
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: exponential_
# Mode: Eager
# Name: exponential__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 21311.746

================================================================================
After the change, Program Output:
================================================================================
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: exponential_
# Mode: Eager
# Name: exponential__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 20919.914

================================================================================
```

Test Plan: Sandcastle and Github tests

Reviewed By: BIT-silence

Differential Revision: D19518700

fbshipit-source-id: 0e79cb6a999c1278eb08b0d94cf61b119c85a36c
2020-01-28 16:59:22 -08:00
1f78bd0774 [caffe2] Early error throwing for currupted embeddings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32717

Reviewed By: xianjiec

Differential Revision: D19604954

fbshipit-source-id: c02eccf048c0dba3f66d729ab1fda50f3cacef63
2020-01-28 16:55:29 -08:00
6f7d5bb3e1 Temporarily disable the test_quantized_rnn test (#32742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32742

As Title says (Check https://github.com/pytorch/pytorch/issues/32644).
ghstack-source-id: 97352793

Test Plan: CI

Differential Revision: D19611029

fbshipit-source-id: 9f4a155c909f419e41c1d7078eb2796dd17cedd2
2020-01-28 16:50:59 -08:00
43d31ae4c3 Added ONNX model checker to ONNX export (#32298)
Summary:
Included the ONNX model checker code in the ONNX export
this will force onnx checker to run for all models that get exported.
This should help with validating exported models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32298

Reviewed By: hl475

Differential Revision: D19538251

Pulled By: houseroad

fbshipit-source-id: eb20b124fe59200048f862ddaf20f6c59a0174d5
2020-01-28 16:28:54 -08:00
99228086a6 Added missing period in README.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32723

Differential Revision: D19607256

Pulled By: mlacayo

fbshipit-source-id: 2993014d4d90fa26acd5bc01ed7494cc43a29a62
2020-01-28 16:25:04 -08:00
e74e1ccc47 Use direct vector indexing in Object::getSlot() instead of at(). (#31627)
Summary:
This method is pretty hot.  In an internal workload, this single
call to at() accounted for ~2% of overall cycles.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31627

Reviewed By: yinghai

Differential Revision: D19607779

Pulled By: qizzzh

fbshipit-source-id: 1684919049a35fdad686d8396c7dce7243ab92d4
2020-01-28 16:17:16 -08:00
ee60cd9124 Back out "fix view listing in autograd codegen" (#32720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32720

Original commit changeset: 5ebc4c978af5

Test Plan: existing tests

Reviewed By: chenyangyu1988

Differential Revision: D19603336

fbshipit-source-id: 56051a716c4eedf49cfe7367ff447b4b9c5429ea
2020-01-28 16:10:47 -08:00
2060e0a9dd Split serialization tests to their own file (#32241)
Summary:
Stacked PRs
 * #32244 - Make zip serialization the default
 * **#32241 - Split serialization tests to their own file**

This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`.
](https://our.intern.facebook.com/intern/diff/19415826/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32241

Pulled By: driazati

Differential Revision: D19415826

fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b
2020-01-28 15:04:05 -08:00
0327e75e14 Back out "[caffe2] use JIT'ed fp32 SLS" (#32711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32711

Original commit changeset: 4f29d34523ef

Test Plan: CI

Differential Revision: D19603967

fbshipit-source-id: af3f647fff416a84290a42217747948bac4d73c6
2020-01-28 14:07:11 -08:00
ffdcbadeaa Minor refactoring to improve code reuse (#32675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32675

It's good to have one location to do the mapping.

Test Plan: Everything still runs.

Reviewed By: amylittleyang

Differential Revision: D19590354

fbshipit-source-id: d8c0d14e4bdf27da3e13bd4d161cd135d6e3822b
2020-01-28 13:31:48 -08:00
9de3208449 [rpc][flaky-tests] fix for test_handle_send_exceptions and (#32656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32656

Fixes these flaky tests.

Test Plan: Run the test 500 times and verify that it succeeds every time.

Differential Revision: D19584453

fbshipit-source-id: 07cbc4914211f274182ac0fa74bb5ef6d43392d1
2020-01-28 12:40:12 -08:00
6e7e595c1d [rpc][easy] remove redundant test in rpc_test.py (#32588)
Summary:
Both `test_wait_all_workers` and `test_wait_all_workers_and_shutdown` test the same pattern of initialize RPC, call `_wait_all_workers`, and `rpc.shutdown(graceful=False)`.

`test_wait_all_workers` seems to be more thorough since it tests one worker driving and the others waiting on it as well.

We shouldn't have duplicate test so removing this `test_wait_all_workers_and_shutdown`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32588

Differential Revision: D19566294

Pulled By: rohan-varma

fbshipit-source-id: b69519d169b3964649d47ad75532bda5de538241
2020-01-28 11:55:17 -08:00
0ea65d63cf [JIT] Fix stateful lambda stuff and simplify code in custom C++ binding API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32658

Test Plan: Imported from OSS

Differential Revision: D19584701

Pulled By: jamesr66a

fbshipit-source-id: d556c7db2f32900eb1122348402789b59516a7d7
2020-01-28 11:03:04 -08:00
465ebd58ba [JIT] pickle serialization for custom bound classes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32604

Test Plan: Imported from OSS

Differential Revision: D19566633

fbshipit-source-id: 9387d3ff45cbd6ccde49ce190a52859481cc301c
2020-01-28 11:02:59 -08:00
34ccfba403 [JIT] Include custom_class.h in torch/script.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32586

Test Plan: Imported from OSS

Differential Revision: D19558716

fbshipit-source-id: be540d8ed7de0834e64be89ae621ae50befc83b0
2020-01-28 11:02:54 -08:00
06c19263d3 [JIT] Serialize attributes and types in ClassType serialization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32555

Test Plan: Imported from OSS

Differential Revision: D19544737

Pulled By: jamesr66a

fbshipit-source-id: 2256cfba414a850cdc986bb5872dd4cb177b456c
2020-01-28 11:02:49 -08:00
1719da13f9 [JIT] Support for registering C++ lambdas as methods on custom C++ class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32553

Test Plan: Imported from OSS

Differential Revision: D19543269

Pulled By: jamesr66a

fbshipit-source-id: 7e566650295e9d1c4f2f716470e061308a6210a0
2020-01-28 11:01:07 -08:00
da390914bd .circleci: Add workflows for Python 3.8 (#31948)
Summary:
Done by just editing `.circleci/cimodel/data/dimensions.py` to include `3.8` and then regenerated using `.circleci/regenerate.sh`

cc kostmo, mingbowan, ezyang, soumith

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31948

Differential Revision: D19602069

Pulled By: seemethere

fbshipit-source-id: ac57fde9d0c491c7d948a3f5944c3cb324d403c0
2020-01-28 10:26:03 -08:00
0dc38be407 consider FAIL_GUARD while counting indices for GUARDs (#32672)
Summary:
This handles a corner case when a user schedules second bailout after the first one and the first one doesn't fire.
Alternatively, we could go back to the implementation that uses a hash set to remember the indices of bailouts that need to fire.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32672

Differential Revision: D19596872

Pulled By: Krovatkin

fbshipit-source-id: 41dcc374cd2501ac20a9892fb31a9c56d6640258
2020-01-28 08:59:25 -08:00
c64dec1993 Python binding to export bytecode format for lite interpreter (#32621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32621

Export the "_save_for_mobile" method to Python so that the bytecode format for lite interpreter can be added or updated to the original script model.

It's the first step of python binding for lite interpreter, as discussed in this [internal post](https://fb.workplace.com/groups/1144215345733672/permalink/1478900738931796/) and offline.

Next step is to export the load_for_mobile and run method of mobile module, so that users could verify the mobile model from Python.

Test: use the following python script to display the bytecode part of the updated model file.
```
#!/usr/bin/env python3
import sys
import pickle
import pprint
import zipfile

class FakeObject(object):
    def __init__(self, module, name, args):
        self.module = module
        self.name = name
        self.args = args
        self.state = None

    def __repr__(self):
        state_str = "" if self.state is None else f"(state={self.state!r})"
        return f"{self.module}.{self.name}{self.args!r}{state_str}"

    def __setstate__(self, state):
        self.state = state

class FakeClass(object):
    def __init__(self, module, name):
        self.module = module
        self.name = name
        self.__new__ = self.fake_new

    def __repr__(self):
        return f"{self.module}.{self.name}"

    def __call__(self, *args):
        return FakeObject(self.module, self.name, args)

    def fake_new(self, *args):
        return FakeObject(self.module, self.name, args)

class DumpUnpickler(pickle._Unpickler):
    def find_class(self, module, name):
        return FakeClass(module, name)

    def persistent_load(self, pid):
        return FakeObject("pers", "obj", (pid,))

def main(argv):
    zfile = zipfile.ZipFile(argv[1])
    names = [i for i in zfile.namelist() if "bytecode.pkl" in i]
    if not names:
        print("bytecode.pkl not found.")
        return
    with zfile.open(names[0], "r") as handle:
        value = DumpUnpickler(handle).load()
    pprint.pprint(value)

if __name__ == "__main__":
    sys.exit(main(sys.argv))

```

Test Plan: Imported from OSS

Differential Revision: D19596359

Pulled By: iseeyuan

fbshipit-source-id: 19a4a771320f95217f5b0f031c2c04db7b4079a8
2020-01-28 08:30:20 -08:00
e24ce0e524 Kill some more unused code in function_wrapper.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32600

Test Plan: Imported from OSS

Differential Revision: D19565654

Pulled By: gchanan

fbshipit-source-id: 993c3dc5467639a7690109d07911951a165a412f
2020-01-28 07:38:51 -08:00
9a2691f2fc Fix spelling errors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32673

Differential Revision: D19597118

Pulled By: pietern

fbshipit-source-id: f88c1da7548fcee141ed248f5f49d25c1d639955
2020-01-28 04:46:15 -08:00
63170431f9 [jit] fix segfault on missing getstate (#32642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32642

Previously, if we defined `__setstate__` but not `__getstate__`, we
would segfault. This PR turns that into a comprehensible error message
(and improves another error message as well).

Fixes https://github.com/pytorch/pytorch/issues/25886

Test Plan: Imported from OSS

Differential Revision: D19596463

Pulled By: suo

fbshipit-source-id: dbe76bc36bc747d65fb0223184c009e0e9ba072c
2020-01-28 01:25:37 -08:00
8e4161517e div_kernel: throw when dividing by integer zero (#32629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/327
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32629

Differential Revision: D19595782

Pulled By: ezyang

fbshipit-source-id: f5bbb298f150efe63a698e8a0b53a84871d16560
2020-01-27 21:41:00 -08:00
b3848c568e Fix flaky test_nccl_timeout. (#32653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32653

This test was flaky since the watchdog thread could abort the
communicator instead of the thread calling `wait()`. As a result, we could
actually see `NCCL error` instead of `Operation timed out` on the user end.
ghstack-source-id: 97250714

Test Plan: waitforbuildbot

Differential Revision: D19583003

fbshipit-source-id: 5c07326d1a16f214dcdbabed97ca613e0a5b42b9
2020-01-27 21:09:40 -08:00
d68592a440 [JIT] Fix classes as attributes in recursive scripting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32594

Test Plan: Imported from OSS

Differential Revision: D19562951

Pulled By: jamesr66a

fbshipit-source-id: 3d5491c1c23456f107390a78be16da687de951e6
2020-01-27 20:37:48 -08:00
b9f764b1c7 Use the C++ current RpcAgent pointer to eliminate the unnecessary argument passing from Python world (#32635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32635

With the source of truth of current RPC agent moved to C++ world, there is no point of passing current RPC agent from Python world to C++ world.
ghstack-source-id: 97293316

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info
```

Differential Revision: D5703519

fbshipit-source-id: ef7c28bdb1efd293eb6cafe0b0fca7ca80fa08a6
2020-01-27 20:24:32 -08:00
666e5430f8 Clean up mvlgamma doc (including a weird way to link to reference) (#32667)
Summary:
Intentionally left blank
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32667

Differential Revision: D19594683

Pulled By: ezyang

fbshipit-source-id: 5a6eb0a74f569d3c0db2a35e0ed4b329792a18e4
2020-01-27 20:12:17 -08:00
db8ce7ea2d Back out "Make autogen functions correct for multiple outputs and views" (#32681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32681

Original commit changeset: a2b41c2d231e

Test Plan: fb and oss tests

Reviewed By: hudeven

Differential Revision: D19591864

fbshipit-source-id: 7068b5563e37bc9a5d415fd535c73fd9d71fe131
2020-01-27 19:54:34 -08:00
5c8535d5b0 Make C++ RpcAgent::currentRPCAgent_ the source of truth of current RPC Agent (#32633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32633

There were 2 sources of current RPC agent.

- One is in Python world, `torch.distributedrpc.api._agent`.
- The other is in C++ world, `RpcAgent::defaultRpcAgent_`

Setting Python `_agent` to `None`, does not necessarily reset the C++ `defaultRpcAgent_` to `nullptr`.

i.e.
```
 torch.distributedrpc.api._agent = None
```
does not translate to
```
RpcAgent::defaultRpcAgent_ = nullptr
```

This PR is to remove this ambiguity, and use the C++ pointer as source of truth.

The solution is to leverage a pybind11 behavior that it implicitly casts C++ `shared_ptr<RpcAgent>(nullptr)` to Python `None`.
ghstack-source-id: 97293315

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_duplicate_name

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info
```

```
buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_remote_module

buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_embedding

buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling

buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc
```

Differential Revision: D5733066

fbshipit-source-id: b3e6032ee975f19ca556497edbbf40b517b25be8
2020-01-27 19:34:12 -08:00
1217c9b364 Updating submodules
Summary:
GitHub commits:

3f156207e8
135cff30a5
7aa66c704f
1dc4136644
9166d9f767

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: fb27e09060ecb4278b4002c02bce48fe9f4dc361
2020-01-27 18:34:38 -08:00
1695915371 Make _wait_all_workers() support being called for multiple times (#32624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32624

We need this PR to resolve the issue mentioned in https://github.com/pytorch/pytorch/issues/31325#issuecomment-574918917.

The solution is for each `_wait_all_workers()` call, there is a sequence ID added, to identify different calls.
ghstack-source-id: 97277591

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_wait_all_workers

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_wait_all_workers
```

Differential Revision: D5739520

fbshipit-source-id: a64131e09c365179624700514422f5375afe803f
2020-01-27 17:04:02 -08:00
39987de9e4 [vulkan][caffe2] Add logging for descriptor extensions, fp16 storage
Summary:
`fbcode/caffe2/caffe2/mobile/contrib/libvulkan-stub/BUCK` changes comment:

libvulkan-stub contains vulkan headers `VK_HEADER_VERSION 29`

fbandroid uses ndk r17 that includes vulkan `VK_HEADER_VERSION 76`
which contains defines for extensions that we need.

`("include", "**/*.h"),` -> `("include", "*.h"),` means that ndk vulkan headers to use.

For fp16_storage logging need to add boilerplate for `vkGetPhysicalDeviceFeatures2KHR`

Test Plan:
scuba employees device_event

logcat getVulkanInfo().
```
instance ext.name:VK_KHR_surface
instance ext.name:VK_KHR_android_surface
instance ext.name:VK_EXT_swapchain_colorspace
instance ext.name:VK_KHR_get_surface_capabilities2
instance ext.name:VK_EXT_debug_report
instance ext.name:VK_KHR_device_group_creation
instance ext.name:VK_KHR_external_fence_capabilities
instance ext.name:VK_KHR_external_memory_capabilities
instance ext.name:VK_KHR_get_physical_device_properties2
instance ext.name:VK_KHR_external_semaphore_capabilities
device ext.name:VK_KHR_incremental_present
device ext.name:VK_EXT_hdr_metadata
device ext.name:VK_KHR_shared_presentable_image
device ext.name:VK_GOOGLE_display_timing
device ext.name:VK_KHR_push_descriptor
device ext.name:VK_KHR_image_format_list
device ext.name:VK_EXT_queue_family_foreign
device ext.name:VK_ANDROID_external_memory_android_hardware_buffer
device ext.name:VK_KHR_external_semaphore_fd
device ext.name:VK_KHR_external_fence_fd
device ext.name:VK_KHR_external_memory_fd
device ext.name:VK_KHR_external_memory
device ext.name:VK_KHR_swapchain
device ext.name:VK_KHR_external_semaphore
device ext.name:VK_KHR_driver_properties
device ext.name:VK_KHR_sampler_mirror_clamp_to_edge
device ext.name:VK_KHR_multiview
device ext.name:VK_KHR_relaxed_block_layout
device ext.name:VK_KHR_maintenance1
device ext.name:VK_KHR_maintenance3
device ext.name:VK_KHR_maintenance2
device ext.name:VK_EXT_global_priority
device ext.name:VK_KHR_get_memory_requirements2
device ext.name:VK_KHR_descriptor_update_template
device ext.name:VK_KHR_bind_memory2
device ext.name:VK_KHR_shader_draw_parameters
device ext.name:VK_KHR_dedicated_allocation
device ext.name:VK_KHR_create_renderpass2
device ext.name:VK_KHR_draw_indirect_count
device ext.name:VK_KHR_sampler_ycbcr_conversion
device ext.name:VK_KHR_device_group
device ext.name:VK_KHR_external_fence
device ext.name:VK_KHR_variable_pointers
device ext.name:VK_EXT_sampler_filter_minmax
device ext.name:VK_KHR_storage_buffer_storage_class
VULKAN_SYMBOL_WRAPPER_LOAD_INSTANCE_SYMBOL(vkGetPhysicalDeviceFeatures2KHR) res=1
mChipsetInfoUtilInfo.getVulkanInfo():{vk_driver_version=2149056512, vk_device_id=100859905, vk_extension_descriptor_update_template=1, vk_api_version=4198487, vk_support_fp16_storage=0, vk_platform_dlopen=success, vk_shader_int16=1, vk_device_type=1, vk_shader_float64=0, vk_extension_push_descriptor=1, vk_shader_int64=0, vk_wrapper_init=true, vk_vendor_id=20803, vk_max_compute_shared_memory_size=32768, vk_device_name=Adreno (TM) 630, vk_max_compute_work_group_invocations=1024, vk_device_count=1}
```

Reviewed By: dreiss

Differential Revision: D19564664

fbshipit-source-id: 908b34bdcc24d9b03ecc185edbc5cfb6e7aa27c9
2020-01-27 16:34:47 -08:00
812b1ad869 [quantization] FP16 dynamic quantized Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32331

Test Plan: Imported from OSS

Differential Revision: D19441158

Pulled By: jamesr66a

fbshipit-source-id: c04247ffe707be68718c486c31bc6c6040f7dc11
2020-01-27 15:45:32 -08:00
389b9c180b Updating submodules
Summary:
GitHub commits:

9ae8cbb0a1
986df37135
ef4d11b6e1

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 04e7a5ad02cb412ef36672ec30e10a898c525232
2020-01-27 14:43:34 -08:00
57519bd829 Revert "Fix iterator for ncclCommWatchdog. (#32571)" (#32649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32649

This reverts commit 59dbece3716775c3e6f3a428f73fbf1bde8fac4f.

Revert "Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338)"

This reverts commit f86d6c6afd0e981642d20b4269837334ec46c140.

Test Plan: Imported from OSS

Differential Revision: D19584224

Pulled By: ezyang

fbshipit-source-id: 6cc0ad56ba1f3aec5b48db44e8c6c24c8105db4a
2020-01-27 14:25:30 -08:00
897b6908d4 Kill THIntegerTensor, THDenseTensor, THDenseIndexTensor. (#32599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32599

these aren't used anymore.

Test Plan: Imported from OSS

Differential Revision: D19565655

Pulled By: gchanan

fbshipit-source-id: c0da31365df7342352f9850ae2a2e1e611a6886b
2020-01-27 13:26:31 -08:00
f6c46df856 Adding native qconcat
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32252

Test Plan: Imported from OSS

Differential Revision: D19422889

Pulled By: z-a-f

fbshipit-source-id: 23dd5f50009cc4c46b36c39ae1168b57f9a977a4
2020-01-27 11:24:46 -08:00
f0917dce7f Revert D19562258: [pytorch][PR] Fixes moving after weight norm application
Test Plan: revert-hammer

Differential Revision:
D19562258

Original commit changeset: 4fef006e32cd

fbshipit-source-id: 62e40de19331a61f4a65b7371460fe7dc28f23ea
2020-01-27 11:18:19 -08:00
64323ae177 Back out "Use simd version for fp16 conversions" (#32640)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32640

Original commit changeset: 3b1ee0ba756e

Reverting according to https://our.intern.facebook.com/intern/diff/D19291499/?transaction_id=1347995678706116&dest_fbid=465672071047258

Test Plan: unittests.

Reviewed By: jspark1105, jianyuh

Differential Revision: D19576708

fbshipit-source-id: bec92318523498067935234ab702c925ece71da6
2020-01-27 10:01:24 -08:00
e36cbb8f2f Fixes moving after weight norm application (#32563)
Summary:
This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN.

One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563

Differential Revision: D19562258

Pulled By: mruberry

fbshipit-source-id: 4fef006e32cdfd8e3e3d519fc2ab5fc203dd7b36
2020-01-27 09:57:43 -08:00
5ac2593d4f [ROCm] Adjust elementwise_kernel settings on ROCm (#32609)
Summary:
Recent PR https://github.com/pytorch/pytorch/issues/31974 and upcoming PR https://github.com/pytorch/pytorch/issues/32383 are changing the behavior of the elementwise_kernel infrastructure on CUDA.

In order to stay in sync, change the nd-loop behavior to match ROCm and CUDA for now. Once the full rework is done, the ROCm settings will likely diverge again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32609

Differential Revision: D19580121

Pulled By: ezyang

fbshipit-source-id: 4c8dcf6db3ac973e48ece6a665615cfe7d7cb764
2020-01-27 09:26:28 -08:00
ca9dc67094 0-dim batch size input for interpolate. (#32400)
Summary:
This PR adds support for 0-dim batch size input for `torch.nn.functional.interpolate` for various modes of interpolation.

Fixes part of gh-12013

CC: rgommers  ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32400

Differential Revision: D19557090

Pulled By: ezyang

fbshipit-source-id: 6822f148bb47bfbcacb5e03798bf2744f24a2a32
2020-01-27 09:24:46 -08:00
602394e996 verify input sizes for instance norm and group norm (#29082)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19250
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29082

Differential Revision: D19373507

Pulled By: ezyang

fbshipit-source-id: 231a79280f4cd7db2c26218a60869356a124bf77
2020-01-27 09:05:56 -08:00
19bb496a0d Enable mkldnn on windows (#31355)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15982.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31355

Differential Revision: D19428979

Pulled By: ezyang

fbshipit-source-id: bee304c5913e70e8dead3098e9796051861cd666
2020-01-27 09:00:02 -08:00
957a07ffbd [ROCm] Enable Caffe2 video operators for ROCm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32610

Differential Revision: D19580129

Pulled By: ezyang

fbshipit-source-id: 16d620173dcc231068e041d599aa09c94e677a9e
2020-01-27 08:29:07 -08:00
5b321a0985 [rpc] make handling of FORWARD_AUTOGRAD_REQ in request_callback_impl (#32476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32476

This makes the handling of FORWARD_AUTOGRAD_REQ in request_callback
nonblocking. Processing this message requires unwrapping the message with
autograd information, processing the original message, and sending back the
message with autograd information wrapped. This makes the processing the
original message nonblocking by grabbing a future to it and marking the parent
future as completed when this one completes.
ghstack-source-id: 97221251

Test Plan: `test_rpc_spawn.py` and `test_dist_autograd_spawn.py` both pass.

Differential Revision: D19509501

fbshipit-source-id: 84ad2f9c5305ed11ed9bb0144b1aaf5f8698cd2b
2020-01-27 00:47:27 -08:00
1e5aead35b Make cuda search process of cpp extension quiet (#32620)
Summary:
Fixes https://discuss.pytorch.org/t/error-with-cpp-extentions/67559.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32620

Differential Revision: D19576164

Pulled By: soumith

fbshipit-source-id: 076229322375774bec03ef2632fc233000c15391
2020-01-26 20:26:43 -08:00
8fbe1ccd16 faster bailout tests (#32266)
Summary:
Reduces the overhead of `prim::BailOut` nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32266

Differential Revision: D19503336

Pulled By: Krovatkin

fbshipit-source-id: daa0c373f0fa17edd689600b75e7e4ba98b4670a
2020-01-26 19:44:00 -08:00
12d5933969 Bug fix of norm minimization for dev mode (#31462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31462

Fix the divide by zero issue in norm minimization in dev mode

Test Plan: buck run mode/dev vision/video_modeling/classification/tools:test_octGloRe_quantization -- --test_data=/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/deep_vision_video_yufei_test_data_fcc_v4p2_10.csv --output_dir /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/octGloRe --load_model_path=/mnt/vol/gfsfblearner-oregon/flow/data/2019-10-15/e2681db8-e4f5-4b70-ae18-45bf0b8fbfbc/train_model_epoch0_inputcount0_final.mdl --dataset_name="FCC V4P2" --num_labels=1099 --column_handle="handle" --clip_per_video=1 --num_groups=24 --width_per_group=2 --batch_size=32 --histogram_file=/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/octGloRe/hist_octGloRe_final_24x2_fcc_v4p2_1clip_f144586257_nullfix_100k_compiled.hist --int8_model_type="pb"  --int8_predict_net_path="reproduce_octGloRe_final_24x2_predict_net_int8_l2approx_wminmax_from_mdl.pb" --int8_init_net_path="reproduce_octGloRe_final_24x2_init_net_int8_l2approx_wminmax_from_mdl.pb" --weight_quant="l2_approx" --activation_quant="l2_approx"  --print_model --int8_model_saved --num_iter 10

Reviewed By: jspark1105

Differential Revision: D19172591

fbshipit-source-id: 994a20e3364b0dc33623a11281e0bdbc2e06159d
2020-01-26 12:44:14 -08:00
90a259e1e2 Add warning regarding pickle insecurity on torch.load documentation (#32593)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31875

Added a small warning box based on the one presented on the [pickle](https://docs.python.org/3/library/pickle.html) module regarding the safety issues of unpickling files. i.e., unwanted code execution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32593

Differential Revision: D19572292

Pulled By: ngimel

fbshipit-source-id: 69e7de390133ea77bddcadcd5b6820193c8abcc9
2020-01-25 22:12:37 -08:00
3bbb36e02d Update linspace types (#32218)
Summary:
Changes the linspace functions to be more consistent as requested in https://github.com/pytorch/pytorch/issues/31991. The code has also been updated to avoid an early rounding error; the line `scalar_t step = (scalar_end - scalar_start) / static_cast<static_t>(steps-1)` can result in `step = 0` for integer scalars, and this gives unintended results. I examined the new output using
```
import torch

types = [torch.uint8, torch.int8, torch.short, torch.int, torch.long, torch.half, torch.float, torch.double]

print('Testing linspace:')
for type in types:
    print(type, torch.linspace(-2, 2, 10, dtype=type))
```
which returns
```
Testing linspace:
torch.uint8 tensor([254, 254, 254, 255, 255,   0,   0,   1,   1,   2], dtype=torch.uint8)
torch.int8 tensor([-2, -2, -2, -1, -1,  0,  0,  1,  1,  2], dtype=torch.int8)
torch.int16 tensor([-2, -2, -2, -1, -1,  0,  0,  1,  1,  2], dtype=torch.int16)
torch.int32 tensor([-2, -2, -2, -1, -1,  0,  0,  1,  1,  2], dtype=torch.int32)
torch.int64 tensor([-2, -2, -2, -1, -1,  0,  0,  1,  1,  2])
torch.float16 tensor([-2.0000, -1.5557, -1.1113, -0.6670, -0.2227,  0.2227,  0.6660,  1.1113,
         1.5547,  2.0000], dtype=torch.float16)
torch.float32 tensor([-2.0000, -1.5556, -1.1111, -0.6667, -0.2222,  0.2222,  0.6667,  1.1111,
         1.5556,  2.0000])
torch.float64 tensor([-2.0000, -1.5556, -1.1111, -0.6667, -0.2222,  0.2222,  0.6667,  1.1111,
         1.5556,  2.0000], dtype=torch.float64)
```
which is the expected output: `uint8` overflows as it should, and the result of casting from a floating point to an integer is correct.

This PR does not change the logspace function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32218

Differential Revision: D19544224

Pulled By: ngimel

fbshipit-source-id: 2bbf2b8552900eaef2dcc41b6464fc39bec22e0b
2020-01-25 20:23:54 -08:00
5fd037ce44 Fix MagmaInitializesCorrectly_CUDA by using an invertible matrix (#32547)
Summary:
This test case had been using the tensor

```
1  2  3  4
5  6  7  8
9  10 11 12
13 14 15 16
```

which is not an invertible tensor and causes the test case to fail, even if magma gets initialized just fine. This change uses a tensor that is invertible, and the inverse doesn't include any elements that are close to zero to avoid floating point rounding errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32547

Differential Revision: D19572316

Pulled By: ngimel

fbshipit-source-id: 1baf3f8601b2ba69fdd6678d7a3d86772d01edbe
2020-01-25 20:00:54 -08:00
320d1a1573 Fix wrong typing (torch/nn/parameter.pyi) (#32617)
Summary:
A constructor of `nn.Parameter` has default values on `data` and `requires_grad`, but in type stub, there are no default values.

Resolve https://github.com/pytorch/pytorch/issues/32481
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32617

Differential Revision: D19571397

Pulled By: ngimel

fbshipit-source-id: fd14298aa472b7575221229cecf5a56f8c84f531
2020-01-25 16:19:33 -08:00
69283388ca [pytorch] codegen flags to whitelist op registrations / generate to separate files (#32451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32451

This PR adds a few new parameters to ATen codegen script:

```
1. op_registration_whitelist
Can be used to filter op registrations for selective build;

2. type_whitelist
Can be used to filter types (CPUType, CUDAType, ...) for selective build;

3. per_op_registration
When set it will group function registrations by op name and write to separate files;
```

1 & 2 are introduced for mobile custom build without relying on static dispatch;
3 is introduced to solve custom build with multi-library / multi-model (needed by FB
internal build - see more details: https://fb.quip.com/ZVh1AgOKW8Vv).

These flags should work independently with each other (and independent to USE_STATIC_DISPATCH).

Not setting them should have no effect compared to master.
ghstack-source-id: 97214788

Test Plan: - tested all 3 params with FB internal build changes.

Differential Revision: D19427919

fbshipit-source-id: a381fe5f768fe2e9196563787f08eb9f18316e83
2020-01-25 15:27:29 -08:00
0afe195046 [pytorch] move type_derived_methods out of anonymous namespace (#32275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32275

Currently TypeDerived (e.g. `CPUType::`) methods are declared and
defined in anonymous namespace as they are only called from c10
dispatcher - except for STATIC_DISPATCH mode, where they can be directly
called from Functions.h.

We plan to generate c10 op registration into separate files for internal
xplat/BUCK build, thus we need declare these methods in non-anonymous
namespace.

I feel it's easier to simply change it unconditionally, unless there are
some side effect I'm not aware of - `TypeDefault::` methods are in
non-anonymous namespace anyway.
ghstack-source-id: 97214789

Test Plan: - CI

Differential Revision: D19426692

Pulled By: ljk53

fbshipit-source-id: 44aebba15f5e88ef4acfb623844f61d735016959
2020-01-25 15:24:32 -08:00
bd20274e8f [caffe2] use JIT'ed fp32 SLS (#32413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32413

Use JIT'ed fp32 SLS in Caffe2 operators

Test Plan: CI

Reviewed By: jianyuh

Differential Revision: D19460555

fbshipit-source-id: 4f29d34523efb6ea1e4c324cc8c93c96990c6aad
2020-01-25 12:57:18 -08:00
6ad9e5c70d Support TorchScript call over remote API (RRef) (#32466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32466

It's a follow-up work of https://github.com/pytorch/pytorch/pull/32197.

In https://github.com/pytorch/pytorch/pull/32197, `rpc.sync_rpc(..) `and `rpc.rpc_async(..)` support taking a TorchScript annotated Python function as the user function for RPC.

This PR extend along this direction by making `rpc.remote(..)` support taking a TorchScript annotated Python function as well.

ghstack-source-id: 97211168

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_function_exception

buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork

buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_function_exception
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call

buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call
```

Differential Revision: D19440633

fbshipit-source-id: d37f6dcdc0b80d35ac7bcba46ad6f9b831c3779b
2020-01-25 02:18:27 -08:00
e0ffe72649 [aten] fix shadowing variable warning (#32573)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32573

Fix the following warning
```
caffe2/aten/src/ATen/ParallelOpenMP.h:36:9: warning: declaration of ‘num_threads’ shadows a previous local [-Wshadow=compatible-local]
     int64_t num_threads = omp_get_num_threads();
         ^~~~~~~~~~~
caffe2/aten/src/ATen/ParallelOpenMP.h:29:9: note: shadowed declaration is here
   int64_t num_threads = omp_in_parallel() ? 1 : omp_get_max_threads();
         ^~~~~~~~~~~
```

Test Plan: CI

Reviewed By: ilia-cher

Differential Revision: D19552578

fbshipit-source-id: b8388de1aaa2bb7676b777c93b8ba9c25f5a3d51
2020-01-24 18:48:07 -08:00
169541871a Add operator support for dynamic quant on mobile (#32479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32479

Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator

Test Plan:
python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear

Imported from OSS

Differential Revision: D19542980

fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed
2020-01-24 17:51:54 -08:00
59dbece371 Fix iterator for ncclCommWatchdog. (#32571)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32571

The watchdog thread would erase an element and call `it--` (implicitly
relying on `it++` in the for loop to position correctly). Although, `it--`
would cause undefined behavior if the iterator is pointing to begin(). As a
result, I've modified the logic to update the iterator appropriately.

I've also enhanced the watchdog thread to catch and log exceptions.
ghstack-source-id: 97150763

Test Plan: waitforbuildbot

Differential Revision: D19551365

fbshipit-source-id: 426835819ad8d467bccf5846b04d14442a342f78
2020-01-24 17:34:36 -08:00
1218a16aae [pytorch][refactor] Explicitly use auto* for pointers (#32548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32548

As Title says.
ghstack-source-id: 97175523

Test Plan: CI

Differential Revision: D19541893

fbshipit-source-id: 96dce6964e6a89393d4159401a59672f041f51d3
2020-01-24 17:20:38 -08:00
e7edc5f20e [jit] Cloning constants in ClassType (#32371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32371

After we add constants to ClassType, we didn't update clone to
clone the constants, this PR adds the support
fixes: https://github.com/pytorch/pytorch/issues/32368

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D19564378

fbshipit-source-id: dbb13fb889d6ea9291034313b1f3c9aff4748bda
2020-01-24 16:48:38 -08:00
666472a38d [docs] Change fut.wait() to torch.jit._wait(fut) in jit overview docs (#32336)
Summary:
It looks like the jit Future does not have a `wait()` anymore and this throws an error when trying to run this code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32336

Differential Revision: D19559922

Pulled By: rohan-varma

fbshipit-source-id: a5aa67990595e98e0682a20cf5aced17c2ae85bb
2020-01-24 16:40:22 -08:00
6412ca3ce9 duplicate symbols with AT_PARALLEL_OPENMP=0 (#32568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32568

explicitly disabling openmp actually causes it to be used.

Test Plan: CI passes

Reviewed By: ilia-cher

Differential Revision: D19549732

fbshipit-source-id: 767b92148f47a1450ded46e101cd3d9b331a5d40
2020-01-24 16:27:50 -08:00
91f10a1de1 [quant][graphmode][refactor] Better API for fold_convbn (#32380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32380

We'll clone the module first and then fold conv bn and return a new
module

Test Plan:
.

Imported from OSS

Differential Revision: D19508033

fbshipit-source-id: 328e91a2c9420761c904a7f2b62dab4cfaaa31ac
2020-01-24 15:46:47 -08:00
52f8f031ac add diag into pt operator microbenchmark (#32597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32597

Currently, there is no benchmark test about diag operator. This diff will add one into the suite.

Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim1_M64_N64_diagonal0_outTrue_cpu
# Input: dim: 1, M: 64, N: 64, diagonal: 0, out: True, device: cpu
Forward Execution Time (us) : 28.496

# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim2_M128_N128_diagonal-10_outFalse_cpu
# Input: dim: 2, M: 128, N: 128, diagonal: -10, out: False, device: cpu
Forward Execution Time (us) : 45.179

# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim1_M256_N256_diagonal20_outTrue_cpu
# Input: dim: 1, M: 256, N: 256, diagonal: 20, out: True, device: cpu
Forward Execution Time (us) : 49.009
```

Reviewed By: mingzhe09088

Differential Revision: D19564024

fbshipit-source-id: 828a3e0e0e06810a77eb5ddb734efd30e4a63acf
2020-01-24 15:41:04 -08:00
9e0ce72e9e [pytorch] change op dependency output to use double-quoted strings (#32464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32464

Changed to double quoted strings to make FB linter happy.

Test Plan: Imported from OSS

Differential Revision: D19507859

Pulled By: ljk53

fbshipit-source-id: fa70535c7fbea73214b3b0efb0532184b5ee6854
2020-01-24 15:27:28 -08:00
2bfd33b4ab [refactor] Adding FoldConvBatchNorm2dHelper (#32374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32374

Moving all fold conv bn code to a class to prepare for making
it work with shared ClassType

Test Plan:
compiles

Imported from OSS

Differential Revision: D19508032

fbshipit-source-id: 4e9cf714111305d2b5474d4506507078f69f0c84
2020-01-24 14:41:20 -08:00
573a30270c [pytorch] Minor: boilerplate to propagate errors in request_callback_impl (#32556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32556

Out of caution, avoid assuming that there's never a failure in a couple of
request_calback_impl case handlers, but rather propagate the error.
ghstack-source-id: 97128697

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D19544685

fbshipit-source-id: 67c55626960bd42a5b0dec7841e8ba44ab059eb9
2020-01-24 14:37:33 -08:00
3ab30753e9 Make autogen functions correct for multiple outputs and views (#31990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31990

This PR does three things:
- Add a new `allow_rebase_history` flag to the differentiable views. If set, trying to rebase their history will raise an error.
- Make sure that the codegen functions verify this flag before doing inplace operations so that they fail before doing the inplace modification.
- Make sure the codegen functions set this flag properly when we don't support rebasing the history of the output.

The codegen change can be found [here](4bf180caa0).

Test Plan: Imported from OSS

Differential Revision: D19409649

Pulled By: albanD

fbshipit-source-id: a2b41c2d231e952ecfe162bdb6bad620ac595703
2020-01-24 14:32:28 -08:00
9e59244b53 fix view listing in autograd codegen (#32044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32044

Fix the list of views in the codegen:
- Move `narrow` out of the autograd functions since it's now implemented with slice.
- Add `split_with_sizes` that was missing from the list
- Remove special formulas for both `split` and `split_with_sizes`. Both used not to be considered as views. When they are, all the rnn code breaks because it uses them in an invalid way. The generic formula will generate one `narrow` Node for each output. Which is always valid.

The diff for the generated code can be found [here](https://github.com/pytorch/pytorch/compare/16eff6e...albanD:06d6e85) (outdated for last commit)

Test Plan: Imported from OSS

Differential Revision: D19409648

Pulled By: albanD

fbshipit-source-id: 5ebc4c978af500403f7f008c0231b7db0cabab26
2020-01-24 14:31:21 -08:00
d2bda53f6d [quant][graphmode] Call _jit_pass_dedup_module_ueses in quantize_script (#32303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32303

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508029

fbshipit-source-id: 468ed53fc8bb3c8fdf5d79aea186949e64be711a
2020-01-24 13:34:40 -08:00
fe3eb09da5 [quant] Re-enable fold_convbn in quantize_script (#32302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32302

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508035

fbshipit-source-id: 2ac26585396ec8a115acd0e1d7ccb84098a76824
2020-01-24 13:03:53 -08:00
fd1a4f18ee [pytorch] update code analyzer build.sh to handle srcs with same name (#32525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32525

Before calling static code analyzer we need link all bitcode files into
a single module. Current approach is a bit hacky: cmake still calls "ar"
to pack bitcode files into archives, then we manually unpack these
archives and call llvm-link.

Turns out libtorch_cpu.a contains a few files with same name, e.g.:
```
aten/src/ATen/native/SoftMax.cpp
aten/src/ATen/native/mkldnn/SoftMax.cpp
```

"ar x" will only keep one of them and cause inaccurate analysis result.

Use this temporary hack to workaround the problem. Ideally should merge
this step into cmake (e.g. directly calling llvm-link to produce target
output?).

Differential Revision: D19530533

Pulled By: ljk53

fbshipit-source-id: 94b292c241abaaa0ff4a23059882abdc3522971e
2020-01-24 12:37:30 -08:00
ef5637f85e [jit] allow compilation using optional modules (#32539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32539

Before: if something in `_modules` was `None`, we would barf. This is
incorrect because it's allowed for users to put `None` there, in case a
module is optional.

This case ought to be handled correctly during scripting. Fixes https://github.com/pytorch/pytorch/issues/32469

Test Plan: Imported from OSS

Differential Revision: D19552346

Pulled By: suo

fbshipit-source-id: aba7fdc19fd84d195c81cdaca8a75013a8626a8b
2020-01-24 11:51:47 -08:00
7d0f0b62de API for testing bailouts (#32518)
Summary:
This API seems to be quite useful to make sure all bailouts in a graph are triggered. I used it for testing torchvision models and I was wondering if this might be something we might actually want to have? zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32518

Differential Revision: D19553147

Pulled By: Krovatkin

fbshipit-source-id: 7542c99051588b622091aec6d041c70731ca5d26
2020-01-24 11:19:41 -08:00
f0c85571ed docker: Refactor Dockerfile process for official images (#32515)
Summary:
## Commit Message:

Refactors Dockerfile to be as parallel as possible with caching and adds a new Makefile to build said Dockerfile.

Also updated the README.md to reflect the changes as well as updated some of the verbage around running our latest Docker images.

Adds the new Dockerfile process to our CircleCI workflows

## How to build:

Building the new images is pretty simple, just requires `docker` > 18.06 since the new build process relies on `buildkit` caching and multi-stage build resolving.

### Development images
For `runtime` images:
```
make -f docker.Makefile runtime-image
```

For `devel` images:
```
make -f docker.Makefile devel-image
```

Builds are tagged as follows:
```bash
docker.io/${docker_user:-whoami}/pytorch:$(git describe --tags)-${image_type}
```

Example:
```
docker.io/seemethere/pytorch:v1.4.0a0-2225-g9eba97b61d-runtime
```

### Official images

Official images are the ones hosted on [`docker.io/pytorch/pytorch`](https://hub.docker.com/r/pytorch/pytorch)

To do official images builds you can simply add set the `BUILD_TYPE` variable to `official` and it will do the correct build without building the local binaries:

Example:
```
make -f docker.Makefile BUILD_TYPE=official runtime-image
```

## How to push:

Pushing is also super simple (And will automatically tag the right thing based off of the git tag):

```
make -f docker.Makefile runtime-push
make -f docker.Makefile devel-push
```
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32515

Differential Revision: D19558619

Pulled By: seemethere

fbshipit-source-id: a06b25cd39ae9890751a60f8f36739ad6ab9ac99
2020-01-24 10:27:20 -08:00
8fd3eaed25 [jit] Fix dict type serialization (#32569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32569

If the dict's contained types cannot be inferred from its contents (for
example, `Dict[str, Tensor]` vs. `Dict[str, Optional[Tensor]]`), we must
explicitly annotate the type.

Also this removes some special handling that omits annotations on empty
containers that have the default type. It makes the code more complex
for not too much value, and was wrong for dicts anyway.

Test Plan: Imported from OSS

Differential Revision: D19551016

Pulled By: suo

fbshipit-source-id: c529b112e72c10f509a6bc0f5876644caa1be967
2020-01-24 03:19:55 -08:00
3ada2e0d64 [pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/4049

Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477

We would like to add the intra-op parallelization support for the EmbeddingBag operator.

This should bring speedup for the DLRM benchmark:
https://github.com/pytorch/pytorch/pull/24385

Benchmark code:
```
from __future__ import absolute_import, division, print_function, unicode_literals

import torch
import time

eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum')

input = torch.LongTensor(1500).random_(0, 1000000)
offsets = torch.zeros(64, dtype=torch.int64)

niter = 10000
s = time.time()
for _ in range(niter):
    out = eb(input, offsets)
time_per_iter = (time.time() - s) / niter
print('time_per_iter', time_per_iter)
print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9)
```

The following results are single core on Skylake T6:
- Before our change (with the original caffe2::EmbeddingLookup)
time_per_iter 6.313693523406982e-05
GB/s 6.341517821789133

- After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths.
time_per_iter 5.7627105712890626e-05
GB/s 6.947841559053659

- With Intel's PR: https://github.com/pytorch/pytorch/pull/24385
time_per_iter 7.393271923065185e-05
GB/s 5.415518381664018

For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6.
ghstack-source-id: 97124557

Test Plan:
With D16990830:
```
buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench
```

With D17750961:
```
buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb
buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb
```

OSS test
```
python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu
```

Buck test
```
buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu"

OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets"  --print-passing-details
```

Generate the AVX2 code for embedding_lookup_idx_avx2.cc:
```
python hp_emblookup_codegen.py --use-offsets
```

Differential Revision: D17768404

fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700
2020-01-23 21:29:44 -08:00
b474c351dd [rpc] Remove template on RRef and add Type to RRef creation (#30630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30630

This remove template and all the specializations it have in rpc, we
universally use IValue as the inner value since we support making python
object to be hold inside IValue.

This will also ensure that we have the correct type information when
creating the RRef, we use the return type from the schema when creating
userRRef and OwnerRRef, it will enable IValue to always have the correct
type if the IValue is the RRef object (next PR)

Test Plan: Imported from OSS

Differential Revision: D19502235

fbshipit-source-id: 0d5decae8a9767e0893f3b8b6456b231653be3c5
2020-01-23 21:15:46 -08:00
ef2d4e67d1 Updating submodules
Summary:
GitHub commits:

08e28edc08
6884ecfc67
685144514f
ed665880aa

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 7b19dca06ad7e8751de21efc48f5eada37b446fb
2020-01-23 21:09:43 -08:00
6f146e1768 [JIT] Remove capsule type handling of node hashing (#32540)
Summary:
Capsule Type doesn't appear in the IR, it is purely used in the runtime. So we should not have to handle it node hashing... Let's see if this breaks anything.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32540

Differential Revision: D19541357

Pulled By: eellison

fbshipit-source-id: 905ed9f89cf6d03b45ddb4fde02adfa149b477f8
2020-01-23 17:44:28 -08:00
d2f66083c5 porting gather to ATen using TensorIterator with multithreading support. (#32425)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/24702](https://github.com/pytorch/pytorch/issues/24702).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32425

Differential Revision: D19538265

Pulled By: ngimel

fbshipit-source-id: 78821a16b6948916e956a04f984e0956f86cf582
2020-01-23 16:14:47 -08:00
4cd6b5cda6 [quant] Re-enable test_nested that has different qconfig for shared ClassType (#32206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32206

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D19508028

fbshipit-source-id: 5de3c2ef17de146feca03d7135a7e04f393de398
2020-01-23 15:32:57 -08:00
6745bfc31c Revert "Remove __torch__ from custom class qualname" (#32514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32514

This reverts commit c7fdf5b251c6fecd5d78b4f33d30bd77ca3f841c.

Test Plan: Imported from OSS

Differential Revision: D19525532

Pulled By: jamesr66a

fbshipit-source-id: 126f4e87250a2ac739bd7aa161a0f7b39f143d38
2020-01-23 14:56:25 -08:00
8ed1dd528e [JIT] Add torch.classes.load_library
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32508

Test Plan: Imported from OSS

Differential Revision: D19525175

Pulled By: jamesr66a

fbshipit-source-id: b9f07113f551bdfb56d49d24d12989be2b8fc7e4
2020-01-23 14:56:20 -08:00
69f9bf8893 [JIT] Support returning tuple from custom bound C++ method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32477

Test Plan: Imported from OSS

Differential Revision: D19509927

Pulled By: jamesr66a

fbshipit-source-id: 7d407150402cc19344c3ec3b4a27b3d7c464e8ac
2020-01-23 14:56:15 -08:00
ae42e232ce [JIT] Fix custom class method binding for const methods
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32471

Test Plan: Imported from OSS

Differential Revision: D19508249

Pulled By: jamesr66a

fbshipit-source-id: 3a0bce6845072bb03567049a73b9982b54d8daf9
2020-01-23 14:56:11 -08:00
7e14c420ae [JIT] Test __getstate__ and __setstate__ for custom bound C++ classes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32470

Test Plan: Imported from OSS

Differential Revision: D19508250

Pulled By: jamesr66a

fbshipit-source-id: 481299fb3c18fa874c2a1d2993984bb6b3193bac
2020-01-23 14:56:06 -08:00
dbd29e5668 [JIT] Passing custom class as arg (#32260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32260

This makes it so you can actually pass the custom class as an arg to ScriptFunctions

Test Plan: Imported from OSS

Differential Revision: D19424252

Pulled By: jamesr66a

fbshipit-source-id: c3530186619655781dedbea03c2ad321aaff1cb8
2020-01-23 14:54:59 -08:00
ad4fba0ce4 Only run test_conv_large and test_conv_transposed_large_cuda on 32GB device (#32473)
Summary:
For some reason, these two tests start to fail on 16GB Volta on Linux...

Also fixes https://github.com/pytorch/pytorch/issues/31650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32473

Differential Revision: D19538314

Pulled By: ngimel

fbshipit-source-id: 266195f19d8cf76b035795e0e318c152ae72adc2
2020-01-23 14:50:24 -08:00
49cd83d735 no more build_pytorch_libs.sh/.bat (#32319)
Summary:
https://github.com/pytorch/pytorch/issues/12918
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32319

Differential Revision: D19544272

Pulled By: soumith

fbshipit-source-id: dd32fa61efa78af908f21c7e54cb6484bf895e54
2020-01-23 14:45:54 -08:00
d234626267 [quant][graphmode] Support quantizing shared ClassType with different qconfigs (#32205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32205

to be filled

Test Plan:
python test_jit.py

Imported from OSS

Differential Revision: D19508031

fbshipit-source-id: cbf03d34e52eae62595c34fde6ec645cb6744ad9
2020-01-23 14:32:55 -08:00
ef94496b36 [JIT] throw if no self arg on ignored methods (#32503)
Summary:
There was a user who did this and it would seg fault.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32503

Differential Revision: D19538481

Pulled By: eellison

fbshipit-source-id: dc3752028b9eff6ac88c025e8a2b5f8fd44ce32f
2020-01-23 14:27:00 -08:00
db02a4e4ce Support 3D attention mask in MultiheadAttention. (#31996)
Summary:
Support a 3D attention mask for MultiheadAttention. If `attn_mask` has the batch dimension, it will not be unsqueezed. Fix https://github.com/pytorch/pytorch/issues/30678
Relevant issues/pr:
https://github.com/pytorch/pytorch/pull/25359
https://github.com/pytorch/pytorch/issues/29520
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31996

Differential Revision: D19332816

Pulled By: zhangguanheng66

fbshipit-source-id: 3448af4b219607af60e02655affe59997ad212d9
2020-01-23 13:16:48 -08:00
b6b8620871 Add unit test on export_opnames with interface. (#31531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31531

As suggested by suo , add unit test on torch.jit.export_opnames with interface. A submodule is annotated as interface and assigned to an instance, and then re-assigned to another instance. Make sure the operator names are also updated.

Test Plan: Imported from OSS

Differential Revision: D19539129

Pulled By: iseeyuan

fbshipit-source-id: 71a76ae7790cdd577618ca278afdb132727f08dc
2020-01-23 12:27:22 -08:00
9af5a97b1d Fix nll_loss to support empty tensors on GPU (#31491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31491

Fixes #31472

Test Plan: Imported from OSS

Differential Revision: D19537231

Pulled By: pbelevich

fbshipit-source-id: 20a43251a0f68a7a3557dd8234daee2d4814e5dd
2020-01-23 11:45:59 -08:00
583bb97618 [quant][graphmode] Default to non-inplace in graph mode quantization API (#32204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32204

att

Test Plan:
.

Imported from OSS

Differential Revision: D19508030

fbshipit-source-id: 94814c3c126a196f3938f944abfa5ae2a24d8dde
2020-01-23 10:39:46 -08:00
ea7bebb7fe [PyTorch BC] Clean up the whitelist for PyTorch Op BC check (#32523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32523

remove stale items

Test Plan: cont build

Reviewed By: hl475

Differential Revision: D19526918

fbshipit-source-id: ee7392ae84e5ddf88284020775119e59c9b6533e
2020-01-23 09:25:37 -08:00
02aa3ba331 Raise error for code that risk deadlock (#32295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32295

Fix for https://github.com/pytorch/pytorch/issues/32045

Calling into the engine with the GIL can deadlock because:
- worker thread initialization acquires the GIL
- Any Node / hook can be a python function that will acquire the GIL

The choice was made here to raise an error as one of the advantage of using cpp extensions with python is to be able to release the GIL. So we prefer to educate users to do it rather than doing it under the hook.

Test Plan: Imported from OSS

Differential Revision: D19430979

Pulled By: albanD

fbshipit-source-id: e43f57631885f12e573da0fc569c03a943cec519
2020-01-23 08:53:59 -08:00
21d475e20d [gloo] Skip registry warning (#31126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31126

Gloo device creator registry is throwing warning that confuses users - https://fb.workplace.com/groups/1405155842844877/permalink/3217491788277931/
Create C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING API to skip such warning

Test Plan:
{F224342749}

Tested both `C10_DEFINE_SHARED_REGISTRY` and `C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING`.
Make sure nothing breaks

Reviewed By: d4l3k

Differential Revision: D18904783

fbshipit-source-id: 0e0065d530956249a18325d4ed3cb58dec255d4c
2020-01-22 22:46:27 -08:00
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
e735395fc6 [caffe2] use 2-stage EmbeddingSpMDM interface (#32271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32271

Use the 2-stage EmbeddingSpMDM interface in D19425982 to reduce the overhead of code cache lookup and lock contention.
Fix an issue in sparse_lengths_sum_benchmarks generating empty indices when average length is small like 1.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D19425987

fbshipit-source-id: d5c5f0d46e0072403901809c31d516fa0f4b9b31
2020-01-22 19:05:36 -08:00
685f090ac8 [Rowwise Pruning][c2 op] Add Quantile Op (#32448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32448

Using binary search to compute the value for the given quantile among the input tensors.

Test Plan: Newly added unittests;

Reviewed By: jspark1105

Differential Revision: D19487604

fbshipit-source-id: 0dc6627b78d1310ac35b3f1d53b89cc89a697ece
2020-01-22 16:59:56 -08:00
4bdfc71421 Fix race condition for to() backward that spans devices (#31930)
Summary:
While putting finishing touches on the gradient scaling PR (https://github.com/pytorch/pytorch/pull/26512), I discovered my multi-GPU test (which uses `to()` to transfer tensors between devices) was intermittently failing with bad numerics.  I knew it was going to be [a weird case from the start](https://www.imdb.com/title/tt8946378/quotes/qt4868203) and spent a week descending into madness.  It turns out, for backward ops that create gradients on a different device from the device on whose stream the op is executed, the streaming backward synchronizations in [input_buffer.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L46-L83) do not properly tell later ops to wait on the population/creation of those gradients.  For example, a cross-device `to()` backward (CopyBackward Node) enqueues a cudaMemcpyAsync on the current stream of the source (incoming gradient's) device, then [syncs getCurrentCUDAStream on the destination device with the cudaMemcpyAsync](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Copy.cu#L76).  However, `input_buffer.cpp` in such cases ([case (3)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L77-L81)) was not properly telling `opt_consumer_stream` to wait on the current stream of the destination device (`var`'s device).

Circumstances needed to repro in current master (see [my test](https://github.com/pytorch/pytorch/compare/master...mcarilli:backward_to_race_fix#diff-e68a7bc6ba14f212e5e7eb3727394b40R1901)):
- 2 devices, with non-default streams used for forward-pass ops on both devices (which is the default behavior in test_cuda.py)
- A `to()` that transfers a tensor requiring grad from one device to another
- A backward pass that routes back through to()'s backward (aka CopyBackward).

Under these circumstances, backward ops following CopyBackward on CopyBackward's destination device (aka the original forward-pass source device) race with the device-to-device transfer, and execute using partially-transferred data.

The present PR fixes the race condition and ensures that later ops wait on the CopyBackward transfer.  This PR should also make streaming backward safe for other backward ops that span devices, as long as they play nice and populate any new gradients they create using the "current stream" of the device(s) on which they create those gradients.

There are a couple minor issues where I'm not sure of the best approach:
- Should we guard onto the var's device for the entire body of InputBuffer::add?
- I'm fairly sure we need to `recordStream` on `var` if the consumer stream is different from the stream on which (we expect) `var` was created, but calling `c10::cuda::CUDACachingAllocator::recordStream` in input_buffer.cpp might break CPU-only builds.  I couldn't find a different API call to record streams that seemed CPU-build-agnostic.  Could I wrap the call with a macro?

Thanks to mruberry for helpful suggestions and also the organization/naming of the stream pool and streaming backward code that allowed me to (just barely) wrap my head around the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31930

Differential Revision: D19517617

Pulled By: mruberry

fbshipit-source-id: 183d5460aefa5d27366b465b0473b80ec80fa044
2020-01-22 16:32:24 -08:00
193ac31441 [jit] Enable IValue to hold a PyObject (#32491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32491

This PR enables IValue to be able to hold a pure PyObject by adding a
new enum tag, a new jit_type to denote PyObject existance in IValue and
the JIT type system. We don't and not plan to expose this to user.

This is the basic piece that enable ivalue to be adopted broader like
making RRef always hold IValue, it might also simplify some compiler
logic
ghstack-source-id: 97039980

Test Plan: Imported from OSS

Differential Revision: D19502234

fbshipit-source-id: 90be001706d707d376cfbea25980fd82980df84a
2020-01-22 15:48:32 -08:00
556c0b063d Updating submodules
Summary:
GitHub commits:

87b81e7cb2
3a9a0976f2
9294f3b2fa
c8addc5ad4
9a9f1a849a
27cb280170

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 73beec64bf9c17fa6c42dd09ea85350e8c9c66ea
2020-01-22 15:30:31 -08:00
14e0bec9f2 [caffe2] remove unnecessary np.set_printoptions and fix test errors (#32475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32475

As title

Test Plan: CI

Reviewed By: houseroad

Differential Revision: D19508778

fbshipit-source-id: fd9ad63607535980505d155f3e3c3b7c6b95daf7
2020-01-22 14:49:47 -08:00
faffd2141a Corrected logical boolean expression (#32249)
Summary:
Changed bitwise & to logical && in the boolean expression.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32249

Differential Revision: D19501586

Pulled By: eellison

fbshipit-source-id: afe374cfc9661182703cc82810d9cb735fbb8180
2020-01-22 13:54:16 -08:00
43eb931c0f Remove mis-exposed abort API on ProcessGroup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32292

Test Plan: Imported from OSS

Differential Revision: D19430252

Pulled By: mrshenli

fbshipit-source-id: 4ec594e1be54afe774bdcecc0f1c9bda2edf5e0d
2020-01-22 12:51:20 -08:00
b7c6277c53 Adding QConfigTypePtrMap (#32203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32203

The type is needed for allowing multiple qconfig configurations for shared
ClassType, see next PR for more details

Test Plan:
.

Imported from OSS

Differential Revision: D19508027

fbshipit-source-id: a3df29dab3038bfa88c55dda98a3e8a78e99e5a1
2020-01-22 12:40:12 -08:00
38d122eca9 implement tuple constants (#31841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31841

Add Tuple Constants to JIT. The constraint here is that all elements of a tuple must themself be insertable as a a constant. Previously tuples were special cased in constant propagation, but now that there are more passes that are inserted constants, such as freezing, we should just have tuples be representable as constants.

Test Plan: Imported from OSS

Differential Revision: D19439514

Pulled By: eellison

fbshipit-source-id: 3810ba08ee349fa5598f4b53ea64525996637b1a
2020-01-22 12:13:31 -08:00
69492ad6ac remove tuple logic in constant propagation (#31840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31840

The next PR in this stack makes tuples insertable as constants, so we can remove special handling of tuples in constant propagation.

Test Plan: Imported from OSS

Differential Revision: D19439515

Pulled By: eellison

fbshipit-source-id: c58f153157f1d4eee4c1242decc4f36e41c1aa05
2020-01-22 12:13:26 -08:00
b01d824a78 improve mayContainAlias (#31839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31839

There are a number of improvements that can be made to `mayContainAlias`, which I would like to do in follow ups. For now, this is an easy one.

Test Plan: Imported from OSS

Differential Revision: D19439516

Pulled By: eellison

fbshipit-source-id: 0042fb7eaae6cfb4916bf95dc38280517a4bd987
2020-01-22 12:13:20 -08:00
adf0916606 Add str[] float[] constants resubmit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31791

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D19439513

Pulled By: eellison

fbshipit-source-id: a04c7401687b051f0d4fb4794963931ebe004194
2020-01-22 12:11:58 -08:00
e184a8843c Fix comparisions for ConcreteModuleType (#32256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32256

Previously two unrelated modules loaded from torch.jit.load
would compare equal because we only considered their data_ attributes which
are initialized blank in torch.jit.load. This changes ConcreteModuleType
to distinguish when the data_ attribute is blank vs when it is empty.

This replaces the poisoned logic.
ghstack-source-id: 96755797

Test Plan: oss

Differential Revision: D19423055

fbshipit-source-id: 79d6a50a3731c6eeb8466ba2a93702b49264bba0
2020-01-22 11:59:38 -08:00
8e689378c7 Move some of the helper functions for public use (#32202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32202

Move some helper functions in ModuleUseDeduper for public use

Test Plan:
.

Imported from OSS

Differential Revision: D19508034

fbshipit-source-id: 2e8e05eff6f3bbcfe6936598371e4afa72f9b11f
2020-01-22 11:35:37 -08:00
510a122d27 add missing align_corners annotation (#32492)
Summary:
adds the missing annotation in grid_sample and affine_grid functional
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32492

Differential Revision: D19516550

Pulled By: ezyang

fbshipit-source-id: 064c8c99bf6eae6744237c0b151b3ce4c82ada96
2020-01-22 11:29:07 -08:00
1c017f0c14 Migrate max and min (binary) from TH to ATen. (#30851)
Summary:
TH implementation will be removed after the unary max and min are
migrated.

Benchmark: (Debian 10, Release build, gcc 7.4, no turbo)

```python
import timeit
for device in ('cpu', 'cuda'):
    print(f'device: {device}')
    for op in ('max', 'min'):
        for dtype in ('torch.double', 'torch.float', 'torch.int16',
'torch.int32', 'torch.int64'):
            for n, t in [(10_000, 200000),
                        (100_000, 20000)]:
                print(f'torch.{op}(a, b), numel() == {n} for {t} times,
dtype={dtype}')
                print(timeit.timeit(f'torch.{op}(a)' +
(';torch.cuda.synchronize()' if device == 'cuda' else ''),
                                    setup=f'import torch; a =
torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype})
* ({n} / 2)', number=t))
    print()
```

Before:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.241763713000182
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.7138833169992722
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.2183356810000987
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7031846980007685
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7704679510006827
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.289198366999699
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7937613740014058
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2930124340000475
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8032857640009752
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.2908709189996443
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8829010000008566
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.2994690759987861
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
1.8037853410005482
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.2929310759991495
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.8075240359994496
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2932477679987642
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7868400779989315
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2885970789993735
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8389664830010588
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.29402057399966

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.787109836999662
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.842438002999188
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.429616614999759
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.835390076999829
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.940423873000327
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4108991760003846
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.9318018840003788
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4168134739993548
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9610764919998473
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4189234130008117
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.960172712999338
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4162539499993727
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.8985912560001452
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.4113489299998037
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.9160250799995993
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4128787690005993
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8806865219994506
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4086357010000938
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9362181240012433
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4151225870009512

```

After:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.2685823729998447
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.72004808300062
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.212242640000113
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7089235590001408
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7767087259999244
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2916517639996528
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8265984959998605
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.3002885240002797
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8084679720004715
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3012119999993956
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8800218449996464
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.3060645710002063
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.4905043950002437
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.9126290209997023
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7972335520007618
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2918074379995232
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8047651860006226
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2992197730000044
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8526509560006161
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3030709570002728

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.700986622000528
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.8415469050005413
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.3051693249999516
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.8321999460004008
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8086475109994353
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.405110773999695
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.913458047999484
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4236377289998927
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9386842409994642
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4230227469997772
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
3.0341797270002644
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4289592409995748
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.6091147850002017
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
2.036691903999781
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8256167649997224
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4078955400000268
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8631781489993955
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4210130069996012
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
3.0112479260005784
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4297719679998409

```

Solve partly https://github.com/pytorch/pytorch/issues/24594 #24595

Close https://github.com/pytorch/pytorch/issues/25016

Continuing https://github.com/pytorch/pytorch/issues/27185
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30851

Differential Revision: D19515694

Pulled By: ezyang

fbshipit-source-id: 1764897f912d6ae24b0c361f19a1aacf96e0826e
2020-01-22 09:03:18 -08:00
b77c25dec0 Fix dll load logic for Python 3.8 on Windows (#32215)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215

Differential Revision: D19501869

Pulled By: ezyang

fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915
2020-01-22 08:33:34 -08:00
c342c354a9 Put sparse all reduce results to input tensors (#32226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32226

right now if users call torch.dist.all_reduce() on dense tensors, outputs are put in input tensors. but if users call torch.dist.all_reduce() on sparse tensors, outputs are neither returned explicitly to users nor are put in input tensors.

To make torch.dist.all_reduce() API have same behavior on both dense tensors and sparse tensors, this diff is made to make torch.dist.all_reduce() on sparse tensors to put output in input tensors as well. This is acheived by simply calling input_sparse.copy_(output_sparse), see PR https://github.com/pytorch/pytorch/pull/9005 that implemented copy_ for sparse tensors.

close #31413
ghstack-source-id: 96984228

Test Plan: unit test

Differential Revision: D19192952

fbshipit-source-id: 2dd31dc057f20cc42b44b9e55df864afa2918c33
2020-01-22 08:06:56 -08:00
e37a24b044 Always return a new tensor from nn.functional.pad (#32350)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31734
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32350

Differential Revision: D19501845

Pulled By: ezyang

fbshipit-source-id: ea79496d23dc0016f3caa233c53d283b08f60371
2020-01-22 08:03:42 -08:00
8abaa322da fix torch.eq() doc entry (#32399)
Summary:
fix `torch.eq()` entry example to match the current output (boolean, instead of uint8)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32399

Differential Revision: D19498104

Pulled By: ezyang

fbshipit-source-id: e7ec1263226766a5c549feed16d22f8f172aa1a3
2020-01-22 07:43:10 -08:00
248f6d0485 Implement backend fallback fallthrough (#32439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32439

This adds c10::fallthrough_kernel which is a special boxed function which
can be used to implement fallthrough behavior at a dispatch key.  A fallthrough
kernel will redispatch to the next valid dispatch key.  It is implemented
in such a way that it costs no more to fallthrough than it does to go
straight to the actual implementation of the kernel.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19503886

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 6ee05bd815c4ef444e612d19f62312dbb76f2787
2020-01-22 07:32:08 -08:00
0d610b4821 Remove the support of build options like NO_*, WITH_* (#32447)
Summary:
We will now use USE_*, BUILD_* consistently. The backward compatibility
for NO_* and WITH_* is hereby removed in this commit, as promised in the
comment (next release is beyond Feb 20):

    # Before we run the setup_helpers, let's look for NO_* and WITH_* variables and hotpatch environment with the USE_*
    # equivalent The use of NO_* and WITH_* is deprecated and will be removed in Feb 20, 2020.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32447

Differential Revision: D19515536

Pulled By: ezyang

fbshipit-source-id: 2f2c51e6d4674af690b190a1f0397b8f596b6a15
2020-01-22 07:25:29 -08:00
44b270d892 insert_quant_dequant pass support shared class types (#31408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31408

We'll error out when a graph is quantized with different QSchemes.
This only occurs when we have two modules that have same types (e.g. two Conv2d modules initialized with
same arguments) and quantized with two configs that would produce different quantized graphs, for example
per tensor affine and per channel affine. This is a rare case, so it should be OK to skip for now.
Actual support will come later.

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D19162366

fbshipit-source-id: 798f06d0ddef0c8458237ce88b62159cc77eec8b
2020-01-21 22:18:49 -08:00
60b6c99aa7 Updating submodules
Summary:
GitHub commits:

d2ee8a1a3f
a1543b168d

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: a1394f1c4a48920d3ce1403c70351e2c56eaecf0
2020-01-21 19:18:29 -08:00
64de93d8e7 Move log_normal to Aten(CPU) (#31854)
Summary:
Fix https://github.com/pytorch/pytorch/issues/24723.
Benchmark script :
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(1000):
        input.log_normal_()

for n in [1, 10, 100, 1000]:
    fwd_t = 0
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(10000):
        t1 = _time()
        input.log_normal_()
        t2 = _time()
        fwd_t = fwd_t + (t2 -t1)
    fwd_avg = fwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg))
```
Test Device: skx-8180.
Before:
```
input size(128, 1) forward time is 0.0114 (ms).
input size(128, 10) forward time is 0.1021 (ms).
input size(128, 100) forward time is 1.0081 (ms).
input size(128, 1000) forward time is 10.1831 (ms).
```
After:
```
input size(128, 1) forward time is 0.0108 (ms).
input size(128, 10) forward time is 0.0969 (ms).
input size(128, 100) forward time is 0.9804 (ms).
input size(128, 1000) forward time is 9.6131 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31854

Differential Revision: D19314586

Pulled By: pbelevich

fbshipit-source-id: 2ea1d9a2c505e36aca9e609b52ccb3e8caf2ba8f
2020-01-21 19:07:31 -08:00
4973695268 Updating submodules
Summary:
GitHub commits:

d45f7b4f09
e6e8b9e871
da618022d2
2df47f519a

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: c4af09e70a56d11e845150ba3d90a570a3758e51
2020-01-21 17:16:46 -08:00
7fdc6cb74e Fix test_data_parallel name errors and add to run_test.py (#32428)
Summary:
While working on https://github.com/pytorch/pytorch/issues/31768 and trying to add tests for `DataParallel`, I discovered that:
- `test_data_parallel.py` can't be run through `run_test.py`
- running it with `pytest` fails with many name errors

`test_data_parallel.py` seems to have been split from `test_nn.py` in https://github.com/pytorch/pytorch/issues/28297 but not in a state where it can actually be run. Presumably `DataParallel` hasn't been tested by CI in the time since.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32428

Differential Revision: D19499345

Pulled By: ezyang

fbshipit-source-id: f9b748a99a5c85fc6675c22506cf10bbfd9c8a4d
2020-01-21 15:11:03 -08:00
0b606a4a7c Enhace DispatchStub to be thread safe from a TSAN point of view. (#32148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32148

TSAN would complain about multiple threads reading and writing to the
`cpu_dispatch_ptr` without any sort of synchronization. Although, this is a
valid issue from a TSAN point of view, there wasn't a correctness issue since
both threads would compute the same value.

In order to fix this, I've used std::atomic for cpu_dispatch_ptr with relaxed
ordering guarantees.
ghstack-source-id: 96989435

Test Plan: Verify the TSAN tests pass.

Differential Revision: D19386082

fbshipit-source-id: 1ff0893e02529eddd06b2855d9565edf1bbf1196
2020-01-21 14:59:57 -08:00
be6ffac1b6 Adagrad optimizer - updated step function, added param_groups, state to optimizers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29335

Differential Revision: D19449382

Pulled By: anjali411

fbshipit-source-id: ee238801ed9cdf15a80f2ce31cc4aab8ba582aea
2020-01-21 14:41:12 -08:00
0ed04bfdf6 Updating submodules
Summary:
GitHub commits:

40b08129cf
8cd8d286e6
d305f13e21
2957bd45f1

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 3b76eb7c8b6b5cf617aca7bd143e1ee404c4f0ed
2020-01-21 14:11:17 -08:00
e1d97025ee QNNPACK: Add support for dynamic quantization.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31896

Test Plan: Added new tests to QNNPACK's test suite to cover the new use case.  All new tests are passing.

Reviewed By: supriyar

Differential Revision: D19443250

Pulled By: AshkanAliabadi

fbshipit-source-id: fa7b1cffed7266a3c198eb591d709f222141a152
2020-01-21 12:33:08 -08:00
bc6005281b Updating submodules
Summary:
GitHub commits:

47e0b9b97e
6d225aaf95
ab4da8f60a

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 27bcdf08b6f5e47a5c948e094aca26bf67a6fb66
2020-01-21 12:12:31 -08:00
9e853e7090 Revert "Temporary workaround for BC test due to schema parser changes" (#32441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32441

This reverts commit ceffdbd2179e7dafdc6407909a00f4267db040de.

Test Plan: Imported from OSS

Reviewed By: houseroad

Differential Revision: D19500043

Pulled By: jamesr66a

fbshipit-source-id: 3bd22c55e4a81ff8b89d27f6e7438e3bdfc18606
2020-01-21 12:07:46 -08:00
f86d6c6afd Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32338

Timed out ops could linger around if the user doesn't actually call
`wait()` on that OP. As result, to fix this I've introduced the following
functionality in this PR:

1. Keep track of all outstanding work in ProcessGroupNCCL.
2. Enhance NCCL watchdog to sweep through all outstanding work and perform the
following operations:
  i.   If the work has timed out, abort all communicators for that work and
       remove them from the cache.
  ii.  If the communicators for the work receive an error, abort the
       communicators and remove them from the cache.
  iii. If the work has completed (successfully/unsuccessfully), remove it from
       the list of outstanding work.
ghstack-source-id: 96895704

Test Plan: waitforbuildbot

Differential Revision: D19401625

fbshipit-source-id: 8f6f277ba2750a1e1aa03cdbc76e8c11862e7ce5
2020-01-21 12:05:40 -08:00
ec4be4e58c Redundant condition (#32396)
Summary:
Optimize expression: 'A || (!A && B)' <=> 'A || B'

A: relErr <= maxRelErr
!A : relErr > maxRelErr
B: absErr <= absErrForRelErrFailure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32396

Differential Revision: D19499370

Pulled By: ezyang

fbshipit-source-id: c19bdcb2d4e7ff7806a8cd181c6e7e9e276b9979
2020-01-21 11:30:49 -08:00
839fe714de Fix BC test after TorchBind cahnges (#32429)
Summary:
It was broken by https://github.com/pytorch/pytorch/issues/32320. Let's be on the safe side and just whitelist all testing ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32429

Differential Revision: D19501016

Pulled By: dzhulgakov

fbshipit-source-id: 9cc1d363edb4579905bee1976a2b57255ce41738
2020-01-21 11:30:44 -08:00
e4f43bf7a5 Set rpath for JNI library on Mac (#32247)
Summary:
Without this, dlopen won't look in the proper directory for dependencies
(like libtorch and fbjni).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32247

Test Plan:
Build libpytorch_jni.dylib on Mac, replaced the one from the libtorch
nightly, and was able to run the Java demo.

Differential Revision: D19501498

Pulled By: dreiss

fbshipit-source-id: 13ffdff9622aa610f905d039f951ee9a3fdc6b23
2020-01-21 11:30:39 -08:00
9482683065 Remove dead includes in caffe2/test
Reviewed By: ezyang

Differential Revision: D19273220

fbshipit-source-id: 3dfc3388914e60611c84472e3fc529f5b5e40534
2020-01-21 11:30:34 -08:00
c13df8b688 Fix cusparse version check (#32405)
Summary:
The current version check doesn't use proper lexicographic comparison and so will break for future versions of cuSPARSE with `CUSPARSE_VER_MAJOR > 10` and `CUSPARSE_VER_MINOR < 2`. Also, my cusparse headers for CUDA 9 don't seem to include version macros at all, so added `if !defined` to be explicit about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32405

Differential Revision: D19499412

Pulled By: ezyang

fbshipit-source-id: 1593bf1e5a4aae8b75bb3b350d016cc6c3b9c009
2020-01-21 11:30:30 -08:00
9ce25cce91 add an option to record time spent waiting for GIL (#30842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30842

We'd like to profile the time spent on GIL acqusiition to debug
performance issues.

Test Plan: Unit tests pass.

Differential Revision: D18837590

fbshipit-source-id: 925968f71c5fb96b8cd93f1eab4647602d2617d1
2020-01-21 11:29:23 -08:00
1177191c8e Synchronize with ShipIt.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2020-01-21 13:39:28 -05:00
cc2d5b15ad F.normalize uses clamp_min_ inplace (#32360)
Summary:
We don't care about autograd when `out!=None` anyways
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32360

Differential Revision: D19452402

Pulled By: colesbury

fbshipit-source-id: c54775289f8a700019ca61e951d59ff4894ac980
2020-01-21 10:38:06 -08:00
0c03304bdf .circleci: Only run macos libtorch on master (#32378)
Summary:
These jobs were taking forver to run so we decided it's only really
worth it to run it on master.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32378

Differential Revision: D19499301

Pulled By: seemethere

fbshipit-source-id: 22cac5b5baee84e44607a16daeb77048cb0f5974
2020-01-21 10:38:01 -08:00
a2641e6005 Make type of Tensor.type() more specific (#32353)
Summary:
Fixes the following issue:

```
$ cat test.py
import torch

t = torch.tensor(1.5)
t.type(torch.float32)[None]

$ mypy test.py
test.py:4: error: Invalid index type "None" for "Union[str, Tensor]"; expected type "Union[int, slice]"
Found 1 error in 1 file (checked 1 source file)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32353

Differential Revision: D19499388

Pulled By: ezyang

fbshipit-source-id: 715111e934aea020b20f850d27e32c4f70b82572
2020-01-21 10:37:56 -08:00
418ebc827b Build: Respect USE_CUDNN=0, even if cudnn is found (#32404)
Summary:
Currently, setting `USE_CUDNN=0` has no effect and any cudnn library found on your system will be used anyway. This is especially problematic when your system has multiple CUDA versions installed, and you are building with a version that lacks a matching cudnn. CMake will find any other cudnn versions and you end up with both CUDA versions added to your compiler include paths.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32404

Differential Revision: D19499425

Pulled By: ezyang

fbshipit-source-id: a9b3f6f9dc22033481c3c1c5999b1a7ef98468cb
2020-01-21 10:36:03 -08:00
ecbf6f99e6 Removed unused weight update in prepack. Moved zero point update to (#32254)
Summary:
qlinear/qconv to be consistent with data update.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32254

Differential Revision: D19422929

Pulled By: kimishpatel

fbshipit-source-id: 595a4f7d6fde4978c94f3e720ec8645f3f2bdb7a
2020-01-19 19:08:37 -08:00
b543e3cd6f support empty batch in group normalization (#32401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32401

https://github.com/pytorch/pytorch/issues/12013

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- 'test_GroupNorm_empty'

Differential Revision: D19463720

fbshipit-source-id: 8ae44590fc5eeb1adc69a2345d7cc2187d3307ac
2020-01-19 19:04:54 -08:00
7fbfb7eef2 Updating submodules
Summary:
GitHub commits:

ea6039a6c9
0d30b8e0fc
7acedd4723
4db6e3b785
cd898afb5e
cf5dd11204
08bdcfd87e
fc84c09b8f
454d37976b
a22e6b8cb4

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: b87550b26e69216be2a8e40870a6e7dab825261c
2020-01-19 03:30:58 -08:00
58234c0254 support torch script call over rpc (#32197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197

This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test.
[test all]This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96879167
ghstack-source-id: 96879167

Test Plan: unit test

Differential Revision: D19402374

fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e
2020-01-18 09:24:17 -08:00
1ecad2bb2b Test passing custom class instance to bound method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32320

Test Plan: Imported from OSS

Differential Revision: D19437335

Pulled By: jamesr66a

fbshipit-source-id: 8f5166dbe6fc5704b12b6224932460b12be0d39b
2020-01-17 23:09:38 -08:00
c7078a1ce8 Fix returning instance of custom class from method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32312

Test Plan: Imported from OSS

Differential Revision: D19433511

Pulled By: jamesr66a

fbshipit-source-id: f048d5f60eaba992ee42fea2d318a59b3a156578
2020-01-17 23:09:34 -08:00
c7fdf5b251 Remove __torch__ from custom class qualname
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32301

Test Plan: Imported from OSS

Differential Revision: D19431645

Pulled By: jamesr66a

fbshipit-source-id: 198522a1641cb9f90fa4c614da4ca4162fadf456
2020-01-17 23:09:29 -08:00
ceffdbd217 Temporary workaround for BC test due to schema parser changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32324

Test Plan: Imported from OSS

Differential Revision: D19438085

Pulled By: jamesr66a

fbshipit-source-id: 3dd2586e73c890a7bdadd6cbb3df2c186f93199d
2020-01-17 23:08:20 -08:00
61ee8c972f porting scatter_add to ATen (CPU) (#31662)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/24758](https://github.com/pytorch/pytorch/issues/24758).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31662

Differential Revision: D19440824

Pulled By: ngimel

fbshipit-source-id: b13443cfcc8bcb9ec21f1cddb5c6fbc0ef4bb0f2
2020-01-17 21:36:54 -08:00
53429680d5 Remove stray @script (#32235)
Summary:
This should be covered under recursive script now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32235

Pulled By: driazati

Differential Revision: D19414889

fbshipit-source-id: 85f8132401dbe44c9dbaef7c0350110f90eb9843
2020-01-17 19:22:09 -08:00
8c40a78277 Back out "Calling JITed 8 Bit Fused SLS in FBGEMM from C2" (#32381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32381
Original commit changeset: 0dfa936eb503

"Facebook"
Temporary remedy for SEV :
https://our.intern.facebook.com/intern/sevmanager/view/s/193726

Test Plan: Run CI tests

Reviewed By: jspark1105

Differential Revision: D19458382

fbshipit-source-id: 731790f96b341ade5e70ff13e4b0b5fafad0fea6
2020-01-17 19:08:48 -08:00
25e62ebac9 Updating submodules
Summary:
GitHub commits:

9b13f58aa1
044b292acc
e1f67bbf3d

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 21df26f60f436eb8c1766f66afac4a0d93dd33d1
2020-01-17 18:32:53 -08:00
10c2bd35af Fix cudnn channels_last descriptors problem (#31952)
Summary:
This is to append fixes to https://github.com/pytorch/pytorch/issues/31783 so we can pull the fixes in without breaking tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31952

Differential Revision: D19433839

Pulled By: ngimel

fbshipit-source-id: 5b3d2f0b2a86aacd1d100dd86996ee0d63e5ee92
2020-01-17 17:45:07 -08:00
824e649d40 Specify requires_grad for Parameter replica so it's not always set to True by default (#32356)
Summary:
This is the proposed fix for issue https://github.com/pytorch/pytorch/issues/32018
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32356

Differential Revision: D19450648

Pulled By: mrshenli

fbshipit-source-id: c63eeb6e9f5a87ebe613dd7013907559f295a7ea
2020-01-17 17:41:10 -08:00
0ac31a99be run code analysis against mobile interpreter (#32276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32276

Include mobile interpreter in mobile code analysis pass, which has some
manually registered ops in temporary namespaces.

The mobile interpreter is still under development and these ops will be
removed in the future. This is a temporary step for internal build
experiment.

Test Plan: Imported from OSS

Differential Revision: D19426818

Pulled By: ljk53

fbshipit-source-id: 507453dc801e5f93208f1baea12400beccda9ca5
2020-01-17 17:21:28 -08:00
5bc44fb6ea TensorIterator unrolling and vectorized load - step 0, 1 (#31974)
Summary:
This is step 0 and 1 for  https://github.com/pytorch/pytorch/issues/31975:

- Old code is moved to namespace `legacy`
- New `elementwise_kernel` and `launch_kernel` added to namespace `modern`, they only support 1d contiguous case for now
- In `gpu_kernel_impl`, dispatch to the new code if the problem is trivial 1d contiguous.

In terms of performance, this PR affect elementwise operators on contiguous tensors. The performance is improved slightly (up to 8%) for medium size tensors on Volta.

## compiled code
See https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise.ipynb

We can see that, previously, the add kernel compiles to
```
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 71
        /*0000*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R0, SR_TID.X ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 73
        /*0030*/                   S2R R3, SR_CTAID.X ;
        /*0040*/                   IMAD R0, R3, 0x200, R0 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 76
        /*0050*/                   ISETP.GE.AND P0, PT, R0, c[0x0][0x160], PT ;
        /*0060*/               P0 EXIT ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110
        /*0070*/                   IMAD R3, R0.reuse, c[0x0][0x194], RZ ;
        /*0080*/                   IMAD R6, R0, c[0x0][0x198], RZ ;
        /*0090*/                   IADD3 R4, P0, R3.reuse, c[0x0][0x178], RZ ;
        /*00a0*/                   IADD3 R2, P1, R6.reuse, c[0x0][0x180], RZ ;
        /*00b0*/                   LEA.HI.X.SX32 R5, R3, c[0x0][0x17c], 0x1, P0 ;
        /*00c0*/                   LEA.HI.X.SX32 R3, R6, c[0x0][0x184], 0x1, P1 ;
        /*00d0*/                   LDG.E.SYS R5, [R4] ;
        /*00e0*/                   LDG.E.SYS R2, [R2] ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 77
        /*00f0*/                   IMAD R0, R0, c[0x0][0x190], RZ ;
        /*0100*/                   IADD3 R6, P0, R0, c[0x0][0x170], RZ ;
        /*0110*/                   LEA.HI.X.SX32 R7, R0, c[0x0][0x174], 0x1, P0 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110
        /*0120*/                   FFMA R9, R2, c[0x0][0x1a0], R5 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 170
        /*0130*/                   STG.E.SYS [R6], R9 ;
	//## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 81
        /*0140*/                   EXIT ;
.L_16826:
        /*0150*/                   BRA `(.L_16826);
        /*0160*/                   NOP;
        /*0170*/                   NOP;
.L_29063:
```
Now it compiles to
```
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210
        /*0000*/                   MOV R1, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R6, SR_CTAID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217
        /*0030*/                   MOV R7, 0x4 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 208
        /*0040*/                   S2R R3, SR_TID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210
        /*0050*/                   LEA R6, R6, R3, 0x8 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225
        /*0060*/                   IADD3 R2, R6.reuse, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217
        /*0070*/                   IMAD.WIDE R4, R6.reuse, R7.reuse, c[0x0][0x190] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225
        /*0080*/                   IADD3 R3, R6, 0x80, RZ ;
        /*0090*/                   ISETP.GE.AND P1, PT, R2, c[0x0][0x160], PT ;
        /*00a0*/                   ISETP.GE.AND P0, PT, R6.reuse, c[0x0][0x160], PT ;
        /*00b0*/                   ISETP.GE.AND P2, PT, R3, c[0x0][0x160], PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217
        /*00c0*/                   IMAD.WIDE R2, R6.reuse, R7, c[0x0][0x188] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225
        /*00d0*/                   IADD3 R14, R6, 0xc0, RZ ;
        /*00e0*/                   ISETP.GE.AND P3, PT, R14, c[0x0][0x160], PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 228
        /*00f0*/              @!P1 LDG.E.SYS R11, [R4+0x100] ;
        /*0100*/              @!P0 LDG.E.SYS R0, [R2] ;
        /*0110*/              @!P0 LDG.E.SYS R9, [R4] ;
        /*0120*/              @!P1 LDG.E.SYS R8, [R2+0x100] ;
        /*0130*/              @!P2 LDG.E.SYS R10, [R2+0x200] ;
        /*0140*/              @!P2 LDG.E.SYS R13, [R4+0x200] ;
        /*0150*/              @!P3 LDG.E.SYS R12, [R2+0x300] ;
        /*0160*/              @!P3 LDG.E.SYS R15, [R4+0x300] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245
        /*0170*/                   IMAD.WIDE R6, R6, R7, c[0x0][0x180] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191
        /*0180*/                   FFMA R9, R9, c[0x0][0x168], R0 ;
        /*0190*/                   FFMA R11, R11, c[0x0][0x168], R8 ;
        /*01a0*/                   FFMA R13, R13, c[0x0][0x168], R10 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245
        /*01b0*/              @!P0 STG.E.SYS [R6], R9 ;
        /*01c0*/              @!P1 STG.E.SYS [R6+0x100], R11 ;
        /*01d0*/              @!P2 STG.E.SYS [R6+0x200], R13 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191
        /*01e0*/                   FFMA R15, R15, c[0x0][0x168], R12 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 244
        /*01f0*/               P3 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245
        /*0200*/                   STG.E.SYS [R6+0x300], R15 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 248
        /*0210*/                   EXIT ;
.L_727:
        /*0220*/                   BRA `(.L_727);
        /*0230*/                   NOP;
        /*0240*/                   NOP;
        /*0250*/                   NOP;
        /*0260*/                   NOP;
        /*0270*/                   NOP;
.L_32233:
```

## benchmark

The benchmark is for add kernel on Volta.

See https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-unroll.ipynb

For tensors of size from 2^20 to 2^30, previously we had
```
1.5.0a0+dedd16b
dedd16b4181cae81e37e978cd3bf24c1ba35ca05
33 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
48.7 µs ± 75 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78.9 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
140 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
261 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
506 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
993 µs ± 189 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.96 ms ± 139 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.9 ms ± 955 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.79 ms ± 187 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Now we have
```
1.5.0a0+b1a239b
b1a239be8d529e89875fe47cd09964ef3a9516ac
30.4 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
45.2 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
75 µs ± 476 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
134 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
253 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
489 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
961 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.91 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.8 ms ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.57 ms ± 763 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
It is slightly better.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31974

Differential Revision: D19450765

Pulled By: ngimel

fbshipit-source-id: 79601bfceb5da84ff87384ba8193793eb4095a2e
2020-01-17 17:16:23 -08:00
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
c8ca70e39d Updating submodules
Summary:
GitHub commits:

54b290f00f
e8df50310d
ef5c9efe12

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 7b6dc88d40e8fd8c396d4d12846db43b0fb4258c
2020-01-17 15:48:29 -08:00
7e3c438913 Renaming IValue List functions (#32093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093

toGenericListRef -> toListRef
isGenericList -> isList
toGenericList -> toList
toXListRef -> toXVector

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D19369767

Pulled By: zdevito

fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae
2020-01-17 15:17:45 -08:00
bdd5e15437 skip testExceptions in ProcessGroupGloo if built with TSAN (#32242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32242

TSAN and fork don't play well together, so skip this test if we're
building under TSAN. It will still run in other modes.

Differential Revision: D19416113

fbshipit-source-id: 7e88d63a843356372160c2524c05e8fd1706553e
2020-01-17 14:17:06 -08:00
5a58c16722 Updating submodules
Summary:
GitHub commits:

29aba0a287
37a97eb4de
0efdd57292
6d886fc7eb
2e5854752a
931d1c643b
781986ef71
2e6d2903d7
e04348ff63
e8650fd560

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: abd7ee4aaec8401b2c885335940773a0655b4496
2020-01-17 12:48:36 -08:00
9b6ec61bfd exposing CPU/GPU Copy ops (#32248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248

expose CPU/GPU copy ops

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test

Reviewed By: houseroad

Differential Revision: D19405856

fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e
2020-01-17 12:40:43 -08:00
e7bc1663bd fix unchecked cast alias analysis (#32309)
Summary:
Unchecked cast just refines the type of a value, the value stays the same, so the output should alias the input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32309

Differential Revision: D19439037

Pulled By: eellison

fbshipit-source-id: fe6902d0d9a5a9ef5e9c13e1dbd056576d8c327e
2020-01-17 12:29:28 -08:00
df514fd8c0 C++ C2/Glow operator unittest
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258

Test Plan:
```
 buck test glow/fb/test/numerics:fp16_op_test
```

Reviewed By: bddppq

Differential Revision: D19401786

fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9
2020-01-17 12:13:34 -08:00
e133d8be3b Fix ASAN / potential segfault in quantized Tensor memory allocations.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29882

Differential Revision: D18522039

Pulled By: AshkanAliabadi

fbshipit-source-id: 1fdc68491aa2ac176633b9ecc3ee78c9175a97aa
2020-01-17 12:09:25 -08:00
4e69352713 Add 64bit atomic fetch add (#32354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32354

adding int_64 version of AtomicFetchAdd

Reviewed By: bwasti

Differential Revision: D19434349

fbshipit-source-id: b2358e8c5c6b7cd7e7b21de974b4ee1b5258fcf4
2020-01-17 11:43:43 -08:00
aa61d1ee85 Add a new job to support custom build (#32323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32323

### Summary

Since we have released the custom build in 1.4.0, it's time to setup a CI for that. This PR adds a new iOS job to the iOS builds. To save time, It only runs the arm64 build.

### Test Plan

- Don't break any iOS jobs
- Custom Build works.

Test Plan: Imported from OSS

Differential Revision: D19451342

Pulled By: xta0

fbshipit-source-id: 9de305c004fc795710ecf01d436ef4792c07760c
2020-01-17 11:39:08 -08:00
7732924501 Delete unused bernoulli_Tensor from THTensorRandom.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32328

Test Plan: Imported from OSS

Differential Revision: D19448736

Pulled By: pbelevich

fbshipit-source-id: 92380ca1e0c0ac88d100e6fba8d216a46d0b181e
2020-01-17 11:09:19 -08:00
8c1268aad3 Use default scale/zero_point in fake_quantize module instead of None (#32318)
Summary:
Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out.
fixes: https://github.com/pytorch/pytorch/issues/32082
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318

Differential Revision: D19434801

Pulled By: jerryzh168

fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469
2020-01-17 11:04:08 -08:00
5b815d980e Added cummin
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32238

Differential Revision: D19416791

Pulled By: anjali411

fbshipit-source-id: 5aadc0a7a55af40d76f444ab7d7d47ec822f55a5
2020-01-17 10:51:58 -08:00
78d8f691ad Don't dispatch to integral types in smooth_l1_kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32333

Differential Revision: D19442787

Pulled By: ngimel

fbshipit-source-id: 9578483202614d7406eceb13cbf15b253c04f237
2020-01-17 10:47:43 -08:00
6a5a55d573 use gtest asserts in ProcessGroupGlooTest instead of other checks (#32138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32138

I personally prefer `throw std::runtime_error("BOOM")`, but we should
probably have asserts here now that it is gtest. Also ensures that the correct
exceptions are thrown by the `testSignal` tests.
ghstack-source-id: 96811000

Differential Revision: D19382905

fbshipit-source-id: 1b00dd70524d03c8bd6f48715baa5070a7985467
2020-01-17 10:31:59 -08:00
4968bc2450 cap the maximum depth of bailout chains at 1 (#32073)
Summary:
This is another implementation of the maximum bailout depth.
The first version was implemented in https://github.com/pytorch/pytorch/pull/31521
This one has advantages that
* the bailout depth only exists in `CodeImpl` which seems to be an appropriate place to keep it in.
* threading many objects is reduced to threading through CodeImpl and getPlanFor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32073

Differential Revision: D19443432

Pulled By: Krovatkin

fbshipit-source-id: 898384bb2308a1532a50a33d9e05cfca504711e6
2020-01-17 09:42:46 -08:00
61a2b34113 Updating submodules
Summary:
GitHub commits:

2d9c2bb401

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: ea12c419c4bab8ce60793deecb10a8ead086a4d5
2020-01-17 05:54:26 -08:00
904ab092c2 fix testSend and testRecv in ProcessGroupGlooTest (#32134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32134

These tests weren't written in the most correct way and were often
flaky. It was tricky to identify these tests as flaky until we moved this file
to use gtest.

The gist of the issue is that the test previously would not coordinate sends
and recvs properly. For example, we created a single thread to test an
abortRecv and a successful recv. A separate sender thread was used to send 2
messages. What could go wrong here is that the first send could successfully
complete, resulting in the receiving end processing the message before it gets
the abort signal. In this case we would have an error in the test.
ghstack-source-id: 96806879

Differential Revision: D19379395

fbshipit-source-id: 24782ccaf6e6ec6b445378b29d5f10f901e0dee6
2020-01-17 04:00:39 -08:00
7a9c920bac add lock for ncclCommAbort (#31901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31901

ncclCommAbort is not thread safe, so adding a lock for it
ghstack-source-id: 96829715

Test Plan: unit tests

Differential Revision: D19293869

fbshipit-source-id: 711b4a07605d6e5a81577247d2f90a78041c1809
2020-01-17 03:57:08 -08:00
91bdb872ce fix spelling mistake: excpected -> expected
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28817

Differential Revision: D18544562

Pulled By: dgisser

fbshipit-source-id: 51f728e807f9c4bb30f58585d5b6f436cb880153
2020-01-17 00:11:08 -08:00
ef5ae4823a Register RoIAlignRotated with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30785

Reviewed By: wat3rBro

Differential Revision: D18415056

fbshipit-source-id: e00376bec948309d53f2172697cd477449f769b2
2020-01-16 16:32:28 -08:00
b79030d6c8 remove unused code after refactoring optimizations into profiling-sensitive and profiling-insensitive (#32106)
Summary:
After we removed `Specialize_AutogradZero` from the optimization pipeline of the simple executor mode, we don't need to mark any inputs as undefined in `autodiff`. Also, `needsGradient` in `graph_executor.cpp` never runs on graph with profiling information, so I removed that code as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32106

Differential Revision: D19374238

Pulled By: Krovatkin

fbshipit-source-id: 4223d3efe3c904a55a28471e5ae9593017ce3e07
2020-01-16 16:31:16 -08:00
c2761490fc Enhancing the test (#32321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32321

Updating the test to test more meaningful sematics

Test Plan:
[xintchen@devvm6308.prn2 ~/fbsource/fbcode] buck test mode/dev //caffe2:ATen-core-test -- 'OperatorRegistrationTest\.whenRegisteringCPUTensorType_thenCanOnlyCallUnboxedWithCPUTensorIdDispatchKey'
Building: finished in 0.4 sec (100%) 517/517 jobs, 0 updated
  Total time: 0.5 sec
Trace available for this run at /tmp/testpilot.20200116-132729.2541763.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision e5f315ebe0508d11fc281fa4b4f7b43d2ef1c003 fbpkg 67e8eb96914f400db234fd9af70fdcde at Wed Jan 15 23:38:32 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/762/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/6192449492430045
      ✓ caffe2:ATen-core-test - OperatorRegistrationTest.whenRegisteringCPUTensorType_thenCanOnlyCallUnboxedWithCPUTensorIdDispatchKey 0.002 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6192449492430045
Summary (total time 1.15s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D19436345

fbshipit-source-id: c1f2383d62627aa4507616b8905ceb42ac563e9d
2020-01-16 15:56:34 -08:00
53708e21ed classic fixed-point liveness
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31724

Differential Revision: D19426570

Pulled By: Krovatkin

fbshipit-source-id: 3387dfb25e6e9456d5d0517eac1d2e44e61d6813
2020-01-16 15:13:22 -08:00
8c8bd79f32 Add CI scripts for Custom Build (#32316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32316

### Summary

Since the Custom Build has been released in 1.4.0, it's time setup CI. To do that, we need

1.  Add a python script to generate the yaml file
2. Add new build scripts to circle CI (arm64 only).

### Test Plan

- Don't break the current iOS CIs

Test Plan: Imported from OSS

Differential Revision: D19437362

Pulled By: xta0

fbshipit-source-id: 395e27a582c43663af88d11b1ef974a4687e672c
2020-01-16 14:46:16 -08:00
34c751c263 Eliminate exception throwing code from dispatch call sites (#32168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32168

We move the exception raising into the function, saving us a
big pile of instructions for raising the stack.

After this stack of changes, the compiler is willing to inline, e.g.,
`c10::KernelFunction::callUnboxed<at::Tensor, at::Tensor const&>(c10::OperatorHandle const&, at::Tensor const&) const::__func__`
(whereas previously it refused to do so.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19392948

Pulled By: ezyang

fbshipit-source-id: d5edab00cae48444b308e74438a17a421532c08f
2020-01-16 14:43:16 -08:00
b85dbe8f7b Out-of-line construction of OperatorName. (#32121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32121

This reduces code size in the call sites of this function (of which
there are many: one for every operator call) since we no longer have
to construct std::string at the site.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19392951

Pulled By: ezyang

fbshipit-source-id: 8bc43d46ba635380ff9f8989f7557fdd74b552cf
2020-01-16 14:43:12 -08:00
36d09197ab Move error reporting code out-of-line from header. (#32118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32118

This reduces code size and makes the calling function more likely to inline.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19392950

Pulled By: ezyang

fbshipit-source-id: 5e3829cca5604407229f93c2486eb9a325581ea2
2020-01-16 14:43:07 -08:00
7b7390778c Make an assert on a hotpath trigger only in DEBUG mode. (#32117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32117

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19392949

Pulled By: ezyang

fbshipit-source-id: 7f579e45d49bddeab36b8dd1a90c83224a368ac8
2020-01-16 14:42:18 -08:00
8746f90cf6 Fix weight backward for cudnn conv of large tensor (#31889)
Summary:
This is the last PR for https://github.com/pytorch/pytorch/issues/22496
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31889

Differential Revision: D19431371

Pulled By: ngimel

fbshipit-source-id: 754fa91d49ad03549cb07aa30dde34bf9e851302
2020-01-16 14:15:52 -08:00
b26ee54176 For ppc64le, stop presenting the python 2.7 builds (we will no longer… (#32315)
Summary:
For ppc64le, we no longer plan to run regular builds on Python 2.7, and we wish to stop
publicizing the build status for those two builds (ppc64le/CPU and ppc64le/GPU each on py27).

This pull request simply removes the build status links for these two builds, replacing them
with a generic dash character (consistent with other un-publicized builds within the table).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32315

Differential Revision: D19435939

Pulled By: soumith

fbshipit-source-id: c9f31e7acba83e42f6a758ac011bbef36fd8aaa0
2020-01-16 13:49:40 -08:00
cd99b3706a Pin Pillow to latest and use a torchvision that works with it (#32290)
Summary:
Follow on from https://github.com/pytorch/pytorch/pull/31777, as suggested in https://github.com/pytorch/pytorch/pull/31777#issuecomment-575166543.

Pillow 7.0.0 removed `PILLOW_VERSION` and `__version__` should be used instead.

torchvision 0.5.0 switched from using `PILLOW_VERSION` to `__version__`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32290

Differential Revision: D19430280

Pulled By: mrshenli

fbshipit-source-id: be8d6317a4948d71e818adeafe61dfe567df5601
2020-01-16 10:48:22 -08:00
f94aab45fd Logical condition reduction (#32201)
Summary:
x || ( !x  &&  y )  <=>  to x || y
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32201

Differential Revision: D19429334

Pulled By: ezyang

fbshipit-source-id: 044dc46c2d9a7e180aa1795703c0097b0c7c3585
2020-01-16 07:57:12 -08:00
14548c2d5b out variant for native_batch_norm forward (#29192)
Summary:
This is dealing with forward of native BatchNorm CUDA impl to support inplace operation. The larger issue: https://github.com/pytorch/pytorch/issues/26288

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29192

Differential Revision: D19410370

Pulled By: ezyang

fbshipit-source-id: a6889c96bdd848f3a1cb2d943d06e054d22fb7ab
2020-01-16 07:24:13 -08:00
bab87e4b60 reimplement __torch_function__ overrides for torch.functional using inline logic (#32194)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30831.

This improves the performance of operators in the `torch.functional` namespace that are overridable by `__torch_function__` implementations when supplied with `Tensor` operands.

Running the split benchmark in various configurations produces the following timings:

<details>
<summary>Expand for timings on <code>master</code> </summary>

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 3.340

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cuda
# Input: M: 8, N: 8, parts: 2, device: cuda
Forward Execution Time (us) : 3.333

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 3.366

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cuda
# Input: M: 256, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 3.385

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 3.468

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cuda
# Input: M: 512, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 3.416
```
</details>

<details>
<summary>Expand for timings with this pull request applied</summary>

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 2.261

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cuda
# Input: M: 8, N: 8, parts: 2, device: cuda
Forward Execution Time (us) : 2.223

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.237

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cuda
# Input: M: 256, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.218

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.259

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cuda
# Input: M: 512, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.234
```

</details>

<details>
<summary>Expand for timings on <code>master</code> with <code>__torch_function__</code> dispatch disabled </summary>

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 2.180

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cuda
# Input: M: 8, N: 8, parts: 2, device: cuda
Forward Execution Time (us) : 2.172

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.171

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cuda
# Input: M: 256, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.146

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.175

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cuda
# Input: M: 512, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.152
```

</details>

So at least on the machine I'm testing on, this brings the overhead down to less than 100 ns. For comparison, the overhead for `__array_function__` in NumPy is about 850 ns on the same machine.

<details>
<summary>Expand for timings for NumPy <code>__array_function__</code> dispatch </summary>

```
In [1]: import numpy as np

In [2]: %timeit np.mean([1])
8.89 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [3]: %timeit np.mean._implementation([1])
8.04 µs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```

See [the implementation in NumPy](https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L195) for why this measures `__array_function__` overhead.

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32194

Differential Revision: D19410396

Pulled By: ezyang

fbshipit-source-id: ada788a4399c81cd7eb2d548aa04a2459e96634a
2020-01-16 07:10:38 -08:00
7df5dc2775 Creating callUnboxedWithDispatchKey method (#32198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32198

creating a method called "callUnboxedWithDispatchKey".

Also adding tests to make sure it works.

Test Plan: buck test mode/dev //caffe2:ATen-core-test

Differential Revision: D19402815

fbshipit-source-id: b206cf04b1216fbbd5b54ac79aef495cb0c1be06
2020-01-16 01:37:41 -08:00
d75b6b3f9d Support shape inference and lowering of SparseLengthsWeightedSumFused4BitRowwise (#32257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32257

Pull Request resolved: https://github.com/pytorch/glow/pull/4018

att.

Test Plan:
Unit tests:
```
buck test glow:masterCaffe2ImporterTest -- caffe2.SparseLengthsSumFused4BitRowwise
buck test caffe2/caffe2/opt:bound_shape_inference_test
```

Reviewed By: jfix71

Differential Revision: D19389014

fbshipit-source-id: 5f6863443adee5d3bf7a50a105866441eefb9560
2020-01-15 23:49:06 -08:00
f3b62d4b1c Updating submodules
Summary:
GitHub commits:

191bbb1069
9d5a6e33e3
2bdfe1544a
1600bee8de
b7f1b3e51c
3220376f13
1ba747dfb4
0d5b08cbfc
481179a38e
9bc4f9c40f

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 79135519c3449c2b77ff1ca7d4f13724e2390f6e
2020-01-15 21:37:32 -08:00
851a7e861b Add CAFFE2_API to video decoding functions (#31187)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31132
Also closes old issue https://github.com/pytorch/pytorch/issues/11735
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31187

Differential Revision: D19147172

Pulled By: pbelevich

fbshipit-source-id: e959058eec3489061f431fbecc99ded0d4dc1704
2020-01-15 19:39:02 -08:00
89c6e18c43 Updating submodules
Summary:
GitHub commits:

9915834ced
3cdb0d61d6
93a4e9f4cc
dafd450683
b5d5670e40
bab52dcc84
d2b4d42d4b
83479196c3
f2ec66095a
99561fee3b
eacaa4f35d
4ce4667b20
89291814cc

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 2a3c90f0a7615441dae746b18b9048cfddf0f4de
2020-01-15 17:54:21 -08:00
90c65b81c3 Define repr() on IValues (#32232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32232

Previously, we were using `operator<<` as the default way of printing
IValue constants during serialization. The semantics of `operator<<`
were ill-defined; and this bit us in particular with strings and lack of
quoting.

This PR defines the role of `operator<<`: much like Python `str()`, it
is intended to produce a human-readable-ish representation for
debugging purposes.

This PR also defines a new `repr()` function on IValue that is intended
to produce a valid Python expression that can be used to recreate an
object with the same value. `repr()` is not defined on all IValue kinds
(notably tensors!) for this reason.

Test Plan: Imported from OSS

Differential Revision: D19417036

Pulled By: suo

fbshipit-source-id: c102d509eaf95a28b6a62280bc99ca6f09603de5
2020-01-15 17:35:41 -08:00
104b2c610b Tensor prep from image in native (#31426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31426

Tensor convertion from YUV image is moved to native with optimizations to eliminate branching inside loop, no variables declaration, less ops.

Perf stat from local devices - measuring converting 320x240 image from camera to 1,3,224,224 tensor;
Legend:
Java - current java impl
JavaOpt - current java impl + the same optimizations with no if/else in for, declare variables outside of for, inlining etc.
C - C impl

```
Nexus 5
JavaOpt N:25 avg:119.24 min: 87 max:177 p10:102 p25:105 p50:115 p75:127 p90:150
      C N:25 avg: 17.24 min: 14 max: 39 p10: 14 p25: 15 p50: 15 p75: 16 p90: 23
   Java N:25 avg:139.96 min: 70 max:214 p10: 89 p25:110 p50:139 p75:173 p90:181
avg C vs JavaOpt 6.91x

Pixel 3 XL
JavaOpt N:19 avg: 16.11 min: 12 max: 19 p10: 14 p25: 15 p50: 16 p75: 18 p90: 19
      C N:19 avg:  5.79 min:  3 max: 10 p10:  4 p25:  5 p50:  6 p75:  6 p90:  9
   Java N:19 avg: 16.21 min: 12 max: 20 p10: 14 p25: 15 p50: 16 p75: 18 p90: 20
avg C vs JavaOpt 2.78x

Full build with 4 abis inside:
Pixel 3 XL
JavaOpt N:25 avg: 18.84 min: 16 max: 24 p10: 16 p25: 17 p50: 18 p75: 20 p90: 22
      C N:25 avg:  7.96 min:  5 max: 10 p10:  7 p25:  7 p50:  8 p75:  9 p90:  9
avg C vs JavaOpt 2.36x
```

Test Plan: Imported from OSS

Differential Revision: D19165429

Pulled By: IvanKobzarev

fbshipit-source-id: 3b54e545f6fbecbc5bb43216aca81061e70bd369
2020-01-15 17:10:00 -08:00
de5821d291 Torchscript print to logcat (#31456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31456

External request https://discuss.pytorch.org/t/jit-android-debugging-the-model/63950

By default torchscript print function goes to stdout. For android it is not seen in logcat by default.
This change propagates it to logcat.

Test Plan: Imported from OSS

Differential Revision: D19171405

Pulled By: IvanKobzarev

fbshipit-source-id: f9c88fa11d90bb386df9ed722ec9345fc6b25a34
2020-01-15 16:44:56 -08:00
31b7d0873c Add File existence checking (#32208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32208

### Summary

Since the master branch will generate `libtorch_cpu.a`, which is different from the release branch. This PR will skip the missing libs before archiving them.

### Test Plan

- don't break the nightly build

Test Plan: Imported from OSS

Differential Revision: D19420042

Pulled By: xta0

fbshipit-source-id: fb28df17b7e95d5c7fdf5f3a21bece235d7be17c
2020-01-15 15:35:50 -08:00
8b4c695e47 Added cons folding for ONNX mul, div, sqrt ops (#32077)
Summary:
An example of a model with such leaf nodes is faster_rcnn model. This PR helps optimizing onnx ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32077

Reviewed By: hl475

Differential Revision: D19399622

Pulled By: houseroad

fbshipit-source-id: 35c628c6f1514b79f1bcf7982c25f0f4486f8941
2020-01-15 15:31:34 -08:00
ffc8e255c4 Sort export w/ negative axes (#31971)
Summary:
Fixing export of Sort on negative axes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31971

Reviewed By: hl475

Differential Revision: D19325874

Pulled By: houseroad

fbshipit-source-id: 18ab2bf39221970c8ab65a1355f5759f88faa54f
2020-01-15 15:13:23 -08:00
4460a86cd6 Support op registration if name starts with underscore (_) (#32017)
Summary:
This is required for rehistering torchvision::_new_empty_tensor op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32017

Reviewed By: hl475

Differential Revision: D19399606

Pulled By: houseroad

fbshipit-source-id: 43e1f2d78d2a0310af347b42f7e9b54cd503a20d
2020-01-15 14:57:57 -08:00
01010f5705 Add comments to torch::nn::ConvTranspose{1,2,3}d modules explaining how to use them in a Sequential module (#32223)
Summary:
Following changes in https://github.com/pytorch/pytorch/pull/31005.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32223

Differential Revision: D19415328

Pulled By: yf225

fbshipit-source-id: f6f74f10ba3b5cc7e1a92f8b02ea4c9747018ae8
2020-01-15 14:53:33 -08:00
a5161c7022 Update out-of-date comment on Docker image updates. (#32224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32224

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19416878

Pulled By: ezyang

fbshipit-source-id: 0205d0635658a3328128dcaad94bbbef505342be
2020-01-15 14:30:58 -08:00
322f34b245 Adding DDP Design Note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32158

Test Plan: Imported from OSS

Differential Revision: D19405980

Pulled By: mrshenli

fbshipit-source-id: 808ef1c71b637546f8872375bf1828967b1a5a60
2020-01-15 14:10:45 -08:00
74621ca926 Add allgather_base as per our discussion re: ProcessGroup interface. (#31892)
Summary:
Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31892

Test Plan: No functional changes, no tests yet.

Differential Revision: D19290739

Pulled By: agolynski

fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a
2020-01-15 14:05:23 -08:00
81048c41ab remove simple .data from torch/nn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31482

Test Plan: Imported from OSS

Differential Revision: D19303243

Pulled By: albanD

fbshipit-source-id: 5afdfeb4b8382c09b9ec65acd545148ed76d4285
2020-01-15 12:40:38 -08:00
3363ca20a7 example_outputs Doc Edit (#31826)
Summary:
torch.onnx.export docs contain two descriptions for 'example_outputs' arg.
So combined the information for it with the description with the parameters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31826

Differential Revision: D19274928

Pulled By: zou3519

fbshipit-source-id: cbcce0a79c51784c1d7aa8981aab8aac118ca9b4
2020-01-15 12:34:34 -08:00
3d01e3d16f Notify other threads before running callbacks (#31713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31713

- In case the callbacks are heavy/slow, the other threads should be able to start work on the value of the future after the current thread moves the value and unlock the mutex.
- `completed()` is not inlined. Avoid function call overhead.

ghstack-source-id: 96694593

Test Plan: tdb

Differential Revision: D5624371

fbshipit-source-id: 5762e6e894d20108ec9afedd1a6e64bcd97ee3fe
2020-01-15 12:03:07 -08:00
0392e8384b Fix simple typo: whos -> whose (#31288)
Summary:
Closes https://github.com/pytorch/pytorch/issues/31287
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31288

Differential Revision: D19166753

Pulled By: zou3519

fbshipit-source-id: da31ad323b8fafa7cbc502fda4e2eb6e02facfb6
2020-01-15 11:47:21 -08:00
4314620ba0 [jit] Module clone work with shared ClassType (#31970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31970

Now that the ClassType can be shared among different module instances, we'll
preserve the sharing in clone as well, that is if the original module has
a ClassType that is shared, we'll clone this ClassType once and share it between
different module instances as well.

Test Plan:
build/test/test_jit

Imported from OSS

Differential Revision: D19406251

fbshipit-source-id: 2881c695f6e718e5432040a3817cf187a62017bf
2020-01-15 11:24:53 -08:00
62b06b9fae Rename TensorTypeId to DispatchKey (#32154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154

TensorTypeId -> DispatchKey
	c10/core/TensorTypeId.h -> c10/core/DispatchKey.h
	c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp
	TensorTypeId::* -> DispatchKey::*
	TensorTypeId type_id -> DispatchKey dispatch_key
		type_id -> dispatch_key
	TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys
	RealTensorTypeId -> RealDispatchKey
TensorTypeSet -> DispatchKeySet
	TensorTypeIds -> DispatchKeys
	c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h
	c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp
	type_set() -> key_set()
	type_set_ -> key_set_
	typeSet -> keySet
ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard
IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard
LocalTensorTypeSet -> LocalDispatchKeySet
	c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h
	c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp
	tls_local_tensor_type_set -> tls_local_dispatch_key_set
	tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded
	tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded
	tls_is_tensor_type_id_included -> tls_is_dispatch_key_included
	tls_set_tensor_type_id_included -> tls_set_dispatch_key_included
MultiDispatchTensorTypeSet -> MultiDispatchKeySet
	multi_dispatch_tensor_type_set -> multi_dispatch_key_set
tensorTypeIdToBackend -> dispatchKeyToBackend
backendToTensorTypeId -> backendToDispatchKey
initForTensorTypeSet -> initForDispatchKeySet
inferred_type_set -> inferred_key_set
computeTensorTypeId -> computeDispatchKey
PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set
get_default_tensor_type_id -> get_default_dispatch_key
inferred_type_id -> inferred_dispatch_key
actual_type_id -> actual_dispatch_key
typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_
get_type_id() -> get_dispatch_key()
legacyExtractTypeId -> legacyExtractDispatchKey
extractTypeId -> extractDispatchKey

Test Plan: Imported from OSS

Differential Revision: D19398900

Pulled By: pbelevich

fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776
2020-01-15 11:16:08 -08:00
8c3ee9f2ba [Python] Deprecate use of scipy.misc.logsumexp and scipy.misc.comb (#32209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32209

* Deprecate use of scipy.misc.logsumexp and scipy.misc.comb.
* Removed in 1.0.0 https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.misc.logsumexp.html and https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.misc.comb.html
* Use scipy.special.logsumexp and scipy.special.comb instead.
* This diff updates most usages of except those in experimental folders.
* This diff does NOT fix existing lint/code/TARGETS issues.
* This diff does NOT autoformat codes.

Test Plan: sandcastle auto unittests

Differential Revision: D19406460

fbshipit-source-id: 2103fa0d674d9671a0175f4ce54b3c887d22f04e
2020-01-15 10:40:47 -08:00
05088da8e9 [pytorch][PR] Fixed error in sample code of documentation (#31682)
Summary:
"in_features" and "out_features" are not defined. Possibly a typo. They should be "input_features" and "output_features" instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31682

Differential Revision: D19251685

Pulled By: zou3519

fbshipit-source-id: ac9e524e792a1853a16e8876d76b908495d8f35e
2020-01-15 10:34:07 -08:00
ef0f96e92f [pytorch][PR] update comment in autograd.h for locking (#32222)
Summary:
Just update the comment to make it accurate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32222

Differential Revision: D19410428

Pulled By: albanD

fbshipit-source-id: ad13596382613c2728e674a47049ea4f563964b9
2020-01-15 09:42:24 -08:00
19bbb4fccb Stop building documentation in pytorch_linux_xenial_cuda*_build (#32187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32187

Fixes #32058. Previously we would build documentation during the pytorch
linux cuda build. We don't actually need to do this because we have a
dedicated python_doc_build job that builds the docs. With this change,
the CUDA build should run ~10 minutes faster, giving devs faster signal.

Test Plan: - Check the CUDA (10.1) build on this PR, make sure it doesn't build the docs.

Differential Revision: D19400417

Pulled By: zou3519

fbshipit-source-id: e8fb2b818146f33330e06760377a9afbc18a71ed
2020-01-15 07:48:42 -08:00
4dce482acb dict type unification fix (#32185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32185

Previously we would unify the contained types of dictionaries, however this breaks type safety.
```
torch.jit.script
def test(input: Dict[str, None], cond):
    if cond:
        out = input
    else:
        out: {"1": 1}
    out["hi"] = 3
```

This would only occur if a dictionary is being re-assigned across an if condition with different contained types, which is pretty unlikely. I tested `model_backward_compatibility` for all fb models and this didn't break anything. This PR is a precursor to alias analysis changes.

Also fixes `Future` type unification. Because `Future` is an immutable type, it is okay to unify the contained type.

Test Plan: Imported from OSS

Differential Revision: D19398585

Pulled By: eellison

fbshipit-source-id: ebc8812cdf5b6dba37b1cfbc2edc7d8c467b258c
2020-01-14 23:02:05 -08:00
c70bb0a4f8 Fixes to prim ops (#32179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32179

Tensors are used as keys in dictionaries, so we need to annotate that key insertion into a dictionary inserts the key into the wildcard set. Also fixes bug with `listCopyAndSort` not copying the input list.

Test Plan: Imported from OSS

Differential Revision: D19397555

Pulled By: eellison

fbshipit-source-id: 17acdc22ff5e2dda44fd25c80450396f5592095e
2020-01-14 22:58:29 -08:00
879620e85e [caffe2] fix how np.clip is used in lengths_reducer_fused_{4,8}_rowwise_ops_test (#32086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32086

np.clip(1, num_indices // 2, 10) -> np.clip(num_indices // 2, 1, 10)
Also change batchsize -> num_rows to match with what the variable actually does

Test Plan: CI

Reviewed By: hx89

Differential Revision: D19361521

fbshipit-source-id: 9ce864c7d7da046dc606afa5207da677ccf80f52
2020-01-14 22:53:28 -08:00
7ad03855dc Fix 'template' keyword warning with clang-cl and clang.exe (#32104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32104

Fixes these warnings:
```
xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(96,17): warning: use 'template' keyword to treat 'data' as a dependent template name
            W.t.data<uint8_t>(),
                ^
                template
xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(97,17): warning: use 'template' keyword to treat 'data' as a dependent template name
            B.t.data<int32_t>(),
                ^
                template
```

Test Plan: Tested locally with clang-cl and CI for other toolchains

Reviewed By: boguscoder

Differential Revision: D19353563

fbshipit-source-id: c28afb8c1ad72fd77ef82556ba89fcf09100d1f9
2020-01-14 20:09:35 -08:00
02f09a1bbd Implement backend-agnostic rpc._wait_all_workers() utility (#32190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32190

We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.

- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.
ghstack-source-id: 96693296

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn

buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_rref_leak
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

Differential Revision: D19399908

fbshipit-source-id: 1dee607cd49adafe88534621a1c85e2736e2f595
2020-01-14 19:19:14 -08:00
7572501d40 move ProcessGroupGlooTest to gtest (#32133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32133

We should do this to better debug the test.

Differential Revision: D19375479

fbshipit-source-id: 8c2bf61bae605a38252bb793b091ade479bea11a
2020-01-14 17:42:42 -08:00
8dc67a014f Add cummax
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32169

Differential Revision: D19393236

Pulled By: anjali411

fbshipit-source-id: 5dac6b0a4038eb48458d4a0b253418daeccbb6bc
2020-01-14 17:19:10 -08:00
02c3493a84 Fix an invalid peephole transformation if input/output values are written to (#28455)
Summary:
fixes https://github.com/pytorch/pytorch/issues/28360
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28455

Differential Revision: D19374601

Pulled By: Krovatkin

fbshipit-source-id: 622f24b40aba03e79e55a6b8d25d88417f7d8bad
2020-01-14 16:28:07 -08:00
2bd179147a Fix typo in config script to re-enable libtorch build and test in macOS CI (#32072)
Summary:
Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue.

**Test Plan:**
Check that libtorch build and test are running again in macOS CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32072

Differential Revision: D19391909

Pulled By: yf225

fbshipit-source-id: 1ab345b099869f78e1124f1a8bd185fa51371b6a
2020-01-14 16:23:57 -08:00
f6f1e0aef5 Automatic update of fbcode/onnx to 65020daafa9183c769938b4512ce543fd5740f8f (#32125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32125

Previous import was 57ebc587fcf3913b4be93653b0dd58c686447298

Included changes:
- **[65020daa](https://github.com/onnx/onnx/commit/65020daa)**: better error message for undefined inputs (#2540) <Yuxin Wu>
- **[8afff0e9](https://github.com/onnx/onnx/commit/8afff0e9)**: bump ORT version (#2538) <Lu Fang>
- **[3d9ca57e](https://github.com/onnx/onnx/commit/3d9ca57e)**: fix name of directory (#2537) <Prasanth Pulavarthi>
- **[df8fa2c9](https://github.com/onnx/onnx/commit/df8fa2c9)**: Repository guidelines (#2539) <Prasanth Pulavarthi>
- **[49cc2f02](https://github.com/onnx/onnx/commit/49cc2f02)**: Update CircleCI job to use Python3.6 (#2527) <bddppq>
- **[25ff79a4](https://github.com/onnx/onnx/commit/25ff79a4)**: Fix wrong model version, it's not 12 (the onnx_opset_version()), not 11 (the opset version of the latest stable), but 10 (#2478) <daquexian>
- **[7cebaed5](https://github.com/onnx/onnx/commit/7cebaed5)**: Fix Windows py3.5 CI (#2529) <bddppq>
- **[eddae00e](https://github.com/onnx/onnx/commit/eddae00e)**: Correct the order of arguments of InferShapes (#2500) <Shinichiro Hamaji>
- **[41b5afe6](https://github.com/onnx/onnx/commit/41b5afe6)**: Include <ostream> in common/status.h (#2519) <Casey Carter>
- **[423f1977](https://github.com/onnx/onnx/commit/423f1977)**: add 8 bit support to maxpool op (#2510) <Ashwini Khade>
- **[78593c2f](https://github.com/onnx/onnx/commit/78593c2f)**: add 8 bit support to reducemin and reducemax ops (#2516) <Ashwini Khade>

Test Plan: cont build

Reviewed By: benoitsteiner

Differential Revision: D19380034

fbshipit-source-id: ddce8450864a611773b2a32e2f0254c9bb6b6906
2020-01-14 15:21:37 -08:00
f3b67bf750 Fix frontend kwarg defualts error (#32146)
Summary:
This was not tested before, fixes #32139 (which was actually a false positive, functions with kwargs but without defaults on those kwargs are supported). This PR adds testing for both cases and cleans up the error reporting.
](https://our.intern.facebook.com/intern/diff/19385828/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32146

Pulled By: driazati

Differential Revision: D19385828

fbshipit-source-id: 5eab74df6d02f8e1d7ec054cafb44f909f9d637e
2020-01-14 14:59:36 -08:00
ecc3497172 Update Gemfile (#32147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32147

### Summary

Got some security warnings regarding the ruby dependencies. This diff updates the packages in Gemfile.

```
GitHub has detected that a package defined in the ios/TestApp/Gemfile.lock file of the pytorch/pytorch repository contains a security vulnerability.

Package name: excon
Affected versions: < 0.71.0
Fixed in version: 0.71.0
Severity: LOW

Identifier(s):
GHSA-q58g-455p-8vw9
CVE-2019-16779
```

### Test Plan

- Won't affect the existing iOS CI jobs

Test Plan: Imported from OSS

Differential Revision: D19400087

Pulled By: xta0

fbshipit-source-id: 34b548d136cfd6b68fcc53bf0b243461bd7afd64
2020-01-14 14:52:50 -08:00
9bf0479b65 Fix the passing-by-ref constructor of OperatorName. (#32170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32170

Stack from [ghstack](https://github.com/ezyang/ghstack):
Change the overload name from passing by const ref to by value and move.
* **#32170 Fix the passing-by-ref constructor of OperatorName.**

Test Plan: Imported from OSS

Differential Revision: D19396225

Pulled By: iseeyuan

fbshipit-source-id: e946c47647e1f8d23d7565cfe93f487845e7f24c
2020-01-14 13:52:12 -08:00
51a34545e9 Revert D18482934: support torch script call over rpc
Test Plan: revert-hammer

Differential Revision:
D18482934

Original commit changeset: bd82a0d820c4

fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158
2020-01-14 13:30:56 -08:00
4a26bb9b18 Suppress pip logs (#31912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31912

### Summary

Clean up the logs from pip-install.

### Test Plan

- Don't break the iOS simulator build

Test Plan: Imported from OSS

Differential Revision: D19395526

Pulled By: xta0

fbshipit-source-id: a638a209cab801ce90c8615e7ea030b1ab0939f3
2020-01-14 12:04:53 -08:00
2bb9dbeffa omit constexpr with nvcc on clang (#32149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32149

This is an attempt at clarifying some of the preprocessor boolean logic that was getting more and more complicated. The previous logic used constexpr with nvcc on clang; which we were getting compiler failures on in ovrsource with mode/linux/* (based on platform007).

Test Plan:
ovrsource xplat/caffe2 compiles
fbsource sandcastle green

Differential Revision: D19385409

fbshipit-source-id: 60a02bae9854388b87510afdd927709673a6c313
2020-01-14 11:49:16 -08:00
b0ac425dc4 Emit warning from deprecated torch function signatures (#32009)
Summary:
Continuation of https://github.com/pytorch/pytorch/issues/31514, fixes https://github.com/pytorch/pytorch/issues/28430
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32009

Test Plan:
I verified that the deprecation warnings only occur once on a relevant workflow. Built with:

```
buck build mode/opt //vision/fair/detectron2/tools:train_net
```

Ran with:

```
DETECTRON2_ENV_MODULE=detectron2.fb.env ~/local/train_net.par --config-file configs/quick_schedules/retinanet_R_50_FPN_instant_test.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 2
```

Inspected log:

```
[01/14 07:28:13 d2.engine.train_loop]: Starting training from iteration 0
buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1299: UserWarning: This overload of add is deprecated:
add(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add(Tensor other, Number alpha)
buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1334: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, Number alpha)
[01/14 07:28:25 d2.utils.events]: eta: 0:00:10  iter: 19  total_loss: 1.699  loss_cls: 1.185  loss_box_reg: 0.501  time: 0.5020  data_time: 0.0224  lr: 0.000100  max_mem: 3722M
[01/14 07:28:35 fvcore.common.checkpoint]: Saving checkpoint to ./output/model_final.pth
```

Differential Revision: D19373523

Pulled By: ezyang

fbshipit-source-id: 75756de129645501f43ecc4e3bf8cc0f78c40b90
2020-01-14 11:44:29 -08:00
61e509b992 Skip un-runnable tests (#31965)
Summary:
`test_init_ops` calls `orthogonal_` which fails without lapack (this test was just missing a skip condition)

The cpp tests would fail with a `undefined symbol` error if run with `BUILD_TESTS=0`, so this PR skips them if that flag is `0`
](https://our.intern.facebook.com/intern/diff/19320064/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31965

Pulled By: driazati

Differential Revision: D19320064

fbshipit-source-id: d1dcd36714107688ded25a414e8969abe026bd03
2020-01-14 11:36:52 -08:00
0664c6bbfd Add ccls cache to gitignore (#31437)
Summary:
`ccls` [puts a cache](https://github.com/MaskRay/ccls/wiki/Customization#cachedirectory) in the working directory by default, this PR adds it to gitignore so git doesn't pick it up
](https://our.intern.facebook.com/intern/diff/19165007/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31437

Pulled By: driazati

Differential Revision: D19165007

fbshipit-source-id: 41012eb0ece2df60b8566d7929710b154c38ee66
2020-01-14 11:27:18 -08:00
b783a75aa3 Fix scalar^tensor derivative for scalars that are zero
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32063

Test Plan: Imported from OSS

Differential Revision: D19394258

Pulled By: agolynski

fbshipit-source-id: 3eed0f9cc1b8c677c6948c927d007044be67fe7f
2020-01-14 11:11:23 -08:00
fa60e1150d Fix tensor^tensor derivative for 0 base entries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32062

Test Plan: Imported from OSS

Differential Revision: D19394259

Pulled By: agolynski

fbshipit-source-id: 836525e03573af838511ad5b4cc87ec2c1536a5e
2020-01-14 11:10:25 -08:00
1487582ba7 Switch important CI from CUDA 9 to 10.1 (#31951)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31427
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31951

Differential Revision: D19393566

Pulled By: ezyang

fbshipit-source-id: 06f9637791494a453d3fbef765840dc9f9805196
2020-01-14 09:38:55 -08:00
dbd737158b support torch script call over rpc (#30063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30063

This diff makes following changes:
1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type.

Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods.

2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well.

3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT.

4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes.

5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue

ghstack-source-id: 96638829

Test Plan: unit test

Differential Revision: D18482934

fbshipit-source-id: bd82a0d820c47a8e45b2e7c616eca06573f7d7ea
2020-01-14 09:27:04 -08:00
5f1a881cb8 Add private user tensor type IDs for experimentation. (#31830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31830

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19330312

Pulled By: ezyang

fbshipit-source-id: fe2e53e732e946088e983ec45fed2393436f0517
2020-01-14 09:01:03 -08:00
8d472bab6b Make torch.backends.mkldnn usable without import
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32055

Differential Revision: D19373220

Pulled By: ezyang

fbshipit-source-id: 50ab3ff70fc893c81123419c4d3cf2e3e48a0a93
2020-01-14 08:19:19 -08:00
77c78b7d28 remove .data from torch/nn doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31481

Test Plan: Imported from OSS

Differential Revision: D19303242

Pulled By: albanD

fbshipit-source-id: 4f650df9e9e302a299175967bcc6e30a5099fa2a
2020-01-14 07:30:42 -08:00
c036fbdc5c remove .data from torch/jit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31480

Test Plan: Imported from OSS

Differential Revision: D19303244

Pulled By: albanD

fbshipit-source-id: ec66b32353f2f9b16072185ecde3ae8abbe09a35
2020-01-14 07:30:37 -08:00
26621d101f remove simple .data from torch/nn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31482

Test Plan: Imported from OSS

Differential Revision: D19303185

Pulled By: albanD

fbshipit-source-id: 610eae096bab24a7b9f651b9af2e3ecd19df55b0
2020-01-14 07:29:24 -08:00
62b1a5f846 Updating submodules
Summary:
GitHub commits:

2156e48924
8c5b4af317
be69716784
4f76ad1fab
0b12b2f13c
0449b53cb1
1481689822
43ffa9bbf0
787d6b6c93

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: b0080fd1a4c26efbe8f26245fbba7740fbac08f3
2020-01-13 20:15:38 -08:00
a472f0201f Added support for Dim operation in ONNX export (#31928)
Summary:
While ONNX does not currently directly support the Dim operation on a
tensor, we can provide the same functionality with two ONNX operations.
This allows us to support Dim for all opsets. It may be adventageous to
add support for Dim into a future ONNX opset, and use that for more
efficient code.
While testing dim op found that there is an issue with empty blocks
withing if statements. Modified graph generation to prevent generation
of empty if blocks.

Fixes https://github.com/pytorch/pytorch/issues/27569
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31928

Reviewed By: hl475

Differential Revision: D19376602

Pulled By: houseroad

fbshipit-source-id: 111682b058a5341f5cca6c1a950c83ae412a4c6c
2020-01-13 19:42:43 -08:00
c474952b5d Updating submodules
Summary:
GitHub commits:

1f8321394d
024c1d0b43
1d57089fc3
3c6f1f782c
21a27b0f8e
23bb716b62
894c6d21af
e3e241d700
ac4e11d84a
c35803ad68
647388f265
50a3288630
b197f0c95a

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 1807ac876a126d221c257edbd4732f9a1240e869
2020-01-13 18:07:08 -08:00
470c496eb2 use cholesky_inverse to compute precision matrix (#32092)
Summary:
Resolves a long-standing TODO. :D

I also fix the docs of lowrank_mvn which is raised at [forum](https://discuss.pytorch.org/t/lowrankmultivariatenormal-example-raises-valueerror/65381).

cc vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32092

Differential Revision: D19373912

Pulled By: ezyang

fbshipit-source-id: b13129d7c30e87c6f8a6ced86601762a3f5c5624
2020-01-13 16:35:46 -08:00
f003008d6e Allow TCPStore to pick a port to bind to. (#31674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31674

The motivation of this PR was to fix the problem where we would see
"Address already in use" issues for TCPStoreTest due to port conflicts. To
resolve this:

1. We can now pass in port 0 for TCPStore and retrieve the port it actually
bound to using a new getPort() API.
2. Added a `wait` flag to TCPStore constructor indicating whether or not it
should wait for workers (defaults to true).
3. Made `waitForWorkers` a public API to ensure that we can construct TCPStore
without waiting and wait for workers separately. This helps in TCPStoreTest to
ensure we can retrieve the port and pass it to the client stores.
ghstack-source-id: 96486845

Test Plan: waitforbuildbot

Differential Revision: D19240947

fbshipit-source-id: 7b1d1cb2730209fac788764845f1dbbe73d75d9b
2020-01-13 14:23:31 -08:00
632d6fc583 Revert D19373615: Fix typo in config script to re-enable libtorch build and test in macOS CI
Test Plan: revert-hammer

Differential Revision:
D19373615

Original commit changeset: 28686ef58953

fbshipit-source-id: 432b04adfd9d010e1965846a386f117ebc80e013
2020-01-13 14:11:30 -08:00
701ca68882 Docs entry for the is_quantized
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32075

Test Plan: Imported from OSS

Differential Revision: D19353861

Pulled By: z-a-f

fbshipit-source-id: 4249216ac9a4af354a251c62181d65bc14cbfd3e
2020-01-13 13:54:35 -08:00
d53ce5e4cd Updating submodules
Summary:
GitHub commits:

b5718e35c8
e1af1b0550
8a34e7f444
e9e70ade5b
d9e693ece0
329347c63c
671b5aa064
7f3bb0bf37
6207e92b9b
d4b95d87d4

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 3c9131bdee0bf8a8ca5c679a95e8ff8a6f805762
2020-01-13 13:30:11 -08:00
d97413eb7a Change python/cpp docs CI to use a CPU-only image (#32102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32102

Previously, the docs CI depended on our CUDA xenial py3 build. This
meant that the turnaround time to get signal for docs was very slow
(I've seen builds that go as much as 3 hours).

Fortunately, the docs CI do not (and should not!) rely on CUDA. This
PR changes it so that the docs CI runs on a CPU-only machine.

Fixes #29995

Test Plan:
- Check CI status on this PR by reading logs for the python and cpp docs
builds.
- I built the docs locally, once for CPU, and once for CUDA, and
verified (via diff) that the pages were exactly the same)

Differential Revision: D19374078

Pulled By: zou3519

fbshipit-source-id: 3eb36f692c3c0632d2543d3439c822d51a87b809
2020-01-13 12:01:49 -08:00
1f34801460 More robust mangling (#31978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31978

Currently we keep a `mangleIndex_` that's intenral to compilation unit and
just increment the index when we found the original name is mangled, this doesn't
guarantee the new name is not defined.
This PR fixes the problem by querying whether the new name is defined or not.
fixes: https://github.com/pytorch/pytorch/issues/31268

Test Plan:
fixes the issue

Imported from OSS

Differential Revision: D19350535

fbshipit-source-id: fe3262b2838d4208ab72e2cd4a5970b3a792ae86
2020-01-13 11:11:50 -08:00
a3dd44653f Fix typo in config script to re-enable libtorch build and test in macOS CI (#32072)
Summary:
Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue.

**Test Plan:**
Check that libtorch build and test are running again in macOS CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32072

Differential Revision: D19373615

Pulled By: yf225

fbshipit-source-id: 28686ef5895358a2b60db46b1946f21c58c6a18e
2020-01-13 10:25:10 -08:00
5988d36f58 Fix cumprod error for tensors with zero elements (#32070)
Summary:
Currently cumprod crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both dim() and numel() in cumprod backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32070

Differential Revision: D19373200

Pulled By: ezyang

fbshipit-source-id: d8ecde33f3330b40a7c611f6faa3b1d707ef2a9a
2020-01-13 09:50:27 -08:00
695c4f1bab Fix a typo in function name: liner -> linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32068

Test Plan: Imported from OSS

Differential Revision: D19373360

Pulled By: nairbv

fbshipit-source-id: 7696300b5c1dbcd7991fda3311d68807b2960982
2020-01-13 09:33:50 -08:00
8e93159fb6 CUDA 8 cleanup (#32013)
Summary:
CUDA 8 is no longer supported
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32013

Differential Revision: D19372963

Pulled By: ezyang

fbshipit-source-id: e584d7d5d5908933221ea4400234b3e6e7c32e7a
2020-01-13 08:48:48 -08:00
9a4219eb39 Install complete set of headers for ROCm build (#32076)
Summary:
This PR adds a more complete list of pytorch header files to be installed at build time. It also fixes one instance of including a header from local src directory instead of installed directory.
A more complete set of headers enable other modules to correctly work with pyTorch built for ROCm.

cc: ezyang bddppq iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32076

Differential Revision: D19372933

Pulled By: ezyang

fbshipit-source-id: 3b5f3241c001fa05ea448c359a706ce9a8214aa0
2020-01-13 08:33:28 -08:00
4002fec509 Display NVCC version in CI for convenience to look at
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32069

Differential Revision: D19372943

Pulled By: ezyang

fbshipit-source-id: c78e5779d4139e42df1f235db65d8c0399ffa1a2
2020-01-13 08:16:52 -08:00
e74a215ade Changed clip_grad_norm_ total_norm calculation (#32020)
Summary:
Redefines the computation of the total_norm to increase performance as shown in https://github.com/pytorch/pytorch/issues/31474.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32020

Differential Revision: D19353309

Pulled By: ngimel

fbshipit-source-id: bf7530dcd39f56614a211b5f21445864d4f2e875
2020-01-13 08:13:46 -08:00
77c2c78e01 Fix typographical error in torch.triu docstring (#32067)
Summary:
below --> above

Fixes https://github.com/pytorch/pytorch/issues/32032
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32067

Differential Revision: D19355788

Pulled By: zou3519

fbshipit-source-id: dc7a2538a78cd11e72d47ad923ef50599a5a87e2
2020-01-13 07:21:33 -08:00
14593f077f remove list specialization from ivalue (#30734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734

What are specialized lists?

The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types.
e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>.

Why do we have specialized lists?

When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>,
std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain
these same types. Conversion was just unwrapping the IValue, very easy and cheap.

What is the problem with specialized lists?

We end up with significant special cases through the compiler. Other types like Dict are not
specialized. So in the Pickler, for instance, there is a single piece of logic to handle
their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't
match Python, leading to problems along translation boundaries. Our pickle serialization
is slightly different than python, so it is harder to load objects from our IValue serialization
as Python values.

They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++
bindings to TorchScript. This would entail having a single torch::List class (untemplated)
that can be used to construct inputs. This is made much harder if the underlying ivalue needs
to be different depending on the type inside the list. The ideal case would be to have a constructor like

```
template<typename T>
List(std::vector<T> foo);
```

It would then set up the type tags correctly based on type T, without the need for passing tags.

Do specialized lists improve perf?

Not in a way we have been able to measure. Our major concern initially was having to translate
a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern
for aten::_convolution which takes a number of mostly-constant lists of integers. However,
when we measure the effect of actually having to do this conversion for an aten::_convolution,
it does not take measurable time (benchmark results below).
This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code.

What are the issues removing them?

This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly
the same. The only visible change is that toTensorListRef and family have turned into toTensorVector
because they now return by value a copy of the list as a vector.

Further PRs can then clean up the complexity issues that arose from speclization. This will likely
involve removing the isTensorList/isIntList functions, and refactoring the code that used them to
work generically. At some point we will also change serialization to no longer write specialized
lists in the pickle binary. This is forward incompatible, so will go in its own PR.

Benchmark:
```
import torch

import torch.nn as nn
import torch.nn.functional as F
import time

class MnistNet(nn.Module):
    def __init__(self):
        super(MnistNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, kernel_size=1)
        self.conv2 = nn.Conv2d(1, 1, kernel_size=1)

    def forward(self, x):
        for i in range(10):
            x = F.relu(self.conv1(x))
            x = F.relu(self.conv2(x))
        return x

model = MnistNet()
x = torch.rand(1, 1, 1, 1)
r = torch.jit.trace(model, x )
r(x)
r(x)
r(x)
r(x)
print(torch.jit.last_executed_optimized_graph())

while True:
    b = time.time()
    for i in range(100):
        r(x)
    e = time.time()
    print(e - b)
```

Results (no observable difference):

```
Before (actual conv)
0.13251137733459473
0.13260436058044434
0.13276338577270508
0.1327497959136963
0.13250041007995605
0.13270330429077148
0.13290190696716309
0.13265132904052734
0.13274288177490234
0.1326758861541748
0.13253355026245117
0.13254785537719727
0.13260746002197266
0.13285017013549805
0.13264012336730957
0.132490873336792
0.13280034065246582
0.13243484497070312
0.1325232982635498
0.1326127052307129
0.13264131546020508
0.13274383544921875
0.13298296928405762
0.1326909065246582
-------------------
After (actual conv)
0.13127517700195312
0.13150334358215332
0.13092470169067383
0.13102364540100098
0.13134360313415527
0.13155555725097656
0.13314104080200195
0.13151955604553223
0.13160037994384766
0.1315293312072754
0.13137340545654297
0.13148093223571777
0.131455659866333
0.1327371597290039
0.13134026527404785
0.13152337074279785
0.13151192665100098
0.13165974617004395
0.13403725624084473
0.13251852989196777
0.13135504722595215
0.1315624713897705
0.1317615509033203
0.1314380168914795
0.13157200813293457
--------------------

The following replace the convolution operator with a no-op, to show
that even if the conv op was made faster, then we still would not see
a difference:

Before (fake conv)
0.0069539546966552734
0.0069522857666015625
0.007120847702026367
0.007344722747802734
0.007689952850341797
0.007932662963867188
0.00761723518371582
0.007501363754272461
0.007532835006713867
0.007141828536987305
0.007174253463745117
0.007114410400390625
0.007071495056152344
------------------
After (fake conv)
0.007458209991455078
0.007337093353271484
0.007268190383911133
0.007313251495361328
0.007306575775146484
0.007468700408935547
0.0073091983795166016
0.007308483123779297
0.007538318634033203
0.007356882095336914
0.007464170455932617
0.007372140884399414
```

Test Plan: Imported from OSS

Differential Revision: D18814702

Pulled By: zdevito

fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6
2020-01-12 18:28:25 -08:00
46f32e136a Revert "Support PyTorch ROCm CI on Ubuntu18.04 (#31886)" (#31946)
Summary:
This reverts commit 4ee9c562188ae930cb2520cfce7805f55acaf968.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31946

Differential Revision: D19368391

Pulled By: bddppq

fbshipit-source-id: 63d032a5256ff4da7247fb1092be314c5b133eb6
2020-01-12 14:04:38 -08:00
927c2a02b0 enable autograd profiler to work with RPC and RRef. (#31381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31381

This PR adds support for being able to profile both sync and async RPCs, so that users can use the autograd profiler and be able to view metrics such as RPC latency and number of calls in the profiler output.

The way this is implemented is by using the existing `RecordFunction` class provided by the autograd profiler. We create a `RecordFunction` instance when sending an RPC, if autograd profiling is enabled. We also invoke the starting callbacks on this `RecordFunction` instance, this does things such as start the CPU timer.  This instance is then persisted across the lifetime of the RPC by attaching it to the `Future` created by the RPC. When the RPC is finished (i.e. when `future->markComplete()` is called), we run the `RecordFunction` instance's end callbacks, which among other things, stops the timer so that we get the correct RPC latency.

The `RecordFunction` and relevant callbacks in `profiler.cpp` are modified slightly to support running end callbacks from a different thread (which is needed since futures are marked as completed by a different thread than the main RPC thread). By default, the autograd profiler uses a `thread_local` list of `Events` and `thread_id`. However, since we'd like to run the `RecordFunction`'s callbacks from a different thread, we would like to access the list of `Events` created by the original thread. This is done by attaching the `thread_id` for the event to the `RecordFunction`, and then looking up the event with that thread in `all_event_lists` (see the changes in `profiler.cpp`). To ensure that the original behavior does not change in the profiler, this described behavior is only run when a user calls `setOverrideThreadId()` on the `RecordFunction` object.
ghstack-source-id: 96527291

Test Plan: Added a unit test.

Differential Revision: D19053322

fbshipit-source-id: 9a27a60c809fc4fdb16fa5d85085f3b6b21abfbb
2020-01-10 21:26:18 -08:00
20e5c90d82 accept url query when rank or wolrd_size is specified (#32016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32016

The previously logic will raise exception when there is query in url when rank or world_size is specified
The fix will parse the url and stitch rank and world_size into url.query and regenerate the url.

Test Plan: f161291877

Differential Revision: D19337929

fbshipit-source-id: 6bb3a07716dda5233553804000b706052ff18db8
2020-01-10 18:27:06 -08:00
b6cee03e29 C++ tensor indexing: add Slice / TensorIndex (#30424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30424

`at::indexing::TensorIndex` is used for converting C++ tensor indices such as `{None, "...", Ellipsis, 0, true, {1, None, 2}, torch::tensor({1, 2})}` into its equivalent `std::vector<TensorIndex>`, so that further tensor indexing operations can be performed using the supplied indices.

Test Plan: Imported from OSS

Differential Revision: D18695902

Pulled By: yf225

fbshipit-source-id: d73e14a411cdbec815866b02e75ffd71a9186e89
2020-01-10 17:53:41 -08:00
638e4ad8b9 Updated function definition for torch.mode and torch.median in torch docs (#32003)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/32002
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32003

Differential Revision: D19334306

Pulled By: anjali411

fbshipit-source-id: fe6a7cc7295b2d582a0b528f353ec64d9085e8c5
2020-01-10 13:13:54 -08:00
28c1258f18 Scale init for batch-norm and layer-norm (#31983)
Summary:
Per discussion with Fei Tian, we need to add a `scale_init_value` to scale down the output of normalization such as batch-norm and layer-norm.

Currently we have `sparse_normalization_options` to normalize embedding pooling output. By default, scale = 1.0, we found it's better to set scale from 0.025 to 0.1 https://fb.quip.com/MiKUAibEaYhH

Besides, I am removing the tags from normalizers because it makes more sense to calculate norm ops in distributed trainers, not ps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31983

Test Plan:
Testing LN and BN after sum-pooling --
baseline f160348514
LN: f160348609
BN: f160348710

{F226106518}

Layer norm after sum-pooling fwd_net https://fburl.com/sa4j207n
Layer norm after dot-prod fwd_net https://fburl.com/twggwyvb

## Unit Tests
Testing normalization after pooling
```
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_batch_normalization
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_batch_normalization
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_layer_normalization
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_layer_normalization
```

Testing normalization after dot-prod
```
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_batch_norm
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_layer_norm
```

Differential Revision: D19277618

Pulled By: SilunWang

fbshipit-source-id: ea323e33e3647ba55d2e808ef09d94ad7b45b934
2020-01-10 11:55:56 -08:00
c5af0afdcb catch exceptions in ProcessGroupAgent::enqueueSend and report them. (#31023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31023

Adds support to catch exceptions in ProcessGroupAgent::enqueueSend and
report them in the future by marking the future as completed with an exception
indicating the error. An example of when this could happen is if the receiving
side aborts when the sender is sending the message, previously, we would hang
until the timeout is hit, and the original exception would be lost.
ghstack-source-id: 96498386

Test Plan: Added a relevant unit test: `test_sender_exceptions` in rpc_test.py

Differential Revision: D18901981

fbshipit-source-id: 08de26936c4ad45b837219a247088cbea644c04c
2020-01-10 11:39:57 -08:00
346005d3ed integrate op dependency analysis process into CI
Summary:
Custom build and internal build will depend on the analysis result so
let's make sure it doesn't break.

Tested locally with LLVM-5.0, LLVM-7 and LLVM-8.

Test Plan: - check CI result

Differential Revision: D18894637

Pulled By: ljk53

fbshipit-source-id: 657854e4bed85a84907e3b6638d158823a56ec80
2020-01-10 11:37:37 -08:00
16b8ca56b6 update docker image version (#31848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31848

Trigger docker image build and bump up docker image version.

Test Plan: - Check tag at: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html

Differential Revision: D19282725

Pulled By: ljk53

fbshipit-source-id: a27b2831a92ff54d80ccbae0f18dadff0469254c
2020-01-10 11:37:32 -08:00
03ff3eb94d skip TEST_DILL on Python2 (#32027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32027

The test was added in #30985 for #28313. Seems the fix only works for
Python3 but doesn't work on Python2. The current Python2 CI docker image
doesn't have `dill` module installed at all so it's not captured.

I'm trying to build and push new CI docker image which has `dill` installed
and I verified it's the latest version 0.3.1.1 but the fix doesn't seem
to work and blocks me from upgrading image version. It works for Python3
docker image though...

Here is a succeeded job with old image (no dill installed):
https://app.circleci.com/jobs/github/pytorch/pytorch/4192688

Here is a failed job with new image (dill installed):
https://app.circleci.com/jobs/github/pytorch/pytorch/4192679

This PR bypasses the test for Py2 to unblock docker image change. We
can figure out a proper fix for Py2 later.

Test Plan: Imported from OSS

Differential Revision: D19341451

Pulled By: ljk53

fbshipit-source-id: d5768de8cbaf1beba8911da76f4942b8f210f2d2
2020-01-10 11:37:28 -08:00
ab5eb65e74 gate torch_global_deps with BUILD_SHARED_LIBS flag (#32011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32011

Run into build problem with Ninja + code analysis build as follows:
```
The install of the torch_global_deps target requires changing an RPATH from
the build tree, but this is not supported with the Ninja generator unless
on an ELF-based platform.
```

Seems we don't need build the target for static build mode?

Verified code analyzer works with the patch.

Test Plan: Imported from OSS

Differential Revision: D19336818

Pulled By: ljk53

fbshipit-source-id: 37f45a9392c45ce92c1df40d739b23954e50a13a
2020-01-10 11:37:24 -08:00
f995ec2076 Remove qconfig_dict in top level eager mode quantization API (#31972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31972

Since eager mode quantization requires many user modifications, we can't
consistently quantize a given model by just changing qconfig_dict, therefore
the top level `qconfig_dict` is not that useful.
fixes: https://github.com/pytorch/pytorch/issues/31549

Test Plan:
.

Imported from OSS

Differential Revision: D19330691

fbshipit-source-id: 8aee6e5249e0c14e8a363ac1a83836e88887cd7d
2020-01-10 11:04:37 -08:00
c5a362a96d Updating submodules
Summary:
GitHub commits:

b14a430062
c1c5426018
42d18a93c4
a4e11e8721
25c971b0c3
b2ea65322f
e86573b6de
31d721301c
687119aeaf
25cad9547d
428862c045
95640f80d8
0e4db05b37
5cb83de9cc
4fdb800074

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: bcd533c540c1170844dbf2b23538d72c95a0d304
2020-01-10 11:01:20 -08:00
8098ae455c Move rshift to Aten (#31594)
Summary:
VitalyFedyunin , this PR is about move rshift to Aten.
Benchmark script :
```
import timeit
import torch
torch.manual_seed(1)

for n, t in [(10, 100000),(1000, 10000)]:
    print('__rshift__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))
        for dtype in ('torch.float32', 'torch.float64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t))

for n, t in [(10, 100000),(1000, 10000)]:
    print('__irshift__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
        for dtype in ('torch.float32', 'torch.float64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__rshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.17183916084468365
device: cpu, dtype: torch.uint8, 100000 times           0.16587729007005692
device: cpu, dtype: torch.int16, 100000 times           0.16659130714833736
device: cpu, dtype: torch.int32, 100000 times           0.17177579551935196
device: cpu, dtype: torch.int64, 100000 times           0.17860156949609518
device: cpu, dtype: torch.float32, 100000 times         0.23938780091702938
device: cpu, dtype: torch.float64, 100000 times         0.22591270506381989
device: cuda, dtype: torch.int8, 100000 times           1.2709560776129365
device: cuda, dtype: torch.uint8, 100000 times          1.2692269310355186
device: cuda, dtype: torch.int16, 100000 times          1.2785452520474792
device: cuda, dtype: torch.int32, 100000 times          1.2733035255223513
device: cuda, dtype: torch.int64, 100000 times          1.2785427365452051
device: cuda, dtype: torch.float32, 100000 times                1.2980637094005942
device: cuda, dtype: torch.float64, 100000 times                1.3062487514689565
__rshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03122080024331808
device: cpu, dtype: torch.uint8, 10000 times            0.030290847644209862
device: cpu, dtype: torch.int16, 10000 times            0.024531075730919838
device: cpu, dtype: torch.int32, 10000 times            0.024743229150772095
device: cpu, dtype: torch.int64, 10000 times            0.025563121773302555
device: cpu, dtype: torch.float32, 10000 times          0.6707976600155234
device: cpu, dtype: torch.float64, 10000 times          0.5344798369333148
device: cuda, dtype: torch.int8, 10000 times            0.12768010422587395
device: cuda, dtype: torch.uint8, 10000 times           0.12681372743099928
device: cuda, dtype: torch.int16, 10000 times           0.12995595764368773
device: cuda, dtype: torch.int32, 10000 times           0.12989260721951723
device: cuda, dtype: torch.int64, 10000 times           0.12804713658988476
device: cuda, dtype: torch.float32, 10000 times         0.13013121113181114
device: cuda, dtype: torch.float64, 10000 times         0.1406280631199479
__irshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3805475188419223
device: cpu, dtype: torch.uint8, 100000 times           0.36341007333248854
device: cpu, dtype: torch.int16, 100000 times           0.36908434610813856
device: cpu, dtype: torch.int32, 100000 times           0.3669992135837674
device: cpu, dtype: torch.int64, 100000 times           0.37847711704671383
device: cpu, dtype: torch.float32, 100000 times         0.4311870699748397
device: cpu, dtype: torch.float64, 100000 times         0.44503832422196865
device: cuda, dtype: torch.int8, 100000 times           1.4343859804794192
device: cuda, dtype: torch.uint8, 100000 times          1.4298221375793219
device: cuda, dtype: torch.int16, 100000 times          1.4460898758843541
device: cuda, dtype: torch.int32, 100000 times          1.4518025070428848
device: cuda, dtype: torch.int64, 100000 times          1.4456725595518947
device: cuda, dtype: torch.float32, 100000 times                1.4610810624435544
device: cuda, dtype: torch.float64, 100000 times                1.4736663019284606
__irshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.05944254994392395
device: cpu, dtype: torch.uint8, 10000 times            0.058085592463612556
device: cpu, dtype: torch.int16, 10000 times            0.05094402376562357
device: cpu, dtype: torch.int32, 10000 times            0.050842881202697754
device: cpu, dtype: torch.int64, 10000 times            0.06223891582340002
device: cpu, dtype: torch.float32, 10000 times          0.7006897022947669
device: cpu, dtype: torch.float64, 10000 times          0.5614962242543697
device: cuda, dtype: torch.int8, 10000 times            0.1461706068366766
device: cuda, dtype: torch.uint8, 10000 times           0.14335164614021778
device: cuda, dtype: torch.int16, 10000 times           0.1448021186515689
device: cuda, dtype: torch.int32, 10000 times           0.14513055887073278
device: cuda, dtype: torch.int64, 10000 times           0.1439579650759697
device: cuda, dtype: torch.float32, 10000 times         0.14666561130434275
device: cuda, dtype: torch.float64, 10000 times         0.1540807681158185
```
After:
```
_rshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.16366520430892706
device: cpu, dtype: torch.uint8, 100000 times           0.16091545950621367
device: cpu, dtype: torch.int16, 100000 times           0.1659633992239833
device: cpu, dtype: torch.int32, 100000 times           0.1682385364547372
device: cpu, dtype: torch.int64, 100000 times           0.17289020214229822
device: cpu, dtype: torch.float32, 100000 times         0.24359441827982664
device: cpu, dtype: torch.float64, 100000 times         0.21783945057541132
device: cuda, dtype: torch.int8, 100000 times           1.2517220517620444
device: cuda, dtype: torch.uint8, 100000 times          1.260181212797761
device: cuda, dtype: torch.int16, 100000 times          1.2681935774162412
device: cuda, dtype: torch.int32, 100000 times          1.2764465296640992
device: cuda, dtype: torch.int64, 100000 times          1.294325228780508
device: cuda, dtype: torch.float32, 100000 times                1.3062216322869062
device: cuda, dtype: torch.float64, 100000 times                1.303224254399538
__rshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.027045012451708317
device: cpu, dtype: torch.uint8, 10000 times            0.026978280395269394
device: cpu, dtype: torch.int16, 10000 times            0.025594274513423443
device: cpu, dtype: torch.int32, 10000 times            0.02593063935637474
device: cpu, dtype: torch.int64, 10000 times            0.02668109256774187
device: cpu, dtype: torch.float32, 10000 times          0.09746317192912102
device: cpu, dtype: torch.float64, 10000 times          0.1644029449671507
device: cuda, dtype: torch.int8, 10000 times            0.12530914042145014
device: cuda, dtype: torch.uint8, 10000 times           0.12615622486919165
device: cuda, dtype: torch.int16, 10000 times           0.12741118855774403
device: cuda, dtype: torch.int32, 10000 times           0.1284919548779726
device: cuda, dtype: torch.int64, 10000 times           0.12974756956100464
device: cuda, dtype: torch.float32, 10000 times         0.13044228963553905
device: cuda, dtype: torch.float64, 10000 times         0.13918257877230644
__irshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.19456563983112574
device: cpu, dtype: torch.uint8, 100000 times           0.190769555978477
device: cpu, dtype: torch.int16, 100000 times           0.2002257639542222
device: cpu, dtype: torch.int32, 100000 times           0.20456529594957829
device: cpu, dtype: torch.int64, 100000 times           0.2043834924697876
device: cpu, dtype: torch.float32, 100000 times         0.2832390898838639
device: cpu, dtype: torch.float64, 100000 times         0.2582795573398471
device: cuda, dtype: torch.int8, 100000 times           1.304957083426416
device: cuda, dtype: torch.uint8, 100000 times          1.3216373259201646
device: cuda, dtype: torch.int16, 100000 times          1.3238621400669217
device: cuda, dtype: torch.int32, 100000 times          1.333009460940957
device: cuda, dtype: torch.int64, 100000 times          1.3835567953065038
device: cuda, dtype: torch.float32, 100000 times                1.4483617274090648
device: cuda, dtype: torch.float64, 100000 times                1.4179155295714736
__irshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03196091763675213
device: cpu, dtype: torch.uint8, 10000 times            0.03048650734126568
device: cpu, dtype: torch.int16, 10000 times            0.03048624936491251
device: cpu, dtype: torch.int32, 10000 times            0.030591044574975967
device: cpu, dtype: torch.int64, 10000 times            0.031246556900441647
device: cpu, dtype: torch.float32, 10000 times          0.10918692220002413
device: cpu, dtype: torch.float64, 10000 times          0.18057993799448013
device: cuda, dtype: torch.int8, 10000 times            0.13614848721772432
device: cuda, dtype: torch.uint8, 10000 times           0.130373639985919
device: cuda, dtype: torch.int16, 10000 times           0.1332557238638401
device: cuda, dtype: torch.int32, 10000 times           0.1331850504502654
device: cuda, dtype: torch.int64, 10000 times           0.1363008264452219
device: cuda, dtype: torch.float32, 10000 times         0.1370363561436534
device: cuda, dtype: torch.float64, 10000 times         0.1442740885540843
```
Fix https://github.com/pytorch/pytorch/issues/24512 #24516  https://github.com/pytorch/pytorch/issues/24659  https://github.com/pytorch/pytorch/issues/24663
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31594

Differential Revision: D19346542

Pulled By: ezyang

fbshipit-source-id: 37dd00b86898810b850cf4769c3af8aea6d4596b
2020-01-10 10:52:15 -08:00
a201027e93 Abstract atomic add calls (#31992)
Summary:
Instead of a mixture of direct calls to library provided atomicAdd calls, such as float atomicAdd(float*, float) and calls provided internally, such as void atomicAdd(long*, long), abstract to one API void gpuAtomicAdd(T*, T) in THCAtomics.cuh for the PyTorch backend.

The advantage of this approach is that it allows us to more easily distinguish between capabiltiies of different platforms (and their versions). Additionally, the abstraction of void returning atomicAdds allows us to, in the future, support fast HW instructions on some platforms that will not return the previous value.

Call sites that do not satisfy above conditions and are either highly platform specific (__half2 atomicAdd fast path in one operator) or require the return explicitly (some int atomicAdd invocations) are left untouched. The Caffe2 backend also remains untouched.

While here, add a bunch of includes of THCAtomics.cuh that were missing before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31992

Differential Revision: D19330220

Pulled By: ezyang

fbshipit-source-id: d6ab73ec5168c77e328faeef6c6f48eefba00861
2020-01-10 09:48:42 -08:00
c6f41ae01b Fix and add more padding mode support for Conv (#31784)
Summary:
Fix https://github.com/pytorch/pytorch/issues/29712 #29668 , add arg checking, doc, and support for reflection and replication padding modes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31784

Differential Revision: D19301974

Pulled By: ezyang

fbshipit-source-id: a0ed4815c0c22e416b16e256bba04324e376b2f8
2020-01-10 08:14:58 -08:00
b6f43afaca Fix tensordot allowing negative dims (#31954)
Summary:
fixes https://github.com/pytorch/pytorch/issues/31926
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954

Differential Revision: D19331847

Pulled By: zou3519

fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28
2020-01-10 07:42:04 -08:00
8ea49e7a08 add missing braces for format in rpc _to_worker_info (#31969)
Summary:
This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969

Differential Revision: D19331927

Pulled By: rohan-varma

fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65
2020-01-09 23:18:46 -08:00
4e84661139 update llvmlite to 0.30.0 (#31858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31858

Trying to upgrade docker image but ran into the following error:

```
Running test_nn ... [2020-01-04 18:05:12.537860]
Traceback (most recent call last):
  File "test_nn.py", line 45, in <module>
    from common_cuda import TEST_CUDA, TEST_MULTIGPU, TEST_CUDNN, TEST_CUDNN_VERSION
  File "/var/lib/jenkins/workspace/test/common_cuda.py", line 16, in <module>
    import numba.cuda
  File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 178, in <module>
    _ensure_llvm()
  File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 100, in _ensure_llvm
    raise ImportError(msg)
ImportError: Numba requires at least version 0.30.0 of llvmlite.
Installed version is 0.28.0.
```

Test Plan: Imported from OSS

Differential Revision: D19282923

Pulled By: ljk53

fbshipit-source-id: bdeefbf4f6c0c97df622282f76e77eb1eadba436
2020-01-09 19:28:08 -08:00
62f93443e5 Explain RPC behavior when using Tensor as arg or return value
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968

Test Plan: Imported from OSS

Differential Revision: D19321380

Pulled By: mrshenli

fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac
2020-01-09 16:42:24 -08:00
6abfa9ad8a Quantized H Tangent function (#31031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031

This activation will be needed for the LSTM implementation.
Also includes the QNNPack implementation.

Test Plan: Imported from OSS

Differential Revision: D19334280

Pulled By: z-a-f

fbshipit-source-id: ae14399765a47afdf9b1e072d3967c24ff473e8d
2020-01-09 16:16:17 -08:00
021e1e20c1 Revert D19320493: Javadoc changes
Test Plan: revert-hammer

Differential Revision:
D19320493

Original commit changeset: cc76b2a2acbe

fbshipit-source-id: 3b36dd2d2591acc60a06a421dd625c21adbe578a
2020-01-09 14:23:30 -08:00
700d1c5cbc update CI script to take string docker image version (#31857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31857

According to mingbowan we will change to use string docker image
version because the tag is no longer an integer since we move the docker
image build job to circle CI:
http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html

Test Plan: - with stacked PR

Differential Revision: D19282726

Pulled By: ljk53

fbshipit-source-id: 7a12ae89a11cf15163b905734d50fed6dc98cb07
2020-01-09 14:15:10 -08:00
67ff051ddd Remove temporary fix for torchbind in BC check (#31982)
Summary:
Remove the patch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31982

Reviewed By: hl475

Differential Revision: D19333205

Pulled By: houseroad

fbshipit-source-id: 1d16fd31ede7266789141238520d47b762a7a340
2020-01-09 13:58:16 -08:00
2968faf154 Update doc about output_differentiability keyword in derivatives.yaml
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31925

Test Plan: Imported from OSS

Differential Revision: D19303833

Pulled By: albanD

fbshipit-source-id: 291a9f122720844a5f8386b22cf6abc66ae86e4d
2020-01-09 13:48:06 -08:00
67c1d930eb Lock graph_task before writing leaf_streams. (#31995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995

Fixes #31906.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19331259

Pulled By: ezyang

fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d
2020-01-09 13:26:36 -08:00
1296e2d55e C++ API parity: isinf (#31099)
Summary:
fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099

Differential Revision: D19314733

Pulled By: yf225

fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e
2020-01-09 13:16:13 -08:00
cfdfdf70d7 remove JSON dumping dependency (#30724)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19420

So after actually writing a C++ JSON dumping class I figured that
a faster and cleaner way would be simply rewrite the Python without
the JSON module since the JSON that we need to output is so simple.

For now I decided to not touch the `parse_cpu_trace` function since
only changing `export_chrome_trace` shows a 4x speedup.

Here's the script I used for benchmarking:
``` python
import time
import torch

x = torch.ones(2, 2)

start = time.time()
with torch.autograd.profiler.profile() as prof:
  for _ in range(10000):
    x * x

for i in range(50):
  prof.export_chrome_trace("trace.json")

stop = time.time()

print(stop-start)
```
master branch (using json dump) -> 8.07515025138855
new branch (without json dump) ->  2.0943689346313477

I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659)
and it does work fine.

Please let me know what you think.

If you still insist on the C++ version I can send a new patch soon enough.

CC ezyang rgommers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724

Differential Revision: D19298955

Pulled By: ezyang

fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427
2020-01-09 12:56:16 -08:00
bc68a8745f Spelling fix in transformer docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973

Differential Revision: D19330660

Pulled By: zou3519

fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba
2020-01-09 11:13:23 -08:00
26f552a3d1 Javadoc changes (#31956)
Summary:
- Add Javadoc url in index.rst
- Delete no longer needed java rst files
- Remove intersphinx extension from conf.oy
- Remove javasphinx from docs/requirements.txt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31956

Differential Revision: D19320493

Pulled By: jlin27

fbshipit-source-id: cc76b2a2acbe2ecdabcd3339e1cc3182f0c906ae
2020-01-09 10:55:24 -08:00
e59e5ba5a3 Move geometric to Aten(CPU) (#31878)
Summary:
Fix https://github.com/pytorch/pytorch/issues/24704.
Benchmark script :
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(1000):
        input.geometric_(0.5)

for n in [1, 10, 100, 1000]:
    fwd_t = 0
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(10000):
        t1 = _time()
        input.geometric_(0.5)
        t2 = _time()
        fwd_t = fwd_t + (t2 -t1)
    fwd_avg = fwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg))
```
Test device: **skx-8180**.
Before:
```
input size(128, 1) forward time is 0.0092 (ms).
input size(128, 10) forward time is 0.0802 (ms).
input size(128, 100) forward time is 0.7994 (ms).
input size(128, 1000) forward time is 7.8403 (ms).
```
After:
```
input size(128, 1) forward time is 0.0088 (ms).
input size(128, 10) forward time is 0.0781 (ms).
input size(128, 100) forward time is 0.7815 (ms).
input size(128, 1000) forward time is 7.7163 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31878

Differential Revision: D19314510

Pulled By: ezyang

fbshipit-source-id: 2d95bf9938c8becf280890acf9e37223ddd08a39
2020-01-09 10:47:56 -08:00
99b3f9cac4 Move log_sigmoid to Aten(CPU) (#30958)
Summary:
VitalyFedyunin, This PR is about port LogSigmoid activation to Aten:
Test script:
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"
m = nn.LogSigmoid()
#warm up
for n in [1, 10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.randn(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [1, 10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.randn(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
**Before:**
```
input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms).
input size(128, 100) forward time is 0.90 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 9.04 (ms); backwad avg time is 0.87 (ms).
```
**After:**
```
input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.03 (ms).
input size(128, 1000) forward time is 0.28 (ms); backwad avg time is 0.07 (ms).
```
**OMP_NUM_THREADS=1:**
```
Before:
input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms).
input size(128, 100) forward time is 0.88 (ms); backwad avg time is 0.10 (ms).
input size(128, 1000) forward time is 8.72 (ms); backwad avg time is 0.81 (ms).
After:
input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.03 (ms).
input size(128, 1000) forward time is 0.63 (ms); backwad avg time is 0.15 (ms).
```

Fix https://github.com/pytorch/pytorch/issues/24724, https://github.com/pytorch/pytorch/issues/24725.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30958

Differential Revision: D19275111

Pulled By: ezyang

fbshipit-source-id: bbfe82e58fb27a4fb21c1914c6547a9050072e5c
2020-01-09 10:30:00 -08:00
5a76335aaa Move lshift to Aten (#31566)
Summary:
VitalyFedyunin , this PR is about move lshift to Aten.
Benchmark script :
```
import timeit
import torch
torch.manual_seed(1)

for n, t in [(10, 100000),(1000, 10000)]:
    print('__lshift__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))
        for dtype in ('torch.float32', 'torch.float64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t))

for n, t in [(10, 100000),(1000, 10000)]:
    print('__ilshift__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
        for dtype in ('torch.float32', 'torch.float64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__lshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.31618343852460384
device: cpu, dtype: torch.uint8, 100000 times           0.31258584931492805
device: cpu, dtype: torch.int16, 100000 times           0.3140896391123533
device: cpu, dtype: torch.int32, 100000 times           0.34389012958854437
device: cpu, dtype: torch.int64, 100000 times           0.339566046372056
device: cpu, dtype: torch.float32, 100000 times         0.4180623721331358
device: cpu, dtype: torch.float64, 100000 times         0.4165227338671684
device: cuda, dtype: torch.int8, 100000 times           1.7851383443921804
device: cuda, dtype: torch.uint8, 100000 times          1.7842160519212484
device: cuda, dtype: torch.int16, 100000 times          1.789359962567687
device: cuda, dtype: torch.int32, 100000 times          1.7822618428617716
device: cuda, dtype: torch.int64, 100000 times          1.7968465769663453
device: cuda, dtype: torch.float32, 100000 times                1.8066061967983842
device: cuda, dtype: torch.float64, 100000 times                1.8046843251213431
__lshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.04618230368942022
device: cpu, dtype: torch.uint8, 10000 times            0.04634759668260813
device: cpu, dtype: torch.int16, 10000 times            0.040676115080714226
device: cpu, dtype: torch.int32, 10000 times            0.04404774494469166
device: cpu, dtype: torch.int64, 10000 times            0.04511771444231272
device: cpu, dtype: torch.float32, 10000 times          0.6887832451611757
device: cpu, dtype: torch.float64, 10000 times          0.5559549620375037
device: cuda, dtype: torch.int8, 10000 times            0.17996764183044434
device: cuda, dtype: torch.uint8, 10000 times           0.17970609478652477
device: cuda, dtype: torch.int16, 10000 times           0.17873135022819042
device: cuda, dtype: torch.int32, 10000 times           0.1781835313886404
device: cuda, dtype: torch.int64, 10000 times           0.17846618220210075
device: cuda, dtype: torch.float32, 10000 times         0.18056879844516516
device: cuda, dtype: torch.float64, 10000 times         0.18132662680000067
__ilshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.61110960226506
device: cpu, dtype: torch.uint8, 100000 times           0.6333359787240624
device: cpu, dtype: torch.int16, 100000 times           0.6345370784401894
device: cpu, dtype: torch.int32, 100000 times           0.6470990972593427
device: cpu, dtype: torch.int64, 100000 times           0.6587044578045607
device: cpu, dtype: torch.float32, 100000 times         0.7269002720713615
device: cpu, dtype: torch.float64, 100000 times         0.7217964073643088
device: cuda, dtype: torch.int8, 100000 times           1.9880435159429908
device: cuda, dtype: torch.uint8, 100000 times          1.986489498987794
device: cuda, dtype: torch.int16, 100000 times          2.0059875370934606
device: cuda, dtype: torch.int32, 100000 times          1.995262237265706
device: cuda, dtype: torch.int64, 100000 times          1.9974954994395375
device: cuda, dtype: torch.float32, 100000 times                2.00442770216614
device: cuda, dtype: torch.float64, 100000 times                2.009664717130363
__ilshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.08199594635516405
device: cpu, dtype: torch.uint8, 10000 times            0.08096733782440424
device: cpu, dtype: torch.int16, 10000 times            0.0734213450923562
device: cpu, dtype: torch.int32, 10000 times            0.0769620593637228
device: cpu, dtype: torch.int64, 10000 times            0.08650507684797049
device: cpu, dtype: torch.float32, 10000 times          0.7196345143020153
device: cpu, dtype: torch.float64, 10000 times          0.597336508333683
device: cuda, dtype: torch.int8, 10000 times            0.19723015930503607
device: cuda, dtype: torch.uint8, 10000 times           0.19754122477024794
device: cuda, dtype: torch.int16, 10000 times           0.19710093270987272
device: cuda, dtype: torch.int32, 10000 times           0.19611249305307865
device: cuda, dtype: torch.int64, 10000 times           0.19750046730041504
device: cuda, dtype: torch.float32, 10000 times         0.19680574722588062
device: cuda, dtype: torch.float64, 10000 times         0.19689027685672045
```
After:
```
__lshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3031281465664506
device: cpu, dtype: torch.uint8, 100000 times           0.30772678554058075
device: cpu, dtype: torch.int16, 100000 times           0.3088294789195061
device: cpu, dtype: torch.int32, 100000 times           0.30907699652016163
device: cpu, dtype: torch.int64, 100000 times           0.31315001379698515
device: cpu, dtype: torch.float32, 100000 times         0.38823566399514675
device: cpu, dtype: torch.float64, 100000 times         0.39300001971423626
device: cuda, dtype: torch.int8, 100000 times           1.3225595457479358
device: cuda, dtype: torch.uint8, 100000 times          1.31739442050457
device: cuda, dtype: torch.int16, 100000 times          1.3198596313595772
device: cuda, dtype: torch.int32, 100000 times          1.309600466862321
device: cuda, dtype: torch.int64, 100000 times          1.3264533821493387
device: cuda, dtype: torch.float32, 100000 times                1.3377520674839616
device: cuda, dtype: torch.float64, 100000 times                1.3343619462102652
__lshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.02718757465481758
device: cpu, dtype: torch.uint8, 10000 times            0.02701799664646387
device: cpu, dtype: torch.int16, 10000 times            0.025483975186944008
device: cpu, dtype: torch.int32, 10000 times            0.025557605549693108
device: cpu, dtype: torch.int64, 10000 times            0.026179466396570206
device: cpu, dtype: torch.float32, 10000 times          0.0962932649999857
device: cpu, dtype: torch.float64, 10000 times          0.1611471576616168
device: cuda, dtype: torch.int8, 10000 times            0.13165222201496363
device: cuda, dtype: torch.uint8, 10000 times           0.13358880020678043
device: cuda, dtype: torch.int16, 10000 times           0.1342075066640973
device: cuda, dtype: torch.int32, 10000 times           0.1328689968213439
device: cuda, dtype: torch.int64, 10000 times           0.13336248509585857
device: cuda, dtype: torch.float32, 10000 times         0.1345295710489154
device: cuda, dtype: torch.float64, 10000 times         0.14084953162819147
__ilshift__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.19080814253538847
device: cpu, dtype: torch.uint8, 100000 times           0.18541878275573254
device: cpu, dtype: torch.int16, 100000 times           0.19136024825274944
device: cpu, dtype: torch.int32, 100000 times           0.1916898973286152
device: cpu, dtype: torch.int64, 100000 times           0.1973192635923624
device: cpu, dtype: torch.float32, 100000 times         0.2668355852365494
device: cpu, dtype: torch.float64, 100000 times         0.24472137168049812
device: cuda, dtype: torch.int8, 100000 times           1.3581306440755725
device: cuda, dtype: torch.uint8, 100000 times          1.3522163443267345
device: cuda, dtype: torch.int16, 100000 times          1.366145665757358
device: cuda, dtype: torch.int32, 100000 times          1.3674909211695194
device: cuda, dtype: torch.int64, 100000 times          1.3734915973618627
device: cuda, dtype: torch.float32, 100000 times                1.3831533305346966
device: cuda, dtype: torch.float64, 100000 times                1.396162535995245
__ilshift__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.02847585454583168
device: cpu, dtype: torch.uint8, 10000 times            0.02960751298815012
device: cpu, dtype: torch.int16, 10000 times            0.028516249731183052
device: cpu, dtype: torch.int32, 10000 times            0.02842544950544834
device: cpu, dtype: torch.int64, 10000 times            0.029186096973717213
device: cpu, dtype: torch.float32, 10000 times          0.0999628696590662
device: cpu, dtype: torch.float64, 10000 times          0.16676222812384367
device: cuda, dtype: torch.int8, 10000 times            0.13856443110853434
device: cuda, dtype: torch.uint8, 10000 times           0.13766566663980484
device: cuda, dtype: torch.int16, 10000 times           0.13652489613741636
device: cuda, dtype: torch.int32, 10000 times           0.13678150344640017
device: cuda, dtype: torch.int64, 10000 times           0.13749946560710669
device: cuda, dtype: torch.float32, 10000 times         0.13879029918462038
device: cuda, dtype: torch.float64, 10000 times         0.14587809145450592
```

Fix https://github.com/pytorch/pytorch/issues/24510 #24514 https://github.com/pytorch/pytorch/issues/24657  https://github.com/pytorch/pytorch/issues/24661
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31566

Differential Revision: D19314251

Pulled By: ezyang

fbshipit-source-id: 52df17b2c18ef1880374c6dbcf18fb1118086552
2020-01-09 09:41:36 -08:00
5c423cae72 Add precision tests for CUDA half linspace+logspace (#31962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31962

I added precision tests for CUDA half, float, and double.

The precision for CUDA half seems bad, but I checked the numbers against
previous versions of pytorch. The output of CUDA Half linspace+logspace
are exactly the same when compared with 1.2.0.

Test Plan: - Run CI

Differential Revision: D19320182

Pulled By: zou3519

fbshipit-source-id: 38d3d4dea2807875ed0b0ec2b93b19c10a289988
2020-01-09 07:35:52 -08:00
5d5f156558 Revert D18903453: Quantized H Tangent function
Test Plan: revert-hammer

Differential Revision:
D18903453

Original commit changeset: 0050b1cebb1d

fbshipit-source-id: 205978f71d5688d4068861f7cf2dff40fbb311c6
2020-01-09 07:30:49 -08:00
ddff4efa26 Don't use RTLD_GLOBAL to load _C. (#31162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19262579

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2
2020-01-09 07:28:15 -08:00
8614860210 Uniformly apply Windows logic in cpp_extensions everywhere (#31161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161

Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols.  But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262578

Pulled By: ezyang

fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f
2020-01-09 07:28:11 -08:00
0dbd5c0bfe Added torchvision tests as part of ORT tests (#31835)
Summary:
Added torchvision tests as part of ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835

Reviewed By: hl475

Differential Revision: D19278607

Pulled By: houseroad

fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd
2020-01-08 21:04:29 -08:00
6d9a9e379d Fix segfault in caffe2 slice test (#31801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31801

Try to fix issue #30764

Test Plan:
python test/onnx/test_utility_funs.py TestUtilityFuns

Imported from OSS

Differential Revision: D19315046

fbshipit-source-id: de3595969280e4ebe762cb098ff0891f8b5a9a90
2020-01-08 17:13:29 -08:00
9e9ca6ec37 add conversion functions to embedding tables (#31083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083

add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases)

Test Plan:
added unit tests
enhanced shape inference tests

Reviewed By: jspark1105

Differential Revision: D18920547

fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891
2020-01-08 16:56:12 -08:00
eb23171bce TensorIterator norm update (#31903)
Summary:
special case for norm out where p == 2. Instead of calling `pow`,
we use multiplication as a faster code path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31903

Differential Revision: D19312749

Pulled By: ngimel

fbshipit-source-id: 73732b7b37a243a14438609784795b920271a0b5
2020-01-08 16:50:42 -08:00
8ecd3f783d check for object equality in constant pooling (#31800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800

If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness.

Test Plan: Imported from OSS

Differential Revision: D19269499

Pulled By: eellison

fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4
2020-01-08 16:47:07 -08:00
319cc21108 Add AliasDb API For Changing Aliasing (#31501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501

We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable.

Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing.

Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship`

Related:  https://github.com/pytorch/pytorch/issues/28360

Test Plan: Imported from OSS

Differential Revision: D19254413

Pulled By: eellison

fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15
2020-01-08 16:47:03 -08:00
5cc49ed45f Document IValue (#31904)
Summary:
This is a first pass attempt at documenting `IValue` to help with problems like in #17165. Most users are probably concerned with
 * how to make an `IValue` that matches the input type to their graph (most of the constructors are pretty self explanatory, so as long as they are in the docs I think its enough)
 * how to extract the results after running their graph (there is a small note on the behavior of `.toX()` based on confusions we've had in the past)

Preview:
https://driazati.github.io/pytorch_doc_previews/31904/api/structc10_1_1_i_value.html#exhale-struct-structc10-1-1-i-value

There are also some random CSS fixes to clean up the style.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31904

Pulled By: driazati

Differential Revision: D19318733

fbshipit-source-id: b29dae3349d5a7ea5a3b8e09cd23f7ff8434edb4
2020-01-08 16:08:35 -08:00
883fb5434a Use real argument names for Python functions (#29300)
Summary:
This hooks up `inspect` so that Python functions get their parameters
names attached instead of naming them `0, 1, 2, ...`. This also fixes
issue #28537 where `ignore` functions were improperly typing `self`.
](https://our.intern.facebook.com/intern/diff/19256434/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300

Pulled By: driazati

Differential Revision: D19256434

fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c
2020-01-08 15:41:28 -08:00
09a22f3301 Remove C++ docs contributing page (#31908)
Summary:
Stacked PRs
 * **#31908 - Remove C++ docs contributing page**
 * #31905 - Add doc previewing instructions

We should have 1 source of truth for contribution instructions (CONTRIBUTING.md).
This PR moves the instructions from the C++ doc pages there instead of having its
own separate page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31908

Pulled By: driazati

Differential Revision: D19296366

fbshipit-source-id: c1daf004259342bd09e09dea3b80e34db47066ec
2020-01-08 15:37:35 -08:00
8c59d48281 Add doc previewing instructions (#31905)
Summary:
Stacked PRs
 * #31908 - Remove C++ docs contributing page
 * **#31905 - Add doc previewing instructions**

This adds some instructions on how to get started with Github pages you can show reviewers your documentation changes. Hopefully we can delete this eventually and build docs automatically on relevant PRs in CI.
](https://our.intern.facebook.com/intern/diff/19296364/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31905

Pulled By: driazati

Differential Revision: D19296364

fbshipit-source-id: df47fa1a8d7be029c3efcf6521298583ad9f7a95
2020-01-08 15:37:31 -08:00
dedd16b418 remove THConv code which never be used (#31879)
Summary:
Just remove dead code in TH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31879

Differential Revision: D19315818

Pulled By: ezyang

fbshipit-source-id: dbeb2475e19e9ebf769df2649cc859c08d3d184d
2020-01-08 15:14:27 -08:00
9a3cb1e859 Move cauchy to Aten(CPU) (#31824)
Summary:
Fix https://github.com/pytorch/pytorch/issues/24684.
Benchmark script :
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(1000):
        input.cauchy_()

for n in [1, 10, 100, 1000]:
    fwd_t = 0
    input = torch.randn(128, n, requires_grad=False, device=device)
    for i in range(10000):
        t1 = _time()
        input.cauchy_()
        t2 = _time()
        fwd_t = fwd_t + (t2 -t1)
    fwd_avg = fwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg))
```
Test device: **skx-8180**.
Before:
```
input size(128, 1) forward time is 0.0071 (ms).
input size(128, 10) forward time is 0.0596 (ms).
input size(128, 100) forward time is 0.5798 (ms).
input size(128, 1000) forward time is 5.8395 (ms).
```
After:
```
input size(128, 1) forward time is 0.0070 (ms).
input size(128, 10) forward time is 0.0583 (ms).
input size(128, 100) forward time is 0.5714 (ms).
input size(128, 1000) forward time is 5.7674 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31824

Differential Revision: D19314411

Pulled By: ezyang

fbshipit-source-id: 58098546face3e5971b023f702cfe44ff1cccfbc
2020-01-08 15:10:53 -08:00
9ba6a768de Add op bitwise_or (#31559)
Summary:
ezyang ,  this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 .
Benchmark script :
```
import timeit
import torch
torch.manual_seed(1)

for n, t in [(10, 100000),(1000, 10000)]:
    print('__or__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a | b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))

for n, t in [(10, 100000),(1000, 10000)]:
    print('__ior__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a | b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__or__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.17616272252053022
device: cpu, dtype: torch.uint8, 100000 times           0.17148233391344547
device: cpu, dtype: torch.int16, 100000 times           0.17616403382271528
device: cpu, dtype: torch.int32, 100000 times           0.17717823758721352
device: cpu, dtype: torch.int64, 100000 times           0.1801931718364358
device: cuda, dtype: torch.int8, 100000 times           1.270583058707416
device: cuda, dtype: torch.uint8, 100000 times          1.2636413089931011
device: cuda, dtype: torch.int16, 100000 times          1.2839747751131654
device: cuda, dtype: torch.int32, 100000 times          1.2548385225236416
device: cuda, dtype: torch.int64, 100000 times          1.2650810535997152
__or__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.031136621721088886
device: cpu, dtype: torch.uint8, 10000 times            0.030786747112870216
device: cpu, dtype: torch.int16, 10000 times            0.02391665056347847
device: cpu, dtype: torch.int32, 10000 times            0.024147341027855873
device: cpu, dtype: torch.int64, 10000 times            0.024414129555225372
device: cuda, dtype: torch.int8, 10000 times            0.12741921469569206
device: cuda, dtype: torch.uint8, 10000 times           0.1249831635504961
device: cuda, dtype: torch.int16, 10000 times           0.1283819805830717
device: cuda, dtype: torch.int32, 10000 times           0.12591975275427103
device: cuda, dtype: torch.int64, 10000 times           0.12655890546739101
__ior__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3908365070819855
device: cpu, dtype: torch.uint8, 100000 times           0.38267823681235313
device: cpu, dtype: torch.int16, 100000 times           0.38239253498613834
device: cpu, dtype: torch.int32, 100000 times           0.3817988149821758
device: cpu, dtype: torch.int64, 100000 times           0.3901665909215808
device: cuda, dtype: torch.int8, 100000 times           1.4211318120360374
device: cuda, dtype: torch.uint8, 100000 times          1.4215159295126796
device: cuda, dtype: torch.int16, 100000 times          1.4307750314474106
device: cuda, dtype: torch.int32, 100000 times          1.4123614141717553
device: cuda, dtype: torch.int64, 100000 times          1.4480243818834424
__ior__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.06468924414366484
device: cpu, dtype: torch.uint8, 10000 times            0.06442475505173206
device: cpu, dtype: torch.int16, 10000 times            0.05267547257244587
device: cpu, dtype: torch.int32, 10000 times            0.05286940559744835
device: cpu, dtype: torch.int64, 10000 times            0.06211103219538927
device: cuda, dtype: torch.int8, 10000 times            0.15332304500043392
device: cuda, dtype: torch.uint8, 10000 times           0.15353196952492
device: cuda, dtype: torch.int16, 10000 times           0.15300503931939602
device: cuda, dtype: torch.int32, 10000 times           0.15274472255259752
device: cuda, dtype: torch.int64, 10000 times           0.1512152962386608
```
After:
```
__or__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.2465507509186864
device: cpu, dtype: torch.uint8, 100000 times           0.2472386620938778
device: cpu, dtype: torch.int16, 100000 times           0.2469814233481884
device: cpu, dtype: torch.int32, 100000 times           0.2535214088857174
device: cpu, dtype: torch.int64, 100000 times           0.24855613708496094
device: cuda, dtype: torch.int8, 100000 times           1.4351346511393785
device: cuda, dtype: torch.uint8, 100000 times          1.4434308474883437
device: cuda, dtype: torch.int16, 100000 times          1.4520929995924234
device: cuda, dtype: torch.int32, 100000 times          1.4456610176712275
device: cuda, dtype: torch.int64, 100000 times          1.4580101007595658
__or__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.029985425993800163
device: cpu, dtype: torch.uint8, 10000 times            0.03024935908615589
device: cpu, dtype: torch.int16, 10000 times            0.026356655173003674
device: cpu, dtype: torch.int32, 10000 times            0.027377349324524403
device: cpu, dtype: torch.int64, 10000 times            0.029163731262087822
device: cuda, dtype: torch.int8, 10000 times            0.14540370367467403
device: cuda, dtype: torch.uint8, 10000 times           0.1456305105239153
device: cuda, dtype: torch.int16, 10000 times           0.1450125053524971
device: cuda, dtype: torch.int32, 10000 times           0.1472016740590334
device: cuda, dtype: torch.int64, 10000 times           0.14709716010838747
__ior__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.27195510920137167
device: cpu, dtype: torch.uint8, 100000 times           0.2692424338310957
device: cpu, dtype: torch.int16, 100000 times           0.27726674638688564
device: cpu, dtype: torch.int32, 100000 times           0.2815811652690172
device: cpu, dtype: torch.int64, 100000 times           0.2852728571742773
device: cuda, dtype: torch.int8, 100000 times           1.4743850827217102
device: cuda, dtype: torch.uint8, 100000 times          1.4766502184793353
device: cuda, dtype: torch.int16, 100000 times          1.4774163831025362
device: cuda, dtype: torch.int32, 100000 times          1.4749693805351853
device: cuda, dtype: torch.int64, 100000 times          1.5772947426885366
__ior__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03614502027630806
device: cpu, dtype: torch.uint8, 10000 times            0.03619729354977608
device: cpu, dtype: torch.int16, 10000 times            0.0319912089034915
device: cpu, dtype: torch.int32, 10000 times            0.03319283854216337
device: cpu, dtype: torch.int64, 10000 times            0.0343862259760499
device: cuda, dtype: torch.int8, 10000 times            0.1581476852297783
device: cuda, dtype: torch.uint8, 10000 times           0.15974601730704308
device: cuda, dtype: torch.int16, 10000 times           0.15957212820649147
device: cuda, dtype: torch.int32, 10000 times           0.16002820804715157
device: cuda, dtype: torch.int64, 10000 times           0.16129320487380028
```

Fix  https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559

Differential Revision: D19315875

Pulled By: ezyang

fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad
2020-01-08 15:06:30 -08:00
4f9d2f74e2 Port softplus activation to Aten(CPU+CUDA) (#30504)
Summary:
VitalyFedyunin, This PR is about port Softplus activation to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.Softplus()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms).
CPU:
input size(128, 100) forward time is 1.16 (ms); backwad avg time is 0.69 (ms).
input size(128, 10000) forward time is 60.19 (ms); backwad avg time is 31.86 (ms).
```
After:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU:
input size(128, 100) forward time is 0.43 (ms); backwad avg time is 0.16 (ms).
input size(128, 10000) forward time is 1.65 (ms); backwad avg time is 0.83 (ms).
```
`OMP_NUM_THREADS=1:`
```
Before:
input size(128, 100) forward time is 0.53 (ms); backwad avg time is 0.28 (ms).
input size(128, 10000) forward time is 51.33 (ms); backwad avg time is 25.48 (ms).
After:
input size(128, 100) forward time is 0.44 (ms); backwad avg time is 0.16 (ms).
input size(128, 10000) forward time is 42.05 (ms); backwad avg time is 13.97 (ms).
```

Fix https://github.com/pytorch/pytorch/issues/24633, https://github.com/pytorch/pytorch/issues/24634, https://github.com/pytorch/pytorch/issues/24766, https://github.com/pytorch/pytorch/issues/24767.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30504

Differential Revision: D19274913

Pulled By: ezyang

fbshipit-source-id: 21b29e8459dcba5a040cc68333887b45a858328e
2020-01-08 15:03:53 -08:00
d2fdf140af Combine all the user inputs together and convert them to fp16 (#31898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31898

Att

Reviewed By: tracelogfb

Differential Revision: D19291357

fbshipit-source-id: 747ed5234ca042ceeaff2d094701ead7597ac3ee
2020-01-08 14:36:42 -08:00
8b4feff01d Use simd version for fp16 conversions (#31897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31897

Previous version only use avx2. The _simd version uses avx512 if CPU is capable of that.

Test Plan: Unitttest

Reviewed By: tracelogfb

Differential Revision: D19291499

fbshipit-source-id: 3b1ee0ba756e5c9defbd5caf7f68982d9b2ca06c
2020-01-08 14:36:38 -08:00
1314f7f4f4 Ensure the original grad_mode is restored during backward (#31884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884

Fix #31715

Test Plan: Imported from OSS

Differential Revision: D19301076

Pulled By: albanD

fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce
2020-01-08 14:16:51 -08:00
c299cb05ef temporary fix for jit test backward compatibility issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31949

Test Plan: Imported from OSS

Differential Revision: D19314763

Pulled By: albanD

fbshipit-source-id: b5eff0ed53a371d260596ca85d914c8bddb0a8aa
2020-01-08 13:32:08 -08:00
462bfc7fe7 docker hub image info (#31923)
Summary:
result: http://docker.pytorch.org/docker_hub.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31923

Differential Revision: D19316770

Pulled By: mingbowan

fbshipit-source-id: 57f34d8983d26772bb0d310fa0a4085674c860e5
2020-01-08 13:20:06 -08:00
5dfcfeebb8 Revert D19298735: Emit warning from deprecated torch function signatures
Test Plan: revert-hammer

Differential Revision:
D19298735

Original commit changeset: 03cb78af1765

fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70
2020-01-08 13:04:41 -08:00
620060cb0c Quantized H Tangent function (#31031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031

This activation will be needed for the LSTM implementation.
Also includes the QNNPack implementation.

Test Plan: Imported from OSS

Differential Revision: D18903453

Pulled By: z-a-f

fbshipit-source-id: 0050b1cebb1ddb179b7ecbcb114fe70705070f67
2020-01-08 12:59:39 -08:00
54777b1e73 Avoid reference invalidation in cuda SpectralOps' plan_caches (#31861)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31412

The root cause is `plan_caches` being resized in one thread while another holds a reference to an existing `CuFFTParamsLRUCache` which then becomes invalidated.

I was able to reproduce the crash very reliably without this fix applied and no longer see it. Being a race condition, it's hard to say for sure though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31861

Differential Revision: D19312314

Pulled By: ezyang

fbshipit-source-id: 06e4561128d503f2d70cdfe1982be0f3db2a8cf8
2020-01-08 11:50:05 -08:00
7f723cbd8a Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility
Test Plan: revert-hammer

Differential Revision:
D19290954

Original commit changeset: cdb22203c2f2

fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3
2020-01-08 10:25:51 -08:00
c66ca74f03 Add device debug info to CUDA build (#31929)
Summary:
Also print NVCC flags in the summary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929

Differential Revision: D19312079

Pulled By: ezyang

fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037
2020-01-08 09:56:20 -08:00
f0072b3af5 Remove C++11 compatibility from c10::optional (#30919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30919

deletecode
ghstack-source-id: 96383227

Test Plan: waitforsandcastle

Differential Revision: D18869641

fbshipit-source-id: c08345d17a291cea3749af20473b6acddc78ab27
2020-01-08 09:19:59 -08:00
f67851d69a Fix c10::util::get_fully_qualified_type_name for MSVC (#31313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31313

This is a bugfix. The reason we couldn't enable the constexpr-ness for it before is that it was buggy,
and without constexpr it crashed at runtime and not at compile time which seems to have passed our CI unfortunately...
ghstack-source-id: 96380160

Test Plan: Now it works even when enabling constexpr for it

Differential Revision: D19087471

fbshipit-source-id: 28be107389f4507d35d08eab4b089a405690529b
2020-01-08 09:11:10 -08:00
2a294aace6 Remove memory ordering from LeftRight (#31026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31026

This is error prone and probably wrong. Since we don't use LeftRight on the hot path anymore, let's remove this.
ghstack-source-id: 96369644

Test Plan: none

Differential Revision: D18902165

fbshipit-source-id: 7b9478cd7cc071f403d75da20c7c889c27248b5c
2020-01-08 08:59:30 -08:00
84dfa96f62 Fix -Wundef warning in conversions.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31911

Test Plan:
* CI builds including GPU and OSS-build tests
* The `defined(__HIP_DEVICE_COMPILE__) ` instance a few lines below is proof that this is a define/undef flag, not a define01 flag

Reviewed By: hlu1

Differential Revision: D19296560

fbshipit-source-id: 1c45069aec534b0bf4a87751a74680675c985e06
2020-01-08 08:39:37 -08:00
ee817012b2 Add more tests to the autograd wrt view and inplace (#31147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31147

The goal here is to add more tests of the current behavior of the autograd to make sure no regressions are introduced when modifying it.
Do let me know if you think of other corner cases I missed.

Test Plan: Imported from OSS

Differential Revision: D19301082

Pulled By: albanD

fbshipit-source-id: 2cb07dcf99e56eb1f2c56a179796f2e6042d5a2d
2020-01-08 07:14:52 -08:00
6664703842 Implement backend-agnostic rpc._wait_all_workers() utility (#31888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888

We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.

- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.
ghstack-source-id: 96386210

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

Differential Revision: D19290954

fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf
2020-01-08 01:00:25 -08:00
9116f02beb Rename TORCH_DCHECK to TORCH_INTERNAL_ASSERT_DEBUG_ONLY (#31917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31917

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19301480

Pulled By: ezyang

fbshipit-source-id: fcce8868733965b9fbd326b4ec273135759df377
2020-01-07 17:28:47 -08:00
ab60cca488 Make c10::util::get_fully_qualified_type_name() backwards compatible with clang 4 (#31351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351

Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly.

Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message.
ghstack-source-id: 96380163

Test Plan: testinprod

Differential Revision: D19135587

fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa
2020-01-07 17:07:54 -08:00
0dca9c30ca constexpr typeid improvements (#31312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31312

ghstack-source-id: 96369343

Test Plan: unit tests

Differential Revision: D19087198

fbshipit-source-id: 7f9a7169f11973759b9ecabcc755c211d34e2742
2020-01-07 17:07:49 -08:00
c21f89970f Remove c++14-conditional constexpr (#30916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916

These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead.
ghstack-source-id: 96369584

Test Plan: waitforsandcastle

Differential Revision: D18869635

fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff
2020-01-07 16:40:11 -08:00
4daa3dedbe Fix IValue.isList
Summary: I think this was wrong before?

Test Plan: Not sure.

Reviewed By: IvanKobzarev

Differential Revision: D19221358

fbshipit-source-id: 27e675cac15dde29e026305f4b4e6cc774e15767
2020-01-07 16:33:36 -08:00
1b4d3d5748 Properly return data from non-contiguous tensors in Java
Summary:
These were returning incorrect data before.  Now we make a contiguous copy
before converting to Java.  Exposing raw data to the user might be faster in
some cases, but it's not clear that it's worth the complexity and code size.

Test Plan: New unit test.

Reviewed By: IvanKobzarev

Differential Revision: D19221361

fbshipit-source-id: 22ecdad252c8fd968f833a2be5897c5ae483700c
2020-01-07 16:33:31 -08:00
2d6a2c898c Support tensors with a storage offset in Java (#31584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31584

These were returning incorrect data before.

Test Plan: New unit test.

Reviewed By: IvanKobzarev

Differential Revision: D19221360

fbshipit-source-id: b3f01de086857027f8e952a1c739f60814a57acd
2020-01-07 16:33:26 -08:00
6d1fa8296b Support tensors with empty shape in Java
Summary: These are valid tensors.

Test Plan: New unit test.

Reviewed By: IvanKobzarev

Differential Revision: D19221362

fbshipit-source-id: fa9af2fc539eb7381627b3d473241a89859ef2ba
2020-01-07 16:33:21 -08:00
3c07eb33bb Better error for torch::jit::loading a eager file (#31709)
Summary:
This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++.

Relevant for #31620
](https://our.intern.facebook.com/intern/diff/19252172/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31709

Pulled By: driazati

Differential Revision: D19252172

fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5
2020-01-07 16:20:42 -08:00
a730920a3d Make RRef leak detection always print a warning log (#31922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31922

For better debugging, `test_rref_leak` failure in https://app.circleci.com/jobs/github/pytorch/pytorch/4135881, as per discussion in https://github.com/pytorch/pytorch/pull/31888.

ghstack-source-id: 96375261

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

Differential Revision: D19302814

fbshipit-source-id: 51632aede98e01689f8bc0f266788a9b020daa15
2020-01-07 15:18:00 -08:00
227d1a43a4 Revert D18838848: disable __torch_function__ overides for operators in torch.functional
Test Plan: revert-hammer

Differential Revision:
D18838848

Original commit changeset: 22b8015d7b2f

fbshipit-source-id: fdaeffcd112990ed379782cf7216d3f1beeb2cb1
2020-01-07 15:03:15 -08:00
8a0503b355 Run a non-quiet submodule update to prevent timeouts on Circle CI (#31900)
Summary:
As in title, this PR will disable the `--quiet` flag used in the CI as a workaround to a timeout hitting Mac OS CI.  Circle CI works by timing out when no text has been printed for 10 min.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31900

Differential Revision: D19302899

Pulled By: bwasti

fbshipit-source-id: 145647da983ee06f40794bda1abd580ea45a0019
2020-01-07 14:01:05 -08:00
114562cf93 For torch::from_blob() add clue when memory is non-owned. (#31222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31222

 - When constructing torch::from_blob() in the case where the deleter is a nop, switch to using a nullptr context in the DataPtr (with a nop deleter)

 - No real extra memory/cpu requirements here, actually saves a minor alloc.

Why? Trying to get a signal that a Tensor might contain non-owned memory from
torch::from_blob(), by detecting the nullptr context.
ghstack-source-id: 96336078

Test Plan:
buck test mode/dev caffe2/test/cpp/api/...
   buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18992119

fbshipit-source-id: 4eea642f82d0858b57fdfc6995364a760c10567d
2020-01-07 13:12:30 -08:00
ca72df06ae disable __torch_function__ overides for operators in torch.functional (#30839)
Summary:
For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++.

cc hl475
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839

Differential Revision: D18838848

Pulled By: ezyang

fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8
2020-01-07 12:27:28 -08:00
bb279c5c63 named tensor max pooling support
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31669

Test Plan: Imported from OSS

Differential Revision: D19240348

Pulled By: glaringlee

fbshipit-source-id: 004387aa753e4e41afdede66647abbb0bcbd9808
2020-01-07 12:03:18 -08:00
3a2757c682 Fix tracing for modules with List[Tensor] as output (#31343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31343

Fix an issue in TorchScript tracing for modules with `c10::List<at::Tensor>` as an output. TensorList was not supported properly.

Test Plan: unit tests

Reviewed By: wanchaol

Differential Revision: D18850722

fbshipit-source-id: 87a223104d1361fe754d55deceeb1e8bbcad629b
2020-01-07 11:57:25 -08:00
74d69e296e Raise an error if torch.cat is given out as one of the input tensors (#30577)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30562 for both cpu and cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30577

Differential Revision: D19298732

Pulled By: ezyang

fbshipit-source-id: ea539c97493ee17d8f60b1134d100a44c8717578
2020-01-07 11:30:33 -08:00
c888473b57 Restructure docs organization and naming (#31849)
Summary:
* Rename “Other Languages” → “Language Bindings”
* Move the Community section to the bottom
* Move "Language Bindings" above "Python API"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31849

Differential Revision: D19290966

Pulled By: jlin27

fbshipit-source-id: 30b579e032a9fb1636e4afc7bbbd85a2708f637d
2020-01-07 11:16:53 -08:00
bf8e1c0710 Integrate async mode for autograd engine with distributed autograd. (#31508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31508

This PR builds on top of https://github.com/pytorch/pytorch/pull/31230
to ensure that distributed autograd doesn't block an RPC thread anymore during
the backward pass.

I've also added a unit test where all ranks hammer rank 0 without about 60
backward calls (which would cause a deadlock earlier), but now such a test
passes without any issues.
ghstack-source-id: 96345097

Test Plan: waitforbuildbot

Differential Revision: D19188749

fbshipit-source-id: b21381b38175699afd0f9dce1ddc8ea6a220f589
2020-01-07 11:01:16 -08:00
0e5a6700cc Emit warning from deprecated torch function signatures (#31514)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28430

The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used.

One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures.

[`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514

Differential Revision: D19298735

Pulled By: ezyang

fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647
2020-01-07 10:57:53 -08:00
5cc62f2913 Ensure autograd callbacks are called only once for reentrant backward. (#31909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31909

https://github.com/pytorch/pytorch/pull/31230 introduced a bug where
we would end up calling `graph_task_post_processing` twice for reentrant
backward calls (once when we mark the future completed and then we we called
graph_task_post_processing in execute_with_graph_task).

This PR fixes the issues by verifying the future we return in that case is
completed and we remove the call to graph_task_post_processing.

In addition to that I added a test that reproduced the problem and verified it
is fixed by this PR.
ghstack-source-id: 96349102

Test Plan: waitforbuildbot

Differential Revision: D19296363

fbshipit-source-id: dc01a4e95989709ad163bb0357b1d191ef5a4fb2
2020-01-07 10:35:04 -08:00
4ee9c56218 Support PyTorch ROCm CI on Ubuntu18.04 (#31886)
Summary:
In order to support Ubuntu18.04, some changes to the scripts are required.
* install dependencies with -y flag
* mark install noninteractive
* install some required dependencies (gpg-agent, python3-distutils, libidn11)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31886

Differential Revision: D19300586

Pulled By: bddppq

fbshipit-source-id: d7fb815a3845697ce63af191a5bc449d661ff1de
2020-01-07 10:32:47 -08:00
2f5eefe525 Raise ValueError if CUDA device is specified without specifying the : (#29087)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/19076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29087

Differential Revision: D19298959

Pulled By: ezyang

fbshipit-source-id: 878ea4840682012f07177d8d159a77c0e5afada6
2020-01-07 10:29:49 -08:00
3c7db5ccbc Don't unconditionally compile runJITCPPTests (#31236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31236

It is not compiled on Windows

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262581

Pulled By: ezyang

fbshipit-source-id: 80bfa553333a946f00291aaca6ad26313caaa9e6
2020-01-07 10:24:52 -08:00
809ee9d04c Enable personalized FC weight_init and sparse_emb weight_init (#31707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31707

Change the initialization value for FC weight init and sparse embedding lookup init.

Previous default initialization is uniform(-\sqrt(1/input_dim), \sqrt(1/input_dim)); Now pass into a flexible hyperparameter, say \alpha into it, to change into uniform(-\sqrt(\alpha/input_dim), \sqrt(\alpha/input_dim));

Reviewed By: chonglinsun

Differential Revision: D18825615

fbshipit-source-id: 4c5f2e07f2b3f5d642fd96d64dbf68892ebeb30b
2020-01-07 10:10:54 -08:00
22044c6f7c Use TORCH_CHECK instead of AT_ASSERT in torch::cuda::gather() (#27456)
Summary:
The error message produced by AT_ASSERT() in gather() encouraged users to file a bug report ("please report a bug to PyTorch..."). The assertion should be a regular argument check since it can be triggered by passing tensors with different dimensionality, e.g. `torch.cuda.comm.gather([torch.rand(1, device='cuda'), torch.rand(1, 1, device='cuda')])`.

See: https://github.com/pytorch/pytorch/issues/26400
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27456

Differential Revision: D19300270

Pulled By: ezyang

fbshipit-source-id: ec87d225e23445020b377521e0daccceb4748215
2020-01-07 10:04:24 -08:00
20c5dd59bd Add stub for transformer.py and MultiheadAttention Class. (#28396)
Summary:
Add stub for `transformer.py` and `class MultiheadAttention`. Add import for `transformer.py`  and `class MultiheadAttention` in `__init__.pyi.in`. I've tested the code hint in PyCharm and all works file.
Relate issue: [https://github.com/pytorch/pytorch/issues/27842](https://github.com/pytorch/pytorch/issues/27842)
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28396

Differential Revision: D19300287

Pulled By: ezyang

fbshipit-source-id: 1a79d6518b5edd4643892c46a959108385c739ad
2020-01-07 09:13:36 -08:00
346a349111 Update all instances of 1.4.0 -> 1.5.0 (#31785)
Summary:
Done with:

```
❯ sed -i 's/1\.4\.0/1.5.0/g' $(find -type f -not -path "./third_party/*")
```

This was previously done in separate commits, but it would be beneficial to bump all included projects within this repository at the same time.

Old bumps for reference:
* [iOS]Update Cocoapods to 1.4.0: https://github.com/pytorch/pytorch/pull/30326
* [android] Change nightly builds version to 1.4.0-SNAPSHOT: https://github.com/pytorch/pytorch/pull/27381
* Roll master to 1.4.0: https://github.com/pytorch/pytorch/pull/27374

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31785

Differential Revision: D19277925

Pulled By: seemethere

fbshipit-source-id: f72ad082f0566004858c9374879f4b1bee169f9c
2020-01-07 08:00:17 -08:00
985fd970aa Enable BFloat16 support for Convolutions on ROCm (#30948)
Summary:
This PR adds bfloat16 support for convolutions on ROCm.

- Intergrates MIOpen bfloat16 convolution support into PyTorch

- Enables bfloat16 convolution for non-miopen paths, i.e THCUNN, native hip kernels

- Enables bfloat16 type for probability distribution functions(this is included in this PR since conv unit tests use bfloat16 random number generators)

Native cuda kernels for convolution and random functions will be compiled for CUDA as well.

iotamudelta bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30948

Differential Revision: D19274164

Pulled By: ezyang

fbshipit-source-id: c0888a6ac72a2c5749b1ebb2195ac6f2209996be
2020-01-07 06:57:35 -08:00
a561a8448b minor doc tweak to use mp.spawn in example (#30381)
Summary:
Per pietern's comment in https://github.com/pytorch/pytorch/issues/30022, we can make this example launcher a bit simpler by using `torch.multiprocessing`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30381

Differential Revision: D19292080

Pulled By: rohan-varma

fbshipit-source-id: 018ace945601166ef3af05d8c3e69d900bd77c3b
2020-01-06 22:19:01 -08:00
34561dadcd Don't handle bias inside cudnn_convolution* (#31524)
Summary:
Compared to cuDNN bias, PyTorch add has the following advantage:
- faster, especially for backward (see: https://github.com/zasdfgbnm/things/blob/master/2019/conv-backward-profile.md)
- handles 64bit indexing automatically
- has less code, less maintenance effort

ngimel I submit this PR early so the CI could start building it. But I have not tested it locally yet (still waiting for compiling).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31524

Differential Revision: D19264244

Pulled By: ngimel

fbshipit-source-id: cb483d378a6d8bce0a05c3643a796e544bd8e8f0
2020-01-06 16:47:54 -08:00
5d80f63478 no_grad, enable_grad: support for decorating generator functions (#31792)
Summary:
Closes https://github.com/pytorch/pytorch/issues/31497

This allows `torch.no_grad` and `torch.enable_grad` to be used as decorators for generator functions. In which case it disables/enables grad only inside the body of the generator and restores the context outside of the generator.

https://github.com/pytorch/pytorch/issues/31497 doesn't include a complete reproducer but the included test with `torch.is_grad_enabled` show this is working where it failed before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31792

Differential Revision: D19274971

Pulled By: albanD

fbshipit-source-id: fde6d3fd95d76c8d324ad02db577213a4b68ccbe
2020-01-06 15:21:20 -08:00
58cffbff91 Add missing TORCH_CUDA_API annotation to throw_nccl_error (#31157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31157

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262583

Pulled By: ezyang

fbshipit-source-id: 8fb87b41ab53770329b38e1e2fe679fb868fee12
2020-01-06 14:39:51 -08:00
4ef9daf7b2 Remove dead CAFFE2_LIBS variable (#31155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31155

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262584

Pulled By: ezyang

fbshipit-source-id: 147ac5a9c36e813ea9a2f68b498880942d661be5
2020-01-06 14:39:47 -08:00
a9dae70bae Remove LibIRC logic from cmake. (#31152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152

Per apaszke: I can't find any reasonable references to libIRC online, so
I decided to remove this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19262582

Pulled By: ezyang

fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548
2020-01-06 14:39:43 -08:00
112196fdee Fix index put (#31552)
Summary:
This change is required for cases like:
x[1:] = data or x[:3] = data
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31552

Reviewed By: hl475

Differential Revision: D19238815

Pulled By: houseroad

fbshipit-source-id: 56c9837d86b341ea92b0a71d55034ce189d12e6c
2020-01-06 14:09:48 -08:00
78cba90a8c Enable constant folding for Reshape (#31054)
Summary:
Enabled constant folding for onnx::Reshape
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31054

Reviewed By: hl475

Differential Revision: D18946951

Pulled By: houseroad

fbshipit-source-id: 499e8bf5fb091a94f7a27cbdf4311a23b1a6e3d3
2020-01-06 13:35:44 -08:00
492ca46e71 Fix androidTest - exclude host tests from it
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31522

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D19200861

Pulled By: IvanKobzarev

fbshipit-source-id: a6024f3013398f9e0d237e06c984a20493d42f11
2020-01-06 11:29:46 -08:00
c65305e991 Add a check method for custom type tensor (#31290)
Summary:
For backend integration, backend (e.g. Glow) needs to check the content of the tensor to determine whether it is a legit byte tensor or some special packed format. This provides a convenient interface for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31290

Reviewed By: jackm321, qizzzh

Differential Revision: D19069684

Pulled By: yinghai

fbshipit-source-id: 63360fa2c4d32695fe9767a40027d446d63efdd4
2020-01-06 11:15:33 -08:00
1f2b6d632a Refactor tests in pytorch's test/dist_autograd_test.py file (#31803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31803

Refactored the following fairly similar functions:
  1. `test_context_cleanup_tensor_with_grad`
  2. `test_context_cleanup_tensor_no_grad`
  3. `test_context_cleanup_no_tensors`
by creating a helper function `context_cleanup_test_helper` that can be invoked with the appropriate arguments.

Test Plan: Verified by running tests.

Differential Revision: D19269246

fbshipit-source-id: bfb42b078ad56b97ceeecf0d68b4169768c2c453
2020-01-06 10:59:00 -08:00
ddff014b79 fixed scale_factor calculation for uint8 tensor (#31778)
Summary:
When calling the add_images() method on the tensorboard SummaryWriter with a uint8 NCHW tensor, the tensor is incorrectly scaled, resulting in overflow behavior. This leads to incorrect images being displayed in tensorboard.

Issue: https://github.com/pytorch/pytorch/issues/31459

Local Testing (ran this code with and without the PR changes and printed scale_factor):

import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()
x=torch.tensor([[[[1, 2, 3], [4, 5, 6]]]], dtype=torch.uint8)
writer.add_images("images", x)

Before- scale_factor: 255, After- scale_factor: 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31778

Differential Revision: D19289189

Pulled By: anjali411

fbshipit-source-id: 350a1650337244deae4fd8f8b7fb0e354ae6986b
2020-01-06 10:27:35 -08:00
1ba1799a66 C++ added 3rd arg of false to BatchNorm/InstanceNorm register_parameter … (#31873)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/31680
C++ BatchNorm & InstanceNorm attempt to register undefined tensors when affine is false.

Fixes https://github.com/pytorch/pytorch/issues/31680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31873

Differential Revision: D19287087

Pulled By: yf225

fbshipit-source-id: 0d57f10c49083386919b703d72b520a73a8e9e7f
2020-01-06 01:46:24 -08:00
33430cf094 Revert D18643137: Implement backend-agnostic rpc._wait_all_workers() utility
Test Plan: revert-hammer

Differential Revision:
D18643137

Original commit changeset: d669d4fc9ad6

fbshipit-source-id: fe1f8ed77c1c5760638fef06e67ba100b86c33e9
2020-01-05 11:58:51 -08:00
fde94e7556 Provide async mode for local autograd engine. (#31230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230

A major issue with distributed autograd currently is that we block an
RPC thread when we call Engine::execute_with_graph_task.

To resolve this issue, I've made modifications to the local autograd engine
such that `execute_with_graph_task` returns a Future instead. The `execute()`
methods for Engine::execute() and DistEngine::execute() still wait() on this
Future which ensures there is no change in behavior yet.

In follow up PRs we can modify the distributed autograd engine to take
advantage of this Future.

Closes #26359
ghstack-source-id: 96298057

Test Plan: waitforbuildbot

Differential Revision: D18999709

fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229
2020-01-05 00:29:28 -08:00
3f0b330736 corrected keyword argument name in docs for Tensor.scatter (#31617)
Summary:
See https://github.com/pytorch/pytorch/issues/31601
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31617

Differential Revision: D19268872

Pulled By: mruberry

fbshipit-source-id: 52f0213f4aab991fd549b7623556a2ced61631a6
2020-01-04 21:48:30 -08:00
9020d30fc9 Updating submodules
Summary:
GitHub commits:

d7f0e32081
f2a603d2df
323a2bc3e5
04c07965ef
c179d38294
6fac956f22

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 558f35dbf1adb3b45179629c61d77488e441d4e3
2020-01-04 21:43:31 -08:00
502533cfe6 Implement backend-agnostic rpc._wait_all_workers() utility (#30710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30710

We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.

- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_wait_all_workers

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers$
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_forward_chain
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_wait_all_workers

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_wait_all_workers$
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

# Debug

```
buck test mode/dev-nosan caffe2/test:rpc_fork -- test_shutdown
```

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_clean_context_during_backward

buck build mode/dev-nosan //caffe2/test:dist_autograd_fork

buck-out/gen/caffe2/test/dist_autograd_fork\#binary.par -r test_clean_context_during_backward
```

https://our.intern.facebook.com/intern/testinfra/diagnostics/281475127895800.844424945328750.1575664368/

```
I1206 12:27:47.491420 185619 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.493880 185630 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.494526 185625 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.495390 185636 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
E1206 12:27:47.544198 185627 pair.cc:642] 1 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
E1206 12:27:47.544203 185633 pair.cc:642] 2 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
E1206 12:27:47.544210 185639 pair.cc:642] 3 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
```
This should mean the UDF in the request has been run, so Python proceeded and ran to `_agent.shutdown()`.

While the RpcAgents on followers wanted to send back the response, but the leader has closed RPC.

Need to re-trigger "pytorch_rpc-buck" to reproduce the rare-seen issue.

Differential Revision: D18643137

fbshipit-source-id: d669d4fc9ad65ed48bed1329a4eb1c32ba51323c
2020-01-04 17:13:44 -08:00
f362cd510d Move prim ops from JIT registration to C10 (#30612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612

The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style.

Test Plan: Imported from OSS

Differential Revision: D19237648

Pulled By: iseeyuan

fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9
2020-01-04 13:47:44 -08:00
5579611544 Enable foldbn tests (#29220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29220

Support for accessing constant is added in previous
PRs, this PR re-enables the foldbn tests

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D18846848

fbshipit-source-id: 90ceaf42539ffee80b984e0d8b2420da66c263c3
2020-01-04 11:47:01 -08:00
ebe69236d1 Expose class constant through attr and setattr in object (#29219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29219

We added class constant in previous PRs, this PR allows access to
class constant in the object API

Test Plan:
build/bin/test_jit
python test/test_jit.py

Imported from OSS

Differential Revision: D18846851

fbshipit-source-id: 888a6517d5f747d1f8ced283c0c2c30b2f6c72c6
2020-01-04 11:09:35 -08:00
6f62c311a1 Add unsafeRemoveConstant for ClassType (#30787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30787

This is needed when we fuse conv bn modules,
where we need to rewrite a constant bias (None) of conv to an attribute
bias of Tensor

Test Plan:
build/bin/test_jit

Imported from OSS

Differential Revision: D18846850

fbshipit-source-id: 9fd5fe85d93d07226e180b75d2e068fe00ca25fe
2020-01-04 01:11:59 -08:00
2bac76969c Fix getConstant (#31012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31012

- getConstant should throw when the item is not found
- add another getConstant which takes slot index as argument

Test Plan:
test_class_type.cpp

Imported from OSS

Differential Revision: D18898418

fbshipit-source-id: d3a23a4896fdbf5fa98e1c55c9c4d6205840014b
2020-01-03 23:06:11 -08:00
8420f205ee Remove refs from ArrayRef arguments (#31845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31845

ArrayRef is trivially copyable and should be passed by value. Removing
unnecessary `&`s.

Test Plan: Imported from OSS

Differential Revision: D19278523

Pulled By: suo

fbshipit-source-id: 026db693ea98d19246b02c48d49d1929ecb6478e
2020-01-03 22:50:55 -08:00
b0a2765103 move docker image html to correct bucket (#31832)
Summary:
save docker image version to docker.pytorch.org bucket to be served with http://docker.pytorch.org

test result: https://s3.amazonaws.com/docker.pytorch.org/pytorch.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31832

Differential Revision: D19281263

Pulled By: mingbowan

fbshipit-source-id: d906a72d419876c81a570a2086b2d8d2c47d5d17
2020-01-03 21:38:58 -08:00
5fe3604987 Preserve constant from ConcreteModuleType to ClassType (#29218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29218

We need to be able to access constant in module.

Test Plan:
tbd

Imported from OSS

Differential Revision: D18846847

fbshipit-source-id: 22d2c485c3c449bc14ad798f6e1a0c64fc8fb346
2020-01-03 21:30:04 -08:00
e5b7231edc Adding version check for hypothesis deadline
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31262

Test Plan: Imported from OSS

Differential Revision: D19036700

Pulled By: z-a-f

fbshipit-source-id: 8e898a6f064dfb4876aa0d3cc299288b5af7b37d
2020-01-03 19:17:55 -08:00
28c9dd4436 fix ProcessGroupGlooTest (#31255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31255

This test had 2 issues. A timeout would occasionally happen due to a timeout of 50ms, and CUDA could would get compiled and run on CPU, leading to errors. This PR fixes those issues.

Differential Revision: D19028231

fbshipit-source-id: e50752228affe0021e7c0caa83bce78d76473759
2020-01-03 18:35:29 -08:00
27488773b0 Updating submodules
Summary:
GitHub commits:

8c7c0e201e
b84db9a971
0524fa0b36
2df7b2ba54
80553514ed
4eb66bc7aa

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 97d0605beabcfc15236038215208acf034f8eba4
2020-01-03 17:04:54 -08:00
c829c6f3d2 Disable flaky test_debug_info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31847

Test Plan: Imported from OSS

Differential Revision: D19278009

Pulled By: mrshenli

fbshipit-source-id: 652fa6741a48f35d9f8f54534e84d64fdd96b439
2020-01-03 17:01:27 -08:00
6b1db202bc Add tanh to c10::cuda::compat (#31844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31844

Add tanh to c10::cuda::compat

Test Plan: unittest

Reviewed By: bddppq

Differential Revision: D19277230

fbshipit-source-id: d2cceea58722393ecb90aacec05b692dbb92d467
2020-01-03 14:27:36 -08:00
9407137102 Update the descriptive error message for enforce fail (#31575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31575

We need a new exception class specifically for the enforce_finite operator, because we need to map it to a specific python exception ExitException, not the RuntimeError type that all c10::Errors get mapped to by default. This diff includes:
- Define c10::EnforceFiniteNotMet
- API CAFFE_ENFORCE_FINITE to throw c10::EnforceFiniteNotMet
- Map from c10::EnforceFiniteNotMet to python ExitException
- Apply CAFFE_ENFORCE_FINITE in caffe2 op

Test Plan:
- integration test pass: https://fburl.com/fblearner/xwkzbqyo
- integration test with D19213617: https://fburl.com/fblearner/479y4jrj Generate error message as desired

- Example:
  - Original error message  f157597803
{F225477055}

  - Updated error message  (with D19213617 to generate the error): f158571327
{F225477071}

Reviewed By: zheng-xq

Differential Revision: D19206240

fbshipit-source-id: bd256862801d5957a26b76d738edf4e531f03827
2020-01-03 13:53:20 -08:00
40e720282c Using _floats_wrapper in per_channel_tensor generation (#31780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31780

We need to specify width to ensure the generated float is representable by `float32`
fixes: https://github.com/pytorch/pytorch/issues/31774

Test Plan:
ci

Imported from OSS

Differential Revision: D19275165

fbshipit-source-id: 50560b4208c562b6bcd2abccadd234f29fbb4b0a
2020-01-03 13:40:08 -08:00
86a4e2135d Do not register const float * type on utiliy_ops.cu (#31583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31583

But rather use `float *`, which is alredy registered

Test Plan: CI

Reviewed By: xianjiec

Differential Revision: D19221405

fbshipit-source-id: eb8eabcf828745022bc1e4185a0e65abd19a8f04
2020-01-03 13:28:26 -08:00
457c57d9f7 use unordered_set instead of vector for futureTimeouts key in (#31813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31813

Closes https://github.com/pytorch/pytorch/issues/31804. We were using
an `std::vector` for the key for a map that keeps track of futures to mark them
if they timeout, but we can instead use an `unordered_set`. This results in a
faster lookup in the code block where we remove futureIDs from this set when
they complete successfully. Previously we were finding them via a linear
`std::find`. Switching it to a constant time find will help performance in the
case where a large number of futures are scheduled to time out at the same
time, or if there is no timeout enforced.

To benchmark a rough perf improvement, I created 50k futures with the same
timeout. Before this PR, the lookup `std::find(futuresAtTime.begin(),
futuresAtTime.end(), id)` took ~200us, now it takes 1us.
ghstack-source-id: 96251355

Test Plan: Unit tests pass.

Differential Revision: D19269798

fbshipit-source-id: 1a0fa84a478ee27a16ab0b9fa6f5413b065a663e
2020-01-03 13:21:23 -08:00
b44c0f328e Skip same tests in ONNX Python3 CI as in Python2 (#31827)
Summary:
resolve https://github.com/pytorch/pytorch/issues/31103

vgg models were not tested in Python2 but are turned on in Python3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31827

Reviewed By: houseroad

Differential Revision: D19274123

Pulled By: bddppq

fbshipit-source-id: c48beb574e8b03b2adbd6c9d8ca3f600bee93024
2020-01-03 12:42:42 -08:00
79e30ff3f8 optimize index_select performance on CPU with TensorIterator (#30598)
Summary:
This PR aims at improving `index_select` performance on CPU with `TensorIterator`.
The code has equally effective optimization for both contiguous tensor and non-contiguous tensor.
The code will try to parallel inner loop in case the slice of copy is large enough, otherwise it will parallel on outer loop.
Thus both the user scenarios from DLRM (from `Embedding`) and Fairseq transformer is covered.

1. for contiguous input, single socket: **1.25x** performance speedup
2. for non-contiguous input, single socket: **799x** performance speedup
3. for contiguous input, single core: same performance
4. for non-contiguous input, single core: **31x** performance speedup
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30598

Differential Revision: D19266892

Pulled By: VitalyFedyunin

fbshipit-source-id: 7aaf8e2c861b4a96250c968c4dd95c8d2c5b92d7
2020-01-03 11:59:43 -08:00
0ae063d5d9 Fixed concatenation benchmark + added it to the microbenchmarking runs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31587

Test Plan: Imported from OSS

Differential Revision: D19221813

Pulled By: z-a-f

fbshipit-source-id: ee0eb60da7899b23fdc63326302d1e2fd4b540ee
2020-01-03 11:23:12 -08:00
9c9d3cd550 Revert D19262570: Fix race condition when creating build dir
Test Plan: revert-hammer

Differential Revision:
D19262570

Original commit changeset: bb18c72e4264

fbshipit-source-id: 40675ef6ef4c98629deaaef0b25956f92534ff50
2020-01-03 11:17:42 -08:00
a02a5129a8 Move rrelu to Aten(CPU) (#31094)
Summary:
VitalyFedyunin, this PR is about port rrelu activation to Aten:
Test script:
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    return time.time()

device = "cpu"
m = nn.RReLU(0.1, 0.3).train()
# for inference
#m = nn.RReLU(0.1, 0.3).eval()
#warm up
for n in [1, 10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.randn(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [1, 10, 100, 1000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.randn(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
**Before:**
```
Training:
input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.03 (ms).
input size(128, 10) forward time is 0.03 (ms); backwad avg time is 0.04 (ms).
input size(128, 100) forward time is 0.17 (ms); backwad avg time is 0.06 (ms).
input size(128, 1000) forward time is 1.45 (ms); backwad avg time is 0.07 (ms).
inferecne:
input size(128, 1) forward time is 0.01 (ms).
input size(128, 10) forward time is 0.01 (ms).
input size(128, 100) forward time is 0.02 (ms).
input size(128, 1000) forward time is 0.15 (ms).
```
**After:**
```
Training:
input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.03 (ms).
input size(128, 10) forward time is 0.03 (ms); backwad avg time is 0.04 (ms).
input size(128, 100) forward time is 0.17 (ms); backwad avg time is 0.07 (ms).
input size(128, 1000) forward time is 1.43 (ms); backwad avg time is 0.08 (ms).
inferecne:
input size(128, 1) forward time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms).
input size(128, 100) forward time is 0.02 (ms).
input size(128, 1000) forward time is 0.03 (ms).
```
**OMP_NUM_THREADS=1:**
```
Before:
Training:
input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.03 (ms).
input size(128, 1000) forward time is 1.45 (ms); backwad avg time is 0.14 (ms).
inferecne:
input size(128, 1) forward time is 0.01 (ms).
input size(128, 10) forward time is 0.01 (ms).
input size(128, 100) forward time is 0.02 (ms).
input size(128, 1000) forward time is 0.20 (ms).

After:
Training:
input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms).
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms).
input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.03 (ms).
input size(128, 1000) forward time is 1.43 (ms); backwad avg time is 0.15 (ms).
inferecne:
input size(128, 1) forward time is 0.01 (ms).
input size(128, 10) forward time is 0.02 (ms).
input size(128, 100) forward time is 0.02 (ms).
input size(128, 1000) forward time is 0.06 (ms).
```
Fix https://github.com/pytorch/pytorch/issues/24755, https://github.com/pytorch/pytorch/issues/24756.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31094

Differential Revision: D19270936

Pulled By: VitalyFedyunin

fbshipit-source-id: 11bb3236b1037a558022d3777d1f9a429af2bffe
2020-01-03 11:10:00 -08:00
b47e9b97a2 Add op bitwise_and (#31104)
Summary:
Refer to https://github.com/pytorch/pytorch/pull/25665,  add `bitwise_and` operator.
Benchmark script :
```
import timeit
#for __and__
for n, t in [(10, 100000),(1000, 10000)]:
    print('__and__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t))
#for __iand__
for n, t in [(10, 100000),(1000, 10000)]:
    print('__iand__ (a.numel() == {}) for {} times'.format(n, t))
    for device in ('cpu', 'cuda'):
        for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'):
            print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t))
```
Device: **Tesla P100, skx-8180**
Cuda verison: **9.0.176**

Before:
```
__and__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.1766007635742426
device: cpu, dtype: torch.uint8, 100000 times           0.17322628945112228
device: cpu, dtype: torch.int16, 100000 times           0.17650844901800156
device: cpu, dtype: torch.int32, 100000 times           0.17711848113685846
device: cpu, dtype: torch.int64, 100000 times           0.18240160401910543
device: cuda, dtype: torch.int8, 100000 times           1.273967768996954
device: cuda, dtype: torch.uint8, 100000 times          1.2778537990525365
device: cuda, dtype: torch.int16, 100000 times          1.2753686187788844
device: cuda, dtype: torch.int32, 100000 times          1.2797665279358625
device: cuda, dtype: torch.int64, 100000 times          1.2933144550770521
__and__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.031139614060521126
device: cpu, dtype: torch.uint8, 10000 times            0.03091452084481716
device: cpu, dtype: torch.int16, 10000 times            0.022756479680538177
device: cpu, dtype: torch.int32, 10000 times            0.025045674294233322
device: cpu, dtype: torch.int64, 10000 times            0.024164282716810703
device: cuda, dtype: torch.int8, 10000 times            0.12820732593536377
device: cuda, dtype: torch.uint8, 10000 times           0.12775669433176517
device: cuda, dtype: torch.int16, 10000 times           0.12697868794202805
device: cuda, dtype: torch.int32, 10000 times           0.12832533661276102
device: cuda, dtype: torch.int64, 10000 times           0.1280576130375266
__iand__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.3687064303085208
device: cpu, dtype: torch.uint8, 100000 times           0.36253443732857704
device: cpu, dtype: torch.int16, 100000 times           0.362891579978168
device: cpu, dtype: torch.int32, 100000 times           0.37680106051266193
device: cpu, dtype: torch.int64, 100000 times           0.3689364707097411
device: cuda, dtype: torch.int8, 100000 times           1.419940729625523
device: cuda, dtype: torch.uint8, 100000 times          1.4247053815051913
device: cuda, dtype: torch.int16, 100000 times          1.4191444097086787
device: cuda, dtype: torch.int32, 100000 times          1.4305962566286325
device: cuda, dtype: torch.int64, 100000 times          1.4567416654899716
__iand__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.06224383972585201
device: cpu, dtype: torch.uint8, 10000 times            0.06205617543309927
device: cpu, dtype: torch.int16, 10000 times            0.05016433447599411
device: cpu, dtype: torch.int32, 10000 times            0.05216377507895231
device: cpu, dtype: torch.int64, 10000 times            0.06139362137764692
device: cuda, dtype: torch.int8, 10000 times            0.14827249851077795
device: cuda, dtype: torch.uint8, 10000 times           0.14801877550780773
device: cuda, dtype: torch.int16, 10000 times           0.14952312968671322
device: cuda, dtype: torch.int32, 10000 times           0.14999118447303772
device: cuda, dtype: torch.int64, 10000 times           0.14951884001493454
```
After:
```
__and__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.23157884553074837
device: cpu, dtype: torch.uint8, 100000 times           0.23063660878688097
device: cpu, dtype: torch.int16, 100000 times           0.23005440644919872
device: cpu, dtype: torch.int32, 100000 times           0.23748818412423134
device: cpu, dtype: torch.int64, 100000 times           0.24106105230748653
device: cuda, dtype: torch.int8, 100000 times           1.4394256137311459
device: cuda, dtype: torch.uint8, 100000 times          1.4436759827658534
device: cuda, dtype: torch.int16, 100000 times          1.4631587155163288
device: cuda, dtype: torch.int32, 100000 times          1.459101552143693
device: cuda, dtype: torch.int64, 100000 times          1.4784048134461045
__and__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.028442862443625927
device: cpu, dtype: torch.uint8, 10000 times            0.028130197897553444
device: cpu, dtype: torch.int16, 10000 times            0.025318274274468422
device: cpu, dtype: torch.int32, 10000 times            0.02519288007169962
device: cpu, dtype: torch.int64, 10000 times            0.028299466706812382
device: cuda, dtype: torch.int8, 10000 times            0.14342594426125288
device: cuda, dtype: torch.uint8, 10000 times           0.145280827768147
device: cuda, dtype: torch.int16, 10000 times           0.14673697855323553
device: cuda, dtype: torch.int32, 10000 times           0.14499565307050943
device: cuda, dtype: torch.int64, 10000 times           0.14582364354282618
__iand__ (a.numel() == 10) for 100000 times
device: cpu, dtype: torch.int8, 100000 times            0.25548241566866636
device: cpu, dtype: torch.uint8, 100000 times           0.2552562616765499
device: cpu, dtype: torch.int16, 100000 times           0.25905191246420145
device: cpu, dtype: torch.int32, 100000 times           0.26635489892214537
device: cpu, dtype: torch.int64, 100000 times           0.26269810926169157
device: cuda, dtype: torch.int8, 100000 times           1.485458506271243
device: cuda, dtype: torch.uint8, 100000 times          1.4742380809038877
device: cuda, dtype: torch.int16, 100000 times          1.507783885113895
device: cuda, dtype: torch.int32, 100000 times          1.4926990242674947
device: cuda, dtype: torch.int64, 100000 times          1.519851053133607
__iand__ (a.numel() == 1000) for 10000 times
device: cpu, dtype: torch.int8, 10000 times             0.03425929415971041
device: cpu, dtype: torch.uint8, 10000 times            0.03293587639927864
device: cpu, dtype: torch.int16, 10000 times            0.029559112153947353
device: cpu, dtype: torch.int32, 10000 times            0.030915481969714165
device: cpu, dtype: torch.int64, 10000 times            0.03292469773441553
device: cuda, dtype: torch.int8, 10000 times            0.15792148280888796
device: cuda, dtype: torch.uint8, 10000 times           0.16000914946198463
device: cuda, dtype: torch.int16, 10000 times           0.1600684942677617
device: cuda, dtype: torch.int32, 10000 times           0.16162546630948782
device: cuda, dtype: torch.int64, 10000 times           0.1629159888252616
```
Fix  https://github.com/pytorch/pytorch/issues/24508, https://github.com/pytorch/pytorch/issues/24509,  https://github.com/pytorch/pytorch/issues/24655, https://github.com/pytorch/pytorch/issues/24656.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31104

Differential Revision: D18938930

Pulled By: VitalyFedyunin

fbshipit-source-id: a77e805a0b84e8ace16c6e648c2f67dad44f2e44
2020-01-03 10:32:36 -08:00
68f3782106 remove std_single and var_single code in TH (#31608)
Summary:
std_single and var_single in TH never be used, remove them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31608

Differential Revision: D19270920

Pulled By: VitalyFedyunin

fbshipit-source-id: e106a42383bf224f7e2c1c092b95484d23af4b0a
2020-01-03 10:16:52 -08:00
0b9cd410a9 Fix cumsum error for tensors with zero elements (#31694)
Summary:
Currently `cumsum` crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both `dim()` and `numel()` in cumsum backward

Fixes https://github.com/pytorch/pytorch/issues/31515
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31694

Reviewed By: mrshenli

Differential Revision: D19266613

Pulled By: leedtan

fbshipit-source-id: 9407e0aa55440fed911c01a3580bb6c5eab62a16
2020-01-03 10:16:46 -08:00
daf00beaba Remove duplicated Numa detection code. (#30628)
Summary:
cmake/Dependencies.cmake (1111a6b810/cmake/Dependencies.cmake (L595-L609)) has already detected Numa. Duplicated detection and variables may lead to
incorrect results.

Close https://github.com/pytorch/pytorch/issues/29968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628

Differential Revision: D18782479

Pulled By: ezyang

fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0
2020-01-03 08:48:46 -08:00
8c425dd201 Fix race condition when creating build dir (#30956)
Summary:
The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956

Differential Revision: D19262570

Pulled By: ezyang

fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc
2020-01-03 07:58:26 -08:00
f56c59ead6 clarify when to use as_tuple in torch.nonzero
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31798

Differential Revision: D19272332

Pulled By: zou3519

fbshipit-source-id: 954d086a7b9f1a719e0dac303a4253bf7ec8e9f4
2020-01-03 07:43:35 -08:00
95cb66570a Erase array sizes from types in c10::str(). (#31683)
Summary:
This dramatically reduces the number of instantiations and eliminates
~900KB of code from my local build of libtorch_cpu.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31683

Differential Revision: D19258364

Pulled By: resistor

fbshipit-source-id: addb921a26289978ffd14c203325ca7e35a4515b
2020-01-02 22:30:57 -08:00
f39105b68f add num_pending_users to debug info (#31539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31539

Adding this metric primarily because it is needed to unblock unit
tests for https://github.com/pytorch/pytorch/pull/31381. It also may be useful
to look at this metric to see the number of pending RRef forks that currently
exist.
ghstack-source-id: 96230360

Test Plan: Modified the relevant unit test.

Differential Revision: D19204158

fbshipit-source-id: 016345e52cd02cc5f46837bffd8d589ba8575f29
2020-01-02 21:28:03 -08:00
5be8dac329 Remove non-ascii character from torch/onnx/symbolic_opset11.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31814

Reviewed By: houseroad

Differential Revision: D19270742

Pulled By: bddppq

fbshipit-source-id: 80800d588e63701d6e1b5838d7ada993f0246a81
2020-01-02 20:54:32 -08:00
fc598f9023 generate op dependency graph as python code
Summary:
Add support to print op dependence as python code so that both custom
build script and BUCK can import it without yaml parser.

Test Plan:
- generate the file:
```
ANALYZE_TORCH=1 FORMAT=py DEPLOY=1 tools/code_analyzer/build.sh -closure=false
```

- load the file in python:
```
python
>>> from tools.code_analyzer.generated.torch import TORCH_DEPS
>>> print(TORCH_DEPS)
```

Differential Revision: D18894639

Pulled By: ljk53

fbshipit-source-id: e304d0525a07a13cf6e8a9317cd22637200d044c
2020-01-02 20:26:28 -08:00
fa0424f224 add LLVM-dev package to android docker image (#31215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31215

Install LLVM-dev package for code analysis CI job: #30937

LLVM-dev package is not related to android NDK but the whole code
analysis thing is for mobile custom build so choose this docker image.

Test Plan: - wait docker image to build?

Differential Revision: D19193223

Pulled By: ljk53

fbshipit-source-id: 54a79daf8d98fa7c8b9eed11f519e1c7b1614be8
2020-01-02 20:26:24 -08:00
dc43f9dc54 fix test_backward_node_failure flakiness (#31588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31588

Per title. This test can sometimes fail with a different error regex
than the one that is currently tested, so add this error regex to make the test
pass consistently.

Differential Revision: D19222275

fbshipit-source-id: 89c95276d4d9beccf9e0961f970493750d78a96b
2020-01-02 15:44:16 -08:00
155376721c Pin hypothesis package to 4.57.1 to avoid test failures
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31794

Test Plan: Imported from OSS

Differential Revision: D19266039

Pulled By: mrshenli

fbshipit-source-id: 4b1839c4de2b4476c8173a79582c861bf4fa998f
2020-01-02 15:33:03 -08:00
5f8308e32d Pin Pillow to v6 as PILLOW_VERSION is removed in v7
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31777

Test Plan: Imported from OSS

Differential Revision: D19264247

Pulled By: mrshenli

fbshipit-source-id: 52b0a3629e3a96ef2f9d3e289b9f7bb6a2745786
2020-01-02 15:32:58 -08:00
feb0ccdbfd Updating submodules
Summary:
GitHub commits:

123ae291fc
b9e9d4f7d9
86ea03e727
1cd1bfb668
917504ac42
06cc652030
e63819cbe3
6d21d8cfd3
b636829d55
19d0faece2
9860344e10

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 1de7509af788dc7861cfc779936fbc9e0146a5a5
2020-01-02 14:35:41 -08:00
ed5cd0d742 Use numeric limits to define TensorTypeSet(FULL) representation (#31668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31668

This also removes an annoying warning about change of sign conversion

Test Plan: Run unit tests

Reviewed By: ezyang

Differential Revision: D19238631

fbshipit-source-id: 29b50abac635e530d5b0453c3a0f36a4573fbf5b
2020-01-02 12:54:02 -08:00
d770fbc1d2 Some modifications to improve readability (#31352)
Summary:
In the long string, formalstring thinks it is good to have a name.

When using dict, literal is better for readability and faster than dict constructor.

I always appreciate your efforts in creating the world's best frameworks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352

Differential Revision: D19191967

Pulled By: ngimel

fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991
2020-01-02 12:48:34 -08:00
7078f4b27d skip _test_optional_float in BC check (#31786)
Summary:
Skip _test_optional_float
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31786

Reviewed By: hl475

Differential Revision: D19265059

Pulled By: houseroad

fbshipit-source-id: 6b95bd3b8cad83a4c459c0603befaaeeade6cdff
2020-01-02 11:12:38 -08:00
37fc59e847 Updating submodules
Summary:
GitHub commits:

17caab3d7b

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: f4828cd5c81615d0df86f915b3abb6a58509aa79
2020-01-02 10:57:58 -08:00
9e9bfbfd8d Update old scheduler example usage (#31358)
Summary:
Update the old example usage in CosineAnnealingWarm, `scheduler.step()` should be called after `optimizer.step()`.

https://github.com/pytorch/pytorch/issues/20028#issuecomment-566061580
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31358

Differential Revision: D19199311

Pulled By: vincentqb

fbshipit-source-id: cb29b95f8277d2dfa75ec2a83c1af03a5c9c9a69
2020-01-02 09:15:04 -08:00
c4f10e0fe7 Renaming scales parameter for interpolate (#31526)
Summary:
PR separated from https://github.com/pytorch/pytorch/pull/31274.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31526

Reviewed By: zou3519

Differential Revision: D19221931

Pulled By: gchanan

fbshipit-source-id: 81958a9910867ac9d62f2b47abc49384526c4e51
2020-01-02 08:19:30 -08:00
236b0a318c Delete ATen/stub (#31763)
Summary:
This folder contained an empty CombinedStub file which isn't explicitly used anywhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31763

Differential Revision: D19262563

Pulled By: ezyang

fbshipit-source-id: 5d095c93d6f7a1cc35f5919aa6006b31c2376b18
2020-01-02 07:04:07 -08:00
cb1af5f61f Revert D19233558: add float[] str[] constants
Test Plan: revert-hammer

Differential Revision:
D19233558

Original commit changeset: 4f7c6d9ddbe7

fbshipit-source-id: a5020a9169e349a5970323471d673e8cd7818c66
2019-12-31 11:57:34 -08:00
7a3ed36309 Fix nvcc math functions for MSVC 2019 (#31704)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31108.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31704

Differential Revision: D19256110

Pulled By: mingbowan

fbshipit-source-id: a4aba2830aba002497f70a75ef995e5e7de08393
2019-12-31 10:52:12 -08:00
1499b894c4 Apply clang-format to csrc/distributed/rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31681

Test Plan: Imported from OSS

Differential Revision: D19247085

Pulled By: mrshenli

fbshipit-source-id: ce6c1710663eecda3641d8dcf80ef16f9d21b93e
2019-12-31 07:25:50 -08:00
b102550d2c Allow to pass in masks through db (#31676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31676

Facebook:

Previously we assumed mask is passed in as a tensor which is not feasible for sparse parameter.
Here we allow to pass in the mask through db path which requires the masks to be stored in some db first.

Test Plan: unit tests

Reviewed By: ellie-wen

Differential Revision: D18928753

fbshipit-source-id: 75ca894de0f0dcd64ce17b13652484b3550cbdac
2019-12-30 20:54:27 -08:00
39297bfe08 Fix flaky test_debug_info. (#31675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31675

This test could be flaky since there could be inflight RPC requests as
part of startup which might not have finished. As a result, if they finish
between the different calls to retrieve debug_info, there could be a problem
since we would report separate information. As a result, we wait to ensure
the metrics stabilize to avoid flakiness.
ghstack-source-id: 96188488

Test Plan: waitforbuildbot

Differential Revision: D19242588

fbshipit-source-id: 8f3db7e7365acbd3742e6ec0c2ddcca68f27db9e
2019-12-30 18:07:26 -08:00
f4e955ff62 Change PackSegments to ensure consistent behavior between CPU and GPU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31673

Reviewed By: Wakeupbuddy, BIT-silence

Differential Revision: D18925762

fbshipit-source-id: e0c318e97f69b14a54f43c176af57d98fbc16c9f
2019-12-30 13:31:45 -08:00
dd0f2f0c19 add float[] str[] constants (#31503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31503

Add support for float lists and string lists constants, which enables better constant propagation + constant pooling + freezing.

Test Plan: Imported from OSS

Differential Revision: D19233558

Pulled By: eellison

fbshipit-source-id: 4f7c6d9ddbe7623757a9a20606ce5f394e14e93d
2019-12-30 11:58:17 -08:00
6064223808 @slowTest some slow tests (#31706)
Summary:
These are all the jit tests that take > 10 seconds according to `pytest test/test_jit.py --durations=15`

```
32.76s call     test/test_jit.py::TestModels::test_super_resolution
32.20s call     test/test_jit.py::TestModels::test_neural_style
30.90s call     test/test_jit.py::TestJit::test_export_batchnorm
25.95s call     test/test_jit.py::TestJit::test_dropout_module_requires_grad
22.24s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Transformer
12.38s call     test/test_jit.py::TestScript::test_fuser_double_float_codegen
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31706

Pulled By: driazati

Differential Revision: D19251567

fbshipit-source-id: 8e76f717506b8bf28d1a63ce302feb0446dc9141
2019-12-30 11:45:24 -08:00
ee87b01f40 add additional types to indexing operations dispatch (#31692)
Summary:
- Fixes https://github.com/pytorch/pytorch/issues/31672
- Adds Bfloat16 dispatch to the indexing operations that were missing it
    - index_put on cuda does not have bfloat16 dispatch, because I'm not sure bfloat16 math ops work on cuda

Note: `index_put_` with `accum=True` is enabled for `bool`, which does not make much sense, but I'm not the one who started it, so this behavior is preserved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31692

Differential Revision: D19249561

Pulled By: ngimel

fbshipit-source-id: 1269196194f7b9f611b32be198c001704731a78f
2019-12-29 23:03:54 -08:00
22d84204f7 Expose torch.poisson in documentation (#31667)
Summary:
Changelog:
- Add doc string for torch.poisson briefing current behavior
- Check for non-positive entries in the tensor passed as input to torch.poisson

Closes https://github.com/pytorch/pytorch/issues/31646
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31667

Differential Revision: D19247371

Pulled By: ngimel

fbshipit-source-id: b53d105e73bf59a45beeb566f47365c3eb74efca
2019-12-28 21:32:26 -08:00
3b7916fccd Modify the order of arguments position of torch.std and torch.std_mean in doc (#31677)
Summary:
Change log:

- [x] Change the order of arguments position of torch.std and torch.std_mean in doc.
- [x] Correct a spelling mistake of torch.std_mean in doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31677

Differential Revision: D19247372

Pulled By: ngimel

fbshipit-source-id: 8685f5207c39be524cdc81250430beac9d75f330
2019-12-28 20:36:26 -08:00
e8e47c0a1b Split RRef class into abstract RRef and RRefBase (#28942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28942

The new abstract RRef class contains only user-facing RRef APIs.
It will be later moved to a common folder so that it can be shared
by jit and distributed packages to provide TorchScript support.

Test Plan: Imported from OSS

Differential Revision: D18240590

Pulled By: mrshenli

fbshipit-source-id: ac28cfc2c8039ab7131b537b2971ed4738710acb
2019-12-28 20:01:02 -08:00
90a187618e Integrate masked sparse Adagrad (#31641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31641

Assuming mask is provided as a tensor

Test Plan: unit test

Reviewed By: ellie-wen

Differential Revision: D18928737

fbshipit-source-id: a4f3dd51769c2b56e5890043e91c18e6128be082
2019-12-27 18:40:50 -08:00
ae214f67a5 updated code to ensure error check for negative dims
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31636

Differential Revision: D19233031

Pulled By: anjali411

fbshipit-source-id: c29265ddd1f887f1a0b98aca56a2691d7584353d
2019-12-27 14:39:57 -08:00
647569e546 get rid of choco install (#30897)
Summary:
7zip and cmake are part of base image, no need to re-install. Remove the install step can make build/test more stable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30897

Differential Revision: D19232961

Pulled By: mingbowan

fbshipit-source-id: fa3bbd1325839a2a977bf13fdbd97fda43793b8d
2019-12-27 13:12:04 -08:00
35bee0c729 separate op for rowwise counter (#31612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31612

Count the number recent update on rows. Exponential decay is applied on the counter with decay rate r, such that
    r^{counter_halflife} = 0.5;
If counter_halflife is nonpositive, this operator is turned off.

Test Plan: added unittest

Reviewed By: chocjy

Differential Revision: D19217921

fbshipit-source-id: 96d850123e339212cc0e0ef352ea8a1b1bf61dfa
2019-12-27 12:18:39 -08:00
e84e7ec556 Kill aten_custom_call.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25613

Test Plan: Imported from OSS

Differential Revision: D17172503

Pulled By: gchanan

fbshipit-source-id: 1456ecca8f459d008e335412cd7084bdfcb93439
2019-12-27 11:08:42 -08:00
b522a8e1ff Optimize zero length input (#31602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31602

Pull Request resolved: https://github.com/pytorch/glow/pull/3943

Zero length input is something we hit fairly frequently in practice. Previous handling of global TensorPool involves two locks per input (acquire and reclaim). Here we use a specialized anchor tensor to host zero length input. Note that it is only padded to max sequence length. If necessary, an easy extension can be added to pad to max `InputPlaceholder.getType().size()`.

Reviewed By: jfix71

Differential Revision: D19192467

fbshipit-source-id: cafdc1eb7bf9b9d6ead04a0243b0be838f6b71cd
2019-12-26 22:31:15 -08:00
204939b401 Automatic update of fbcode/onnx to 57ebc587fcf3913b4be93653b0dd58c686447298 (#31642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31642

Previous import was c08a7b76cf7c1555ae37186f12be4d62b2c39b3b

Included changes:
- **[57ebc587](https://github.com/onnx/onnx/commit/57ebc587)**: python_out does not recognize dllexport_decl. (#2482) <xkszltl>
- **[477a9b87](https://github.com/onnx/onnx/commit/477a9b87)**: Edited PythonAPIOverview.md (#2491) <AlexMuresan>
- **[59b9f908](https://github.com/onnx/onnx/commit/59b9f908)**: Minor correction type (#2411) <Jhuo IH>
- **[cdc8b861](https://github.com/onnx/onnx/commit/cdc8b861)**: fix the optimize pass of fuse_consecutive_transposes (#2471) <XavierAtShanghai>
- **[ad1f5567](https://github.com/onnx/onnx/commit/ad1f5567)**: Add clarification for bias quantization in QlinearConv Op spec (#2464) <Ashwini Khade>
- **[d9a73ccc](https://github.com/onnx/onnx/commit/d9a73ccc)**: Add remove operator and function requirements to the add new op doc. (#2486) <Emad Barsoum>

Test Plan: cont build

Reviewed By: hl475

Differential Revision: D19234753

fbshipit-source-id: 4b7de1407d9b64e584f6e6d68cbe03fa1b4c854d
2019-12-26 21:25:04 -08:00
ffcac9ad37 Clean White List for BC Checks (#31629)
Summary:
Delete obsolete items
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31629

Reviewed By: hl475

Differential Revision: D19231522

Pulled By: houseroad

fbshipit-source-id: 393ed630f7854b643c8fa8c5f3f576718934de96
2019-12-26 21:21:39 -08:00
4983ef8de1 Integrating MaskedAdagrad
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31640

Test Plan: unit test

Reviewed By: ellie-wen

Differential Revision: D18805278

fbshipit-source-id: 1def4a89b7e4e04385c762bf127d95c5e513180e
2019-12-26 17:18:39 -08:00
Jie
909b8eba0d cudnn grouped convolution nhwc patch (#31444)
Summary:
Earlier cudnn version doesn't support grouped convolution in NHWC well. Legit
configuration in later cudnn version might return CUDNN_STATUS_NOT_SUPPORTED.
We are falling back to NCHW when runtime check of cudnn version is < 7.6.0 to
keep the logic simple.

Note:
We might update the heuristics, 7.6.0 is very conservative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31444

Differential Revision: D19232414

Pulled By: VitalyFedyunin

fbshipit-source-id: 4c2d79ed347c49cd388bbe5b2684dbfa233eb2a3
2019-12-26 17:16:02 -08:00
39508501a4 Create byte-aware word lstm benchmark (#31260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31260

1. Update the LiteLM dataset conversion script (fbcode/pytext/fb/tools/lite_lm_dataset_to_tensorproto.py)
2. Created a benchmark json file for byte-aware lstm word model (xplat/aibench/specifications/models/caffe2/assistant/lite_lm_len5.json)
3. In order to run the model -- created an int64 Tensor for the model, added batch gather ops to the BUCK file

Test Plan:
```
1. Create tensorproto of the model input
buck run mode/opt //pytext/fb/tools:byte_lm_dataset_to_tensorproto -- --in-path /mnt/vol/pytext/smart_keyboard/aibench/test_5.txt --out-path /mnt/vol/pytext/smart_keyboard/aibench/byteAwareWordLM/ --hidden_dim 203 --layers_num 2 --max_seq_len 64 --max_byte_len 15

2. Run the aibench command
buck run fbsource//xplat/aibench:run_bench -- -b aibench/specifications/models/caffe2/assistant/lm_byte_lstm_len5.json --remote --devices SM-G960U-8.0.0-26
```

Reviewed By: gardenia22

Differential Revision: D17785682

fbshipit-source-id: 351c3c8bae16449e72ac641522803b23a83349be
2019-12-26 16:44:30 -08:00
91eb7c26cd Fix Typos
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31630

Differential Revision: D19233162

Pulled By: zou3519

fbshipit-source-id: c2716a2df2b2ccfeda7718b484e9605515ecdf01
2019-12-26 15:47:10 -08:00
34dce8e348 Updating submodules
Summary:
GitHub commits:

a40d608341
50e0ea13e5
bcbdec74f4

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 3de13d5b9b20ec18927ee3f0224df789172a3e9c
2019-12-26 15:06:04 -08:00
ec4e347744 Add Python language reference docs (#30686)
Summary:
This exposes our audit of https://docs.python.org/3/reference/ with descriptions for each line item.

To generate the `.rst` from the Quip:

```bash
pip install m2r
m2r jit_language_reference.md
```

https://driazati.github.io/pytorch_doc_previews/30686/jit.html#python-functions-and-modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30686

Pulled By: driazati

Differential Revision: D19219587

fbshipit-source-id: 249db9b5ee20e38804d4302bbfeca7d54f27d0bd
2019-12-26 13:21:36 -08:00
5d95a9ca79 Print all broken ops instead of the first one (#31628)
Summary:
Originally, we only print one broken schema. With this changeset, all the broken schemas are printed out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31628

Reviewed By: hl475

Differential Revision: D19231444

Pulled By: houseroad

fbshipit-source-id: 3dd5b4609a6a9a9046e95f2f30deb9beeb5dcd56
2019-12-26 12:51:43 -08:00
cf46bcace8 Updating submodules
Summary:
GitHub commits:

faebc336da
23d8703808

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 0368879112c318607821bbf3a081669dade19148
2019-12-26 12:27:04 -08:00
866c1b1fcc Ensure legacy sparse constructor/new doesn't interpret python data as tensor data. (#31490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31490

When this happens, a dense tensor is constructed from a sparse constructor.

Fixes: https://github.com/pytorch/pytorch/issues/16154

Test Plan: Imported from OSS

Reviewed By: cpuhrsch, mrshenli

Differential Revision: D19196498

Pulled By: gchanan

fbshipit-source-id: 57a6324833e35f3e62318587ac74267077675b93
2019-12-26 10:46:18 -08:00
e2951d586d Updating submodules
Summary:
GitHub commits:

11a904583d

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: f00bf65aebddb4541faa2626d42ac436e090ee89
2019-12-26 09:49:33 -08:00
29f345831e Error out if legacy Tensor.new is called on alternate layouts / dtypes (#31485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31485

Fixes: https://github.com/pytorch/pytorch/issues/22158

Test Plan: Imported from OSS

Differential Revision: D19196499

Pulled By: gchanan

fbshipit-source-id: a01ea7641b5fcd00a9d267243539ff64a5492e5f
2019-12-26 07:27:24 -08:00
a54dc87e8e revert D18805532 and make numerics of masked adagrad consistent with unmasked adagrad (#30784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30784

Instead of putting experimental Masked*Adagrad to OSS, we decided to change D18805278 .

Test Plan: CI

Reviewed By: chocjy

Differential Revision: D18824265

fbshipit-source-id: 3d893fe6c441f2ff7af4c497cf81b9c49363e7a8
2019-12-24 10:02:13 -08:00
363d8be787 Bypass _TorchScriptTesting_StackString::pop in BC check now (#31586)
Summary:
Failed result: https://circleci.com/gh/pytorch/pytorch/4054919?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console

Original PR: https://github.com/pytorch/pytorch/pull/30242
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31586

Reviewed By: hl475

Differential Revision: D19222086

Pulled By: houseroad

fbshipit-source-id: 96db2bf18fa06eaebdd558e86615e26b95f34516
2019-12-23 22:00:20 -08:00
46ad80c839 Fix null pointer dereference on Android for strtod_c (#31582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31582

D19124934 removed a dummy pointer passed to strtod_c() that's used only for Android (https://fburl.com/diffusion/zkv34jf1). Without it, jit parsing on Android start throwing SIGSEGV due to null pointer dereferencing. This diff adds the dummy pointer back.

Test Plan: Tests

Reviewed By: driazati, shoumikhin

Differential Revision: D19221071

fbshipit-source-id: 2e230c3fbfa873c3f7b92f73c87ee766ac182115
2019-12-23 20:08:13 -08:00
446e9af5b9 Fix parsing of big float literals (#29940)
Summary:
Stacked PRs
 * **#29940 - [jit] Fix parsing of big float literals**
 * #29935 - [jit] Fix hex literal parsing
 * #29931 - [jit] Throw a better error for int too big for int64_t
](https://our.intern.facebook.com/intern/diff/19186604/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29940

Pulled By: driazati

Differential Revision: D19186604

fbshipit-source-id: 6ef66588a5cf956f281e7bd1e5584ef06f5296e9
2019-12-23 17:21:07 -08:00
218cfd568d Conv transpose/backward split 32bit (#31510)
Summary:
Basically the same as https://github.com/pytorch/pytorch/pull/31379 except for that I write a separate function `split_batch_dim_to_32bit_out` for the logic. This function could also be used for convolution forward, and I will rebase this PR after https://github.com/pytorch/pytorch/issues/31379 get merged and then change `raw_cudnn_convolution_forward_out` to use `split_batch_dim_to_32bit_out` here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31510

Differential Revision: D19210563

Pulled By: ngimel

fbshipit-source-id: e20bb82b6360aa2c0e449e127188c93f44e1e9b4
2019-12-23 11:34:17 -08:00
fb63c0e2c9 Remove -Wno-unused-private-field
Test Plan: Sanity check

Reviewed By: nlutsenko

Differential Revision: D18833450

fbshipit-source-id: c69b6679b4caa3e868ca41113cd502c8905a776b
2019-12-23 10:59:00 -08:00
68e5172382 Support optional float parameters (float?, optional<double>). (#31517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517

This is going to be used by upsample (which currently uses magic values to represent optionals).

For now, we just introduce a fake function for testing (torch._test_optional_float(x)).

Test Plan: Imported from OSS

Differential Revision: D19198721

Pulled By: gchanan

fbshipit-source-id: 0a1382fde0927c5d277d02d62bfb31fb574b8c74
2019-12-23 08:33:39 -08:00
9459db86bf Raise warning for schedulers following chainable shedulers (#31125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29697.

Raise warning for schedulers following chainable schedulers in https://github.com/pytorch/pytorch/issues/26423. See explanation for
* [new warning when load/save](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564655802)
* [change from deprecation to user warning](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564659775).

gchanan -- This should go in the upcoming release following https://github.com/pytorch/pytorch/issues/26423.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31125

Differential Revision: D19143740

Pulled By: vincentqb

fbshipit-source-id: 35b55fe6c5b39ca5a68b1a6e19f14eb95b9a784e
2019-12-23 08:24:22 -08:00
fe76af96ed fix test_process_group_debug_info flaky test (#31533)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31533

Fixes this test that was flaky and has been disabled (see
https://github.com/pytorch/pytorch/issues/31112)
ghstack-source-id: 96038999

Test Plan: Run the test 1000 times and ensure that it passes.

Differential Revision: D19203366

fbshipit-source-id: 7978cbb8ca0989a0a370a36349cdd4db3bb8345b
2019-12-22 18:01:21 -08:00
cc2d5ca37f add enabled API to autograd profiler (#31380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31380

For being able to profile async RPCs, we attach a `RecordFunction` object to the future that is created during the RPC to persist it across the lifetime of the RPC (this is implemented in the next PR: ). Since we'd only like to do this when profiling is enabled, this PR adds an enabled API to the autograd profiler.
ghstack-source-id: 96053933

Test Plan: Modified unit test.

Differential Revision: D19050391

fbshipit-source-id: aa382110e69d06b4a84c83b31d2bec2d8a81ba10
2019-12-22 16:24:59 -08:00
7d630278da Separate torchbind from Python (#30242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30242

Pull Request resolved: https://github.com/pytorch/pytorch/pull/29501

Currently blocked on schema serialization issue

Test Plan: Imported from OSS

Differential Revision: D18463063

Pulled By: jamesr66a

fbshipit-source-id: c12a1b644eb9bf04e68ff93cccf91d6cb3e75359
2019-12-21 22:52:40 -08:00
700109eb63 set stream everytime when we get a cuDNN handle (#31541)
Summary:
cudnn version of https://github.com/pytorch/pytorch/pull/31537

https://github.com/pytorch/pytorch/pull/31532 is a quick fix and this is a bigger change. This would deprecate https://github.com/pytorch/pytorch/pull/31532, but we could also merge https://github.com/pytorch/pytorch/pull/31532 first for a quick fix and then work on this later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31541

Differential Revision: D19206753

Pulled By: ngimel

fbshipit-source-id: 3352f923d13a9baf0971f64f8b7ce03e9a8b42b1
2019-12-20 21:34:40 -08:00
b5bbec7bad set stream everytime when we get a cuSparse handle (#31538)
Summary:
cuSparse version of https://github.com/pytorch/pytorch/pull/31537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31538

Differential Revision: D19206895

Pulled By: ngimel

fbshipit-source-id: a32c0bc310189a89a0098837438d62458b5c0a7c
2019-12-20 21:31:17 -08:00
8d8e82883e set stream everytime when we get a cuBlas handle (#31537)
Summary:
I don't see any reason for not doing so, because it is a common error that people forget to set the stream. And I don't think there is a reason for not running on the current stream.

This is just for cublas, cusparse and cudnn should be modified also.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31537

Differential Revision: D19206908

Pulled By: ngimel

fbshipit-source-id: ba2b2b74e9847f0495c76dbc778751a9f23f8b36
2019-12-20 21:31:13 -08:00
0b0f90f53c Split on batch dimension when 32bit indexing not enough for convolution forward (#31379)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/22496

This is just a first step towards the support of 64bit convolution on CUDA. In the forward of convolution, if the total tensor size is larger than 2^31, then we split it on the batch dimension. I want to get some review feedback before moving forward for the same splitting approach for backward.

There are real-world use cases that even when N=1 the input is still larger than 2^31. For this case, the splitting would be complicated, so I am planning to modify `use_cudnn` to just dispatch to the slow fallback kernel in PyTorch in a later PR.

Update: `later PR` is https://github.com/pytorch/pytorch/pull/31383
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31379

Differential Revision: D19192018

Pulled By: ngimel

fbshipit-source-id: c26ecc56319ac67c4d5302ffed246b8d9b5eb972
2019-12-20 21:27:06 -08:00
3820d6f6b9 make gc script python2 compatible (#31536)
Summary:
get rid of f-string, somehow we still have python2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31536

Differential Revision: D19204187

Pulled By: mingbowan

fbshipit-source-id: da8e17e4dccdd6fd1b0e92eb4740f5a09a8a4209
2019-12-20 16:34:33 -08:00
c808eed04a Nightly dimension, input shape in gradle (#30195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30195

1. Added flavorDimensions 'build' local/nightly
to be able to test the latest nightlies

```
cls && gradle clean test_app:installMobNet2QuantNightlyDebug -PABI_FILTERS=x86 --refresh-dependencies && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity
```

 2. To be able to change all new model setup editing only `test_app/build.gradle`
 Inlined model asset file names to `build.gradle`

Extracted input tensor shape to `build.gradle` (BuildConfig)

Test Plan: Imported from OSS

Differential Revision: D18893394

Pulled By: IvanKobzarev

fbshipit-source-id: 1fae9989d6f4b02afb42f8e26d0f3261d7ca929b
2019-12-20 16:08:04 -08:00
3a19980b78 Tensor class created from java does not call native methods
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31520

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D19199477

Pulled By: IvanKobzarev

fbshipit-source-id: ba51454586a9385dba4ab73936f907346e0105d1
2019-12-20 14:40:54 -08:00
11854bcd38 Add test to torch.jit.export_opnames, make the _C function private
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31446

Test Plan: Imported from OSS

Differential Revision: D19172851

Pulled By: iseeyuan

fbshipit-source-id: f06d8766ed73c9abe4ebf41c402ee64880d745be
2019-12-20 13:38:43 -08:00
81329c907d Updating submodules
Summary:
GitHub commits:

cbce6d17bb
4762e080cf
174107c0a4
8dee0e0058
ce52b27b4d
f89dea4fec
b269fc595c
5b014c641e
ae2d7e11a2

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 252ea5198c3fe4ecfe24e878ea701c48c57618de
2019-12-20 13:35:02 -08:00
35b249769d Exclude lite interpreter Java files from OSS host build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31204

Test Plan: Imported from OSS

Differential Revision: D19200610

Pulled By: dreiss

fbshipit-source-id: 0cf41c99b4c2604afc2dccfebbea213c0e1f9638
2019-12-20 13:32:27 -08:00
08de70cad1 Remove observers in the end (#31407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31407

Remove observers in the end instead of before quantize tensor
since we still need them to find the quantization paramters for each module instance

Test Plan:
.

Imported from OSS

Differential Revision: D19162367

fbshipit-source-id: f817af87183f6c42dc97becea85ddeb7e050e2b1
2019-12-20 13:17:26 -08:00
b4c48b7e29 Call getQSchemeAndQParamMap later in quantizeTensors (#31406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31406

Previously we record quantization parameters for a given value when we collect the observer nodes,
but actually the quantization parameter can vary depending on each module instance, to achieve
that, we need to delay the call to later stage and only record the `Value*` that's needed
in `collectObserverNodesAndValueToQuantize` function

Test Plan:
.

Imported from OSS

Differential Revision: D19162369

fbshipit-source-id: e0f97e322d18a281bf15b6c7bbb04c3dfacb512f
2019-12-20 13:17:21 -08:00
df9d5b8a77 Use macros instead of directly accessing Python object fields (#31388)
Summary:
The Python C API documentation states "Access to the [PyObject]
members must be done by using the macros Py_REFCNT and Py_TYPE."
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31388

Differential Revision: D19161790

Pulled By: colesbury

fbshipit-source-id: ac9a3738c913ad290a6d3460d0d657ec5c13b711
2019-12-20 12:11:17 -08:00
5375ceae80 run optimizations on pre-profiled graph (#31392)
Summary:
This is the first stab at running profile-insensitive optimizations on pre-profiled graphs. Running those optimizations has a potential to simplify graphs greatly before GuardElimination and GuardElimination should be able to remove more guards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31392

Differential Revision: D19173639

Pulled By: Krovatkin

fbshipit-source-id: 2485a2a598c10f9b5445efb30b16439ad4551b3f
2019-12-20 10:49:08 -08:00
256db1e61b Add fake parsing for torchbind classes in schema type parser
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31506

Test Plan: Imported from OSS

Differential Revision: D19187722

Pulled By: jamesr66a

fbshipit-source-id: 4529409454d64393a821b8fa795db39bc82da8fc
2019-12-20 10:28:57 -08:00
7a12ccd003 optimize FloatToFused8BitRowwiseQuantized and Fused8BitRowwiseQuantizedToFloat (#31470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31470

Optimize performance of these two operators.
Additionally use nearbyint instead of round to be consistent with 4-bit embedding table quantization.

Reviewed By: hyuen

Differential Revision: D19072103

fbshipit-source-id: efe96f14aeff7958cceb453ed625d3fd693891ff
2019-12-20 10:09:26 -08:00
0b57b383b1 Im2col export (#30972)
Summary:
Added im2col to opset 11.
This symbolic is used to export torch.nn.Unfold
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30972

Reviewed By: hl475

Differential Revision: D18946921

Pulled By: houseroad

fbshipit-source-id: 13dd0cbae899700df32fd74d6dff1f29033a2b4c
2019-12-20 09:45:45 -08:00
6cd987e7c0 Make fully_qualified_type_name_impl() compatible with VS2017 15.9 (#31455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31455

In 15.9, __FUNCSIG__ unwraps using definitions as well as preserves noexcept qualifiers

Test Plan: Build caffe2 on Windows using VS2017

Differential Revision: D19166204

fbshipit-source-id: b6c5f70e5262d13adf585f77b92223cf5f1e78dd
2019-12-20 09:17:44 -08:00
2099cfa13d Fix input_channels divisibility check in concat_split_op (#31448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31448

Replace `(!x%y)` with `(x%y != 0)`

Test Plan: CI

Reviewed By: orionr

Differential Revision: D19165492

fbshipit-source-id: 246635fb8ddd5823196bcef9d0e6cdf1c349015e
2019-12-20 09:12:54 -08:00
b38901aa15 Test reading __cuda_array_interface__ inferred strides. (#31451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31451

The PR that fixed this, https://github.com/pytorch/pytorch/pull/24947, didn't add a test.

Fixes: https://github.com/pytorch/pytorch/issues/31443

Test Plan: Imported from OSS

Differential Revision: D19170020

Pulled By: gchanan

fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b
2019-12-20 08:21:39 -08:00
d0d6e0b5e3 add type promotion support for sparse tensors (#30429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30429

also fix a bug in uncoalesced division

General approach here is that we:
* compute the common dtype based on input tensors
* error if the output tensor is specified and the common type can't be cast back to the output type (e.g. for inplace ops)
* convert input tensor (values) to the common dtype
* perform the op as normal (computing at the common dtype instead of the result type).
* convert/copy the result values back to that of the result tensor (for in-place ops).

For uncoalesced division we need to coalesce, because an integral tensor with values=[1,1] at the same index divided by 2 would give 1/2 + 1/2 =0 instead of 2/2=1.

Test Plan: Imported from OSS

Differential Revision: D19143223

Pulled By: nairbv

fbshipit-source-id: 480fa334c0b2b3df046818f2342cfd4e2d9d892a
2019-12-20 08:01:00 -08:00
e9ef087d2d Updating submodules
Summary:
GitHub commits:

357842e091
d62f47c763
dc94cd4972

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: dcb9813e1469cc867d9c826daa873c535ef408ab
2019-12-20 00:57:39 -08:00
4c341582ea modify model to enable loading by blob (#31507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31507

This script is used to generate a model with bound shape inference and
blob reorder, which are requirements for big model loading on T17.
1. Load existing model.
2. Do bound shape inference and blob reorder (put embedding blobs at the end).
3. Save the modified model.

Test Plan:
Generated a new moel and tested on NNPI.
P124181047 (mismatch is AA variance)

Reviewed By: ipiszy

Differential Revision: D19165467

fbshipit-source-id: c3522fc5dc53b7ec652420558e9e8bf65a1ccfae
2019-12-19 21:57:22 -08:00
06dbef663d Add support for del (#31273)
Summary:
Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts

Fixes #20615
](https://our.intern.facebook.com/intern/diff/19181473/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273

Pulled By: driazati

Differential Revision: D19181473

fbshipit-source-id: c42a2d43ec361a98e0c425232981edc9c39388c4
2019-12-19 21:48:11 -08:00
624088e444 Don't dispatch to cudnn if it is not possible to make it 32bit by splitting batch dim (#31383)
Summary:
Also a step towards supporting 64bit indexing in convolution.

See also: https://github.com/pytorch/pytorch/pull/31379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31383

Differential Revision: D19183443

Pulled By: ngimel

fbshipit-source-id: 0c2030fac147e629d7be0c29f0683ec2b3f28c71
2019-12-19 18:00:03 -08:00
87768e5ade Updating submodules
Summary:
GitHub commits:

286867987e
09cbf47ea5
db100834c1
1ba92b8582
60240e3f08
beb5c4798e
c37eb5d377
1ada29037c
f12539bbc9

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 75b16ea1bc038b599b3540d0615dd9eb9ecfda74
2019-12-19 17:30:48 -08:00
457286a383 fix missing type check in dictionary literal
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31375

Test Plan: Imported from OSS

Differential Revision: D19145440

Pulled By: zdevito

fbshipit-source-id: 69909089586149ef766b4858d3420864a81b2493
2019-12-19 16:22:36 -08:00
348d42114e Kill MessageType::SHUTDOWN related logic in pg agent (#31270)
Summary:
https://github.com/pytorch/pytorch/pull/30330 got rid of the need to send a `MessageType::SHUTDOWN` message, so we can now remove the logic/utils for this type of message.

I think we can also delete the enum entry in the `enum MessageType`, but we may want to keep it in case the logic in https://github.com/pytorch/pytorch/pull/30710 is ever moved to C++.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31270

Test Plan: All existing unit tests pass

Differential Revision: D19146983

Pulled By: rohan-varma

fbshipit-source-id: 35b185411f9446d7d4dfc37a6cb5477cf041e647
2019-12-19 13:47:43 -08:00
57caeb3fc1 Fix builtins table (#31492)
Summary:
Fixes a bad merge that is breaking distributed tests on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31492

Pulled By: driazati

Differential Revision: D19180978

fbshipit-source-id: f69f525e2c7f61194686f07cf75db00eb642882f
2019-12-19 13:33:15 -08:00
226c2d79ce Get QScheme from observer module (#31293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31293

Previously we check the number of elements in scale to determine if we are using per channel quantization,
but we should get qscheme information from observer module directly and we'll expose this information
to caller as well

Test Plan:
.

Imported from OSS

Differential Revision: D19146669

fbshipit-source-id: ea430eeae0ef8f441be39aa6dcc1bb530b065554
2019-12-19 13:33:11 -08:00
dbe2f265d0 Better error msg for autograd profiler + multi-worker dataloader crash (#31473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31473

Mitigates #6313

A common use case for the autograd profiler is to use it to run over an
entire model, including dataloading. The following will crash:
- run autograd profiler in CUDA mode
- Use a multi-worker DataLoader (presumably with the 'fork' spawn
method)
- because the autograd profiler initializes CUDA and forking after CUDA is
initialized is bad.

This PR puts in a nice error message when this happens so that users
aren't too confused. The new error message looks like:
https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70

Test Plan:
- Tested locally.
- I didn't add a test case for this because it's hard to write a test
case that doesn't completely stop the rest of our test suite from
running.

Differential Revision: D19178080

Pulled By: zou3519

fbshipit-source-id: c632525ba1f7b168324f1aa55416e5250f56a086
2019-12-19 13:30:19 -08:00
e67064a96f Exclude generated source docs from Google (#31484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31484

See https://github.com/pytorch/pytorch/issues/26123 for context.

Previously, when someone googles for `pytorch "adaptive_max_pool2d"`,
https://pytorch.org/docs/stable/_modules/torch/nn/modules/pooling.html
is the first result. This PR changes the docs build script to exclude
all such generated source docs under `_modules/` from Google.

It does this by doing a search for `<head>` and then appending
`<meta name="robots" content="noindex">`.
The [google developer
docs](https://support.google.com/webmasters/answer/93710?hl=en) suggest
that this is the right way to prevent google from indexing the page.

In the future, when the CI
builds documentation (both master and stable docs), the newly created
docs under _modules will have the meta noindex tag.

Test Plan:
- I ran `find "$install_path/_modules" -name "*.html" -print0 | xargs -0
sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'` on a docs
build locally and checked that it does indeed append the meta noindex
tag after `<head>`.
- In a few days we should rerun the search to see if these pages are
still being indexed.

Differential Revision: D19180300

Pulled By: zou3519

fbshipit-source-id: 5f5aa95a85dd9f065607c2a16f4cdd24ed699a83
2019-12-19 13:27:12 -08:00
8f3c0d541e Speed up Tensor::has_names for unnamed tensors (#31436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31436

Tensor::has_names is slower than it should be for unnamed tensors
because of the following:
- it always tries to access the TLS for NamesMode. Unnamed tensors don't
need to peek at NamesMode to determine if they have names or not.
- There is some virtual function being called because TensorImpl is in
c10 and NamedTensorMeta is in libtorch.

This PR short-circuits Tensor::has_names for unnamed tensors by
checking if the underlying TensorImpl hold a pointer to NamedTensorMeta
or not. If the NamedTensorMeta is nullptr; then the tensor is definitely
unnamed.

Benchmarks:
- I have a dedicated benchmarking machine where I isolate a single CPU
and make sure it runs at a fixed frequency.
- I benchmarked torch.add, which calls `tensor::has_names` three times.
- The TL;DR is that torch.add between size-1 unnamed tensors gets sped up
~200ns after this change which is a 9% improvement.
- Before, on my machine:
https://gist.github.com/zou3519/dfd648a1941d584711d850754e0694bc
- After on my machine:
https://gist.github.com/zou3519/e78f0d8980b43d0d9c3e3e78ecd0d4d5

Test Plan: - run tests

Differential Revision: D19166510

Pulled By: zou3519

fbshipit-source-id: 1888a4e92d29152a5e3b778a95e531087e532f53
2019-12-19 13:19:30 -08:00
9d9bc93bfb Added error message to indicate that reduction operations are not supported for dim>=64 (#31476)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/23159
Currently we don't support reduction operations for dim>=64 and we should give a descriptive RuntimeError indicating the same
Diff: D19179039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31476

Differential Revision: D19179039

Pulled By: anjali411

fbshipit-source-id: 58568f64627bf3df6b3e00a1498544c030e74a0e
2019-12-19 13:00:53 -08:00
779b128872 add back in reference to jit_unsupported section (#31486)
Summary:
It was added in https://github.com/pytorch/pytorch/pull/31329 and removed in a bad merge in https://github.com/pytorch/pytorch/pull/31138/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31486

Differential Revision: D19181967

Pulled By: eellison

fbshipit-source-id: 7e4b4a9b2042c30ec18f7f737bc4a9a56fac7d92
2019-12-19 12:44:16 -08:00
49fe7a7401 Updated documentation for NLLLoss to explain what x, y and w refer to (#31488)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/31385

In the current documentation for NLLLoss, it's unclear what `y` refers to in the math section of the loss description. There was an issue(https://github.com/pytorch/pytorch/issues/31295) filed earlier where there was a confusion if the loss returned for reduction=mean is right or not, perhaps because of lack in clarity of formula symbol description in the current documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31488

Differential Revision: D19181391

Pulled By: anjali411

fbshipit-source-id: 8b75f97aef93c92c26ecbce55b3faf2cd01d3e74
2019-12-19 12:28:16 -08:00
d6acc87c93 Guard against copying from quantized Tensor to non-quantized Tensor (#29660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29660

att

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D18799897

fbshipit-source-id: 5d1b4ef84f5ae8eba830784b74485d78fa1e6fcf
2019-12-19 12:16:44 -08:00
c4121ed8db Fix is_fundamental template for MSVC (#30959)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30932
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959

Differential Revision: D18891797

Pulled By: mingbowan

fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1
2019-12-19 12:10:22 -08:00
6d6a91fb0f Updating submodules
Summary:
GitHub commits:

58a1ec274c
24da1c8b66
77d5ba7887
c7b80d7ab5

Test Plan: n/a

Reviewed By: tgreenidge

fbshipit-source-id: be872df9014b795b279b93bd81efbaa41f2d0fd7
2019-12-19 12:05:29 -08:00
28376e826d Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31463

Pulled By: driazati

Differential Revision: D19173580

fbshipit-source-id: 6e5bb24949ec357c4d5b29a16d1733b664f21e05
2019-12-19 10:17:01 -08:00
540b9da41e Bump numba version in circleCI config to 0.46.0. (#31435)
Summary:
The current numba version doesn't appear to actually work with our numba-cuda tests (numba.cuda.is_available()) fails.

Previous attempts to upgrade were blocked by https://github.com/numba/numba/issues/4368.

It's a bit unclear to me, but I believe 0.46.0 fixes the above version.  I'm verify that we catch that issue in CI via https://github.com/pytorch/pytorch/pull/31434.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31435

Differential Revision: D19166865

Pulled By: gchanan

fbshipit-source-id: e01fa48c577e35de178423db7a7f79ac3dd3894d
2019-12-19 07:55:55 -08:00
fc3103b116 fixing a naming issue in creating a residual loop node in a bailout graph (#31400)
Summary:
This addresses the issue of differentiating between `%4` in
`%12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3)` and `%y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24` in `%4` loop's body in a residual continuation loop, because these should be different values.

```
[DUMP profiling_graph_executor_impl.cpp:124] with prim::BailoutTemplate_0 = graph(%z.1 : int,
[DUMP profiling_graph_executor_impl.cpp:124]       %size.1 : int):
[DUMP profiling_graph_executor_impl.cpp:124]   %2 : Tensor = prim::Constant[value= 1  1 [ CPUDoubleType{2} ]]()
[DUMP profiling_graph_executor_impl.cpp:124]   %3 : Double(2) = prim::BailOut[index=0](%2, %z.1, %size.1)
[DUMP profiling_graph_executor_impl.cpp:124]   %4 : int = prim::Constant[value=0]() # test_jit.py:3772:54
[DUMP profiling_graph_executor_impl.cpp:124]   %5 : None = prim::Constant()
[DUMP profiling_graph_executor_impl.cpp:124]   %6 : bool = prim::Constant[value=1]() # test_jit.py:3770:16
[DUMP profiling_graph_executor_impl.cpp:124]   %counters.1 : int[] = prim::ListConstruct()
[DUMP profiling_graph_executor_impl.cpp:124]   %8 : int = prim::Constant[value=8]()
[DUMP profiling_graph_executor_impl.cpp:124]   %9 : int = aten::__round_to_zero_floordiv(%size.1, %8)
[DUMP profiling_graph_executor_impl.cpp:124]   %10 : int = aten::mul(%9, %8)
[DUMP profiling_graph_executor_impl.cpp:124]   %11 : int = aten::sub(%size.1, %10)
[DUMP profiling_graph_executor_impl.cpp:124]   %12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3) # test_jit.py:3770:16
[DUMP profiling_graph_executor_impl.cpp:124]     block0(%i.2 : int, %15 : int, %y.7 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:124]       %17 : Double(2) = prim::BailOut[index=1](%y.7, %z.1, %counters.1, %9, %11, %i.2, %15)
[DUMP profiling_graph_executor_impl.cpp:124]       %18 : int[] = aten::append(%counters.1, %15) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %19 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %20 : Tensor = aten::ones(%19, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %21 : Double(1) = prim::BailOut[index=2](%20, %z.1, %counters.1, %9, %11, %i.2, %15, %17)
[DUMP profiling_graph_executor_impl.cpp:124]       %22 : Tensor[] = prim::ListConstruct(%17, %21)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %24 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:124]       %25 : int = aten::add(%15, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %26 : int[] = aten::append(%counters.1, %25) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %27 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %28 : Tensor = aten::ones(%27, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %29 : Double(1) = prim::BailOut[index=3](%28, %z.1, %counters.1, %9, %11, %i.2, %y.5, %25)
[DUMP profiling_graph_executor_impl.cpp:124]       %30 : Tensor[] = prim::ListConstruct(%y.5, %29)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.9 : Double(4) = aten::cat(%30, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %32 : int = aten::add(%25, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %33 : int[] = aten::append(%counters.1, %32) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %34 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %35 : Tensor = aten::ones(%34, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %36 : Double(1) = prim::BailOut[index=4](%35, %z.1, %counters.1, %9, %11, %i.2, %y.9, %32)
[DUMP profiling_graph_executor_impl.cpp:124]       %37 : Tensor[] = prim::ListConstruct(%y.9, %36)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.10 : Double(5) = aten::cat(%37, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %39 : int = aten::add(%32, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %40 : int[] = aten::append(%counters.1, %39) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %41 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %42 : Tensor = aten::ones(%41, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %43 : Double(1) = prim::BailOut[index=5](%42, %z.1, %counters.1, %9, %11, %i.2, %y.10, %39)
[DUMP profiling_graph_executor_impl.cpp:124]       %44 : Tensor[] = prim::ListConstruct(%y.10, %43)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.11 : Double(6) = aten::cat(%44, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %46 : int = aten::add(%39, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %47 : int[] = aten::append(%counters.1, %46) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %48 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %49 : Tensor = aten::ones(%48, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %50 : Double(1) = prim::BailOut[index=6](%49, %z.1, %counters.1, %9, %11, %i.2, %y.11, %46)
[DUMP profiling_graph_executor_impl.cpp:124]       %51 : Tensor[] = prim::ListConstruct(%y.11, %50)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.12 : Double(7) = aten::cat(%51, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %53 : int = aten::add(%46, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %54 : int[] = aten::append(%counters.1, %53) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %55 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %56 : Tensor = aten::ones(%55, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %57 : Double(1) = prim::BailOut[index=7](%56, %z.1, %counters.1, %9, %11, %i.2, %y.12, %53)
[DUMP profiling_graph_executor_impl.cpp:124]       %58 : Tensor[] = prim::ListConstruct(%y.12, %57)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.13 : Double(8) = aten::cat(%58, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %60 : int = aten::add(%53, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %61 : int[] = aten::append(%counters.1, %60) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %62 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %63 : Tensor = aten::ones(%62, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %64 : Double(1) = prim::BailOut[index=8](%63, %z.1, %counters.1, %9, %11, %i.2, %y.13, %60)
[DUMP profiling_graph_executor_impl.cpp:124]       %65 : Tensor[] = prim::ListConstruct(%y.13, %64)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.14 : Double(9) = aten::cat(%65, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %67 : int = aten::add(%60, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       %68 : int[] = aten::append(%counters.1, %67) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %69 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %70 : Tensor = aten::ones(%69, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %71 : Double(1) = prim::BailOut[index=9](%70, %z.1, %counters.1, %9, %11, %i.2, %y.14, %67)
[DUMP profiling_graph_executor_impl.cpp:124]       %72 : Tensor[] = prim::ListConstruct(%y.14, %71)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.15 : Tensor = aten::cat(%72, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %74 : int = aten::add(%67, %24)
[DUMP profiling_graph_executor_impl.cpp:124]       -> (%6, %74, %y.15)
[DUMP profiling_graph_executor_impl.cpp:124]   %75 : Double(10) = prim::BailOut[index=10](%y.1, %z.1, %counters.1, %11, %12)
[DUMP profiling_graph_executor_impl.cpp:124]   %76 : int, %y : Tensor = prim::Loop(%11, %6, %12, %75) # test_jit.py:3770:16
[DUMP profiling_graph_executor_impl.cpp:124]     block0(%i.1 : int, %79 : int, %y.6 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:124]       %81 : Double(*) = prim::BailOut[index=11](%y.6, %z.1, %counters.1, %11, %i.1, %79)
[DUMP profiling_graph_executor_impl.cpp:124]       %82 : int[] = aten::append(%counters.1, %79) # test_jit.py:3771:20
[DUMP profiling_graph_executor_impl.cpp:124]       %83 : int[] = prim::ListConstruct(%z.1)
[DUMP profiling_graph_executor_impl.cpp:124]       %84 : Tensor = aten::ones(%83, %5, %5, %5, %5) # test_jit.py:3772:38
[DUMP profiling_graph_executor_impl.cpp:124]       %85 : Double(1) = prim::BailOut[index=12](%84, %counters.1, %11, %i.1, %79, %81)
[DUMP profiling_graph_executor_impl.cpp:124]       %86 : Tensor[] = prim::ListConstruct(%81, %85)
[DUMP profiling_graph_executor_impl.cpp:124]       %y.4 : Tensor = aten::cat(%86, %4) # test_jit.py:3772:24
[DUMP profiling_graph_executor_impl.cpp:124]       %88 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:124]       %89 : int = aten::add(%79, %88)
[DUMP profiling_graph_executor_impl.cpp:124]       -> (%6, %89, %y.4)
[DUMP profiling_graph_executor_impl.cpp:124]   %90 : Double(12) = prim::BailOut[index=13](%y, %counters.1)
[DUMP profiling_graph_executor_impl.cpp:124]   %91 : (Tensor, int[]) = prim::TupleConstruct(%90, %counters.1)
[DUMP profiling_graph_executor_impl.cpp:124]   return (%91)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31400

Differential Revision: D19172750

Pulled By: Krovatkin

fbshipit-source-id: 85d3aac4e80b65b83b6be3c0bca8075a731a2b7e
2019-12-19 00:34:50 -08:00
1e116a5089 Revert D19054937: Add support for del
Test Plan: revert-hammer

Differential Revision:
D19054937

Original commit changeset: c535ea16a9e6

fbshipit-source-id: e57d31811441947b7ee38c8c2b16eecde5005792
2019-12-18 22:39:41 -08:00
489dd6cb90 Add TORCH_DCHECK macro that checks only in debug builds (#31240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31240

Follow up on discoveries/discussions in https://github.com/pytorch/pytorch/pull/30810

Mimic the `DCHECK` macro from https://github.com/pytorch/pytorch/blob/e5eb871/c10/util/logging_is_not_google_glog.h#L117-L125

With this change the perf gap is eliminated:

```
================================================================================
Program Output:
================================================================================
Run on (36 X 1601 MHz CPU s)
2019-12-12 20:12:13
-----------------------------------------------------------------
Benchmark                          Time           CPU Iterations
-----------------------------------------------------------------
BM_IntrusivePtrCtorDtor           23 ns         23 ns   30914703
BM_SharedPtrCtorDtor              27 ns         27 ns   25895944
BM_IntrusivePtrArray/16          503 ns        503 ns    1392139
BM_IntrusivePtrArray/32         1006 ns       1006 ns     695749
BM_IntrusivePtrArray/64         2013 ns       2013 ns     347714
BM_IntrusivePtrArray/128        4024 ns       4024 ns     173964
BM_IntrusivePtrArray/256        8047 ns       8047 ns      86994
BM_IntrusivePtrArray/512       16106 ns      16106 ns      43461
BM_IntrusivePtrArray/1024      32208 ns      32207 ns      21731
BM_IntrusivePtrArray/2048      64431 ns      64430 ns      10865
BM_IntrusivePtrArray/4096     128940 ns     128938 ns       5429
BM_SharedPtrArray/16             503 ns        503 ns    1392128
BM_SharedPtrArray/32            1006 ns       1006 ns     695940
BM_SharedPtrArray/64            2012 ns       2012 ns     347817
BM_SharedPtrArray/128           4024 ns       4023 ns     173927
BM_SharedPtrArray/256           8069 ns       8069 ns      86741
BM_SharedPtrArray/512          16143 ns      16142 ns      43357
BM_SharedPtrArray/1024         32283 ns      32283 ns      21685
BM_SharedPtrArray/2048         64718 ns      64717 ns      10817
BM_SharedPtrArray/4096        129469 ns     129466 ns       5407
================================================================================
```
```
================================================================================
Program Output:
================================================================================
Run on (80 X 2001 MHz CPU s)
2019-12-12 20:12:23
-----------------------------------------------------------------
Benchmark                          Time           CPU Iterations
-----------------------------------------------------------------
BM_IntrusivePtrCtorDtor           18 ns         18 ns   38630411
BM_SharedPtrCtorDtor              22 ns         22 ns   32356114
BM_IntrusivePtrArray/16          402 ns        402 ns    1739637
BM_IntrusivePtrArray/32          805 ns        805 ns     869818
BM_IntrusivePtrArray/64         1610 ns       1609 ns     434881
BM_IntrusivePtrArray/128        3218 ns       3218 ns     217437
BM_IntrusivePtrArray/256        6436 ns       6436 ns     108739
BM_IntrusivePtrArray/512       12882 ns      12882 ns      54356
BM_IntrusivePtrArray/1024      25763 ns      25763 ns      27177
BM_IntrusivePtrArray/2048      51532 ns      51531 ns      13590
BM_IntrusivePtrArray/4096     103091 ns     103091 ns       6778
BM_SharedPtrArray/16             402 ns        402 ns    1740165
BM_SharedPtrArray/32             804 ns        804 ns     869035
BM_SharedPtrArray/64            1610 ns       1610 ns     434975
BM_SharedPtrArray/128           3218 ns       3218 ns     217505
BM_SharedPtrArray/256           6457 ns       6457 ns     108510
BM_SharedPtrArray/512          12909 ns      12909 ns      54249
BM_SharedPtrArray/1024         25810 ns      25810 ns      27127
BM_SharedPtrArray/2048         51763 ns      51763 ns      13531
BM_SharedPtrArray/4096        103506 ns     103505 ns       6759
================================================================================
```

Test Plan:
buck test caffe2/c10/...
buck test mode/opt caffe2/c10/...

Differential Revision: D18998243

fbshipit-source-id: ddf0a118a80efe032b52d403867c1f416c721590
2019-12-18 21:55:58 -08:00
fb24f7c4ad catch all exceptions in converting default values to ivalues (#31398)
Summary:
Previously we would only catch `py::cast_error` which led to incomprehensible error messages like: `TypeError: 'NoneType' object is not iterable`. We are running arbitrary pybind code here, and not doing anything with the error message, so we should be less restrictive with the types of errors we catch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31398

Differential Revision: D19166655

Pulled By: eellison

fbshipit-source-id: 84db8b3714c718b475913f2f4bb6f19e62f2d9ec
2019-12-18 20:27:46 -08:00
1bb6c51421 Fix getAttribute (#31011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31011

`getAttribute` is supposed to throw when there the attribute is not
found rather than return a `nullptr`.

Test Plan:
.

Imported from OSS

Differential Revision: D18898417

fbshipit-source-id: 0fe7d824b978ad19bb5ef094d3aa560e9fc57f87
2019-12-18 19:27:39 -08:00
dff7b945bf Avoid sending large unneeded data over wire in process_group_agent. (#31357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31357

If a user selects a subset of a Tensor and sends it in an RPC, we were sending
the whole original Tensor Storage over the network.

While this sounds reasonable, in practice, we observed view-like Tensors being sent
over rpc, where only 1% of the data in the provided Tensor's Storage was
actually used/needed.

The simple solution here is to just force a clone in the serializer code if we see that
less than (arbitrary) half the bits are used, and the tensor is more than a nominal few KB.
Add related tests to ensure this doesn't break.

An alternate approach would be to modify the Pickler. That said, since Pickler is shared by more
components, the logic might be harder to tailor appropriately at that layer (particularly
given that the Pickler has explicit logic to share a single Storage* among several Tensors
that commonly point to the same Storage*).

It's possible that we might want to further refine the basic thresholds in this change.
In practice, we've seen a mostly bimodal distribution thus far for the percent of Tensor
Storage referred by a Tensor in observed rpcs (i.e. either 90%+ or sub-10% of the Storage
referenced), hence the existing 50% threshold here is probably not an unreasonable
starting point.
ghstack-source-id: 95925474

Test Plan: buck test mode/dev caffe2/test/cpp/rpc/...

Differential Revision: D19137056

fbshipit-source-id: e2b3a4dd0cc6e1de820fd0740aa1d59883dbf8d4
2019-12-18 19:24:24 -08:00
1bb800cf5c Updating submodules
Summary:
GitHub commits:

f5d37bdcfd
21ba9e3692
576eeaee27
7ba1f57d53
e520f8f5b3
54f9092b0c
88bb770ce1
d91888de6c
ff06eb0881
fdaeb6ea30
1fd432f00f
60b7cb3408

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: f63bd0a879f4d08e159f530f595067f5a09ffe70
2019-12-18 18:41:23 -08:00
fe707c7849 Use default_observer and default_weight_observer in tests (#31424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31424

att

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D19162368

fbshipit-source-id: 33b95ba643eeeae942283bbc33f7ceda8d14c431
2019-12-18 18:35:07 -08:00
e1509cb468 Add support for del (#31273)
Summary:
Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts

Fixes #20615
](https://our.intern.facebook.com/intern/diff/19054937/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273

Pulled By: driazati

Differential Revision: D19054937

fbshipit-source-id: c535ea16a9e62d176f8ad45947670fc3535af77c
2019-12-18 18:19:22 -08:00
e7d25a3e4d add a suggested alternative to _get_trace_graph
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31441

Test Plan: Imported from OSS

Differential Revision: D19165646

Pulled By: suo

fbshipit-source-id: 96a264bc55ceafd798d92b986d319cddbb0d9c69
2019-12-18 17:34:25 -08:00
d2e66b44cc Temporary fix to support building pytorch from fbsource (for xplat dependencies) (#31393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31393

pytorch build was set up with the include paths (-I) relative to fbcode/. This works well for fbcode builds, but doesn't work for the new fbcode_deps args for xplat build targets that work across xplat and fbcode. When these targets are built, the include paths need to be relative to fbsource, so fbcode/ suffix needs to be added to those paths.

Longer term, to properly fix this, we need to use raw_headers with public_include_directories specified for all of these targets.

Test Plan: buck test mode/dev //papaya/integration/service/local/test:mnist_federated_system_test -- 'MnistFederatedSystemTest\.test' --run-disabled

Reviewed By: mzlee

Differential Revision: D19148465

fbshipit-source-id: a610e84bf4cad5838e54e94bae71b957c4b6d4b5
2019-12-18 17:30:57 -08:00
a3cdb7eca3 Fix default instantation of dynamic quantized LSTM
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31433

Test Plan: Imported from OSS

Differential Revision: D19164539

Pulled By: jamesr66a

fbshipit-source-id: 7045817ab3dfb530c4480a10523c4c6bcdbfc7eb
2019-12-18 16:59:00 -08:00
1e80ff7a67 autograd/profiler: make record_function more threadsafe (#31346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31346

This makes it so that if profiling is enabled/disabled from a different thread while a RecordFunction span is active via an op it doesn't crash the process.

We currently see when using torch.distributed.rpc to enable/disable profiling on other nodes while other things are running.

Test Plan: buck test //caffe2/test:autograd -- test_record_function

Reviewed By: albanD

Differential Revision: D19133258

fbshipit-source-id: 30712b06c6aa051789948de2918dcfb9b78967ba
2019-12-18 16:27:42 -08:00
148bcd3ee5 Add support for builtins as attributes (#31269)
Summary:
Fixes #27495

This adds builtins as another piece of a concrete type. They're separate from normal functions since they represent the `BuiltinFunction` sugared value (which is a direct call to a builtin op). It also moves the builtins related logic from `jit/__init__.py` to `jit/_builtins.py` so it can be used from `jit/_recursive.py` to look up functions in the builtins table.
](https://our.intern.facebook.com/intern/diff/19149779/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31269

Pulled By: driazati

Differential Revision: D19149779

fbshipit-source-id: d4e5e5d7d7d528b75a2f503e6004394251a4e82d
2019-12-18 15:24:45 -08:00
503a4e9019 Cleanup after moving language reference (#31146)
Summary:
Stacked PRs
 * **#31146 - [jit] Cleanup after moving language reference**
 * #31138 - [jit] Move TorchScript language reference to its own page

Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language

Pull Request resolved: https://github.com/pytorch/pytorch/pull/31146

Pulled By: driazati

Differential Revision: D19167390

fbshipit-source-id: f28daed36754a553264fc8ac142ed22c3e26d63e
2019-12-18 15:09:35 -08:00
ae2487bf4d Move TorchScript language reference to its own page (#31138)
Summary:
Stacked PRs
 * #31146 - [jit] Cleanup after moving language reference
 * **#31138 - [jit] Move TorchScript language reference to its own page**

Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language

Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138

Pulled By: driazati

Differential Revision: D19167375

fbshipit-source-id: d37110d85fc8b8d2c741be49846e873de1357c2a
2019-12-18 15:09:31 -08:00
d08250c223 fix zero-batch handling in convtranspose (#24341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24341

ConvTransposeOp doesn't crash for zero-batch, but it doesn't modify the output blob. This leads to buggy behaviour especially when running the same network twice using different input, or backprop during training.

Seems `ConvTransposeUnpoolBase<Context>::GetOutputSize` works for zero-batch, so I remove the check for `input.numel() > 0`, and reshape the output blob before returning.

For CudnnConvTransposeGradientOp, it's a bit verbose to set `dfilter` and `dbias`, it's a  seems the Cudnn can handle it, so simply remove the `X.numel() == 0` branch.

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:conv_transpose_test -- --run-disabled

Reviewed By: BIT-silence

Differential Revision: D16807606

fbshipit-source-id: 0d72c5bd8f2e03c34465e7b530cca548d9bdd5e1
2019-12-18 15:06:36 -08:00
7692494c67 Fix hex literal parsing (#29935)
Summary:
Stacked PRs
 * #29940 - [jit] Fix parsing of big float literals
 * **#29935 - [jit] Fix hex literal parsing**
 * #29931 - [jit] Throw a better error for int too big for int64_t

Previously these were all parsed as `0`
](https://our.intern.facebook.com/intern/diff/19124944/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29935

Pulled By: driazati

Differential Revision: D19124944

fbshipit-source-id: 1ee0c1dee589933363a5efba069a2cfaf94373c5
2019-12-18 14:00:22 -08:00
1f50cfc24d Throw a better error for int too big for int64_t
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29931

Pulled By: driazati

Differential Revision: D19124934

fbshipit-source-id: 91841d7ba4f2f6142c51fba07b7faa14bb817e3a
2019-12-18 14:00:16 -08:00
fb30a48b4e add unsupported section (#31329)
Summary:
Add a section for unsupported ops, and modules. Automatically generate the properties and attributes that aren't bound, and for ops that have semantic mismatches set up tests so the docs stay up to date.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31329

Differential Revision: D19164472

Pulled By: eellison

fbshipit-source-id: 46290bb8a64d9de928cfb1eda5ff4558c3799c88
2019-12-18 13:56:02 -08:00
5e8bac24b4 Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#28135)
Summary:
Fix: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765

Port of TH SoftMarginCriterion to ATen using un-fused tensor operators but with custom backward code. This is a follow-up/fixc of reverted PR https://github.com/pytorch/pytorch/issues/27673.

Benchmark results:

CPU became faster, GPU slower. To reach previous TH perf probably manual fusion is necessary.

### WITH patch
```
CPU warmup 1000 took 7.997200009413064e-05
CPU warmup 10000 took 0.0008116499957395718
CPU warmup 100000 took 0.0012691459996858612
CPU warmup TOTAL time 0.0021982479956932366
CPU forward 1000 took 7.320100849028677e-05
CPU forward 10000 took 0.00015837099635973573
CPU forward 100000 took 0.0010471990099176764
CPU forward 1000000 took 0.01238470000680536
CPU forward 10000000 took 0.12747182900784537
CPU forward 100000000 took 1.2076255190040683
CPU forward TOTAL time 1.3488940890092636
CPU for- & backward 1000 took 0.00032587299938313663
CPU for- & backward 10000 took 0.0006926299975020811
CPU for- & backward 100000 took 0.002146183993318118
CPU for- & backward 1000000 took 0.019158899012836628
CPU for- & backward 10000000 took 0.2957490350090666
CPU for- & backward 100000000 took 1.7630806300003314
CPU for- & backward TOTAL time 2.081367089995183

GPU warmup 1000 took 0.0004558280052151531
GPU warmup 10000 took 0.0002567449992056936
GPU warmup 100000 took 0.0001593509950907901
GPU warmup TOTAL time 0.0009442300070077181
GPU forward 1000 took 0.00015061900194268674
GPU forward 10000 took 0.00015258099301718175
GPU forward 100000 took 0.00015409699699375778
GPU forward 1000000 took 0.0008183339959941804
GPU forward 10000000 took 0.004424853003001772
GPU forward 100000000 took 0.04356115800328553
GPU forward TOTAL time 0.04938192600093316
GPU for- & backward 1000 took 0.0008062430133577436
GPU for- & backward 10000 took 0.0006074949924368411
GPU for- & backward 100000 took 0.0007091690058587119
GPU for- & backward 1000000 took 0.001022183001623489
GPU for- & backward 10000000 took 0.009945805999450386
GPU for- & backward 100000000 took 0.0944173600000795
GPU for- & backward TOTAL time 0.28060428200114984
```

### WITHOUT patch
```
CPU warmup 1000 took 6.394000956788659e-05
CPU warmup 10000 took 0.00038220599526539445
CPU warmup 100000 took 0.0034939230099553242
CPU warmup TOTAL time 0.003981974994530901
CPU forward 1000 took 4.7855006414465606e-05
CPU forward 10000 took 0.000347569992300123
CPU forward 100000 took 0.003367935001733713
CPU forward 1000000 took 0.03605044000141788
CPU forward 10000000 took 0.35935167300340254
CPU forward 100000000 took 3.630371332008508
CPU forward TOTAL time 4.029640004009707
CPU for- & backward 1000 took 0.00028494100843090564
CPU for- & backward 10000 took 0.0006738200027029961
CPU for- & backward 100000 took 0.0051178760040784255
CPU for- & backward 1000000 took 0.04925115800870117
CPU for- & backward 10000000 took 0.7172313440096332
CPU for- & backward 100000000 took 5.441953932997421
CPU for- & backward TOTAL time 6.21466830400459

GPU warmup 1000 took 0.001803738996386528
GPU warmup 10000 took 0.00041877900366671383
GPU warmup 100000 took 0.0003870719956466928
GPU warmup TOTAL time 0.0026561370032140985
GPU forward 1000 took 0.00037833399255760014
GPU forward 10000 took 0.00038825398951303214
GPU forward 100000 took 0.0003841099969577044
GPU forward 1000000 took 0.0007090550061548129
GPU forward 10000000 took 0.0016171559982467443
GPU forward 100000000 took 0.013463679002597928
GPU forward TOTAL time 0.017010531009873375
GPU for- & backward 1000 took 0.0007374050037469715
GPU for- & backward 10000 took 0.0006343529967125505
GPU for- & backward 100000 took 0.0006375070079229772
GPU for- & backward 1000000 took 0.0007550300069851801
GPU for- & backward 10000000 took 0.002672752001672052
GPU for- & backward 100000000 took 0.023170708998804912
GPU for- & backward TOTAL time 0.20251446698966902
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28135

Differential Revision: D18001447

Pulled By: VitalyFedyunin

fbshipit-source-id: ad90dc1cca42dcaf3ea9e17e4f8fd79cee0a293e
2019-12-18 13:33:59 -08:00
7cf8b9bada Move leaky_relu to Aten(CPU, CUDA) (#29899)
Summary:
VitalyFedyunin, This PR is about port LeakyReLU activation to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.LeakyReLU()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).

CPU:
OMP_NUM_THREADS=56
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.14 (ms).
input size(128, 10000) forward time is 4.21 (ms); backwad avg time is 8.02 (ms).
OMP_NUM_THREADS=1
input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.07 (ms).
input size(128, 10000) forward time is 1.98 (ms); backwad avg time is 6.21 (ms)
```
After:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).

CPU:
OMP_NUM_THREADS=56
input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.04 (ms).
input size(128, 10000) forward time is 0.03 (ms); backwad avg time is 0.09 (ms).
OMP_NUM_THREADS=1
input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms).
input size(128, 10000) forward time is 0.47 (ms); backwad avg time is 1.02 (ms).
```
How to set the numbers of thread? using following script:
```
num_threads=$1
script=$2
last_core=`expr $num_threads - 1`
echo "using $num_threads OMP threads"
echo "bind cores to 0~$last_core"
export OMP_NUM_THREADS=$num_threads
export KMP_AFFINITY=granularity=fine,compact,1,0
numactl --physcpubind=0-$last_core --membind=0 python $script
```
and run .**/run.sh num_threads test.py**.

Fixes https://github.com/pytorch/pytorch/issues/24583 #24584 https://github.com/pytorch/pytorch/issues/24720 #24721
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29899

Differential Revision: D18816231

Pulled By: VitalyFedyunin

fbshipit-source-id: afb1e43a99317d17f50cff1b593cd8f7a0a83da2
2019-12-18 13:14:11 -08:00
b0bd35ff13 caffe2/event: allow multiple errors such as when cancelled (#31335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335

When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well.

Typically we see:
1. SendOp failed due to a network error
2. async scheduling cancels all other ops via `SetFinished("Cancelled");`
3. Another SendOp fails due to a network error and crashes the process when the exception is thrown.

This changes caffe2 ops to allow failing twice.

Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu

Reviewed By: andrewwdye

Differential Revision: D19106548

fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9
2019-12-18 13:10:57 -08:00
4d22c3ba01 fix docker login, add docker image tag list after purge as html (#31328)
Summary:
example of the generated html: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31328

Differential Revision: D19147113

Pulled By: mingbowan

fbshipit-source-id: 5104e92d4490f047a6474e2b12aed3293b52a9df
2019-12-18 12:08:51 -08:00
47766e648f C++ API parity: MultiheadAttention
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27309

Test Plan: Imported from OSS

Differential Revision: D17766736

Pulled By: pbelevich

fbshipit-source-id: 7a5f2399f081945d31d4c13d7a8d248c387fc1a6
2019-12-18 10:13:29 -08:00
c63f8e5ebe Fix typo in data.rst docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31395

Differential Revision: D19160010

Pulled By: zou3519

fbshipit-source-id: cbc4e719e69117e8747617729d240c72e7a4e3dd
2019-12-18 09:52:10 -08:00
285cc13435 check devices for all input tensors in index_put (#31280)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/30960
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31280

Differential Revision: D19149114

Pulled By: ngimel

fbshipit-source-id: af185a98ac6ea614f43bbf865de02ea113d4ed56
2019-12-18 09:25:40 -08:00
913323750d CODEOWNERS for distributed optimizer. (#31403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31403

ghstack-source-id: 95874532

Test Plan: waitforbuildbot

Differential Revision: D19154217

fbshipit-source-id: a18ebe646b97c83cc0eb0821b10b4c76d5ce2878
2019-12-18 09:25:35 -08:00
359c39b3c2 Use global lock instead of per instance lock. (#31404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31404

Multiple "trainers" could each create different instances of DistributedOptimizer, which means we can still have a race condition unless we do a trully global per worker lock.
ghstack-source-id: 95874624

Test Plan: run unit tests -- unfortunatelly due to the non-deterministic behavior it's not clear how to unit test this properly.

Differential Revision: D19154248

fbshipit-source-id: fab6286c17212f534f1bd1cbdf9f0de002d48c74
2019-12-18 09:22:54 -08:00
386cd59d44 Remove redundant queries of qconfig in insertObservers (#31292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31292

att
Also we need to do this check after we call `insertObservers` on invoked modules
as well since qconfig can be None for parent module while being valid for invoked modules

Test Plan:
.

Imported from OSS

Differential Revision: D19146668

fbshipit-source-id: be6811353d359ed3edd5415ced29a4999d86650b
2019-12-18 09:15:52 -08:00
58d2dd5b73 Enabled flip for bool tensors (#31267)
Summary:
Fix this [issue](https://github.com/pytorch/pytorch/issues/31213)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31267

Differential Revision: D19047249

Pulled By: izdeby

fbshipit-source-id: f58ca3ac88aab28742b8d345400270f7d31c3856
2019-12-18 09:01:32 -08:00
3e59e80429 Revert D18941024: Move TorchScript language reference to its own page
Test Plan: revert-hammer

Differential Revision:
D18941024

Original commit changeset: d0ff600870a1

fbshipit-source-id: 01c0eac4c9741f27b91d710616e71a0d769f6f6a
2019-12-18 08:55:50 -08:00
3694749cd1 Detect dill version in torch.save/load (#30985)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/28313
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30985

Differential Revision: D19142947

Pulled By: zou3519

fbshipit-source-id: 10e3a182a99e80ca8c9c8328b6f8764b27d78eb3
2019-12-18 08:05:08 -08:00
74e59c6fed caffe2::TypeInfo fix when using clang-cl on Windows (#31364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31364

clang-cl defines both `_MSC_VER` and `__clang__`. Names are mangled clang style though. calling `extract` with the wrong name mangling pattern will throw `std::logic_error`. This crashes on Windows when `get_fully_qualified_type_name` is called because it is marked with `noexcept`.

Test Plan: Windows builds no longer crash on startup.

Reviewed By: mattjgalloway

Differential Revision: D19142064

fbshipit-source-id: 516b9b63daeff30f5c097d192b0971c7a42db57e
2019-12-18 07:51:07 -08:00
c05538b831 Move TorchScript language reference to its own page (#31138)
Summary:
Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language
](https://our.intern.facebook.com/intern/diff/18941024/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138

Pulled By: driazati

Differential Revision: D18941024

fbshipit-source-id: d0ff600870a14c4a7c6ce54867d152072a12c48c
2019-12-18 00:46:19 -08:00
3c8892aa0c avoid doing quadratic work in concrete type inference (#31020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31020

Before, the recursive scripting process re-did the concrete type
inference process for every submodule call. This changes things so that
the concrete type inference process only occurs once (at the top level),
and we re-use all the inferred concrete types while recursively
compiling submodules.

This is both more efficient (we don't do n^2 work inferring concrete
types) and less bug-prone (since we infer the concrete type only once,
there is no possibility of a mismatch).

Test Plan: Imported from OSS

Differential Revision: D18904110

Pulled By: suo

fbshipit-source-id: 6560b85ae29fe5e9db1ee982dbf8bc222614b8d8
2019-12-17 21:55:55 -08:00
878b0e35f7 Simplify recursive script compilation flow. (#31019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31019

No more `recurisve_script`, just direct calls to `create_script_module`.
This reduces the number of pathways through the frontend, and the
uniformity is useful for a future PR.

Test Plan: Imported from OSS

Differential Revision: D18904113

Pulled By: suo

fbshipit-source-id: 7de061dfef0cbdfc9376408fc6c1167b81803f01
2019-12-17 21:55:50 -08:00
82d52bc718 remove remnants of properties hack (#31018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31018

Properties are now disallowed so this hack is no longer necessary

Test Plan: Imported from OSS

Differential Revision: D18904112

Pulled By: suo

fbshipit-source-id: 83448da677082d59355729bb72d9f9f4c31ea756
2019-12-17 21:55:45 -08:00
7e81d72d12 remove unnecessary arg from create_script_module (#31017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31017

This arg is now derivable from another one. So we don't need to pass
both

Test Plan: Imported from OSS

Differential Revision: D18904111

Pulled By: suo

fbshipit-source-id: ea74ea9c2ae83d9e0e6977b0eb6629f53545e2e4
2019-12-17 21:55:41 -08:00
e5631119f6 use expect instead of casting in register_c10_ops (#31401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31401

As title, just a mechanical change

Test Plan: Imported from OSS

Differential Revision: D19152965

Pulled By: suo

fbshipit-source-id: 6bb27df7c8f542c55110286c156358ba0936269f
2019-12-17 21:37:59 -08:00
4ec2448580 Update OVERVIEW.md (#31373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31373

Just some housekeeping

Test Plan: Imported from OSS

Differential Revision: D19145987

Pulled By: suo

fbshipit-source-id: ae8142dab2bddcf0b628c27c426ca26334c48238
2019-12-17 21:29:16 -08:00
e0ab255a51 Updates to serialization.md (#31372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31372

Keeping it current with the latest changes.

Test Plan: Imported from OSS

Differential Revision: D19145986

Pulled By: suo

fbshipit-source-id: 88122e66fa87a354ef8e87faffe58551074e3f03
2019-12-17 21:29:12 -08:00
e169e02836 Refactor custom op tests (#31282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31282

Introduce a helper to easily call stack ops
ghstack-source-id: 95855728

Test Plan: unit tests

Differential Revision: D19061515

fbshipit-source-id: a7d6329e26cd3d94730d88c8a6393e10bfbd8e9b
2019-12-17 20:48:01 -08:00
c5d2758c35 Disable flaky TestMomentumSGD.test_fp16momentum_sgd (#31369)
Summary:
Related to https://github.com/pytorch/pytorch/issues/31368
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31369

Differential Revision: D19147072

Pulled By: VitalyFedyunin

fbshipit-source-id: 6fad13be7b35f992d84a20f23877cad05ff18616
2019-12-17 19:16:54 -08:00
e3fecabdcb Setup operator registration for distributed package (#31214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31214

This set up the basic infrastructure for distributed autograd and rpc to
bind their operators to TorchScript, since the whole distributed package
is builtin behind the `USE_DISTRIBUTED` flag, we separate the
registration and build it only when the flag is on.

Test Plan: Imported from OSS

Differential Revision: D19137160

fbshipit-source-id: ff47dc4c380ebe273fe0eea9e5e3fccfbd6466d7
2019-12-17 17:26:43 -08:00
e33dea6e4e dynamicly quantized lstm benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30149

Test Plan: Imported from OSS

Differential Revision: D18613005

Pulled By: z-a-f

fbshipit-source-id: 966bfe2c862b1b4006b228bd9115c5c1cd3ad8cf
2019-12-17 16:52:04 -08:00
f0243ea712 Use [[deprecated]] instead of C10_DEPRECATED (#30918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30918

This is a C++14 feature we can use now
ghstack-source-id: 95811482

Test Plan: waitforsandcastle

Differential Revision: D18869636

fbshipit-source-id: b5b3d78b61b6ceb2deda509131f8502e95b1d057
2019-12-17 15:21:34 -08:00
d9c3913dfc move BatchPermutationOp to caffe2/operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31350

Reviewed By: houseroad

Differential Revision: D19053527

fbshipit-source-id: 50d11f137d0f5c07e8ad899a3a84d56a042bbc32
2019-12-17 14:58:27 -08:00
0b8332efb4 Remove c++11 examples from doc comments (#30925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30925

-
ghstack-source-id: 95810835

Test Plan: it's just comments

Differential Revision: D18869634

fbshipit-source-id: 346498ae2472dbfe23ef40533bff891fde9922c4
2019-12-17 14:58:22 -08:00
5554e5b793 Docs: c++11 -> c++14 (#30530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30530

Switch some mentions of "C++11" in the docs to "C++14"
ghstack-source-id: 95812049

Test Plan: testinprod

Differential Revision: D18733733

fbshipit-source-id: b9d0490eb3f72bad974d134bbe9eb563f6bc8775
2019-12-17 14:09:02 -08:00
cc8d6342fc make profiling take no_grad flags into account (#31071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31071

Previously the profiler would think Tensors would require grad, even
when the no_grad flag is enabled during execution. This makes the profiling
and guards respect the no_grad flag, which eliminates extra differentiable
graphs that appear in the backward graph (where no_grad is typically enabled).

Test Plan: Imported from OSS

Differential Revision: D18915468

Pulled By: zdevito

fbshipit-source-id: 1ae816a16ab78ae5352825cc6b4a68ed7681a089
2019-12-17 13:22:16 -08:00
dab5f72543 we should have a config-based way to skip flaky tests (#30978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30978

This particular approach queries our issue tracker for test titles that
match the following format:

```
DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync)
```

And then skips the python test for them. There is 1 second timeout so
if the internet flakes we still run the test suite, without disabling any
tests.

This is intended as a quick fix, similar to ninja unland, to get to a green
master. Long term test disables should go into the code.

Test Plan: Imported from OSS

Pulled By: zdevito

Differential Revision: D18890532

fbshipit-source-id: fe9447e59a6d5c9ad345f7c3ff15d63b6d2a09e2
2019-12-17 11:58:43 -08:00
d2067569e7 Kill THTensor_(bhistc). (#31254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31254

It's not used.

Test Plan: Imported from OSS

Differential Revision: D19022923

Pulled By: gchanan

fbshipit-source-id: caa5e6b7a133f24f8f3349fd1e53147f8dd3fd97
2019-12-17 08:54:17 -08:00
49eff2f43c Kill THSize. (#31218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31218

It isn't used.

Test Plan: Imported from OSS

Differential Revision: D18986641

Pulled By: gchanan

fbshipit-source-id: 0a434941d12193941f097232c18ffe4268bf5f82
2019-12-17 08:54:13 -08:00
52b8a52e4d move AliasWithNameOp to caffe2/operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31281

Reviewed By: houseroad

Differential Revision: D19053453

fbshipit-source-id: 350bfd5c001db9c17916dcae7ade8f56db1e9841
2019-12-17 02:39:40 -08:00
0e548a76eb Upgrade exported ONNX IR version to 6 (#31025)
Summary:
Upgrade IR version from 4 to 6, below is change doc from ONNX. The upgrade should be backward compatible.

```
  // IR VERSION 5 published on March 18, 2019
  // - Add message TensorAnnotation.
  // - Add quantization annotation in GraphProto to map tensor with its scale and zero point quantization parameters.
  IR_VERSION_2019_3_18 = 0x0000000000000005;

  // IR VERSION 6 published on Sep 19, 2019
  // - Add support for sparse tensor constants stored in model.
  //   - Add message SparseTensorProto
  //   - Add sparse initializers
  IR_VERSION = 0x0000000000000006;
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31025

Reviewed By: hl475

Differential Revision: D18935444

Pulled By: houseroad

fbshipit-source-id: 9ba47f9657fa1a668db291cf04af07d5e8d73c21
2019-12-16 23:18:22 -08:00
10ce1765be Introducing ScalarTypeType and LayoutType (#31074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31074

As the title,
It's step 1 in https://github.com/pytorch/pytorch/pull/30694#issuecomment-564205276.

Not using those types in any other place.

Test Plan: Making sure all unit tests and build pass successfully.

Differential Revision: D18916246

fbshipit-source-id: c8213307ed196e1b51ce1a2a7c10869dcd45b79e
2019-12-16 21:46:47 -08:00
f9010d7648 remove wipe cache from op bench (#31334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334

The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench.

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N1_K1_cpu
# Input: M: 1, N: 1, K: 1, device: cpu
Forward Execution Time (us) : 111.192

A/B test also pass Benchmark Run #2476535015

Reviewed By: hl475

Differential Revision: D19126970

fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd
2019-12-16 16:34:14 -08:00
229ce89b92 Fix coverage and hypothesis conflict (#31320)
Summary:
Temporarily enforcing versions for all envs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31320

Differential Revision: D19122781

Pulled By: VitalyFedyunin

fbshipit-source-id: fe6473b177367371387d4b3b873131e7ecfbc0f8
2019-12-16 15:52:42 -08:00
c5d3be1102 Remove the second copy on calling dist_autograd_context._known_worker_ids() (#31206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31206

Improvement on #25525.

- DistAutogradContext::getKnownWorkerIds() returns a unordered_map as temp value. No need to copy this temp value A into another temp value B.
ghstack-source-id: 95736296

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork --  test_worker_ids_recorded
```

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift -- test_context_cleanup_tensor_with_grad
```

Differential Revision: D5707771

fbshipit-source-id: 9fea83dc69b02047aef8b02a73028a260ac0be40
2019-12-16 15:07:39 -08:00
643ca5def2 Replace c10::guts::stuff with std::stuff (#30915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915

Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609

Test Plan: waitforsandcastle

Differential Revision: D18869639

fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
2019-12-16 13:57:19 -08:00
c6a8f884d8 add copy_ operator the op bench (#31327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31327

Adds copy_ operator to the benchmark suite

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 --operators copy_
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: copy_
# Mode: Eager
# Name: copy__M1_N1_K1_cpu_dtype_onetorch.int32_dtype_twotorch.int32
# Input: M: 1, N: 1, K: 1, device: cpu, dtype_one: torch.int32, dtype_two: torch.int32
Forward Execution Time (us) : 60.645

Reviewed By: hl475

Differential Revision: D19122910

fbshipit-source-id: e5f0b0e2612daae0201b1b4a87f52b971e0cc4a8
2019-12-16 13:45:12 -08:00
d401ba1417 benchmark binary ops in binary_test (#31326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31326

as title

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_in_one[64,1,64]_in_two[1,64,1]_cpu_dtypetorch.float32
# Input: in_one: [64, 1, 64], in_two: [1, 64, 1], device: cpu, dtype: torch.float32
Forward Execution Time (us) : 28080.802

Reviewed By: hl475

Differential Revision: D19120113

fbshipit-source-id: 1105de208f7609cc6d74f0b5bc6fe75f19146b28
2019-12-16 13:45:08 -08:00
455e85a2f1 Fix unflatten when dim is a negative integer (#31208)
Summary:
Changelog:
- Wrap dim to be a positive integer when dim is negative
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31208

Test Plan:
- Updated tests in test_namedtensor.py

Fixes https://github.com/pytorch/pytorch/issues/31184

Differential Revision: D19036569

Pulled By: zou3519

fbshipit-source-id: 86e01e20988dee7c4b6c73232f66282d687f9a2c
2019-12-16 12:48:03 -08:00
9ca61aec0f Kill THLogAdd (#31217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31217

It doesn't seem to be used.

Test Plan: Imported from OSS

Differential Revision: D18986642

Pulled By: gchanan

fbshipit-source-id: 96d615df82731d2224d403ab6e2cad6d4c6674fd
2019-12-16 12:30:16 -08:00
409151e1bb Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917

This is a C++14 feature, we can use this now.
ghstack-source-id: 95255753

Test Plan: waitforsandcastle

Differential Revision: D18869637

fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b
2019-12-15 23:54:16 -08:00
c95d46abbd Remove C++11 compatibility from c10::util::crc64_t (#30920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30920

deletecode
ghstack-source-id: 95255641

Test Plan: waitforsandcastle

Differential Revision: D18869640

fbshipit-source-id: c3d7f4e1a29caff9fd8a8141c258f6f1c3fd830c
2019-12-15 23:43:02 -08:00
0d7391f8b2 Test cases for custom ops with autograd (#31003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31003

-
ghstack-source-id: 95663728

Test Plan: unit tests

Differential Revision: D18896189

fbshipit-source-id: d71f7678fff644536fe30452ee21a5a7df1f1f0b
2019-12-15 22:37:24 -08:00
930d0751e6 Java Tensor hybrid, owns at::Tensor, no memcopy for java outputs. (#30501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30501

**Motivation**:
In current state output of libtorch Module forward,runMethod is mem copied to java ByteBuffer, which is allocated, at least in some versions of android, on java heap. That could lead to intensive garbage collection.

**Change**:
Output java tensor becomes owner of output at::Tensor and holds it (as `pytorch_jni::TensorHybrid::tensor_` field) alive until java part is not destroyed by GC. For that org.pytorch.Tensor becomes 'Hybrid' class in fbjni naming and starts holding member field `HybridData mHybridData;`

If construction of it starts from java side - java constructors of subclasses (we need all the fields initialized, due to this `mHybridData` is not declared final, but works as final) call `this.mHybridData = super.initHybrid();` to initialize cpp part (`at::Tensor tensor_`).

If construction starts from cpp side - cpp side is initialiaed using provided at::Tensor with `makeCxxInstance(std::move(tensor))` and is passed to java method `org.pytorch.Tensor#nativeNewTensor` as parameter `HybridData hybridData`, which holds native pointer to cpp side.

In that case `initHybrid()` method is not called, but parallel set of ctors of subclasses are used, which stores `hybridData` in `mHybridData`.

Renaming:
`JTensor` -> `TensorHybrid`

Removed method:
`JTensor::newAtTensorFromJTensor(JTensor)` becomes trivial `TensorHybrid->cthis()->tensor()`

Test Plan: Imported from OSS

Differential Revision: D18893320

Pulled By: IvanKobzarev

fbshipit-source-id: df94775d2a010a1ad945b339101c89e2b79e0f83
2019-12-15 21:36:20 -08:00
60ec53c7fd Fix copy kernel speed regression introduced in #29631 (#31279)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31271

This fixes copy kernel speed regression introduced in https://github.com/pytorch/pytorch/issues/29631.

The previous implementation forces the compiler to instantiate `static_cast_with_inter_type` because it is passed as an argument of a function. This behavior makes it impossible for compilers to do optimizations like automatic vectorization, and, function call itself is expensive compared to a single casting instruction.

To check the change, run
```
readelf -Ws /home/xgao/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so | grep static_cast_with_inter_type
```

On nightly build, we have output
```
168217: 0000000001852bf0     5 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIsdE5applyEd
168816: 0000000001852d30    33 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEaE5applyEa
168843: 00000000018531f0     7 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIblE5applyEl
168930: 0000000001852c20     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIslE5applyEl
168935: 00000000018528d0   124 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIfNS_4HalfEE5applyES1_
169023: 0000000001852f30    17 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEhE5applyEh
169713: 00000000018525c0     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIahE5applyEh
170033: 0000000001852c10     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIsiE5applyEi
170105: 0000000001852bd0     5 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIshE5applyEh
170980: 0000000001852fc0    27 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIdES1_IfEE5applyES3_
171398: 0000000001852810    13 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIdbE5applyEb
171574: 00000000018532e0    35 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIbNS_8BFloat16EE5applyES1_
171734: 0000000001852b20     6 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIlSt7complexIdEE5applyES2_
172422: 0000000001853350    54 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EaE5applyEa
172704: 00000000018533c0    38 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EfE5applyEf
172976: 0000000001852890    10 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIflE5applyEl
173038: 0000000001852f80     9 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEfE5applyEf
173329: 00000000018531c0    20 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIbfE5applyEf
173779: 00000000018524d0     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIhiE5applyEi
174032: 0000000001852960    14 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIfNS_8BFloat16EE5applyES1_
174334: 0000000001852d60    29 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEdE5applyEd
174470: 0000000001852c60   124 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIsNS_4HalfEE5applyES1_
174770: 0000000001852bc0    15 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIlNS_8BFloat16EE5applyES1_
176408: 0000000001853980   144 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeINS_4HalfEbE5applyEb
176475: 0000000001852790   128 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIdNS_4HalfEE5applyES1_
....
```

And after this PR, we get empty output
```
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31279

Differential Revision: D19075587

Pulled By: ngimel

fbshipit-source-id: c20088241f39fa40c1d055f0a46eb5b9ece52e71
2019-12-15 14:01:31 -08:00
9dc3d8738c fix view call on discontiguous tensor in to_sparse_backward (#31223)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30820
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31223

Differential Revision: D19044172

Pulled By: ngimel

fbshipit-source-id: ac9fa71197d4f6c5b90a26e8d23360250745a2e2
2019-12-15 11:51:47 -08:00
0e50c1b0d9 Replace assert with cuda assert macro (#31297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31297

Follow-up to https://github.com/pytorch/pytorch/pull/31276

This is final replacement needed for aten out of place hipification.

Test Plan: wait for CI to clear.

Reviewed By: bddppq

Differential Revision: D19070209

fbshipit-source-id: 1428cd0ddfb5a8f4e234fabce822285e898047ea
2019-12-15 05:43:00 -08:00
ec92711aac Fix error message in incorrect rref.localValue() call (#31199)
Summary:
Closes https://github.com/pytorch/pytorch/issues/31198, see the issue for more details. We throw an error when `local_value()` is called on a non-owned rref, but the incorrect node name is printed in the error message. This PR fixes that and adds a relevant unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31199

Differential Revision: D19072014

Pulled By: rohan-varma

fbshipit-source-id: 760c20bfd2fbf286eaaca19500469509a575cfec
2019-12-14 22:51:00 -08:00
ffe0c1ae4d Make test_torch.py pass cuda-memcheck (#29243)
Summary:
Make the following changes:
- When there are more than 10k errors, cuda-memcheck only shows 10k errors, in this case we shouldn't raise an Exception
- Add UNDER_CUDA_MEMCHECK environment to allow disabling `pin_memory` tests when running cuda-memcheck.
- Add a `--ci` command option, when turned on, then this script would run output to stdout instead of writing a file, and exit with an error if cuda-memcheck fails
- Add a `--nohang` command option. When turned on, then hang would be treated as pass instead of error
- Do simple filtering on the test to run: if `'cpu'` in the test name but not `'cuda'` is not in the test name
- Add `--split` and `--rank` to allowing splitting the work (NVIDIA CI has a limitation of 3 hours, we have to split the work to satisfy this limitation)
- The error summary could be `ERROR SUMMARY: 1 error`, or `ERROR SUMMARY: 2 errors`, the tail could be `error` or `errors`, it is not of the same length. The script is fixed to handle this case.
- Ignore errors from `cufft`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29243

Differential Revision: D18941701

Pulled By: mruberry

fbshipit-source-id: 2048428f32b66ef50c67444c03ce4dd9491179d2
2019-12-14 20:29:58 -08:00
701e05dcbb Buck test targets robolectric,instrumentattion
Summary:
Buck targets for robolectric and instrumentation tests for pytorch android:
```
buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:test_host
```
```
buck test //xplat/caffe2/android:test_instrumentation
```
For both:
```
buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch
```

Models in assets:
`pt_android_test_asset` - creates buck target that can be included in both robolectric and instrumentation tests that contains asset created from provided torchscript sources as separate file, using the latest binaries of libtorch.

`pt_gen_test_asset_bin`  does that tacing, usage format
```
generate_test_asset input_file.jit output_file.py
```

Example of test-host setup for users of pytorch android:
robolectric tests:

```
load("fbsource//xplat/caffe2:pt_defs.bzl", "pt_android_test_asset", "pt_predictor_binary", "PT_ANDRIOID_TEST_HOST_JNI_DEPS")

pt_android_test_asset(
    name = "test_asset",
    src = "test_asset.jit",
    asset_name = "test_asset.pt",
)

robolectric3_test(
    name = "example_test_host",
    srcs = [...],
    jni_deps = PT_ANDRIOID_TEST_HOST_JNI_DEPS,
    deps = [
        ":pytorch_common",
        ":test_asset",
        "//fbandroid/java/com/facebook/soloader/annotation:annotation",
        "//fbandroid/java/com/facebook/testing/robolectric/v3:v3",
        "//fbandroid/libraries/soloader/java/com/facebook/soloader:soloader",
        "//fbandroid/third-party/java/robolectric3/robolectric:robolectric",
    ],
)
```

COMMON_LINKER_FLAGS = ["-Wl,--no-as-needed"] can not be applied on MacOs

Test Plan:
```
[twsvcscm@od0187.atn1 /data/sandcastle/boxes/fbsource (b416b20a)]$ buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch
Parsing buck files: finished in 7.2 sec
Creating action graph: finished in 0.7 sec
Building: finished in 11.9 sec (100%) 791/791 jobs, 0 updated
  Total time: 19.9 sec
Testing: finished in 11.0 sec (30 PASS/0 FAIL)
RESULTS FOR //xplat/caffe2/android:test_host //xplat/caffe2/android:test_instrumentation
PASS     159ms 15 Passed   0 Skipped   0 Failed   org.pytorch.PytorchHostTests
PASS     152ms 15 Passed   0 Skipped   0 Failed   org.pytorch.PytorchInstrumentedTests (localhost:31930)
TESTS PASSED
```

OSS changes test:
```
gradle -p android pytorch_android:cAT passes
```

Reviewed By: dreiss

Differential Revision: D18799005

fbshipit-source-id: 881609826a837efebc8526aee40355c5a62947d0
2019-12-14 20:29:52 -08:00
57ee7dab87 Wraps assert statements in cuda kernels (#31276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31276

Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail()

This is similar to https://github.com/pytorch/pytorch/pull/13902 in caffe2 land.

Test Plan: wait for CI to clear

Reviewed By: bddppq

Differential Revision: D19047582

fbshipit-source-id: 34703b03786c8eee9c78d2459eb54bde8dc21a57
2019-12-14 20:29:47 -08:00
58eb15f41c JIT Type parser for mobile (#30391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30391

A Type parser to parse the python string of a Type. For example,
"Tuple[str, Optional[float], Dict[str, List[Tensor]], int]".
Please refer to test_type_parser.cpp for the usage.

One of the use cases is in lite interpreter, types needs to be serialized (directly calling the python_str() of the Type) and deserialized (calling parseType(str)).

Test Plan: Imported from OSS

Differential Revision: D18924268

Pulled By: iseeyuan

fbshipit-source-id: 830d411563abfbeec023f01e7f8f4a1796f9a59a
2019-12-14 20:29:42 -08:00
065685180d Loading module from android asset (#30378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30378

Loading module directly from android assets. Iteration on https://github.com/pytorch/pytorch/pull/30109
Loading Module:
```
mModule = AndroidUtils.loadModuleFromAsset(assetName, getAssets());
```

`org.pytorch.AndroidUtils` is excluded from pytorch_jni host build

Testing:
test_app module load switched to this approach and works fine
```
gradle test_app:installMobNet2QuantDebug -PABI_FILTERS=x86 && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity
```

Test Plan: Imported from OSS

Differential Revision: D18893269

Pulled By: IvanKobzarev

fbshipit-source-id: a7c73776f40e9c67bef233da05db56cc6efbe76a
2019-12-14 20:29:37 -08:00
70013415c7 DDP should not set grad for globally unused params (#28883)
Summary:
https://github.com/pytorch/pytorch/issues/28294 DDP should not set grad for globally unused parameters

DDP currently computes the param to bucket mapping upfront, and allreduce grads for all params in every iteration. Even if params are unused, it will just set grad to zero. With such behavior, optimizer cannot tell if a param indeed has a zero grad or it is not used in the current iteration. This could trigger convergence problems for optimizers with weight decay and momentum such as SGD. However, DDP cannot simply set grad to None for local unused parameters, as local unused parameters might be used in other processes, and hence we still need to allreduce its grad. Instead DDP should figure out the globally unused parameters and skip touching their grad in the end of backward.

Implementation summary:
* Add locally used parameter map for each model replica.
* Mark the locally unused parameters in the end of forward and then reduce to get the globally unused parameters.
* In the end of backward skip touching grad for those globally unused parameters.
* Add a unit test test_global_local_unused_params_grad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28883

Differential Revision: D18491530

Pulled By: mrshenli

fbshipit-source-id: 24e9b5f20df86c34ddbf9c7106250fd6ce186699
2019-12-14 20:29:32 -08:00
7cb83bea3b Fix static cuda builds on older cmake versions (#30935)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/28378#issuecomment-562597033

To reproduce the failure I had to downgrade to `cmake 3.9` (Ubuntu 18 uses 3.10 apparently). These older `cmake` versions unfortunately don't seem to allow `target_link_libraries(INTERFACE)` to be used with imported libraries. Switching back to `set_property(TARGET)` fixes the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30935

Differential Revision: D18956912

Pulled By: albanD

fbshipit-source-id: a2b728ee3268599a428b7878c988e1edef5d9dda
2019-12-14 20:29:27 -08:00
7c1b5084a7 Enable equality operator for bfloat16 CPU scalar types. (#30817)
Summary:
See https://github.com/pytorch/xla/issues/1330 for reference.

mruberry ailzhang FYI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30817

Differential Revision: D18847375

Pulled By: mruberry

fbshipit-source-id: d1efedf8b975b8d9b55cf0ddf141818eaa7c91f0
2019-12-14 20:29:21 -08:00
2950530031 caffe2::TypeMeta uses compile time type names (#26619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26619

ghstack-source-id: 95348564

Test Plan: unit tests

Differential Revision: D17519252

fbshipit-source-id: 337ec76d17172dd1af60a1676d69964a41dcb7a1
2019-12-14 20:29:16 -08:00
6e1e09fd10 Compile time type names (#26618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26618

Implement a mechanism to get type names at compile time
In a future diff, I'm planning to introduce this to caffe2::TypeMeta and a few other places.
ghstack-source-id: 95337871

Test Plan: unit tests

Differential Revision: D17519253

fbshipit-source-id: e14017f962fd181d147accb3f53fa8d6ee42a3f8
2019-12-14 20:29:11 -08:00
c35cddb306 Switch default memory format of clone operator to Preserve
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30089

Test Plan: Imported from OSS

Differential Revision: D18624985

Pulled By: VitalyFedyunin

fbshipit-source-id: 8d315b08b7b5858fd0a81d3375b44ccb94787ad4
2019-12-14 20:29:06 -08:00
fde3d707ad Switch default memory format of to (and similar) operators to Preserve
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30088

Test Plan: Imported from OSS

Differential Revision: D18624984

Pulled By: VitalyFedyunin

fbshipit-source-id: 54901786d7496c7dce785140b0585ac9093b1d86
2019-12-14 20:29:01 -08:00
927588df8e Switch default memory format of _like operators to Preserve
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30087

Test Plan: Imported from OSS

Differential Revision: D18624986

Pulled By: VitalyFedyunin

fbshipit-source-id: 8e434966f872ffaddf1249248ea445cbbab300ce
2019-12-14 20:28:57 -08:00
1ec989404c Kill some unnecessary function declarations.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31216

Test Plan: Imported from OSS

Differential Revision: D18986640

Pulled By: gchanan

fbshipit-source-id: 30630d9ea025bb510f85e9627cbb4ba46de5e93d
2019-12-14 20:28:52 -08:00
d7d07e7caf thrust is included in SortingKthValue.cu but never used
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31263

Differential Revision: D19042793

Pulled By: ngimel

fbshipit-source-id: 28f06c46a53e15f106ebee6c36e2ad25a3676bd2
2019-12-14 20:28:47 -08:00
cd3f05b44d Small fixes for hipification (#31200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31200

We do not hipify these files when doing out of place.

Test Plan: wait for CI to clear.

Differential Revision: D18963683

fbshipit-source-id: eeba8597143f26417d0a8181a4c746139afefa24
2019-12-14 20:28:43 -08:00
9954739956 Refactor test for unique and unique_consecutive and fix some bugs (#31211)
Summary:
Tests for unique_dim will be refactored in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31211

Differential Revision: D19034968

Pulled By: ngimel

fbshipit-source-id: 855d326b37638b5944f11fbbce03394cf000daf9
2019-12-14 20:28:38 -08:00
3587f769dc use propagate_names instead of propagate_names_for_reduction for cumsum and cumprod
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31134

Differential Revision: D18964172

Pulled By: anjali411

fbshipit-source-id: 3050c6d283a469a858378c44ac2ab9102baefce5
2019-12-14 20:28:33 -08:00
a9ad98fb25 Remove unused argument "destId" in addSendRpcBackward (#31207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31207

Cleanup after #30914.

In #30914, `autogradContext->addKnownWorkerId(dst);` was moved out of `addSendRpcBackward()`.

So `addSendRpcBackward()` does not need `dstId` as it's argument anymore.
ghstack-source-id: 95509218

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_context_cleanup_tensor_no_grad
```

Differential Revision: D5742365

fbshipit-source-id: accd041a594ec18d369231f5590289828d87baa7
2019-12-14 20:28:29 -08:00
8fea7a49d6 pinning hypothesis for windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31169

Differential Revision: D19036734

Pulled By: mingbowan

fbshipit-source-id: 2205a40720329cb53e741c9827c9049142759588
2019-12-14 20:28:24 -08:00
b64baa963f Robustify rpc_agent handlers with generic Future<T> (#31224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31224

If a future coming back to a rpc_agent server is satisfied with an
exception, ensure this information is propagated back over the wire.
ghstack-source-id: 95522418

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/...

Differential Revision: D18979185

fbshipit-source-id: 99848ae805cc2d48948809a238f61a2e0ef234c9
2019-12-14 20:28:20 -08:00
36d17f4105 abort nccl communicators before throwing operation timed out (#31128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31128

When operation times out due to some errors that are not detected by nccl communicators, ncclCommWatchdog can not check this time out error and thus can not abort ncclComms accordingly. So explicitly abort ncclComms here before throwing this timed out exception to users, after this, ncclCommWatchdog can detect nccl communicators are aborted and clean up devNCCLCommMap_ accordingly. if throwing timed out excepiton without aborting nccl communicators here, it was observed that CUDA GPU will have 100% utilization and can not run new events successfully.
ghstack-source-id: 95528488

Test Plan: newly revised test _test_nccl_errors_blocking passed with the changes in this diff; the reviesed test failed withtout the changes in this diff

Reviewed By: isunjin

Differential Revision: D18928607

fbshipit-source-id: be65a05ce4ff005f0c7fed36ae8e28903e8ffe2b
2019-12-13 00:33:36 -08:00
1ef99cf0ab Intrusive_ptr implementation slower than shared_ptr (#30810)
Summary:
It was a random coding exercise so I wasn't putting much effort into it; but, I was like "hey is the current intrusive_ptr implementation optimized enough?" so I compared it with shared_ptr (using std::shared_from_this).

My benchmark result shows that intrusive_ptr is actually slower. On my macbook the speed is:

```
---------------------------------------------------------------
Benchmark                        Time           CPU Iterations
---------------------------------------------------------------
BM_IntrusivePtrCtorDtor         14 ns         14 ns   52541902
BM_SharedPtrCtorDtor            10 ns         10 ns   71898849
BM_IntrusivePtrArray         14285 ns      14112 ns      49775
BM_SharedPtrArray            13821 ns      13384 ns      51602
```

Wanted to share the results so someone could probably take a look if interested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30810

Reviewed By: yinghai

Differential Revision: D18828785

Pulled By: bddppq

fbshipit-source-id: 202e9849c9d8a3da17edbe568572a74bb70cb6c5
2019-12-13 00:25:36 -08:00
f7c92f60ba Typo in filename align with classname
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31235

Test Plan: Imported from OSS

Differential Revision: D19001793

Pulled By: IvanKobzarev

fbshipit-source-id: ae7f410be6b3c291f1feb3027b5b4a6b7ce15ab3
2019-12-12 23:16:29 -08:00
db90a5b992 Switch to open sourced fbjni (#30175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30175

fbjni was opensourced and java part is published as 'com.facebook.fbjni:fbjni-java-only:0.0.3'
switching to it.
We still need submodule fbjni inside the repo (which is already pointing to  https://github.com/facebookincubator/fbjni) for so linking.

**Packaging changes**:
before that `libfbjni.so` came from pytorch_android_fbjni dependency, as we also linked fbjni in `pytorch_android/CMakeLists.txt` - it was built in pytorch_android, but excluded for publishing. As we had 2 libfbjni.so there was a hack to exclude it for publishing and resolve duplication locally.
```
        if (rootProject.isPublishing()) {
            exclude '**/libfbjni.so'
        } else {
            pickFirst '**/libfbjni.so'
        }
```

After this change fbjni.so will be packaged inside pytorch_android.aar artefact and we do not need this gradle logic.

I will update README in separate PR after landing previous PR to readme(https://github.com/pytorch/pytorch/pull/30128) to avoid conflicts

Test Plan: Imported from OSS

Differential Revision: D18982235

Pulled By: IvanKobzarev

fbshipit-source-id: 5097df2557858e623fa480625819a24a7e8ad840
2019-12-12 20:05:22 -08:00
199e1fb348 Use AVX2 to increase frequency for FP16<->FP32 Caffe2 ops (#31203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31203

For multi-instance environment, AVX2 should help increase the clock frequency.
ghstack-source-id: 95502576

Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- "Float16"

Reviewed By: jspark1105

Differential Revision: D18962649

fbshipit-source-id: 6532d929a99f41f2f6ad1a1a1962e38ae3ddaecb
2019-12-12 19:42:29 -08:00
ca8cb3241a Expose setNumThreads to android api (#31205)
Summary:
PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure:
https://app.circleci.com/jobs/github/pytorch/pytorch/3916388

This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205

Reviewed By: dreiss

Differential Revision: D18977250

Pulled By: IvanKobzarev

fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5
2019-12-12 18:57:27 -08:00
b7c148013f fix torch square_ benchmark runtime error (#31221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31221

This is fixing the runtime error introduced in https://github.com/pytorch/pytorch/pull/30719 that added torch square_ operator to the benchmark suite.

Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: square_
# Mode: Eager
# Name: square__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 66.291

Reviewed By: hl475

Differential Revision: D18987889

fbshipit-source-id: 09c56e3a73aab5ab661aac2b06429063b3a82fac
2019-12-12 18:48:02 -08:00
f30b14dead Fix handling of type comments in body (#30590)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30477. Any type comment after `# type: (...) -> ` is ignored.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30590

Differential Revision: D18887351

Pulled By: driazati

fbshipit-source-id: 162c652f6d7610d14609bbcb25aaa27cdd947a76
2019-12-12 18:19:30 -08:00
20a2e526ef build a generic future<T> (#29579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29579

Per #28923, this diff is to move Future<Message> to torch::utils and extend it to be Future<T>, most of implementations are copied from FutureMessage and ivalue::Future. merge ivalue::Future with Future<T> will be done separately.

The main difference between Future<T>  and FutureMessage is the error handling, instead of checking message type inside Future to handle error, this future<T> owns has_error_ and error_ states.

also this future passes value_, has_error_ and error_ states to callbacks for easily read future states.

In next diff, a torch script rpc async API will be created, before the API returns, it will create an ivalue::Future and passes it to Future<T>'s call back where state of ivalue::Future will be set.  In this way, the torch script rpc async API  can still return a ivalue::Future and call wait() to get its state appropriately afterwards.
ghstack-source-id: 95479525

Test Plan: unit tests

Differential Revision: D18263023

fbshipit-source-id: 48a65712656a72c2feb0bb3ec8b308c0528986a6
2019-12-12 16:57:14 -08:00
c08f2ea254 Updating submodules
Summary:
GitHub commits:

367861fec0
22f5444c09
11c103407d
34507cb383
16d5e3e5ac
c4ce8e637f
0f7ef79620
330fa43933

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 2b6847af7ccba6b53a866e3fded2edf9995b0aaf
2019-12-12 16:53:44 -08:00
5ef0d6f854 Remove subgraphNode kind assert in unmergeSubgraph (#31212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31212

To be able to use this function more broadly.

Test Plan: unit tests

Reviewed By: jackm321

Differential Revision: D18978913

fbshipit-source-id: d998dc7c7f9540f491a8a4bc5d6d25d9c3bf8764
2019-12-12 15:59:55 -08:00
a2463cbc38 Adding quantized clamp kernel (#30541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30541

ghstack-source-id: 95450749

Adding quantized clamp kernel

Test Plan:
Added test.

buck test mode/dev //caffe2/test:quantized -- 'test_qclamp \(test_quantized\.TestQuantizedOps\)' --print-passing-details

Differential Revision: D18739628

fbshipit-source-id: 38a029ab96c5b0689bb15c67dc4f274883e74975
2019-12-12 15:54:40 -08:00
1d5af9599d Update ONNX Flatten to accept negative indices in opset 11 (#30751)
Summary:
Update ONNX Flatten to accept negative indices in opset 11.
With this change, some cases of flatten do not rely on the input rank being available.
Fixes : https://github.com/pytorch/pytorch/issues/30512 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30751

Reviewed By: hl475

Differential Revision: D18946904

Pulled By: houseroad

fbshipit-source-id: a6fa30a9182fff92211e505a19325525c6112f19
2019-12-12 15:27:54 -08:00
84d6796658 move AWS ECR gc jobs to circleci (#30996)
Summary:
all jobs are currently running with "--dry-run", so you can verify if the jobs are doing the right thing.  i'll remove the flag and make it runs every hour same as on Jenkins once this PR is approved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30996

Differential Revision: D18971001

Pulled By: mingbowan

fbshipit-source-id: 2384bdb50ebdf47aad265395f26be3843f0ce05e
2019-12-12 14:28:20 -08:00
5c936845cf fix torch_train build (#30497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30497

fix torch_train build

Test Plan: buck build //xplat/caffe2:torch_trainAndroid

Reviewed By: dreiss

Differential Revision: D18719662

fbshipit-source-id: a3d06b4068d502dbe29681d9f26906f2b8c7b622
2019-12-12 14:20:17 -08:00
a38184dbab Only create OwnerRRefs when processing remote calls (#31163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31163

The purpose is to unblock integration with TorchScript. Currently,
an OwnerRRef will be created by either a remote call or a to_here
call, whichever arrives first. However, when making RRef an IValue,
we need to know the type of value held by the RRef, which is
retrived by checking the return type of the TorchScript function.
The TorchScript function is only avaible during the remote call
but not in the to_here() call. Hence, an OwnerRRef can only be
created when processing a remote call. This commit implements this
behavior by introducing a conditional variable for every OwnerRRef
in the RRefContext, and let the to_here() call and PyRRef::unpickle
block on the CV until the value is ready.

Test Plan: Imported from OSS

Differential Revision: D18949591

Pulled By: mrshenli

fbshipit-source-id: 17513c6f1fd766885ea8e1cd38f672a403fa4222
2019-12-12 14:02:04 -08:00
f6c31f61c5 Enabled roll for bool tensor (#31194)
Summary:
Fixed this [issue](https://github.com/pytorch/pytorch/issues/31079).
Tested via unit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31194

Differential Revision: D18958141

Pulled By: izdeby

fbshipit-source-id: 119bf4d31df10ee02c277f5a4663038470cf7780
2019-12-12 13:48:14 -08:00
bee6344d4e remove / rewrite weak module tests (#31193)
Summary:
Remove most of the testing for `weak_script`, since we removed it. Refactor a few of the existing tests to use recursive scripting api.

Fix for https://github.com/pytorch/pytorch/issues/23965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31193

Differential Revision: D18966291

Pulled By: eellison

fbshipit-source-id: 6b1e18c293f55017868a14610d87b69be42bde12
2019-12-12 13:33:38 -08:00
066e3ed953 Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31127

Original commit changeset: d22448b90843

On Skylake T6:

Single Core:
(Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.)
- Before the PR:
```
native_layer_norm        0.81%            5.884ms          0.81%            5.884ms          122.580us        NaN              0.000us          0.000us          48               [[47, 1, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        0.68%            5.053ms          0.68%            5.053ms          105.272us        NaN              0.000us          0.000us          48               [[56, 1, 1024], [1024], [1024]]
```

20 Cores:
- Before the PR:
```
native_layer_norm        1.65%            41.682ms         1.65%            41.682ms         868.365us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```

- After the PR:
```
native_layer_norm        1.34%            33.829ms         1.34%            33.829ms         704.771us        NaN              0.000us          0.000us          48               [[61, 64, 1024], [1024], [1024]]
```
ghstack-source-id: 95420889

Test Plan:
buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"

buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval"

 python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval

Differential Revision: D18936428

fbshipit-source-id: 8cae33d35fb338b5ac49b1597c2709152612d6e5
2019-12-12 13:31:12 -08:00
66f2bba852 Adding function to convert Module to channels last
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28991

Test Plan: Imported from OSS

Differential Revision: D18430810

Pulled By: VitalyFedyunin

fbshipit-source-id: 0693d4e31fc6f9831722c29fc83517f16ddfc028
2019-12-12 11:38:35 -08:00
4ead2e8996 Fix CircleCI behavior for non-leaf stack PRs (#31088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31088

Original issue:
https://github.com/pytorch/pytorch/issues/31027

The problem is that for the stacks of PRs for non-leaf PRs circleCI does not set environment variable `CIRCLE_PULL_REQUEST` which is used to filter out some jobs that should run only on `master`.

(Android job for master includes alll 4 abis (x86, x86_64, armeabi-v7a, arm64-v8a)  and gradle build tries to get results from all 4 abis, for PRs we run only x86 build for resources economy. Thats why not filtered master android job fails as abis apart x86 were not scheduled)

env variable `CIRCLE_BRANCH ` is set fine and can be used as a workaround to distinguish that this is PR (published with ghstack).

Test Plan: Imported from OSS

Differential Revision: D18966385

Pulled By: IvanKobzarev

fbshipit-source-id: 644c5ef07fcf2d718b72695da2cc303da8b94ef4
2019-12-12 11:33:14 -08:00
bcb0bb7e0e Remove unnecessary ATen/core/EnableNamedTensor.h (#31117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31117

After this diff, we will have completely removed the named tensor
feature flagging. This means that named tensors are always on and that
there is no mechanism to turn them off. There should be no more follow-up
diffs.

I performed the deletion of the header with
```
find . -type f -print0 | xargs -0 sed -i '/#include
<ATen\/core\/EnableNamedTensor.h>/d'
```

Test Plan: - wait for CI

Differential Revision: D18934952

Pulled By: zou3519

fbshipit-source-id: 253d059074b910fef15bdf885ebf71e0edf5bea5
2019-12-12 09:53:07 -08:00
9047d4df45 Remove all remaining usages of BUILD_NAMEDTENSOR (#31116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116

Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR

Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.

Test Plan: - run CI

Differential Revision: D18934951

Pulled By: zou3519

fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
2019-12-12 09:53:03 -08:00
c0bcfd0445 Revert D18923167: Expose setNumThreads to android api
Test Plan: revert-hammer

Differential Revision:
D18923167

Original commit changeset: 8d98c2edbff4

fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9
2019-12-12 09:23:58 -08:00
56de8853da Resubmit overload v2 (#31123)
Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/30356 and https://github.com/pytorch/pytorch/pull/31014 :'(

The last commit contains the fix. There was an internal FBcode error not able to compile the previous `impl_default->second.equal(default_val.second))` line. I tried various fixes in C++ internally but couldn't figure anything out. This is a good example of the programming costs of going from python -> c++ for different types of objects, because the conceptual overhead has expanded in scope from (python) -> (python, c++, pybind).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31123

Differential Revision: D18936128

Pulled By: eellison

fbshipit-source-id: 7d8fd66a6dd4a3e9838f3a0b68c219b6565a9462
2019-12-12 07:54:23 -08:00
3a02ed822b Remove insert_prepack_unpack and fold_prepack for now (#30909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30909

`fold_prepack` doesn't work anymore after we change `scale`, `zero_point`
to be attributes, but since the freeze API is coming up, I don't want to
spend time to make this work since this will be thrown away later.

Test Plan:
.

Imported from OSS

Differential Revision: D18864537

fbshipit-source-id: 649e6b91f2b04b8babacc0afb6bc1530ed7259d3
2019-12-12 07:44:31 -08:00
159835e666 Add types for the remaining optimizers. (#31130)
Summary:
**Patch Description**
Round out the rest of the optimizer types in torch.optim by creating the stubs for the rest of them.

**Testing**:
I ran mypy looking for just errors in that optim folder. There's no *new* mypy errors created.
```
$ mypy torch/optim | grep optim
$ git checkout master; mypy torch/optim | wc -l
968
$ git checkout typeoptims; mypy torch/optim | wc -l
968
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31130

Reviewed By: stephenroller

Differential Revision: D18947145

Pulled By: vincentqb

fbshipit-source-id: 5b8582223833b1d9123d829acc1ed8243df87561
2019-12-12 06:36:41 -08:00
2488231fe3 Tweak pollTimedOutRPCs thread synchronization (#30355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30355

- Make processTimedOutFutures hold lock.
- Reduce unnecessary scan on future and future timeout maps.
- Reduce the scope of lock at a spot.
- Avoid repeatedly wake up if user set timeout = 0.

ghstack-source-id: 95409528

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts
```

Differential Revision: D5516149

fbshipit-source-id: 4bb0bd59fa31d9bfaef9f07ac0126782da17f762
2019-12-11 22:02:32 -08:00
0db6c01301 Re-enable python 2 builds (#31164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31164

We have a small number of internal projects that still are on Python 2.
Until we can figure out how to get rid of them, we need to continue
supporting Python 2 for PyTorch.

Test Plan: Imported from OSS

Differential Revision: D18949698

Pulled By: suo

fbshipit-source-id: 4a9d7e4306ed81576e05f243de472937a2bb1176
2019-12-11 22:02:28 -08:00
4f5a4be45f Add native/quantized to the list of header rewrites (#31151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31151

same as title. I am not sure why this was not added in the first place.

Test Plan: wait for build to succeed.

Reviewed By: bddppq, xw285cornell

Differential Revision: D18880216

fbshipit-source-id: 8b17d4fbd5dd08c28c52df8b1da77b69d56d65dc
2019-12-11 21:59:29 -08:00
6ab2d1b1a4 Partially support tensor lists in loop/concat/stack (#30126)
Summary:
This is a follow-up PR after https://github.com/pytorch/pytorch/pull/29136 ~~and https://github.com/pytorch/pytorch/pull/29171~~

ONNX::Loop does not support Sequence type as loop-carried dependencies. Only tensors are supported.
This PR adds a pass that converts Sequence loop-carried dependencies to scan_outputs.
In opset 11, only the below pattern is supported.
```
PTIR graph:
 ...
 %res.1 : Tensor[] = prim::ListConstruct()
 %res : Tensor[] = prim::Loop(%11, %22, %res.1)
   block0(%i.1 : Tensor, %res.6 : Tensor[]):
     ...
     %res.3 : Tensor[] = aten::append(%res.6, %17)
     -> (%22, %res.3)
 return (%res.3)

ONNX graph:
 ...
 %res : Tensor = onnx::Loop(%11, %22)
   block0(%i.1 : Tensor):
     ...
     -> (%22, %17)
 %res_seq : Tensor[] = onnx::SplitToSequence[keepdims=0](%res)
 return (%res_seq)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30126

Reviewed By: hl475

Differential Revision: D18946880

Pulled By: houseroad

fbshipit-source-id: 67ee65700513e8a942344a3d647e2e73c19ee3d2
2019-12-11 21:24:41 -08:00
a3ed350eb2 Change type of timeoutFutures_ key to time_point instead of duration (#31078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31078

Make `ProcessGroupAgent::pollTimedOutRPCs` code more conventional.

- Use `std::chrono::time_point` to represent `endTime` instead of `std::chrono::duration`.
- Replace `std::condition_variable::wait_for(lock, endTime)` with `std::condition_variable::wait_until(lock, endTime)`.
- Remove the unnecessary `::getRPCRemainingTime()`.
ghstack-source-id: 95408482

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts
```

Differential Revision: D5705442

fbshipit-source-id: ba54b7bdb84bc02d05c22360b01290d044bbfcf5
2019-12-11 21:01:31 -08:00
49a5841a9f Make Conv{1,2,3}dOptions and ConvTranspose{1,2,3}dOptions different classes (#31005)
Summary:
Currently, both `Conv{1,2,3}dOptions` and `ConvTranspose{1,2,3}dOptions` are aliases of the `ConvOptions<{1,2,3}>` class, which causes confusion because the `ConvOptions` class has parameters such as `transposed` that shouldn't be exposed to the end user. (This has caused issues such as https://github.com/pytorch/pytorch/issues/30931.) This PR makes the following improvements:
1. Rename the original `torch::nn::ConvOptions<N>` class to `torch::nn::detail::ConvNdOptions<N>` class, to signify that it's an implementation detail and should not be used publicly.
2. Create new classes `torch::nn::ConvOptions<N>` and `torch::nn::ConvTransposeOptions<N>`, which have parameters that exactly match the constructor of `torch.nn.Conv{1,2,3}d` and `torch.nn.ConvTranspose{1,2,3}d` in Python API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31005

Differential Revision: D18898048

Pulled By: yf225

fbshipit-source-id: 7663d646304c8cb004ca7f4aa4e70d3612c7bc75
2019-12-11 20:31:48 -08:00
85107e72b4 Fix type unification With Specialized Tensor Shapes (#31076)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/30015

We had a model that failed in shape propagation because we could not unify `Tensor` and `Optional[BoolTensor]`. Tensor not subtyping Optional[BoolTensor] was correct, but we should have unified those two types to `Optional[Tensor]`.
 The fix here is that for immutable types containers (Optional, Tuple Type), we should be attempting to unify with complete shape information, and if that fails, then try to unify those types with unshaped types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31076

Differential Revision: D18921802

Pulled By: eellison

fbshipit-source-id: aa6890277470c60b349ed1da4d81cc5d71d377f6
2019-12-11 20:11:34 -08:00
97c1e90f46 ONNX Interpolate Add Scales Params (#28324)
Summary:
Fix for : https://github.com/pytorch/pytorch/issues/27176
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324

Reviewed By: hl475

Differential Revision: D18309133

Pulled By: houseroad

fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf
2019-12-11 20:09:15 -08:00
79c27ba4ef Add ONNX Export Support to floor_divide (#31081)
Summary:
Adding support for the new ATen op floor_divide which was introduced in https://github.com/pytorch/pytorch/pull/30493/files.

This operation is used in Torchvision/FasterRCNN-MaskRCNN, which are now failing after the new op was introduced.
This PR fixes the failure.

cc: neginraoof
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31081

Reviewed By: houseroad

Differential Revision: D18945316

Pulled By: eellison

fbshipit-source-id: 09919c237d618ce7db293c7770f48f7304949dcf
2019-12-11 19:39:11 -08:00
d81c6bde3b Updating submodules
Summary:
GitHub commits:

36ab9debf5
55e5070f0a
5fed1a6da7
9f0f470fce
e1dfe80fe0
786d2c588c
6c2b9d596d

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 1242688c93ba233f19f3afac174c814ae4c455dc
2019-12-11 18:58:37 -08:00
efe683fb2a dynamicly quantized linear benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30148

Test Plan: Imported from OSS

Differential Revision: D18613006

Pulled By: z-a-f

fbshipit-source-id: 3851189a2822fd09a5dd97c9d54774727822d2bf
2019-12-11 18:39:57 -08:00
73f9e81660 Make rref fetch calls async. (#31086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31086

This change leverages the new future response framework so that server
threads don't block until setValue is called. Particulurly, we add a
getFuture() method to OwnerRRef so that we get a future that is satisfied
once setValue is called.
ghstack-source-id: 95402273

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18925272

fbshipit-source-id: 2caf51019e5b5fd7ec45539544780067deb28610
2019-12-11 18:30:09 -08:00
679b20b1e4 Unify list elements for all list types (#30777)
Summary:
Previously list elements were only unified for tensor lists.
This improves error messages and expands the unification logic
to include all types.
](https://our.intern.facebook.com/intern/diff/18837726/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30777

Pulled By: driazati

Differential Revision: D18837726

fbshipit-source-id: c4d275562a8429700987569426d694faa8f6002e
2019-12-11 17:00:52 -08:00
0414463007 doc fix for max method: a warning about different behaviour on CPU and GPU (#31115)
Summary:
Fixes [30708](https://github.com/pytorch/pytorch/issues/30708),
Adds warning regarding different behaviour of the method depending on device type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31115

Differential Revision: D18937365

Pulled By: zou3519

fbshipit-source-id: 7c731dd80f8b371de08d7fdfcc2196be15a593e1
2019-12-11 16:02:33 -08:00
e5a550cd1d Fix Test CI by pinning hypothesis and correcting the import (#31137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31137

Our Test CI is broken because:
- hypothesis recently did a new release that reorganized their internal
modules
- we were importing something from their internal module structure.

This PR fixes the CI by doing the following:
- import SearchStrategy from the correct (public) location
- Pin the hypothesis version to avoid future surprises.

In the long term, we should stop install hypothesis every time the CI
runs and instead install it as a part of our docker build process. See
https://github.com/pytorch/pytorch/issues/31136 for details.

Test Plan:
- I tested this locally; before this PR test/test_nn.py fails to run but
after it does run.
- Wait for CI

Differential Revision: D18940817

Pulled By: zou3519

fbshipit-source-id: c1ef78faa5a33ddf4d923f947c03cf075a590bb8
2019-12-11 15:42:59 -08:00
945ce71b18 Correctly handle scalar types, fix parse of numpy ints (#30486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486

Fixes: https://github.com/pytorch/pytorch/issues/29252

There is some incorrect code in the handling of parsing python numbers that led to issue #29252:

When we allow interpretation of a zero-dim numpy integer value
as a scalar in pytorch, we incorrectly parse the int as a float.

This PR also fixes the issue described in the "FIXME" here:
https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487

Test Plan: Added a unit test based on the example given in the issue.

Differential Revision: D18932520

Pulled By: nairbv

fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65
2019-12-11 15:35:57 -08:00
293a139d79 add a warning for script classes (#31069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31069

Just to clarify that they are still experimental.

Test Plan: Imported from OSS

Differential Revision: D18920496

Pulled By: suo

fbshipit-source-id: d2f3014592a01a21f7fc60a4ce46dd0bfe5e19e9
2019-12-11 14:48:55 -08:00
6225443009 Expose setNumThreads to android api (#31033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033

Intention:
There are requests from users to control number of threads from android side:
https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2
https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2

At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads()

Test Plan: Imported from OSS

Differential Revision: D18923167

Pulled By: IvanKobzarev

fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e
2019-12-11 14:20:14 -08:00
06d874f95b Change startTime_ to endTime_ in FutureInfo (#30342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30342

This can eliminate the unnecessary calls to getRPCEndTime(). Reduce lines of code for simplicity.

ghstack-source-id: 95377162

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts
```

Differential Revision: D5705624

fbshipit-source-id: aca4c4917718124022c09ee0d13cf5ca483402af
2019-12-11 14:04:49 -08:00
7a8261e962 Updating submodules
Summary:
GitHub commits:

06033e7eb2
c56d2fa73f
972f299a62
3717a88289
ea64a080c6
b4e0237162

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 73d2d91c851f1905d6d4606a9f8002eb47246852
2019-12-11 12:52:00 -08:00
4b2d356ac1 Re-enable test_rref_context_debug_info after enforcing proper synchronization (#30994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30994

The flakiness we saw was due to missing barriers(), which caused
states leaked into previous or subsequent checks. This commit
attempts fix this problem by adding barriers before and after each
check.

Test Plan: Imported from OSS

Differential Revision: D18893457

Pulled By: mrshenli

fbshipit-source-id: 42bcc12efa7e6e43e2841ef23e4bc2543b0236c6
2019-12-11 12:38:14 -08:00
5b03ff0a09 Update embedding renorm comment to reference fixed issue (#29140)
Summary:
Address last comment in https://github.com/pytorch/pytorch/issues/28546
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29140

Differential Revision: D18915091

Pulled By: albanD

fbshipit-source-id: 756ff5bb6a92d47c80aa9f96ff6f0edea5fd24de
2019-12-11 11:58:55 -08:00
dbc8b00816 Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077)
Summary:
We mention `WorkerInfo` and `RpcBackendOptions` in a couple of different locations in our docs, and these are public classes that the user may use, so we should add the class to the documentation.
<img width="978" alt="Screen Shot 2019-12-10 at 1 42 22 PM" src="https://user-images.githubusercontent.com/8039770/70571759-47db2080-1b53-11ea-9d61-c83985a29dd9.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31077

Differential Revision: D18928162

Pulled By: rohan-varma

fbshipit-source-id: 67f11eedd87523c469377b791a0ba23704ec3723
2019-12-11 11:39:57 -08:00
4a751dfc20 optimize MulGradient for common shapes (#19705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19705

Optimizing for a case when there's a consecutive dims that are not broadcasted followed by another consecutive dims that are broadcasted.
For example, MulGradient(["dC", "A", "B"], ["dA", "dB"], broadcast=True, axis=0) where A.shape == dC.shape == [9508, 80] and B.shape == [80] .

Test Plan:
In SKL T6,

Running mul_gradient_benchmark without this optimization
Operator #0 (dA, MulGradient) 11.9119 ms/iter

After this optimization,
Operator #0 (dA, MulGradient) 0.672759 ms/iter

Need to land D15291800 before to fix the unit test error

Reviewed By: dmudiger

Differential Revision: D15075415

fbshipit-source-id: 0f97be17cf8f1dacbafa34cd637fb8bc1c5e5387
2019-12-11 11:39:52 -08:00
a53b39f09d Disable flaky test_process_group_debug_info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31113

Test Plan: Imported from OSS

Differential Revision: D18932365

Pulled By: mrshenli

fbshipit-source-id: a2996b6a8d3881be4ffc174b85509aeee8c51c96
2019-12-11 11:36:58 -08:00
44ecc3a70b Add tracing support for optional Device and Layout (#30979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30979

This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues.

Main focus of these changes is TensorOptions in code generation.
Goals:
- Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers.
- Refactor TensorOptions logic to a single place.
- Log all discovered issues.

Non goals:
- Fix Everything!
- Remove all the hacks in code generation scripts.
- Clean up and defector all code generation scripts.

--------------
In this PR:
Add tracing support for optional Device and Layout types.

--------------

Test Plan: Imported from OSS

Differential Revision: D18912685

Pulled By: izdeby

fbshipit-source-id: 4a9514ce2eee0041f9bc96636d3ddb4f077675e1
2019-12-11 11:32:52 -08:00
672f4cfad9 Added C++ API test (#30980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30980

This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues.

Main focus of these changes is TensorOptions in code generation.
Goals:
- Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers.
- Refactor TensorOptions logic to a single place.
- Log all discovered issues.

Non goals:
- Fix Everything!
- Remove all the hacks in code generation scripts.
- Clean up and defector all code generation scripts.

--------------
In this PR:
Add a test to check that C++ API behavior stays the same after all the changes.
While working on it a bug related to `requires_grad` was found and logged in the master task.

--------------

Test Plan: Imported from OSS

Differential Revision: D18912681

Pulled By: izdeby

fbshipit-source-id: 19772a37c92dde820839b79055f348689b99fa77
2019-12-11 11:21:05 -08:00
1f87e823b8 Make nn.Transformer TorchScript compatible (#28561)
Summary:
This makes `nn.Transformer` usable from TorchScript. It preserves backwards compatibility via `__setstate__` on the encoder/decoder.

Fixes https://github.com/pytorch/pytorch/issues/24173
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28561

Differential Revision: D18124753

Pulled By: driazati

fbshipit-source-id: 7314843e5aa9c9bf974c4672e4edb24ed8ef4a6f
2019-12-11 10:57:31 -08:00
a929d312ac Add dill>=0.3.1 as testing dependency (#31121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31121

For https://github.com/pytorch/pytorch/pull/30985 .

Test Plan:
- run `pip install "dill>=0.3.1"` locally, check that it actually
installs dill>=0.3.1.

Differential Revision: D18934871

Pulled By: zou3519

fbshipit-source-id: 688a489b9e81134ccb5ab4b099116e3fe6b6b7ae
2019-12-11 10:33:00 -08:00
3593981976 Updating submodules
Summary:
GitHub commits:

9b38c6430e

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 8801c415c9b00bec46efc102c0daceba59397449
2019-12-11 09:50:33 -08:00
717274c001 Add useful warnings for t.grad when it won't be populated for known reasons (#30531)
Summary:
Fix https://github.com/pytorch/pytorch/issues/2362 and https://github.com/pytorch/pytorch/issues/19778

To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30531

Differential Revision: D18832767

Pulled By: albanD

fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff
2019-12-11 09:47:18 -08:00
3301794855 Port ELU activation to Aten (#29275)
Summary:
VitalyFedyunin, This PR is about port  ELU activation to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.ELU()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100) forward time is 0.28 (ms); backwad avg time is 0.18 (ms).
input size(128, 10000) forward time is 23.53 (ms); backwad avg time is 14.46 (ms).
OMP_NUM_THREADS=1
input size(128, 100) forward time is 0.16 (ms); backwad avg time is 0.08 (ms).
input size(128, 10000) forward time is 15.53 (ms); backwad avg time is 6.60 (ms).
```
After:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100) forward time is 0.24 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.73 (ms); backwad avg time is 1.11 (ms).
OMP_NUM_THREADS=1
input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.07 (ms).
input size(128, 10000) forward time is 14.40 (ms); backwad avg time is 6.00 (ms).
```
How to set the numbers of thread? using following script:
```
num_threads=$1
script=$2
last_core=`expr $num_threads - 1`
echo "using $num_threads OMP threads"
echo "bind cores to 0~$last_core"
export OMP_NUM_THREADS=$num_threads
export KMP_AFFINITY=granularity=fine,compact,1,0
numactl --physcpubind=0-$last_core --membind=0 python $script
```
and run .**/run.sh num_threads test.py**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29275

Differential Revision: D18587389

Pulled By: VitalyFedyunin

fbshipit-source-id: bea8f3f006c6893090f863d047c01886d195437a
2019-12-11 09:44:34 -08:00
4aa30d3c0c Revert D18293522: Optimize LayerNorm with explicit vectorization using Vec256
Test Plan: revert-hammer

Differential Revision:
D18293522

Original commit changeset: f4cfed6e62ba

fbshipit-source-id: cdd6d9d36c00b516aecdab549abeeffc4a473829
2019-12-11 08:55:28 -08:00
9305f44854 Remove BUILD_NAMEDTENSOR from codegen and .cu files (#31047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31047

Changelist:
- remove BUILD_NAMEDTENSOR from .cu files
- remove BUILD_NAMEDTENSOR special handling in function_wrapper.py
- remove BUILD_NAMEDTENSOR from cpp_extension.py. This code actually
did nothing because we always compile with BUILD_NAMEDTENSOR.

Test Plan: - run tests

Differential Revision: D18908442

Pulled By: zou3519

fbshipit-source-id: b239e24de58580adaf3cef573350773a38b1e4f0
2019-12-11 08:49:56 -08:00
65f6e449c7 Updating submodules
Summary:
GitHub commits:

0f94976f31
be15abd839
034086d70f
aa131abdf5
a3f268f1b5
6394aabc99

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: fa99a0a096de1f088e5fa8cd92fdf5fd6c330740
2019-12-11 07:25:34 -08:00
d6d6075573 Optimize LayerNorm with explicit vectorization using Vec256 (#29104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29104

We would like to provide the vectorized implementation for layer norm. This PR reuses https://github.com/pytorch/pytorch/pull/23349.

Test Plan:
buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"

buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval"

 python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval

Differential Revision: D18293522

fbshipit-source-id: f4cfed6e62bac1b43ee00c32b495ecc836bd9ec5
2019-12-11 06:01:45 -08:00
28ee309c9a disable onnx py3 gcc5 build (#31100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31100

This appears to not work right now. Disabling pending an investigation.

Test Plan: Imported from OSS

Differential Revision: D18928777

Pulled By: suo

fbshipit-source-id: 63089131bad98902979e5cf4373732c85badef9d
2019-12-11 00:26:15 -08:00
8013ffd400 Fix weight_norm export for dim=0 (#31015)
Summary:
Exported weight_norm is incorrectly reducing over axis 0 as well when dim is set to 0.
Previous test case only covers weight with size(0) == 1, which yields the same result whether reduced over or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31015

Reviewed By: hl475

Differential Revision: D18900894

Pulled By: houseroad

fbshipit-source-id: 19004f51933b37f848dbe4138e617a7a8e35a9ec
2019-12-10 23:43:56 -08:00
9a5fd2eb07 Fix conflicts in CMAKE_GENERATOR and generator (#30971)
Summary:
...specified in -G

https://cmake.org/cmake/help/latest/variable/CMAKE_GENERATOR.html
According to the document, the generator could be determined through two methods:
1. Specify in `-G`
2. Read from `CMAKE_GENERATOR`

We should avoid conflicts in these two methods. This fixes https://github.com/pytorch/pytorch/issues/30910.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30971

Differential Revision: D18927529

Pulled By: mingbowan

fbshipit-source-id: e9a179ceb32d6fbabfaeac6cfe9e6170ca170b20
2019-12-10 22:22:26 -08:00
7f5f2e8871 add ZERO_COLLISION_HASH to caffe2 data type (#30912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912

Add a new data type ZERO_COLLISION_HASH .

Test Plan: ci

Reviewed By: boryiingsu

Differential Revision: D18843626

fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c
2019-12-10 21:36:24 -08:00
c72dd526a7 kill py2 onnx builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31082

Differential Revision: D18922689

Pulled By: suo

fbshipit-source-id: 98c91b90ee3b1dd13c6020597a0ace741a1597da
2019-12-10 20:25:42 -08:00
9f3fe78239 peephole optimize type refinements (#31024)
Summary:
Peephole optimize out type refinements when they are no longer refining the type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31024

Differential Revision: D18920958

Pulled By: eellison

fbshipit-source-id: 6d05d9812b9f9dcf001de760a78a2042fb832773
2019-12-10 18:32:28 -08:00
d02280b432 move migration guide to appendix (#31068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31068

Let's get it out of the early parts now that the recursive API has been
around for a while

Test Plan: Imported from OSS

Differential Revision: D18920498

Pulled By: suo

fbshipit-source-id: 6f4389739dd9e7e5f3014811b452249cc21d88e7
2019-12-10 18:04:02 -08:00
d088bd0bad Updating submodules
Summary:
GitHub commits:

c6506e2698
4427c1a832
a653857178
558f42bd6c
3839cbaf52

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 4a253bba6de9a2c2a11a82e33809a370e1b4fd04
2019-12-10 16:58:08 -08:00
e7e6d56b77 Allow async work in rpc RequestCallback processing. (#30637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30637

RequestCallback api currently forces work to be always synchronous, which,
as we scale, means we're going to need to throw large number of (mostly
blocked) threads at the rpc problem. For some activities like dependent
autograd rpcs, there's not a necessary reason to block in these threads.

In this change, the RequestCallback api is updated to return a
shared_ptr<FutureMessage> rather than a Message:

   std::shared_ptr<FutureMessage> operator()(Message& request) const;

With a futures-style api, RPC ops that wish to be async can then be async,
while short-lived blocking functions (or Python UDFs) can just block.

In this change, we keep all of the current ops as synchronous (i.e. we block
and then return a completed FutureMessage). We also update the rpc_agents in
a manner compatible with this sort of parallelism.

Here, we only want to incur overhead when we use the async behavior.
Some modest extra cost seems unavoidable here (e.g. the allocation for the
std::make_shared<>), but we can trivially detect the synchronous/completed
case in the rpc_agent and avoid the extra thread-switches/etc. in that case.
ghstack-source-id: 95287026

Test Plan:
- Basic: buck test mode/dev-nosan caffe2/test/...
  - Additional testcase in ThriftRpcAgentTest for deferred work.

Differential Revision: D18774322

fbshipit-source-id: cf49922a71707cfb1726de16f93af23b160385d8
2019-12-10 16:11:05 -08:00
e42af97349 Add quantized concat conversion (#30887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30887

Support to convert quantized concat from pytorch to caffe2

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_cat

Imported from OSS

Differential Revision: D18855676

fbshipit-source-id: 5d0cf3f03c61819e168b080afa368b1255d0419c
2019-12-10 15:46:16 -08:00
3de8584de8 Correct definition of nodes that work with Autograd (#30683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30683

Assume that a node can work with autograd only if it is not a fusion
group and in prim or aten namespaces.

Test Plan: CI

Reviewed By: lly-zero-one

Differential Revision: D18795171

Pulled By: ilia-cher

fbshipit-source-id: 301090557e330b58be70e956784f7f0dc343c684
2019-12-10 15:39:38 -08:00
b7652a2f81 remove py2 flake8 lint (#29357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29357

As title

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D18920562

Pulled By: suo

fbshipit-source-id: b5dd559cfb0ba6c64b9ccf3655417afb56a7b472
2019-12-10 15:31:10 -08:00
d113b22571 kill PyTorch py2 circle jobs (#29353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29353

First step to killing Python 2 everywhere. I don't really know that much
about the caffe2 circle jobs so I left them alone for now.

Test Plan: Imported from OSS

Differential Revision: D18920563

Pulled By: suo

fbshipit-source-id: b37d8427a6ecd4b8a7e16c1ff948e0ce13b5798f
2019-12-10 15:31:06 -08:00
5edfe9cb80 add torch.square (#30719)
Summary:
fixes https://github.com/pytorch/pytorch/issues/30524
This adds an new operator `torch.square` to PyTorch

I think it is ready for the first-time review now albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30719

Differential Revision: D18909268

Pulled By: albanD

fbshipit-source-id: 5626c445d8db20471a56fc1d7a3490e77812662b
2019-12-10 15:22:46 -08:00
e3d40f857b Make nn.Module forward() type annotation more permissive (#31057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31057

The current signature basically will always fail to type check, because
mypy enforces that the subclass method's input types must be "wider"
than their superclass method's input types (i.e. they can vary
contravariantly). And nothing is wider than `Any`.

This change makes it so that any input params are allowed in
`forward()`. Fixes #29099

Test Plan: Imported from OSS

Differential Revision: D18918034

Pulled By: suo

fbshipit-source-id: 9940e9f769b55d580d9d7f23abf6f88edb92627f
2019-12-10 14:36:13 -08:00
8fd85d70be Updating submodules
Summary:
GitHub commits:

163b6e2428
1d7a0e1a4b
b8031f09d7
7fd86a8f64

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 98b2487b39fb56641641c0947ed09f883755126a
2019-12-10 14:19:31 -08:00
ed20937231 Remove TensorImpl::maybe_zero_dim.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30878

Test Plan: Imported from OSS

Differential Revision: D18855989

Pulled By: gchanan

fbshipit-source-id: 44087b6136ec40d0a3de5b5a9f03c60d002a1107
2019-12-10 13:21:47 -08:00
0cbbe050bb Updating submodules
Summary:
GitHub commits:

b459fcc89f
2b060c1498
13a2c072c4

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 59fb11a977dcb7b2c09acb7fe997b0d5e52f27c4
2019-12-10 12:48:07 -08:00
cc319659e3 qnnpack TanH
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31013

Test Plan: Imported from OSS

Differential Revision: D18898903

Pulled By: z-a-f

fbshipit-source-id: aa126a98627b808678f629f39853c3b9c70eb2bf
2019-12-10 12:23:37 -08:00
b01b05790e Fix memory leak due to circular dependency. (#31030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31030

DistAutogradContext held a shared_ptr reference to RecvRpcBackward and
RecvRpcBackward held a shared_ptr reference to the context. This circular
dependency caused significant memory leaks. As a result, I'm changing the
reference in RecvRpcBackward to be a weak_ptr.

Test Plan: waitforbuildbot

Differential Revision: D18896389

fbshipit-source-id: e5bc588b6f998885854e3a67de1e82452e8475ce
2019-12-10 12:20:43 -08:00
57f29a44c7 Bug fix of the histogram observers (#30970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30970

Check null tensors in the histogram observers

Test Plan: f154576636 vs f154820243

Reviewed By: hx89

Differential Revision: D18865771

fbshipit-source-id: 669c014d914525deee36142e12f013afaf3caf1d
2019-12-10 11:45:20 -08:00
27d7dba9ab Remove scalar_check specification and codegen. (#30874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30874

These have all been disabled at this point, so there is no difference in the generated code.

Test Plan: Imported from OSS

Differential Revision: D18855990

Pulled By: gchanan

fbshipit-source-id: 03796b2978e23ef9060063f33241a1cbb39f1cf3
2019-12-10 11:41:20 -08:00
47033b49f3 Suppress XCode build warnings (#31000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31000

## Summary

Add Fastlane configurations to suppress the build warnings from XCode.

Test Plan: Imported from OSS

Differential Revision: D18912489

Pulled By: xta0

fbshipit-source-id: f2c54d54a12ad2415695d1fcb1800684c7a9e560
2019-12-10 11:37:52 -08:00
2da3b9a0f6 Updating submodules
Summary:
GitHub commits:

fd8771904e
6bf51e234f
6380df5e10
696c2a2359

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 188670fcdc50ccf060eea137698ecfb45484e059
2019-12-10 11:23:13 -08:00
78a00d72b4 Revert D18899127: resubmit polish up overloads on free functions
Test Plan: revert-hammer

Differential Revision:
D18899127

Original commit changeset: 9049b8718926

fbshipit-source-id: c70a8aa4120aa757dce0926a8ab3cc5c92cd6041
2019-12-10 10:51:07 -08:00
394d2f7037 Fix the rendering of the doc of max. (#30779)
Summary:
Close https://github.com/pytorch/pytorch/issues/30731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30779

Differential Revision: D18837317

Pulled By: zou3519

fbshipit-source-id: b9b5ba414756a68d4b39a7a7c2d89fee1e3c040f
2019-12-10 10:48:16 -08:00
313c211f3f Calling JITed 8 Bit Fused SLS in FBGEMM from C2 (#30926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30926

Calling the JITed FBGEMM kernel for Fused 8 Bit Sparse Length Sum (Fused8BitRowwiseEmbeddingLookup)

Test Plan:
buck test  mode/dbg //caffe2/caffe2/python:lengths_reducer_fused_8bit_rowwise_ops_test

All tests pass.

Reviewed By: jspark1105

Differential Revision: D18058128

fbshipit-source-id: 0dfa936eb503712c39e53748e015fc156afde86f
2019-12-10 10:44:05 -08:00
bb7befb12c Support loading by blob in predictor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30805

Reviewed By: ipiszy

Differential Revision: D18827383

fbshipit-source-id: b97f958768618ca29a02b057667a9b4ee313ad3c
2019-12-10 10:34:14 -08:00
a42d093db2 FCTransposed to FbFCPacked (#29766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29766

Add FbgemmPackTranspose op to support the packing on FCTransposed weights

Add FCTransposed to FbFCPacked transformation to Dper fp16 exporter

Test Plan:
```
buck test mode/opt caffe2/caffe2/fb/fbgemm:fb_fc_packed_op_test
```

```
buck test mode/opt caffe2/caffe2/python:layers_test
```

Differential Revision: D18482306

fbshipit-source-id: e8f1947b3d0d04892293509ebf88742f5f0f5997
2019-12-10 10:18:21 -08:00
c34ef1aa2e Automatic update of fbcode/onnx to c08a7b76cf7c1555ae37186f12be4d62b2c39b3b (#30619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30619

Previous import was fea8568cac61a482ed208748fdc0e1a8e47f62f5

Included changes:
- **[c08a7b76](https://github.com/onnx/onnx/commit/c08a7b76)**: doc: fix some typos at ONNXIFI (#2473) <Yorkie Liu>
- **[4be12d46](https://github.com/onnx/onnx/commit/4be12d46)**: remove workshop update since it is done (#2460) <Prasanth Pulavarthi>
- **[86107d1b](https://github.com/onnx/onnx/commit/86107d1b)**: Updated with correct URL to LICENSE (#2468) <Ryan Loney>
- **[9bf6fbb6](https://github.com/onnx/onnx/commit/9bf6fbb6)**: Update Argmin/Argmax (#2461) <Lara Haidar>
- **[748d81b8](https://github.com/onnx/onnx/commit/748d81b8)**: Fix windows conda build (#2452) <Ashwini Khade>
- **[a32db1c5](https://github.com/onnx/onnx/commit/a32db1c5)**: Delete duplicate word in comment (#2439) <Haibo Hao>
- **[e108da9a](https://github.com/onnx/onnx/commit/e108da9a)**: Fix bug in function body verifier (#2390) <G. Ramalingam>
- **[c3d3ef82](https://github.com/onnx/onnx/commit/c3d3ef82)**: docs: fix typo in IR.md (#2441) <Elliot Waite>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D18766132

fbshipit-source-id: 13c04f21399579acb87a8f9fac2e4c329b0720b8
2019-12-10 10:15:08 -08:00
06c7420fa2 Raise error if a block can not be found from a CUDA tensor (#30870)
Summary:
After several discussions, we agreed not to put any extra safety check for recordStream as either the check will cause failures in certain scenarios or there is no need to throw for user errors.

As a summary, it simply does what is described in https://github.com/pytorch/pytorch/issues/27405, check if a tensor is indeed allocated by a CUDACachingAllocator instance, if it is, then throw internal error if a block can not be retrieved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30870

Differential Revision: D18851669

Pulled By: yxia11

fbshipit-source-id: c2f01798cd24f1fd0f35db8764057d5d333dab95
2019-12-10 08:04:00 -08:00
af4040d808 resubmit polish up overloads on free functions (#31014)
Summary:
Resubmitting https://github.com/pytorch/pytorch/pull/30356

Second commit has reintroduces deleted function which caused revert previously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31014

Differential Revision: D18899127

Pulled By: eellison

fbshipit-source-id: 9049b8718926c329d9cb46bb96eac6c278e9b866
2019-12-10 07:57:47 -08:00
e05ee4c421 Remove BUILD_NAMEDTENSOR macros (#30894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30894

This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In https://github.com/pytorch/pytorch/pull/25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-529014810)
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](https://github.com/pytorch/pytorch/pull/25798#issuecomment-555707645)
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-530539104)
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](https://github.com/pytorch/pytorch/pull/25798#issuecomment-562214053),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan: - wait for CI

Differential Revision: D18858543

Pulled By: zou3519

fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13
2019-12-10 07:54:05 -08:00
f48a8901c5 Add floor_divide function (#30493)
Summary:
Adds `torch.floor_divide` following the numpy's `floor_divide` api. I only implemented the out-of-place version, I can add the inplace version if requested.

Also fixes  https://github.com/pytorch/pytorch/issues/27512
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30493

Differential Revision: D18896211

Pulled By: eellison

fbshipit-source-id: ee401c96ab23a62fc114ed3bb9791b8ec150ecbd
2019-12-10 07:51:39 -08:00
44428d0ee2 Updating submodules
Summary:
GitHub commits:

6c87dc4d3c
5ec43afc1d
1e3cb8283f
3af1c72471
dc8e6e6e68
405e596d50
f40ae54a52
479a143912
e63b40cb4b
cb5f0670a6
470a664def
6e8f70b2d9
0fb026ca58
3595e0cf38
79b171ffa3
fb5322d98d
cd48fc606b

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 99bee659ea0fca0247d67d2dac12a821e1bd402d
2019-12-10 07:45:23 -08:00
42324cb6e8 Change interface from map of TensorShape to shapeInfoMap (#30802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30802

Change shape_hints from map<string, TensorShape> to ShapeInfoMap to catch dimType info from model file.

Reviewed By: ipiszy

Differential Revision: D18821486

fbshipit-source-id: c5d9ed72e158d3698aba38900aeda00f776745b4
2019-12-10 00:35:11 -08:00
5205556782 Export custom ops (#29752)
Summary:
Updated to export API:
When calling this API, a dict containing the custom opsets (domain and version) used to export the model could be provided.
We allow registering one custom opset (domain, version) per ONNX opset. So, when exporting an operator from a custom domain, users need to pass this pair. Default custom opset version is 1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29752

Reviewed By: hl475

Differential Revision: D18703662

Pulled By: houseroad

fbshipit-source-id: 84d22557d132b526169051193d730761798fce60
2019-12-09 18:48:50 -08:00
04b9324476 Factor out getInvokedMethod in InsertQuantDeQuantHelper (#30860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30860

att

Test Plan:
.

Imported from OSS

Differential Revision: D18849021

fbshipit-source-id: e5ff260f2f4e88075b0c6b32ccfd8272053ccc41
2019-12-09 16:10:58 -08:00
fa6661422f Disable flaky test_rref_context_debug_info
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30990

Test Plan: Imported from OSS

Differential Revision: D18893023

Pulled By: mrshenli

fbshipit-source-id: 80b36927f243fa53c4d64f7e7c51097290ffdeee
2019-12-09 15:55:51 -08:00
73dd8c005a Revert D18864774: polish up overloads on free functions
Test Plan: revert-hammer

Differential Revision:
D18864774

Original commit changeset: 6c566738bd6f

fbshipit-source-id: 669192605a3bc1a6ba06bbb5cae54f61637a45ae
2019-12-09 15:41:45 -08:00
446488960a polish up overloads on free functions (#30356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30356

This finishes up the `torch.jit.overload` api for free-functions.
- defaults now required on the implementation function itself
- fully follows [overload spec](https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading) such that the following is supported

```
overload
def mouse_event(x1: int, y1: int) -> ClickEvent: ...
def mouse_event(x1: int,
                y1: int,
                x2: Optional[int] = None,
                y2: Optional[int] = None): ...
```

Note: `jit.overload` isn't supported yet for UDT, but is support for modules. This PR doesn't make the same changes for modules, if reviewers think I should include them then I could do so in a follow up PR or wait to land this. Since that's still an internal api I think it's fine, and the changes here would allow us to expose `torch.jit.overload` on free functions.

Test Plan: Imported from OSS

Differential Revision: D18864774

Pulled By: eellison

fbshipit-source-id: 6c566738bd6f0551a000a9ea8d56e403636b7856
2019-12-09 15:12:18 -08:00
a03581b927 add tests that schemas are valid (#30749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30749

Add check to schemas that the schema is sane.

I removed the defaults from symbolic_script because they were in some cases wrong and don't actually do anything. At the point they're invoked the forward should already have matched all arguments.

Test Plan: Imported from OSS

Differential Revision: D18864775

Pulled By: eellison

fbshipit-source-id: 273d7e96d65b8a3d3de72e2d7bfcdf2417046c6b
2019-12-09 15:12:13 -08:00
e9ca13d7f5 Add glue code to collect debug info from all components
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30888

Test Plan: Imported from OSS

Differential Revision: D18857139

Pulled By: mrshenli

fbshipit-source-id: 5c1bfb83a21a4a57c4297bb94f14baa09520b791
2019-12-09 14:39:11 -08:00
8a57362000 Fix index out of bound error in Engine::ready_queue_size when called before start_threads
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30967

Test Plan: Imported from OSS

Differential Revision: D18887178

Pulled By: mrshenli

fbshipit-source-id: 67baeac9214a4749ce7e9b4d89862c93620b2d5e
2019-12-09 14:39:07 -08:00
a38c9b1ade Adding debugging metrics to process group agent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30884

Test Plan: Imported from OSS

Differential Revision: D18857140

Pulled By: mrshenli

fbshipit-source-id: 4ec61d13778dd49467159d0db4b6dd51feaf282b
2019-12-09 14:39:03 -08:00
82268bf300 handle reassignment to inf and nan (#30877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30877

Previously, when the environment tried to reassign variables which had been assigned to "inf" or "nan" it would fail because they are not simple values. Constant prop exposed this, a test was failing internally because of it.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D18861016

Pulled By: eellison

fbshipit-source-id: b9b72978a26a0b00b13bf8ea7685825551f5a541
2019-12-09 14:20:17 -08:00
3eefc06feb add constant prop for immutable types (#30544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30544

Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with https://github.com/pytorch/pytorch/pull/30503, so I'm going to land this PR first and then see if caching still has a sizable impact).

Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs.

Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing.

Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point.

Test Plan: Imported from OSS

Differential Revision: D18833607

Pulled By: eellison

fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a
2019-12-09 14:20:12 -08:00
648bb501a1 rename shouldAnnotate api (#30543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30543

`shouldAnnotate` doesn't make make a ton of sense as a public api

Test Plan: Imported from OSS

Differential Revision: D18833608

Pulled By: eellison

fbshipit-source-id: 460ee05d0fa91b1edc640c037be2a6ee8eaf50a6
2019-12-09 14:20:07 -08:00
45f0556ba0 Proper print for one element tuple (#30853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30853

Right now we print one element tuple as `(val)`, and it will
be interpreted as `val` in parsing, this PR changes it
to `(val,)` so we can recognize the one element tuple in parsing

Test Plan:
.

Imported from OSS

Differential Revision: D18846849

fbshipit-source-id: 42959b9190c2567ef021a861497077c550324b7c
2019-12-09 14:15:40 -08:00
5bf58274cc getQParams return a dictionary of qparams (#30859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30859

We can dictionary of quantization parameters to simplify the code
handling these things a bit

Test Plan:
.

Imported from OSS

Differential Revision: D18849023

fbshipit-source-id: 09e9860b2656a1affa8776016e16794529bcee3b
2019-12-09 13:42:21 -08:00
fb36f1c334 Updating submodules
Summary:
GitHub commits:

0f96b98cec
8090b337a4
e43d2c4424
70d1c268bf
fc6140865b
4caba2ed65

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 5b4edf4267942ab0cbd2980dc500227e3ce353e3
2019-12-09 13:02:10 -08:00
536481d9de Fix missing virtual destructor (#30927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30927

Classes that are used virtually (e.g. have virtual methods) must have a virtual destructor or bad things happen
ghstack-source-id: 95144736

Test Plan: waitforsandcastle

Differential Revision: D18870351

fbshipit-source-id: 333af4e95469fdd9103aa9ef17b40cbc4a343f82
2019-12-09 12:25:26 -08:00
528fa737ba Custom op autograd tests (#30519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30519

Re-enable them and write a few additional ones
ghstack-source-id: 95143051

Test Plan: unit tests

Differential Revision: D18729561

fbshipit-source-id: 8cefd8320913d72a450a3324bfd7c88faed072d7
2019-12-09 12:25:22 -08:00
daef363b15 Move Softshrink activation to Aten(CPU+CUDA) (#30229)
Summary:
VitalyFedyunin, This PR is about port Softshrink activation to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.Softshrink()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms).
CPU:
input size(128, 100) forward time is 0.19 (ms); backwad avg time is 0.23 (ms).
input size(128, 10000) forward time is 17.23 (ms); backwad avg time is 16.83 (ms).
```
After:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU:
input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 0.32 (ms); backwad avg time is 0.08 (ms).
```
`OMP_NUM_THREADS=1:`
```
Before:
input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.10 (ms).
input size(128, 10000) forward time is 7.58 (ms); backwad avg time is 7.91 (ms).
After:
input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.02 (ms).
input size(128, 10000) forward time is 7.30 (ms); backwad avg time is 1.02 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30229

Differential Revision: D18810054

Pulled By: VitalyFedyunin

fbshipit-source-id: e19074824396570db45ba488ae4f9fe1b07a5839
2019-12-09 12:19:46 -08:00
4f342a61c1 add the worker IDs outside of addSendRpcBackward to ensure they are (#30914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30914

When tensors don't require grad, we don't call `addSendRpcBackward`, where we record known workerIDs to clean up the dist autograd context later. But since  https://github.com/pytorch/pytorch/pull/29781, we always include the autograd context ID in RPCs, even if tensors do not require grad. So, it could be possible that we don't release the contexts on some nodes.

This can contribute to OOMs since the contexts will not be cleaned up in this case, which can be checking by running the unit test without this patch. We can fix this issue by moving the `addKnownWorkerIds`  call to the `getMessageWithAutograd` function.
ghstack-source-id: 95178561

Test Plan: Added a unit test: `test_context_cleanup_tensor_no_grad`

Differential Revision: D18869191

fbshipit-source-id: b80f66bfd0dd7d01960abe1691d3f44095bb1b2b
2019-12-09 11:38:34 -08:00
c75bc9067c MultiMarginCriterion: move scalar_check from codegen to code.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30827

Test Plan: Imported from OSS

Differential Revision: D18833658

Pulled By: gchanan

fbshipit-source-id: decd42789d92d4fbfeea9b470b3d7333e3862263
2019-12-09 07:48:58 -08:00
190dac13e3 Use universal references and perfect forwarding in Loops.h. (#30466)
Summary:
This simplifies the generated code a bit, saving about 40K off of libtorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30466

Differential Revision: D18836215

Pulled By: resistor

fbshipit-source-id: ad75c9e04783bb29cc06afd2022f73f9625dd52b
2019-12-08 23:31:10 -08:00
6848f9abb8 call fp16<->fp32 routines in fbgemm from Half2Float and Float2Half operators (#30715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30715

Changed caffe2/caffe2/TARGETS file to define USE_FBGEMM for x86 and USE_SSE_ONLY is not defined.

Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- Float16

Reviewed By: jianyuh

Differential Revision: D18806067

fbshipit-source-id: 1b44b90a9f6dc3c27f81a46038c0f7542ed2bab3
2019-12-07 19:46:47 -08:00
776fdda753 Add debug info API for distributed autograd. (#30642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30642

Adding a couple of basic metrics for distributed autograd which would
help in determining stuckness.
ghstack-source-id: 95156189

Test Plan: waitforbuildbot

Differential Revision: D18776478

fbshipit-source-id: a0556ad6fe2b7c3cd0082ee2350c1c78cafaaec5
2019-12-07 13:56:51 -08:00
0b33080992 Updating submodules
Summary:
GitHub commits:

452ebf30a8
8e85afc8a1
39d204760c
5760376392

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: aa1ff805dbe1a1cbe5eb256ed2ba30af587a8707
2019-12-07 13:48:58 -08:00
4bb497b38e MultiheadAttention fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30666

Test Plan: Imported from OSS

Differential Revision: D18864094

Pulled By: pbelevich

fbshipit-source-id: f7a634b2c7f526282bf918d47b9cc82aa0c0af1d
2019-12-07 09:42:10 -08:00
8b6d7698d6 Updating submodules
Summary:
GitHub commits:

40ac0e57c1

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: ac74c10651a5a4ef67c93a38dc6673f0687e38ae
2019-12-07 02:43:38 -08:00
f1bd8cc286 Fix lint issues in dist_autograd_test.py (#30928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30928

ghstack-source-id: 95152373

Test Plan: waitforbuildbot

Differential Revision: D18872870

fbshipit-source-id: 2cd1ef228da4bd90c13e2f067a0c89b975fa3179
2019-12-07 01:44:37 -08:00
63f1b780ba Support exporting aten::copy_ and aten::index_put to ONNX opset 11 (#26941)
Summary:
- [x] Add more comments and refactor the logic of `ReshapeToAdvancedIndexingFormat`
- [x] Add more description here. Cases that are/aren't supported, and how they are supported.
- [x] Need to merge this PR https://github.com/pytorch/pytorch/issues/27186 to enable testing inplace operators.

We are now supporting exporting aten::copy_ and aten::index_put to ONNX.
Here's a breakdown of the different cases in PyTorch code.

```
# Case 1: Scalar Indices
x[0, 1, 2] = data

# Case 2: Slice Indices
x[1:3, :, ::2] = data

# Case 3: Ellipsis Indices
x[..., 0] = data

# Case 4: Tensor Indices
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[ind1, ind2] = data

# Case 5: Mixing all the above cases
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[1:3, ind1, ind2, ..., 3] = data
```

Limitations:

Tensor indices must be consecutive, and 1-d tensors.

```
# Supported
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[ind1, ind2] = data

# Not supported
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
ind3 = torch.tensor([[0], [1]])
x[ind1, :, ind2] = data
x[ind3] = data
```

Negative indices are not supported.
```
# Not supported
x[-1] = data
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26941

Differential Revision: D17951030

Pulled By: houseroad

fbshipit-source-id: 4357777072f53aa0bc4b297aa1ee53457a7f8dec
2019-12-06 22:48:46 -08:00
a26238da57 Enable using torch.autograd.profiler.record_function as decorator (#30861)
Summary:
```python
record_function('my_func')
def f(x, y):
    return x + y

with profile() as p:
    f(1, 2)
print(prof.key_averages().table())
```

```
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
my_func                               85.42%           86.796us         87.27%           88.670us         88.670us         1
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 101.606us
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30861

Differential Revision: D18857993

Pulled By: bddppq

fbshipit-source-id: eb6b8e2a8d4f3a7f8e5b4cb3da1ee3320acb1ae7
2019-12-06 21:38:35 -08:00
5c56986738 Attach autograd edges only for tensors requiring grad. (#30904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30904

When we sent tensors over RPC, on the server side we would call
addRecvRpcBackward which would call `set_history` on all tensors. This was
incorrect and set the `requires_grad` flag on tensors that didn't actually need
grad.

To fix this, we only attach autograd edges to tensors that need grads.
ghstack-source-id: 95113672
ghstack-source-id: 95113999

Test Plan: waitforbuildbot

Differential Revision: D18828561

fbshipit-source-id: d8942b76e9e4c567f8f1821f125c00d275ea0f90
2019-12-06 18:05:57 -08:00
62b10721fb Actually make flake8 do something (#30892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892

Fixes all outstanding lints and actually installs a properly configured
flake8

Test Plan: Imported from OSS

Differential Revision: D18862825

Pulled By: suo

fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
2019-12-06 17:50:50 -08:00
8d35b6cec7 embedding_bag make_bag_size optimization (#30701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30701

From James' PR https://github.com/pytorch/pytorch/pull/19715

embedding_bag microbenchmarks:
Baseline: P123020983
Refactor make_bag_size, no changing at::zeros to at::empty (this diff): P123021393
Inference benchmark on T6_SKL - _embedding_bag self time only:
bs=40, baseline: .302 ms/iter
bs=40, with diff: .244 ms/iter
bs=1 baseline: .148 ms/iter
bs=1 with diff: .124 ms/iter
The bigger gap comes from fb::embedding_bag_byte_rowwise_offsets, I'm looking into that one too.

Test Plan:
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./inference_benchmark_nolr_emb.par --pt-scripted-model=traced_model.pt --pt-inputs="batch_size_40/pt_inputs.pth" --iters=3000 --warmup-iters=100
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 3000 --operators embeddingbag

Reviewed By: yinghai, qizzzh

Differential Revision: D18800166

fbshipit-source-id: 820e6ece0b6ade72ee42409661f92c548f43a4cb
2019-12-06 16:17:16 -08:00
cd6167ff63 Upgrade bazel to 1.2.0. (#30885)
Summary:
Companion diff for https://github.com/pytorch/xla/pull/1464. Should land only after the pytorch/xla PR is in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30885

Differential Revision: D18866835

Pulled By: ailzhang

fbshipit-source-id: 51f4d2770f8ef873a659579ddd81a42957ffb885
2019-12-06 16:08:24 -08:00
7b97eaeba5 Add module level qpl logging. (#30906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30906

Add mobile module observer to measure performance of each method run.
ghstack-source-id: 95120194

Test Plan:
Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent:
1. buck install -r fb4a
2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params:
a. sample_rate: 1.0
b. enabled: true
c. use_bytedoc_pytorch_model: true
d. use_bytedoc_caffe2_model: false
e. use_full_jit: false
3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage;
4. Click on the ads, wait for the offsite ads loads;
5. Click back to news feed;
6. Go to scuba table: https://fburl.com/scuba/4fghwp0b and see all the operator runs have been logged:

{F223456981}

Reviewed By: ljk53

Differential Revision: D18702116

fbshipit-source-id: a9f07eee684e3022cef5ba3c5934f30f20192a85
2019-12-06 15:52:26 -08:00
118f1c633b refactor the way we are handling bailout counts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30410

Differential Revision: D18733370

Pulled By: Krovatkin

fbshipit-source-id: 0ea9dc0f3dd1a47bcc09f1d54745460f9bd71886
2019-12-06 15:45:38 -08:00
c37de32b23 Enable len(dataloader) for iterable dataset (#23587)
Summary:
Copy-paste comment from code for reasoning:

```
            # NOTE [ IterableDataset and __len__ ]
            #
            # For `IterableDataset`, `__len__` could be inaccurate when one naively
            # does multi-processing data loading, since the samples will be duplicated.
            # However, no real use case should be actually using that behavior, so
            # it should count as a user error. We should generally trust user
            # code to do the proper thing (e.g., configure each replica differently
            # in `__iter__`), and give us the correct `__len__` if they choose to
            # implement it (this will still throw if the dataset does not implement
            # a `__len__`).
            #
            # To provide a further warning, we track if `__len__` was called on the
            # `DataLoader`, save the returned value in `self._len_called`, and warn
            # if the iterator ends up yielding more than this number of samples.
```

Fixes https://github.com/pytorch/pytorch/issues/30184
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587

Differential Revision: D18852625

Pulled By: ailzhang

fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826
2019-12-06 15:38:05 -08:00
a77eafa1d8 Fix 'initialized after field' error (#30908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30908

Same as title.

Test Plan: Wait for CI to clear.

Reviewed By: bddppq, xw285cornell

Differential Revision: D18862837

fbshipit-source-id: bc34356b85774fc20ba46d321c8a2bb5d5c727f6
2019-12-06 15:04:18 -08:00
baccd26df7 update code analyzer script to handle splitted torch libraries (#30864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30864

Change it to handle all archive files under install folder.

Test Plan:
```
ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh
ANALYZE_TORCH=1 tools/code_analyzer/build.sh
```

Differential Revision: D18850317

Pulled By: ljk53

fbshipit-source-id: 7c57ae16c82b6ded53aa7df385f3b6074190fc04
2019-12-06 14:38:30 -08:00
223f46f5fa Fix flake8 warning (#30905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30905

-
ghstack-source-id: 95117983

Test Plan: -

Differential Revision: D18861981

fbshipit-source-id: b794a7fbe05af29471286c7f665cf3f86541eb5a
2019-12-06 14:19:35 -08:00
4fd20c0816 Kill hypothesis deadline testing (#30890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890

We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing

Test Plan: Imported from OSS

Differential Revision: D18857597

Pulled By: jamesr66a

fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6
2019-12-06 13:36:14 -08:00
26c51468c5 Fix examples in RRef API doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30857

Test Plan: Imported from OSS

Differential Revision: D18847527

Pulled By: mrshenli

fbshipit-source-id: 7dc9d28277597f8fc3ef97fa9ac98a312e76e6fb
2019-12-06 13:14:11 -08:00
642469b706 Fix examples in API doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30856

Test Plan: Imported from OSS

Differential Revision: D18847528

Pulled By: mrshenli

fbshipit-source-id: 57f666d9d4b634fb77b1b65debd2b07e2bebd57a
2019-12-06 13:14:06 -08:00
5e6c3fb23b Add more details to explain rpc_backend_options arg in init_rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30855

Test Plan: Imported from OSS

Differential Revision: D18847529

Pulled By: mrshenli

fbshipit-source-id: b4f0d5797f3b41cce155b7821d6bd34b268bd24e
2019-12-06 13:14:02 -08:00
6d06b925ba Remove values_to_quantize_ (#30858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30858

This is not needed since we have `values_to_qparams_`

Test Plan:
.

Imported from OSS

Differential Revision: D18848992

fbshipit-source-id: dc81f59967a93abdd5562f1010f02de4f4e60db0
2019-12-06 12:15:13 -08:00
81e4739141 Move QScheme ops to c10 (#30134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30134

ghstack-source-id: 95055387

Test Plan: buck build mode/dev caffe2:generate-code

Differential Revision: D18609716

fbshipit-source-id: fec39359e0b97387a9b13f8179d72a731cc61808
2019-12-06 12:04:51 -08:00
d6ddfab11f save linux build binary size to Scuba (#30832)
Summary:
example: https://fburl.com/scuba/mjheume7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30832

Differential Revision: D18857146

Pulled By: mingbowan

fbshipit-source-id: 66bcd352922944c227f337a66e8a75e2d7393fd3
2019-12-06 11:55:35 -08:00
78254eab45 Add mobile operator observer for qpl logging.
Summary: Add mobile operator observer to measure performance of each operator run, the result will also log into QPL event: [MOBILE_OPERATOR_STATS ](https://fburl.com/quicklog/8773a00a).

Test Plan:
Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent:
1. buck install -r fb4a
2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params:
a. sample_rate: 1.0
b. enabled: true
c. use_bytedoc_pytorch_model: true
d. use_bytedoc_caffe2_model: false
e. use_full_jit: false
3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage;
4. Click on the ads, wait for the offsite ads loads;
5. Click back to news feed;
6. Go to scuba table: https://fburl.com/scuba/er7t4g9u and see all the operator runs have been logged:

{F223250762}

Reviewed By: ljk53

Differential Revision: D18131224

fbshipit-source-id: 23e2f6e2a9851c04b29511b45dc53f3cce03e8a0
2019-12-06 11:55:32 -08:00
44ff7b08d8 Reduce intrusive_ptr incref/decref costs (#30709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30709

Intrusive_ptr doesn't provide a explicit incref method. When a users want to
incref the target, they creates a intrusive_ptr to wrap the target, then makes
a copy which does the actual incref, then release both the first intrusive_ptr
and the copy to prevent decref at deconstruction time. This is very
inefficient. Instead, do the incref/decref directly.

Differential Revision: D18798505

fbshipit-source-id: 524d4f30d07d733df09d54423b044d80e4651454
2019-12-06 11:52:20 -08:00
e123d90a93 Back out "Back out "Back out "Revert D18542342: Boxed variable dispatch""" (#30650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30650

Original commit changeset: 51bb7aac7cb7
ghstack-source-id: 95082205

Test Plan: CI

Differential Revision: D18778190

fbshipit-source-id: 7e9577e88fd0492006b6ea836ec081aea9da6b0c
2019-12-06 11:45:09 -08:00
37435d36ed Refactor VariableTypeManual (#30649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30649

Operators in VariableTypeManual are now no longer registered against the VariableTypeId key, but they are registered as compound ops. See https://github.com/pytorch/pytorch/issues/30102 for background.

This also requires the non-variable codegen to ignore them and requires removal of VariableMethodStubs.cpp.

So, because function_wrapper.py now also needs to know which ops are manual, instead of having a hard-coded list in gen_variable_type.cpp for ops with manual implementation, we now have a `manual_kernel_registration` flag in native_functions.yaml that disables the registration of operator kernels for this operator (the schema is still registered). Then, we manually register the right kernels for the operator.
ghstack-source-id: 95082204

Test Plan: unit tests

Differential Revision: D18778191

fbshipit-source-id: 0af6f9e43ff4fb9800ce19b286dfccd0fd22cc41
2019-12-06 11:45:05 -08:00
b0e7db5b31 Revert D18840736: make sure windows tests get triggered
Test Plan: revert-hammer

Differential Revision:
D18840736

Original commit changeset: 6fdf73649622

fbshipit-source-id: 719576e9c717847bfb4b057875a273123e941db3
2019-12-06 11:26:37 -08:00
4ed2eae2d0 Add registerQParams function (#30552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30552

For upcoming changes to support quantizing shared class type

Test Plan:
.

Imported from OSS

Differential Revision: D18818653

fbshipit-source-id: 393a55db69b20a1c00ffa0157ab568cb097915b2
2019-12-06 11:17:35 -08:00
0051467118 Update CITATION from Workshop paper to Conference paper (#30872)
Summary:
The conference paper is finally published at NeurIPS 2019: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30872

Differential Revision: D18854253

Pulled By: soumith

fbshipit-source-id: 4f91838b1953e976542997959d5571884f739872
2019-12-06 09:16:17 -08:00
377131b0eb MultiMarginCriterion: fix scalar_check in the case where reduction == None. (#30826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30826

Previously the scalar_check for the reduction None case was:
input.dim() <= 1, but it should be target based, i.e.:
target.dim() == 0.  This follows from the "correct cases", i.e.
(N, C) X (N,) -> (N,)
(C,) X () -> ()

Test Plan: Imported from OSS

Differential Revision: D18833660

Pulled By: gchanan

fbshipit-source-id: 26338b842a8311718c4b89da3e2f1b726d5409b8
2019-12-06 09:04:38 -08:00
5687ee1d85 added a serialize function in SGD class to utilize the existing macro for serialization/deserialization calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30739

Differential Revision: D18842908

Pulled By: anjali411

fbshipit-source-id: 7dc13ff9c4fc126790b88b1b4b5d03425c349d38
2019-12-06 08:38:07 -08:00
e5d571ae25 Remove scalar_check from topk, move it to the THC implementation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30852

Test Plan: Imported from OSS

Differential Revision: D18842662

Pulled By: gchanan

fbshipit-source-id: b5e8a4367fce9441be2ddbd026495f1911038221
2019-12-06 07:50:20 -08:00
60714dfb64 change index_select scalar_check to retain dimensionality of input. (#30790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30790

The index_select documentaiton reads:
"The returned tensor has the same number of dimensions as the original tensor (input)."

But the implementation would return a 0-dimensional tensor iff both the input and index were 0-dimensional.
This change makes it so we retuan a 0-dimensional tensor iff the input is 0-dimensional.

Restacked version of: https://github.com/pytorch/pytorch/pull/30502

Test Plan: Imported from OSS

Differential Revision: D18825717

Pulled By: gchanan

fbshipit-source-id: aeb10c5107e748af3e264fbdc81fff5dd4833cc4
2019-12-06 07:47:53 -08:00
1d7b40f1c4 Fix reading __cuda_array_interface__ without strides (#24947)
Summary:
When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24947

Differential Revision: D18838986

Pulled By: ezyang

fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42
2019-12-06 07:36:27 -08:00
11b3065323 Run method_tests on CUDA. (#30821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30821

While investigating while our tests didn't catch #30704 I noticed that none
of our tests in method_tests() were being run on CUDA.  This diff moves
those tests into the new device-generic test framework so that we also get
CUDA coverage.  For expediency, I blacklisted all tests which didn't work
on CUDA (rather than fix them); that's something we can leave for future PRs.
This is done by way of a new expectedFailure gadget.

Note that all occurences of skipIfNoLapack needed to be replaced with
skipCPUIfNoLapack.

I punted for test_jit; it's possible those tests should also run CUDA but a JIT
expert should take a look here.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18840089

Pulled By: ezyang

fbshipit-source-id: 66b613b5024c91d3e391c456bb642be7e73d4785
2019-12-06 07:24:27 -08:00
9a858aba5f Moving checks related to options.aliasAnalysis and schema.hasAliasInfo to read callsite (#30671)
Summary:
**Context:**
In D18530964, we allow not set aliasAnalysis at previous registration call, and then update it to the correct one in following registration call.

But its not working E2E due to those existing checks.

So we want to remove or delay those TORCH_CHECKs.

Here is the existing three callsites for operator.aliasAnalysisKind():
https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/ir.cpp?lines=994%2C995%2C996%2C1001%2C1004

https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/operator.cpp?lines=147%2C155

https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/passes/alias_analysis.cpp?lines=260%2C277%2C380

**Things to check**
1. Those two checks are different. But since in original op_registration code, if options.schemaOrName_->is_right() is FALSE, we kind of convert it to FunctionSchema type, so in the read callsites, we only need to check the following: options.aliasAnalysisKind_ == AliasAnalysisKind::FROM_SCHEMA ||  !schema.hasAnyAliasInfo()

2. If the three callsites above are indeed needed for those checks.

3. Here we made assumptions that for reads from jit or other places, its always being called after all registrations calls are done. Trying to make sure its a valid assumption
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30671

Test Plan: Will update and refactor the tests soon.

Differential Revision: D18784623

Pulled By: charliechen0401

fbshipit-source-id: 75edea140d0ae3e54820e1aeef010c81fe26416a
2019-12-06 01:36:22 -08:00
619e2ffe23 Replace deprecated AT_* with TORCH_* to reduce warnings in c10d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30795

Test Plan: Imported from OSS

Differential Revision: D18826310

Pulled By: mrshenli

fbshipit-source-id: 0041ac2e5788e874e0a566abd57a8a90e658da9b
2019-12-06 01:28:30 -08:00
b0cba8ceae Replace deprecated AT_ERROR with TORCH_CHECK to reduce warnings in rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30794

Test Plan: Imported from OSS

Differential Revision: D18826311

Pulled By: mrshenli

fbshipit-source-id: bfd58d30f386bbe9535264b2afce4acbe7ac5b0e
2019-12-06 01:28:26 -08:00
2011cc1e91 Fix half->float case of softmax backward when inner_size is not 1 (#30838)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30572

That unit test is tested to fail with master and success with this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30838

Differential Revision: D18841066

Pulled By: ngimel

fbshipit-source-id: 86a7ccdb3016c98d62dd0946daff101704cd1f68
2019-12-06 00:25:34 -08:00
d32aec5ad6 Add get_metrics and get_debug_info to rpc agent (#30833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30833

[rpc] Add get_metrics and get_debug_info to rpc agent

Test Plan: UT and builds

Reviewed By: mrshenli

Differential Revision: D18835068

fbshipit-source-id: f552cf196bb6d54ccd38a44ba981e7d5b15513f0
2019-12-05 23:52:42 -08:00
58cdf1429c Add tests for quantizing traced models (#30476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30476

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18795724

fbshipit-source-id: 9253e102bf458d9185f68848071a4e4eff9f9b08
2019-12-05 23:03:45 -08:00
f1755d9aea Insert GetAttr for quantization parameters instead of Constant (#30551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30551

To enable quantizing with shared types, we need to insert GetAttr nodes for
quantization parameters since the code might be shared by multiple module instances
and we'd like to make quantized module instance also share the same code but with
different values of attributes.

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D18818652

fbshipit-source-id: fc95623cac59dcedd9e3f95397524eae515e7a11
2019-12-05 22:52:45 -08:00
1fa4908ac0 Refactor test_quantization.py and enable test_nested (#30475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30475

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18795727

fbshipit-source-id: c9942c5361e0a34e91a08b8fc27405799db7ff4f
2019-12-05 21:56:03 -08:00
ef95a72690 modify test_local_shutdown_with_rpc to not be flaky (#30837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30837

This test would get very occasional flakes, with an error saying the
RPC timed out. This happened because one worker would still be waiting for the
return value of an RPC, but another worker had already performed its local
shutdown, so it would not have sent the response. This didn't show up in
initial testing since the flakiness is very rare (< 1/100 test runs). This diff
fixes the issue by not erroring if these RPCs timeout. The reason this is okay
is because with a local shutdown, we should not expect for all outstanding RPCs
to be completed, since workers are free to shut down without completing/waiting
on outstanding work.
ghstack-source-id: 95021672
ghstack-source-id: 95021672

Test Plan: Ran the test 1000 times to ensure that it is not flaky.

Differential Revision: D18775731

fbshipit-source-id: 21074e8b4b4bbab2be7b0a59e80cb31bb471ea46
2019-12-05 21:46:39 -08:00
7af9d77290 Update persons_of_interest.rst
Updating to add POI for mobile, quantization and an addition to optimizers.
2019-12-05 21:20:40 -08:00
a7406516d1 Refactor bias and weight check and add aten::linear pattern (#30474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30474

There are some common parts in `isBiasOfConvOrLinear` and `isWeightOfConvOrLinear`, we can factor
them out, the refactor will allow for easier extension of new patterns

Test Plan:
python test/test_jit.py
python test/test_quantization.py

Imported from OSS

Differential Revision: D18795725

fbshipit-source-id: 446463da5e3fa8464db441ed0d9651930487b3b7
2019-12-05 21:00:39 -08:00
a51c5f5cbf Add JIT pass to insert permutes for conv ops (#30679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30679

Caffe2 expects quantized ops to be in NHWC format while pytorch inputs are in NCHW.
Add a jit pass to insert permutes to convert from nchw2nhwc before each conv op and add nhwc2nchw permute after the conv op.
Using graph rewriter to find consecutive redundant permutes and remove them from the graph

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps

Imported from OSS

Differential Revision: D18790518

fbshipit-source-id: 4dd39cf0b31b21f5586c0edfdce2260d4e245112
2019-12-05 18:51:16 -08:00
c1159494a6 Revert D18621773: we should have a config-based way to skip flaky tests
Test Plan: revert-hammer

Differential Revision:
D18621773

Original commit changeset: 5532f1d5fa3f

fbshipit-source-id: 22239b88a6f9551938e6e2178bf9162e3385b011
2019-12-05 17:08:20 -08:00
4034aa7621 make sure windows tests get triggered (#30836)
Summary:
we prefer "_" over "-" in build names, so change checks in test script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30836

Differential Revision: D18840736

Pulled By: mingbowan

fbshipit-source-id: 6fdf736496225c5f8ab44906d8f4681b7bf894a7
2019-12-05 15:47:56 -08:00
82c3f4861f Move hardtanh activation to Aten(CPU, CUDA) (#30152)
Summary:
VitalyFedyunin, This PR is about port Hardtanh activation to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.Hardtanh()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    fwd_t = 0
    bwd_t = 0
    for i in range(10000):
        t1 = _time()
        output = m(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU
input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.06 (ms).
input size(128, 10000) forward time is 0.84 (ms); backwad avg time is 0.44 (ms).
```
After:
```
GPU:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms).
input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms).
CPU
input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 0.61 (ms); backwad avg time is 0.10 (ms).
```
`OMP_NUM_THREADS=1:`
```
Before:
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.07 (ms).
input size(128, 10000) forward time is 5.21 (ms); backwad avg time is 5.25 (ms).
After:
input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms).
input size(128, 10000) forward time is 1.09 (ms); backwad avg time is 1.09 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30152

Differential Revision: D18815545

Pulled By: VitalyFedyunin

fbshipit-source-id: d23b6b340a7276457f22dce826bcbe3b341d755f
2019-12-05 15:28:03 -08:00
6e38d50352 Revert D18117070: Migrate max and min (binary) from TH to ATen.
Test Plan: revert-hammer

Differential Revision:
D18117070

Original commit changeset: e06d37a8a140

fbshipit-source-id: 49dd33f52e7e3ffcaafc02109a0a0a67545ec7e8
2019-12-05 14:43:29 -08:00
e5bd7a7942 we should have a config-based way to skip flaky tests (#29944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29944

This particular approach queries our issue tracker for test titles that
match the following format:

```
DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync)
```

And then skips the python test for them. There is 1 second timeout so
if the internet flakes we still run the test suite, without disabling any
tests.

This is intended as a quick fix, similar to ninja unland, to get to a green
master. Long term test disables should go into the code.

Test Plan: Imported from OSS

Differential Revision: D18621773

Pulled By: zdevito

fbshipit-source-id: 5532f1d5fa3f83f77fc3597126cbb7dba09a3c33
2019-12-05 14:28:27 -08:00
0974dcc244 Fix error checking of CUDA multi_margin_loss. (#30825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30825

It didn't verify in the 1-d case that the targets were size 1..

Test Plan: Imported from OSS

Differential Revision: D18833659

Pulled By: gchanan

fbshipit-source-id: 9b0276e7b0423fdaf2ba7cfa34bde541558c61f9
2019-12-05 14:23:00 -08:00
2ced81f289 Revert "Default to not build Caffe2 operators on Windows. (#29061)" (#30740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30740

This reverts commit 7102aceaf88ab71781c6019458bd7a07e86a532f.

Test Plan: Imported from OSS

Differential Revision: D18834315

Pulled By: ezyang

fbshipit-source-id: 2dbd1cf686864b9840365083182cd6188a285399
2019-12-05 14:01:59 -08:00
f874230d33 Vectorize smooth L1 loss backward function on CPU. (#30046)
Summary:
Benchmark (Intel i7-8850H, turbo off, release build, RHEL 7.7):

```
import timeit

for dtype in ('torch.float', 'torch.double'):
    print(f'dtype={dtype}')
    for n, t in [(10_000, 100000),
                (100_000, 20000)]:
        print(f'numel() == {n} for {t} times')
        print(timeit.timeit('output.backward(retain_graph=True)', number=t, setup=f"""
import torch
loss = torch.nn.SmoothL1Loss()
input = torch.randn({n}, requires_grad=True)
target = torch.randn({n})
output = loss(input, target)
"""))
```

Before:

```
dtype=torch.float
numel() == 10000 for 100000 times
6.154701935998673
numel() == 100000 for 20000 times
5.157296671999575
dtype=torch.double
numel() == 10000 for 100000 times
6.195317157000318
numel() == 100000 for 20000 times
5.099748799999361
```

After:

```
dtype=torch.float
numel() == 10000 for 100000 times
4.968745516000126
numel() == 100000 for 20000 times
2.4029395039997326
dtype=torch.double
numel() == 10000 for 100000 times
4.9910852479988534
numel() == 100000 for 20000 times
2.4867371629989066
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30046

Differential Revision: D18602399

Pulled By: VitalyFedyunin

fbshipit-source-id: 4c6c7b7b69ad6bce759786ddd7d6bc1e88ecf6ab
2019-12-05 13:57:42 -08:00
6486bdfb90 Fix os.register_at_fork not defined on Windows (#30809)
Summary:
According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809

Differential Revision: D18828777

Pulled By: bddppq

fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f
2019-12-05 13:36:53 -08:00
c564d794ed Add ATen/native/ headers to torch target (#30835)
Summary:
We dont have ATen/native/*.h in torch target before, and we would like it to be exposed for external use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30835

Differential Revision: D18836160

Pulled By: zrphercule

fbshipit-source-id: 7330a9c9d8b65f173cc332b1cfeeb18c7dca20a8
2019-12-05 13:24:21 -08:00
244b0bd1a5 Add docs for how we expose declarations in at:: to torch:: (#30760)
Summary:
This PR adds docs for how we expose declarations in `at::` to `torch::`, to make the semantics more clear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30760

Differential Revision: D18833081

Pulled By: yf225

fbshipit-source-id: eff4d8815c67f681ce3a930ce99771cf2e55dbd9
2019-12-05 13:05:28 -08:00
be55874f2c style fixes to code analyzer (#30808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30808

Addressed some comments on #29550 after it's landed.

Test Plan:
```
LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh
LLVM_DIR=... ANALYZE_TORCH=1 tools/code_analyzer/build.sh -closure=false -debug_path=true
```

Differential Revision: D18835100

Pulled By: ljk53

fbshipit-source-id: 991d292ddc0211a88b04d0bdc24719f471c7786e
2019-12-05 11:25:37 -08:00
9617d07bd5 Wrap warning handler in a function to avoid siof (#30800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30800

SparseNN benchmark crashed due to this.
Wrap warning handler in a function to avoid siof.

Test Plan: Tested locally, SparseNN benchmark no longer crashes.

Reviewed By: yinghai

Differential Revision: D18826731

fbshipit-source-id: 8fcab8a3f38cc20f775409c0686363af3c27d0a6
2019-12-05 11:22:15 -08:00
bf1b4b6fef add torch_cpu to the static library list in TorchConfig.cmake.in (#30769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30769

The TorchConfig.cmake is the public cmake we produce in install folder for
3rd party client code to get all libtorch dependencies easily.

Apparently this build flow is not well covered by our CI (which is focused
on 1st party build / shared libraries?) as the little dummy project for
code analysis testing purpose was broken by #30315 without fail any CI.

Fixed the problem for mobile build and add the dummy project build to mobile
CI as well.

Test Plan: - make sure new CI pass;

Differential Revision: D18825054

Pulled By: ljk53

fbshipit-source-id: 80506f3875ffbc1a191154bb9e3621c621e08b12
2019-12-05 11:13:32 -08:00
f531815526 Deprecate tensor.type() (#30281)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29161.

I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281

Differential Revision: D18830818

Pulled By: ezyang

fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20
2019-12-05 10:55:34 -08:00
2171f91053 reenable cuda_kernel_loop_overflow_large test (#30797)
Summary:
Fix https://github.com/pytorch/pytorch/issues/30771 has landed, original issue https://github.com/pytorch/pytorch/issues/26838 is now closed

cc peterjc123
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30797

Differential Revision: D18827307

Pulled By: ngimel

fbshipit-source-id: 41b3db5fc9db85daeaa1b53c55b468976c996285
2019-12-05 10:09:39 -08:00
1578a28692 Migrate max and min (binary) from TH to ATen. (#27185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27185

TH implementation will be removed after the unary max and min are migrated.

Benchmark: (Debian 10, Release build, gcc 7.4, no turbo)

```python
import timeit
for device in ('cpu', 'cuda'):
    print(f'device: {device}')
    for op in ('max', 'min'):
        for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
            for n, t in [(10_000, 200000),
                        (100_000, 20000)]:
                print(f'torch.{op}(a, b), numel() == {n} for {t} times, dtype={dtype}')
                print(timeit.timeit(f'torch.{op}(a)' + (';torch.cuda.synchronize()' if device == 'cuda' else ''),
                                    setup=f'import torch; a = torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype}) * ({n} / 2)', number=t))
    print()
```

Before:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.241763713000182
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.7138833169992722
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.2183356810000987
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7031846980007685
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7704679510006827
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.289198366999699
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7937613740014058
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2930124340000475
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8032857640009752
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.2908709189996443
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8829010000008566
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.2994690759987861
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
1.8037853410005482
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.2929310759991495
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.8075240359994496
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2932477679987642
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.7868400779989315
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2885970789993735
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8389664830010588
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.29402057399966

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.787109836999662
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.842438002999188
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.429616614999759
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.835390076999829
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.940423873000327
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4108991760003846
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.9318018840003788
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4168134739993548
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9610764919998473
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4189234130008117
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.960172712999338
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4162539499993727
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.8985912560001452
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.4113489299998037
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.9160250799995993
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4128787690005993
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8806865219994506
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4086357010000938
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9362181240012433
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4151225870009512

```

After:

```
device: cpu
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
2.2685823729998447
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.72004808300062
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.212242640000113
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.7089235590001408
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7767087259999244
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2916517639996528
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8265984959998605
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.3002885240002797
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8084679720004715
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3012119999993956
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
1.8800218449996464
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.3060645710002063
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
2.4905043950002437
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.9126290209997023
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
1.7972335520007618
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.2918074379995232
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
1.8047651860006226
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.2992197730000044
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
1.8526509560006161
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.3030709570002728

device: cuda
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double
4.700986622000528
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.8415469050005413
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.3051693249999516
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float
1.8321999460004008
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8086475109994353
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.405110773999695
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.913458047999484
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4236377289998927
torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
2.9386842409994642
torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4230227469997772
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double
3.0341797270002644
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double
1.4289592409995748
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float
3.6091147850002017
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float
2.036691903999781
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16
2.8256167649997224
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16
1.4078955400000268
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32
2.8631781489993955
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32
1.4210130069996012
torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64
3.0112479260005784
torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64
1.4297719679998409

```

Solve partly #24594 #24595

Close #25016

Test Plan: Imported from OSS

Differential Revision: D18117070

Pulled By: VitalyFedyunin

fbshipit-source-id: e06d37a8a1405848ba0b9e398870a77eb52bae8b
2019-12-05 09:55:56 -08:00
fa251cfd97 Fully deprecate variadic inputs of checkpoint_sequential (#25985)
Summary:
To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985

Differential Revision: D18809875

Pulled By: albanD

fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0
2019-12-05 09:23:28 -08:00
2607772959 Turn off scalar_checks for SpatialDepthwiseConvolution and SpatialConvolutionMM. (#30789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30789

The input(s) can't be 0-dimensional, so its irrelevant.

Restacked version of: https://github.com/pytorch/pytorch/pull/30438

Test Plan: Imported from OSS

Differential Revision: D18825716

Pulled By: gchanan

fbshipit-source-id: a4883b795163efcb9d8dba6166d0f2102b6728a2
2019-12-05 08:07:31 -08:00
f12332eb51 Move scalar_check from codegen to code in MultiLabelMarginCriterion. (#30770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30770

Restacked version of: https://github.com/pytorch/pytorch/pull/30753

Test Plan: Imported from OSS

Differential Revision: D18821556

Pulled By: gchanan

fbshipit-source-id: 64b7311b1eb3855c4f1981d060accc918b99088d
2019-12-05 08:07:26 -08:00
50625798df Fix scalar check of MultiLabelMarginLoss. (#30768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30768

The behavior didn't match the documentation, because the documentation (for 'none' reduction) reads:
input X target -> output
(N, C) X (N, C) -> (N,)
(C,) X (C,) -> ()

but the later case would output (1,).  This also changes the case to:
() X (C,) -> ()
from:
() X (C,) -> (C,)
which makes more sense with the above formulas.

Restacked version of: https://github.com/pytorch/pytorch/pull/30748

Test Plan: Imported from OSS

Differential Revision: D18821554

Pulled By: gchanan

fbshipit-source-id: 3df77c51cf25648cb5fab62a68b09f49c91dab4e
2019-12-05 08:07:20 -08:00
473a044835 Fix a CUDA memory leak in MultiLabelMarginCriterion error checking. (#30767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30767

Restacked version of: https://github.com/pytorch/pytorch/pull/30733

Test Plan: Imported from OSS

Differential Revision: D18821553

Pulled By: gchanan

fbshipit-source-id: 8bf0365ce54dd2f07a5d6d0937332d0baf75b350
2019-12-05 08:07:15 -08:00
ba1a9871cb Turn off scalar_check for is_target for MultiLabelMarginCriterion, which is handled correctly in code. (#30766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30766

Restacked version of: https://github.com/pytorch/pytorch/pull/30728

Test Plan: Imported from OSS

Differential Revision: D18821555

Pulled By: gchanan

fbshipit-source-id: 27acc72f82e94eddeea675ae66e010cfb2fc7421
2019-12-05 08:07:10 -08:00
35a6997863 Support 0-d tensors in CUDA MultiLabelMarginCriterion. (#30765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30765

It is already supported in CPU and is pretty easy to add for consistency.

Restacked version of: https://github.com/pytorch/pytorch/pull/30727

Test Plan: Imported from OSS

Differential Revision: D18821557

Pulled By: gchanan

fbshipit-source-id: e6aa3e91000ff3fd63941defc7d30aef58ae2f82
2019-12-05 08:07:05 -08:00
c4e9748bc6 Provide full path for buck hipification (#30746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30746

This diff should be safe as long as open source build succeeds and should have no impact to cuda.

Differential Revision: D18811302

fbshipit-source-id: a7adab993816cba51842701898fac5019438b664
2019-12-05 07:57:52 -08:00
f2a2fec47c CUDA-strided-complex Binary and Unary Op support (#30295)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex)

Changes so far:

- [x]  Added complex support of torch.empty and torch.fill()
- [x]  Added complex support of CopyKernels
    - The 'static_cast_with_inter_type' template function is specialized for the following cases
        - `dest_t = thrust::complex<dest_value_t>`, `src_t = std::complex<src_value_t>`
        - `dest_t = std::complex<dest_value_t>`, `src_t = thrust::complex<src_value_t>`
     - This handles the compile-time case where `dest_value_t=double` and `src_value_t=float`.
- [x]  Added complex support of BinaryOp kernels
    - `using thrust_t = typename ztype_cuda<scalar_t>::thrust_t;` converts std::complex<T> ScalarTypes to thrust types and is a no-op of other Scalar Types.
    - The operator is performed using complex number support defined in `thrust/complex.h`
    - This could be extended to work with ROCm by using `rocm/complex.h`
- [x]  Added complex support of UnaryOp kernels
    - Added CUDA support for `angle()`, `real()`, `imag()`, `conj()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30295

Differential Revision: D18781954

Pulled By: ezyang

fbshipit-source-id: 25d204c0b8143ee27fda345a5d6a82f095da92a7
2019-12-05 07:30:39 -08:00
139aa51962 Clean up non-C++14 code (#28443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28443

We're now on C++14, so we don't need the else branch of these ifdef's anymore
ghstack-source-id: 94904074

Test Plan: waitforsandcastle

Differential Revision: D18069136

fbshipit-source-id: f1613cab9a99ee30f99775e4a60a1b06fd0a03ff
2019-12-05 00:41:29 -08:00
a939b52ddb fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_… (#30771)
Summary:
…overflow_large to working state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30771

Differential Revision: D18821529

Pulled By: ngimel

fbshipit-source-id: c5cbf56e686a2a3cfc7274dd96db37289dac7588
2019-12-04 20:58:30 -08:00
1d20c32bf1 Make InsertQuantDeQuantHelper global (#30550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30550

Right now we have a `InsertQuantDeQuantHelper` for each module, but we need
it to be global because we need to know what graphs have been quantized before
and based on this information we can decide how to handle the module instance.

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D18818651

fbshipit-source-id: bfcaf37094ce20a257171a0c99b05b9348ebc13d
2019-12-04 20:03:00 -08:00
c4c2e23385 Supporting making submodules unique (#30037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30037

Support quantization for modules with reused submodules, e.g. relu (automatically make unique)
We first do a pass on the graph to find all duplicate uses of the same module, and record the `Value`s of the
module instance, for each of these values we create a new module and change the access to that module.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18821483

fbshipit-source-id: 1698b981e9e9f0c728d9f03fcbcfbd260151f679
2019-12-04 19:26:56 -08:00
7a2889b014 Stop producing op_version_set version numbers.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28122

Test Plan: Imported from OSS

Differential Revision: D17959565

Pulled By: zdevito

fbshipit-source-id: 701101bd870700eb0c9882c69e2cfdd2524b555e
2019-12-04 19:14:43 -08:00
3c1bb21cf5 Invoke more passes in insertObservers (#30473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30473

Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.
ghstack-source-id: 94887831

Test Plan:
This is needed for quantizing traced model tests to pass

Imported from OSS

Differential Revision: D18795722

fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829
2019-12-04 18:45:04 -08:00
e09c415387 Back out "make the order btw div and mul in adagrad update consistent" (#30737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30737

Original commit changeset: 2a8b2a3f5401

Reverting this to be safe until we address test failures in T58528495

Test Plan: CI

Reviewed By: wx1988

Differential Revision: D18812384

fbshipit-source-id: 2a3ac554024773022ec827f259127e4c8cffe6e2
2019-12-04 17:43:45 -08:00
1f1ce53e8e Don't install pybind11 header directory for system pybind11 installs (#30758)
Summary:
For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version.

Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758

Differential Revision: D18820189

Pulled By: bddppq

fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17
2019-12-04 16:43:21 -08:00
569ea63f3b fix anynonzero op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29423

Test Plan: Imported from OSS

Differential Revision: D18820523

fbshipit-source-id: 55c7a1911121f0aed008bd684b448151bbbf0a8a
2019-12-04 16:40:43 -08:00
1d8a13147c Updating submodules
Summary:
GitHub commits:

1e345af4de
61d54df22c
dab87e19bf

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 88e55e94c7473a7a310338eaaf508e7fc71e0df6
2019-12-04 16:40:39 -08:00
cd032c7f6a Updating submodules
Summary:
GitHub commits:

b94ef9fb23
4462a7f00a
16e629c415
50770702ad
5b632a5deb
d2fa2cbcd6
4e152f651e
54c89b5f03

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 766783d00f8440c1264f13045ae6411233355af6
2019-12-04 14:56:01 -08:00
1707774417 AddConstant and findConstant for ClassType (#29217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29217

We want to preserve constant information in ClassType so that
users can access the constants in the module by name.
This is also used later for freezing some attribute(converting
attributes to constant)

Test Plan:
tbd

Imported from OSS

Differential Revision: D18799955

fbshipit-source-id: fbfbcd5d3f7f560368b96e2a87e270c822a3d03a
2019-12-04 14:17:13 -08:00
2308a0ec1b Improve documentation around builtin functions (#30347)
Summary:
This breaks the builtins page into some more sections and adds details about Python built-in functions
](https://our.intern.facebook.com/intern/diff/18718166/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30347

Pulled By: driazati

Reviewed By: wanchaol

Differential Revision: D18718166

fbshipit-source-id: bf43260ab7bcf92cccef684a5ce68cb16020771d
2019-12-04 13:50:40 -08:00
42e79d7e8a Kill THNN version of MultiMarginCriterion; it's not used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30725

Test Plan: Imported from OSS

Differential Revision: D18808767

Pulled By: gchanan

fbshipit-source-id: bcc4a6e272036f3d167fc158a53fe7aa1dec51f9
2019-12-04 13:46:32 -08:00
9d3402e4cb Add the __torch_function__ API override mechanism (#30730)
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.

I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730

Differential Revision: D18813270

Pulled By: ezyang

fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
2019-12-04 13:19:07 -08:00
289e9a07fd Move Tanh backward to Aten(CPU+CUDA) (#30224)
Summary:
VitalyFedyunin, This PR is about port Tanh backward to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)
def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
m = nn.Tanh()
if torch.cuda.is_available():
    device = "cuda"
    m = m.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    for i in range(1000):
        output = m(input)
        output.backward(grad_output)

for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n, device=device)
    bwd_t = 0
    for i in range(10000):
        output = m(input)
        t1 = _time()
        output.backward(grad_output)
        t2 = _time()
        bwd_t = bwd_t + (t2 - t1)
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d)  backwad avg time is %.2f (ms)." % (n, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P40.
Perfromance:
Before:
```
GPU:
input size(128, 100) backwad avg time is 0.12 (ms).
input size(128, 10000) backwad avg time is 0.17 (ms).
CPU
input size(128, 100) backwad avg time is 0.05 (ms).
input size(128, 10000) backwad avg time is 0.35 (ms).
```
After:
```
GPU:
input size(128, 100) backwad avg time is 0.12 (ms).
input size(128, 10000) backwad avg time is 0.17 (ms).
CPU
input size(128, 100) backwad avg time is 0.04 (ms).
input size(128, 10000) backwad avg time is 0.25 (ms).
```
`OMP_NUM_THREADS=1:`
```
Before:
input size(128, 100) backwad avg time is 0.03 (ms).
input size(128, 10000) backwad avg time is 1.85 (ms).
After:
input size(128, 100) backwad avg time is 0.02 (ms).
input size(128, 10000) backwad avg time is 1.16 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30224

Differential Revision: D18810045

Pulled By: VitalyFedyunin

fbshipit-source-id: ab37948ab8f76bdaf9f3d1388562eaf29dacc0ea
2019-12-04 12:55:33 -08:00
d38f9117fd Cache compilation of free functions (#30503)
Summary:
We don't have to recompile free functions if we've already compiled them.

Improved compilation of resnet18 by 27%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30503

Differential Revision: D18796501

Pulled By: eellison

fbshipit-source-id: 2dee0fc5fcf9adc5b92213f8cb813730d71b376f
2019-12-04 12:45:35 -08:00
9d69c55b0d add MaskedRowWiseSparseAdagrad
Summary: As title

Test Plan: buck test caffe2/caffe2/fb/optimizers:masked_adagrad_test

Reviewed By: chocjy

Differential Revision: D18736639

fbshipit-source-id: d0d73f75228604d3448651bff2cf34ecc21f9ba6
2019-12-04 12:36:09 -08:00
786de33832 Move scalar_check logic from codegen to code in NLLLoss. (#30670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30670

Also turn off scalar_check for grad_input: it isn't necessary because the input can't be 0-dimensional.

Test Plan: Imported from OSS

Differential Revision: D18784523

Pulled By: gchanan

fbshipit-source-id: 246d30970457075a0403dd0089317659a2cd2dd4
2019-12-04 12:30:23 -08:00
fa2aa245cf Simplify scalar_check of nll_loss. (#30669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30669

The inputs can't be 0-d, so we don't need that check in the scalar_check.

Test Plan: Imported from OSS

Differential Revision: D18784524

Pulled By: gchanan

fbshipit-source-id: d44222dffc91880a6e8c7be69e6e146e60040d43
2019-12-04 12:30:19 -08:00
6918f0ce86 Move scalar_check for total_weight in NLLLoss functions to code from codegen. (#30665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30665

total_weight is a "hidden" output just for autograd, so it's not user visible.  The existing test_nn tests cover this (I verified that the new code is executed) and this matches the CPU behavior.

Test Plan: Imported from OSS

Differential Revision: D18782709

Pulled By: gchanan

fbshipit-source-id: 6d1c20eeaeffa14d06f375b37f11e866587f5fa0
2019-12-04 12:30:14 -08:00
756f279d95 Rename QuantizeHelper to InsertQuantDeQuantHelper (#30549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30549

Preparing for later refactoring

Test Plan:
.

Imported from OSS

Differential Revision: D18802464

fbshipit-source-id: 0b5afb143549d93eed4c429125d3d5fd253093a9
2019-12-04 10:40:22 -08:00
f73cd28082 InsertObservers for shared class types (#30548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30548

ClassTypes can be shared among different module instances, but previously we assumed
they would be unique, this PR enables the insert_observers pass to work with shared class types

Test Plan:
python test/test_jit.py
python test/test_quantization.py

Imported from OSS

Differential Revision: D18802465

fbshipit-source-id: b782e71e44a043af45577ac2b5c83e695155bb8b
2019-12-04 09:34:47 -08:00
6e145b4614 add irregular c10 op registration/invocation cases to test project (#30558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30558

Most c10 op registration/invocation cases are generated by aten codegen
following some fixed pattern, but a handful of them were written
manually, mainly for quantized ops. Added these "irregular" cases to the
test project to verify static code analyzer can handle them as well.

Test:
- build and run the test project;

Test Plan: Imported from OSS

Differential Revision: D18811098

Pulled By: ljk53

fbshipit-source-id: 7bdf17175dfec41c56c0d70f124cc96478135bc4
2019-12-04 08:46:00 -08:00
a55f125e3b Check the error return of nvrtcGetProgramLogSize and nvrtcGetProgramLog (#30663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30663

Yes they can fail.  See https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18810088

Pulled By: ezyang

fbshipit-source-id: 96186e71c9a195bdbbed811e7ba8dc40bec09eae
2019-12-04 08:37:43 -08:00
ca072951d5 move MaskedAdagrad to caffe2/operators/experimental/optimizers (#30714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30714

Move Masked*Adagrad operators so caffe2/python/optimizer.py can use them.

Test Plan: buck test caffe2/caffe2/operators/experimental/optimizers:masked_adagrad_test

Reviewed By: chocjy

Differential Revision: D18805532

fbshipit-source-id: 49b1f755b31296c62e7a6a8134313b962ad9690c
2019-12-04 08:29:13 -08:00
d0af07ca4c Fix capitalization inconsistency in optim.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30608

Differential Revision: D18808516

Pulled By: ezyang

fbshipit-source-id: 4be68be9a8c8c3da7a0b98162bc1050b588fab43
2019-12-04 08:17:03 -08:00
38986e1dea Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.

Some things of note:

* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18790941

Pulled By: ezyang

fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
2019-12-04 08:04:57 -08:00
1189595875 Fix Tensor.argsort -> torch.argsort documentation link
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30464

Differential Revision: D18717657

Pulled By: zou3519

fbshipit-source-id: 9894f63c6cb1b5311117441e78805230d1bc09f3
2019-12-04 07:49:38 -08:00
b8792c0438 Revert D18645954: add __torch_function__ API override mechanism
Test Plan: revert-hammer

Differential Revision:
D18645954

Original commit changeset: 54b5e4344d7a

fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13
2019-12-04 07:41:47 -08:00
a68b790293 fix ref to nonexistent torch.repeat
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30614

Differential Revision: D18808517

Pulled By: ezyang

fbshipit-source-id: 27f9bda6fbbd1c3c751a0e96fdc336bf724c0b31
2019-12-04 07:27:01 -08:00
ec7bb9de1c format tri[lu]_indices doc better
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30377

Differential Revision: D18689152

Pulled By: zou3519

fbshipit-source-id: 7fab1e39ecd39ef6a3869befcbe217f8d3b6a87e
2019-12-04 07:16:34 -08:00
d6ca93b353 add doc for F.softplus
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30055

Differential Revision: D18762624

Pulled By: zou3519

fbshipit-source-id: 61da88cbb8cd0f37ac26b0fb8aaacdbe85c724ba
2019-12-04 07:16:30 -08:00
d12786b24f add __torch_function__ API override mechanism (#27064)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details).

For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py.

This PR currently contains:

* tests for `__torch_function__` behavior
* modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument.

This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)).

### Benchmarks:
See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064

Differential Revision: D18645954

Pulled By: ezyang

fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767
2019-12-04 05:56:46 -08:00
c0299d2707 add LLVM code analyzer in order to replace static dispatch
Summary:
[Why static dispatch]
Static dispatch was introduced to allow stripping out unused ops at link
time (with “gc-sections” linker flag) for mobile build.

The alternative approaches to do "non-static" dispatch are:
* virtual methods - old ATen dispatcher, which has already been deprecated;
* registry pattern - used by caffe2, c10 and JIT;

However, none of them are “gc-sections” friendly. Global registers are
root symbols - linker cannot strip out any op if we use registry pattern
for mobile.

[Why static dispatch isn’t great]
* One more code path to maintain;
* Need recompile framework to add new backends/ops;
* Doesn’t support AutoGrad yet thus blocks on-device training;

[Static Code Analysis]
This PR introduces a LLVM analysis pass. It takes LLVM bitcode /
assembly as input and generates dependecy graph among aten ops. From a
set of root ops used by a model, we can calculate transitive closure of
all dependent ops, then we can ask codegen to only register these ops.

[Approach]
To generate the dependency graph it searches for 3 types of connections in
LLVM bitcode / assembly:
 1) op registration: op name (schema string literal) -> registered function;
 2) regular function call: function -> function;
 3) op invocation: function -> op name (schema string literal)

For 2) it uses similar algorithm as llvm::LazyCallGraph - not only looks into
call/invoke instructions but also recursively searches for function pointers
in each instruction's operands.

For 1) and 3) it searches for connections between operator name string
literals / function pointers and c10 op registration/invocation API calls in
LLVM IR graph via "use" edges (bi-directional):
 1. llvm::Value has "users()" method to get other llvm::Value nodes that use
    the value;
 2. most of types derive from llvm::User which has "operands()" method to get
    other llvm::Value nodes being used by the value;

[Limitation]
For now the search doesn't go beyond the function boundary because the
reference to op name string literals and c10 op registration/invocation
APIs are almost always in the same function.

The script uses regular expression to identify c10 API calls:
* op_schema_pattern="^(aten|quantized|profiler|_test)::[^ ]+"
* op_register_pattern="c10::RegisterOperators::(op|checkSchemaAndRegisterOp_)"
* op_invoke_pattern="c10::Dispatcher::findSchema|callOp"

If we create helper function around c10 API (e.g. the "callOp" method
defined in aten/native), we could simply add them to the regular expression
used to identify c10 API.

[Example]
In the following example, it finds out:
 1) the registered function for "quantized:add" operator;
 2) one possible call path to at::empty() function;
 3) the called operator name "aten::empty":

- "quantized::add"
- c10::detail::wrap_kernel_functor_unboxed_<at::native::(anonymous namespace)::QAdd<false>, at::Tensor (at::Tensor, at::Tensor, double, long)>::call(c10::OperatorKernel*, at::Tensor, at::Tensor, double, long)
- at::native::(anonymous namespace)::QAdd<false>::operator()(at::Tensor, at::Tensor, double, long)
- void at::native::DispatchStub<void (*)(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::operator()<at::Tensor&, at::Tensor const&, at::Tensor const&>(c10::DeviceType, at::Tensor&, at::Tensor const&, at::Tensor const&)
- at::native::DispatchStub<void (*)(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::choose_cpu_impl()
- void at::native::(anonymous namespace)::qadd_kernel<false>(at::Tensor&, at::Tensor const&, at::Tensor const&)
- at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool)
- at::TensorIterator::build()
- at::TensorIterator::fast_set_up()
- at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>)
- "aten::empty"

[How do we know it’s correct?]
* Built a test project that contains different op registration/invocation
  patterns found in pytorch codebase, including both codegen and non-codegen
  cases.
* Tried different optimization flags “-O0”, “-O3” - the result seems to
  be stable.
* Filtered by common patterns: “aten::”, “at::”, “at::native”,
  “at::CPUType”, “at::TypeDefault” - manually checked the relationship
  between function schema strings and corresponding implementations were
  captured.
* It can print instruction level data flow and show warning message if it
  encounters unexpected cases (e.g.: found 0 or multiple op names per
  registration/invocation API call, found 0 registered functions, etc).
* Verified consistent results on different linux / macOs hosts. It can
  handle different STL library ABI reliably, including rare corner cases
  for short string literals

[Known issues]
* Doesn’t handle C code yet;
* Doesn’t handle overload name yet (all variants are collapsed into the
  main op name);

Test Plan:
```
LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 scripts/build_code_analyzer.sh
```

Differential Revision: D18428118

Pulled By: ljk53

fbshipit-source-id: d505363fa0cbbcdae87492c1f2c29464f6df2fed
2019-12-04 01:02:33 -08:00
f5c9452beb Fix toObject() r-value version (#30713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30713

It should use moveToIntrusivePtr.
This function is a very hot one and used a lot in interpreter loop. e.g.
GET_ATTR, SET_ATTR. Making a copy and doing incref/decref caused big overhead.

Reviewed By: yinghai

Differential Revision: D18805212

fbshipit-source-id: 3a9368604f71638a21300ad086739c4b50f0644e
2019-12-04 00:19:35 -08:00
d456a538f9 op dependency analysis bash driver
Summary:
Move the shell script into this separate PR to make the original PR
smaller and less scary.

Test Plan:
- With stacked PRs:
1. analyze test project and compare with expected results:
```
ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh
```

2. analyze LibTorch:
```
ANALYZE_TORCH=1 tools/code_analyzer/build.sh
```

Differential Revision: D18474749

Pulled By: ljk53

fbshipit-source-id: 55c5cae3636cf2b1c4928fd2dc615d01f287076a
2019-12-04 00:12:24 -08:00
7e472679ff pin actions/checkout version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30703

Test Plan: Imported from OSS

Differential Revision: D18805447

Pulled By: suo

fbshipit-source-id: d58ebe0e90b81c9282d3977f36c53c54cac750d9
2019-12-03 20:52:54 -08:00
b26401f965 Dump operator names of a script module (#30467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30467

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

Test Plan: Imported from OSS

Differential Revision: D18801619

Pulled By: iseeyuan

fbshipit-source-id: f9b198d3e82b095daf704ee595d8026ad889bb13
2019-12-03 20:20:33 -08:00
63a1542ed2 Adding Debug Info for RRef Context
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30610

Test Plan: Imported from OSS

Differential Revision: D18763592

Pulled By: mrshenli

fbshipit-source-id: ad8854bdb6250c29eaa0f582d66cfd31394312e5
2019-12-03 19:16:31 -08:00
6dda241ab8 Add RRef.__str__() API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30609

Test Plan: Imported from OSS

Differential Revision: D18763593

Pulled By: mrshenli

fbshipit-source-id: 20f1eea2d6cfe9ab2a27a9677d97dde07c1dca9b
2019-12-03 19:16:26 -08:00
bb5dcaf24f Add logical_and and logical_or (#30521)
Summary:
With the CI failure caused in 8bbafa0b32d2899ef6101172d62c6049427c977b fixed (incorrect return type of the lambdas in CUDA kernels)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521

Differential Revision: D18770151

Pulled By: ailzhang

fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a
2019-12-03 18:24:54 -08:00
ab834d5093 Remove exp10 in TH (unused)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422

Test Plan: Imported from OSS

Differential Revision: D18764280

Pulled By: VitalyFedyunin

fbshipit-source-id: 626b88a115f2efce4a53c6784f0a6660b36c97f9
2019-12-03 18:17:24 -08:00
76acf5b553 Remove many unused bfloat16 functions in TH
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30329

Test Plan: Imported from OSS

Differential Revision: D18764281

Pulled By: VitalyFedyunin

fbshipit-source-id: bc3f91c6d09d4f73c77fe1492a358128744aee76
2019-12-03 18:17:19 -08:00
4ac614191a Remove exp10 in TH (unused)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422

Test Plan: Imported from OSS

Differential Revision: D18764186

Pulled By: VitalyFedyunin

fbshipit-source-id: 9343a5a7e4edf61ba3b85eaf846b2e149ed6529a
2019-12-03 18:17:15 -08:00
ea3697db69 inline to prevent duplicate obj when linking (#30363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30363

getting duplicate definition errors when linking test.
ghstack-source-id: 94472892

Test Plan: CI passes

Differential Revision: D18669686

fbshipit-source-id: 3d3bfc38e4247cf8bea655537824b891b84f67bc
2019-12-03 15:59:25 -08:00
3cf8382984 detect_anomaly() for SparseTensors (#29803)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28649

1. Modified detect_anomaly() to use isnan()
2. isnan() for SparseTensors returns a bool Tensor of _values.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29803

Differential Revision: D18594299

Pulled By: ezyang

fbshipit-source-id: 3f4190c569f53219be330584fc604ca43c4a6c7a
2019-12-03 15:42:51 -08:00
fef4360536 remove default constructor in futureInfo (#30197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30197

This default constructor was added because std::map's operator[]
requires a default constructor. However, instead of using operator[], we can
use emplace and remove the constructor, to ensure that the FutureInfo struct
doesnt get constructed with garbage values.
ghstack-source-id: 94802453

Test Plan: Unit tests pass.

Differential Revision: D18627675

fbshipit-source-id: c4cb000e60081478c0fd7308e17103ebbc4dc554
2019-12-03 15:36:22 -08:00
59151d3e43 autograd/profiler: support merging FunctionEventAvg (#30677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30677

Currently you can only add FunctionEvents to FunctionEventAvg. This makes it so you can add multiple FunctionEventAvg objects together. This is useful for merging multiple profiles together such as when dealing with distributed training.

Test Plan:
added unit test

  buck test //caffe2/test:autograd -- test_profiler

Reviewed By: bddppq

Differential Revision: D18785578

fbshipit-source-id: 567a441dec885db7b0bd8f6e0ac9a60b18092278
2019-12-03 15:28:58 -08:00
dcd1216efe Force early initialization of OpenMP in forked children (#29006)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
2019-12-03 15:23:31 -08:00
a376dd344c Added check for torch.where on CPU that both arguments have same dtype (#30662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30662

Cherry picked from: https://github.com/pytorch/pytorch/pull/29081

Test Plan: Imported from OSS

Differential Revision: D18782295

Pulled By: nairbv

fbshipit-source-id: 897ab25ddf8819ca34f5e86c5d3f41debb56cb04

Co-authored-by: ifedan
2019-12-03 15:19:52 -08:00
56dd2836ec Make zeros argument of torch.where same dtype as other argument (#30661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30661

Cherry-picked from https://github.com/pytorch/pytorch/pull/29080

Test Plan: Imported from OSS

Differential Revision: D18781870

Pulled By: nairbv

fbshipit-source-id: 9de85aa91bf7e0856f35c7c6238a8923315ed27f

Co-authored-by: ifedan
2019-12-03 15:19:48 -08:00
2ba03e0287 Enable test_trainer_ps in dist_autograd_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30341

Test Plan: Imported from OSS

Differential Revision: D18769574

Pulled By: mrshenli

fbshipit-source-id: caf25742fa1fc9dbf6486f5ec981fae3f29784bc
2019-12-03 15:12:36 -08:00
d4c25add45 make sure the counter stays correct in between bailout transitions (#30186)
Summary:
This fixes the second issue reported in https://github.com/pytorch/pytorch/issues/29909 namely, a loop counter is assigned the wrong values after transitioning to a bailout graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30186

Differential Revision: D18646845

Pulled By: Krovatkin

fbshipit-source-id: 1f7c601dd9f35892979385ffa132fb0886a4f203
2019-12-03 14:59:08 -08:00
03a73cb9ac Remove namespace F = torch::nn::functional from torch/nn/modules/batchhnorm.h (#30684)
Summary:
This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to.

Fixes https://github.com/pytorch/pytorch/issues/30682.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684

Differential Revision: D18795717

Pulled By: yf225

fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad
2019-12-03 14:52:23 -08:00
604a27361f remove tuple_parser (#30659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30659

I could only find one usage of TupleParser and it doesn't seem worth maintaining just for that one usage.

Test Plan: Imported from OSS

Differential Revision: D18795979

Pulled By: nairbv

fbshipit-source-id: 6e50d65fc8fade0944f36ab20d00f1539a3d4cb8
2019-12-03 14:49:59 -08:00
4d4d8e0dce Update persons_of_interest.rst (#30647)
Summary:
Adding back the 3 names for the MSFT team - re: ONNX Governance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30647

Differential Revision: D18781163

Pulled By: jlin27

fbshipit-source-id: 7284ba29841ab41b9807c9d92694630b50de7b6a
2019-12-03 14:46:15 -08:00
4e6379379c fetch before checking out PR tip
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30680

Test Plan: Imported from OSS

Differential Revision: D18796189

Pulled By: suo

fbshipit-source-id: 99da48e5fd510ffdf4e606c2393eb55d4f6ca8d5
2019-12-03 14:43:19 -08:00
980aead1f8 Add support for quantized slice conversion (#30498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30498

Updated Int8SliceOp to accept dim, start and end index similar to Pytorch.

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_slice

Imported from OSS

Differential Revision: D18740519

fbshipit-source-id: 2313f37a4936edb150ce04911b241e591e191801
2019-12-03 14:37:59 -08:00
bc2e6d10fa Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14"
Summary: Original commit changeset: 775d2e29be0b

Test Plan: CI

Reviewed By: mruberry

Differential Revision: D18775520

fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac
2019-12-03 14:33:43 -08:00
aff693ab1c Ensure MIOpen is called on same stream as operator for RNN (#30672)
Summary:
To ensure synchronization between copying of weights in RNN wei buf, and the operation, both the pyTorch operator as well as underlying MIOpen call must be on the same HIP stream. This is also consistent with MIOpen calls in other pyTorch operators

ezyang iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30672

Differential Revision: D18785683

Pulled By: bddppq

fbshipit-source-id: 144611046cb70cfe450680295734203f253ac6e2
2019-12-03 14:28:45 -08:00
40146eb48e Skip ProcessGroupGlooAyncTest if there is no CUDA available (#30345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30345

Skip ProcessGroupGlooAyncTest if there is no CUDA available, otherwise in sandcastle non GPU host the test will abort with failing to load CUDA library
ghstack-source-id: 94771241

Test Plan: test skipped on non GPU host

Differential Revision: D18665322

fbshipit-source-id: 8c7b89aeecc6ec007bee12d864a6058384254e61
2019-12-03 13:27:34 -08:00
19cd90d303 Globally record observer nodes (#30547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30547

att

Test Plan:
test_jit.py test_quantization.py

Imported from OSS

Differential Revision: D18784752

fbshipit-source-id: 000e140aa86ff12a240d98da71871a5a5053401f
2019-12-03 12:16:00 -08:00
1b5ce05924 don't use size()/stride() functions in TensorImpl, use size_[d]/stride_[d] instead (#30452)
Summary:
This improved multi-d microbenchmark by ~100 ns, empty_tensor_restride used to be 13% of iteration time, now about 5%
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30452

Test Plan: Covered by existing tests

Differential Revision: D18704233

Pulled By: ngimel

fbshipit-source-id: be527f09183bc31e9d1f63fd49bfbe0998fe167f
2019-12-03 11:38:07 -08:00
7023e13fbb Fix mapping white list (#30636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30636

Currently DeQuantStub is still in whitelist because set union has
lower precedence than set difference
fix issue: https://github.com/pytorch/pytorch/issues/29646

Test Plan:
verified locally that we don't attach qconfig for DeQuantStub

Imported from OSS

Differential Revision: D18775275

fbshipit-source-id: 8da07e40963555671b3d4326c9291706103f858e
2019-12-03 11:34:28 -08:00
f114c33e69 Fix iOS CI (#30327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30327

### Summary

Seems like starting from macOS 10.15, we can no longer get access to the `Downloads` folder in our macOS machines.

```
permissionError: [Errno 1] Operation not permitted: '/Users/distiller/Downloads'
```

The fix is to change the conda download directory to ${HOME}

### Test Plan

- iOS jobs are back to normal
- Don't break other jobs

Test Plan: Imported from OSS

Differential Revision: D18717380

Pulled By: xta0

fbshipit-source-id: cad754076bf4ae5035741aa57a310ad87c76726e
2019-12-03 11:24:21 -08:00
1b12fd33ed Add missing trigramma_stub definition. (#30314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30314

Somehow we forgot to define it!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18762356

Pulled By: ezyang

fbshipit-source-id: 28afc605ad986266071e3831049ec8a7f71fd695
2019-12-03 10:46:52 -08:00
a009fc14be Workaround hcc bug regarding extern "C" definitions (#30313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30313

See comments in code about the bug.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18762360

Pulled By: ezyang

fbshipit-source-id: 406a01f2f0c3722b381428c89afd67b3c3c19142
2019-12-03 10:46:48 -08:00
8269f7b652 Delete redundant THC_API on THCStorage_new (#30312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30312

It's not necessary because it's already defined in the header.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18762363

Pulled By: ezyang

fbshipit-source-id: 418bf355d460dd171ac449559f20bf55415e54ae
2019-12-03 10:46:43 -08:00
d43e205026 Properly include declaration of dispatch in file that registers it. (#30311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30311

multinomial_stub must be in scope to register against it.  Somehow,
this works today, but when I split torch_cpu and torch_cuda it
doesn't.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18762358

Pulled By: ezyang

fbshipit-source-id: ef9c111292cd02d816af1c94c8bbaadabffaabe5
2019-12-03 10:46:38 -08:00
a5b1f6e7d7 Add missing _API definitions. (#30310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30310

- Annotate CUDAGenerator.h with correct TORCH_CUDA_API.
  This is actually CUDA related functionality with its implementation living
  in the cuda/ folder.  For some reason it lives at the top level; it
  should be moved (but that should be handled in another PR.)
- Add missing TORCH/CAFFE_API annotations to.  All of
  these functions are used from CUDA code, which means that
  we need to correctly annotate them if we split CPU/CUDA code
  into separate libraries.

Test Plan: Imported from OSS

Differential Revision: D18762357

Pulled By: ezyang

fbshipit-source-id: c975a8e4f082fe9f4196c2cca40977623caf4148
2019-12-03 10:46:32 -08:00
08394cede3 DEFINE_DISPATCH in the correct namespace. (#30308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30308

Dispatch is declared in non-anonymous namespace, so it definitely
shouldn't be defined in an anonymous namespace.  This doesn't seem
to matter today, but it matters when we split libtorch into two
libraries.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18762361

Pulled By: ezyang

fbshipit-source-id: 484f0fab183c385dd889db9dad3e48e92e0a3900
2019-12-03 10:46:27 -08:00
9740011f10 Use normal dispatch to get to CUDA threshold kernels, instead of DispatchStub. (#30307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30307

DispatchStub will stop working when I split CPU/CUDA libraries, because
there are some symbols from the templates in DispatchStub stubs which aren't
properly exported and I couldn't figure out how to make them dispatch properly.

This is the only case where DispatchStub is being used to dispatch to CUDA,
anyway.

This partially addresses #29844 but I need to also just completely delete
the CUDA registration logic from DispatchStub entirely.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18762362

Pulled By: ezyang

fbshipit-source-id: bdfa8739c0daf23badf3c5af61890a934af00813
2019-12-03 10:46:22 -08:00
a997f224ac Add torch.multiprocessing.create_processes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493

Differential Revision: D18766066

Pulled By: ailzhang

fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89
2019-12-03 10:38:19 -08:00
4d30415f12 Add ONNX Scripting Conv Support (#30618)
Summary:
Convolution nodes are traced as aten:_convolution and are currently supported in ONNX.
Scripting convolution uses aten:conv<1,2,3>d which are currently not supported in ONNX.
This PR adds the symbolics for aten:conv<1,2,3>d and aten:conv_transpose<1,2,3>d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30618

Reviewed By: hl475

Differential Revision: D18778145

Pulled By: houseroad

fbshipit-source-id: 4af0379f29974a1ce8443024d1d87b3eb8d2dd36
2019-12-03 10:28:38 -08:00
89be1a22d4 split getInvokedMethods (#30546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30546

factor out this function for later support of quantizing shared types

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D18776304

fbshipit-source-id: f5a736b0f69019cefe17ec4517da1ae5462f78e1
2019-12-03 10:11:57 -08:00
d5c136097a improve .view() performance (#30554)
Summary:
Improve .view() performance by not calling set_ and instead restriding returned alias. This improves performance of .view() operation from ~500ns to ~360 ns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30554

Test Plan: covered by existing tests

Differential Revision: D18759896

Pulled By: ngimel

fbshipit-source-id: 9757c93158bc55e9c87dc30ac3415ba8f8b849e5
2019-12-03 09:17:43 -08:00
5a484245d9 Change test_invalid_names test to only test constructor of WorkerInfo (#30620)
Summary:
This tests seems to only test that we throw exceptions in the `WorkerInfo` constructor when invalid names are passed in, so I don't think we need to complicate by initializing RPC, and exposing ourselves to potential flakiness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30620

Differential Revision: D18766955

Pulled By: rohan-varma

fbshipit-source-id: 11643de4d57431e5f46e096c7766de3ab0b9b05a
2019-12-03 09:07:10 -08:00
f9f54201d3 Remove deprecated fromIvalue in RRefForkData
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30646

Test Plan: Imported from OSS

Differential Revision: D18777610

Pulled By: mrshenli

fbshipit-source-id: 7a749c1035e36bbb464332d3829fd53e2c6cf727
2019-12-03 09:01:40 -08:00
b446572997 TestCppExtension now removes /tmp/torch_extensions folder so that it can be used by other users in a multi-user environment. (#30095)
Summary:
Previous behaviour: a user runs tests from `TestCppExtension` class so that `/tmp/torch_extensions` is created under her ownership and not removed afterwards,
then the other user's run of the same tests might result in 'Permission denied' exception upon deleting `/tmp/torch_extensions`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30095

Differential Revision: D18770234

Pulled By: ezyang

fbshipit-source-id: 4c6b972e4c4327a94c8b4bf6b0b9998a01c218bb
2019-12-03 07:44:27 -08:00
8b29701ae5 Turn off scalar_checks for _th_reciprocal. (#30436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30436

The underlying TH implementation is correct.

Test Plan: Imported from OSS

Differential Revision: D18699088

Pulled By: gchanan

fbshipit-source-id: e75a588ae4afb0506922ba98208546d5c0de623a
2019-12-03 07:04:53 -08:00
61798865e3 Turn off scalar_checks for torch.clamp. (#30435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30435

The underlying THC implementations are correct.

Test Plan: Imported from OSS

Differential Revision: D18699089

Pulled By: gchanan

fbshipit-source-id: f5d1319bf48eae36903296dad0b98ed80661f732
2019-12-03 07:04:47 -08:00
e5b947a3a8 Raise an error for is_signed on quantized types (#30527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527

When we introduced dtype.is_signed we allowed for support of
quantized types, but we're not sure what the correct result should be.

See discussion at https://github.com/pytorch/pytorch/pull/29511

Test Plan: Imported from OSS

Differential Revision: D18765410

Pulled By: nairbv

fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d
2019-12-03 06:34:53 -08:00
18ec4632b3 Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (#30626)
Summary:
PR https://github.com/pytorch/pytorch/pull/30523 attempted to fix https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462, but the fix wasn't complete. This PR makes the following improvements:
1. Fixes https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462 properly by excluding undefined tensors in the result of `Module::parameters()` / `named_parameters()` / `buffers()` / `named_buffers()`, which mirrors the Python API behavior.
2. Audits all use sites of `Module::parameters_` / `buffers_` and change them to `Module::named_parameters(/*recurse=*/false)` / `named_buffers(/*recurse=*/false)` when appropriate, so that use sites of module parameters / buffers never need to worry about undefined tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30626

Differential Revision: D18777507

Pulled By: yf225

fbshipit-source-id: 55b64b69779e1186342efd3c44857f416334ed6b
2019-12-02 21:59:58 -08:00
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
0bebfe2143 Add the explicit per-tensor/per-channel quant info when we print the module (#30591)
Summary:
As Title says. We would like to explicitly distinguish per-tensor/per-channel scheme when we print the module.

Here is an example for Lenet after applying the per-channel dynamic quantization:

Before this PR:
```
FloatModel(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
  (fc1): DynamicQuantizedLinear(
    in_features=800, out_features=500
    (_packed_params): LinearPackedParams()
  )
  (fc2): DynamicQuantizedLinear(
    in_features=500, out_features=10
    (_packed_params): LinearPackedParams()
  )
)
```

After this PR:
```
FloatModel(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
  (fc1): DynamicQuantizedLinear(
    in_features=800, out_features=500, qscheme=torch.per_channel_affine
    (_packed_params): LinearPackedParams()
  )
  (fc2): DynamicQuantizedLinear(
    in_features=500, out_features=10, qscheme=torch.per_channel_affine
    (_packed_params): LinearPackedParams()
  )
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30591

Differential Revision: D18764366

Pulled By: jianyuh

fbshipit-source-id: e897ab42ace6b82b2a90729ba788313c7873de1a
2019-12-02 20:14:46 -08:00
4dab29a2bd Fix serialization memory lifetime issue. (#30603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30603

Pickler object needs to be kept in scope until data is written out to the
final serialized string. tensorData in particular is a reference to memory
owned by the descoped Pickle object.

Noticed this by inspection. In practice, this potential read-after-free here
is limited to non-cpu tensors, and any such use was very soon after free.
ghstack-source-id: 94756036

Test Plan: existing test suite at buck test mode/dev-nosan caffe2/test:rpc_fork

Differential Revision: D18760463

fbshipit-source-id: 9de890d66626aa48f13ca376dd9bd50b92e0cb00
2019-12-02 20:10:28 -08:00
db81e13d6b Fix TCPStoreTest and improve tcputils::connect() (#30354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30354

TCPStoreTest would timeout since the TCPStore constructor for the
server would block the main thread waiting for workers. The workers themselves
were spawned later on once the server store is created. As a result, this test
would always timeout.

To fix the test, I moved the server store to a thread so that the workers can
register with the server in parallel.

In addition to this made a few improvements to tcputils::connect. When
tcputils::connect() encountered an exception, it always looked at `errno` for
the error code. In some cases `errno` could be overwritten and the real error
code would be stored in `std::system_error`. As a result, I've modified the
code to look at the error code in `std::system_error` if we catch an exception
of that type.
ghstack-source-id: 94758939

Test Plan: waitforbuildbot

Differential Revision: D18668454

fbshipit-source-id: d5a3c57b066b094bfecda9a79d9d31bfa32e17f0
2019-12-02 19:52:34 -08:00
9e3d19412b Disable implicit conversion warning (#30529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30529

We started to see build failures for multiple services with top-of-trunk LLVM compiler. The failures point to a warning that was treated as error for implicit conversion from long to double. Per discussion on D18642524, I'm disabling this warning from the containing TARGET file. T58053069 opened for code owner to track this - a proper source code fix and more unit test is needed.

Test Plan: local build, sandcastle

Reviewed By: smessmer

Differential Revision: D18668396

fbshipit-source-id: 28c0ff3258c5ba3afd41a0053f9fe1b356a496a8
2019-12-02 18:30:03 -08:00
968c0d4a46 Add support for converting quantized AvgPool2d and Reshape operations (#30490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30490

Add symbolic mapping to Int8AvgPool2d and Int8Reshape op in C2

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps

Imported from OSS

Differential Revision: D18740520

fbshipit-source-id: 1606125500c4b549fbc984e7929b7fd5204396a0
2019-12-02 18:15:01 -08:00
2d0a4e42e9 Add barriers to fix flaky test_graph_for_py_nested_call and (#30624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30624

These tests were flaky since we would end up calling the 'verify'
methods before some of the RPCs were done. The `check_rpc_done` function might
not guarantee this since set_rpc_done sets an appropriate flag in python which
causes `check_rpc_done` to pass. Although, there are a few steps after that
like attaching the send functions for the response of the RPC that might not
have executed by then.
ghstack-source-id: 94781954

Test Plan: Run the tests 100 times.

Reviewed By: zhaojuanmao

Differential Revision: D18768786

fbshipit-source-id: a14c3f4b27de14fe5ecc6e90854dc52652f769b8
2019-12-02 18:12:28 -08:00
98ab55fc51 PRAGMA missing for clang (#30351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30351

Not sure what proper fix is, clang is having trouble with the loop pragmas. This at least gets things compiling.
ghstack-source-id: 94458450

Test Plan: CI passes

Differential Revision: D18665812

fbshipit-source-id: b8a899ce4138010cbe308eaa2c0838dd9e15573f
2019-12-02 17:50:22 -08:00
9c02b88791 Add pickler support for Device (#30131)
Summary:
This PR adds (un)pickling support for `c10::Device`. It also adds `torch.device` as a type annotation for device attributes.
](https://our.intern.facebook.com/intern/diff/18664421/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30131

Pulled By: driazati

Differential Revision: D18664421

fbshipit-source-id: 64378fb42b2d1bbe2bd86259e5ed10f24b5d1e49
2019-12-02 17:43:08 -08:00
19b7d49fac Add TOC to CONTRIBUTING.md (#29671)
Summary:
This TOC is manually generated but `CONTRIBUTING.md` seems like its
stable enough for that to be okay
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29671

Pulled By: driazati

Differential Revision: D18771604

fbshipit-source-id: 0d6c9c6cf1083d3be413219d3cead79c2fe5050b
2019-12-02 16:47:59 -08:00
569729527b Turn off scalar_checks for exp, cos, cosh, tan, atan, tanh, erf, erfc. (#30434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30434

These are all pointwise ops that are implemented correctly wrt shapes in THC.

Test Plan: Imported from OSS

Differential Revision: D18699087

Pulled By: gchanan

fbshipit-source-id: 82cb91b00c77bfaca75be497c87fc7ae52daf46c
2019-12-02 16:10:25 -08:00
9082123038 Back out "Back out "Revert D18542342: Boxed variable dispatch""
Summary: Original commit changeset: 7f3e32a6ee0c

Test Plan: waitforsandcastle

Reviewed By: ezyang

Differential Revision: D18766763

fbshipit-source-id: 51bb7aac7cb7ce3df94681e838949e7a156e3ad9
2019-12-02 16:06:36 -08:00
3636cb0364 windows build (#30556)
Summary:
based on https://github.com/pytorch/pytorch/pull/28677
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30556

Differential Revision: D18764040

Pulled By: mingbowan

fbshipit-source-id: 53104636800f5887b74a82c154bc5e9603de9322
2019-12-02 14:54:22 -08:00
d32f261f16 make the order btw div and mul in adagrad update consistent (#30449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30449

There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad.
In this diff we first compute effective_lr = lr / (sqrt(moment) + epsilon) and then multiply with gradient.

Test Plan: CI

Reviewed By: protonu

Differential Revision: D18703416

fbshipit-source-id: 2a8b2a3f5401466549561412bd22f07abac3c598
2019-12-02 13:53:38 -08:00
1111a6b810 Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/29095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274

Differential Revision: D18762293

Pulled By: ezyang

fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9
2019-12-02 12:19:58 -08:00
6deb41c88d Update magma to 2.5.1 for Windows and switch CUDA in CI to 9.2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30513

Differential Revision: D18764184

Pulled By: ezyang

fbshipit-source-id: 4992869fd6a89471a5d25eb6a9b44ad8eceb480f
2019-12-02 11:56:10 -08:00
b68d1fc316 add small input shapes to some ops (#30617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617

as title

Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split

Reviewed By: hl475

Differential Revision: D18764248

fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19
2019-12-02 10:46:43 -08:00
8ee61e0be4 Fix CPU_INTEL flag error on windows (#30564)
Summary:
${CMAKE_HOST_SYSTEM_PROCESSOR} get processor name by `uname -p` on linux and `%PROCESSOR_ARCHITECTURE%` on windows
1. %PROCESSOR_ARCHITECTURE% has value in (AMD64|IA64|ARM64) for 64-bit processor, and (x86) for 32-bit processor
2. `uname -p` has value like "(x86_64|i[3-6]+86)"
We cannot tell intel cpu from other cpus by ${CMAKE_HOST_SYSTEM_PROCESSOR}. It is the architecture, not provider.
i. e. Intel CPU i7-9700K CPU on windows get "AMD64"

reference:
[MSDN](https://docs.microsoft.com/zh-cn/windows/win32/winprog64/wow64-implementation-details?redirectedfrom=MSDN)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30564

Differential Revision: D18763031

Pulled By: ezyang

fbshipit-source-id: 11ae20e66b4b89bde1dcf4df6177606a3374c671
2019-12-02 08:43:01 -08:00
e6000a7c04 Temporarily disable test_numerical_consistency_per_tensor (#30600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30600

test_numerical_consistency_per_tensor in test_fake_quant is failing on Windows.
ghstack-source-id: 94742124

Test Plan: CircleCI tests

Differential Revision: D18760287

fbshipit-source-id: 7f59355eab74e811bb370ad2836ed2f1def1f621
2019-12-02 06:57:14 -08:00
c780610f2d Disable test_backward_per_tensor in test_fake_quant (#30594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30594

This testcase started breaking, clean up for the build.
ghstack-source-id: 94736837

Test Plan: Unittest disabling change

Differential Revision: D18758635

fbshipit-source-id: 05df1158ff0ccd75e401f352da529fb663b1cae0
2019-12-01 22:26:28 -08:00
53785771a7 Don't build test_cpp_rpc if torch is built without distributed support (#30587)
Summary:
On the latest master, I get link errors when building one of the tests:

```sh
/home/pbell/git/pytorch/build/../test/cpp/rpc/test_wire_serialization.cpp:23:
undefined reference to `torch::distributed::rpc::wireDeserialize(void const*, unsigned long)'
```

This seems to be caused by PR https://github.com/pytorch/pytorch/issues/29785 not working with `USE_DISTRIBUTED=0`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30587

Differential Revision: D18758625

Pulled By: jjlilley

fbshipit-source-id: 0ad0703acdbbac22bb4b8317370fbe2606fcb67e
2019-12-01 16:43:12 -08:00
dd52f50fc8 Add examples to RRef doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30516

Test Plan: Imported from OSS

Differential Revision: D18728183

Pulled By: mrshenli

fbshipit-source-id: af472ebed0e6dd0a85653b080abd3ac4d482bd26
2019-11-28 15:34:26 -08:00
30d70d5378 Make doc source format consistent in rpc/init.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30515

Test Plan: Imported from OSS

Differential Revision: D18728184

Pulled By: mrshenli

fbshipit-source-id: 7b643c7f8225943113fbd7130ff6aadb30c1d4e9
2019-11-28 15:34:22 -08:00
ec5e471647 Reorganize rpc API doc and add introduction (#30491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30491

Our RPC API docs presents the APIs well but misses a general
introduction to the APIs. Readers might be a little lost the first
time landing this page. This commits reorganizes the APIs into
four components from user's perspective, RPC, RRef, dist autograd,
and dist optimizer. It also adds an intro to each and briefly
discribes why we provide those.

Test Plan: Imported from OSS

Differential Revision: D18723294

Pulled By: mrshenli

fbshipit-source-id: 4aced4ab537b070aa780aaaf9724659fd47cb3cb
2019-11-28 15:34:18 -08:00
f4e7e9039d Improve process_group_agent() serialization speed (#29785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29785

TLDR: This change improves process_group's serialization speed:
  Serialize_Tensor64:     12.38us ->   1.99us  (~-84%)
  Deserialize_Tensor64:   33.89us ->   5.62us  (~-84%)
  Serialize_Tensor1M:    525.74us -> 285.43us  (~-45%)
  Deserialize_Tensor1M:  892.61us -> 273.68us  (~-70%)

After speaking with the jit team, we had consensus that torch::save()/load()
are somewhat high-overhead for RPC serialization, mostly intended for
persistent disk data.

(Particularly, for large tensors, 35% of the time is spent in CRC checking, even
with the fb-side changes to subsitute 40x faster SSE-accelerated crc checking;
Also, for small tensors, the zip container overhead is considerable, as is the
overhead of lexing/parsing an embedded text python program for each RPC).

The jit team encouraged us to use jit::pickler, with the WriteableTensorData
way of outputting result tensors (not the default side-tensor table, or
with pickling the actual tensors). This ends up just pickling some tensor
metadata, and giving us some tensor blobs that we can mindlessly
blit over the wire (they copy to cpu memory if needed).

There is yet no standardized container format for the pickled data
(there is jit::pickle_save() checked in, but but it's experimental,
no load function is yet provided), but they encouraged us to just use
something sensible for this, and possibly revisit later. For now, I made
the directory headers slightly http-inspired.

Note that serialization is just one component of the pipeline, but that
said, we also see reasonable reductions in end-to-end echo times (noisier):
   ProcessGroupAgent_Echo(Tensor_Small)   855.25us -> 492.65us  (~-42%)
   ProcessGroupAgent_Echo(Tensor_1M)       10.82ms -> 6.94ms    (~-35%)
   ProcessGroupAgent_Echo(Small_NoTensor) 688.82us -> 301.72us  (~-56%)
   ProcessGroupAgent_Echo(1MB_NoTensor)     4.65ms -> 3.71ms    (~-20%)

I moved the "wire serialization" logic to a separate file to assist with
unittesting.
ghstack-source-id: 94694682

Test Plan:
buck test mode/dev-nosan caffe2/test/cpp/api:serialize
  buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18493938

fbshipit-source-id: 07ddfe87dbe56472bc944f7d070627052c94a8f4
2019-11-28 09:57:52 -08:00
1350b99de4 Add local shutdown to process group agent (#30330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330

This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.

ghstack-source-id: 94673884
ghstack-source-id: 94673884

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18661775

fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
2019-11-27 22:34:08 -08:00
7ac8efa689 Skip undefined tensors when moving torch::nn module to a different device (#30523)
Summary:
This fixes high-pri issues such as https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30523

Differential Revision: D18732904

Pulled By: yf225

fbshipit-source-id: fe5a7a43838000f5803bd9c01ecfba0c3f02df5d
2019-11-27 21:21:02 -08:00
640109ae5d Back out "Revert D18542342: Boxed variable dispatch"
Summary: Original commit changeset: 082992125447

Test Plan: waitforsandcastle

Reviewed By: akinh

Differential Revision: D18737627

fbshipit-source-id: 7f3e32a6ee0c330002ae7fdcc8a35e8b540bb4db
2019-11-27 17:39:09 -08:00
87f29557bd Ignore logical_and and logical_or in op BC check for now (#30537)
Summary:
Get the CI happy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30537

Reviewed By: hl475

Differential Revision: D18738567

Pulled By: houseroad

fbshipit-source-id: f30a87e22653b83ebdb1b54851460ec245866ecf
2019-11-27 16:59:37 -08:00
a2ed50c920 Revert D17908478: Switch PyTorch/Caffe2 to C++14
Test Plan: revert-hammer

Differential Revision:
D17908478

Original commit changeset: 6e340024591e

fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d
2019-11-27 14:57:05 -08:00
0b25371f5d Turn off scalar_check for _th_normal.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29955

Test Plan: Imported from OSS

Differential Revision: D18548051

Pulled By: gchanan

fbshipit-source-id: c652999ac9e37d2592aa85ef022040fe0700b5cf
2019-11-27 14:52:06 -08:00
f3631c2464 Revert D18542342: Boxed variable dispatch
Test Plan: revert-hammer

Differential Revision:
D18542342

Original commit changeset: a30ae35d98f8

fbshipit-source-id: 082992125447c814c90f7934fadf00995e146e0e
2019-11-27 14:01:40 -08:00
7d2b0aa693 add retries to network operations (curl, conda install, git clone) (#30479)
Summary:
Addresses some of the top network-related flakiness occurrences.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30479

Differential Revision: D18736386

Pulled By: kostmo

fbshipit-source-id: 9eb5dca0cd0281894a0b304fbaf59a0341d3ff58
2019-11-27 13:58:15 -08:00
c1c5622a6a Add katex to pytorch-linux-xenial-py3.6-gcc5.4 docker image (#30522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30522

This is in preparation for moving the docs push CI jobs to depend on
`pytorch-linux-xenial-py3.6-gcc5.4` rather than
`pytorch-linux-xenial-cuda9-cudnn7-py3`.

Test Plan: Imported from OSS

Differential Revision: D18731108

Pulled By: zou3519

fbshipit-source-id: fd753a5ca818fa73a14e4276c33368a247cc40e1
2019-11-27 12:41:58 -08:00
a69be8123a Use gettimeofday on iOS (#30361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30361

### Summary

By default, the compiler will choose `clock_gettime` for the iOS build. However, that API is not available until iOS 10. Since the Facebook app still supports iOS 9.0,  we have to use `gettimeofday` instead.

```shell
xplat/caffe2/torch/csrc/autograd/profiler.h:86:3: error: 'clock_gettime' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability]

xplat/caffe2/torch/csrc/autograd/profiler.h:86:17: error: '_CLOCK_MONOTONIC' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability]
```

P.S. the open-sourced version is iOS 12.0 and above, so we don't have this problem.

### Test Plan

- buck build works
- Don't break CIs

Test Plan: Imported from OSS

Differential Revision: D18730262

Pulled By: xta0

fbshipit-source-id: fe6d954b8d3c23cbc9d1e25a2e72e0b0c1d4eaa9
2019-11-27 11:48:41 -08:00
2f42488d36 Updating submodules
Summary:
GitHub commits:

64dc8e79e9
3b2aa3c218
dc6c17ca9e
4508ea4e06
6150034ff3
12b7a89a4b
9befbe9b40
2fd96cc070
68bf04ce46
19bd96d453
7229ad4fd7
b2bb2b465b
4c65c9023d

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: e7dc6a4ebafdc6a01aff89f4038f5679ed6e7011
2019-11-27 11:44:54 -08:00
106ab487eb fix typo in doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30518

Differential Revision: D18729361

Pulled By: albanD

fbshipit-source-id: 4e386b99e898b9cd8f9a21dff642d0f40355899f
2019-11-27 11:19:13 -08:00
fcb7371e65 Update docs for cpp_extension on Windows (#30392)
Summary:
Targets https://github.com/pytorch/pytorch/issues/30379.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30392

Differential Revision: D18730438

Pulled By: albanD

fbshipit-source-id: f718d006ee8aaaa356c1e15e53a0469f15e8ed41
2019-11-27 10:56:29 -08:00
d0acc9c085 Switch PyTorch/Caffe2 to C++14 (#30406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406

ghstack-source-id: 94642238

Test Plan: waitforsandcastle

Differential Revision: D17908478

fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb
2019-11-27 10:47:31 -08:00
ec5c08de74 Revert D18580867: Add logical_and and logical_or
Test Plan: revert-hammer

Differential Revision:
D18580867

Original commit changeset: 7e4d7c37da4d

fbshipit-source-id: 81fb604c7aef8d847f518f5faa016e7bd0423016
2019-11-27 09:27:00 -08:00
1e8ed021c6 Support logsoftmax with dim != -1 (#30433)
Summary:
PyTorch dim and ONNX axis have different meanings.
ONNX only supports log_softmax with dim = -1. Transpose must be added before and after log_softmax to support other cases.
This requires input rank to be known at export time.
Fixes https://github.com/pytorch/pytorch/issues/17918
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30433

Reviewed By: hl475

Differential Revision: D18723520

Pulled By: houseroad

fbshipit-source-id: d0ed3b3f051d08d46495a7abfa854edd120dca3a
2019-11-27 08:34:38 -08:00
0282c5ae69 Add helper to aggregate multiple process groups (#25768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25768

The round robin process group can be constructed from multiple other
process groups. Every collective call against this new process group
is delegated to the specified process groups in a round robin fashion.

Doing so may benefit performance when calling into multiple NCCL
process groups. Instead of adding support for round-robin usage of
NCCL communicators, we achieve the same without changing the NCCL
process group and adding this wrapper class.

The API to create this round robin process group is a bit harsh. If we
find it adds significant benefit we can revisit and make this a first
class citizen in the torch.distributed module.
ghstack-source-id: 94578376

Test Plan: The newly added test passes.

Reviewed By: chenyangyu1988

Differential Revision: D17226323

fbshipit-source-id: ec9f754b66f33b983fee30bfb86a1c4c5d74767d
2019-11-27 08:34:34 -08:00
1d3f3a1a0c Add pybind11 trampoline class for c10d.Store (#30415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30415

This enables subclassing of c10d.Store and implementing its interface in Python.
ghstack-source-id: 94586627

Test Plan: New tests passes.

Reviewed By: vladbelous

Differential Revision: D18693018

fbshipit-source-id: fa1eba4bd11cc09a3d6bf3f35369c885033c63c0
2019-11-27 08:34:29 -08:00
d2336edcfb Boxed variable dispatch (#29934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29934

Previously, when doing boxed dispatch (e.g. custom ops), the dispatcher manually removed the VariableTensorId flag before dispatching
because custom ops don't have variable kernels.
This is one of the blockers that prevented us from using the boxed dispatch mechanism for ops from native_functions.yaml because they define variable kernels and need them to be called for autograd.

This PR changes that. The dispatcher doesn't remove the VariableTensorId flag anymore.
Instead, to make custom ops work, we implement a variable fallback kernel that is called whenever no other variable kernel was found.
ghstack-source-id: 94618474

Test Plan: unit tests

Differential Revision: D18542342

fbshipit-source-id: a30ae35d98f89f7ae507151f55c42cfbed54a451
2019-11-27 08:34:25 -08:00
512c2a2df5 Enable constant folding (#29834)
Summary:
Set default do_constant_folding = True
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29834

Reviewed By: hl475

Differential Revision: D18588037

Pulled By: houseroad

fbshipit-source-id: b35c06161321629c886e177ea666eff31cebf06a
2019-11-27 08:34:20 -08:00
c1c8105de0 Make the warning of using SparseTensor in JIT less noisy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30499

Test Plan: waitforsandcastle

Reviewed By: wanchaol

Differential Revision: D18705553

fbshipit-source-id: d6e16e3285a74a1c031a5312f7a690f1baf392f8
2019-11-27 08:34:16 -08:00
829499e626 avoid Formatting::print() when STRIP_ERROR_MESSAGES is set (#30451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30451

TORCH_CHECK takes __VA_ARGS__ so there is no need to concatenate strings
before calling it. This way it won't call Formatting::print() on the
tensor when STRIP_ERROR_MESSAGES macro is set. Formatting::print() calls
several specific tensor methods that brings in unnecessary inter-op
dependencies for static code analysis.

Test Plan: - builds

Differential Revision: D18703784

Pulled By: ljk53

fbshipit-source-id: 1c0628e3ddcb2fd42c475cb161edbef09dfe8eb5
2019-11-26 17:38:45 -08:00
2d6b2f39e9 Fix docs so that the example works (#30120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30120

The example given for functional conv2d didn't work. This diff fixes the example in docs so that it works.

Fixes https://github.com/pytorch/pytorch/issues/29649
ghstack-source-id: 94601559

Test Plan: Tried the example locally

Differential Revision: D18604606

fbshipit-source-id: ff1a4f903e2843efe30d962d4ff00e5065cd1d7e
2019-11-26 17:38:40 -08:00
5ada5363fc GenericDict/List type use unshapedType() (#30428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30428

Reported issue https://discuss.pytorch.org/t/incomprehensible-behaviour/61710

Steps to reproduce:

```
class WrapRPN(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, features):
        # type: (Dict[str, Tensor]) -> int
        return 0
```

```
#include <torch/script.h>

int main() {
  torch::jit::script::Module module = torch::jit::load("dict_str_tensor.pt");

  torch::Tensor tensor = torch::rand({2, 3});
  at::IValue ivalue{tensor};
  c10::impl::GenericDict dict{c10::StringType::get(),ivalue.type()};
  dict.insert("key", ivalue);
  module.forward({dict});
}
```

ValueType of `c10::impl::GenericDict` is from the first specified element as `ivalue.type()`
It fails on type check in` function_schema_inl.h` !value.type()->isSubtypeOf(argument.type())
as `DictType::isSubtypeOf` requires equal KeyType and ValueType, while `TensorType`s are different.

Fix:
Use c10::unshapedType for creating Generic List/Dict

Test Plan: Imported from OSS

Differential Revision: D18717189

Pulled By: IvanKobzarev

fbshipit-source-id: 1e352a9c776a7f7e69fd5b9ece558f1d1849ea57
2019-11-26 17:38:36 -08:00
6bd8937aee FunctionParameter::set_default_str replace || with &&
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30471

Test Plan: Imported from OSS

Differential Revision: D18710958

Pulled By: pbelevich

fbshipit-source-id: 7e5339175c7e16cd975a90bf6b123df728045e4d
2019-11-26 17:38:31 -08:00
21d7532dfe Add more comment on NumPy detection in Python scripts.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30417

Differential Revision: D18716502

Pulled By: albanD

fbshipit-source-id: 0b1b86f882e0e24cb6845e4a44708048e7e3b4a8
2019-11-26 17:38:27 -08:00
8bbafa0b32 Add logical_and and logical_or (#28162)
Summary:
Superseding https://github.com/pytorch/pytorch/issues/24379 as type promotion has been implemented.

Close https://github.com/pytorch/pytorch/issues/24379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28162

Differential Revision: D18580867

Pulled By: ailzhang

fbshipit-source-id: 7e4d7c37da4dc8df87314bd4f1f6a7539e46586a
2019-11-26 17:38:22 -08:00
92e27c5e89 Flag to disable Variable
Summary:
using `buck build mode/opt mode/no-gpu //experimental/ngimel/benchmark_framework_overheads:cpp_benchmark`

```
devvm497.prn3.facebook.com:/data/users/bwasti/fbsource/fbcode $ ./cpp_benchmark --niter 10000
creating inputs, number of dimensions 1
starting op
benchmarking 10000 iterations
using cpp frontend
elapsed time per iteration 0.90638 us
```

```
devvm497.prn3.facebook.com:/data/users/bwasti/fbsource/fbcode $ ./cpp_benchmark --niter 10000 --disable_variable_dispatch
creating inputs, number of dimensions 1
starting op
benchmarking 10000 iterations
using cpp frontend
elapsed time per iteration 0.775436 us
```

Test Plan: let all tests run

Reviewed By: smessmer

Differential Revision: D18654276

fbshipit-source-id: 362812b2c87ec428448b2ac65baac45f492fdce4
2019-11-26 17:38:18 -08:00
4eff2f2007 Fix missing closing quotes in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30448

Differential Revision: D18711396

Pulled By: zou3519

fbshipit-source-id: 6e35e0779716185791273eedca7a93667a6cda90
2019-11-26 17:38:13 -08:00
05a1644ce3 Fix BC for quantized linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30481

Test Plan: Imported from OSS

Differential Revision: D18714602

Pulled By: jamesr66a

fbshipit-source-id: d51206c22cf2446e98053446789c6324c0481321
2019-11-26 17:38:09 -08:00
976d91d30a Comment on a set of ops bound at the python layer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30420

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D18713999

Pulled By: eellison

fbshipit-source-id: 3a8d6e4431cbfe6a78ca047217c1c53c47403841
2019-11-26 17:38:04 -08:00
634f370c63 Add comment to ops bound at python layer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30419

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D18714000

Pulled By: eellison

fbshipit-source-id: 22ccb941b2db24031921f378c600e68fe70e1346
2019-11-26 17:37:59 -08:00
c5a6c4d6c9 Adding elementwise kernel also operating on index (#28175)
Summary:
This PR add `gpu_kernel_with_index` as an addition to element-wise kernel template. It allows kernel to not only operate on input tensor value, but also each values index(view as 1d, so from 0 to numel) within the lambda.
Direct use case here is to replace thrust::tabulate used in range/arange/linspace. Benifits are:
- thrust::tabulate causes additional unneccessary synchronization on cpu.
- Now it works with tensor iterator, output no longer needs to be contiguous and a memcpy is saved

It can also potentially be reused to add new function to pytorch later, if we see use case both value and index is needed.(for example unify tril/triu into tensor iterator element-wise? add other pattern?)

Known issues:
https://github.com/pytorch/pytorch/pull/23586 is needed to enable non-contiguous case work properly, since overlapping needs to be checked. Currently non-contiguous tensor falls into TOO_HARD. I could write proper check in this file but I figured using exist method is better. jjsjann123
It does not work beyond 32bit indexing. But thrust was erroring on those case too. We could split tensor in caller to enable this. Index changes after split, so it is easier for caller to pass different lambda, and harder for the template to handle it in general.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28175

Differential Revision: D18708649

Pulled By: ngimel

fbshipit-source-id: 382081c96f266ae7b61095fc1f2af41c6b210fa9
2019-11-26 17:37:55 -08:00
e9cc4a5942 Add @DoNotStrip to nativeNewTensor method. (#30472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30472

Add DoNotStrip to nativeNewTensor method.
ghstack-source-id: 94596624

Test Plan:
Triggered build on diff for automation_fbandroid_fallback_release.

buck install -r fb4a

Tested BI cloaking using pytext lite interpreter.

Obverse that logs are sent to scuba table:

{F223408345}

Reviewed By: linbinyu

Differential Revision: D18709087

fbshipit-source-id: 74fa7a0665640c294811a50913a60ef8d6b9b672
2019-11-26 12:16:33 -08:00
fec903ce00 Fix test case after get_qparams refactor (#30470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30470

att

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18710775

fbshipit-source-id: b1c7c0afbc538ff1d3e19c5d3d6bd425e4f94f06
2019-11-26 12:16:29 -08:00
b0871f211b Make all optimizers consistent so that they don't change gradients inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257

Test Plan: Imported from OSS

Differential Revision: D18665461

Pulled By: albanD

fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95
2019-11-26 12:16:25 -08:00
45880f4246 Change logging to remove the word "error" from info log
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30468

Reviewed By: xianjiec

Differential Revision: D18702959

fbshipit-source-id: a777445bea735dce89182dd95f38907963fab556
2019-11-26 12:16:21 -08:00
dcd9f49809 Specify ordering on singular values and eigenvalues output from torch… (#30389)
Summary:
….svd/symeig respectively

Changelog:
- Adds a note to docstrings of the both functions specifying the ordering

Fixes https://github.com/pytorch/pytorch/issues/30301
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30389

Differential Revision: D18707608

Pulled By: zou3519

fbshipit-source-id: b0f73631578f39a24fae9af4997c6491de8be9a8
2019-11-26 10:23:47 -08:00
dbce53fe32 Turn off scalar_check for _th_gather. (#29954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29954

The underlying op handles scalar_check correctly.

Test Plan: Imported from OSS

Differential Revision: D18548054

Pulled By: gchanan

fbshipit-source-id: a1b44afa80c2928b78abbfba8b8b5d3608ac0fd3
2019-11-26 10:23:42 -08:00
72ac45662b Turn off scalar_checks for torch.take. (#29953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29953

The underlying function handles it correctly.

Test Plan: Imported from OSS

Differential Revision: D18548055

Pulled By: gchanan

fbshipit-source-id: cc2d0ae37d9689423363d115c6a653cb64840528
2019-11-26 10:23:37 -08:00
79a830af56 Turn off scalar_check for Tensor.set_(Tensor) (#29952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29952

The underlying op handles the check correctly.

Test Plan: Imported from OSS

Differential Revision: D18548048

Pulled By: gchanan

fbshipit-source-id: 9ac6fde743408e59ccdfc61bd574ebe6e2862238
2019-11-26 10:23:33 -08:00
0febff36ac Export dynamic unbind/split and __getitem__ (#29136)
Summary:
In ONNX opset 11, a series of sequence ops were added. Operators that are related to Tensor[] in PyTorch can be exported using these sequence ops.
In this PR, unbind/split that produces Tensor[], and __getitem__ that takes Tensor[] as input, are exported correctly to ONNX opset 11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29136

Reviewed By: hl475

Differential Revision: D18309222

Pulled By: houseroad

fbshipit-source-id: be12c96bf8d0a56900683ef579f1c808c0a1af21
2019-11-26 06:54:06 -08:00
2599b9b551 Add output_size argument to caffe2 Int8ResizeNearest (#30202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30202

Pytorch Upsample operator has output_size as an argument.
For quantized tensor inputs we cannot get the input_size to calculate the width and height scale factor.
Instead we pass the output_size directly to caffe2 to calculate the scale factors.

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_upsample

Imported from OSS

Differential Revision: D18631478

fbshipit-source-id: 38a39129bc863f4ecf2293acc068e40ab7edc825
2019-11-26 06:54:02 -08:00
efe1859ad9 By default ignore RRef leaks during shutdown (#30217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217

Before this commit, RRefContext throws an error if it detects any
RRef leak during shutdown. However, this requires applications to
make sure that is has freed all references to RRefs in application
code, which can be a bad debugging experience when for large
applications. Besides, this also relies on Python GC to free things
up in time, which might not always be true. After this commit,
RRefContext would ignore leaking RRefs during shutdown, as shutdown
is called when the application has finished training and no longer
care about local states. Hence, it should be OK to just ignore
those leaks and destroy OwnerRRefs. If application would like to
enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak
to False.

Test Plan: Imported from OSS

Differential Revision: D18632546

Pulled By: mrshenli

fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38
2019-11-26 06:53:58 -08:00
06db5ad707 Provide names for operator nodes in ONNX exported graph. (#27342)
Summary:
The PyTorch exporter does not add any name to the ONNX operators in the exported graph. A common request is to add names to op nodes by default. This helps the readability of the graph in visualization tools such a Netron, or when the ONNX graph is printed as a string. Also, it helps with the debuggability of the ONNX graph.

Therefore this PR adds name to operators in the exporters. The names follow a simple format, <op_type>_<index>. Expect files for tests in `test/onnx/test_operators.py` have been updated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27342

Reviewed By: hl475

Differential Revision: D17790979

Pulled By: houseroad

fbshipit-source-id: 1eaae88b5f51f152735a2ff96e22827837e34d9d
2019-11-26 06:53:53 -08:00
584be86c3f Try exporting ONNX with force_outplace=False (#29466)
Summary:
This should resolve https://github.com/pytorch/pytorch/issues/29008. This flag has two effects on the tracer.
- Remove the underscroll for inplace operators. E.g.: index_put_ ==> index_put. This is handled in utils.py separately as well.
- Add out as input for backward computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29466

Reviewed By: hl475

Differential Revision: D18422815

Pulled By: houseroad

fbshipit-source-id: 317b6a3c8a5751fe6fe49d7543e429d281ed0d6d
2019-11-26 06:53:49 -08:00
eccf42fd15 Bug fix: Handle missing keys in observer state dict during load (#30357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357

Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant.
ghstack-source-id: 94468814

Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly.

Differential Revision: D18668517

fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7
2019-11-26 06:53:45 -08:00
ab5774547a Add info about transitive dependencies in case of using local aars (#30128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30128

Preview: https://github.com/pytorch/pytorch/tree/gh/IvanKobzarev/23/head/android

Based on users issue: https://discuss.pytorch.org/t/android-somethings-went-wrong-with-pytorch-android-1-4-0-snapshot/61009/3

Test Plan: Imported from OSS

Differential Revision: D18702658

Pulled By: IvanKobzarev

fbshipit-source-id: 14928baccd58ddbe633fad03038271d8333c4b49
2019-11-26 06:53:40 -08:00
085dde5965 Fix for when PyTorch model trace has RecursiveScriptModules (#30430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30430

When a module isn't a TracedModule, attempt to get name information with `original_name` property on module and default to 'Module' when no such property exists.

Test Plan:
### Change child module to scripted module:
```
model = torchvision.models.alexnet()
model.classifier = torch.jit.script(model.classifier)
```
### Add graph
```
w = SummaryWriter()
w.add_graph(model, torch.rand((2, 3, 224, 224)))
w.close()
```
### No errors
However, graph is disconnected at parts and hard to understand.
{F223327878}

Reviewed By: sanekmelnikov

Differential Revision: D18690836

fbshipit-source-id: 42295d06b7c1d48d5401776dca1e0d12cd64b49d
2019-11-26 06:53:35 -08:00
8199596d7e Add missing std::move (#30411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30411

-
ghstack-source-id: 94526555

Test Plan: unit tests

Differential Revision: D18690385

fbshipit-source-id: fd348c0887c279694c2f6d287b361c8e07f02ffb
2019-11-26 06:53:31 -08:00
661a6c8ef2 Add get_qparams and revert the changes to calculate_qparams (#30262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262

`get_qparams` returns all parameters that's needed to call quantize function

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18645047

fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a
2019-11-26 06:53:26 -08:00
46e7f31fa3 Document unsupported types (#30344)
Summary:
This adds a listing of the parts of the `typing` module that are unsupported

This is also a first pass decisions on features are 'unlikely to be implemented' vs 'not implemented' so they're open to discussion
](https://our.intern.facebook.com/intern/diff/18665628/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30344

Pulled By: driazati

Differential Revision: D18665628

fbshipit-source-id: 22b8ebbde23df03839306cdb4344ca18a44f2c29
2019-11-26 06:53:22 -08:00
ab2ec4d835 Fix inexistent parameter in document (#24335)
Summary:
There is no `out` argument to `argsort` according to the source code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24335

Differential Revision: D16829134

Pulled By: vincentqb

fbshipit-source-id: 8f91154984cd4a753ba1d6105fb8a9bfa0da22b3
2019-11-26 06:53:17 -08:00
0b71e7e1fd Refactor QAT Conv module for better extensibility (#30362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30362

Right now the qat modules(qat.ConvBn2d, qat.ConvBnReLU2d, qat.Conv2d)
are not convinent to support other dimensions of Conv, this PR refactors
these modules so that we can support Conv1d/Conv3d better

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18691152

fbshipit-source-id: 5b561e6b054eadd31b98cabdf1ac67a61ee9b805
2019-11-26 06:53:12 -08:00
b8f50d9cc8 Support to add dequant for each use of Value (#30145)
Summary:
In this PR, we mainly handle the case there are multiple usage of a Value when inserting the quant-dequant pair. This change will add one dequant for each usage of the Value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30145

Differential Revision: D18671600

Pulled By: lly-zero-one

fbshipit-source-id: 61324a98861da85b80dcf7e930381311118ae53b
2019-11-25 14:52:58 -08:00
25f4ba7c1b Improve compare kernel (#29743)
Summary:
Currently, the way the compare kernels handle dtypes is very funny (this behavior is introduced in https://github.com/pytorch/pytorch/pull/28427 and I just realize it today):

Let's say `a, b` are two float tensors on CUDA.

If you do `a < b`, this is what would happen inside the loop:
- Step 1: Fetch `a` and `b`, dynamically cast them from `float` to `float`. (i.e. check the scalar type to figure out if it needs cast. it doesn't. so do nothing then.)
- Step 2: compute `a < b`, get a `bool` result
- Step 3: statically cast the result into `float`
- Step 3: do a dynamic cast of the result from `float` to `bool` and store the value

And if you do `a.lt_(b)`, this is what would happen:
- Step 1: Fetch `a` and `b`, no casting
- Step 2: compute `a < b`, get a `bool` result
- Step 3: statically cast the result into `float`
- Step 4: store the result to memory, no casting

Although dynamic casting happens on registers, it still hurt the performance a bit (~8%).

This PR fixes this issue. Now for compare kernels, if the output is bool and inputs have the same dtype, then there is no dynamic casting. Otherwise, there will be dynamic casting for each input and output. That is, the dynamic casting behavior of the two cases described above are swapped.

Benchmark on `a < b` for tensor of 1000000000 fp32 elements:
Before https://github.com/pytorch/pytorch/issues/28427 6.35 ms
Current master: 6.88 ms
With this PR: 6.36 ms
Benchmark on `a.lt_(b)` does not show any difference across versions.

Besides this, what worries me most is, with type promotion, the logic for tensor iterator is becoming super complicated, and it is hard to see if one change causes the performance regression of others. I suggest we create scripts that could benchmark tensor iterator entirely, review that code and put it somewhere inside the repository (maybe under `/tools` or `/test/scripts`?), and whenever we are not certain about the performance we could run it to check. (I guess not on this PR but on PRs after the script is done. If there are worries about performance, the author of PRs should run the script manually, and the reviewer should remind PR author to do so if necessary) If this is a good idea, I will send a PR for the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29743

Differential Revision: D18671269

Pulled By: ngimel

fbshipit-source-id: 89a9c1c8b5fd45d5ae8fe907d65c2fe1a7dfd2dc
2019-11-25 14:52:53 -08:00
5c6705e62c add default arg for init_method (#30208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208

Adds default arg for init_method so users don't have to pass this in,
and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs.
ghstack-source-id: 94500475

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18630074

fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a
2019-11-25 14:52:48 -08:00
d64e2581cc Add list of supported XCode/CUDA versions to README
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30407

Differential Revision: D18689043

Pulled By: smessmer

fbshipit-source-id: cd772451ef31356ed3045ebb1a9c4f5e5e91bb45
2019-11-25 14:52:42 -08:00
0517323dad Update osx CI to XCode 9.4 / CUDA 10.0, cudnn 7.6.5 (#30359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30359

We need this for C++14 support

ghstack-source-id: 94519850

Test Plan: unit tests

Differential Revision: D18668868

fbshipit-source-id: 87e8eadf0e60a1699fba4524aea53b306b9a7f24
2019-11-25 14:52:37 -08:00
c12f9a12a8 Fix quantized ConvReLU3d test (#30266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30266

Fix quantized ConvReLU3d test

Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv"

Reviewed By: hl475

Differential Revision: D18645717

fbshipit-source-id: bbe93f9daf5046f2aa05363efc7d0e59eaff37bf
2019-11-25 14:52:32 -08:00
d7ac90e2ef Stop binding std_single and var_single from TH; they aren't used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29951

Test Plan: Imported from OSS

Differential Revision: D18548057

Pulled By: gchanan

fbshipit-source-id: 0143f694517fa8229e53bd2bc636501804a3f80b
2019-11-25 14:52:27 -08:00
0c67311878 Turn off scalar_check for set_(Storage, ...) (#29950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29950

The underlying code handles it correctly.

Test Plan: Imported from OSS

Differential Revision: D18548052

Pulled By: gchanan

fbshipit-source-id: 88b737572c816fb0026ac5e66da7e3f4ab686773
2019-11-25 14:52:22 -08:00
7160300638 Turn off scalar_check for reductions _th_max, _th_min. (#29949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29949

The underlying functions handle this already.

Test Plan: Imported from OSS

Differential Revision: D18548047

Pulled By: gchanan

fbshipit-source-id: 123c9297db4e4315da9b1d996ac8b41aa1b4c7bc
2019-11-25 14:52:17 -08:00
16606e1725 Turn off scalar_check for mode; the underlying code is correct.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29948

Test Plan: Imported from OSS

Differential Revision: D18548053

Pulled By: gchanan

fbshipit-source-id: 15cdfc24d3e5123497c72dc09c5e6b28cb5e1f88
2019-11-25 14:52:12 -08:00
b8eba7aca9 Turn off scalar_check for ormqr. (#29947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29947

It requires > 0-dimensional tensors.

Test Plan: Imported from OSS

Differential Revision: D18548049

Pulled By: gchanan

fbshipit-source-id: ce80a42515b59513a0e5ef2b32e2c2b90b4d64f5
2019-11-25 14:52:07 -08:00
7c6cc1d6d4 Turn off scalar_checks for _th_multinomial_alias_draw. (#29946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29946

it requires > 0-dimensional tensors.

Test Plan: Imported from OSS

Differential Revision: D18548050

Pulled By: gchanan

fbshipit-source-id: 4d1e3b53bd701137cc2cb674f95627a5e064a274
2019-11-25 14:52:02 -08:00
6e88ddf352 Turn off scalar_check for _th_addmv and _th_eig as they can never pass. (#29945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29945

Both functions require at least 1 2-dimensional tensor, so can never return an inferred scalar.

Test Plan: Imported from OSS

Differential Revision: D18548056

Pulled By: gchanan

fbshipit-source-id: f99a41d490b9a5ab5717534c92e4f2e848c743e8
2019-11-25 14:51:56 -08:00
ce5f1a1b25 Turn off scalar_check for masked_select. (#29923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29923

Note that this changes the behavior of masked_select when both "self" and "mask" are 0-dimensional.

In previous versions of PyTorch, this would return a 0-dimensional tensor.  But the documentation reads:
"Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor."

Test Plan: Imported from OSS

Differential Revision: D18539560

Pulled By: gchanan

fbshipit-source-id: 1637ed2c434fcf8ceead0073aa610581f4a19d21
2019-11-25 14:51:51 -08:00
0c9c62ba6e Turn off scalar_checks for __and__ and clone.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29880

Test Plan: Imported from OSS

Differential Revision: D18521732

Pulled By: gchanan

fbshipit-source-id: 7fdf5d8a7b93b43ac32067222cb8df5e790900de
2019-11-25 14:51:46 -08:00
94ad7544ae Turn off scalar_check for __or__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29879

Test Plan: Imported from OSS

Differential Revision: D18521745

Pulled By: gchanan

fbshipit-source-id: 93d17d5e9cad5dd6d2c20221d87408c838d74eca
2019-11-25 14:51:40 -08:00
f994377d28 Turn off scalar_check for lshift, rshift.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29878

Test Plan: Imported from OSS

Differential Revision: D18521746

Pulled By: gchanan

fbshipit-source-id: 11fd7db79ac8ae76b1a5df25fb0ff59d81fcf394
2019-11-25 14:51:34 -08:00
99a46b44ea Use correct API macro in VariableHooksInterface. (#30320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30320

Fixes #30296

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18665704

Pulled By: ezyang

fbshipit-source-id: f09a953137fcc105959382254f9b8886af5aea3b
2019-11-25 14:51:29 -08:00
20dfae4099 Fix the crashes for c++ not able to find java class through Jni (#30390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30390

Fix the crashes for c++ not able to find java class through Jni
ghstack-source-id: 94499644

Test Plan: buck install -r fb4a

Reviewed By: ljk53

Differential Revision: D18667992

fbshipit-source-id: aa1b19c6dae39d46440f4a3e691054f7f8b1d42e
2019-11-25 14:51:23 -08:00
3990e9d1ca Improve performance of LeftRight::read() (#30282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30282

The atomic increment/decrements in LeftRight::read() were measurable in perf benchmarks. Let's improve their perf.
ghstack-source-id: 94443230

Test Plan: unit tests, perf benchmarks

Differential Revision: D18650228

fbshipit-source-id: d184ce8288510ab178e7c7da73562609d1ca3c9f
2019-11-23 15:25:13 -08:00
0c7e4c1d62 backend fallback test (#29682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29682

This PR re-introduces backend_fallback_test.cpp, which was previously called boxed_fallback_test.cpp and showed how to use the backend fallback API.
ghstack-source-id: 94481314

Test Plan: unit tests

Differential Revision: D18462654

fbshipit-source-id: 3e9b5c8f35c05f9cd795f44a5fefd1a0aaf03509
2019-11-23 15:25:09 -08:00
959a849a23 better boxing (#29681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29681

Remove callUnboxedOnly() and instead use metaprogramming to figure out if an operator can use a boxed fallback or not.
This enables boxed fallback for ops in native_functions.yaml even if they don't have `use_c10_dispatcher: full` set, as long as they're in the range of supported types.
ghstack-source-id: 94481320

Test Plan: unit tests

Differential Revision: D18462653

fbshipit-source-id: 2955e3c4949267520a1734a6a2b919ef5e9684a2
2019-11-23 15:25:05 -08:00
aa2862b843 Hide the OperatorKernel* argument from the stack based kernel API (#29337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337

This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it.
But if a kernel is registered in a boxed way, we don't need it and should hide this from the API.
This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does.
Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so.
ghstack-source-id: 94481316

Test Plan: unit tests

Differential Revision: D18361991

fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492
2019-11-23 15:25:01 -08:00
afdc0bd4ec OperatorHandle::callBoxed/callUnboxed (#29330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29330

This makes for a nicer API, especially in backend fallback kernels who get an OperatorHandle instance and can directly call these methods on it.
ghstack-source-id: 94481322

Test Plan: unit tests stacked on top

Differential Revision: D18357424

fbshipit-source-id: fa8c638335f246c906c8e16186507b4c486afb3f
2019-11-23 15:24:57 -08:00
fb8c17dde1 Test cases for backend fallback kernels (#29214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29214

-
ghstack-source-id: 94481312

Test Plan: unit tests

Differential Revision: D18329308

fbshipit-source-id: 1dbae401f2255c69ed16d436f891b9b60c333d81
2019-11-23 15:24:53 -08:00
583c288232 Add a OperatorHandle argument to boxed kernels (#29201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201

This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called.
ghstack-source-id: 94481313

Test Plan: I will add unit tests in a diff stacked on top

Differential Revision: D18282746

fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99
2019-11-23 15:24:49 -08:00
24aabe439a Make Dispatcher::backendFallbackKernels_ an array (#30340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30340

We already made OperatorEntry::dispatchTable_ an array to be able to avoid the concurrency primitives there,
but Dispatcher::backendFallbackKernels_ has the same issue. Let's make it a table too.

Since there is some code duplication here, we also factor out the concept of a KernelFunctionTable to be used in both places.
ghstack-source-id: 94481317

Test Plan: unit tests

Differential Revision: D18663426

fbshipit-source-id: ba82ca5c4cae581eea359d5c0c3a5e23b0f8838c
2019-11-23 15:24:45 -08:00
7b5045be9d Remove LeftRight from OperatorEntry and DispatchTable. (#30333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30333

re-export of https://github.com/pytorch/pytorch/pull/30328
ghstack-source-id: 94481321

Differential Revision: D18661518

fbshipit-source-id: 5a35a1ed2fae3b21a43614957a91d648c21bcca1
2019-11-23 15:24:41 -08:00
4aa692fc91 Convert KernelTable to a flat-indexed array rather than a hashtable. (#30332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30332

-
ghstack-source-id: 94481315

Reviewed By: resistor

Differential Revision: D18660421

fbshipit-source-id: 9f11434f1c3c234c45f586719182053fa81731f0
2019-11-23 15:24:37 -08:00
7c4b9042ab Updates to quantization documentation (#30288)
Summary:
This pull request includes fixes for six quantization doc bugs.

https://github.com/pytorch/pytorch/issues/30283 - Rendering issue on QConfig
https://github.com/pytorch/pytorch/issues/26305 - Minor doc issue on fuse_modules()
https://github.com/pytorch/pytorch/issues/27451 - Issues with ConvReLU2d, ConvReLU3d, and LinearReLU doc issues
https://github.com/pytorch/pytorch/issues/26899 - Missing docstrings in torch.nn.intrinsic fused functions
https://github.com/pytorch/pytorch/issues/29735 - add discussion of QNNPack to quantization doc page
https://github.com/pytorch/pytorch/issues/27938 - some of the quantized functions lack documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30288

Differential Revision: D18653368

Pulled By: gottbrath

fbshipit-source-id: 410b3dd81ff10909a7f1a7736ca42d7cabf0beb1
2019-11-23 09:29:30 -08:00
7570b2798a updating citation (#30267)
Summary:
NIPS -> NeurIPS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30267

Differential Revision: D18672928

Pulled By: soumith

fbshipit-source-id: c20f26a0547f94ff39f8ee40e5f0ccc5fcc814af
2019-11-23 07:24:14 -08:00
59ca9b7430 Graph-mode quantization for convolution from traced model (#30245)
Summary:
In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245

Differential Revision: D18671597

Pulled By: lly-zero-one

fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848
2019-11-23 01:24:50 -08:00
2a7a39c1af (de)serialization of values between C++ and Python (#30108)
Summary:
This PR updates `torch::pickle_save` to use the new zipfile format introduced in #29232 and adds `torch::pickle_load` which can decode the zipfile format. Now that `torch.save/load` use this format as well (if the `_use_new_zipfile_serialization` flag is `True`), raw values saved in Python can be loaded in C++ and vice versa.

Fixes #20356
](https://our.intern.facebook.com/intern/diff/18607087/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30108

Pulled By: driazati

Differential Revision: D18607087

fbshipit-source-id: 067cdd5b1cf9c30ddc7e2e5021a8cceee62d8a14
2019-11-23 00:06:07 -08:00
ee20e66c48 replace the SLSRQ for their right emulations in the replayer test (#30367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30367

use the SLS emulations that match the hardware

Test Plan: replayer test

Differential Revision: D18667605

fbshipit-source-id: 89aee630184737b86ecfb09717437e5c7473e42c
2019-11-23 00:06:03 -08:00
328ec5460f refactor the observer removal and quantize tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30360

Differential Revision: D18670373

Pulled By: lly-zero-one

fbshipit-source-id: 1481d6e4d5ce40376577b8deb0a0f74d5559076e
2019-11-22 21:25:23 -08:00
6a00191fc2 Add RpcAgent::getWorkerInfos() (#30241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30241

We need an API to get all worker infos. This will be used by backend-agnostic `rpc.wait_all_workers()` API.
ghstack-source-id: 94454935

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_get_worker_infos

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_get_worker_infos
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_get_worker_infos

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_get_worker_infos
```

Differential Revision: D5693412

fbshipit-source-id: 5123c8248b6d44fd36b8a5f381dbabb2660e6f0f
2019-11-22 18:26:30 -08:00
c7f988b8c6 transport open registration (#30167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30167

Pull Request resolved: https://github.com/pytorch/pytorch/pull/29164

- Created GlooDeviceFactory to hide device creation details
- Added transport option while on Python interface

The reason of making the factory class is to make it easier to extend gloo transport in the future

Test Plan: Imported from OSS

Reviewed By: satgera, d4l3k

Differential Revision: D18596527

fbshipit-source-id: e8114162ee8d841c0e0769315b48356b37d6ca0a
2019-11-22 17:41:52 -08:00
ac103a5d78 Remove variable wrapping from register_c10_ops (#29207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29207

The logic calling c10 ops from JIT did some variable wrapping to make sure all results are always variables.
Thanks to ezyang, this is not needed anymore because everything is a variable now.
ghstack-source-id: 93345590

Test Plan: waitforsandcastle

Differential Revision: D18327507

fbshipit-source-id: 86512c5e19d6972d70f125feae172461c25e3cb6
2019-11-22 15:32:55 -08:00
9fb879934e Revert D18641413: add unit tests to iOS CI jobs
Test Plan: revert-hammer

Differential Revision:
D18641413

Original commit changeset: 12942206f1de

fbshipit-source-id: 4fa76d50fb897db4342d10a4e46a9887e37ef233
2019-11-22 15:24:27 -08:00
6c9b188262 Support in-place update in IndexHashOp (#30275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30275

`IndexHash` did not support in-place update.

Reviewed By: kennyhorror

Differential Revision: D18612231

fbshipit-source-id: adeccdf1ceb6107454555ff9cdf66fd5e5773f2a
2019-11-22 14:49:28 -08:00
99a2a0b1ca Implement torch.diagonal for named tensors (#30193)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30193

Featuring:
- Added a NoNamesGuard::reset() function that sets NamesMode back to
what it was before the guard. This makes it so that we don't have to
create a new context to run code in an unnamed way.
- Added a diagonal(Tensor, *, Dimname outdim, Dimname dim1, Dimname dim2, int64_t offset=0)
overload. All of the non-tensor arguments are keyword only for
readability purposes; something like `tensor.diagonal("A", "B", "C")`
would be really confusing.

Test Plan: - Added new tests

Differential Revision: D18638363

Pulled By: zou3519

fbshipit-source-id: ea37b52a19535f84a69be38e95e569e88f307381
2019-11-22 14:49:23 -08:00
2e709763a3 add wrapper to exclude XLA when running device tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30316

Test Plan: Imported from OSS

Differential Revision: D18659286

Pulled By: nairbv

fbshipit-source-id: 86d035bb0c54c612868590c3188cfcd969c3f686
2019-11-22 13:04:59 -08:00
8c6f0c0587 Detect TorchScript archives in torch.load (#29339)
Summary:
This PR looks for a `constants.pkl` file at the top level in a zip file
in `torch.load`. If found, it calls `torch.jit.load` instead and issues
a warning to call `torch.jit.load` directly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29339

Differential Revision: D18611095

Pulled By: driazati

fbshipit-source-id: f070a02f6b5509054fc3876b3e8356bbbcc183e1
2019-11-22 12:30:30 -08:00
90cb1e67ff Fix exception message in Java Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30205

Test Plan: Imported from OSS

Reviewed By: linbinyu

Differential Revision: D18653568

Pulled By: dreiss

fbshipit-source-id: a5fcb809eba641a7fbd0e99e835eceeb248e680c
2019-11-22 12:04:49 -08:00
0c18de2623 Add inferBoundShapeOp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30101

Reviewed By: ipiszy

Differential Revision: D18387803

fbshipit-source-id: 5edb6b949257370b62fa6da477bd6ed2f16a9bd1
2019-11-22 12:04:45 -08:00
35e6c1763e Switch Docker image onda-cuda-cxx11-ubuntu1604 to new uniform name (#29943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29943

This was apparently the same as "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",
so standardize on that name.

Test Plan:
This PR, which is stacked on top of a commit that puts one of the jobs
using that container into the set of PR builds.

Imported from OSS

Differential Revision: D18653554

fbshipit-source-id: 40e6c52db02265d61e8166bb1211376faccfc53a
2019-11-22 11:39:55 -08:00
2723 changed files with 152809 additions and 55755 deletions

View File

@ -39,6 +39,7 @@ LINUX_PACKAGE_VARIANTS = OrderedDict(
"3.5m",
"3.6m",
"3.7m",
"3.8m",
],
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[

View File

@ -24,7 +24,7 @@ class Conf(object):
def gen_docker_image(self):
if self.gcc_config_variant == 'gcc5.4_cxx11-abi':
return miniutils.quote("pytorch/conda-cuda-cxx11-ubuntu1604:latest")
return miniutils.quote("pytorch/pytorch-binary-docker-image-ubuntu16.04:latest")
docker_word_substitution = {
"manywheel": "manylinux",
@ -33,11 +33,9 @@ class Conf(object):
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
# The cpu nightlies are built on the pytorch/manylinux-cuda100 docker image
alt_docker_suffix = self.cuda_version or "100"
# The cpu nightlies are built on the pytorch/manylinux-cuda102 docker image
alt_docker_suffix = self.cuda_version or "102"
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
if self.cuda_version == "101":
return "soumith/manylinux-cuda101@sha256:5d62be90d5b7777121180e6137c7eed73d37aaf9f669c51b783611e37e0b4916"
return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
def get_name_prefix(self):
@ -69,7 +67,17 @@ class Conf(object):
job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")
job_def["filters"] = {"branches": {"only": "postnightly"}}
else:
job_def["filters"] = {"branches": {"only": "nightly"}}
job_def["filters"] = {
"branches": {
"only": "nightly"
},
# Will run on tags like v1.5.0-rc1, etc.
"tags": {
# Using a raw string here to avoid having to escape
# anything
"only": r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"
}
}
if self.libtorch_variant:
job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)
if phase == "test":

View File

@ -5,7 +5,9 @@ from cimodel.lib.conf_tree import Ver
CONFIG_TREE_DATA = [
(Ver("ubuntu", "16.04"), [
([Ver("gcc", "5")], [XImportant("onnx_py2")]),
([Ver("clang", "7")], [XImportant("onnx_py3.6")]),
([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),
XImportant("onnx_ort1_py3.6"),
XImportant("onnx_ort2_py3.6")]),
]),
]
@ -27,7 +29,9 @@ class TreeConfigNode(ConfigNode):
return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]
def is_build_only(self):
if str(self.find_prop("language_version")) == "onnx_py3.6":
if str(self.find_prop("language_version")) == "onnx_main_py3.6" or \
str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \
str(self.find_prop("language_version")) == "onnx_ort2_py3.6":
return False
return set(str(c) for c in self.find_prop("compiler_version")).intersection({
"clang3.8",
@ -36,6 +40,12 @@ class TreeConfigNode(ConfigNode):
"android",
}) or self.find_prop("distro_version").name == "macos"
def is_test_only(self):
if str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \
str(self.find_prop("language_version")) == "onnx_ort2_py3.6":
return True
return False
class TopLevelNode(TreeConfigNode):
def __init__(self, node_name, subtree):
@ -68,6 +78,7 @@ class LanguageConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["language_version"] = node_name
self.props["build_only"] = self.is_build_only()
self.props["test_only"] = self.is_test_only()
def child_constructor(self):
return ImportantConfigNode

View File

@ -12,7 +12,7 @@ from dataclasses import dataclass
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"
DOCKER_IMAGE_VERSION = 345
DOCKER_IMAGE_VERSION = "345"
@dataclass
@ -23,6 +23,7 @@ class Conf:
# for gpu files and host compiler (gcc/clang) for cpu files)
compilers: [Ver]
build_only: bool
test_only: bool
is_important: bool
@property
@ -33,7 +34,9 @@ class Conf:
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_py3.6" \
or self.language == "onnx_main_py3.6" \
or self.language == "onnx_ort1_py3.6" \
or self.language == "onnx_ort2_py3.6" \
or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \
or str(self.distro) in ["ubuntu14.04", "macos10.13"]
@ -50,6 +53,13 @@ class Conf:
def construct_phase_name(self, phase):
root_parts = self.get_build_name_root_parts()
build_name_substitutions = {
"onnx_ort1_py3.6": "onnx_main_py3.6",
"onnx_ort2_py3.6": "onnx_main_py3.6",
}
if phase == "build":
root_parts = [miniutils.override(r, build_name_substitutions) for r in root_parts]
return "_".join(root_parts + [phase]).replace(".", "_")
def get_platform(self):
@ -62,7 +72,9 @@ class Conf:
lang_substitutions = {
"onnx_py2": "py2",
"onnx_py3.6": "py3.6",
"onnx_main_py3.6": "py3.6",
"onnx_ort1_py3.6": "py3.6",
"onnx_ort2_py3.6": "py3.6",
"cmake": "py2",
}
@ -74,7 +86,9 @@ class Conf:
parameters = OrderedDict()
lang_substitutions = {
"onnx_py2": "onnx-py2",
"onnx_py3.6": "onnx-py3.6",
"onnx_main_py3.6": "onnx-main-py3.6",
"onnx_ort1_py3.6": "onnx-ort1-py3.6",
"onnx_ort2_py3.6": "onnx-ort2-py3.6",
}
lang = miniutils.override(self.language, lang_substitutions)
@ -136,6 +150,7 @@ def instantiate_configs():
distro=fc.find_prop("distro_version"),
compilers=fc.find_prop("compiler_version"),
build_only=fc.find_prop("build_only"),
test_only=fc.find_prop("test_only"),
is_important=fc.find_prop("important"),
)
@ -150,10 +165,11 @@ def get_workflow_jobs():
x = []
for conf_options in configs:
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
if conf_options.test_only:
phases = ["test"]
for phase in phases:
x.append(conf_options.gen_workflow_job(phase))

View File

@ -3,8 +3,8 @@ PHASES = ["build", "test"]
CUDA_VERSIONS = [
None, # cpu build
"92",
"100",
"101",
"102",
]
STANDARD_PYTHON_VERSIONS = [
@ -12,4 +12,5 @@ STANDARD_PYTHON_VERSIONS = [
"3.5",
"3.6",
"3.7",
"3.8"
]

View File

@ -4,17 +4,15 @@ from cimodel.lib.conf_tree import ConfigNode, X, XImportant
CONFIG_TREE_DATA = [
("xenial", [
(None, [
XImportant("2.7.9"),
X("2.7"),
XImportant("3.5"), # Not run on all PRs, but should be included on [test all]
X("3.5"),
X("nightly"),
]),
("gcc", [
("5.4", [ # All this subtree rebases to master and then build
XImportant("3.6"),
("3.6", [
("parallel_tbb", [XImportant(True)]),
("parallel_native", [XImportant(True)]),
("parallel_tbb", [X(True)]),
("parallel_native", [X(True)]),
]),
]),
# TODO: bring back libtorch test
@ -24,11 +22,11 @@ CONFIG_TREE_DATA = [
("5", [
XImportant("3.6"), # This is actually the ASAN build
]),
# ("7", [
# ("3.6", [
# ("xla", [XImportant(True)]),
# ]),
# ]),
("7", [
("3.6", [
("xla", [XImportant(True)]),
]),
]),
]),
("cuda", [
("9", [
@ -39,14 +37,16 @@ CONFIG_TREE_DATA = [
# and
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)
X("3.6"),
]),
("9.2", [X("3.6")]),
("10.1", [X("3.6")]),
("10.2", [
XImportant("3.6"),
("3.6", [
("libtorch", [XImportant(True)])
]),
]),
("9.2", [X("3.6")]),
("10", [X("3.6")]),
("10.1", [X("3.6")]),
]),
("android", [
("r19c", [

View File

@ -13,7 +13,7 @@ DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"
# ARE YOU EDITING THIS NUMBER? MAKE SURE YOU READ THE GUIDANCE AT THE
# TOP OF .circleci/config.yml
DOCKER_IMAGE_VERSION = 405
DOCKER_IMAGE_VERSION = "f990c76a-a798-42bb-852f-5be5006f8026"
@dataclass
@ -160,6 +160,11 @@ def gen_dependent_configs(xenial_parent_config):
configs.append(c)
return configs
def gen_docs_configs(xenial_parent_config):
configs = []
for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push"]:
configs.append(HiddenConf(x, parent_build=xenial_parent_config))
@ -210,7 +215,6 @@ def instantiate_configs():
android_abi = fc.find_prop("android_abi")
parms_list_ignored_for_docker_image.append(android_abi)
restrict_phases = ["build"]
fc.props["is_important"] = True
elif compiler_name:
gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")
@ -222,7 +226,7 @@ def instantiate_configs():
python_version = fc.find_prop("pyver")
parms_list[0] = fc.find_prop("abbreviated_pyver")
if cuda_version in ["9.2", "10", "10.1"]:
if cuda_version in ["9.2", "10", "10.1", "10.2"]:
# TODO The gcc version is orthogonal to CUDA version?
parms_list.append("gcc7")
@ -248,7 +252,16 @@ def instantiate_configs():
parallel_backend=parallel_backend,
)
if cuda_version == "9" and python_version == "3.6" and not is_libtorch:
# run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds
# should run on a CPU-only build that runs on all PRs.
if distro_name == 'xenial' and fc.find_prop("pyver") == '3.6' \
and cuda_version is None \
and parallel_backend is None \
and compiler_name == 'gcc' \
and fc.find_prop('compiler_version') == '5.4':
c.dependent_tests = gen_docs_configs(c)
if cuda_version == "10.1" and python_version == "3.6" and not is_libtorch:
c.dependent_tests = gen_dependent_configs(c)
if (compiler_name == "gcc"
@ -275,7 +288,7 @@ def get_workflow_jobs():
config_list = instantiate_configs()
x = ["setup"]
x = []
for conf_options in config_list:
phases = conf_options.restrict_phases or dimensions.PHASES

File diff suppressed because it is too large Load Diff

View File

@ -27,6 +27,8 @@ elif [[ "$image" == *-bionic* ]]; then
UBUNTU_VERSION=18.04
fi
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
@ -54,6 +56,13 @@ case "$image" in
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.8)
# TODO: This is a hack, get rid of this as soon as you get rid of the travis downloads
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64"
TRAVIS_PYTHON_VERSION=3.8
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc4.8)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=4.8
@ -67,6 +76,7 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-py3.6-gcc7.2)
ANACONDA_PYTHON_VERSION=3.6
@ -87,22 +97,6 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda8-cudnn7-py2)
CUDA_VERSION=8.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=2.7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda8-cudnn7-py3)
CUDA_VERSION=8.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda9-cudnn7-py2)
CUDA_VERSION=9.0
CUDNN_VERSION=7
@ -118,7 +112,6 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)
CUDA_VERSION=9.2
@ -146,6 +139,17 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.6
@ -157,6 +161,7 @@ case "$image" in
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
ANDROID_NDK_VERSION=r19c
@ -184,6 +189,7 @@ tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"
# Build image
docker build \
--no-cache \
--build-arg "TRAVIS_DL_URL_PREFIX=${TRAVIS_DL_URL_PREFIX}" \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "THRIFT=${THRIFT:-}" \

View File

@ -10,7 +10,7 @@ apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
pushd /tmp
curl -Os https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip
curl -Os --retry 3 https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip
popd
_ndk_dir=/opt/ndk
mkdir -p "$_ndk_dir"

View File

@ -8,7 +8,7 @@ sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment
export PATH="/opt/cache/bin:$PATH"
# Setup compiler cache
curl https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
chmod a+x /opt/cache/bin/sccache
function write_sccache_stub() {

View File

@ -10,7 +10,7 @@ file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"
# Download and install specific CMake version in /usr/local
pushd /tmp
curl -Os "https://cmake.org/files/${path}/${file}"
curl -Os --retry 3 "https://cmake.org/files/${path}/${file}"
tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz
rm -f cmake-*.tar.gz
popd

View File

@ -65,9 +65,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# DO NOT install cmake here as it would install a version newer than 3.5, but
# we want to pin to version 3.5.
conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six
if [[ "$CUDA_VERSION" == 8.0* ]]; then
conda_install magma-cuda80 -c pytorch
elif [[ "$CUDA_VERSION" == 9.0* ]]; then
if [[ "$CUDA_VERSION" == 9.0* ]]; then
conda_install magma-cuda90 -c pytorch
elif [[ "$CUDA_VERSION" == 9.1* ]]; then
conda_install magma-cuda91 -c pytorch
@ -88,7 +86,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# scikit-learn is pinned because of
# https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5
# only)
as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.43.1 llvmlite==0.28.0
as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.46.0 llvmlite==0.30.0
popd
fi

View File

@ -14,7 +14,7 @@ if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
# Download Python binary from Travis
pushd tmp
as_jenkins wget --quiet https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64/python-$TRAVIS_PYTHON_VERSION.tar.bz2
as_jenkins wget --quiet ${TRAVIS_DL_URL_PREFIX}/python-$TRAVIS_PYTHON_VERSION.tar.bz2
# NB: The tarball also comes with /home/travis virtualenv that we
# don't care about. (Maybe we should, but we've worked around the
# "how do I install to python" issue by making this entire directory
@ -88,6 +88,9 @@ if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
# Install psutil for dataloader tests
as_jenkins pip install psutil
# Install dill for serialization tests
as_jenkins pip install "dill>=0.3.1"
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@ -46,6 +46,7 @@ RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install non-standard Python versions (via Travis binaries)
ARG TRAVIS_PYTHON_VERSION
ARG TRAVIS_DL_URL_PREFIX
ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
ADD ./common/install_travis_python.sh install_travis_python.sh
RUN bash ./install_travis_python.sh && rm install_travis_python.sh

View File

@ -0,0 +1,13 @@
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
ADD gc.py /usr/bin/gc.py
ADD docker_hub.py /usr/bin/docker_hub.py
ENTRYPOINT ["/usr/bin/gc.py"]

View File

@ -0,0 +1,125 @@
#!/usr/bin/env python
from collections import namedtuple
import boto3
import requests
import os
IMAGE_INFO = namedtuple(
"IMAGE_INFO", ("repo", "tag", "size", "last_updated_at", "last_updated_by")
)
def build_access_token(username, passwordtr):
r = requests.post(
"https://hub.docker.com/v2/users/login/",
data={"username": username, "password": password},
)
r.raise_for_status()
token = r.json().get("token")
return {"Authorization": "JWT " + token}
def list_repos(user, token):
r = requests.get("https://hub.docker.com/v2/repositories/" + user, headers=token)
r.raise_for_status()
ret = sorted(
repo["user"] + "/" + repo["name"] for repo in r.json().get("results", [])
)
if ret:
print("repos found:")
print("".join("\n\t" + r for r in ret))
return ret
def list_tags(repo, token):
r = requests.get(
"https://hub.docker.com/v2/repositories/" + repo + "/tags", headers=token
)
r.raise_for_status()
return [
IMAGE_INFO(
repo=repo,
tag=t["name"],
size=t["full_size"],
last_updated_at=t["last_updated"],
last_updated_by=t["last_updater_username"],
)
for t in r.json().get("results", [])
]
def save_to_s3(tags):
table_content = ""
client = boto3.client("s3")
for t in tags:
table_content += (
"<tr><td>{repo}</td><td>{tag}</td><td>{size}</td>"
"<td>{last_updated_at}</td><td>{last_updated_by}</td></tr>"
).format(
repo=t.repo,
tag=t.tag,
size=t.size,
last_updated_at=t.last_updated_at,
last_updated_by=t.last_updated_by,
)
html_body = """
<html>
<head>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
crossorigin="anonymous">
<link rel="stylesheet" type="text/css"
href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">
</script>
<script type="text/javascript" charset="utf8"
src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>
<title> docker image info</title>
</head>
<body>
<table class="table table-striped table-hover" id="docker">
<caption>Docker images on docker hub</caption>
<thead class="thead-dark">
<tr>
<th scope="col">repo</th>
<th scope="col">tag</th>
<th scope="col">size</th>
<th scope="col">last_updated_at</th>
<th scope="col">last_updated_by</th>
</tr>
</thead>
<tbody>
{table_content}
</tbody>
</table>
</body>
<script>
$(document).ready( function () {{
$('#docker').DataTable({{paging: false}});
}} );py
</script>
</html>
""".format(
table_content=table_content
)
client.put_object(
Bucket="docker.pytorch.org",
ACL="public-read",
Key="docker_hub.html",
Body=html_body,
ContentType="text/html",
)
if __name__ == "__main__":
username = os.environ.get("DOCKER_HUB_USERNAME")
password = os.environ.get("DOCKER_HUB_PASSWORD")
token = build_access_token(username, password)
tags = []
for repo in list_repos("pytorch", token):
tags.extend(list_tags(repo, token))
save_to_s3(tags)

202
.circleci/ecr_gc_docker/gc.py Executable file
View File

@ -0,0 +1,202 @@
#!/usr/bin/env python
import argparse
import datetime
import boto3
import pytz
import sys
def save_to_s3(project, data):
table_content = ""
client = boto3.client("s3")
for repo, tag, window, age, pushed in data:
table_content += "<tr><td>{repo}</td><td>{tag}</td><td>{window}</td><td>{age}</td><td>{pushed}</td></tr>".format(
repo=repo, tag=tag, window=window, age=age, pushed=pushed
)
html_body = """
<html>
<head>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
crossorigin="anonymous">
<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>
<title>{project} nightly and permanent docker image info</title>
</head>
<body>
<table class="table table-striped table-hover" id="docker">
<thead class="thead-dark">
<tr>
<th scope="col">repo</th>
<th scope="col">tag</th>
<th scope="col">keep window</th>
<th scope="col">age</th>
<th scope="col">pushed at</th>
</tr>
</thead>
<tbody>
{table_content}
</tbody>
</table>
</body>
<script>
$(document).ready( function () {{
$('#docker').DataTable({{paging: false}});
}} );
</script>
</html>
""".format(
project=project, table_content=table_content
)
# for pytorch, file can be found at
# http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html
# and later one we can config docker.pytorch.org to point to the location
client.put_object(
Bucket="docker.pytorch.org",
ACL="public-read",
Key="{project}.html".format(project=project),
Body=html_body,
ContentType="text/html",
)
def repos(client):
paginator = client.get_paginator("describe_repositories")
pages = paginator.paginate(registryId="308535385114")
for page in pages:
for repo in page["repositories"]:
yield repo
def images(client, repository):
paginator = client.get_paginator("describe_images")
pages = paginator.paginate(
registryId="308535385114", repositoryName=repository["repositoryName"]
)
for page in pages:
for image in page["imageDetails"]:
yield image
parser = argparse.ArgumentParser(description="Delete old Docker tags from registry")
parser.add_argument(
"--dry-run", action="store_true", help="Dry run; print tags that would be deleted"
)
parser.add_argument(
"--keep-stable-days",
type=int,
default=14,
help="Days of stable Docker tags to keep (non per-build images)",
)
parser.add_argument(
"--keep-unstable-days",
type=int,
default=1,
help="Days of unstable Docker tags to keep (per-build images)",
)
parser.add_argument(
"--filter-prefix",
type=str,
default="",
help="Only run cleanup for repositories with this prefix",
)
parser.add_argument(
"--ignore-tags",
type=str,
default="",
help="Never cleanup these tags (comma separated)",
)
args = parser.parse_args()
if not args.ignore_tags or not args.filter_prefix:
print(
"""
Missing required arguments --ignore-tags and --filter-prefix
You must specify --ignore-tags and --filter-prefix to avoid accidentally
pruning a stable Docker tag which is being actively used. This will
make you VERY SAD. So pay attention.
First, which filter-prefix do you want? The list of valid prefixes
is in jobs/private.groovy under the 'docker-registry-cleanup' job.
You probably want either pytorch or caffe2.
Second, which ignore-tags do you want? It should be whatever the most
up-to-date DockerVersion for the repository in question is. Follow
the imports of jobs/pytorch.groovy to find them.
"""
)
sys.exit(1)
client = boto3.client("ecr", region_name="us-east-1")
stable_window = datetime.timedelta(days=args.keep_stable_days)
unstable_window = datetime.timedelta(days=args.keep_unstable_days)
now = datetime.datetime.now(pytz.UTC)
ignore_tags = args.ignore_tags.split(",")
def chunks(chunkable, n):
""" Yield successive n-sized chunks from l.
"""
for i in range(0, len(chunkable), n):
yield chunkable[i : i + n]
stable_window_tags = []
for repo in repos(client):
repositoryName = repo["repositoryName"]
if not repositoryName.startswith(args.filter_prefix):
continue
# Keep list of image digests to delete for this repository
digest_to_delete = []
print(repositoryName)
for image in images(client, repo):
tags = image.get("imageTags")
if not isinstance(tags, (list,)) or len(tags) == 0:
continue
tag = tags[0]
created = image["imagePushedAt"]
age = now - created
# new images build on circle ci use workflow ID as tag, which has 4 "-"
if tag.isdigit() or tag.count("-") == 4 or tag in ignore_tags:
window = stable_window
if tag in ignore_tags:
stable_window_tags.append((repositoryName, tag, "", age, created))
elif age < window:
stable_window_tags.append((repositoryName, tag, window, age, created))
else:
window = unstable_window
if tag in ignore_tags:
print("Ignoring tag {} (age: {})".format(tag, age))
continue
if age < window:
print("Not deleting manifest for tag {} (age: {})".format(tag, age))
continue
if args.dry_run:
print("(dry run) Deleting manifest for tag {} (age: {})".format(tag, age))
else:
print("Deleting manifest for tag {} (age: {})".format(tag, age))
digest_to_delete.append(image["imageDigest"])
# Issue batch delete for all images to delete for this repository
# Note that as of 2018-07-25, the maximum number of images you can
# delete in a single batch is 100, so chunk our list into batches of
# 100
for c in chunks(digest_to_delete, 100):
client.batch_delete_image(
registryId="308535385114",
repositoryName=repositoryName,
imageIds=[{"imageDigest": digest} for digest in c],
)
save_to_s3(args.filter_prefix, stable_window_tags)

View File

@ -0,0 +1,3 @@
boto3
pytz
requests

View File

@ -88,8 +88,11 @@ YAML_SOURCES = [
File("job-specs-custom.yml"),
File("binary_update_htmls.yml"),
File("binary-build-tests.yml"),
File("docker_build_job.yml"),
File("docker_jobs.yml"),
File("workflows.yml"),
File("workflows-setup-job.yml"),
File("windows-build-test.yml"),
Listgen(pytorch_build_definitions.get_workflow_jobs, 3),
File("workflows-pytorch-macos-builds.yml"),
File("workflows-pytorch-android-gradle-build.yml"),
@ -102,12 +105,14 @@ YAML_SOURCES = [
Listgen(binary_build_definitions.get_binary_build_jobs, 3),
File("workflows-nightly-ios-binary-builds.yml"),
File("workflows-nightly-android-binary-builds.yml"),
Header("Nightly tests"),
Listgen(binary_build_definitions.get_nightly_tests, 3),
File("workflows-nightly-uploads-header.yml"),
Listgen(binary_build_definitions.get_nightly_uploads, 3),
File("workflows-s3-html.yml"),
File("workflows-docker-builder.yml")
File("workflows-docker-builder.yml"),
File("workflows-ecr-gc.yml"),
]

View File

@ -1,5 +1,11 @@
#!/bin/bash
set -eux -o pipefail
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
@ -17,7 +23,7 @@ export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
# Clone the Pytorch branch
git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"
retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"
pushd "$PYTORCH_ROOT"
if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then
# "smoke" binary build on PRs
@ -33,13 +39,13 @@ else
echo "Can't tell what to checkout"
exit 1
fi
git submodule update --init --recursive --quiet
retry git submodule update --init --recursive
echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder master repo
git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -31,9 +31,9 @@ fi
conda_sh="$workdir/install_miniconda.sh"
if [[ "$(uname)" == Darwin ]]; then
retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
else
retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
fi
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"

View File

@ -5,20 +5,24 @@ echo ""
echo "DIR: $(pwd)"
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/Downloads/conda.sh
/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
# Install dependencies
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
@ -26,13 +30,13 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
export BUILD_PYTORCH_MOBILE=1
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
#store the binary
cd ${WORKSPACE}
DEST_DIR=${WORKSPACE}/ios
mkdir -p ${DEST_DIR}
cp -R ${PROJ_ROOT}/build_ios/install ${DEST_DIR}
mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}
mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

View File

@ -14,11 +14,13 @@ mkdir -p ${ZIP_DIR}/src
cp -R ${ARTIFACTS_DIR}/arm64/include ${ZIP_DIR}/install/
# build a FAT bianry
cd ${ZIP_DIR}/install/lib
target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch.a)
target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch_cpu.a libtorch.a libXNNPACK.a)
for lib in ${target_libs[*]}
do
libs=(${ARTIFACTS_DIR}/x86_64/lib/${lib} ${ARTIFACTS_DIR}/arm64/lib/${lib})
lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}
if [ -f "${ARTIFACTS_DIR}/x86_64/lib/${lib}" ] && [ -f "${ARTIFACTS_DIR}/arm64/lib/${lib}" ]; then
libs=("${ARTIFACTS_DIR}/x86_64/lib/${lib}" "${ARTIFACTS_DIR}/arm64/lib/${lib}")
lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}
fi
done
# for nnpack, we only support arm64 build
cp ${ARTIFACTS_DIR}/arm64/lib/libnnpack.a ./

View File

@ -11,11 +11,15 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
source activate testenv >/dev/null
elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then
export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"
elif [[ "$DESIRED_PYTHON" == 3.8m ]]; then
export PATH="/opt/python/cp38-cp38/bin:\$PATH"
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"
python_path="/opt/python/cp\$python_nodot-cp\${python_nodot}"
# Prior to Python 3.8 paths were suffixed with an 'm'
if [[ -d "\${python_path}/bin" ]]; then
export PATH="\${python_path}/bin:\$PATH"
elif [[ -d "\${python_path}m/bin" ]]; then
export PATH="\${python_path}m/bin:\$PATH"
fi
fi
# Install the package
@ -28,11 +32,11 @@ pkg="/final_pkgs/\$(ls /final_pkgs)"
if [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "\$pkg" --offline
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
conda install -y cpuonly -c pytorch
retry conda install -y cpuonly -c pytorch
fi
retry conda install -yq future numpy protobuf six
if [[ "$DESIRED_CUDA" != 'cpu' ]]; then
# DESIRED_CUDA is in format cu90 or cu100
# DESIRED_CUDA is in format cu90 or cu102
if [[ "${#DESIRED_CUDA}" == 4 ]]; then
cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
else

View File

@ -5,15 +5,6 @@ set -eu -o pipefail
set +x
declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
cat >/home/circleci/project/login_to_anaconda.sh <<EOL
set +x
echo "Trying to login to Anaconda"
yes | anaconda login \
--username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"
set -x
EOL
chmod +x /home/circleci/project/login_to_anaconda.sh
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
@ -21,12 +12,18 @@ chmod +x /home/circleci/project/login_to_anaconda.sh
set -eux -o pipefail
export PATH="$MINICONDA_ROOT/bin:$PATH"
# This gets set in binary_populate_env.sh, but lets have a sane default just in case
PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}
# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable
# The only difference is the trailing slash
# Strip trailing slashes if there
CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')
# Upload the package to the final location
pushd /home/circleci/project/final_pkgs
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry timeout 30 /home/circleci/project/login_to_anaconda.sh
anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force
anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

View File

@ -4,15 +4,6 @@ set -eu -o pipefail
set +x
export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
cat >/Users/distiller/project/login_to_anaconda.sh <<EOL
set +x
echo "Trying to login to Anaconda"
yes | anaconda login \
--username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"
set -x
EOL
chmod +x /Users/distiller/project/login_to_anaconda.sh
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
@ -22,11 +13,17 @@ set -eux -o pipefail
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# This gets set in binary_populate_env.sh, but lets have a sane default just in case
PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}
# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable
# The only difference is the trailing slash
# Strip trailing slashes if there
CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')
pushd "$workdir/final_pkgs"
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry /Users/distiller/project/login_to_anaconda.sh
retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force
retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

View File

@ -40,25 +40,25 @@ if [[ -z "$DOCKER_IMAGE" ]]; then
fi
fi
# Upload to parallel folder for devtoolsets
# All nightlies used to be devtoolset3, then devtoolset7 was added as a build
# option, so the upload was redirected to nightly/devtoolset7 to avoid
# conflicts with other binaries (there shouldn't be any conflicts). Now we are
# making devtoolset7 the default.
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$(uname)" == 'Darwin' ]]; then
export PIP_UPLOAD_FOLDER='nightly/'
else
# On linux machines, this shouldn't actually be called anymore. This is just
# here for extra safety.
export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'
fi
# Default to nightly, since that's where this normally uploads to
PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.5.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
if git describe --tags --exact >/dev/null 2>/dev/null; then
# Switch upload folder to 'test/' if we are on a tag
PIP_UPLOAD_FOLDER='test/'
# Grab git tag, remove prefixed v and remove everything after -
# Used to clean up tags that are for release candidates like v1.5.0-rc1
# Turns tag v1.5.0-rc1 -> v1.5.0
BASE_BUILD_VERSION="$(git describe --tags | sed -e 's/^v//' -e 's/-.*$//')"
fi
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE"
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"
else
export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE+$DESIRED_CUDA"
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
fi
export PYTORCH_BUILD_NUMBER=1
@ -96,7 +96,7 @@ export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.4.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.5.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

View File

@ -16,31 +16,12 @@ set -eux -o pipefail
# Expect actual code to be written to this file
chmod +x /home/circleci/project/ci_test_script.sh
VOLUME_MOUNTS="-v /home/circleci/project/:/circleci_stuff -v /home/circleci/project/final_pkgs:/final_pkgs -v ${PYTORCH_ROOT}:/pytorch -v ${BUILDER_ROOT}:/builder"
# Run the docker
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d "${DOCKER_IMAGE}")
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")
else
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d "${DOCKER_IMAGE}")
fi
# Copy the envfile and script with all the code to run into the docker.
docker cp /home/circleci/project/. "$id:/circleci_stuff"
# Copy built packages into the docker to test. This should only exist on the
# binary test jobs. The package should've been created from a binary build job,
# whhich persisted the package to a CircleCI workspace, which this job then
# copies into a GPU enabled docker for testing
if [[ -d "/home/circleci/project/final_pkgs" ]]; then
docker cp /home/circleci/project/final_pkgs "$id:/final_pkgs"
fi
# Copy the needed repos into the docker. These do not exist in the smoke test
# jobs, since the smoke test jobs do not need the Pytorch source code.
if [[ -d "$PYTORCH_ROOT" ]]; then
docker cp "$PYTORCH_ROOT" "$id:/pytorch"
fi
if [[ -d "$BUILDER_ROOT" ]]; then
docker cp "$BUILDER_ROOT" "$id:/builder"
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")
fi
# Execute the test script that was populated by an earlier section

View File

@ -57,7 +57,6 @@ time python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THNN/generic/THNN.h \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml
@ -73,10 +72,10 @@ time python tools/setup_helpers/generate_code.py \
# Build the docs
pushd docs/cpp
pip install breathe==4.11.1 bs4 lxml six
pip install breathe>=4.13.0 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx==1.8.5
pip install sphinx>=2.0
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j

View File

@ -90,6 +90,12 @@ else
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"
fi
# Prevent Google from indexing $install_path/_modules. This folder contains
# generated source files.
# NB: the following only works on gnu sed. The sed shipped with mac os is different.
# One can `brew install gnu-sed` on a mac and then use "gsed" instead of "sed".
find "$install_path/_modules" -name "*.html" -print0 | xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'
git add "$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"

View File

@ -2,7 +2,7 @@
set -ex -o pipefail
# Set up NVIDIA docker repo
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L --retry 3 https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
@ -45,7 +45,7 @@ retry () {
retry sudo pip -q install awscli==1.16.35
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-430.40.run"
DRIVER_FN="NVIDIA-Linux-x86_64-440.59.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi

View File

@ -2,7 +2,7 @@
set -eux -o pipefail
# Set up CircleCI GPG keys for apt, if needed
curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
curl --retry 3 -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
# Stop background apt updates. Hypothetically, the kill should not
# be necessary, because stop is supposed to send a kill signal to

View File

@ -1,140 +0,0 @@
import argparse
import re
import sys
# Modify this variable if you want to change the set of default jobs
# which are run on all pull requests.
#
# WARNING: Actually, this is a lie; we're currently also controlling
# the set of jobs to run via the Workflows filters in CircleCI config.
default_set = set([
# PyTorch CPU
# Selected oldest Python 2 version to ensure Python 2 coverage
'pytorch-linux-xenial-py2.7.9',
# PyTorch CUDA
'pytorch-linux-xenial-cuda9-cudnn7-py3',
# PyTorch ASAN
'pytorch-linux-xenial-py3-clang5-asan',
# PyTorch DEBUG
'pytorch-linux-xenial-py3.6-gcc5.4',
# LibTorch
'pytorch-libtorch-linux-xenial-cuda9-cudnn7-py3',
# Caffe2 CPU
'caffe2-py2-mkl-ubuntu16.04',
# Caffe2 CUDA
'caffe2-py3.5-cuda10.1-cudnn7-ubuntu16.04',
# Caffe2 ONNX
'caffe2-onnx-py2-gcc5-ubuntu16.04',
'caffe2-onnx-py3.6-clang7-ubuntu16.04',
# Caffe2 Clang
'caffe2-py2-clang7-ubuntu16.04',
# Caffe2 CMake
'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',
# Caffe2 CentOS
'caffe2-py3.6-devtoolset7-cuda9.0-cudnn7-centos7',
# Binaries
'manywheel 2.7mu cpu devtoolset7',
'libtorch 2.7m cpu devtoolset7',
'libtorch 2.7m cpu gcc5.4_cxx11-abi',
'libtorch 2.7 cpu',
'libtorch-ios-11.2.1-nightly-x86_64-build',
'libtorch-ios-11.2.1-nightly-arm64-build',
'libtorch-ios-11.2.1-nightly-binary-build-upload',
# Caffe2 Android
'caffe2-py2-android-ubuntu16.04',
# Caffe2 OSX
'caffe2-py2-system-macos10.13',
# PyTorch OSX
'pytorch-macos-10.13-py3',
'pytorch-macos-10.13-cuda9.2-cudnn7-py3',
# PyTorch Android
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',
'pytorch-linux-xenial-py3-clang5-android-ndk-r19',
# PyTorch Android gradle
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',
# Pytorch iOS builds
'pytorch-ios-11.2.1-x86_64_build',
'pytorch-ios-11.2.1-arm64_build',
# PyTorch Mobile builds
'pytorch-linux-xenial-py3-clang5-mobile-build',
# Pytorch backward compatibility check
'pytorch-linux-backward-compatibility-check-test',
# XLA
'pytorch-xla-linux-xenial-py3.6-clang7',
# GraphExecutor config jobs
'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test',
'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test',
# Other checks
'pytorch-short-perf-test-gpu',
'pytorch-python-doc-push',
'pytorch-cpp-doc-push',
])
# Collection of jobs that are *temporarily* excluded from running on PRs.
# Use this if there is a long-running job breakage that we can't fix with a
# single revert.
skip_override = {
# example entry:
# 'pytorch-cpp-doc-push': "https://github.com/pytorch/pytorch/issues/<related issue>"
}
# Takes in commit message to analyze via stdin
#
# This script will query Git and attempt to determine if we should
# run the current CI job under question
#
# NB: Try to avoid hard-coding names here, so there's less place to update when jobs
# are updated/renamed
#
# Semantics in the presence of multiple tags:
# - Let D be the set of default builds
# - Let S be the set of explicitly specified builds
# - Let O be the set of temporarily skipped builds
# - Run S \/ (D - O)
parser = argparse.ArgumentParser()
parser.add_argument('build_environment')
args = parser.parse_args()
commit_msg = sys.stdin.read()
# Matches anything that looks like [foo ci] or [ci foo] or [foo test]
# or [test foo]
RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')
markers = RE_MARKER.finditer(commit_msg)
for m in markers:
if m.group(1) and m.group(2):
print("Unrecognized marker: {}".format(m.group(0)))
continue
spec = m.group(1) or m.group(2)
if spec is None:
print("Unrecognized marker: {}".format(m.group(0)))
continue
if spec in args.build_environment or spec == 'all':
print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))
sys.exit(0)
skip_override_set = set(skip_override.keys())
should_run_set = default_set - skip_override_set
for spec in should_run_set:
if spec in args.build_environment:
print("Accepting {} as part of default set".format(args.build_environment))
sys.exit(0)
print("Rejecting {}".format(args.build_environment))
for spec, issue in skip_override.items():
if spec in args.build_environment:
print("This job is temporarily excluded from running on PRs. Reason: {}".format(issue))
break
sys.exit(1)

View File

@ -1,29 +0,0 @@
#!/usr/bin/env bash
set -exu -o pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# Check if we should actually run
echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
if [ -z "${BUILD_ENVIRONMENT:-}" ]; then
echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"
echo "CircleCI scripts are probably misconfigured."
exit 1
fi
if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then
echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"
echo "written out. Are you perhaps running the wrong copy of this script?"
echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"
exit 1
fi
if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then
if [[ $CIRCLE_BRANCH != "ci-all/"* ]] && [[ $CIRCLE_BRANCH != "nightly" ]] && [[ $CIRCLE_BRANCH != "postnightly" ]] ; then
# Don't swallow "script doesn't exist
[ -e "$SCRIPT_DIR/should_run_job.py" ]
if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then
circleci step halt
exit
fi
fi
fi

View File

@ -0,0 +1,87 @@
import glob
import json
import logging
import os
import os.path
import re
import sys
import time
import requests
def get_size(file_dir):
try:
# we should only expect one file, if no, something is wrong
file_name = glob.glob(os.path.join(file_dir, "*"))[0]
return os.stat(file_name).st_size
except:
logging.exception(f"error getting file from: {file_dir}")
return 0
def build_message(size):
pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [
None,
None,
None,
]
os_name = os.uname()[0].lower()
if os_name == "darwin":
os_name = "macos"
return {
"normal": {
"os": os_name,
"pkg_type": pkg_type,
"py_ver": py_ver,
"cu_ver": cu_ver,
"pr": os.environ.get("CIRCLE_PR_NUMBER"),
"build_num": os.environ.get("CIRCLE_BUILD_NUM"),
"sha1": os.environ.get("CIRCLE_SHA1"),
"branch": os.environ.get("CIRCLE_BRANCH"),
},
"int": {
"time": int(time.time()),
"size": size,
"commit_time": int(os.environ.get("COMMIT_TIME", "0")),
},
}
def send_message(message):
access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")
if not access_token:
raise ValueError("Can't find access token from environment variable")
url = "https://graph.facebook.com/scribe_logs"
r = requests.post(
url,
data={
"access_token": access_token,
"logs": json.dumps(
[
{
"category": "perfpipe_pytorch_binary_size",
"message": json.dumps(message),
"line_escape": False,
}
]
),
},
)
print(r.text)
r.raise_for_status()
if __name__ == "__main__":
file_dir = os.environ.get(
"PYTORCH_FINAL_PACKAGE_DIR", "/home/circleci/project/final_pkgs"
)
if len(sys.argv) == 2:
file_dir = sys.argv[1]
print("checking dir: " + file_dir)
size = get_size(file_dir)
if size != 0:
try:
send_message(build_message(size))
except:
logging.exception("can't send message")

View File

@ -0,0 +1,25 @@
$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"
$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
"--add Microsoft.VisualStudio.Component.VC.Tools.14.11",
"--add Microsoft.Component.MSBuild",
"--add Microsoft.VisualStudio.Component.Roslyn.Compiler",
"--add Microsoft.VisualStudio.Component.TextTemplating",
"--add Microsoft.VisualStudio.Component.VC.CoreIde",
"--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest",
"--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core",
"--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
"--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")
curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS 2017 installer failed"
exit 1
}
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru
Remove-Item -Path vs_installer.exe -Force
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."
exit 1
}

View File

@ -1,43 +1,43 @@
#!/usr/bin/env python3
import urllib.request
import re
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.caffe2_build_definitions as caffe2_build_definitions
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
from yaml import load
RE_VERSION = re.compile(r'allDeployedVersions = "([0-9,]+)"')
try:
from yaml import CLoader as Loader
except ImportError:
from yaml import Loader
URL_TEMPLATE = (
"https://raw.githubusercontent.com/pytorch/ossci-job-dsl/"
"master/src/main/groovy/ossci/{}/DockerVersion.groovy"
)
def check_version(job, expected_version):
url = URL_TEMPLATE.format(job)
with urllib.request.urlopen(url) as f:
contents = f.read().decode('utf-8')
m = RE_VERSION.search(contents)
if not m:
raise RuntimeError(
"Unbelievable! I could not find the variable allDeployedVersions in "
"{}; did the organization of ossci-job-dsl change?\n\nFull contents:\n{}"
.format(url, contents)
)
valid_versions = [int(v) for v in m.group(1).split(',')]
if expected_version not in valid_versions:
raise RuntimeError(
"We configured {} to use Docker version {}; but this "
"version is not deployed in {}. Non-deployed versions will be "
"garbage collected two weeks after they are created. DO NOT LAND "
"THIS TO MASTER without also updating ossci-job-dsl with this version."
"\n\nDeployed versions: {}"
.format(job, expected_version, url, m.group(1))
)
def load_config(filename=".circleci/config.yml"):
with open(filename, "r") as fh:
return load("".join(fh.readlines()), Loader)
def load_tags_for_projects(workflow_config):
return {
v["ecr_gc_job"]["project"]: v["ecr_gc_job"]["tags_to_keep"]
for v in workflow_config["workflows"]["ecr_gc"]["jobs"]
if isinstance(v, dict) and "ecr_gc_job" in v
}
def check_version(job, tags, expected_version):
valid_versions = tags[job].split(",")
if expected_version not in valid_versions:
raise RuntimeError(
"We configured {} to use Docker version {}; but this "
"version is not configured in job ecr_gc_job_for_{}. Non-deployed versions will be "
"garbage collected two weeks after they are created. DO NOT LAND "
"THIS TO MASTER without also updating ossci-job-dsl with this version."
"\n\nDeployed versions: {}".format(job, expected_version, job, tags[job])
)
def validate_docker_version():
check_version('pytorch', pytorch_build_definitions.DOCKER_IMAGE_VERSION)
check_version('caffe2', caffe2_build_definitions.DOCKER_IMAGE_VERSION)
tags = load_tags_for_projects(load_config())
check_version("pytorch", tags, pytorch_build_definitions.DOCKER_IMAGE_VERSION)
check_version("caffe2", tags, caffe2_build_definitions.DOCKER_IMAGE_VERSION)
if __name__ == "__main__":

View File

@ -12,9 +12,3 @@
# resource_class: gpu.medium
# <<: *binary_linux_test
#
# binary_linux_libtorch_2.7m_cu100_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 2.7m cu100"
# resource_class: gpu.medium
# <<: *binary_linux_test

View File

@ -2,7 +2,7 @@
<<: *binary_linux_build_params
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- run:
<<: *binary_checkout
- run:
@ -19,8 +19,8 @@
elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then
retry apt-get update
retry apt-get -y install expect moreutils
conda install -y -c eumetsat expect
conda install -y cmake
retry conda install -y -c eumetsat expect
retry conda install -y cmake
fi
- run:
name: Update compiler to devtoolset7
@ -41,6 +41,16 @@
no_output_timeout: "1h"
command: |
source "/pytorch/.circleci/scripts/binary_linux_build.sh"
- run:
name: save binary size
no_output_timeout: "5m"
command: |
source /env
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
pip3 install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0
- persist_to_workspace:
root: /
paths: final_pkgs
@ -56,7 +66,7 @@
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
# TODO: We shouldn't attach the workspace multiple times
- attach_workspace:
at: /home/circleci/project
@ -79,7 +89,7 @@
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- setup_ci_environment
- attach_workspace:
@ -130,7 +140,7 @@
smoke_mac_test:
<<: *binary_linux_test_upload_params
macos:
xcode: "9.0"
xcode: "9.4.1"
steps:
- attach_workspace:
at: ~/workspace
@ -158,10 +168,10 @@
binary_mac_build:
<<: *binary_mac_params
macos:
xcode: "9.0"
xcode: "9.4.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- run:
<<: *binary_checkout
- run:
@ -199,10 +209,10 @@
binary_mac_upload: &binary_mac_upload
<<: *binary_mac_params
macos:
xcode: "9.0"
xcode: "9.4.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- run:
<<: *binary_checkout
- run:
@ -227,7 +237,7 @@
steps:
- attach_workspace:
at: ~/workspace
- should_run_job
- attach_scripts
- checkout
- run_brew_for_ios_build
- run:
@ -247,15 +257,15 @@
- persist_to_workspace:
root: /Users/distiller/workspace/
paths: ios
binary_ios_upload:
binary_ios_upload:
<<: *pytorch_ios_params
macos:
xcode: "11.2.1"
steps:
- attach_workspace:
at: ~/workspace
- should_run_job
- attach_scripts
- checkout
- run_brew_for_ios_build
- run:

View File

@ -4,7 +4,7 @@
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- checkout
- setup_ci_environment
@ -64,7 +64,7 @@
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- setup_ci_environment
- run:
@ -124,10 +124,10 @@
caffe2_macos_build:
<<: *caffe2_params
macos:
xcode: "9.0"
xcode: "9.4.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- checkout
- run_brew_for_macos_build
- run:
@ -151,7 +151,7 @@
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
chmod +x ${TMPDIR}/conda.sh
/bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/conda.sh
@ -162,7 +162,7 @@
pip -q install numpy
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

View File

@ -3,16 +3,12 @@ commands:
# attaches the workspace at ~/workspace; this workspace is generated
# by the setup job. Note that ~/workspace is not the default working
# directory (that's ~/project).
should_run_job:
description: "Test if the job should run or not"
attach_scripts:
description: "Attach the scripts that power everything else"
steps:
- attach_workspace:
name: Attaching workspace
at: ~/workspace
- run:
name: Should run job
no_output_timeout: "2m"
command: ~/workspace/.circleci/scripts/should_run_job.sh
# This system setup script is meant to run before the CI-related scripts, e.g.,
# installing Git client, checking out code, setting up CI env, and

View File

@ -1,21 +0,0 @@
docker_build_job:
parameters:
image_name:
type: string
default: ""
machine:
image: ubuntu-1604:201903-01
resource_class: large
environment:
IMAGE_NAME: << parameters.image_name >>
steps:
- checkout
- run:
name: build_docker_image_<< parameters.image_name >>
no_output_timeout: "1h"
command: |
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
set -x
cd .circleci/docker && ./build_docker.sh

View File

@ -0,0 +1,84 @@
docker_build_job:
parameters:
image_name:
type: string
default: ""
machine:
image: ubuntu-1604:201903-01
resource_class: large
environment:
IMAGE_NAME: << parameters.image_name >>
steps:
- checkout
- run:
name: build_docker_image_<< parameters.image_name >>
no_output_timeout: "1h"
command: |
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
set -x
cd .circleci/docker && ./build_docker.sh
docker_for_ecr_gc_build_job:
machine:
image: ubuntu-1604:201903-01
steps:
- checkout
- run:
name: build_docker_image_for_ecr_gc
no_output_timeout: "1h"
command: |
cd .circleci/ecr_gc_docker
docker build . -t 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
eval $(aws ecr get-login --no-include-email --region us-east-1)
set -x
docker push 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
ecr_gc_job:
parameters:
project:
type: string
default: "pytorch"
tags_to_keep: # comma separate values
type: string
environment:
PROJECT: << parameters.project >>
IMAGE_TAG: << parameters.tags_to_keep >>
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
aws_auth:
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
steps:
- run:
name: garbage collecting for ecr images
no_output_timeout: "1h"
command: |
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
set -x
/usr/bin/gc.py --filter-prefix ${PROJECT} --ignore-tags ${IMAGE_TAG}
docker_hub_index_job:
docker:
- image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
aws_auth:
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
steps:
- run:
name: garbage collecting for ecr images
no_output_timeout: "1h"
command: |
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
export DOCKER_HUB_USERNAME=${CIRCLECI_DOCKER_HUB_USERNAME}
export DOCKER_HUB_PASSWORD=${CIRCLECI_DOCKER_HUB_PASSWORD}
set -x
/usr/bin/docker_hub.py

View File

@ -1,15 +1,9 @@
# WARNING: DO NOT EDIT THIS FILE DIRECTLY!!!
# See the README.md in this directory.
# IMPORTANT: To update Docker image version, please first update
# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/pytorch/DockerVersion.groovy and
# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/caffe2/DockerVersion.groovy,
# and then update DOCKER_IMAGE_VERSION at the top of the following files:
# * cimodel/data/pytorch_build_definitions.py
# * cimodel/data/caffe2_build_definitions.py
# And the inline copies of the variable in
# * verbatim-sources/job-specs-custom.yml
# (grep for DOCKER_IMAGE)
# IMPORTANT: To update Docker image version, please follow
# the instructions at
# https://github.com/pytorch/pytorch/wiki/Docker-image-build-on-CircleCI
version: 2.1
@ -19,3 +13,17 @@ docker_config_defaults: &docker_config_defaults
# This IAM user only allows read-write access to ECR
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}
executors:
windows-with-nvidia-gpu:
machine:
resource_class: windows.gpu.nvidia.medium
image: windows-server-2019-nvidia:stable
shell: bash.exe
windows-cpu-with-nvidia-cuda:
machine:
# we will change to CPU host when it's ready
resource_class: windows.xlarge
image: windows-server-2019-vs2019:stable
shell: bash.exe

View File

@ -2,13 +2,13 @@
environment:
BUILD_ENVIRONMENT: pytorch-python-doc-push
# TODO: stop hardcoding this
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- setup_ci_environment
- run:
@ -47,13 +47,13 @@
pytorch_cpp_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- setup_ci_environment
- run:
@ -93,10 +93,10 @@
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
macos:
xcode: "9.0"
xcode: "9.4.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- checkout
- run_brew_for_macos_build
- run:
@ -107,7 +107,7 @@
export IN_CIRCLECI=1
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
@ -133,11 +133,11 @@
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
macos:
xcode: "9.0"
xcode: "9.4.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
# This workspace also carries binaries from the build job
- should_run_job
- attach_scripts
- run_brew_for_macos_build
- run:
name: Test
@ -154,64 +154,16 @@
- store_test_results:
path: test/test-reports
pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- checkout
- run_brew_for_macos_build
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# Install CUDA 9.2
sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true
curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip
unzip ~/cuda_9.2.64_mac_installer.zip -d ~/
sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window
sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib
sudo rm -rf /usr/local/cuda || true
# Install cuDNN 7.1 for CUDA 9.2
curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz
rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1
tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1
sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/
sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/
sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
git submodule sync && git submodule update -q --init --recursive
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
pytorch_android_gradle_build:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- should_run_job
- attach_scripts
- setup_linux_system_environment
- checkout
- setup_ci_environment
@ -291,13 +243,13 @@
pytorch_android_publish_snapshot:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- should_run_job
- attach_scripts
- setup_linux_system_environment
- checkout
- setup_ci_environment
@ -327,13 +279,13 @@
pytorch_android_gradle_build-x86_32:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
- should_run_job
- attach_scripts
- run:
name: filter out not PR runs
no_output_timeout: "5m"
@ -376,9 +328,9 @@
xcode: "11.2.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- checkout
- run_brew_for_ios_build
- run_brew_for_ios_build
- run:
name: Run Fastlane
no_output_timeout: "1h"
@ -410,30 +362,44 @@
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/Downloads/conda.sh
/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
# Install dependencies
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive
# export
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
export BUILD_PYTORCH_MOBILE=1
#check the custom build flag
echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"
if [ -n "${SELECTED_OP_LIST}" ]; then
export SELECTED_OP_LIST="${PROJ_ROOT}/ios/TestApp/custom_build/${SELECTED_OP_LIST}"
fi
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
- run:
name: Run Build Tests
name: Run Build Test
no_output_timeout: "30m"
command: |
set -e
@ -445,7 +411,11 @@
exit 1
fi
echo ${IOS_DEV_TEAM_ID}
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
else
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}
fi
if ! [ "$?" -eq "0" ]; then
echo 'xcodebuild failed!'
exit 1
@ -455,15 +425,14 @@
no_output_timeout: "2h"
command: |
set -e
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
echo "not SIMULATOR build, skip it."
exit 0
fi
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
source ~/anaconda/bin/activate
#install the latest version of PyTorch and TorchVision
pip install torch torchvision
pip install torch torchvision --progress-bar off
#run unit test
cd ${PROJ_ROOT}/ios/TestApp/benchmark
python trace_model.py
@ -471,4 +440,3 @@
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
fastlane scan

View File

@ -12,10 +12,14 @@ pytorch_params: &pytorch_params
use_cuda_docker_runtime:
type: string
default: ""
build_only:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
DOCKER_IMAGE: << parameters.docker_image >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
BUILD_ONLY: << parameters.build_only >>
resource_class: << parameters.resource_class >>
pytorch_ios_params: &pytorch_ios_params
@ -29,11 +33,46 @@ pytorch_ios_params: &pytorch_ios_params
ios_platform:
type: string
default: ""
op_list:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
IOS_ARCH: << parameters.ios_arch >>
IOS_PLATFORM: << parameters.ios_platform >>
SELECTED_OP_LIST: << parameters.op_list >>
pytorch_windows_params: &pytorch_windows_params
parameters:
test_name:
type: string
default: ""
cuda_version:
type: string
default: "10"
python_version:
type: string
default: "3.6"
vc_version:
type: string
default: "14.11"
vc_year:
type: string
default: "2017"
vc_product:
type: string
default: "BuildTools"
use_cuda:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: "pytorch-win-ws2019-cuda10-cudnn7-py3"
SCCACHE_BUCKET: "ossci-compiler-cache"
CUDA_VERSION: <<parameters.cuda_version>>
PYTHON_VERSION: <<parameters.python_version>>
VC_VERSION: <<parameters.vc_version>>
VC_YEAR: <<parameters.vc_year>>
VC_PRODUCT: <<parameters.vc_product>>
USE_CUDA: <<parameters.use_cuda>>
TORCH_CUDA_ARCH_LIST: "7.5"
JOB_BASE_NAME: <<parameters.test_name>>

View File

@ -5,7 +5,7 @@ jobs:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- checkout
- setup_ci_environment
@ -19,28 +19,27 @@ jobs:
time docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
# NB: Temporarily disable the rebase logic in v1.4.0, don't merge this change into master
# # TODO We may want to move the rebase logic to a separate step after checkout
# # Rebase to master only if in xenial_py3_6_gcc5_4 case
# if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
# echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
# set -x
# git config --global user.email "circleci.ossci@gmail.com"
# git config --global user.name "CircleCI"
# git config remote.origin.url https://github.com/pytorch/pytorch.git
# git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
# git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
# export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
# echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
# export GIT_COMMIT=${CIRCLE_SHA1}
# echo "GIT_COMMIT: " ${GIT_COMMIT}
# git checkout -f ${GIT_COMMIT}
# git reset --hard ${GIT_COMMIT}
# git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
# set +x
# else
# echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
# fi
# TODO We may want to move the rebase logic to a separate step after checkout
# Rebase to master only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
set -x
git config --global user.email "circleci.ossci@gmail.com"
git config --global user.name "CircleCI"
git config remote.origin.url https://github.com/pytorch/pytorch.git
git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
export GIT_COMMIT=${CIRCLE_SHA1}
echo "GIT_COMMIT: " ${GIT_COMMIT}
git checkout -f ${GIT_COMMIT}
git reset --hard ${GIT_COMMIT}
git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
set +x
else
echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
fi
git submodule sync && git submodule update -q --init --recursive
@ -89,7 +88,7 @@ jobs:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- attach_scripts
- setup_linux_system_environment
- setup_ci_environment
- run:
@ -132,10 +131,119 @@ jobs:
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
retrieve_test_reports
- store_test_results:
path: test-reports
pytorch_windows_build:
<<: *pytorch_windows_params
parameters:
test_name:
type: string
default: ""
cuda_version:
type: string
default: "10"
python_version:
type: string
default: "3.6"
vc_version:
type: string
default: "14.11"
vc_year:
type: string
default: "2017"
vc_product:
type: string
default: "BuildTools"
use_cuda:
type: string
default: ""
executor: windows-cpu-with-nvidia-cuda
steps:
- checkout
- run:
name: Install VS2017
command: |
if [[ "${VC_YEAR}" == "2017" ]]; then
powershell .circleci/scripts/vs_install.ps1
fi
- run:
name: Install Cuda
no_output_timeout: 30m
command: |
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/cuda_10.1.243_426.00_win10.exe
mkdir cuda_install_logs
./cuda_10.1.243_426.00_win10.exe -s -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
cat cuda_install_logs/LOG.setup.exe.log
if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe"
then
echo "CUDA installation failed"
exit 1
fi
rm -rf ./cuda_install_logs
rm -f ./cuda_10.1.243_426.00_win10.exe
- run:
name: Install Cudnn
command : |
cd c:/
curl --retry 3 -O https://ossci-windows.s3.amazonaws.com/cudnn-10.1-windows10-x64-v7.6.4.38.zip
7z x cudnn-10.1-windows10-x64-v7.6.4.38.zip -ocudnn
cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/"
- run:
name: Build
no_output_timeout: "90m"
command: |
set -e
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
set -x
.jenkins/pytorch/win-build.sh
pytorch_windows_test:
<<: *pytorch_windows_params
parameters:
test_name:
type: string
default: ""
cuda_version:
type: string
default: "10"
python_version:
type: string
default: "3.6"
vc_version:
type: string
default: "14.11"
vc_year:
type: string
default: "2017"
vc_product:
type: string
default: "BuildTools"
use_cuda:
type: string
default: ""
executor: windows-with-nvidia-gpu
steps:
- checkout
- run:
name: Install VS2017
command: |
if [[ "${VC_YEAR}" == "2017" ]]; then
powershell .circleci/scripts/vs_install.ps1
fi
- run:
name: Test
no_output_timeout: "30m"
command: |
set -e
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
set -x
.jenkins/pytorch/win-test.sh

View File

@ -0,0 +1,134 @@
# Warning: indentation here matters!
- pytorch_windows_build:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_build
cuda_version: "10"
python_version: "3.6"
vc_version: "14.11"
vc_year: "2017"
vc_product: "BuildTools"
use_cuda: "1"
requires:
- setup
filters:
branches:
only:
- master
- /ci-all\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test1
test_name: pytorch-windows-test1
cuda_version: "10"
python_version: "3.6"
vc_version: "14.11"
vc_year: "2017"
vc_product: "BuildTools"
use_cuda: "1"
requires:
- setup
- pytorch_windows_vs2017_14.11_py36_cuda10.1_build
filters:
branches:
only:
- master
- /ci-all\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test2
test_name: pytorch-windows-test2
cuda_version: "10"
python_version: "3.6"
vc_version: "14.11"
vc_year: "2017"
vc_product: "BuildTools"
use_cuda: "1"
requires:
- setup
- pytorch_windows_vs2017_14.11_py36_cuda10.1_build
filters:
branches:
only:
- master
- /ci-all\/.*/
- pytorch_windows_build:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_build
cuda_version: "10"
python_version: "3.6"
vc_version: "14.16"
vc_year: "2017"
vc_product: "BuildTools"
use_cuda: "1"
requires:
- setup
filters:
branches:
only:
- master
- /ci-all\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test1
test_name: pytorch-windows-test1
cuda_version: "10"
python_version: "3.6"
vc_version: "14.16"
vc_year: "2017"
vc_product: "BuildTools"
use_cuda: "1"
requires:
- setup
- pytorch_windows_vs2017_14.16_py36_cuda10.1_build
filters:
branches:
only:
- master
- /ci-all\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test2
test_name: pytorch-windows-test2
cuda_version: "10"
python_version: "3.6"
vc_version: "14.16"
vc_year: "2017"
vc_product: "BuildTools"
use_cuda: "1"
requires:
- setup
- pytorch_windows_vs2017_14.16_py36_cuda10.1_build
filters:
branches:
only:
- master
- /ci-all\/.*/
- pytorch_windows_build:
name: pytorch_windows_vs2019_py36_cuda10.1_build
cuda_version: "10"
python_version: "3.6"
vc_version: ""
vc_year: "2019"
vc_product: "Community"
use_cuda: "1"
requires:
- setup
- pytorch_windows_test:
name: pytorch_windows_vs2019_py36_cuda10.1_test1
test_name: pytorch-windows-test1
cuda_version: "10"
python_version: "3.6"
vc_version: ""
vc_year: "2019"
vc_product: "Community"
use_cuda: "1"
requires:
- setup
- pytorch_windows_vs2019_py36_cuda10.1_build
- pytorch_windows_test:
name: pytorch_windows_vs2019_py36_cuda10.1_test2
test_name: pytorch-windows-test2
cuda_version: "10"
python_version: "3.6"
vc_version: ""
vc_year: "2019"
vc_product: "Community"
use_cuda: "1"
requires:
- setup
- pytorch_windows_vs2019_py36_cuda10.1_build

View File

@ -1,3 +1,5 @@
# TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file
# instead of doing one offs here
# Binary builds (subset, to smoke test that they'll work)
#
# NB: If you modify this file, you need to also modify
@ -10,13 +12,17 @@
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
docker_image: "pytorch/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_build:
name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build
build_environment: "manywheel 3.7m cu100 devtoolset7"
name: binary_linux_manywheel_3_7m_cu102_devtoolset7_build
build_environment: "manywheel 3.7m cu102 devtoolset7"
requires:
- setup
docker_image: "pytorch/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda102"
filters:
branches:
only:
- master
- binary_linux_build:
name: binary_linux_conda_2_7_cpu_devtoolset7_build
build_environment: "conda 2.7 cpu devtoolset7"
@ -31,7 +37,7 @@
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
@ -46,33 +52,54 @@
build_environment: "wheel 3.6 cpu"
requires:
- setup
filters:
branches:
only:
- master
- binary_mac_build:
name: binary_macos_conda_2_7_cpu_build
build_environment: "conda 2.7 cpu"
requires:
- setup
filters:
branches:
only:
- master
# This job has an average run time of 3 hours o.O
# Now only running this on master to reduce overhead
- binary_mac_build:
name: binary_macos_libtorch_2_7_cpu_build
build_environment: "libtorch 2.7 cpu"
requires:
- setup
filters:
branches:
only:
- master
- binary_linux_test:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
- binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
docker_image: "pytorch/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda102"
filters:
branches:
only:
- master
- binary_linux_test:
name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test
build_environment: "manywheel 3.7m cu100 devtoolset7"
name: binary_linux_manywheel_3_7m_cu102_devtoolset7_test
build_environment: "manywheel 3.7m cu102 devtoolset7"
requires:
- setup
- binary_linux_manywheel_3_7m_cu100_devtoolset7_build
docker_image: "pytorch/manylinux-cuda100"
- binary_linux_manywheel_3_7m_cu102_devtoolset7_build
docker_image: "pytorch/manylinux-cuda102"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
filters:
branches:
only:
- master
- binary_linux_test:
name: binary_linux_conda_2_7_cpu_devtoolset7_test
build_environment: "conda 2.7 cpu devtoolset7"
@ -89,7 +116,7 @@
- setup
- binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

View File

@ -7,6 +7,7 @@
only:
- master
jobs:
- docker_for_ecr_gc_build_job
- docker_build_job:
name: "pytorch-linux-bionic-clang9-thrift-llvmdev"
image_name: "pytorch-linux-bionic-clang9-thrift-llvmdev"
@ -17,11 +18,8 @@
name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-cuda8-cudnn7-py2"
image_name: "pytorch-linux-xenial-cuda8-cudnn7-py2"
- docker_build_job:
name: "pytorch-linux-xenial-cuda8-cudnn7-py3"
image_name: "pytorch-linux-xenial-cuda8-cudnn7-py3"
name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
@ -46,6 +44,9 @@
- docker_build_job:
name: "pytorch-linux-xenial-py3.5"
image_name: "pytorch-linux-xenial-py3.5"
- docker_build_job:
name: "pytorch-linux-xenial-py3.8"
image_name: "pytorch-linux-xenial-py3.8"
- docker_build_job:
name: "pytorch-linux-xenial-py3.6-clang7"
image_name: "pytorch-linux-xenial-py3.6-clang7"

View File

@ -0,0 +1,28 @@
ecr_gc:
triggers:
- schedule:
cron: "45 * * * *"
filters:
branches:
only:
- master
jobs:
- ecr_gc_job:
name: ecr_gc_job_for_pytorch
project: pytorch
tags_to_keep: "271,262,256,278,282,291,300,323,327,347,389,401,402,403,405,a8006f9a-272d-4478-b137-d121c6f05c83,6e7b11da-a919-49e5-b2ba-da66e3d4bb0a,f990c76a-a798-42bb-852f-5be5006f8026"
- ecr_gc_job:
name: ecr_gc_job_for_caffe2
project: caffe2
tags_to_keep: "348,345,336,325,324,315,306,301,287,283,276,273,266,253,248,238,230,213"
- ecr_gc_job:
name: ecr_gc_job_for_translate
project: translate
tags_to_keep: "8"
- ecr_gc_job:
name: ecr_gc_job_for_tensorcomp
project: tensorcomp
tags_to_keep: "34"
- docker_hub_index_job

View File

@ -3,7 +3,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
filters:
branches:
only: nightly
@ -12,7 +12,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
filters:
branches:
only: nightly
@ -21,7 +21,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
filters:
branches:
only: nightly
@ -30,7 +30,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
filters:
branches:
only: nightly

View File

@ -4,8 +4,5 @@
#- binary_linux_libtorch_2.7m_cu90_test:
# requires:
# - binary_linux_libtorch_2.7m_cu90_build
#- binary_linux_libtorch_2.7m_cu100_test:
# requires:
# - binary_linux_libtorch_2.7m_cu100_build
# Nightly uploads

View File

@ -1,10 +1,18 @@
- pytorch_android_gradle_build-x86_32:
name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32
filters:
branches:
only:
- master
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- pytorch_android_gradle_build:
name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build
filters:
branches:
only:
- master
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

View File

@ -4,7 +4,7 @@
- setup
- pytorch_linux_xenial_py3_6_gcc5_4_build
build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"
resource_class: large
- pytorch_linux_test:
name: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test
@ -12,5 +12,5 @@
- setup
- pytorch_linux_xenial_py3_6_gcc5_4_build
build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:f990c76a-a798-42bb-852f-5be5006f8026"
resource_class: large

View File

@ -1,7 +1,6 @@
# Pytorch iOS PR builds
- pytorch_ios_build:
name: pytorch_ios_11_2_1_x86_64_build
context: org-member
build_environment: "pytorch-ios-11.2.1-x86_64_build"
ios_arch: "x86_64"
ios_platform: "SIMULATOR"
@ -15,3 +14,13 @@
ios_platform: "OS"
requires:
- setup
- pytorch_ios_build:
name: pytorch_ios_11_2_1_arm64_custom_build
context: org-member
build_environment: "pytorch-ios-11.2.1-arm64_custom_build"
ios_arch: "arm64"
ios_platform: "OS"
op_list: "mobilenetv2.yaml"
requires:
- setup

View File

@ -8,6 +8,3 @@
requires:
- setup
- pytorch_macos_10_13_py3_build
- pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
requires:
- setup

View File

@ -4,4 +4,34 @@
requires:
- setup
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:405"
build_only: "1"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:f990c76a-a798-42bb-852f-5be5006f8026"
- pytorch_linux_build:
name: pytorch_linux_xenial_py3_clang5_mobile_custom_build_static
requires:
- setup
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-custom-build-static"
build_only: "1"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:f990c76a-a798-42bb-852f-5be5006f8026"
- pytorch_linux_build:
name: pytorch_linux_xenial_py3_clang5_mobile_custom_build_dynamic
requires:
- setup
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-custom-build-dynamic"
build_only: "1"
# Use LLVM-DEV toolchain in android-ndk-r19c docker image
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"
- pytorch_linux_build:
name: pytorch_linux_xenial_py3_clang5_mobile_code_analysis
requires:
- setup
# Most of this CI is already covered by "mobile-custom-build-dynamic" job
filters:
branches:
only:
- master
- /ci-all\/.*/
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-code-analysis"
build_only: "1"
# Use LLVM-DEV toolchain in android-ndk-r19c docker image
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:f990c76a-a798-42bb-852f-5be5006f8026"

View File

@ -0,0 +1,8 @@
- setup:
# Run this job on everything since it is
# the dependency for everything.
filters:
tags:
only: /.*/
branches:
only: /.*/

View File

@ -8,6 +8,6 @@ ignore =
# these ignores are from flake8-bugbear; please fix!
B007,B008,
# these ignores are from flake8-comprehensions; please fix!
C400,C401,C402,C403,C404,C405,C407,C411,
C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
per-file-ignores = __init__.py: F401
exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi,.git

View File

@ -22,7 +22,9 @@ jobs:
pip install -r requirements.txt
cd .circleci && ./ensure-consistency.py
- name: Ensure Docker version is correctly deployed
run: .circleci/validate-docker-version.py
run: |
pip install pyyaml
.circleci/validate-docker-version.py
- name: Shellcheck Jenkins scripts
run: |
sudo apt-get install -y shellcheck
@ -65,7 +67,8 @@ jobs:
- name: Run flake8
run: |
set -eux
pip install flake8
pip install flake8 flake8-mypy flake8-bugbear flake8-comprehensions flake8-executable flake8-pyi mccabe pycodestyle pyflakes
flake8 --version
flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
cat ${GITHUB_WORKSPACE}/flake8-output.txt
- name: Add annotations
@ -102,7 +105,7 @@ jobs:
run: |
set -eux
pip install flake8
rm -rf .circleci
rm -rf .circleci tools/clang_format_new.py
flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
cat ${GITHUB_WORKSPACE}/flake8-output.txt
- name: Add annotations
@ -150,7 +153,7 @@ jobs:
# Install dependencies
pip install pyyaml
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-8 main"
sudo apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-8 main"
sudo apt-get update
sudo apt-get install -y clang-tidy-8
sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-8 1000
@ -175,7 +178,6 @@ jobs:
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THNN/generic/THNN.h \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml

15
.gitignore vendored
View File

@ -57,11 +57,6 @@ torch/csrc/jit/generated/*
torch/csrc/jit/fuser/config.h
torch/csrc/nn/THCUNN.cpp
torch/csrc/nn/THCUNN.cwrap
torch/csrc/nn/THNN_generic.cpp
torch/csrc/nn/THNN_generic.cwrap
torch/csrc/nn/THNN_generic.h
torch/csrc/nn/THNN.cpp
torch/csrc/nn/THNN.cwrap
torch/bin/
torch/cmake/
torch/lib/*.a*
@ -250,3 +245,13 @@ GSYMS
GPATH
tags
TAGS
# ccls file
.ccls-cache/
# clang-format storage location used by apply_clang_format.py
.clang-format-bin
# clangd background index
.clangd/

10
.gitmodules vendored
View File

@ -111,10 +111,14 @@
path = third_party/foxi
url = https://github.com/houseroad/foxi.git
[submodule "third_party/tbb"]
path = third_party/tbb
url = https://github.com/01org/tbb
branch = tbb_2018
path = third_party/tbb
url = https://github.com/01org/tbb
branch = tbb_2018
[submodule "android/libs/fbjni"]
ignore = dirty
path = android/libs/fbjni
url = https://github.com/facebookincubator/fbjni.git
[submodule "third_party/XNNPACK"]
ignore = dirty
path = third_party/XNNPACK
url = https://github.com/google/XNNPACK.git

View File

@ -104,7 +104,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
build_args+=("BUILD_TEST=ON")
build_args+=("USE_OBSERVERS=ON")
build_args+=("USE_ZSTD=ON")
"${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@"
BUILD_CAFFE2_MOBILE=1 "${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@"
exit 0
fi
@ -130,7 +130,7 @@ if [[ $BUILD_ENVIRONMENT == *py2-cuda9.0-cudnn7-ubuntu16.04* ]]; then
# removing http:// duplicate in favor of nvidia-ml.list
# which is https:// version of the same repo
sudo rm -f /etc/apt/sources.list.d/nvidia-machine-learning.list
curl -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
curl --retry 3 -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
sudo dpkg -i ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
sudo apt-key add /var/nvinfer-runtime-trt-repo-5.0.2-ga-cuda9.0/7fa2af80.pub
sudo apt-get -qq update
@ -175,18 +175,12 @@ if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py"
fi
# building bundled nccl in this config triggers a bug in nvlink. For
# more, see https://github.com/pytorch/pytorch/issues/14486
if [[ "${BUILD_ENVIRONMENT}" == *-cuda8*-cudnn7* ]]; then
build_args+=("USE_SYSTEM_NCCL=ON")
fi
# Try to include Redis support for Linux builds
if [ "$(uname)" == "Linux" ]; then
build_args+=("USE_REDIS=ON")
fi
# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace
# Use a specialized onnx namespace in CI to catch hardcoded onnx namespace
build_args+=("ONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")
###############################################################################

View File

@ -40,6 +40,9 @@ for test in $(find "$cpp_test_dir" -executable -type f); do
LD_LIBRARY_PATH="$ld_library_path" "$test"
fi
;;
*/*_benchmark)
LD_LIBRARY_PATH="$ld_library_path" "$test" --benchmark_color=false
;;
*)
# Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While
# planning to migrate to gtest as the common PyTorch c++ test suite, we
@ -82,7 +85,7 @@ fi
EXTRA_TESTS=()
# CUDA builds always include NCCL support
if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *-rocm* ]]; then
EXTRA_TESTS+=("$caffe2_pypath/contrib/nccl")
fi

View File

@ -0,0 +1,25 @@
#!/usr/bin/env bash
# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables
set -eu -o pipefail
# This script builds and runs code analyzer tool to generate aten op dependency
# graph for custom mobile build.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Clang version:"
clang --version
export LLVM_DIR="$(llvm-config-5.0 --prefix)"
echo "LLVM_DIR: ${LLVM_DIR}"
# Run the following 2 steps together because they share the same (reusable) time
# consuming process to build LibTorch into LLVM assembly.
# 1. Run code analysis test first to fail fast
time ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh
# 2. Run code analysis on mobile LibTorch
time ANALYZE_TORCH=1 tools/code_analyzer/build.sh -closure=false

View File

@ -0,0 +1,26 @@
#!/usr/bin/env bash
# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables
set -eu -o pipefail
# This script uses linux host toolchain + mobile build options in order to
# build & test mobile libtorch without having to setup Android/iOS
# toolchain/simulator.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Install torch & torchvision - used to download & trace test model.
pip install torch torchvision --progress-bar off
# Run end-to-end process of building mobile library, linking into the predictor
# binary, and running forward pass with a real model.
if [[ "$BUILD_ENVIRONMENT" == *-mobile-custom-build-static* ]]; then
TEST_CUSTOM_BUILD_STATIC=1 test/mobile/custom_build/build.sh
elif [[ "$BUILD_ENVIRONMENT" == *-mobile-custom-build-dynamic* ]]; then
export LLVM_DIR="$(llvm-config-5.0 --prefix)"
echo "LLVM_DIR: ${LLVM_DIR}"
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
else
TEST_DEFAULT_BUILD=1 test/mobile/custom_build/build.sh
fi

View File

@ -14,13 +14,13 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# (2) build with NCCL and MPI
# (3) build with only MPI
# (4) build with neither
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
@ -36,10 +36,12 @@ if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-mobile* ]]; then
# Use linux host toolchain + mobile build options in order to build & test
# mobile libtorch without having to setup Android/iOS toolchain/simulator.
exec ./scripts/build_mobile.sh -DBUILD_BINARY=ON "$@"
if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *-mobile-code-analysis* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile-code-analysis.sh" "$@"
fi
echo "Python version:"
@ -51,6 +53,11 @@ gcc --version
echo "CMake version:"
cmake --version
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
echo "NVCC version:"
nvcc --version
fi
# TODO: Don't run this...
pip_install -r requirements.txt || true
@ -59,7 +66,7 @@ if ! which conda; then
# In ROCm CIs, we are doing cross compilation on build machines with
# intel cpu and later run tests on machines with amd cpu.
# Also leave out two builds to make sure non-mkldnn builds still work.
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda9-cudnn7-py3-* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda10.1-cudnn7-py3-* ]]; then
pip_install mkl mkl-devel
export USE_MKLDNN=1
else
@ -98,7 +105,6 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_64* ]]; then
build_args+=("-DANDROID_ABI=x86_64")
fi
export BUILD_PYTORCH_MOBILE=1
exec ./scripts/build_android.sh "${build_args[@]}" "$@"
fi
@ -198,16 +204,6 @@ if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then
assert_git_not_dirty
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip_install -r requirements.txt || true
LC_ALL=C make html
popd
assert_git_not_dirty
fi
# Build custom operator tests.
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
CUSTOM_OP_TEST="$PWD/test/custom_operator"
@ -221,7 +217,7 @@ if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then
assert_git_not_dirty
else
# Test standalone c10 build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda10.1-cudnn7-py3* ]]; then
mkdir -p c10/build
pushd c10/build
cmake ..
@ -248,30 +244,24 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
pip_install lark-parser
# Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642
sudo add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"
sudo add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-8 main"
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
sudo apt-get -qq update
# Install clang-7 clang++-7 for xla
sudo apt-get -qq install clang-7 clang++-7
# Install clang-8 clang++-8 for xla
sudo apt-get -qq install clang-8 clang++-8
# Bazel dependencies
sudo apt-get -qq install pkg-config zip zlib1g-dev unzip
# XLA build requires Bazel
wget https://github.com/bazelbuild/bazel/releases/download/0.24.1/bazel-0.24.1-installer-linux-x86_64.sh
chmod +x bazel-*.sh
sudo ./bazel-*.sh
BAZEL="$(which bazel)"
if [ -z "${BAZEL}" ]; then
echo "Unable to find bazel..."
exit 1
fi
# Install bazels3cache for cloud cache
sudo apt-get -qq install npm
npm config set strict-ssl false
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
curl -sL --retry 3 https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install -qq nodejs
# XLA build requires Bazel
# We use bazelisk to avoid updating Bazel version manually.
sudo npm install -g @bazel/bazelisk
sudo ln -s "$(command -v bazelisk)" /usr/bin/bazel
# Install bazels3cache for cloud cache
sudo npm install -g bazels3cache
BAZELS3CACHE="$(which bazels3cache)"
if [ -z "${BAZELS3CACHE}" ]; then
@ -281,7 +271,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
bazels3cache --bucket=${XLA_CLANG_CACHE_S3_BUCKET_NAME} --maxEntrySizeBytes=0
pushd xla
export CC=clang-7 CXX=clang++-7
export CC=clang-8 CXX=clang++-8
# Use cloud cache to build when available.
sed -i '/bazel build/ a --remote_http_cache=http://localhost:7777 \\' build_torch_xla_libs.sh

View File

@ -128,7 +128,7 @@ if [ -z "$COMPACT_JOB_NAME" ]; then
exit 1
fi
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3* ]] || \
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda10.1-cudnn7-py3* ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch_macos* ]]; then
BUILD_TEST_LIBTORCH=1
@ -140,7 +140,7 @@ fi
# min version 3.5, so we only do it in two builds that we know should use conda.
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py2* ]] || \
[[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then
[[ "$BUILD_ENVIRONMENT" == *cuda10.1-cudnn7-py3* ]]; then
if ! which conda; then
echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"
exit 1
@ -179,3 +179,11 @@ function get_exit_code() {
set -e
return $retcode
}
function file_diff_from_base() {
# The fetch may fail on Docker hosts, but it's not always necessary.
set +e
git fetch origin master --quiet
set -e
git diff --name-only "$(git merge-base origin master HEAD)" > "$1"
}

View File

@ -13,7 +13,7 @@ mkdir -p ${WORKSPACE_DIR}
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p ${WORKSPACE_DIR}
retry curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
retry bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"

View File

@ -54,7 +54,14 @@ test_python_all() {
# using the address associated with the loopback interface.
export GLOO_SOCKET_IFNAME=lo0
echo "Ninja version: $(ninja --version)"
python test/run_test.py --verbose
if [ -n "$CIRCLE_PULL_REQUEST" ]; then
DETERMINE_FROM=$(mktemp)
file_diff_from_base "$DETERMINE_FROM"
fi
python test/run_test.py --verbose --determine-from="$DETERMINE_FROM"
assert_git_not_dirty
}
@ -102,7 +109,6 @@ test_custom_script_ops() {
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python test_custom_classes.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt

View File

@ -14,13 +14,13 @@ if [ -n "${IN_CIRCLECI}" ]; then
# TODO move this to docker
pip_install unittest-xml-reporting
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
@ -31,7 +31,7 @@ fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" build/bin/test_api
time python test/run_test.py --verbose -i distributed
time python test/run_test.py --verbose -i c10d
time python test/run_test.py --verbose -i c10d_spawn
time python test/run_test.py --verbose -i distributed/test_distributed
time python test/run_test.py --verbose -i distributed/test_c10d
time python test/run_test.py --verbose -i distributed/test_c10d_spawn
assert_git_not_dirty

View File

@ -15,13 +15,13 @@ if [ -n "${IN_CIRCLECI}" ]; then
# TODO move this to docker
pip_install unittest-xml-reporting
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
@ -37,6 +37,7 @@ fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# TODO: Move this to Docker
sudo apt-get -qq install --no-install-recommends apt-transport-https ca-certificates
sudo apt-get -qq update
sudo apt-get -qq install --no-install-recommends libsndfile1
fi
@ -48,7 +49,8 @@ if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
# ninja is installed in /var/lib/jenkins/.local/bin
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# TODO: move this to Docker
# TODO: Please move this to Docker
# The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
pip_install --user "hypothesis==4.53.2"
# TODO: move this to Docker
@ -74,27 +76,49 @@ fi
# if you're not careful. Check this if you made some changes and the
# ASAN test is not working
if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
# Suppress vptr violations arising from multiple copies of pybind11
export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true
# We suppress the vptr volation, since we have separate copies of
# libprotobuf in both libtorch.so and libcaffe2.so, and it causes
# the following problem:
# test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:
# runtime error: member call on address ... which does not point
# to an object of type 'google::protobuf::MessageLite'
# ...: note: object is of type 'onnx_torch::ModelProto'
#
# This problem should be solved when libtorch.so and libcaffe2.so are
# merged.
export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp
export PYTORCH_TEST_WITH_ASAN=1
export PYTORCH_TEST_WITH_UBSAN=1
# TODO: Figure out how to avoid hard-coding these paths
export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-5.0/bin/llvm-symbolizer
export TORCH_USE_RTLD_GLOBAL=1
# NB: We load libtorch.so with RTLD_GLOBAL for UBSAN, unlike our
# default behavior.
#
# The reason for this is that without RTLD_GLOBAL, if we load multiple
# libraries that depend on libtorch (as is the case with C++ extensions), we
# will get multiple copies of libtorch in our address space. When UBSAN is
# turned on, it will do a bunch of virtual pointer consistency checks which
# won't work correctly. When this happens, you get a violation like:
#
# member call on address XXXXXX which does not point to an object of
# type 'std::_Sp_counted_base<__gnu_cxx::_Lock_policy::_S_atomic>'
# XXXXXX note: object is of type
# 'std::_Sp_counted_ptr<torch::nn::LinearImpl*, (__gnu_cxx::_Lock_policy)2>'
#
# (NB: the textual types of the objects here are misleading, because
# they actually line up; it just so happens that there's two copies
# of the type info floating around in the address space, so they
# don't pointer compare equal. See also
# https://github.com/google/sanitizers/issues/1175
#
# UBSAN is kind of right here: if we relied on RTTI across C++ extension
# modules they would indeed do the wrong thing; but in our codebase, we
# don't use RTTI (because it doesn't work in mobile). To appease
# UBSAN, however, it's better if we ensure all the copies agree!
#
# By the way, an earlier version of this code attempted to load
# libtorch_python.so with LD_PRELOAD, which has a similar effect of causing
# it to be loaded globally. This isn't really a good idea though, because
# it depends on a ton of dynamic libraries that most programs aren't gonna
# have, and it applies to child processes.
export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so
# Increase stack size, because ASAN red zones use more stack
ulimit -s 81920
(cd test && python -c "import torch")
(cd test && python -c "import torch; print(torch.__version__, torch.version.git_version)")
echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_ubsan(0)")
@ -107,23 +131,28 @@ elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then
export ATEN_CPU_CAPABILITY=avx
fi
if [ -n "$CIRCLE_PULL_REQUEST" ]; then
DETERMINE_FROM=$(mktemp)
file_diff_from_base "$DETERMINE_FROM"
fi
test_python_nn() {
time python test/run_test.py --include nn --verbose
time python test/run_test.py --include test_nn --verbose --determine-from="$DETERMINE_FROM"
assert_git_not_dirty
}
test_python_ge_config_simple() {
time python test/run_test.py --include jit_simple --verbose
time python test/run_test.py --include test_jit_simple --verbose --determine-from="$DETERMINE_FROM"
assert_git_not_dirty
}
test_python_ge_config_legacy() {
time python test/run_test.py --include jit_legacy jit_fuser_legacy --verbose
time python test/run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="$DETERMINE_FROM"
assert_git_not_dirty
}
test_python_all_except_nn() {
time python test/run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods
time python test/run_test.py --exclude test_nn test_jit_simple test_jit_legacy test_jit_fuser_legacy --verbose --bring-to-front test_quantization test_quantized test_quantized_tensor test_quantized_nn_mods --determine-from="$DETERMINE_FROM"
assert_git_not_dirty
}
@ -153,7 +182,7 @@ test_aten() {
}
test_torchvision() {
pip_install --user git+https://github.com/pytorch/vision.git@44a5bae933655ed7ff798669a43452b833f9ce01
pip_install --user git+https://github.com/pytorch/vision.git@43e94b39bcdda519c093ca11d99dfa2568aa7258
}
test_libtorch() {
@ -180,7 +209,6 @@ test_custom_script_ops() {
cp -a "$CUSTOM_OP_BUILD" build
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python test_custom_classes.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
@ -191,7 +219,9 @@ test_custom_script_ops() {
test_xla() {
export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"
# Issue #30717: randomize the port of XLA/gRPC workers is listening on to reduce flaky tests.
XLA_PORT=`shuf -i 40701-40999 -n 1`
export XRT_WORKERS="localservice:0;grpc://localhost:$XLA_PORT"
pushd xla
echo "Running Python Tests"
./test/run_tests.sh
@ -214,7 +244,7 @@ test_backward_compatibility() {
pushd test/backward_compatibility
python dump_all_function_schemas.py --filename new_schemas.txt
pip_uninstall torch
pip_install torch==1.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
python check_backward_compatibility.py --new-schemas new_schemas.txt
popd
set +x
@ -240,9 +270,9 @@ elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then
# TODO: run some C++ tests
echo "no-op at the moment"
elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then
test_torchvision
test_python_nn
elif [[ "${BUILD_ENVIRONMENT}" == *-test2 || "${JOB_BASE_NAME}" == *-test2 ]]; then
test_torchvision
test_python_all_except_nn
test_aten
test_libtorch

View File

@ -10,7 +10,7 @@ if [ ! -f setup.py ]; then
fi
# shellcheck disable=SC2034
COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build
COMPACT_JOB_NAME=pytorch-win-ws2019-cuda10-cudnn7-py3-build
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
source "$SCRIPT_PARENT_DIR/common.sh"

View File

@ -4,7 +4,7 @@ if "%DEBUG%" == "1" (
set BUILD_TYPE=release
)
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%
:: This inflates our log size slightly, but it is REALLY useful to be
:: able to see what our cl.exe commands are (since you can actually
@ -35,17 +35,28 @@ goto cuda_build_end
:: Override VS env here
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
@echo on
popd
set DISTUTILS_USE_SDK=1
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=%CUDA_PATH%
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2
set CUDA_PATH_V9_2=%CUDA_PATH%
goto cuda_build_common
:cuda_build_10
pushd .
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
@echo on
popd
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
set CUDA_PATH_V10_1=%CUDA_PATH%
@ -54,6 +65,8 @@ goto cuda_build_common
:cuda_build_common
set DISTUTILS_USE_SDK=1
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
@ -64,14 +77,16 @@ set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set PATH=%TMP_DIR_WIN%\bin;%PATH%
:: Target only our CI GPU machine's CUDA arch to speed up the build
set TORCH_CUDA_ARCH_LIST=5.2
:: Target only our CI GPU machine's CUDA arch to speed up the build, we can overwrite with env var
:: default on circleci is Tesla T4 which has capability of 7.5, ref: https://developer.nvidia.com/cuda-gpus
:: jenkins has M40, which is 5.2
if "%TORCH_CUDA_ARCH_LIST%" == "" set TORCH_CUDA_ARCH_LIST=5.2
sccache --stop-server
sccache --start-server
sccache --zero-stats
set CC=sccache cl
set CXX=sccache cl
set CC=sccache-cl
set CXX=sccache-cl
set CMAKE_GENERATOR=Ninja
@ -107,7 +122,19 @@ if not "%USE_CUDA%"=="0" (
copy %TMP_DIR_WIN%\bin\sccache.exe %TMP_DIR_WIN%\bin\nvcc.exe
)
set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc
:: randomtemp is used to resolve the intermittent build error related to CUDA.
:: code: https://github.com/peterjc123/randomtemp
:: issue: https://github.com/pytorch/pytorch/issues/25393
::
:: Previously, CMake uses CUDA_NVCC_EXECUTABLE for finding nvcc and then
:: the calls are redirected to sccache. sccache looks for the actual nvcc
:: in PATH, and then pass the arguments to it.
:: Currently, randomtemp is placed before sccache (%TMP_DIR_WIN%\bin\nvcc)
:: so we are actually pretending sccache instead of nvcc itself.
curl -kL https://github.com/peterjc123/randomtemp/releases/download/v0.2/randomtemp.exe --output %TMP_DIR_WIN%\bin\randomtemp.exe
set RANDOMTEMP_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc.exe
set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\randomtemp.exe
set RANDOMTEMP_BASEDIR=%TMP_DIR_WIN%\bin
if "%REBUILD%"=="" set USE_CUDA=1

View File

@ -1,4 +1,4 @@
if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda90
if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda92
if "%CUDA_VERSION%" == "10" set CUDA_SUFFIX=cuda101
if "%CUDA_SUFFIX%" == "" (
@ -8,10 +8,10 @@ if "%CUDA_SUFFIX%" == "" (
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
) else (
aws s3 cp s3://ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
aws s3 cp s3://ossci-windows/magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
)
7z x -aoa %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
7z x -aoa %TMP_DIR_WIN%\magma_2.5.2_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
)
set MAGMA_HOME=%TMP_DIR_WIN%\magma

View File

@ -5,11 +5,11 @@ if "%BUILD_ENVIRONMENT%"=="" (
)
if "%REBUILD%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
curl --retry 3 -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3
if "%REBUILD%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3
call conda install -y -q python=%PYTHON_VERSION% numpy cffi pyyaml boto3
call conda install -y -q -c conda-forge cmake
)

View File

@ -1,8 +1,8 @@
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/mkl_2019.4.245.7z --output %TMP_DIR_WIN%\mkl.7z
curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/mkl_2020.0.166.7z --output %TMP_DIR_WIN%\mkl.7z
) else (
aws s3 cp s3://ossci-windows/mkl_2019.4.245.7z %TMP_DIR_WIN%\mkl.7z --quiet
aws s3 cp s3://ossci-windows/mkl_2020.0.166.7z %TMP_DIR_WIN%\mkl.7z --quiet
)
7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl
)

View File

@ -4,11 +4,14 @@ if "%REBUILD%"=="" (
:check_sccache
%TMP_DIR_WIN%\bin\sccache.exe --show-stats || (
taskkill /im sccache.exe /f /t || ver > nul
del %TMP_DIR_WIN%\bin\sccache.exe
del %TMP_DIR_WIN%\bin\sccache.exe || ver > nul
del %TMP_DIR_WIN%\bin\sccache-cl.exe || ver > nul
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe
curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe
curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output %TMP_DIR_WIN%\bin\sccache-cl.exe
) else (
aws s3 cp s3://ossci-windows/sccache.exe %TMP_DIR_WIN%\bin\sccache.exe
aws s3 cp s3://ossci-windows/sccache-cl.exe %TMP_DIR_WIN%\bin\sccache-cl.exe
)
goto :check_sccache
)

View File

@ -3,7 +3,7 @@ if exist "%TMP_DIR%/ci_scripts/pytorch_env_restore.bat" (
exit /b 0
)
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%
:: Install Miniconda3
if "%BUILD_ENVIRONMENT%"=="" (
@ -13,7 +13,7 @@ if "%BUILD_ENVIRONMENT%"=="" (
)
if NOT "%BUILD_ENVIRONMENT%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3
@ -21,8 +21,10 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
:: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0
call conda install -y -q -c conda-forge cmake
)
pip install -q ninja future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow
:: The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
pip install ninja future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow
:: No need to install faulthandler since we only test Python >= 3.6 on Windows
:: faulthandler is builtin since Python 3.3
@ -33,19 +35,27 @@ goto cuda_build_end
:cuda_build_9
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
@echo on
popd
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=%CUDA_PATH%
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2
set CUDA_PATH_V9_2=%CUDA_PATH%
goto cuda_build_common
:cuda_build_10
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
@echo on
popd
@ -56,6 +66,7 @@ goto cuda_build_common
:cuda_build_common
set DISTUTILS_USE_SDK=1
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%

View File

@ -1,3 +1,3 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
cd test && python run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose && cd ..
cd test && python run_test.py --exclude test_nn test_jit_simple test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="%1" && cd ..
if ERRORLEVEL 1 exit /b 1

View File

@ -8,7 +8,7 @@ python %SCRIPT_HELPERS_DIR%\run_python_nn_smoketests.py
if ERRORLEVEL 1 exit /b 1
echo Run nn tests
python run_test.py --include nn --verbose
python run_test.py --include test_nn --verbose --determine-from="%1"
if ERRORLEVEL 1 exit /b 1
popd

View File

@ -1,7 +1,7 @@
#!/bin/bash -ex
# shellcheck disable=SC2034
COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-test
COMPACT_JOB_NAME=pytorch-win-ws2019-cuda10-cudnn7-py3-test
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
source "$SCRIPT_PARENT_DIR/common.sh"
@ -30,18 +30,22 @@ fi
export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers
if [ -n "$CIRCLE_PULL_REQUEST" ]; then
DETERMINE_FROM="${TMP_DIR}/determine_from"
file_diff_from_base "$DETERMINE_FROM"
fi
run_tests() {
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
$SCRIPT_HELPERS_DIR/test_python_nn.bat && \
$SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat && \
$SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM" && \
$SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM" && \
$SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \
$SCRIPT_HELPERS_DIR/test_libtorch.bat
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
$SCRIPT_HELPERS_DIR/test_python_nn.bat
$SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM"
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
$SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat && \
$SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM" && \
$SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \
$SCRIPT_HELPERS_DIR/test_libtorch.bat
fi

View File

View File

@ -1,6 +1,10 @@
@inproceedings{paszke2017automatic,
title={Automatic Differentiation in {PyTorch}},
author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
booktitle={NIPS Autodiff Workshop},
year={2017}
@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

View File

@ -30,7 +30,7 @@ endif()
set(CMAKE_INSTALL_MESSAGE NEVER)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 14)
if (NOT MSVC)
set(CMAKE_C_STANDARD 11)
endif()
@ -81,12 +81,24 @@ if(APPLE)
set(CMAKE_MACOSX_RPATH ON)
endif()
if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")
set(CPU_INTEL ON)
if (WIN32)
# On Windows, CMAKE_HOST_SYSTEM_PROCESSOR is calculated through `PROCESSOR_ARCHITECTURE`,
# which only has the value of `x86` or `AMD64`. We cannot infer whether it's a Intel CPU
# or not. However, the environment variable `PROCESSOR_IDENTIFIER` could be used.
if ($ENV{PROCESSOR_IDENTIFIER} MATCHES "Intel")
set(CPU_INTEL ON)
else ()
set(CPU_INTEL OFF)
endif ()
else ()
set(CPU_INTEL OFF)
if (${CMAKE_HOST_SYSTEM_PROCESSOR} MATCHES "(x86_64|i[3-6]+86)")
set(CPU_INTEL ON)
else ()
set(CPU_INTEL OFF)
endif ()
endif ()
# For non-supported platforms, turn USE_DISTRIBUTED off by default.
# It is not tested and likely won't work without additional changes.
if(NOT LINUX)
@ -107,11 +119,9 @@ option(BUILD_BINARY "Build C++ binaries" OFF)
option(BUILD_DOCS "Build Caffe2 documentation" OFF)
option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)
option(BUILD_PYTHON "Build Python binaries" ON)
cmake_dependent_option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON
"NOT MSVC" OFF)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)
option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)
option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" OFF)
option(USE_STATIC_DISPATCH "Use static dispatch for ATen operators" OFF)
cmake_dependent_option(
CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON
@ -175,6 +185,7 @@ option(USE_SNPE "Use Qualcomm's SNPE library" OFF)
option(USE_SYSTEM_EIGEN_INSTALL
"Use system Eigen instead of the one under third_party" OFF)
option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
option(USE_XNNPACK "Use XNNPACK" ON)
option(USE_ZMQ "Use ZMQ" OFF)
option(USE_ZSTD "Use ZSTD" OFF)
cmake_dependent_option(
@ -192,6 +203,7 @@ cmake_dependent_option(
USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON
"USE_DISTRIBUTED" OFF)
option(USE_TBB "Use TBB" OFF)
option(ONNX_ML "Enable traditional ONNX ML API." ON)
# Used when building Caffe2 through setup.py
option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" ON)
@ -207,6 +219,8 @@ cmake_dependent_option(
set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")
set(SELECTED_OP_LIST "" CACHE STRING
"Path to the yaml file that contains the list of operators to include for custom build. Include all operators by default.")
set(OP_DEPENDENCY "" CACHE STRING
"Path to the yaml file that contains the op dependency graph for custom build.")
# This is a fix for a rare build issue on Ubuntu:
# symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk
@ -260,8 +274,16 @@ if (MSVC)
endif()
# /bigobj increases number of sections in .obj file, which is needed to link
# against libaries in Python 2.7 under Windows
set(${flag_var} "${${flag_var}} /MP /bigobj")
# against libraries in Python 2.7 under Windows
# For Visual Studio generators, if /MP is not added, then we may need
# to add /MP to the flags.
# For other generators like ninja, we don't need to add /MP because it is
# already handled by the generator itself.
if(CMAKE_GENERATOR MATCHES "Visual Studio" AND NOT ${flag_var} MATCHES "/MP")
set(${flag_var} "${${flag_var}} /MP /bigobj")
else()
set(${flag_var} "${${flag_var}} /bigobj")
endif()
endforeach(flag_var)
foreach(flag_var
@ -283,17 +305,24 @@ if (MSVC)
list(APPEND CUDA_NVCC_FLAGS "-Xcompiler /w -w")
endif(MSVC)
IF(NOT MSVC)
list(APPEND CUDA_NVCC_FLAGS_DEBUG "-g" "-lineinfo" "--source-in-ptx")
list(APPEND CUDA_NVCC_FLAGS_RELWITHDEBINFO "-g" "-lineinfo" "--source-in-ptx")
ENDIF(NOT MSVC)
# Set INTERN_BUILD_MOBILE for all mobile builds. Components that are not
# applicable to mobile are disabled by this variable.
if (ANDROID OR IOS)
# Setting `BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN` environment variable can
# force it to do mobile build with host toolchain - which is useful for testing
# purpose.
if (ANDROID OR IOS OR DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN})
set(INTERN_BUILD_MOBILE ON)
endif()
# Setting `PYTORCH_BUILD_MOBILE` environment variable can force it to do mobile
# build with host toolchain.
if (DEFINED ENV{PYTORCH_BUILD_MOBILE})
set(INTERN_BUILD_MOBILE ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DC10_MOBILE")
if (DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN})
# C10_MOBILE is derived from Android/iOS toolchain macros in
# c10/macros/Macros.h, so it needs to be explicitly set here.
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DC10_MOBILE")
endif()
endif()
# INTERN_BUILD_ATEN_OPS is used to control whether to build ATen/TH operators.
@ -318,11 +347,13 @@ if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)
set(FEATURE_TORCH_MOBILE ON)
set(NO_API ON)
set(USE_FBGEMM OFF)
set(USE_PYTORCH_QNNPACK ON)
set(USE_QNNPACK OFF)
set(INTERN_DISABLE_ONNX ON)
set(INTERN_DISABLE_AUTOGRAD ON)
set(INTERN_USE_EIGEN_BLAS ON)
# Disable developing mobile interpreter for actual mobile build.
# Enable it elsewhere to capture build error.
set(INTERN_DISABLE_MOBILE_INTERP ON)
endif()
# ---[ Utils
@ -383,10 +414,6 @@ if(USE_FBGEMM)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_FBGEMM")
endif()
if(BUILD_NAMEDTENSOR)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBUILD_NAMEDTENSOR")
endif()
if(USE_QNNPACK)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_QNNPACK")
endif()
@ -395,6 +422,10 @@ if(USE_PYTORCH_QNNPACK)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_PYTORCH_QNNPACK")
endif()
if(USE_XNNPACK)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL")
endif()
# ---[ Whitelist file if whitelist is specified
include(cmake/Whitelist.cmake)
@ -405,8 +436,8 @@ if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.8.0
endif()
# ---[ Build flags
set(CMAKE_C_STANDARD 99)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 14)
if(NOT MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -fPIC")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")
@ -414,6 +445,7 @@ if(NOT MSVC)
# Details at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1459
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror=return-type")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-field-initializers")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-type-limits")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-array-bounds")
@ -475,6 +507,10 @@ if(NOT MSVC)
set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-math-errno")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-trapping-math")
check_cxx_compiler_flag("-Werror=format" HAS_WERROR_FORMAT)
if (HAS_WERROR_FORMAT)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror=format")
endif()
endif()
if (USE_ASAN)

View File

@ -10,13 +10,13 @@
/test/test_c10d.py @pietern @mrshenli @zhaojuanmao
/torch/utils/cpp_extension.py @goldsborough @fmassa @soumith @ezyang
# Not there to stricly require the approval, but to be tagged as a reviewer
# Not there to strictly require the approval, but to be tagged as a reviewer
# on the PRs to push them into a high priority inbox.
/torch/csrc/api/data/ @apaszke
/torch/csrc/autograd/ @apaszke
/torch/csrc/autograd/ @apaszke @albanD
/torch/csrc/jit/ @apaszke
/torch/nn/ @apaszke
/torch/autograd/ @apaszke
/torch/autograd/ @apaszke @albanD
/torch/jit/ @apaszke
/torch/utils/data/ @apaszke
@ -25,3 +25,8 @@
/torch/csrc/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao
/torch/distributed/rpc @mrshenli @pritamdamania87 @zhaojuanmao
/torch/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao
/torch/distributed/optim @mrshenli @pritamdamania87 @zhaojuanmao @aazzolini
# Distributed tests
/test/distributed @mrshenli @pritamdamania87 @zhaojuanmao
/torch/testing/_internal/distributed @mrshenli @pritamdamania87 @zhaojuanmao

76
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,76 @@
# Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to make participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at <conduct@pytorch.org>. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq

View File

@ -1,3 +1,37 @@
- [Contributing to PyTorch](#contributing-to-pytorch)
- [Developing PyTorch](#developing-pytorch)
- [Codebase structure](#codebase-structure)
- [Unit testing](#unit-testing)
* [Better local unit tests with pytest](#better-local-unit-tests-with-pytest)
- [Writing documentation](#writing-documentation)
* [Building documentation](#building-documentation)
+ [Tips](#tips)
+ [Building C++ Documentation](#building-c---documentation)
* [Previewing changes](#previewing-changes)
+ [Submitting changes for review](#submitting-changes-for-review)
* [Adding documentation tests](#adding-documentation-tests)
- [Profiling with `py-spy`](#profiling-with-py-spy)
- [Managing multiple build trees](#managing-multiple-build-trees)
- [C++ development tips](#c---development-tips)
* [Build only what you need](#build-only-what-you-need)
* [Code completion and IDE support](#code-completion-and-ide-support)
* [Make no-op build fast](#make-no-op-build-fast)
+ [Use Ninja](#use-ninja)
+ [Use CCache](#use-ccache)
+ [Use a faster linker](#use-a-faster-linker)
* [C++ frontend development tips](#c---frontend-development-tips)
- [CUDA development tips](#cuda-development-tips)
- [Windows development tips](#windows-development-tips)
* [Known MSVC (and MSVC with NVCC) bugs](#known-msvc--and-msvc-with-nvcc--bugs)
* [Running clang-tidy](#running-clang-tidy)
* [Pre-commit tidy/linting hook](#pre-commit-tidy-linting-hook)
* [Building PyTorch with ASAN](#building-pytorch-with-asan)
+ [Getting `ccache` to work](#getting--ccache--to-work)
+ [Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?](#why-this-stuff-with--ld-preload--and--libasan-rt--)
+ [Why LD_PRELOAD in the build function?](#why-ld-preload-in-the-build-function-)
+ [Why no leak detection?](#why-no-leak-detection-)
- [Caffe2 notes](#caffe2-notes)
## Contributing to PyTorch
If you are interested in contributing to PyTorch, your contributions will fall
@ -95,7 +129,6 @@ and `python setup.py clean`. Then you can install in `develop` mode again.
* [src](aten/src)
* [TH](aten/src/TH)
[THC](aten/src/THC)
[THNN](aten/src/THNN)
[THCUNN](aten/src/THCUNN) - Legacy library code from the original
Torch. Try not to add things here; we're slowly porting these to
[native](aten/src/ATen/native).
@ -185,26 +218,13 @@ pytest test/test_nn.py -k Loss -v
The above is an example of testing a change to Loss functions: this command runs tests such as
`TestNN.test_BCELoss` and `TestNN.test_MSELoss` and can be useful to save keystrokes.
## Writing Documentation
## Writing documentation
PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to
fit into Jupyter documentation popups.
For C++ documentation (https://pytorch.org/cppdocs), we use
[Doxygen](http://www.doxygen.nl/) and then convert it to
[Sphinx](http://www.sphinx-doc.org/) via
[Breathe](https://github.com/michaeljones/breathe) and
[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen
reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more
information on the documentation syntax. To build the documentation locally,
`cd` into `docs/cpp` and then `make html`.
We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen
commands. To run this check locally, run `./check-doxygen.sh` from inside
`docs/cpp`.
### Building Documentation
### Building documentation
To build the documentation:
@ -229,28 +249,6 @@ cd docs
make html
```
4. To view HTML files, you must start an HTTP server. For example
```bash
# Start a server from the current directory (Python 3 only)
cd docs/build/html
python -m http.server
```
If you are developing on a remote machine, you can set up an SSH tunnel so that
you can access the HTTP server on the remote machine on your local machine. To map
remote port 8086 to local port 8086, use either of the following commands.
```bash
# For SSH
ssh my_machine -L 8086:my_machine:8086
# For Eternal Terminal
et my_machine -t="8086:8086"
```
Then navigate to `localhost:8086` in your web browser.
#### Tips
The `.rst` source files live in [docs/source](docs/source). Some of the `.rst`
@ -267,12 +265,84 @@ ls | grep rst | grep -v index | grep -v jit | xargs rm
# Make your changes, build the docs, etc.
# Don't commit the deletions!
git add index.rst jit.rst
git add index.rst jit.rst
...
```
#### Building C++ Documentation
For C++ documentation (https://pytorch.org/cppdocs), we use
[Doxygen](http://www.doxygen.nl/) and then convert it to
[Sphinx](http://www.sphinx-doc.org/) via
[Breathe](https://github.com/michaeljones/breathe) and
[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen
reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more
information on the documentation syntax.
### Adding Documentation Tests
We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen
commands. To run this check locally, run `./check-doxygen.sh` from inside
`docs/cpp`.
To build the documentation, follow the same steps as above, but run them from
`docs/cpp` instead of `docs`.
### Previewing changes
To view HTML files locally, you can open the files in your web browser. For example,
navigate to `file:///your_pytorch_folder/docs/build/html/index.html` in a web
browser.
If you are developing on a remote machine, you can set up an SSH tunnel so that
you can access the HTTP server on the remote machine from your local machine. To map
remote port 8000 to local port 8000, use either of the following commands.
```bash
# For SSH
ssh my_machine -L 8000:my_machine:8000
# For Eternal Terminal
et my_machine -t="8000:8000"
```
Then navigate to `localhost:8000` in your web browser.
#### Submitting changes for review
It is helpful when submitting a PR that changes the docs to provide a rendered
version of the result. If your change is small, you can add a screenshot of the
changed docs to your PR.
If your change to the docs is large and affects multiple pages, you can host
the docs yourself with the following steps, then add a link to the output in your
PR. These instructions use GitHub pages to host the docs
you have built. To do so, follow [these steps](https://guides.github.com/features/pages/)
to make a repo to host your changed documentation.
GitHub pages expects to be hosting a Jekyll generated website which does not work
well with the static resource paths used in the PyTorch documentation. To get around
this, you must add an empty file called `.nojekyll` to your repo.
```bash
cd your_github_pages_repo
touch .nojekyll
git add .
git commit
git push
```
Then, copy built documentation and push the changes:
```bash
cd your_github_pages_repo
cp -r ~/my_pytorch_path/docs/build/html/* .
git add .
git commit
git push
```
Then you should be able to see the changes at your_github_username.github.com/your_github_pages_repo.
### Adding documentation tests
It is easy for code snippets in docstrings and `.rst` files to get out of date. The docs
build includes the [Sphinx Doctest Extension](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html),
@ -337,7 +407,7 @@ privileges.
tweaked to adjust the stack sampling rate, see the `py-spy` readme for more
details.
## Managing Multiple Build Trees
## Managing multiple build trees
One downside to using `python setup.py develop` is that your development
version of PyTorch will be installed globally on your account (e.g., if
@ -356,7 +426,7 @@ source activate pytorch-myfeature
python setup.py develop
```
## C++ Development tips
## C++ development tips
If you are working on the C++ code, there are a few important things that you
will want to keep in mind:
@ -364,7 +434,7 @@ will want to keep in mind:
1. How to rebuild only the code you are working on.
2. How to make rebuilds in the absence of changes go faster.
### Build only what you need.
### Build only what you need
`python setup.py build` will build everything by default, but sometimes you are
only interested in a specific component.
@ -387,10 +457,11 @@ variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `U
- `USE_FBGEMM=0` will disable using FBGEMM (quantized 8-bit server operators).
- `USE_NNPACK=0` will disable compiling with NNPACK.
- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).
- `USE_XNNPACK=0` will disable compiling with XNNPACK.
For example:
```bash
DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 python setup.py develop
DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop
```
For subsequent builds (i.e., when `build/CMakeCache.txt` exists), the build
@ -407,7 +478,7 @@ C++ code. You need to `pip install ninja` to generate accurate
information for the code in `torch/csrc`. More information at:
- https://sarcasm.github.io/notes/dev/compilation-database.html
### Make no-op build fast.
### Make no-op build fast
#### Use Ninja
@ -534,7 +605,17 @@ The easiest way to use `lld` this is download the
ln -s /path/to/downloaded/ld.lld /usr/local/bin/ld
```
## CUDA Development tips
### C++ frontend development tips
We have very extensive tests in the [test/cpp/api](test/cpp/api) folder. The
tests are a great way to see how certain components are intended to be used.
When compiling PyTorch from source, the test runner binary will be written to
`build/bin/test_api`. The tests use the [GoogleTest](https://github.com/google/googletest/blob/master/googletest)
framework, which you can read up about to learn how to configure the test runner. When
submitting a new feature, we care very much that you write appropriate tests.
Please follow the lead of the other tests to see how to write a new test case.
## CUDA development tips
If you are working on the CUDA code, here are some useful CUDA debugging tips:
@ -543,7 +624,7 @@ If you are working on the CUDA code, here are some useful CUDA debugging tips:
slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.
2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,
`cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).
3. CUDA supports a lot of C++11 features such as, `std::numeric_limits`, `std::nextafter`,
3. CUDA supports a lot of C++11/14 features such as, `std::numeric_limits`, `std::nextafter`,
`std::tuple` etc. in device code. Many of such features are possible because of the
[--expt-relaxed-constexpr](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions)
nvcc flag. There is a known [issue](https://github.com/ROCm-Developer-Tools/HIP/issues/374)
@ -620,7 +701,7 @@ two dynamic libraries, one linking with the other:
```CMake
project(myproject CXX)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 14)
add_library(foo SHARED foo.cpp)
add_library(bar SHARED bar.cpp)
# NB: don't forget to __declspec(dllexport) at least one symbol from foo,
@ -694,7 +775,7 @@ static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm");
we have AliasAnalysisKind::PURE_FUNCTION and not AliasAnalysisKind::PURE.
The same is likely true for other identifiers that we just didn't try to use yet.
### Running Clang-Tidy
## Running clang-tidy
[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++
linter and static analysis tool based on the clang compiler. We run clang-tidy
@ -725,7 +806,7 @@ root folder if you used `setup.py build`. You can use `-c <clang-tidy-binary>`
to change the clang-tidy this script uses. Make sure you have PyYaml installed,
which is in PyTorch's `requirements.txt`.
### Pre-commit Tidy/Linting Hook
## Pre-commit tidy/linting hook
We use clang-tidy and flake8 (installed with flake8-bugbear,
flake8-comprehensions, flake8-mypy, and flake8-pyi) to perform additional
@ -740,7 +821,7 @@ You'll need to install an appropriately configured flake8; see
[Lint as you type](https://github.com/pytorch/pytorch/wiki/Lint-as-you-type)
for documentation on how to do this.
### Building PyTorch with ASAN
## Building PyTorch with ASAN
[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) is very
useful for debugging memory errors in C++. We run it in CI, but here's how to
@ -788,7 +869,7 @@ suo-devfair ~/pytorch build_with_asan
suo-devfair ~/pytorch run_with_asan python test/test_jit.py
```
#### Getting `ccache` to work
### Getting `ccache` to work
The scripts above specify the `clang` and `clang++` binaries directly, which
bypasses `ccache`. Here's how to get `ccache` to work:
@ -799,7 +880,7 @@ bypasses `ccache`. Here's how to get `ccache` to work:
3. Change the `CC` and `CXX` variables in `build_with_asan()` to point
directly to `clang` and `clang++`.
#### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?
### Why this stuff with `LD_PRELOAD` and `LIBASAN_RT`?
The “standard” workflow for ASAN assumes you have a standalone binary:
@ -820,7 +901,7 @@ workaround for cases like this:
More information can be found
[here](https://github.com/google/sanitizers/wiki/AddressSanitizerAsDso).
#### Why LD_PRELOAD in the build function?
### Why LD_PRELOAD in the build function?
We need `LD_PRELOAD` because there is a cmake check that ensures that a
simple program builds and runs. If we are building with ASAN as a shared
@ -829,7 +910,7 @@ dynamic linker errors and the check will fail.
We dont actually need either of these if we fix the cmake checks.
#### Why no Leak detection?
### Why no leak detection?
Python leaks a lot of memory. Possibly we could configure a suppression file,
but we havent gotten around to it.

73
Dockerfile Normal file
View File

@ -0,0 +1,73 @@
# syntax = docker/dockerfile:experimental
#
# NOTE: To build this you will need a docker version > 18.06 with
# experimental enabled and DOCKER_BUILDKIT=1
#
# If you do not use buildkit you are not going to have a good time
#
# For reference:
# https://docs.docker.com/develop/develop-images/build_enhancements/
ARG BASE_IMAGE=ubuntu:18.04
ARG PYTHON_VERSION=3.7
FROM ${BASE_IMAGE} as dev-base
RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
ccache \
cmake \
curl \
git \
libjpeg-dev \
libpng-dev && \
rm -rf /var/lib/apt/lists/*
RUN /usr/sbin/update-ccache-symlinks
RUN mkdir /opt/ccache && ccache --set-config=cache_dir=/opt/ccache
ENV PATH /opt/conda/bin:$PATH
FROM dev-base as conda
RUN curl -v -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install -y python=${PYTHON_VERSION} conda-build pyyaml numpy ipython&& \
/opt/conda/bin/conda clean -ya
FROM dev-base as submodule-update
WORKDIR /opt/pytorch
COPY . .
RUN git submodule update --init --recursive
FROM conda as build
WORKDIR /opt/pytorch
COPY --from=conda /opt/conda /opt/conda
COPY --from=submodule-update /opt/pytorch /opt/pytorch
RUN --mount=type=cache,target=/opt/ccache \
TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
python setup.py install
FROM conda as conda-installs
ARG INSTALL_CHANNEL=pytorch-nightly
RUN /opt/conda/bin/conda install -c "${INSTALL_CHANNEL}" -y pytorch torchvision && \
/opt/conda/bin/conda clean -ya
FROM ${BASE_IMAGE} as official
LABEL com.nvidia.volumes.needed="nvidia_driver"
RUN --mount=type=cache,id=apt-final,target=/var/cache/apt \
apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
libjpeg-dev \
libpng-dev && \
rm -rf /var/lib/apt/lists/*
COPY --from=conda-installs /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
WORKDIR /workspace
FROM official as dev
# Should override the already installed version from the official-image stage
COPY --from=build /opt/conda /opt/conda

View File

@ -25,8 +25,8 @@ You can reuse your favorite Python packages such as NumPy, SciPy and Cython to e
| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center></center> |
| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center></center> |
| Windows CPU / GPU | <center></center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/) | <center></center> |
| Linux (ppc64le) CPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |
| Linux (ppc64le) GPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/) |
| Linux (ppc64le) CPU | <center></center> | <center></center> | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |
| Linux (ppc64le) GPU | <center></center> | <center></center> | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/) |
See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).
@ -53,7 +53,7 @@ Elaborating further:
### A GPU-Ready Tensor Library
If you use NumPy, then you have used Tensors (a.k.a ndarray).
If you use NumPy, then you have used Tensors (a.k.a. ndarray).
![Tensor illustration](./docs/source/_static/img/tensor_illustration.png)
@ -200,6 +200,14 @@ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
```
Each CUDA version only supports one particular XCode version. The following combinations have been reported to work with PyTorch.
| CUDA version | XCode version |
| ------------ | ------------- |
| 10.0 | XCode 9.4 |
| 10.1 | XCode 10.1 |
On Windows
At least Visual Studio 2017 Update 3 (version 15.3.3 with the toolset 14.11) and [NVTX](https://docs.nvidia.com/gameworks/content/gameworkslibrary/nvtx/nvidia_tools_extension_library_nvtx.htm) are needed.
@ -272,20 +280,30 @@ ccmake build # or cmake-gui build
### Docker Image
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which Python version is to be used by Miniconda, or leave it unset to use the default. Build from pytorch repo directory as docker needs to copy git repo into docker filesystem while building the image.
```
docker build -t pytorch -f docker/pytorch/Dockerfile . # [optional] --build-arg WITH_TORCHVISION=0
#### Using pre-built images
You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+
```bash
docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest
```
You can also pull a pre-built docker image from Docker Hub and run with nvidia-docker,
but this is not currently maintained and will pull PyTorch 0.2.
```
nvidia-docker run --rm -ti --ipc=host pytorch/pytorch:latest
```
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you
should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.
#### Building the image yourself
**NOTE:** Must be built with a docker version > 18.06
The `Dockerfile` is supplied to build images with cuda support and cudnn v7.
You can pass `PYTHON_VERSION=x.y` make variable to specify which Python version is to be used by Miniconda, or leave it
unset to use the default.
```bash
make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch
```
### Building the Documentation
To build documentation in various formats, you will need [Sphinx](http://www.sphinx-doc.org) and the
@ -316,6 +334,7 @@ Three pointers to get you started:
* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* Slack: The [PyTorch Slack](https://pytorch.slack.com/) hosts a primary audience of moderate to experienced PyTorch users and developers for general chat, online discussions, collaboration etc. If you are a beginner looking for help, the primary medium is [PyTorch Forums](https://discuss.pytorch.org). If you need a slack invite, please fill this form: https://goo.gl/forms/PP1AGvNHpSaJP8to1
* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: https://eepurl.com/cbG0rv
* for brand guidelines, please visit our website at [pytorch.org](https://pytorch.org/)
## Releases and Contributing

View File

@ -34,12 +34,12 @@ repositories {
dependencies {
...
implementation 'org.pytorch:pytorch_android:1.4.0-SNAPSHOT'
implementation 'org.pytorch:pytorch_android_torchvision:1.4.0-SNAPSHOT'
implementation 'org.pytorch:pytorch_android:1.5.0-SNAPSHOT'
implementation 'org.pytorch:pytorch_android_torchvision:1.5.0-SNAPSHOT'
...
}
```
The current nightly(snapshots) version is the value of `VERSION_NAME` in `gradle.properties` in current folder, at this moment it is `1.4.0-SNAPSHOT`.
The current nightly(snapshots) version is the value of `VERSION_NAME` in `gradle.properties` in current folder, at this moment it is `1.5.0-SNAPSHOT`.
## Building PyTorch Android from Source
@ -49,6 +49,7 @@ For this you can use `./scripts/build_pytorch_android.sh` script.
```
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init --recursive
sh ./scripts/build_pytorch_android.sh
```
@ -59,7 +60,7 @@ The workflow contains several steps:
2\. Create symbolic links to the results of those builds:
`android/pytorch_android/src/main/jniLibs/${abi}` to the directory with output libraries
`android/pytorch_android/src/main/cpp/libtorch_include/${abi}` to the directory with headers. These directories are used to build `libpytorch.so` library that will be loaded on android device.
3\. And finally run `gradle` in `android/pytorch_android` directory with task `assembleRelease`
Script requires that Android SDK, Android NDK and gradle are installed.
@ -103,8 +104,15 @@ dependencies {
implementation(name:'pytorch_android', ext:'aar')
implementation(name:'pytorch_android_torchvision', ext:'aar')
implementation(name:'pytorch_android_fbjni', ext:'aar')
...
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
}
```
We also have to add all transitive dependencies of our aars.
As `pytorch_android` [depends](https://github.com/pytorch/pytorch/blob/master/android/pytorch_android/build.gradle#L62-L63) on `'com.android.support:appcompat-v7:28.0.0'` and `'com.facebook.soloader:nativeloader:0.8.0'`, we need to add them.
(In case of using maven dependencies they are added automatically from `pom.xml`).
At the moment for the case of using aar files directly we need additional configuration due to packaging specific (`libfbjni.so` is packaged in both `pytorch_android_fbjni.aar` and `pytorch_android.aar`).
```

View File

@ -11,6 +11,10 @@ allprojects {
runnerVersion = "1.2.0"
rulesVersion = "1.2.0"
junitVersion = "4.12"
androidSupportAppCompatV7Version = "28.0.0"
fbjniJavaOnlyVersion = "0.0.3"
soLoaderNativeLoaderVersion = "0.8.0"
}
repositories {
@ -34,8 +38,6 @@ allprojects {
}
}
ext.isPublishing = { ['uploadArchives', 'bintrayUpload'].any { gradle.startParameter.taskNames.contains(it) } }
ext.deps = [
jsr305: 'com.google.code.findbugs:jsr305:3.0.1',
]

View File

@ -62,7 +62,7 @@ mkdir -p $OUT_DIR
pushd $PYTORCH_DIR
python $PYTORCH_DIR/setup.py clean
ANDROID_ABI=$abi BUILD_PYTORCH_MOBILE=1 VERBOSE=1 ANDROID_DEBUG_SYMBOLS=1 $PYTORCH_DIR/scripts/build_android.sh -DANDROID_CCACHE=$(which ccache)
ANDROID_ABI=$abi VERBOSE=1 ANDROID_DEBUG_SYMBOLS=1 $PYTORCH_DIR/scripts/build_android.sh -DANDROID_CCACHE=$(which ccache)
cp -R $PYTORCH_DIR/build_android/install/lib $OUT_DIR/
cp -R $PYTORCH_DIR/build_android/install/include $OUT_DIR/
@ -97,4 +97,3 @@ find $PYTORCH_ANDROID_DIR -type f -name *apk
find $PYTORCH_ANDROID_DIR -type f -name *apk | xargs echo "To install apk run: $ANDROID_HOME/platform-tools/adb install -r "
popd

View File

@ -1,6 +1,6 @@
ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64
VERSION_NAME=1.4.0-SNAPSHOT
VERSION_NAME=1.5.0-SNAPSHOT
GROUP=org.pytorch
MAVEN_GROUP=org.pytorch
POM_URL=https://github.com/pytorch/pytorch/tree/master/android
@ -27,3 +27,4 @@ android.useAndroidX=true
android.enableJetifier=true
nativeLibsDoNotStrip=false
testAppAllVariantsEnabled=false

View File

@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.4.1)
project(pytorch_jni CXX)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_VERBOSE_MAKEFILE ON)
set(TRACE_ENABLED OFF)
@ -48,6 +48,14 @@ add_library(pytorch_jni SHARED
${pytorch_android_SOURCES}
)
if (APPLE)
# Need to add rpath so dlopen can find dependencies.
add_custom_command(TARGET pytorch_jni
POST_BUILD COMMAND
${CMAKE_INSTALL_NAME_TOOL} -add_rpath "@loader_path"
$<TARGET_FILE:pytorch_jni>)
endif()
target_compile_options(pytorch_jni PRIVATE
-fexceptions
)
@ -72,8 +80,10 @@ if (ANDROID_ABI)
endfunction(import_static_lib)
import_static_lib(libtorch)
import_static_lib(libtorch_cpu)
import_static_lib(libc10)
import_static_lib(libnnpack)
import_static_lib(libXNNPACK)
import_static_lib(libpytorch_qnnpack)
import_static_lib(libeigen_blas)
import_static_lib(libcpuinfo)
@ -85,9 +95,11 @@ if (ANDROID_ABI)
-Wl,--gc-sections
-Wl,--whole-archive
libtorch
libtorch_cpu
-Wl,--no-whole-archive
libc10
libnnpack
libXNNPACK
libpytorch_qnnpack
libeigen_blas
libcpuinfo
@ -100,8 +112,10 @@ else()
target_link_libraries(pytorch_jni
fbjni
torch
torch_cpu
c10
nnpack
XNNPACK
pytorch_qnnpack
cpuinfo
clog

View File

@ -33,6 +33,11 @@ android {
}
jniLibs.srcDirs = ['src/main/jniLibs']
}
androidTest {
java {
exclude 'org/pytorch/PytorchHostTests.java'
}
}
}
externalNativeBuild {
cmake {
@ -41,11 +46,6 @@ android {
}
packagingOptions {
if (rootProject.isPublishing()) {
exclude '**/libfbjni.so'
} else {
pickFirst '**/libfbjni.so'
}
if (nativeLibsDoNotStrip.toBoolean()) {
doNotStrip "**/*.so"
logger.warn('WARNING: nativeLibsDoNotStrip==true; debug symbols included')
@ -58,10 +58,9 @@ android {
}
dependencies {
api project(':fbjni')
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
implementation 'com.facebook.fbjni:fbjni-java-only:' + rootProject.fbjniJavaOnlyVersion
implementation 'com.android.support:appcompat-v7:' + rootProject.androidSupportAppCompatV7Version
implementation 'com.facebook.soloader:nativeloader:' + rootProject.soLoaderNativeLoaderVersion
testImplementation 'junit:junit:' + rootProject.junitVersion
testImplementation 'androidx.test:core:' + rootProject.coreVersion

View File

@ -0,0 +1,20 @@
#include <torch/jit.h>
#include <torch/script.h>
#include <torch/csrc/jit/api/module.h>
#include <iostream>
#include <fstream>
#include <string>
int main(int argc, char* argv[]) {
std::string input_file_path{argv[1]};
std::string output_file_path{argv[2]};
std::ifstream ifs(input_file_path);
std::stringstream buffer;
buffer << ifs.rdbuf();
torch::jit::Module m("TestModule");
m.define(buffer.str());
m.save(output_file_path);
}

View File

@ -14,7 +14,12 @@ repositories {
sourceSets {
main {
java.srcDir '../src/main/java'
java {
srcDir '../src/main/java'
exclude 'org/pytorch/PyTorchAndroid.java'
exclude 'org/pytorch/LiteModuleLoader.java'
exclude 'org/pytorch/LiteNativePeer.java'
}
}
test {
java {

Some files were not shown because too many files have changed in this diff Show More