Compare commits

..

4470 Commits

Author SHA1 Message Date
cb90e36684 Update onnx.rst (#40605)
Correcting the link (current 404s)
2020-06-30 06:46:40 -07:00
54a63e0420 Update header names and add in C++ docs (#27172)
1. Update "Package Reference" to "Python API"
2. Add in torchaudio and torchtext reference links so they show up across all docs not just the main page
3. Add "Other Languages" section and add in C++ docs
2019-10-01 21:52:53 -04:00
8554416a19 delete C_CONTIGUOUS assertions to be compatible with particular builds of numpy 2019-08-08 05:54:09 -07:00
64069120e4 fix install_requires properly 2019-08-08 05:16:06 -07:00
11d7e8d85b Fix regression in triangular_solve when number of batches = 1 for CUDA (#23997)
Changelog:
- When number of batches = 1, dispatch to trsm instead of trsm_batched in MAGMA

Test Plan:
- All triangular_solve tests should pass to ensure that the change is valid
2019-08-07 20:36:21 -07:00
4d1d843d18 Hotpatch CXXFLAGS to be the same as CFLAGS if CXXFLAGS is not set. 2019-08-07 16:01:08 -07:00
3e8e88b7f4 Use prerendered KaTeX in docs. (#23376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23376

This uses master version of sphinxcontrib-katex as it only
recently got prerender support.

Fixes #20984

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16582064

Pulled By: ezyang

fbshipit-source-id: 9ef24c5788c19572515ded2db2e8ebfb7a5ed44d
2019-08-07 15:58:36 -07:00
214dc0244b Adds torch.random to docs/toc 2019-08-07 15:57:29 -07:00
db37022260 [jit] prefix module qualified names with __module__ (#23633)
This is temporary, won't be needed with the new serialization format.
But for now, since the main module gets its name from the archive name,
we need this for safety, other wise something like
`torch.jit.save("torch.pt") will break things.

ghstack-source-id: f36febe1025ff04e7f79617e548819d4876dc7fa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23630
2019-08-07 18:55:48 -04:00
a0e9dd3190 [jit] don't try to set training after ScriptModule has been initialized. (#23681)
Now when initializing a ScriptModule during the torch.jit.load()
process, there is already a cpp module backing the thing. That means
that setting training will overwrite whatever the initialized
ScriptModule had.

This PR splits apart the common "set up internal state" part of the
Module __init__ and calls that from ScriptModule.__init__ and
Module.__init__, leaving the "nn.Module-specific" part (setting
`self.training`) for the nn.Module __init__

ghstack-source-id: 9b2ba8a15c43cf230363e4cd10ba4ad3ac4931f7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23680
2019-08-07 18:54:45 -04:00
5e36beca0f [jit] Support nn.GRU and Make nn.LSTM accept PackedSequence (#23700)
* [jit] Support nn.GRU and Make nn.LSTM accept PackedPaddedSequence

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix

* add link to comments
2019-08-07 18:54:08 -04:00
0d7d1d080d Fix argmax docstring (#23855) 2019-08-07 18:52:54 -04:00
e6f9422aca Fix expansion of stride argument (#23969)
In max_pool2d, max_pool3d, avg_pool2d, avg_pool3d.

There is only one substantive change: when stride.size() == 1,
we expand it to size 2.  However, I also took the opportunity
to give a better error message.

Testing here is bare minimum, because I'm in a hurry.  Just make
sure C++ API with all size 1 inputs works.

This is a squash of four commits.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2019-08-07 16:18:52 -04:00
805556cbb6 [v1.2] [RETRY] Fixed Bool in IsIntegralType bug (plus review comments) (#23955)
* Fixed Bool in IsIntegralType bug

* Added deprecation message

* Resolved PR comments

* Update for review comments.

* Get rid of tab.
2019-08-07 16:10:56 -04:00
7bdc5b6a63 Fix unused imports in torch/onnx/symbolic_opset8.py (#23678)
Summary:
Which causes lint errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23678

Differential Revision: D16622458

Pulled By: mrshenli

fbshipit-source-id: 145ad30dfb452dd556573c1b3d4cdd9cd7852752
2019-08-07 13:09:35 -07:00
5ad90d39ab No need to handle the dependency of INSTALL_TEST on BUILD_TEST in cmake.py (#23806)
Summary:
Simplifying https://github.com/pytorch/pytorch/issues/23793: The dependency relationship between
{INSTALL,BUILD}_TEST is already properly handled in CMakeLists.txt. All
we need to do is to pass down INSTALL_TEST.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23806

Differential Revision: D16691833

Pulled By: soumith

fbshipit-source-id: 7607492b2d82db3f79b174373a92e2810a854a61
2019-08-07 13:03:41 -07:00
8893c32cd8 Upload OS X binaries to pytorch-nightly please...
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2019-08-07 15:56:40 -04:00
cdb032efea [v1.2] Update CosineAnnealingWarmRestarts to follow PyTorch 1.1+ Step Order. (#23833) (#23952)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/23480.

I only verified that the schedule reaches the restart at the expected step as specified in the issue, it would be good to have someone else verify correctness here.

Script:
```
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(torch.optim.SGD([torch.randn(1, requires_grad=True)], lr=0.5), T_0=1, T_mult=2)
for i in range(9):
    print(i)
    print(scheduler.get_lr())
    scheduler.step()
```
Output:
```
0
[0.5]
1
[0.5]
2
[0.25]
3
[0.5]
4
[0.42677669529663687]
5
[0.25]
6
[0.07322330470336313]
7
[0.5]
8
[0.4809698831278217]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23833

Differential Revision: D16657251

Pulled By: gchanan

fbshipit-source-id: 713973cb7cbfc85dc333641cbe9feaf917718eb9
2019-08-07 13:39:24 -04:00
2ad6f427e8 [v1.2] fix torch.frac documentation (#23830) (#23951)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/13968 .

Following the math formula in wiki: https://en.wikipedia.org/wiki/Fractional_part
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23830

Differential Revision: D16656871

Pulled By: ailzhang

fbshipit-source-id: a71467870cf9566e0c7b1a045f72607dada81e1f
2019-08-07 13:39:12 -04:00
060f67723c Updated docs and added deprecation warnings to acknowledge a bool tensor (#22261) (#23811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22261
ghimport-source-id: 1611d62d056a04c0ad15ef662e594a3d206a78e2

Test Plan: Imported from OSS

Differential Revision: D16005990

Pulled By: izdeby

fbshipit-source-id: 2413824aa75a0755719e4df11acd21e6607e5a85
2019-08-07 13:00:03 -04:00
fb50b52949 allow INSTALL_TEST to pass through from env to cmake (#23793)
Summary:
This allows `INSTALL_*` to pass through to cmake.
Additional fix is that if `INSTALL_TEST` is specified, it wont use `BUILD_TEST` as the default value for `INSTALL_TEST`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23793

Differential Revision: D16648668

Pulled By: soumith

fbshipit-source-id: 52c2a0d8033bc556355b87a6731a577940de9859
2019-08-05 13:09:06 -04:00
3314b57c33 Changed tensor comparison return type from uint8 to bool (#21113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113
ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf

Test Plan: Imported from OSS

Differential Revision: D15552204

Pulled By: izdeby

fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037
2019-08-05 07:43:49 -07:00
1ac19ea8ba Fix dataloader._shutdown_workers if not all workers are started (#23762) 2019-08-04 23:28:47 -04:00
93a8dda495 cpu binary builds are built with cu100 docker image now instead of cu80 2019-08-04 21:08:06 -04:00
de276b1d14 add appropriate install_requires 2019-08-04 20:00:30 -04:00
47828fd9fc Document empty_strided (#23740)
Changelog:
- Add doc string for torch.empty_strided
- Remove empty file named `python` in test/
2019-08-03 01:09:33 -04:00
f0a65e660c Allowing batching for det/logdet/slogdet operations (#22909) (#23634)
Summary:
Changelog:
- Add batching for det / logdet / slogdet operations
- Update derivative computation to support batched inputs (and consequently batched outputs)
- Update docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909

Test Plan:
- Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py`
- Add autograd tests for batched inputs

Differential Revision: D16580988

Pulled By: ezyang

fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a
2019-08-02 00:15:52 -04:00
7e327f259b [1.2.0] fix align_corners doc (#23709)
* fix align_corners doc

* Update torch/nn/functional.py

Co-Authored-By: bnehoran <bnehoran@users.noreply.github.com>

* Update torch/nn/functional.py

Co-Authored-By: bnehoran <bnehoran@users.noreply.github.com>
2019-08-02 00:14:50 -04:00
ccc7b5c225 Use dst dir for temp file (#23629) (#23713)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23629

Differential Revision: D16594223

Pulled By: soumith

fbshipit-source-id: db0275415111f08fc13ab6be00b76737a20f92df
2019-08-02 00:14:32 -04:00
6f5fb78e9e Fix CTC loss for zero-length targets on GPU (#23298) (#23715)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/18215 at last!

Also sprinkle tests...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23298

Differential Revision: D16582145

Pulled By: soumith

fbshipit-source-id: bc8b1a629de0c2606e70a2218ccd135f4a9cdc5d
2019-08-02 00:14:11 -04:00
4f5211691f [jit] Recursive compilation error hot fixes (#23686)
* [jit] Recursive compilation error hot fixes

This is a combination of #23454 and #23682 which are needed for the
error reporting on recrusively compiled code

* #23682
2019-08-01 18:40:15 -04:00
7a7cfcbf0f add setup metadata to help PyPI flesh out content on pypi package page 2019-08-01 14:36:17 -04:00
129939132b [v1.2.0] Slightly improve dataloader docs on when auto-batching is disabled (#23672)
* Slightly improve dataloader docs on when auto-batching is disabled

* fix typo
2019-08-01 14:26:33 -04:00
bb4ff00f33 fix pin_memory_thread not exiting quickly (#23647) 2019-08-01 11:08:20 -04:00
49e32ffc9f Enable stable docs push to pytorch.github.io:site-v1.2.0 2019-08-01 06:22:00 -07:00
d6df9575f9 Fix regression in torch.qr (#23591) (#23606)
Summary:
Changelog:
- Use narrow instead of narrow_copy while returning
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23591

Test Plan:
- All tests should pass to ensure that the change is correct

Fixes https://github.com/pytorch/pytorch/issues/23580

Differential Revision: D16581174

Pulled By: ezyang

fbshipit-source-id: 1b6bf7d338ddd138ea4c6aa6901834dd202ec79c
2019-07-31 21:04:02 -04:00
c3bad0de3e at::view (#23452) (#23604)
Summary:
accidently calls clone, but what we want is creating an empty tensor and set storage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23452
ghstack-source-id: 87438096

Differential Revision: D16442756

fbshipit-source-id: 6d5663f82c9bd4e9de8fc846c52992477843af6a
2019-07-31 20:25:35 -04:00
18cbf11329 Prepare the stable docs build for v1.2.0
This sets up the docs build in dry-run mode. If everything looks okay I
will enable it.
2019-07-31 13:25:47 -07:00
a9dc2a15b7 Refactor the pytorch_doc_push_script to take a branch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23556

Test Plan:
- Run ci

Imported from OSS

Differential Revision: D16563747

Pulled By: zou3519

fbshipit-source-id: 104371b3712c00b073a82e5145090e7bd6fd2d53
2019-07-31 13:18:55 -07:00
e7abff0778 Delete re_worker_requirements 2019-07-30 13:02:20 -04:00
b3a9a7a9b9 Rename gels to lstsq (#23460)
Summary:
Changelog:
- Rename `gels` to `lstsq`
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460

Test Plan: - All tests should pass to confirm that the patch is correct

Differential Revision: D16547834

Pulled By: colesbury

fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496
2019-07-30 09:56:04 -07:00
cfe9400996 Remove preprocessing of CFLAGS, CPPFLAGS, and LDFLAGS in Python scripts. (#23528)
Summary:
After https://github.com/pytorch/pytorch/issues/23455, there is no need of this preprocessing in Python scripts.
They will be automatically processed in CMake (plus CPPFLAGS here
probably meant to be CXXFLAGS).

Reference:

- https://cmake.org/cmake/help/v3.15/envvar/CFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/CXXFLAGS.html
- https://cmake.org/cmake/help/v3.15/envvar/LDFLAGS.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23528

Differential Revision: D16561561

Pulled By: ezyang

fbshipit-source-id: 962a27a2b0a18db0f95477ad067a2611e4128187
2019-07-30 08:07:36 -07:00
fd61cc9ebc Moved at::assert_no_internal_overlap to TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22917

Differential Revision: D16521429

Pulled By: pbelevich

fbshipit-source-id: 80ae583c6486d6948431b79e1452902bdf2cfbc3
2019-07-30 07:47:33 -07:00
4b78ce1ba4 Clean cmake infrastructure up (#23527)
Summary:
Only check for cmake dependencies we directly depend on (e.g., hipsparse but not rocsparse)

Use cmake targets for ROCm where possible.

While there, update the docker CI build infrastructure to only pull in packages by name we directly depend on (anticipating the demise of, e.g., miopengemm). I do not anticipate a docker rebuild to be necessary at this stage as the changes are somewhat cosmetic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23527

Differential Revision: D16561010

Pulled By: ezyang

fbshipit-source-id: 87cd9d8a15a74caf9baca85a3e840e9d19ad5d9f
2019-07-30 07:26:48 -07:00
437a8b3eed Named inference rule for copy_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23229

Test Plan: Imported from OSS

Differential Revision: D16494413

Pulled By: zou3519

fbshipit-source-id: 4acb85e5a4ad09bf5f7cbb84cc8d4ceac0cd9967
2019-07-30 07:17:34 -07:00
16da355b30 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
LARGE: 66
MEDIUM: 649
XLARGE: 1

Updated actions:
From LARGE to MEDIUM: 18
From LARGE to XLARGE: 2
From MEDIUM to LARGE: 20
From XLARGE to LARGE: 1

Differential Revision: D16559356

fbshipit-source-id: a51ef034265649314661ab0e283089a069a20437
2019-07-30 02:53:11 -07:00
4e59055c4d optimize matmul memory usage for certain cases (#23433)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23433

Differential Revision: D16524135

Pulled By: ailzhang

fbshipit-source-id: e7684fec60c9b9db9a09f8ac157b13c8dde1bdd2
2019-07-29 22:35:45 -07:00
7b081e5d1e Improve error message for changing tensor metadata after .data or .detach() (#23504)
Summary:
When a user tries to change metadata of a tensor created from `.data` or `.detach()`, we currently shows an error message "<function_name> is not allowed on Tensor created from .data or .detach()". However, this error message doesn't suggest what the right fix should look like. This PR improves the error message.

Closes https://github.com/pytorch/pytorch/issues/23393.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23504

Differential Revision: D16547415

Pulled By: yf225

fbshipit-source-id: 37f4a0385442e2b0966386fb14d3d938ecf4230c
2019-07-29 22:25:14 -07:00
db1e9b1d6c Fix a few clang warnings.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23524

Differential Revision: D16549562

fbshipit-source-id: 58351fc2858d495b135023626116f6f565c8e9b1
2019-07-29 22:08:50 -07:00
30bc19d751 dictKeys and dictItems ops on typed dicts return typed lists (#23270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23270
ghstack-source-id: 87389530

Differential Revision: D16448942

fbshipit-source-id: e6b578f0e97776112259d7ea38e143e4716ec273
2019-07-29 20:00:34 -07:00
c8817f9436 fix default value for script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23542

Test Plan: Imported from OSS

Differential Revision: D16557122

Pulled By: suo

fbshipit-source-id: c86578aa2c55f44ed5d573d33874a82244df3d09
2019-07-29 19:51:26 -07:00
6314af6e57 Revert D16526027: [jit] Include recursive class compilations in error call stack
Differential Revision:
D16526027

Original commit changeset: 109f2968430d

fbshipit-source-id: c27252540ec6b7da60739eb7dcc8b1650672c226
2019-07-29 19:02:39 -07:00
6574f6167c Revert D16554694: [jit] add a test for inline tracing
Differential Revision:
D16554694

Original commit changeset: 0fae4458f18c

fbshipit-source-id: 08aa0c292fa5b2dbdd0d1f0e59f531416edef760
2019-07-29 18:57:06 -07:00
265b498de2 add a test for inline tracing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23535

Test Plan: Imported from OSS

Differential Revision: D16554694

Pulled By: suo

fbshipit-source-id: 0fae4458f18c06ffbd484905ad7836dce9ce69cc
2019-07-29 18:05:15 -07:00
52b95fd4be Include recursive class compilations in error call stack (#23454)
Summary:
Previously these were left out which would lead to confusing messages,
now it looks something like:

```
torch.jit.frontend.UnsupportedNodeError: import statements aren't
supported
:
at ../test.py:13:9
    def bad_fn(self):
        import pdb
        ~~~~~~ <--- HERE
'__torch__.X' is being compiled since it was called from 'fn'
at ../test.py:16:12
def fn(x):
    return X(10)
           ~~~~ <--- HERE
```

Fixes #23453
](https://our.intern.facebook.com/intern/diff/16526027/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23454

Pulled By: driazati

Differential Revision: D16526027

fbshipit-source-id: 109f2968430dbf51ee91b1b3409badfd557d19a4
2019-07-29 18:00:05 -07:00
696642ae8d Change docs to use recursive script API (#21612)
Summary:
Use the recursive script API in the existing docs

TODO:
* Migration guide for 1.1 -> 1.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21612

Pulled By: driazati

Differential Revision: D16553734

fbshipit-source-id: fb6be81a950224390bd5d19b9b3de2d97b3dc515
2019-07-29 17:51:22 -07:00
bfee46f8e2 Update argument list for non-fbgemm path for qconv_prepack (#23521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23521

non-fbgemm path should have the same arguments as fbgemm path.

Reviewed By: jianyuh

Differential Revision: D16547637

fbshipit-source-id: bb00d725fb968cbee32defb8facd2799a7e79bb4
2019-07-29 17:41:11 -07:00
65a89472c4 Put all modules in the global Python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23154

Test Plan: Imported from OSS

Differential Revision: D16441913

Pulled By: suo

fbshipit-source-id: a79f2c3e06a33cbd79b2e3333f16c069f356f451
2019-07-29 16:38:20 -07:00
e366af7d87 Add TORCH_CHECK to disable sub for bool tensors (#23519)
Summary:
This resolves two issues in one shot:

- sub shouldn't be available for bool type.
- When sub is applied to an unsupported type, the current error messages
  shows "add_cpu/add_cuda is not implemented for [type]". They should be
  "sub_cpu/sub_cuda" instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23519

Differential Revision: D16548770

Pulled By: izdeby

fbshipit-source-id: fe404a2a97b8d11bd180ec41364bf8e68414fb15
2019-07-29 16:28:35 -07:00
3c986dff77 introduce auto_set to simplify benchmarking the backward path of operators (#23276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23276

This diff introduces a new feature to simplify benchmarking the backward path of ops. Here is an example:

```
...
self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set())
self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set())
...
```

In this way, the benchmark will generate three different test cases.
1. input_one requires grad
2. input_two requires grad
3. both inputs require grad

Here is a sample output:
```
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwdall
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 863.744

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd1
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 727.915

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N8_K8_bwd2
# Input: M: 1, N: 8, K: 8
Backward Execution Time (us) : 687.626
```

Reviewed By: zheng-xq

Differential Revision: D16450355

fbshipit-source-id: 50ae0916e81c3ff9f0c482ed6d386319eb15b305
2019-07-29 15:58:41 -07:00
41dfe7204b Threading and CPU Inference note
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23417

Test Plan:
cd docs; make html

Imported from OSS

Differential Revision: D16523781

Pulled By: ilia-cher

fbshipit-source-id: d6c09e8a85d39e6185bbdc4b312fea44fcdfff06
2019-07-29 15:45:49 -07:00
f4eb93f7bc Support pack_padded_sequence and pad_packed_sequence
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23249

Test Plan: Imported from OSS

Differential Revision: D16466587

Pulled By: wanchaol

fbshipit-source-id: a721da01b2da0ef90cac80b77f1285102e3b1118
2019-07-29 15:36:47 -07:00
c384fbf4c8 support torch._C._get_tracing_state in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23248

Test Plan: Imported from OSS

Differential Revision: D16466588

Pulled By: wanchaol

fbshipit-source-id: 3c3d5dec2cea2f9cb080eadaef457cc62ac3fbe0
2019-07-29 15:05:50 -07:00
e1f8985973 Specify onnxruntime version to install for CI tests (#23517)
Summary:
No real change on the CI since currently the default latest is 0.4.0. houseroad bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23517

Differential Revision: D16550375

Pulled By: bddppq

fbshipit-source-id: a669b8af678c79c4d6909300b28458fe6b7cd30c
2019-07-29 14:58:15 -07:00
c779eff579 support torch.as_tensor in script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23247

Test Plan: Imported from OSS

Differential Revision: D16466590

Pulled By: wanchaol

fbshipit-source-id: cf52721eacd177d9040564790382db13a9fcc2fe
2019-07-29 14:38:22 -07:00
3a568c9a2b CI: install clang-tidy (#23518)
Summary:
Install clang-tidy (from LLVM 8) for the `clang_tidy` job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23518

Differential Revision: D16549621

Pulled By: ezyang

fbshipit-source-id: b1d20641380cdfdb0589249770b98163528fa69f
2019-07-29 14:28:26 -07:00
a8edc2b5d2 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22926

Differential Revision: D16546369

Pulled By: colesbury

fbshipit-source-id: 56f7ef4476e586dee19366fdb720085d1c2f2027
2019-07-29 13:47:05 -07:00
9219a37c12 avoid Include the same header file twice (#23418)
Summary:
avoid Include the same header file twice
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23418

Differential Revision: D16546422

Pulled By: colesbury

fbshipit-source-id: 5cd868cce73d9199ced9b6f2f6f57bf42e5a5d5b
2019-07-29 13:34:11 -07:00
dec4eacae4 fix fbcode weak ordering (#23511)
Summary:
There is an internal fbcode assert that fails if i do not add these checks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23511

Differential Revision: D16545606

Pulled By: eellison

fbshipit-source-id: cd3a799850bae8f052f9d81c1e4a2678fda19317
2019-07-29 13:14:39 -07:00
0c9979dd7d Fix TestCuda.test_events_wait (#23520)
Summary:
PyTorch test sets a policy() method to assertLeaksNoCudaTensors.
Whenever a test is run, assertLeaksNoCudaTensors is called,
which in turn calls CudaMemoryLeakCheck, which in turn calls
initialize_cuda_context_rng, where it executes torch.randn
on each device, where a kernel is launched on each device.

Since the kernel may not finish on device 1, the assertion
self.assertTrue(s1.query()) fails.

The fix is to insert

        torch.cuda.synchronize(d0)
        torch.cuda.synchronize(d1)

at the beginning of the test so that previously launched kernels finish before the real
test begins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23520

Differential Revision: D16547701

Pulled By: soumith

fbshipit-source-id: 42ad369f909d534e15555493d08e9bb99dd64b6a
2019-07-29 13:09:41 -07:00
e982e46de3 Add multiprocessing_context= argument to DataLoader (#22990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22131
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990

Differential Revision: D16539052

Pulled By: colesbury

fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2
2019-07-29 12:58:40 -07:00
56664c2c65 Untap caskroom/homebrew-cask in attempt to unbreak OS X builds.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23514

Test Plan: Imported from OSS

Differential Revision: D16546841

Pulled By: ezyang

fbshipit-source-id: 96d2e988cb0dddfeec0174875761dfa26f25a8c1
2019-07-29 12:45:01 -07:00
31f1928096 add sorting policy to ChunkDataset (#23053)
Summary:
Add a sorting policy to ChunkDataset.

This is considered an advanced parameter for developers who want to apply a 'sorting policy' to the chunk data before sampling into minibatch.

Different than the collate method, this policy is applied on the chunk level instead of minibatch level. When a chunk of data is loaded (multiple chunks if cross_chunk_shuffle_count_ is greater than 1), this policy is targeting to the full loaded data. It will be useful if developers want to perform some pre-processing (like bucketing) to the chunk data before example sampler samples the data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23053

Differential Revision: D16537692

Pulled By: colesbury

fbshipit-source-id: cd21ed40ab787a18b8c6dd304e5b806a7a45e6ba
2019-07-29 12:34:02 -07:00
a356276d79 add note to Contribution Guide around recently released research (#23513)
Summary:
Thanks adefazio for the feedback, adding a note to the Contribution guide so that folks don't start working on code without checking with the maintainers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23513

Differential Revision: D16546685

Pulled By: soumith

fbshipit-source-id: 1ee8ade963703c88374aedecb8c9e5ed39d7722d
2019-07-29 12:24:59 -07:00
06762b4721 Fix distributions.Categorical.sample bug from .view() (#23328)
Summary:
This modernizes distributions code by replacing a few uses of `.contiguous().view()` with `.reshape()`, fixing a sample bug in the `Categorical` distribution.

The bug is exercised by the following test:
```py
batch_shape = (1, 2, 1, 3, 1)
sample_shape = (4,)
cardinality = 2
logits = torch.randn(batch_shape + (cardinality,))
dist.Categorical(logits=logits).sample(sample_shape)
# RuntimeError: invalid argument 2: view size is not compatible with
#   input tensor's size and stride (at least one dimension spans across
#   two contiguous subspaces). Call .contiguous() before .view().
#   at ../aten/src/TH/generic/THTensor.cpp:203
```
I have verified this works locally, but I have not added this as a regression test because it is unlikely to regress (the code is now simpler).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23328

Differential Revision: D16510678

Pulled By: colesbury

fbshipit-source-id: c125c1a37d21d185132e8e8b65241c86ad8ad04b
2019-07-29 12:09:50 -07:00
be644d822b fixes #20178 (#23297)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23297

Differential Revision: D16497552

Pulled By: VitalyFedyunin

fbshipit-source-id: 386933b15c27d02351f042be71b153bc9439004d
2019-07-29 12:04:44 -07:00
236149edc5 Make randperm works properly on non-contiguous tensors. (#23043)
Summary:
Close https://github.com/pytorch/pytorch/issues/22710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23043

Differential Revision: D16446340

Pulled By: VitalyFedyunin

fbshipit-source-id: 1760af310fee71b369e1aaaf96546277058611c9
2019-07-29 11:59:04 -07:00
d6ff78fd00 fix an over-indented return in trace_module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23358

Differential Revision: D16519010

Pulled By: Krovatkin

fbshipit-source-id: a7e4225b70e915d91c74874e3eca9bcb87baf84c
2019-07-29 11:15:55 -07:00
505fa83b2f Implement named inference rule for mul
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23193

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23193 gh/zou3519/75/head

Imported from OSS

Differential Revision: D16494401

Pulled By: zou3519

fbshipit-source-id: 0e2395d7de39158ec51feed5da0389715ec52600
2019-07-29 09:58:18 -07:00
d3fcb4ccd3 Try another version of apt/dpkg killing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23499

Test Plan: Imported from OSS

Differential Revision: D16542875

Pulled By: ezyang

fbshipit-source-id: 05aa97f2d61e4fc00a819768448944f85701cab8
2019-07-29 09:13:24 -07:00
8ada7c9920 Remove two CMAKE_ build options from additional_options. (#23451)
Summary:
Following up 915261c8bef85e3b845a0384d3fb55e55707b609
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23451

Differential Revision: D16542303

Pulled By: ezyang

fbshipit-source-id: 1406c311c198eb237f85d6d8f1f0d58626be8257
2019-07-29 08:13:59 -07:00
09ba4df031 Whether MKLDNN should be built under native arch should respect USE_NATIVE_ARCH (#23445)
Summary:
Currently there is no way to build MKLDNN more optimized than sse4. This commit let MKLDNN build respect USE_NATIVE_ARCH.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23445

Differential Revision: D16542275

Pulled By: ezyang

fbshipit-source-id: 550976531d6a52db9128c0e3d4589a33715feee2
2019-07-29 08:13:56 -07:00
b335f3910f Remove redundant MSVC_Z7_OVERRIDE processing and combine "/EHa" flag setup (#23455)
Summary:
- MSVC_Z7_OVERRIDE has already handled in CMakeLists.txt. No need to process it for once more in the Python scripts.
- Option MSVC_Z7_OVERRIDE should be visible to the user only if MSVC is used.
- Move the setting of "/EHa" flag to CMakeLists.txt, where other MSVC-specific flags are processed. This also further prepares the removal of redundant cflags setup in Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23455

Differential Revision: D16542274

Pulled By: ezyang

fbshipit-source-id: 4d3b8b07161478bbba8a21feb6ea24c9024e21ac
2019-07-29 08:08:47 -07:00
81e46d4f78 Fix build issue. CUDA may be installed in $CUDA_HOME/lib on macOS. (#23491)
Summary:
Closes gh-16955.
Closes https://github.com/pytorch/vision/issues/977

On Linux both `lib64` and `lib` may be present (symlinked). The reports
seem to all be about macOS, but it seems like this is also possibly more
robust on Linux and can't hurt. So not treating platforms differently.

Note that Eigen has a similar check in its CMake:

```
if(CUDA_64_BIT_DEVICE_CODE AND (EXISTS "${CUDA_TOOLKIT_ROOT_DIR}/lib64"))
  link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib64")
else()
  link_directories("${CUDA_TOOLKIT_ROOT_DIR}/lib")
endif()
 ```

There may be other issues for building from source on macOS, can't test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23491

Differential Revision: D16538973

Pulled By: soumith

fbshipit-source-id: cc309347b7d16e718e06878d3824d0a6e40b1019
2019-07-29 08:08:43 -07:00
97f129bf0a Let set_rng_state and get_rng_state accept string parameter (#23448)
Summary:
Currently set_rng_state and get_rng_state do not accept string as their parameters. This commit let them accept strings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23448

Differential Revision: D16527172

Pulled By: soumith

fbshipit-source-id: 8f9a2129979706e16877cc110f104770fbbe952c
2019-07-29 08:08:39 -07:00
7a82066282 Update PyTorch Docker image to 323. (#23389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23389

Test Plan: Imported from OSS

Differential Revision: D16541971

Pulled By: ezyang

fbshipit-source-id: 2b7e483f4d6eedef7f5c140ffc0fac21fecd179b
2019-07-29 07:29:54 -07:00
f546a3b8d8 fixing documentation, issue 22697 (#23268)
Summary:
As fmassa commented :

> Agree, it should probably be weight, start, end
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23268

Differential Revision: D16493403

Pulled By: zou3519

fbshipit-source-id: 51ed07f6f7abdbd41dc323570aed41d804fa9c1b
2019-07-29 07:24:49 -07:00
19858f7cc6 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 981
LARGE: 56

Updated actions:
From MEDIUM to LARGE: 10
From LARGE to MEDIUM: 3
From LARGE to XLARGE: 1

Differential Revision: D16532427

fbshipit-source-id: c58bf59e6c571627b3994f8cdfa79758fb85892b
2019-07-29 04:37:23 -07:00
91d28026f8 Remove unused cuBLAS driver functions for getrs (#23375)
Summary:
Changelog:
- Remove getrs driver functions from THCBlas{.h/.cpp}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23375

Test Plan: - Build to pass to confirm no callsites were missed.

Differential Revision: D16539079

Pulled By: soumith

fbshipit-source-id: b5c285a2d36714ddf3393337eec7d85b1eaf3f51
2019-07-28 21:29:18 -07:00
54c280863c Add some compiler flags for building cpp extensions on Windows (#23472)
Summary:
(1) Add `COMMON_MSVC_FLAGS` to the flags in the ninja codepath
(2) Add `/EHsc` to `COMMON_MSVC_FLAG`
(3) Remove `-fPIC` and `-std=c++11` from the flags in the windows codepath
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23472

Differential Revision: D16532993

Pulled By: soumith

fbshipit-source-id: bc2d983f5f8b4eae9c7385bf170f155679e92e87
2019-07-28 20:33:18 -07:00
ef6356133e Revert D16428208: [pytorch][PR] only scatter in forward if multi-device per process
Differential Revision:
D16428208

Original commit changeset: eaa3876b2b95

fbshipit-source-id: 9db3bc86bf419dd06fdaaff434f72b92ecb5a427
2019-07-27 22:41:20 -07:00
64e4152064 Clarify that torch.device without an index will always represent the current device (#23468)
Summary:
Per discussion in https://github.com/pytorch/pytorch/issues/23448
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23468

Differential Revision: D16532950

Pulled By: soumith

fbshipit-source-id: 48c97060aaf55f1d7589afab42c6cd623d71a9a7
2019-07-27 06:49:52 -07:00
ffef0e03b7 Enabling GPU device runs for operators (#23461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23461

Enabling GPU device runs for production operator shapes.

Reviewed By: xw285cornell, mingzhe09088

Differential Revision: D16526928

fbshipit-source-id: 46657963f4b0bc43d14205ccf1b63d588657e388
2019-07-26 18:53:40 -07:00
23e526e6ff Fix SourceRange comparison
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23341

Test Plan: Imported from OSS

Differential Revision: D16505398

Pulled By: jamesr66a

fbshipit-source-id: 0bf6a1a054c7749c0a3334654d5746dd9f5dee96
2019-07-26 18:08:43 -07:00
3497891c14 add sorted keyword for lists and dicts (#23274)
Summary:
Add `sorted` keyword to JIT for lists and dicts. This desugars to a list copy and a call to `list.sort()`. Since we don't have interfaces yet I implement it in terms of `list.sort()`. When we do we can re-visit implementing this op in a different manner.

The test fails bc of a fix to specialized lists which is landing here: https://github.com/pytorch/pytorch/pull/23267

Ignore the first commit because it is formatting, plz use clang_format ppl :'(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23274

Differential Revision: D16527323

Pulled By: eellison

fbshipit-source-id: aed8faef23cb790b9af036cd6c1b9b1d7066345d
2019-07-26 17:44:15 -07:00
f0ebf769de allow accepting empty input to the benchmark (#23462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23462

as title

Reviewed By: hl475

Differential Revision: D16527176

fbshipit-source-id: 7a8ff4f3c6122ae7b3205e0b446fec06fd95eedc
2019-07-26 17:30:42 -07:00
522cca5040 Support IntList in Dict's shalloEquals (#23205)
Summary:
Required when comparing IntList type of default values

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23205
ghstack-source-id: 87208341

Reviewed By: zrphercule

Differential Revision: D16433809

fbshipit-source-id: 3f60d67d708129be31198161423d819108468077
2019-07-26 17:30:38 -07:00
d6d7a5f075 only scatter in forward if multi-device per process (#22384)
Summary:
Scatter is unnecessary if only using one device, and it breaks on some custom data structures like namedtuple, so would like to avoid :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22384

Differential Revision: D16428208

Pulled By: soumith

fbshipit-source-id: eaa3876b2b95c1006ccaaacdb62f54c5280e730c
2019-07-26 17:30:34 -07:00
e1ae3a75c8 gate module::save logic on mobile (#23415)
Summary:
This is part of the effort to shrink OSS libtorch mobile build size.
We shouldn't need Module::save function on mobile - it depends on
csrc/jit/export.cpp which then depends on ONNX. By gating these two
methods we can avoid these dependencies for libtorch mobile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23415
ghstack-source-id: 87288228

Reviewed By: dreiss

Differential Revision: D16511143

fbshipit-source-id: fd031f91fcf9b7be54cbe1436506965af94ab537
2019-07-26 17:23:38 -07:00
23f963e4a8 Update distributed.rst (#23289)
Summary:
Different backend is supported since https://github.com/pytorch/pytorch/pull/18595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23289

Differential Revision: D16528229

Pulled By: soumith

fbshipit-source-id: 57753e84c015817661ba30835278ee3a899aa2d0
2019-07-26 16:55:52 -07:00
ca76c82ce3 Add early returns to JIT (#19179)
Summary:
Add early returns to JIT with minimal changes to compiler.cpp and an IR->IR pass that will transform the graph so that there is only one return value.

In compiler.cpp, record when a block will exit so that in the following example will work:
```
if cond:
    a = torch.zeros([2])
else:
    return 2
a += 2
...
```
To match block outputs with values that will not be used, like in the above example with `a`, I add a Bottom Type that subtypes everything else. This allows shape propagation to continue to work, and makes it so that we don't need many extra nodes filling up the graph.

The IR transform currently doesn't work on Loops, I didn't add that to this PR to avoid too much complexity, but will add it as a stack (and it should be very little extra code). the IR  transform is commented at the top of the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19179

Differential Revision: D16519819

Pulled By: eellison

fbshipit-source-id: 322a27f69966d1fd074ebe723c3e948b458b0e68
2019-07-26 16:42:43 -07:00
9223fa1c46 Add support to serialize qtensor in JIT. (#23356)
Summary:
Adds qtensor specific fields to the proto file so that they get serialized into the model.json

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356
ghstack-source-id: 87263428

Differential Revision: D16473237

fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b
2019-07-26 15:52:15 -07:00
9dad13e1f0 Revert "Add fbgemm_qlinear_dynamic op (#23104)"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23449

Test Plan: Imported from OSS

Differential Revision: D16524768

Pulled By: ezyang

fbshipit-source-id: 9eb01b021011d1172317b5adb774c10c42ac2b86
2019-07-26 15:02:33 -07:00
953459f29e Dont allow conversions with QInt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22931

Test Plan: Imported from OSS

Differential Revision: D16467985

Pulled By: gchanan

fbshipit-source-id: 3925fc96a641e66b92fa65c542a2a23190c915a5
2019-07-26 14:45:14 -07:00
190d255d2e Add FLOAT_MODULE to quantized conv (#23414)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23414
ghstack-source-id: 87225586

Differential Revision: D16511055

fbshipit-source-id: c617733f60cfe38f4791e35e57e9551f2b5d8c09
2019-07-26 14:02:20 -07:00
83d6c6be07 ONNX export for index_select (#21866)
Summary:
ONNX export for index_select
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21866

Reviewed By: zrphercule

Differential Revision: D16471345

Pulled By: houseroad

fbshipit-source-id: 745c23ba8a3223b5ec59b924df7358a36a92518c
2019-07-26 13:56:15 -07:00
74f8094ea5 Rename threading build options
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23407

Test Plan:
USE_CUDA=0 ATEN_THREADING=TBB USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB
BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python
setup.py develop install --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D16522538

Pulled By: ilia-cher

fbshipit-source-id: 75c4761d93a7f5936f28e4c5eedcd27d8490d0c5
2019-07-26 13:09:14 -07:00
aae48748f2 Avoid unnecessary tensor clone in Cloneable (#20995)
Summary:
As pointed out by SsnL in https://github.com/pytorch/pytorch/issues/20910, when clone destination is different from the module's device,
`Cloneable` currently calls `clone()` and then `to()` on every parameter and buffer, where the first clone is unnecessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20995

Differential Revision: D15517353

Pulled By: mrshenli

fbshipit-source-id: 6b6dc01560540a63845663f863dea0a948021fa5
2019-07-26 12:46:42 -07:00
53182e53f0 fix observer name in the benchmark output (#23443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23443

as title

Reviewed By: hl475

Differential Revision: D16520962

fbshipit-source-id: 7a0ccbece487837c204f242d2a5c6f69b32cbc8c
2019-07-26 12:20:41 -07:00
828c08b4c7 allow passing a list of operators to benchmark (#23442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23442

Replace the argument name from `operator` to `operators` which can take a list of operators to test.

Reviewed By: hl475

Differential Revision: D16520779

fbshipit-source-id: 94284a87c64471793e319f5bd3143f89b9a192bb
2019-07-26 12:20:36 -07:00
0bc90194fb Catch and print exception traceback in parallel_apply() workers (#18055)
Summary:
When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure.

This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread.

Before:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply
    raise output
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

After:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply
    ''.join(traceback.format_exception(*exc_info)))
RuntimeError: Caught exception in replica 0. Original traceback and message:
Traceback (most recent call last):
  ...
  File "../models/foo.py", line 319, in bar
    baz = asdf / ghij[:, np.newaxis]
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055

Differential Revision: D16444972

Pulled By: zhangguanheng66

fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce
2019-07-26 11:41:22 -07:00
7499fe72e9 remove c2 tests from benchmark_all_test (#23437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23437

as title

Reviewed By: hl475

Differential Revision: D16519770

fbshipit-source-id: 63fc269e18c264d399e25f44b03f81fc3ae01113
2019-07-26 11:12:53 -07:00
e5e2face8f Change handling of DataParallel in ONNX exporter (#23365)
Summary:
Don't automatically unwrap top layer DataParalllel for users. Instead, we provide useful error information and tell users what action to take.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23365

Reviewed By: zrphercule

Differential Revision: D16514273

Pulled By: houseroad

fbshipit-source-id: f552de5c53fb44807e9d9ad62126c98873ed106e
2019-07-26 11:12:49 -07:00
c8c5e11fba Support variadic returns in Schema's operator<< (#23204)
Summary:
old: prim::PythonOp(...) ->
new: prim::PythonOp(...) -> ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23204
ghstack-source-id: 87208343

Reviewed By: zrphercule

Differential Revision: D16433592

fbshipit-source-id: 36cbb329188f112e09c3b1708a8090781b830dfe
2019-07-26 10:58:00 -07:00
34f53564b4 Don't warn when using conda compilers with utils.cpp_extension (#23396)
Summary:
The conda compiler are gcc/c++ 7.3.0, but have custom version strings
for clarity:

    x86_64-conda_cos6-linux-gnu-cc
    x86_64-conda_cos6-linux-gnu-c++

Using these compilers to build a C++ or CUDA extension now gives this warning (unnecessarily):

```
                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (/home/rgommers/anaconda3/envs/pytorch-nightly/bin/x86_64-conda_cos6-linux-gnu-c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux.
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23396

Differential Revision: D16500637

Pulled By: soumith

fbshipit-source-id: 5b2fc3593e22e9a7d07dc2c0456dbb4934ffddb2
2019-07-26 10:17:14 -07:00
47a54295ee Add fbgemm_qlinear_dynamic op (#23104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23104

ghstack-source-id: 87247148

As suggested in https://github.com/pytorch/pytorch/pull/22891, we will add an overload for ```torch.fbgemm_linear_int8_weight``` (dynamic quantized version of linear function) that takes PackedLinearWeight as input and is pretty much the same in signature as regular aten::linear.

Differential Revision: D16381552

fbshipit-source-id: 1ccc4174fd02c546eee328940ac4b0da48fc85e8
2019-07-26 10:11:56 -07:00
b7984fa8a7 Remove cases of AT_FORALL_SCALAR_TYPES_EXCEPT_HALF.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22890

Test Plan: Imported from OSS

Differential Revision: D16467980

Pulled By: gchanan

fbshipit-source-id: 93ddbd041b7f65cafe8520b095289f14ad6d667f
2019-07-26 09:58:35 -07:00
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
c8a50a26d2 Named inference rule for torch.prod
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23106

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D16419175

Pulled By: zou3519

fbshipit-source-id: beb9ef838525c1ea7d7839cb9b8d68028fb4917f
2019-07-26 08:50:45 -07:00
9817d7e16b Implement named inference rule for torch.sum
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23081

Test Plan:
- New tests [namedtensor ci]

Imported from OSS

Differential Revision: D16419174

Pulled By: zou3519

fbshipit-source-id: 8679f77f121664d0398d7f062a53c0fa37482481
2019-07-26 08:50:40 -07:00
4104e80eae qconv+relu and qlinear+relu modules (#23410)
Summary:
adding qconv+relu and qlinear+relu modules in nn/_intrinsic/quantized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23410

Test Plan:
Extended tests to test these new modules as well

buck test mode/dev caffe2/test:quantized -- 'test_linear_api'  --print-passing-details
```
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379
      ✓ caffe2/test:quantized - test_linear_api (test_nn_quantized.ModuleAPITest) 4.055 1/1 (passed)
Test output:
> test_linear_api (test_nn_quantized.ModuleAPITest)
> test API functionality for nn.quantized.linear and nn._intrinsic.quantized.linear_relu ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 4.056s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/2251799820197379
Summary (total time 10.66s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

buck test mode/dev caffe2/test:quantized -- 'test_conv_api'  --print-passing-details
```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.QuantizedConvTest) 5.195 1/2 (passed)
Test output:
> test_conv_api (test_quantized_conv.QuantizedConvTest)
> Tests the correctness of the conv functional. ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 5.195s
>
> OK
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 10.616 2/2 (passed)
Test output:
> test_conv_api (test_nn_quantized.ModuleAPITest)
> Tests the correctness of the conv module. ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 10.616s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074607089664
Summary (total time 17.31s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
``

Differential Revision: D16505333

Pulled By: dskhudia

fbshipit-source-id: 04f45cd0e76dc55f4694d558b913ab2958b7d727
2019-07-26 08:50:36 -07:00
fb8725fdbd Update doc about subsequent builds: options can be changed in build/CMakeCache.txt
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23331

Test Plan: Imported from OSS

Differential Revision: D16517622

Pulled By: ezyang

fbshipit-source-id: d2d15b8bb2599b3b9abb7a1aa6bc91bfc0d8e5d0
2019-07-26 08:50:32 -07:00
0b4c0b95e9 For second-time build, let build_type be inferred from CMakeCache.txt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23323

Test Plan: Imported from OSS

Differential Revision: D16517621

Pulled By: ezyang

fbshipit-source-id: 22984df214d01246a7868980e148936698940ea8
2019-07-26 08:50:28 -07:00
beb109f6f1 Enable cpp api test in multigpu-test.sh (#23380)
Summary:
blocking https://github.com/pytorch/pytorch/issues/20995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23380

Differential Revision: D16517013

Pulled By: mrshenli

fbshipit-source-id: 3f44ecf0e8d1e235165f2ce4396795ca38e2d837
2019-07-26 07:44:07 -07:00
46224ef89e Update ONNX docs (#23185)
Summary:
This is still work in progress.

There are several more items to add to complete this doc, including

- [x] LHS indexing, index assignments.
- [x] Tensor List.
- [x] ~Shape/Type propagation.~
- [x] FAQs

Please review and share your thoughts, feel free to add anything that you think should be included as well. houseroad spandantiwari lara-hdr neginraoof
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23185

Differential Revision: D16459647

Pulled By: houseroad

fbshipit-source-id: b401c005f848d957541ba3b00e00c93ac2f4609b
2019-07-26 00:14:54 -07:00
7a0ae0079f export sort to onnx
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21913

Differential Revision: D15982801

Pulled By: houseroad

fbshipit-source-id: 96dbd738c557478fffd48000db7263ae1f9754f5
2019-07-26 00:02:20 -07:00
1c00e0fc3f Added a flatten module (#22245)
Summary:
https://github.com/pytorch/pytorch/issues/2118

I'm not sure I'm doing it correctly, so I'll add tests if we decide that it's roughly correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22245

Differential Revision: D16508957

Pulled By: Chillee

fbshipit-source-id: a8dc7af999ba698c921006889f71cb1bc5a59d50
2019-07-25 22:48:52 -07:00
5b0484d977 Fix forwarding of arguments into kernel function (#23412)
Summary:
They should be forwarded by their actual type, not their rvalue reference.
This looked like perfect forwarding but actually wasn't.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23412
ghstack-source-id: 87214575

Reviewed By: dzhulgakov

Differential Revision: D16507872

fbshipit-source-id: 2b20a37df83067dd53e917fe87407ad687bb147c
2019-07-25 22:00:40 -07:00
3516f3c235 handle exit from init method (#21211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21211

There are cases where the `init` method used to create inputs can exit with error. When this happens, that specific input should be skipped.

Reviewed By: zheng-xq

Differential Revision: D15466410

fbshipit-source-id: 55e86764b2ec56f7730349ff1df6e50efc0239d7
2019-07-25 21:41:06 -07:00
dd79d45c5a Added torch.bitwise_not docstr (#23397)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/23311
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23397

Differential Revision: D16505107

Pulled By: pbelevich

fbshipit-source-id: 8d515fc27e253469393941c8da23d8e0510e64df
2019-07-25 18:32:58 -07:00
58a3e4f71f Automatic update of fbcode/onnx to 28ca699b69b5a31892619defca2391044a9a6052 (#23404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23404

Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd

Included changes:
- **[28ca699b](https://github.com/onnx/onnx/commit/28ca699b)**: Member Company logo guidelines (#2196) <Prasanth Pulavarthi>
- **[47acb06a](https://github.com/onnx/onnx/commit/47acb06a)**: remove link to outdated issue for contributions wanted (#2186) <Prasanth Pulavarthi>
- **[168519f6](https://github.com/onnx/onnx/commit/168519f6)**: Create sigs.md (#2103) <Prasanth Pulavarthi>
- **[b9320746](https://github.com/onnx/onnx/commit/b9320746)**: mintor format update (#2180) <Prasanth Pulavarthi>
- **[65b8e0f9](https://github.com/onnx/onnx/commit/65b8e0f9)**: add more types support for Equal op (#2176) <Ke Zhang>
- **[dc5e62a9](https://github.com/onnx/onnx/commit/dc5e62a9)**: Update AddNewOP document. (#2172) <Emad Barsoum>
- **[bae8b530](https://github.com/onnx/onnx/commit/bae8b530)**: Add missing space (#2150) <Takeshi Watanabe>
- **[5952b7f5](https://github.com/onnx/onnx/commit/5952b7f5)**: python api example typo fix (#2155) <LeicongLi>
- **[904cb842](https://github.com/onnx/onnx/commit/904cb842)**: Fix errors in RoiAlign shape inference code (#2167) <G. Ramalingam>

Reviewed By: zrphercule

Differential Revision: D16502373

fbshipit-source-id: 81f1e8f0db6828fd551089ae2e0be65153739532
2019-07-25 18:26:04 -07:00
bd54608bd2 fused qconv2d+relu kernel (#23353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23353

Adding support for fused qconv2d + relu

Reviewed By: jianyuh

Differential Revision: D16473318

fbshipit-source-id: cd3c3476a21ffe946dbd9812e833b957c0fd206c
2019-07-25 17:55:47 -07:00
6a8c2758d5 Add better performing versions for groupwise and depthwise convolutions (#22869)
Summary:
Groupwise and depthwise convolutions become faster with this diff
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22869

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qconv'  --print-passing-details

```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224
      ✓ caffe2/test:quantized - test_qconv (test_quantized.TestQuantizedConv) 2.731 1/2 (passed)
Test output:
> test_qconv (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 2.732s
>
> OK
      ✓ caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQuantizedConv) 5.187 2/2 (passed)
Test output:
> test_qconv_unpack (test_quantized.TestQuantizedConv) ... ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 5.188s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/562950091484224
Summary (total time 15.66s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

```

buck test mode/dev caffe2/test:quantized -- 'test_conv_api'
```
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.040 1/2 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.402 2/2 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/3940649676010406
Summary (total time 11.83s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D16264144

Pulled By: dskhudia

fbshipit-source-id: 32fa43e5c3d97c8aaa6e0858327a2ac0aef8df5c
2019-07-25 17:55:43 -07:00
2409e6a475 C++ at::Tensor, torch::tensor constructors should not accept QInts.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22889

Test Plan: Imported from OSS

Differential Revision: D16467984

Pulled By: gchanan

fbshipit-source-id: 6e2b1bf2a6a8dbc60138cd437b9cf814a0fc297d
2019-07-25 16:30:25 -07:00
0e3a359a38 Align the operator<< for Argument with FunctionSchema parser (#23203)
Summary:
Align the Argument's operator<< with parser,
additional support:
1) List size
2) real default value
3) Alias information

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23203
ghstack-source-id: 87118985

Reviewed By: zrphercule

Differential Revision: D16433188

fbshipit-source-id: aea5711f93feacd94d1732e2f0d61218a31a0c5c
2019-07-25 15:28:17 -07:00
83b0fbc38d Remove TensorIterator::Builder (#23329)
Summary:
The builder pattern doesn't seem to work well with return-value-optimization.
This saves ~100 ns in the construction of TensorIterator::binary_op.

```
import torch
x = torch.rand(1)
y = torch.rand(1)
z = torch.rand(1)
%timeit torch.add(x, y, out=z)  # ~1.76 us vs ~1.88 us on my machine
```

cc resistor zheng-xq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23329

Differential Revision: D16495070

Pulled By: VitalyFedyunin

fbshipit-source-id: 8ce116075fa4c7149dabfcdfa25885c1187c8e2f
2019-07-25 15:15:49 -07:00
2cfe949d45 DEPRECATE_MESSAGE doesn't actually get expanded; inline it at both sites.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23379

Test Plan: Imported from OSS

Differential Revision: D16495278

Pulled By: ezyang

fbshipit-source-id: 596438fbf3285d6eee7b5d27a014f87b6c261cf1
2019-07-25 14:26:00 -07:00
b755bc1e31 fix importing for module defs that are named "foo.bar"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23367

Test Plan: Imported from OSS

Differential Revision: D16478637

Pulled By: suo

fbshipit-source-id: 30c6e7bfe377ef35d8c39e2d31615075ca0a6a19
2019-07-25 14:07:56 -07:00
b22adeb007 Fix error message for a wrong fork CUDA (#23322)
Summary:
Re-land https://github.com/pytorch/pytorch/pull/23030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23322

Differential Revision: D16469442

Pulled By: zhangguanheng66

fbshipit-source-id: 70b63ab6265efa3f289111ef0ce46bb3c0d353bc
2019-07-25 12:58:14 -07:00
d18529eb93 Fix upload path for macos binaries (#23386)
Summary:
peterjc123  will this affect windows too?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23386

Differential Revision: D16499443

Pulled By: pjh5

fbshipit-source-id: a3bec32d16f2cd06a097082deae0b020dd8bc5ac
2019-07-25 12:48:04 -07:00
7ee62d3d91 Fix the iOS build (#23293)
Summary:
The legacy iOS build script (`build_ios.sh`) is still working, but the output is in caffe2, not Pytorch. To enable the Pytorch iOS build, we can set the value of `BUILD_CAFFE2_MOBILE` to `NO`, and turn on another cmake arg - `INTERN_BUILD_MOBILE` ljk53  has created for Android.

There is a trivial issue in `used_kernel.cpp` that will cause the compiling error when running `build_ios.sh`, as it uses a `system`API that has been deprecated since iOS 11. The fix below is to bypass this file since it's not needed by mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23293

Test Plan:
The `build_ios.sh` completed successfully, and all the generated static libraries can be compiled and linked successfully on iOS devices.

### Build script

```shell
./scripts/build_ios.sh \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```

Differential Revision: D16456100

Pulled By: xta0

fbshipit-source-id: 38c73e1e3a0c219a38ddc28b31acc181690f34e8
2019-07-25 12:41:20 -07:00
071536f895 Fix the operator== for Argument (#23202)
Summary:
type() returns a shared pointer, we should compare its content, not pointer itself

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23202
ghstack-source-id: 87125582

Reviewed By: zrphercule

Differential Revision: D16432634

fbshipit-source-id: 639e730dcdc1cec02f280efeea53019b36d9ae37
2019-07-25 11:59:28 -07:00
bbc53bffef AliasAnalysisKind::CONSERVATIVE/FROM_SCHEMA (#22175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22175

- Rename AliasAnalysisKind::DEFAULT to AliasAnalysisKind::CONSERVATIVE
- Introduce AliasAnalysisKind::FROM_SCHEMA that means the alias annotations of the schema should be honored
- Introduce AliasAnalysisKind::INTERNAL_SPECIAL_CASE to be able to run assertions that internal special cased ops are treated correctly

- aten:: and prim:: ops are not treated as special cases anymore, but just use AliasAnalysisKind::FROM_SCHEMA
- There's a set of assertions to ensure that aten:: and prim:: ops are all correctly set up to use AliasAnalysisKind::FROM_SCHEMA. Once this PR lands and passes all tests, we will remove those assertions and open up for the possibility of different AliasAnalysisKind settings for aten:: and prim:: ops

Differential Revision: D15929595

fbshipit-source-id: 7c6a9d4d29e13b8c9a856062cd6fb3f8a46a2e0d
2019-07-25 11:53:51 -07:00
b9202d459a Polish torch::Dict (#23344)
Summary:
torch::List recently received some polishing that now also is done for Dict. This should be done before the PyTorch 1.2 release because of backwards compatibility.

- Dict is just a reference type, so "const Dict" should have the same capabilities as "Dict", constness is not guaranteed in any way.
- DictIterator gets comparison operators <, <=, >, >=

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23344
ghstack-source-id: 87170304

Differential Revision: D16468800

fbshipit-source-id: 2978c3b9cdcfb2cfb3f26516b15bd455d9a48ba9
2019-07-25 11:35:36 -07:00
722f80a07d Align String str() with the format in FunctionSchema (#23201)
Summary:
old: string
new: str

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23201
ghstack-source-id: 87125580

Reviewed By: zrphercule

Differential Revision: D16432340

fbshipit-source-id: 56bc7e8efbc2276315f464958cf38704f75dd06e
2019-07-25 11:31:00 -07:00
7c383ba4a0 Remove superfluous check (#23370)
Summary:
This check is not needed. Even if it were, the assignment is clobbered anyway.

Closes #23300.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23370
ghstack-source-id: 87157671

Differential Revision: D16485329

fbshipit-source-id: 8ccac79e81f5e0d0d20099d550411c161f58c233
2019-07-25 11:26:16 -07:00
39de49c7ec Fix a tiny bug and refactor (#22808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22808

- Use ```size_to_dim_```.
- ```mod``` is not in the scope. Should be ```module```.

Reviewed By: mingzhe09088

Differential Revision: D16225799

fbshipit-source-id: 9a263227d2d508eefdfddfee15fd0822819de946
2019-07-25 11:26:12 -07:00
39fd264799 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23381

Differential Revision: D16496327

Pulled By: pjh5

fbshipit-source-id: 529029544a5f8c8106bcb7cebdc71aee33e3b86c
2019-07-25 10:39:37 -07:00
ed316c0ca0 Align Dict str() with the format in FunctionSchema (#23200)
Summary:
Old: Dict[int, str]
New: Dict(int, str)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23200
ghstack-source-id: 87125581

Reviewed By: zrphercule

Differential Revision: D16432159

fbshipit-source-id: a3dc5fa397697a53e78290d25e19589f757c1eb8
2019-07-25 10:27:41 -07:00
f477cab2dc Add type checks in _intrinsics.modules.fused (#23361)
Summary:
recreated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23361
ghstack-source-id: 87142339

Reviewed By: zafartahirov

Differential Revision: D16455500

fbshipit-source-id: ab2c9d10c7c025ae77f5b80f28e6bd261620a5f7
2019-07-25 09:49:01 -07:00
25e06618c8 Support parsing variadic return schema (#23199)
Summary:
all cases should be prim ops, but let's support it. it will expect variadic return schema to be prim::PythonOp(...) -> ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23199
ghstack-source-id: 87113845

Differential Revision: D16431635

fbshipit-source-id: 798b6957ce5d800f7fcf981c86fdcb009cd77a78
2019-07-25 09:39:49 -07:00
cf50249bde Disable fusion of grad_sum_to_size (#23372)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/22833

grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add.

Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you!

About the choice of removing the fusion completely instead of being more precise:
- We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed.
- There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic.
- Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes.
- The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372

Differential Revision: D16489930

Pulled By: soumith

fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4
2019-07-25 08:55:33 -07:00
82545ecc71 Specify build dir as a global variable in BUILD_DIR in the build system.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23318

Test Plan: Imported from OSS

Differential Revision: D16493987

Pulled By: ezyang

fbshipit-source-id: 497e9dd924280f61dde095b4f2b50f5402d9da97
2019-07-25 07:19:47 -07:00
915261c8be Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776)
Summary:
This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776

Differential Revision: D16493553

Pulled By: ezyang

fbshipit-source-id: 852f4779e70f84a4c9f7bab4c2ae4927248ffc93
2019-07-25 07:19:44 -07:00
f91b19c2aa Do not explicitly set USE_FBGEMM in tools/setup_helpers/cmake.py (#23314)
Summary:
Instead, defer its default value to CMakeLists.txt

NO_FBGEMM has already been handled in tools/setup_helpers/env.py
(although deprecated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23314

Differential Revision: D16493580

Pulled By: ezyang

fbshipit-source-id: 7255eb1df5e8a6dd0362507d68da0986a9ed46e2
2019-07-25 07:11:52 -07:00
ba6f65cf33 Add document of functions nn.init.ones_/zeros_ (#23145)
Summary:
Functions `nn.init.ones_` and `nn.init.zeros_` were not documented. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23145

Differential Revision: D16427108

Pulled By: soumith

fbshipit-source-id: 4fac31e79717a436411ef5e107a829b403e576c9
2019-07-25 06:09:50 -07:00
252710262f (#22775)
Summary:
passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures
consistent fusion in nested Blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775

Differential Revision: D16439979

Pulled By: soumith

fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1
2019-07-25 05:54:03 -07:00
0c79753c0d Improve documentation for torch.enable_grad , torch.no_grad and torch.set_grad_enabled (#23310)
Summary:
Modified documentation for ` torch.enable_grad` , ` torch.no_grad` and `torch.set_grad_enabled`.

Fixes https://github.com/pytorch/pytorch/issues/19189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23310

Differential Revision: D16489626

Pulled By: soumith

fbshipit-source-id: f0926e4f51ffd97521e67bee3a16ad954458247a
2019-07-25 05:48:33 -07:00
2938299de1 Fix lint failure introduced in #23346
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23371

Differential Revision: D16489985

Pulled By: pietern

fbshipit-source-id: 914048563bbe7bf5ab897c6f12f4a1bb4bff30e1
2019-07-25 05:17:15 -07:00
0842624d50 Fix upload issue with linux libtorch nightlies (#23368)
Summary:
This is a small fix on top of gh-23348, which fixed the libtorch
nightly build timeouts.

For the latest nighly build (25 July), see
https://circleci.com/workflow-run/33d0a24a-b77c-4a8f-9ecd-5646146ce684
The only failures are these uploads, which is because `aws s3 cp`
can only deal with one file at a time. The only way to make it do
multiple files at once is:
```
aws s3 cp . "$s3_dir" --exclude "*"  --include "libtorch-*.zip" --recursive --acl public-read
```
which is much more verbose. executing one `cp` per file should be fine,
and this is also what's done in `binary_macos_upload.sh`

Closes gh-23039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23368

Differential Revision: D16488853

Pulled By: soumith

fbshipit-source-id: 6dc04b4de2f6cd2de5ae9ad57a6e980f56896498
2019-07-25 04:52:43 -07:00
95e822622b Enhance interpretation of GLOO_SOCKET_IFNAME (#22978)
Summary:
With this change you can now list multiple interfaces separated by
comma. ProcessGroupGloo creates a single Gloo context for every device
in the list (a context represents a connection to every other
rank). For every collective that is called, it will select the context
in a round robin fashion. The number of worker threads responsible for
executing the collectives is set to be twice the number of devices.

If you have a single physical interface, and wish to employ increased
parallelism, you can also specify
`GLOO_SOCKET_IFNAME=eth0,eth0,eth0,eth0`.  This makes ProcessGroupGloo
use 4 connections per rank, 4 I/O threads, and 8 worker threads
responsible for executing the collectives.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22978
ghstack-source-id: 87006270

Differential Revision: D16339962

fbshipit-source-id: 9aa1dc93d8e131c1714db349b0cbe57e9e7266f1
2019-07-25 04:52:38 -07:00
1dd4d55565 Improve FindMKLDNN.cmake to avoid binary compatibility issue in MKL-DNN (#23292)
Summary:
Illegal instruction is encountered in pre-built package in MKL-DNN. https://github.com/pytorch/pytorch/issues/23231
To avoid such binary compatibility issue, the HostOpts option in MKL-DNN is disabled in order to build MKL-DNN for generic arch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23292

Differential Revision: D16488773

Pulled By: soumith

fbshipit-source-id: 9e13c76fb9cb9338103cb767d7463c10891d294a
2019-07-25 04:42:26 -07:00
e8ad167211 Remove usage of FOLDER constant in test_distributed.py (#23223)
Summary:
This is step 1 in trying to get rid of constants that are set prior to
executing the test runner. All setup logic should be concentrated in
the setupClass() function of the TestCase subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23223
ghstack-source-id: 87005260

Reviewed By: zhaojuanmao

Differential Revision: D16439147

fbshipit-source-id: 7a929ad4b1c8e368e33d1165becbd4d91220882c
2019-07-25 02:55:30 -07:00
711be82951 Make optimize a thread_local flag
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23170

Test Plan: Imported from OSS

Differential Revision: D16441912

Pulled By: suo

fbshipit-source-id: a33485178a329d54e41e364c4f14950f88481c55
2019-07-24 23:09:21 -07:00
b3980f46a2 Replace uint8 with int8 in Linear and LSTM quantization path (#23347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23347

This diff replaces uint8 with int8 to match with the underlying kernel implementation.  When we do int8 quantization,  we are computing with uint8 (input activation) * int8 (weight) -> uint8 (output activation). The weight is quantized into int8.

Reviewed By: jianyuh

Differential Revision: D16469435

fbshipit-source-id: a697655b0e97833fc601e5980970aec4dba53c39
2019-07-24 22:25:12 -07:00
fba325be34 Try kill -9ing apt-get (#23354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23354

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16474254

Pulled By: ezyang

fbshipit-source-id: 0dd7ce02e1aa1a42a24d2af066ebd0ac5206c9a0
2019-07-24 19:27:10 -07:00
ff3cc795c8 Build binaries with local version string specifying CUDA version (#23325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23325

Fixes #19990

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16473826

Pulled By: ezyang

fbshipit-source-id: 466db2c22fabd7b574f0a08aec67a18318ddb431
2019-07-24 18:32:32 -07:00
cf0f3556f6 Enabled cumsum and cumprod for bool tensors (#23346)
Summary:
```
a = torch.tensor([[True, False, True],
                  [False, False, False],
                  [True, True, True]])

>>> torch.cumsum(a, 0)
tensor([[1, 0, 1],
        [1, 0, 1],
        [2, 1, 2]])

>>> torch.cumsum(a, 1)
tensor([[1, 1, 2],
        [0, 0, 0],
        [1, 2, 3]])
```

Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23346

Differential Revision: D16469393

Pulled By: izdeby

fbshipit-source-id: b55f3ca0588f9961a771def40f6ef58932021e1a
2019-07-24 18:16:01 -07:00
c9312e1a8b Open source 3D depthwise conv (#23164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23164

for open source CSN model

Reviewed By: weiyaowang

Differential Revision: D16412312

fbshipit-source-id: bb4e7748e697271563f974ca05878f8832d83653
2019-07-24 17:56:56 -07:00
73dee44ec5 Specifying libtorch variant in libtorch builds (#23348)
Summary:
This should fix all libtorch timeout issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23348

Differential Revision: D16472782

Pulled By: pjh5

fbshipit-source-id: b1f3a783d044eb881f6e8755e0c891093e93c93e
2019-07-24 17:31:43 -07:00
3297d8e203 Switch keys to be sequential and stable in pickle serialization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23280

Test Plan: Imported from OSS

Differential Revision: D16452816

Pulled By: zdevito

fbshipit-source-id: e143780b8e834298a575ac76d49576df94fbe27b
2019-07-24 17:13:51 -07:00
93da1030df Fix pickler bug where it would not load if no tensors were saved
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23263

Test Plan: Imported from OSS

Differential Revision: D16446928

Pulled By: zdevito

fbshipit-source-id: f70f86b28c3901a97b65b4d7654e39dc6e1aab6a
2019-07-24 17:13:46 -07:00
7922b5057d Memoize storages in pickler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23262

Test Plan: Imported from OSS

Differential Revision: D16446927

Pulled By: zdevito

fbshipit-source-id: 92d26f64ff6269b1deef821edae31745158b5137
2019-07-24 17:13:42 -07:00
71a047c3e3 Unwrap DataParallel automatically (#23334)
Summary:
Handle DataParallel for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23334

Differential Revision: D16467844

Pulled By: houseroad

fbshipit-source-id: 696aeada437c6c0612ac4ef9c4d51e3386625de0
2019-07-24 16:29:48 -07:00
c23ba35009 Skip QNNpack tests on ppc64le (where support is not enabled) (#23343)
Summary:
Proposed PR for
https://github.com/pytorch/pytorch/issues/23342

Disables execution of QNNpack tests if IS_PPC.
Basically this parallels the same skipping of tests for IS_WINDOWS as well, which is already present.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23343

Differential Revision: D16469218

Pulled By: soumith

fbshipit-source-id: 80b651d00e5d413e359cf418f79e20d74cd9c8e1
2019-07-24 15:24:00 -07:00
22c169fb9c Improve the error message for ONNX export (#23317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23317

Print out the kind type when fail to export

Reviewed By: zrphercule

Differential Revision: D16462641

fbshipit-source-id: 27157c0bd597362f90ac8cfb33e1808bac0ec48b
2019-07-24 15:03:05 -07:00
87d3f66506 max_pool_with_indices: return valid indices if all input elements are -inf (#23161)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20465.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23161

Differential Revision: D16442672

Pulled By: ezyang

fbshipit-source-id: 8c2ee13acd73954c7307720c01c732f460266a63
2019-07-24 14:51:39 -07:00
b7d90332ea add notes about overshoot in bicubic mode (#23321)
Summary:
fix https://github.com/pytorch/pytorch/issues/21044

Bicubic interpolation can cause overshoot.

Opencv keeps results dtype aligned with input dtype:
- If input is uint8, the result is clamped [0, 255]
- If input is float, the result is unclamped.

In Pytorch case, we only accept float input, so we'll keep the result unclamped, and add some notes so that users can explicitly call `torch.clamp()` when necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23321

Differential Revision: D16464796

Pulled By: ailzhang

fbshipit-source-id: 177915e525d1f54c2209e277cf73e40699ed1acd
2019-07-24 14:46:37 -07:00
d522b3ca58 BlackBoxPredictor OSS part N: ThreadLocalPtr, InferenceGraph (#23257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23257

Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: zrphercule

Differential Revision: D16428124

fbshipit-source-id: b35deada5c015cd97b91ae12a7ea4aac53bd14b8
2019-07-24 14:35:30 -07:00
2915d53096 Move OptionalType wrapping out of constants.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23234

Pulled By: driazati

Differential Revision: D16460880

fbshipit-source-id: d4e6b747615dbfe73a92ce571d3b2aaae7179f1b
2019-07-24 14:35:26 -07:00
48ca64dbf7 Better error for compiling a module type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23312

Pulled By: driazati

Differential Revision: D16461299

fbshipit-source-id: 11e56c44d561c3fbf70a96c22c5fd494eea0cf19
2019-07-24 14:24:50 -07:00
d6dcec37b6 Add docs about prod ecosystem features (#23010)
Summary:
Covering fleet-wide profiling, api logging, etc.

It's my first time writing rst, so suggestions are definitely welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23010

Differential Revision: D16456721

Pulled By: dzhulgakov

fbshipit-source-id: 3d3018f41499d04db0dca865bb3a9652d8cdf90a
2019-07-24 14:15:33 -07:00
mal
87482bb15a Remove torch::autograd::Node::get_shared_ptr()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23307

Test Plan: Imported from OSS

Differential Revision: D16460972

fbshipit-source-id: 0678627e05dd4c69c4dafa6b717db5cd1a531f56
2019-07-24 13:50:47 -07:00
8fdbe1e10b Support LSTM with FP16 weight (#23291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23291

This diff implements LSTM with FP16 weights based on FBGEMM.

At a high level, here are the steps:
1. Quantize and pack weight in every layer of LSTM
2. Pass weights from step 1 to the ATen `quantized_lstm` function which does matrix multiplication with FP16 weight. The following code shows the dtype of each variable used in MM:
Y  =   X * W + B
(fp32, fp32, fp16, fp32)

Reviewed By: jianyuh

Differential Revision: D16389595

fbshipit-source-id: c26ae4e153c667a941f4af64e9d07fc251403cee
2019-07-24 12:40:11 -07:00
1f608d09cf Revert D16440000: [pytorch][PR] Re-land "Fix error message for a wrong fork CUDA"
Differential Revision:
D16440000

Original commit changeset: e05683275522

fbshipit-source-id: b688f24c1e6d3d8f63c2d415262a3f0ab1b85914
2019-07-24 12:05:36 -07:00
1a9edfcd36 Prevent xla-build from clobbering pytorch_linux_trusty_py3_6_gcc5_4_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23304

Test Plan: Imported from OSS

Differential Revision: D16459166

Pulled By: zou3519

fbshipit-source-id: 8f5c35ebf1fe6e86705e7fb4fff4c720bcd8f97b
2019-07-24 11:58:44 -07:00
0c0ffccbb6 deepCopy also copies type information of lists (#23271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23271
ghstack-source-id: 87088503

Differential Revision: D16449220

fbshipit-source-id: 551b7cef8f6d0d2d5a56b24ddbe2e0bb2c0c3dbe
2019-07-24 11:53:01 -07:00
895e79adf1 Revert D16441000: Switch from KaTeX to imgmath for documentation rendering.
Differential Revision:
D16441000

Original commit changeset: c1ab557cb816

fbshipit-source-id: cbfec2ca648b614b291debd6b3e215db9fbeb57b
2019-07-24 11:43:17 -07:00
94711d7471 Quantized conv avoid functional usage (#22733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22733

This refactor changes the conv module to avoid the usage of the functional ops.

Reviewed By: jerryzh168

Differential Revision: D15835572

fbshipit-source-id: f2294cd708fbe8372eb3a15cc60d83777d4f7029
2019-07-24 11:43:12 -07:00
67179d71f7 Reduce number of processes spawned for gloo_test.TestCase.test_forked_cw (#23221)
Summary:
It used to be run with comm_size=8, which causes flaky results in a
stress run. The flakiness was caused by too many listening sockets
being created by Gloo context initialization (8 processes times 7
sockets times 20-way concurrency, plus TIME_WAIT).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23221
ghstack-source-id: 86995596

Reviewed By: d4l3k

Differential Revision: D16437834

fbshipit-source-id: 998d0e2b087c0ab15eca64e308059c35e1b51e7b
2019-07-24 11:31:20 -07:00
3ed79f4b6c Fix argument names in torch doc (#22973)
Summary:
I manually went through all functions in `torch.*` and corrected any mismatch between the arguments mentioned in doc and the ones actually taken by the function. This fixes https://github.com/pytorch/pytorch/issues/8698.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22973

Differential Revision: D16419602

Pulled By: yf225

fbshipit-source-id: 5562c9b0b95a0759abee41f967c45efacf2267c2
2019-07-24 11:22:45 -07:00
eb51131fb4 Revert D16423217: [pytorch][PR] Update sleef to master, fixes #20535
Differential Revision:
D16423217

Original commit changeset: 587de3f10e83

fbshipit-source-id: 466e56eab73ce669cc179d08b7f39d2c8b0ffb34
2019-07-24 11:10:15 -07:00
803af9988c Fix the problem in parseOctal and throw exception if use \xhh to specify hex value (#23198)
Summary:
follow the rules:
1) https://docs.python.org/2.0/ref/strings.html
2) https://en.cppreference.com/w/cpp/language/escape
didn't find anything about the format \h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23198
ghstack-source-id: 86986691

Reviewed By: zrphercule

Differential Revision: D16431215

fbshipit-source-id: 7b342708d1984e08b3cbbcf6d487623f1dc63a14
2019-07-24 10:41:59 -07:00
b9a7fc529a Suppress warnings in tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23264

Pulled By: driazati

Differential Revision: D16449965

fbshipit-source-id: ff7d6ddf2275e5f44883a19b6a24c6beaa42ccf4
2019-07-24 10:36:46 -07:00
200cb836f1 Enabled 'add_cuda' for bool and fixed alpha scalar bug (#23044)
Summary:
Enabled 'add_cuda' for bool
Tested via unit tests

Fixed https://github.com/pytorch/pytorch/issues/22431 #22430
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23044

Differential Revision: D16368095

Pulled By: izdeby

fbshipit-source-id: 033d28095ff1c5df4078905c52782cf4cf9ed6b0
2019-07-24 10:31:34 -07:00
fbf28b5458 Change TensorIterator to be stack allocated, using named return value optimization to elide copies.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22519

Differential Revision: D16451460

fbshipit-source-id: 6ca6ae2fdf1af5a2f792b42e55279413971b3c46
2019-07-24 10:22:19 -07:00
7203612f85 Update sleef to master, fixes #20535 (#23168)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20535

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23168

Differential Revision: D16423217

Pulled By: ezyang

fbshipit-source-id: 587de3f10e839b94f51f673741b5fda8849e32f6
2019-07-24 08:18:14 -07:00
fd1d06e317 Let Python build scripts accept both CMAKE_BUILD_TYPE and the oldschool DEBUG and REL_WITH_DEB_INFO variables. (#22875)
Summary:
Currently the build type is decided by the environment variable DEBUG
and REL_WITH_DEB_INFO. This commit also lets CMAKE_BUILD_TYPE be
effective. This makes the interface more consistent with CMake. This
also prepares https://github.com/pytorch/pytorch/issues/22776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22875

Differential Revision: D16281663

Pulled By: ezyang

fbshipit-source-id: 952f92aad85ff59f1c7abe8256eca8a4a0936026
2019-07-24 08:07:47 -07:00
aa660b8eb7 Re-land "Fix error message for a wrong fork CUDA" (#23209)
Summary:
Re-land https://github.com/pytorch/pytorch/pull/23030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23209

Differential Revision: D16440000

Pulled By: zhangguanheng66

fbshipit-source-id: e05683275522835a33d5a7e6d76b7e94774e4d98
2019-07-24 07:01:04 -07:00
4cd726c7b3 Update ROCm CI to python3.6 (#23088)
Summary:
Rehash of https://github.com/pytorch/pytorch/issues/22322 .

Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088

Differential Revision: D16448261

Pulled By: bddppq

fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8
2019-07-23 23:07:45 -07:00
91bef6c168 format sugared_value & compiler.cpp (#23283)
Summary:
there are a lot of formatting changes which makes other diffs to these PRs noisy & hard to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23283

Differential Revision: D16453590

Pulled By: eellison

fbshipit-source-id: 97b4bf1dbbbfb09c44c57402f61ea27287060044
2019-07-23 22:29:22 -07:00
bc15a20db9 Minor refactor: propagating messages in TestCase
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23146

Test Plan: Imported from OSS

Differential Revision: D16413801

Pulled By: zafartahirov

fbshipit-source-id: 8009a7afe77e127ddd220fb71c6c272b0d44c733
2019-07-23 21:18:44 -07:00
8a77098247 Make Module::register_module / register_parameter / register_buffer public (#23196)
Summary:
In Python, `register_module` / `register_parameter` / `register_buffer` method in `nn.Module` is public. This PR makes those APIs public for C++ `nn::Module` as well. Closes https://github.com/pytorch/pytorch/issues/23140.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23196

Differential Revision: D16440239

Pulled By: yf225

fbshipit-source-id: e0eff6e1db592961fba891ec417dc74fa765e968
2019-07-23 21:18:41 -07:00
24601daa12 Adding check for a single batch in adaptive_avg_pool
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23137

Test Plan: Imported from OSS

Differential Revision: D16403804

Pulled By: zafartahirov

fbshipit-source-id: df79a8c768ffabeceb4c0044c967a623c5885484
2019-07-23 21:11:06 -07:00
mal
e7a9b0d62f Rename torch::autograd::Function to torch::autograd::Node
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23269

Test Plan: Imported from OSS

Differential Revision: D16454878

fbshipit-source-id: b1e840fc2d3901955280d141e5ad6efd5e9d66af
2019-07-23 20:52:22 -07:00
0ab19d66ee Port lu_solve to ATen (#22379)
Summary:
Changelog:
- Port TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Remove TH/THC implementations
- Update doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379

Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists)

Differential Revision: D16089645

Pulled By: zou3519

fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8
2019-07-23 19:11:35 -07:00
2197bee3da add cudnn.cu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23265

Differential Revision: D16453611

Pulled By: bddppq

fbshipit-source-id: b49e01b6d019781097ec5072cdc6d37a2988bfbe
2019-07-23 18:09:23 -07:00
a936a90391 caffe2/caffe2/fb/operators/cc_amrc: drop SIMD OpenMP vectorization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23235

Reviewed By: ajtulloch

Differential Revision: D16384612

Pulled By: luciang

fbshipit-source-id: a4c8257c6d3e151ba99167a152ad824b0dde7671
2019-07-23 17:25:00 -07:00
7ed9622fdf Read number of workspaces from argument in recurrent_network_op (#23272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23272

We see significant performance improvements by limiting concurrency
at caffe2 level on mobile. This diff enables setting the number of caffe2
workspaces used during rnn inference.

Reviewed By: akyrola

Differential Revision: D16448611

fbshipit-source-id: 28abaddb4ea60bacb084ceb28cb7a4d1e67ccc17
2019-07-23 17:19:40 -07:00
a35136dd73 Add support for onnx tensor index export (#21716)
Summary:
Support exporting
* Standard tensor indexing like
```
x = torch.ones(4, 5)
ind = torch.tensor([0, 1])

return x[ind]
```
* [Advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing) like
```
x = torch.ones(4,5,6,7,8)
ind1 = torch.tensor([0, 1])
ind2 = torch.tensor([[3], [2]])
ind3 = torch.tensor([[2, 2], [4, 5]])

return x[2:4, ind1, None, ind2, ind3, :]
```
It would be ideal if ONNX can natively support indexing in future opsets, but for opset <= 10 it will always need this kind of workarounds.

There are still various limitations, such as not supporting advanced indexing with negative indices, not supporting mask indices of rank > 1, etc. My feeling is that these are less common cases that requires great effort to support using current opset, and it's better to not make the index export more cumbersome than it already is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21716

Reviewed By: zrphercule

Differential Revision: D15902199

Pulled By: houseroad

fbshipit-source-id: 5f1cc687fc9f97da18732f6a2c9dfe8f6fdb34a6
2019-07-23 17:11:28 -07:00
1de44a6f54 fix specialized list from dict keys (#23267)
Summary:
Previously we weren't specializing the list returned from `dict.keys()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23267

Differential Revision: D16448512

Pulled By: eellison

fbshipit-source-id: fcd2a37ac680bdf90219b099a94aa36a80f4067c
2019-07-23 17:02:19 -07:00
a6ccd62a81 BlackBoxPredictor OSS part 5: glow transforms
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.

Reviewed By: bertmaher

Differential Revision: D16367134

fbshipit-source-id: fc6bacc1be3ff6336beb57cdad58168d3a2b8c28
2019-07-23 16:39:23 -07:00
bdb1e1305d exclude some caffe2 modules from libtorch mobile build (#20000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20000
ghimport-source-id: f47773ef1c6849cd0c0e65080400416c6b370d39

Test Plan:
- verified libtorch mobile library builds and links successfully;

Imported from OSS

Differential Revision: D15169024

Pulled By: ljk53

fbshipit-source-id: 20ac89c6e7053239c93e51f00c5c5dc3595bea74
2019-07-23 16:20:27 -07:00
1c0309a9a9 make OMP_NUM_THREADS default in launch.py (#22501)
Summary:
per https://github.com/pytorch/pytorch/issues/22260, default number of open mp threads are spawned to be the same of number of cores available, for multi processing data parallel cases, too many threads may be spawned and could overload the CPU, resulting in performance regression.

so set OMP_NUM_THREADS = number of CPU processors/number of processes in default to neither overload or waste CPU threads
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22501

Test Plan:
1. without and with this change, example codes result in same result
      python ~/local/fbsource-fbcode/fbcode/caffe2/torch/distributed/launch.py --nproc_per_node=2 pytorch/examples/yanlizhao/distributed_launch_example.py

  Setting OMP_NUM_THREADS environment variable for each process to be: 24, which
  is max(1, num_cpus / num_processes), you can further tune the variable for optimal performance in your application if needed.
  final loss =  tensor(0.5211, device='cuda:0', grad_fn=<MseLossBackward>)

Differential Revision: D16092225

Pulled By: zhaojuanmao

fbshipit-source-id: b792a4c27a7ffae40e4a59e96669209c6a85e27f
2019-07-23 16:14:24 -07:00
058645acb1 Fusion and _intrinsic modules (#23003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23003

torch.quantization.fuse_module and torch.nn._intrinsic convRelu and LinearRelu

Fusion function to combine specific modules: (conv,bn) and  (conv,bn,relu).
In all cases, replace modules in place. The first module is replaced with the _intrinsic fused module and the remaining modules are replaced by nn.Identity.
Support both training and eval. For training, the modules are "fused" with a sequential container. This is to allow for further module swaps for quantization aware training.
Also add: torch.nn._intrinsic for convRelu and LinearRelu.

TODO: Add tests for _intrinsic modules.

Conv BN fusion code is based on DsKhudia's implementation

Differential Revision: D16199720

fbshipit-source-id: 95fb9ffe72b361d280313b2ec57de2acd4f9dda2
2019-07-23 14:54:19 -07:00
7b229342ca Renamed CosineAnnealingLr to CosineAnnealingLR (#23242)
Summary:
fixing https://github.com/pytorch/pytorch/issues/23160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23242

Differential Revision: D16443348

Pulled By: pbelevich

fbshipit-source-id: af0edf4e841e04a8016c98bfee72696581f3f070
2019-07-23 14:54:15 -07:00
8d4956fd02 hook up dropout sparse with replacement operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23183

Reviewed By: ffjiang

Differential Revision: D16428262

fbshipit-source-id: 0d6e17d15c898629bbd2826441f2c9701a78b0bd
2019-07-23 14:34:25 -07:00
6f01d13728 Implement dropout with replacement for id list features. (#22880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22880

Implement sparse dropout with replacement value.

Reviewed By: xianjiec

Differential Revision: D16267012

fbshipit-source-id: 8c4878230f61bb3ac333291e2c6aaf2fbdc5f9ce
2019-07-23 14:34:21 -07:00
e0f632c58b pickler.cpp: respect __getstate__/__setstate__
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23190

Test Plan: Imported from OSS

Differential Revision: D16431553

Pulled By: zdevito

fbshipit-source-id: 680ea1507c12727fd17aedb3067f522cf490e306
2019-07-23 14:27:51 -07:00
bae10db522 Incorporating arguments to pull production operators and adding device type. (#23197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23197

Incorporating arguments to pull production operators and adding device type.

Reviewed By: mingzhe09088

Differential Revision: D16387263

fbshipit-source-id: e20ed82225eb1e4b7ab1756ec157967b055d85bf
2019-07-23 13:43:26 -07:00
d8220b0599 add simple inheritance support to AST
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23109

Test Plan: Imported from OSS

Differential Revision: D16441914

Pulled By: suo

fbshipit-source-id: 18a57762d376759b98c18bc160eacbcc99f78ee9
2019-07-23 12:21:27 -07:00
017870a633 kill module_lookup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23097

Test Plan: Imported from OSS

Differential Revision: D16383329

Pulled By: suo

fbshipit-source-id: 282f8bac2245d584b66139daf4e5ea7b2b317295
2019-07-23 12:21:23 -07:00
2a37740a86 make RHS of assignment optional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23033

Test Plan: Imported from OSS

Differential Revision: D16383330

Pulled By: suo

fbshipit-source-id: 63c55fae06f0cd534eb5053f91a773431ad052d4
2019-07-23 12:21:19 -07:00
3be0a2b4be Parse all stmts in class defs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23031

Test Plan: Imported from OSS

Differential Revision: D16383327

Pulled By: suo

fbshipit-source-id: 6485109a66e653b7f26d30b91a97af8d71594e22
2019-07-23 12:21:15 -07:00
0dabaad819 Add Module::replace_module to C++ api (#22546)
Summary:
This adds a replace_module method to the C++ api. This is needed to be able to replace modules.

The primary use case I am aware of is to enable finetuning of models.
Given that finetuning is fairly popular these days, I think it would be good to facilitate this in the C++ api as well.

This has been reported by Jean-Christophe Lombardo on the [forums](https://discuss.pytorch.org/t/finetuning-a-model-on-multiple-gpu-in-c/49195).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22546

Differential Revision: D16440289

Pulled By: yf225

fbshipit-source-id: c136f914b8fc5c0f1975d877ea817fda5c851cda
2019-07-23 11:50:06 -07:00
f112c522af LinearReLU module (#23022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23022

will be tested in later diffs.

Added LinearReLU module for qat, allows conversion from torch.nn._intrisic.LinearReLU to torch.nn._intrinsic.qat.LinearReLU

Reviewed By: zafartahirov

Differential Revision: D16286800

fbshipit-source-id: 84cce3551d46e649781b9b6107d4076e10e51018
2019-07-23 11:17:25 -07:00
192dd8faf1 Set correct list type in pybind_utils (#23188)
Summary:
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23188
ghstack-source-id: 87008828

Differential Revision: D16430911

fbshipit-source-id: 9d9d29bf42402e0fff323dfd0ed65fcfd5564fd3
2019-07-23 10:52:38 -07:00
19be7ece15 Fix erase_number_types test (#23181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23181

We can't run dead code elimination after erasing number types because dce relies on graph invariants that erase_number_types breaks.

Reviewed By: houseroad

Differential Revision: D16427819

fbshipit-source-id: d1b98a74d2558b14d4be692219691149689a93d8
2019-07-23 10:23:10 -07:00
e56f11b750 Fix onnx export (#23180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23180

This pass needs to be run later because it breaks jit graph invariants and the lower_all_tuples pass still needs a valid jit graph.

Reviewed By: houseroad

Differential Revision: D16427680

fbshipit-source-id: 427c7e74c59a3d7d62f2855ed626cf6258107509
2019-07-23 10:23:06 -07:00
60afcabc6f DictConstruct sets correct types (#23171)
Summary:
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23171
ghstack-source-id: 87009037

Differential Revision: D16423640

fbshipit-source-id: 0f4f9b12759b8a9defaae775e33e2b0af9bb7791
2019-07-23 10:23:01 -07:00
67aede98c3 Exclude unused onnx targets (#23195)
Summary:
e.g. onnxifi_dummy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23195

Differential Revision: D16441493

Pulled By: bddppq

fbshipit-source-id: 76816e7a7c73f60f3c7abea10fbdbf086cea0476
2019-07-23 10:22:57 -07:00
9d03133c14 ListConstruct sets correct element type (#23189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23189
ghstack-source-id: 86971099

Differential Revision: D16430987

fbshipit-source-id: 9af255075b670e6f811e1a9d104f2738a38e9515
2019-07-23 10:14:35 -07:00
2073cc73f8 Use concrete types in jit test for generic lists (#23192)
Summary:
Creating an untyped generic list is deprecated, we always want type information to be present.

This fixes test cases and removes one that used lists with ambigious types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23192
ghstack-source-id: 86972891

Differential Revision: D16431482

fbshipit-source-id: 4ca5cd142118a3f0a4dcb8cd77383127c54abb29
2019-07-23 10:04:12 -07:00
21f52ce0d4 Remove trailing semicolon from TORCH_CHECK macros.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22339

Test Plan: Imported from OSS

Differential Revision: D16182743

Pulled By: ezyang

fbshipit-source-id: 3c4ac0abe49ce83901bd5b07279a135857035f80
2019-07-23 09:58:50 -07:00
174f7a586f Switch from KaTeX to imgmath for documentation rendering.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23025

Test Plan: Imported from OSS

Differential Revision: D16441000

Pulled By: ezyang

fbshipit-source-id: c1ab557cb8163e9c69585c32d237c076582a6d73
2019-07-23 09:44:37 -07:00
792d527746 Fix typos in comments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23130

Differential Revision: D16402755

Pulled By: ezyang

fbshipit-source-id: 8bf9767c0012aed8ad91289bbaf2d979f130d728
2019-07-23 09:44:33 -07:00
60c46dd4df Let CMake handle NCCL detection instead of our handcrafted Python script. (#22930)
Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
2019-07-23 08:45:51 -07:00
e4b75c6580 Fix typo in dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23132

Differential Revision: D16402759

Pulled By: ezyang

fbshipit-source-id: 9500570f6b7492a67a2af853bfb63a5667e6b7b5
2019-07-23 08:45:47 -07:00
45d3f495ef Add document of function torch.as_strided (#22842)
Summary:
Documentation of `torch.as_strided` and `Tensor.as_strided` is missing. As mentioned in https://github.com/pytorch/pytorch/issues/9886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22842

Differential Revision: D16254106

Pulled By: soumith

fbshipit-source-id: dee142483fb9ef7bea84bd44a970b6eccdcdc471
2019-07-23 06:06:00 -07:00
c9e62f6988 Update nccl to 2.4.8-1 (#23186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23016
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23186

Differential Revision: D16438723

Pulled By: soumith

fbshipit-source-id: ff4f5b9c7383b92e5cf2053a87caf2ac11be7aeb
2019-07-23 05:35:32 -07:00
9a6ae5c0b1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: aab8ded966d718befb664a6e968eedc6bbe7cb5e
2019-07-22 22:47:52 -07:00
d7448c7812 quantized conv module (#23178)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23178
ghstack-source-id: 86973164

Differential Revision: D16426871

fbshipit-source-id: a2ebb38997acfeb61b7dfd6b11dd8ee9b3a7a8ed
2019-07-22 20:47:40 -07:00
f3a37278cc ConvReLU2d module (#23008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23008

Added ConvReLU2d module to convert from nn._intrinsic.ConvReLU2d to nn._intrinsic.qat.ConvReLU2d

Differential Revision: D16286670

fbshipit-source-id: 2903d825175911c0095497369f313bf2a2eb3833
2019-07-22 20:47:36 -07:00
eb5137a5d1 Export torch.arange to ONNX (#22601)
Summary:
Some overlap with https://github.com/pytorch/pytorch/pull/21716 regarding caffe2 nonzero. Will rebase the other one accordingly whichever gets merged first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22601

Reviewed By: zrphercule

Differential Revision: D16224660

Pulled By: houseroad

fbshipit-source-id: dbfd1b8776cb626601e0bf83b3fcca291806e653
2019-07-22 20:30:39 -07:00
06d11f0434 Revert D16368004: [pytorch][PR] Fix error message for a wrong fork CUDA
Differential Revision:
D16368004

Original commit changeset: 44b6977790ce

fbshipit-source-id: c81a232bd52219e56a19c64650c4b6dedeb167cb
2019-07-22 18:46:48 -07:00
3861520603 Verify flatten works for quantized Tensor (#23121)
Summary:
Added a test in `test_torch.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23121
ghstack-source-id: 86983227

Differential Revision: D16391409

fbshipit-source-id: 04e72b2f753a0a6ddbf58d55b794e443b18a2156
2019-07-22 18:34:25 -07:00
a24f6c13a3 Fix broken indexing when using None and ellipses indexing together (#22905)
Summary:
https://github.com/pytorch/pytorch/issues/20153

I believe you need 2 passes for this. Take this example
```python
torch.jit.script
def f():
    x = torch.ones(10, 9, 8, 7, 6)
    return x[..., None, None].shape
```
which results in `[10, 9, 8, 7, 6, 1, 1]`
vs
```
torch.jit.script
def f():
    x = torch.ones(10, 9, 8, 7, 6)
    return x[..., None, None, :].shape
```
which results in `[10, 9, 8, 7, 1, 1, 6]`
After only processing `x[..., None, None` we don't know whether we should be creating a new dimension at the end of the dimension list or somewhere in the middle. What we do depends on the elements to the right of it.

Thus, I do 2 passes - one to collect all the dimensions that the index operations operate on, and another that executes the index operations.

This still doesn't work for an ellipse index followed by a tensor index, but it wasn't working previously either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22905

Differential Revision: D16433558

Pulled By: Chillee

fbshipit-source-id: c1b303cb97b1af8b6e405bad33495ef3b4c27c4a
2019-07-22 18:11:23 -07:00
648f10be16 Fix load op to return the shape info as before when loading multiple blobs (#23182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23182

This fixes the issue seen in D16390551

Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op).
Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information.

If you are loading just one blob and the shape information is provided, then this returns the right shape info back.

For all other cases, behavior is unchanged as before we introduced the issue.

This fixes the issue reported by Andrey in D16229465

Reviewed By: boryiingsu

Differential Revision: D16428140

fbshipit-source-id: 8ef6705ab2efb346819489e1f166e23269f7ef8a
2019-07-22 15:53:40 -07:00
1c574458b0 nn_quantized test (#23169)
Summary:
- scale/zero_point in quantized modules should be Tensor
- fix conv module permutation API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23169
ghstack-source-id: 86956383

Reviewed By: zafartahirov

Differential Revision: D16423570

fbshipit-source-id: d29498e07bdd8f71a33b4e16e089f80847bbca6d
2019-07-22 15:53:36 -07:00
e08f8f45ff Turning on fbgemm for nightlies (#22784)
Summary:
fbgemm requires a AVX512 which requires a more recent compiler, so this also switches all the nightlies from devtoolset3 to devtoolset7. Since CUDA 9.0 doesn't support devtoolset7, we also switch from CUDA 9.0 to CUDA 9.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22784

Differential Revision: D16428165

Pulled By: pjh5

fbshipit-source-id: c1af3729d8edce88a96fa9069d4c5a1808c25f99
2019-07-22 15:09:11 -07:00
a6e45a69a8 Fix error message for a wrong fork CUDA (#23030)
Summary:
Fix https://github.com/pytorch/pytorch/issues/17357
Unblock 1.2 release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23030

Differential Revision: D16368004

Pulled By: zhangguanheng66

fbshipit-source-id: 44b6977790ce768efa4777bae41d4b26dae5f288
2019-07-22 15:04:32 -07:00
3ca7c0ffdb Add get_accessed_features function to ModelLayer class (#23036)
Summary:
We need a way to figure get a complete list fo features that are used in training a model.  One way to do this is to make it possible to get the list of features used in each Model Layer.  Then once the model is complete we can go through the layers and aggregate the features.

I've introduced a function to expose that information here, get_accessed_features, and implemented it in the FeatureSparseToDense layer to start with.

I've tried to include the minimum amount of information to make this useful, while making it easy to integrate into the variety of model layers.  This is, for example, why AccessedFeatures does not contain feature_names which is not always present in a model layer.  I debated whether or not to include feature_type, but I think that's useful enough, and easy enough to figure out in a model layer, that it's worth including.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23036

Test Plan:
Added a unit test to verify the behavior of get_accessed_features in FeatureSparseToDense.

aml_dper2-fblearner-flow-integration-tests failed due to a known issue D16355865
aml_dper3-fblearner-flow-integration-tests failed due to a known issue T47197113

I verified no tests in the integration tests failed to issues other than those known ones.

DPER2 canaries: https://fburl.com/fblearner/1217voga

Reviewed By: volkhin

Differential Revision: D16365380

Pulled By: kevinwilfong

fbshipit-source-id: 2dbb4d832628180336533f29f7d917cbad171950
2019-07-22 15:04:28 -07:00
ff23a02ac4 Pin numba to 0.44.0 to fix Windows CI.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23176

Test Plan: Imported from OSS

Differential Revision: D16426873

Pulled By: ezyang

fbshipit-source-id: 10d800db78416137504c396711dc45109f6f5ca4
2019-07-22 14:59:15 -07:00
b6d06d5496 Remove empty THCThreadLocal{.h/.cpp} (#23157)
Summary:
These files were removed from the build process and cleaned in https://github.com/pytorch/pytorch/pull/9735.

Closes https://github.com/pytorch/pytorch/issues/22572
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23157

Differential Revision: D16426819

Pulled By: soumith

fbshipit-source-id: aa01aec9fe0e3af456ba8b75ae85d0b1df2a8ed9
2019-07-22 14:59:11 -07:00
fdfc676eb6 Invert ownership between PyFunction and THPFunction.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22983

Test Plan: Imported from OSS

Differential Revision: D16422209

Pulled By: ezyang

fbshipit-source-id: d6e41a1606484fbbd7a95a547b83a4199151be68
2019-07-22 14:13:14 -07:00
ae5b52086e Support converting Python number to IValue in pybind_utils.h (#22817)
Summary:
I ran into the following error when trying to pass a Python int as an arg to `torch::jit::createStackForSchema`, and I think it is due to the missing support for `NumberType` in [toIValue](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/pybind_utils.h#L448).

> RuntimeError: Missing cases in toIValue for type: Scalar! File a bug report. (toIValue at ../torch/csrc/jit/pybind_utils.h:449)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22817

Differential Revision: D16276006

Pulled By: mrshenli

fbshipit-source-id: 7f63519bb37219445e836ec1f51ca4f98bf52c44
2019-07-22 14:01:30 -07:00
2becbd3faa BlackBoxPredictor OSS part 4: Open-source other transforms (#23099)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison
This specific diff:
There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23099

Test Plan:
salex@devvm4218:caffe2 { (fcdaf96|HISTEDIT)}$ submit_canary --q tw_adindexer_canary_on_canary_tier && submit_canary --q tw_adfinder_canary_on_canary_tier && submit_canary prospector_repl
ay_canary
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851419/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717789681292057
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GBYe_ANnNNBnbWsDAAAAAABJPvJBbjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851536/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717806884923980
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GArl_QPncP7tc30IAAAAAACfza93bjEQAAAz
/proc/self/fd/4/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Patch Phabricator Link: differential/diff/86851661/
Submit job request to the thrift service
https://our.intern.facebook.com/intern/ads/canary/419717823090263325
DONE
Everpaste link: https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GNcyAwRrfFd0MIUIAAAAAABLOINibjEQAAAz

Differential Revision: D16288332

Pulled By: salexspb

fbshipit-source-id: 95899dede6b11a2ae14703b9aaea8e1a677f0aaa
2019-07-22 13:53:43 -07:00
27031dccb2 Updating producer_version in exported ONNX models to pytorch 1.2. (#23120)
Summary:
Bumping up the producer_version in exported ONNX models in view of the next release. Updating tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23120

Reviewed By: zrphercule

Differential Revision: D16420917

Pulled By: houseroad

fbshipit-source-id: 6686b10523c102e924ecaf96fd3231240b4219a9
2019-07-22 13:45:39 -07:00
7e31c02afe Fixed deprecated use of yaml.load
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22985

Test Plan: Imported from OSS

Differential Revision: D16425112

Pulled By: Chillee

fbshipit-source-id: ef0c764c3fd2518b9284d9a20e84d677ebd8f277
2019-07-22 13:25:27 -07:00
76291829ba Refactor named inference rule for reductions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23075

Test Plan: Imported from OSS

Differential Revision: D16419173

Pulled By: zou3519

fbshipit-source-id: 187639b563336f935e5f06351dd0b680de1aadfd
2019-07-22 13:12:03 -07:00
b4b51ed5ec Implement tensor.size(Dimname), tensor.stride(Dimname)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22989

Test Plan: Imported from OSS

Differential Revision: D16364437

Pulled By: zou3519

fbshipit-source-id: 393a93fecac27b5d3b1a7f7692590d8fd5e95a5d
2019-07-22 13:11:59 -07:00
965b97f5f0 Bidirectional GRU and LSTM C++ API forward fix (#22850)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/17998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22850

Differential Revision: D16420854

Pulled By: pbelevich

fbshipit-source-id: 76f38be40d8479fb9cafba92939cea61d81fd336
2019-07-22 12:59:47 -07:00
e5797e9350 Revert D16390551: Fix load op to return the shape info as before when loading multiple blobs
Differential Revision:
D16390551

Original commit changeset: 1055b481a7a9

fbshipit-source-id: ea50a71e3d446a74bd04d9945710cc4ccee63c87
2019-07-22 12:48:14 -07:00
fcdfc35d1c Support get/setstate with no args (#23119)
Summary:
`pickle` supports this and a lot of the quantized use cases for get/set
state follow this pattern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23119

Pulled By: driazati

Differential Revision: D16391234

fbshipit-source-id: 9f63e0a1679daa61b17aa64b5995e2be23b07b50
2019-07-22 12:32:29 -07:00
858d4a6a04 Cleanup API and remove 'experimental' warning (#23000)
Summary:
This fixes ASAN test issues with https://github.com/pytorch/pytorch/pull/21786 seen at https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2212325/output/105/0?file=true and lands it again.

This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the  user) and removes the "experimental" warning in prep for our 1.2 release.

We also don't need the additional PyTorch version checks now that we are in the codebase itself.

cc yf225, lanpa, natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23000

Reviewed By: sanekmelnikov

Differential Revision: D16349734

Pulled By: orionr

fbshipit-source-id: 604a9cad56868a55e08b509a0c6f42b84f68de95
2019-07-22 12:10:05 -07:00
fad3031b5c Fix type hints for None constants (#23029)
Summary:
The type hint was being ignored when emitting `None` constants, this also de-dups some testing code
](https://our.intern.facebook.com/intern/diff/16364572/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23029

Pulled By: driazati

Differential Revision: D16364572

fbshipit-source-id: 64f3abd3e37ee49c209480a85ed4f1b8802e5d93
2019-07-22 11:55:05 -07:00
2891784a72 Resolve with closed over variables instead of stack frame (#22270)
Summary:
Previously we looked at the stack frame of the function that called
`script` to resolve variables. This doesn't work if someone calls script
with a function defined somewhere else that references captured
variables. We already have a mechanism to look at the closed over
variables for a function, so this changes the `rcb` to use that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22270

Pulled By: driazati

Differential Revision: D16391346

fbshipit-source-id: ad9b314ae86c249251b106079e76a5d7cf6c04c2
2019-07-22 11:44:36 -07:00
fd90b967b2 Fix load op to return the shape info as before when loading multiple blobs (#23166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23166

Changing the load op to take in shapes vector needs changes in lots of places (almost all usages of load op).
Instead this is a small and safe change where the behavior is unchanged if we are loading multiple blobs and when loading a single blob without shape information.

If you are loading just one blob and the shape information is provided, then this returns the right shape info back.

For all other cases, behavior is unchanged as before we introduced the issue.

This fixes the issue reported by Andrey in D16229465

Reviewed By: boryiingsu

Differential Revision: D16390551

fbshipit-source-id: 1055b481a7a9e83021209e59f38a7cc0b49003cf
2019-07-22 11:27:59 -07:00
82db5dceb6 Added running via throughput benchmark options. (#23077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23077

Although the difference between running from python and this is not much if we
have forward method's loop long enough (like 1000 in this case).

Reviewed By: mingzhe09088

Differential Revision: D16122343

fbshipit-source-id: 5c1d1b98ae82c996baf9d42bcd04995e2ba60c78
2019-07-22 11:27:55 -07:00
2ba516d5b6 Added add op framework overhead benchmark for C2 (#23078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23078

C2 benchmark.

Reviewed By: mingzhe09088

Differential Revision: D16122337

fbshipit-source-id: bf56e60c6e60eda2be2938d9f613708a4bc1669a
2019-07-22 11:27:50 -07:00
0621068cdc Add simple add op based framework overhead benchmark. (#23076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23076

Tracing based and non tracing based added

Reviewed By: mingzhe09088

Differential Revision: D16097280

fbshipit-source-id: 3a137092f7ccc3dd2d29d95e10178ec89d3ce892
2019-07-22 11:27:45 -07:00
4223e2f9e9 fix qat tests (#23124)
Summary:
missed instantiating observers in linear

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23124
ghstack-source-id: 86886705

Reviewed By: raghuramank100

Differential Revision: D16401066

fbshipit-source-id: f9f0f359caeca855c62192d13261a33eef57715a
2019-07-22 10:28:35 -07:00
8bc28cc898 Remove cuda free mutex (#23040)
Summary:
Revision of https://github.com/pytorch/pytorch/issues/22173 to address CI failure after merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23040

Differential Revision: D16366872

Pulled By: mrshenli

fbshipit-source-id: 747b6ecf2dc195c25f82b8f732ae9ff52cd3a394
2019-07-22 07:58:29 -07:00
22f7c9e31b (#23105)
Summary:
Fixed a [bug](https://github.com/pytorch/pytorch/issues/22992) where passing result tensor into masked_select wont work with bool mask.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23105

Differential Revision: D16386676

Pulled By: izdeby

fbshipit-source-id: 93a1e9bfbc916c8a8eaa149a70a5553f3711f53e
2019-07-22 07:49:30 -07:00
aeee49d51d Revert "Temporarily skip mypy-0.720 to unbreak master type checks"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23095

Test Plan: Imported from OSS

Differential Revision: D16383149

Pulled By: zou3519

fbshipit-source-id: ca6bdfe0f51f6bdbd4d95142a880f3902f60676d
2019-07-22 06:54:22 -07:00
b8c8977be7 Update ScatterWeightedSum Op (#23087)
Summary:
Update ScatterWeightedSum op when there exists only one weighted X to update slice of Y which is usually the case when the op is used for gradient update. The changes remove the copy overhead and seeing significant operator performance improvement

- 25 - 50% improvment on CUDA based on input configuration

-  ~50% improvement on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23087

Differential Revision: D16385194

Pulled By: bddppq

fbshipit-source-id: 3189e892940fb9c26305269eb0d47479b9b71af0
2019-07-21 22:21:40 -07:00
ff8cb9f622 hipify: do not overwrite files that stay the same (#23112)
Summary:
This is a small patch to not overwrite unchanged files to help a bit with building.
It is not as incremental as one might like, given that one has to pass `--out-of-place-only` to not run into the patching and things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23112

Differential Revision: D16402623

Pulled By: bddppq

fbshipit-source-id: 531ce0078bc716ae31bd92c5248080ef02a065b9
2019-07-21 22:00:53 -07:00
2ac9abf759 Fix memory leak in Adam, Adagrad, RMSProp (#23125)
Summary:
As reported in LaurentMazare/tch-rs#76, the memory grows when weight_decay is present when using Adam. It applies the same fix in https://github.com/pytorch/pytorch/issues/23007 to Adam, Adagrad and RMSProp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23125

Differential Revision: D16402421

Pulled By: soumith

fbshipit-source-id: 59eb4bd81b8bd9e1a5f7c068ed841f70a4c38a80
2019-07-21 10:06:18 -07:00
96b6797fc0 improve enforce in cross_entroy_op (#23062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23062

as title

Reviewed By: xianjiec, BIT-silence

Differential Revision: D16374601

fbshipit-source-id: 62219c6abde311ebc8a0e6a03cfb517d80bb52b5
2019-07-21 00:07:58 -07:00
3e66385002 Lint fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23135

Test Plan: Imported from OSS

Differential Revision: D16403272

Pulled By: zafartahirov

fbshipit-source-id: 31f9eb11216c494a8327bcb5dc37e47a77611e2b
2019-07-20 21:46:18 -07:00
963707c5ea MaxPool2d in the torch (#22765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22765

the pooling signature is the same as the non-quantized one. Adding it to the native_functions.yaml

Reviewed By: jerryzh168

Differential Revision: D16102608

fbshipit-source-id: 7627ad8f02a231f488b74d1a245b853f89d9c419
2019-07-20 21:41:09 -07:00
cf3e6478ad Concat with out (#22408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22408

Quantized Concatenation with out argument

Reviewed By: jianyuh

Differential Revision: D16061526

fbshipit-source-id: 61487cf87763665df19feb8e678da72fd66e8740
2019-07-20 16:13:14 -07:00
05f088ec22 make jit logging visible, so it can be used in a TVM compiler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23041

Differential Revision: D16402934

Pulled By: Krovatkin

fbshipit-source-id: 715f9821809527e94bd7f01f1680db046c888e6c
2019-07-20 14:37:49 -07:00
bb9119f67d Use set -x to help investigate doc push errors (#23111)
Summary:
I couldn't find any verbosity options in the [`docker pull` command docs](https://docs.docker.com/engine/reference/commandline/pull/), but
`docker pull` [got a `--quiet` option](https://github.com/docker/cli/pull/882) in a recent version (not sure if we're using that version), and `--quiet` for `docker push` [is forthcoming](https://github.com/docker/cli/pull/1221).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23111

Differential Revision: D16402993

Pulled By: kostmo

fbshipit-source-id: 52f77b11b839d28f8cf1ecb58518ca69632d7fbe
2019-07-20 12:36:05 -07:00
a62c687445 Remove unused atomics detection code. (#23089)
Summary:
USE_{C11,MSC,GCC}_ATOMICS are not used in PyTorch or submodules. Now we remove their underlying detection code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23089

Differential Revision: D16402750

Pulled By: ezyang

fbshipit-source-id: fde84b958eb0b5b4d3f0406acefa92ab30ea43be
2019-07-20 10:52:53 -07:00
4e5f70089f fix indexing for more than 65535 elems in non-indexed first dim (#23123)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22843, also adds test from https://github.com/pytorch/pytorch/issues/23102
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23123

Differential Revision: D16402422

Pulled By: soumith

fbshipit-source-id: aa7a79159ed947be03ce3725ec8abcf5246a60bf
2019-07-20 06:17:43 -07:00
6791f395f9 support at::view and at::reshape for quantized tensor (#23046)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23046
ghstack-source-id: 86840501

Differential Revision: D16368897

fbshipit-source-id: 9da232c11f21af5f850cd9545e56996a81791d00
2019-07-19 23:34:04 -07:00
a03205ed66 Move THTensor_compute_stride to ATen (#23045)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23045
ghstack-source-id: 86842517

Differential Revision: D16368860

fbshipit-source-id: 8970a73758afadbc9a6a3e263cdcfe5e2fd9cc0d
2019-07-19 23:14:11 -07:00
0d8324b18a Add fused modules in nn._intrinsic (#23085)
Summary:
Using nn.Sequential to represent fused modules

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23085
ghstack-source-id: 86883096

Differential Revision: D16379521

fbshipit-source-id: 57d67cb947de8665bd758848595a4a000366153a
2019-07-19 23:04:25 -07:00
47af41fe72 Quantized concatenation (+fused relu). (#21749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21749

This is the first version without "requantization"

Reviewed By: jerryzh168

Differential Revision: D15807940

fbshipit-source-id: 19bb0482abed8ed9d1521a3fa1f15bda8e6a6a7c
2019-07-19 22:23:41 -07:00
9f4df63c2c Moving np function to test area
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23118

Test Plan: Imported from OSS

Differential Revision: D16400634

Pulled By: zafartahirov

fbshipit-source-id: 44872fdf64b20a6b67e5176042fe58c8c2359738
2019-07-19 22:11:21 -07:00
77353636de Conv module (#23084)
Summary:
Added Conv module for qat

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23084
ghstack-source-id: 86862445

Differential Revision: D16379417

fbshipit-source-id: 742cc8b8e0f132070ca4943a1c2e3db60c2b5bdc
2019-07-19 18:49:52 -07:00
b964bdb53a Fbgemm fp16 tensor support (#23101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101

Support for
- Shape inference
- Tensor info extraction

Reviewed By: zrphercule

Differential Revision: D16345251

fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79
2019-07-19 17:08:03 -07:00
2a8d5a132c Fix workspace destruction ordering (#23096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096

nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first.

Reviewed By: ajyu

Differential Revision: D16382987

fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7
2019-07-19 16:49:50 -07:00
79c4f83fbe Include module names in recursive error stacks (#22921)
Summary:
Following on to #22280, this adds module names so they're included in
the call stacks of an error message (e.g. so it appears as `M.forward`
instead of `forward`)
](https://our.intern.facebook.com/intern/diff/16287925/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22921

Pulled By: driazati

Differential Revision: D16287925

fbshipit-source-id: 6f31d72caa87ba2dc527805d36f7d62eb94c0808
2019-07-19 16:09:14 -07:00
7cc029cb75 Quantization aware training in eager mode (#23082)
Summary:
Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23082
ghstack-source-id: 86824650

Differential Revision: D16379374

fbshipit-source-id: 7d16d1acd87025065a24942ff92abf18e9fc8070
2019-07-19 14:57:25 -07:00
c09e92255c Add initial support for serializing classes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22953

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D16340214

Pulled By: zdevito

fbshipit-source-id: 70fb1968eca34e14492e0d2be52e28b27813f821
2019-07-19 14:51:59 -07:00
6334edc2d8 BlackBoxPredictor OSS: open-source NQL and custom transforms (#22877)
Summary:
Overal context: open-source BlackBoxPredictor as the entry
point for inference in Caffe2 (thread safe abstraction for Caffe2
inference). This should be used in ThroughputBenchmark for the purpose
of framework comparison

This specific diff:

There should be no harm in moving transformation code to
OSS. On the advantages side we will be able to compare production
Caffe2 setup with PyTorch in the most fair way via
ThroughputBenchmark. This approach avoid any complicated
transformation regirstries. Building those proper would be significant
engineering effort as well as production risk. In the past we had SEVs
related to transforms being turned off due to various refactors. Given
that we don't plan to build any other significant investments into
transformation logic except existing ones (like TVM and Glow), and
those also relate to open-source technologies, I came up to the
conclusion of moving to OSS the whole thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22877

Test Plan:
did a bunch of unit tests locally and now

waitforsandcaslte

AdFinder canary:
https://our.intern.facebook.com/intern/ads/canary/419623727275650390

adindexer:
https://our.intern.facebook.com/intern/ads/canary/419623750891549182

prospector:
https://our.intern.facebook.com/intern/ads/canary/419644899887610977
https://our.intern.facebook.com/intern/ads/canary/419645123742738405

Differential Revision: D16267765

Pulled By: salexspb

fbshipit-source-id: 776a1cd5415e0695eae28254b3f155e7a9bd8c2b
2019-07-19 14:37:56 -07:00
f2f3e8ad8c fix overspecializing constants in compilation (#22816)
Summary:
When we specialize the tensor type of constants in compilation it causes all sorts of problems.

Fix for https://github.com/pytorch/pytorch/issues/22809
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22816

Differential Revision: D16384094

Pulled By: eellison

fbshipit-source-id: f33c00d92d87108749d09bf037a6e74c5d9adaa2
2019-07-19 14:19:49 -07:00
a302821c5d Adding more binary documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23093

Differential Revision: D16384838

Pulled By: pjh5

fbshipit-source-id: 0ce91c2f3f0ec8f5c026622f27039b36c42a81d4
2019-07-19 14:06:34 -07:00
Jie
a28ffaf350 (#22827)
Summary:
1. Fix out of range memory access for reduction on all dimensions for non-packed
tensor.

2. Enabling launch config that maps block width to reduction on fastest striding
dimension. This mapping was previously only active when reducing on fastest
striding dimension of packed tensor, which is not necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22827

Differential Revision: D16271897

Pulled By: zdevito

fbshipit-source-id: 20763f6cf9a58e44ffc0e7ec27724dfec8fe2c5d
2019-07-19 13:38:17 -07:00
818828e8a8 Only import PIL when needed (#23023)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22389

In most cases we only import `PIL` methods when we need them, but we missed a spot.

cc lanpa natalialunova sanekmelnikov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23023

Reviewed By: sanekmelnikov

Differential Revision: D16373492

Pulled By: orionr

fbshipit-source-id: b08bf8a9b5a861390eadf62eda21ac055777180f
2019-07-19 13:30:43 -07:00
mal
44493a623e Pass variable_list of inputs to _wrap_outputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23037

Test Plan: Imported from OSS

Differential Revision: D16380071

fbshipit-source-id: ae3333c02ef8a3c09b95bec7b8e92ce649553615
2019-07-19 12:31:23 -07:00
2ee0f0bc3a add break continue to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23091

Differential Revision: D16382604

Pulled By: eellison

fbshipit-source-id: 47432d844c811ecd87ad97155e835b07ae8056cc
2019-07-19 12:17:00 -07:00
6dfecc7e01 Remove deprecated linear algebra functions (and methods) (#22841)
Summary:
Changelog:
- Removed the following linear algebra functions in PyTorch in favor of the renamed operations
  - `btrifact` (use `lu` instead)
  - `btrifact_with_info` (use `lu` with `get_infos=True` instead)
  - `btrisolve` (use `lu_solve` instead)
  - `btriunpack` (use `lu_unpack` instead)
  - `gesv` (use `solve` instead)
  - `pstrf` (use `cholesky` instead)
  - `potrf` (use `cholesky` instead)
  - `potri` (use `cholesky_inverse` instead)
  - `potrs` (use `cholesky_solve` instead)
  - `trtrs` (use `triangular_solve` instead)

- Removed dead code after the removal of `pstrf`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22841

Test Plan:
- All existing tests should pass to verify that the removal is clean

Closes https://github.com/pytorch/pytorch/issues/22832

Differential Revision: D16346184

Pulled By: zou3519

fbshipit-source-id: f748d16ed7609c028de6adcbc28684d5a1af0678
2019-07-19 11:43:06 -07:00
61a683c212 Delete aten/src/ATen/out.txt (#23050)
Summary:
An file introduced in 5c0e0589509540fc991a88ffc48e96cc76fd799d , probably by mistake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23050

Differential Revision: D16379947

Pulled By: ezyang

fbshipit-source-id: b7fa8995028e180603d7830b6f170a7a57310385
2019-07-19 10:27:59 -07:00
5417ddbdae Fix get_all_math_dtypes for device='cuda' retuning None (#23028)
Summary:
This PR fixes the invalid None return when calling get_all_math_dtype(device='cuda').

Issue came from the __append__ method which doesn't have any return value used in `return dtypes.append(...)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23028

Differential Revision: D16362732

Pulled By: colesbury

fbshipit-source-id: 0bbc30a0c663749d768159f1bc37b99f7263297b
2019-07-19 09:29:16 -07:00
84c2c89e2c Revert D16199356: [qat] Quantization aware training in eager mode
Differential Revision:
D16199356

Original commit changeset: 62aeaf47c12c

fbshipit-source-id: d06a96b0a617ae38029ffb246173ec065454b666
2019-07-19 03:18:48 -07:00
f19aa12ae5 Revert D16274792: [qat] Conv module
Differential Revision:
D16274792

Original commit changeset: 1da10194123b

fbshipit-source-id: 71b34774b463f2350289bd39b8cfd798e095ffa5
2019-07-19 03:18:45 -07:00
c362e72d4a Revert D16349133: [quant] Add fused modules in nn._intrinsic
Differential Revision:
D16349133

Original commit changeset: 04d862ac4a0d

fbshipit-source-id: d96d9d98e9b29fddf93d4106621752abb00947eb
2019-07-19 03:18:41 -07:00
2401a05aae Revert D16373996: [fix] conv module missing return
Differential Revision:
D16373996

Original commit changeset: 1ec85d23c9dd

fbshipit-source-id: e507db59405aa240d20f132c3d6df323b241a542
2019-07-19 03:06:39 -07:00
25f0dc3490 BERT CPU performance optimization: use mkldnn for nn.Linear() when input is dense layout (#21851)
Summary:
This PR aims at improving BERT performance on CPU by using `mkldnn` inner product for `nn.Linear()`.
The current logic is to use `mkldnn` only when `input` tensor is of mkldnn layout. This PR loosens this condition, `mkldnn` will be used for `nn.Linear()` when `input` tensor is of dense layout. The aten tensor is viewed inplace in `mkldnn` without additional memory copy.
1. when `input.dim() >= 3` , it is viewed as 2d tensor. e.g. `[T, N, C]` is treated as `[TN, C]`;
2. when `input` is not contiguous, it is copied so as to be contiguous. `mkldnn` inner product can't handle non-contiguous memory.

With this PR, BERT on `glue/MRPC` inference (batch size = 1) on Xeon 6148 single socket (20 cores@2.5GHz) improves by `44%`:

1. before (unit: iterations/sec):
```bash
408/408 [00:24<00:00, 16.69it/s]
```
2. after (unit: iterations/sec):
```bash
408/408 [00:16<00:00, 24.06it/s]
```

The latency reduces from `59.92 ms` to `41.56ms` correspondingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21851

Differential Revision: D16056334

Pulled By: dzhulgakov

fbshipit-source-id: 9b70ed58323b5e2f3f4e3ebacc766a74a8b68a8a
2019-07-19 00:54:29 -07:00
12ac9171db fix error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22982

Differential Revision: D16356464

Pulled By: soumith

fbshipit-source-id: 3ddd5de4cf5c000dcf5b2faed39283dc715cba25
2019-07-18 23:38:55 -07:00
cdfdeb74af conv module missing return (#23058)
Summary:
att

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23058
ghstack-source-id: 86807313

Reviewed By: jianyuh

Differential Revision: D16373996

fbshipit-source-id: 1ec85d23c9ddd9975bc32f6c5d30cde04eb1109e
2019-07-18 22:24:56 -07:00
6601978012 Use ProfiledTensorType in peephole.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22767

Differential Revision: D16342954

Pulled By: Krovatkin

fbshipit-source-id: a577ea942ff4bab6ae15f14d6ba04a68675c70aa
2019-07-18 21:49:45 -07:00
d153b0b58b Updating submodules
Reviewed By: yns88

fbshipit-source-id: 87bb7a817dea65783436d6d6dfbbd492724d20a7
2019-07-18 20:43:55 -07:00
23badc60f3 Fix TBB build for older versions of cmake
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23038

Test Plan:
with-proxy pip install --upgrade cmake==3.11.0
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.13.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

with-proxy pip install --upgrade cmake==3.6.3
python setup.py clean
USE_CUDA=0 PARALLEL_BACKEND=NATIVE USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python setup.py develop --cmake

Imported from OSS

Differential Revision: D16365699

Pulled By: ilia-cher

fbshipit-source-id: cbf779dff63e4e186d9b4c2fc21539a24ce0d5a2
2019-07-18 20:12:26 -07:00
e57b682abf Add fused modules in nn._intrinsic (#22999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22999

Using nn.Sequential to represent fused modules

Reviewed By: zafartahirov

Differential Revision: D16349133

fbshipit-source-id: 04d862ac4a0d20e83dc9d6de6b7d0d0c26bdedfd
2019-07-18 18:58:11 -07:00
12d9d768b8 Conv module (#22899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22899

Added Conv module for qat

Reviewed By: zafartahirov

Differential Revision: D16274792

fbshipit-source-id: 1da10194123b2759a6a35c60d1c2d2c0b569ccdc
2019-07-18 18:58:07 -07:00
65ef671d11 Quantization aware training in eager mode (#22732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22732

Add support for quantization aware training in eager mode

Modifications to Post training flow:
## Prepare
* Fusion: e.g. (Conv, Bn) → ConvBn (float)
* Swapping: To insert fake_quant to weight, we need to swap the float modules that has weight with different qat modules, e.g. Conv → torch.nn.qat.Conv , ConvBn → torch.nn._intrinsic.qat.ConvBn
```
    * previously we were thinking about modify the weight in forward_pre hook and change it back in forward_hook:
        * def forward_pre_hook(self, input):
                self.float_weight = self.weight
                self.weight = self.fake_quantize(self.float_weight)

            def forward_hook(self, input):
                self.weight = self.float_weight
```

* Assignments to self.weight are needed because we can’t change forward function and in forward function they are using self.weight.
* But we will need to keep two copies of weight in this case, so it’s probably better to just swap the module
* So we want to just swap Conv to torch.nn.qat.Conv and Linear to torch.nn.qat.Linear
* qat modules will have fake_quant for output and weights inserted in forward function

## Convert
* flow should be identical to ptq, but the swapping dictionary is slightly different since modules are changed in prepare step.

Reviewed By: zafartahirov

Differential Revision: D16199356

fbshipit-source-id: 62aeaf47c12c62a87d9cac208f25f7592e245d6c
2019-07-18 18:58:03 -07:00
8dfbbf7bf2 Add nn.qat.Linear (#22714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22714

We need this module for add fake_quant for weight

Reviewed By: zafartahirov

Differential Revision: D16193585

fbshipit-source-id: ed6c04ecf574ca1fe1dcded22c225da05976f7a3
2019-07-18 18:27:27 -07:00
b6011c3caf Update torchvision in CI. (#22754)
Summary:
To include dea1afbf5e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22754

Differential Revision: D16366676

Pulled By: zhangguanheng66

fbshipit-source-id: abfcb785973f9caa2a5aa1154fa689bbba8ff2dd
2019-07-18 18:22:24 -07:00
358e0d3d44 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 3998c7b50a8b0377e8f1748a8dbd3b7d2afc99a4
2019-07-18 16:36:25 -07:00
9897ec4701 Recursively compile class types (#22475)
Summary:
Try to compile for class types encountered in recursive script
](https://our.intern.facebook.com/intern/diff/16340717/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22475

Pulled By: driazati

Differential Revision: D16340717

fbshipit-source-id: 5e1a46db517be2412f57156efbc4eb3347b01a8a
2019-07-18 15:43:16 -07:00
425d28c30a Reapply: optimize topk on cpu using parallel and partial sort (#19736) (#22865)
Summary:
https://github.com/pytorch/pytorch/issues/19736 was reverted as it was suspected to be broken on the master, trying to reapply
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22865

Differential Revision: D16265457

Pulled By: VitalyFedyunin

fbshipit-source-id: 784bd6405471f15a8a49ebd0f3e98160d7d0679e
2019-07-18 14:15:54 -07:00
c1c4014bba Add warning for legacy autograd function (#22922)
Summary:
When working on https://github.com/pytorch/pytorch/pull/22762, we discovered that we haven't actually deprecated legacy autograd function. This PR puts up the deprecation warning for 1.2, with the goal to remove legacy function support completely in the near future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22922

Differential Revision: D16363916

Pulled By: yf225

fbshipit-source-id: 4b554010a3d1f87a3fa45cc1aa29d019c8f1033c
2019-07-18 14:02:17 -07:00
a2b3403962 Mark protobuf include path as system include (#23012)
Summary:
To suppress (many) compiler warnings from protobuf headers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23012

Differential Revision: D16364573

Pulled By: bddppq

fbshipit-source-id: adbc4921e29389131d43e7bcc1e6fcba19450c76
2019-07-18 13:44:39 -07:00
84d892b645 Remove DistributedDataParallelCPU as DDP now supports CPU models (#22864)
Summary:
cc ailzhang aazzolini yifuwang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22864

Differential Revision: D16358011

Pulled By: mrshenli

fbshipit-source-id: 8db2dc035dea03f07a32c749e754f625fda1bf28
2019-07-18 12:50:45 -07:00
a5e6586618 Revert D16357177: [pytorch][PR] Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup.
Differential Revision:
D16357177

Original commit changeset: f4ca9cd46cc6

fbshipit-source-id: 49e66e7e59df6cbc7f5d847bacc07da134067956
2019-07-18 12:28:46 -07:00
11d257e5df Fix SGD memory leak when there is weight_decay (#23007)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/20146. I am working on another PR that adds CPU and CUDA memory leak checking to all C++ API tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23007

Differential Revision: D16358973

Pulled By: yf225

fbshipit-source-id: 5ee7ed4e61e60424031540a633e1fae09d9df171
2019-07-18 12:10:10 -07:00
502766e99e Add the mathematical definition of torch.sign to clarify this is the sgn function.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22894

Differential Revision: D16345027

Pulled By: ezyang

fbshipit-source-id: 1421571f1f8764539a35b9060d90ea6075f889d3
2019-07-18 11:45:27 -07:00
662fe699c5 Named inference rules for some initializer fns
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22972

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D16342782

Pulled By: zou3519

fbshipit-source-id: 25277688ab51e1e98af0e19aeb9c79399171d2fb
2019-07-18 10:04:29 -07:00
57cec0a720 Named inference rules for split/chunk
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22971

Test Plan: Imported from OSS

Differential Revision: D16342783

Pulled By: zou3519

fbshipit-source-id: 379edc8eb2f45a82ee8a6320f8285f8f81ea0b1b
2019-07-18 10:04:25 -07:00
6b70217a7e Adding README for binaries to OSS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23001

Differential Revision: D16359617

Pulled By: pjh5

fbshipit-source-id: bfe3f0e1dcb00f34e9362a74227e8a0bb90a8aaf
2019-07-18 10:04:21 -07:00
b91ab177a0 Add support to print QTensor in cpp (#22950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22950

Print quantized tensor by first dequantizing it and then printing. Also print the scale, zero_point. size and type of tensor.

Reviewed By: jerryzh168

Differential Revision: D16286397

fbshipit-source-id: 2d6fb1796e5b329a77c022b18af0a39f6edde0d7
2019-07-18 09:44:20 -07:00
0c091380cc disable non-deterministic cudnn ctcloss (#22977)
Summary:
Associated issue: https://github.com/pytorch/pytorch/issues/21680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22977

Differential Revision: D16357873

Pulled By: nairbv

fbshipit-source-id: 58711bac7d3e8390e868d594dc265ba053a1537c
2019-07-18 08:28:42 -07:00
29347cc9cf Fix race condition, bad lock hierarchy. Move getFreeMutex() into AutoNcclGroup. (#22173)
Summary:
There are two mutexes within CUDACachingAllocator that cause a deadlock.  One of the mutexes was added in order to work around the issue of NCCL interacting poorly with cudaFree.  See

- 68ff58d771
- https://github.com/pytorch/pytorch/pull/880

As of NCCL version 2 and its new group start/end APIs, the protection surrounding cudaFree() is no longer needed.  The PyTorch code was updated to use the NCCL2 group start/end API, but the corresponding cuda_free_mutex and its getter getFreeMutex() were not revised.  This PR removes the use of the getFreeMutex() when NCCL2 is used by moving calls to getFreeMutex() into the AutoNcclGroup.  That way, depending on the NCCL version used, we either use the mutex or we use the new group APIs.

The race condition is as follows, thanks to skeelyamd:

The deadlock occurs between hip_free_mutex (aka cuda_free_mutex in github) (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L165) and mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L162).

hip_free_mutex is exported from THCCachingAllocator in getFreeMutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L660) and is acquired in ProcessGroupNCCL::collective (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L397), which then calls back into THCCachingAllocator via c10::cuda::CUDACachingAllocator::recordStream (https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L416 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L655 to https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L379).  At this point it acquires mutex (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L384).

This requires hip_free_mutex to be locked before mutex.

However, in free_blocks (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L505) THCCachingAllocator locks hip_free_mutex.  Free_blocks is called from emptyCache (https://github.com/pytorch/pytorch/blob/master/c10/cuda/CUDACachingAllocator.cpp#L328) which locks mutex.

That requires mutex to be locked before hip_free_mutex.

emptyCache and ProcessGroupNCCL::collective may not be executed concurrently but this is occurring and deadlocking the CPU.

free_blocks is also called by malloc (via cuda_malloc_retry -> free_cached_blocks -> free_blocks) which also locks mutex first and so malloc must not execute concurrent with ProcessGroupNCCL::collective.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22173

Differential Revision: D16357177

Pulled By: pietern

fbshipit-source-id: f4ca9cd46cc6d5e15290d99577d88be3f4fa8972
2019-07-18 07:31:02 -07:00
14ecf92d42 Slightly improve irfft doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22995

Differential Revision: D16356435

Pulled By: soumith

fbshipit-source-id: f6cfd9990fd79faebfb566704359c866ddf36525
2019-07-18 03:12:49 -07:00
c2df54d6d0 avg_pool2d avg_pool3d for LongTensor (#22433)
Summary:
Generate avg_pool2d/avg_pool3d for LongTensor for CPU.
Added divisor_override parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22433

Differential Revision: D16108809

Pulled By: ifedan

fbshipit-source-id: 8de7ff585a0479702cceafb5ccf9dfea62a9cc50
2019-07-17 19:59:09 -07:00
52bf38007b Remove usage of legacy autograd function (#22925)
Summary:
We are planning to put up a deprecation warning for legacy autograd function in 1.2: https://github.com/pytorch/pytorch/pull/22922. This PR removes all usage of legacy function in PyTorch core and test suite, to prepare for the eventual removal of legacy function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22925

Differential Revision: D16344834

Pulled By: yf225

fbshipit-source-id: 8bf4cca740398835a08b7a290f3058c3e46781ba
2019-07-17 19:50:36 -07:00
29853293d7 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 4f801d353ee14ec0bd6fd24830f0e7a4343d67f8
2019-07-17 18:05:09 -07:00
992f3860a3 Quantized relu to native_functions (#22316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22316

Adding the quantized ReLU to the native_functions.yamp, as it has the same signature as non-quantized relu

Reviewed By: jerryzh168

Differential Revision: D16038441

fbshipit-source-id: 1cfbb594eb9bca1b7ec49ca486defcf1908b0d26
2019-07-17 17:31:02 -07:00
e24f18cea0 Revert D15854892: [pytorch][PR] [tensorboard] Cleanup API and remove 'experimental' warning
Differential Revision:
D15854892

Original commit changeset: 06b849882694

fbshipit-source-id: 588edc4616d020a23645f8c8181782c8412c4c6e
2019-07-17 16:45:54 -07:00
a0ef4abeed Add missing comment from #22103 (#22984)
Summary:
One important comment is missing from https://github.com/pytorch/pytorch/issues/22103 (not sure what happened).
This commit makes it up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22984

Differential Revision: D16347044

Pulled By: ezyang

fbshipit-source-id: 0903909a5fb6740b43195136f1a23c28cfb2a02f
2019-07-17 16:21:38 -07:00
442dd7b906 Implement "trimmed lasso" regularization and support all available regularization in a single interface (#22966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22966

We want to implement "trimmed lasso" for feature selection with learnable and regularizable weights. Trimmed lasso is a simple yet powerful improved version from traditional lasso. More reference can be found at https://arxiv.org/abs/1708.04527 and http://proceedings.mlr.press/v97/yun19a.html. For quick and necessary intro, please refer to P1-3 of the paper at https://arxiv.org/abs/1708.04527.

Given n weights, traditional lasso sums up all weights' l1 norms. The trimmed lasso takes an input integer k (how many weights you want to select from n) and only sums over the smallest n - k weights. Given lambda as the regularization constant, the penalty term is only on the smallest n - k weights, but not other larger weights. If lambda becomes larger than certain threshold, the smallest n - k weights are shrunk to zero. That means we have those weights "dropped". With this property, the number k is the number of weights left after lasso, which we can easily control.

Meanwhile, we further support all available regularization in a single interface. Current supported regularizers on weights include no reg, l1, l2, elastic, trimmed l1, elastic with trimmed l1, group l1, and logbarrier.

Differential Revision: D16326492

fbshipit-source-id: 6e1fd75606005d9bc09d6650435c96a7984ba69c
2019-07-17 16:12:31 -07:00
eb76b7a564 Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6
Differential Revision:
D16199862

Original commit changeset: 46ca6029a232

fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b
2019-07-17 14:26:56 -07:00
796a39ba85 Automatic update of fbcode/onnx to 707064980b9825b8705b9d1c9aad34d8b022d5dd (#22981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22981

Previous import was 806aa863020fa180e57f576cb032ec44ce8ddcca

Included changes:
- **[70706498](https://github.com/onnx/onnx/commit/70706498)**: TensorProto::INT8 & INT16 were missed here (#2164) <ZINEKS>
- **[8218a4ea](https://github.com/onnx/onnx/commit/8218a4ea)**: Fix LabelEncoder's shape inference (#2170) <Wei-Sheng Chin>
- **[0f1a9a1c](https://github.com/onnx/onnx/commit/0f1a9a1c)**: Fixing a unit test in Cumsum Operator (#2157) <Jeff Saremi>
- **[2c03cff0](https://github.com/onnx/onnx/commit/2c03cff0)**: [New Operator] CumSum (#2030) <Jeff Saremi>
- **[220b8300](https://github.com/onnx/onnx/commit/220b8300)**: Fix globalpool output shape (#2147) <daquexian>

Reviewed By: benoitsteiner

Differential Revision: D16341736

fbshipit-source-id: 7e7a2684d8c821991231bfd6558f9f6cb4fb05fb
2019-07-17 14:05:14 -07:00
031b406c38 Update ROCm CI to python3.6 (#22322)
Summary:
Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6.

This PR adds the skip tests and some semantic changes for PyTorch.

Open tasks/questions:
* RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped?
* for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322

Differential Revision: D16199862

Pulled By: ezyang

fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8
2019-07-17 13:42:30 -07:00
5adba33c01 Use integer floor division for pooling shape computation (#22304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21935 by using the integer floor division that was introduced for convolution shapes in https://github.com/pytorch/pytorch/issues/9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't.

Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22304

Differential Revision: D16181955

Pulled By: ezyang

fbshipit-source-id: a2405372753572548b40616d1206848b527c8121
2019-07-17 13:23:29 -07:00
332824551c Fix F.one_hot doc signature
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22929

Differential Revision: D16290741

Pulled By: ezyang

fbshipit-source-id: d8b979e64d92b94c5a70bb4ffe2a83042ed6abfc
2019-07-17 13:23:25 -07:00
074afd7143 Remove unneeded IValue copy in unpickler.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22883

Test Plan: Imported from OSS

Differential Revision: D16270330

Pulled By: zdevito

fbshipit-source-id: ffd05b8c6860889d75172a288f339a434af76d45
2019-07-17 11:00:38 -07:00
b6adb568fb Cleanup some logic in pickler
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22882

Test Plan: Imported from OSS

Differential Revision: D16270332

Pulled By: zdevito

fbshipit-source-id: 714f293493965b13e471945fde11831a04875604
2019-07-17 11:00:34 -07:00
3c0814ffeb add docs to onnx APIs (#22938)
Summary:
Add docs to onnx APIs, including
  - export
  - export_to_pretty_string
  - is_in_onnx_export

Fix https://github.com/pytorch/pytorch/issues/14698
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22938

Differential Revision: D16296182

Pulled By: zhangguanheng66

fbshipit-source-id: 1a1fa769b430db6428e6dfafba5447e6e2a75517
2019-07-17 10:50:41 -07:00
4861527446 Cleanup API and remove 'experimental' warning (#21786)
Summary:
This cleans up the `torch.utils.tensorboard` API to remove all kwargs usage (which isn't clear to the  user) and removes the "experimental" warning in prep for our 1.2 release.

We also don't need the additional PyTorch version checks now that we are in the codebase itself.

cc ezyang lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21786

Reviewed By: natalialunova

Differential Revision: D15854892

Pulled By: orionr

fbshipit-source-id: 06b8498826946e578824d4b15c910edb3c2c20c6
2019-07-17 10:34:00 -07:00
2630109727 always restore dlopen flag in dyndep (#22958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22958

When we use `extension_loader.DlopenGuard()` to dyndep or import modules, it sets a `RTLD_GLOBAL` flag, and restores the original flags after the `yield`. However, if the modules is not there, yield will fail, and the flags won't be restored, creating all kinds of symbol conflict problems.

Reviewed By: bddppq

Differential Revision: D16311949

fbshipit-source-id: 7b9ec6d60423ec5e78cae694b66c2f17493840b0
2019-07-17 10:26:25 -07:00
35b6cdc2eb Rewriting hypothesis_utils (#22830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22830

Separating the tensor generation and the generation of the quantization parameters

- Introducing hypothesis filter `assume_not_overflowing`, which makes sure that the generated tensor and qparams play well with each other. **Note: This is an expensive filter!**
- `qtensor` -> Renameed to `tensor`
- `qtensor_conv` -> Renamed to `tensor_conv2d`
- The tensors don't return the quantization parameters anymore, use `qparams` for it
- The `dtypes` argument is just a quantized dtype now.
- The enforcement for zero_point is predefined as before. As before, if set to `None` the zero_point will be sampled. However, if `None`, you can override sampling with `zero_point_min` and `zero_point_max`
- Scale sampling can also be overriden using `scale_min` and `scale_max`

Reviewed By: jerryzh168

Differential Revision: D16234314

fbshipit-source-id: 5b538a5aa9772b7add4f2ce5eff6fd0decd48f8e
2019-07-17 10:16:13 -07:00
b96610bf5a fix the CI job for onnx (#22946)
Summary:
ONNX uses virtualenv, and PyTorch doesn't. So --user flag is causing problems in ONNX ci...

Fixing it by moving it to pytorch only scripts. And will install ninja in onnx ci separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22946

Reviewed By: bddppq

Differential Revision: D16297781

Pulled By: houseroad

fbshipit-source-id: 52991abac61beaf3cfbcc99af5bb1cd27b790485
2019-07-17 09:50:06 -07:00
f72d754877 qlinear operator level benchmark (#22914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22914

Adding op level benchmarking for qlinear operator

Reviewed By: mingzhe09088

Differential Revision: D16285204

fbshipit-source-id: 99b734ddfa0af6aada820cac7b2f38ef7a5868cb
2019-07-17 09:13:17 -07:00
7a99f3987b Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618)
Summary:
…te argument in macro

Changelog:
- Update note about tensors on CPU for the following MAGMA functions
  - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots
  - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors
  - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues
- Remove dummy tensor in ALLOCATE_ARRAY MACRO
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618

Test Plan:
- All existing tests should pass to verify that the patch is correct

This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573

Differential Revision: D16286198

Pulled By: zou3519

fbshipit-source-id: a5a6ec829084bdb752ca6006b8795227cbaf63b1
2019-07-17 07:38:23 -07:00
5911cb8e5c Make load() create only one CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22727

Differential Revision: D16197603

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 3eaefe6f229032b109d63a151fe0a20268b5cf56
2019-07-16 20:08:10 -07:00
ec57d9215f Updating submodules
Reviewed By: yns88

fbshipit-source-id: 7eb29a58ff20b8ff0b793a84eb2f00e0a1bbe4b5
2019-07-16 19:53:06 -07:00
7ed82ea461 Added generation of transpose and dilated 2D and 3D for LongTensor (#22594)
Summary:
Added implementations: transpose2D transpose3D dilated2D and dilated3D for LongTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22594

Differential Revision: D16155462

Pulled By: ifedan

fbshipit-source-id: af57330314bc2c3e0a38b9e75105b20030a1f9bb
2019-07-16 18:58:39 -07:00
bcfa023a00 hardshrink_cpu and hardshrink_backward_cpu refactoring with at::native::cpu_kernel
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22459

Differential Revision: D16132625

Pulled By: pbelevich

fbshipit-source-id: d7eb1cd6ed04eba3d0c54feaca1e5ab2836211b5
2019-07-16 18:58:35 -07:00
ef36046ad7 Better error message for using Python builtin_function_or_method (#22935)
Summary:
* better error in `toSugaredValue`
* removes a bunch of periods on error messages, `ErrorReport` already adds a `:` at the end of the message](https://our.intern.facebook.com/intern/diff/16291079/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22935

Pulled By: driazati

Differential Revision: D16291079

fbshipit-source-id: 478724fc7d1ae79093f4ede18553ffeafa2c7964
2019-07-16 16:49:04 -07:00
25b69997c3 Tensorboard Metrics (#22492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22492

Collect metrics about Tensorboard usage
[internal] fbcode/pytorch/tensorboardX/tensorboardX/writer.py
[OSS] fbcode/caffe2/torch/utils/tensorboard/writer.py
Tensorboard Ondemand
https://fb.quip.com/JRvqAKtzgy6z

Reviewed By: dzhulgakov

Differential Revision: D16105544

fbshipit-source-id: de14e6ec781889e367a6eba39fc777f707628263
2019-07-16 16:18:00 -07:00
7793ab0871 More documentation about the pyobj field.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22885

Test Plan: Imported from OSS

Differential Revision: D16283076

Pulled By: ezyang

fbshipit-source-id: 4f6a87d900c4d430eedc90661de89e0f6916347e
2019-07-16 14:47:38 -07:00
cd11109c2e Fix messed up tests for dropout (#22893)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22109.

I've confirmed with suo that this wasn't intentional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22893

Differential Revision: D16288640

Pulled By: Chillee

fbshipit-source-id: 00fd6fe418ecefb304866a723051d0e5451ba4d5
2019-07-16 14:17:11 -07:00
8ced53d62b Correct the check of whether src is defined in copy_. (#22715)
Summary:
(intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22715

Differential Revision: D16205243

Pulled By: ezyang

fbshipit-source-id: 9bf5a7885691d057198ae482259b36c1773457dd
2019-07-16 14:03:43 -07:00
798d5d9771 Revert D16281714: Add sanity checks for NCCL detection.
Differential Revision:
D16281714

Original commit changeset: 396bcbf099bd

fbshipit-source-id: a22cc112d1b6a62d689f9d8a7f93e8be3abe2a44
2019-07-16 13:58:27 -07:00
7586ffdc57 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 30926ecb8fabee3f020ae183bb568a11145bcada
2019-07-16 13:37:24 -07:00
01f03d56ee Revert D16283037: Add sanity checks for NCCL detection.
Differential Revision:
D16283037

Original commit changeset: fc09c9443a56

fbshipit-source-id: 30cdf7b1ad91498ee615d018de5571ba36f4383e
2019-07-16 13:20:43 -07:00
7a370dbb41 Enable recursive script mode as the default (#22887)
Summary:
This fixes up the test suite (mostly just adding `ignore` decorations
to tests that need to call Python function) so that it all passes with
recursive script enabled.

The main user-facing result of this change is that Python functions are
compiled without any decorators, so non-TorchScriptable code must be
decorated with `torch.jit.ignore` (or
`torch.jit.ignore(drop_on_export=True` to maintain the functionality of
the current `ignore`)

Details can be found in #20939
](https://our.intern.facebook.com/intern/diff/16277608/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22887

Pulled By: driazati

Differential Revision: D16277608

fbshipit-source-id: 0abd0dc4291cf40651a1719bff813abb2b559640
2019-07-16 13:00:08 -07:00
eaee0c6cd9 Make classtypes hold a weak_ptr to their CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22902

Test Plan: Imported from OSS

Differential Revision: D16278159

Pulled By: suo

fbshipit-source-id: 6aa682e347847e808b44218d38ff1dae66945a07
2019-07-16 12:04:20 -07:00
b6a88b3344 Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22901

Test Plan: Imported from OSS

Differential Revision: D16278160

Pulled By: suo

fbshipit-source-id: f3e7d83b48d5f5b5cb1548ccc5b9bd382a3c411a
2019-07-16 12:04:16 -07:00
c6fe864db3 Add key_padding_mask kwarg to Transformer (#22588)
Summary:
Motivation:
The forward method of MultiheadAttention has a kwarg a key_padding_mask. This mask is of shape (N,S) where N is batch and S is sequence length. This mask is applied prior to attention softmax where True values in the mask are set to float('-inf'). This allows you to mask position j from attention for all position i in input sequence. It's typically used to mask padded inputs. So for a sample in a batch we will be able to make sure no encoder outputs depend on padding inputs. Currently the Transformer, TransformerEncoder, and TransformerEncoderLayer do not have this kwarg, and only have options for a (S,S), (T,T), and (S,T) masks which are applied equally across the batch for source input, target output, and target-source memory respectively. These masks can't be used for padding and are instead used for things like subsequent masking in language modeling, by masking the attention of position i to position j.

This diff exposes the key_padding_mask to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods which is ultimately passed to MultiheadAttention forward.

Open question: should we also allow a key_padding_mask for the decoder layer? As padding is usually at the end of each sentence in a batch and sentences are usually decoding from left to right, usually people deal with padding on decoded outputs by just masking those outputs at the loss layer. There might be some scenarios where it's needed though I don't think it would be common. People can also still just subclass and override the layers. We could also pass the input key_padding_mask to the memory <> decoder attention layer. Not sure if that's necessary though because the output of position i from each attention encoder layer won't depend on any masked positions in the input (even if position i is a masked position itself) so there's not really any point in masking position i again.
Adds the key_padding_mask kwarg to Transformer, TransformerEncoder, and TransformerEncoderLayer forward methods.
The standard TransformerEncoderLayer uses a MultiheadAttention layer as self_attn. MultiheadAttention forward method has a key_padding_mask kwarg that allows for masking of values such as padding per sequence in a batch, in contrast to the attn_mask kwarg which is usually of shape (S,S) and applied equally across the batch.

MultiheadAttention calls functional.multi_head_attention_forward, which has the same key_padding_mask kwarg of shape (N,S). Masked (True) values are set to float('-inf').
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22588

Test Plan:
buck test mode/dev caffe2/test:nn -- 'test_transformerencoderlayer \(test_nn\.TestNN\)'
buck test mode/dev caffe2/test:nn -- 'test_Transformer_cell \(test_nn\.TestNN\)'
buck test mode/dev caffe2/test:nn -- 'test_transformer_args_check \(test_nn\.TestNN\)'

Differential Revision: D16112263

Pulled By: lucasgadams

fbshipit-source-id: dc4147dd1f89b55a4c94e8c701f16f0ffdc1d1a2
2019-07-16 11:57:22 -07:00
9b9546a498 replace ByteTensor with bool in fill_test (#22913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22913

as title

Reviewed By: hl475

Differential Revision: D16285248

fbshipit-source-id: 78b13d48d547760e59e0e5c8875ab09a3cd24828
2019-07-16 11:51:55 -07:00
31497799b9 Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16283037

Pulled By: ezyang

fbshipit-source-id: fc09c9443a568d9af1c78a847282a7d707c49dd6
2019-07-16 11:32:36 -07:00
e2046f8c1d Add sanity checks for NCCL detection.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22819

Test Plan: Imported from OSS

Differential Revision: D16281714

Pulled By: ezyang

fbshipit-source-id: 396bcbf099bd07b996cf779c6b43092096b52d90
2019-07-16 11:32:32 -07:00
3ea04b59c0 Resolve the doc issue in which two asterisks have weird links. (#22896)
Summary:
Asterisks start emphases in rst. We should either escape them or put them as interpreted text.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22896

Differential Revision: D16282869

Pulled By: zou3519

fbshipit-source-id: 15ec4286434db55fb8357b1a12e6f70ef54f8c66
2019-07-16 11:23:06 -07:00
3f3f5d042a Revert D16227440: [pytorch][PR] Update note about tensors on CPU for certain MAGMA functions, elimina…
Differential Revision:
D16227440

Original commit changeset: 97d5537c5da9

fbshipit-source-id: 2dacfcc821e1fb64466e185efa0f6abd0c9ba526
2019-07-16 11:13:59 -07:00
52de340629 Export torch.masked_fill with onnx::where
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22521

Reviewed By: zrphercule

Differential Revision: D16155168

Pulled By: houseroad

fbshipit-source-id: 5d419f08213324d474b839ba1ae13c799aeee92a
2019-07-16 10:55:30 -07:00
6c997538b7 Unwrap sccache post-build for ROCm compilations. (#22743)
Summary:
The sccache wrapping strategy causes problems for at-runtime kernel
compilation of MIOpen kernels. We therefore - after the builds of
caffe2/pytorch are complete - unwrap sccache again by moving the clang-9
actual binary back into its original place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22743

Differential Revision: D16283329

Pulled By: bddppq

fbshipit-source-id: 4fcdc92be295d5ea9aba75c30e39af1a18a80c13
2019-07-16 10:28:16 -07:00
ba38445cfd Fix alias annotations for dict ops (#22900)
Summary:
Fixes #22553
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22900

Pulled By: driazati

Differential Revision: D16277794

fbshipit-source-id: 657f18c50c9a87597ec1a7d568cc532638cfe386
2019-07-16 10:28:12 -07:00
8482efb203 pin_memory malloc now uses existing context if available. (#22229)
Summary:
This is achieved by using `cuDevicePrimaryCtxGetState` as a way to check whether a primary context exists on a device. It is not too slow, from this benchmark of a single call to it on CUDA 10.1, Titan Xp, driver 415.27:
```
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_cuDevicePrimaryCtxGetState        301 ns        301 ns    2319746
```

Commits:

1. Add `CUDAHooks::getDeviceWithPrimaryContext` which returns a device index with primary context (if exists).
    Link `c10/cuda` against `libcuda` for device API calls.
2. Use `getDeviceWithPrimaryContext` to check primary context in `pin_memory`.
    Fix `OptionalDeviceGuard` doc.
3. Refactor `test_cuda_primary_ctx.py` to support multiple tests.
    Add test for this in that file.

Fixes https://github.com/pytorch/pytorch/issues/21081.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22229

Differential Revision: D16170194

Pulled By: zou3519

fbshipit-source-id: 485a45f211b7844c9e69c63f3b3b75194a796c5d
2019-07-16 10:18:30 -07:00
054c7eb0f4 Update note about tensors on CPU for certain MAGMA functions, elimina… (#22618)
Summary:
…te argument in macro

Changelog:
- Update note about tensors on CPU for the following MAGMA functions
  - magma_(d/s)getrf_gpu and magma_getrf_nopiv_gpu require tensors on CPU for pivots
  - magma_(d/s)geqrf2_gpu requires tensors on CPU for elementary reflectors
  - magma_(d/s)syevd_gpu requires tensors on CPU for eigenvalues
- Remove dummy tensor in ALLOCATE_ARRAY MACRO
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22618

Test Plan:
- All existing tests should pass to verify that the patch is correct

This PR has been proposed to eliminate confusion due to the previous comments, as indicated in https://github.com/pytorch/pytorch/issues/22573

Differential Revision: D16227440

Pulled By: zou3519

fbshipit-source-id: 97d5537c5da98c0ed3edc4668a09294794fc426b
2019-07-16 10:09:10 -07:00
f8ad65adb1 Fix torch.triu / torch.tril on contiguous tensors with non-default st… (#22730)
Summary:
…rides

Changelog:
- Fix behavior of `torch.triu` / `torch.tril` on certain unsqueezed tensors that lead to uninitialized values on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22730

Test Plan:
- Add tests for these cases in test_triu_tril in test_torch

Fixes https://github.com/pytorch/pytorch/issues/22581

Differential Revision: D16222897

Pulled By: zou3519

fbshipit-source-id: b86b060187797e5cd2a7731421dff1ba2b5c9596
2019-07-16 10:09:03 -07:00
0ea8e61f03 For consistent CUDA_HOME behavior (#22845)
Summary:
Align the behavior of `torch.utils.cpp_extension.CUDA_HOME` with that of `tools.setup_helpers.cuda.CUDA_HOME`.

Typically, I swapped the position of guess 2 and guess 3 in `torch.utils.cpp_extension.CUDA_HOME` .

Fixing issue https://github.com/pytorch/pytorch/issues/22844
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22845

Differential Revision: D16276241

Pulled By: zou3519

fbshipit-source-id: 3b62b439b2f794a6f3637a5fee58991f430985fe
2019-07-16 09:55:56 -07:00
560d847da6 add benchmark for PT fill_ op (#22867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22867

as title

Reviewed By: hl475

Differential Revision: D16263458

fbshipit-source-id: 55b0e62023c117aaa0c2b9a4d65b234a388f086d
2019-07-16 09:50:41 -07:00
3b1c3996e1 remove RTTI check for TensorImpl shadow copy (#22773)
Summary:
We introduced RTTI in recent change: https://github.com/pytorch/pytorch/pull/21613

For internal mobile build we don't enable '-frtti' yet. This diff is trying to replace
RTTI with alternative approach.

According to dzhulgakov we could compare two tensors' type_id directly in most cases -
which is more strict than comparing TensorImpl subclass type as TensorImpl -> type_id
mapping is 1-to-n but it's more proper for this use case.

The only two cases where we can relax direct type comparison (for legacy reason) are:
1. CPUTensor <-> CUDATensor;
2. SparseCPUTensor <-> SparseCUDATensor;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22773

Differential Revision: D16277696

Pulled By: ljk53

fbshipit-source-id: 043e264fbacc37b7a11af2046983c70ddb62a599
2019-07-15 23:21:57 -07:00
c5afdd0b55 Revert D16197605: [jit] Make traced fns also go into the global python CU
Differential Revision:
D16197605

Original commit changeset: d32c975486b0

fbshipit-source-id: a00f0490cc23824792f3e745d7b5a003b1a33d20
2019-07-15 22:31:33 -07:00
a326aad816 Revert D16197608: [jit] Make classtypes hold a weak_ptr to their CU
Differential Revision:
D16197608

Original commit changeset: 22250d6f0d24

fbshipit-source-id: 47a8cdeb62b1033252070ecb92906358014b551a
2019-07-15 19:49:41 -07:00
5f05037de6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: a7af5bc022abbfb81af31dbb653e25a3b8d54c4f
2019-07-15 18:07:21 -07:00
94d99f2522 add num_runs flag to the benchmark (#22892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22892

Think of num_runs as manually run the binary <num_runs> times. Each run runs the operator for many iterations.

Reviewed By: hl475

Differential Revision: D16271597

fbshipit-source-id: b6f509ee0332c70f85bec0d447b84940c5c0cecd
2019-07-15 17:18:25 -07:00
6ffacd5f02 Use original module's class name for ScriptModules (#22873)
Summary:
Since recursive script creates a ScriptModule from an `nn.Module`,
there's no ties to the original module to pull a type name from, so we
have to explicitly pass it in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22873

Pulled By: driazati

Differential Revision: D16268547

fbshipit-source-id: 902a30e6e36427c6ba7033ded027a29d9dcbc1ee
2019-07-15 15:27:29 -07:00
248336946e remove stray print
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22825

Differential Revision: D16266401

Pulled By: Krovatkin

fbshipit-source-id: 214f90578061aad83eab143381b3c05386edee3d
2019-07-15 14:54:10 -07:00
f7de9be3c0 Add FakeQuantize Module (#21767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21767

Adding FakeQuantize Module
for quantization aware training

Reviewed By: dzhulgakov

Differential Revision: D15728503

fbshipit-source-id: 2a9a6a362812ede3deac42b93dddca35987bd8e6
2019-07-15 14:08:55 -07:00
0cddd3e751 update README (#21312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21312

This diff updates the README of op-bench.

Reviewed By: zheng-xq

Differential Revision: D15612665

fbshipit-source-id: b33119fd4f9d086b03b5e28fbe8a4015b282b15c
2019-07-15 13:34:05 -07:00
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
260b0e8476 Make classtypes hold a weak_ptr to their CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22726

Differential Revision: D16197608

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 22250d6f0d249f61f269afb4fe8e7d1af0be1205
2019-07-15 13:13:16 -07:00
5fc1260e0a Make traced fns also go into the global python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22725

Differential Revision: D16197605

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: d32c975486b0cb4808687f0aa89325571f2817c4
2019-07-15 13:13:12 -07:00
16aa235f43 _script_compile and _script_class_compile add to the python CU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22724

Differential Revision: D16197609

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: e12b31f8c8ce14b0968f4ac9445e7d225126b210
2019-07-15 13:13:08 -07:00
f2f80744be Close loophole to create untyped tuples (#22518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22518

-

Reviewed By: dzhulgakov

Differential Revision: D16115216

fbshipit-source-id: 1afae3666f7acd7d7833db8a72168364fed4879d
2019-07-15 11:33:45 -07:00
800f4936f0 Deprecate untyped Lists (#22517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22517

Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict().
This should hopefully make it clear that this is not public API and prevent people from using it.

Reviewed By: dzhulgakov

Differential Revision: D16115214

fbshipit-source-id: 2c8d0e4e375339c699d583995f79c05c59693c3e
2019-07-15 11:33:35 -07:00
bd88fd0793 Added .bfloat16() (#22852)
Summary:
Add conversion method for bfloat16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22852

Differential Revision: D16256760

Pulled By: izdeby

fbshipit-source-id: 01d75495f9df513a0cdf78791c3eb013ab92bd95
2019-07-15 09:32:18 -07:00
8399197df6 Set up CI with Azure Pipelines (#22839)
Summary:
Introduce Azure Pipelines for the linting checks.  This is meant to be equivalent to the existing Travis linting phase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22839

Differential Revision: D16260376

Pulled By: ezyang

fbshipit-source-id: 1e535c3096358be67a0dad4cd920a92082b2d18e
2019-07-15 06:41:56 -07:00
535c5540bc Back out "Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen"" (#22794)
Summary:
Original commit changeset: 227df3b85316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22794
ghstack-source-id: 86400904

Differential Revision: D16222777

fbshipit-source-id: 0b198ac59e640df0b8204b4ed30f8e822c15fd9a
2019-07-15 06:28:56 -07:00
317cf7c874 Remove tensor_data() call in Python Variable() and nn.Parameter() constructors (#22821)
Summary:
As part of the Variable/Tensor merge, `variable.tensor_data()` should be removed in favor of `variable.detach()`. This PR removes  `tensor_data()` call sites in Python `Variable()` and `nn.Parameter()` constructor paths.

Note that this PR is BC-breaking in the following way:
- For Python `Variable()` constructor:
Previously, in-place updating a tensor after it's been used to create a Variable does not bump the Variable's version counter, which causes the following problem:
```python
t = torch.ones(2, 3)
v = torch.autograd.Variable(t).requires_grad_()
y = v * v
t.add_(1)  # This bumps version counter of `t`
y.sum().backward()  # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch
```
After this patch, in-place updating a tensor after it's been used to create a Variable will also bump the Variable's version counter, thus preserving the correctness of the Variable's version counter.

- For Python `nn.Parameter()` constructor:
Previously, in-place updating a tensor after it's been used to create an nn.Parameter does not bump the nn.Parameter's version counter, which causes the following problem:
```python
t = torch.ones(2, 3)
v = torch.nn.Parameter(t)
y = v * v
t.add_(1)  # This bumps version counter of `t`
y.sum().backward()  # This computes `v`'s gradient incorrectly before this patch, and throws error after this patch
```
After this patch, in-place updating a tensor after it's been used to create an nn.Parameter will also bump the nn.Parameter's version counter, thus preserving the correctness of the nn.Parameter's version counter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22821

Differential Revision: D16258030

Pulled By: yf225

fbshipit-source-id: 9a6d68cea1864893193dbefbb6ef0c1d5ca12d78
2019-07-14 21:09:29 -07:00
14e8fb70a1 Make the signature of fill_out consistent with fill_.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22761

Test Plan: Imported from OSS

Differential Revision: D16257779

Pulled By: ezyang

fbshipit-source-id: b1201500042ae1f4678835da957de1777c1038a3
2019-07-14 19:20:59 -07:00
1c266c2738 Move the body of fill_kernel_impl into fill_kernel_cuda
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22760

Test Plan: Imported from OSS

Differential Revision: D16257782

Pulled By: ezyang

fbshipit-source-id: d214d2d77affd937109b33ca841af76004f85834
2019-07-14 19:20:53 -07:00
fc297b8e83 Move fill and fill_diagonal to Fill.cpp, Fill.h, and FillKernel.{cpp,cu}
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22758

Test Plan: Imported from OSS

Differential Revision: D16257781

Pulled By: ezyang

fbshipit-source-id: 9e5ed06e95ef65b036eb388488faad981f1e8012
2019-07-14 19:20:46 -07:00
815e73bc20 make_variable consumes the Tensor if it only has one reference
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22705

Test Plan: Imported from OSS

Differential Revision: D16192220

Pulled By: jamesr66a

fbshipit-source-id: 9c42bb759077b74a1370d3a2d7114ed3593f333b
2019-07-14 18:36:20 -07:00
b5fa9a340a Temporarily skip mypy-0.720 to unbreak master type checks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22835

Differential Revision: D16239190

Pulled By: bddppq

fbshipit-source-id: e97fd3aae0676de8a06dc9fb498f36ed28dc92c3
2019-07-14 09:49:24 -07:00
1a93b96815 Revert da315a4
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22837

Differential Revision: D16239667

Pulled By: llyfacebook

fbshipit-source-id: 1a625d78d633927129dd2791e65b333b3902f94f
2019-07-13 01:54:20 -07:00
92468f0a6b Revert D16238204: Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators
Differential Revision:
D16238204

Original commit changeset: a6b5eb3f4820

fbshipit-source-id: bdfae93c522c1ce734ab8dc736ced66411fe50ee
2019-07-12 22:58:50 -07:00
da315a4e2a Revert D16037021: Support GRU module quantization in Pytorch
Differential Revision:
D16037021

Original commit changeset: 71145c67d869

fbshipit-source-id: 33cd2e57eba30ea33cc4f3116732a721c26f6efb
2019-07-12 21:05:34 -07:00
fcfefc3439 Revert D16224780: [pytorch][PR] [ROCm] MIOpen integration into pytorch RNN operators
Differential Revision:
D16224780

Original commit changeset: 331dafbb7689

fbshipit-source-id: a6b5eb3f4820fbb58d4a329aa4c93b40a111ff27
2019-07-12 20:55:05 -07:00
ead1193241 Transfer Learning: Caffe2 load op changes to return shape inference (#22829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22829

Sending out caffe2 load op changes separately since we want pick it to open source.

This change is needed because the shape information of the blobs is determined from the load operator and that shape information is needed in our download_group.

Reviewed By: boryiingsu

Differential Revision: D16229465

fbshipit-source-id: f78b2df9a7f26968d70eca68dde75cd11ab6f7a2
2019-07-12 19:45:13 -07:00
d8c1b86135 Support GRU module quantization in Pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22498

Reviewed By: BIT-silence

Differential Revision: D16037021

fbshipit-source-id: 71145c67d8696e525b686cd3313033e5b6771718
2019-07-12 18:31:08 -07:00
ba9d559a12 Get rid of torch.mean shape analysis
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22810

Test Plan: Imported from OSS

Differential Revision: D16226973

Pulled By: jamesr66a

fbshipit-source-id: ad23f48782e8d21788ecae39fc512ff4502716bf
2019-07-12 17:50:10 -07:00
9eb039334f Use Linear Operator with fp16 weights in JIT (#22323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22323

This diff adds an interface to use quantized Linear op in JIT.

Reviewed By: jamesr66a

Differential Revision: D16040724

fbshipit-source-id: 90e90aff9973c96ea076ed6a21ae02c349ee2bcf
2019-07-12 15:59:17 -07:00
573d9e6975 Support Linear operation with fp16 weights in ATen (#22023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22023

This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y  =   X * W + B with dtypes:
(fp32, fp32, fp16, fp32)

To do that, three steps are needed:
1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16`
2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight`
3. Add bias to the result from step2 and return the final Y

Reviewed By: jianyuh

Differential Revision: D15921768

fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d
2019-07-12 15:59:13 -07:00
35ee4bf4e5 Revert D16204820: [pytorch][PR] optimize topk on cpu using parallel and partial sort
Differential Revision:
D16204820

Original commit changeset: ea70562c9149

fbshipit-source-id: c8f8e262c7c681593d243f035bf1f0d84675c9dc
2019-07-12 15:14:06 -07:00
cf2889ad8f add support for breaks and continues (#21692)
Summary:
Add support for breaks and continues in the jit. We do with a Graph transform pre-SSA.

A graph of the form
```
def test():
    while i < 5:
        if i == 3:
            break
        i += 1
        print(i)
```
has the body of the loop transformed to
```
if i == 3:
    did_break = True
else:
    did_break = False
if did_break:
    loop_exit = True
else:
    i += 1
    print(i)
    loop_exit = i < 5
```

I am going to add more tests but I think it is ready for review now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21692

Differential Revision: D16215807

Pulled By: eellison

fbshipit-source-id: 365102f42de4861d9323caaeb39a96de7619a667
2019-07-12 15:02:44 -07:00
b3147bc674 PyTorch export to ONNX Opset 7 and 8 - Cont (#22421)
Summary:
This is an extension to the original PR https://github.com/pytorch/pytorch/pull/21765

1. Increase the coverage of different opsets support, comments, and blacklisting.
2. Adding backend tests for both caffe2 and onnxruntime on opset 7 and opset 8.
3. Reusing onnx model tests in caffe2 for onnxruntime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22421

Reviewed By: zrphercule

Differential Revision: D16225518

Pulled By: houseroad

fbshipit-source-id: 01ae3eed85111a83a0124e9e95512b80109d6aee
2019-07-12 14:52:48 -07:00
9f8e2c067f MIOpen integration into pytorch RNN operators (#22774)
Summary:
This PR enables pytorch RNN operators to use MIOpen engine

ezyang bddppq

cc: lcskrishna iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22774

Differential Revision: D16224780

Pulled By: bddppq

fbshipit-source-id: 331dafbb76892d7390b620a95d8f384d38ee5533
2019-07-12 14:47:48 -07:00
30e03df638 Speeds up fast-path for 1D tensors (#22756)
Summary:
Using PMCTest (https://www.agner.org/optimize/) to measure
TensorIterator construction, this results in ~600 fewer instructions
retired (~300 fewer cycles) for constructing TensorIterator on a 1D
tensor. (Should be roughly ~100 ns, but it's hard to measure that
precisely end-to-end).

```
Before:
     Clock   Core cyc   Instruct       Uops   L1D Miss
      5082       2768       5690       7644          3

After:
     Clock   Core cyc   Instruct       Uops   L1D Miss
      4518       2437       5109       6992          0
```

Note that Instruct is reliable, Core cyc is a little noisy, and Clock
is a little more noisy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22756

Differential Revision: D16207777

Pulled By: VitalyFedyunin

fbshipit-source-id: bcc453a90472d9951a1c123bcb1b7a243fde70ac
2019-07-12 12:33:38 -07:00
02bc06a683 avoid kernel launches for zero-sized tensor inputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22790

Test Plan: Imported from OSS

Differential Revision: D16226168

Pulled By: wanchaol

fbshipit-source-id: 081607c9acc1540c753b080c5f727dc4e8c22acc
2019-07-12 12:24:52 -07:00
b1b65f34a9 Make PythonArgs::tensor and PythonArgs::scalar faster (#22782)
Summary:
Speeds up the common case where Tensor is a torch.Tensor (not a
subclass). This reduces the number of executed instructions for a
torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster).

Note that most of the PythonArgs accessors are too large to be inlined.
We should move most of them to the cpp file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782

Differential Revision: D16223592

Pulled By: colesbury

fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1
2019-07-12 11:57:29 -07:00
10c14ad17c optimize topk on cpu using parallel and partial sort (#19736)
Summary:
This PR aims at improving `topk()` performance on CPU. This is useful when computing **beam search** during `Transformer` and `BERT`.

Given a tensor x of size `[N, C]`, and we want to apply `x.topk(K)`, the current logic is **sequentially** loop on the dimension of `N` and do **quick select** on the dimension of `C` so as to find out top K elements.

Performance can be further improved from:

- On the dimension of `N`, it can be paralleled
- Maybe a faster sorting algorithm for `topk`. (After a bunch of experimenting, `std::partial_sort` seems to be the most promising)

So i compared 3 versions:

1. vanilla: sequential + quick select
2. reference PR https://github.com/pytorch/pytorch/issues/19737: parallel + quick select
3. this PR: parallel + partial sort

with the following benchmark, on `Xeon 8180, 2*28 cores@2.5 GHz`:
```python
import torch
from time import time

num_iters = 1000

def bench_topk(N=8, C=168560, k=10):
    a = torch.randn(N, C)
    # warm up
    for i in range(100):
        torch.topk(a, k)

    t = 0
    for i in range(num_iters):
        a = torch.randn(N, C)
        start = time()
        value, indice = torch.topk(a, k)
        t += time() - start
    print("#[%d, %d] times: %f ms" % (N, C, t / num_iters * 1000))

Ns = [10, 20, 30]
Cs = [10000, 20000, 40000, 80000, 160000, 320000]

for n in Ns:
    for c in Cs:
        bench_topk(N=n, C=c)

```
### vanilla: sequential + quick select
```
#[10, 10000] times: 0.746740 ms
#[10, 20000] times: 1.437399 ms
#[10, 40000] times: 2.832455 ms
#[10, 80000] times: 5.649426 ms
#[10, 160000] times: 11.309466 ms
#[10, 320000] times: 22.798765 ms
#[20, 10000] times: 1.511303 ms
#[20, 20000] times: 2.822024 ms
#[20, 40000] times: 5.564770 ms
#[20, 80000] times: 11.443044 ms
#[20, 160000] times: 22.747731 ms
#[20, 320000] times: 46.234449 ms
#[30, 10000] times: 2.214045 ms
#[30, 20000] times: 4.236179 ms
#[30, 40000] times: 8.418577 ms
#[30, 80000] times: 17.067578 ms
#[30, 160000] times: 33.826214 ms
#[30, 320000] times: 68.109420 ms
```
### reference PR: parallel + quick select
```
#[10, 10000] times: 0.271649 ms
#[10, 20000] times: 0.593016 ms
#[10, 40000] times: 1.133518 ms
#[10, 80000] times: 2.082355 ms
#[10, 160000] times: 4.049928 ms
#[10, 320000] times: 7.321285 ms
#[20, 10000] times: 0.315255 ms
#[20, 20000] times: 0.539054 ms
#[20, 40000] times: 1.000675 ms
#[20, 80000] times: 1.914586 ms
#[20, 160000] times: 4.437122 ms
#[20, 320000] times: 8.822445 ms
#[30, 10000] times: 0.347209 ms
#[30, 20000] times: 0.589947 ms
#[30, 40000] times: 1.102814 ms
#[30, 80000] times: 2.112201 ms
#[30, 160000] times: 5.186837 ms
#[30, 320000] times: 10.523023 ms
```
### this PR: parallel + partial sort
```
#[10, 10000] times: 0.150284 ms
#[10, 20000] times: 0.220089 ms
#[10, 40000] times: 0.521875 ms
#[10, 80000] times: 0.965593 ms
#[10, 160000] times: 2.312356 ms
#[10, 320000] times: 4.759422 ms
#[20, 10000] times: 0.167630 ms
#[20, 20000] times: 0.265607 ms
#[20, 40000] times: 0.471477 ms
#[20, 80000] times: 0.974572 ms
#[20, 160000] times: 3.269645 ms
#[20, 320000] times: 6.538608 ms
#[30, 10000] times: 0.204976 ms
#[30, 20000] times: 0.342833 ms
#[30, 40000] times: 0.589381 ms
#[30, 80000] times: 1.398579 ms
#[30, 160000] times: 3.904077 ms
#[30, 320000] times: 9.681224 ms
```
In summary, `2` is **5x** faster than `vanilla` on average and `3` is **8.6x** faster than `vanilla`.
On `Fairseq Transformer`, the default parameter on dataset `wmt14` would have a `topk` size of `[8, 168560]`, and this operator gets `3x` faster with this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19736

Differential Revision: D16204820

Pulled By: VitalyFedyunin

fbshipit-source-id: ea70562c9149a0d832cf5872a891042ebd74fc63
2019-07-12 11:10:20 -07:00
fc23d7f3bd Speed up TensorIterator::compute_strides a little (#22779)
Summary:
For three 1-D operands, compute_strides now takes 298 instructions instead
of 480. (Saves ~36 ns).  We'll want to make Tensor::sizes(), strides(), and
element_size() trivially inlinable to speed this up more.

(Using PMCTest from https://www.agner.org/optimize/ to measure instructions retired)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22779

Differential Revision: D16223595

Pulled By: colesbury

fbshipit-source-id: e4730755f29a0aea9cbc82c2d376a8e6a0c7bce8
2019-07-12 10:57:32 -07:00
f266a63eeb Initiate checkCuda90Bug warning (#22757)
Summary:
Initiate checkCuda90Bug warning to THCudaBlas_Sgemm and THCudaBlas_Hgemm.
https://github.com/pytorch/pytorch/pull/22034
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22757

Differential Revision: D16223085

Pulled By: zhangguanheng66

fbshipit-source-id: 470c6cbaba16a3cec295993c2673f02008a602a6
2019-07-12 09:55:09 -07:00
ccb28939bf Revert D16222539: [pytorch][PR] Let users pass CMake-specific options starting with CMAKE_ to CMake.
Differential Revision:
D16222539

Original commit changeset: 1cc6e69c85cd

fbshipit-source-id: c79d68976ac1047c54b32c093429b23e9482cd8f
2019-07-12 07:57:57 -07:00
612eed31a9 Let users pass CMake-specific options starting with CMAKE_ to CMake. (#22776)
Summary:
This should make it more convenient to follow https://github.com/pytorch/pytorch/issues/8433's suggestion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22776

Differential Revision: D16222539

Pulled By: ezyang

fbshipit-source-id: 1cc6e69c85cdf0d7f8074653445410d85746847c
2019-07-12 07:28:32 -07:00
7eb0319339 add new tests to benchmark_all_test (#22787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22787

as title

Reviewed By: hl475

Differential Revision: D16219329

fbshipit-source-id: 097ee73e7644d5ca482ad044d0fd2c3e7dc2c10b
2019-07-11 22:50:55 -07:00
1878800f47 make custom op work in OSS environment (#22781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22781

The custom op is required to make the op benchmark work with JIT. Running this command `python setup.py install` in the pt_extension directory to install it. It is required.

Reviewed By: hl475

Differential Revision: D16214430

fbshipit-source-id: c9221c532011f9cf0d5453ac8535a6cde65e8376
2019-07-11 21:17:17 -07:00
8ec712da30 Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_linear (#22403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22403

- C10 Operator Registration (https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/op_registration/op_registration.cpp) supports None type.

- ATen has None Tensor support, e.g., https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml#L1078

Reviewed By: zafartahirov

Differential Revision: D16069522

fbshipit-source-id: 3acaec783fc138ff36b14ffc0582d0764be4ad34
2019-07-11 17:33:08 -07:00
9d11004ee4 Update ONNX constant folding to support opset 10. (#22515)
Summary:
Currently ONNX constant folding (`do_constant_folding=True` arg in `torch.onnx.export` API) supports only opset 9 of ONNX. For opset 10, it is a no-op. This change enables ONNX constant folding for opset 10. Specifically there are three main changes:
1) Turn on constant folding ONNX pass for opset 10.
2) Update support for opset 10 version of `onnx::Slice` op for backend computation during constant folding.
3) Enable constant folding tests in `test/onnx/test_utility_funs.py` for multiple opsets (9 and 10).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22515

Reviewed By: zrphercule

Differential Revision: D16189336

Pulled By: houseroad

fbshipit-source-id: 3e2e748a06e4228b69a18c5458ca71491bd13875
2019-07-11 16:29:03 -07:00
291570e085 make CompilationUnit::define return defined functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22723

Test Plan: Imported from OSS

Differential Revision: D16197604

Pulled By: suo

fbshipit-source-id: b22491a58aa9ea476acab06614093ff004291407
2019-07-11 14:55:43 -07:00
de819be93e refactor self to be a class again
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22722

Test Plan: Imported from OSS

Differential Revision: D16197607

Pulled By: suo

fbshipit-source-id: b4dd96b3f9cc46b48678aab0ff89afc3666e2185
2019-07-11 14:55:39 -07:00
22d70e0d4b Give functions qualified names
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22721

Test Plan: Imported from OSS

Differential Revision: D16197606

Pulled By: suo

fbshipit-source-id: 94718fcdb0d3b651f16674af3cfd6249ed4533ae
2019-07-11 14:55:34 -07:00
4b48ae4aec Suppress progress bar only for pip install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22708

Test Plan: Imported from OSS

Differential Revision: D16206329

Pulled By: ezyang

fbshipit-source-id: 4ec29e0e9e48a168e88ec716ee8e270c56a38cdb
2019-07-11 13:50:29 -07:00
05d56bd1b6 Remove hard-coded NVRTC specific constant from fuser header
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699

Test Plan: Imported from OSS

Differential Revision: D16192290

Pulled By: bwasti

fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf
2019-07-11 13:44:25 -07:00
513b7a7a06 assert_no_internal_overlap pass op name by const ref
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22729

Test Plan: Imported from OSS

Differential Revision: D16205448

Pulled By: jamesr66a

fbshipit-source-id: b383c461dd58e8a3d0bfeae43ebfd1e021668f80
2019-07-11 13:38:10 -07:00
9690f8629d Move the storage in empty_cpu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22728

Test Plan: Imported from OSS

Differential Revision: D16205449

Pulled By: jamesr66a

fbshipit-source-id: 6fd198d0d526b5de393e2988906dac2a63064f24
2019-07-11 13:38:07 -07:00
a797815198 bucketize op shape inference (#22716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22716

add shape inference func to bucketize op

Reviewed By: ipiszy

Differential Revision: D16193718

fbshipit-source-id: 6e893356b6408255538545673047dd5124837e70
2019-07-11 12:44:29 -07:00
ac78a86e1d Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen" (#22749)
Summary:
Original commit changeset: add2ee8a8865

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22749
ghstack-source-id: 86323899

Differential Revision: D16203552

fbshipit-source-id: 227df3b85316315c15d2cb7b6a5c884096a82e9e
2019-07-11 12:21:21 -07:00
8bdda03ae1 optimize RNN on CPU (#22512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22512

optimize RNN on CPU

Reviewed By: llyfacebook

Differential Revision: D16113360

fbshipit-source-id: 9ee53b3b4bb9b636e7be1ccdf25420e2caa60762
2019-07-11 12:16:27 -07:00
Jie
3135298dde (#22602)
Summary:
1. update on restricting block.z <= 64, compliant to CUDA maximum z-dimension of
a block;
2. clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22602

Differential Revision: D16203857

Pulled By: ezyang

fbshipit-source-id: 567719ae175681a48eb0f818ca0aba409dca2550
2019-07-11 12:02:58 -07:00
1682d38a25 Improve hypothesis_utils.py for qtensor (#22693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22693

change np.finfo to torch.finfo

Differential Revision: D16185556

fbshipit-source-id: 594f8ba1d6317ac2de47af754a8bd6015d40ea15
2019-07-11 11:56:01 -07:00
3fabb9f105 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22737

Differential Revision: D16200090

Pulled By: bddppq

fbshipit-source-id: 3819716a9b01f073966fc8b420c6a0b8d13232ac
2019-07-11 11:09:24 -07:00
45cf33a731 add fill_diagonal function (#21892)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21796
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21892

Differential Revision: D16164678

Pulled By: colesbury

fbshipit-source-id: 85df8ae9b7a6a91b6023fe7295b3a8124e4526ea
2019-07-11 09:20:44 -07:00
89d6e88042 Add environment variables used in CONTRIBUTING example (#22736)
Summary:
Some other environment variables can be added to speed things up for development.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22736

Differential Revision: D16200904

Pulled By: soumith

fbshipit-source-id: 797ef91a863a244a6c96e0adf64d9f9b4c9a9582
2019-07-11 04:15:51 -07:00
5147819f9d enabled MIOpen depthwise convolutions (#22696)
Summary:
They mistakenly got removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22696

Differential Revision: D16191442

Pulled By: bddppq

fbshipit-source-id: 7ceda274c557879e11f84596040efe9e0c9b861f
2019-07-11 00:14:58 -07:00
d21e476dcd Quantized Conv2d Module (#21323)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; https://github.com/pytorch/pytorch/issues/21808 Quantized conv avoid functional usage&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15835572/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15551835/)

Quantized Conv2d Module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21323

Test Plan:
Tests are split into two parts: functional and API.

`buck test mode/dev caffe2/test:quantized -- test_conv_api` : https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491

```
Parsing buck files: finished in 1.4 sec
Building: finished in 4.6 sec (100%) 7136/7136 jobs, 2 updated
  Total time: 6.1 sec
Trace available for this run at /tmp/testpilot.20190703-153023.392592.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 7149de230b9e1cdc7a872bb31fe099f0616dee09 fbpkg e59e6ab0fe8e47a496f915d34555c3ad at Fri Jun 28 12:20:54 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/647/t.par
Discovering tests
Running 2 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491
      ✓ caffe2/test:quantized - test_conv_api (test_nn_quantized.ModuleAPITest) 0.044 1/2 (passed)
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 5.109 2/2 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4785074605318491
Summary (total time 9.08s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D15551835

Pulled By: zafartahirov

fbshipit-source-id: 481a7df4b8a88e485437e1596eefb08d5e6766fa
2019-07-10 21:31:24 -07:00
ad634875d0 Mark Unpickler data ptr arg as const
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22690

Differential Revision: D16184299

Pulled By: mrshenli

fbshipit-source-id: 332954028533952dad01df03eca8e95bf6fe67a9
2019-07-10 20:07:13 -07:00
4240220926 Revert D16183577: Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Differential Revision:
D16183577

Original commit changeset: f86838c407db

fbshipit-source-id: bbf53ce52a20b1e90b1fe522d73e558d8044c4ba
2019-07-10 18:29:22 -07:00
1ecc945ab2 Revert D15998762: [jit] Give functions qualified names
Differential Revision:
D15998762

Original commit changeset: bc2b734f626a

fbshipit-source-id: a118cc4e9a34233279e8380529a8d8120a25839d
2019-07-10 16:10:28 -07:00
a1ca32409f Revert D15998758: [jit] refactor self to be a class again
Differential Revision:
D15998758

Original commit changeset: 14bad87bb6e4

fbshipit-source-id: f2c29974d4afc4d8f88a36e9c266e6d5a22a6191
2019-07-10 16:10:24 -07:00
e6eb17303f Revert D16184799: [jit] make CompilationUnit::define return defined functions
Differential Revision:
D16184799

Original commit changeset: 9f77a7ca2223

fbshipit-source-id: a0e08220d924a6ca55bf2f1f77754553d0133595
2019-07-10 16:10:20 -07:00
fffa7200c1 fixing lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22703

Differential Revision: D16188326

Pulled By: izdeby

fbshipit-source-id: 72e6b6f957068c3995010a1b811f24cd2304ff6f
2019-07-10 16:02:21 -07:00
67c634d58e add a comment to native_functions explaining softmax interfaces (#22651)
Summary:
Address the review comment made by gchanan here:

https://github.com/pytorch/pytorch/pull/22456#discussion_r300715866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22651

Differential Revision: D16181828

Pulled By: nairbv

fbshipit-source-id: 0d41a9024c2664298c281e198a997be73e7f8499
2019-07-10 15:34:29 -07:00
0196e0bafb add line numbers to jit_log.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22630

Differential Revision: D16172090

Pulled By: Krovatkin

fbshipit-source-id: 26cdb0077a0bfbf9981e39359472f3251546db53
2019-07-10 15:28:29 -07:00
c49a71f91f make CompilationUnit::define return defined functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22667

Test Plan: Imported from OSS

Differential Revision: D16184799

Pulled By: suo

fbshipit-source-id: 9f77a7ca2223237fbcb4b12a4734b7d334f7be13
2019-07-10 15:19:11 -07:00
ee9c8a75f4 refactor self to be a class again (#22207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22207
ghimport-source-id: 36ee8bd17411a2e220665ad2a27364653061070e

Test Plan: Imported from OSS

Differential Revision: D15998758

Pulled By: suo

fbshipit-source-id: 14bad87bb6e44bf1a43ae86339d8cc7b311c76dd
2019-07-10 15:19:07 -07:00
c0674cebf1 Give functions qualified names (#22206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22206
ghimport-source-id: d453219d907e048f24eb7f63c096b2c300307c83

Test Plan: Imported from OSS

Differential Revision: D15998762

Pulled By: suo

fbshipit-source-id: bc2b734f626ab07f97dc50ddf1b021e8b46de312
2019-07-10 15:19:03 -07:00
86fc417147 Move Quantization Models to common_quantization (#22706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22706

Moved the models used for quantization test from the test_quantization.py file to common_quantization.py

Reviewed By: jerryzh168

Differential Revision: D16189865

fbshipit-source-id: 409b43454b6b3fe278ac16b1affb9085d6ed6835
2019-07-10 15:05:49 -07:00
ebafa2e15f Turn on USE_DIRECT_NVRTC in fbcode again. (#22685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22685
ghstack-source-id: 86247780

Reviewed By: bddppq

Differential Revision: D16182352

fbshipit-source-id: fc51aa7c1112904b8cccd055dc87e10c836cf2fb
2019-07-10 15:05:45 -07:00
edeb4dbdcb register __getitem__ builtin
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22276

Test Plan: Imported from OSS

Differential Revision: D16060595

Pulled By: wanchaol

fbshipit-source-id: e1e27d6be8d62fc1a841860a783aff108980d9d3
2019-07-10 14:53:35 -07:00
368dbb9ab3 Fix a FIXME in test_nn (#22675)
Summary:
https://github.com/pytorch/pytorch/issues/17262 is already resolved, so this should pass now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22675

Differential Revision: D16188003

Pulled By: zou3519

fbshipit-source-id: 32693229a0590b274ed1bf76b815f17e77c2d3ea
2019-07-10 13:12:50 -07:00
00df49c984 Fix Trace inlining of graphs with optional inputs (#22686)
Summary:
Previously in tracing when we called a script function we would inline the graph and set the graph inputs equal to the types the graph was invoked with.

This breaks for optional arguments invoked with None since we rely on None being set to Optional[T] in schema matching.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22686

Differential Revision: D16186372

Pulled By: eellison

fbshipit-source-id: e25c807c63527bf442eb8b31122d50689c7822f5
2019-07-10 12:57:06 -07:00
3e3e6ee335 Add common_quantized test case utilities (#22694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22694

Move quantization and quantized utility functions for testing to common_quantized.py and common_quantization.py.  Addditionally, add a quantized test case base class which contains common methods for checking the results of quantization on modules.  As a consequence of the move, fixed the import at the top of test_quantized.py, and test_quantization to use the new utility

Reviewed By: jerryzh168

Differential Revision: D16172012

fbshipit-source-id: 329166af5555fc829f26bf1383d682c25c01a7d9
2019-07-10 12:23:36 -07:00
7750cae722 Refactor and improve randperm tests.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121

Test Plan: Imported from OSS

Differential Revision: D16153794

Pulled By: li-roy

fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9
2019-07-10 12:23:33 -07:00
32709af8f4 Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22103

Test Plan: Imported from OSS

Differential Revision: D16153585

Pulled By: li-roy

fbshipit-source-id: 0801b91e7b352c8de8fdfbe929be85d69182b8da
2019-07-10 12:23:29 -07:00
0f7c3710dd Support Half type in randperm.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102

Test Plan: Imported from OSS

Differential Revision: D16153586

Pulled By: li-roy

fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff
2019-07-10 12:23:25 -07:00
9c4c9c3af0 Delegate Python ~ (invert operator) to Tensor.bitwise_not().
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22326

Test Plan: Imported from OSS

Differential Revision: D16183577

Pulled By: colesbury

fbshipit-source-id: f86838c407db4ded9ce70998bf1ab1ffd75b3b58
2019-07-10 12:17:52 -07:00
574e808680 Add a bitwise NOT operator for integer and Boolean types (CUDA).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320

Test Plan: Imported from OSS

Differential Revision: D16183578

Pulled By: colesbury

fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034
2019-07-10 12:17:48 -07:00
e2dc1fc715 Add a bitwise NOT operator for integer and Boolean types (CPU).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22283

Test Plan: Imported from OSS

Differential Revision: D16183576

Pulled By: colesbury

fbshipit-source-id: 2e539fab8ff885dddb9bff334d1d784b28d65b8f
2019-07-10 12:17:44 -07:00
mal
58e20638f7 Refactoring _wrap_outputs to remove python dependence.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631

Test Plan:
test suite

Imported from OSS

Differential Revision: D16185040

fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b
2019-07-10 12:12:16 -07:00
ec1b669d23 fix dce over loops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22632

Test Plan: Imported from OSS

Differential Revision: D16184469

Pulled By: suo

fbshipit-source-id: b7cc2d20a7dd8b287e1b6128ddb70d3936032a7e
2019-07-10 12:03:19 -07:00
9b8d771733 skip import nccl and gloo_gpu in cpu machine (#22522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22522

Skip importing nccl and gloo_gpu modules in cpu machine

Reviewed By: bddppq

Differential Revision: D16115827

fbshipit-source-id: 329b7a0bb5eccb78c9e772bdab5db7c79b546d55
2019-07-10 11:56:56 -07:00
b984b0ab4b fix print (#22689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22689

att

Reviewed By: Lucaskabela

Differential Revision: D16184260

fbshipit-source-id: 1a6ad51a37918d0c81d6e3baa0ca0baa32cb9673
2019-07-10 11:26:34 -07:00
f81395b3e3 Enable more passes in ProfilingGraphExecutor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22079

Differential Revision: D16119322

Pulled By: Krovatkin

fbshipit-source-id: 301fcc42d0e1f031d9de5bcd9679fb8c2d742fef
2019-07-10 10:44:18 -07:00
10c60b601a Added Bfloat16 tensor for cpu with very limited support (#21860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21860
ghimport-source-id: 5290755b63033cdfdeb911a4ecf4aa282b3db02d

Test Plan: Imported from OSS

Differential Revision: D15856091

Pulled By: izdeby

fbshipit-source-id: 54e7e17be1b5c5a2e80a41feaeaeba75dbb8108f
2019-07-10 09:08:52 -07:00
6eb3969ac7 keep reuqires_grad unchanged after converting bn to syncbn (#22569)
Summary:
After converting BN layers to SyncBN layers, the function will set all `requires_grad = True` regardless of the original requires_grad states. I think it is a bug and have fixed it in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22569

Differential Revision: D16151647

Pulled By: zou3519

fbshipit-source-id: e2ad1886c94d8882485e7fb8be51ad76469ecc67
2019-07-10 08:38:04 -07:00
cbb0b8166d Revert D16161144: [pytorch][PR] Add traces to LowerGradOf and SpecializeAutoGrad
Differential Revision:
D16161144

Original commit changeset: 9e206fcfb179

fbshipit-source-id: 8f9eecb5cd6ca715bd0c647c32cf77cd9d88e6ac
2019-07-10 06:55:01 -07:00
3a8d7463bd Enabled BFloat16 storage (#21523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523
ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0

Test Plan: Imported from OSS

Differential Revision: D15819368

Pulled By: izdeby

fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c
2019-07-09 21:51:06 -07:00
932ec8aa9f Updating submodules
Reviewed By: zpao

fbshipit-source-id: f5636ab0457c1b2e15df95a5677a7194978d9cd0
2019-07-09 21:39:57 -07:00
e72b617eb5 Intoducing bfloat16 type (#21522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21522
ghimport-source-id: 4803f197ec04938501fdb10c1741280331c349d2

Test Plan: Imported from OSS

Differential Revision: D15819369

Pulled By: izdeby

fbshipit-source-id: 46408dc316a5c4dc644a736dc42da2422b34bcb9
2019-07-09 21:14:10 -07:00
de5a481c6e add forward declaration in stateful dataset (#22562)
Summary:
Addressing potential dependency issue by adding forward declaration for OutputArchive/InputArchive.

This change follows the same pattern in base.h in 'torch/csrc/api/include/torch/data/samplers/base.h'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22562

Differential Revision: D16161524

Pulled By: soumith

fbshipit-source-id: d03f8a2ece5629762f9fa8a27b15b0d037e8f07b
2019-07-09 16:41:56 -07:00
3cf5f22f02 Enable C2 operators running with {cpu, gpu} * {forward, backward} (#22664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22664

This diff enables c2 operators to run the combination of {cpu, gpu} * {forward, backward}.

Reviewed By: hl475

Differential Revision: D15781789

fbshipit-source-id: e9843e3c46ea144042829860638d406f6a33792b
2019-07-09 16:41:53 -07:00
95a5da175d change c2 bench to use new tensor creation interface (#22663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22663

as title

Reviewed By: hl475

Differential Revision: D15744502

fbshipit-source-id: 441ab9fb7580ca87c3f2027d0a63ba18b8d35016
2019-07-09 16:41:49 -07:00
e1fdf8a46f Add comments about adding new build options. (#22641)
Summary:
Also revert the change of cmake.py in
c97829d7011bd59d662f6af9c3a0ec302e7e75fc . The comments are added to
prevent future similar incidents in the future (which has occurred a couple of times in the past).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22641

Differential Revision: D16171763

Pulled By: ezyang

fbshipit-source-id: 5a65f9fbb3c1c798ebd25521932bfde0ad3d16fc
2019-07-09 16:41:46 -07:00
e2216ada65 Properly formats errors rising up from C++ extension compilation (#22445)
Summary:
Here's a C++ extension with a missing semicolon:
```python
torch.utils.cpp_extension.load_inline('test', 'int main() { return 0 }')
```
which currently generates this error
```
RuntimeError: Error building extension 'test_v6': b'[1/2] c++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
-isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o\nFAILED: main.o \nc++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test_v6 -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o\n/tmp/torch_extensions/test/main.cpp: In
function \xe2\x80\x98int main()\xe2\x80\x99:\n/tmp/torch_extensions/test/main.cpp:2:23:
error: expected \xe2\x80\x98;\xe2\x80\x99 before \xe2\x80\x98}\xe2\x80\x99 token\n int
main() { return 0 }\n                       ^\nninja: build stopped: subcommand failed.\n'
```

After this PR, the error is
```
RuntimeError: Error building extension 'test': [1/2] c++ -MMD -MF main.o.d -
DTORCH_EXTENSION_NAME=test -DTORCH_API_INCLUDE_EXTENSION_H -isystem
/opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o
FAILED: main.o
c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=test -
DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-
packages/torch/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-
packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC
 -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c
/tmp/torch_extensions/test/main.cpp -o main.o
/tmp/torch_extensions/test/main.cpp: In function ‘int main()’:
/tmp/torch_extensions/test/main.cpp:2:23: error: expected ‘;’ before ‘}’ token
 int main() { return 0 }
                       ^
ninja: build stopped: subcommand failed.
```
which is a lot easier to read.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22445

Differential Revision: D16094205

Pulled By: ezyang

fbshipit-source-id: 21043344aac260dc3e4e04d6a42898507bb840e4
2019-07-09 16:41:42 -07:00
50901be9fb Add traces to LowerGradOf and SpecializeAutoGrad
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22599

Differential Revision: D16161144

Pulled By: Krovatkin

fbshipit-source-id: 9e206fcfb1796e9448e80f178b75d0c277bd348f
2019-07-09 16:41:39 -07:00
0c2cd93e43 Avoid potential extra copy in _lu_with_info_cuda (#22634)
Summary:
No need to `clone` if the expanded size matches original size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22634

Differential Revision: D16171091

Pulled By: ezyang

fbshipit-source-id: 3d8f116398f02952488e321c0ee0ff2868768a0c
2019-07-09 16:41:36 -07:00
45aad2e680 change unary, pool, max ops to use new interface (#22661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22661

as title

Reviewed By: hl475

Differential Revision: D16170825

fbshipit-source-id: d80944224b8717e7aa35980907ff48e587b85217
2019-07-09 16:41:32 -07:00
2b2fe525b9 introduce a new interface to add a list of operators (#21209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21209

This diff introduces a new interface to add a list of operators. Here are the steps to add ops using this interface:

- create op_list:
```unary_ops_list = op_bench.op_list(
    attr_names=["op_name", "op_function"],
    attrs=[
         ["abs", torch.abs],
         ["abs_", torch.abs_],
   ],
)
```
-  create a bench class:
```
class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
    def init(self, M, N, op_function):
        self.input_one = torch.rand(M, N)
        self.op_func = op_function

    def forward(self):
        return self.op_func(self.input_one)
```
- 3. register those ops
``` op_bench.generate_pt_tests_from_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
 ```

Reviewed By: zheng-xq

Differential Revision: D15514188

fbshipit-source-id: f09b359cab8175eeb8d51b3ad7bbbcfbc9f6430f
2019-07-09 16:41:29 -07:00
164388150a fix lint (#22654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22654

att

Reviewed By: bddppq

Differential Revision: D16168219

fbshipit-source-id: db1a5e2161e7be70b2f6e6b4beaa27ea91f853f2
2019-07-09 16:41:26 -07:00
8a233b99cb Report errors through call stack (#22280)
Summary:
The error for `test_error_stack_module`:

```
Traceback (most recent call last):
  File "../test.py", line 35, in <module>
    scripted = torch.jit.script(M())
  File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1119, in script
    return _convert_to_script_module(obj)
  File "/home/davidriazati/other/pytorch/torch/jit/__init__.py", line 1825, in _convert_to_script_module
    raise e
RuntimeError:

d(int x) -> int:
Expected a value of type 'int' for argument 'x' but instead found type 'str'.
:
at ../test.py:11:12
def c(x):
    return d("hello") + d(x)
           ~ <--- HERE

'c' is being compiled since it was called from 'b'
at ../test.py:14:12
def b(x):
    return c(x)
           ~~~ <--- HERE

'b' is being compiled since it was called from 'forward'
at ../test.py:22:16
    def forward(self, x):
        return b(x)
               ~~~ <--- HERE

'forward' is being compiled since it was called from 'forward'
at ../test.py:31:20
    def forward(self, x):
        return x + self.submodule(x)
                   ~~~~~~~~~~~~~~~~ <--- HERE
```

This also unifies our error reporting in the front end with `ErrorReport`

TODO
* Include module names in message, #22207 should make this easy

](https://our.intern.facebook.com/intern/diff/16060781/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22280

Pulled By: driazati

Differential Revision: D16060781

fbshipit-source-id: c42968b53aaddb774ac69d5abbf7e60c23df8eed
2019-07-09 16:41:22 -07:00
13d58fd9f5 README for the quantized op creation (#22165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22165

Workflow description for the quantized ops design.

Reviewed By: jerryzh168

Differential Revision: D15975977

fbshipit-source-id: ef73b172f609adef149c157c404bb452b5457a9f
2019-07-09 16:41:19 -07:00
dd4982e287 Cleanup integer sign warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22560

Test Plan: Imported from OSS

Differential Revision: D16151479

Pulled By: bwasti

fbshipit-source-id: a54139d8f95ed964530862f96723e4365724b2da
2019-07-09 16:41:16 -07:00
046c4589df lu: When not using pivoting, return the identity permutation instead of zeros (#22242)
Summary:
Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :)

These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option.

Here's a MWE to reproduce the broken behavior, and my fix.

```python
torch.manual_seed(0)

n = 4
L = torch.randn(n,n)
A = L.mm(L.t()).unsqueeze(0)
b = torch.randn(1, n)

A_lu_cpu = torch.lu(A)
A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False)
A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True)
print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot)
print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot)

x_cpu = b.lu_solve(*A_lu_cpu)
x_cuda_nopivot = b.cuda().lu_solve(*A_lu_cuda_nopivot)
x_cuda_nopivot_fixed = b.cuda().lu_solve(
    A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int())
x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot)

print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot)
```

Output:

```
A_lu_cuda_nopivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

-----

A_lu_cuda_pivot
 (tensor([[[ 2.8465, -0.7560,  0.8716, -1.7337],
         [-0.2656,  5.5724, -1.1316,  0.6678],
         [ 0.3062, -0.2031,  1.4206, -0.5438],
         [-0.6091,  0.1198, -0.3828,  1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32))

(tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]),
 tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'),
 tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242

Differential Revision: D16049334

Pulled By: ezyang

fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d
2019-07-09 11:16:50 -07:00
7fcfed19e7 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:43 -07:00
5040d52a5a torch.quantization conversion utilities, observers for eager mode quantization (#22010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22010

torch.quantization module with observers and conversion routines

Reviewed By: zafartahirov

Differential Revision: D15554183

fbshipit-source-id: 05a3fabe28dd701978b8ecebf5bfc3a4c044ba5c
2019-07-09 10:51:38 -07:00
073fa6f411 add GRAPH_UPDATE logging to guard_elimination.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22497

Differential Revision: D16165106

Pulled By: Krovatkin

fbshipit-source-id: aeb48d81d92c71f7038903b1656d760b6b95c562
2019-07-09 10:09:35 -07:00
a3346e100e Performance improvements for depthwise convolutions in FP16 (#22302)
Summary:
This PR activates faster depthwise convolution kernels for Volta and Turing GPUs using cudnn >= 7600.
The script to benchmark the current PyTorch master branch and this PR branch can be found [here](https://gist.github.com/ptrblck/4590cf20721d8f43296c9903abd4a774).
(50 warmup iterations, 1000 iterations for timing)

I've used https://github.com/pytorch/pytorch/issues/3265 to create a similar benchmark and added a few additional setups.
Since the results are quite long, I've uploaded them in a spreadsheet [here](https://docs.google.com/spreadsheets/d/13ByXcqg7LQUr3DVG3XpLwnJ-CXg3GUZJ3puyTMw9n2I/edit?usp=sharing).
Times are given in ms per iteration.
We've benchmarked this PR on a DGX1 using V100 GPUs.

The current workload check in `check_cudnn_depthwise_workload` is quite long and can be moved to another file, if wanted.

CC ngimel (Thanks for the support while benchmarking it ;) )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22302

Differential Revision: D16115057

Pulled By: ezyang

fbshipit-source-id: bad184658518e73b4d6b849d77e408f5a7a757de
2019-07-09 07:28:31 -07:00
31d821e267 Move thnvrtc and DynamicLibrary to ATen (#22362)
Summary:
Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229.

`DynamicLibrary` is also moved as it is used in the stub code, and seems general enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362

Differential Revision: D16131787

Pulled By: ezyang

fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73
2019-07-09 07:28:27 -07:00
74883d4865 Fix typos in gradcheck error message
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22357

Differential Revision: D16065935

Pulled By: ezyang

fbshipit-source-id: f131655eaca27f9df4cd6c511faabf0b8f2bf0de
2019-07-09 07:12:56 -07:00
92e8d04098 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 488
LARGE: 29
XXLARGE: 2

Updated actions:
From MEDIUM to LARGE: 227
From XLARGE to MEDIUM: 1
From XLARGE to LARGE: 1
From XLARGE to XXLARGE: 1
From LARGE to MEDIUM: 2
From LARGE to XLARGE: 2

Differential Revision: D16161669

fbshipit-source-id: 67a4e0d883ca3f1ca3185a8285903c0961537757
2019-07-09 05:24:19 -07:00
9fb4386c14 Add a higher-level log traces to DCE
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22511

Differential Revision: D16153694

Pulled By: Krovatkin

fbshipit-source-id: 5edbc04bdccfa89f5ad0bf37f51e1bd2cb28962a
2019-07-08 21:55:02 -07:00
5395db22a4 Typo fixed in documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22600

Differential Revision: D16156989

Pulled By: mrshenli

fbshipit-source-id: e491b083d872eaceb829028dadbab2e28ecfc785
2019-07-08 19:29:07 -07:00
b5860b5572 torchvision version changed to the latest one (#22598)
Summary:
Version changed to 487c9bf4b7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22598

Differential Revision: D16155471

Pulled By: ifedan

fbshipit-source-id: 2d54883c91add28c0f076858f292363eb95340a9
2019-07-08 17:13:59 -07:00
738aba171b use caffe2_dnnlowp_force_slow_path in FC (#22143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22143

Like Conv DNNLOWP operator, allow FC to run the slow path to debug numerical issues caused by Intel's int8 instruction that does horizontal addition of 2 int8 multiplication results in 16 bit

Reviewed By: hx89

Differential Revision: D15966885

fbshipit-source-id: c6726376a3e39d341fd8aeb0e54e0450d2af8920
2019-07-08 17:01:04 -07:00
905b9a89b2 fix uninitialized variable in BailOutInserter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22596

Differential Revision: D16156883

Pulled By: Krovatkin

fbshipit-source-id: 8926262a2d3115f34400c9ebb0c98c540e1cc623
2019-07-08 16:45:51 -07:00
c97829d701 Adding FC and Relu QNNPACK ops to C10 registry (#22174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22174

This is a preliminary change outlining the approach we plan to follow to integrate QNNPACK operators into the pytorch backend. The operators will not be made visible to the user in the python world, so ultimately we will have a function that calls qnnpack backend based on the environment being run on.

The goal of the project is to integrate QNNPACK library with PyTorch to achieve good performance for quantized mobile models.

Reviewed By: ljk53

Differential Revision: D15806325

fbshipit-source-id: c14e1d864ac94570333a7b14031ea231d095c2ae
2019-07-08 14:21:42 -07:00
0e7b65e48a Convert VariableVersion counter to intrusive_ptr, saving a memory allocation on every Tensor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22514

Differential Revision: D16114714

fbshipit-source-id: 441043d18938710869b64cb67884f49cd6060727
2019-07-08 13:40:58 -07:00
0c1ecf19e1 Simplify the flow control in div_kernel_cuda. (#22555)
Summary:
Some duplicated code is removed. It also becomes clear that there is only one special case `div_kernel_cuda` is handling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22555

Differential Revision: D16152091

Pulled By: zou3519

fbshipit-source-id: bb875370077c1f84efe4b766b3e1acc461e73e6c
2019-07-08 12:13:10 -07:00
478d480d37 Add Module.requires_grad_ (#22576)
Summary:
addresses https://github.com/pytorch/pytorch/issues/20241
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22576

Differential Revision: D16149314

Pulled By: zou3519

fbshipit-source-id: 1cc4c1ec084df30e00e9ae73ce1a53494a034d5c
2019-07-08 12:13:07 -07:00
456d27dff0 Update module.h (Fix a grammatical error of the comment in line 233) (#22548)
Summary:
Fix a grammatical error of the comment in line 233.
change from " Returns an `OrderedDict` of he submodules of this `Module`"
to " Returns an `OrderedDict` of the submodules of this `Module`"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22548

Differential Revision: D16134534

Pulled By: zou3519

fbshipit-source-id: 33b1dd0fbc3a24bef99b6e0192566e2839292842
2019-07-08 11:51:49 -07:00
3a12520844 Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473)
Summary:
As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach.

Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked.

This supersedes https://github.com/pytorch/pytorch/pull/22418.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473

Differential Revision: D16099042

Pulled By: yf225

fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972
2019-07-08 11:31:10 -07:00
304552b61a Enabled masked fill and scatter for bool tensors (#22491)
Summary:
Enabled masked_fill, masked_fill_, masked_scatter_, masked_scatter on bool tensors.
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22491

Differential Revision: D16108817

Pulled By: izdeby

fbshipit-source-id: 8b1808f41d7a4f65fe6d3797a3c83b2dac3446c7
2019-07-08 10:49:40 -07:00
a48cf8f52d Fixed RNNImplBase reset and flat_weights methods to handle bidirectional flag correctly (#22493)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/19545:
Changed torch/csrc/api/src/nn/modules/rnn.cpp to be consistent with torch/nn/modules/rnn.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22493

Differential Revision: D16111433

Pulled By: pbelevich

fbshipit-source-id: edfa41e8a9889d64918998dc7c46b8763fdf5765
2019-07-08 10:34:04 -07:00
595e344769 Add type stubs to import 'nn' modules (#22411)
Summary:
Forgot to mirror the `nn/ __init__.py` semantics in the new `nn` type stub.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22411

Differential Revision: D16149798

Pulled By: ezyang

fbshipit-source-id: 0ffa256fbdc5e5383a7b9c9c3ae61acd11de1dba
2019-07-08 09:22:37 -07:00
9bafe5d4da Fix torch.normal with CUDA tensors (#22533)
Summary:
`addcmul_out` overwrote the samples, which led to constant values being output by `torch.normal`.

Changelog:
- Replace the `addcmul_out` calls with combo of inplace `mul` and `add` and justification for this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22533

Test Plan:
- Enable tests for test_normal on all devices

Fixes https://github.com/pytorch/pytorch/issues/22529

Differential Revision: D16141337

Pulled By: ezyang

fbshipit-source-id: 567a399042e0adcd154582f362318ce95a244c62
2019-07-08 08:27:38 -07:00
80e2fab952 Deprecate and set a date for removing NO_* and WITH_* (user) build options (#22474)
Summary:
Currently specifying different build options in respect to the "USE_"
series is in quite a disarray. There are a lot of build options that
accept three variants: USE_OPTION, WITH_OPTION, and NO_OPTION. Some
build options only accept USE_ and NO_ variant. Some accept only USE_.
This inconsistency is quite confusing and hard to maintain.

To resolve this inconsistency, we can either let all these build options
support all three variants, or we only support the USE_ variant.

This commit makes a step to the latter choice, i.e., deprecates and sets
a date for removing the NO_ and WITH_ variants and keeps only the
USE_ variant. This is likely better than the former solution because:

- NO_ and WITH_ variants are not documented.
- CMakeLists.txt only has the USE_ variants for relevant build options
  defined. It would be a surprise that when user pass these variables to
  CMake during rebuild and find them ineffective.
- Multiple variants are difficult to maintain.
- The behavior is confusing if more than one variant is passed. For
  example, what to be expected if one sets "NO_CUDA=1 USE_CUDA=1"?

The downside is that this will break backward compatibility for existing
build scripts in the future (if they used the undocumented build
options).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22474

Differential Revision: D16149396

Pulled By: ezyang

fbshipit-source-id: 7145b88ad195db2051772b9665dd708dfcf50b7d
2019-07-08 08:22:08 -07:00
43d36415b9 torch.utils.data.Dataloader: documentation about RNG state consumption (#22540)
Summary:
the outcome from the pytorch forum issue: https://discuss.pytorch.org/t/dataloader-problem-problem-arises-when-shuffle-true/45631

The discussion is here: https://github.com/pytorch/pytorch/pull/20749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22540

Differential Revision: D16131777

Pulled By: ezyang

fbshipit-source-id: 566deda1b44dc7fae54250e9b508d120851a2848
2019-07-08 08:22:04 -07:00
ce8c9d9bd5 Fix cuda detection script (#22527)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22507
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22527

Differential Revision: D16126220

Pulled By: ezyang

fbshipit-source-id: eb05141282b0f058324da1b3d3cb34566f222a67
2019-07-08 07:06:59 -07:00
d4464d3418 Use system locale in collect_env.py (#22579)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22570.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22579

Differential Revision: D16147304

Pulled By: soumith

fbshipit-source-id: db73447bffbfdf54f7b830447d4b9584f363f05f
2019-07-07 20:55:31 -07:00
d48cbd62cd Fix spectral_norm load_state_dict with strict=False (#22545)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21251

also fixes some missing hook removals.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22545

Differential Revision: D16139506

Pulled By: soumith

fbshipit-source-id: 552a9f9f91be328a47ee8f1e1d29c1f59b0ebca3
2019-07-07 19:08:48 -07:00
94bd5ddf7f Add some essentials for building c++ extensions on Windows (#22563)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22489.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22563

Differential Revision: D16142615

Pulled By: ezyang

fbshipit-source-id: d7c27a874f788dd27065fad6699485e4a6372ec4
2019-07-06 19:29:25 -07:00
9db7bc8bc7 fix uninitialized variable warning (#22477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477

There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together.

Reviewed By: hx89

Differential Revision: D16100211

fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea
2019-07-06 00:36:45 -07:00
42c6ea5faa ONNX Export Topk with Dynamic k (+ add test cases)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21104

Differential Revision: D16061592

Pulled By: houseroad

fbshipit-source-id: 855b310a138fdde9c25869ffe9f127189dc2eaf5
2019-07-05 23:46:36 -07:00
221af09ca7 Move GradMode / AutoGradMode / NoGradGuard to ATen core (#18573)
Summary:
After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen.

Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning:
Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we *don't* want the tensor to accumulate gradients, but *still* want the Variable to be treated as a Variable. Here is one example:
```python
#  torch/tensor.py
with torch.no_grad():
   ...
   new_tensor = self.new()    # `at::GradMode` is false at this point
   ...
```
```cpp
// tools/autograd/templates/python_variable_methods.cpp
static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs)
{
  ...
  // if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error)
  return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs));
  ...
}
```
For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18573

Differential Revision: D16134413

Pulled By: yf225

fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a
2019-07-05 23:41:37 -07:00
39a4ec4141 Make device_of take tensor by const ref
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22556

Test Plan: Imported from OSS

Differential Revision: D16137540

Pulled By: jamesr66a

fbshipit-source-id: 8a6be6edd602c53edc954508ea27d8a6071bd964
2019-07-05 23:34:43 -07:00
7730346853 Make shuffling optional in DistributedSampler (#22479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22479

In some cases, for example, when we training on CTR data, we would like to start training from old samples and finish on new recent samples.

This diff add the option to disable the shuffling in DistributedSampler to accommodate this use case.

Reviewed By: soumith

Differential Revision: D16100388

fbshipit-source-id: 35566581f5250040b2db5ec408a63037b47a9f5d
2019-07-05 18:56:28 -07:00
9e1ae4b264 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 8932f509ab9b14988428a1b9a42d3388ff5a4ad5
2019-07-05 18:36:03 -07:00
577042a3cc Better Constant Propagation through Tuples (#22561)
Summary:
Replaces https://github.com/pytorch/pytorch/pull/21501 because ghimport had errors when i tried to import the stack that i couldn't figure out :'(

has the two commits that were previously accepted and the merge commit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22561

Differential Revision: D16135743

Pulled By: eellison

fbshipit-source-id: f0a98842ccb334c7ceab04d1437e09dc76be0eb1
2019-07-05 18:06:46 -07:00
a09150adc0 Deprecate untyped Dicts (#22516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22516

Force anybody creating an untyped Dict to call c10::impl::deprecatedUntypedDict().
This should hopefully make it clear that this is not public API and prevent people from using it.

Differential Revision: D16115215

fbshipit-source-id: 2ef4cb443da1cdf4ebf5b99851f69de0be730b97
2019-07-05 18:00:13 -07:00
595d2dbb4d Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 1a6c249837151564f48f675ced6a221ec739aae9
2019-07-05 15:39:56 -07:00
91706d1044 Primitive Jit Logging
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22278

Differential Revision: D16134598

Pulled By: Krovatkin

fbshipit-source-id: e64b14d0d68801189fc78c059a4e8b322acce3fa
2019-07-05 15:27:38 -07:00
ed60d9fcf9 List/Dict remember and check their element type (#22005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005

When a Dict or List is created with type information, it will remember that.
If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type.

Differential Revision: D15914462

fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7
2019-07-05 15:17:51 -07:00
mal
042da2171e Skip test_slogdet_sign if LAPACK library is not installed
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22551

Test Plan:
ran test locally

Imported from OSS

Differential Revision: D16132182

fbshipit-source-id: 5b9efbf883efa66c4d8b7c400bdb804ac668a631
2019-07-05 11:57:24 -07:00
3507eaf3ea Add clone() implementation for QTensor (#22510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22510

Added a new function to implement clone operation on quantized tensors. Also added a test case which can be tested as shown in test plan.

This change is required to be able to call torch.jit.trace on quantized models.
Clone implementation calls copy_ on QTensor internally.

Differential Revision: D16059576

fbshipit-source-id: 226918cd475521b664ed72ee336a3da8212ddcdc
2019-07-05 11:24:53 -07:00
mal
0140a756d8 Prioritize reentrant tasks and execute them recursively until close to limit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22397

Test Plan:
Added test for reentrant backwards with checkpoint and a test for a recursive backwards function (which should fail if we run all the reentrant tasks recursively in the same thread) and for testing priority of reentrant tasks.
~~Will add a test for priority of reentrant tasks in future pr.~~

Imported from OSS

Differential Revision: D16131955

fbshipit-source-id: 18301d45c1ec9fbeb566b1016dbaf7a84a09c7ac
2019-07-05 08:51:06 -07:00
e5d640341f Set stream for softmax kernel launch (#22470)
Summary:
Currently, the **stream** parameter is not set when launching these two kernels: softmax_warp_forward() and softmax_warp_backward(), i.e. the kernels are always put on the default stream, which may fail to respect the stream that was set previously. Add **at::cuda::getCurrentCUDAStream()** as a launch argument to fix this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22470

Differential Revision: D16115051

Pulled By: izdeby

fbshipit-source-id: 38b27e768bb5fcecc1a06143ab5d63b0e68a279e
2019-07-05 07:33:55 -07:00
d2ceab2766 update video input (#22471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22471

update C2 video input with latest augmentation

Reviewed By: HengCV

Differential Revision: D16096127

fbshipit-source-id: bb07394e211cd52b50005d801b6d03250248ea9e
2019-07-05 00:56:33 -07:00
4ba1c4f798 Add the support of handle Bias being nullptr for torch.ops.quantized.fbgemm_conv (#22472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22472

As Title says.

Reviewed By: dskhudia, bddppq

Differential Revision: D16097594

fbshipit-source-id: 7f56b7906dd9c2792e21a8aa553c0b8d05b19012
2019-07-04 19:37:37 -07:00
57dbc79674 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22543

Test Plan: Imported from OSS

Differential Revision: D16127755

Pulled By: suo

fbshipit-source-id: 5f6ba507bf5b671987e2cabf03c2118271800595
2019-07-04 18:26:04 -07:00
b952011bae use save/load for emitFunctionHook
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22314

Test Plan: Imported from OSS

Differential Revision: D16121781

Pulled By: suo

fbshipit-source-id: b376afb082726d78f082a0ff6902c4b501435d4b
2019-07-04 17:12:16 -07:00
bc24593a80 remove unused argument in import.cpp (#22205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22205
ghimport-source-id: afdf13f6515a1352cde4cbb7b45bf01e717bcf4d

Test Plan: Imported from OSS

Differential Revision: D15998763

Pulled By: suo

fbshipit-source-id: 6e5c1c668b9de5e352d2aa7ca27197deb42ca94b
2019-07-04 17:12:12 -07:00
4b9b7d6f03 improvements to QualifiedName (#22204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22204
ghimport-source-id: 319afc622f7137ca9075efefca1a05acedc19a4a

Test Plan: Imported from OSS

Differential Revision: D15998759

Pulled By: suo

fbshipit-source-id: 4534443aef61255af0fa3d2ed1be5e87266e2f2c
2019-07-04 17:12:08 -07:00
f5919dba45 refactoring of module/object (#22203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22203
ghimport-source-id: 6b3807ac8aa53df2fdd770b43d8e54b8f0d69c20

Test Plan: Imported from OSS

Differential Revision: D15998760

Pulled By: suo

fbshipit-source-id: dd51edbcb66561189ae9d94a129434092bcad01b
2019-07-04 17:12:04 -07:00
3b2844eeea Make CompilationUnit own Functions (#22202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22202
ghimport-source-id: de6c963af1df76d2d6357155e64a5913ab879f76

Test Plan: Imported from OSS

Differential Revision: D15998761

Pulled By: suo

fbshipit-source-id: 5414a6424953738d823b265d20dc67dde6e5b2d8
2019-07-04 17:12:00 -07:00
66378c7025 make test context managers exception safe
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22502

Test Plan: Imported from OSS

Differential Revision: D16121783

Pulled By: suo

fbshipit-source-id: e5f991b189261f596355541cc331ef92667bd1a5
2019-07-04 17:11:56 -07:00
2b06e7cd50 add #pragma once to jit headers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22505

Differential Revision: D16119310

Pulled By: Krovatkin

fbshipit-source-id: 8b742411f40d66690ce28726c213741e0c2de618
2019-07-04 11:10:59 -07:00
6f6a680481 remove erase_fork_wait.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22509

Differential Revision: D16119307

Pulled By: Krovatkin

fbshipit-source-id: 3251f594be6d365652b847bdc35dd4f4b62c35e6
2019-07-03 22:47:57 -07:00
799633e4cd move casting ops from prim to aten
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22275

Test Plan: Imported from OSS

Differential Revision: D16060597

Pulled By: wanchaol

fbshipit-source-id: a11d8ad3b037e15bd670cc7cd3fefd4f0abd0bba
2019-07-03 22:22:28 -07:00
97a604ef57 Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456)
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412

Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456

Differential Revision: D16097159

Pulled By: nairbv

fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
2019-07-03 20:03:25 -07:00
10c4b98ade Remove weak script (#22212)
Summary:
* Deletes all weak script decorators / associated data structures / methods
   * In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn`
   * Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l | xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods
* `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand

This should also fix https://github.com/pytorch/pytorch/issues/22212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212

Differential Revision: D15988346

Pulled By: driazati

fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f
2019-07-03 17:28:25 -07:00
b93f29ded3 add JIT path to the benchmark (#22309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22309

This diff enables PT operators to run with JIT mode. Users can control eager and JIT mode using the `use_jit` flag.

In this diff, we are putting operators in a loop and passed it to JIT. One extra step which wraps the operator with the `_consume` op is introduced to avoid dead code elimination optimization in JIT.  With that, the reported time includes the real operator execution time plus the `_consume` (directly return input, nothing else if happening inside) op.

Reviewed By: zheng-xq

Differential Revision: D16033082

fbshipit-source-id: e03be89fd5a505e44e81015dfc63db9cd76fb8a1
2019-07-03 17:18:03 -07:00
29ec4769bb Fix SyncBatchNorm running var update issue (#22248)
Summary:
## Fix https://github.com/pytorch/pytorch/issues/22192

+ change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)`
+ change cuda & cuda head
```cuda
std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean,
                                                        const Tensor& running_var, double momentum, double epsilon, int64_t count) {
                                                        const Tensor& running_var, double momentum, double epsilon, const Tensor& counts)
```
+ change python interface
```python
class SyncBatchNorm(Function):
    def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size):
        ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248

Differential Revision: D16002146

Pulled By: mrshenli

fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956
2019-07-03 17:17:59 -07:00
325ec2327f create tensor based on provided datatype (#22468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22468

as title

Reviewed By: ajauhri

Differential Revision: D15744503

fbshipit-source-id: 050b32dd7f135512385fc04f098c376c664211a9
2019-07-03 17:08:23 -07:00
319ef3bcbb Fix onnx custom op export & add initial test case (#21321)
Summary:
- Fix typo in ```torch/onnx/utils.py``` when looking up registered custom ops.
- Add a simple test case
    1. Register custom op with ```TorchScript``` using ```cpp_extension.load_inline```.
    2. Register custom op with ```torch.onnx.symbolic``` using ```register_custom_op_symbolic```.
    3. Export model with custom op, and verify with Caffe2 backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21321

Differential Revision: D16101097

Pulled By: houseroad

fbshipit-source-id: 084f8b55e230e1cb6e9bd7bd52d7946cefda8e33
2019-07-03 16:59:12 -07:00
9c44f6c723 generate tests based on op metadata (#21432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21432

This diff introduce a new interface to generate tests based on the metadata of operators.

Reviewed By: ajauhri

Differential Revision: D15675542

fbshipit-source-id: ba60e803ea553d8b9eb6cb2bcdc6a0368ef62b1c
2019-07-03 16:48:41 -07:00
2732a5e534 Another dce fix (#22499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22499

Another place where onnx export is running dead code elimination after making the jit graph invalid. Fixing it.

Reviewed By: houseroad

Differential Revision: D16111969

fbshipit-source-id: 5ba80340c06d091988858077f142ea4e3da0638c
2019-07-03 16:37:53 -07:00
d9e15bccb0 Perform weight re-init for embedding table in sparse_lookup.py (#22348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22348

This is the last step of LRU hash eviction weight re-init. This diff checks if there's evicted values in sparse_lookup, if so call op created in D15709866 to re-init the values for indicies in evicted_values. Also created gradient op for the operator. The gradient op just passes the output gradient as input gradient.

Reviewed By: itomatik

Differential Revision: D16044736

fbshipit-source-id: 9afb85209b0de1038c5153bcb7dfc5f52e0b2abb
2019-07-03 10:33:40 -07:00
c9f41e9bc0 Add device guard around MPI operations (#22446)
Summary:
If the current CUDA device is not the same as the device that hosts
the tensor the operation works on then OpenMPI will segfault, as
reported in https://github.com/pytorch/pytorch/issues/21922. This changes adds a device guard for every
operation to ensure the correct device is set.

Fixes https://github.com/pytorch/pytorch/issues/21922.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22446

Differential Revision: D16106823

Pulled By: pietern

fbshipit-source-id: 99d762eb3851c0a0e0b4fe81cf27c1c8d35596cc
2019-07-03 02:01:53 -07:00
abb2e68989 Don't construct a single element array for unary ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22469

Test Plan: Imported from OSS

Differential Revision: D16105794

Pulled By: jamesr66a

fbshipit-source-id: 6bb5a6703c8dba3cda20a6db192d2a15711751a1
2019-07-02 23:29:56 -07:00
7fef0b7b72 Take const refs in TensorIterator::mark_outputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22465

Test Plan: Imported from OSS

Differential Revision: D16105795

Pulled By: jamesr66a

fbshipit-source-id: 22945fc3f02f8450ae6b92492a0f7baad80c5cb5
2019-07-02 23:29:52 -07:00
0d63619414 Deprecate vector/unordered_map again (#22478)
Summary:
The deprecation was temporarily removed in https://github.com/pytorch/pytorch/pull/21999 because it threw warnings on our codebase.

https://github.com/pytorch/pytorch/pull/22162 fixed these internal call sites so we can now re-enable the deprecation.

I checked locally that this PR doesn't add any new warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22478

Differential Revision: D16101342

Pulled By: smessmer

fbshipit-source-id: 378eb208f6a3dd3a28d2efe2e001fd71a9569297
2019-07-02 21:59:05 -07:00
17cc79865d Fix dead code elimination in onnx export (#22476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22476

Dead code elimination assumes a valid jit graph because it checks if operators have side effects.
The onnx export path destroys the jit graph right before calling dead code elimination, but it actually doesn't care about side effects.
We can just call dead code elimination and disable side effect lookup and things should work.

Reviewed By: houseroad

Differential Revision: D16100172

fbshipit-source-id: 8c790055e0d76c4227394cafa93b07d1310f2cea
2019-07-02 21:28:57 -07:00
76e14c1e51 remove caffe2/core dependency from ATen/core/jit_type.h (#22441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22441

This include doesn't seem to be needed. Remove it to simplify mobile build dependency.

Reviewed By: dreiss

Differential Revision: D16088224

fbshipit-source-id: f6aec21655e259726412e26a006d785912436c2a
2019-07-02 21:07:59 -07:00
e210c65097 Add torch.where overload with only condition argument (#21986)
Summary:
Requested in https://github.com/pytorch/pytorch/issues/21798
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21986

Differential Revision: D16081577

Pulled By: zhangguanheng66

fbshipit-source-id: 658c0f451b833aceb1a41ee424c7990eec00bc02
2019-07-02 18:18:15 -07:00
dcd902bdde provide "size" parameter in torch.normal when called with two floats (#20545)
Summary:
This has been requested in https://github.com/pytorch/pytorch/issues/20323

(It is still not exactly the same as NumPy, which allows you to pass tensors at mean/std and broadcast them with size, but the present PR is extremely simple and does the main thing people are asking for)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20545

Differential Revision: D15358736

Pulled By: zhangguanheng66

fbshipit-source-id: 762ea5eab5b8667afbac2df0137df017ba6e413c
2019-07-02 18:18:11 -07:00
bb0f299f27 Update MultiheadAttention module support key/value with different number of features and allow static key/value (#21288)
Summary:
The changes include:

1. Allow key/value to have different number of features with query. It supports the case when key and value have different feature dimensions.
2. Support three separate proj_weight, in addition to a single in_proj_weight. The proj_weight of key and value may have different dimension with that of query so three separate proj_weights are necessary. In case that key and value have same dimension as query, it is preferred to use a single large proj_weight for performance reason. However, it should be noted that using a single large weight or three separate weights is a size-dependent decision.
3. Give an option to use static k and v in the multihead_attn operator (see saved_k and saved_v). Those static key/value tensors can now be re-used when training the model.
4. Add more test cases to cover the arguments.

Note: current users should not be affected by the changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21288

Differential Revision: D15738808

Pulled By: zhangguanheng66

fbshipit-source-id: 288b995787ad55fba374184b3d15b5c6fe9abb5c
2019-07-02 18:06:25 -07:00
d684112ec9 Output sequence probability with CTC beam search, optional multiple output sequences (#21927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21927

Add `OUTPUT_PROB` output to CTCBeamSearchDecoderOp to return a probability for each sequence.

Add argument to output top-k instead of top-1 decoded sequences.

Reviewed By: SuperIRabbit

Differential Revision: D15797371

fbshipit-source-id: 737ca5cc4f90a0bcc3660ac9f58519a175977b69
2019-07-02 17:29:13 -07:00
830c6590ef EraseNumberTypes cleans itself up (#22461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22461

We shouldn't call dead code elimination after EraseNumberTypes because dead code elimination assumes a valid jit graph which EraseNumberTypes just broke.
Let's have it clean up itself isntead.

Reviewed By: houseroad

Differential Revision: D16094656

fbshipit-source-id: f2752277d764e78ab276c57d56b2724b872b136f
2019-07-02 17:06:53 -07:00
a6441c00d6 Remove build variable NCCL_EXTERNAL (#22467)
Summary:
It's always set to equal USE_NCCL, we made Gloo depending on Caffe2 NCCL
build. See 30da84fbe1614138d6d9968c1475cb7dc459cd4b
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22467

Differential Revision: D16098581

Pulled By: ezyang

fbshipit-source-id: f706ec7cebc2e6315bafca013b669f5a72e04815
2019-07-02 15:36:44 -07:00
34f950c800 Create C2 operator to replace values in embedding table (#22279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22279

This new operator is used for embedding table weight re-init. After we get the evicted indices, they will be the rows need reseting in embedding table. Then we can create a 1d tensor with default values, and apply this operator to copy the tensor to all evicted rows in embedding table

Will add gradient op in next diff

Reviewed By: itomatik

Differential Revision: D15709866

fbshipit-source-id: 2297b70a7326591524d0be09c73a588da245cc08
2019-07-02 15:26:22 -07:00
040a4bd914 include conv_op_impl.h from conv_dnnlowp_op.cc (#22458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22458

To make sure template instantiation.

Reviewed By: jianyuh

Differential Revision: D16094183

fbshipit-source-id: 7861df0b303bec42ab80a53477c4b608edebb61d
2019-07-02 15:09:34 -07:00
474dec4b00 Warn on conditions that can trigger cuBLAS sgemm bug (#22034)
Summary:
The sgemm in cuBLAS 9.0 has some issues with sizes above 2M on Maxwell and Pascal architectures. Warn in this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22034

Differential Revision: D15949930

Pulled By: zhangguanheng66

fbshipit-source-id: 0af977ec7900c76328d23898071de9c23778ff8b
2019-07-02 15:09:31 -07:00
f5b3f9ecd9 Remove unnecessary ROCm detection code in Python scripts. (#22464)
Summary:
ROCm is already detected in cmake/public/LoadHIP.cmake. No need to
detect twice. Plus, the Python script read environment variable
ROCM_HOME, but what is really used in CMake scripts is ROCM_PATH -- A
user must specify both environment variables right. Since ROCM_HOME is
undocumented, this commit completely eradicates it.

 ---

ezyang A remake of https://github.com/pytorch/pytorch/issues/22228 because its dependency has been dismissed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22464

Differential Revision: D16096833

Pulled By: bddppq

fbshipit-source-id: fea461e80ee61ec77fa3a7b476f7aec4fc453d5d
2019-07-02 14:46:03 -07:00
e68dc899d1 Fix compiler warnings (#22162)
Summary:
Fix various compiler warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162

Differential Revision: D16085339

Pulled By: smessmer

fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d
2019-07-02 14:12:55 -07:00
53a52f574f infer shape until no more change (#22425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22425

Currently, in bound_shape_inference.cc: InferBoundShapeAndType, we firstly infer ops in the order and then infer inputs of concat in reverse order. In ctr_instagram_model tiny version, concat is right before FC, so we can infer the inputs for concat. But in production version, we found there are some ops between concat and FC(or other ops we know the shape), so the shape of these ops cannot be inferred.
This diff is a tmp solution for this problem: infer shape in order and in reverse order repeatly until no more change.

Reviewed By: yinghai, ipiszy

Differential Revision: D16082521

fbshipit-source-id: d5066509368029c6736dce156030adf5c38653d7
2019-07-02 13:13:55 -07:00
07ef85e326 Add USE_MKLDNN_CBLAS build option. (#19014)
Summary:
MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014

Differential Revision: D16094090

Pulled By: ezyang

fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc
2019-07-02 12:29:54 -07:00
6d5871300b Use concrete types on call sites for Dict/List (#22004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22004

In future, we want all dicts/lists to store information about the types they contain.
This is only possible if the creation API doesn't allow creating lists/dicts without type information.
This diff removes some call sites that don't specify type information and have it specify type information.

Reviewed By: dzhulgakov

Differential Revision: D15906387

fbshipit-source-id: 64766a2534b52c221e8a5501a85eaad13812e7bd
2019-07-02 11:52:35 -07:00
693871ded3 Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360)
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_").  This commit eradicate this issue before it
is made into a stable release.

The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.

 ---

Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360

Differential Revision: D16074509

Pulled By: zou3519

fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
2019-07-02 11:46:13 -07:00
bb07f2d063 Pass LRU hash output evicted_values to SparseLookup (#21389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21389

As titled. To do weight re-init on evicted rows in embedding table, we need to pass the info of the evicted hashed values to SparseLookup, which is the layer model responsible for constructing the embedding table and do pooling.

To pass evicted values, we need to adjust the output record of lru_sparse_hash to include the evicted values, and add optional input to all processors that needs to take in sparse segment. For SparseLookup to get the evicted values, its input record needs to be adjusted. Now the input record can have type IdList/IdScoreList/or a struct of feature + evicted values

Reviewed By: itomatik

Differential Revision: D15590307

fbshipit-source-id: e493881909830d5ca5806a743a2a713198c100c2
2019-07-02 11:27:37 -07:00
869ce89474 use feenableexcept when glibc is available (#22241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387

glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace.

Reviewed By: jspark1105

Differential Revision: D15301095

fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb
2019-07-02 10:49:55 -07:00
e74b0fc87c Fix empty_like for quantized tensors. (#21978)
Summary:
empty_like uses the tensor options of `self`, rather than the passed in tensor options.  This means it messes up variable/tensor types, and ignores specifications like different dtypes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21978

Differential Revision: D15903948

Pulled By: gchanan

fbshipit-source-id: f29946be01c543f888daef2e99fe928e7b7d9d74
2019-07-02 10:28:59 -07:00
7235532df3 Revert D16088193: Refactored math tests to iterate over all math ops
Differential Revision:
D16088193

Original commit changeset: 81b6e536b450

fbshipit-source-id: 81ee8857c3d5353955d75e05203cfbf2d3b8dacd
2019-07-02 10:18:46 -07:00
a845d02cd5 Revert D16088191: Added math.log2 and hypot
Differential Revision:
D16088191

Original commit changeset: 5d80c480243d

fbshipit-source-id: 12ea2617e3af5bf81b1f2a57f8633ca06a99db5b
2019-07-02 10:18:42 -07:00
2dd71b18c4 Fix PoolWindow crash from thread_local (#22405)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19394

See https://developercommunity.visualstudio.com/content/problem/124121/thread-local-variables-fail-to-be-initialized-when.html for context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22405

Differential Revision: D16090822

Pulled By: ezyang

fbshipit-source-id: 9fdd2c272fa7723fb62b906336d2e2620411b12b
2019-07-02 09:48:14 -07:00
7ca7edc307 ONNX Export LayerNorm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22265

Reviewed By: zrphercule

Differential Revision: D16076268

Pulled By: houseroad

fbshipit-source-id: 29b4ecab2fa0dc7250c9d1ad6924903181a66ab2
2019-07-02 09:37:07 -07:00
a4b2f3e213 Implement AdamW optimizer (#21250)
Summary:
# What is this?
This is an implementation of the AdamW optimizer as implemented in [the fastai library](803894051b/fastai/callback.py) and as initially introduced in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). It decouples the weight decay regularization step from the optimization step during training.

There have already been several abortive attempts to push this into pytorch in some form or fashion: https://github.com/pytorch/pytorch/pull/17468, https://github.com/pytorch/pytorch/pull/10866, https://github.com/pytorch/pytorch/pull/3740, https://github.com/pytorch/pytorch/pull/4429. Hopefully this one goes through.
# Why is this important?
Via a simple reparameterization, it can be shown that L2 regularization has a weight decay effect in the case of SGD optimization. Because of this, L2 regularization became synonymous with the concept of weight decay. However, it can be shown that the equivalence of L2 regularization and weight decay breaks down for more complex adaptive optimization schemes. It was shown in the paper [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) that this is the reason why models trained with SGD achieve better generalization than those trained with Adam. Weight decay is a very effective regularizer. L2 regularization, in and of itself, is much less effective. By explicitly decaying the weights, we can achieve state-of-the-art results while also taking advantage of the quick convergence properties that adaptive optimization schemes have.
# How was this tested?
There were test cases added to `test_optim.py` and I also ran a [little experiment](https://gist.github.com/mjacar/0c9809b96513daff84fe3d9938f08638) to validate that this implementation is equivalent to the fastai implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21250

Differential Revision: D16060339

Pulled By: vincentqb

fbshipit-source-id: ded7cc9cfd3fde81f655b9ffb3e3d6b3543a4709
2019-07-02 09:09:10 -07:00
c9a8413306 Numerical stability of embedding kernels (#22401)
Summary:
Address the issue raised in https://github.com/pytorch/pytorch/issues/22377.

The PR https://github.com/pytorch/pytorch/issues/22016 introduces a temporary tensor of weights `grad_weight_per_segment` of the same dtype as the end result, which can be a problem when using `float16`.
In this PR, it now use a `float32` temporary tensor when the input is `float16`.

ngimel, can I get you to review? I think I have fixed the issues you have pointed out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22401

Differential Revision: D16077319

Pulled By: mrshenli

fbshipit-source-id: 7cfad7f40b4d41a244052baa2982ab51bbbd7309
2019-07-02 08:53:22 -07:00
b76877728a Added math.log2 and hypot
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21512

Test Plan: Imported from OSS

Differential Revision: D16088191

Pulled By: Chillee

fbshipit-source-id: 5d80c480243d2644c96df26337cf65918d79443e
2019-07-02 06:28:34 -07:00
3d3d07b7dd Refactored math tests to iterate over all math ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21511

Test Plan: Imported from OSS

Differential Revision: D16088193

Pulled By: Chillee

fbshipit-source-id: 81b6e536b4505178c829a9d925c30cd185b7a706
2019-07-02 06:28:30 -07:00
0ffda97aa4 Make Gloo an optional c10d dependency (#22257)
Summary:
The CMake modifications include removal of some unnecessary paths
(e.g. find_package(CUDA) and friends) that are no longer used since
c10d is always part of the larger torch build. The macro
`C10D_USE_...` was ambiguous and is now removed in favor of only
having top level `USE_...`. The c10d test suite is changed to include
skip annotations for the tests that depend on Gloo as well.

Now, if you compile with `USE_DISTRIBUTED=1` and `USE_GLOO=0` you get
a functioning build for which the tests actually pass.

Closes https://github.com/pytorch/pytorch/issues/18851.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22257

Differential Revision: D16087993

Pulled By: pietern

fbshipit-source-id: 0cea66bd5cbd9736b06fa1d45ee13a18cab88adb
2019-07-02 02:39:48 -07:00
b9ede6600e Remove the USE_MIOPEN build option as MIOpen is always used when built with ROCm. (#22420)
Summary:
Close https://github.com/pytorch/pytorch/issues/22200
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22420

Differential Revision: D16087538

Pulled By: bddppq

fbshipit-source-id: ecf3e7eb8213bb093e1c5290d096c233284a2ff9
2019-07-02 00:05:59 -07:00
6721e67c10 Remove hacky stub for quantized ops (#22388)
Summary:
Effectively reverts https://github.com/pytorch/pytorch/pull/18267 - this was a temporary measure and is not used any more.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22388

Differential Revision: D16070725

Pulled By: dzhulgakov

fbshipit-source-id: ee5db11a608f248b0da981169d4cc90470fd482f
2019-07-01 23:21:42 -07:00
2dd1323379 Fix the GPU trainer for NoneCalibration and RNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22385

Reviewed By: Wakeupbuddy

Differential Revision: D16053190

fbshipit-source-id: 6304c5c51f33691c201c78d4c921a9c250d9b4f5
2019-07-01 22:55:18 -07:00
5bd97be309 Fix lint error in format_time() in throughput_benchmark.py and clean it up a bit. (#22424)
Summary:
The `assert False` lint error has been causing CI to fail:

    ./torch/utils/throughput_benchmark.py:14:13: B011 Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22424

Differential Revision: D16083464

Pulled By: bddppq

fbshipit-source-id: 6d96e36c8fcbb391d071b75fe79c22d526c1ba3c
2019-07-01 22:15:37 -07:00
edd5b770be Remove API-level guard on NeuralNetworks.h (#22429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22429

Android NDK r20 removes the guard `(__ANDROID_API__ <= __ANDROID_API_O_MR1__)`, so we do it here also. There is insufficient reason to keep these decls undefined for earlier API levels. NDK r15 and earlier don't even define `__ANDROID_API_O_MR1__`, so the preprocessor defaults it to 0 and the guard evaluates as TRUE.

Reviewed By: smeenai, hlu1

Differential Revision: D16084105

fbshipit-source-id: f0857b3eb0573fe219f0d6c5e6583f89e2b5518f
2019-07-01 22:09:11 -07:00
de84104059 Lint ONNX Related Code (#22423)
Summary:
Lint the code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22423

Differential Revision: D16086518

Pulled By: houseroad

fbshipit-source-id: c6e5143f42c73a70beeaa2e089df4164f6265c32
2019-07-01 21:44:16 -07:00
ffa15d2285 Load original SourceRanges on import (#22180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22180
ghimport-source-id: efa46dcb845c099f0a746f523901ab2c2cd3b004

Test Plan: Imported from OSS

Differential Revision: D15981425

Pulled By: jamesr66a

fbshipit-source-id: bef682bd13c1a5be95bdb97e025690c6f2d523d3
2019-07-01 21:14:39 -07:00
2c2a913a4f Preserve SourceRanges across serialization (#22179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22179
ghimport-source-id: 9879551127da09d78ca348b9e436db5a09a92a38

Test Plan: Imported from OSS

Differential Revision: D15981423

Pulled By: jamesr66a

fbshipit-source-id: a2506f5a2f05916b6e8226841b0229110e758671
2019-07-01 21:14:35 -07:00
e05942c09b Serialization methods for SourceRange and Source (#22178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22178
ghimport-source-id: 85ca4d4454c6d4b57a82f211004c4bb712d1c980

Test Plan: Imported from OSS

Differential Revision: D15981426

Pulled By: jamesr66a

fbshipit-source-id: f81f5ee3b66fc4a0d4a708b8109712b5df9f241a
2019-07-01 21:14:31 -07:00
671782d88a Refactor file:line:col to be less ugly (#22177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22177
ghimport-source-id: e35f068c2d39bd8fa2058a9bfc0b1a3856f9383d

Test Plan: Imported from OSS

Differential Revision: D15981424

Pulled By: jamesr66a

fbshipit-source-id: b7748c5cfd4f8ea594314cb601a2b8045173700a
2019-07-01 21:14:28 -07:00
dff2c07183 Manual revert of D16012838
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412

Reviewed By: nairbv, houseroad

Differential Revision: D16079809

fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6
2019-07-01 19:58:21 -07:00
2c18bf21be Fix ScriptModule.__dir__() (#22426)
Summary:
`_method_names` is on `_c`, so `dir(script_module)` calls previously
didn't work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22426

Differential Revision: D16085330

Pulled By: driazati

fbshipit-source-id: 6f9f1bef5da4306c0f26aa0be1bcec6dd3a6f0fb
2019-07-01 19:33:14 -07:00
f0f2331a1c Add support for cross-chunk shuffling in ChunkDataset (#22347)
Summary:
This change adds one advanced support for cross-chunk shuffling.

For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22347

Differential Revision: D16081378

Pulled By: zhangguanheng66

fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0
2019-07-01 19:13:34 -07:00
1f9c4fdb5e split onnx passes (#22413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22413

_jit_pass_erase_number_types invalidates the jit graph but parts of _jit_pass_onnx rely on having a valid jit graph.

This splits _jit_pass_onnx into _jit_pass_onnx_remove_print and _jit_pass_onnx_preprocess_caffe2 (which rely on the valid jit graph), runs these before _jit_pass_erase_number_types,
and then runs the rest of _jit_pass_onnx after _jit_pass_erase_number_types

Reviewed By: houseroad

Differential Revision: D16079890

fbshipit-source-id: ae68b87dced077f76cbf1335ef3bf89984413224
2019-07-01 18:16:53 -07:00
a54acd3755 Update the way boolean tensor are being printed (#22238)
Summary:
In case when the boolean tensor gets printed out, no need to specify the dtype.

Example:
```
>> x = torch.tensor([[True, True, True], [True, True, True]])
>> print(x)
tensor([[True, True, True],
        [True, True, True]])

>> x = torch.tensor([True])
>> print(x)
tensor([True])

>> x = torch.tensor(True)
>> print(x)
tensor(True)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22238

Differential Revision: D15996304

Pulled By: izdeby

fbshipit-source-id: 5699acf3e00abca8a2bbb5384f8271eeb063dce7
2019-07-01 18:04:42 -07:00
cbf572671d update mkldnn-bridge to avoid mem leak (#22392)
Summary:
fix the memory leak issue exposed by https://github.com/pytorch/pytorch/issues/21537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22392

Test Plan: {F164886124}

Reviewed By: yinghai

Differential Revision: D16074150

Pulled By: bddppq

fbshipit-source-id: b70192aad3d531f349fea5d2d477b827715a2363
2019-07-01 17:12:48 -07:00
402b9f9a6d add PT chunk op to the benchmark (#22409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22409

as title

Reviewed By: hl475

Differential Revision: D16079031

fbshipit-source-id: 109060ffc953f2357b2783b13f9b9dc87bd3f98a
2019-07-01 16:37:05 -07:00
8a726f5815 add PT split op to the benchmark (#22410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22410

as title

Reviewed By: hl475

Differential Revision: D16078705

fbshipit-source-id: 29e1cc19d0e93a561d07c47e5678a311e6de3e3b
2019-07-01 16:37:01 -07:00
8281909e73 add PT cat operator to the benchmark (#22404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22404

as title

Reviewed By: hl475

Differential Revision: D16078395

fbshipit-source-id: 4ff5c558036af1dce6ac0001a1a1fc3a373a981f
2019-07-01 16:36:57 -07:00
007fd01e9b Enable PT operators running with {cpu, gpu} * {forward, backward} (#22416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22416

This diff tests the combination of cpu/gpu and forward/backward path for PT add operator.

Reviewed By: hl475

Differential Revision: D15770792

fbshipit-source-id: 38cc648361d2501d774db407f988c3cb5115b2ae
2019-07-01 16:30:58 -07:00
dfa6fca1c6 Supporting Manifold DB in Predictor Exporter (#22334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22334

Improve the function signatures of save_to_db and load_from_db in predictor_exporter.

Reviewed By: akyrola

Differential Revision: D16047208

fbshipit-source-id: a4e947f86e00ef3b3dd32c57efe58f76a38fcec7
2019-07-01 16:17:02 -07:00
30fedeae4a Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 26112fb218995b292bb28e65332f6259b3c289f6
2019-07-01 15:51:30 -07:00
10e4137396 Optimize InstanceNormGradientOp (#22288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22288

Optimize InstanceNormGradientOp

Benchmarks:

CPU with [N, C, H, W] = [128, 256, 56, 56],
NCHW order: 616ms -> 128ms
NHWC order: 1612ms -> 174ms

GPU with [N, C, H, W] = [128, 256, 112, 112],
NCHW order: 6450ms -> 37ms
NHWC order: 1419ms -> 82ms

Reviewed By: houseroad

Differential Revision: D16023630

fbshipit-source-id: 5af9bf1103cde2fc2bcb5cd5a057d039732f052e
2019-07-01 15:10:17 -07:00
d0348c0ef9 ThroughputBenchmark: improve formatting for ExecutionStats (#22293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22293

Just wrapping C class with nicer python interface which now
ust print dirrectly to get all the data. Later we can add various
visualizations there

Differential Revision: D16023999

fbshipit-source-id: 8436e37e36965821a690035617784dcdc352dcd1
2019-07-01 14:24:34 -07:00
d0db2a76a0 PyTorch ThroughputBenchmark: fix inaccuracy in number of iterations reporting (#22292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22292

as we do atomic fetch_add to validate if a thread should
finish, we should not take the last iteration into account. As a
result total number of iterations should be exactly the same as user
sets via config.num_iters

Now when running a unit test I see exact number of iterations reported

Differential Revision: D16023963

fbshipit-source-id: 3b12ee17276628ecd7b0979f28cd6deb777a1543
2019-07-01 14:24:29 -07:00
813b01e4a8 Use at::AutoNonVariableTypeMode before calling ATen tensor factory functions (#22364)
Summary:
As part of the Variable/Tensor merge, one invariant for tensor libraries such as ATen / Caffe2 / XLA is that they should only deal with Tensors, not Variables. However, currently in `variable_factories.h` we are potentially passing Variables into those tensor libraries without the `at::AutoNonVariableTypeMode` guard, which will cause those libraries to treat those Variables as Variables (i.e. their `is_variable()` is true), not Tensors.

Consider the following example for `full_like`:
```cpp
inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) {
  ...
  // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations.
  //
  // When `self` is a Variable, since we are not using `at::AutoNonVariableTypeMode`,
  // `at::full_like` will also use `self` as a Variable (and it will see that `self.is_variable()` is true),
  // which breaks the invariant that ATen / XLA should never deal with Variables.
  at::Tensor tensor = at::full_like(self, fill_value, self.options().is_variable(false));
  at::Tensor result =
    autograd::make_variable_consuming(std::move(tensor), /*requires_grad=*/false);
  ...
  return result;
}
```

Instead, the invariant-preserving implementation would be:
```cpp
inline at::Tensor full_like(const at::Tensor & self, at::Scalar fill_value) {
  ...
  at::Tensor tensor = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    // Both ATen and XLA rely on `at::full_like` to dispatch to library specific implementations.
    //
    // When `self` is a Variable, since we have `at::AutoNonVariableTypeMode` in the scope,
    // `at::full_like` will use `self` as a Tensor (and it will see that `self.is_variable()` is false),
    // which preserves the invariant that ATen / XLA should only deal with Tensors.
    return at::full_like(self, fill_value, self.options().is_variable(false));
  })();
  at::Tensor result =
    autograd::make_variable_consuming(std::move(tensor), /*requires_grad=*/false);
  ...
  return result;
}
```
This PR makes the suggested change for all variable factory functions.

cc. ailzhang This should allow us to remove all `tensor_data()` calls in the XLA codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22364

Differential Revision: D16074862

Pulled By: yf225

fbshipit-source-id: 3deba94b90bec92a757041ec05d604401a30c353
2019-07-01 14:08:28 -07:00
d632b1ff3c Expose is_mkldnn to python and register it as torchscript prim op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22386

Differential Revision: D16074722

Pulled By: bddppq

fbshipit-source-id: b9b2a05a894847640084f063fba68d9db4e6aec1
2019-07-01 12:31:59 -07:00
2ab6ff42d1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: c9d3be641389f3c45a9e5a65280d8bfd20e38ea0
2019-07-01 12:25:21 -07:00
577c04c490 add mutation support for forward_pre_hook and forward_hook (#22285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22285

Previously forward hooks are expected to return None, this PR adds the support to overwrite input and output in `forward_pre_hook` and `forward_hook`, this is used to implement inserting quant/dequant function calls around forward functions.

Differential Revision: D16022491

fbshipit-source-id: 02340080745f22c8ea8a2f80c2c08e3a88e37253
2019-07-01 11:06:42 -07:00
f7421b82ad Remove versions constraints from external_deps (#22113)
Summary:
As per attached tasks, these are noops and are being deprecated/removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22113

Reviewed By: philipjameson

Differential Revision: D15901131

fbshipit-source-id: 3acf12208f692548afe4844be13717a49d74af32
2019-07-01 10:55:30 -07:00
bfeff1eb8f Stubs for torch.nn (#19089)
Summary:
Closes https://github.com/pytorch/pytorch/issues/18724
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19089

Differential Revision: D16073654

Pulled By: ezyang

fbshipit-source-id: 5642179651ce45ab7c5a46cc1fcc4fd6b37fa71c
2019-07-01 09:50:17 -07:00
a43d9af52c Comment on why Windows build_pytorch.bat builds twice (#22363)
Summary:
I've noticed that Windows CI seems to build twice, e.g., https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-build/60304/console

This adds a comment explaining why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22363

Differential Revision: D16073609

Pulled By: zou3519

fbshipit-source-id: ddb422b7c7e18cc436caff2c5838373a82f69429
2019-07-01 09:45:01 -07:00
451c907a47 Adding qconv unpack operator for serialization (#22354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22354

qconv weight unpack operator

Reviewed By: zafartahirov, jianyuh

Differential Revision: D16059668

fbshipit-source-id: b068b1a13bcf6a9148d864db384db780d474bfbf
2019-07-01 09:39:14 -07:00
f894ef7263 Add smoke test for information fn/method/attrs to test_namedtensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22341

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 22341 gh/zou3519/66/head

Imported from OSS

Differential Revision: D16053440

Pulled By: zou3519

fbshipit-source-id: 400f2e1c136cd7db4346a42b58813e42595ca755
2019-07-01 07:24:54 -07:00
496e35f76b More named inference rules for pointwise unary ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22308

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 22308 gh/zou3519/65/head

Imported from OSS

Differential Revision: D16053441

Pulled By: zou3519

fbshipit-source-id: 2e8d4cc11d7a711d2b789752a316a11fffc0996e
2019-07-01 07:24:51 -07:00
2a698682e4 Remove Type dispatch (#21964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21964
ghimport-source-id: fdfb555ac4efbf31ae7d2c700a5aa44ad0cc4d7f

Test Plan: Imported from OSS

Differential Revision: D15897424

Pulled By: li-roy

fbshipit-source-id: 3cd6744254e34d70e6875ffde749b5cf959b663c
2019-06-30 04:11:35 -07:00
6c454ff14c Stop using Type in Python bindings (#21963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21963
ghimport-source-id: 4d9d66ba2c8587503d892b67f535cc2a62e2d19e

Test Plan: Imported from OSS

Differential Revision: D15897423

Pulled By: li-roy

fbshipit-source-id: 2dd55ceb80971df7c86545b7bfff733387f13572
2019-06-30 04:11:32 -07:00
9c8f9f0ecb Remove many usages of Type (#21941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21941
ghimport-source-id: f20cca6229daba9eb8652adb3d959266ae081ef1

Test Plan: Imported from OSS

Differential Revision: D15893331

Pulled By: li-roy

fbshipit-source-id: c988b16008ff0e2725a88c6025afd4aabdaca45a
2019-06-30 04:11:28 -07:00
3cba9e8aaa Error Message Paraphrasing (#22369)
Summary:
Saying `I` in an err msg is too subjective to be used in a framework.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369

Differential Revision: D16067712

Pulled By: soumith

fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6
2019-06-30 00:13:02 -07:00
41e51ce142 Fix QNNPACK and NNPACK settings (#22367)
Summary:
`setup.py` recommends setting `USE_QNNPACK=0` and `USE_NNPACK=0` to disable building QNNPACK and NNPACK respectively. However this wasn't reflected correctly because we were looking for `NO_QNNPACK` and `NO_NNPACK`. This PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22367

Differential Revision: D16067393

Pulled By: soumith

fbshipit-source-id: 6491865ade9a6d41b7a79d68fd586a7854051f28
2019-06-29 21:15:59 -07:00
d8de69d621 Adds symbolic op for logsumexp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22306

Differential Revision: D16046027

Pulled By: houseroad

fbshipit-source-id: 7319fd58321220941250c5b8eff024914798e392
2019-06-29 00:09:06 -07:00
b52621c870 Revise error message for invalid Reduction (#22160)
Summary:
Say the user inputs reduction=False. Of course, we can't add a bool and a string, so the ValueError itself will error -which is more confusing to the user. Instead, we should use string formatting. I would use `f"{reduction} is not..."` but unsure whether we are ok with using f"" strings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22160

Differential Revision: D15981826

Pulled By: soumith

fbshipit-source-id: 279f34bb64a72578c36bdbabe2da83d2fa4b93d8
2019-06-28 22:37:04 -07:00
9e18234109 Automatic update of fbcode/onnx to 806aa863020fa180e57f576cb032ec44ce8ddcca (#22359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22359

Previous import was 355a4954ea4e5836a5e943589509951c44feb6b4

Included changes:
- **[806aa863](https://github.com/onnx/onnx/commit/806aa863)**: Expose ONNX_ML build option to python (#2138) <bddppq>
- **[8f6e60db](https://github.com/onnx/onnx/commit/8f6e60db)**: Missing newline fix   (#2128) <Chris Seymour>
- **[d94f99d2](https://github.com/onnx/onnx/commit/d94f99d2)**: Avoid unnecessary copies of names by checker (#2098) <Scott McKay>
- **[01f77251](https://github.com/onnx/onnx/commit/01f77251)**: update qlinear conv test (#2120) <Ashwini Khade>
- **[1f0c13d3](https://github.com/onnx/onnx/commit/1f0c13d3)**: Add shape inference for LinearClassifier (#2077) <Hariharan Seshadri>
- **[eb798fcf](https://github.com/onnx/onnx/commit/eb798fcf)**: Fix inconsistency in describing graph's initializer. The initializer (#2115) <xykong58>

Reviewed By: bddppq, zrphercule

Differential Revision: D16061494

fbshipit-source-id: 6ccb63c135c27b307048aa42c11313675027ffb7
2019-06-28 22:22:24 -07:00
7cc8f37f56 Reduce needless copying when returning lists of tensors in the JIT interpreter. (#21690)
Summary:
This fixes the JIT performance gap reported in https://twitter.com/VahidK/status/1138677898439561216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21690

Differential Revision: D15783709

fbshipit-source-id: 23bb4acda6b60c27e95667e1d53c7d261a87167d
2019-06-28 19:00:05 -07:00
737f8a7638 Fix onnx passes (#22319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22319

The onnx pass replacing ints with Tensors produces an invalid JIT graph. It should only be called right before the onnx pass.
Also, it should only be called if we actually export to onnx.

Reviewed By: houseroad

Differential Revision: D16040374

fbshipit-source-id: e78849ee07850acd897fd9eba60b6401fdc4965b
2019-06-28 17:08:55 -07:00
e76c9751c4 Use lazy initialization in autograd record_function to avoid static (#22317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22317

About to add an observer that is also statically initialized in a different
file, so we need to enforce initialization order.

Reviewed By: ilia-cher

Differential Revision: D16012275

fbshipit-source-id: f26e57149a5e326fd34cb51bde93ee99e65403c4
2019-06-28 14:52:56 -07:00
3a198400f8 modify pool benchmarks
Summary: as title

Reviewed By: hl475

Differential Revision: D16058193

fbshipit-source-id: 8f4e04a0356960f6483d6ef58e64876740434849
2019-06-28 14:35:23 -07:00
89c709d217 modify unary operators benchmark
Summary: as title

Reviewed By: hl475

Differential Revision: D16057665

fbshipit-source-id: 07e31a17450fbfd88b5bd330c31c729de5300eaa
2019-06-28 14:03:41 -07:00
6cf4df5d06 add PT softmax ops to the benchmark suite (#21208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21208

The diff adds softmax, softmax2d, and logsoftmax to the benchmark suite.

Reviewed By: zheng-xq

Differential Revision: D15526265

fbshipit-source-id: b7ba63032dba7146765513c8cb1ac5a6a7bd1a68
2019-06-28 13:58:20 -07:00
2132ea1d8d Fix "python: can't open file '.jenkins/pytorch/print_sccache_log.py': [Errno 2] No such file or directory"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22315

Test Plan: Imported from OSS

Differential Revision: D16049862

Pulled By: ezyang

fbshipit-source-id: d9c83208e6b5ee7eb009ddb585dbfa0ea1cbb9e6
2019-06-28 07:15:33 -07:00
042a2fd810 Sync worker requirement mismatches
Summary:
Syncing worker requirement mismatches to improve remote build time.

Created actions:
MEDIUM: 445
LARGE: 354

Updated actions:
From MEDIUM to LARGE: 21
From LARGE to XLARGE: 34
From LARGE to MEDIUM: 9
From XLARGE to MEDIUM: 1

Differential Revision: D16047893

fbshipit-source-id: 7afab2ef879277f114d67fd1da9f5102ec04ed7f
2019-06-28 04:13:06 -07:00
e259894e83 Test raising TypeError in torch.from_numpy() (#21607)
Summary:
With some additional cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21607

Differential Revision: D16046063

Pulled By: li-roy

fbshipit-source-id: 15256a0e94afea39db3cb581c546c2a18a8a7fda
2019-06-27 23:54:47 -07:00
0804452709 fix lint in torch/nn/quantized/modules/linear.py (#22325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22325

att

Reviewed By: bddppq

Differential Revision: D16042464

fbshipit-source-id: 0610896c08667fdaa95983f49140193ecb9ede16
2019-06-27 23:18:42 -07:00
1bea27be9d Remove three cpu sigmoid functions that are identical to IMPLEMENT_UNARY_OP_VEC (#22271)
Summary:
This does not occur in CUDA code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22271

Differential Revision: D16024605

Pulled By: bddppq

fbshipit-source-id: bb4f16bacbdc040faa59751fba97958f4c2d33cd
2019-06-27 23:05:05 -07:00
5e77111486 nn.quantized.Relu and nn.quantize.Quantize/DeQuantize modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21930

Differential Revision: D15554224

fbshipit-source-id: 1de9ac7412468106be60e53852c23318ead37bc6
2019-06-27 16:15:17 -07:00
6f0f7e316d Support building caffe2 with clang-cl on Windows (#22307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22307

MSVC-specific pragma doesn't silence the warning about throwing constructor and therefore `clang-cl` fails to compile this file. This diff fixes the problem by adding additional check for `clang` compiler.

Reviewed By: smessmer

Differential Revision: D16032324

fbshipit-source-id: 6dbce0ebf0a533d3e42b476294720590b43a8448
2019-06-27 15:43:38 -07:00
83768f0756 Add ONNX export support for multidim torch.sum. (#22240)
Summary:
This change fixes the issue reported in https://github.com/pytorch/pytorch/issues/22066.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22240

Reviewed By: zrphercule

Differential Revision: D15996934

Pulled By: houseroad

fbshipit-source-id: 3a842ba26f54aa710233fbe87d727fc1f2568d9c
2019-06-27 15:02:33 -07:00
2832e33a94 Add serialization for nn.quantized.Linear module (#21925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21925

att

Differential Revision: D15483071

fbshipit-source-id: 3a218dad5b653b38a0885339889ff70c75a13bef
2019-06-27 14:57:22 -07:00
5c46e701fc Implementation of nn.quantized.linear module (#21921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21921

Call FBGEMM kernels to implement quantized linear operator. This operator is used only for inference.

Differential Revision: D15375695

fbshipit-source-id: b9ca6c156fd60481fea83e55603b2897f7bfc3eb
2019-06-27 14:09:48 -07:00
7a40412158 Delay reduction of unused parameters until first autograd hook is called (#22219)
Summary:
Reduction of gradients for unused parameters should happen as soon as
possible, because they potentially block reduction of gradients for
used parameters. This used to happen instantly when
`prepare_for_backward` was called and it found parameters that didn't
contribute. This meant that if you have a model with unused
parameters, and you want to discard the model output (i.e. not call
backward on some loss), reduction of the gradients of those unused
parameters would have been kicked off, and you'd see an error the next
time you called `forward`.

In this commit, this original approach is slightly changed to delay
reduction of the gradients of those unused parameters until the first
autograd hook is called. This means that you can now discard the model
output regardless of the model having unused parameters or not.

This is a prerequisite for making the `find_unused_parameters`
argument to DDP default to `True`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22219

Differential Revision: D16028698

Pulled By: pietern

fbshipit-source-id: c6aec2cd39c4a77746495d9cb1c9fb9c5ac61983
2019-06-27 14:09:44 -07:00
ac39869370 Fixed list() not making a copy (#22093)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22087
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22093

Differential Revision: D16036814

Pulled By: Chillee

fbshipit-source-id: 3c7106f907415ed0f600acaf45d2c61e1c60867a
2019-06-27 13:55:43 -07:00
b1096995d5 Update ThroughputBenchmark to reflect new script::Module API (no (#22291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22291

There was a race between landing the benchmark diff and
https://github.com/pytorch/pytorch/pull/21934 from zdevito. This PR
should fix the issue.

Reviewed By: zdevito

Differential Revision: D16023640

fbshipit-source-id: 931714352e656f045f9ef3cd17422db51b168384
2019-06-27 12:57:27 -07:00
177b8bf6e7 Named inference rule for more pointwise ops. (#22268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22268
ghimport-source-id: c722f9fbb3fc529c872dcccbf58ba1a8c5fcda8e

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

Imported from OSS

Differential Revision: D16030549

Pulled By: zou3519

fbshipit-source-id: 5cbb2c8626335a32a22ed8079245a5faa7cf553f
2019-06-27 12:49:36 -07:00
7732b1a604 Enable named inference for some unary pointwise ops (#22267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22267
ghimport-source-id: 1566df9a20712cada6ea1209e000c5ff757daa14

Test Plan: Imported from OSS

Differential Revision: D16030550

Pulled By: zou3519

fbshipit-source-id: 183ca1d14dc0fb6f1ee6e114b48c2703c61e11ce
2019-06-27 12:49:32 -07:00
69b702a6eb Implement unify_from_right (#22223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22223
ghimport-source-id: b88bd2a13c1c9c699945a69ec05300c6e598e95a

Test Plan:
- `build/bin/NamedTensor_test` [namedtensor ci]

gh-metadata: pytorch pytorch 22223 gh/zou3519/62/head

Imported from OSS

Differential Revision: D16030551

Pulled By: zou3519

fbshipit-source-id: f3d53e3f9b2428a4926c61a02631e6cd29f89e4b
2019-06-27 12:49:29 -07:00
6386e4d244 Named inference rule for abs. (#22151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22151
ghimport-source-id: 54c1726b578ac162af817f78df6f540b764e46e3

Test Plan:
- `python test/test_namedtensor.py` [namedtensor ci]

Imported from OSS

Differential Revision: D15970326

Pulled By: zou3519

fbshipit-source-id: 4ea25f0a73bbc24b604d3ded2027eeb4ce800de0
2019-06-27 12:49:25 -07:00
2913f6a26d Adding modules for Python 3 compatibility (#22295)
Summary:
To improve the python 3 compatibility and make linter happy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22295

Reviewed By: zrphercule

Differential Revision: D16024957

Pulled By: houseroad

fbshipit-source-id: c0eddf731891b2f547ba619b3c2f6b2d7a32f034
2019-06-27 12:06:40 -07:00
6947e192f7 Remove unused param in Caffe2 LayerNormGradientOp (#22282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22282

Remove unused param in Caffe2 LayerNormGradientOp

Reviewed By: bddppq, houseroad

Differential Revision: D16017117

fbshipit-source-id: bdd0bd2aca009e549dfd2bf622494dfc791589e3
2019-06-27 11:22:44 -07:00
be0631b6ee Add the rest of the dict API (#21979)
Summary:
This adds the rest of the `dict.???` methods that were missing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979

Pulled By: driazati

Differential Revision: D16023573

fbshipit-source-id: 3ea9bd905090e2a176af654a8ca98c7d965ea679
2019-06-27 11:08:18 -07:00
c9626a11cc Made a += b for lists do an in place add (#21896)
Summary:
In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()`

Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this:
```
Pytorch batch_gather took 0.018311 seconds.
Pytorch batch_gather jit took 0.013921 seconds.
Pytorch vectorized batch_gather took 0.001384 seconds.
```
Previously, `batch_gather jit` took 3x as long as `batch_gather`.

Some logic taken from https://github.com/pytorch/pytorch/pull/21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`.

Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21896

Differential Revision: D15998628

Pulled By: Chillee

fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3
2019-06-27 10:59:24 -07:00
bf677b8849 Set MKLDNN (default) build variables in CMakeLists.txt, not in Python build scripts (#22215)
Summary:
This is yet another step to disentangle Python build scripts and CMake
and improve their integration (Let CMake handle more build environment
detections, and less by our handcrafted Python scripts).

The processor detection logic also changed a bit: Instead of detecting
whether the system processor is PPC or ARM, this PR changes to detect
Intel CPUs, because this is more precise as MKL only supports Intel
CPUs. The build option `USE_MKLDNN` will also not be presented to
users on non-Intel processors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22215

Differential Revision: D16005953

Pulled By: ezyang

fbshipit-source-id: bf3f74d53609b3f835e280f63a872ff3c9352763
2019-06-27 10:21:55 -07:00
d2bad941f4 Fix lint issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22303

Differential Revision: D16030302

Pulled By: ifedan

fbshipit-source-id: 5564f6f810382f31f9416e5881978b03f51e53a9
2019-06-27 09:27:16 -07:00
zaf
e9d1b852c4 Functional conv2d (#21225)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; https://github.com/pytorch/pytorch/issues/21323 Quantized Conv2d Module&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15551835/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21225 Functional conv2d**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15544061/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21225

Test Plan:
`buck test mode/dev caffe2/test:quantized -- test_conv_api`: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929

```
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 1.1 sec
Building: finished in 5.1 sec (100%) 6958/6958 jobs, 2 updated
  Total time: 6.3 sec
Trace available for this run at /tmp/testpilot.20190603-163323.4026295.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 17661db57af88ec71497f5c21efa86531c07662b fbpkg ce57c6c1c73f45c4aa890e9df65820c3 at Sat Jun  1 17:06:32 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/625/t.par
Discovering tests
Running 1 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929
      ✓ caffe2/test:quantized - test_conv_api (test_quantized_conv.FunctionalAPITest) 6.962 1/1 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1407375016833929
Summary (total time 10.65s):
  PASS: 1
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Reviewed By: dskhudia

Differential Revision: D15544061

Pulled By: zafartahirov

fbshipit-source-id: 700c0c78b5915bf7e54bda7c44f44b7b1e247f4d
2019-06-27 09:19:54 -07:00
59c42595e0 Enabled gather and scatter for bool tensor (#21924)
Summary:
- moving stuff around in order to enable bool.
- Added implementation of atomicAdd(bool, bool)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924

Differential Revision: D15883711

Pulled By: izdeby

fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e
2019-06-27 09:07:50 -07:00
f13fadd510 fix python2 corner-case in torch.distributed.launch (#20996)
Summary:
Small fix for the comment raised in 4cf76574b9 (r33134850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20996

Differential Revision: D15991510

Pulled By: pietern

fbshipit-source-id: 4e5a35864b5a4ec9402aa83a19c4a3ba0df2f01f
2019-06-27 05:19:37 -07:00
f39b6624ba ChunkDataset checkpoint support (#21889)
Summary:
When dealing with large scale dataset, it is handy if we can save the dataset status and resume later. Especially in cases where some unexpected crash happens, user don't need to start over the whole dataset from begining. Instead, they can reload it from the last checkpoint.

This change adds support for checkpoint save/load logic in ChunkDataset.

On ChunkDataset construction, user can specify a file name from which to load the checkpoint. If it is empty, default to start from fresh; otherwise the ChunkDataset will 'fast forward' the chunk sampler to the corresponding checkpoint.

The user can also call ChunkDataset::save() to serialize current status to a file, which can be used later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21889

Differential Revision: D16024582

Pulled By: ailzhang

fbshipit-source-id: 1862ab5116f94c9d29da174ce04a91041d06cad5
2019-06-26 22:54:14 -07:00
30d890c672 Removed an outdated comment above IMPLEMENT_UNARY_OP_VEC(abs) (#22272)
Summary:
due to 82b570528db0a43fc04bb90f5d4538c01e4a5582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22272

Differential Revision: D16024544

Pulled By: bddppq

fbshipit-source-id: 37955bff3301975c0dd6abde8a3ba79af0555111
2019-06-26 22:24:13 -07:00
f144b9ebef Fix two overindent lint errors in test/common_nn.py. (#22287)
Summary:
This keeps causing lint tests to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22287

Differential Revision: D16024524

Pulled By: bddppq

fbshipit-source-id: a3e3780a55943283e9c854e94ac06ea4715e5319
2019-06-26 21:41:41 -07:00
e6d4a2d289 Remove unused file cmake/Modules/FindMIOpen.cmake (#22244)
Summary:
`cmake/public/LoadHIP.cmake` calls `find_package(miopen)`, which uses the CMake module in MIOpen installation (It includes the line `set(miopen_DIR ${MIOPEN_PATH}/lib/cmake/miopen)`). `cmake/Modules/FindMIOpen.cmake` is not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22244

Differential Revision: D16000771

Pulled By: bddppq

fbshipit-source-id: 07bb40fdf033521e8427fc351715d47e6e30ed34
2019-06-26 21:21:46 -07:00
5e0a74dd70 Rename copy_tensor_data to copy_tensor_metadata (#22266)
Summary:
The original name `copy_tensor_data` could be confusing because users are not sure whether it deep-copies data in the tensor's storage or just copies the tensor's metadata. The renaming makes it more clear.

cc. ailzhang This might break XLA build, but I think the renaming makes it more clear why we use `copy_tensor_data` in XLATensorImpl's shallow-copy functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22266

Differential Revision: D16014724

Pulled By: yf225

fbshipit-source-id: f6ee966927d4d65d828b68264b3253b2f8fd768d
2019-06-26 21:16:57 -07:00
45c6fa0007 Refactor Tests for Multiple ONNX Opsets (#20036)
Summary:
Refactor tests for https://github.com/pytorch/pytorch/pull/19294.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20036

Reviewed By: zrphercule

Differential Revision: D16016593

Pulled By: houseroad

fbshipit-source-id: eaae324e347679acf3d0ac1c14be03919f54496e
2019-06-26 17:06:57 -07:00
f51de8b61a Back out "Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark" (#22185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22185

Original commit changeset: 72a0eac1658b

Differential Revision: D15981928

fbshipit-source-id: d2455d79e81c26ee90d41414cde8ac0f9b703bc3
2019-06-26 16:05:51 -07:00
3f2a839dda Add comments to bailoug_graph.*
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22161

Differential Revision: D15975355

Pulled By: Krovatkin

fbshipit-source-id: dca0095b4f05cff8277663ad38b65eeb44417f40
2019-06-26 15:39:38 -07:00
04fe2453c4 conv2d/conv3d for LongTensor (#20730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20730

Generates forward conv2d function for LongTensor

Differential Revision: D15423753

fbshipit-source-id: 0e770b61257cc4c6559581796bf104ef68155c84
2019-06-26 15:29:56 -07:00
3ba72a11db Revert D15999938: [jit] Add the rest of the dict API
Differential Revision:
D15999938

Original commit changeset: 7bc2a55e3f79

fbshipit-source-id: e377c00e990d6f058960936e69712b77851c06fa
2019-06-26 14:16:37 -07:00
7707dee761 Re apply optional ScalarType changes (#22237)
Summary:
This is (mostly) the re-application of:
https://github.com/pytorch/pytorch/pull/21088

which was reverted due to an issue conflicting with changes in:
https://github.com/pytorch/pytorch/pull/22104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237

Differential Revision: D16012838

Pulled By: nairbv

fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883
2019-06-26 13:36:25 -07:00
8b02522b93 Avoid copy in ArrayRef<->vector comparison (#22218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22218

-

Differential Revision: D15990763

fbshipit-source-id: 53c98f915fadc8a65aea896c80292d5804d967a4
2019-06-26 13:36:21 -07:00
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
5bdc4db26e Refactor named tensor helper code (#22150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22150
ghimport-source-id: 460022febc24f49b86f0d5dbf8dc227564bde6cb

Test Plan: Imported from OSS

Differential Revision: D15970325

Pulled By: zou3519

fbshipit-source-id: 86a3e3ca82bbf4ff815431e25c5f9a35fcd23be0
2019-06-26 11:33:29 -07:00
29b53b0259 Fix bug in caffe2 transpose on GPU (#22233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22233

Fix bug in caffe2 transpose on GPU

Reviewed By: hl475

Differential Revision: D15994973

fbshipit-source-id: 542dc8757b51a6322fffa55826c1d4e32927398d
2019-06-26 11:33:25 -07:00
2dc9643080 Better error message for mismatched dict key type (#22231)
Summary:
](https://our.intern.facebook.com/intern/diff/15993936/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22231

Pulled By: driazati

Differential Revision: D15993936

fbshipit-source-id: 6822ef01477a3b32beb8c037a621fa71abd022c8
2019-06-26 10:46:45 -07:00
af9e0085f2 Add the rest of the dict API (#21979)
Summary:
This adds the rest of the `dict.???` methods that were missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21979

Pulled By: driazati

Differential Revision: D15999938

fbshipit-source-id: 7bc2a55e3f791015a0ff2e3731703075cf0770ee
2019-06-26 10:40:29 -07:00
25eae3ed08 Disable test_proper_exit flaky worker_kill (#22208)
Summary:
I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208

Differential Revision: D15990307

Pulled By: soumith

fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7
2019-06-26 09:47:40 -07:00
a4f281446b introduce flags to set omp and mkl threads (#21472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21472

as title

Reviewed By: hl475

Differential Revision: D15695846

fbshipit-source-id: 44437f6b94a9c583275fcc711bb6ccf2b04f90fc
2019-06-26 09:33:05 -07:00
5f84f372a6 Use variable_data() in tensor_to_numpy (#22214)
Summary:
As part of the Variable/Tensor merge, we want to gradually remove call sites of `tensor_data()` and the API itself, and instead uses `variable_data()`. This PR removes the `tensor_data()` call in the tensor_to_numpy conversion path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22214

Differential Revision: D15997397

Pulled By: yf225

fbshipit-source-id: 6fcab7b14e138824fc2adb5434512bcf868ca375
2019-06-26 08:57:47 -07:00
f176950a67 Use lower case for strong wolfe option. (#22092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22092
ghimport-source-id: ccc53ed2f1e16865237334a4dde4d162e21762e5

Test Plan: Imported from OSS

Differential Revision: D15955996

Pulled By: vincentqb

fbshipit-source-id: 8ffbea3b9ef8ff7021d42524fa46112da8a3438e
2019-06-26 08:20:25 -07:00
9f22805cc6 Refactor function_wrapper.create_generic (#22077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22077
ghimport-source-id: 39cf0a2e66e7fa2b6866af72782a22a4bd025e4c

Test Plan:
- Compared the build/aten/src folder before and after this change
locally and verified they are identical (`diff -r`).
- Wait for CI + Also, [namedtensor ci]

Imported from OSS

Differential Revision: D15941967

Pulled By: zou3519

fbshipit-source-id: d8607df78f48325fba37e0d00fce0ecfbb78cb36
2019-06-26 08:20:21 -07:00
b297552887 Make nn functions configurable for different scalar types (#20729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20729

Currently there is no way to specify what scalar types each nn function will support.

This change will allow to specify supported scalar types for each function/backward function and device. By default each function will support Float, Double, Half.

If you want to scpecify any extra supported scalar types, other then default, you will need to change nn.yalm:

- name: _some_func(Tensor self)
  cname: SomeFunction
  CPU:
    forward_scalar_types: ['Float', 'Double', 'Long']
    backward_scalar_types: ['Float', 'Double']

Differential Revision: D15423752

fbshipit-source-id: b3c157316d6e629bc39c1b377a3b23c71b1656cf
2019-06-26 07:53:38 -07:00
95b5718007 Prevent VS from emitting errors when using swap in Optional.h (#22182)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21706
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22182

Differential Revision: D15981740

Pulled By: ezyang

fbshipit-source-id: d58b3ca3aea8d3d383150208b87fa4bbd4f6fe33
2019-06-26 07:29:35 -07:00
fde75a33e1 update IterableDataset doc to be consistent with current behavior
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22230

Differential Revision: D15994680

Pulled By: ezyang

fbshipit-source-id: 9e47e8369aa08a550987c4468ce75aa7650ee1d4
2019-06-26 06:49:22 -07:00
655a370859 restoring HEADs for ideep and onnx to more recent versions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22250

Differential Revision: D16003227

Pulled By: Krovatkin

fbshipit-source-id: bf906a8e9e5e0f79391e5984c6cdfb9638d84981
2019-06-26 02:19:17 -07:00
17b37eb353 Bump gloo (#22225)
Summary:
This includes:
* Removal of builder classes
* Add allgatherv
* Add bcube allreduce algorithm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22225

Differential Revision: D16003629

Pulled By: pietern

fbshipit-source-id: fd062b82bfeeddb8190206d9931a781c7daff6f9
2019-06-26 00:55:36 -07:00
c1fc2f25c2 export deleteFunction in torch/csrc/autograd/function.h (#22236)
Summary:
In `torch/csrc/autograd/function.h` we define `torch::autograd::Function`, a (the?) central autograd record-holding class. `Function` is declared public API (`TORCH_API`).

We also define a custom deleter `deleteFunction` which we use throughout PyTorch's own use of `Function`. This trivial PR declares the deleter public API as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22236

Differential Revision: D16001335

Pulled By: yf225

fbshipit-source-id: 6ef0a3630e8f82f277a0e6e26cc64455ef7ee43e
2019-06-25 20:46:09 -07:00
e8bc992b03 print device when it's not on default device (#22094)
Summary:
we used to not print device when it's on xla. It's sometimes confusing as it looks the same as cpu tensor...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22094

Differential Revision: D15975405

Pulled By: ailzhang

fbshipit-source-id: f19ceb9e26f5f2f6e7d659de12716f0dfe065f42
2019-06-25 20:28:50 -07:00
1a164bf30b remove unused mkldnn include (#22217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22217

Seems introduced in https://github.com/pytorch/pytorch/pull/19209/files which is
no longer used. Remove it to simplify mobile build.

Reviewed By: bddppq

Differential Revision: D15983344

fbshipit-source-id: 37ee0bfbd022da09af6bc44c6e3fec1c99a8e732
2019-06-25 17:45:39 -07:00
de85abf226 Allow default construction of Dict/List (#22084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084

For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr.
But since we renamed them to Dict/List, we can now allow default construction without ambiguity.

Differential Revision: D15948098

fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6
2019-06-25 17:40:48 -07:00
e425789286 Fix "missing return statement" warning (#22216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216

-

Differential Revision: D15989670

fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043
2019-06-25 16:57:42 -07:00
f7a126f941 fix optional type subtype relation (#22186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22186
ghimport-source-id: 05ef8c3a176fe2a67d4835888e6db52b57a6d199

Test Plan: Imported from OSS

Differential Revision: D15994644

Pulled By: wanchaol

fbshipit-source-id: 7c5c4eebd421f6c9470661c2c2eb38bafdff8bbd
2019-06-25 16:57:38 -07:00
defd23b8b9 Clean up old uses of checkScript (#22002)
Summary:
This cleans up the `checkScript` API and some old tests that were hardcoding outputs. It also now runs the Python function when a string is passed in to verify the outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22002

Differential Revision: D15924485

Pulled By: driazati

fbshipit-source-id: ee870c942d804596913601cb411adc31bd988558
2019-06-25 16:24:19 -07:00
7b1ffba3bf ArgumentStash for Scalar arguments (#21931)
Summary:
Scalars are being traced as constants.
This PR is to fix this issue.

The ONNX Graph for Test_Full_op() before and after this change:

def Test_Full_op():
  class Test_Full(nn.Module):
    def forward(self, x):
      return torch.full((3, 4), x, dtype=torch.long)
  model = Test_Full()
  x = torch.tensor(12)
  output = model(x)

Before this change:
graph(%input1 : Long()):
%output1 : Float(3, 4) = onnx::Constant[value=<Tensor>]
return (%output1)

After this change:
graph(%input1 : Long()):
%1 : int[] = onnx::Constant[value= 3 4 [ Variable[CPULongType]{2} ]]
%2 : Tensor = onnx::ConstantOfShape[value={0}]
%output1 : Float(3, 4) = onnx::Add(%2, %input1)
return (%output1)

Similar PR : https://github.com/pytorch/pytorch/pull/12939
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21931

Reviewed By: zrphercule

Differential Revision: D15950066

Pulled By: houseroad

fbshipit-source-id: 3470665d88fa34faa600940ef16b069a06002cd5
2019-06-25 15:22:08 -07:00
7ee82d48a8 Removed work around for convolution transpose op since the bug has be… (#22184)
Summary:
…en fixed in v0.18
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22184

Differential Revision: D15982627

Pulled By: bddppq

fbshipit-source-id: 8725d5b5e5b68e029ffb08af12b416bd310c9638
2019-06-25 14:34:34 -07:00
5b87049c66 remove uses of std::shared_ptr<Module> (#21934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21934
ghimport-source-id: e64ab9096f43749ead3ac5567675b815da295664

Test Plan: Imported from OSS

Differential Revision: D15892401

Pulled By: zdevito

fbshipit-source-id: 6424139206593ff944556c69d8a54723884eacaf
2019-06-25 13:24:38 -07:00
1d705b4b07 Run clang-format on c10d bits (#22194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22194

TSIA

Differential Revision: D15983780

fbshipit-source-id: 1365bcf9bbc262a3657f646e81d2fc9c32f24c97
2019-06-25 12:34:52 -07:00
f5a1ea170b SIMD version average pooling added (#22148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22148

Average pooling is added into dnnlowp optimization code.

Reviewed By: jspark1105

Differential Revision: D15936556

fbshipit-source-id: 6177ee62529801898f230c6fb89e9c4b598593a5
2019-06-25 12:19:21 -07:00
a7cb07eb0f Add missing algorithm header to Array utility (#22157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22157

This header uses `std::swap_ranges` function which is defined in `<algorithm>` header (https://en.cppreference.com/w/cpp/algorithm/swap_ranges). Therefore this file isn't guaranteed to compile on all platforms.

This diff fixes the problem by adding the missing header.

Reviewed By: smessmer

Differential Revision: D15971425

fbshipit-source-id: e3edcec131f72d729161f5644ee152f66489201a
2019-06-25 12:19:17 -07:00
6ff0c6ca3f Remove THD (#22065)
Summary:
It's been ~9 months since moving THD to the `torch.distributed.deprecated` namespace (see https://github.com/pytorch/pytorch/issues/11405) and we haven't seen issues related to it, so it's time to remove it.

Closes https://github.com/pytorch/pytorch/issues/18967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22065

Reviewed By: mrshenli

Differential Revision: D15983669

Pulled By: pietern

fbshipit-source-id: 2a2f5866f9a63040bc7cef3956d5fd215aba7165
2019-06-25 12:19:13 -07:00
bcb5fd8f06 Port symeig to ATen and enable batching of inputs (#21858)
Summary:
Changelog:
- Port `symeig` from TH/THC to ATen
- Enable batching of matrix inputs for `symeig`
- Modify derivative computation based on batching
- Update docs to reflect the change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858

Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work.

Differential Revision: D15981789

Pulled By: soumith

fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae
2019-06-25 12:13:27 -07:00
4ec6fbefa6 Show deprecation warning when stateful lambdas are used as kernels (#21885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21885

If a kernel is defined as a stateful lambda

    static auto registry = torch::RegisterOperators().op("my::op", [some_closure] (Tensor a) {...});

this can have very unexpected behavior when kernels are instantiated. There is no guarantee that the state is kept.

In the options based API, state is already disallowed:

    // this is a compiler error
    static auto registry = torch::RegisterOperators().op("my::op", torch::RegisterOperators::options().kernel([some_closure] (Tensor a) {...}));

but we can't disallow it in the non-options-based API for backwards compatibility reasons.

We can, however, show a deprecation warning. This is what this diff introduces.

Differential Revision: D15867089

fbshipit-source-id: 300fa4772fad8e7d177eb7cb910063d360537a4a
2019-06-25 11:53:18 -07:00
c68119387d serialize torch.Size object (#20952)
Summary:
fixes https://github.com/pytorch/pytorch/issues/20823
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20952

Differential Revision: D15514274

Pulled By: ailzhang

fbshipit-source-id: 8340a40fadfd06063f7f33b0d99d693e74d5defb
2019-06-25 10:44:35 -07:00
7daa96a3ce porting convtranspose3d to ATen (#22019)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/18353

CPU and GPU porting for convolution transpose 3d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22019

Differential Revision: D15985353

Pulled By: ezyang

fbshipit-source-id: 1c579577a32db24a1ce38f5ab9b3f1cb9c8f2a6e
2019-06-25 10:22:34 -07:00
9af8ea1ce5 Not expose mkldnn reshape and transpose (#22193)
Summary:
This PR is to make mkldnn reshape and transpose not exposes as Tensor API, please see the comments in https://github.com/pytorch/pytorch/pull/21943.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22193

Differential Revision: D15983434

Pulled By: bddppq

fbshipit-source-id: ad3514dfd8a3b0d89442eef752864e5d3f3d04f0
2019-06-25 09:52:47 -07:00
mal
c8b5f1d2f8 Switch autograd to use a pool of workers for each device (#21911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21911
ghimport-source-id: 3b7d37481201aa4b4ca8f7767603d0dfd13f871f

Test Plan:
Tested on https://github.com/pytorch/pytorch/issues/6959 and ensured no Recursion Error.
Performance testing:
[word_language_model](https://gist.github.com/malvika2147/34c214871d549f9275812f2d20506990) (no significant change)
[mnist](https://gist.github.com/malvika2147/77890eef102099490a1029122fb20dd0) (no significant change)
[Comparison of performance](https://gist.github.com/malvika2147/c0a8790910b8513bd2e20b224bdd6300) on https://github.com/pytorch/pytorch/issues/6959 with smaller inputs. (slower by about ~25%, expected)

Imported from OSS

Differential Revision: D15985852

fbshipit-source-id: ca172690857fd1718462b80f3a244af9d8825d6c
2019-06-25 09:08:26 -07:00
94e83da55c Optimization of the Embedding and Embedding-Bag CUDA Kernel (#22016)
Summary:
Re-implementation of the `embedding_dense_backward_cuda()` and the `embedding_bag_backward_cuda_sum_avg()` functions.

#### Performance
Running a [Mortgage Workflow](https://github.com/EvenOldridge/MortgageWorkflowA) with a block size of 100K on a DXG-2 (single GPU), we see a 270% speedup:
```
Original version:    370,168 example/s
Optimized version: 1,034,228 example/s
```
The original version is bounded by the `EmbeddingBag_accGradParametersKernel_sum_avg`, which takes 70% of the CUDA execution time. In the optimized version, the optimized kernel now takes only 17% of the time.

#### Greater Numerical Stability
An added benefit is greater numerical stability. Instead of doing a flat sum where a single variable are used to accumulate the weights, this code uses two-steps where each GPU-thread computes a sub-result defined by `NROWS_PER_THREAD` before the final result are accumulated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22016

Differential Revision: D15944339

Pulled By: mrshenli

fbshipit-source-id: 398d5f48826a017fc4b31c24c3f8b56d01830bf0
2019-06-25 08:14:15 -07:00
b0bd8758fc Further remove redundant CMake option passing code for those CMake variables that are directly controlled by environment variables but with a different name. (#22154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22154
ghimport-source-id: 714b98566e70063925c4c9e10940a4fe46fb5a3d

Test Plan: Imported from OSS

Differential Revision: D15985376

Pulled By: ezyang

fbshipit-source-id: 60710125009cd8bf60b5600a3f05854d931d9844
2019-06-25 07:23:06 -07:00
ce1a9653a8 Remove more build options not needed to be explicitly set in Python build scripts. (#22153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22153
ghimport-source-id: 129d90626a8e64079477a744fbbaba58e139a852

Test Plan: Imported from OSS

Differential Revision: D15985375

Pulled By: ezyang

fbshipit-source-id: 925bb1c886633b002beb1da0754bb055aa971e21
2019-06-25 07:23:03 -07:00
839b496fbd Fixes bugs in torch.multinomial without replacement (#22183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22183
ghimport-source-id: f03c17178de115adbe983953a8f9f205e3df7721

Test Plan: Imported from OSS

Differential Revision: D15985324

Pulled By: ezyang

fbshipit-source-id: 6e9dc3b54d448f4bb374b004d7f1dd1ac5c014f6
2019-06-25 07:15:18 -07:00
b61693c0ed Optimize InstanceNormOp forward (#22130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22130

Optimize InstanceNormOp forward

For InstanceNormOp on CPU with order = NHWC, N = 128, C = 256, H = 56, W = 56: 183ms -> 115ms.

For InstanceNormOp on GPU with N = 256, C = 256, H = 112, W = 112:
NCHW: 1475ms -> 45ms
NHWC: 1597ms -> 79ms

Reviewed By: houseroad

Differential Revision: D15963711

fbshipit-source-id: 3fa03109326456b9f301514fecbefa7809438d3e
2019-06-25 01:04:53 -07:00
ac4913ee62 support both regularizable and sofmax re-weighting on sparse features in dot product (#22176)
Summary:
In order to select more important features in dot product among a list of candidate sparse features, we can assign one learnable weight on each feature, reweight each feature by multiplying the weight onto its embedding before dot product. We finally select features based on the weight magnitude after training.

We can perform L1 and/or L2 regularization on the weights. To summarize, the weights tend to shrink their values (avoiding overfitting) due to L2 regularization, and some weights will vanish to zero as L1. To avoid sparse feature embedding being ignored due to early collapse of weights, a piece lr warm up policy is used in optimizing regularization term, such that regularization is weak at first stage and gets stronger afterwards (a small lr constant in iters less than threshold 1, a medium lr constant in stage 2, and a final reasonable large lr constant in all iters after threshold 2). The features with nonzero and relatively large weights (in absolute value) will be selected for the module.

We can also apply softmax on the original weights to make it sum to 1. We can even boosting the softmaxed weights by multiply the number of softmax components, which essentially make them sum to the number of softmax components and avergae to 1. In this idea, all the weights are positive and sum to a constant. Regularization is not a must since we can count on the competition between softmax weights themselves to achieve reasonable re-weighting. We expect those weights be more dense, comparing with sparse ones from L1 regularization and we can select features based on top K weights.

Overall, we aim to demonstrate the selected feature set outperform current v0 feature set in experiments. Special acknowledgement goes to Shouyuan Chen, who initiated the work of regularizable weighting.

 ---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22176

The diff will export updates to Github repository, as stated below.

{F162787228}

Basically, the updates on the files are summarized as below:

- adding logger messages
`caffe2/python/layer_model_helper.py`
- add ElasticNet regularizer, which combines both L1 and L2 regularization
`caffe2/python/regularizer.py`
- implement piecewarmup, specifically warm up with three constant pieces
`caffe2/sgd/learning_rate_functors.h, caffe2/sgd/learning_rate_op.cc, caffe2/sgd/learning_rate_op.h`

Differential Revision: D15923430

fbshipit-source-id: ee18902cb88c23b1b7b367cc727d690a21e4cda9
2019-06-24 21:27:33 -07:00
299ea84a70 Use latest stable flake8-bugbear in CI and fix B011 flake8 error. (#21944)
Summary:
- PyCQA/flake8-bugbear#53 has been fixed (but not yet closed on their side) and a new version of flake8-bugbear has been released on Mar 28, 2019. Switch CI to use the latest stable version.
- Fix the new B011 errors that flake8-bugbear catches in the current codebase.

 ---

B011: Do not call assert False since python -O removes these calls. Instead callers should raise AssertionError().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21944

Differential Revision: D15974842

Pulled By: soumith

fbshipit-source-id: de5c2c07015f7f1c50cb3904c651914b8c83bf5c
2019-06-24 20:48:15 -07:00
f5df0c9104 Don't end on inplace operators in einsum (#22111)
Summary:
Returning the result of an inplace `squeeze_` in `einsum` (which itself is traced) interacts badly with `autograd.Function`.

I must admit that I'm not 100% certain whether it should be necessary to change this, but I consider this a good change overall.

Fixes: https://github.com/pytorch/pytorch/issues/22072
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22111

Differential Revision: D15974990

Pulled By: soumith

fbshipit-source-id: 477e7f23833f02999085f665c175d062e7d32acd
2019-06-24 20:39:20 -07:00
ede08492e1 Enabled mul for bool tensors on CUDA (#21771)
Summary:
Enable mul_cuda for bool tensors.
This is a helper PR to fix a [test failure](https://circleci.com/gh/pytorch/pytorch/1992191?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) in the other [PR](https://github.com/pytorch/pytorch/pull/21113).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21771

Differential Revision: D15883737

Pulled By: izdeby

fbshipit-source-id: 4c39644bbe8e80da4d14570862589944285d4bfe
2019-06-24 18:37:29 -07:00
3b700a43d5 Add missing whitespace in error message (#21904)
Summary:
The current error message displays as:
     `RuntimeError: index koccurs twice in output`
A whitespace is missing between the index and 'occurs'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21904

Differential Revision: D15878941

Pulled By: colesbury

fbshipit-source-id: 163dda1829bf4956978cd01fd0e751673580722d
2019-06-24 15:32:46 -07:00
88cdc16835 AveragePool: expand incomplete kernel_size for the C++ API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22075

Differential Revision: D15945260

Pulled By: mrshenli

fbshipit-source-id: 827660c19ebbdb5f0aae2f4eadb6025ae2f93674
2019-06-24 15:32:41 -07:00
2372e7ed2e DilatedMaxPool: expand incomplete kernel_size for the C++ API (#22073)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22032.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22073

Differential Revision: D15944471

Pulled By: mrshenli

fbshipit-source-id: 84b265be00d67aa7f13508ede0646763d2339f1d
2019-06-24 15:32:36 -07:00
b2a39314e7 Make Dropout.__repr__ consistent with other modules (#22110)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22106.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22110

Differential Revision: D15958821

Pulled By: ezyang

fbshipit-source-id: 89381dc3bfa79544580e20fea906cef4f5101b61
2019-06-24 15:27:06 -07:00
273b6c5bae Cast return value of vector.at() to void to avoid nodiscard warning in MSVC. (#22061)
Summary:
Fix https://github.com/pytorch/pytorch/issues/22053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22061

Differential Revision: D15957983

Pulled By: ezyang

fbshipit-source-id: e4416c5f0db2bc6b8bfaa27be52b942148ec7b3d
2019-06-24 15:27:02 -07:00
0ac28c8966 Quick fix for #18215, the CPU case (#21910)
Summary:
The bug is that when target_length == 0, there is no preceding BLANK state and the original implementation will lead to out of bound pointer access.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21910

Differential Revision: D15960239

Pulled By: ezyang

fbshipit-source-id: 7bbbecb7bf91842735c14265612c7e5049c4d9b3
2019-06-24 15:26:58 -07:00
41d0525de3 Improve repr for IncompatibleKeys (#22119)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20128.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22119

Differential Revision: D15961965

Pulled By: ezyang

fbshipit-source-id: 9cc397726e6bea5580e79d291cfc1ee75337fa0c
2019-06-24 15:26:54 -07:00
f1775796dd Fix minor issues with #21736 (#22074)
Summary:
cc mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22074

Differential Revision: D15965376

Pulled By: mrshenli

fbshipit-source-id: 50ff96de6390817d8ea52c04322c6bee3d649b32
2019-06-24 15:18:26 -07:00
a45898931c Document the Boolean tensor type.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21601

Differential Revision: D15971573

Pulled By: gchanan

fbshipit-source-id: c07c57f989980149cb1307dcca6ba64dce52d0ef
2019-06-24 14:16:36 -07:00
7c4206499e Fix in ivalue::Future (#22114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22114
ghimport-source-id: 249b76c078e7af8ebb6cab113dd48dbd3e31e8dc

Test Plan:
ran intra_inter_benchmark with PARALLEL_BACKEND=NATIVE build

Imported from OSS

Differential Revision: D15958901

Pulled By: ilia-cher

fbshipit-source-id: 1c3dedc4cf1ff8166aeb26899a06c7287a499562
2019-06-24 12:56:46 -07:00
6350dbddd1 Fix sequential MKL case (#22062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22062
ghimport-source-id: a30255d7453c4ffecf40215a785c1e06b7296368

Test Plan:
USE_CUDA=0 PARALLEL_BACKEND=OPENMP BLAS=MKL USE_MKLDNN=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ BUILD_BINARY=1 python setup.py develop --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D15938079

Pulled By: ilia-cher

fbshipit-source-id: e7ef0c5bc75ebb845ebe66bf76a4070d45305b35
2019-06-24 12:56:43 -07:00
21da33f0f9 Better trace comments
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22090

Differential Revision: D15968440

Pulled By: Krovatkin

fbshipit-source-id: e55e03a4303adbaa576c4384e7a42410bd99da6e
2019-06-24 12:51:27 -07:00
f1c7fa0503 De-deprecate some warnings that hurt usability (#21999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21999
ghimport-source-id: a77b3aea3d3ed33f328e143203730f2655371837

Test Plan: Imported from OSS

Differential Revision: D15925892

Pulled By: bwasti

fbshipit-source-id: 2b4e0af40bc1c6d12c617ba8701d3a5f7a6d833d
2019-06-24 12:35:00 -07:00
2347a4032b Fix tracing docs and add more comprehensive examples (#22082)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21857
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22082

Differential Revision: D15968306

Pulled By: Krovatkin

fbshipit-source-id: a76e500b0b7192bd814931ec48bbe9c37b8b92e0
2019-06-24 12:10:19 -07:00
85cbe0d825 Fix Concat Dimension Bug (#22088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22088

This diff is similar to D14163001. We need to handle the edge case when add_axis=1.

Reviewed By: jspark1105

Differential Revision: D15949003

fbshipit-source-id: 328d1e07b78b69bde81eee78c9ff5a8fb81f629b
2019-06-24 10:32:48 -07:00
322261a4de Fix dispatching of backwards kernel for ROCm. (#22125)
Summary:
Use WARP_SIZE consistently also for the dispatch dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22125

Differential Revision: D15966661

Pulled By: bddppq

fbshipit-source-id: 93eb663e01aff3b49474504a2f96f060919edf0c
2019-06-24 10:32:44 -07:00
e016a424ef Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter
Differential Revision:
D15944971

Original commit changeset: 53473c370813

fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10
2019-06-24 09:41:33 -07:00
6edaa11e5a fix broken link
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22064

Differential Revision: D15951107

Pulled By: mrshenli

fbshipit-source-id: 0b8f97bd2bbac26855cd2889e1fc619770974ee2
2019-06-24 07:34:16 -07:00
77eda8de8e Support sparse gradients in DistributedDataParallel (#22037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22037

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Reviewed By: mrshenli

Differential Revision: D15926383

fbshipit-source-id: 39c0d5dbd95bf0534314fdf4d44b2385d5321aaf
2019-06-24 07:34:12 -07:00
a7ec889de4 Add sparse tensor allreduce (#22036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22036

Implemented only on ProcessGroupGloo, as an allgather of metadata
(sparse_dim, dense_dim, and nnz), followed by an allgather of indices,
followed by an allgather of values. Once these operations have
finished, all ranks locally compute a reduction over these sparse
tensors. Works for both CPU and CUDA tensors.

This surfaced a problem with the existing assumption of only modifying
tensors that are passed at the call site, because for sparse tensors
we don't know the dimensions of the output tensors before we run the
collective. To deal with this unknown, this commit adds a `result`
function to the `c10d::ProcessGroup::Work` class that returns a vector
of tensors.

It's a bit odd to have to retrieve the result through this function
only for operations on sparse tensors. To make this work irrespective
of tensor layout, we can create a follow-up commit to make all in
place operations make their results accessible through this function
as well. This doesn't break any existing contracts but does have the
potential to add interface ambiguity.

This is a resubmission of #19146.

Reviewed By: mrshenli

Differential Revision: D15926384

fbshipit-source-id: b6ee5d81606bfa8ed63c3d63a9e307613491e0ae
2019-06-24 07:34:09 -07:00
313960d52e Use at::detail::* instead of detail::* to avoid ambiguity in windows (#22029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22029
ghimport-source-id: d1a26a07faf101c644775267a141ba56cbd3f1c9

Test Plan: Imported from OSS

Differential Revision: D15965039

Pulled By: ezyang

fbshipit-source-id: 31baf405da6f7c6d9e31f5954ec827889dadf769
2019-06-24 07:18:02 -07:00
142361a7e4 merge interfaces that have an optional scalartype parameter (#21088)
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```

[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088

Reviewed By: ailzhang

Differential Revision: D15944971

Pulled By: nairbv

fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
2019-06-24 07:17:58 -07:00
cd0d8480d3 Remove many build options redundantly specified in Python build scripts. (#21877)
Summary:
Currently many build options are explicitly passed from Python build scripts to CMake. But this is unecessary, at least for many of them. This commit removes the build options that have the same name in CMakeLists.txt and environment variables (e.g., `USE_REDIS`). Additionally, many build options that are not explicitly passed to CMake are lost.

For `ONNX_ML`, `ONNX_NAMESPACE`, and `BUILDING_WITH_TORCH_LIBS`, I changed their default values in CMake scripts (as consistent with what the `CMake.defines` call meant), to avoid their default values being redundantly set in the Python build scripts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21877

Differential Revision: D15964996

Pulled By: ezyang

fbshipit-source-id: 127a46af7e2964885ffddce24e1a62995e0c5007
2019-06-24 07:17:54 -07:00
1b34ccfc78 Porting SpatialDilatedConvolution and VolumetricDilatedConvolution to ATen (#20983)
Summary:
This PR tackles issue https://github.com/pytorch/pytorch/issues/18352 .

Progress:
- [x] conv_dilated2d CPU
- [x] conv_dilated3d CPU
- [x] conv_dilated2d CUDA
- [x] conv_dilated3d CUDA
- [x] RocM port
- [x] Port of CUDA gemm and gemv
- [x] Refactored 2d and 3d functions as well as output and gradient computations into a single C++ template function
- [x] Cleanup
  + [x] eliminate forward functions
  + [x] eliminate buffers `columns` and `ones` from functions API
  + [x] eliminate out functions
  + [x] eliminate using `ones`

Note that col2im, im2col, col2vol, vol2col implementations are exposed in `ATen/native/im2col.h` and `ATen/native/vol2col.h`. The corresponding operators (not ported in this PR) should use these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20983

Differential Revision: D15958088

Pulled By: ezyang

fbshipit-source-id: 1897f6e15abbf5710e9413cd1e443c2e1dc7d705
2019-06-24 07:12:54 -07:00
3ba654e6d5 Add finding thnvrtc_library into torchconfig.cmake (#22126)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/21861#issuecomment-504805368
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22126

Differential Revision: D15964930

Pulled By: ezyang

fbshipit-source-id: 0fb749784bec9af5a8ccbcf775fa7d9d4d34a4c6
2019-06-24 07:04:44 -07:00
08060e898b Revert D15435461: [pytorch][PR] PyTorch ThroughputBenchmark
Differential Revision:
D15435461

Original commit changeset: db08829dc3f4

fbshipit-source-id: 72a0eac1658b2d3f885bc9a21c49fcc23030ae3e
2019-06-23 22:55:05 -07:00
d96ce9b9fe add for in dict support (#22006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22006
ghimport-source-id: d9686c0b61b0eea3787f48adce567249e4e8faf0

Test Plan: Imported from OSS

Differential Revision: D15948548

Pulled By: wanchaol

fbshipit-source-id: 4227502ca050099085ad481aef725ac2cab06d74
2019-06-23 20:49:35 -07:00
c9344fc9c4 add for in string support (#21990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21990
ghimport-source-id: 69b4882f8602c4088e7a833c43fd3cd37501a3c0

Test Plan: Imported from OSS

Differential Revision: D15948547

Pulled By: wanchaol

fbshipit-source-id: 057e7f4fb67c6dca98458ceb14414368e1a86260
2019-06-23 20:49:30 -07:00
eab35756d8 support iteration tuple unpacking (#21985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21985
ghimport-source-id: 1f20a8db7b6bad23b18ac1caefcb46b3fa141697

Test Plan: Imported from OSS

Differential Revision: D15948549

Pulled By: wanchaol

fbshipit-source-id: 758c9c3dfad40c4158aee21ddebcd25b711111d7
2019-06-23 20:49:26 -07:00
9b45237618 PyTorch ThroughputBenchmark (#20766)
Summary:
This is useful for measuring inference performance of your
models. This is a very basic benchmark for now. We don't support
batching on the benchmark side, no inter and intra op parallelizm is
supported yet, just caller based parallelizm.

Main phylosophy here is that user should be able to provide inputs
from python and just stack them within the benchmark. API should be
exactly the same as passing inputs to module.forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20766

Test Plan: Added a new unit test

Differential Revision: D15435461

Pulled By: salexspb

fbshipit-source-id: db08829dc3f4398bb1d8aa16cc4a58b6c72f16c6
2019-06-23 13:03:18 -07:00
c0f96aaf01 Restore default values on premature test exit (#22115)
Summary:
Previously any assert failures would leave the updated setting, making
the test suite semantics dependent on the order in which the tests are run.

The diff is large only due to the indentation change (might be good to review without whitespace changes).

cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22115

Differential Revision: D15960875

Pulled By: soumith

fbshipit-source-id: 9313695277fc2d968786f13371719e03fff18519
2019-06-23 12:55:00 -07:00
887ecf797c Fix DictType isSubtypeOf (#22104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22104
ghimport-source-id: 9db14020f424cf2e021d63e9c0fe4017ac7cd6c8

Test Plan: Imported from OSS

Differential Revision: D15956726

Pulled By: jamesr66a

fbshipit-source-id: 85448deab70c5e5b7ab1132652836ed575581868
2019-06-22 16:36:34 -07:00
45b91bd326 refactor all for in range/tensor tests to be together with other for loop tests (#21950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21950
ghimport-source-id: b2491313bc2e0fcc10f77167c261cbae4d884ebb

Test Plan: Imported from OSS

Differential Revision: D15948546

Pulled By: wanchaol

fbshipit-source-id: 34dde28902ae5b8affbf6e4deaaffdb1d8ddd6ec
2019-06-22 01:38:14 -07:00
e0f5ab2c2e Tree based Iterator infrastructure: for in range/list/tensor/zip/enumerate (#21801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21801
ghimport-source-id: b019d3e9a6f9bf152991a01b40e424dff176ffaa

Test Plan: Imported from OSS

Differential Revision: D15948545

Pulled By: wanchaol

fbshipit-source-id: 6110a0f3ab08cbbb398441e8330f56083ecd2d99
2019-06-22 01:00:42 -07:00
a256b09ce9 Backout Liveness Tests again :-(
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22100

Differential Revision: D15956214

Pulled By: Krovatkin

fbshipit-source-id: 9b0c8ecf5b479bf878ffc31acc416bd8dbfe4b50
2019-06-22 00:18:21 -07:00
7b1d6c8912 Update intra_inter_benchmark (#22051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051
ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15933951

Pulled By: ilia-cher

fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba
2019-06-21 23:06:27 -07:00
91bf0a9f9d Move quantized tensor tests in test_torch.py to test_quantized_tensor.py (#22089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22089

att

Reviewed By: jianyuh

Differential Revision: D15950101

fbshipit-source-id: 70acdeeef3a05201d72f986d5a0005832efd75ff
2019-06-21 22:48:34 -07:00
b19b20efef fix minor comment (#21576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21576

Fix comment regarding original_tensor

Reviewed By: jianyuh

Differential Revision: D15733294

fbshipit-source-id: e2957f32dcf90859b77e61c931b64abdd066aabb
2019-06-21 22:23:53 -07:00
f7b2778cb1 s/uniqueName/debugName/ (#22096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22096
ghimport-source-id: 8f1d994b98432942b5beeb10bf6d30e447d51997

Test Plan: Imported from OSS

Differential Revision: D15956004

Pulled By: jamesr66a

fbshipit-source-id: 319d2d20ef0863249a8a2bdd228b4f792d37bfab
2019-06-21 20:54:53 -07:00
7d637de771 Reduce excessive CI printing in TestHub (#22043)
Summary:
https://github.com/pytorch/pytorch/pull/21132 reverted https://github.com/pytorch/pytorch/pull/19606.

Now these tests again print like 40% lines of CI outputs (e.g., https://circleci.com/gh/pytorch/pytorch/2041825?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link)

This PR now uses the functionality introduced in https://github.com/pytorch/vision/issues/862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22043

Differential Revision: D15947268

Pulled By: ailzhang

fbshipit-source-id: f84f4d6b86203dbe8687e04ae3ed8c99df0bdff8
2019-06-21 20:08:44 -07:00
63ca908026 Updating submodules
Reviewed By: yns88

fbshipit-source-id: d3374d2ee514cc0526559ffbac6dc11918ea71cf
2019-06-21 18:51:07 -07:00
856268c716 Revert D15947873: [JIT] s/uniqueName/debugName
Differential Revision:
D15947873

Original commit changeset: 31a2b30d0ce9

fbshipit-source-id: ef1c0f120c1835184d8106d176cea58ec6ad40b7
2019-06-21 18:51:03 -07:00
36e4b54420 s/uniqueName/debugName (#22048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22048
ghimport-source-id: a82d80ceec1d8055ce4cf62df10ade4a224109f8

Test Plan: Imported from OSS

Differential Revision: D15947873

Pulled By: jamesr66a

fbshipit-source-id: 31a2b30d0ce911edf5791ca10040a1e968750b06
2019-06-21 17:59:38 -07:00
4bc89bd5a6 Implement tensor.select(Dimname,int) (#21795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21795
ghimport-source-id: d13af6078a47de1d6045cfbb7d278c378fe734fe

Test Plan: Imported from OSS

Differential Revision: D15833457

Pulled By: zou3519

fbshipit-source-id: fa52aff25ce0e12f31da3eef83ea948b4f7a5d9f
2019-06-21 16:16:45 -07:00
18a904c12e Updating submodules
Reviewed By: yns88

fbshipit-source-id: 494a8fe00cbdb782bbdb05eefb17e9166d117599
2019-06-21 15:14:46 -07:00
f164c01f9c Adding liveness test cases back
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21762

Differential Revision: D15943509

Pulled By: Krovatkin

fbshipit-source-id: 4b65bf63ab15a2347da5f7269cc0f2dbb226b330
2019-06-21 15:09:09 -07:00
38aa5a519e Experimental option to use single thread pool (#22047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22047
ghimport-source-id: 8731538a091997fd31d6aff59152dc9241de2ba4

Test Plan:
EXPERIMENTAL_SINGLE_THREAD_POOL=1 PARALLEL_BACKEND=NATIVE_TBB
USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0
BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake
./build/bin/parallel_info
./build/bin/thread_init_test
./build/bin/test_parallel
./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15931188

Pulled By: ilia-cher

fbshipit-source-id: 1ca1b190b6e16ce5398f2dad72deaf3cb083a43b
2019-06-21 14:54:16 -07:00
5ff06a7b0b more complete tuple assignments (#21949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21949
ghimport-source-id: 458793d74af3728bf0338867b081157905a7635a

Test Plan: Imported from OSS

Differential Revision: D15948550

Pulled By: wanchaol

fbshipit-source-id: 9ed69e0859e052816f06fc9c288b905551b2e48c
2019-06-21 14:49:38 -07:00
4009089d1f Sparse BLAS: Remove workaround to check zero length inputs. (#22080)
Summary:
Fix was released with ROCm 2.4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22080

Differential Revision: D15947427

Pulled By: bddppq

fbshipit-source-id: b6b66f4cfc334ddc6140d1d519792d4783ba0efa
2019-06-21 14:45:06 -07:00
04e9278306 First round of optimizations for segment_reduction_op kernels. (#22081)
Summary:
Apply launch bounds annotations for ROCm as the maximum threads per
block (1024) is higher than the ROCm internal default (256).

Reduce the minBlocksPerMultiprocessor for ROCm to 8 from 16 as this
improves performance in some microbenchmarks by (statistically
significant) 4%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22081

Differential Revision: D15947426

Pulled By: bddppq

fbshipit-source-id: b4b7015417f99e14dfdedb62639e4d837c38e4fd
2019-06-21 14:33:12 -07:00
1c5fe2e8c4 Add support for Python 3.8 Constant node (#22007)
Summary:
We can't really test these until we get Python 3.8 in the CI, but these all work locally and won't be invoked at all for Python 3.7 and lower so this should be pretty safe.

Fixes #21710
](https://our.intern.facebook.com/intern/diff/15914735/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22007

Pulled By: driazati

Differential Revision: D15914735

fbshipit-source-id: 83833cebe7e38b162719a4f53cbe52c3fc638edd
2019-06-21 14:22:06 -07:00
f9b3989206 handle slice with negative indices and indices exceeding tensor dimen… (#21811)
Summary:
handle slice with negative indices and indices exceeding tensor dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21811

Reviewed By: zrphercule

Differential Revision: D15944243

Pulled By: houseroad

fbshipit-source-id: f7d987e9d8d704ade9d489599df14afbf1333428
2019-06-21 13:37:54 -07:00
38c9bb8261 Remove most usages of THCHalfAutoNumerics. (#21878)
Summary:
This was originally introduced between at::Half, which overloaded a number of operators; since this isn't necessary anymore, get rid of it.

Note in many cases, these files still need THCNumerics.cuh (which was included by THCHalfAutoNumerics); I was not careful about isolating these usages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21878

Differential Revision: D15941236

Pulled By: gchanan

fbshipit-source-id: 65f30a20089fcd618e8f3e9646cf03147a15ccba
2019-06-21 12:40:38 -07:00
06c3bd0302 Improve ListPtr::extract() (#21753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21753

- it accidentally didn't move non-IValue-based lists before. This is fixed now.
- it only needs to recreate a T() for IValue-based lists

Reviewed By: resistor

Differential Revision: D15809220

fbshipit-source-id: 944badf1920ee05f0969fff0d03284a641dae4a9
2019-06-21 12:26:01 -07:00
fe580e850e Rewrite lerp operator to use TensorIterator and support compile-time vectorization. (#22038)
Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
2019-06-21 11:39:27 -07:00
28630529ac Limit overall number of threads used by TBB (#22045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22045
ghimport-source-id: ea49ae04d86677f7a73a07968ce454eb1128fb84

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/parallel_info
./build/bin/thread_init_test
./build/bin/test_parallel

Imported from OSS

Differential Revision: D15930319

Pulled By: ilia-cher

fbshipit-source-id: 4c33ae395965e5708f8d7ceb67495b303fc4d22c
2019-06-21 11:39:18 -07:00
82dd69326b Split nn.Module._save_to_state_dict to make it overridable (#21933)
Summary:
# Motivation

We allow to override JIT module serialization with `__getstate__/__setstate__` in order to cover cases where parameters are not serializable. Use cases include: MKLDNN integration: a388c78350/torch/utils/mkldnn.py (L18-L26)
and also fbgemm prepacked format integration for quantized tensors.

However many Eager scripts use `torch.save(module.state_dict())` form of serialization. There are several ways to make it work:

* make packed_weight itself pickleable (e.g. by binding `__getstate__/__setstate__` on C++ UDT level)
    * change: we’d need to allow module buffers to be of arbitrary, non-Tensor types
    * pro: no change to state_dict behavior
    * cons: might not be directly inspectable by user calling .state_dict(), especially if packed weights represent several tensors fused together
* make packed_weight being proper Tensor layout
    * pro: no change to state_dict or buffers behavior
    * cons: adding new tensor layouts is pretty costly today
    * cons: doesn’t work if multiple tensors are packed in one interleaved representation
* *[this approach]* allow Modules to override state_dict and return regular tensors
    * pro: most flexible and hackable
    * pro: maintains semantic meaning of statedict as all data necessary to represent module’s state
    * cons: complicates state_dict logic
    * cons: potential code duplication between `__getstate__/__setstate__`

Based on discussions with zdevito and gchanan we decided to pick latter approach. Rationale: this behavior is fully opt-in and will impact only modules that need it. For those modules the requirement listed above won't be true. But we do preserve requirement that all elements of state_dict are tensors. (https://fburl.com/qgybrug4 for internal discussion)

In the future we might also implement one of the approaches above but those are more involved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21933

Differential Revision: D15937678

Pulled By: dzhulgakov

fbshipit-source-id: 3cb5d1a8304d04def7aabc0969d0a2e7be182367
2019-06-21 09:55:22 -07:00
b2197ef2b0 Adding support for JIT Fusion on Windows for CUDA (#21861)
Summary:
This pull request adds the necessary Windows DLL code to be able to support JIT fusion for CUDA. CPU JIT Fusion isn't supported. This also adds all the non-CPU JIT tests back in on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21861

Differential Revision: D15940939

Pulled By: soumith

fbshipit-source-id: e11f6af1ac258fcfd3a077e6e2f2e6fa38be4ef1
2019-06-21 09:44:17 -07:00
edb5a1662e Remove getDeviceFromPtr and allocator from Type (#21940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21940
ghimport-source-id: 0a618878ae030f663b05662f83ac4b549a90ba29

Test Plan: Imported from OSS

Differential Revision: D15893330

Pulled By: li-roy

fbshipit-source-id: a3dfb6b4ed0c72f7f3efd00192fb63aabc9c5967
2019-06-21 01:05:33 -07:00
b36a041d6f Move UnsafeTensorFromTH and UnsafeStorageFromTH off Type (#21923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21923
ghimport-source-id: f015c8521ef9071eaa982cbf73c13aa925035956

Test Plan: Imported from OSS

Differential Revision: D15883390

Pulled By: li-roy

fbshipit-source-id: 6a7a7ffbe6000199d41cdca5efb97371f46dd8fe
2019-06-21 01:05:29 -07:00
5d7cf66862 add Int8SpatialBNRelu (#22014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22014

Add Int8SpatialBN + Relu fused operator.

Reviewed By: dskhudia

Differential Revision: D15916551

fbshipit-source-id: a938e0f0e105ab5f823a3cb6144f50aa2ab944c1
2019-06-20 23:23:04 -07:00
7d81e62562 Add mkldnn tests for running end to end resnet models
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22041

Differential Revision: D15928786

Pulled By: bddppq

fbshipit-source-id: 4b12e5bda2da13aba2d63d357a0a854d59317362
2019-06-20 22:42:49 -07:00
71741ba115 rename test to be more consistent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057

Differential Revision: D15936870

Pulled By: soumith

fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69
2019-06-20 22:02:36 -07:00
a3fc6ed046 Hook up liveness into profiling pipeline.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21881

Differential Revision: D15931627

Pulled By: Krovatkin

fbshipit-source-id: dc825a563c7aceb5f66a2ed2a600d550b70941b2
2019-06-20 21:23:16 -07:00
3838324539 Add max/min/argmax/argmin/sort/argsort for quantized Tensor (#21546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21546

Added following methods for QTensor:
- max, min
- argmax, argmin
- sort, argsort

Reviewed By: dzhulgakov

Differential Revision: D15718117

fbshipit-source-id: 746b978d5722cb75e216fc65585bf206d45a7969
2019-06-20 21:00:03 -07:00
95aee81dd7 more general fusion logic (#22015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22015

Previous fusion logic only works for operators back-to-back in the linear order of protobuf file.
This diff generalizes to work for any predecessor-successor operators in the graph without any "interfering" use/def of the related blobs.

Reviewed By: csummersea

Differential Revision: D15916709

fbshipit-source-id: 82fe4911a8250845a8bea3427d1b77ce2442c495
2019-06-20 20:44:26 -07:00
88921feafd change return type for q_scale and q_zero_point (#21709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709

Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions

Differential Revision: D15793003

fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04
2019-06-20 20:30:39 -07:00
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
d4119f8fcb Automatic update of fbcode/onnx to 355a4954ea4e5836a5e943589509951c44feb6b4 (#22030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22030

Previous import was dd599b05f424eb161a31f3e059566a33310dbe5e

Included changes:
- **[355a4954](https://github.com/onnx/onnx/commit/355a4954)**: Update codeowners to have community folder changes assigned to steering committee (#2104) <Prasanth Pulavarthi>
- **[ceaa5da7](https://github.com/onnx/onnx/commit/ceaa5da7)**: Fix Resize/Upsample Shape inference function (#2085) <Raymond Yang>
- **[4de8dc0d](https://github.com/onnx/onnx/commit/4de8dc0d)**: Clarify shape inference requirements for new operators (#2088) <Hariharan Seshadri>
- **[52aa1fad](https://github.com/onnx/onnx/commit/52aa1fad)**: Fix NN defs file (#2083) <Hariharan Seshadri>

Reviewed By: bddppq

Differential Revision: D15924221

fbshipit-source-id: 91ba64ef3e1a2de4a7dd0b02ee6393508cc44a73
2019-06-20 15:52:45 -07:00
84a2d5d7aa Add hashing to bucket-weighted pooling (#20673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20673

Add option to bucket-weighted pooling to hash the bucket so that any cardinality score can be used.

Reviewed By: huginhuangfb

Differential Revision: D15003509

fbshipit-source-id: 575a149de395f18fd7759f3edb485619f8aa5363
2019-06-20 15:12:36 -07:00
1aae4b02df Fix 'error : detail is ambiguous' on Windows (#22025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22025
ghimport-source-id: 0fb408ad185a989507f7509a2a3574e1a7e60ab2

Test Plan: Imported from OSS

Differential Revision: D15926651

Pulled By: ezyang

fbshipit-source-id: 298340bfbfe44dcd81cde8f0d56f8dbde92fb7bd
2019-06-20 13:23:21 -07:00
19ef15709f Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0be0694d6adf1ae9baa408a4b372101a26a14ba4
2019-06-20 12:59:31 -07:00
4cd7d78718 correct arange docs (#21992)
Summary:
https://github.com/pytorch/pytorch/issues/21579 correctly points out an inaccuracy in the docs for `arange`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21992

Differential Revision: D15914411

Pulled By: umanwizard

fbshipit-source-id: 3eb1734b29af3f3858f0f4d54c71e28dbda5c75b
2019-06-20 12:36:00 -07:00
08facca1a1 Support accumulating DDP grads using a context manager (#21736)
Summary:
The first attempt and more discussions are available in https://github.com/pytorch/pytorch/issues/19577

#### Goal

Allow toggling DDP gradient synchronization across iterations. With this feature, users may accumulate grads in module variables, and only kick off expensive grad synchronize every a few iterations.

#### Concerns

Our first attempt in https://github.com/pytorch/pytorch/issues/19577 tries to do it using a variable or a function. But apaszke made a good point that it will not be error prone, and favors a context manager instead.

#### Proposed Solution

Instead of providing a `accumulate_grads` variable/function/context, we provide a `DistributedDataParallel.no_sync()` context manager. And it does exactly what the name suggests, i.e., disable DDP grad synchronization within the context. Note that `accumulate_grads` means `no_sync` + no optimizer step, where the latter is not controlled by DDP.

It is true that users need to call another `model(input).backward()` after exiting the context, and this is indeed more verbose. But I think it is OK as one major concern in the previous discussion is to prevent users from running into errors without knowing it. This API should reaffirm the expected behavior, and does not mess up with other use cases if accumulating grads is not required..

The application would then look like:

```python
with ddp.no_sync():
  for input in inputs:
    ddp(input).backward()

ddp(one_more_input).backward()
optimizer.step()
```

chenyangyu1988 myleott
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21736

Differential Revision: D15805215

Pulled By: mrshenli

fbshipit-source-id: 73405797d1e39965c52016af5cf45b15525ce21c
2019-06-20 12:23:52 -07:00
40b9f8f0a0 Added more descriptive error message for index out of range (#21758)
Summary:
https://github.com/pytorch/pytorch/issues/21535
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21758

Differential Revision: D15922915

Pulled By: Chillee

fbshipit-source-id: dcb301a661c359f27869200ee241ec272ef50d3a
2019-06-20 12:12:03 -07:00
6bd58b7548 Move list / dict tests to TestList and TestDict (#22000)
Summary:
There aren't any substantive changes aside from some test renames (e.g. `TestScript.test_dict_membership` -> `TestDict.test_membership`) and the addition of `TestDict.dict()`.

Adding the rest of the dict ops was making the tests a mess and `TestScript` is already > 10000 lines by itself, so breaking them up should make things cleaner
](https://our.intern.facebook.com/intern/diff/15911383/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22000

Pulled By: driazati

Differential Revision: D15911383

fbshipit-source-id: 614428e03fbc14252f0e9cde74ab9a707169a860
2019-06-20 11:17:35 -07:00
0702b5f345 Partially parallelize randperm on CPU. (#21529)
Summary:
This commit parallelizes the variable initialization (from 1 to n) step on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21529

Differential Revision: D15855402

Pulled By: VitalyFedyunin

fbshipit-source-id: f1ba54587451f9cb0eb5e542c3c5b458b48e1a3d
2019-06-20 10:44:01 -07:00
e388f70499 Move cppdocs build to CircleCI (#19768)
Summary:
The cppdocs build job (originally run on Chronos as a cron job) was frequently broken because it was not run on every PR. This PR moves it to CircleCI and enables it on every PR, so that we can get the build failure signal much earlier.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19768

Differential Revision: D15922289

Pulled By: yf225

fbshipit-source-id: e36ef59a2e42f78b7d759ee02f2d94dc90f88fff
2019-06-20 10:24:21 -07:00
76fe91bb2f Revert D14889547: Add sparse tensor allreduce
Differential Revision:
D14889547

Original commit changeset: 34f3de4d6a2e

fbshipit-source-id: 24d2239da0b865280af88dce3d8fb25883fc0174
2019-06-20 10:07:27 -07:00
cb4c213f55 Revert D15007365: Support sparse gradients in DistributedDataParallel
Differential Revision:
D15007365

Original commit changeset: f298e83fd3ca

fbshipit-source-id: ef5e556d2df37f0c64652bd3563956afd8d9fd7f
2019-06-20 10:07:22 -07:00
f8f583cbae Port convtranspose2d (#20994)
Summary:
this PR will resolve partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20994

Differential Revision: D15876052

Pulled By: ezyang

fbshipit-source-id: 5896e0cbb656d0530e39fd681808adc685841b37
2019-06-20 07:11:38 -07:00
365de7bda1 Support sparse gradients in DistributedDataParallel (#19443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19443

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Reviewed By: mrshenli

Differential Revision: D15007365

fbshipit-source-id: f298e83fd3ca828fae9e80739e1db89d045c99ac
2019-06-20 07:06:28 -07:00
aee6a412e9 Add sparse tensor allreduce (#19146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19146

Implemented only on ProcessGroupGloo, as an allgather of metadata
(sparse_dim, dense_dim, and nnz), followed by an allgather of indices,
followed by an allgather of values. Once these operations have
finished, all ranks locally compute a reduction over these sparse
tensors. Works for both CPU and CUDA tensors.

This surfaced a problem with the existing assumption of only modifying
tensors that are passed at the call site, because for sparse tensors
we don't know the dimensions of the output tensors before we run the
collective. To deal with this unknown, this commit adds a `result`
function to the `c10d::ProcessGroup::Work` class that returns a vector
of tensors.

It's a bit odd to have to retrieve the result through this function
only for operations on sparse tensors. To make this work irrespective
of tensor layout, we can create a follow-up commit to make all in
place operations make their results accessible through this function
as well. This doesn't break any existing contracts but does have the
potential to add interface ambiguity.

Reviewed By: mrshenli

Differential Revision: D14889547

fbshipit-source-id: 34f3de4d6a2e09c9eba368df47daad0dc11b333e
2019-06-20 07:06:24 -07:00
97ea44b34a Fix issue in quantization error measurement when followed by Relu (#21890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21890

As title

Reviewed By: jspark1105

Differential Revision: D15739808

fbshipit-source-id: 8fbcca04f0711fd9f994d67e1f4a604ef9fa42c6
2019-06-19 22:29:54 -07:00
b6f542f8a1 Add aten mkldnn transpose (#21943)
Summary:
This PR is about:

1.  Make mkldnn reshape can share same memory fro plain format tensor.

2.  Add mkldnn transpose operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21943

Differential Revision: D15916063

Pulled By: bddppq

fbshipit-source-id: d1971c67341f277c1e80c1fa34e213b6c27f4062
2019-06-19 22:20:46 -07:00
3d44cd6d19 Replace Type dispatch with ATenDispatch (#22008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22008
ghimport-source-id: b0a5cc3da283b195f88636e2a61939d2facd11d9

Test Plan: Imported from OSS

Differential Revision: D15914756

Pulled By: li-roy

fbshipit-source-id: 5bc300ec525a3ee9e6491dd4c55e78bbd977d691
2019-06-19 21:42:54 -07:00
5d67c606ea Added error for classes that don't have an init function (#21880)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21761
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21880

Differential Revision: D15879205

Pulled By: Chillee

fbshipit-source-id: 8b614970196b381357b6032a73eeaab0b7a4f667
2019-06-19 21:33:37 -07:00
4fee532de6 Pass loop_over optional parameter for cached reader properly. (#21929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21929

Just need to pass `loop_over` argument properly.

Reviewed By: noname01

Differential Revision: D15885401

fbshipit-source-id: f1928277262a80e5b41f4c4f3945c2f378a4e233
2019-06-19 18:15:32 -07:00
96c0bd3722 ListPtr->List DictPtr->Dict step 3 (#21938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21938

After having changed all call sites, we can now remove the old naming scheme.

Reviewed By: zdevito

Differential Revision: D15892402

fbshipit-source-id: 1f5b53a12fa657f6307811e8657c2e14f6285d2f
2019-06-19 18:02:08 -07:00
275087383b ListPtr->List DictPtr->Dict step 2 (#21937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937

This changes call sites to use the new naming scheme

Reviewed By: zdevito

Differential Revision: D15892404

fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133
2019-06-19 18:02:05 -07:00
093c78f854 ListPtr->List DictPtr->Dict step 1 (#21936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21936

This introduces torch::List and torch::Dict as aliases to ListPtr/DictPtr.
After this lands, we can step by step change the call sites to the new naming
and finally remove the old spellings.

Reviewed By: zdevito

Differential Revision: D15892405

fbshipit-source-id: 67b38a6253c42364ff349a0d4049f90f03ca0d44
2019-06-19 18:02:01 -07:00
cba79f4872 Revert D15637222: [wip] Replace Type dispatch with ATenDispatch
Differential Revision:
D15637222

Original commit changeset: fcfaea0b5480

fbshipit-source-id: 9bca7ebb91d7a3609b86663089140d7c5a33f58d
2019-06-19 17:36:52 -07:00
15be5483c0 Move NamedType method definitions into cpp file (#21983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21983
ghimport-source-id: fe9c1eba5f4c737e1442b877d396b9e8e5298cfb

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D15907633

Pulled By: jamesr66a

fbshipit-source-id: bd2dfdca117cdc3ae35fdd9d29cf521d82636069
2019-06-19 16:43:11 -07:00
f6aac41391 Defining object destructor in c10 (#21984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21984
ghimport-source-id: 5767592e37ed388422eed5639f8ba0722aec66e2

Test Plan: Imported from OSS

Differential Revision: D15906530

Pulled By: zdevito

fbshipit-source-id: bec8c8b0b5b9dcc2e8fc69b5031fcfa6bb22d54e
2019-06-19 16:27:40 -07:00
24a6c32407 Replace Type dispatch with ATenDispatch (#21320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21320
ghimport-source-id: cc18f746a1c74df858cb0f6d8b7d4de4315683c7

Test Plan: Imported from OSS

Differential Revision: D15637222

Pulled By: li-roy

fbshipit-source-id: fcfaea0b5480ab966175341cce92e3aa0be7e3cb
2019-06-19 15:46:45 -07:00
00fdb2cf95 Enable XLA by default on pull requests. (#21991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21991
ghimport-source-id: 5077e62a613c36256d2b5a2427aa9c3887c4a797

Test Plan: Imported from OSS

Differential Revision: D15907913

Pulled By: ezyang

fbshipit-source-id: c67bb999f02760836d1568c1a3911add3f1538f0
2019-06-19 15:01:49 -07:00
effcc398c4 Refactor Random Number Generators in ATen (#21555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21555
ghimport-source-id: dd900a8c3e1ef9ef1e011b8bb5476626d18cc462

Test Plan: Imported from OSS

Differential Revision: D15875780

Pulled By: ezyang

fbshipit-source-id: 6e04e90af62ab9c9593d74f344a3a084aaaf6f43
2019-06-19 13:54:09 -07:00
34aee933f9 ONNX Export Interpolate (Resize) for opset version 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21434

Reviewed By: zrphercule

Differential Revision: D15777197

Pulled By: houseroad

fbshipit-source-id: 517b06a54a234ffdb762401e83f5a732023ed259
2019-06-19 13:40:27 -07:00
44128e09f0 Speed up op lookup and registration (#21806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806

Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it.

This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle)
and it also speeds up op registration since that needs to look if an op with the same name already eists.

Differential Revision: D15834256

fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1
2019-06-19 12:05:14 -07:00
d1c80300ce Better stringification of dispatch keys in error messages (#21809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21809

Many error messages show dispatch keys, for example when the dispatcher didn't find a kernel to dispatch to.
Previously, this was a string like "CPU" or "CUDA" for known backends and just an arbitrary number for other backends.

Now, tensor type id registration also registers a name for the dispatch key and shows that in the error messages.

There is no API change, just the error messages are better now.

Differential Revision: D15835809

fbshipit-source-id: 4f0c9d0925c6708b02d79c653a2fae75b6623bb9
2019-06-19 11:44:24 -07:00
dd046bef8d NamedTuple serialization (#21839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21839
ghimport-source-id: b9d82018fbf26b22d58cad3a033cbfe4e879a8fe

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860002

Pulled By: jamesr66a

fbshipit-source-id: 0fc97c4adefa9ae4937f21179c7afa817f4099e5
2019-06-19 10:43:55 -07:00
5a37f8c63f Refactor TupleType to take a NamedTupleSpec (#21836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21836
ghimport-source-id: 91cab735765ff875046b42864188e86b8487b0ae

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860003

Pulled By: jamesr66a

fbshipit-source-id: 62a99a212ae6f9af83a90305e443f2dd05588292
2019-06-19 10:43:51 -07:00
c0be6e6290 Introduce SerializableType (#21835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21835
ghimport-source-id: e674048a56b9a573ba89e484f4b41818d3f08234

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860004

Pulled By: jamesr66a

fbshipit-source-id: 2d2905296939903ed4586932bea0a504b542bbdb
2019-06-19 10:43:47 -07:00
74104f383e Some small fixes for NamedTuple (#21813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21813
ghimport-source-id: a1edca8ad0384a9e493ef2f3b0aa5005a668a8f3

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15860005

Pulled By: jamesr66a

fbshipit-source-id: 4a43432d2dacebde1a676a93ac57f675db857154
2019-06-19 10:43:43 -07:00
6b972795e4 Add torch.__future__._overwrite_module_params_on_conversion global flag, and check it in nn.Module._apply() (#21613)
Summary:
https://github.com/pytorch/pytorch/pull/17072 breaks `model.to(xla_device)`, because moving `model` to XLA device involves changing its parameters' TensorImpl type, and the current implementation of `nn.Module.to()` doesn't support changing module parameters' TensorImpl type:
```python
# 6dc445e1a8/torch/nn/modules/module.py (L192-L208)
def _apply(self, fn):
    ...
    for param in self._parameters.values():
        if param is not None:
            # Tensors stored in modules are graph leaves, and we don't
            # want to create copy nodes, so we have to unpack the data.
            param.data = fn(param.data)  # NOTE: this doesn't allow changing `param.data`'s TensorImpl type
            if param._grad is not None:
                param._grad.data = fn(param._grad.data)  # NOTE: this doesn't allow changing `param._grad.data`'s TensorImpl type
   ...
```

yf225 TODO: fix the description here when we finish the implementation

To fix this problem, we introduce a new API `model.to_()` that always assign new tensors to the parameters (thus supporting changing the parameters to any TensorImpl type), and also bump the version counter of the original parameters correctly so that they are invalidated in any autograd graph they participate in.

We also add warning to the current `model.to()` API to inform users about the upcoming behavior change of `model.to()`: in future releases, it would create and return a new model instead of in-place updating the current model.

This unblocks adding XLA to our CI test suite, which also allows XLA to catch up with other changes in our codebase, notably the c10 dispatcher.

[xla ci]

cc. resistor ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21613

Differential Revision: D15895387

Pulled By: yf225

fbshipit-source-id: b79f230fb06019122a37fdf0711bf2130a016fe6
2019-06-19 10:30:02 -07:00
Jie
056a033cdc updating upsampling bilinear2d kernel: (#21879)
Summary:
1. faster atomicAdd trick for fp16 backward kernel
2. better launch configs for backward kernel
3. removed unnecessary buffer initialization for forward kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21879

Differential Revision: D15898680

Pulled By: ezyang

fbshipit-source-id: 1fc81e6c078f1538d82e4f36921b630499eb504f
2019-06-19 07:42:21 -07:00
34536e207a Fix: convert Onnx DynamicSlice operator with 4 inputs to caffe2 fa… (#20846)
Summary:
I reported an issue [https://github.com/pytorch/pytorch/issues/20743](url)
and make this pull request for it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20846

Reviewed By: zrphercule

Differential Revision: D15569135

Pulled By: houseroad

fbshipit-source-id: 96a2c818ef666a7d79b96decfa347d7154b34d5c
2019-06-19 00:09:15 -07:00
4b1df5c1f5 Use fn(param) instead of fn(param.data) in nn.Module._apply (#21865)
Summary:
When we pass `fn` to `nn.Module._apply()` and `fn` is an in-place operation, the correct behavior should also include bumping the parameters' and their gradients' version counters. This PR fixes the old incorrect behavior and makes sure the new behavior is right.

Note that this PR is BC-breaking in the following way:

Previously, passing an in-place operation to `nn.Module._apply()` does not bump the module's parameters' and their gradients' version counters. After this PR, the module's parameters' and their gradients' version counters will be correctly bumped by the in-place operation, which will invalidate them in any autograd graph they previously participate in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21865

Differential Revision: D15881952

Pulled By: yf225

fbshipit-source-id: 62f9244a4283a110147e9f20145ff232a5579fbd
2019-06-18 20:45:40 -07:00
abd6cffe55 Added some extra tests for std_mean and var_mean for multiple dims. (#20650)
Summary:
Added some extra tests for std_mean and var_mean for multiple dims.
Some refactoring of previously created tests based on PR comments: https://github.com/pytorch/pytorch/pull/18731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20650

Differential Revision: D15396101

Pulled By: ifedan

fbshipit-source-id: d15c3c2c7084a24d6cfea4018173552fcc9c03a9
2019-06-18 20:36:32 -07:00
fa5263af2c Add set_quantizer_ for QTensor (#21852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852

To enable change of q_scale and q_zero_point in `copy_`

Differential Revision: D15793427

fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be
2019-06-18 19:50:12 -07:00
e239e31da6 Fix lint error (#21932)
Summary:
https://github.com/pytorch/pytorch/issues/21916 has broken python lint on master
https://travis-ci.org/pytorch/pytorch/jobs/547354937
```
./tools/build_variables.py:167:39: E261 at least two spaces before inline comment
./tools/build_variables.py:379:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:379:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:380:17: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:380:19: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:381:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:381:20: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:382:23: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:382:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:387:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:387:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:388:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:388:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:389:16: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:389:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:390:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:390:27: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:391:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:391:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:395:22: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:395:24: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:402:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:402:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:403:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:403:15: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:404:16: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:404:18: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:405:25: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:405:27: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:406:13: E251 unexpected spaces around keyword / parameter equals
./tools/build_variables.py:406:15: E251 unexpected spaces around keyword / parameter equals
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21932

Differential Revision: D15892041

Pulled By: bddppq

fbshipit-source-id: f62949a7617f8ea9c036ea9b48ab1e340a7af83e
2019-06-18 19:08:24 -07:00
3bdde56907 Fix incorrect usage of __HIP_PLATFORM_HCC__ (#21757)
Summary:
This avoid using `__HIP_PLATFORM_HCC__` in case it changes in the future.

Following up https://github.com/pytorch/pytorch/issues/21718
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21757

Reviewed By: xw285cornell

Differential Revision: D15891867

Pulled By: bddppq

fbshipit-source-id: 5de55687ab1c86eddf6b4d8d25fee48d96ec72ad
2019-06-18 18:56:32 -07:00
a388c78350 fix bug in CompilationUnit::define (#21886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21886
ghimport-source-id: fefbd758bbe2fbcaaad84a376ac5f69c40bccb80

Test Plan: Imported from OSS

Differential Revision: D15867647

Pulled By: suo

fbshipit-source-id: 3e0f5bbc98ec93ccf26442c4c574626e45e53888
2019-06-18 15:41:55 -07:00
52e1cea057 Fix recusive method compilation (#21862)
Summary:
The code in `python_sugared_value.cpp` to recursively compile methods
was not being tested, so this adds a test for it and fixes some errors
in it

It was necessary to disable any hooks set since (at least in our tests) they would try to export
a half-finished graph since they were being called on recursively
compiled methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21862

Differential Revision: D15860314

Pulled By: driazati

fbshipit-source-id: e8afe9d4c75c345b6e1471072d67c5e335b61337
2019-06-18 14:08:56 -07:00
eda08b0aae script::Module as a view. (#21814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21814
ghimport-source-id: 49cfea6101ad9ca438600c465762e23252e05ff3

Test Plan: Imported from OSS

Differential Revision: D15839583

Pulled By: zdevito

fbshipit-source-id: ab4ef31a523b3ac1477aa7e6d4d9513e7408c560
2019-06-18 13:58:49 -07:00
94c61d4f32 Fix infinite loop in del_post_hook (#21914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21914

https://github.com/pytorch/pytorch/pull/21591 added a needed feature to clean up grad accumulator post hooks when the DistributedDataParallel model object is cleaned up. There's a minor typo that causes it to loop infinitely over the first element.

Differential Revision: D15878884

fbshipit-source-id: b7fd0bbd51eb187579d639b1709c6f7b62b85e7a
2019-06-18 13:43:59 -07:00
c0f51142cd Added a test case for an index error for the index_copy_ (#21912)
Summary:
Follow up PR with a test for the [fixed](4b45f08f87) [bug](https://github.com/pytorch/pytorch/issues/20322).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21912

Differential Revision: D15878674

Pulled By: izdeby

fbshipit-source-id: c8fef2214606c796d174d0faaaf633531a7bea88
2019-06-18 13:43:56 -07:00
ad00c12379 Clean up //caffe2/torch-cpp to avoid duplicated symbols (#21916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21916

Hopefully fixes https://fb.workplace.com/groups/1405155842844877/permalink/2832659000094547/

Reviewed By: rutyrinott

Differential Revision: D15862128

fbshipit-source-id: 77c01a57bddc39b267e307b50942e029a381711b
2019-06-18 13:05:22 -07:00
081cd3a293 Change AT_CHECK to TORCH_CHECK in python_arg_parser.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21887

Differential Revision: D15869483

Pulled By: jerryzh168

fbshipit-source-id: f3d9d73078e7c1c08ec79694105e18084e7f9caf
2019-06-18 10:48:38 -07:00
28ecc104f4 Fix WeakIValueEq (#21891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21891
ghimport-source-id: a037850c96fe803540412db9a88548fa41f2d4f0

Test Plan: Imported from OSS

Differential Revision: D15871588

Pulled By: jamesr66a

fbshipit-source-id: ecfdece1285c0737d0b1dc2afe959c43d9413001
2019-06-18 10:35:35 -07:00
010f238d17 Retry pip install to make pytorch rocm CI more stable (#21895)
Summary:
pip install randomly core dumps, examples:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25720//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/25723//console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21895

Differential Revision: D15873197

Pulled By: bddppq

fbshipit-source-id: 2c967bc0a47bef9d3f7af83e99514c93b54e353f
2019-06-18 10:10:56 -07:00
5eb25c3704 Support in membership checks (#21527)
Summary:
This PR adds support for `in` checks like `key in my_dict`

For now it leaves lists as a follow up due to the changes around `IValue` lists and it needing an `IValue` equality op.

For objects it uses the magic method `__contains__(self, key)`
](https://our.intern.facebook.com/intern/diff/15811203/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21527

Pulled By: driazati

Differential Revision: D15811203

fbshipit-source-id: 95745060394f8a9450efaaf8ab09d9af83bea01e
2019-06-18 09:49:12 -07:00
afad3e4954 Add support for class annotations (#21379)
Summary:
This adds support for inferred attributes (everything except empty lists, dicts, and tuples) as well as using the PEP 526 style annotations on a class, so this eliminates the need for `torch.jit.Attribute`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21379

Differential Revision: D15718537

Pulled By: driazati

fbshipit-source-id: b7481ae3d7ee421613e931b7dc3427ef2a99757f
2019-06-18 09:49:09 -07:00
85528feb40 Mark test_snli as a slow test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21908

Pulled By: driazati

Differential Revision: D15875846

fbshipit-source-id: 98b79e7beee5ffd72e1f41d22e07e618547b23e9
2019-06-18 09:44:12 -07:00
0998a32588 Backward function will set a flag if it released variables (#21533)
Summary:
This is a fix for https://github.com/pytorch/pytorch/issues/21469
Currently there is no way to define if backward function released variables when variables were added to a vector. This change will set a flag if function has saved variables and they were released. So we will prevent if somebody will call this function again with already released variables.
Functions that do not have saved variables can be called multiple times for BC
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21533

Differential Revision: D15810481

Pulled By: ifedan

fbshipit-source-id: 5663e0c14f1b65727abc0d078aef348078d6a543
2019-06-18 09:21:17 -07:00
f363a33e10 Set __file__ for torch.ops (#21888)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19351 https://github.com/pytorch/lockdown/issues/93
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21888

Differential Revision: D15871142

Pulled By: ailzhang

fbshipit-source-id: 339e9d493e2e13f09e118814bdd1d7a5942804b8
2019-06-18 08:46:23 -07:00
31e1e63bc2 Port avg_pool3d() to ATen (#21732)
Summary:
This will need a conflict resolution once avg_pool2d() has been merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21732

Differential Revision: D15824923

Pulled By: ezyang

fbshipit-source-id: 83341e0209b660aecf788272079d8135d78b6ff1
2019-06-18 08:33:30 -07:00
Jie
c471a63a39 UpSample-nearest cuda kernel update (#21694)
Summary:
updating upsampling kernel:
1. avoids atomicAdd for better fp16 performance.
2. better launch configures for 2D input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21694

Differential Revision: D15875791

Pulled By: ezyang

fbshipit-source-id: 426fc5d5f0c0cdf58bfa1a2b564f17a9ea286fa4
2019-06-18 08:24:25 -07:00
998efb48c3 Add at::dimname_to_position helper. (#21789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21789
ghimport-source-id: 42c0a58280f3645dd38ea11d39311a0c53f90488

Test Plan:
- `build/bin/NamedTensor_test` [namedtensor ci]

Imported from OSS

Differential Revision: D15833455

Pulled By: zou3519

fbshipit-source-id: 8dd51a7b785972668984a7c161b94b92039a1cb1
2019-06-18 07:44:04 -07:00
8f9e0f77dd Turn off non-default stream testing. (#21793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21793
ghimport-source-id: 5264fa90ca77fbc79898cfa2f0ee02f47dec27d4

Test Plan: Imported from OSS

Differential Revision: D15874814

Pulled By: ezyang

fbshipit-source-id: 5c51ab9ae431faf2db549b88b07ba00783acab25
2019-06-18 07:00:08 -07:00
08a0ac84d7 Removed unused variable from closure in range (#21897)
Summary:
This was some code I added :^)

Time for me to remove it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21897

Differential Revision: D15873213

Pulled By: Chillee

fbshipit-source-id: 769c3bd71c542be4afddc02dc2f65aa5c751b10d
2019-06-18 02:21:50 -07:00
6042012a93 Fixed "tried to access to nonexistent attribute" -> "tried to access nonexistent attribute"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21863

Differential Revision: D15873204

Pulled By: Chillee

fbshipit-source-id: c5d85487b287ee9dd8318161ef9399ffd1ee0b68
2019-06-18 02:13:09 -07:00
df787cf079 Fixed a warning in test_jit.py (#21898)
Summary:
What's the point of having warnings if we never fix them :^)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21898

Differential Revision: D15873280

Pulled By: Chillee

fbshipit-source-id: a8274bab2badd840d36a9d2e1354677a6114ae1d
2019-06-18 01:15:15 -07:00
f1c1d1a964 Export the cosine_similarity op as an ATenOp correctly (#21884)
Summary:
cosine_similarity has two non-tensor parameters, needs some special handling. Add the support for its export in this diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21884

Reviewed By: zrphercule

Differential Revision: D15866807

Pulled By: houseroad

fbshipit-source-id: a165fbc00c65c44b276df89ae705ca8960349d48
2019-06-17 23:34:59 -07:00
3ed8acdf59 Fixes lint error in py3 (#21883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21883
ghimport-source-id: c4330d71033929178ef10f2a0fcd8b0b2b468cb5

Test Plan: Imported from OSS

Differential Revision: D15866746

Pulled By: bwasti

fbshipit-source-id: c3d23f3396a95d5b1d689a07662e82e48cb3ab7a
2019-06-17 22:20:06 -07:00
2ba164b943 Future interface for ATen/Parallel (#21764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21764
ghimport-source-id: fca083c09d814a0411020871f49429509fc0e8b5

Test Plan:
Imported from OSS
see https://github.com/pytorch/pytorch/pull/21764

Differential Revision: D15816658

Pulled By: ilia-cher

fbshipit-source-id: 0e25ca6ff66a837d4f69f37a47e59927ab10e216
2019-06-17 22:05:59 -07:00
d8314a6260 Replace nullary/unary/binary loops with generic implementation (#21475)
Summary:
```
This replaces the kernel helpers in Loops.h/cuh with the following:

  cpu_kernel
  cpu_kernel_vec

  gpu_kernel
  gpu_kernel_with_scalars

These work with functions with any number of input arugments, with the
exception of 'gpu_kernel_with_scalars' which is limited to binary
operations. Previously, we only supported functions of 0, 1, or 2 input
arguments. Adding support for 3 or 4 input argument functions required
significant amount of additional code.

This makes a few other changes:

Remove 'ntensors' from the for_each/serial_for_each loop. Most loops
assume a fixed number of tensors, and the value is accessible from
TensorIterator::ntensors()

Only lift CPU scalars to parameters in 'gpu_kernel_with_scalars'.
Previously, we performed this recursively in gpu_unary_kernel and
gpu_binary_kernel, so something like `torch.add(3, 4, out=cuda_tensor)`
would specialize to a "nullary" kernel. Now, only the first
scalar input is lifted to a kernel parameter. Any additional scalar
inputs are copied to CUDA tensors. Note that operations like `x + 5`
and `5 + x` still work efficiently. This avoids generating an exponential
number of specializations in the number of input arguments.
```

**Performance measurements**
Timing numbers are unchanged for basic elementwise operations. Linked below is a script to measure torch.add perf on PR vs. master CPU+GPU (GCC 7.3):
[miniperf.py](https://gist.github.com/colesbury/4a61893a22809cb0931f08cd37127be4)

**Generated assembly**
cpu_kernel and cpu_kernel_vec still generate good vectorized code with
both GCC 7.3 and GCC 4.8.5. Below is the assembly for the "hot" inner loop of
torch.add as well as an auto-vectorized torch.mul implementation using cpu_kernel/
binary_kernel. (The real torch.mul uses cpu_kernel_vec but I wanted to check that
auto vectorization still works well):

[torch.add GCC 7.3](https://gist.github.com/colesbury/927ddbc71dc46899602589e85aef1331)
[torch.add GCC 4.8](https://gist.github.com/colesbury/f00e0aafd3d1c54e874e9718253dae16)
[torch.mul auto vectorized GCC 7.3](https://gist.github.com/colesbury/3077bfc65db9b4be4532c447bc0f8628)
[torch.mul auto vectorized GCC 4.8](https://gist.github.com/colesbury/1b38e158b3f0aaf8aad3a76963fcde86)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21475

Differential Revision: D15745116

Pulled By: colesbury

fbshipit-source-id: 914277d7930dc16e94f15bf87484a4ef82890f91
2019-06-17 19:08:33 -07:00
7f057f00cc Update mkldnn-bridge to fix MKLDNN grouped conv issue (#21854)
Summary:
1. Fix grouped conv issue in https://github.com/pytorch/pytorch/issues/21597
2. Fix build error in 731670f40a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21854

Test Plan: buck run experimental/jbai/pt_issue_21597_mkldnn_conv_2d_repro:run

Reviewed By: yinghai

Differential Revision: D15861105

Pulled By: bddppq

fbshipit-source-id: fe3e2943a15aab4294f8e6bb15db15829a94420f
2019-06-17 18:21:26 -07:00
5237835a17 Make script::Method a value type (#21675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21675
ghimport-source-id: 90ee7ba00e58b0151ca4c17e91fd17303c9d5d08

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D15777725

Pulled By: zdevito

fbshipit-source-id: 8482cd2e1dcd7dd77a9cacbb76743bd190c7c4cf
2019-06-17 18:14:50 -07:00
cc4498a54a Always enable P2P access for GPU copies (#21872)
Summary:
PR https://github.com/pytorch/pytorch/issues/20685 incorrectly only enabled P2P access for non-contiguous copies.
This can make cudaMemcpy slow for inter-gpu copies, especially on ROCm
devices.  I didn't notice a difference on CUDA 10, but ngimel says it's
important for CUDA too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21872

Differential Revision: D15863965

Pulled By: colesbury

fbshipit-source-id: 0a858f3c338fa2a5d05949d7f65fc05a70a9dfe1
2019-06-17 17:48:28 -07:00
76a250d590 Add new regression loss function type to FBLearner (#21080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21080

Add Huber loss as a new option for regression training (refer to TensorFlow implementation: https://fburl.com/9va71wwo)
  # huber loss
  def huber(true, pred, delta):
    error = abs(true-pred)
    loss = 0.5 * min(error, delta)^2 + delta * max(error - delta, 0)
    return mean(loss)

As a combination of MSE loss (`x < delta`) and MAE loss (`x >= delta`), the advantage of Huber loss is to reduce the training dependence on outlier.

One thing worth to note is that Huber loss is not 2nd differential at `x = delta`. To further address this problem, one could consider adopt the loss of `LOG(cosh(x))`.

Reviewed By: chintak

Differential Revision: D15524377

fbshipit-source-id: 73acbe2728ce160c075f9acc65a1c21e3eb64e84
2019-06-17 17:43:00 -07:00
8aeb4ef4bf Add python string standard lib (#21807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21807
ghimport-source-id: dcb2c78b8facb90a323ab9212b7703e553354273

Test Plan: Imported from OSS

Differential Revision: D15835509

Pulled By: bwasti

fbshipit-source-id: bc8bc5ae5a4fb4a1581aa94485973ed87af4eaaf
2019-06-17 15:48:36 -07:00
d329dffd92 improve error message on recursive class defs (#21842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21842
ghimport-source-id: 33569714b18fc476c4e6b3bc976b53b1f107273d

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15857568

Pulled By: suo

fbshipit-source-id: 6307597b9741cfdccd5c55216ebdc7c4391a5e23
2019-06-17 15:23:21 -07:00
cdae8b93a7 improve recursive scripting error message (#21841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21841
ghimport-source-id: fbca813d12ca4bfad7967e12c8dafe5eaba77cab

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D15857569

Pulled By: suo

fbshipit-source-id: 152eba10565cf7119508079e98512f116eb3a5a8
2019-06-17 15:23:17 -07:00
c0420d9618 Attempt to fix TRT build after library merge (#21775)
Summary:
After fixing https://github.com/pytorch/pytorch/issues/20774 the TRT build was broken

Because of missing annotations, pybind_state_gpu.so was missing symbols, but pybind_state.so did not. It caused a weird combination when trying to import pybind_state_gpu first left system in semi-initialized state and lead to sigsev.

Minimal repro:
```
>>> import ctypes

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so: undefined symbol: _ZN6caffe219TensorRTTransformer9TransformEPNS_9WorkspaceEPNS_6NetDefERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_11TensorShapeESt4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SC_EEE

>>> ctypes.CDLL('/var/lib/jenkins/.local/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state.so')
Segmentation fault (core dumped)
```

Too lazy to repro locally, let's see if CI passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21775

Differential Revision: D15829605

Pulled By: dzhulgakov

fbshipit-source-id: 1adb2bde56b0cd68f84cfca67bc050adcf787cd9
2019-06-17 14:16:45 -07:00
0408697317 Followup cleanup in cmake.py and add a comment in setup.py (#21792)
Summary:
Following up b811b6d5c03596d789a33d7891b606842e01f7d2

* Use property instead of __setattr__ in CMake.
* Add a comment clarifying when built_ext.run is called.

 ---

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21792

Differential Revision: D15860606

Pulled By: umanwizard

fbshipit-source-id: ba1fa07f58d4eac81ac27fa9dc7115d1cdd3dec0
2019-06-17 13:46:25 -07:00
7279e07c8b Don't use anonymous namespace in header. (#21790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21790
ghimport-source-id: ff648a1e9a1b8627f0742307e2e7810d6445d597

Test Plan: Imported from OSS

Differential Revision: D15827311

Pulled By: ezyang

fbshipit-source-id: 996bfd3a93fcda5934dcc523adae0648cba1c4fa
2019-06-17 13:26:02 -07:00
1aa16d356e named inference rule for tensor.select (#21752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21752
ghimport-source-id: 95e17087b8c29c9bd88003ae225cb7329d0b67e6

Test Plan:
- `python test/test_namedtensor.py` [namedtensor ci]

gh-metadata: pytorch pytorch 21752 gh/zou3519/50/head

Imported from OSS

Differential Revision: D15833453

Pulled By: zou3519

fbshipit-source-id: 7b51e4137e54712aa9c6274a9e6bb48ab7191b8d
2019-06-17 13:12:49 -07:00
b403b10ff9 Fix #11752: fix numerical issue in log_softmax (#21672)
Summary:
https://github.com/pytorch/pytorch/issues/11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in https://github.com/pytorch/pytorch/issues/11752. `log_softmax` is still not working for big logits.

I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`.

This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21672

Differential Revision: D15856327

Pulled By: VitalyFedyunin

fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567
2019-06-17 12:59:08 -07:00
0f675f9cbc Port im2col and vol2col (#21769)
Summary:
resolves partially https://github.com/pytorch/pytorch/issues/18353
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21769

Differential Revision: D15854530

Pulled By: ezyang

fbshipit-source-id: 574853c068010d1b7588047d2ab7450077471447
2019-06-17 10:06:26 -07:00
2b23fac8da Disallow creation of tensors with duplicate names (#21781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21781
ghimport-source-id: d77e0c97fe0104b4b29571fd5828967399d34fb1

Test Plan:
- `python test/test_namedtensor.py -v` [namedtensor ci]

gh-metadata: pytorch pytorch 21781 gh/zou3519/51/head

Imported from OSS

Differential Revision: D15833454

Pulled By: zou3519

fbshipit-source-id: fca4de83fba4bced615ec3cbd4ce4c441ddfcaf2
2019-06-17 09:59:50 -07:00
44707dd3ca Rename Dimname::name to Dimname::full_name (#21803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21803
ghimport-source-id: e0bc5a746e745e18f19215c6551d79cb0cd5f9c5

Test Plan:
- [namedtensor ci]

Imported from OSS

Differential Revision: D15833452

Pulled By: zou3519

fbshipit-source-id: 7aa4d78ff436bd6a622a5ea235b75135d9798d33
2019-06-17 08:32:32 -07:00
7c1528bab6 Copy NamedTensorMeta in TensorImpl::copy_tensor_data() (#21735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21735
ghimport-source-id: 4a4289693e372880e3d36e579c83d9e8745e70ed

Test Plan:
- I'm not sure how to test this other than making sure it compiles.
- [namedtensor ci]

gh-metadata: pytorch pytorch 21735 gh/zou3519/49/head

Imported from OSS

Differential Revision: D15833456

Pulled By: zou3519

fbshipit-source-id: ea2fa6d5c5f1eb2d7970d47189d6e4fcd947146d
2019-06-17 08:32:28 -07:00
da4e60226c Keep Reducer hooks in a vector instead of an unordered_map (#21783)
Summary:
kuttas pointed out that the DDP Reducer only needs to remember `uintptr, Function` pairs, and hence does not need a nunordered map as added by https://github.com/pytorch/pytorch/issues/21591. Using a vector should speed it up a bit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21783

Differential Revision: D15854312

Pulled By: mrshenli

fbshipit-source-id: 153ba035b8d658c7878a613f16a42de977d89c43
2019-06-17 08:24:19 -07:00
76713fb564 Fix remote build + clean up disable feature hack (#21816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21816

Clean up disable feature hack.

Reviewed By: bddppq

Differential Revision: D15833285

fbshipit-source-id: a2ae5d0f15e47b835dbd3997bbaa0add7e868f20
2019-06-17 08:08:34 -07:00
4a6aa1d806 Populate producer_info.json in any PyTorch model at FB (#21662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21662

Use hook added in https://github.com/pytorch/pytorch/pull/20863 to auto-write a file with environment information (including user, machine, Flow, etc).

Reviewed By: natalialunova

Differential Revision: D15690185

fbshipit-source-id: ccaaeda9562db32925041d18f394fb98fab8db99
2019-06-16 20:12:23 -07:00
c9ba3f699d Bag of documentation fixes (#21846)
Summary:
Thanks henon for raising the issues.

Fixes https://github.com/pytorch/pytorch/issues/21830
Fixes https://github.com/pytorch/pytorch/issues/21831
Fixes https://github.com/pytorch/pytorch/issues/21832
Fixes https://github.com/pytorch/pytorch/issues/21827
Fixes https://github.com/pytorch/pytorch/issues/21822
Fixes https://github.com/pytorch/pytorch/issues/21820
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21846

Differential Revision: D15847389

Pulled By: soumith

fbshipit-source-id: 421cc48af646a2618af731697de7d4de83d3eabe
2019-06-16 19:35:27 -07:00
972ec676b2 Remove lowered execution (#21674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21674
ghimport-source-id: b8e27f0ce9b8b362daf73556ee67457fb5355062

Reviewed By: eellison

Differential Revision: D15777726

Pulled By: zdevito

fbshipit-source-id: 718ac676c9a1bcf99b856862fd29631d825645da
2019-06-16 14:29:18 -07:00
ff1172d705 high pri Jit builtins (#21451)
Summary:
bin/hex/oct/round/chr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21451

Differential Revision: D15702863

Pulled By: ailzhang

fbshipit-source-id: 9f69896b79e7584f12353e9f2ee2969dbe1ec6d6
2019-06-16 09:48:38 -07:00
4f75da3b41 change ClassType::compilation_unit to return owning ptr (#21787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21787
ghimport-source-id: eed7b98b0f02745066164b8ef3906291931e2ecb

Test Plan: Imported from OSS

Differential Revision: D15831353

Pulled By: suo

fbshipit-source-id: 50695c35dba8ffea710cbc9aca8aba6a75512fa0
2019-06-16 02:37:07 -07:00
263b1985a8 Revert D15833924: [jit] Fix stdout capturing, remove some expect files
Differential Revision:
D15833924

Original commit changeset: 152972b4c240

fbshipit-source-id: 1d5a2258bc134fdc7bd2cb557bcc05f2289443b6
2019-06-15 20:39:11 -07:00
04f09d4235 Move unwrap logic from c10 to caffe2 (#21620)
Summary:
After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path.

Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620

Differential Revision: D15763560

Pulled By: yf225

fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
2019-06-14 22:02:43 -07:00
794ee6d00c Switch to out-source builds for LibTorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21772

Differential Revision: D15839332

Pulled By: yf225

fbshipit-source-id: 017cf61c5682c6a8ffeaf2ca952e1418c27be30e
2019-06-14 21:00:18 -07:00
4a2fc00db0 Revert D15830704: [jit] Add Python string standard lib
Differential Revision:
D15830704

Original commit changeset: e55a8c6bf910

fbshipit-source-id: 1ec953bfaabab0288e953f48cde0a32370ac3fc6
2019-06-14 20:52:58 -07:00
97b92eede1 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 979417253fab9142059bdfb6e834f44bb1cc8d0d
2019-06-14 17:41:30 -07:00
220efdbdc4 Refactor pybind_utils.h (#21550)
Summary:
This refactors pybind_utils so we can have all our type-inferring stuff in
1 place (e.g. for #21379)

There is some follow up work to make the error messages better, but I think that's fine to save for another PR.
](https://our.intern.facebook.com/intern/diff/15727002/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21550

Pulled By: driazati

Differential Revision: D15727002

fbshipit-source-id: a6974f2e1e5879f0503a18efc138da31cda7afa2
2019-06-14 17:27:45 -07:00
a85305fdea Hook up profiled execution in the interpreter (#21799)
Summary:
Rebasing https://github.com/pytorch/pytorch/pull/21616 onto master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21799

Differential Revision: D15832854

Pulled By: Krovatkin

fbshipit-source-id: 88d754446df2abc25ea86e46764848d48ee3a5fc
2019-06-14 16:56:13 -07:00
4bcc72fe95 Support for NamedTuple (#21428)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/18

This implements NamedTuple by taking advantage of the existing `names` field in `TupleType`.

TODO: This currently doesn't retain the NamedTuple-ness through serialization. Discussed with suo offline, we can probably make a way to define an anonymous NamedTuple in script (e.g. `NamedTuple('Foo', [('a', int), ('b', float), ('c', List[float])])` and serialize that
TODO: implement support for calling the constructor with kwargs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21428

Differential Revision: D15741564

Pulled By: jamesr66a

fbshipit-source-id: c077cbcea1880675ca6deb340a9ec78f824a136c
2019-06-14 16:45:56 -07:00
ac8d1a1f76 fix some issues found by enabling -Wshorten-64-to-32 (#18187)
Summary:
when enabling this flag, there were a lot of warnings, this pr focuses on the warnings where this comparison could be affecting array indices, which could be ones most prone to fail

the good news is that I didn't find anything obviously concerning

one degenerate case could be when the matrices we work with are too skinny could run into issues (dim1=1, dim2 needs to hold a big number)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18187

Differential Revision: D14527182

Pulled By: hyuen

fbshipit-source-id: b9f46b6f68ab912c55368961758a7a5af1805555
2019-06-14 16:29:32 -07:00
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
d0021b3ac7 Fix stdout capturing, remove some expect files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21805

Pulled By: driazati

Differential Revision: D15833924

fbshipit-source-id: 152972b4c24041b8a459d5b8ef8789543a6b8153
2019-06-14 16:05:06 -07:00
07fea3f5b6 Add new get_batch() method to ChunkDataset API (#21797)
Summary:
We plan on generating python bindings for C++ ChunkDataset API using the current Pytorch Dataloader class, which must call get_batch() instead of get_batch(size)

This changes doesnt break the current API, just add one more method that will make future extensions easier (WIP)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21797

Differential Revision: D15830522

Pulled By: soumith

fbshipit-source-id: 7208f305b48bf65d2783eaff43ff57a05e62c255
2019-06-14 13:39:54 -07:00
dddc65db9e Add Python string standard lib (#21059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21059
ghimport-source-id: f813585cde1b275c134b19009a2f5c0b3d70fc6e

Reviewed By: jamesr66a

Differential Revision: D15830704

Pulled By: bwasti

fbshipit-source-id: e55a8c6bf910a163b9a5260235e315af9532b129
2019-06-14 13:34:42 -07:00
65a3dbdfb0 Remove hip device sync in miopen Conv implementation (#21791)
Summary:
xw285cornell bddppq
Note there are other optimizations coming.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21791

Differential Revision: D15829238

Pulled By: bddppq

fbshipit-source-id: 66c62c646f315d65b4e432ca20890faded843db4
2019-06-14 12:32:50 -07:00
1fc240e59a add tests for add_custom_scalars and others (#20987)
Summary:
Originally, the tests for tensorboard writer are smoke tests only. This PR lets CI compare the output with expected results at low level. The randomness of the tensors in the test are also removed.
ps. I found that how protobuf serializes data differs between different python environment. One method to solve this is to write the data and then read it back instantly. (compare the data at a higher level)

For `add_custom_scalars`, the data to be written is a dictionary. and the serialized result might be different (not `ordereddict`). So only smoke test for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20987

Reviewed By: NarineK, lanpa

Differential Revision: D15804871

Pulled By: orionr

fbshipit-source-id: 69324c11ff823b19960d50def73adff36eb4a2ac
2019-06-14 12:27:07 -07:00
0d6eb209e6 Expose torch.empty(sizes, *, names, ...) to Python (#21648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648
ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694

Differential Revision: D15804482

Pulled By: zou3519

fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83
2019-06-14 11:52:47 -07:00
710821875a Fix flaky nuclear_norm() test (#21638)
Summary:
Try to fix a sporadic failure on some CIs.

I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638

Differential Revision: D15827779

Pulled By: ezyang

fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02
2019-06-14 11:40:03 -07:00
zaf
ff8c3fd54e Adding the quantized namespace to torch.nn and importing it from torch (#21600)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **https://github.com/pytorch/pytorch/issues/21600 Adding the quantized namespace to torch**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15742149/)

Add nn.quantized name space to torch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21600

Differential Revision: D15742149

Pulled By: zafartahirov

fbshipit-source-id: 60dede12c81861f369d208b06f5b68e9384312f6
2019-06-14 11:05:45 -07:00
9a1dc43f34 Deprecate unordered_map and vector in IValues (#21712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21712

Warn when people use unordered_map or vector with IValues. These APIs are deprecated.
The unordered_map API is slow because it requires copying the whole map.
The vector API is slow for some types (e.g. std::string) because for them it also requires copying the whole map.
Also, the vector API would get slow for all types if we decide to switch to SmallVector.

Differential Revision: D15792428

fbshipit-source-id: 1b72406b3a8d56521c862858c9f0ed01e56f2757
2019-06-14 11:05:41 -07:00
029a968212 Define __setstate__ on _ConvNd to handle pre-padding_mode pickles. (#21687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21687
ghimport-source-id: df49530d25239ac4d62eae83c5d7b0d8f00f836a

Differential Revision: D15807402

Pulled By: ezyang

fbshipit-source-id: f51b221444afc4e017db7544642a9c0a7d2a3efb
2019-06-14 11:00:21 -07:00
7284f448ba Fix handling of kwargs from common method invocations (#21499)
Summary:
When kwargs are specified in a test defined via common_method_invocations, it doesn't work if there isn't also a positional argument (`{'foo':'foo'}` without a positional arg generates a python call like: `self.method(, foo=foo)`, erroring on the `,`). I wanted to test something in a different PR and noticed I couldn't.

Also fixed some flake8 warnings I was seeing locally.

I replaced `lambda x: x` with `ident` since it seems a bit cleaner to me, but happy to revert that if others don't agree?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21499

Differential Revision: D15826974

Pulled By: nairbv

fbshipit-source-id: a3f37c80ba2303c7d9ae06241df06c7475b64e36
2019-06-14 10:47:33 -07:00
c1744a6c39 Add ONNX py3 CI cases (#21715)
Summary:
So far, we only have py2 ci for onnx. I think py3 support is important. And we have the plan to add onnxruntime backend tests, which only supports py3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21715

Reviewed By: bddppq

Differential Revision: D15796885

Pulled By: houseroad

fbshipit-source-id: 8554dbb75d13c57b67ca054446a13a016983326c
2019-06-14 10:20:14 -07:00
c06ccbe663 Add aten mkldnn zero_ operator (#20573)
Summary:
### mkldnn backward ops list:
 - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💚
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20573

Differential Revision: D15820477

Pulled By: bddppq

fbshipit-source-id: 35d95f5b4e013c8db1911f52148550a2e40a2e68
2019-06-14 09:48:49 -07:00
bc6281028c rebuild_storage_fd retry on EINTR (#21723)
Summary:
Some data loader tests are flaky on py 2 with the following error
```
Jun 12 22:17:31 Traceback (most recent call last):
Jun 12 22:17:31   File "test_dataloader.py", line 798, in test_iterable_dataset
Jun 12 22:17:31     fetched = sorted([d.item() for d in dataloader_iter])
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__
Jun 12 22:17:31     idx, data = self._get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data
Jun 12 22:17:31     success, data = self._try_get_data()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data
Jun 12 22:17:31     data = self.data_queue.get(timeout=timeout)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Jun 12 22:17:31     res = self._recv()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Jun 12 22:17:31     return pickle.loads(buf)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Jun 12 22:17:31     return Unpickler(file).load()
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Jun 12 22:17:31     dispatch[key](self)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Jun 12 22:17:31     value = func(*args)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Jun 12 22:17:31     fd = multiprocessing.reduction.rebuild_handle(df)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
Jun 12 22:17:31     new_handle = recv_handle(conn)
Jun 12 22:17:31   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
Jun 12 22:17:31     return _multiprocessing.recvfd(conn.fileno())
Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call
```

Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174
So we should call it with an outer try-catch loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723

Differential Revision: D15806247

Pulled By: ezyang

fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7
2019-06-14 09:10:00 -07:00
deb2140c6e Throwing errors for min and max reductions in empty CUDA tensors (#19612)
Summary:
Related to https://github.com/pytorch/pytorch/issues/17750.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19612

Differential Revision: D15813649

Pulled By: gchanan

fbshipit-source-id: aa3dc34dd1e6d8bb24fa4c18891204108759bb35
2019-06-14 08:34:30 -07:00
b811b6d5c0 When building extensions, honor options set in CMake. (#21653)
Summary:
Currently when building extensions, variables such as USE_CUDA, USE_CUDNN are used to determine what libraries should be linked. But we should use what CMake has detected, because:

1. If CMake found them unavailable but the variables say some libraries should be linked, the build would fail.
2. If the first build is made using a set of non-default build options, rebuild must have these option passed to setup.py again, otherwise the extension build process is inconsistent with CMake. For example,

```bash
# First build
USE_CUDA=0 python setup.py install
# Subsequent builds like this would fail, unless "build/" is deleted
python setup.py install
```

This commit addresses the above issues by using variables from CMakeCache.txt when building the extensions.

 ---

The changes in `setup.py` may look lengthy, but the biggest changed block is mostly moving them into a function `configure_extension_build` (along with some variable names changed to `cmake_cache_vars['variable name']` and other minor changes), because it must be called after CMake has been called (and thus the options used and system environment detected by CMake become available).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21653

Differential Revision: D15824506

Pulled By: ezyang

fbshipit-source-id: 1e1eb7eec7debba30738f65472ccad966ee74028
2019-06-14 08:13:40 -07:00
4001e71547 When converting to NumPy, throw TypeError when type is not supported (#21608)
Summary:
This makes the error thrown in aten_to_numpy_dtype consistent with that in numpy_dtype_to_aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21608

Differential Revision: D15816035

Pulled By: gchanan

fbshipit-source-id: 392e8b9ea37003a859e7ed459911a1700fcbd695
2019-06-14 07:35:03 -07:00
2d5ce519f2 Fix with emit_nvtx, also allow shape information to appear in nvtx ranges. (#21691)
Summary:
This PR is intended as a fix for https://github.com/pytorch/pytorch/issues/21644.

It allows the `with emit_nvtx` context manager to take an additional `record_shapes` argument. `record_shapes` is False by default, but if True, the nvtx ranges generated for each autograd op will append additional information about the sizes of Tensors received by that op.

The format of shape information is equivalent to what the CPU-side profiler spits out.  For example,
```
M = torch.randn(2, 3)
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)

with torch.cuda.profiler.profile():
    with torch.autograd.profiler.emit_nvtx(record_shapes=True):
        torch.addmm(M, mat1, mat2)
```
produces the following nvtx range label for addmm:
![Screenshot from 2019-06-12 10-48-01](https://user-images.githubusercontent.com/7799218/59374008-b7d13100-8cff-11e9-9245-58410073d965.png)
(cf the "Input Shapes" shown in 864cfbc216 (diff-115b6d48fa8c0ff33fa94b8fce8877b6))

I also took the opportunity to do some minor docstring cleanup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21691

Differential Revision: D15816226

Pulled By: gchanan

fbshipit-source-id: b2b01ea10fea61a6409a32b41e85b6c8b4851bed
2019-06-14 07:35:00 -07:00
b9675efb5a Fix the issue of sizes vs size for tensor creation ops (#21686)
Summary:
Related to [pytorch#20921](https://github.com/pytorch/pytorch/issues/20921)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21686

Differential Revision: D15816109

Pulled By: gchanan

fbshipit-source-id: 4428b8e77b6c8b297ddb77e58fc1cb916c9cc46e
2019-06-14 07:34:56 -07:00
1e7bd7586d Query caffe2 operator stats for detailed execution info (#20924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20924

I found a python3 bug for deserializing caffe2 code. The exception thrown is Unicode related error instead of just decode error, and we need to catch that as well

Reviewed By: ipiszy

Differential Revision: D15293221

fbshipit-source-id: 29820800d1b4cbe5bf3f5a189fe2023e655d0508
2019-06-13 23:41:04 -07:00
d9eec4ef0d backend.py: _getattr__ must raise AttributeError (#21763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21763

Custom __getattr__ functions can only raise AttributeError. This code throwed NotImplementedError which caused upstream troubles when hasattr() was called.

Differential Revision: D15815176

fbshipit-source-id: 0982e2382de4578d3fc05c5d2a63f624d6b4765e
2019-06-13 23:17:57 -07:00
044809f1f3 Handling numel() == 0 in convTranspose (#21652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21652

Diff fixes issue of empty ROIs for convTranspose

Issue StackTrace: P65374505

Reviewed By: jerryzh168

Differential Revision: D15766739

fbshipit-source-id: 39cf8feca66b6aae22ff4ec5c1b6a4e3f20f378d
2019-06-13 23:02:26 -07:00
5c0e058950 Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647
ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42

Differential Revision: D15804425

Pulled By: zou3519

fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d
2019-06-13 20:38:19 -07:00
3e79036382 Make it possible to trigger all tests by pushing to ci-all/ branch (#21750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21750
ghimport-source-id: 4792aa5ccab7e4b54c21f23d0b78802f85bbeb8d

Differential Revision: D15819367

Pulled By: ezyang

fbshipit-source-id: db91ee727c66469ac78e59b3662f29db53a916bc
2019-06-13 19:53:35 -07:00
16b4a12ed8 better example for local weights (#21685)
Summary:
fixes https://github.com/pytorch/hub/issues/29
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21685

Differential Revision: D15817774

Pulled By: ailzhang

fbshipit-source-id: d2f615e5d431186d45a21d8300fb9ba3c37b246c
2019-06-13 17:56:25 -07:00
adc99efb46 Add batch id to tracer event (#21446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446

this is used for easier tracing of iter id when looking at trace diagram

Reviewed By: ilia-cher

Differential Revision: D15628950

fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d
2019-06-13 17:13:42 -07:00
fbecb4621f schema_matching.cpp: improve error messages.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141

Differential Revision: D15808354

Pulled By: ZolotukhinM

fbshipit-source-id: 16d938fd5acafb445a0c433cabc9a55cab563165
2019-06-13 17:04:38 -07:00
cfd8c58b45 Tune elementwise ops for ROCm (#21754)
Summary:
```
The stride calculation using OffsetCalculator performs poorly with
MAX_DIMS=25. This reduces MAX_DIMS (after coalescing) to 16 on ROCm.
I think it's unlikely that anyone will exceed this limit. If they do,
we can add additional specializations for ROCm with more dimensions.
```

I'm not sure about the underlying cause. With MAX_DIM=25, the add kernel's params
is ~648 bytes vs. ~424 bytes with MAX_DIM=16. The kernel instruction footprint is
bigger too, but most of these instructions are never executed and most kernel parameters
are never loaded because the typical dimensionality is much smaller.

Mini benchmark here:
https://gist.github.com/colesbury/1e917ae6a0ca9d24712121b92fed4c8f

(broadcasting operations are much faster)

cc iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21754

Reviewed By: bddppq

Differential Revision: D15811906

Pulled By: colesbury

fbshipit-source-id: 063f92c083d26e2ef2edc98df7ff0400f9432b9d
2019-06-13 16:25:26 -07:00
f59581218f Fix spelling errors (#21665)
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665

Differential Revision: D15806155

Pulled By: soumith

fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
efd20de276 fix multihead attention for half (#21658)
Summary:
Currently multihead attention for half type is broken
```
  File "/home/ngimel/pytorch/torch/nn/functional.py", line 3279, in multi_head_attention_forward
    attn_output = torch.bmm(attn_output_weights, v)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument https://github.com/pytorch/pytorch/issues/2 'mat2'
```
because softmax converts half inputs into fp32 inputs. This is unnecessary - all the computations in softmax will be done in fp32 anyway, and the results need to be converted into fp16 for the subsequent batch matrix multiply, so nothing is gained by writing them out in fp32. This PR gets rid of type casting in softmax, so that half works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21658

Differential Revision: D15807487

Pulled By: zhangguanheng66

fbshipit-source-id: 4709ec71a36383d0d35a8f01021e12e22b94992d
2019-06-13 15:17:04 -07:00
4716409f30 Use expect to fill in pytorchbot token (#20459)
Summary:
In this PR, we use `expect` to fill in the token for pytorchbot when doing `git push`, so that we don't need to save the token in the git remote URL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20459

Differential Revision: D15811676

Pulled By: yf225

fbshipit-source-id: cd3b780da05d202305f76878e55c3435590f15a8
2019-06-13 14:56:38 -07:00
b858f42e16 Document that no_grad is thread local. (#21755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21755
ghimport-source-id: dfb53759024d9ba9d104fdb2a8151ab996e55234

Differential Revision: D15811172

Pulled By: ezyang

fbshipit-source-id: c8c7c1c15277d8fe8cc513e20af449257d7ff15c
2019-06-13 13:47:09 -07:00
3e8dc565bd Bug fix: ONNX export full operator (#21669)
Summary:
Fix an obvious bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21669

Reviewed By: zrphercule

Differential Revision: D15806614

Pulled By: houseroad

fbshipit-source-id: d0f6e934252e0057f3dbcc7f160236ee6f4497ac
2019-06-13 13:20:21 -07:00
4b45f08f87 Added dim check for index_copy_ (#21617)
Summary:
Fixing reported [bug](https://github.com/pytorch/pytorch/issues/20322)
The issue was related to not checking the dimensions of source vs destination tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21617

Differential Revision: D15749963

Pulled By: izdeby

fbshipit-source-id: acff114c729fd9c0a9a51325e0ebd8b42e1f2fc1
2019-06-13 13:15:23 -07:00
aa6887e6ef add error message to missing function backend (#21742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21742

Add error message to NotImplementedError so we know which function it is about.

Reviewed By: bddppq

Differential Revision: D15806379

fbshipit-source-id: 14eab9d03aa5b44ab95c5caeadc0e01d51f22188
2019-06-13 13:10:48 -07:00
756a20de93 Add/edit docs for nn.transformer (#21746)
Summary:
Add docs for TransformerEncoder and TransformerDecoder, plus minor edits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21746

Differential Revision: D15807498

Pulled By: zhangguanheng66

fbshipit-source-id: 388efb5821c4c3d25865cecea70902e9b2bf5d15
2019-06-13 12:27:26 -07:00
7c7d5be033 Skip failing test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21751

Pulled By: driazati

Differential Revision: D15809091

fbshipit-source-id: 3cc96e632a7b89b4d86d68d2a76021d971447e12
2019-06-13 12:21:56 -07:00
51ee048709 improve torch.load & torch.save doc formatting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21747

Differential Revision: D15808189

Pulled By: ezyang

fbshipit-source-id: 5413eaaa901be098c6bad135f702ba103bc79d6c
2019-06-13 12:13:04 -07:00
63a7c7bb2a Add event and event_counter columns to caffe2_usage_tracer table (#21739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21739

Added event and event_counter columns for PyTorch/Caffe2 API usage metrics

Reviewed By: dzhulgakov

Differential Revision: D15119119

fbshipit-source-id: a71010bd659109a8e4f3a8bad84b22c1d15dc528
2019-06-13 12:06:02 -07:00
f87d5cc191 Fix first reshape in pixel_shuffle conversion (#21486)
Summary:
When converting pixel_shuffle to reshape + transpose + reshape, the first reshape should
be:
[N, C * r^2, H, W] => [N, C, r, r, H, W]
in order to match pytorch's implementation (see ATen PixelShuffle.cpp).

This previously wasn't caught by the test case, since it uses C = r = 4. Updated test case to
have C = 2, r = 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21486

Reviewed By: houseroad

Differential Revision: D15700945

Pulled By: houseroad

fbshipit-source-id: 47019691fdc20e152e867c7f6fd57da104a12948
2019-06-13 11:44:54 -07:00
fc3f702ba8 at::launch benchmark (#21581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21581
ghimport-source-id: 6a65d73694b17611d6ad45db0b39b86c318a68c7

Differential Revision: D15736495

Pulled By: ilia-cher

fbshipit-source-id: 6b9109ad3611ff3c8b1a37796e9149bef0c2ad36
2019-06-13 10:46:35 -07:00
eca42a5122 Fix failing test for Final annotations (#21725)
Summary:
](https://our.intern.facebook.com/intern/diff/15800009/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21725

Pulled By: driazati

Differential Revision: D15800009

fbshipit-source-id: 5409c213161e3f2031710933897b85872aad2a83
2019-06-13 10:41:44 -07:00
5485f09f18 Native TBB parallel backend (#20480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20480
ghimport-source-id: c710f897c4c9b9616fc3dd76d80b4845aea43a1f

Differential Revision: D15333692

Pulled By: ilia-cher

fbshipit-source-id: 61e476dd5c737fe144e3aec000d8ebb11fbc0547
2019-06-13 10:11:16 -07:00
ab0d5ab99d Port avg_pool2d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21635

Differential Revision: D15768487

Pulled By: ezyang

fbshipit-source-id: 85e1d883aded0f4d3ac5100719df335f5a337fc5
2019-06-13 10:03:58 -07:00
5a7e2ccc0b Add use_rocm flag to detect AMD build in the runtime (#21718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21718

adding a detection method on whether the package is built for AMD.

Reviewed By: bddppq

Differential Revision: D15795893

fbshipit-source-id: 91a21ee76b2273b1032507bdebe57e016717181d
2019-06-13 09:30:49 -07:00
556af7c19d ROCm 2.5
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21724

Differential Revision: D15799149

Pulled By: bddppq

fbshipit-source-id: c72689e73470f2ca145556a2ac8cb34e36e341ef
2019-06-13 01:32:46 -07:00
42770e1370 Improving Categorical Distribution Docs' (#16291) (#21707)
Summary:
**Closes:** Confusing documentation with distributions.Categorical about logits https://github.com/pytorch/pytorch/issues/16291

**Solution**: Changes documentation on the Categorical distribution from `log probabilities` to `event log-odds`. This makes should reduce confusion as raised by this issue, and is consistent with other distributions such as `torch.Binomial`.

More than happy to make any other changes if they fit :).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21707

Differential Revision: D15799181

Pulled By: soumith

fbshipit-source-id: f11acca7a5c130102a3ff6674640235ee5aa69bf
2019-06-12 23:54:02 -07:00
a3db2844e1 Support tuples in ScriptModule inputs/outputs (#20784)
Summary:
- [x] Add tests after https://github.com/pytorch/pytorch/pull/20256 is merged

- Support exporting ScriptModule with inputs/outputs of arbitrarily constructed tuples.

- Moved the assigning of output shapes to after graph conversion to ONNX is completed. By then all tuples in the IR has already been lowered by the pass ```_jit_pass_lower_all_tuples```. If assigning output shapes is required to happen before that, we'll need to hand parse the tuple structures in the graph, and repeat the same logic in ```_jit_pass_lower_all_tuples```. Handling inputs is easier because all tuple information is encoded within the input tensor type.

- Swap the order of ```_jit_pass_lower_all_tuples``` and ```_jit_pass_erase_number_types```. Ops like ```prim::TupleIndex``` relies on index being a scalar. ```_jit_pass_erase_number_types``` will convert these kind of scalars to tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20784

Reviewed By: zrphercule

Differential Revision: D15484171

Pulled By: houseroad

fbshipit-source-id: 4767a84038244c929f5662758047af6cb92228d3
2019-06-12 23:37:28 -07:00
4c03ac7ac4 Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689)
Summary:
…ngular_solve

Changelog:
- Iterate over mini batches of 65535 matrices (maximum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689

Differential Revision: D15800254

Pulled By: soumith

fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a
2019-06-12 23:30:19 -07:00
b599bb3836 Add mkldnn mul operator (#20575)
Summary:
### mkldnn backward ops list:
 - [ ] \(https://github.com/pytorch/pytorch/pull/20567) Add aten mkldnn conv2d backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20570) Add aten mkldnn backward ops: relu, linear and reshape 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20572) Add aten mkldnn batchnorm backward operator 💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20573) Add aten mkldnn zero_ operator💛
 - [ ] \(https://github.com/pytorch/pytorch/pull/20575) Add mkldnn mul operator 💛
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20575

Differential Revision: D15799529

Pulled By: bddppq

fbshipit-source-id: 4887d8ef1a0e316ad9db199b657d9481fc13e486
2019-06-12 22:41:51 -07:00
d3b3cbe26e Revert D15769066: [pytorch][PR] schema_matching.cpp: improve error messages.
Differential Revision:
D15769066

Original commit changeset: 5853e0360581

fbshipit-source-id: ac6fa8429136abf4c7835919009f936eea11ea7b
2019-06-12 20:17:38 -07:00
49481d576d Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
2019-06-12 20:12:34 -07:00
e9121e27ce remove liveness tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21719

Differential Revision: D15797628

Pulled By: Krovatkin

fbshipit-source-id: 87742bdde0b05aff4341ababb1f55c51991768ec
2019-06-12 19:04:41 -07:00
f5c00345b3 Profiling Programs section in README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21695

Differential Revision: D15795716

Pulled By: Krovatkin

fbshipit-source-id: e14a44210ea4312a247157a6681fce449e40f779
2019-06-12 17:52:05 -07:00
8dd670657b Liveness for BailOut graphs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21615

Differential Revision: D15793434

Pulled By: Krovatkin

fbshipit-source-id: d89f1bf61ea57a1e3b75f8e2b200c27beb8b46cf
2019-06-12 17:22:33 -07:00
8c57ce87b0 make tests pass with enable_first_class_module() enabled. (#21565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21565
ghimport-source-id: d1fe735fb7821eadc59116fb921d8fe39a49f818

Reviewed By: driazati

Differential Revision: D15729503

Pulled By: zdevito

fbshipit-source-id: fabb678f040d21fae7545e3b2be1d098e24c544e
2019-06-12 17:13:00 -07:00
d8056cb832 Update quantization to work with first-class modules. (#21660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21660
ghimport-source-id: f9a11b2748f49042ee636755358d79c547aa249e

Reviewed By: suo

Differential Revision: D15770237

Pulled By: zdevito

fbshipit-source-id: 41fa8577028eef247bc545635cd93192a0b19db4
2019-06-12 17:12:57 -07:00
56f4602630 Add WeakIValue, use in tracer. (#21515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21515
ghimport-source-id: 7898a68791db2b5050164ab01d6ca6991e05746d

Reviewed By: suo

Differential Revision: D15719981

Pulled By: zdevito

fbshipit-source-id: 42cf26cf6541bcdf95f1343da3b9228fe2c229da
2019-06-12 17:12:53 -07:00
0293cf5bb6 Add Final[T] annotated members to __constants__ (#21603)
Summary:
Class member annotations can be marked with `Final[T]` instead of adding them to `__constants__`. `Final` comes from the `typing_extensions` module (which will be used if it is present). If not, the polyfill from `_jit_internal` is exposed as `torch.jit.Final` for users that don't want to install `typing_extensions`.

This keeps around `__constants__` since a lot of code is still using it, but in documentation follow ups we should change the examples to all to use `Final`.

TODO: install typing_extensions on CI, move tests to a Python3 only file when #21489 lands
](https://our.intern.facebook.com/intern/diff/15746274/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21603

Pulled By: driazati

Differential Revision: D15746274

fbshipit-source-id: d2c9b5643b4abba069b130c26fd42714c906ffac
2019-06-12 16:40:40 -07:00
0481a7710d Support for type annotations instead of torch.jit.annotate() (#21390)
Summary:
This adds support for PEP 526 style annotations on assignments in place of
`torch.jit.annotate()`, so

```python
a = torch.jit.annotate(List[int], [])
```

turns into

```python
a : List[int] = []
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390

Differential Revision: D15790937

Pulled By: driazati

fbshipit-source-id: 0cc204f7209a79839d330663cc6ba8320d3a4120
2019-06-12 15:51:46 -07:00
699de487db numerical integration "trapz" function. (#21610)
Summary:
This is intended to match [numpy.trapz](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html): numerical integration based on the trapezoid rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21610

Differential Revision: D15747618

Pulled By: umanwizard

fbshipit-source-id: 8eadb2e75c9877b07592d875ca0b2cca6cb72297
2019-06-12 15:30:13 -07:00
b527e48588 Use c10::List (#21177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177

- Integrate c10::ListPtr into IValue and the c10 dispatcher.
- Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now

Differential Revision: D15476433

fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954
2019-06-12 13:58:24 -07:00
ae342fd076 Refactor Random Number Generators in ATen (#21364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21364
ghimport-source-id: ca7d37e10190ba46dc8512f437404ca9216d3369

Differential Revision: D15696497

Pulled By: ezyang

fbshipit-source-id: 2e713b8566ae915e175b5a79ac1dd9b86cc2a23d
2019-06-12 13:01:30 -07:00
96910251e0 schema_matching.cpp: improve error messages.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21141

Differential Revision: D15769066

Pulled By: ZolotukhinM

fbshipit-source-id: 5853e0360581c44e42b068add3bf2bc68e671b2b
2019-06-12 12:43:12 -07:00
28adca82ea Add some named tensor helper functions (#21636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21636
ghimport-source-id: 5eff5744cd3c80f75bdb02576be1407a64e0434d

Differential Revision: D15780269

Pulled By: zou3519

fbshipit-source-id: 87ff40ffbe0ebd5fc4d105709c9f6f8dda5f9952
2019-06-12 12:34:44 -07:00
20b0acf057 Add some more namedtensor builds to the CI (#21632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21632
ghimport-source-id: 6a8da97ce153c6d279017af920edd0d20765c32c

Differential Revision: D15760331

Pulled By: zou3519

fbshipit-source-id: b2f4c65df5f6f9322d47da995c76851387e5df47
2019-06-12 12:34:40 -07:00
3e6eb3dcab Add virtual dtor to NamedTensorMetaInterface (#21633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21633
ghimport-source-id: 6cdf0b1559e696a19e282ff6d5ba79c6b119e8c0

Differential Revision: D15760589

Pulled By: zou3519

fbshipit-source-id: 537882c05ab7b19889a31c648c5efeb1949831a8
2019-06-12 12:34:37 -07:00
83cec5f3ee nn.Transformer (#20170)
Summary:
Accidentally rebased the old PR and make it too messy. Find it here (https://github.com/pytorch/pytorch/pull/19274)

Create a PR for comments. The model is still WIP but I want to have some feedbacks before moving too far. The transformer model depends on several modules, like MultiheadAttention (landed).

Transformer is implemented based on the paper (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). Users have the flexibility to build a transformer with self-defined and/or built-in components (i.e encoder, decoder, encoder_layer, decoder_layer). Users could use Transformer class to build a standard transformer model and modify sub-layers as needed.

Add a few unit tests for the transformer module, as follow:
TestNN.test_Transformer_cell
TestNN.test_transformerencoderlayer
TestNN.test_transformerdecoderlayer
TestNN.test_transformer_args_check
TestScript.test_scriptmodule_transformer_cuda

There is another demonstration example for applying transformer module on the word language problem. https://github.com/pytorch/examples/pull/555
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20170

Differential Revision: D15417983

Pulled By: zhangguanheng66

fbshipit-source-id: 7ce771a7e27715acd9a23d60bf44917a90d1d572
2019-06-12 12:22:12 -07:00
180aa234fc Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 5cbf562652b9d7cf3877b5f819141f88c9b857d3
2019-06-12 12:17:42 -07:00
8f40164517 Add libtorch Linux CPU binary build to PR CI (#21671)
Summary:
Currently we don't have any Linux libtorch binary build in the PR CI, which led to nightly build failure such as https://circleci.com/gh/pytorch/pytorch/1939687. This PR adds Linux libtorch CPU binary build to prevent such breakage from happening in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21671

Differential Revision: D15785003

Pulled By: yf225

fbshipit-source-id: d1f2e4235e48296ddecb3367f8e5a0df16f4ea49
2019-06-12 12:07:31 -07:00
39d412194f Fix ProcessGroupGloo allgather for tensors with shared storage (#21490)
Summary:
Fix https://github.com/pytorch/pytorch/issues/20421

`ProcessGroupGloo` only requires input/output tensors to be contiguous. Contiguous tensors might not start from the beginning of the underlying storage, e.g., `chunk(..., dim=0)[1]`. The current implementation passes `tensor.storage().data()` ptr to gloo buffer. This leads to wrong results if the tensor has a non-zero storage offset.

The proposed solution is to use `tensor.data_ptr()` instead. Let's see if this breaks any tests.

cc qijianan777
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21490

Differential Revision: D15768907

Pulled By: mrshenli

fbshipit-source-id: 9d7d1e9baf0461b31187c7d21a4a53b1fbb07397
2019-06-12 11:59:17 -07:00
ad73ea22f7 Add strong Wolfe line search for lbfgs (#8824)
Summary:
This pull request adds a line search for lbfgs. "strong Wolfe" is the default line search method in [minFunc](https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html) and it is also recommended in the [Numerical Optimization](https://www.springer.com/gp/book/9780387303031) book.

The implementation is based on four sources:
+ https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html
+ https://www.springer.com/gp/book/9780387303031 Algorithms 3.5, 3.6, formula 3.59
+ https://github.com/torch/optim/blob/master/lswolfe.lua
+ https://github.com/torch/optim/blob/master/polyinterp.lua

The 'lua' version is based on an old version of `minFunc`, which has been updated in 2012. I made a couple of small changes based on the updated version. Due to that, the test of comparing with `.lua` version is not consistent (that's is the reason I changed a learning rate in the test).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8824

Differential Revision: D15783067

Pulled By: vincentqb

fbshipit-source-id: 5316d9088233981120376d79c7869d5f97e51b69
2019-06-12 11:32:41 -07:00
2c91ba3bbc Add div hashing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21422

Reviewed By: xianjiec

Differential Revision: D15589181

fbshipit-source-id: f6ff0726164f88da45e4b090b4d5ad05305b3225
2019-06-12 11:27:37 -07:00
76e01542ed Fix the shape of PReLU weight (#21330)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/21271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21330

Reviewed By: zrphercule

Differential Revision: D15776459

Pulled By: houseroad

fbshipit-source-id: 4e0aef88e9c91c79faa3da6fa66f7466dee52018
2019-06-12 11:03:40 -07:00
7123c6ca04 Enable groupwise for qconv (#21592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21592

We now support groupwise convolutions for qconv2d

Reviewed By: zafartahirov

Differential Revision: D15739239

fbshipit-source-id: 80b9b4fef5b9ee3d22ebecbaf205b970ab3d4250
2019-06-12 11:03:36 -07:00
8cc8e15473 Back out "[pytorch][PR] [Re-landing] Fix caffe2 windows CI for new Windows AMI" (#21670)
Summary:
Original commit changeset: e65c1d6bfcc9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21670

Differential Revision: D15776087

Pulled By: yf225

fbshipit-source-id: cbb55cbbcb133cae1aeb2fe75cc52e7350cc6c88
2019-06-12 10:37:19 -07:00
cbcb2b5ad7 Delete DDP hooks in Reducer destructor (#21591)
Summary:
Closes https://github.com/pytorch/pytorch/issues/21344

DDP assigns the original module to the first module replica instead of creating a new one. Then, it creates a new Reducer to add post hooks to sync gradients. However, because every reconstructed DDP instance wraps the same original module, all their reducers will add hooks to the same set of variables. This PR deletes DDP hooks from variables when destructing Reducer, trying to make DDP failure recoverable.

pietern kuttas and I discussed the following solutions:

#### Solution 1

Keep `add_post_hook` API intact, and do a `dynamic_cast` in `del_post_hook` to check hook type. If the type matches Reducer's hook, delete it. As pietern mentioned, this will not work if we create multiple DDP instances from the same original model.

#### Solution 2

Use a counter to generate a unique key for every hook in `Function`, and keep them in a map. return the key to the caller of `add_post_hook`, and ask the caller to provide key if it needs to delete the hook.

Con: this would add extra overhead to `add_post_hook` and every `Function` object.

#### Solution 3 [Current implementation]

kuttas suggests that, instead of generating a unique key, directly using the address of the pointer would be better. In order to avoid messing up dereferencing, let `add_post_hook` to return a `uintptr_t`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21591

Differential Revision: D15745706

Pulled By: mrshenli

fbshipit-source-id: e56d2d48de0c65f6667790ab16337eac7f7d8b76
2019-06-12 07:08:28 -07:00
1e4af2b969 Pin torchvision version. (#20811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20811
ghimport-source-id: 52e043453272d8441a2c0efd7f005b71ded024d6

Differential Revision: D15779416

Pulled By: ezyang

fbshipit-source-id: 1b3c2d9aeab57e580038f0c2a8bfbfcae9d7b62a
2019-06-12 06:16:20 -07:00
1ffa9d3d3b correct measure quantization error when followed_by=Relu and dequantize_output=1 (#21664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21664

As title

Reviewed By: csummersea

Differential Revision: D15770947

fbshipit-source-id: 57f5842e1a250300703b02134c314e4f06b767b8
2019-06-11 23:36:15 -07:00
c2a18a6702 Override print when python is present (#21625)
Summary:
This makes it so we can see the output of prim::Print in environments like iPython notebooks which override sys.stdout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21625

Differential Revision: D15756793

Pulled By: jamesr66a

fbshipit-source-id: 7d9a14b2e229ed358e784318e9d862677db2c461
2019-06-11 22:58:22 -07:00
aa7e27fa70 Emit Loop Condition as Separate Block (#21611)
Summary:
Emit loop condition as a separate block in loops, then inline them before conversion to SSA. This is needed for breaks & continues where we will inline the condition block after the continue pass and before the break pass.

I also considered emitting a prim::For and a prim::While, but i think it's easier to just have one pathway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21611

Differential Revision: D15775820

Pulled By: eellison

fbshipit-source-id: de17c5e65f6e4a0256a660948b1eb630e41b04fb
2019-06-11 22:03:26 -07:00
341a7e4bb5 Fix issue in backward path (#21663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21663

as title

Reviewed By: hl475

Differential Revision: D15770793

fbshipit-source-id: b3d0dd030237c4d62bddc388984a273153fac4a6
2019-06-11 21:09:25 -07:00
afd202be9f StoreMatrixInMatrixMarketFormat can store both integer and float tensors (#21606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21606

StoreMatrixInMatrixMarketFormat was able to dump quantized tensors only but sometimes we want to dump float tensors.

Reviewed By: csummersea

Differential Revision: D15741611

fbshipit-source-id: 95b03c2fdf1bd8407f7d925171d9dc9f25677464
2019-06-11 17:28:19 -07:00
c2a08d339b Automatic update of fbcode/onnx to dd599b05f424eb161a31f3e059566a33310dbe5e (#21641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21641

Previous import was 5160f3ac3380302224998f1c95e111cd961c4bc5

Included changes:
- **[dd599b05](https://github.com/onnx/onnx/commit/dd599b05)**: Fix type s/depracted/deprecated/ (#2092) <Takeshi Watanabe>
- **[abb1702a](https://github.com/onnx/onnx/commit/abb1702a)**: Add shape inference for Tile op (#2076) <Hariharan Seshadri>
- **[67638d9c](https://github.com/onnx/onnx/commit/67638d9c)**: [New Operator] Round (#2053) <Jeff Saremi>
- **[584e4477](https://github.com/onnx/onnx/commit/584e4477)**: Add dilations support in ConvTranspose shape inference and update docs (#2068) <daquexian>

Reviewed By: zrphercule

Differential Revision: D15762382

fbshipit-source-id: 590f25fb733e1565eb90fcdeb797b0ba34e2d2c3
2019-06-11 16:54:47 -07:00
968114ae3d Revert D15769256: [jit] Add python string standard lib
Differential Revision:
D15769256

Original commit changeset: 1af487446361

fbshipit-source-id: 96bea4a49664dad68762bef75ae28e64c673f8b1
2019-06-11 16:54:43 -07:00
039629cedd fix incorrect use of TeX in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21649

Differential Revision: D15766392

Pulled By: umanwizard

fbshipit-source-id: a362ec06e971ee12c47a45bc9c15cc773ec878e3
2019-06-11 16:19:40 -07:00
1bd21d3f14 test_jit: Remove tests checking non-guaranteed properties from 'test_insert_observers'. (#21657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21657
ghimport-source-id: e9c7e45c00db55bf3b7895d06d77f0d99bfc1afe

Differential Revision: D15769295

Pulled By: ZolotukhinM

fbshipit-source-id: cfb40bc5d7116b1d99f5e0f5c4f5577f5aa33804
2019-06-11 16:12:09 -07:00
ee33afe2b1 randomized testing for qconv (#21436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21436

Test many different options

Reviewed By: zafartahirov

Differential Revision: D15683754

fbshipit-source-id: 60d0fc697b53c7e4adadbe80995d45f28729bca4
2019-06-11 16:07:22 -07:00
cf5c3bb3fe make range functions respect current stream (#21619)
Summary:
Stream is not respected on range/linspace/logspace functions, which contributes to https://github.com/pytorch/pytorch/issues/21589 (this is not a complete solution for that issue).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21619

Differential Revision: D15769666

Pulled By: ezyang

fbshipit-source-id: 7c036f7aecb3119430c4d432775cad98a5028fa8
2019-06-11 15:46:48 -07:00
9241c4b3c6 Add python string standard lib (#21656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21656
ghimport-source-id: cc7d7f68e33e95a97f6274c50823138aa4bacabb

Differential Revision: D15769256

Pulled By: bwasti

fbshipit-source-id: 1af487446361d90d03dce004c3e2169a3e62667d
2019-06-11 15:23:23 -07:00
9737b166a4 Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15761029

Pulled By: colesbury

fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb
2019-06-11 15:18:17 -07:00
fb9fbc009c Fix momentum bug in CyclicLR (#20401)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/19003

The author of this issue also asked that `cycle_momentum` default to `False` if the optimizer does not have a momentum parameter, but I'm not sure what the best way to do this would be. Silently changing the value based on the optimizer may confuse the user in some cases (say the user explicitly set `cycle_momentum=True` but doesn't know that the Adam optimizer doesn't use momentum).

Maybe printing a warning when switching this argument's value would suffice?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20401

Differential Revision: D15765463

Pulled By: ezyang

fbshipit-source-id: 88ddabd9e960c46f3471f37ea46013e6b4137eaf
2019-06-11 15:10:28 -07:00
cdbc20677c Add len to OrderedDict types (#21651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21651
ghimport-source-id: 0bba5c1930865e2d18b18782ba8c8990b0761d4d

Differential Revision: D15767795

Pulled By: bwasti

fbshipit-source-id: 70e27176897b0f977c9034ffb3ad21091c91e12e
2019-06-11 14:53:40 -07:00
7a040f4b0b Revert D15706021: [jit] Support for type annotations instead of torch.jit.annotate()
Differential Revision:
D15706021

Original commit changeset: 8bf1459f229d

fbshipit-source-id: 7ae34578560e2dccd0f04af2220445b3999771fe
2019-06-11 14:33:28 -07:00
b46e87cd3d Fix catch block to fix 'error: catching polymorphic type' (#21637)
Summary:
Fix catch block to fix 'error: catching polymorphic type `class c10::Error` by value [-Werror=catch-value=]'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21637

Differential Revision: D15761860

Pulled By: VitalyFedyunin

fbshipit-source-id: befc18a9c217440381cdb50a1319b0b5db5710e9
2019-06-11 12:30:52 -07:00
dd439bc39e Rename hubconf.conf to hubconf.py in docs (#21631)
Summary:
It's a typo I guess. cc fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21631

Differential Revision: D15764909

Pulled By: soumith

fbshipit-source-id: 5ffc7bde181c13e151332e7de3c0da36505b495e
2019-06-11 12:22:43 -07:00
bbcd6cc782 Support for type annotations instead of torch.jit.annotate() (#21390)
Summary:
This adds support for PEP 526 style annotations on assignments in place of
`torch.jit.annotate()`, so

```python
a = torch.jit.annotate(List[int], [])
```

turns into

```python
a : List[int] = []
```
](https://our.intern.facebook.com/intern/diff/15706021/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21390

Pulled By: driazati

Differential Revision: D15706021

fbshipit-source-id: 8bf1459f229d5fd0e16e59953b9656e85a2207fb
2019-06-11 12:03:57 -07:00
25d1496d58 Fix Process Group for tensors shared across processes (#21449)
Summary:
Ops on a Process Group (pg) instance will hit an error when input/output tensors are created on a different process, because, pg calls `recordStream` on `CUDACachingAllocator` which only knows tensors created within the same process.

The proposed solution is to add a `suppressError` arg (suggestions for better names?) to `recordStream`. See comments in code for arguments.

CC pichuang1984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21449

Differential Revision: D15689736

Pulled By: mrshenli

fbshipit-source-id: e7fc81b167868f8666536067eaa7ae2c8584d88e
2019-06-11 11:50:25 -07:00
50ee1f3fa7 better error msg when seeing a unsupported builtin function (#21068)
Summary:
fixes https://github.com/pytorch/lockdown/issues/39.
Hopefully it doesn't break other tests....
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21068

Differential Revision: D15761895

Pulled By: ailzhang

fbshipit-source-id: 60cbb16cfc930b377d753b81e10b7edaea9a1281
2019-06-11 11:32:44 -07:00
4610347fdf Breaks up NN module in docs so it loads faster.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21291

Differential Revision: D15760935

Pulled By: ezyang

fbshipit-source-id: 114da4c52b78949e631e9adcae4eb620546124fb
2019-06-11 09:38:41 -07:00
646a7f99bb Move management of calls of "cmake --build" to setup_helper/cmake.py and refactor as a CMake class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21493

Differential Revision: D15759279

Pulled By: ezyang

fbshipit-source-id: 157e1de36f1c5a51caf2a25b363a94369c442012
2019-06-11 07:04:05 -07:00
835a6b9da2 Fix namedtensor build (#21609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21609
ghimport-source-id: 648a0bcd28db2cdda1bf2fa6a904ca8f851088c2

Differential Revision: D15747687

Pulled By: zou3519

fbshipit-source-id: 2a972a15fa7399391617fc6e6b19879b86568c3a
2019-06-11 06:53:50 -07:00
29c849ff34 implement transpose operator for MKLDNN (#19955)
Summary:
implement transpose operator for MKLDNN
1. upgrade mkldnn-bridge to support ND transpose
2. implement transpose operator in caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19955

Differential Revision: D15701832

Pulled By: bddppq

fbshipit-source-id: e4337cd0ba6f8180a35c8c70cbb6830a0a84182f
2019-06-11 01:55:13 -07:00
731670f40a upgrade mkldnn-bridge (#20569)
Summary:
1. reduce the overhead of mkldnn-bridge itself
2. remove redundant code and useless APIs
3. provide new operators, including int8 inner_product,  ND permute/transpose, elem_add/mul, and etc.
4. improve inner_product to support io format weights without implicit reorder
5. add SoftMax support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20569

Reviewed By: houseroad

Differential Revision: D15558663

Pulled By: bddppq

fbshipit-source-id: 79a63aa139037924e9ffb1069f7e7f1d334efe3a
2019-06-11 00:47:11 -07:00
f2623c74a9 add PT pointwise unary ops to the benchmark suite (#21207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21207

This diff adds 80 PT pointwise unary ops to the benchmark suite. Most of the ops are added using the generate_pt_tests_from_list interface. The rest are handled separately.

Reviewed By: zheng-xq

Differential Revision: D15471597

fbshipit-source-id: 8ea36e292a38b1dc50f064a48c8cd07dbf78ae56
2019-06-10 21:35:44 -07:00
4e3c97a0be add separate path for op with JIT (#21210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21210

This diff introduces a new path to run op with JIT. There are two steps involved here:
1. Users need to script the op. This should happen in the `init` method.
2. The generated graph from step1 is passed to `jit_forward` which will be executed by the benchmark backend

Reviewed By: zheng-xq

Differential Revision: D15460831

fbshipit-source-id: 48441d9cd4be5d0acebab901f45544616e6ed2ee
2019-06-10 19:53:58 -07:00
a82feee07c Method-only entries in native functions should have self as first argument (#21549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21549
ghimport-source-id: a98fd7a18b4c523d9facb328a3b80a35416834ce

Differential Revision: D15724794

Pulled By: li-roy

fbshipit-source-id: a0f218cf6fd32d9694921685fc805d868156fce3
2019-06-10 19:32:34 -07:00
fff22125a5 AT_CHECK -> TORCH_CHECK (#21547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21547
ghimport-source-id: d99d3fdcd9abde4e1126716d32ed05aaf8508c50

Differential Revision: D15747676

Pulled By: bwasti

fbshipit-source-id: ae9824436e8316e2d0002d2973df4833a18c5f23
2019-06-10 16:58:09 -07:00
f5c24fc66d Deprecate torch::jit::RegisterOperators (#21552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21552

Original commit changeset: a142c22be3fd

https://github.com/pytorch/pytorch/pull/21368 got reverted because of a MSVC issue. This commit re-introduces that change and fixes the MSVC issue.

Differential Revision: D15727526

fbshipit-source-id: 8eb0eb9a7108dc049911b79342c364ac1b8623c8
2019-06-10 16:52:24 -07:00
cab3e726df Split out Function into its own file (#21539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21539
ghimport-source-id: f1e4396a0bec6e30d3179f926ec4da68807942f7

Differential Revision: D15741979

Pulled By: suo

fbshipit-source-id: 4cd0ed36bcbf8db0b36a101dda6f58975f806889
2019-06-10 16:37:58 -07:00
512c9d8c76 add PT gather op to the benchmark suite (#21614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21614

as title

Reviewed By: kimishpatel

Differential Revision: D15525115

fbshipit-source-id: 6a17e1d791bdb432cc3d51e45c5e82b96268127d
2019-06-10 16:31:52 -07:00
32a0440209 Publish torch::Dict and torch::OperatorKernel (#20723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20723

These classes already existed but only as c10::Dict and c10::OperatorKernel.
Since they're now part of torch::RegisterOperators(), they should also live in the torch namespace.

Differential Revision: D15421575

fbshipit-source-id: d64ebd8664fadc264bbbae7eca1faa182529a32b
2019-06-10 16:19:42 -07:00
a93a1ccbb3 Run test_c10d.py in multi-gpu environment (#21598)
Summary:
yf225 helped me discovered that our CI does not run multi-gpu tests in `test_c10d.py`. There are quite a few multi-gpu c10d tests. This PR tries to enable those tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21598

Differential Revision: D15744256

Pulled By: mrshenli

fbshipit-source-id: 0a1524a862946128321f66fc8b7f331eff10e52a
2019-06-10 15:58:38 -07:00
74f6c55f0f support negative axis in concat and split operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17955

Differential Revision: D14476031

Pulled By: ezyang

fbshipit-source-id: e0e57e8595ed2005ded9e923572a40fe62aca5a7
2019-06-10 15:26:29 -07:00
3889855a5b Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21463
ghimport-source-id: 1b0ea4a282b41388d5c6f6a5d18d37c14ae874ad

Differential Revision: D15747426

Pulled By: ezyang

fbshipit-source-id: 0708394f907b98a9f45bcfa26e5cc450fda8cf76
2019-06-10 15:26:25 -07:00
8b9b215dc5 Add a 'dim' argument to nuclear norm (#21022)
Summary:
Addresses #18275.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022

Differential Revision: D15743515

Pulled By: ezyang

fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
2019-06-10 15:18:34 -07:00
2378c120e6 Implements divmod function (#20979)
Summary:
This PR refer to issue #18627
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20979

Differential Revision: D15743929

Pulled By: wanchaol

fbshipit-source-id: 967fc3fd519501e427176e10b112c8be1390540b
2019-06-10 15:00:56 -07:00
8a88d33103 Uninitialized Ivalue (#21387)
Summary:
Create an uninitialized ivalue. This will be needed for Breaks & Continues to match up if block outputs of values that are guaranteed not to be used but need to escape the block scope. It is not exposed to users.

Was previously part of final returns but I was asked to make a separate PR for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21387

Differential Revision: D15745124

Pulled By: eellison

fbshipit-source-id: ae6a6f766b4a70a71b9033987a630cfbf044e296
2019-06-10 14:51:24 -07:00
dd0ffd6864 Use schema string specification in derivatives.yaml. (#20916)
Summary:
For consistency, derivatives.yaml now uses the same schema specification as native_functions.yaml.

Note that there are some small downsides, e.g. changing the default values or return parameter names in native_functions.yaml also now requires updating derivatives.yaml as well.  But this has a few nice properties:
1) Able to copy-paste definitions from native_functions to derivatives.
2) Makes it impossible to write derivatives for operators without schemas (e.g. old TH operators).
3) Moves us closer to the ideal situation of co-locating forward and backwards declarations.

Note that this doesn't change any generated code; in particular, this has the same behavior of mapping in-place and out-of-place definitions together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20916

Differential Revision: D15497800

Pulled By: gchanan

fbshipit-source-id: baee5caf56b675ce78dda4aaf6ce6a34575a6432
2019-06-10 13:47:55 -07:00
5f25a252d6 Allow tensors with requires_grad=True in c10 ops (#21599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21599

We prevented this because c10 ops can't have a backwards yet and calling them with requires_grad=True would do the wrong thing
if the c10 op is not purely implemented by calling other autograd-able ops.

However, it is a valid use case to have c10 ops that just call other autograd-aware ops, and these ops should be callable with requires_grad=True.

This should fix https://github.com/pytorch/pytorch/issues/21584.

Differential Revision: D15744692

fbshipit-source-id: ba665365c850ef63fc9c51498fd69afe49e5d7ec
2019-06-10 13:37:06 -07:00
5a48642fde Revert D15717575: [pytorch][PR] Fix bug in multinomial_alias_draw
Differential Revision:
D15717575

Original commit changeset: b1154e226d42

fbshipit-source-id: 305ca010bfda88c9295c52e0626d867452c72f84
2019-06-10 13:28:11 -07:00
4fb302eb34 fix optional type promotion for classes (#21593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21593
ghimport-source-id: f68730618bccf2326218e08d0a2a70171fdd8921

Differential Revision: D15741471

Pulled By: suo

fbshipit-source-id: 7ac1a0f6d9d2ff4bc819caff43a7a5b6d37cbc98
2019-06-10 12:51:00 -07:00
a436822c40 Consider contained types in alias analysis (#21431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21431
ghimport-source-id: d86ce974a065ec572e71cfa14a8f6bdf48216da7

Reviewed By: jamesr66a

Differential Revision: D15718560

Pulled By: suo

fbshipit-source-id: a36ce907ab26be22f12bab6175797fe8b34721f1
2019-06-10 12:42:10 -07:00
bb4aff2680 cleanups to memory_dag (#21430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21430
ghimport-source-id: 2dc5a0df8512e796c12d65d3ecc5981638122ce6

Reviewed By: jamesr66a

Differential Revision: D15718561

Pulled By: suo

fbshipit-source-id: 1ef31c08c8a757b632451eb07a47a8227e76c67f
2019-06-10 12:42:06 -07:00
ae144032aa cleanups to alias analysis interfaces (#21397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21397
ghimport-source-id: 8733e1af2fe66a3f4494a2c24c82a039375a982e

Reviewed By: jamesr66a

Differential Revision: D15642662

Pulled By: suo

fbshipit-source-id: ae66b7b4f19f255d6fe0e7e804bd0df6d86cb8d1
2019-06-10 12:42:02 -07:00
ddac8da813 avoid calling front() on empty working set (#21396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21396
ghimport-source-id: 7e57282099d2fd57c58c990b51ae933e427aecb2

Reviewed By: jamesr66a

Differential Revision: D15642663

Pulled By: suo

fbshipit-source-id: f9b467ba53f03438879bf3929da522aabaff2343
2019-06-10 12:41:58 -07:00
bb1dbdb99b Fix bug in multinomial_alias_draw (#21324)
Summary:
An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution

Changelog:
- Remove the incorrect increment / decrement operation

Fixes #21257, fixes #21508

cc: LeviViana neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324

Differential Revision: D15717575

Pulled By: ezyang

fbshipit-source-id: b1154e226d426c0d412d360c15f7c64aec95d101
2019-06-10 12:05:48 -07:00
30d6933016 BailOut Graphs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21381

Differential Revision: D15724412

Pulled By: Krovatkin

fbshipit-source-id: 18e4a1916c7cd1baea76953d0087d6257e58c55b
2019-06-10 11:49:38 -07:00
3df5a46a99 Skip triangular_solve CUDA test on non-default stream
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21590

Differential Revision: D15742549

Pulled By: ezyang

fbshipit-source-id: fd5b2cbce86e5f229c2ffba114ef362934296d07
2019-06-10 11:38:42 -07:00
6f99bcda8a fix test (#21594)
Summary:
test that wasn't on the CI, but is tested internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21594

Differential Revision: D15742157

Pulled By: eellison

fbshipit-source-id: 11fc82d1fc0281ffedd674ed96100e0c783c0599
2019-06-10 11:23:18 -07:00
91ea2cd5a7 clip sigmoid to prevent transforms return inf/nan values (#20288)
Summary:
This PR addresses some numerical issues of Sigmoid/StickBreakingTransform, where these transforms give +-inf when the unconstrained values move to +-20 areas.

For example, with
```
t = torch.distributions.SigmoidTransform()
x = torch.tensor(20.)
t.inv(t(x)), t.log_abs_det_jacobian(x, t(x))
```
current behaviour the inverse will return `inf` and logdet return `-inf` while this PR makes it to `15.9424` and `-15.9424`.

And for
```
t = torch.distributions.StickBreakingTransform()
x = torch.tensor([20., 20.])
t.inv(t(x)), t.log_abs_det_jacobian(x, t(x))
```
current value is `(inf, nan)` and `-inf` for logdet, while this PR makes it `[16.6355, 71.3942]` and `-47.8272` for logdet.

Although these finite values are wrong and seems unavoidable, it is better than returning `inf` or `nan` in my opinion. This is useful in HMC where despite that the grad will be zero when the unconstrained parameter moves to unstable area (due to clipping), velocity variable will force the parameter move to another area which by chance can move the parameter out of unstable area. But inf/nan can be useful to stop doing inference early. So the changes in this PR might be inappropriate.

I also fix some small issues of `_Simplex` and `_RealVector` constraints where batch shape of the input is not respected when checking validation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20288

Differential Revision: D15742047

Pulled By: ezyang

fbshipit-source-id: b427ed1752c41327abb3957f98d4b289307a7d17
2019-06-10 11:16:31 -07:00
4bdbd30b96 Add python binding to deserialize blob (#21532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21532

Add python binding to deserialize blob

Reviewed By: yinghai

Differential Revision: D15706816

fbshipit-source-id: f498c7e0f7392f055b13810bbf81cba59f25e1d2
2019-06-10 10:49:21 -07:00
e4fae884f6 Change compiler to use Load/Stores, then transform to SSA (#21101)
Summary:
This changes our compiler so it first emits Loads & Stores, and then transforms the graph to SSA in a follow up pass. When a variable is set, we emit a prim::Store, and when a variable is referenced, we emit a prim::Load.
```
a = 1
print(a)
```
becomes:
```
%a.1 : int = prim::Constant[value=1]()
prim::Store[name="a"](%a.1)
%a : int = prim::Load[name="a"]()
prim::Print(%a)
```
In the follow up pass, convertToSSA, the values are turned into SSA form with the Loads & Stores removed. This change will enable breaks and continues because you can transform the graph with the variable naming information still intact.

There are still some remaining jitter and edge cases issues that I have to look through, but I think is still ready for eview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21101

Differential Revision: D15723353

Pulled By: eellison

fbshipit-source-id: 3269934d4bc24ddaf3a87fdd20620b0f954d83d0
2019-06-10 10:26:43 -07:00
1e6c99a6e0 update hub doc (#21568)
Summary:
update doc as pointed out in https://github.com/pytorch/hub/pull/22
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21568

Differential Revision: D15732927

Pulled By: ailzhang

fbshipit-source-id: 78ab026539e5ee59e7c3a8144e2c9fcbbc225733
2019-06-10 09:39:35 -07:00
mal
f308b07e8c Don't leak threads on exit (#21438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21438
ghimport-source-id: 33f145f5b3508163365442c22a223c4a44e677d8

Differential Revision: D15738856

fbshipit-source-id: 656e8d0e3d0d22f116e3ab66bf0282608d6f1a76
2019-06-10 09:14:13 -07:00
c294d64eff fix concat and split tensor inference function (#21382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21382

Concat tensor inference function was not handling correctly the case where axis argument points to the last dimension so input tensors don't need to have the same number of dimensions.
Split tensor inference function was not handling correctly the case where split information is provided as the second input tensor rather than as an argument.

Reviewed By: mdschatz

Differential Revision: D15633148

fbshipit-source-id: d566af44dc882457ee9efe83d2461b28408c2c5d
2019-06-10 08:23:53 -07:00
9deab0cf0e Documentation for locking discipline in engine.cpp/.h (#21548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21548

Added documentation as titled.

Reviewed By: ezyang

Differential Revision: D15723146

fbshipit-source-id: fab4a35c62f07256673318c0874701f7628b2f7a
2019-06-10 07:50:01 -07:00
547fcaa977 Add named_guard to native_functions options (#21373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21373
ghimport-source-id: acab6d3ab0b287d504afa98eaefa2aed6fe99453

Differential Revision: D15717925

Pulled By: zou3519

fbshipit-source-id: 8515c448b368be79f71681833b5edf960da44fe8
2019-06-10 07:29:41 -07:00
8ffcbfb7d4 Add unique_ptr<NamedTensorMeta> field to TensorImpl (#21341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21341
ghimport-source-id: 06021b06864746571a904a1cfc0aaea5f8a12325

Differential Revision: D15717907

Pulled By: zou3519

fbshipit-source-id: 48ee76cf2f11a8b092be75ecac8d5faee68ca0d9
2019-06-10 07:29:36 -07:00
f9c4d0d7a9 Fix NVTX path on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21580

Differential Revision: D15738060

Pulled By: ezyang

fbshipit-source-id: 05a2e97279816753d574678252bf9b35913c99b1
2019-06-10 06:05:44 -07:00
c4e0d61646 Regularization is not supported in FP16 (#21319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21319

Add assertion to raise Exception when Regularization is applied on FP16.

Reviewed By: bddppq

Differential Revision: D15528486

fbshipit-source-id: c887c90d1d9ccfdaded3b5fa16816c6f29910e2e
2019-06-09 23:59:48 -07:00
b1bf16eeab Enabled _th_ixor_ and _th_equal for bool (#21538)
Summary:
Following up on the feedback in this [PR](https://github.com/pytorch/pytorch/pull/21113/files?file-filters%5B%5D=.cwrap&owned-by%5B%5D=)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21538

Differential Revision: D15721390

Pulled By: izdeby

fbshipit-source-id: 1b5265bf8726c1051f306f7674d731e25a6c7d03
2019-06-09 15:28:38 -07:00
e447a733a1 Update module.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21570

Differential Revision: D15732665

Pulled By: ezyang

fbshipit-source-id: caa12a8619ad1396540f787b5c849d29cc5b03bd
2019-06-09 15:28:35 -07:00
04e6564f0c clean up the TracingState API (#21564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21564
ghimport-source-id: b6f71e2238f6f7c8de6cfbf6969a5e08e07be46c

Reviewed By: suo

Differential Revision: D15729497

Pulled By: zdevito

fbshipit-source-id: aacfea6058fadb572df692aa9ebd6cab0bcd03fc
2019-06-09 15:28:32 -07:00
69aa2b2814 Collapse tracing_state.h into tracer.h (#21563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21563
ghimport-source-id: de87e5e621da33326a9d2cb8a57d82d355166479

Reviewed By: suo

Differential Revision: D15729499

Pulled By: zdevito

fbshipit-source-id: 17b3e2e71d004f08c4413e80091388ae9ac2df2b
2019-06-09 15:28:29 -07:00
ea822d9626 Interpreter support for CallFunction/CallMethod (#21562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21562
ghimport-source-id: 17e5e183f730f50d97ef48973aafc6249d54978f

Reviewed By: suo

Differential Revision: D15729500

Pulled By: zdevito

fbshipit-source-id: efa8a133b617b1498810392a8da6b513ce00b5eb
2019-06-09 15:28:26 -07:00
ad0c08f950 Expose ExecutionPlan in prep for function calls (#21561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21561
ghimport-source-id: 4bf28d8140610a0cefef0c0a17f0a513ae855dde

Reviewed By: suo

Differential Revision: D15729498

Pulled By: zdevito

fbshipit-source-id: b26458336da1efaba71d8a577c3917c6622dae0d
2019-06-09 15:28:22 -07:00
de31f6719c Add flag to temporarily enable first class modules (#21560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21560
ghimport-source-id: a555ca33fcd3efd1147aaf90f26a8e63da1c1a67

Reviewed By: suo

Differential Revision: D15729502

Pulled By: zdevito

fbshipit-source-id: d6c11472bfc791e2ad1e9aa695b0439d72b79681
2019-06-09 15:28:19 -07:00
18996a8952 unfinished push/pop reduction (#21559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21559
ghimport-source-id: 81ba4a5638577781e1ea706599966c033c37e814

Reviewed By: suo

Differential Revision: D15729501

Pulled By: zdevito

fbshipit-source-id: 3423bff61e89617c40078d5fab726b77d21bfa27
2019-06-09 15:28:16 -07:00
13edda417d Prepare interpreter for function calling (#21558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21558
ghimport-source-id: a8a19dbefea869ca1401e5afea6c02f31f95b99a

Reviewed By: suo

Differential Revision: D15729491

Pulled By: zdevito

fbshipit-source-id: 9629664608a2379a2ddcafaf741fa8463c4fb917
2019-06-09 15:28:13 -07:00
8ae7b1c486 Update functional.py doc (#21510)
Summary:
- Fixes a typo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21510

Differential Revision: D15731277

Pulled By: ezyang

fbshipit-source-id: c3f8e110f5c61e797b857477b495168ea8d63cd5
2019-06-09 15:28:09 -07:00
74828be4a7 fix segfault in cat on CPU with tensors that can't be indexed with 32-bit ints. (#21530)
Summary:
Should be self-explanatory. This `int` variable is overflowing.

Reported in #21526
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21530

Differential Revision: D15719275

Pulled By: umanwizard

fbshipit-source-id: 24e917a00a5b78bc3af29ef3b8b72eea7e89d5d5
2019-06-09 15:28:06 -07:00
406374657a Optimize batch mm op when broadcast the second input (#21556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21556

Optimize batch mm op when broadcast the second input

Reviewed By: houseroad

Differential Revision: D15728914

fbshipit-source-id: c60441d69d4997dd32a3566780496c7ccda5e67a
2019-06-09 15:28:03 -07:00
d71501259b Revert D15572818: Prepare interpreter for function calling
Differential Revision:
D15572818

Original commit changeset: 3a9b5f053664

fbshipit-source-id: b932411e8e88c7414c8db332d6049fe4e26bd83e
2019-06-07 22:20:54 -07:00
d4bcab0dba Revert D15590900: Reduce number of stack manipulation instructions in interpreter.
Differential Revision:
D15590900

Original commit changeset: 98829979feba

fbshipit-source-id: eb7f1d396bb2b98d2852af81c69db81430eba33c
2019-06-07 22:20:50 -07:00
03641413e5 Revert D15600068: Add flag to temporarily enable first class modules
Differential Revision:
D15600068

Original commit changeset: 9b68e23d7f8b

fbshipit-source-id: 45f36b3aaa4f1c457c27490579496456cbbc680b
2019-06-07 22:20:47 -07:00
e616a5e8b8 Revert D15600067: Expose ExecutionPlan in prep for function calls
Differential Revision:
D15600067

Original commit changeset: 82b7de458dd6

fbshipit-source-id: ca26a362cd73bdb9e8c4eba15dd5c10986fa79fe
2019-06-07 22:20:44 -07:00
bfb235b8c9 Revert D15618275: Interpreter support for CallFunction/CallMethod
Differential Revision:
D15618275

Original commit changeset: 038ae27e5416

fbshipit-source-id: 8dbe0f564ba103fe445dacc471085c659171705f
2019-06-07 22:20:40 -07:00
c27cabe2d7 Revert D15719982: Collapse tracing_state.h into tracer.h
Differential Revision:
D15719982

Original commit changeset: 56bb021dd949

fbshipit-source-id: 2eb3e2c9745c35a84ebcc0fc7ac62b5f1fdd6437
2019-06-07 22:20:37 -07:00
3cfe914191 Revert D15719980: clean up the TracingState API
Differential Revision:
D15719980

Original commit changeset: 3de2746c3f3c

fbshipit-source-id: 4610e215936b2476a0271355ef3b8f1f480bdea8
2019-06-07 22:20:34 -07:00
dd0faf4366 clean up the TracingState API (#21514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21514
ghimport-source-id: 6a9b6fdd7e696ea29e8715482708efe897230e4d

Reviewed By: jamesr66a

Differential Revision: D15719980

Pulled By: zdevito

fbshipit-source-id: 3de2746c3f3c3de4111b4cb73f4c4acedbf28862
2019-06-07 20:57:05 -07:00
8c5f3acfc0 Collapse tracing_state.h into tracer.h (#21513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21513
ghimport-source-id: 86278929818a8fc65684bd8f2ffac31460772fe9

Reviewed By: jamesr66a

Differential Revision: D15719982

Pulled By: zdevito

fbshipit-source-id: 56bb021dd949668562ea481c5ff0115a9ea2b02e
2019-06-07 20:57:01 -07:00
5f6afafdef Interpreter support for CallFunction/CallMethod (#21325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21325
ghimport-source-id: eeca1176f5e00c85a69cd016acccf5105e670e02

Reviewed By: jamesr66a

Differential Revision: D15618275

Pulled By: zdevito

fbshipit-source-id: 038ae27e5416f1ce338009627c839a4d61a00658
2019-06-07 20:56:58 -07:00
1517ff66a1 Expose ExecutionPlan in prep for function calls (#21273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21273
ghimport-source-id: b92c1e07fbe4122467a21b98d29635295093e0c2

Reviewed By: jamesr66a

Differential Revision: D15600067

Pulled By: zdevito

fbshipit-source-id: 82b7de458dd65c175f55b0f383bfc3fcf4704032
2019-06-07 20:56:55 -07:00
7e08bc42d5 Add flag to temporarily enable first class modules (#21272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21272
ghimport-source-id: 43e73d1b93ccbe0dd6845eb3f7444c9d0abd444b

Reviewed By: jamesr66a

Differential Revision: D15600068

Pulled By: zdevito

fbshipit-source-id: 9b68e23d7f8b6046a5a0d6d9fd16138ac384b863
2019-06-07 20:56:52 -07:00
dde27958dd Reduce number of stack manipulation instructions in interpreter. (#21240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21240
ghimport-source-id: 5e9cbe8b3df3ac721135d2f652a420ae0b14ac55

Reviewed By: jamesr66a

Differential Revision: D15590900

Pulled By: zdevito

fbshipit-source-id: 98829979feba23685f0ba98ba3cb840157f7259a
2019-06-07 20:56:49 -07:00
c53e4d012d Prepare interpreter for function calling (#21185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21185
ghimport-source-id: 6b9cb92d1f1f59bb980dcfa0d29dfe985ee955d1

Reviewed By: jamesr66a

Differential Revision: D15572818

Pulled By: zdevito

fbshipit-source-id: 3a9b5f053664c09212b97f1391d8d006337b5550
2019-06-07 20:56:46 -07:00
c36dc35853 Revert D15576968: Turn on Werror for deprecated-declarations.
Differential Revision:
D15576968

Original commit changeset: fb73a8986a5b

fbshipit-source-id: 1ae19afc6816f764b895a47162728433a319ac0b
2019-06-07 19:15:56 -07:00
b849f101b1 Fix slow unpickling (#21542)
Summary:
This was looking at the number of elements in the memo table, not the total capacity, and was thus calling reserve() a lot more than it should have
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21542

Reviewed By: driazati

Differential Revision: D15723132

Pulled By: jamesr66a

fbshipit-source-id: 20e1f9099b6a51a33994ea9dbc3f22eb3bc0c8f9
2019-06-07 17:28:55 -07:00
66d596645a Turn on Werror for deprecated-declarations. (#21195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21195

The motivation is that, while we shouldn't break USER code for using
deprecated declarations, we should keep our internal code base
deprecation clean.

Differential Revision: D15576968

fbshipit-source-id: fb73a8986a5b60bf49ee18260653100319bb1030
2019-06-07 17:24:17 -07:00
a5cf6d5100 reorganize op bench directory (#21543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21543

No code change in this diff.

Reviewed By: hl475

Differential Revision: D15721419

fbshipit-source-id: 06212cc882f5297064153417dc4d80bce9ec2667
2019-06-07 16:06:51 -07:00
5b4a188a95 add support for steps(strides) in tensor slices
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20929

Differential Revision: D15632636

Pulled By: Krovatkin

fbshipit-source-id: 0e127bbd7b339784c4be2e0a57f28024727d5ad3
2019-06-07 15:55:26 -07:00
5744fb3007 Add mkldnn softmax operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21516

Differential Revision: D15712759

Pulled By: bddppq

fbshipit-source-id: bf515135263156bea1a2b3e53a47edf697b8b1e2
2019-06-07 15:22:18 -07:00
a947d98282 Set "scalar_check: false" for some LAPACK functions that can't return scalars. (#21498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21498
ghimport-source-id: 33ce2f3f083616f633561e4871585f439e2647c0

Differential Revision: D15715477

Pulled By: gchanan

fbshipit-source-id: 7772573ba74cdf7a5f2d86d2e581652ebd85e1c6
2019-06-07 15:00:54 -07:00
fe5ceea580 Rename caffe2<->c10 operator wrappers (#21322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322

Naming is everything.

- Rename c10_operator.h -> export_caffe2_op_to_c10.h
- Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h
- Rename corresponding macros

This hugely improves readability and explains what these things are doing.

Reviewed By: dzhulgakov

Differential Revision: D15616816

fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603
2019-06-07 13:48:10 -07:00
dad85b7e69 clang-format by line (#21531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21531
ghimport-source-id: 711867e19cc3948a5e2a6aa8c4f2cd631abb04d2

Reviewed By: zdevito

Differential Revision: D15719260

Pulled By: suo

fbshipit-source-id: e88c5d3e14e6ecc956ce30ab0246ed606f4b0a38
2019-06-07 13:42:44 -07:00
fa0c5c31d4 Turn namedtensor build back on (#21520)
Summary:
namedtensor build + test should run on PRs only if the commit message
includes [namedtensor ci].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21520

Differential Revision: D15718404

Pulled By: zou3519

fbshipit-source-id: ce8b5df2682e795e64958a9d49e2e3c091599b33
2019-06-07 13:37:48 -07:00
2b902e9738 Fix the offset numerical bug when casting (#21484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484

cast<int32_t*> => cast<int32_t>

Also fixed reserve problem which might cause incorrect pointer.

Reviewed By: yinghai

Differential Revision: D15699866

fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990
2019-06-07 12:33:18 -07:00
ac8c3fa7b6 Revert D15717337: [pytorch][PR] [precommit hook] clang-format by line
Differential Revision:
D15717337

Original commit changeset: 57e65a679a8f

fbshipit-source-id: f73794087a23d56d03497b29d9a9e4e7d54deaad
2019-06-07 11:50:42 -07:00
a77802cf56 clang-format by line (#15657)
Summary:
This should further reduce noise by only clang-formatting the lines you actually touched in the precommit hook.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15657

Differential Revision: D15717337

Pulled By: suo

fbshipit-source-id: 57e65a679a8fdee5c3ff28e241c74ced9398eb0c
2019-06-07 11:46:36 -07:00
b7f5d1e4c6 Fix size of histc return on CPU when input is 0-dimensional and bins=1. (#21497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21497
ghimport-source-id: bc03f27408aa772f78d5351afe404b5e91a7c4ce

Differential Revision: D15715478

Pulled By: gchanan

fbshipit-source-id: 90e1b65249b4b12f936ee8877cc0bc5a972d9ceb
2019-06-07 11:23:46 -07:00
881adb5bcd fix tuple indexing bug (#21521)
Summary:
lower tuples pass didn't check bounds for tuple index
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21521

Differential Revision: D15716813

Pulled By: eellison

fbshipit-source-id: 8eead98c2c63118e7d24a8c8bf6184b02afb7dcd
2019-06-07 11:17:59 -07:00
a5cca4d342 add failback for Sign operator (#21343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21343

Needed to binarise features

Reviewed By: yinghai

Differential Revision: D15625653

fbshipit-source-id: 52f48259a040dac35a7000bb1eea9feb5c7ef1ab
2019-06-07 10:56:22 -07:00
51fb42ebcf Updating submodules
Reviewed By: yns88

fbshipit-source-id: 5778cdb5173fc16e5d5474fefa2ea89264101184
2019-06-07 10:43:12 -07:00
54413cf91e replace LegacyTracedModule with torchscript used in add_graph (#21339)
Summary:
The new implementation of tracing supports more module. So many error-handling code can be removed by placing the old one (LegacyTracedModule).

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21339

Reviewed By: natalialunova

Differential Revision: D15695154

Pulled By: orionr

fbshipit-source-id: af7d35754e9f34bd1a0ad7b72a9ebe276ff8ab98
2019-06-07 10:43:08 -07:00
b144ba66d5 Change PyTorch tests to use non-default CUDA stream (#21474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474
ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3

Differential Revision: D15699700

Pulled By: fbhuba

fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97
2019-06-07 10:24:48 -07:00
5cc3a3e2bf Set "scalar_check: false" for TH methods that can't return scalars. (#21496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21496
ghimport-source-id: d7197bccfe9e4d807f38a66e02ca6f0bf32bdc2b

Differential Revision: D15715479

Pulled By: gchanan

fbshipit-source-id: fa59eb808d26119b33eb97bb90ef70e95e58458d
2019-06-07 10:19:23 -07:00
cc85c3dbbc ONNX Export Slice and Flip ops for opset 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20533

Reviewed By: zrphercule

Differential Revision: D15579713

Pulled By: houseroad

fbshipit-source-id: 91f3ac0cb14ef226f980362b0013b6b92cb8b8da
2019-06-07 10:03:26 -07:00
3eced796cd Make torchvision install chatty. (#21476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21476
ghimport-source-id: adfd08b818f31ebbdf3da89d6bb95d33e14a9403

Differential Revision: D15715270

Pulled By: ezyang

fbshipit-source-id: dde02579d9960ac960306d0a024b8e17846ae0ff
2019-06-07 08:41:13 -07:00
1503c734ce updating gemmlowp tp 3fb5c
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21488

Differential Revision: D15715264

Pulled By: ezyang

fbshipit-source-id: 86978f294720e0ce6f60b748a71f0604d6cfa00c
2019-06-07 08:32:49 -07:00
8c9a88bdab Make test_cuda.py work on Python 2. (#21466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21466
ghimport-source-id: 0a235c8b8cf994621a5a5afe022340dd35764c91

Differential Revision: D15698096

Pulled By: ezyang

fbshipit-source-id: 1759c2681071e9c7e83de3de86daf4333c5f8f3a
2019-06-07 08:13:03 -07:00
c60465873c Fix batch norm multiplier init (#13774)
Summary:
Fixes #12259, needs to make sure tests (see #13766) don't break due to numerical precision issues. Not sure what would need to be adjusted here...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13774

Differential Revision: D15715021

Pulled By: ezyang

fbshipit-source-id: 20ce2beee1b39ebe9f023c5f2b25be53acccb5f3
2019-06-07 07:50:39 -07:00
c604847602 Implement at::match(Dimname, Dimname) and at::unify(Dimname, Dimname) (#21281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21281
ghimport-source-id: 4b241d54a60c188b8566065c90b227b40914a5ca

Differential Revision: D15699063

Pulled By: zou3519

fbshipit-source-id: c0f00c370d266a4ea5211aae943041fd899e960a
2019-06-07 06:30:45 -07:00
4727685ea1 Added at::Dimname (#21280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21280
ghimport-source-id: 921848326e4828ffd422868be26c409c6490e1ab

Differential Revision: D15698516

Pulled By: zou3519

fbshipit-source-id: 502b9b019d51dd46327e6caf2af69aa520c70cb6
2019-06-07 06:30:42 -07:00
e27c2f1437 Revert "Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen" (#21427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21427
ghimport-source-id: 930c2fb29320f70e78f94e7eaaffe8e2ab62e7f3

Differential Revision: D15698423

Pulled By: ezyang

fbshipit-source-id: 891c94c24b6d377cd6dd94d86cc66465b582359f
2019-06-07 05:52:27 -07:00
d6af6588c2 Super resolution export to Caffe2 is broken, skip it. (#21479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21479
ghimport-source-id: 60fa97fb2dfb37a758c0e8b9c2bc0fb2819fd2f7

Differential Revision: D15713609

Pulled By: ezyang

fbshipit-source-id: a3d9c49e2db985f4373508cd44e94d43ae6e24da
2019-06-07 05:46:26 -07:00
78a376592d add cancelAsyncCallback method to OperatorBase (#21492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492

If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks.

The new behavior is to cancel the callbacks first, then set event status to finished.

Reviewed By: ilia-cher

Differential Revision: D15702475

fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca
2019-06-06 20:57:12 -07:00
696b2c89b4 Adding gradient to Boolean Mask operator (#21423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21423

- add gradient for boolean mask
- add test for gradient checking

Reviewed By: BIT-silence

Differential Revision: D15640036

fbshipit-source-id: 79f40c6901e805bf1b8e9b01b57903e30b00f654
2019-06-06 20:48:47 -07:00
d3d195e0b1 Updating submodules
Reviewed By: yns88

fbshipit-source-id: af5812e3d071e66f9d0272c36bf639eb04bde7e4
2019-06-06 20:42:34 -07:00
772fd79d40 Defer constructing error strings for definitions under If's until they're needed. (#21429)
Summary:
This saves ~7% DenseNet load time (4.3s -> 4.0s) on my laptop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21429

Differential Revision: D15681374

fbshipit-source-id: 9925a6154d51f2d592e26cb5ff8bf7ab3ee2519b
2019-06-06 20:32:57 -07:00
abc0d3e544 Fix unused variable warning
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21444

Differential Revision: D15701786

Pulled By: ezyang

fbshipit-source-id: 8348e08f9b8f3047b30736f9a944786ab84e6b68
2019-06-06 19:37:54 -07:00
bad67015fe Add warning for Turing GPUs and CUDA <= 9000 (#21468)
Summary:
Turing GPUs (compute capability 7.5) require CUDA10 to work properly.
We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older:
[Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575)
[Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6)

Tested on using CUDA9 with an RTX 2080Ti.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21468

Differential Revision: D15696170

Pulled By: ezyang

fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a
2019-06-06 19:33:02 -07:00
63d4bbb0ec Turn XLA back on for default set (but filtered out using should_run_job) (#21470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21470
ghimport-source-id: 69800c1ce1187591b7bcdb8a63973b4fd8d0e326

Differential Revision: D15696930

Pulled By: ezyang

fbshipit-source-id: fafbcba38d9572a23ee9c1d81cdcce3a154ae4c6
2019-06-06 19:18:45 -07:00
f433913996 add more info back to BenchResult (#21502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21502

In BenchResult, we keep name, avg_fwd, std_fwd, avg_bwd, and std_bwd. There is no information about the number of each iteration. In this diff, I am adding more info to BenchResult to include the number reported from each iteration.

Reviewed By: wanchaol

Differential Revision: D15706306

fbshipit-source-id: 3f14be4ba91f1f6da473995783bd7af1d067938d
2019-06-06 18:43:51 -07:00
d51bd2191c Revert D15629687: Deprecate torch::jit::RegisterOperators
Differential Revision:
D15629687

Original commit changeset: 2f87f18be655

fbshipit-source-id: a142c22be3fdf14a2b3c29b8766b218fb0883927
2019-06-06 18:09:01 -07:00
37ab35c8fc Move jit testing utils to their own file (#21491)
Summary:
This moves `JitTestCase` to its own file so that we can have other jit
test files (ex. `test_jit_py3.py`)

There aren't any code changes, just a move and cleaning up the imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21491

Pulled By: driazati

Differential Revision: D15703060

fbshipit-source-id: 6082e8b482100bb7b0cd9ae69738f1273e626171
2019-06-06 15:52:45 -07:00
e87f77def6 Fix typo (#21426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21426

-

Differential Revision: D15679789

fbshipit-source-id: 5fd448e66af159fd79883aa874065424ec9694ad
2019-06-06 15:44:16 -07:00
d714abf597 Deprecate torch::jit::RegisterOperators (#21368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21368

-

Differential Revision: D15629687

fbshipit-source-id: 2f87f18be65552f3eb3f4c945d7f19ba4bae0eb8
2019-06-06 15:44:12 -07:00
cb2ec07fa2 ReshapeOp supports empty tensor (#21230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21230

tsia; we support empty tensor with this diff for reshape operator

Reviewed By: jerryzh168

Differential Revision: D15583356

fbshipit-source-id: 6d44c04e95ca3546509bfb12102e29c878f9a7c7
2019-06-06 15:02:11 -07:00
b161832f10 support ceil mode by padding changes (#21310)
Summary:
Modify MKLDNN pooling operation to support ceil mode by adjusting the right/bottom padding accordingly. This is done similarly as in Caffe (see discussion https://github.com/pytorch/pytorch/pull/19205#discussion_r276903751).

To make this possible, I split the padding to left and right (top / bottom). This naming is confusing but actually follows mkldnn's own naming for pooling::compute(). We increase the r paddings so that it matches the ceiling mode expected output size.

Strengthened the test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21310

Reviewed By: bddppq

Differential Revision: D15611664

Pulled By: akyrola

fbshipit-source-id: 46b40015dafef69a8fd5e7b2c261d8dbf448cd20
2019-06-06 14:47:35 -07:00
80a083ef92 Remove unneeded headers (#21393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21393

Result of splitting the base diff. We moved a header from src/* to include/fbgemm/*

Reviewed By: jianyuh

Differential Revision: D15635188

fbshipit-source-id: ad7d0ddba964ff1cb8b2e33f5f98e457a4d2eac9
2019-06-06 14:23:54 -07:00
8a9ea55b25 Add autograd for to_sparse. (#20458)
Summary:
https://github.com/pytorch/pytorch/issues/18111
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20458

Differential Revision: D15699732

Pulled By: nairbv

fbshipit-source-id: f7a5424c1f1d3b0e4eba0d503d75ae8a18ef7ff4
2019-06-06 14:23:51 -07:00
87d10d49f4 Bilinear Upsampling increased throughput (#19306)
Summary:
changed `UpsampleBilinearKernel` s.t. the throughput increased 40~50%.

I tested locally with my local test code -- **not pytorch's provided test code** -- because I am having a build problem ( which I made an issue about [here](https://github.com/pytorch/pytorch/issues/19184)). I tested with various tensor sizes and across all the sizes, it should a significant increase in throughput.

1. added `__restrict__`
2. instead of launch as many threads as there are output elements, I launched only `output_height * output_width` may threads and had each thread iterate through the channel and batch dimension.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19306

Differential Revision: D15701840

Pulled By: ezyang

fbshipit-source-id: 53c54d4f4e4a28b58ecc7d7ae6b864cbfc760e27
2019-06-06 13:58:57 -07:00
c5d5d45f40 Fix numerically instability of SigmoidTransform (#19802)
Summary:
fix #18254 for numerically instability of `SigmoidTransform`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19802

Differential Revision: D15701837

Pulled By: ezyang

fbshipit-source-id: fe6c755c523487c8bbdcc3bfb8455801617c70a4
2019-06-06 13:58:53 -07:00
f8cab38578 Address precision matrix instability of MVN distribution (#21366)
Summary:
Currently, when the input of MVN is precision matrix, we take inverse to convert the result to covariance matrix. This, however, will easily make the covariance matrix not positive definite, hence will trigger a cholesky error.

For example,
```
import torch
torch.manual_seed(0)
x = torch.randn(10)
P = torch.exp(-(x - x.unsqueeze(-1)) ** 2)
torch.distributions.MultivariateNormal(loc=torch.ones(10), precision_matrix=P)
```
will trigger `RuntimeError: cholesky_cpu: U(8,8) is zero, singular U.`

This PR uses some math tricks ([ref](https://nbviewer.jupyter.org/gist/fehiepsi/5ef8e09e61604f10607380467eb82006#Precision-to-scale_tril)) to only take inverse of a triangular matrix, hence increase the stability.

cc fritzo, neerajprad , SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21366

Differential Revision: D15696972

Pulled By: ezyang

fbshipit-source-id: cec13f7dfdbd06dee94b8bed8ff0b3e720c7a188
2019-06-06 13:54:46 -07:00
vfn
8ece538a79 Addresses bad behavior with overridden optimizer.step by #20124 (#21460)
Summary:
This PR addresses the problem described in the comment: https://github.com/pytorch/pytorch/pull/20203#issuecomment-499231276
and previously coded bad behaviour:
- a warning was raised all the times when lr schedulling is initialized

Now the code checks that:
- on the second call of `lr_scheduler.step`, ensure that `optimizer.step` has been already called, otherwise raise a warning (as it was done in #20203 )
- if optimizer's step is overridden -> raise once another warning to aware user about the new pattern:
`opt.step()` -> `lrs.step()` as we can not check this .

Now tests check that
- at initialization (`lrs = StepLR(...)`)there is no warnings
- if we replace `optimizer.step` by something else (similarly to the [code of nvidia/apex](https://github.com/NVIDIA/apex/blob/master/apex/amp/_process_optimizer.py#L287)) there is another warning raised.

cc ezyang

PS. honestly I would say that there is a lot of overhead introduced for simple warnings. I hope all these checks will be removed in future `1.2.0` or other versions...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21460

Differential Revision: D15701776

Pulled By: ezyang

fbshipit-source-id: eac5712b9146d9d3392a30f6339cd33d90c497c7
2019-06-06 13:54:42 -07:00
51d0da2802 Improve build docs and process for Windows (#21190)
Summary:
Fixes #21026.
1. Improve build docs for Windows
2. Change `BUILD_SHARED_LIBS=ON` for Caffe2 local builds
3. Change to out-source builds for LibTorch and Caffe2 (transferred to #21452)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21190

Differential Revision: D15695223

Pulled By: ezyang

fbshipit-source-id: 0ad69d7553a40fe627582c8e0dcf655f6f63bfdf
2019-06-06 13:46:52 -07:00
6fc702f384 Per-callback sampling (#21394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21394
ghimport-source-id: 2607c7b456031a1ddb19fabc3b6fe2585c276d76

Differential Revision: D15639723

Pulled By: ilia-cher

fbshipit-source-id: 938d02c1daf5bec5afa5d3cd021d2dae7e7160ce
2019-06-06 13:46:48 -07:00
bb788631ce Fix caffe2 windows CI for new Windows AMI (#21452)
Summary:
The alternative of #21410.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21452

Differential Revision: D15701767

Pulled By: ezyang

fbshipit-source-id: e65c1d6bfcc98e88460f4a57e5b99c2f395c0ceb
2019-06-06 13:46:45 -07:00
3feb40d602 pack_padded_sequence: Check for empty (zero-element) tensors (#21461)
Summary:
Fixes: #20529

Thank you, JamieCT for the bug report with reproducing script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21461

Differential Revision: D15696183

Pulled By: ezyang

fbshipit-source-id: a93cde2c924f8447563c64ce8a1cf75fcee60a01
2019-06-06 13:41:52 -07:00
3b6362d98e Remove NodeExecStats and AllocatorMemoryUsed (#21419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21419

Removed ```node_stats``` and unused imports

Reviewed By: orionr

Differential Revision: D15672824

fbshipit-source-id: 6167c80c081d925f02a1d279f3af0e1b8de66752
2019-06-06 13:35:52 -07:00
0a3fb45d3d allow passing Python built-in types as dtypes (#21215)
Summary:
Another simple bit of syntax that NumPy supports and we don't.

Support int, float, and bool.

```python
>>> torch.randn((2,3), dtype=float)
tensor([[-0.1752, -0.3240, -0.6148],
        [ 0.1861,  1.6472,  0.1687]], dtype=torch.float64)
```

A bit confusingly, Python's "float" actually means double, but nothing we can do about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21215

Differential Revision: D15697012

Pulled By: umanwizard

fbshipit-source-id: 9a38d960a610b8e67023486b0c9265edd3c22246
2019-06-06 13:17:23 -07:00
b647804a55 Fix embedding bag nan output when input is empty (#21400)
Summary:
```
import torch

Embed = torch.nn.EmbeddingBag(100, 10, sparse=True)

print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0])))
print(Embed(input=torch.LongTensor([]), offsets=torch.LongTensor([0, 0])))
```

Before this fix:
```
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
```

After this fix:
```
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21400

Differential Revision: D15643357

Pulled By: bddppq

fbshipit-source-id: 119eba38129dc0a3757c331304a18044714fcca5
2019-06-06 13:03:17 -07:00
f4f32cecfd numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:

```
In [15]: x=torch.randn((1000,1000))

In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: y=x.numpy()

In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: x=x.t()

In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: y=x.numpy()

In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293

Differential Revision: D15358754

Pulled By: umanwizard

fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1
2019-06-06 12:50:59 -07:00
8a2985eb05 Support recursive ModuleList / Sequential (#21306)
Summary:
Adds support for recursively compiling `nn.Sequential` and
`nn.ModuleList`. When either is used, it is converted to a
`jit._ConstModuleList` or `jit._ConstSequential` as necessary. Due to
this, we don't need to add it to `__constants__` since it's made
constant on demand.

This PR also moves the recursive script tests out to their own class
`TestRecursiveScript` (the added test is called `test_iterable_modules`)
](https://our.intern.facebook.com/intern/diff/15611738/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21306

Pulled By: driazati

Differential Revision: D15611738

fbshipit-source-id: fac52993990bd2dfad71d044c463a58a3759932a
2019-06-06 12:23:59 -07:00
2e37ab85af Enable bool support for several index methods (#21435)
Summary:
Enable bool tensors for these index methods:
- index_select
- index_copy
- put
- take
- index_fill

Tested via unit tests

TODO:
Enable index_add in a separate PR as it requires more "side" changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21435

Differential Revision: D15684964

Pulled By: izdeby

fbshipit-source-id: 48440e4d44873d70c4577e017dd0d8977e0fa15a
2019-06-06 12:14:01 -07:00
61cc03fb8d Make ScriptModule.training an attribute instead of a parameter (#21078)
Summary:
Redo of #19587
](https://our.intern.facebook.com/intern/diff/15560540/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21078

Pulled By: driazati

Differential Revision: D15560540

fbshipit-source-id: f415775d87c163f93b3bbdd5f87c9ff73f58b049
2019-06-06 12:06:49 -07:00
f1adddd1c6 Updated sum() logic to properly deal with bool tensor (#21421)
Summary:
`torch.tensor([True, False, True], dtype=torch.bool).sum()` should return **2** instead of **True** as it does now.

Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21421

Differential Revision: D15674203

Pulled By: izdeby

fbshipit-source-id: b00e3d0ca809c9b92b750adc05632522dad50c74
2019-06-06 12:02:23 -07:00
b7b6b612a7 Fix C++ data parallel (#20910)
Summary:
Fixes #19540

CC nmerrill67

C++ data parallel was using Module.clone() to create module replicas on every destination device. However, clone() does not set up gradient edges to point from replicas to the original module. As a result, the gradient will not be aggregated into the original module. This commit fixes the the problem by manually setting gradient edges from every parameter X in every replica to the same parameter X in the original module.

## Failed Attempt

Initially I tried implementing what we did in `replicate.py`, which
1. create module replicas
2. use Python `Broadcast` autograd function to broadcast every parameter in the original module to all destination devices.
3. assign the broadcast result params to module replicas' `_parameters` dict.

This works in Python because derived module member field params (e.g., `Linear.weight`) and base module `_parameters` (e.g., `Linear._parameters['weight']`) are referencing the same parameter instance. Assigning one of them will apply to both. However, in C++, even though I can modify Module's `parameters_ `values and gradient edges to point to the broadcast source,  I cannot touch the weight and bias member fields in Linear, because replicate cannot (and should not) add special-case handlers to every different module. (See `Linear` [.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/modules/linear.h), [.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/src/nn/modules/linear.cpp)) Although they initially point to the same `TensorImpl` instance, after assigning to `Module.parameters_['weight']`, it will be different from `Linear.weight`.

## Solution Options

gchanan and I had several discussions on this issue and figured two solutions to this problem.

### Option One [implemented in this PR]

Replicate the module in two steps:
1. call `Module.clone()` to create a module replica on every destination device.
2. manually setting gradient edges from every parameter in every replica to the same parameter in the original module.

* Pro: Does not need to change any existing module, and relatively easier to implement
* Con: It is a little hackish.

### Options Two

Implement a `Replicatable` class (similar to `Cloneable`), and make it a friend class of `Module`. For more details see `Note [Replicating Modules]` in the code change.

* Pro: Maybe this aligns more with our existing approach implemented in `Cloneable`?
* Con: Require changes to every existing module.

I am inclined to go with option one, because `replicate` will only be used on data parallel. I feel it is too big an overkill if we have to change all existing module implementations due to a data parallel requirement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20910

Differential Revision: D15556426

Pulled By: mrshenli

fbshipit-source-id: aa836290ec657b32742e2bea80bd0ac2404ef3b0
2019-06-06 11:57:31 -07:00
da4f3629c5 Add missing shebangs to Python files with executable permissions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21305

Differential Revision: D15613078

Pulled By: ezyang

fbshipit-source-id: 1fedf4368d65db406b617a51402ee8a20968aff7
2019-06-06 10:53:40 -07:00
52596164d4 Fix 32-bit env. model load issue (#20900)
Summary:
Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20900

Differential Revision: D15696709

Pulled By: ezyang

fbshipit-source-id: 37a81f05f235d3b9fc6244e12d3320ced3d1465e
2019-06-06 10:30:29 -07:00
f891b4338a Test the exceptions raised by isfinite and isinf (#21168)
Summary:
Following up ef1fdc27a3779586efad631d698cec2d6d19503f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21168

Differential Revision: D15696615

Pulled By: ezyang

fbshipit-source-id: 46904974ef3c4cb87c7a1d06871bf01543e61ef2
2019-06-06 10:30:26 -07:00
dffff3218b Improves NVRTC Error messages (#21174)
Summary:
Current versions of NVRTC incorrectly map error code 7 to the error string "NVRTC unknown error." This update maps error code 7 to the correct string explicitly in PyTorch. See the documentation at: https://docs.nvidia.com/cuda/nvrtc/index.html#group__error.

This may give us a better idea of the source of NVRTC errors that some community members, like Uber, have reported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21174

Differential Revision: D15696593

Pulled By: ezyang

fbshipit-source-id: f5c7b5876c07b311ab5f2d7c8e375e93273912c6
2019-06-06 10:30:22 -07:00
6615797837 Add derivative for QR decomposition (#21274)
Summary:
Changelog:
- Implement derivative for QR decomposition for tall and square matrices i.e., num rows >= num cols
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21274

Differential Revision: D15696506

Pulled By: ezyang

fbshipit-source-id: 1c77bb8369818112c84139360f6e2650f92bf2fd
2019-06-06 10:11:21 -07:00
26bcadcc61 Gumbel-Softmax Arxiv Docs Link Fix (#21376)
Summary:
Links separated #20297
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21376

Differential Revision: D15696413

Pulled By: ezyang

fbshipit-source-id: 513bd430e41c109aa2d0fbaa9a242acb2a12059b
2019-06-06 10:11:18 -07:00
ee15ad1bd6 "CharTensor" numpy conversion is supported now (#21458)
Summary:
Fixed #21269 by removed the the expected `ValueError` when converting a tensor to a Numpy `int8` array in the Numba interoperability test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21458

Differential Revision: D15696363

Pulled By: ezyang

fbshipit-source-id: f4ee9910173aab0b90a757e75c35925b026d1cc4
2019-06-06 10:06:41 -07:00
c8083e0292 Include named_any.h in modules.h (#21437)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19462.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21437

Differential Revision: D15684880

Pulled By: yf225

fbshipit-source-id: db23c7e4e0f62d22b0b6c18f15420c3bb66af366
2019-06-06 09:57:33 -07:00
856e3518c5 Parallelize eye() on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21077

Differential Revision: D15695329

Pulled By: ezyang

fbshipit-source-id: 9841777238dac7c08cde2db3cd9401853f633af3
2019-06-06 09:52:13 -07:00
ae18f8e761 Fix latex formular error about *normal (#21000)
Summary:
issue:
 https://github.com/pytorch/pytorch/issues/20903
the latex abort norm should be `\mathcal{N}(\text{mean}, \text{std}^2)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21000

Differential Revision: D15695335

Pulled By: ezyang

fbshipit-source-id: 34dcca0acb20c297f876287e081cd44d11a3e516
2019-06-06 08:47:42 -07:00
4e02d3c0a1 insert default parameters in binary cross entropy with logits (#21336)
Summary:
I inserted default weight and reduction params in binary_cross_entropy_with_logits function . These default params exist in python and binary_cross_entropy function in cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21336

Differential Revision: D15628917

Pulled By: ezyang

fbshipit-source-id: 38e5f53851125238842df1bd71cb6149c8603be1
2019-06-06 08:47:39 -07:00
d50dca4075 fix two typos: "a the" => "the"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20437

Differential Revision: D15321243

Pulled By: zou3519

fbshipit-source-id: 6e1690132769b8ef2fd679cb5898c378efac2112
2019-06-06 08:42:57 -07:00
63a55d4932 Support gather export with OneHot + Mul (#21235)
Summary:
This could serve as a alternative solution to export ```torch.gather``` before something similar goes into ONNX spec. The exported model is verified to be correct against onnxruntime backend. We weren't able to test against Caffe2 backend because it doesn't seem to support OneHot opset9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21235

Differential Revision: D15613039

Pulled By: houseroad

fbshipit-source-id: 7fc097f85235c071474730233ede7d83074c347f
2019-06-06 08:35:28 -07:00
240d62fbaa Move redundant code that checks NumPy during build to a helper module and add an option to disable building with NumPy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21417

Reviewed By: ezyang

Differential Revision: D15694357

Pulled By: fmassa

fbshipit-source-id: bc1bda23349ba4531f19619fa4adecb846225c20
2019-06-06 08:15:19 -07:00
a68d2e817b Kill apt-get even harder, and before we purge. (#21464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21464
ghimport-source-id: 81beb6ef39e3d412e755f0ae06c9186d8e11a8bc

Differential Revision: D15694828

Pulled By: ezyang

fbshipit-source-id: 0791fe017a1318425528795f576fb96e54b14dae
2019-06-06 07:49:39 -07:00
12528990f8 change output of ai_pep_format (#21440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21440

This diff modifies the output format when ai_pep_format is enabled.

Reviewed By: hl475

Differential Revision: D15681042

fbshipit-source-id: df5f2dbb38d1bd866ca7f74ef4e63459d480be6e
2019-06-05 21:54:24 -07:00
4e679e30a8 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 060bf204b6400515bbc8f1b9b3ef34bef9d32560
2019-06-05 20:06:53 -07:00
7e300fbb21 Added degrees, radians, ldexp (#21131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21131
ghimport-source-id: 62b9cb71a17f9c9a7999a6e33c2d8b840ce097ff

Differential Revision: D15563184

Pulled By: Chillee

fbshipit-source-id: e2c47fb9f9c0fe9f039cfd001c5e6d5b455e034c
2019-06-05 19:17:02 -07:00
bd2d318e23 Modify quant-dequant node api to take module object and method name (#21407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21407

Modify api takes module object and method whose graph is
instrumented to insert the quant dequant nodes

Differential Revision: D15651624

fbshipit-source-id: 1ff1ae446c986184c724504c8fdd0dcd43864016
2019-06-05 19:08:56 -07:00
505ae5f51d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 9ab609a16522eb233f128df024903eb880742224
2019-06-05 19:03:34 -07:00
f8202d85a0 Added frexp, isinf, isnan, isfinite (#21130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21130
ghimport-source-id: fa771086da13deed232e142db6f940439bcc67bc

Differential Revision: D15563186

Pulled By: Chillee

fbshipit-source-id: fe33dbc454af2a9626ad810a5304300eb17d7530
2019-06-05 18:46:39 -07:00
26db46b324 change the epilogue of SLS to match the simd section (#21439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21439

this bug got exposed after testing accuracy on shapes not multiples of 8

Reviewed By: jspark1105

Differential Revision: D15684759

fbshipit-source-id: 2950f2bd87ee1d8e539148285a14c755f606b3a7
2019-06-05 18:41:55 -07:00
7e6d932208 Make strtod_c compatible with different gcc abi (#21293)
Summary:
We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace:
```
#0  0x00007fec10160207 in raise () from /lib64/libc.so.6
#1  0x00007fec101618f8 in abort () from /lib64/libc.so.6
#2  0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6
#4  0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6
#7  0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) ()
   from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
#8  0x00007feb28643d62 in torch::jit::script::strtod_c(char const*, char**) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
```

We are suspecting this line will get compiled to gcc abi dependent symbol:
```
char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point();
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21293

Differential Revision: D15609910

Pulled By: bddppq

fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1
2019-06-05 18:10:09 -07:00
e07d94558d Updating submodules
Reviewed By: yns88

fbshipit-source-id: 6608ac4be8c338ff5a8116b275bbad487d317972
2019-06-05 16:28:40 -07:00
991c557270 Fix an incorrect implementation of celu (#21213)
Summary:
Fixing an incorrect implementation of the CELU activation function. The existing implementation works by a chance combination of errors that seem to cancel each other out. This change makes the code more readable, aligns the parameter names correctly, and is consistent with the cuda implementation.

I came across this issue while working on version counters... I attempted to specify a gradient in derivatives.yaml for CELU due to a failed test, but the derivative couldn't be specified correctly without fixing the celu implementation.
https://github.com/pytorch/pytorch/pull/20612
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21213

Differential Revision: D15678823

Pulled By: nairbv

fbshipit-source-id: 29fa76b173a66c2c44ed2e0b7959e77f95d19c43
2019-06-05 15:45:50 -07:00
335869e833 Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21425

Differential Revision: D15678631

fbshipit-source-id: 3da2c694de13ad03019e2b3ff451e762199265bb
2019-06-05 15:40:29 -07:00
6b9f46b2d0 Fix "warning: missing return statement at end of non-void function" (#21424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21424

Fixes #21418

Differential Revision: D15676140

fbshipit-source-id: cfadce164c6cfefb16f8bf7bc09529ba8b910769
2019-06-05 15:30:54 -07:00
0e3c4a054b Remove curandStateMTGP32 usage (#21301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21301
ghimport-source-id: d4516237a8fb46d1f74c47532e849e5926fc6a79

Differential Revision: D15632929

Pulled By: ezyang

fbshipit-source-id: b5147edb95dc3d71f87581aa2ab002e48c3fef30
2019-06-05 14:06:25 -07:00
793b302653 ensure version_counter gets incremented for non-differentiable outputs (#20612)
Summary:
issue:
https://github.com/pytorch/pytorch/issues/14571

To reproduce I:
1) added these lines to derivatives.yaml:
```
- name: add_(Tensor self, Scalar other, Scalar alpha)
  output_differentiability: [False, False, False]
- name: add_(Tensor self, Tensor other, Scalar alpha)
  output_differentiability: [False, False, False]
```

2) Ran this code:
```
import torch

scalar = torch.tensor(5)
var1 = torch.randn(4,2,requires_grad=True)
var2 = var1.detach().requires_grad_()
output1 = var1 * scalar
output2 = var2 * scalar
output1.sum().backward()
scalar.add_(5, 1)
output2.sum().backward()
print(var1.grad)
print(var2.grad)
```
Observed modified var2.grad in the output:
```
tensor([[5., 5.],
        [5., 5.],
        [5., 5.],
        [5., 5.]])
tensor([[10., 10.],
        [10., 10.],
        [10., 10.],
        [10., 10.]])
```

After making this change, re-running the above code produces the expected error:
```
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    output2.sum().backward()
  File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 107, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/bvaughan/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.LongTensor []] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20612

Differential Revision: D15661958

Pulled By: nairbv

fbshipit-source-id: af3373135a1a589a635b49e0ff62622a210258e6
2019-06-05 13:36:05 -07:00
8215f44405 Revert D15660575: [pytorch][PR] Fix Caffe2 CI job for new Windows AMI
Differential Revision:
D15660575

Original commit changeset: cfc0f325b0fb

fbshipit-source-id: cb7d87605c9019b9e563bf5ce4325a919263938e
2019-06-05 12:15:34 -07:00
98e3aaeb78 Adding support for exporting models with variable length input/output to ONNX (#20034)
Summary:
Proposal: https://gist.github.com/pk-g/cc45ff8c5891b5699bffd883a87f13ae?fbclid=IwAR17bRA7Fks4APoZRYiNa93UkLdoFCpRDuIYEx0lNVyPTyaDAShbEnytiQo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20034

Reviewed By: zrphercule

Differential Revision: D15606731

Pulled By: houseroad

fbshipit-source-id: 247251e07b4893cb3f7a1287948b1f57aadb7851
2019-06-05 12:02:23 -07:00
ba2bdf8d0e Added factorial (#21129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21129
ghimport-source-id: a676dd33c4d0b2b60c3e9ce725bda0abeb22375f

Differential Revision: D15563183

Pulled By: Chillee

fbshipit-source-id: 641cae34c181a16c772665f5f7ed01c96a67ea9c
2019-06-05 11:51:03 -07:00
7a1c9076ac Revert D15632268: [pytorch][PR] Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen
Differential Revision:
D15632268

Original commit changeset: 8e337e8dc17a

fbshipit-source-id: de98b1af51a53105c97fb076b09efb6fa8eb08a7
2019-06-05 11:41:50 -07:00
f172fadd80 Make warnings be UserWarnings with source file info (#21231)
Summary:
Redo of #15201, this makes `warnings.warn` calls match their Python
behavior
](https://our.intern.facebook.com/intern/diff/15605266/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21231

Pulled By: driazati

Differential Revision: D15605266

fbshipit-source-id: 5931fd720b0c40d52dd492fbd1f5a76abefaab5c
2019-06-05 11:09:11 -07:00
3068a945ce Retry awscli install. (#21383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21383
ghimport-source-id: e518a4f8bf498694b6d504b8a695c5f11e7c681f

Differential Revision: D15664738

Pulled By: ezyang

fbshipit-source-id: d645db505de906e65c057f0d6964b5ce0fb6ff52
2019-06-05 10:38:01 -07:00
bf0e3b62ae Minor preparational JIT changes (#21096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21096
ghimport-source-id: 169f8b4b70cd77b0f9b07cca81d2b4cde2c46456

Reviewed By: ezyang

Differential Revision: D15546176

Pulled By: izdeby

fbshipit-source-id: cdd0a1c87263955eef9d3174ec2f36d1d2935f48
2019-06-05 10:30:01 -07:00
c15254d4ab Expunge some more deprecated uses of AT_CHECK.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21194

Differential Revision: D15576898

fbshipit-source-id: f030195f5bffe0027d4081aece57e2852aaf9ecb
2019-06-05 10:25:25 -07:00
ec7dc52e60 Fix a bug in qconv (#21294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21294

Returned output tensor wasn't of correct shape

Reviewed By: zafartahirov

Differential Revision: D15605081

fbshipit-source-id: f79a9d5b93b8b97e79c09411b9dc681987a61e44
2019-06-05 10:19:31 -07:00
03617574d3 Сhange type of a tensor with bools (#19097)
Summary:
**This is **bc-breaking** change**
Change dtype of a tensor which was created from bool data.
Old behavior: torch.tensor([True, False]) -> uint8 tensor
Now: torch.tensor([True, False]) -> bool tensor

Tested via tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097

Reviewed By: ezyang

Differential Revision: D15632553

Pulled By: izdeby

fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3
2019-06-05 10:19:27 -07:00
22ddddfb80 Continuation of Port max_unpool1d, max_unpool2d and max_unpool3d to ATen (#19465)
Summary:
This PR is a continuation of #15310, which itself is a continuation of #14845, #14941, & #15293. It should be synced up with the pytorch/master branch as of yesterday.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19465

Differential Revision: D15632268

Pulled By: ezyang

fbshipit-source-id: 8e337e8dc17ac31439935ccb530a7caf77f960e6
2019-06-05 10:13:58 -07:00
6874c4058d Add type annotation to stft (#21302)
Summary:
We want to be able to call stft from a torchscript which requires that stft have a type annotation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21302

Differential Revision: D15607973

Pulled By: cpuhrsch

fbshipit-source-id: c4a5c09cdaafe7e81cf487a3ad216d1b03464a21
2019-06-05 10:06:48 -07:00
7c6f2836d4 Fix Caffe2 CI job for new Windows AMI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21410

Differential Revision: D15660575

Pulled By: ezyang

fbshipit-source-id: cfc0f325b0fbc22282686a4d12c7a53236d973d4
2019-06-05 06:35:39 -07:00
6251c563eb Add CUDA support for _dirichlet_grad (#21191)
Summary:
Changelog:
- Migrate _dirichlet_grad implementation from TH to ATen
- Add CUDA support for _dirichlet_grad

Closes #11030.
Closes #15773.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21191

Differential Revision: D15660330

Pulled By: ezyang

fbshipit-source-id: c8ad5b80366e5348139ce9be10400f22fc430344
2019-06-05 06:35:35 -07:00
b460a1987e Per discussion at https://github.com/pytorch/pytorch/pull/21244, fix bugs in (#21392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21392

as discussed at https://github.com/pytorch/pytorch/pull/21244, we
found some values in log_beta are not properly initialized. This diff will 1)
initialize all log_beta to -inf; 2) fix a tricky compare condition; 3) zero all
the gradient elements corresponding to padding to zero.

Offline experiments show that this diff can fix previous seen NaN loss.

Differential Revision: D15637977

fbshipit-source-id: 477008a5e11aae946bd2aa401ab7e0c513421af0
2019-06-05 00:28:45 -07:00
42b2f56124 Fixing race condition at Module::forward method (#21398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21398

Module::forward method calls find_method() function potentially in multiple threads.
Internally it calls find_offset() method and reads dict_ object.
If the correspondent name is not in a dictionary thread call insert() method and modifies dict_ object.
At the same time when first thread modifies dict_ object another thread can enter forward()->find_method()->find_offset() path
and access dict_ object for reading while it have been modified -> crash.
Moved mutex protection up to protect both calls find_offset() and insert().
Consider to use C++ 17 shared_mutex locking object instead of recursive_mutex object.

Reviewed By: bddppq

Differential Revision: D15638942

fbshipit-source-id: ca6a453448302a0b3666c87724755fa4e9ce242f
2019-06-04 23:03:25 -07:00
95eb9339c1 Adds CUDA C++11 and Profiling Notes (#21386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21386
ghimport-source-id: 9430c7640b90d9add38d9bf2f1bd0c8f62b7f239

Differential Revision: D15640102

Pulled By: ezyang

fbshipit-source-id: 98a5efdea9b1de05207ebd3624cb20acda9fe96b
2019-06-04 19:18:55 -07:00
eadac840f7 Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (#21300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21300
ghimport-source-id: c314c28cb693b554d6f24de235c11ba24ed6bf61

Reviewed By: jerryzh168

Differential Revision: D15632935

Pulled By: ezyang

fbshipit-source-id: 9bb24f17d78151bf50942905c967bdcfe1ff00cb
2019-06-04 19:13:57 -07:00
c82bf8ef10 Move THCTensor_(lognormal) to ATen (#21299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21299
ghimport-source-id: 2c63f289f02087f023feda8bff6b90ed49737889

Reviewed By: jerryzh168

Differential Revision: D15632930

Pulled By: ezyang

fbshipit-source-id: 85c17cdca486b46942c5b500e4fd4d95bb5657f9
2019-06-04 19:13:53 -07:00
4671bed0f3 Move THCTensor_(geometric) to ATen (#21298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21298
ghimport-source-id: c0e2604aa25cc5da2b67293cafd88c2e77e476f9

Reviewed By: jerryzh168

Differential Revision: D15632932

Pulled By: ezyang

fbshipit-source-id: 248ca4b56967116f27174cda44893ecfe4ca9a99
2019-06-04 19:13:50 -07:00
d341bcb3dc Move THCTensor_(exponential) to ATen (#21297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21297
ghimport-source-id: 5f45154e714ab44dec961dabf1c64e54aaa063a2

Reviewed By: jerryzh168

Differential Revision: D15632931

Pulled By: ezyang

fbshipit-source-id: 0367eec0a9ef6812b1b3ab7597817ee40a011bb8
2019-06-04 19:13:46 -07:00
92b76df8f6 Finished trigonometric functions (#21128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21128
ghimport-source-id: d566de103f2aefc59e6423181de325d8f42620f4

Differential Revision: D15563190

Pulled By: Chillee

fbshipit-source-id: ad2e09cac5c7dae9978a7bd61098c2828620cdc4
2019-06-04 17:59:09 -07:00
7309cb60fd Finished the high-priority functions (#21127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21127
ghimport-source-id: 609021958e76ea01299f62b9491038005e6b4f27

Differential Revision: D15563189

Pulled By: Chillee

fbshipit-source-id: 5c6155a69fff7447689ef012ea303dc358d50486
2019-06-04 17:59:05 -07:00
622588d8fd Added remainder of high-priority trigonometric math ops (#21126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21126
ghimport-source-id: e310f3cfb28436b99ad038691887ca82068ca2c9

Differential Revision: D15563191

Pulled By: Chillee

fbshipit-source-id: 7135ddd5bc9eebc818694fa8b67eaade907fa8a1
2019-06-04 17:59:02 -07:00
e268fc97c3 Re-add Tensor.T (#21175)
Summary:
Something flaky is going on with `test_inplace_view_saved_output` on Windows.

With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted.

Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output...

I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing.

**Test Plan:** I will wait to make sure the Windows CI suite passes before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175

Differential Revision: D15566970

Pulled By: umanwizard

fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3
2019-06-04 17:38:25 -07:00
ba08cf336d Reorganize cmake related functions to tools/setup_helpers/cmake.py (#21367)
Summary:
Currently tools/build_pytorch_libs.py looks quite convoluted. This commit reorgnizes cmake related functions to a separate file to make the code clearer.

 ---

This is hopefully helpful for further contribution for better integration with cmake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21367

Differential Revision: D15636991

Pulled By: soumith

fbshipit-source-id: 44d76e4e77aec0ce33cb32962b6a79a7f82785da
2019-06-04 17:01:38 -07:00
6ee9e87ff5 Back out "[pytorch][PR] don't materialize constants" (#21374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21374

Original commit changeset: d5609b0a5697

Not materializing constants slows compilation time significantly

Differential Revision: D15630632

fbshipit-source-id: c6b5026ee6eae2ef290628f350f49a657495bd5d
2019-06-04 16:32:09 -07:00
45d2305732 fix incorrect default on Graph::toString (#21370)
Summary:
This default was incorrect and made printing in python not print file:line:col

This wasn't caught because FileCheck internally uses operator<< to print the graph, which has `true` hardcoded as the value. I've added more comprehensive tests to catch this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21370

Differential Revision: D15631135

Pulled By: jamesr66a

fbshipit-source-id: c809e06fff4f0174eefeb89062024384b4944ef7
2019-06-04 16:15:38 -07:00
0dc7286e15 Better error message when trying to instantiate NamedTuple
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21309

Differential Revision: D15630564

Pulled By: jamesr66a

fbshipit-source-id: 82753feee65bbe6c8b2f827cc2664628f3b9f4a3
2019-06-04 16:11:05 -07:00
d348d6405c cdist: pairwise distances between two sets of tensors with batch mode (#20934)
Summary:
Batch implementation for cdist function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20934

Differential Revision: D15609458

Pulled By: ifedan

fbshipit-source-id: 31c12e120d168baec6a6af913f599838a44034d7
2019-06-04 15:52:52 -07:00
6a3ebdbbc5 Remove all conda 3.5 nightly configs, remove libtorch smoketests (#21380)
Summary:
| | Before | After
------------ | ------------ | -------------
Binary builds | ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915716-77a5f900-86d6-11e9-8a39-7ef587e56281.png) | ![binarybuilds-config-dimensions](https://user-images.githubusercontent.com/261693/58915620-4a594b00-86d6-11e9-9e5f-95cf085e6fc8.png) |
Smoke tests | ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915728-812f6100-86d6-11e9-80c1-182242fdfd0e.png) | ![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/58915686-68bf4680-86d6-11e9-8cd2-e65a47384b4f.png) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21380

Differential Revision: D15634729

Pulled By: kostmo

fbshipit-source-id: aef44b0e5b9997be55d93969ab85effca68c5c88
2019-06-04 15:48:47 -07:00
ca32563999 add suggestion to use lld to CONTRIBUTING.md (#21334)
Summary:
I found this significantly speeds up incremental builds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21334

Differential Revision: D15632994

Pulled By: suo

fbshipit-source-id: bb4af90f4400bffa90d168d82ff30fece5e3835c
2019-06-04 15:40:49 -07:00
4940e41d16 Fix mkl-dnn tautological compare error (#21371)
Summary:
```
../third_party/ideep/mkl-dnn/src/cpu/jit_avx512_common_convolution.hpp:144:821: error: self-comparison always evaluates to true [-Werror,-Wtautological-compare]
        virtual pd_t *clone() const override { return new pd_t(*this); } virtual status_t create_primitive(primitive_t **primitive, const primitive_at_t *inputs, const primitive_t **outputs) const override { double ms = get_msec(); primitive_t::input_vector ins(inputs, inputs + this->n_inputs()); primitive_t::outpu
t_vector outs(outputs, outputs + this->n_outputs()); auto ret = safe_ptr_assign<primitive_t>(*primitive, new (jit_avx512_common_convolution_bwd_data_t)(this, ins, outs)); ms = get_msec() - ms; if (mkldnn_verbose()->level >= 2) { printf("mkldnn_verbose,create,%s,%g\n", this->info(), ms); fflush(0); } return ret; } v
irtual const char *name() const override { return (avx512_common == sse42 ? "jit:" "sse42" : (avx512_common == avx ? "jit:" "avx" : (avx512_common == avx2 ? "jit:" "avx2" : (avx512_common == avx512_common ? "jit:" "avx512_common" : (avx512_common == avx512_core ? "jit:" "avx512_core" : (avx512_common == avx512_mic
? "jit:" "avx512_mic" : (avx512_common == avx512_mic_4ops ? "jit:" "avx512_mic_4ops" : "jit:" ""))))))); };
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21371

Differential Revision: D15631392

Pulled By: bddppq

fbshipit-source-id: 3b0008acab8ae53ce61327686bd8367e7fb5d298
2019-06-04 15:27:07 -07:00
403ca41142 make analyzeConservative more conservative (#21227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21227
ghimport-source-id: cac97ba20cb020f3edc4e83e7641201f0826f40a

Reviewed By: jamesr66a

Differential Revision: D15592316

Pulled By: suo

fbshipit-source-id: b311f73a5d81d6d0b0331678b6a625e446588ebd
2019-06-04 15:09:46 -07:00
0dbae7eddb cleanup templated implementation of mayAlias (#21224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21224
ghimport-source-id: 6ec4ea015043bbddd031f92c5149e8313f21977d

Reviewed By: jamesr66a

Differential Revision: D15592318

Pulled By: suo

fbshipit-source-id: 47c52342f2a1360752306908e2f394ef52e47504
2019-06-04 15:09:43 -07:00
adf6f6c442 use memory locations instead of values for working set (#21223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21223
ghimport-source-id: 82800a465a4273e185bfffe2f67835b2f7f3a519

Reviewed By: jamesr66a

Differential Revision: D15592317

Pulled By: suo

fbshipit-source-id: 5e87c803a928b61c923200888a3ff1ac7b2523e0
2019-06-04 15:09:39 -07:00
f330168570 remove multisets from work set (#21222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21222
ghimport-source-id: 0eb6daa92bef68a35bef918c3f3a791b401812aa

Reviewed By: jamesr66a

Differential Revision: D15592319

Pulled By: suo

fbshipit-source-id: 895d26538ba1edcd73b83147a68b7e4069084230
2019-06-04 15:09:36 -07:00
df0b83654a cleanups to alias analysis (#21221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21221
ghimport-source-id: 778e7317bbe874d35a903d89af5e0bc9721c8680

Reviewed By: jamesr66a

Differential Revision: D15592313

Pulled By: suo

fbshipit-source-id: d6f6d2be8cd80b40dd26d0bb3be30f074e356105
2019-06-04 15:09:33 -07:00
77c2f5dd75 fix copyright notice in docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21372

Differential Revision: D15631889

Pulled By: umanwizard

fbshipit-source-id: cf764432c27cb1b01d8137ed60ec7de361450d0e
2019-06-04 14:53:45 -07:00
57f932a638 Enable 'empty' function for mkldnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21184

Differential Revision: D15625296

Pulled By: bddppq

fbshipit-source-id: 47d26798bcf48e227ffd813f299959a7b8993641
2019-06-04 14:16:13 -07:00
b869a3b4ac add new ops to benchmark_all_test (#21365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21365

This diff adds new operators to benchmark_all_test so all the supported ops can be built as one binary

Reviewed By: hl475

Differential Revision: D15627328

fbshipit-source-id: b7ca550a279f485102a6a6bd47e4032c7beb9940
2019-06-04 13:54:26 -07:00
2ed6f017ed Added better tests for math ops and unified them (#21125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21125
ghimport-source-id: 2a576b563208ce3d83e6771643e20d24bc72af86

Differential Revision: D15563188

Pulled By: Chillee

fbshipit-source-id: 0e77471729f715063d6bee075d2fc65f8db8b6c3
2019-06-04 13:15:54 -07:00
6938de8851 made floor/ceil return ints (#21124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21124
ghimport-source-id: e3e45bd50c9af1ee03fd58f2f4d631ce23d9612e

Differential Revision: D15563187

Pulled By: Chillee

fbshipit-source-id: 6504a41da883a8287d64db20d40cf958edb7404c
2019-06-04 10:32:16 -07:00
87690d2b77 Move THCTensor_(cauchy) to ATen (#21289)
Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
cauchy, size, elements 65536 forward 4.980564117431641e-06 bandwidth (GB/s) 52.63339529803734
cauchy, size, elements 131072 forward 6.232261657714844e-06 bandwidth (GB/s) 84.12483762631982
cauchy, size, elements 262144 forward 9.548664093017577e-06 bandwidth (GB/s) 109.81389540833959
cauchy, size, elements 524288 forward 1.59454345703125e-05 bandwidth (GB/s) 131.52052963827754
cauchy, size, elements 1048576 forward 2.86865234375e-05 bandwidth (GB/s) 146.21165262978724
cauchy, size, elements 2097152 forward 5.4748058319091796e-05 bandwidth (GB/s) 153.2220184158516
cauchy, size, elements 4194304 forward 0.00010075807571411133 bandwidth (GB/s) 166.50988897012377
cauchy, size, elements 8388608 forward 0.0001935744285583496 bandwidth (GB/s) 173.34124269355965
cauchy, size, elements 16777216 forward 0.00038077831268310545 bandwidth (GB/s) 176.24129779641603
cauchy, size, elements 33554432 forward 0.0006851387023925781 bandwidth (GB/s) 195.8986224705994
```
#### After:
```
cauchy, size, elements 65536 forward 6.077289581298828e-06 bandwidth (GB/s) 43.13501874366419
cauchy, size, elements 131072 forward 6.2131881713867184e-06 bandwidth (GB/s) 84.38308731972373
cauchy, size, elements 262144 forward 6.46829605102539e-06 bandwidth (GB/s) 162.11008150033175
cauchy, size, elements 524288 forward 6.8783760070800785e-06 bandwidth (GB/s) 304.8905726935182
cauchy, size, elements 1048576 forward 9.505748748779296e-06 bandwidth (GB/s) 441.23867681003264
cauchy, size, elements 2097152 forward 1.5070438385009766e-05 bandwidth (GB/s) 556.6266744001266
cauchy, size, elements 4194304 forward 2.4406909942626954e-05 bandwidth (GB/s) 687.396152951685
cauchy, size, elements 8388608 forward 4.6243667602539064e-05 bandwidth (GB/s) 725.6005792706125
cauchy, size, elements 16777216 forward 9.100198745727539e-05 bandwidth (GB/s) 737.4439380404413
cauchy, size, elements 33554432 forward 0.00017449140548706055 bandwidth (GB/s) 769.1939188944922
```
### Double Type
#### Before:
```
cauchy, size, elements 65536 forward 4.885196685791015e-06 bandwidth (GB/s) 53.660889593753055
cauchy, size, elements 131072 forward 6.229877471923828e-06 bandwidth (GB/s) 84.15703235943361
cauchy, size, elements 262144 forward 9.605884552001953e-06 bandwidth (GB/s) 109.15975455706132
cauchy, size, elements 524288 forward 1.5976428985595704e-05 bandwidth (GB/s) 131.26537863315923
cauchy, size, elements 1048576 forward 2.9621124267578124e-05 bandwidth (GB/s) 141.59840666786866
cauchy, size, elements 2097152 forward 5.5103302001953126e-05 bandwidth (GB/s) 152.23421637604707
cauchy, size, elements 4194304 forward 0.00010124444961547851 bandwidth (GB/s) 165.70998275677383
cauchy, size, elements 8388608 forward 0.0001944279670715332 bandwidth (GB/s) 172.58027487195184
cauchy, size, elements 16777216 forward 0.00034950494766235353 bandwidth (GB/s) 192.01119883668116
cauchy, size, elements 33554432 forward 0.0007002186775207519 bandwidth (GB/s) 191.67973135938277
```
#### After:
```
cauchy, size, elements 65536 forward 5.91278076171875e-06 bandwidth (GB/s) 44.33514628129032
cauchy, size, elements 131072 forward 6.234645843505859e-06 bandwidth (GB/s) 84.09266751632889
cauchy, size, elements 262144 forward 7.433891296386719e-06 bandwidth (GB/s) 141.05344807902503
cauchy, size, elements 524288 forward 1.1401176452636719e-05 bandwidth (GB/s) 183.94171941045587
cauchy, size, elements 1048576 forward 1.960039138793945e-05 bandwidth (GB/s) 213.99082890665372
cauchy, size, elements 2097152 forward 3.434181213378906e-05 bandwidth (GB/s) 244.26806504326578
cauchy, size, elements 4194304 forward 6.517410278320313e-05 bandwidth (GB/s) 257.4215107465028
cauchy, size, elements 8388608 forward 0.0001229524612426758 bandwidth (GB/s) 272.9057365819818
cauchy, size, elements 16777216 forward 0.00023239374160766602 bandwidth (GB/s) 288.77225150621814
cauchy, size, elements 33554432 forward 0.00046050310134887696 bandwidth (GB/s) 291.4589013773367
```
Resubmit of https://github.com/pytorch/pytorch/pull/20622
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21289

Differential Revision: D15622713

Pulled By: ezyang

fbshipit-source-id: abe8bd57794bd1c3a0b92395367a9653c5d0f2db
2019-06-04 08:24:42 -07:00
f9e746e9c8 Use "important" node to toggle whether or not to build on PR (#21308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21308
ghimport-source-id: 75fba872a658d8257a3f6ff9d9e33a320c6e523e

Differential Revision: D15621909

Pulled By: ezyang

fbshipit-source-id: 6d016d9ffdeb6414d70a1b48ed4766b5dc626353
2019-06-04 08:05:56 -07:00
1291d95e82 Revert "Fix the caffe2_gpu linkage with torch on Windows" (#21335)
Summary:
The original PR (#16071) is not working anymore after `caffe2` and `torch` is unified. What's more, It is making the binary big since the optimizing flag is disabled on a very big project(the `torch` library used to be small, but it now applies on the whole `caffe2` and `caffe2_gpu` library). We need to get it reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21335

Differential Revision: D15622163

Pulled By: soumith

fbshipit-source-id: 900bd400106d27a1512eed1e9f2288114f5f41bb
2019-06-04 07:49:49 -07:00
38d68ad803 Update randomness.rst (#21337)
Summary:
Following [this question on the forums](https://discuss.pytorch.org/t/reproducibility-and-performance/46504), I propose the following doc change. It clarifies that 'performance reduction' concerns the processing speed (and not the training accuracy).

Related website commit: https://github.com/pytorch/pytorch.github.io/pull/211
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21337

Differential Revision: D15622151

Pulled By: soumith

fbshipit-source-id: f0edeb20049f2ee715c400e7c57abb966864d621
2019-06-04 07:38:00 -07:00
ae42a11ab2 Make .circleci Conf class uses dataclasses; use types. (#21284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21284
ghimport-source-id: a628af85b7a085e15903168e957bed1e273d6636

Differential Revision: D15621908

Pulled By: ezyang

fbshipit-source-id: 64e2da8b96cdc1b53c0b314771d225eebf3d4b2d
2019-06-04 07:28:53 -07:00
25a6ff10f0 Add gtest for TensorIterator (#21253)
Summary:
This adds a regression test for the bug fix in #21236. Operations
involving CUDA tensors an CPU scalars should not copy the CPU scalar to
the device (because that is slow). They should instead "lift" the scalar
to a kernel parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21253

Reviewed By: bddppq

Differential Revision: D15604080

Pulled By: colesbury

fbshipit-source-id: c14ded5d584499eaa5ea83337ffc50278205f3d6
2019-06-04 07:23:42 -07:00
fecd5fa171 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 1a30f65182aead9145cb02fb544e2b7a25043f44
2019-06-04 07:23:39 -07:00
2ee2d78a29 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0256f2f4afaeaa2c16074dbca3b9a03ca434c7de
2019-06-03 23:36:35 -07:00
af4c24153f Honor OMP/MKL environment variables in AT_PARALLEL_NATIVE case (#21189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21189
ghimport-source-id: 4dcfaf04880346ff5ca79ca4dd11c94dcb645ce5

Differential Revision: D15574578

Pulled By: ilia-cher

fbshipit-source-id: 919fccb58b997f9a7add5486a79f9cd4cabaa1ee
2019-06-03 23:22:58 -07:00
f251416d70 Update fbgemm submodule (#21328)
Summary:
Fix master breakage

cc jianyuh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21328

Differential Revision: D15618649

Pulled By: bddppq

fbshipit-source-id: bce279705520dbd9c6df5fb794cdaeaed48a1a5a
2019-06-03 22:17:04 -07:00
113a27ee45 bake constants into the traced graph, get rid of getNestedValueTrace (#21046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21046
ghimport-source-id: 5cb3efb1896fbe42336e24c14fbf0bb5e646528e

Differential Revision: D15530991

Pulled By: wanchaol

fbshipit-source-id: b096ca5a1cdce496742b7f7e1de3ef8d21e9a8b0
2019-06-03 21:48:11 -07:00
cf356a342b Fix a bug in loop unrolling (#21239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21239
ghimport-source-id: 68256b752be795b32ab3f426848ed1d64fc5ea3e

Reviewed By: suo

Differential Revision: D15590901

Pulled By: zdevito

fbshipit-source-id: 8700aab723d4486fd20d3414df8160b36a3cc5da
2019-06-03 21:35:14 -07:00
6e657c5586 Add CallMethod, inline eagerly (#21116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21116
ghimport-source-id: 3c47e335dd80f52216e50e0a215cedc1862a9e78

Reviewed By: eellison

Differential Revision: D15552816

Pulled By: zdevito

fbshipit-source-id: 708fe87439d94117dca0a26c98f0917f497f718f
2019-06-03 21:35:11 -07:00
0f58d20fe4 Add quantized::fbgemm_linear_unpack operator for serialization (#97)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/97

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20721

- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.

Reviewed By: zafartahirov

Differential Revision: D15314568

fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
2019-06-03 20:36:30 -07:00
4b576e5184 Do not hardcode build_dir in build_caffe2. Use the build_dir parameter.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21296

Differential Revision: D15613035

Pulled By: bddppq

fbshipit-source-id: 19313cbe0135581990d489f489d366d00962a3c3
2019-06-03 20:31:30 -07:00
702ba3d2fb build torch for libtorch mobile build (#21234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21234
ghimport-source-id: 8d401691a811991c79acf5e09e60389910910365

Differential Revision: D15616540

Pulled By: ljk53

fbshipit-source-id: 150e706630911bf14c55f47f4058eaada1edf1cc
2019-06-03 19:51:05 -07:00
82ceeaeca2 Add options to jit's operator constructor (#21315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21315
ghimport-source-id: 168ddecb333a8cb309e7b859683de9b077123205

Differential Revision: D15614506

Pulled By: bwasti

fbshipit-source-id: ae013a88e2069c38845b5b8ff805db96ab2c29e9
2019-06-03 19:30:22 -07:00
457c0f164e insert missing #pragma once in VariableTypeUtils.h (#21134)
Summary:
insert missing #pragma once keyword to prevent redefinition error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21134

Differential Revision: D15607673

Pulled By: li-roy

fbshipit-source-id: 0000fa18e3c55e5d36a64b171d6e85eb4bc211a1
2019-06-03 17:50:56 -07:00
1c5bd1fa65 Automatic update of fbcode/onnx to 5160f3ac3380302224998f1c95e111cd961c4bc5 (#21311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21311

Previous import was 9005291283e943f1a91da5f0acf218bc4e8eb2ca

Included changes:
- **[5160f3ac](https://github.com/onnx/onnx/commit/5160f3ac)**: Fix typo (#2069) <Takeshi Watanabe>
- **[ac218ac6](https://github.com/onnx/onnx/commit/ac218ac6)**: Add a missing step when upgrading an operator (#2071) <daquexian>
- **[5972eed9](https://github.com/onnx/onnx/commit/5972eed9)**: Clarify the axis/size in pads, strides, dilations (#2048) <daquexian>

Reviewed By: bddppq

Differential Revision: D15612734

fbshipit-source-id: 235dc3d49e4a6ccd4f43e6c2f648e87611d52697
2019-06-03 17:35:53 -07:00
02fd1878e3 Cast dropout to float in RNN (#21304)
Summary:
This solves the situation where, for example, someone instantiates LSTM with `dropout=0`, a Python integer. This works fine in Python, but JIT throws a type error because it expected float but got int

Resolves https://github.com/pytorch/lockdown/issues/65
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21304

Differential Revision: D15613153

Pulled By: jamesr66a

fbshipit-source-id: eabff76e3af3de0612583b37dbc5f7eab7e248a4
2019-06-03 16:59:04 -07:00
45de3ef6a7 Export feature length information for onnxifi operator (#21303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21303

Export feature length information for onnxifi operator
recommit for D15548138
disable caffe2_extract_feature_length_for_shape_inference by default
change LOG(INFO) to VLOG(4)
change LOG(WARNING) to LOG_EVERY_N(WARNING, 1000)

Reviewed By: yinghai, ipiszy

Differential Revision: D15608620

fbshipit-source-id: f96410366fe6bae954fea9d6b50ee72f4969d024
2019-06-03 16:53:06 -07:00
7c823312d3 hub doc improvements
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21307

Differential Revision: D15610441

Pulled By: ailzhang

fbshipit-source-id: 2b2a28ed808936cf7c93db31afc6b5ea888ab1b1
2019-06-03 16:29:39 -07:00
22865d4ce1 Add ONNX export support for torch.rand. (#20559)
Summary:
This PR adds support for torch.rand export in the PyTorch ONNX exporter. There are other generator ops that need to be supported for export and they will added in subsequent PRs. This op is needed with priority for a model on our end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20559

Differential Revision: D15379653

Pulled By: houseroad

fbshipit-source-id: d590db04a4cbb256c966f4010a9361ab8eb3ade3
2019-06-03 16:09:01 -07:00
7d84ca6e06 clean code to unify the logic to use fp16 by the optimizer engine (#20915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20915

Clean the unary processor code. Some question are added into the comments to seek suggestions.

Reviewed By: pjh5

Differential Revision: D15448502

fbshipit-source-id: ef0c45718c1a06187e3fe2e4e59b7f20c641d9c5
2019-06-03 15:03:35 -07:00
3004b397f0 change test_name to be globally unique value across tests (#21206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21206

This diff change the default test_name to be a globally unique value across tests. With that, users can list all the tests and choose to run a specific test.

Reviewed By: zheng-xq

Differential Revision: D15543508

fbshipit-source-id: 0814ef6a60d41637fed5245e30c282497cf21bb8
2019-06-03 14:55:11 -07:00
ca80ec7c97 introduce a new intrace to add op [PT changes] (#21149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21149

The diff modifies the interface for PyTorch operators in the benchmark suite

Reviewed By: zheng-xq

Differential Revision: D15433897

fbshipit-source-id: e858183431eb37d90313356716c2de8709372b58
2019-06-03 14:55:08 -07:00
88d033f842 don't materialize constants (#21229)
Summary:
This doesn't affect anything because we run constant pooling, and in the case of Closures and Forks creates unnecessary closures over constants.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21229

Differential Revision: D15587764

Pulled By: eellison

fbshipit-source-id: d5609b0a5697071fab5050eb9e03876ab9ebb27a
2019-06-03 13:36:57 -07:00
9a41f44732 Improve ONNX Loop export (#20445)
Summary:
~~This is work in progress due to its dependency on multiple pending PRs.~~

- [x] ONNX: Relax constraint on subgraph input/output type & shape check. https://github.com/onnx/onnx/pull/2009
- [x] PyTorch: Add infra to test_pytorch_onnx_caffe2.py to test ScriptModule models. https://github.com/pytorch/pytorch/pull/20256

This PR should partially resolve https://github.com/pytorch/pytorch/issues/17531. However, ideally we shouldn't need to put cast(and reshape) node to help the conversion for loop condition.

- Added cast node for condition values before entering loop node. The ONNX spec only accepts Bool type, while in PyTorch if the condition value is an output from other node it could potentially have any integral type.
- Tidying up the exported ONNX loop subgraph input type & shape. According to ONNX spec, input "M" is exported as 0-d scalar tensor with type int64. input "Cond" is exported as incomplete tensor of type Bool without shape information. This is because through out the iteration, the rank of condition value is dynamic, either 0-d or 1-d, as long as it holds a single value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20445

Differential Revision: D15534188

Pulled By: houseroad

fbshipit-source-id: d174e778529def05ee666afeee4b8fb27786e320
2019-06-03 13:00:00 -07:00
mal
4980b8b95c Renaming member variables in engine.cpp/h (#21283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21283
ghimport-source-id: 360a138e420ace3cd4ca6ccbc761c8e68319440d

Differential Revision: D15607428

fbshipit-source-id: f8df6b42796a49c4d68fa8366b6a68d5715f6421
2019-06-03 12:54:50 -07:00
37fed9b24a Rename FC to Linear for the function name (#21268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21268

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15599232

fbshipit-source-id: 0046f933657f60807fdca7009676bfb052748d91
2019-06-03 11:55:35 -07:00
63b3c5a66a Replace AT_ASSERTM with TORCH_CHECK (#21267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21267

Replace AT_ASSERTM with TORCH_CHECK: AT_ASSERTM is deprecated.

Not sure when ```AT_ASSERT``` is dprecated with some new TORCH ASSERT function.

Reviewed By: zafartahirov

Differential Revision: D15599242

fbshipit-source-id: 23f21a9a23dc3c147dc817e6d278066d0832e08d
2019-06-03 11:47:14 -07:00
ad971a37d0 Improve performance of advanced indexing backward (#20557)
Summary:
This PR improves performance of advanced indexing backward, partially solving #15245 (performance is still worse than gather, but not by such outrageous margins). Before, using benchmarking harness from #15245, cuda 10/V100:
```
Indexing is faster by at most -270.61607820767887 us on N: 16 D: 256 K: 1
Indexing is slower by at most 11127.466280784833 us on N: 16 D: 4096 K: 4096
```
after:
```
Indexing is faster by at most 23.524456737696028 us on N: 512 D: 4096 K: 4096
Indexing is slower by at most 186.24056029472553 us on N: 16 D: 1024 K: 4096
```
Strategy is to reuse embedding backward kernel, adapting it to handle unindexed dimensions in the beginning by launching additional threadblocks, and also allowing it to handle slices that are bigger than `65K*128`, that is hardly ever a problem for embedding. Still, integer indexing is baked in the kernel, and is important for performance, so for now bigger than 2G element tensors are not supported.
The main savings come from not having to expand index to all unindexed dimensions, and not sorting expanded index with incoming gradient values, but rather only sorting unexpanded index.
There are ways to make sorting overhead smaller (thanks mcarilli for suggestions) but I'll get to it when it becomes a real problem, or rather, when cuda graphs will force us to get rid of thrust::sort calls.
I've also added tests for indexing backward, before tests for index_put_ and indexing backward were non-existent.
This PR also fixes #20457 by casting indices to `self` backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20557

Differential Revision: D15582434

Pulled By: ezyang

fbshipit-source-id: 91e8f2769580588ec7d18823d99a26f1c0da8e2a
2019-06-03 11:38:53 -07:00
4ac732ed7a file:line for tracing (#21247)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/21217

This adds support for recording file and line information during tracing, by extracting the top Python interpreter frame
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21247

Reviewed By: suo, driazati

Differential Revision: D15594553

Pulled By: jamesr66a

fbshipit-source-id: 72e1b3a46f1dabe3e83a608ec1a7d083bd1720f9
2019-06-03 11:13:49 -07:00
27d1daab45 Export ONNX Dropout for opset 10 (#20710)
Summary:
Remove Dropout from the opset 10 blacklist.
ONNX Dropout was modified in opset 10, but only the output "mask" was modified, which is not exported in pytorch opset 9. So we can still fallback on the opset 9 op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20710

Differential Revision: D15571248

Pulled By: houseroad

fbshipit-source-id: 15267eb63308a29a435261034b2f07324db1dea6
2019-06-03 10:59:56 -07:00
770089c2b8 math module support: isnan, asinh, atanh, cosh, sinh, and tanh (#19337)
Summary:
driazati and eellison Please review This PR is for #19026 .  Specifically, isnan, asinh, atanh, cosh, sinh, and tanh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19337

Differential Revision: D15580932

Pulled By: driazati

fbshipit-source-id: 38513fa59088e038264f9f6f0d6374a13a165589
2019-06-03 10:54:42 -07:00
fb72625267 Remove onnx export expects (#21238)
Summary:
We're not getting much from checking the export strings, and they are noisy and slow development. DIdn't realize they existed until now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21238

Differential Revision: D15604256

Pulled By: eellison

fbshipit-source-id: 488e9401231228cffe132dab99d519563fa63afc
2019-06-03 10:30:12 -07:00
2e59a0a646 add contiguous function type hint for tensor (#21285)
Summary:
Fixes #21261
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21285

Differential Revision: D15604270

Pulled By: soumith

fbshipit-source-id: c1c02348e338477a507052de0a1065cf42a99387
2019-06-03 10:17:03 -07:00
96667dfe41 Write add_scalars data in the same file (#21100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21100

Added multifile flag to write scalar data into separate files. This can slow down dashboard loading.

Reviewed By: orionr

Differential Revision: D15548913

fbshipit-source-id: dd39a7f76f93025d28f14babbf933e39860e6910
2019-06-03 09:53:27 -07:00
5b33698776 Fix build error in c10 on Windows (#21005)
Summary:
Targets https://github.com/pytorch/pytorch/issues/20635#issuecomment-496265510
Reference:
1. https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=vs-2015#microsoft-specific-predefined-macros
2. https://docs.microsoft.com/en-us/cpp/cpp/deprecated-cpp?view=vs-2019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21005

Differential Revision: D15543134

Pulled By: ezyang

fbshipit-source-id: f32709b018a7de651cb31575fc6117bfc4dd3bd1
2019-06-03 09:53:24 -07:00
155f767382 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#21287)
Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779
normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568
normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376
normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219
normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762
normal, size, elements 2097152 forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498
normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253
normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696
normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774
normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252
```
#### After:
```
normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385
normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916
normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706
normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386
normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293
normal, size, elements 2097152 forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237
normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946
normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927
normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856
normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966
```
### Double Type
#### Before:
```
normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461
normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661
normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304
normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668
normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678
normal, size, elements 2097152 forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186
normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557
normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402
normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674
normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564
```
#### After:
```
normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444
normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349
normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296
normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726
normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345
normal, size, elements 2097152 forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881
normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457
normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966
normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818
normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795
```

Resubmit of #20621
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21287

Differential Revision: D15603695

Pulled By: ezyang

fbshipit-source-id: f8c5032678d503d45ac99fb1475a929df7c2b361
2019-06-03 09:45:02 -07:00
21113c2d36 EliminateGuards
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21070

Differential Revision: D15603561

Pulled By: Krovatkin

fbshipit-source-id: 03056688e8b99eddcb30d80cc20ab37ad3f13af2
2019-06-03 09:39:45 -07:00
c8539be962 Make is_contiguous checks generic in number of arguments (#21106)
Summary:
Loops.h has contains specializations for cases where all the inputs are
contiguous as well as cases where one input is a scalar and all other
inputs are contiguous.

Previously, there were separate checks for each functions that take
zero, one, or two input arguments. This is getting unwieldy, especially
once we add support for functions that take three inputs (#21025).

This requires the use of recursive templates (which have their own
downsides), but this seems better than the alternative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21106

Differential Revision: D15562430

Pulled By: colesbury

fbshipit-source-id: 5f19ab2212e16e29552887f4585c2b4a70309772
2019-06-03 09:19:19 -07:00
b159e0ce08 Significantly simplify the spawning of pytorch libs building process. (#21105)
Summary:
Instead of attempting to hardcode calls to "ninja" or "make", we should always let cmake do it. This better integrates build configurations (DEBUG or REL_WITH_DEB_INFO) and better handles the case in which the native build tool is not in PATH (cmake has some capacity to find them and has options for users to specify their locations).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21105

Differential Revision: D15602883

Pulled By: soumith

fbshipit-source-id: 32ac46d438af00e791defde6ae5ac21c437d0bb0
2019-06-03 08:28:19 -07:00
f62a006097 Retry Fix Python DataParallel RNN in no_grad mode (#21262)
Summary:
Retry #21197

The previous one failed because it uses some Python3 only syntax.

ezyang Do we still have multi-GPU py2 tests? I am curious why the CI tests did not catch this error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21262

Differential Revision: D15598941

Pulled By: mrshenli

fbshipit-source-id: 95f416589448c443685d6d236d205b011998a715
2019-06-03 08:04:35 -07:00
0c6efbd410 Fix gelu documents (#21265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21265

Fix gelu documents

Reviewed By: hl475

Differential Revision: D15598958

fbshipit-source-id: 483040069102daada705401c36c8990598142d3d
2019-06-02 20:17:56 -07:00
eaa3ba6587 Add autograd for layer_norm on CPU (#20883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20883

Add autograd for layer_norm on CPU, after this diff, both PyTorch and jit model can automatically benefit from performance improvement of nn.functional.layer_norm

Reviewed By: zheng-xq

Differential Revision: D15483790

fbshipit-source-id: 94ed3b16ab6d83ca6c254dbcfb224ff7d88837f3
2019-06-02 16:55:32 -07:00
31c79b71ff Add gelu gradient for pytorch (#21237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21237

Add gelu gradient for pytorch

Reviewed By: zheng-xq

Differential Revision: D15589816

fbshipit-source-id: 76fda7c413afed5b6cc3abe3a26c258d393a53ce
2019-06-02 09:42:42 -07:00
93ae040ff0 Add gelu activation in pytorch (#20665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20665

Add gelu activation forward on CPU in pytorch

Compare to current python implemented version of gelu in BERT model like

  def gelu(self, x):
      x * 0.5 * (1.0 + torch.erf(x / self.sqrt_two))

The torch.nn.functional.gelu function can reduce the forward time from 333ms to 109ms (with MKL) / 112ms (without MKL) for input size = [64, 128, 56, 56] on a devvm.

Reviewed By: zheng-xq

Differential Revision: D15400974

fbshipit-source-id: f606b43d1dd64e3c42a12c4991411d47551a8121
2019-06-02 09:08:47 -07:00
aac424a6c4 Revert D15577342: [pytorch][PR] Fix Python DataParallel RNN in no_grad mode
Differential Revision:
D15577342

Original commit changeset: 1a024c572171

fbshipit-source-id: 9a3ddc14ebb2d75d9dc3ee1fe69df9ffba3529de
2019-06-01 22:17:19 -07:00
360e6d1b0b Fixes a bug in the test (#21146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21146

The error was reported by https://our.intern.facebook.com/intern/test/562949965807317?ref_report_id=1837062

The API changed from `a.quantize_linear(...)` to `torch.quantize_linear(a, ...)`

Reviewed By: dskhudia

Differential Revision: D15557418

fbshipit-source-id: 88463e09fdf1f574f1b8128f6a00c2810091cd03
2019-06-01 18:00:33 -07:00
62ae348d1a Exclude file:line from graphs used for fuser kernel cache (#21252)
Summary:
cc ezyang this is meant to fix the fuser failures on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21252

Differential Revision: D15594283

Pulled By: jamesr66a

fbshipit-source-id: 85f37e78b2de051c92ade3fe4c44c7530b4542e5
2019-06-01 16:18:55 -07:00
7c40576c61 Save the weight shape info the first time we have chance to extract it (#21233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21233

It is possible that OnnxifiOp is created in a thread where weights have been cleaned from the workspace, which is legit use case as we can create the backend once and lower all the weights. So we need to extract the weight shape info the first time we create the backend and save it.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D15587237

fbshipit-source-id: 1f264dc32c0398c42b618e9c41c119eb13e1c9f1
2019-06-01 12:55:29 -07:00
0efc527dd1 Revert D15548138: Export feature length information for onnxifi operator
Differential Revision:
D15548138

Original commit changeset: 460118648bb4

fbshipit-source-id: 1a25ca2942d804f6c88e96c436f09f68c260b9be
2019-06-01 12:41:47 -07:00
51ebbe970a Fix Python DataParallel RNN in no_grad mode (#21197)
Summary:
Fixes #21108

When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](8cde4c4d22/torch/csrc/autograd/python_function.cpp (L395-L399)), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](9d09f5df6c/aten/src/ATen/native/cudnn/RNN.cpp (L669)).

The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.

apsdehal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197

Differential Revision: D15577342

Pulled By: mrshenli

fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c
2019-06-01 10:37:57 -07:00
f051fbd4a8 Fix typo in test_dataloader
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226

Differential Revision: D15592797

Pulled By: soumith

fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724
2019-06-01 10:30:14 -07:00
d168a8533f compare scalar device with common device (#21236)
Summary:
I think there was a typo in #20690 here https://github.com/pytorch/pytorch/pull/20690/files#diff-b47a50873394e38a005b4c1acd151957R130.
Original conditional was ` common_backend == Backend::CUDA && op.tensor.type().backend() == Backend::CPU)`, now it is `op.device.is_cuda() && op.tensor.device().is_cpu()`. It seems that `op.device` and `op.tensor.device()` should be the same, so this conditional is never true. This leads to spurious h2d copies for operations between cuda tensors and cpu scalars, because cpu scalars are now sent to gpu, instead of being passed to lambdas directly.
Unfortunately, I don't know how to test this change, because functionally everything was fine after #20690, it was just a performance regression.

cc colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21236

Differential Revision: D15592754

Pulled By: soumith

fbshipit-source-id: 105bfecc61c222cfdb7294a03c9ecae3cc7f5817
2019-06-01 10:24:31 -07:00
41b17e2458 Fix wrong type hints for Tensor.is_cuda, is_leaf (#21192)
Summary:
`Tensor.is_cuda` and `is_leaf` is not a predicate function but a `bool` attribute. This patch fixes the type hints in `torch/__init__.pyi` for those attributes.

```diff
- def is_cuda(self) -> bool: ...
+ is_cuda: bool
- def is_leaf(self) -> bool: ...
+ is_leaf: bool
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21192

Differential Revision: D15592766

Pulled By: soumith

fbshipit-source-id: 8c4ecd6939df8b8a8a19e1c9db6d40193bca7e4a
2019-06-01 10:04:52 -07:00
be7fc40621 Fix sccache not being used on Windows (#21248)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21167.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21248

Differential Revision: D15592742

Pulled By: soumith

fbshipit-source-id: 4add002698c13301f142526cd783c866d345bf5e
2019-06-01 09:47:39 -07:00
619261d7a7 Add file-line info for jit.load and string frontend (#21217)
Summary:
This makes file-line reporting also work for things loaded using `torch.jit.load()` as well as the string frontend (via `CompilationUnit`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21217

Differential Revision: D15590838

Pulled By: jamesr66a

fbshipit-source-id: 6b6a12574bf9eca0b83f24f0b50535fda5863243
2019-05-31 23:43:15 -07:00
b663eec119 Lazily build error strings in schema matching using replay. (#21241)
Summary:
Saves ~20% (5.3s -> 4.3s) loading DenseNet on my laptop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21241

Differential Revision: D15590338

fbshipit-source-id: 2c8aebc829d4ea46f358d74d396cc44f5f57fcf5
2019-05-31 23:34:20 -07:00
5bc7c1f83d fix contribution and governance links (#21243)
Summary:
Updated web links on contribution_guide and governance documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21243

Differential Revision: D15591065

Pulled By: soumith

fbshipit-source-id: fdcfc518605a08a2ac35a10c146122d7d0a3f609
2019-05-31 21:02:13 -07:00
85786bea7d Export feature length information for onnxifi operator (#21110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21110

Export feature length information for onnxifi operator

Reviewed By: ipiszy

Differential Revision: D15548138

fbshipit-source-id: 460118648bb4467c096f79dea524060c9524f23d
2019-05-31 20:25:34 -07:00
516ea33f6a add PT maxpool and avgpool ops to the benchmark suite (#21200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21200

This diff adds MaxPool1d/2d/3d and AvgPool1d/2d/3d to the benchmark suite.

Reviewed By: hl475

Differential Revision: D15541980

fbshipit-source-id: 394d136ee94a16ee24285939323ca5fe317e99d3
2019-05-31 19:35:29 -07:00
dceea73460 add PT conv and convtranspose ops to the benchmark suite (#21199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21199

This diff adds Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Conv3d, and ConvTranspose3d operators to the benchmark suite.

Reviewed By: hl475

Differential Revision: D15520817

fbshipit-source-id: 5512afec2be8a1036fbcd170f70265c7e455fcde
2019-05-31 19:35:25 -07:00
2d75d31398 add PT linear op to the benchmark suite (#21204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21204

as title

Reviewed By: hl475

Differential Revision: D15484743

fbshipit-source-id: 7094a983e370e1c3952021146b58b844874b7d5e
2019-05-31 19:35:22 -07:00
00b3e69211 add PT batchnorm op to the benchmark suite (#21201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21201

as title

Reviewed By: hl475

Differential Revision: D15482581

fbshipit-source-id: d93713a35be41e76d077df419cb24585f69d72eb
2019-05-31 19:35:18 -07:00
ed1078bde3 migrate matmul operator to the new interface (#21198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21198

as title

Reviewed By: hl475

Differential Revision: D15325768

fbshipit-source-id: a5d7c6837cd09445e75846660d12807dd26af6cc
2019-05-31 19:35:15 -07:00
c8dc707fee avoid multiple writes to files on export (#21186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21186
ghimport-source-id: 2f62fed50e0d74f4162b74b6a2f44b8baa376316

Differential Revision: D15581527

Pulled By: suo

fbshipit-source-id: b1150cfa47d8df6f217f048c742a5ba9fa7f7935
2019-05-31 19:14:46 -07:00
4c19421f16 Register gradient op with engine (#21205)
Summary:
cc dreiss
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205

Differential Revision: D15578948

Pulled By: bddppq

fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345
2019-05-31 18:48:47 -07:00
daa1e2de1a Add file:line:graph to graph printout (#21180)
Summary:
Example:

```
import torch

torch.jit.script
def foo(x):
    y = torch.neg(x)
    return x - y

print(foo.graph.debug_str())
```

```
graph(%x : Tensor):
  %2 : int = prim::Constant[value=1]()
  %y : Tensor = aten::neg(%x) # demo.py:5:9
  %3 : Tensor = aten::sub(%x, %y, %2) # demo.py:6:12
  return (%3)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21180

Differential Revision: D15583548

Pulled By: jamesr66a

fbshipit-source-id: 0c6dc2fb7555c01dde9c563b78422ef234b2681b
2019-05-31 18:14:18 -07:00
678dc44d4c use _sparse_coo_tensor_unsafe in coalesce for speedup (#21214)
Summary:
Studied why sparse tensor coalesce was slow:  issue #10757.

Using nv-prof, and writing a simple benchmark, I determined bulk of the time was used ``kernelTransformReduceInnermostDimIndex``, which is called when sparse tensor is constructed with sparse_coo_tensor when it does sanity check on the minimum and maximum indices. However, we do not need this sanity check because after coalescing the tensor, these min/maxs won't change.

On my benchmark with 1 million non-zeros, the runtime of coalesce. was about 10x from 0.52s to 0.005 sec.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21214

Reviewed By: bddppq

Differential Revision: D15584338

Pulled By: akyrola

fbshipit-source-id: a08378baa018dbd0b45d7aba661fc9aefd3791e0
2019-05-31 17:10:05 -07:00
9e5f1db66b Reuse common options between ONNXIFI and TVM transformations (#21163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21163

These two backend transformation share some common traits. Therefore we want to reuse the data struct/code as much as possible.

Reviewed By: hlu1

Differential Revision: D15561177

fbshipit-source-id: 35f5d63b2b5b3657f4ba099634fd27c3af545f1b
2019-05-31 17:01:36 -07:00
b12a5f6155 schema_matching.cpp: mark internal functions as static. (#21140)
Summary:
Some of the functions are only used in this file - mark them `static`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21140

Differential Revision: D15578076

Pulled By: Krovatkin

fbshipit-source-id: 71ae67baabebd40c38ecb9292b5b8202ad2b9fc1
2019-05-31 16:40:16 -07:00
668dbcc41b migrate intraop benchmarks to the new interface (#21202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21202

Migrate Ilia's op benchmarks to the new interface

Reviewed By: hl475

Differential Revision: D15322577

fbshipit-source-id: 8e75d51e7ddacbd56896c55f2996a9358491d83e
2019-05-31 16:19:04 -07:00
c62d476206 migrate add operator to the new interface (#21152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21152

Migrate existing add benchmark to use the new op front-end

Reviewed By: zheng-xq

Differential Revision: D15325524

fbshipit-source-id: 34e969e1bd289913d881c476711bce9f8ac18a29
2019-05-31 16:19:00 -07:00
fd19d06db4 remaining use of t.quantize_linear (#21219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21219

att

Differential Revision: D15583802

fbshipit-source-id: 742e8b799d67485b2d48b1458839f3f3b000f200
2019-05-31 16:05:44 -07:00
4dbeb87e52 PyTorch Dockerfile should update submodules recursively.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21216

Differential Revision: D15584114

Pulled By: bddppq

fbshipit-source-id: dbe0c3a54024a90fcd2c6689f8b9689ed0cd639b
2019-05-31 14:56:57 -07:00
0aeb971622 conditionally defined var better error message (#20911)
Summary:
i will do loops in a follow up after some other changes I am working on have landed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20911

Differential Revision: D15497205

Pulled By: eellison

fbshipit-source-id: 8cac197c6a6045b27b552cbb39e6fc86ca747b18
2019-05-31 14:32:03 -07:00
2f4824b2fb Add support for recursive compilation on Modules (#20708)
Summary:
Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939.

Still to do:
* Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled)
* Calling a method other than `forward` on a submodule doesn't work

](https://our.intern.facebook.com/intern/diff/15560490/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708

Pulled By: driazati

Differential Revision: D15560490

fbshipit-source-id: cc7ef3a1c2772eff9beba5f3e66546d2b7d7198a
2019-05-31 14:27:16 -07:00
834d678eb8 Remove old custom op implementation (#21085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21085

Now that torch::jit::RegisterOperators() always passes through to torch::RegisterOperators() (see diffs stacked below this), we can remove the old custom op implementation.

Reviewed By: dzhulgakov

Differential Revision: D15542261

fbshipit-source-id: ef437e6c71950e58fdd237d6abd035826753c2e4
2019-05-31 13:51:14 -07:00
384d828ea5 Add aliasAnalysis to torch::RegisterOperators() (#21084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21084

- Now AliasAnalysisKind can be set using the torch::RegisterOperators() API
- This also allows us to remove the last place in torch::jit::RegisterOperators that didn't use c10 yet.

Reviewed By: dzhulgakov

Differential Revision: D15542097

fbshipit-source-id: ea127ecf051a5c1e567e035692deed44e04faa9e
2019-05-31 13:51:07 -07:00
80556761c8 c10::OperatorOptions (#21181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21181

Implement c10::OperatorOptions as a class to store metadata about operators.
This is meant to replace torch::jit::OperatorOptions.

Reviewed By: dzhulgakov

Differential Revision: D15569897

fbshipit-source-id: 95bf0bf917c1ef2bdf32702405844e1a116d9a64
2019-05-31 13:51:00 -07:00
b91e0d14a7 registration options should only be callable on rvalues (#21079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21079

They're invalidating *this, so they shouldn't be callable on non-rvalues.

Reviewed By: dzhulgakov

Differential Revision: D15541583

fbshipit-source-id: a2a9dafb29af03477486ea2ce9029399f557c728
2019-05-31 13:50:54 -07:00
181792176d Implement various AliasAnalysis operations directly on top of MemoryLocations. (#21203)
Summary:
This reduces DenseNet load time by about 25% (down to 5.3s on my laptop) and gets AliasAnalysis out of the profile top hits entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21203

Differential Revision: D15578155

fbshipit-source-id: ddbb1ad25c9540b5214702830084aa51cc6fd3cb
2019-05-31 13:38:32 -07:00
e098878d75 Cuda persistent softmax (#20827)
Summary:
Adds persistent cuda kernels that speed up SoftMax applied over the fast dimension, i.e. torch.nn.Softmax(dim=-1) and torch.nn.LogSoftmax(dim=-1). When the size is <= 1024, this code is 2-10x faster than the current code, speedup is higher for smaller sizes. This code works for half, float and double tensors with 1024 or fewer elements in the fast dimension. Numerical accuracy is on par with the current code, i.e. relative error is ~1e-8 for float tensors and ~1e-17 for double tensors. Relative error was computed against the CPU code.

The attached image shows kernel time in us for torch.nn.Softmax(dim=-1) applied to a half precision tensor of shape [16384,n], n is plotted along the horizontal axis. Similar uplifts can be seen for the backward pass and for LogSoftmax.

![image](https://user-images.githubusercontent.com/41591019/58212822-b63ebb00-7cb5-11e9-910d-1fc7d8585d58.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20827

Differential Revision: D15582509

Pulled By: ezyang

fbshipit-source-id: 65805db37487cebbc4ceefb1a1bd486d24745f80
2019-05-31 13:20:15 -07:00
052bab7069 Move legacy TH functions(sinh,cosh) to TensorIterator + Vec256 (#21115)
Summary:
This is a follow up on Jame's PR: https://github.com/pytorch/pytorch/pull/19041. The idea is to replace the legacy `sinh` / `cosh` ops that are being dispatched to TH with the operations defined in `Vec256` for better performance.

benchmark(from Jame's script):

```python
import torch, time
ops = ['sinh', 'cosh']
x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')
for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')
```
code on master:

```
op	time per iter (ms)	gops/s	GB/s
sinh	3.37614369392395	0.3105839369002935	2.484671495202348
cosh	3.480502033233643	0.3012714803748572	2.4101718429988574
```
after change (on Macbook pro 2018):

```
op	time per iter (ms)	gops/s	GB/s
sinh	0.8956503868103027	1.1707425301677301	9.365940241341841
cosh	0.9392147302627564	1.1164390487217428	8.931512389773943
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21115

Reviewed By: ljk53

Differential Revision: D15574580

Pulled By: xta0

fbshipit-source-id: 392546a0df11ed4f0945f2bc84bf5dea2750b60e
2019-05-31 12:06:26 -07:00
7f960a9c01 remove quantize_linear from Tensor method (#21196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21196

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15577123

fbshipit-source-id: d0abeea488418fa9ab212f84b0b97ee237124240
2019-05-31 12:01:10 -07:00
c185145d8c remove dependency to caffe2::math and eigen (#21169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21169

We should minimize dependency from perfkernels (we were including eigen header files only in cc files not compiled with avx or avx2 options but better to be very strict because it's easy to introduce illegal instruction errors in perfkernels)

Reviewed By: salexspb

Differential Revision: D15563839

fbshipit-source-id: d4b1bca22d7f2e6f20f23664d4b99498e5984586
2019-05-31 11:55:16 -07:00
8c927b208c improve test_docs_coverage error messages (#21029)
Summary:
Most important fix: Correct "tensor.rst" to "tensors.rst"

Secondary fix: some minor English spelling/grammar fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21029

Differential Revision: D15523230

Pulled By: umanwizard

fbshipit-source-id: 6052d8609c86efa41a4289cd3a099b2f1037c810
2019-05-31 11:13:39 -07:00
e13b483f58 Fix weak module cuda() _flat_weights bug (#21107)
Summary:
Dynamically creating a type at runtime was messing up the MRO and has been causing many other problems. I think it's best to delete it, this causes a regression since
```python
self.linear = nn.Linear(10, 10)
isinstance(self.linear, nn.Linear)
```
will now be `False` again, but this will be fixed once recursive script mode is the default (#20939)
](https://our.intern.facebook.com/intern/diff/15560549/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21107

Pulled By: driazati

Differential Revision: D15560549

fbshipit-source-id: 7bd6b958acb4f353d427d66196bb4ee577ecb1a6
2019-05-31 10:35:30 -07:00
0223d3744a introduce a new intrace to add op [C2 changes] (#21148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21148

The diff modifies the interface for Caffe2 operators in the benchmark suite

Reviewed By: zheng-xq

Differential Revision: D15433888

fbshipit-source-id: c264a95906422d7a26c10b1f9836ba8b35e36b53
2019-05-31 09:21:07 -07:00
31089b02ce introduce a new interface to add op [core changes] (#21147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21147

This diff introduces a new interface to add PT/C2 operators to the benchmark suite.

The following steps are needed to add a new operator:
1. Specify the input shapes, args to an operator in configs
2. Create a PT/C2 benchmark class which includes ```init``` (create tensors),  ```forward``` (specify the operator to be tested.), and ```backward```(gradient of an op.) methods
3. call generate_pt_test/generate_c2_test to create test cases based on configs

Reviewed By: zheng-xq

Differential Revision: D15250380

fbshipit-source-id: 1025a7cf60d2427baa0f3f716455946d3d3e6a27
2019-05-31 09:21:04 -07:00
012069ca8f Revert D15454048: Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen
Differential Revision:
D15454048

Original commit changeset: 8bfc57bf015b

fbshipit-source-id: 98c562ab4cf7a00e9041b2aa50eb7fb0f0c48f69
2019-05-31 07:49:22 -07:00
dc8f306b8e Revert D15454052: Move THCTensor_(cauchy) to ATen
Differential Revision:
D15454052

Original commit changeset: 4f4d33ec11cf

fbshipit-source-id: 832a738796e6b6bdf969a44bb2cdcf171cbd5f77
2019-05-31 07:49:18 -07:00
be9ce6318e remove import torchvision when testing torch.hub (#21132)
Summary:
This should pass once https://github.com/pytorch/vision/pull/971 is merged.
To remove torchvision as baseline, we just compare to sum of all param.sum() in pretrained resnet18 model, which means we need to manually update the number only when that pretrained weights are changed, which is generally rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21132

Differential Revision: D15563078

Pulled By: ailzhang

fbshipit-source-id: f28c6874149a1e6bd9894402f6847fd18f38b2b7
2019-05-31 07:38:30 -07:00
e161360b62 Revert D15558784: [reland][pt1][quant] remove quantize_linear from Tensor method
Differential Revision:
D15558784

Original commit changeset: 0b194750c423

fbshipit-source-id: d180a7f76bb05ad7470f17bc3d2bd614fab16529
2019-05-31 06:20:05 -07:00
5fcd37bd8f List (#21164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21164

Write a List type to be used in operator kernels. This abstracts away from the concrete list type used (e.g. std::vector vs SmallVector)
and allows us to change these implementation details without breaking the kernel API.
Also, this class allows for handling List<bool>, which would not work with ArrayRef because vector<bool> is a bitset and can't be converted to ArrayRef<bool>.

Reviewed By: ezyang

Differential Revision: D15476434

fbshipit-source-id: 5855ae36b45b70437f996c81580f34a4c91ed18c
2019-05-31 04:15:39 -07:00
f91f24764e remove quantize_linear from Tensor method (#21156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21156

we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend
Python
```
torch.quantize_linear(t, ...)
```
C++
```
at::quantize_linear(t, ...)
```

Differential Revision: D15558784

fbshipit-source-id: 0b194750c423f51ad1ad5e9387a12b4d58d969a9
2019-05-30 22:02:12 -07:00
0a0ff83124 replace num_bits with quant_min and quant_max (#21097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21097

att

Differential Revision: D15547166

fbshipit-source-id: 60bc7f7d82c424558b67881627fb74f1eff515af
2019-05-30 20:57:57 -07:00
277bf69fa0 Add torch.load/torch.save for QTensor (#20830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830

att

Reviewed By: dzhulgakov

Differential Revision: D15340701

fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226
2019-05-30 20:52:19 -07:00
eb4d43df3b Make CUDA triu / tril support batches of size > 65535 (#21067)
Summary:
In the previous implementation of triu / tril, we passed the batch size in the 2nd dimension of a grid. This is limited to 65535, which means that performing triu / tril on a tensor with batch size > 65535 will throw an error. This PR removes the dependence on the 2nd dimension, and corresponding non-contiguity constraints.

Changelog:
- Compute offset, row and col in the kernel
- Use 1st dimension of grid alone
- Remove unnecessary contiguity checks on tensors as a result of this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21067

Differential Revision: D15572501

Pulled By: ezyang

fbshipit-source-id: 93851cb661918ce794d43eeb12c8a38762e1358c
2019-05-30 20:16:11 -07:00
057ddab766 on import, register class before defining it (#21182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21182
ghimport-source-id: 2457a4306c0a72888bb8359a267fcd12b43f103a

Differential Revision: D15571334

Pulled By: suo

fbshipit-source-id: 26ca9dddb25df1b1eac2e17c70f682e20e08cb6d
2019-05-30 20:09:01 -07:00
d6438c956b Move THCTensor_(cauchy) to ATen (#20622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20622
ghimport-source-id: b100d6cededf6f2c2020c3d7961271f16497bbdc

Differential Revision: D15454052

Pulled By: ezyang

fbshipit-source-id: 4f4d33ec11cf36b91c67759bd27252d1e457cff1
2019-05-30 18:13:16 -07:00
26d16ae515 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen (#20621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20621
ghimport-source-id: f461d7f1eb6b5a8306dd8175cbb0a7fcc9f64c76

Differential Revision: D15454048

Pulled By: ezyang

fbshipit-source-id: 8bfc57bf015b85f57ed99a54176926386aab4e34
2019-05-30 18:01:31 -07:00
07ac00d21a Automatic update of fbcode/onnx to 9005291283e943f1a91da5f0acf218bc4e8eb2ca (#21057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21057

Previous import was cc2333a3f929caca7223b98699237f19388dd585

Included changes:
- **[90052912](https://github.com/onnx/onnx/commit/90052912)**: Fix wrong condition and add --user in update_doc.sh (#2050) <daquexian>
- **[a4f44a20](https://github.com/onnx/onnx/commit/a4f44a20)**: Add bit-shift operators for supporting hashing (#1931) <Wei-Sheng Chin>
- **[0098752c](https://github.com/onnx/onnx/commit/0098752c)**: Add shape inference logic for Expand op (#2041) <Hariharan Seshadri>
- **[fbe8addb](https://github.com/onnx/onnx/commit/fbe8addb)**: update qops tests (#2040) <Ashwini Khade>
- **[874fb37c](https://github.com/onnx/onnx/commit/874fb37c)**: Fix torchvision installation (#2054) <bddppq>
- **[1f5f6582](https://github.com/onnx/onnx/commit/1f5f6582)**: Fix bug that kernel_shape rather than effective_kernel_shape is used in dilated conv (#2043) <daquexian>
- **[38b6c44e](https://github.com/onnx/onnx/commit/38b6c44e)**: Changes done internally at Facebook (#2035) <Lu Fang>
- **[5c51f0db](https://github.com/onnx/onnx/commit/5c51f0db)**: Explicitly specify type of integers in the input tensor. (#2034) <Dmitri Smirnov>

Reviewed By: benoitsteiner

Differential Revision: D15534241

fbshipit-source-id: 8d2b78a986e5b7fbeb248f2d7b80c1a07230654e
2019-05-30 17:33:18 -07:00
ff0d00f921 Updated scalar type to onnx mapping (#21095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21095
ghimport-source-id: 32a79eace02216de9170f163027b1aa93756b821

Differential Revision: D15546175

Pulled By: izdeby

fbshipit-source-id: 4e47c8538aaf30b4af198baac7279133e4d74b36
2019-05-30 17:11:12 -07:00
726caeace3 Use QTensor for bias (#21038)
Summary:
Use QTesnor for bias tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21038

Differential Revision: D15524980

Pulled By: dskhudia

fbshipit-source-id: c7bf2efc8fe3f4b5574c721c2f64ff073045ecc4
2019-05-30 16:16:03 -07:00
64f06d4964 Enable all and any for bool tensors (#21033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21033
ghimport-source-id: 35fdcf27b0bde8ec3e5b3051cf0d730f20f94783

Differential Revision: D15530497

Pulled By: izdeby

fbshipit-source-id: 9c15cc960055f59a05ce0276f9d51c567626d966
2019-05-30 16:16:00 -07:00
9a22cb9f49 Enabled add, sum and mul for bool tensor (#21032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21032
ghimport-source-id: 6ab21752b4af451e8b10a0e02cd5d726aa7472f0

Differential Revision: D15530496

Pulled By: izdeby

fbshipit-source-id: f4f83aa80eafbb4f307aadc1a13d8cdcf3055c24
2019-05-30 16:11:43 -07:00
fe39602451 Support for rudimentary f-strings (#21037)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/51

This adds support for converting simple f-string literals to calls to `string.format()`. It does not support conversion specifiers or format strings.

This also does not support the string parser frontend, since that implementation would be more involved and likely would require modifying our TorchScript AST
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21037

Reviewed By: zdevito

Differential Revision: D15541183

Pulled By: jamesr66a

fbshipit-source-id: ae9df85e73f646d7219c1349f5b7683becbcef20
2019-05-30 15:50:45 -07:00
76deb450c6 Record source/line info in SourceRange and report in highlight (#21157)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/20898 with flake8 fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21157

Reviewed By: zdevito

Differential Revision: D15560324

Pulled By: jamesr66a

fbshipit-source-id: fc4e429eac03d2768f758b19c9d43e0bb614c2b8
2019-05-30 15:45:30 -07:00
416357648c Optimize alias analysis (#20899)
Summary:
# Overall Improvements
1. Switched from using `unordered_set` to sparse bitset.
1. Prevent some excessive memory allocations (thanks to resistor )
1. Take advantage of the sparse bitset operations
1. Switch to `flat_hash_map` instead of `unordered_map` in some places.

# Benchmarks (somewhat approximate, best of a couple runs)
1. InceptionNet (load + one forward pass): 19.8->13.3
1. GoogleNet(load + one forward pass): 10.0 -> 7.24
1. DenseNet (only load): 7.3 -> 5.3

I use the `sparse bitset` taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use `__builtin_popcountl` and instructions like that instead of other transitive clang dependencies.

## Some notes on our graph topologies
In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k `mayAlias` queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves).

PS: Holy crap macbooks throttle an insane amount with the default fan settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20899

Differential Revision: D15564612

Pulled By: Chillee

fbshipit-source-id: 2a293a21a9be25f942ca888c8f225cab32bbfcd0
2019-05-30 15:37:50 -07:00
31aefd9b09 Adding models to jenkins benchmark script (#21010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21010

Adding and editing the jenkins benchmarks script to accommadate both
Renext and Shufflenet models.

Reviewed By: bddppq

Differential Revision: D15515354

fbshipit-source-id: 2a92c272b0b74ed3ecc78af6544a06337c7753cf
2019-05-30 15:17:40 -07:00
f6e5846a67 add handle to run all jit tests (#21161)
Summary:
Now you can run `python test/run_tests --jit` to run all jit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21161

Differential Revision: D15563912

Pulled By: eellison

fbshipit-source-id: 4bb0285cda4168b72a3dc4bba471485566a59873
2019-05-30 14:12:21 -07:00
7f308b88b9 Only populate net_pos in ssaRewrite if the op doesn't already have a net_pos argument (#21051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21051

In net transforms, we perform an SSARewrite where we update the 'net_pos' for all the ops in the net. The transform function also takes a unordered set of net positions for blacklisting. It's possible that SSARewrite will change the indexes of the ops so the blacklist is applied to the wrong ops. We fix this issue by having SSARewrite only assign new net_pos if the op doesn't already have one.

Reviewed By: yinghai

Differential Revision: D15532795

fbshipit-source-id: e020492a7b5196a91cdc39d0eda761b1ca612cdb
2019-05-30 13:37:35 -07:00
80020306ef Added base parameter to math.log (#21151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21151
ghimport-source-id: 76dc0852022a87a000888a787de1391f71923074

Differential Revision: D15563185

Pulled By: Chillee

fbshipit-source-id: 6ed7cc32ed7c103f360022b97f6df47ccd0403e7
2019-05-30 13:32:52 -07:00
4e3e4d7ff5 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 80a731a1b8b04df01cb0d68ec39d4af10e0b61b7
2019-05-30 13:07:20 -07:00
4aee92833c Update libtorch docs (#21150)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21150

Differential Revision: D15559590

Pulled By: pjh5

fbshipit-source-id: 4063bf91464425e8efe4765dc17bb7e9b7bfccc7
2019-05-30 12:49:56 -07:00
313ef4f5d5 Make data_ptr a method on Tensor (#20878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20878
ghimport-source-id: f19993d97ecb8cfcd60b371d9ed49e3ad2e051c7

Differential Revision: D15482061

Pulled By: li-roy

fbshipit-source-id: c0563ce849fc3277e86a1a58bd384e38365786b2
2019-05-30 11:47:59 -07:00
d17aa72373 Added more regression test for groupconv w/o bias. (#18519)
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/18218, which was fixed by https://github.com/pytorch/pytorch/pull/18463 with mkl-dnn upgraded to v0.18.1.
Covering special case when group > 1, input-channel / group < 16 and output-channel is multiple of 16.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18519

Differential Revision: D14643071

Pulled By: soumith

fbshipit-source-id: d0ebed59326c67089e042b50583b87ed2c3ccc2f
2019-05-30 11:36:07 -07:00
6dc445e1a8 Conservative alias analysis rules for CallFunction/CallMethod (#21087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21087
ghimport-source-id: 4fa6763ffecc7d2974b902dd9bd2bd9ac467bab7

Differential Revision: D15542512

Pulled By: zdevito

fbshipit-source-id: 2dcd673cd4c200d7a854347429d4f33a11793cbc
2019-05-30 11:01:56 -07:00
b6d1a72f48 improve error message on inferred type (#21058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058
ghimport-source-id: 7fad3a0567022dd417f4bd079a50a22e3c1dc020

Differential Revision: D15547218

Pulled By: suo

fbshipit-source-id: 5dbd567c79e6d01e9af4b8552777f7f0043df5b2
2019-05-30 10:50:34 -07:00
ec76976a7a Remove all devtoolset7 jobs (#21153)
Summary:
These do not work. We'll save time and cpu until someone has the time to fix these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21153

Differential Revision: D15558601

Pulled By: pjh5

fbshipit-source-id: f9bfe580aa7962a88506f9af0032647f553637a4
2019-05-30 10:39:26 -07:00
fffffde2f8 Delete more tabs, fix lint. (#21142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21142
ghimport-source-id: 4666c0731d9c08e9990ffafd0ae88fa1e7896348

Differential Revision: D15555285

Pulled By: ezyang

fbshipit-source-id: 9e5bfacf202ceba37bd29cfd5dcb651b7f79068d
2019-05-30 06:36:47 -07:00
e9df9e7960 Revert D15552424: [pytorch][PR] [JIT] Record source/line info in SourceRange and report in highlight
Differential Revision:
D15552424

Original commit changeset: 78d0f0de03f7

fbshipit-source-id: cc24f62189b7bbcdc1406912cfb3d4ca52b8e67e
2019-05-30 05:17:15 -07:00
c4a90ca18e Revert D15477933: [pt1][quant] remove quantize_linear and dequantize from Tensor method
Differential Revision:
D15477933

Original commit changeset: c8aa81f681e0

fbshipit-source-id: ec494fbbab72e20da262bdd8657887e1fdd173cb
2019-05-30 05:04:12 -07:00
3805490d6a Typo fix (#21122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21122

fix a typo

Reviewed By: dzhulgakov

Differential Revision: D15553921

fbshipit-source-id: 260b0be5975d49bb6d70e45d83505efcecf02875
2019-05-30 00:16:01 -07:00
52ded63128 Revert D15546045: [jit] Add support for recursive compilation on Modules
Differential Revision:
D15546045

Original commit changeset: c2c8fe179088

fbshipit-source-id: c921fb92cf9f5c6c94c77fa5070f9c5775c91b77
2019-05-29 23:42:50 -07:00
3083c71cde First class functions in IR, inlined eagerly (#21052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21052
ghimport-source-id: cc476b9cc301967dde5de6212ca144cdb252e84c

Differential Revision: D15533353

Pulled By: zdevito

fbshipit-source-id: 4d25461969cfcc9e5f641d585584cc100c7b34ae
2019-05-29 23:04:18 -07:00
6b099edb53 fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21118

Differential Revision: D15553121

Pulled By: jamesr66a

fbshipit-source-id: 14ebf0e4cb33f8155ac86a9538beb8570bdfe8c8
2019-05-29 21:50:12 -07:00
7cea6d9b71 Redesign the output shape adjustment of OnnxifiOp (#21027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21027

Previously, we are only able to adjust batch size when output shape has batch size conditioned at its first dim. Although not common, there are cases where we want to slice back the output whose batch size is conditioned on non-first dim, or whose output shape doesn't really has batch size in it but rather is an expression of it. Examples are shapes at the output of `Transpose` or `Tile`. This diff redesigns how we handle the output size. The key is when we run OnnxifiOp, the input shapes are given, and we can actually do a shape inference to derive the real output shapes, no matter how they got transformed. And then we compare the real output shape with max batch sized output shape, dim by dim and use a `Slice` op to cut the max output back to real output shape.

Notice that general `Slice` op is slow and in most of the cases, we still prefer adjusting batch size by shrinking its first dim, which is just an operation on meta info without data allocation/manipulation. Therefore, we add a flag `fast_path` to detect this situation and operate accordingly.

Reviewed By: tracelogfb

Differential Revision: D15515189

fbshipit-source-id: 9c1fff161f82d0bc20eeac07ca4a2756e964e9fd
2019-05-29 21:39:00 -07:00
6875018793 Record source/line info in SourceRange and report in highlight (#20898)
Summary:
Resolves https://github.com/pytorch/lockdown/issues/29

Examples:

```
import torch

torch.jit.script
def foobar(x):
    return torch.blargh(xyz)

==

RuntimeError:
object has no attribute blargh:
at compile.py:5:12
torch.jit.script
def foo(x):
    return torch.blargh(x)
           ~~~~~~~~~~~~ <--- HERE
```

It also gets the correct column number in the case where the original source file has common leading whitespace in front of the callable:

```
import torch

with torch.no_grad():
            torch.jit.script
            def foo(x):
                return torch.blargh(x)

==
RuntimeError:
object has no attribute blargh:
at compile_leading.py:6:24
torch.jit.script
def foo(x):
    return torch.blargh(x)
           ~~~~~~~~~~~~ <--- HERE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20898

Differential Revision: D15552424

Pulled By: jamesr66a

fbshipit-source-id: 78d0f0de03f7ccbf3e7ea193a1b4eced57ea5d69
2019-05-29 21:32:33 -07:00
57f4f98c40 Fix borked SourceRanges
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21109

Reviewed By: zdevito

Differential Revision: D15551392

Pulled By: jamesr66a

fbshipit-source-id: 4f29214049b8feced0e740f84007b5751703ee20
2019-05-29 20:13:14 -07:00
67291ba74f remove quantize_linear and dequantize from Tensor method (#20874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20874

A criteria for what should go in Tensor method is whether numpy has it, for this one it does not
so we are removing it as a Tensor method, we can still call it as function.
Python
```
torch.quantize_linear(t, ...), torch.dequantize(t)
```
C++
```
at::quantize_linear(t, ...), at::dequantize(t)
```

Reviewed By: dzhulgakov

Differential Revision: D15477933

fbshipit-source-id: c8aa81f681e02f038d72e44f0c700632f1af8437
2019-05-29 19:17:16 -07:00
8d3388aef2 Add support for recursive compilation on Modules (#20708)
Summary:
Following on #19747, this implements most of the `torch.jit.script()` changes laid out in #20939.

Still to do:
* Accessing a method from Python does not add it as a `ScriptMethod` (so only `export`ed methods and `forward` are compiled)
* Calling a method other than `forward` on a submodule doesn't work
](https://our.intern.facebook.com/intern/diff/15546045/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20708

Pulled By: driazati

Differential Revision: D15546045

fbshipit-source-id: c2c8fe179088ffbdad47198e799a456560655b86
2019-05-29 18:52:36 -07:00
33d35f5f93 Fixed isinstance typos
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21102

Differential Revision: D15549564

Pulled By: Chillee

fbshipit-source-id: 6746dc9e01b5a30d55d544beb70b7005f0cfd8ae
2019-05-29 17:51:27 -07:00
990e63f587 Remove unnecessary sources from base CircleCI AMI (#21103)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21103

Differential Revision: D15550213

Pulled By: kostmo

fbshipit-source-id: b4a2c38d168f722b30c96494079ccdd468b9ece8
2019-05-29 17:46:08 -07:00
12b0dede39 Support exporting tensor factories from scripting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20255

Differential Revision: D15534186

Pulled By: houseroad

fbshipit-source-id: 182e117a35fa31445fcad8cb492160500f71599a
2019-05-29 16:53:49 -07:00
9be72ce44f Convert Tree to use intrusive_ptr instead of shared_ptr.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20815

Differential Revision: D15453817

fbshipit-source-id: 569ab807d32fb3dcebfe201a049c770b1600e5c7
2019-05-29 16:33:02 -07:00
4900edebcf QTensor permute, transpose and contiguous (#20869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20869

Adding support for the functions listed in the title, by implementing the copy kernel.

Differential Revision: D15474060

fbshipit-source-id: 9264df6e442cca1cc5d952e3e5dcc9f4a426f317
2019-05-29 16:05:53 -07:00
99b057d89c Failing assertions is unlikely (#20876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20876

Tell the compiler that assertions are likely to succeed.
This allows the compiler to generate betterr code and optimize for the success case.

Differential Revision: D15480066

fbshipit-source-id: 4485154d66b2ee0ef8a401718712dbd61d811aee
2019-05-29 15:59:33 -07:00
9daf48525e Quantized Max Pool op (#20474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20474

parallel implementaiton of the MaxPool (no ReLU).

Reviewed By: dskhudia

Differential Revision: D15327923

fbshipit-source-id: ca6475e7fe1434b55d4b7730a074bb7ff50355fd
2019-05-29 15:01:01 -07:00
154029a6ff Revert D15534670: [jit] improve error message on inferred type
Differential Revision:
D15534670

Original commit changeset: 8bbfd6e9c1af

fbshipit-source-id: fe62cf954292e8ef1d00a3cc569206f73cedcd31
2019-05-29 14:56:08 -07:00
5dacf6b048 improve error message on inferred type (#21058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21058
ghimport-source-id: e7d6e082b0faf4f3d3e683f2c98863ee269439f0

Differential Revision: D15534670

Pulled By: suo

fbshipit-source-id: 8bbfd6e9c1afbc3006d7d55ed633e18618e05021
2019-05-29 14:47:00 -07:00
6ea9044d3c add 'all' builtin (#20521)
Summary:
[jit] add 'all' builtin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20521

Differential Revision: D15527657

Pulled By: driazati

fbshipit-source-id: eaa3c1c560810581150646858339369e4305fdf2
2019-05-29 14:46:56 -07:00
8fcd80af20 Fix "cuda: unknown error" on Windows (#21062)
Summary:
Thanks Jonas1312 for validating this workground.
Fixes #20635.
However, I don't know exactly why this one is needed.
The following are my guesses:
1. It is a CUDA bug. Static linking against `cudart` is the default now, so they didn't run enough tests for dynamic ones.
2. It is related to UCRT. But (1)according to msdn, shared DLLs should share the same CRT. (2) The CUDA related objects like `CUDevice` passing to `cudart` are stored on the stack, not the heap. (3) If this is the case, it should always fail, not sometimes. https://docs.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries?view=vs-2019
3. It is a bug of our side. However, I was unable to find it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21062

Differential Revision: D15543557

Pulled By: ezyang

fbshipit-source-id: c23af45ebf582fad93ce5f029af6e1f06cf1d49d
2019-05-29 14:34:02 -07:00
157fcfc07d Add quantize_linear_per_channel (#20765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20765

att

Reviewed By: dskhudia

Differential Revision: D15435455

fbshipit-source-id: 77770044411ce8ee02d26d63eb7e79cd10db103e
2019-05-29 14:29:16 -07:00
53ccba004f New torch assertion macros (#20887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20887

Switch AT_xxx assertion macros to the TORCH_ variants and make sure the separation between TORCH_CHECK and TORCH_INTERNAL_ASSERT makes sense.

Differential Revision: D15484658

fbshipit-source-id: 490ae64cc36946756c30971f1b685048bc5f77da
2019-05-29 14:15:04 -07:00
449a2c3555 Fixes #20124 (#20203)
Summary:
Fixes #20124

Description:
Code wraps `optimizer.step()` method to detect whether user is following new pattern or old pattern. In case of old pattern detected, a UserWarning is raised. Documentation is also updated to reflect the change:

![Screen Shot 2019-05-07 at 11 05 17](https://user-images.githubusercontent.com/2459423/57287527-04e63580-70b8-11e9-9ddd-5d159ef0ed2f.png)

cc SsnL, bado-lee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20203

Differential Revision: D15543060

Pulled By: ezyang

fbshipit-source-id: 3605e1afdb6ffc2dfd5e75e92e01b967c4d065b5
2019-05-29 14:15:01 -07:00
74375299e0 add torch.nn._intrinsic and torch.nn._intrinsic.quantized namespace (#20940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20940

- `torch.nn._intrinsic` will contain normal(unquantized) fused modules like Conv2DRelu, Conv2DBnRelu, FakeQuantize ops etc.
- `torch.nn._intrinsic` will contain fused and quantized modules like Quantized Conv2DRelu, Quantized LinearRelu etc.
Right now I only added FakeQuantize op in `torch.nn._intrinsic` namespace, we'll have more later

Differential Revision: D15505228

fbshipit-source-id: d380929e38af7a5bcfbea27474d5b80f95d43b03
2019-05-29 14:09:37 -07:00
736bf7b46c Fix __constants__ for some nn modules (#21071)
Summary:
A bunch of modules were missing entries for `__constants__` which was making their `__repr__`s not work. Others had `__constants__` that were not necessary since it was provided by some parent class instead.

Fixes #20978
](https://our.intern.facebook.com/intern/diff/15539518/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21071

Pulled By: driazati

Differential Revision: D15539518

fbshipit-source-id: 24bdd1ef41ef636eefd5d2bad4ab2d79646ed4f0
2019-05-29 13:55:53 -07:00
1e1f2c85f0 remove constant pooling expect (#21003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21003
ghimport-source-id: c1e0d0555758cab12ce82e0283bab559c7e8e4e2

Differential Revision: D15523443

Pulled By: wanchaol

fbshipit-source-id: 40973c1c0c0ab07fe4b1334e9ae0e4b16b5add8e
2019-05-29 13:55:50 -07:00
0ffd20c268 Fix empty tensor for unique_dim (#19000)
Summary:
Fixes: #18408

cc: zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19000

Reviewed By: ezyang

Differential Revision: D15470136

Pulled By: VitalyFedyunin

fbshipit-source-id: daf71566b4dbdc91927d164f813b5ee8645af1a2
2019-05-29 13:50:32 -07:00
2cd1c78632 Revert D15523444: [jit] move casting ops from prim to aten
Differential Revision:
D15523444

Original commit changeset: 642342bf1cce

fbshipit-source-id: 29de1c7e19cbb3273230c280346e786e61d2d445
2019-05-29 13:42:05 -07:00
7cb1aa67b0 Enabled min, max, minall, maxall, cmin, cmax, cmaxValue, cminValue for bool tensors (#21031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21031
ghimport-source-id: 379b3e9d20872eb5ad14403ed6751cdb0e730bc5

Reviewed By: ezyang

Differential Revision: D15530499

Pulled By: izdeby

fbshipit-source-id: f113d6974ee18ac3dfb5c0bcff66865345d137d2
2019-05-29 13:22:54 -07:00
85777b92b2 Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14430749

fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707
2019-05-29 13:16:00 -07:00
a0111aaf0d move casting ops from prim to aten (#21002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21002
ghimport-source-id: 4c88a54a3ecb76c5ca3c2c328b749350860a166d

Differential Revision: D15523444

Pulled By: wanchaol

fbshipit-source-id: 642342bf1ccea83c88897bc023979a32ee01addf
2019-05-29 12:36:47 -07:00
dd903eb645 Add start and step parameters for range in torchscript (#20795)
Summary:
Fixes #18440

I calculate a derived index from `start,stop,step` as `start + step*index`. When `start=0` and `step=1` (the defaults/`range(n)`), this is the same behavior as before.

Unluckily, it seems that we do not optimize out operations like `x*1` or `x+0`. That means that we're doing lots of redundant operations when we don't need to. EDIT: More specifically, it seems like we only do this optimization for (tensor, scalar): https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/peephole.cpp#L128

The most annoying part of this code is calculating the number of iterations, given `start, stop, step`. I ended up going with the formula `(abs(stop-start) + abs(step)-1)//abs(step)`. Other intuitively appealing formulas like `(stop-start + step -1)//step` don't work for negative numbers.

I tried using `SymbolicVariable` for the calculations, but it seems that `symbolicvariable` only outputs ops for `tensors`, not the integers we have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20795

Differential Revision: D15446869

Pulled By: Chillee

fbshipit-source-id: 6085545ace04e25985c6ac870226f7a651f670d5
2019-05-29 12:31:29 -07:00
fa8c132e24 Revert D15502768: [pytorch][PR] [jit] Make ScriptModule.training an attribute instead of a parameter
Differential Revision:
D15502768

Original commit changeset: 3022f2d57ec6

fbshipit-source-id: 5cd08d3c3a75e38e3aa9b75a0c0059a2c6c85a1e
2019-05-29 12:18:18 -07:00
94b9706017 fix dequantize_linear (#21035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21035

Fix the dtype error in `dequantize_linear`, it should accept the same dtype argument as `quantize_linear`

Differential Revision: D15521931

fbshipit-source-id: 0114c046a3f1046e42fca49c74c85e487fee8616
2019-05-29 12:18:15 -07:00
cbf2a4f5c4 print a warning if a type annotation prefix is invalid according to mypy (#20884)
Summary:
This PR adds a check that prints a warning if a type annotation prefix isn't what mypy expects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20884

Differential Revision: D15511043

Pulled By: Krovatkin

fbshipit-source-id: 9038e074807832931faaa5f4e69628f94f51fd72
2019-05-29 11:56:55 -07:00
a6bb15493d Removed accidental TensorFlow dependency (#21066)
Summary:
I accidentally added a TF dependency in #20413 by using the from tensorboard.plugins.mesh.summary import _get_json_config import.

I'm removing it at the cost of code duplication.

orionr, Please review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21066

Reviewed By: natalialunova

Differential Revision: D15538746

Pulled By: orionr

fbshipit-source-id: 8a822719a4a9f5d67f1badb474e3a73cefce507f
2019-05-29 11:18:10 -07:00
f2199a34eb Hook to store additional metadata about environment (#20863)
Summary:
In larger system environment, there's usually a need to store some information about how the model was created (e.g. from which process, workflow, by which user, etc). It's almost like JPEG metadata written by camera.

This PR adds a low-level c++ hook to allow population of additional files in zip container based on environment. The reason to have it a low-level hook instead of top-level API wrapper (e.g. `m.save_with_metadata`) is to capture all usages of the saving API transparently for user.

Let me know if there are concerns.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20863

Differential Revision: D15487941

Pulled By: dzhulgakov

fbshipit-source-id: 120c5a4c9758aa82846bb51a1207f923e3da1333
2019-05-29 10:11:58 -07:00
00c1584979 Added possibility to index scalars by bool masks (#21030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21030
ghimport-source-id: 7a66ca096c62d050a38a6fcc9f6b2d61e387eb34

Differential Revision: D15530498

Pulled By: izdeby

fbshipit-source-id: d5d38f9610caa55fb7179d41f568c5ea5fa1f2e2
2019-05-29 09:32:55 -07:00
1d4685c20f Improve test_proper_exit error printing (#20166)
Summary:
This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166

Differential Revision: D15536504

Pulled By: ezyang

fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356
2019-05-29 07:51:31 -07:00
aa42742df0 ctc_loss: fix backward when 2d target tensor is larger than max_target_length (#20971)
Summary:
Previously, we didn't work when 2d target tensors had extra columns at the end. Now we just ignore those.
Also fix the confusion in the doc example regarding the number of classes.

Thank you, ypw-rich for the report with reproducing example.

Fixes: #20522
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20971

Differential Revision: D15535481

Pulled By: ezyang

fbshipit-source-id: 397e44e20165fc4fa2547bee9390d4c0b688df93
2019-05-29 05:13:00 -07:00
55f5eb3c47 DilatedMaxPool2d: small cleanup
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20936

Differential Revision: D15514542

Pulled By: ezyang

fbshipit-source-id: 6341a4bb8a9ee0b632c32a013ea609d842a21962
2019-05-29 05:06:33 -07:00
f8565121d9 Port dilated_max_pool3d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20933

Differential Revision: D15525485

Pulled By: ezyang

fbshipit-source-id: 6ff44f11d984903cd20d79cfad04963e6443e6ca
2019-05-29 04:58:42 -07:00
0544a491d5 Revert D15499749: [pytorch][PR] Add Tensor.T attribute to reverse dimensions
Differential Revision:
D15499749

Original commit changeset: f3306b496667

fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878
2019-05-29 04:29:48 -07:00
3038cf8eee Remove THSTensor and SparseTensorRef (#20877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877
ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea

Differential Revision: D15480353

Pulled By: li-roy

fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f
2019-05-29 01:37:03 -07:00
9faa409b56 Fix __irshift__ dispatch (#21047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21047
ghimport-source-id: 8c781c9882eebb07325a1fc7aa6f340bbec18886

Differential Revision: D15529160

Pulled By: li-roy

fbshipit-source-id: d9a444e42df5c509ae10849ba6f8006fbec830c5
2019-05-29 01:03:34 -07:00
8dda19b79f Remove extraneous TensorId checks in as_strided (#21045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21045
ghimport-source-id: e95fbf50bccf6ebc613bb13fb16915254912f22d

Differential Revision: D15528971

Pulled By: li-roy

fbshipit-source-id: c721cc6280dff6e14c5533681d0b35aaa8f98f00
2019-05-29 00:53:53 -07:00
d76546a463 Fix tracing bugs where using 1 - x in C++ would cause the size of 1 to get hardcoded. (#20932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20932
ghimport-source-id: f0a7f12ffd77aec063a088b18c6b1d108c712df8

Differential Revision: D15501251

Pulled By: zdevito

fbshipit-source-id: 91e6e5944d2663b673afde45fc6eed22f31101c4
2019-05-29 00:14:25 -07:00
5c53aa4869 Make build with makefiles less noisy (#21053)
Summary:
https://github.com/pytorch/pytorch/pull/17783 has made ninja and makefile builds to print out build commands unconditionally, this has made the build log very verbose, e.g. ROCm CI build log becomes >13mb. Large build log make searching for real error hard.
https://github.com/pytorch/pytorch/pull/20508 has reverted the ninja change, and this one reverts the makefile change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21053

Differential Revision: D15533412

Pulled By: bddppq

fbshipit-source-id: ad89b617d06acc670d75d4cf25111a4081e9c95e
2019-05-29 00:08:45 -07:00
9b147961c4 Fix get_gpu_memory_info in non-cuda builds (#21054)
Summary:
#21024 broke master
cc akyrola
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21054

Reviewed By: akyrola

Differential Revision: D15533406

Pulled By: bddppq

fbshipit-source-id: 0dcfa0ce865e109b46280ef1786dbc7a8af30739
2019-05-28 23:05:15 -07:00
ffdce79078 Deprecate variadic inputs of checkpoint_sequential (#21006)
Summary:
I've reported inconsistency between `checkpoint_sequential` and `nn.Sequential` at https://github.com/pytorch/pytorch/issues/19260. Both should provide the same input signature but they don't. I think the consistency is important and I agree with apaszke that `nn.Sequential`'s semantics should be kept instead of `checkpoint_sequential`.

I hope `checkpoint_sequential` raises `TypeError` on variadic arguments since PyTorch 1.2.0. But for now, it's okay just to warn as `DeprecationWarning`. I've talked about this approach with soumith.

Please review this pull request. Any comment will be my pleasure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21006

Differential Revision: D15530801

Pulled By: soumith

fbshipit-source-id: 0ceb2cc6a17dcc547d0d00ebaf9df8603be53183
2019-05-28 21:33:45 -07:00
d23d04f17f Allow nondet_tol for nondeterminism in gradcheck and gradgradcheck (#20980)
Summary:
gradcheck currently includes a determinism check (although only trying twice and seeing if results match).
This can lead to flaky tests, e.g. in #20971, but also #13818.
This adds nondet_tol for both gradcheck and gradgradcheck. It does not change / reenable any tests yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20980

Differential Revision: D15530129

Pulled By: soumith

fbshipit-source-id: 04d7f85b5b59cd62867820c74b064ba14f4fa7f8
2019-05-28 21:26:13 -07:00
d190450a35 Fix typo in CyclicLR docs (#21021)
Summary:
Fixes a typo in the CyclicLR docs by adding the lr_scheduler directory and puts in other required arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21021

Differential Revision: D15530109

Pulled By: soumith

fbshipit-source-id: 98781bdab8d82465257229e50fa3bd0015da1286
2019-05-28 21:18:50 -07:00
f1fe4b1114 add simple memory analyzer and log warning if GPU underutilized (#21024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21024

Add a new pybinded call to CUDAGetMemoryInfo.

Reviewed By: wesolwsk

Differential Revision: D15520607

fbshipit-source-id: f6d04e48f7d7cb089fc52fa8835cfee3f452d2f1
2019-05-28 19:58:54 -07:00
1bed5f39f4 Fix warning in register_c10_ops by making index unsigned (#20964)
Summary:
Just an annoying warning that's been popping up a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20964

Differential Revision: D15531064

Pulled By: Chillee

fbshipit-source-id: 9580115676c5e246481054bbfc749a551a3cca5e
2019-05-28 18:02:09 -07:00
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
c1048182be Use constants from math.h for gelu op (#20974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20974

Use constants from math.h for gelu op

Reviewed By: hl475, houseroad

Differential Revision: D15511736

fbshipit-source-id: 7d069888fb5c7c310774d056f18711365b39b8e4
2019-05-28 17:52:34 -07:00
0290897bca tracing for intra_op_parallel (#20603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603

When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized.
This diff also traces child tasks.

Reviewed By: ilia-cher

Differential Revision: D14820008

fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9
2019-05-28 17:39:23 -07:00
9a989ec469 Add an option to stop the build process once cmake terminates. (#21034)
Summary:
Add an option to setup.py to stop the build process once cmake terminates. This leaves users a chance to fine adjust build options. Also update README accordingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21034

Differential Revision: D15530096

Pulled By: soumith

fbshipit-source-id: 71ac6ff8483c3ee77c38d88f0d059db53a7d3901
2019-05-28 17:11:00 -07:00
9294de8c9f Add Tensor.T attribute to reverse dimensions (#20598)
Summary:
For compatibility with numpy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598

Differential Revision: D15499749

Pulled By: umanwizard

fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc
2019-05-28 16:59:06 -07:00
2791a44948 Renaming the relu kernel and adding hypothesis tests (#20647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20647

The initial assumption was that `qint8` would be unsigned. After introduction of `quint8` and `qint8`, some tests break.

Reviewed By: jerryzh168

Differential Revision: D15332106

fbshipit-source-id: 6ed18da428915aea918a363c5f38754a3c75d06b
2019-05-28 16:46:44 -07:00
d6d192e0af Added engine information to the profiling result. (#20493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493

This helps distinguish if the op was a quantized op or not.

Reviewed By: salexspb

Differential Revision: D15337854

fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94
2019-05-28 16:41:12 -07:00
7afa75006e Enable operator profiling via command line (#20173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173

Enabled op profiling even when net type is not dag or prof dag. Also added
engine type info to summary.

Reviewed By: salexspb, ilia-cher

Differential Revision: D15177813

fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb
2019-05-28 16:41:08 -07:00
2ba608b4a0 Fixed gcd to use 64 bit integers (#21041)
Summary:
Not much to say. Fixes implementation introduced here: https://github.com/pytorch/pytorch/pull/19115
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21041

Differential Revision: D15528801

Pulled By: Chillee

fbshipit-source-id: bacd709eb711ca00156bd70480d6051b437517ed
2019-05-28 16:20:55 -07:00
28079c3906 Make ScriptModule.training an attribute instead of a parameter (#19587)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19587 [jit] Make ScriptModule.training an attribute instead of a parameter**

Remove the hack we had previously where `training` was a buffer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19587

Differential Revision: D15502768

Pulled By: driazati

fbshipit-source-id: 3022f2d57ec6849868f9225d9bc2bfb7828cb318
2019-05-28 16:06:46 -07:00
18809f7b0b Better error message in __get_state__ to let a user know that ScriptModules can't be deep-copied atm (#20885)
Summary:
Before we look into supporting `deepcopy` we could at least improve an error msg.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20885

Differential Revision: D15511023

Pulled By: Krovatkin

fbshipit-source-id: 93b8730a2cc663eee0147f14d3341d0606748eaf
2019-05-28 15:09:07 -07:00
07c4e45ca6 Some minor fixes for the changes in #20945 (#21008)
Summary:
Fixes after #20945
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21008

Differential Revision: D15526193

Pulled By: ezyang

fbshipit-source-id: 4cfabc482c149e0aeb92ae7fff04098771fe33ed
2019-05-28 14:48:50 -07:00
0885dd28c8 refactor register_prim_ops (#21001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21001
ghimport-source-id: f1b8e3999bf18fb0f3b857a13c3e3f609e1e4b4e

Differential Revision: D15523445

Pulled By: wanchaol

fbshipit-source-id: c1e29b0985bde580703a1fca9df46da773826df6
2019-05-28 14:11:04 -07:00
b85c52923b Re-land "Fix advanced indexing on "huge" Tensors" (#21019)
Summary:
This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

cc bddppq

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21019

Differential Revision: D15518477

Pulled By: colesbury

fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34
2019-05-28 12:45:56 -07:00
52d27890dc Improve error message for missing attribute (#20779)
Summary:
Fixes #20495 .

Now for
```python
        class A(torch.jit.ScriptModule):
            def __init__(self):
                super(A, self).__init__()

            torch.jit.script_method
            def forward(self, x):
                return x + self.whatisgoingon

        class B(A):
            def __init__(self):
                super(B, self).__init__()
            torch.jit.script_method
            def bar(self, x):
                return x * x

        A()
```
it does
```
RuntimeError:
attribute 'whatisgoingon' does not exist:
torch.jit.script_method
def forward(self, x):
    return x + self.whatisgoingon
               ~~~~~~~~~~~~~~~~~~ <--- HERE

```

I added a test in `test_jit.py` as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20779

Differential Revision: D15441138

Pulled By: Chillee

fbshipit-source-id: 88f458c36b5e32a1ffc467b27bbc28a3c5c07321
2019-05-28 12:27:52 -07:00
bc10677fcb Some name and variable cleanup (#20861)
Summary:
As a part of https://github.com/pytorch/pytorch/pull/20580 I noticed that we had some unusual variable naming in `summary.py`. This cleans it up and also removes some variables that weren't being used.

I'll wait until we have an `add_custom_scalars` test to land this.

cc lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20861

Differential Revision: D15503420

Pulled By: orionr

fbshipit-source-id: 86d105a346198a1ca543d1c5d297804402ab5a0c
2019-05-28 12:22:47 -07:00
99674eb86f Re-enable test_dag_net_forking on ROCm (#21013)
Summary:
Fixes #16229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21013

Differential Revision: D15515824

Pulled By: bddppq

fbshipit-source-id: 23a6c7eaad6129328c6b9dfcc55ac2d31a6d2dc0
2019-05-28 12:12:53 -07:00
082936f033 Clarify cycliclr param docs (#20880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20880

This clarifies how the momentum parameters should be used.

Reviewed By: soumith

Differential Revision: D15482450

fbshipit-source-id: e3649a38876c5912cb101d8e404abca7c3431766
2019-05-28 12:07:47 -07:00
68c3ef72b5 Change bound shape inference for LengthsRangeFill & GatherRanges, add more tests (#20610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20610

Change InferLengthsRangeFill
Add InferGatherRanges
add tests from ClipRangesGatherSigridHash all the way to SparseLengthsWeightedSum
add tests from SigridTransforms all the way to SparseLengthsWeightedSum

e2e test will be added in the following diff

Reviewed By: ipiszy

Differential Revision: D15382730

fbshipit-source-id: a611cd129007a273dfc43955cd99af1c4ed04efd
2019-05-28 11:33:51 -07:00
bbe3411846 Refactor schema_matching.cpp (#20549)
Summary:
It was kind of hard to read through this code so this adds a bunch of comments, no behavior should be changed

](https://our.intern.facebook.com/intern/diff/15499974/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20549

Pulled By: driazati

Differential Revision: D15499974

fbshipit-source-id: 95bf660c3b2bab1c90a2250696cece68bd1925cc
2019-05-28 10:55:09 -07:00
ff6cda0da6 Generate TH functions outside of Type (#20309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20309
ghimport-source-id: d0a0195be53f991f20eb0fbb03edf3814f18b831

Differential Revision: D15509848

Pulled By: li-roy

fbshipit-source-id: 35aafdcb9bb868a41f75cf422c48d357f8655d67
2019-05-28 02:55:51 -07:00
eacb311810 Move 1d tensor checks to TH (#20859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20859
ghimport-source-id: 675cea2c31b48bb5e3b4676640021ace783ea3a8

Differential Revision: D15509850

Pulled By: li-roy

fbshipit-source-id: 468b3b1249d58dd8104643d61d263d1f9b0308bf
2019-05-28 02:55:48 -07:00
d2f14db6cb Change view dispatch to abstract (#20308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20308
ghimport-source-id: cac8d130d45cc36e51d1661c15ad98c10353ea54

Differential Revision: D15509849

Pulled By: li-roy

fbshipit-source-id: 9576028b7075f58c431dc8c12a38c4c5a34c9340
2019-05-28 02:55:41 -07:00
580eab6562 Restore TBB module (#20454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454
ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384

Differential Revision: D15326062

Pulled By: ilia-cher

fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f
2019-05-28 02:49:36 -07:00
82aecfad6a Native ATen/Parallel backend (#20087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20087
ghimport-source-id: bcfc8a86abe0893e4a380fe6f6123e2082ba4317

Differential Revision: D15248663

Pulled By: ilia-cher

fbshipit-source-id: fdb7a8860c85d8202026b629cb7fa344782bd2c4
2019-05-28 01:40:54 -07:00
f4b434a6a5 Fix incorrect torch version in CMake (#21007)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20525
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21007

Differential Revision: D15515260

Pulled By: soumith

fbshipit-source-id: 149084cce276c5e76ca0c5c0872c5e990c47bdfd
2019-05-27 23:46:49 -07:00
0556141339 fix small typo muliprocessing -> multiprocessing (#20998)
Summary:
Minor typo fix in docstring.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20998

Differential Revision: D15514698

Pulled By: soumith

fbshipit-source-id: a9ceb557251ff5868e810331195243b6a8717851
2019-05-27 21:36:13 -07:00
5ddbfc97e9 Revert D15501945: [pytorch][PR] Fix advanced indexing on "huge" Tensors
Differential Revision:
D15501945

Original commit changeset: e876e678e866

fbshipit-source-id: 2833eb118a62e301571a983529f6e4fc91442581
2019-05-27 20:26:37 -07:00
3b0d431bf5 Check for incompatible versions between CUDA and MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20945

Differential Revision: D15514576

Pulled By: ezyang

fbshipit-source-id: 3c0b8b64edce236a84a7195605d437a00a67b7f4
2019-05-27 19:22:21 -07:00
0d35f14565 Update cuSPARSE namespace collision w/ CUDA 10.1 Update 1 (#20889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20889
ghimport-source-id: 8f5f500fa542d4992cd9213923e1af8de115ee58

Differential Revision: D15495545

Pulled By: ezyang

fbshipit-source-id: 60057cf13694158299a8124b1a787cb4e3c21d21
2019-05-27 18:43:32 -07:00
9d9751f634 Convert dequantize_linear to an internal function _dequantize_linear (#20938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20938

Dequantize_linear need not be exposed to the front end users.
It will only be used for the jit passes for q-dq insertion and op
substitution.

Differential Revision: D15446097

fbshipit-source-id: a5fbcf2bb72115122c9653e5089d014e2a2e891d
2019-05-27 15:40:21 -07:00
8e3311c5e2 Remove functionality unsupported by the JIT from multi_head_attention_forward. (#20653)
Summary:
Remove the internal functions in multi_head_attention_forward. Those internal functions cause 10-15% performance regression and there is possibly a JIT issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20653

Differential Revision: D15398888

Pulled By: cpuhrsch

fbshipit-source-id: 0a3f053a4ade5009e73d3974fa6733c2bff9d929
2019-05-27 15:12:58 -07:00
6e76813a39 fix SyncBatchNorm doc (#20991)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19265
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20991

Differential Revision: D15513518

Pulled By: soumith

fbshipit-source-id: 9618c0b2442e013e4d37793cdb04cb4f4b1b141c
2019-05-27 14:46:58 -07:00
ebc8d7170e fix the bug for mkldnn clone (#20943)
Summary:
This PR is to solve the bug for clone a MKLDNN tensor, please see the issue https://github.com/pytorch/pytorch/issues/20895.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20943

Differential Revision: D15511516

Pulled By: mrshenli

fbshipit-source-id: 05b41d6c7eaf8703521f4c768b8f26ec8501dc5e
2019-05-27 12:09:52 -07:00
6480d3f140 Revert D15511921: [pytorch][PR] BatchSampler now uses list.clear() instead of creating new objects
Differential Revision:
D15511921

Original commit changeset: e943d21e75e1

fbshipit-source-id: 933b7ef74c7a530f0a2cc087c8ee6f0455cf9239
2019-05-27 10:51:24 -07:00
482ae8e6b2 BatchSampler now uses list.clear() instead of creating new objects
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20976

Differential Revision: D15511921

Pulled By: soumith

fbshipit-source-id: e943d21e75e19f9154a0570f3188cc3ce174083e
2019-05-26 23:45:26 -07:00
ecf012213b Update submodule URL based on redirection. (#20973)
Summary:
Changes:
  - protobuf has been moved to protocolbuffers/protobuf a while ago.
  - cpuinfo has been moved to pytorch/cpuinfo and updated in FBGEMM recently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20973

Differential Revision: D15511926

Pulled By: soumith

fbshipit-source-id: 2c50373c9b245524f839bd1059870dd2b84e3b81
2019-05-26 22:29:21 -07:00
bb89827e1d Update cuda pinned memory note to include tensor.to (#20977)
Summary:
separate bits of changes from #19228
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20977

Differential Revision: D15511919

Pulled By: soumith

fbshipit-source-id: 5015a29cdac6d6e160388c493182c330f0da63ec
2019-05-26 22:22:06 -07:00
1e8f129a05 In setup.py, also check some submodules of submodules. (#20937)
Summary:
Sometimes users forget using the "--recursive" option when they update submodules. This added check should help expose this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20937

Differential Revision: D15502846

Pulled By: mrshenli

fbshipit-source-id: 34c28a2c71ee6442d16b8b741ea44a18733b1536
2019-05-26 18:43:24 -07:00
8dbdd00f87 tweak tqdm to have download speed in kB/MB/etc (#20908)
Summary:
This changes the progress bars in `_download_url_to_file` from saying things like `49773343.40it/s` to `47.5MB/s`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20908

Differential Revision: D15511223

Pulled By: soumith

fbshipit-source-id: 2422eb5fb486f9ef4bd69c556c4ed1775b8b2860
2019-05-26 15:34:47 -07:00
5ab6e07180 .view(...) now suggests .reshape(...) instead .contiguous().view(...)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20968

Differential Revision: D15511236

Pulled By: soumith

fbshipit-source-id: 673fc2982ad6ea287fdd0cff2684bdc2317a6709
2019-05-26 15:34:44 -07:00
c611630b9d Fix subscripts in RNN documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20949

Differential Revision: D15510760

Pulled By: soumith

fbshipit-source-id: 51e9dbea7d8c8194e46e12311e397deff32dbe2f
2019-05-26 14:57:40 -07:00
a3a458ed30 Fix align corner docs (#20961)
Summary:
I believe the `True` and `False` in the doc are reversed :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20961

Differential Revision: D15510806

Pulled By: soumith

fbshipit-source-id: 62566bb595e187506b23dedc24892e48f35b1147
2019-05-26 14:57:37 -07:00
5e69e76aba Remove padding_mode from torch.nn.functional.conv{1,2,3}d's docstr (#20891)
Summary:
Fixes #20694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20891

Differential Revision: D15510790

Pulled By: soumith

fbshipit-source-id: aa3630693c7446bf18a390cb49c4df9bc9c59eea
2019-05-26 14:52:51 -07:00
4c5b1e3460 Update nccl submodule to v2.4.6 (#20882)
Summary:
Fixes #20630

Haven't tested it yet. Let's see if it passes all CI tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20882

Reviewed By: pietern

Differential Revision: D15483561

Pulled By: mrshenli

fbshipit-source-id: 5f0730a04d92906af077b2fe2170b674ca371e6c
2019-05-26 13:00:26 -07:00
9310e600f6 Use a simpler way to delete recursive function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20913

Differential Revision: D15508071

Pulled By: mrshenli

fbshipit-source-id: ad9a0ab4295bb0f1063d43682a10c124d8384635
2019-05-26 12:17:25 -07:00
66e6571eb8 fixed issue #20921 (#20922)
Summary:
For tensor creation ops like `torch.zeros` and `torch.ones`, the docs [0], [1] use `sizes` as the first argument to the function call while the correct argument is `size`.  This is tested for pytorch 1.1 installed using pip on ubuntu 19.04

An example

```
>>> torch.zeros(2, 3)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.zeros(sizes = (2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: zeros() missing 1 required positional arguments: "size"
>>> torch.zeros(size = (2, 3))
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> torch.ones(sizes = (2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ones() missing 1 required positional arguments: "size"
>>> torch.ones(size = (2, 3))
tensor([[1., 1., 1.],
        [1., 1., 1.]])
```

[0]: https://pytorch.org/docs/master/torch.html#torch.zeros
[1]: https://pytorch.org/docs/master/torch.html#torch.ones
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20922

Differential Revision: D15498741

Pulled By: mrshenli

fbshipit-source-id: 963324ffa004d62ca77ce30ed6f0c3932b5b79b7
2019-05-25 22:22:18 -07:00
83fe92870d Update multiprocessing note now that shared CUDA tensors are refcounted (#19904)
Summary:
The mp notes are not updated after https://github.com/pytorch/pytorch/pull/16854. (The torch.multiprocessing page is.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19904

Differential Revision: D15509661

Pulled By: soumith

fbshipit-source-id: 7c11e14a6c804498dda3adbf19710e63e6a564a0
2019-05-25 17:40:42 -07:00
bdce5533fe Fix pytorch_macos_10_13_py3_test (#20944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20944
ghimport-source-id: da2dcbfaff4e0e75f6b4e836fc1af1d8aee11c56

Differential Revision: D15508912

Pulled By: ezyang

fbshipit-source-id: 6758a8a516d0a875a5f6bbbb12e43d899bcf2161
2019-05-25 08:17:34 -07:00
81e70ffa19 fix bug of not using get_score_cls_index in BoxWithNMSLimitOp (#20868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20868

When `input_boxes_include_bg_cls` is false (which means `input_scores_fg_cls_starting_id` is 0), It doesn't map the class index of score currectly when sorting and limiting the detections over all classes after nms.

Reviewed By: newstzpz

Differential Revision: D15472706

fbshipit-source-id: dc1e808b63ad09fb4bd95acf866771bb3fa92d69
2019-05-24 22:31:01 -07:00
2fb665a9df Add warning about memory overhead when using multiple tiny tensors (#20801)
Summary:
added note in docs regarding #19408
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20801

Differential Revision: D15503351

Pulled By: mrshenli

fbshipit-source-id: 7ab371a7992233fb867aadd4bb6b74fccd232c33
2019-05-24 21:45:51 -07:00
c7e0722814 allow pass ordered dict for nn sequential (#20796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20796
ghimport-source-id: 9f895a2a6ebc71984196b868dc3ea6a12286bc81

Differential Revision: D15505330

Pulled By: wanchaol

fbshipit-source-id: 2922c56606b477a34f4e6433fa790d5b2de9d77a
2019-05-24 20:31:05 -07:00
b93bdf6989 Fix advanced indexing on "huge" Tensors (#20919)
Summary:
This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

Fixes #20888
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20919

Differential Revision: D15501945

Pulled By: colesbury

fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0
2019-05-24 16:25:04 -07:00
430d1a2761 Attempt to fix flaky test_structseq_repr (#20931)
Summary:
Previously, this used `crepr` afer the decref of `repr`. This is not
allowed because `repr` owns the cached copy of `crepr`.

Let's see if this fixes the contbuild.

See #20926
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20931

Differential Revision: D15501929

Pulled By: colesbury

fbshipit-source-id: 24141ba62df8758d2a3998cf7c2054be09088b6a
2019-05-24 15:55:44 -07:00
b1df8bfe8a Reduce set of build/tests which run on PRs. (#20930)
Summary:
Resubmit of #20775

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20930

Differential Revision: D15503173

Pulled By: ezyang

fbshipit-source-id: a5de8eacf6b29ee26f07ac53c915fff3f4d32569
2019-05-24 15:25:37 -07:00
c46c6a4fe6 Zero slice bug (#20914)
Summary:
Bug reported internally at FB:

```python
>>> t=torch.from_numpy(np.empty((0,4)))
>>> t[:,1::2]*=1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Trying to resize storage that is not resizable at ../aten/src/TH/THStorageFunctions.cpp:76
```

This happens because the storage offset of `t[:, 1::2]` is 1, and it has 0 elements. We can fix this by avoiding resizing the storage for no-element arrays.

(We could *also* have avoided it by not modifying the storage index in this case, but I felt this way was more semantically correct -- in general, we should not be assuming it's okay to do anything to the storage when it has zero elements).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20914

Differential Revision: D15497860

Pulled By: umanwizard

fbshipit-source-id: 6af61d73a05edfc5c07ce8be9e530f15bf72e6a9
2019-05-24 15:10:59 -07:00
3858e1684b Don't print backtrace for interpreter errors (#20925)
Summary:
Eager Python errors don't include a backtrace so script shouldn't either

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20925

Pulled By: driazati

Differential Revision: D15499952

fbshipit-source-id: 1169f13ba5578cd52948725eda73de8229146bb1
2019-05-24 14:58:48 -07:00
371bd043d6 register ResizeNearestOp to C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20928

Reviewed By: smessmer

Differential Revision: D15499661

fbshipit-source-id: 5af24d5c9d7ff739b8355e19dfe66b496bc026a5
2019-05-24 14:39:11 -07:00
b5a5e296aa Support 3D mesh/point cloud (#20413)
Summary:
I started adding support for the new **[mesh/point cloud](https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/g3doc/tensorboard.md)** data type introduced to TensorBoard recently.

I created the functions to add the data, created the appropriate summaries.
This new data type however requires a **Merged** summary containing the data for the vertices, colors and faces.

I got stuck at this stage. Maybe someone can help. lanpa?

I converted the example code by Google to PyTorch:
```python
import numpy as np
import trimesh

import torch
from torch.utils.tensorboard import SummaryWriter

sample_mesh = 'https://storage.googleapis.com/tensorflow-graphics/tensorboard/test_data/ShortDance07_a175_00001.ply'
log_dir = 'runs/torch'
batch_size = 1

# Camera and scene configuration.
config_dict = {
    'camera': {'cls': 'PerspectiveCamera', 'fov': 75},
    'lights': [
        {
            'cls': 'AmbientLight',
            'color': '#ffffff',
            'intensity': 0.75,
        }, {
            'cls': 'DirectionalLight',
            'color': '#ffffff',
            'intensity': 0.75,
            'position': [0, -1, 2],
        }],
    'material': {
        'cls': 'MeshStandardMaterial',
        'roughness': 1,
        'metalness': 0
    }
}

# Read all sample PLY files.
mesh = trimesh.load_remote(sample_mesh)
vertices = np.array(mesh.vertices)
# Currently only supports RGB colors.
colors = np.array(mesh.visual.vertex_colors[:, :3])
faces = np.array(mesh.faces)

# Add batch dimension, so our data will be of shape BxNxC.
vertices = np.expand_dims(vertices, 0)
colors = np.expand_dims(colors, 0)
faces = np.expand_dims(faces, 0)

# Create data placeholders of the same shape as data itself.
vertices_tensor = torch.as_tensor(vertices)
faces_tensor = torch.as_tensor(faces)
colors_tensor = torch.as_tensor(colors)

writer = SummaryWriter(log_dir)

writer.add_mesh('mesh_color_tensor', vertices=vertices_tensor, faces=faces_tensor,
                colors=colors_tensor, config_dict=config_dict)

writer.close()
```

I tried adding only the vertex summary, hence the others are supposed to be optional.
I got the following error from TensorBoard and it also didn't display the points:
```
Traceback (most recent call last):
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 302, in run_wsgi
    execute(self.server.app)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/serving.py", line 290, in execute
    application_iter = app(environ, start_response)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/application.py", line 309, in __call__
    return self.data_applications[clean_path](environ, start_response)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/werkzeug/wrappers/base_request.py", line 235, in application
    resp = f(*args[:-2] + (request,))
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 252, in _serve_mesh_metadata
    tensor_events = self._collect_tensor_events(request)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 188, in _collect_tensor_events
    tensors = self._multiplexer.Tensors(run, instance_tag)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_multiplexer.py", line 400, in Tensors
    return accumulator.Tensors(tag)
  File "/home/dawars/workspace/pytorch/venv/lib/python3.6/site-packages/tensorboard/backend/event_processing/plugin_event_accumulator.py", line 437, in Tensors
    return self.tensors_by_tag[tag].Items(_TENSOR_RESERVOIR_KEY)
KeyError: 'mesh_color_tensor_COLOR'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20413

Differential Revision: D15500737

Pulled By: orionr

fbshipit-source-id: 426e8b966037d08c065bce5198fd485fd80a2b67
2019-05-24 14:30:58 -07:00
6063ffd055 Specify dispatch key with kernel (#20821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821

Change registration API. Instead of

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>()
        .dispatchKey(CPUTensorId()));

it is now

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(CPUTensorId()));

This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers.

The semantic problem behind this is that the dispatch key is a *kernel config parameter* and not an *operator config parameter* while things like autograd wrappers, alias info, and actually the kernel itself are *operator config parameters*. And while previously, the different kind of config parameters have been mixed, this diff now separates them.

Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example.

    // what is this supposed to do?
    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .aliasInfo(DEFAULT)
        .dispatchKey(CPUTensorId()));

If we get more kernel config parameters in the future, we could introduce something like this

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(torch::RegisterOperators::kernelOptions()
            .dispatchKey(CPUTensorId())
            .otherConfig());

but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility.

A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call:

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel1>(CPUTensorId())
        .kernel<Kernel2>(CUDATensorId()));

Reviewed By: dzhulgakov

Differential Revision: D15455790

fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11
2019-05-24 14:23:35 -07:00
a2328a27e9 Improve torch.cdist performance (#20605)
Summary:
Fix based on https://github.com/pytorch/pytorch/issues/15253
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20605

Differential Revision: D15396123

Pulled By: ifedan

fbshipit-source-id: 3ed373e68339a35360f083d4aad1b655abcaf97e
2019-05-24 14:06:55 -07:00
4501dc305d Assert against using Operator methods not supported when exporting it to c10 (#17818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14392459

fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52
2019-05-24 13:45:01 -07:00
c8f404a68e Revert D15499918: Reduce set of build/tests which run on PRs.
Differential Revision:
D15499918

Original commit changeset: 992e3f91f95d

fbshipit-source-id: 86fc43d3da7ea3e3a32e95fc4f4f3de6cbd5d49b
2019-05-24 12:55:04 -07:00
d03265b44f Reduce set of build/tests which run on PRs. (#20775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20775
ghimport-source-id: 8d05ed03b8a841233d578a38b7c84bd1152c08e5

Differential Revision: D15499918

Pulled By: ezyang

fbshipit-source-id: 992e3f91f95dd9c0564e5ed6793dd1b286ddba00
2019-05-24 12:29:52 -07:00
dee11a92c1 Use Device instead of Backend in TensorIterator (#20690)
Summary:
This PR also moves Device::validate into the header file, which makes
statements like `Device d = kCPU` effectively free.

Device includes the device's index, so TensorIterator::compute_types
now implicitly checks that all CUDA inputs are on the same GPU.
Previously, this was done ad-hoc in places like TensorIterator::binary_op.

Note that zero-dim Tensor (scalars) are NOT required to be on the
same device as other inputs because they behave almost like Python numbers.
TensorIterator handles copying zero-dim Tensors to the common device.

Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU
and GPU, but not between different GPUs (because Backend didn't encode
the GPU index). This removes that restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690

Differential Revision: D15414826

Pulled By: colesbury

fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f
2019-05-24 12:14:08 -07:00
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
47043220ee Update version strings to 1.2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20812

Differential Revision: D15451892

Pulled By: gchanan

fbshipit-source-id: 07355dbd446053a69b5cf4e3be1842aa1075c71f
2019-05-24 11:07:29 -07:00
a640c81536 Add llvm8 installation step. (#20879)
Summary:
Add ability to build docker container with llvm8.

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20879

Differential Revision: D15497037

Pulled By: rdzhabarov

fbshipit-source-id: d673d1ddd4156c95516e61223b397c2f9bce1214
2019-05-24 10:51:53 -07:00
fa20327618 Update Refinement Docs (#20912)
Summary:
To say that we don't do refinement on module attributes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20912

Differential Revision: D15496453

Pulled By: eellison

fbshipit-source-id: a1ab9fb0157a30fa1bb71d0793fcc9b1670c4926
2019-05-24 10:17:55 -07:00
8c4b2a835b Remove extra workspace queries in matrix inverse computation (#20904)
Summary:
Earlier, the workspace size query and allocation was placed inside the loop.
However, since we have batches of matrices with the same number of rows and columns, the workspace size query and allocation for every matrix in the batch is redundant.

This PR moves the workspace size query and allocation outside the loop, effectively saving (batch_size - 1) number of queries and allocation (and consequently the deallocation).

There is a tremendous speedup in inverse computation as a result of this change.

Changelog:
- Move workspace query and allocation outside the batch loop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20904

Differential Revision: D15495505

Pulled By: ezyang

fbshipit-source-id: 226729734465fcaf896f86e1b1a548a81440e082
2019-05-24 09:59:37 -07:00
4109ec1278 In Dockerfile, do not install unecessary packages, use conda to install ninja (saving one layer), and use "." to refer to WORKDIR to reduce redundancy. (#20881)
Summary:
- Do not install unecessary packages in the Docker image.
- In the Docker image, use conda to install ninja (saving one layer)
- When workdir is set, use "." to refer to it to reduce redundancy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20881

Differential Revision: D15495769

Pulled By: ezyang

fbshipit-source-id: dab7df71ac107c85fb1447697e25978daffc7e0b
2019-05-24 09:32:40 -07:00
6af2482612 Leave it as an option for whether to colorize output during build (#20771)
Summary:
Currently PyTorch forces color output due to #20662. But users should be left an option to turn it off because redirection of the output to a file would be messed if color output is forced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20771

Differential Revision: D15495677

Pulled By: ezyang

fbshipit-source-id: 9d89bbed40d0b67368554305394763a54c5ff6f5
2019-05-24 09:22:52 -07:00
ec45baf4dd tensor_illustration with correct numbers and better fonts for README file (#20751)
Summary:
Fix of README tensor image for issue #20641
Numbers are fixed, symbols made more readable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20751

Differential Revision: D15495706

Pulled By: ezyang

fbshipit-source-id: b6013574d16253ec681fc57143efe3d53952fbe9
2019-05-24 09:18:18 -07:00
ef1fdc27a3 Raise TypeError when the argument to isinf and isfinite is not a tensor (#20817)
Summary:
Currently when the argument to isinf and isfinite is not tensor, a ValueError is raised. This, however, should be a TypeError, because the error is a type mismatch.

In the error message, "str(tensor)" is replaced by "repr(tensor)" because, when an error occurs, a printable representation of the object is likely more useful than the "informal" string version of the object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20817

Differential Revision: D15495624

Pulled By: ezyang

fbshipit-source-id: 514198dcd723a7031818e50a87e187b22d51af73
2019-05-24 09:18:15 -07:00
87040af498 Fix documentation for attention mask shape (#20850)
Summary:
Attention mask should be of shape `(L, S)` since it is added to `attn_output_weights`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20850

Differential Revision: D15495587

Pulled By: ezyang

fbshipit-source-id: 61d6801da5291df960daab273e874df28aedbf6e
2019-05-24 09:10:11 -07:00
a5c90aaf47 Use "length of the RNN input" instead of "length of the RNN"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20873

Differential Revision: D15495570

Pulled By: ezyang

fbshipit-source-id: e3b4cd67ccf97d0053ac053c3bcb74415b928c0a
2019-05-24 09:03:50 -07:00
3e4f213e82 Instructions for how to update pytorch-ci-hud when updating binary builds (#20758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20758
ghimport-source-id: ffb4c97c42c6efbb16ea5d93ea8af1bdf71cb1e4

Differential Revision: D15435639

Pulled By: ezyang

fbshipit-source-id: a12bde8b0b11bbe0d0280b6b3994d9c65dc4f5cc
2019-05-24 07:20:06 -07:00
c3d05e86cc Resend "Split ATen/Parallel into interface and backend" (#20825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20825
ghimport-source-id: 0371fbd37cb37635647d473d5ac9f2859e787061

Differential Revision: D15458073

Pulled By: ilia-cher

fbshipit-source-id: cd27d0da1691f6be1183cd152348ac0d93a53996
2019-05-24 02:03:06 -07:00
6b74856747 Fix init_thread calls in thread pool initialization (#20848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20848
ghimport-source-id: e542858a198252838c1f3100dbfbe90fd3960f07

Differential Revision: D15466918

Pulled By: ilia-cher

fbshipit-source-id: e75d38f51edd5b508c4ca28a292e4141e90f209f
2019-05-24 01:14:31 -07:00
1bb728fe14 Change the quantizer to match the behavior of the FBGEMM implementation (#20892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20892

FBGEMM uses 64 bit values. Need to change our implementation to match

Reviewed By: jerryzh168

Differential Revision: D15487664

fbshipit-source-id: 29cba26093c6f9aeafce14982c1ae12149e63562
2019-05-24 00:46:08 -07:00
fc941d3bca Catchall kernels instead of fallback kernels (#20773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20773

This removes the feature to register fallback kernels that are called when no other kernel matches.
Instead, we introduce the concept of catchall kernels that are always called independent of inputs.
If you only have a fallback/catchall kernel and no kernels with concrete dispatch keys, then both concepts behave in the same way.
The difference is that we now disallow operators to have both, a catchall kernel and kernels with concrete dispatch keys.
This was possible before when they have been fallback kernels.

The reason for this change is that we anticipate needing a method_missing feature in backends, i.e. a backend-wide fallback to call when the backend doesn't specify a kernel for an operator.
We are not clear on precendence between this backend-wide fallback and an operator level fallback. Disallow fallbacks for now so we are free to choose later without breaking backwards compatibility.

Reviewed By: dzhulgakov

Differential Revision: D15438977

fbshipit-source-id: cb3aa764a1659d909ee21a7bd8ec3d32438aafaa
2019-05-23 23:47:51 -07:00
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
8cde4c4d22 Remove Variable::Impl and DifferentiableViewImpl (#17072)
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.

After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.

**Note that this PR is BC-breaking in the following use cases:**

**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.

**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
    # Change gradient to a sparse tensor
    params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))

grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad)  # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072

Differential Revision: D14075257

Pulled By: yf225

fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
2019-05-23 21:09:04 -07:00
f93e0619f3 Adding ShufflenetV2 to caffe2's benchmark suite. (#20180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20180

Adding ShufflenetV2 (by Ma et. al. 2018) to the caffe2's benchmark
suite.

To run, use: `buck run mode/opt caffe2/caffe2/python/examples:imagenet_trainer -- --train_data null --batch_size 128 --epoch_size 3200 --num_epochs 2 --num_gpus 2 --model shufflenet`

Reviewed By: bddppq, xw285cornell

Differential Revision: D15094282

fbshipit-source-id: 0e1ce9c5975868e917b0f179e2c5b15647a76b4e
2019-05-23 20:40:17 -07:00
3aa7ee6fe6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 17161d7e1e742b402715f8ed006e5b3abfa78561
2019-05-23 20:40:14 -07:00
cfb6c4a8ee Updating submodules
Reviewed By: yns88

fbshipit-source-id: 58b230ad12620032f391733c7f9c1e44aeaa390b
2019-05-23 19:49:06 -07:00
62af37aa88 dropout symbolic_script should respect the training flag (#20760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20760
ghimport-source-id: eb667c3549a03a2fc01ffa0a2d3bc7e3a29b78e0

Reviewed By: jamesr66a

Differential Revision: D15486511

Pulled By: suo

fbshipit-source-id: 56ae930a01b0f6f4305a2a745135d4529b4a1ca0
2019-05-23 18:17:17 -07:00
bd53c8eb93 Move torchvision install out of onnx test script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20890

Differential Revision: D15486657

Pulled By: bddppq

fbshipit-source-id: 3acd7386d1f070cad9bd43d6e74244b706c0dc16
2019-05-23 18:02:48 -07:00
d5b7138a2c Dict is a reference type (#20669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20669

Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.

Reviewed By: dzhulgakov

Differential Revision: D15404911

fbshipit-source-id: dc990d3eb7cae044b74dd0253f8b704dde6a6c86
2019-05-23 15:24:31 -07:00
93d5503f34 bug fix 19374 - fix for upsample export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20116

Differential Revision: D15256899

Pulled By: houseroad

fbshipit-source-id: cf0dfd679d528fbb77f483e23071f4a96fb27091
2019-05-23 14:48:23 -07:00
48bf7b9be8 Fix oscillation in coalesceInsertedDataDependencies (#20833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20833

Att. The algorithm is still "horrendously inefficient". But since we are sunsetting Nomnigraph, I just did the minimal fix here.

Reviewed By: tracelogfb

Differential Revision: D15463880

fbshipit-source-id: 413a1280a92c1923ba49031177816a2d5f888575
2019-05-23 14:04:20 -07:00
2d96876d88 Use conda torchvision version (#20865)
Summary:
This tries to fix the following error on current master:
```
May 23 16:18:47 Traceback (most recent call last):
May 23 16:18:47   File "main.py", line 7, in <module>
May 23 16:18:47     from torchvision import datasets, transforms
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/__init__.py", line 1, in <module>
May 23 16:18:47     from torchvision import models
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/models/__init__.py", line 11, in <module>
May 23 16:18:47     from . import detection
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/__init__.py", line 1, in <module>
May 23 16:18:47     from .faster_rcnn import *
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py", line 7, in <module>
May 23 16:18:47     from torchvision.ops import misc as misc_nn_ops
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/ops/__init__.py", line 1, in <module>
May 23 16:18:47     from .boxes import nms, box_iou
May 23 16:18:47   File "/opt/conda/lib/python3.6/site-packages/torchvision/ops/boxes.py", line 2, in <module>
May 23 16:18:47     from torchvision import _C
May 23 16:18:47 ImportError: /opt/conda/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19NonVariableTypeMode10is_enabledEv
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20865

Differential Revision: D15481736

Pulled By: yf225

fbshipit-source-id: 67d4fd70652ccc709b44cb15392d6e44a8fe9235
2019-05-23 13:49:59 -07:00
b6d0f6c85a Move THCTensor_{random, clampedRandom, cappedRandom} to ATen (#20620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20620
ghimport-source-id: 7c09c2462021e3fa5adef61570a575964ff16125

Differential Revision: D15454050

Pulled By: ezyang

fbshipit-source-id: 5b0421c56445baf19dbdbdd9680af128a5cdf443
2019-05-23 13:44:16 -07:00
48424a6c94 Avoid dynamic dispatch inside the omp loop in AdaptiveAvgPool2d (#20366)
Summary:
This PR changes CPU implementation of `AdaptiveAveragePool2D` by
- move dispatch to outside the OpenMP loop
- support fp16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20366

Differential Revision: D15456069

Pulled By: ezyang

fbshipit-source-id: 00fa2916f8b136af9f5c8b5db0eca4619f9f5bac
2019-05-23 13:29:29 -07:00
cf0268e51c Modify cos to cosh in Vec256 (#20797)
Summary:
Minor typo fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20797

Differential Revision: D15469308

Pulled By: ezyang

fbshipit-source-id: 3288ad69316e296e46d861737c5b09e0ea1e694b
2019-05-23 13:24:09 -07:00
70caa2efe2 Add mkldnn sigmoid operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20820

Reviewed By: dzhulgakov

Differential Revision: D15455866

fbshipit-source-id: 712b06dfbd441051dc284a1acdf94926df09bc1d
2019-05-23 12:51:57 -07:00
8dedb04c26 Enable torch.jit.trace for mkldnn modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20800

Differential Revision: D15447892

fbshipit-source-id: 78e76523c5412c020a2bc22d6998ff7b36356720
2019-05-23 12:51:54 -07:00
63585c3b81 Add support for save and load mkldnn modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20799

Reviewed By: wanchaol

Differential Revision: D15447891

fbshipit-source-id: e34de946c79282fb934a5c52ff1def41c7993c75
2019-05-23 12:51:50 -07:00
5f83c5d834 Fix build error with MSVC (#20853)
Summary:
Close #20642

Possibly broken by #19816
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20853

Differential Revision: D15474620

Pulled By: jerryzh168

fbshipit-source-id: 99b52d92a93bac7cab52537f1ebdbd286d4b2cfe
2019-05-23 12:11:29 -07:00
31e2d20c5e Dictionarize check_inputs coming from trace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20813

Differential Revision: D15466836

Pulled By: Krovatkin

fbshipit-source-id: ffdb418592b76dc67c65c59f4dc7303f08734f97
2019-05-23 11:17:55 -07:00
2c556a9489 fix the input/output type mismatch (#20829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20829

as title

Reviewed By: jamesr66a

Differential Revision: D15461937

fbshipit-source-id: 02c7150c0e8d020030ae8898008f718c74850dca
2019-05-23 11:08:21 -07:00
9c57d8df42 Make LayerNorm.normalized_shape a tuple
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20832

Pulled By: driazati

Differential Revision: D15464693

fbshipit-source-id: 244f24d6917b17dde5e33ff852c716fb053b7ca5
2019-05-23 10:51:08 -07:00
99b3f5cd70 Fixes error with custom scalars, fixes #20579 (#20580)
Summary:
When adding custom scalars like this
```python
from torch.utils.tensorboard import SummaryWriter

with SummaryWriter() as writer:
    writer.add_custom_scalars({'Stuff': {
        'Losses': ['MultiLine', ['loss/(one|two)']],
        'Metrics': ['MultiLine', ['metric/(three|four)']],
    }})
```
This error is raised:
```
TypeError: Parameter to MergeFrom() must be instance of same class: expected tensorboard.SummaryMetadata.PluginData got list.
```

Removing the square brackets around `SummaryMetadata.PluginData(plugin_name='custom_scalars')` should be enough to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20580

Differential Revision: D15469700

Pulled By: orionr

fbshipit-source-id: 7ce58034bc2a74ab149fee6419319db68d8abafe
2019-05-23 10:17:36 -07:00
a16708a1ae Workaround python2.7 find_module limitation / explicitly close file (#20782)
Summary:
fix #20781 #20757
hmm I don't know an easy way to add a test to make sure it runs against a package installed as .egg. But i tested it locally with torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20782

Differential Revision: D15443600

Pulled By: ailzhang

fbshipit-source-id: 285eb0d9a44d6edb8e93618fa293f4feb431d2ae
2019-05-23 09:44:17 -07:00
ec57d1f18a Port dilated_max_pool2d() to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20691

Differential Revision: D15435960

Pulled By: ezyang

fbshipit-source-id: 548b7cc42e52ad2c641ec7d9cf78028d9411d02e
2019-05-23 09:04:04 -07:00
f039401bf2 Add back at::_copy_from for use by XLA (#20783)
Summary:
XLA needs a way to override CPUTensor.copy_(XLATensor), but we only
dispatch on the "self" argument. This inverts the dispatch order when
"src" is an unhandled type.

Note that things like XLATensor.copy_(CPUTensor) never enter this
implementation.

cc dlibenzi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20783

Differential Revision: D15443187

Pulled By: colesbury

fbshipit-source-id: 4ee93ba598ef0fed2a99c0683aae30cb50a1f99c
2019-05-23 08:47:20 -07:00
80aed36fb6 fix a couple of typos in README markdown (#20819)
Summary:
was reading the README on github and came across a couple of typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20819

Differential Revision: D15469603

Pulled By: nairbv

fbshipit-source-id: 0ed7868de2d4e6d82557a8c170783966f8a1afd7
2019-05-23 08:11:25 -07:00
8fc069fa17 add batch of string ops (#20826)
Summary:
First batch of https://github.com/pytorch/pytorch/issues/20769, handles `isupper`, `islower`, `isdigit`, `isspace`, `isalnum`, `isalpha`, `upper`, `lower`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20826

Differential Revision: D15466986

Pulled By: eellison

fbshipit-source-id: d1df65721da803dfa30e28fdd9b874405be6bc7d
2019-05-23 08:01:16 -07:00
90182a7332 Install torchvision from master
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20836

Differential Revision: D15464705

Pulled By: bddppq

fbshipit-source-id: abe2ac2de2bf4c8d07334e6b2565c738c40428ae
2019-05-23 02:16:57 -07:00
d35a587958 Remove cpu_half, cpu_bool, cuda_bool from native_functions.yaml (#20552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20552
ghimport-source-id: 0ef4e85b40f3b927564257f44f72f671251acaf1

Differential Revision: D15362154

Pulled By: li-roy

fbshipit-source-id: b2477582389099c6696dca33f1371e8e136e32b6
2019-05-22 22:58:40 -07:00
41100d4027 Add PerChannelAffineQuantizer (#20764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20764

att

Reviewed By: dskhudia

Differential Revision: D15367364

fbshipit-source-id: 1d3ebf356ceac73b0fa4493209839d1c66d4d5b3
2019-05-22 19:16:52 -07:00
a21cf76575 Revert D15459166: [pytorch][PR] add batch of string ops
Differential Revision:
D15459166

Original commit changeset: 0ed908022475

fbshipit-source-id: d0a04228605e3437a02961a525eed8f8b3b59c17
2019-05-22 19:07:50 -07:00
5952ca8d9f Remove duplicated _optimize_trace and use core (#20394)
Summary:
The duplicated code of `_optimize_trace` in _pytorch_graph.py is used to bypass some optimization step which causes missing scope.

It seems that most of the problematic steps have been fixed recently. Standard models implemented in torchvision are visually inspected before the commit. However, the `+=` in 50d54a82d1/torchvision/models/resnet.py (L63) will let f4d9bfaa4d/torch/onnx/utils.py (L159) produce a bad result. It can be fixed by replacing it with `out += identity`. This also implies that `+=` has non-intuitive behavior.

cc orionr ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20394

Reviewed By: NarineK

Differential Revision: D15452204

Pulled By: orionr

fbshipit-source-id: eaa4c13f16551c78dc6419f1e22eb2c560af4cc5
2019-05-22 18:34:20 -07:00
871c9dcb1d move batchnorm and layernorm fusion to decompose (#20337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337
ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4

Differential Revision: D15448596

Pulled By: wanchaol

fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c
2019-05-22 18:01:27 -07:00
cde611a66c Quantized Conv2d operator (#20772)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20772

Copy of D15178352

A conflicting commit landed at the same time as D15178352 that removed registering kernels using IntArrayRef, Hence, D15178352 was revered. Using std::vector instead.

Reviewed By: zafartahirov

Differential Revision: D15437237

fbshipit-source-id: cd2f1caebcc720352b48ce25d716cb1ca49a5197
2019-05-22 17:53:24 -07:00
aebcd80ae4 add batch of string ops (#20826)
Summary:
First batch of https://github.com/pytorch/pytorch/issues/20769, handles `isupper`, `islower`, `isdigit`, `isspace`, `isalnum`, `isalpha`, `upper`, `lower`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20826

Differential Revision: D15459166

Pulled By: eellison

fbshipit-source-id: 0ed908022475e27011803cc4af7cf393a4312783
2019-05-22 17:33:04 -07:00
7aa3887f43 make wildcards alias only each other (#20670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20670
ghimport-source-id: f5704c49fcb829e4668441f31fcf9305da22335c

Reviewed By: jamesr66a

Differential Revision: D15447567

Pulled By: suo

fbshipit-source-id: 391236806838de2524410e26946456441e562470
2019-05-22 16:50:09 -07:00
90910fc6cb Mark values entering containers as wildcards (#20556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20556
ghimport-source-id: d7c62e38a2f6928f6f8d988c26a38ea8f8cff8b6

Reviewed By: jamesr66a

Differential Revision: D15447568

Pulled By: suo

fbshipit-source-id: 77ebc11b571b8517d3bad3ee1b3ee5ac037542b2
2019-05-22 16:50:06 -07:00
28be521e39 Fix bug in exporting node with multiple outputs by scripting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20256

Differential Revision: D15422040

Pulled By: houseroad

fbshipit-source-id: 5de2a992d7d99a48905c39a1878eb0b3b68d6a3f
2019-05-22 16:29:36 -07:00
c2e3e79afc fix pow bug on overloads and clean up (#20824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20824
ghimport-source-id: ceb1b64e2866ec8577800a8c378d8222a62cf199

Reviewed By: cpuhrsch

Differential Revision: D15458009

Pulled By: wanchaol

fbshipit-source-id: 51546d142d2c84e961d8b12ae85a2988a342da3b
2019-05-22 16:21:18 -07:00
98928f4d79 Allow both Variables and Tensors in c10 kernel interface (#20816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20816

Previously, the c10 dispatcher expected ops to be called with Variables and unwrapped them to Tensors before calling into the kernel.
The kernel was expected to return Tensors that were re-wrapped into Variables before passing them on into the system.

However, that doesn't work with kernels that call other operators. One recent example was a kernel that returned the result of `torch::ones()` as output.
Now, with this diff, the c10 dispatcher still passes Tensors to the kernel and Variables back into the system, but it accepts ops to be called with both Tensors or Variables
and kernels are also allowed to return either.

After https://github.com/pytorch/pytorch/pull/17072 , we should be able to get rid of the whole wrapping/unwrapping logic.

Reviewed By: hl475

Differential Revision: D15453963

fbshipit-source-id: 7602b7f2bc43e8ceb8a8c0e97aafcc53d4c47b6c
2019-05-22 16:03:12 -07:00
9ea009fe8b Add as_quantized_tensor (#20740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740

Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point.

Differential Revision: D15232416

fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b
2019-05-22 15:19:45 -07:00
12bc81ae2a Change comparison ops result dtype to bool [Part1] (#20767)
Summary:
This is the first part of the planned changes to change the comparison operations result tensor dtype from Byte to Bool. You can see the whole list of changes (not cleaned up) [here](https://github.com/pytorch/pytorch/pull/19332). As the PR is too big for a single review im breaking it into pieces.

**Changes in this PR:**
1. Enable these methods for bool tensors:
- maskedSelect
- maskedSelectBool
- bitand
- cbitand
- bitor
- cbitor
- bitxor
- cbitxor
- sign
- equal
- neg

2. Add bool clause for the TH version of sign method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20767

Differential Revision: D15436446

Pulled By: izdeby

fbshipit-source-id: 8d2494b5f4873cd79c7f1a40d2cb045cadfad51a
2019-05-22 15:12:46 -07:00
6ec3c12255 Update references to minimum CUDA and cuDNN version (#20718)
Summary:
I didn't update the Windows references because I wasn't sure if they apply to CUDA 9. peterjc123 what should the Windows section say?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20718

Differential Revision: D15459276

Pulled By: colesbury

fbshipit-source-id: 917e22f8ac75378d88c962c226b5a42b6799c79a
2019-05-22 14:54:54 -07:00
05543153dd CUDA implementation of fakequant (#20252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20252

Add CUDA implementation for fakequant op for quantization aware training.

Reviewed By: zafartahirov

Differential Revision: D15243386

fbshipit-source-id: 37610ab046786ffc69aaec5235e5df8304c353d6
2019-05-22 14:46:39 -07:00
fdb923996d Revert D15445092: Some minor fix to unblock the Bert model quantization
Differential Revision:
D15445092

Original commit changeset: 22da41a56ecb

fbshipit-source-id: eca9a85900bf48fe6a9da5cfff61606a10f0c3de
2019-05-22 14:25:14 -07:00
cfc98ae714 fix add_histogram_raw (#20688)
Summary:
This is a porting of the fix from:
https://github.com/lanpa/tensorboardX/issues/421

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20688

Reviewed By: NarineK

Differential Revision: D15415093

Pulled By: orionr

fbshipit-source-id: d32a6298218fbc6fe315aa0f18b57e0c8ef92627
2019-05-22 14:06:21 -07:00
fd2aa93b37 Exposing LengthsSum/Mean/Max in pytorch (#20802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20802

Need this for sequence model

Reviewed By: dzhulgakov

Differential Revision: D15448529

fbshipit-source-id: cd5abe3b689fc0e02feff10faf8cd61c99369f4f
2019-05-22 13:55:19 -07:00
8d7a025703 ONNX Export Scatter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18543

Differential Revision: D14658639

Pulled By: houseroad

fbshipit-source-id: 5d7821b54d2fc93f71120155adf328897d13aff6
2019-05-22 13:31:54 -07:00
fea4a56af3 Add ability to filter metric schema in LayerModelHelper (#20786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20786

Add a method to LayerModelHelper to filter metrics_schema. A general model builder may add metric schema that is not needed in some situations. This change add the ability to skip those unneeded.

Reviewed By: alex1o1o7cloud

Differential Revision: D15418140

fbshipit-source-id: 520f5dffd9938cf206cb1352e2953a4d4d2b6ab1
2019-05-22 12:26:20 -07:00
810816a1f9 Automatic update of fbcode/onnx to cc2333a3f929caca7223b98699237f19388dd585 (#20763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20763

Previous import was ead449a30d026a7a0a59e2ba0a42ca8e52ec2359

Included changes:
- **[cc2333a3](https://github.com/onnx/onnx/commit/cc2333a3)**: Version Conversion of Min, Max, Mean from opset 7 to 8 (#2032) <Ksenija Stanojevic>
- **[5d0975f4](https://github.com/onnx/onnx/commit/5d0975f4)**: Fix auto_pad shape inference bug (#2028) <stevenlix>
- **[819afd05](https://github.com/onnx/onnx/commit/819afd05)**: Version Conversion from opset 8 to 9 (#2007) <Ksenija Stanojevic>
- **[6c913669](https://github.com/onnx/onnx/commit/6c913669)**: fix macro ONNX_DISALLOW_COPY_AND_ASSIGN bug (#2017) <one-hello>

Reviewed By: BIT-silence

Differential Revision: D15425957

fbshipit-source-id: b799357930e8c9421e9bfcbfd97907e086862a6d
2019-05-22 11:37:01 -07:00
4e0d098ace Fix optimizer type hint (#20648)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20548
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20648

Differential Revision: D15453935

Pulled By: ezyang

fbshipit-source-id: 8778e819c58fdc2620f123ec5b5fd568e23b7705
2019-05-22 11:27:40 -07:00
795a1a6ffa When detecting numpy, assign relavant variables outside the try block (#20739)
Summary:
When detecting the presence of NumPy using import, move numpy-related variable assignments outside the try block (i.e., to an else block) to improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20739

Differential Revision: D15453916

Pulled By: ezyang

fbshipit-source-id: d3c37f2b290846be3c6a1462251cbb3e95d493be
2019-05-22 11:27:36 -07:00
fd95947e68 Revert D15248618: Split ATen/Parallel into interface and backend
Differential Revision:
D15248618

Original commit changeset: 060879266bc8

fbshipit-source-id: fc5cbb030b87613c9e15100118c3d4a064097c20
2019-05-22 09:55:51 -07:00
70ecddfd76 Some minor fix to unblock the Bert model quantization (#20787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20787

Set requires_grad=False for bias: this will block the jit tracing.
The as_type fix: The input tensor shape and output tensor shape will be different, which will trigger the assertion failure at https://fburl.com/0m8xy7tc.

Reviewed By: jamesr66a

Differential Revision: D15445092

fbshipit-source-id: 22da41a56ecb9ac092585d0cc1ff0658fb9d631b
2019-05-21 23:13:08 -07:00
a501e7d5be Add quant-dequant nodes for bias. (#20045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20045

This pass adds quant-dequant nodes for bias. This pass requires
quant-dequant pass for activations and weights to be done as it is required
to compute the qparams for bias

Differential Revision: D15179141

fbshipit-source-id: 3aab9fceefcadc3fa42a4e802d9b1e18addad78a
2019-05-21 21:59:37 -07:00
c2d0e7316f Add DictType to Metadata (#20770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20770

Add dict type since it's part of the pytorch built-in system, and sparse features and text features will be converted to Dict

Reviewed By: pritamdamania87

Differential Revision: D15436255

fbshipit-source-id: 239adbd6a8f68be29020fe656d790f6872f1f0e9
2019-05-21 21:53:06 -07:00
70eb315da4 Use AT_INTERNAL_ASSERT in test_base (#20555)
Summary:
as title. We were using AT_ASSERT, which is newly deprecated. In this case, we do in fact want an internal assertion since this is used in testing code to describe expected behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20555

Differential Revision: D15362964

Pulled By: suo

fbshipit-source-id: 984bfe71a774571611f3bbd81767d3cdb878a6fd
2019-05-21 21:25:07 -07:00
4a85e7955c Rename FC to Linear in the test routine (#20716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20716

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15410823

fbshipit-source-id: e82fc241ee288b41304675cb087c0cdcd60d7148
2019-05-21 19:58:19 -07:00
77651615c8 fbgemm precision argument (#20790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20790

att

Reviewed By: jianyuh

Differential Revision: D15445903

fbshipit-source-id: fd338aea55e40eecc780be881e67417679e2ea35
2019-05-21 19:26:15 -07:00
c4a3b4d528 Split ATen/Parallel into interface and backend (#20057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20057
ghimport-source-id: c583f61bf661c994eb4d0625748a299e892a7246

Differential Revision: D15248618

Pulled By: ilia-cher

fbshipit-source-id: 060879266bc8616916fe220adef6ae6c0b076fbd
2019-05-21 19:15:47 -07:00
adbab82846 int_repr for different quantized types (#20656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20656

att

Reviewed By: zafartahirov

Differential Revision: D15398134

fbshipit-source-id: b02899d4ff33598416f65cf76b2ecc62adee243b
2019-05-21 17:57:42 -07:00
c1d6bcf301 Use SmallVector to allocate Compound operands inline. (#20762)
Summary:
Reduces load time for serialized ResNet-18 by 5.5%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20762

Differential Revision: D15437364

fbshipit-source-id: 2ba34dd229a1054553d0ee09f044ce1915377d78
2019-05-21 16:37:52 -07:00
c9da01194a Optimize pytorch layer_norm forward (#20345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20345

Seperate from D15194600
Optimize pytorch layer_norm op part 1:
  optimize layer_norm_forward_cpu
  import Eigen Maps for the performance of reduction

Reviewed By: zheng-xq

Differential Revision: D15290608

fbshipit-source-id: cf2c208dfd6fbcbc4c69db3ed60278d9bee156b5
2019-05-21 15:59:49 -07:00
9cec8ae146 use tensoriterator instead of th for fill_ implementation. (#20719)
Summary:
Moves fill_ to aten as suggested in:
https://github.com/pytorch/pytorch/pull/20336#issuecomment-493260729

borrows from cpuhrsch's PR: https://github.com/pytorch/pytorch/pull/18876/files#diff-0d1178f1a4ce15aeb760d251974e6924

Co-authored-by: cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20719

Differential Revision: D15439420

Pulled By: nairbv

fbshipit-source-id: cbcc313cda61a528cecc4a28d601871565e6110c
2019-05-21 15:45:45 -07:00
7a0c6d528a Fix copy_transpose_valid check (#20759)
Summary:
Fixes #20755

(Was broken in #20685)

cc vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20759

Differential Revision: D15433712

Pulled By: colesbury

fbshipit-source-id: 29f612f7d4d7b73158d6f5dc1e46fd2f8fb09a2f
2019-05-21 15:37:37 -07:00
5acc664f9d make magic methods work with casts too (#20654)
Summary:
Previous implementation of magic methods extended from BuiltinOperators, but it should be able to work with other sugared values, such as casts.

I was also considering making CastValue's and BuiltinOperators's extend from a MagicMethod super class, and having them try to call into the super's before their own call. However, not all Builtin Operators have corresponding magic methods so i did it this way instead (although there are workarounds for that).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20654

Differential Revision: D15434469

Pulled By: eellison

fbshipit-source-id: 813fa00bf8b5b9ada46505075ebf984d8eee6aef
2019-05-21 14:23:06 -07:00
e6f22e1b89 Change Bias to QTensor with qint32(int32_t) (#20713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20713

As Title says.

Reviewed By: zafartahirov

Differential Revision: D15410734

fbshipit-source-id: c00f409278736cf9e3205f7d36dda1b96120f47d
2019-05-21 14:17:37 -07:00
b9a150ede0 Change Weight to QTensor with qint8(int8_t) (#20712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20712

As Title says.

Differential Revision: D15410696

fbshipit-source-id: 48147a79d8cc47a724eb473796a37a1c64f8e883
2019-05-21 14:17:34 -07:00
ac2314fdeb Fix a bug in quantize_linear (#20711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20711

For uint8_t, ```std::numeric_limits::digits``` returns 8;
For int8_t, ```std::numeric_limits::digits``` returns 7.

FBGEMM wants to get the ```qparams.precision``` to be always 8 for both int8_t and uint8_t.

Reviewed By: jerryzh168

Differential Revision: D15410695

fbshipit-source-id: 17dc3842d7c426947454c201bcb167b87b7301ce
2019-05-21 14:17:31 -07:00
32803b52f6 Update Conda description in PyTorch README (#20726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20726

Edward says it doesn't actually provide compilers,
but it does provide dependencies, so let's mention that instead.

Reviewed By: ezyang

Differential Revision: D15423316

fbshipit-source-id: 9b384f88e5bf7a3d2c132508620c276b49e1569f
2019-05-21 14:12:30 -07:00
5d8879cf6d Auto-convert GPU arrays that support the __cuda_array_interface__ protocol (#20584)
Summary:
This PR implements auto-conversion of GPU arrays that support the `__cuda_array_interface__` protocol (fixes #15601).

If an object exposes the `__cuda_array_interface__` attribute, `touch.as_tensor()` and `touch.tensor()` will use the exposed device memory.

#### Zero-copy
When using `touch.as_tensor(...,device=D)` where `D` is the same device as the one used in `__cuda_array_interface__`.

#### Implicit copy
When using `touch.as_tensor(...,device=D)` where `D` is the CPU or another non-CUDA device.

#### Explicit copy
When using `torch.tensor()`.

#### Exception
When using `touch.as_tensor(...,device=D)` where `D` is a CUDA device not used in `__cuda_array_interface__`.

#### Lifetime
`torch.as_tensor(obj)` tensor grabs a reference to `obj` so that the lifetime of `obj` exceeds the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20584

Differential Revision: D15435610

Pulled By: ezyang

fbshipit-source-id: c423776ba2f2c073b902e0a0ce272d54e9005286
2019-05-21 14:06:46 -07:00
847d9c57d1 Improve the recommended citation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20768

Differential Revision: D15436734

Pulled By: ezyang

fbshipit-source-id: d073f3b76a60bd8edf1e7799a1bb153d04a09bb1
2019-05-21 13:51:56 -07:00
bb20956e3c Add support for CMake switches for VS 2019 (#20752)
Summary:
Appending `arch` to the generator name is not supported for VS starting from VS 2019.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20752

Differential Revision: D15436740

Pulled By: ezyang

fbshipit-source-id: 20057aae8f708d82619927bf2cb87dd1bc2df312
2019-05-21 13:46:39 -07:00
47dc65fe76 add str comparisons (#20761)
Summary:
add string comparisons
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20761

Differential Revision: D15434616

Pulled By: eellison

fbshipit-source-id: c00c7bac6308dbcc6a9e46b92421f49fb2d5a81c
2019-05-21 12:47:50 -07:00
cca923c481 Add dequantize_linear for JIT pass (#20107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20107

att

Reviewed By: nishantpdce

Differential Revision: D15202187

fbshipit-source-id: 7d6274a67fcca695c0425587f35046fecbc2ccdc
2019-05-21 12:26:48 -07:00
cc02a1af61 Throw error if multiple kernels registered (#20737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20737

If someone tries to register multiple kernels in the same .op() call, we're now throwing an error.

Differential Revision: D15425660

fbshipit-source-id: 6d2f1444da3e16a6a98863d847965c2aa211e046
2019-05-21 12:17:01 -07:00
f3d827f311 Hipify fb/quantize
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20725

Reviewed By: bddppq

Differential Revision: D15407710

fbshipit-source-id: e5fdeee7e2dffd43cfdd6fab6193eb8a80902c02
2019-05-21 10:51:36 -07:00
b5edeca39d Split cpu/gpu in caffe2/distributed + some clean up (#20674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20674

A few targets in caffe2/caffe2/distribute needs to be split too, otherwise won't compile. Also some clean ups and make select_gpu_type to gpu_library_selector

Differential Revision: D15406019

fbshipit-source-id: 6455ab885b248502b48d4c7565597e00fecfd547
2019-05-21 10:51:33 -07:00
d7cd2d7a8c compile with -fcolor-diagnostics (#20662)
Summary:
Let there be color!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20662

Differential Revision: D15434110

Pulled By: suo

fbshipit-source-id: a317ae72ad72e0b8249f55c9c8d31f420c78c040
2019-05-21 10:32:55 -07:00
c790f10e2d Fix missing cudatoolkit dependency in binary linux tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20732

Differential Revision: D15434025

Pulled By: pjh5

fbshipit-source-id: 74a5798d14b6e61cdcdc784c159294b87264d3de
2019-05-21 10:27:15 -07:00
e3970d66d4 Fixing upload_binary_htmls again
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20736

Differential Revision: D15433417

Pulled By: pjh5

fbshipit-source-id: 58964a341226b536be899855058422cb82aa054b
2019-05-21 10:16:08 -07:00
fac307a5cf Revert D15178352: [pt1][quant] Quantized Conv2d operator
Differential Revision:
D15178352

Original commit changeset: 2e5453283137

fbshipit-source-id: 73cf64c483eedbd41a047e7593c0c92bbd33008c
2019-05-21 09:59:57 -07:00
eca7fa35a4 Fix -Wattributes warning on older versions of gcc (#20587)
Summary:
building with cuda and gcc 4.8.5-28, we see many warnings like:

[893/1645] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THCUNN/caffe2_gpu_generated_ELU.cu.o
/home/bvaughan/repos/pytorch/c10/util/ArrayRef.h:277:48: warning: ‘deprecated’ attribute directive ignored [-Wattributes]
 using IntList C10_DEPRECATED_USING = ArrayRef<int64_t>;

This change prevents those warnings on the older compiler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20587

Differential Revision: D15432749

Pulled By: nairbv

fbshipit-source-id: fd707afcbd6564f96617378d7cd6d62d941a052b
2019-05-21 09:47:40 -07:00
712c60f960 Fixing missing miniconda path in macos smokes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20727

Differential Revision: D15433407

Pulled By: pjh5

fbshipit-source-id: 2f5d4e1e49068e9597f7052deb70a287b91e482b
2019-05-21 09:47:37 -07:00
29b1b59449 Quantized Conv2d operator (#20064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20064

Initial implementation of quantized convolution operator using fbgemm.

Reviewed By: zafartahirov

Differential Revision: D15178352

fbshipit-source-id: 2e5453283137dc165e9a20164ffc138fa8caf88a
2019-05-21 09:13:42 -07:00
d73caca2a1 Add mandatory ScalarType nodes as input to the quant-dequant nodes. (#20468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20468

ScalarType node is mandatory for activations and parameters now.
This change inserts ScalarType node for all the quant-dequant nodes. For the activations, currently the default value is at::ScalarType::Undefined. Remove this and explicitly pass the at::ScalarType::QUint8 dtype

Differential Revision: D15331600

fbshipit-source-id: 5b51e0b42e694bf409026af4783a12da6d7e234b
2019-05-20 20:01:17 -07:00
371cf109a3 Increase static tolerance for negative feature ids
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20671

Reviewed By: Wakeupbuddy

Differential Revision: D15401078

fbshipit-source-id: a946b1df6fae2851d60fadf32e57feb44ba95f38
2019-05-20 19:09:22 -07:00
0beecbdaad fix soft_nms_cpu call in BoxWithNMSLimit (#20738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20738

D15348610 introduces a bug of misaligned arguments.

Reviewed By: isameer

Differential Revision: D15425627

fbshipit-source-id: b6345863847426ae04eb31245d13f7fcb69d0355
2019-05-20 18:49:41 -07:00
fbdafdffa1 Move bucketize_op to open source
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19952

Reviewed By: houseroad

Differential Revision: D15145552

fbshipit-source-id: e0074c878a5c164324a9cc477783285dedffd188
2019-05-20 18:03:27 -07:00
320c38555e Refactor CUDA copy and general copy dispatch (#20685)
Summary:
Copy.cu goes from 308 to 190 lines of code. In general it uses, the same
copy strategy, using cudaMempcyAsync, a pointwise kernel, or a copy
using temporary buffers. The pointwise kernel has slightly improved
performance when broadcasting due to faster index calculation.

This deletes "`s_copy_`", "`_s_copy_from`", and "`_copy_same_type_`". The only
entry-point now is "`copy_`".

A mini-benchmark is here:
https://gist.github.com/colesbury/706de1d4e8260afe046020988410b992

Before:
https://gist.github.com/colesbury/ab454b6fe3791bff420d7bcf8c041f18
After:
https://gist.github.com/colesbury/9024d242b56ab09a9ec985fa6d1620bc

Results were measured on 2.2 GHz Broadwell; no-turbo; one thread;
compiled with GCC 7.3.0. (Results are slower than typical usage due to
turbo being off.)

The only significant differences is in the CUDA [1024] -> [1024, 1024]
broadcasting copy which is ~25% faster. I don't expect a noticeable
difference in real programs.

CPU copy overhead is a tiny bit (~200 ns) faster, but I don't expect
anyone to notice that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20685

Differential Revision: D15414819

Pulled By: colesbury

fbshipit-source-id: d3c6e04a5020470e3bef15b1fc09503cae5df440
2019-05-20 17:09:44 -07:00
cf7ef5e631 Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564

Reviewed By: bddppq

Differential Revision: D15106712

fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993
2019-05-20 16:57:11 -07:00
0bfc0eeef7 restore hidden visibility by default for Linux builds (#20461)
Summary:
Symbols are given hidden visibility by default on Linux to emulate the behavior on Windows.  This helps developers catch visibility issues in their streamlined Linux dev environment before being surprised, late in the process, by Windows errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20461

Reviewed By: kostmo

Differential Revision: D15410410

Pulled By: dzhulgakov

fbshipit-source-id: 1d684b5a9a80b692966a775c3f1c56b7c72ffc95
2019-05-20 16:49:37 -07:00
be1f83c350 Fix dll linkage for tensor type ids (#20547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20547

-

Differential Revision: D15359988

fbshipit-source-id: 680115a6b73f64c9b02f86eccb8feb799adc6c90
2019-05-20 16:25:09 -07:00
410c7210db Add save() to torch._C.Function (#20386)
Summary:
Fixes #20017

This wraps the `torch._C.Function` currently returned from `torch.jit.script` and `torch.jit.trace` in a `ScriptFunction` and `TracedFunction` respectively, both of which are just wrappers to hold the function.
](https://our.intern.facebook.com/intern/diff/15403161/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20386

Pulled By: driazati

Differential Revision: D15403161

fbshipit-source-id: 94fb9f32929e62a00be6cf7512ea144ec9b91e0b
2019-05-20 16:19:51 -07:00
987f1ccf49 Add "ndim" property to tensor (#20565)
Summary:
For compatibility with numpy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20565

Differential Revision: D15374390

Pulled By: umanwizard

fbshipit-source-id: 4ab209a5fb27d8ba27ee7eb6b67b858ce2480594
2019-05-20 16:10:50 -07:00
6ae99aa5bc onnx/caffe2 tests: Do not execute models with CPU-only operators on GPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20720

Reviewed By: bddppq

Differential Revision: D15422322

Pulled By: houseroad

fbshipit-source-id: c79795434157ff5f0a7b2774fd40edc71cf35ba7
2019-05-20 16:04:45 -07:00
be33434d85 Unify the addQuantDequantNode api for inputs and outputs from quant nodes (#20677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20677

With new changes in IR, it is possible to insert nodes after param
nodes in graph. Thus we do not need to have two methods for inserting q-dq
nodes to input or output to quantizable nodes.

Differential Revision: D15406354

fbshipit-source-id: 1963762f434fd82877fa76a272e8520c342b6069
2019-05-20 15:27:45 -07:00
cf548ba683 De-deprecate old list and dict APIs (#20709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20709

- Remove ArrayRef based API. This is neither the old nor the planned new API.
- De-deprecate kernels based on std::vector and std::unordered_map. We don't have the Dict/List based API figured out entirely yet, so we shouldn't push people towards using them.
  std::vector and std::unordered_map will get deprecated again once we figured out List/Dict.

Reviewed By: dzhulgakov

Differential Revision: D15417025

fbshipit-source-id: bfbb33c762e43487bb499bc8cc36d515e678f8fc
2019-05-20 13:53:00 -07:00
c062175803 Remove unused var (ws_) and use vars in undefined case for compile (#20667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20667

Compilation errors:
```
xplat/caffe2/caffe2/utils/signal_handler.h:31:10: error: private field 'SIGINT_action_' is not used [-Werror,-Wunused-private-field]
  Action SIGINT_action_;
         ^
xplat/caffe2/caffe2/utils/signal_handler.h:32:10: error: private field 'SIGHUP_action_' is not used [-Werror,-Wunused-private-field]
  Action SIGHUP_action_;
         ^
xplat/caffe2/caffe2/utils/signal_handler.h:33:17: error: private field 'my_sigint_count_' is not used [-Werror,-Wunused-private-field]
  unsigned long my_sigint_count_;
                ^
xplat/caffe2/caffe2/utils/signal_handler.h:34:17: error: private field 'my_sighup_count_' is not used [-Werror,-Wunused-private-field]
  unsigned long my_sighup_count_;
                ^
4 errors generated.

xplat/caffe2/caffe2/share/fb/stylizer/median_blur_ops.cc:593:14: error: private field 'ws_' is not used [-Werror,-Wunused-private-field]
  Workspace* ws_;
             ^
1 error generated.
```

Reviewed By: bwasti

Differential Revision: D15402928

fbshipit-source-id: 5b98499850aa659fd37ab8e7f2e75166787b8129
2019-05-20 13:52:57 -07:00
af6eea9391 Add the support of feature store example in pytorch model in fblearner (#20040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20040

Add the support of feature store example in fblearner pytorch predictor, end to end

Reviewed By: dzhulgakov

Differential Revision: D15177897

fbshipit-source-id: 0f6df8b064eb9844fc9ddae61e978d6574c22916
2019-05-20 12:58:27 -07:00
9fbce974c9 torch::jit::RegisterOperators forwards to c10::RegisterOperators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20383

Reviewed By: zdevito

Differential Revision: D15300937

fbshipit-source-id: 740fe323fc0945759651116ae61aff4d36319d73
2019-05-20 12:41:04 -07:00
74bdcd44c4 Remove tab. (#20715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20715
ghimport-source-id: 600e244581b37152d86614cca6c9fb5fee6cdcde

Differential Revision: D15417984

Pulled By: ezyang

fbshipit-source-id: 939425de1a95ecc3798384e121b12faaba3a27b8
2019-05-20 11:57:18 -07:00
10445c0404 Finish removal of AT_CHECK, officially deprecate the macro. (#20600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20600

All future uses of AT_CHECK will fail our CI.

Reviewed By: jerryzh168

Differential Revision: D15375397

fbshipit-source-id: 5582664d6c7c4f1a56ae45647eb1bca49fed2866
2019-05-20 11:57:15 -07:00
c1fa449763 Break reference cycle in load_state_dict (#20397)
Summary:
load_state_dict includes a recursive inner function `load` that captures
Tensors through the close-over variable `state_dict`. Because it's
recursive, it also captures itself leading to a reference cycle.

This breaks the reference cycle so that any Tensors in state_dict can be
collected immediately instead of waiting until the next GC cycle.

Alternatively, we could have passed `state_dict` and `metadata` as
arguments to load to prevent capture of Tensors. (That would still
result in cyclic garbage, but not any cyclic garbage of Tensors).

See:
https://github.com/pytorch/pytorch/issues/20199#issuecomment-491089004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20397

Differential Revision: D15414834

Pulled By: colesbury

fbshipit-source-id: 4c2275a08b2d8043deb3779db28be03bda15872d
2019-05-20 11:46:00 -07:00
796e359601 Refactor builtin ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20666

Pulled By: driazati

Differential Revision: D15404419

fbshipit-source-id: c52bfa408c3caabb0a14779308686cf27fed349b
2019-05-20 11:21:18 -07:00
c0a2a3b22b Add a new method SummaryWriter.flush() (#20607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20607

Add a new method SummaryWriter.flush()  that iterates through all of the FileWriters and flushes them

Reviewed By: orionr

Differential Revision: D15380124

fbshipit-source-id: 1975f3f61c5ae3754552bfdb23f2cd78f687d19f
2019-05-20 11:05:12 -07:00
f215db9b92 InsertGuards pass
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20438

Differential Revision: D15342655

Pulled By: Krovatkin

fbshipit-source-id: a193e582d621b99f848573fb4478e7b62265dc9f
2019-05-20 10:49:19 -07:00
036a159fb9 Audit AT_ASSERT sites in TensorImpl.h; doc improvements (#20649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20649

I went through every occurrence of AT_ASSERT in this file and
thought about whether or not it should be TORCH_INTERNAL_ASSERT
or TORCH_CHECK.  I think I did a good job at it.  Some thoughts:

- In order to decide if a check is "internal" or not, we must
  think about where the separation between userspace and our internals
  are.  I think any code that utilizes the PyTorch or Caffe2 C++ frontends
  count as userspace.  An important collorary is that the majority of operator
  code "counts" as userspace, even though it lives in our repository.  This
  is inline with TCB (trusted computing base) thinking: you want the TCB to
  be as small as possible, and because we have a *lot* of operator
  implementations, they should not count as TCB.

- The primary test I applied when considering an AT_ASSERT was whether or
  not I could trigger this error by just making method calls on caffe2::Tensor
  or at::Tensor.  If I could, that made it a TORCH_CHECK.  This covers most
  of the misapplications of TORCH_INTERNAL_ASSERT.  One place I didn't
  do this was the "is variable" checks; I think you have to work a bit
  harder to trigger this case, and userspace code is not mixing up
  Variables and Tensros.

- I updated the docs for device_opt_, explaining when it could be nullopt.
  (The nullopt checks here are TORCH_CHECK, because you can trigger them
  by taking an undefined tensor and poking the methods.)

Differential Revision: D15395576

fbshipit-source-id: 1c51b396012e7d949fbb4258092cf80e5e6f851b
2019-05-20 09:54:37 -07:00
71260b98e2 Fixed histc return type for CUDA (#20369)
Summary:
Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369

Reviewed By: zou3519

Differential Revision: D15300959

Pulled By: izdeby

fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf
2019-05-20 08:08:28 -07:00
d0c742134d #20028 (#20696)
Summary:
Hi, ezyang
Sorry to trouble you.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20696

Differential Revision: D15413694

Pulled By: ezyang

fbshipit-source-id: 1c19d18e00c3a66a52bb9230aa25d7530f6e659c
2019-05-20 07:51:55 -07:00
cda9e995e2 Benchmark repeat op. (#20016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20016

PT's repeat op benchmark

Reviewed By: zheng-xq

Differential Revision: D15166941

fbshipit-source-id: b1ed7af790460456210b60bfb4e44a08657e9612
2019-05-20 07:34:54 -07:00
8acaa286b7 Make CUDACachingAllocator::recordStream() a no-op on null ptrs (#20658)
Summary:
Fixes #20651

Communication collectives in `torch.distributed` call `CUDACachingAllocator::recordStream()` on input and output tensors to prevent their memory blocks being freed too early. `CUDACachingAllocator` uses tensor's data pointer to track memory blocks, which does not accept null pointers. However, empty tensor's `storage().data()` might be null. In this case, as there is no associated memory block for the empty tensor, it should be fine to make `recordStream()` a no-op.

Tests only cover `broadcast` empty tensors for GLOO backend, because GLOO does not support empty inputs (facebookincubator/gloo/issues/179). It can be addressed in either `ProcessGroupGloo` or GLOO itself. Will add more tests when that gap is filled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20658

Differential Revision: D15399371

Pulled By: mrshenli

fbshipit-source-id: d29ebd1c72fddae49531f32695f81b89e42e5a4d
2019-05-20 07:13:51 -07:00
071971476d Fix Binomimal overflow when logits is large (#20679)
Summary:
This PR fixes  #17843. In addition (test locally), this still maintains the continuity of log_prob which is addressed in https://github.com/pytorch/pytorch/pull/15962

cc neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20679

Differential Revision: D15413311

Pulled By: ezyang

fbshipit-source-id: 4fc0ca755ae6a85aa7deb2206dab675f82f9aa25
2019-05-20 06:52:29 -07:00
9b1dbffba5 Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
5835165ce3 Add get/set_num_interop_threads into torch.h include (#20659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20659
ghimport-source-id: 4858d03a9f89c613f64901c3430a7b212f76eb95

Reviewed By: dzhulgakov

Differential Revision: D15399780

Pulled By: ilia-cher

fbshipit-source-id: c1c3cb628c5ee664468f9d181bcd76a5105a89fd
2019-05-20 00:34:59 -07:00
4598729399 better handling of getenv 2019-05-19 23:19:52 -07:00
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
7b9ee598d6 separate option for FE_OVERFLOW (#20476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20476

There're overflow exceptions happening for legitimate computation like
for big x, sigmoid(x) = 1 / (1 + exp(-x)) = 1 / (1 + inf) = 1
This diff separates the option for FE_OVERFLOW to make caffe2_operator_throw_if_fp_exceptions=1 option less noisy.

Reviewed By: hx89

Differential Revision: D15332947

fbshipit-source-id: 9148233f5b84551a0900f0557ba22f2b1508ae0c
2019-05-19 16:05:27 -07:00
dd050b7b91 Replace AT_ASSERT with TORCH_INTERNAL_ASSERT/TORCH_CHECK (#20668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20668

AT_ASSERT is deprecated. Replace it wil the new exception handling macro.

Differential Revision: D15404401

fbshipit-source-id: 7ae9f5471f1f8ad95e7bb835380e098c00313d9c
2019-05-18 01:48:11 -07:00
96a1f7695f Support plot norm of specific embeddings of a LUT in diagnose_options (#19809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19809

as title

Reviewed By: chocjy

Differential Revision: D15100505

fbshipit-source-id: cba290fd4317b260e2bf1689b9ca215d3d19a9e2
2019-05-18 01:08:45 -07:00
2308257483 delete brodcasting ops from shape analysis resize aliasing (#20661)
Summary:
According to https://pytorch.org/docs/stable/notes/broadcasting.html, in-place operations do not allow the in-place tensor to change shape as a result of the broadcast. Therefore our shape analysis could keep the shape information on inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20661

Differential Revision: D15406477

Pulled By: wanchaol

fbshipit-source-id: 8ab60e783292f2fe26e5fdecfb64bec43bca6826
2019-05-18 00:07:32 -07:00
e74869473d De-deprecate parts of the legacy API (#20561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20561

We previously planned to deprecate the direct passing of a kernel function or lambda to the op() call, e.g.

    static auto registry = RegisterOperators().op("my::op", &func);

and push users towards the options based API:

    static auto registry = RegisterOperators().op("my::op", RegisterOperators::options().kernel<decltype(func), &func>());

because that has a slightly lower performance overhead when calling the kernel.

However, that overhead is negligible for all but exotic use cases, so there's no reason to push users towards a more verbose API.
This diff removes the deprecation warning from that API.

However, if you use the API together with deprecated types like std::unordered_map, you will now get a deprecation warning there.

Reviewed By: zdevito

Differential Revision: D15364271

fbshipit-source-id: 56dae0c5870bbab16ad19ba5178f4bea9eafed9f
2019-05-17 20:54:45 -07:00
cb6be42403 Options based registration API (#20514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514

Change API from

    static auto registry = c10::RegisterOperators()
      .op("my::op",
        c10::kernel(...),
        c10::dispatchKey(...)
      );

to

    static auto registry = c10::RegisterOperators()
      .op("my::op", c10::RegisterOperators::options()
        .kernel(...)
        .dispatchKey(...)
      );

because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better.

Reviewed By: zdevito

Differential Revision: D15346348

fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167
2019-05-17 20:54:42 -07:00
85fad0597c Add qint8 type (int8_t) (#19984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19984

Add qint8 for QTensor, with underlying type of int8_t

Reviewed By: jianyuh

Differential Revision: D15150715

fbshipit-source-id: 57580f599d46f9323af5ce462dbbc464b25e40d7
2019-05-17 20:35:05 -07:00
986c9eb537 Add a pybind for Module::get_functions. (#20594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20594
ghimport-source-id: 96f8046fd3aed49d8b29312ce2f8d7e0ea5e5787

Differential Revision: D15375132

Pulled By: ZolotukhinM

fbshipit-source-id: 7ad87e29d965e459c49be1be8a08f86ed2a0b4db
2019-05-17 20:00:28 -07:00
2af5911a95 Modify the Insert quant-dequant test cases to look for q-dq pattern (#20672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20672

Current test case looks for q->int_repr->dq pattern and constant nodes also.
The prim::Constant nodes are not guaranteed to be present at same point in graph.
So we modify the test case to only look for the q->int_repr->dq nodes.

Differential Revision: D15405606

fbshipit-source-id: 2086ffb5bbd328d2a9a55f4c2a2de342575194d3
2019-05-17 18:46:27 -07:00
e42665cf39 Some small performance fixes for c10 dispatcher (#20472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20472
ghimport-source-id: d118bf8d48eea3faf241a7288fcad1bb6a5f051f

Differential Revision: D15332284

Pulled By: li-roy

fbshipit-source-id: a8d9e50a440a7ad3ee730f70c0fcae06ae848cbd
2019-05-17 17:35:56 -07:00
d9dcfacd9e Improve CPUAllocator OOM message (#20618)
Summary:
Spotted while debugging some problem

Before
```
>>> torch.empty(10**15)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0
```

After
```
>>> torch.empty(10**15)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 4000000000000000 bytes. Error code 12 (Cannot allocate memory)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20618

Reviewed By: ezyang

Differential Revision: D15390400

Pulled By: dzhulgakov

fbshipit-source-id: 31f448303e4bd5f8c2bad8ca0f05bcece22a4b5e
2019-05-17 16:14:49 -07:00
e79610c0df Fix missing env for update_binary_size job
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20645

Differential Revision: D15400130

Pulled By: pjh5

fbshipit-source-id: d2a07fc5608ab3a96026b60e16bd12add8e0c9d4
2019-05-17 15:46:50 -07:00
c267d0c869 Misc error message improvements (#19369)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19369 [jit] Misc error message improvements**
)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19369

Pulled By: driazati

Differential Revision: D14982975

fbshipit-source-id: 18dae3d56c3a01280f66223a69e2e52d15d3b651
2019-05-17 15:30:58 -07:00
3be2f7c8e6 SubgraphMatcher: add attributes support. (#20602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20602
ghimport-source-id: fa3225bb5d70729d6a1bcf88295d031707d986a1

Differential Revision: D15377635

Pulled By: ZolotukhinM

fbshipit-source-id: ebd385e7b9436429d0ad76ed3d932925a29f6456
2019-05-17 15:10:02 -07:00
0b9b929d14 Use python type string for user facing error msgs (#20657)
Summary:
Otherwise users see something like (Tensor, Tensor)? and don't know what the ? means.

First commit is formatting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20657

Differential Revision: D15400225

Pulled By: eellison

fbshipit-source-id: cf826790bf2ddafd34f6d5c144526cad9904770b
2019-05-17 15:04:53 -07:00
c819d76789 Add list(string) (#20617)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#20617 [jit] Add list(string)**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20617

Pulled By: driazati

Differential Revision: D15397739

fbshipit-source-id: 55a1e54f0ed2d9fd3ce44a8bb64bb4ba63181c9a
2019-05-17 14:57:20 -07:00
a543586bff Add _enable_recursive_script to try to script all Python functions (#19578)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19578 [jit] Try to script all Python functions**

This adds the `torch.jit._enable_recursive_script` context manager, which will try to compile any Python functions it sees. It's hidden behind an internal context manager for now since it's incomplete (doesn't work for script_methods/Python submodules). If it can't compile the Python function it outputs an error.
](https://our.intern.facebook.com/intern/diff/15386727/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19578

Pulled By: driazati

Differential Revision: D15386727

fbshipit-source-id: 4e308f67677b8e9fccfc525a91bb2f4585062048
2019-05-17 14:50:45 -07:00
cd28ff5395 Add support for __getstate__/__setstate__ on module (#20242)
Summary:
Adds support for `__getstate__` and `__setstate__` on modules that are called as part of export (`torch.save()`) and import (`torch.jit.load`).
* `__getstate__` and `__setstate__` must be TorchScript functions with the signatures `() -> T` and `(T) -> None` respectively
* The results of `__getstate__` are stored using the pickler in `states.pkl` with one for each module in definition order (`__getstate__` returns `None` by default if an imlpementation is not provided)
    * This prevents sharing between `__getstate__` and attributes, but this should be fine since their use is mostly unrelated (attributes are for storing values to be used in script methods, `__getstate__` for running arbitrary computations during import)

Follow up
* Somehow replacing `__getstate__`/`__setstate__` with a `ScriptMethodStub` makes `MyScriptModule().__getstate__()` call `ScriptModule.__getstate__()` when used in Python. This should be fixed so semantics in Python are preserved, but it doesn't affect the typical usage.
](https://our.intern.facebook.com/intern/diff/15287161/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20242

Pulled By: driazati

Differential Revision: D15287161

fbshipit-source-id: b3f5f33ab74a21a89e6d15460af63aff75cab2d8
2019-05-17 14:43:14 -07:00
5f14ef8cc1 Split out gpu/cpu targets based on gpu_library_targets (#20633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20633

Merge the c2_gpu and is_amd_build logic in targets files.

Reviewed By: dzhulgakov

Differential Revision: D15176621

fbshipit-source-id: 9185b394ffcb305fd8d94dc7c7c92780bf10a511
2019-05-17 13:07:10 -07:00
79c5dc313c Remove unnecessary format literals from error message.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20646

Differential Revision: D15394795

fbshipit-source-id: 8033cf03341244b2b6a119e3c59f48ee6fe959cc
2019-05-17 10:45:40 -07:00
26dfeffacd Remove TH/THC link for single matrix inverse (#20534)
Summary:
- Earlier, we had to use the legacy implementation of `getri` for single matrix inverse from TH and THC
- Now, this has been moved to ATen

Changelog:
- Move single matrix inverse implementation to ATen
- Remove unused code in TH and THC resulting from the change
- Minor modifications made to single matrix CPU function implementations in ATen to avoid redundancy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20534

Differential Revision: D15393383

Pulled By: ezyang

fbshipit-source-id: 81972111cd9757d15f1d634f294c93fd0f35636c
2019-05-17 10:03:36 -07:00
839a69f587 Revert D15393514: [pytorch][PR] Refine CosineAnnealingWarmRestarts doc for issue #20028
Differential Revision:
D15393514

Original commit changeset: 03f270a577fc

fbshipit-source-id: 3633f4e9916bdadf018288a64df89078b14af563
2019-05-17 09:55:56 -07:00
8c9f4c560a Add matmul optimization for the case A.ndim <= 2 && B.ndim >= 3 (#20448)
Summary:
This addresses #18862.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20448

Differential Revision: D15393465

Pulled By: ezyang

fbshipit-source-id: 87e5b0ed8253ea00365f420d98ac96dd4e934028
2019-05-17 09:44:26 -07:00
a212a5b97a ir.cpp, module.cpp: clang-format. (#20592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20592
ghimport-source-id: 98dc62a9595c6b94706960274ce9beebacc9ca00

Differential Revision: D15375131

Pulled By: ZolotukhinM

fbshipit-source-id: 7edbb14a337d1646b48756eef4163846648cbd93
2019-05-17 09:21:32 -07:00
b90790ab1b Don't split 256-bit AVX2 load/store intrinsics (#20609)
Summary:
Recent versions of GCC split unaligned load and store intrinsics into
two 128-bit instructions. On old processors (Sandy Bridge) this was a
bit faster for unaligned data, but bit slower for aligned data. On new
processors (Intel Haswell+, recent AMD) splitting loads is slower on
both aligned and unaligned data.

Clang, MSVC, and ICC do not split unaligned load and store intrinsics.

There's a good explanation here:
https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd#tab-top

Splitting load and store intrinsics makes no sense in our AVX2
configuration because the CPUs that support AVX2 instructions are the
same CPUs where splitting is disadvantageous on all data alignemnt.

Note that this doesn't change the AVX configuration (used by CPUs that
support AVX but not AVX2). It's possible this would be benficial for
that configuration too (our data is usually 32-byte aligned), but I'd
prefer the conservative change for now.

torch.add generated assembly (hot loop) (GCC 7.3.0)
before:
https://gist.github.com/colesbury/066376537bccd514daf8fe4ab54d8295

after:
https://gist.github.com/colesbury/8b4b948145001d44b225c51d2428bb91

Timing of `torch.add(x, y, out=z)` for size 10240 (1 thread, Broadwell,
no turbo):
before: 7.35 us after: 6.39 us

(Take the torch.add timings with a grain of salt. The difference in timings
is much larger than I would expect.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20609

Differential Revision: D15385800

Pulled By: colesbury

fbshipit-source-id: 66415b148a3b19360b9de9881af594ab46547b6f
2019-05-17 09:16:17 -07:00
000d73ccde fix WAR race (#20182)
Summary:
was flagged by racecheck.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20182

Differential Revision: D15393536

Pulled By: ezyang

fbshipit-source-id: ad4849c9fb2c8feb966be1c4ca0dadd7360f58fe
2019-05-17 09:06:52 -07:00
3c69c9a7fe Refine CosineAnnealingWarmRestarts doc for issue #20028 (#20267)
Summary:
Fixes #20028
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20267

Differential Revision: D15393514

Pulled By: ezyang

fbshipit-source-id: 03f270a577fc3e0414d3f07d97512a409b08f7cd
2019-05-17 09:02:28 -07:00
cfb87c1022 Update documentation for CTCLoss (#20422)
Summary:
Change `Inputs` to `Shape` to unify the format of CTCLoss `class`, and add the type of `Output` in `Shape`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20422

Differential Revision: D15393484

Pulled By: ezyang

fbshipit-source-id: 5b49647f9740de77db49a566fa2de74fcecd9110
2019-05-17 09:02:25 -07:00
35e0015c70 Export sign onnx operator (#20470)
Summary:
A trivial commit that supports exporting sign operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20470

Differential Revision: D15393446

Pulled By: ezyang

fbshipit-source-id: 12fb1c147d016205abf814907d667f7d8b074ae1
2019-05-17 08:57:22 -07:00
4e551a7edb Make C10_NODISCARD macro more portable for nvcc+clang. (#20324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20324
ghimport-source-id: e51181c82f87c946b5ffcb87b0ad71a056cb4659

Differential Revision: D15359317

Pulled By: ezyang

fbshipit-source-id: d88798f13a61c74456641ddec8250c08ce8af240
2019-05-17 08:57:19 -07:00
690efa5220 Remove checks for CUDA 8 in LU-based tests (#20482)
Summary:
CUDA 8 is no longer supported and removed from CI, so these checks are irrelevant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20482

Differential Revision: D15393438

Pulled By: ezyang

fbshipit-source-id: ac0979bf660b3314eec502c745e34ce4940bda0e
2019-05-17 08:51:56 -07:00
110ed511a4 Make check-doxygen.sh output more interpretable. (#20362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20362
ghimport-source-id: ac791884dc6d3954f69d8fc997b2b561f435e0e7

Differential Revision: D15375139

Pulled By: ezyang

fbshipit-source-id: c8aa0f991430269090e068f828810bae7aa39a07
2019-05-17 08:47:11 -07:00
1136ad59f9 Enable simd and loop vectorizer with MSVC
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20530

Differential Revision: D15392676

Pulled By: ezyang

fbshipit-source-id: c8fda0c7835127f81adf55016223bb4dc14ff40a
2019-05-17 08:38:56 -07:00
fa4ca4e70e Emphasize all DDP forward() outputs must participate in computing loss (#20586)
Summary:
CC borguz chenyangyu1988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20586

Reviewed By: ezyang

Differential Revision: D15373674

Pulled By: mrshenli

fbshipit-source-id: b986918b3592616a9bcc88fba1b8fd53016f68d7
2019-05-17 07:35:49 -07:00
c941abbc0a Fix upsample kernel launch / reorder arguments (#20505)
Summary:
this is a follow up for https://github.com/pytorch/pytorch/pull/19630
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20505

Differential Revision: D15392706

Pulled By: ezyang

fbshipit-source-id: 5a8a7aacdbcf740508baf2b6e0c081c4e5a0390f
2019-05-17 07:30:50 -07:00
3bc0bd9534 Fix caffe2 build failure on Windows (#20574)
Summary:
Fixes #20568.
Looks like CMake is passing `/MD` when we call `add_library`. We need to fix these with C source files too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20574

Differential Revision: D15392682

Pulled By: ezyang

fbshipit-source-id: c92034d8725fcec48fd7db6cf5322868e956dc6b
2019-05-17 07:21:42 -07:00
4c806a9e8a Allow tuples for scale_factor argument in nn.Upsample (#20581)
Summary:
Fixes #20523 .

nn.Upsample was unable to accept tuple inputs for the scale_factor argument due to direct casting to float, which was done in #17732.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20581

Differential Revision: D15392622

Pulled By: ezyang

fbshipit-source-id: b56ba8197a5bbf8891bc7e1bebf5cad63dcab04d
2019-05-17 07:14:18 -07:00
409200df59 Move inter-op settings into ATen/Parallel (#20050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20050
ghimport-source-id: cc102bab8abf3e56c099245976786317ed63ea14

Differential Revision: D15248576

Pulled By: ilia-cher

fbshipit-source-id: 55ddcb7af387ddfc68a42ac7167de07ea648e249
2019-05-17 03:12:02 -07:00
36d3398aa5 Clang-format ImageInputOp (#20441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20441

This op is fairly complex and the fact that it isn't formatted
correctly makes things that much harder to reason about. Clean it up.

Reviewed By: dreiss

Differential Revision: D15220006

fbshipit-source-id: 30632d8bdbf15f96e73d8b6c96c5f29c052e6e7c
2019-05-16 23:00:09 -07:00
ea9c6e7581 eliminate FE_INVALID in unit test (#20502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20502

Following D15307410 removing more floating point exceptions in unit tests

Reviewed By: hx89

Differential Revision: D15340930

fbshipit-source-id: 269fc75e0800bc9d39126767a0f3ca15cd8b0cad
2019-05-16 21:55:28 -07:00
e4c7f59fbc Shallow-copy indices and values in sparse tensor ctor (#20614)
Summary:
(Reopens https://github.com/pytorch/pytorch/pull/20330 and fixes test error.)

After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor.

Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example:
```python
# Calling resize_ on non-requires-grad value tensor
i2 = torch.zeros([1, 1])
v2 = torch.ones([1, 2, 3])
t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3]))
v2.resize_(4, 5)
t2.coalesce().values().size()
# On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor.
# After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20614

Differential Revision: D15385811

Pulled By: yf225

fbshipit-source-id: e963fcf5e4097f8c881b56145f408565d97cf5c1
2019-05-16 18:35:05 -07:00
3c86d597c4 update legacy plus one for mpscnn
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20554

Reviewed By: jerryzh168

Differential Revision: D15362378

fbshipit-source-id: 070cd8314257386036dca89167c738c6602b3f33
2019-05-16 18:17:18 -07:00
8bdbd59d0c handle box plus one for gpu generate_proposals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20553

Reviewed By: newstzpz

Differential Revision: D15362108

fbshipit-source-id: 53b1ef132288855f8977748442bfe5e5806c6c6e
2019-05-16 18:17:15 -07:00
373e6a78bf make box plus one a legacy argument in detection ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20550

Reviewed By: newstzpz

Differential Revision: D15348610

fbshipit-source-id: 12b1e119e9bc9191ba9f2aa6d695ef215780c349
2019-05-16 18:17:12 -07:00
220e6894c5 Rename qint8 data type (#19932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19932

In preparation to add int8_t data type for QTensor

Reviewed By: zafartahirov

Differential Revision: D15137838

fbshipit-source-id: 59462c36d6fc5982986d4196bf3f32f49bb294d7
2019-05-16 18:09:28 -07:00
980982ac09 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: d0158f7e77a915ffbc28c10de8864d2ae9b24e6f
2019-05-16 16:06:55 -07:00
2ddf126b96 Revert D15373683: [pytorch][PR] [BC-breaking] Shallow-copy indices and values in sparse tensor ctor
Differential Revision:
D15373683

Original commit changeset: 32e7275d7121

fbshipit-source-id: ed1786ee9ffa11f7c14c9cd10be6db48285dc57a
2019-05-16 15:22:48 -07:00
4f02321a9a Shallow-copy indices and values in sparse tensor ctor (#20330)
Summary:
After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor.

Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example:
```python
# Calling resize_ on non-requires-grad value tensor
i2 = torch.zeros([1, 1])
v2 = torch.ones([1, 2, 3])
t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3]))
v2.resize_(4, 5)
t2.coalesce().values().size()
# On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor.
# After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20330

Differential Revision: D15373683

Pulled By: yf225

fbshipit-source-id: 32e7275d7121e17937c7cc258e8a60bb0848ff25
2019-05-16 15:04:23 -07:00
21ef4cc615 Improve bmm performance on CPU by applying TensorAccessor (#20266)
Summary:
Currently `bmm()` has very heavy performance overhead on CPU due to construction/deconstruction of `TensorImpl`. Applying `TensorAccessor` when indexing tensor data can greatly improve the performance.

I tested this on `fairseq` Transformer model. Results on Xeon 6148 (20*2 cores 2.5GHz) indicate this PR improves Transformer training performance by approximately **10%** (seconds per iteration reduced from **3.60** to **3.21**). Considering the fact that `bmm()` takes only **14%** of the total time, 10% overall improvement indicates `bmm()` itself improves by roughly **3x**.

Before:
```
| epoch 001:   0%| | 43/25337 [02:34<25:17:11,  3.60s/it, loss=16.179, nll_loss=16.137, ppl=72045.59, wps=1320, ups=0, wpb=4758.767, bsz=136.558, num_updates=43, lr=6.45e-06, gnorm=6.88
```

After:
```
| epoch 001:   0%| | 23/25337 [01:13<22:32:48,  3.21s/it, loss=17.072, nll_loss=17.068, ppl=137419.42, wps=1478, ups=0, wpb=4746.870, bsz=128.348, num_updates=23, lr=3.45e-06, gnorm=10.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20266

Differential Revision: D15262201

Pulled By: cpuhrsch

fbshipit-source-id: c2e4e406c06714b04cc7534f3da71e986eddca35
2019-05-16 14:01:48 -07:00
fa189641b5 Add export for __and__ & __or__ (#17894)
Summary:
In onnx spec, the supported input/output type for `And` and `Or` is `Bool` only.
Thus in exporting, cast to/from `Bool` is inserted for input/output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17894

Reviewed By: zrphercule

Differential Revision: D15103148

Pulled By: houseroad

fbshipit-source-id: 3e1068ea236c743260d42882fb11f0e3a21707e6
2019-05-16 13:52:06 -07:00
61012080c8 split and register CollectAndDistributeFpnRpnProposals with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20509

Reviewed By: newstzpz

Differential Revision: D15302181

fbshipit-source-id: 7d3b29b667cd900f2976101f35200e1ee20b0f64
2019-05-16 13:40:46 -07:00
d784636b39 Scope: Move implementations from .h to .cpp file. (#20593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20593
ghimport-source-id: e1e9a7f98158c23adf02d5ed2763ab1e33bb2997

Differential Revision: D15375134

Pulled By: ZolotukhinM

fbshipit-source-id: 8d8e0c1e0ef7697ded59a4b19e2d9de7c5230294
2019-05-16 13:18:04 -07:00
75d04900fe Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 2ee799db589f63e7b9336a02d047afcc768e8b58
2019-05-16 09:48:39 -07:00
5821a76b8e Forcing gcc ABI and safer bash scripts, v2 (#20540)
Summary:
First time this was merged it broke master and was reverted. This time I do not add ```set -u``` to the .circleci/scripts/setup* scripts. There's still a chance that ```set -u``` breaks the binary builds on master, but at least those can be fixed in parallel and don't completely eliminate signal from all merges.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20540

Differential Revision: D15373444

Pulled By: pjh5

fbshipit-source-id: 0203c20865827366ecd8fa07b2db74d255549ed1
2019-05-16 09:40:01 -07:00
66c6133264 fix empty dropout (#20541)
Summary:
Fix for #20499
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20541

Differential Revision: D15372461

Pulled By: ezyang

fbshipit-source-id: cdc237a98244515a573216a6dac4826261c973f9
2019-05-16 09:33:51 -07:00
a837c00acd Removing unnecessary comments (+fix flake8)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20589

Differential Revision: D15373655

Pulled By: VitalyFedyunin

fbshipit-source-id: 25277648d3e8f8a09cec7569ceda56e74c2ef0b1
2019-05-16 09:19:34 -07:00
5f8e849d84 eliminate FE_INVALID in optimizer related operators and tests (#20501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20501

Fixing unit tests related to optimizer related operators and tests

Reviewed By: hx89

Differential Revision: D15307410

fbshipit-source-id: e5400c26e08f26191ee542fe6b02e0a69bc4e1ae
2019-05-16 08:23:46 -07:00
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
09f22d10a6 Infer schema for experimental ops (#20513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20513

They've been using an old API, switch them to the new one instead.

Reviewed By: li-roy

Differential Revision: D15346349

fbshipit-source-id: 538eb460897ec6addebeebf88b316eb0d6b1dd6f
2019-05-16 01:29:35 -07:00
9bd3305592 Allow nested lists/dicts in legacy operator API (#20379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20379

The legacy custom op API allowed nesting of std::unordered_map and std::vector. While we haven't figured out yet how to do that with the new API,
we at least have to keep backwards compatibility. This diff adds the feature so we can switch to the new API without breaking third party code.

Reviewed By: li-roy

Differential Revision: D15287693

fbshipit-source-id: bb5b8429fddf6298719cbf567b584ed371f8fc81
2019-05-16 01:29:32 -07:00
456b889353 Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach() (#20496)
Summary:
Previously, the caller of `shallow_copy_and_detach()` is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users of `shallow_copy_and_detach()` to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought.

For similar reasons, we want to enforce users of `shallow_copy_and_detach()` to pass `allow_tensor_metadata_change` to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20496

Differential Revision: D15363620

Pulled By: yf225

fbshipit-source-id: a65e74738b10452668d6dc644b43aad5b3d8c9e6
2019-05-15 21:02:48 -07:00
3caf4e6985 Remove weak_script in MultiheadAttention function. (#20563)
Summary:
Remove weak_script. After recently splitting the forward() function in MultiheadAttention module, we notice a memory leak on GPU. Fix the problem by removing those "weak_script" decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20563

Differential Revision: D15368262

Pulled By: zhangguanheng66

fbshipit-source-id: 475db93c9ee0dbaea8fb914c004e7d1e0d419bc2
2019-05-15 20:10:39 -07:00
7db1fb84fa Use slimmer exception raising code when on mobile. (#20543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20543

All of that code for concatenating strings together adds up. Just discard it all for mobile builds.

Reviewed By: ljk53

Differential Revision: D15353447

fbshipit-source-id: a82dd0b884335d662605aabf7dd3d09dfcc1478b
2019-05-15 19:45:18 -07:00
1891614aa5 Add GivenTensorInt16Fill (#20515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20515

Needed by the upcoming quantized version of GenerateProposals

Reviewed By: dzhulgakov

Differential Revision: D14430952

fbshipit-source-id: ea852f04cc4b070f8fbe7a1e6535bba4d5b230fd
2019-05-15 19:45:15 -07:00
5917ec2c52 Print registry warning only when DEBUG is set (#20398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20398

Reduce logging volume from the Registry

Reviewed By: nairbv

Differential Revision: D15312262

fbshipit-source-id: e3546c288d6e1a396b2a4b08204a418aca889437
2019-05-15 19:29:05 -07:00
c129ab06e9 Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439

This is the QTensorProto workflow for multi group quantization in C2 side.
No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50.

Reviewed By: yinghai

Differential Revision: D15096919

fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c
2019-05-15 19:24:08 -07:00
51e40ab832 Add scalar type info to tensor print (#20483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20483
ghimport-source-id: 31bfc51af1060e83492315b96884fc725a1eb84f

Differential Revision: D15334010

Pulled By: li-roy

fbshipit-source-id: 199b575855146a7336d57c165191a16e7e1b5785
2019-05-15 19:03:21 -07:00
abb3698976 Add QInt32 ScalarType and qint32 data type (#19816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816

We need this for quantization for bias
add third argument of ScalarType to `quantize_linear`

Differential Revision: D15094174

fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
2019-05-15 18:50:18 -07:00
1a0f753e6e Fixing typos in schema description for BatchMatMul (#20512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20512

Fixing typos in the description of schema for one of the inputs for BatchMatMul operator.

Reviewed By: jianyuh, BIT-silence

Differential Revision: D15343879

fbshipit-source-id: 06354e8e6b0d79fea937ed2703bb457b2d04f859
2019-05-15 18:06:30 -07:00
b3e510518b Tensor codemod for instance_norm (#20517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20517

fixing a bug in instance_norm

Reviewed By: ezyang

Differential Revision: D15349006

fbshipit-source-id: 2496f7f372118d2713c12a6e9b3357bf6c640b71
2019-05-15 17:51:37 -07:00
ca24e18c7e Add an AssertError check back to MultiheadAttention module (#20492)
Summary:
Fix a typo in the doc.
Add an AssertError check back to MultiheadAttention module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20492

Differential Revision: D15349008

Pulled By: cpuhrsch

fbshipit-source-id: 2d898345f03787c713e537673613a748ad826b34
2019-05-15 17:28:25 -07:00
161566187c enable CopyVector for type of int on CUDA (#20520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20520

as title

Reviewed By: xianjiec

Differential Revision: D15351010

fbshipit-source-id: 99466de9da0abdffe26d6919768dcb4e52cb2ff1
2019-05-15 16:53:51 -07:00
4c23c34e79 Computing var/stddev and mean at the same time (#18731)
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731

Differential Revision: D14726082

Pulled By: ifedan

fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
2019-05-15 16:42:38 -07:00
08bdd694f9 Extract feature length information from SigridTransforms op (#20384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20384

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20171

Extract feature length information from SigridTransforms op

Reviewed By: ipiszy

Differential Revision: D15219408

fbshipit-source-id: 307d2b65b208d3af6977d90246d0372795c45815
2019-05-15 16:21:57 -07:00
428104c60a Automatic update of fbcode/onnx to ead449a30d026a7a0a59e2ba0a42ca8e52ec2359 (#20542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20542

Previous import was e08efaa35ed54362dfa283240506c003175889b7

Included changes:
- **[ead449a3](https://github.com/onnx/onnx/commit/ead449a3)**: fix array range bug (#2015) <one-hello>
- **[0442d426](https://github.com/onnx/onnx/commit/0442d426)**: Relax constraint on subgraph input/output type and shape (#2009) <Bowen Bao>

Reviewed By: zrphercule

Differential Revision: D15350320

fbshipit-source-id: 2cc5db926785cda0b79efb6747da3900361dba76
2019-05-15 15:12:59 -07:00
8226330af3 Extend testAvailableArgTypes (#20374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20374

This test case now also tests that the argument type works correctly in kernels that
- don't return outputs
- return multiple outputs

Reviewed By: li-roy

Differential Revision: D15298233

fbshipit-source-id: 82ab9d81b55b4f9fb34d66a155cc426af8592e25
2019-05-15 14:57:40 -07:00
f89ab7b623 Allow Dict type in c10 operators (#20373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20373

- Add support for Dict<Key, Value> arguments and returns to c10 operators
- Add support for std::unordered_map<Key, Value> to the legacy API (but not to c10 kernels)

Reviewed By: li-roy

Differential Revision: D15298235

fbshipit-source-id: 6d9793db1f12bea377f508a9b33a495ebe0bec18
2019-05-15 14:57:37 -07:00
a821e11127 Speed up RecordFunction with sampled callbacks (#20307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20307
ghimport-source-id: 94adc3c3102bb6b9cb60cf6c6112e350aa954aaf

Differential Revision: D15276308

Pulled By: ilia-cher

fbshipit-source-id: c536063669d8414b4ce0b09fd5dc0d76f1e94bb5
2019-05-15 14:48:49 -07:00
b55d2dcc84 Publish c10::RegisterOperators as torch::RegisterOperators (#20334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20334

-

Reviewed By: li-roy

Differential Revision: D15284557

fbshipit-source-id: fdd1d9f2910dbd05a869eef13ccdc68c80e6bd81
2019-05-15 13:45:07 -07:00
852f8526c5 Replace AT_CHECK with TORCH_CHECK [shard 5/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20431

Reviewed By: jerryzh168

Differential Revision: D15318266

fbshipit-source-id: 500719451202458fae312aa196c0c60098d6a541
2019-05-15 12:54:08 -07:00
5243fe0350 Allow static inheritence for ScriptModules (#20503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20503
ghimport-source-id: 35684175f0806485b074ea136548823ad1bc1c30

Differential Revision: D15341555

Pulled By: zdevito

fbshipit-source-id: ad19da3306914196dcbbcee829dcb0a9f22e3722
2019-05-15 12:41:55 -07:00
da3e74b21c define use_cuda in dropout backward to allow peephole optimization to… (#20289)
Summary:
… work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20289

Differential Revision: D15350262

Pulled By: wanchaol

fbshipit-source-id: b457304688524822c1e6f23049e05472130c1ff4
2019-05-15 11:36:06 -07:00
bd047d812e Recursively checkout submodules for Pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20537

Differential Revision: D15354830

Pulled By: pjh5

fbshipit-source-id: 40902c450756dc127d34c9ec64e78d33edb6c5c9
2019-05-15 10:48:27 -07:00
72bb84c518 Provide a few default args for numpy translation (#20451)
Summary:
Add automatic translations for a few argument names that commonly differ between PyTorch and NumPy.

For now, they are as follows:

* `keepdim` -> `keepdims`
* `dim` -> `axis`
* `input` -> (any of `a`, `x`, `x1`)
* `other` -> `x2`

Basic examples:
```python
>>> t=torch.randn(10,10)
>>> torch.sum(x=t, axis=1)
tensor([ 0.5199, -0.3768,  4.3619, -0.9105,  1.1804,  1.0837, -0.9036,  0.2365,
         1.1171, -0.0999])
```
```python
>>> torch.add(x1=5, x2=6)
tensor(11)
```

The additional overhead is zero when using traditional PyTorch argument names, and a few (usually 1) extra PyDict lookups when using NumPy argument names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20451

Differential Revision: D15337521

Pulled By: umanwizard

fbshipit-source-id: 7a7d389786f4ccf5c86a14ecb2002c61730c51b5
2019-05-15 10:13:17 -07:00
83649ef081 Replace AT_CHECK with TORCH_CHECK [shard 1/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20426

Reviewed By: jerryzh168

Differential Revision: D15318160

fbshipit-source-id: 4d1fb341ab47147d760d527b901de6ce54182753
2019-05-15 08:44:54 -07:00
2827f3ded6 Portable way of the warning clause
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20484

Differential Revision: D15353119

Pulled By: ezyang

fbshipit-source-id: a708548554728ec34c51a8032ceb2b12f16a8d5c
2019-05-15 08:14:01 -07:00
15c0091d8a Fix GetLastError in THAllocator for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20526

Differential Revision: D15352969

Pulled By: ezyang

fbshipit-source-id: 50a9ee10c1c80cfe737c96dd8af63a2b034686ae
2019-05-15 08:13:57 -07:00
73a97387c1 Replace AT_CHECK with TORCH_CHECK [shard 9/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435

Reviewed By: jerryzh168

Differential Revision: D15318877

fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1
2019-05-15 08:05:59 -07:00
365fc26571 Replace AT_CHECK with TORCH_CHECK [shard 8/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20434

Reviewed By: jerryzh168

Differential Revision: D15318396

fbshipit-source-id: dcd0f51be2d64b9440bb95ce8f40acb12545c2f4
2019-05-15 08:05:56 -07:00
d1623f4cc9 Replace AT_CHECK with TORCH_CHECK [shard 3/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20428

Reviewed By: jerryzh168

Differential Revision: D15318209

fbshipit-source-id: e492aaa79146cfce9489bdb354cc539d7c4220a7
2019-05-15 07:40:50 -07:00
9d09f5df6c Replace AT_CHECK with TORCH_CHECK [shard 7/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20432

Reviewed By: jerryzh168

Differential Revision: D15318289

fbshipit-source-id: 6c443ac848fe28a1e3e8d7f33a12cd50f80b3e40
2019-05-15 07:40:47 -07:00
101067703e Fix strtod for MSVC (#20490)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20408. Tested locally by Jonas1312.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20490

Differential Revision: D15353137

Pulled By: ezyang

fbshipit-source-id: 0c0aefe54b11d50f703171700838af51f7666418
2019-05-15 07:40:44 -07:00
97e1f07ffc Replace AT_CHECK with TORCH_CHECK [shard 10/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20436

Reviewed By: jerryzh168

Differential Revision: D15318926

fbshipit-source-id: 71a43070cc50cc174f703ebc595f1d87c6fc1e91
2019-05-15 07:35:37 -07:00
8e26759f14 Back out "[pytorch][PR] Manually set _GLIBCXX_USE_CXX11_ABI in devtoolset7 binary builds"
Summary: Original commit changeset: 571bba8a93ea

Reviewed By: pjh5

Differential Revision: D15349783

fbshipit-source-id: 75c3e2b9b97e0ac0e8bcdef93e53b0d475c6fa38
2019-05-15 00:02:55 -07:00
fd18b89c98 shape inference for learning rate op (#20020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20020

Add shape inference for LearningRate op. The output (lr) should have similar shape with input (iteration), but not the same type (float vs int).

Reviewed By: un-disclosed

Differential Revision: D15112300

fbshipit-source-id: 09969aefa15172a6f3c70cd9b2548e3020da5d7a
2019-05-14 23:34:32 -07:00
33f421027c Allow recency weight pooling for fp16 (#20506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20506

as titled

Reviewed By: alex1o1o7cloud

Differential Revision: D15342758

fbshipit-source-id: 89e7cb6d7b9511ef6c70611359736328571d7fc0
2019-05-14 20:13:38 -07:00
ea13b53856 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 63e9b4a8cf5b15a6ba20d1946aac36c1604d8079
2019-05-14 19:02:43 -07:00
254de9e8ec Removing cyclic dependency (#20511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20511

Removed cyclic dependency of caffe2/core/net.h and workspace.h

Differential Revision: D15303412

fbshipit-source-id: 6e772e372cd0cf2af05d7815f1df8ae20bc2a65e
2019-05-14 18:55:19 -07:00
ace506fb38 Dict (#20372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20372

Implement a Dict type that allows us to abstract away from the concrete implementation used.
The API is similar to std::unordered_map, but behind the scenes we can switch to any map implementation we like. ska::flat_hash_map, google dense map, or any future map implementation with better performance.
Switching such an implementation choice does not have to break backwards compatibility of kernel code using the Dict type.

Reviewed By: zdevito

Differential Revision: D15298234

fbshipit-source-id: b5ad368a9e9516030805cd8f5f1b02e3986933c0
2019-05-14 18:37:02 -07:00
56fb5e03b5 refactor registerStoragePyTypeObject (#20467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20467

for upcoming changes in Storage for QInt8

Reviewed By: ezyang

Differential Revision: D15330865

fbshipit-source-id: 2840e59c0bf088983f792fd724de41b3bb3dec55
2019-05-14 18:22:33 -07:00
ea38fbfc5c Manually set _GLIBCXX_USE_CXX11_ABI in devtoolset7 binary builds (#20243)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/17492
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20243

Differential Revision: D15348101

Pulled By: pjh5

fbshipit-source-id: 571bba8a93eaa9806db3f3d38697c26b5285da7a
2019-05-14 18:02:42 -07:00
358fb51e77 Replace AT_CHECK with TORCH_CHECK [shard 6/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20430

Reviewed By: jerryzh168

Differential Revision: D15318250

fbshipit-source-id: eaee93447d757124a0c9fb5dcde503ae6a065912
2019-05-14 16:00:59 -07:00
5b45355431 Replace AT_CHECK with TORCH_CHECK [shard 2/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20427

Reviewed By: jerryzh168

Differential Revision: D15318190

fbshipit-source-id: 15518a683d7b662ef00f255134aaf9dbd183f099
2019-05-14 16:00:56 -07:00
71af7c46bb Replace AT_CHECK with TORCH_CHECK [shard 4/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20429

Reviewed By: jerryzh168

Differential Revision: D15318222

fbshipit-source-id: daf693c34b4ee92e302eee679ed76a862715d1bb
2019-05-14 15:50:16 -07:00
9610f150d7 stop build spew on development (#20508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20508
ghimport-source-id: 26a16e2918fb93058c7740afb85070e0d29b4d1b

Differential Revision: D15343207

Pulled By: zdevito

fbshipit-source-id: b6d8858024cc440d59cf88d69e0fbc0e67dc85ce
2019-05-14 15:30:52 -07:00
24cd0e08cf identify important circleci builds (#20498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20498
ghimport-source-id: b62b5bcf73ce87b1054cad053fd1cc118a586cf6

Differential Revision: D15342506

Pulled By: suo

fbshipit-source-id: 9889103d23affe0d7eea0abfd801bae46d5238a2
2019-05-14 15:16:06 -07:00
9e7f22b223 Remove dependencies from Caffe2Go on PyTorch JIT (#20463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20463

Source file changes mostly involve ifdef'ing-out references to JIT code
from files that are part of Caffe2Go.  Update Internal build scripts to
remove those files from our globs.

After this, changes to most of the JIT files should not trigger mobile CI.

Reviewed By: dzhulgakov

Differential Revision: D15329407

fbshipit-source-id: 48f614c6b028eef0a03ce5161d083a3e078b0412
2019-05-14 14:36:08 -07:00
3479777519 UpSample GPU Porting (#19630)
Summary:
resolves #16158
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19630

Differential Revision: D15335765

Pulled By: ezyang

fbshipit-source-id: 03dd590c715a65c20ac99674a5d77179cd4a50fc
2019-05-14 11:58:21 -07:00
7ffc37e022 Add ShapeInference for AtomicIter Op (#20021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20021

Add shape inference for AtomicIter operator. The operator takes two blobs iteration and iter_mutex as input and outputs iteration, which should have the same type and shape as the input.

Reviewed By: un-disclosed

Differential Revision: D15111643

fbshipit-source-id: 0d06413305cc4c6257c0cfabf62fb874970803bc
2019-05-14 11:43:21 -07:00
6e82b1c77d Split nn.MultiHeadAttention into Module + functional (#20415)
Summary:
Moving functions from torch/nn/modules/activation.py to torch/nn/functional.py. For functions not implemented (_get_input_buffer and _set_input_buffer), a TODO is added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20415

Differential Revision: D15318078

Pulled By: jamarshon

fbshipit-source-id: 5ca698e2913821442cf8609cc61ac8190496a3c6
2019-05-14 08:41:28 -07:00
b46a630836 Update Sleef to include fix for FMA4 detection (#20450)
Summary:
FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).

Once we verify that this change fixes https://github.com/pytorch/pytorch/issues/12112, I'll make a PR for upstream Sleef.

The mapping of registers to reg is:

```
  reg[0] = eax
  reg[1] = ebx
  reg[2] = ecx <---
  reg[3] = edx
```

Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal
instruction"
errors on AMD CPUs that do not support FMA4.

See https://github.com/pytorch/pytorch/issues/12112
See https://github.com/shibatch/sleef/issues/261

http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20450

Differential Revision: D15324405

Pulled By: colesbury

fbshipit-source-id: 96fb344c646998ff5da19e4cdbf493f5a4e9892a
2019-05-14 08:33:18 -07:00
101176870e eliminate FE_INVALID exceptions related to fp16 conversion (#20390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20390

duc0 Ngo implemented observing floating point exceptions but there were a couple of places where we have "benign" floating point exceptions leading to false positives. This diff eliminates one source of such false positives, namely using _mm256_cvtph_ps and _mm256_cvtps_ph for partially uninitialized array for the remainder loop.

Reviewed By: hx89

Differential Revision: D15307358

fbshipit-source-id: 38f57dfdd90c70bc693292d2f9c33c7ba558e2c9
2019-05-13 23:42:01 -07:00
8e9692df27 codemode change missing [from D13586737]
Summary: as title

Reviewed By: jerryzh168

Differential Revision: D15327669

fbshipit-source-id: e262dacb097e91475b1925ec40b375ec6722ad5a
2019-05-13 20:44:04 -07:00
e8fb5f35f0 Bump torch proto version (#20444)
Summary:
Tagging along to changes in #20191 which added more support for types in the pickler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20444

Pulled By: driazati

Differential Revision: D15321463

fbshipit-source-id: 985061bf5070a7d7bad58ea8db11d531f3d13e74
2019-05-13 18:32:16 -07:00
a9aaf698a4 add c2 benchmark runs in cpp (#20108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108

Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first.

Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that.

Reviewed By: zheng-xq

Differential Revision: D15155976

fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3
2019-05-13 17:01:08 -07:00
d2da3ee601 temporarily disbale layernorm AD (#20442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20442
ghimport-source-id: c246ade4ee9ee31b2e3413efff3ea6a246e1837e

Differential Revision: D15321524

Pulled By: wanchaol

fbshipit-source-id: 22c77d08c91af2d83dfd2c4a84cafc56e9240033
2019-05-13 16:35:50 -07:00
f0829f37c8 Rename AT_ASSERT to TORCH_INTERNAL_ASSERT; other macro updates (#20321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20321

First part of https://github.com/pytorch/pytorch/issues/20287

- Rename `AT_ASSERT` to `TORCH_INTERNAL_ASSERT`
- Make `TORCH_INTERNAL_ASSERT` work with variadic inputs
- Deprecated `AT_ASSERT` and `AT_ASSERTM`
- Rename `AT_CHECK` to `TORCH_CHECK`
- Make `TORCH_CHECK` give a better error message when no arguments are
  provided
- Deprecate `AT_ERROR` in favor of `TORCH_CHECK(false, ...)`
- Deprecate `AT_INDEX_ERROR` in favor of `TORCH_CHECK_INDEX(false, ...)`
- Rename `AT_WARN` to `TORCH_WARN`

No use sites are changed; I'll work on that in follow up patches
(or disable the deprecation, if necessary.)

Differential Revision: D15278439

fbshipit-source-id: 7e0ed489d4e89e5f56b8ad7eafa72cb9a06065ee
2019-05-13 16:16:42 -07:00
1364104054 Fix version counter sharing in set_data() (#20391)
Summary:
In https://github.com/pytorch/pytorch/pull/18223/files#diff-77a6f3462f2233b921d3042412fed6d3R178, we used `auto saved_version_ = data_.unsafeGetTensorImpl()->version_counter().current_version()` and then `new_data_impl_copy->set_version_counter(saved_version_)`, which actually doesn't preserve the original semantics that `var.set_data(tensor)` should keep `var`'s version counter object intact. This PR fixes the bug and adds test to make sure it doesn't happen again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20391

Differential Revision: D15323430

Pulled By: yf225

fbshipit-source-id: e3ba49b51ec8ccecd51c80cb182387f74cfd2b2b
2019-05-13 16:03:42 -07:00
3a0b27b73d Move at::NonVariableTypeMode to TensorImpl, and check it in is_variable() (#20392)
Summary:
As part of the Variable/Tensor merge, we allow passing Tensor with AutogradMeta into ATen ops, but we want to make sure they are not treated as Variables (i.e. their `is_variable()` is false). This PR makes the necessary change to make this work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20392

Differential Revision: D15321899

Pulled By: yf225

fbshipit-source-id: c2ab09db73c63bd71ba2d8391095f4d6b4240a9a
2019-05-13 15:49:23 -07:00
2dc9152dbe Automatic update of fbcode/onnx to e08efaa35ed54362dfa283240506c003175889b7 (#20443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20443

Previous import was 5bde6371620b76302864bce90f521d72eda95d0e

Included changes:
- **[e08efaa3](https://github.com/onnx/onnx/commit/e08efaa3)**: Fix shape inference logic for TopK operator (#2005) <Hariharan Seshadri>
- **[d80ea947](https://github.com/onnx/onnx/commit/d80ea947)**: Nullary variadic (#1889) <G. Ramalingam>
- **[50dc186b](https://github.com/onnx/onnx/commit/50dc186b)**: Removed setting MD/MDd flags manually through cmake. The MTd/MT part is still necessary. Looks like CI fails without it. (#1995) <Alexander Yermolovich>
- **[e7f81c5e](https://github.com/onnx/onnx/commit/e7f81c5e)**: Move NonMaxSupression to object_detection folder (#2001) <Hector Li>
- **[86ab4517](https://github.com/onnx/onnx/commit/86ab4517)**: Prevent using invalid iterator, fix arithmetics. (#2004) <Dmitri Smirnov>

Reviewed By: zrphercule

Differential Revision: D15302141

fbshipit-source-id: 146c346c188934e5125371b261ecfde93b4aa166
2019-05-13 14:47:11 -07:00
824d4f9957 Needed fixes for binaries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20385

Differential Revision: D15321396

Pulled By: pjh5

fbshipit-source-id: de7ca1ac928bdea3bcf6c78e84c7e9b786bcff52
2019-05-13 11:58:50 -07:00
6c3b8a24ff Make sure reducer=None is not used when fp16 embedding is enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20349

Reviewed By: hyuen

Differential Revision: D15291545

fbshipit-source-id: fa5fd0b97aeca6e5f45866908f3f205b701c931b
2019-05-13 11:53:14 -07:00
63c05bffcb Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20440

Pulled By: driazati

Differential Revision: D15320614

fbshipit-source-id: dc650c478e39d0c3e6b660c2d9ef93b3479df1ac
2019-05-13 11:37:27 -07:00
7799ea5eb3 Port adaptive_avg_pool3d to ATen (#19898)
Summary:
Resolves #18065.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19898

Differential Revision: D15240607

Pulled By: ezyang

fbshipit-source-id: 00cf23ed20c1757d5eef71fd8c6a2f53d372e341
2019-05-13 11:29:22 -07:00
5268b7dfaf Remove support for CUDA 8 (#20298)
Summary:
1.1.0 stopped support for CUDA 8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20298

Differential Revision: D15294639

Pulled By: ezyang

fbshipit-source-id: b9411bfe456f93f1529b745dc83b7d6310df684d
2019-05-13 11:24:22 -07:00
62957ab0a1 Tiny spelling mistake fix. (#20425)
Summary:
"then the output would also has k tensors" -> "then the output would also have k tensors"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20425

Differential Revision: D15320152

Pulled By: zou3519

fbshipit-source-id: b04e2ccd29c6a3e33ad1040d0ea975a01a7bd9b5
2019-05-13 11:18:53 -07:00
67414714e5 Move THCTensor_(uniform) to ATen (#20292)
Summary:
As a first step for this plan: https://github.com/pytorch/pytorch/issues/19508#issuecomment-485178192, this PR moves `THCTensor_(uniform)` to ATen. Major changes are:
- `uniform_` cuda kernel now utilizes a philox generator.
- the kernel also utilizes TensorIterator
- the kernel uses a grid-stride loop to achieve peak effective bandwidth

- Since the engine has changed from `curandStateMTGP32` to `curandStatePhilox4_32_10`, the randoms generated now will be different.
- Here is the diff showing codegen changes: https://gist.github.com/syed-ahmed/4af9ae0d42b6c7dbaa13b9dd0d1dd1e8 (BC breaking change if any)

- Philox4_32_10 is known to pass the standard TestU01 Big Crush test (https://www.thesalmons.org/john/random123/papers/random123sc11.pdf) and hence the quality of random numbers generated isn't an issue when compared to the previously used `curandStateMTGP32`.
- I have added a test case in `aten/src/ATen/test/cuda_distributions_test.cu` which verifies that philox offset is incremented properly

The benchmark was done on a DGX station with 4 V100s.
I modified the script from jcjohnson 's [multinomial benchmark](https://github.com/jcjohnson/pytorch-multinomial-benchmark) to produce this notebook which shows that there is a general speedup with this PR and a regression hasn't been introduced: https://gist.github.com/syed-ahmed/9d26d4e96308aed274d0f2c7be5218ef

To reproduce the notebook:
- Run https://gist.github.com/syed-ahmed/4208c22c541f1d30ad6a9b1efc1d728f in a container with the current pytorch top of tree with the command: `python uniform_benchmark.py --stats_json before.json`
- Apply this diff to the current pytorch top of tree and run the same script in a container with the command: `python uniform_benchmark.py --stats_json after.json`
- Run the notebook attached above with the `after.json` and `before.json` in the same directory

The effected bandwidth was calculated using the script (thanks to ngimel ): https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
Following are the numbers before and after.
```
uniform, size, elements 65536 forward 5.168914794921875e-06 bandwidth (GB/s) 50.71548098597786
uniform, size, elements 131072 forward 5.056858062744141e-06 bandwidth (GB/s) 103.67860705101367
uniform, size, elements 262144 forward 7.164478302001953e-06 bandwidth (GB/s) 146.357621001797
uniform, size, elements 524288 forward 1.1217594146728515e-05 bandwidth (GB/s) 186.9520302275877
uniform, size, elements 1048576 forward 1.923084259033203e-05 bandwidth (GB/s) 218.10297600317384
uniform, size, elements 2097152 forward 3.640890121459961e-05 bandwidth (GB/s) 230.39992200138826
uniform, size, elements 4194304 forward 6.778717041015625e-05 bandwidth (GB/s) 247.49839679819922
uniform, size, elements 8388608 forward 0.00012810707092285157 bandwidth (GB/s) 261.92490202361347
uniform, size, elements 16777216 forward 0.00025241613388061524 bandwidth (GB/s) 265.86598474620627
uniform, size, elements 33554432 forward 0.000497891902923584 bandwidth (GB/s) 269.5720239913193
```
```
uniform, size, elements 65536 forward 5.550384521484375e-06 bandwidth (GB/s) 47.22988091821306
uniform, size, elements 131072 forward 5.581378936767578e-06 bandwidth (GB/s) 93.93520954942333
uniform, size, elements 262144 forward 6.165504455566406e-06 bandwidth (GB/s) 170.071404141686
uniform, size, elements 524288 forward 6.3276290893554685e-06 bandwidth (GB/s) 331.4277702414469
uniform, size, elements 1048576 forward 8.509159088134765e-06 bandwidth (GB/s) 492.91639239047356
uniform, size, elements 2097152 forward 1.2989044189453124e-05 bandwidth (GB/s) 645.8218077979443
uniform, size, elements 4194304 forward 2.347707748413086e-05 bandwidth (GB/s) 714.6211452997259
uniform, size, elements 8388608 forward 4.4286251068115234e-05 bandwidth (GB/s) 757.6715389250498
uniform, size, elements 16777216 forward 8.672237396240235e-05 bandwidth (GB/s) 773.8356427961071
uniform, size, elements 33554432 forward 0.00016920566558837892 bandwidth (GB/s) 793.2224227438523
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20292

Differential Revision: D15277761

Pulled By: ezyang

fbshipit-source-id: 8bfe31a01eeed77f0ed6e7ec4d2dda4c6472ecaa
2019-05-13 09:38:28 -07:00
5f7ef09f57 math module support: gcd, copysign, erf, erfc, expm1, fabs, gamma, lgamma (#19707)
Summary:
eellison driazati Refer to issue #19026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19707

Differential Revision: D15302632

Pulled By: eellison

fbshipit-source-id: 68ff13b478b93cc33703ef3276b5fa727c8ff31a
2019-05-13 08:55:23 -07:00
41673d477c Disable incremental_state function in MultiheadAttention module. (#20177)
Summary:
To fully support incremental_state function, it requires several additional utils available in fairseq. However, we lack a problem for the unit test. Therefore, the incremental_state function will be disable for now. If it is needed in the future, a feature request could be created. Fixed #20132

Add some unit tests to cover the arguments of MultiheadAttention module, including bias, add_bias_kv, add_zero_attn, key_padding_mask, need_weights, attn_mask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20177

Differential Revision: D15304575

Pulled By: cpuhrsch

fbshipit-source-id: ebd8cc0f11a4da0c0998bf0c7e4e341585e5685a
2019-05-13 08:21:15 -07:00
f8aa6a8f44 Make a deep copy of extra_compile_flag dictionnary (#20221)
Summary:
See issue #20169
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20221

Differential Revision: D15317126

Pulled By: ezyang

fbshipit-source-id: 0a12932db4f6ba15ea1d558fa329ce23fe2baef6
2019-05-13 08:11:39 -07:00
30bdb8c0d7 Hotfix for caffe2 windows build (#20417)
Summary:
We don't need to overlay vc env when not using ninja. CMake will deal with it automatically. Overlaying is a no-op when the env is the same with the generator specified but will generate the error "Cannot find CMAKE_CXX_COMPILER" when they are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20417

Differential Revision: D15317081

Pulled By: ezyang

fbshipit-source-id: 5d9100321ecd593e810c31158f22c67d3e34973b
2019-05-13 08:03:45 -07:00
f496ea36b2 DataLoader: add error detection for worker_init_fn (#20150)
Summary:
This is an attempt to isolate unrelated changes from #19228 for easier review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150

Differential Revision: D15314891

Pulled By: ezyang

fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921
2019-05-12 18:28:56 -07:00
163f0e182c Fix bug in non_blocking copy (#20305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20305
ghimport-source-id: eb3dacb10fd93bbb5a6bbe078ed1ec842163d0e6

Differential Revision: D15276094

Pulled By: li-roy

fbshipit-source-id: 4728f419aa050e6c94a4f62231fa1a86caa556a7
2019-05-11 15:20:19 -07:00
6a8f55796a Add quant-dequant nodes for weights
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20041

Differential Revision: D15178086

fbshipit-source-id: 8cb060d72b68e44bf042338924f203ae62d74f6a
2019-05-11 14:03:10 -07:00
9499c7b7ee Profiling GraphExecutor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19994

Differential Revision: D15307752

Pulled By: Krovatkin

fbshipit-source-id: 7b35191042199ef16823487e15fe639968cbdc89
2019-05-10 23:05:47 -07:00
f4d9bfaa4d Support Exports to Multiple ONNX Opset (#19294)
Summary:
Support exporting multiple ONNX opsets (more specifically opset 10 for now), following the proposal in https://gist.github.com/spandantiwari/99700e60919c43bd167838038d20f353.
And add support for custom ops (merge with https://github.com/pytorch/pytorch/pull/18297).

This PR will be followed by another PR containing the changes related to testing the ops for different opsets.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19294

Reviewed By: zrphercule

Differential Revision: D15043951

Pulled By: houseroad

fbshipit-source-id: d336fc35b8827145639137bc348ae07e3c14bb1c
2019-05-10 18:37:12 -07:00
1129b3344a move DistillBatchLRLoss Layer from open source to fb
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20291

Reviewed By: chocjy

Differential Revision: D15272181

fbshipit-source-id: 2e0964fa1b1031607134548bb87c4e103c5b1383
2019-05-10 17:46:04 -07:00
3f3ee5600a make trace's errors more helpful in terms of what it can and can't do when tracing module's methods
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20368

Differential Revision: D15305340

Pulled By: Krovatkin

fbshipit-source-id: bafb13002df5c9741160e96205e0846243dde3ec
2019-05-10 17:40:21 -07:00
7d3d5b73f4 Add multiline type annotation support for Python frontend (#14922)
Summary:
This allows multiline type comments in accordance with [PEP 484](https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code)

```python
torch.jit.script
def foo(x   # type: Tensor
        y   # type: Tuple[Tensor, Tensor]
        ):
    # type: (...) -> Tuple[Tensor, Tensor]
    return x, x
```](https://our.intern.facebook.com/intern/diff/15268432/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14922

Pulled By: driazati

Differential Revision: D15268432

fbshipit-source-id: e9add8d8025e42390a14a643835d15cc67a2f33e
2019-05-10 17:27:41 -07:00
3a39ce0f41 Fix reflection on weak modules, copy attributes (#20190)
Summary:
* Constructs a new type at runtime so that `isinstance` checks work for
weak modules assigned to `ScriptModule`s
* Fix some extraneous names in `__constants__`
* Add `in_features` and `out_features` to `nn.Linear` `__constants__`

Fixes #19363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20190

Pulled By: driazati

Differential Revision: D15302350

fbshipit-source-id: 1d4d21ed44ab9578a4bc2a72396a82e9bbcd387c
2019-05-10 17:14:49 -07:00
00d0ddb140 Add all list specializations to pickler (#20191)
Summary:
TensorList, DoubleList, and BoolList were missing from the pickler, so
this adds them.

As a follow up a lot of the code for these could be templated and cut
down

](https://our.intern.facebook.com/intern/diff/15299106/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20191

Pulled By: driazati

Differential Revision: D15299106

fbshipit-source-id: f10c0c9af9d60a6b7fb8d93cea9f550b1a7e2415
2019-05-10 17:14:42 -07:00
6197eed409 Eliminate a const_cast.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20371

Differential Revision: D15298327

fbshipit-source-id: 566ec842e971ba6f80e333bb874cc5fd2c36b02e
2019-05-10 15:43:12 -07:00
a4ae689636 quantize_rnn_modules in ensemble_export (#20365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20365

Enable quantization of `torch.nn.LSTM` module in the decoder of PyTorch natively exported beam search models.

Reviewed By: jmp84

Differential Revision: D15260631

fbshipit-source-id: bbdd3a30c2c2110986eb7aa7ff11ce1c9090ddf4
2019-05-10 15:33:41 -07:00
a0c2829194 Preserve log_dir arg and member for SummaryWriter (#20382)
Summary:
Given that  tensorboardX and our PyTorch 1.1 release had `log_dir` as the argument for SummaryWriter initialization and member variable (which some users access), we need to  preserve this name. However, we might deprecate this in the future and I've added a `get_logdir` method that can be used in the future.

cc natalialunova, lanpa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20382

Reviewed By: NarineK

Differential Revision: D15300941

Pulled By: orionr

fbshipit-source-id: a29a70fcbc614a32ebfa6c655962fdff081af1af
2019-05-10 14:59:47 -07:00
3aa414c8f2 Add documentation to Dispatch.h (#20339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20339
ghimport-source-id: 99b6cf9f03be8d55674a43c8a7e19a42b75143f1

Differential Revision: D15295652

Pulled By: ezyang

fbshipit-source-id: f3b127ed3f53f13931596c06e181056940443290
2019-05-10 14:31:03 -07:00
10a9ef833c Avoid unnecessary refcount bump in unary operators. (#20331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20331
ghimport-source-id: 26588acca171555182714c6b6f89610564e228a1

Differential Revision: D15295193

Pulled By: ezyang

fbshipit-source-id: 44b7a8b4c9d41003a6559be2af0bd4f0ada54b31
2019-05-10 14:23:22 -07:00
75c6d37bac profile on uses
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20049

Differential Revision: D15242492

Pulled By: Krovatkin

fbshipit-source-id: f6eab3ec9d7a2f59f19905d757cf7c6f9ad2fdf6
2019-05-10 14:18:03 -07:00
c6255a57e4 Remove CPU_tensor_parallel_kernel_apply2 (#20207)
Summary:
This code is unused and has been superseded by TensorIterators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20207

Differential Revision: D15240832

Pulled By: cpuhrsch

fbshipit-source-id: 4f600bb8645f9b28a137e2cefb099978f5152d05
2019-05-10 14:04:32 -07:00
6dc70aa513 add test coverage for make_np (#20317)
Summary:
addresses https://github.com/pytorch/pytorch/pull/16196#discussion_r276381946

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20317

Differential Revision: D15289400

Pulled By: orionr

fbshipit-source-id: 914416a8c1369d95656f556c6e05348957789466
2019-05-10 13:59:48 -07:00
ce033485eb Convenience APIs for script objects (#20226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20226
ghimport-source-id: 22937d72e35ec4eba38019284a368453089fe3eb

Differential Revision: D15243625

Pulled By: suo

fbshipit-source-id: 5e9fb773da244f9ef201dba524155c3b19b2b4e0
2019-05-10 13:03:58 -07:00
50149fb66b Adds quantized addition and renames sum to add (#20233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20233

Adding a quantized addition (without relu)

Reviewed By: jianyuh

Differential Revision: D15245791

fbshipit-source-id: 34ede5d805d9ab0d31e8ae87cefb110504bd3c87
2019-05-10 12:53:33 -07:00
35fed93b1e Adding Poisson NLL loss to libtorch (#19316)
Summary:
This PR add Poisson NLL loss to aten and substitute the python implementation with a call to the c++.

Fixes #19186.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19316

Differential Revision: D15012957

Pulled By: ezyang

fbshipit-source-id: 0a3f56e8307969c2f9cc321b5357a496c3d1784e
2019-05-10 11:57:49 -07:00
ed25b8a667 Add a FAQ entry to explain Cannot insert a Tensor that requires grad as a constant
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20181

Differential Revision: D15296816

Pulled By: Krovatkin

fbshipit-source-id: 382e3a0aa982774771e98050cbc65d144cbd959e
2019-05-10 10:55:33 -07:00
ea5c9c9267 Update installing.rst (#20354)
Summary:
Delete useless `cd`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20354

Differential Revision: D15296154

Pulled By: soumith

fbshipit-source-id: 2042b56c91b33e302b0ed9c77f29b9b64079fa98
2019-05-10 10:04:06 -07:00
4ba28deb6e Unify libtorch and libcaffe2 (#17783)
Summary:
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch".  This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so".  This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller.

The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`.

The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783

Differential Revision: D15284178

Pulled By: kostmo

fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05
2019-05-10 09:50:53 -07:00
872bab22c6 Some essential changes needed before updating the Windows AMI (#20353)
Summary:
1. Add cuda 10.1 build
2. Turn on openmp loop support for VS 2019
3. Remove legacy code about selective builds

Tested through CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20353

Differential Revision: D15294806

Pulled By: ezyang

fbshipit-source-id: 0acf5c3fbbc398fd9ebdf9f97653499d39638432
2019-05-10 09:08:51 -07:00
d68802ba47 Sparse half embeddings on cuda (#19695)
Summary:
```
import torch
a = torch.nn.Embedding(3, 4, sparse=True).half().cuda()
a(torch.LongTensor([1, 0]).cuda()).sum().backward()

```
gave: `RuntimeError: torch.cuda.sparse.HalfTensor is not enabled`

This PR enables sparse.HalfTensor on cuda. Still won't work for CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19695

Differential Revision: D15281162

Pulled By: nairbv

fbshipit-source-id: 0d83d946a059393bd53d8b8102e2daa9b4c02588
2019-05-10 08:00:55 -07:00
148e90ba2a Give clear error message when attempting to merge struct which can't be merged.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19804

Differential Revision: D15098833

fbshipit-source-id: 2950e247c74e125e033cd9cfbf5631eee5298ea0
2019-05-10 07:01:01 -07:00
c2c0a32155 Remove setting logger level in caffe2.python.checkpoint (#19803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19803

There is no reason to set a specific logging level for this module. Removing it to just use the default logging level.

Differential Revision: D15098834

fbshipit-source-id: 1654c04500c19690ddde03343f2e84b04bb0f1ef
2019-05-10 07:00:58 -07:00
2a875fc126 Fix THD->c10 dependency to gflags.h (#20319)
Summary:
Fixed #20250

Not sure if there's any specific design reason to `add_dependecy()` and manually add a few include dir, instead of linking the target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20319

Differential Revision: D15294584

Pulled By: ezyang

fbshipit-source-id: 97f813a6b1829dad49958e0f880b33eb95747607
2019-05-10 06:50:58 -07:00
8726b27333 Fix overlay_vc_env when called by legacy python (#20304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20155.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20304

Differential Revision: D15292369

Pulled By: zdevito

fbshipit-source-id: 7da2e0cb85c98d0fcd4461d39e2a8c57391db60e
2019-05-10 06:44:58 -07:00
c397134d6b Revert D15156384: Dict
Differential Revision:
D15156384

Original commit changeset: b9313ec4dd9a

fbshipit-source-id: 3b44f49ec4eaba692cfb2cfe46e5f98102e337d9
2019-05-10 06:11:25 -07:00
85d56852d3 Revert D15227620: Allow Dict type in c10 operators
Differential Revision:
D15227620

Original commit changeset: c1ea6c12165e

fbshipit-source-id: b2cecfffdd38b7c97e20d0ee81915ea10daf8460
2019-05-10 06:11:22 -07:00
c744468e36 Revert D15227621: Extend testAvailableArgTypes
Differential Revision:
D15227621

Original commit changeset: 83db7536e906

fbshipit-source-id: 8ecc443bd18787de52572fde06df408e1c52c50d
2019-05-10 06:11:18 -07:00
99874f87cb Use registry for BoundShapeInferencer (#20341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20341

Att

Reviewed By: ipiszy

Differential Revision: D15288131

fbshipit-source-id: 956ced99cc5c5b8199f81f7baa844fe8a0505456
2019-05-10 01:16:02 -07:00
a0e5240afc Fix DistributedDataParallelTest.test_accumulate_gradients (#20351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20351

This was broken because of a merge race between #20282 and the stack in #20236.

Cleaned up the test and comments a bit as well.

Differential Revision: D15292786

fbshipit-source-id: a4379ea700cad959d3a6921fc5ddf9384fb8f228
2019-05-09 23:27:18 -07:00
02df1ccd9c Remove const_cast's from subgraph matcher. (#20303)
Summary:
The trick here is that creating a mapping from const values to
const values means that downstream clients that want to mutate
the output of the mapping are stuck.  However, a mapping from
const values to non-const values is just fine and doesn't put
constraints on downstream clients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20303

Differential Revision: D15284076

fbshipit-source-id: 16206fd910dd5f83218525ca301b1889df0586cb
2019-05-09 18:07:14 -07:00
e47b210075 Adding setup job as prereq to html update jobs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20325

Differential Revision: D15287710

Pulled By: pjh5

fbshipit-source-id: 2bbed3a46c4affb5ae4e6dd4feb1dda59aeb5d04
2019-05-09 16:25:22 -07:00
3afd99680c Remove SourceLocation (respin) (#20333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20333
ghimport-source-id: e64075bb82067224463e9955d10bd13967d1975d

Differential Revision: D15284081

Pulled By: zdevito

fbshipit-source-id: ac26ae48392b9daff08f460529c06af8f4e4722a
2019-05-09 16:17:33 -07:00
558c6c4d8a Make DistributedDataParallel usable with CPU models (#20236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20236

Use the new version of broadcast_coalesced that deals with both CPU
and CUDA models. Add tests that evaluate correctness of
DistributedDataParallel for CPU models.

Closes #17757.

Reviewed By: mrshenli

Differential Revision: D15245428

fbshipit-source-id: d2fa09f68593b3cd1b72efeb13f5af23ebd5c80a
2019-05-09 14:11:17 -07:00
f32c9bd5e9 Refactor core DistributedDataParallel tests (#20235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20235

The tests expected to only run for CUDA models. In a future commit we
need to update this to work for CPU models as well. Therefore, we can
no longer rely on only integers being passed for device identifiers.
With this change we pass both the materialized list of devices to use
(as `torch.Device` objects), as well as an optional list of integers.
The latter is specified to exercise the code in the
DistributedDataParallel constructor that turns a list of integers into
CUDA devices, IFF it is used to wrap a single-device CUDA module.

This commit also groups together the 'str' and non-'str' tests. These
used to test passing the list of devices as integers or as
`torch.Device` instances. These are now executed from the same test.

Reviewed By: mrshenli

Differential Revision: D15245429

fbshipit-source-id: 5797ba9db33d2c26db8e7493c91bb52f694285ac
2019-05-09 14:11:14 -07:00
caa0d0c50a Add c10d::broadcast_coalesced and tests (#20234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20234

The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.

This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.

See #17757 for more information.

Reviewed By: mrshenli

Differential Revision: D15228099

fbshipit-source-id: a2113ba6b09b68cb5328f49f4c1960031eb43c93
2019-05-09 14:11:08 -07:00
c31fccd678 Fix crash issue in conv+sum fusion for MKLDNN on caffe2 (#20139)
Summary:
The isConvFusion(...) is only for Conv op.
If non-Conv op, the crash takes place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20139

Differential Revision: D15280604

Pulled By: yinghai

fbshipit-source-id: eb45be11990b3bf7c5b45f02ebb6018444ab5357
2019-05-09 13:53:46 -07:00
8c3a7bb57f Move librosa and psutil installation from CI script to docker images build script (#20299)
Summary:
pip install librosa randomly coredump, causes CI flakiness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20299

Differential Revision: D15276270

Pulled By: bddppq

fbshipit-source-id: 9105106f41aaacf620751290b016359ef7d665b3
2019-05-09 13:48:29 -07:00
e870b11ae6 Revert D15275731: Remote SourceLocation
Differential Revision:
D15275731

Original commit changeset: f4da178c3137

fbshipit-source-id: 830b79735eb2dadc4795b5aae407826bf20ef121
2019-05-09 13:07:11 -07:00
eca91de5d2 Remote SourceLocation (#20300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20300
ghimport-source-id: 06f606c4db3b70b1d2ed9f6ed4542c3f703c4e17

Differential Revision: D15275731

Pulled By: zdevito

fbshipit-source-id: f4da178c31372c2264feb9f99476b9c9aa66c1f2
2019-05-09 11:48:29 -07:00
726661b152 profiler: improve repr for averaged events (#20281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20281

This is how it looks like now:

```
<FunctionEventAvg key=mm self_cpu_time=11.404s cpu_time=2.895ms
cuda_time=0.000us input_shapes=[[26, 4096], [4096, 1024]]>
```

Before I forgot to update the repr for these when updated it for not
averaged events

Differential Revision: D15262862

fbshipit-source-id: a9e5b32c347b31118f98b4b5bf2bf46c1cc6d0d2
2019-05-09 11:34:52 -07:00
39af9563e2 Re-enable CUDA tests for C++ API (#20238)
Summary:
CUDA tests for C++ API were not running on the CI due to a missing character in https://github.com/pytorch/pytorch/pull/11554. This PR fixes the bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20238

Differential Revision: D15279945

Pulled By: yf225

fbshipit-source-id: 1fa7439cd40b6f2fba6792eb790bacf0d67d0054
2019-05-09 11:06:51 -07:00
fe714862dd Extend testAvailableArgTypes (#20185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20185

This test case now also tests that the argument type works correctly in kernels that
 - don't return outputs
 - return multiple outputs

Reviewed By: li-roy

Differential Revision: D15227621

fbshipit-source-id: 83db7536e9065e0f8c5032d6b96e970bbaf718b3
2019-05-09 10:54:16 -07:00
e0166f4670 Allow Dict type in c10 operators (#20184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20184

- Add support for Dict<Key, Value> arguments and returns to c10 operators
- Add support for std::unordered_map<Key, Value> to the legacy API (but not to c10 kernels)

Reviewed By: li-roy

Differential Revision: D15227620

fbshipit-source-id: c1ea6c12165e07b74272cb48c6021bdb5c2d7922
2019-05-09 10:54:12 -07:00
c92129033a Dict (#19976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19976

Implement a Dict type that allows us to abstract away from the concrete implementation used.
The API is similar to std::unordered_map, but behind the scenes we can switch to any map implementation we like. ska::flat_hash_map, google dense map, or any future map implementation with better performance.
Switching such an implementation choice does not have to break backwards compatibility of kernel code using the Dict type.

Reviewed By: li-roy

Differential Revision: D15156384

fbshipit-source-id: b9313ec4dd9acb3b6a0035345b6ba4f2a437d1e5
2019-05-09 10:54:07 -07:00
2019f6cd51 Add unit test to ensure no gradients sync when calling ddp.module(input) (#20282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20282

Add unit test to ensure no gradients sync when calling ddp.module(input), e.g not invoking prepare_for_backward

PyText is depending on DDP for data parallel distributed training. To support accumulate gradients locally before gradients sync, we are calling orig_model.forward instead of ddp_model.forward. Add a unit test to avoid changes break the assumption.

Reviewed By: pietern, mrshenli

Differential Revision: D15263155

fbshipit-source-id: 7734e174f507690fb23ea6c52dffff4a93f9b151
2019-05-09 10:15:19 -07:00
899bddeeb6 fix typo in adaptive methods annotation (#20306)
Summary:
fixes #20215
The confusing behavior was caused by typos in type annotation :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20306

Differential Revision: D15276216

Pulled By: ailzhang

fbshipit-source-id: 1b0c9635a72a05c9b537f80d85b117b5077fbec7
2019-05-09 09:29:37 -07:00
83a80d2b31 Add test/test_namedtensor.py (#20168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20168
ghimport-source-id: 78bd3c4b6bc87c216ce33dba13b61feb87e5fe53

Reviewed By: gchanan

Differential Revision: D15278222

Pulled By: zou3519

fbshipit-source-id: 3bcdb1cb654400350d42464dd9e0d5e0a7116e1e
2019-05-09 09:09:22 -07:00
199fa12dee Add namedtensor build and test to the CI (#20163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20163
ghimport-source-id: 6a821a708a20dbfa84c31a8af067d451c59b3964

Reviewed By: gchanan

Differential Revision: D15278216

Pulled By: zou3519

fbshipit-source-id: 0bc28929f8b3ce317de06f1fc275e586c649785c
2019-05-09 09:09:19 -07:00
e01a5bf28b Add USE_NAMEDTENSOR compilation flag. (#20162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162
ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5

Reviewed By: gchanan

Differential Revision: D15278211

Pulled By: zou3519

fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf
2019-05-09 09:09:16 -07:00
f23fb66e6e Fix in file position logic: file descriptor and Python-side handle (#20270)
Summary:
This addresses #18436

The logic replicates the essence of closing file descriptors in numpy:
bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)

This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270

Differential Revision: D15275902

Pulled By: soumith

fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb
2019-05-09 08:20:01 -07:00
c406bf20a0 error instead of crashing on attempt to subclass typed tensors (#20283)
Summary:
https://github.com/pytorch/pytorch/issues/20052

typed tensors (e.g. torch.FloatTensor) can't be subclassed. Was causing
crashes and other errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20283

Differential Revision: D15278138

Pulled By: nairbv

fbshipit-source-id: 8493eac4d34dfb76b054362bf0acec02146cd0e2
2019-05-09 07:10:38 -07:00
1e35ef07e9 Switch off USE_DISTRIBUTED on default for MSVC (#20302)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20250
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20302

Differential Revision: D15277733

Pulled By: ezyang

fbshipit-source-id: a8915939d033a04f962908d19bbad840b6234e27
2019-05-09 06:29:31 -07:00
2aea5b6335 Fixed Softmax doc to specify dimension to prevent warning in 1.1.0. (#20310)
Summary:
See Issue #20301

Specifying dim in docstring example to prevent UserWarning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20310

Differential Revision: D15277734

Pulled By: ezyang

fbshipit-source-id: 2e8b748dbe743675a5a538ccbe97713aad02e8ac
2019-05-09 06:21:57 -07:00
0087069dce Use torch::get/set_num_threads without additional includes beyond torch/torch.h (#20176)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20130.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20176

Differential Revision: D15275036

Pulled By: yf225

fbshipit-source-id: 0f04e1fbfed18c07030b20e92e957ef5f2b5707d
2019-05-09 06:08:27 -07:00
916ef76817 Sort User Defined Classes (#19706)
Summary:
Schema matching for sort is a little tricky because you have to check whether the class defines the __lt__ method correctly, and you have to mark whatever effects happen in __lt__ to the sorting op as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19706

Differential Revision: D15244366

Pulled By: eellison

fbshipit-source-id: 73b3e36462c6cc40f9d8cb235b44499a67d3149e
2019-05-09 00:07:51 -07:00
1102f4c56e Bump op_version_set (#19812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19812
ghimport-source-id: 7cdb24c6a7501c6ec5f0eae07325512746f5abb9

Differential Revision: D15102803

Pulled By: zdevito

fbshipit-source-id: bedf0bd6e1170fa294c65c87df75b82d8694f89c
2019-05-08 23:21:22 -07:00
3d4d7b9082 Refactor ChunkDataReader API + fix missing headers (#19485)
Summary:
This PR restricts the BatchType template argument of ChunkDataReader to STL
vectors only. Internally, ChunkDataReader was assuming BatchType was a
vector, but the user could pass any type to the template argument,
leading to compiling issues during CPP extensions.

Additionally to the proposed API change, this PR adds missing include
headers to chunk.h. Currently the current implementation works but if
users try to create C++ extensions that implements new ChunkDataReaders
to be along with the existing ChunkDataset, the build will fail due to
the missing headers.

In terms of functionality, nothing has changed. This PR simply makes the
implementation slightly more robust for future extensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19485

Differential Revision: D15261725

Pulled By: soumith

fbshipit-source-id: 38c9465d665392ae6a2d12c5a520a4f501e1a6ca
2019-05-08 22:20:19 -07:00
bed1d7d3ff Eliminate some const_cast's.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20279

Differential Revision: D15261884

fbshipit-source-id: 8daf76a7031ca5a648995218fd14a9aa143612dc
2019-05-08 21:51:24 -07:00
d6815e1e27 Only record grad_fn in C++ Scatter and Gather when required so (#20286)
Summary:
C++ `Scatter` and `Gather` always set autograd history for input data tensors regardless whether they require grad. This hits assertion failure in `set_history(Tensor, shared_ptr<Function> grad_fn)`
where `grad_fn` cannot be nullptr. After this PR, C++ `Scatter` and `Gather` only record `grad_fn` when required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20286

Differential Revision: D15266610

Pulled By: mrshenli

fbshipit-source-id: 641df0ea36e7c922b5820c8dc3f83e2a050412b5
2019-05-08 21:04:21 -07:00
2179d5b32b Dynamic quantized full LSTM module (#20249)
Summary:
Previously we only had a Python wrapper for `torch.quantized_lstm_cell`. We had the op `torch.quantized_lstm`, but it didn't have a wrapper. This makes the wrapper
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20249

Reviewed By: driazati

Differential Revision: D15250023

Pulled By: jamesr66a

fbshipit-source-id: f05ad784d903e0ef3a62633c8bf80bad79de48ae
2019-05-08 19:46:59 -07:00
7bc8562a9a Enable ONNX constant folding in test_pytorch_onnx_caffe2.py tests. (#20290)
Summary:
This is a step towards enabling the ONNX constant folding pass by default in the PT->ONNX export. In this change we have enabled test points in `test/onnx/test_pytorch_onnx_caffe2.py`  to run with constant folding pass enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20290

Reviewed By: zrphercule

Differential Revision: D15271674

Pulled By: houseroad

fbshipit-source-id: 9e59ab46ae74b4ad8dea1a2200ecc1f3eb8aad75
2019-05-08 18:47:03 -07:00
3ee97183b0 ScaleBlobs Operator (#19660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660

Implementation of aggregated Scale operator.
The operator takes a list of tensors as an input and scales all of them them with the argument float value.
The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are
necessary for the GPU version of the kernel.

Reviewed By: BIT-silence

Differential Revision: D14984233

fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611
2019-05-08 17:57:33 -07:00
4d676d53a6 split canonicalize_ops, make a decompose pass (#19988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19988
ghimport-source-id: 1dbf39e07099fa24ef9a6c0221312bf01a8011b7

Differential Revision: D15190355

Pulled By: wanchaol

fbshipit-source-id: 83f2b6557efd758810ccb4a4229d71fdebfd06e0
2019-05-08 17:21:59 -07:00
33b4afe3bb dont make alias for none value (#20112)
Summary:
Don't make an alias value for a value that is known to be None. This was preventing constant propagation from running the `out is None` check in nn.functional.normalize, and thus preventing the if statement from being inlined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20112

Differential Revision: D15267328

Pulled By: eellison

fbshipit-source-id: 5b878b0dc50944c2e7a2f583ea483dad9d6bbec3
2019-05-08 17:10:19 -07:00
8ebb86dd3a Support torch.save for saving values during execution (#18154)
Summary:
This PR makes `torch.save` call out to the pickler which saves a tensor in the same format that `torch.save()` does, the file looks like `| pickle archive 1 (includes sizes, strides, requires_grad, etc...) | pickle archive 2 (list of tensor keys) | tensor binary data |` and can be read back in with `torch.load(my_file, pickle_module=torch.jit._pickle)`

Fixes #18003

Unpickling in the JIT for things such as model parallelism will be a follow up PR
](https://our.intern.facebook.com/intern/diff/15015160/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/18154

Pulled By: driazati

Differential Revision: D15015160

fbshipit-source-id: ef76a44b8c243f4794cd7e245ec8305e965bc59f
2019-05-08 16:52:53 -07:00
27c82c03c5 Unify code path for native mkldnn conv and reorder on-the-fly mkldnn conv (#19963)
Summary:
Also, current mkldnn primitive code recreate the computation everytime, causes tiny Convolutions spending significant portion of its time on the repeated codegen. ideep has implemented an lru cache to save the computation and so this change will help us improving perfs for tiny Convolutions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19963

Differential Revision: D15156527

Pulled By: bddppq

fbshipit-source-id: 6a8fbd10a213ec22cdeaff1a2bdb0d09905d1fcd
2019-05-08 15:30:05 -07:00
b3bce01e26 Have add_video use NamedTemporaryFile directly (#20223)
Summary:
address comment in #16196
https://github.com/pytorch/pytorch/pull/16196/files#r278676986

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20223

Reviewed By: natalialunova

Differential Revision: D15261528

Pulled By: orionr

fbshipit-source-id: 1aebcc6cb1c9313d890c5b506973855ebc63fb3b
2019-05-08 15:00:44 -07:00
35de90e324 Canonicalize order of If and Loop outputs (#20015)
Summary:
Canonicalize the ordering of outputs of if and loop nodes based on their first usage. Previously we were able to canonicalize output order by sorting on variable name, but this breaks down with outputs added in an early return pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20015

Differential Revision: D15266066

Pulled By: eellison

fbshipit-source-id: ba5340c068a68b1ffc73f056db194b92d3274dc4
2019-05-08 14:52:07 -07:00
1ab33fce9a Disable worker_kill & holder_iter_reference combination in test_proper_exit (#20172)
Summary:
cc nairbv
All failures I have seen are of this combination. So let's just disable it for all cases. After #20063 I find it failing for py3 once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20172

Differential Revision: D15266527

Pulled By: nairbv

fbshipit-source-id: afb9389dfc54a0878d52975ffa37a0fd2aa3a735
2019-05-08 14:39:47 -07:00
7edf9a25e8 Clarify API and add examples for all methods (#20008)
Summary:
As a part of supporting writing data into TensorBoard readable format, we show more example on how to use the function in addition to the API docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20008

Reviewed By: natalialunova

Differential Revision: D15261502

Pulled By: orionr

fbshipit-source-id: 16611695a27e74bfcdf311e7cad40196e0947038
2019-05-08 14:06:10 -07:00
4a086f700f Quantized FCRelu operator (#19750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19750

Add the FCRelu operator.

Differential Revision: D15079097

fbshipit-source-id: b8e8a591158ad2355bf7acd14476993bbc9beae6
2019-05-08 14:00:40 -07:00
e8cdfb5d23 Use QTensor with quantized FC operator (#19541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19541

For the quantized FC operator, replace the tuple (Tensor, scale, zero_point) with QTensor.

Differential Revision: D14900407

fbshipit-source-id: 164df38f3564e0a68af21b9fedaba98a44ca1453
2019-05-08 14:00:37 -07:00
8defcbfcf4 Enable caffe2 softmax tests with ROCm 2.4 (#20280)
Summary:
cc xw285cornell petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20280

Reviewed By: xw285cornell

Differential Revision: D15262695

Pulled By: bddppq

fbshipit-source-id: d72490ff599cdab0331230bc9b12075085386319
2019-05-08 13:29:11 -07:00
1deb8dde58 Quantized FC operator (#19497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19497

Implements a basic quantized FC (uint8 * int8 -> uint8) with FBGEMM APIs.

Related document:
https://fb.quip.com/rP5tAx56ApMM
https://fb.quip.com/MU7aAbzGDesu

Work Item List:
1. [DONE] currently we use prepack routines inside Quantized FC operator. Will separate it as a standalone operator soon.
2. [DONE] rebase to D14817809 and D14994781 (cpp custom types).
3. [DONE] correctness unit test.

4. [To Do] rebase to QTensor. Similar to D14565413, this will be implemented in the next Diff.

Differential Revision: D14761865

fbshipit-source-id: 031a39915fecd947afb4dd2719112b4ddc1082d3
2019-05-08 13:17:51 -07:00
7b733e4fc1 Rebase conflict fix for isFusableDevice (#20251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20251
ghimport-source-id: 0c8c1847a7979fcd77e4f6618730b170b6b8ce25

Differential Revision: D15262850

Pulled By: bwasti

fbshipit-source-id: 17ecc340a310ddbcce141cfa3ee0efa9660194d2
2019-05-08 12:14:12 -07:00
c931d7e9d2 SubgraphRewriter: Add a support for arbitrary replacement graphs in subgraph rewriter. (#20084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20084
ghimport-source-id: 91b3b0b66da00c6592a2d57c8f2a88a73c019d1a

Differential Revision: D15190191

Pulled By: ZolotukhinM

fbshipit-source-id: d57ba6b6790ea2fd277b2feb3f4a58895ed15486
2019-05-08 11:50:46 -07:00
b3324d0fe3 SubgraphRewriter: Expose runOnGraph and use it in tests. (#20083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20083
ghimport-source-id: e4d425775c2a2fb5ed334727e902a91f744b697c

Differential Revision: D15190192

Pulled By: ZolotukhinM

fbshipit-source-id: 5fbcd61fa631d8f22b5016754f8d1a46eefb19c5
2019-05-08 11:50:43 -07:00
8a6072c3bd SubgraphRewriter: Rename pattern fusion to subgraph rewrite. (#20082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20082
ghimport-source-id: f0594f4ad918288fb3158b4ecfa8010cf09dd0c2

Differential Revision: D15190193

Pulled By: ZolotukhinM

fbshipit-source-id: 81b026398c94f2fbf7487cafbb86b7364a78d827
2019-05-08 11:22:29 -07:00
1a85e57334 flake fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20278

Differential Revision: D15261153

Pulled By: eellison

fbshipit-source-id: 2b210d50bc49c7a91605f1bf957890445077d21f
2019-05-08 10:51:10 -07:00
2a104f7383 Port ATen/native to ATen/Parallel (#20043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20043
ghimport-source-id: 8003ef8ca335d4c4717a886a3a75fd78ec53ade5

Differential Revision: D15248505

Pulled By: ilia-cher

fbshipit-source-id: 7be500ed8bfb23cc36f1dd7108e344319e3e5332
2019-05-08 10:33:43 -07:00
a7db3a7591 Error out if git log fails in setup_ci_environment (#20231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20231
ghimport-source-id: f58c44a52c719bf3c5cdcaa1f1e9624466423ace

Reviewed By: ezyang

Differential Revision: D15244497

Pulled By: zou3519

fbshipit-source-id: 87ca362aaa6a9040d32080b208272dacf7e45f63
2019-05-08 10:05:42 -07:00
0626ea4300 Run 'checkout' before 'setup ci environment' on pytorch linux tests (#20213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20213
ghimport-source-id: ea395ee1151207f37e030ed0126564ff7a73bac8

Reviewed By: ezyang

Differential Revision: D15244428

Pulled By: zou3519

fbshipit-source-id: 45ba2a80a2a9d056f806b5960bfe20f3c60f1e33
2019-05-08 10:05:39 -07:00
cd72be20e0 Update ROCm 2.4 (#20253)
Summary:
xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20253

Reviewed By: ezyang

Differential Revision: D15256826

Pulled By: bddppq

fbshipit-source-id: 405c21fc727d8145c4d3ca4fe8d84804569ebe53
2019-05-08 09:35:40 -07:00
ede38bd743 Port THNN to ATen/Parallel (#20032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20032
ghimport-source-id: f8e1a4c7726585f2d1c894b64e3687ede6177282

Differential Revision: D15247960

Pulled By: ilia-cher

fbshipit-source-id: de784c0002cbe074c87f912a5824e8a75683dbf8
2019-05-08 09:31:39 -07:00
fa6a00f313 Fix memory leak in torch._dirichlet_grad() (#20244)
Summary:
Fixes https://github.com/pyro-ppl/pyro/issues/1853

This fixes a memory leak in `torch._dirichlet_grad()`. This function is used for reparametrized gradients for the `Dirichlet` and `Beta` distributions.

- [x] Could a reviewer please confirm that `freeCopyTo()` is being used correctly and doesn't need an additional `decref()`? The author is unfamiliar with PyTorch C++ memory utilities. Help appreciated.

- ran locally and confirmed leak is fixed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20244

Differential Revision: D15259008

Pulled By: ezyang

fbshipit-source-id: 222ec7d80ddd97bcdd7d54549f3e756575e8402e
2019-05-08 07:45:39 -07:00
10715ffc30 Mention issue number in the JIT workaround comments (#20222)
Summary:
This just updates the `JIT` comments with the issue number #20215.  Hopefully this will stop the proliferation of the workaround. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20222

Differential Revision: D15259007

Pulled By: ezyang

fbshipit-source-id: 5060a351aa618c6dae49d0b7a6ac9b0f57f2490a
2019-05-08 07:34:30 -07:00
a5ff09782e Fix missing files for upload jobs (#20265)
Summary:
The earlier fix to extract scripts missed an attach_workspace which was used to make the built binaries available to the nightly build upload jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20265

Differential Revision: D15259080

Pulled By: soumith

fbshipit-source-id: bf835c2cd76976b4563798ee348f7db83c7a79c1
2019-05-08 07:02:41 -07:00
2db9066a41 Fix formatting for note in eig. (#19743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19743
ghimport-source-id: fcb5f1aa3ee3d71e06ac1b8fbe6d6859a3547d63

Reviewed By: zou3519

Differential Revision: D15258642

Pulled By: ezyang

fbshipit-source-id: 7091fc3e7c829542a65ae3a490912d8d13aadfb3
2019-05-08 06:37:21 -07:00
d6f62b70f3 Fix cuda and cudnn libraries search process on Windows (#20205)
Summary:
Fixes #20202
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205

Differential Revision: D15258626

Pulled By: ezyang

fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770
2019-05-08 06:08:47 -07:00
d24c0aa82f Remove explicit checks for parallelism from TH (#20002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20002
ghimport-source-id: b2037d3226d52ac672578700f77aca215eb309a0

Differential Revision: D15232028

Pulled By: ilia-cher

fbshipit-source-id: 2c818ba819c95e1d709e0cea8c4f7816fddbcfc1
2019-05-08 02:33:31 -07:00
0ebe252c9c Port TH library to ATen/Parallel (#19105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19105
ghimport-source-id: db3e26f89d098e86215c48e464ace615193f5772

Differential Revision: D14947557

Pulled By: ilia-cher

fbshipit-source-id: 7e987e74c034646ba818f02e7bd711aba2ee3364
2019-05-08 01:07:17 -07:00
26dd65eaf8 Namespace isolation for classes (#19903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19903
ghimport-source-id: deadf59f469ad620d0ee10b089dfc9bb92171710

Differential Revision: D15118978

Pulled By: suo

fbshipit-source-id: f2b487fd65520d1b7f45cb74145634d334ef1614
2019-05-07 22:48:31 -07:00
e41aa0ed2f fix parsing bugs (#20246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20246
ghimport-source-id: 611ea44dc6c1d31cee99fe3e89628be072a4a381

Differential Revision: D15247081

Pulled By: zdevito

fbshipit-source-id: 4b836ce6b665c26bb6eb5347206c55d12807fae6
2019-05-07 19:35:51 -07:00
21a3895c7d Extract repeated scripts into files (#19674)
Summary:
The current pytorch config.yml is causing some backend performance
problems on CircleCI, due to the size of the file when all of the YAML
anchors have been expanded. You can view the "processed" config as our
internal system deal with it by running `circleci config process`.

    circleci config process .circleci/config.yml | wc -c

    Before: 2833769 bytes
    After:   558252 bytes (~80% less)

Add create a new job, `setup`, that has 2 functions:
- Assert that config.yml is up to date
- Put the .circleci/scripts directory into a workspace, so that
downstream jobs can easily access it.

The `setup` job becomes the parent of all jobs in the workflow. This
allows us to fail fast if config is invalid. It might be a good place to
add other, quick, lint checks to help fail the build faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19674

Differential Revision: D15252864

Pulled By: pjh5

fbshipit-source-id: 0778c7b8f95e7f3f33ac92fbb8862377fc9fb0ac
2019-05-07 19:04:00 -07:00
42e9a619b3 add decay parameter in ref_adagrad (#15329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15329

Add decay parameter to match with C++ Adagrad implementation.

Reviewed By: chocjy

Differential Revision: D13300991

fbshipit-source-id: db734df0202d8f5fd156f2742207d0b5a3aa7348
2019-05-07 18:58:58 -07:00
f0b5ad8919 add str builtin support (#20188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20188
ghimport-source-id: 390b889f67155eb6168a96d05307ec4e86f5c631

Differential Revision: D15228812

Pulled By: zdevito

fbshipit-source-id: 789efa2e0b2f9518ed97ac877b59637e5b0ebcf4
2019-05-07 18:51:14 -07:00
9fcf585475 Autograd profile recording in c10 custom ops (#20175)
Summary:
This ensures that custom operators registered through c10::RegisterOperators are recorded in autograd profile traces.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20175

Differential Revision: D15221311

Pulled By: jamesr66a

fbshipit-source-id: 9452b24272c2399c20a49af85b62d34cabe6e27a
2019-05-07 18:41:36 -07:00
fb9d9fbd4e smoke test for add_graph (#20007)
Summary:
Do tests with common models from torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20007

Differential Revision: D15251754

Pulled By: orionr

fbshipit-source-id: 9dc09bd407b3ccaaa310d2f4a8d53d5a7d12469d
2019-05-07 18:29:25 -07:00
3ac4d92824 tweak scripts/build_android.sh for ABI and header install (#20152)
Summary:
We now can build libtorch for Android.
This patch aims to provide two improvements to the build
- Make the architecture overridable by providing an environment variable `ANDROID_ABI`.
- Use `--target install` when calling cmake to actually get the header files nicely in one place.

I ran the script without options to see if the caffe2 builds are affected (in particularly by the install), but they seem to run OK and probably only produce a few files in build_android/install.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20152

Differential Revision: D15249020

Pulled By: pjh5

fbshipit-source-id: bc89f1dcadce36f63dc93f9249cba90a7fc9e93d
2019-05-07 17:44:07 -07:00
faf2c3ac26 Standard gamma's export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20126

Reviewed By: zrphercule

Differential Revision: D15243663

Pulled By: houseroad

fbshipit-source-id: 7f5f63f37462a844b03b98783c10e6c21f608a52
2019-05-07 17:37:25 -07:00
48e649b803 Automatic update of fbcode/onnx to 5bde6371620b76302864bce90f521d72eda95d0e (#20232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20232

Previous import was 7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef

Included changes:
- **[5bde6371](https://github.com/onnx/onnx/commit/5bde6371)**: Add shape inference for legacy auto_pad modes (#1988) <stevenlix>
- **[6c9b3407](https://github.com/onnx/onnx/commit/6c9b3407)**: Move Quantization working group to completed state (#1980) <Prasanth Pulavarthi>
- **[8eba124e](https://github.com/onnx/onnx/commit/8eba124e)**: Define the IR acronym (#1985) <Senja Filipi>

Reviewed By: bddppq

Differential Revision: D15244357

fbshipit-source-id: e1a4eaee5de44f9eb944a290afac7a678b50b033
2019-05-07 17:32:48 -07:00
6606ac5d41 Support operator overloading for UDT (#20033)
Summary:
Support operator overloading for User Defined Types, which includes desugaring `a + b` and python builtin functions which call into a method if it is defined like `len(x)`.

See https://rszalski.github.io/magicmethods/ for list of magic methods.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20033

Reviewed By: driazati

Differential Revision: D15246573

Pulled By: eellison

fbshipit-source-id: 03d45dd524ea2a3b40db36843d6067bede27b30d
2019-05-07 17:28:06 -07:00
9c62280ea8 Remove brew libomp from binary mac machines (#20228)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/20030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20228

Differential Revision: D15246258

Pulled By: pjh5

fbshipit-source-id: f57033af74b678566be02cdf5700b5ae6e154d4a
2019-05-07 17:09:40 -07:00
eecf52b444 Fix in benchmark_test_generator (#20237)
Summary:
Add missing import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20237

Differential Revision: D15245957

Pulled By: ilia-cher

fbshipit-source-id: 0f71aa08eb9ecac32002a1644838d06ab9faa37c
2019-05-07 17:03:25 -07:00
8a375189ea Remove flake8 E303 (too many blank lines) (#20225)
Summary:
similar to too few blank lines, I feel like this is not important enough to warrant breaking signal for all linters when it's violated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20225

Differential Revision: D15243480

Pulled By: suo

fbshipit-source-id: 37cdc18daf09e07081e42b69c72d331d81660217
2019-05-07 16:29:15 -07:00
ba3e6de4c2 Fix test_namedtuple_return (#20212)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/20198

Input matrix created randomly might be rank deficient or not positive semidefinite

Not tested yet, will look at CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20212

Differential Revision: D15241331

Pulled By: zou3519

fbshipit-source-id: b71cdc5d5b622b5423a43b86d75cfaf3d9484100
2019-05-07 15:58:25 -07:00
785583a435 Use ignore=dirty in submodules. (#20135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20135
ghimport-source-id: 73a07e07ed9485f80374262de2fb9b87e687a47a

Differential Revision: D15214187

Pulled By: zdevito

fbshipit-source-id: 2f2272f0ee7dad3935e6c31897a0b635b4e66133
2019-05-07 15:41:19 -07:00
864cfbc216 PyTorch Profiler Shape aggregation support (#20035)
Summary:
This is useful when you would like to understand performance
bottlenecks of your model.  One can use the shape analysis in order to
fit model to a roofline model of their hardware.

Please note that this feature can potentially skew profiling
results. Also timing for not nested events will become wrong. One
should only use timing for the bottom most events when shape analysis
is used. Also for the case where people don't need shapes, profiling
should not be affected. As in this case we don't collect shapes, which
is the default behavior and this diff doesn't change it.

One of the next steps
could be, for example, choosing best candidates for quantization. In
the scope of this diff I am just adding optional shapes collection
into the Even class. After that in python there is minor functionality
for providing groupping by shapes.

In the output tables shapes are being truncated but in groupping full
shape string is used as a key.

Here is an example output:

test_profiler_shapes (test_autograd.TestAutograd) ...
```
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
unsigned short      2.30%            305.031us        2.30%            305.031us        305.031us        NaN              0.000us          0.000us          1                [[30, 20]]
addmm               69.40%           9.199ms          69.40%           9.199ms          9.199ms          NaN              0.000us          0.000us          1                [[30], [128, 20], [20, 30], [], []]
unsigned short      0.98%            129.326us        0.98%            129.326us        129.326us        NaN              0.000us          0.000us          1                [[40, 30]]
addmm               27.32%           3.621ms          27.32%           3.621ms          3.621ms          NaN              0.000us          0.000us          1                [[40], [128, 30], [30, 40], [], []]
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 13.255ms
CUDA time total: 0.000us

------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
unsigned short      2.30%            305.031us        2.30%            305.031us        305.031us        NaN              0.000us          0.000us          1                [[30, 20]]
addmm               69.40%           9.199ms          69.40%           9.199ms          9.199ms          NaN              0.000us          0.000us          1                [[30], [128, 20], [20, 30], [], []]
unsigned short      0.98%            129.326us        0.98%            129.326us        129.326us        NaN              0.000us          0.000us          1                [[40, 30]]
addmm               27.32%           3.621ms          27.32%           3.621ms          3.621ms          NaN              0.000us          0.000us          1                [[40], [128, 30], [30, 40], [], []]
------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 13.255ms
CUDA time total: 0.000us
```

Also added this for older aggregation test:

```
test_profiler_aggregation_lstm (test_autograd.TestAutograd) ...
======================================================================================================================================================================================================
TEST
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                     Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
lstm                     0.69%            4.606ms          5.30%            35.507ms         35.507ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.67%            4.521ms          5.27%            35.340ms         35.340ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.66%            4.399ms          5.02%            33.638ms         33.638ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.65%            4.354ms          4.92%            32.958ms         32.958ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.65%            4.351ms          4.96%            33.241ms         33.241ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.65%            4.323ms          5.10%            34.163ms         34.163ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.304ms          4.92%            32.938ms         32.938ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.300ms          5.10%            34.172ms         34.172ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.292ms          5.05%            33.828ms         33.828ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
lstm                     0.64%            4.263ms          4.98%            33.357ms         33.357ms         NaN              0.000us          0.000us          1                [[5, 3, 10]]
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 670.120ms
CUDA time total: 0.000us

-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                     Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls  Input Shapes
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
sigmoid                  15.32%           102.647ms        15.32%           102.647ms        171.078us        NaN              0.000us          0.000us          600              [[3, 20]]
mul                      15.20%           101.854ms        15.20%           101.854ms        169.757us        NaN              0.000us          0.000us          600              [[3, 20], [3, 20]]
lstm                     12.74%           85.355ms         100.00%          670.120ms        33.506ms         NaN              0.000us          0.000us          20               [[5, 3, 10]]
addmm                    11.16%           74.808ms         11.16%           74.808ms         249.361us        NaN              0.000us          0.000us          300              [[80], [3, 20], [20, 80], [], []]
tanh                     9.89%            66.247ms         9.89%            66.247ms         165.617us        NaN              0.000us          0.000us          400              [[3, 20]]
split                    6.42%            43.019ms         6.42%            43.019ms         215.095us        NaN              0.000us          0.000us          200              [[3, 80]]
add                      5.67%            38.020ms         5.67%            38.020ms         190.101us        NaN              0.000us          0.000us          200              [[3, 80], [3, 80], []]
add                      4.81%            32.225ms         4.81%            32.225ms         161.124us        NaN              0.000us          0.000us          200              [[3, 20], [3, 20], []]
addmm                    3.79%            25.380ms         3.79%            25.380ms         253.796us        NaN              0.000us          0.000us          100              [[80], [3, 10], [10, 80], [], []]
unsigned short           3.72%            24.925ms         3.72%            24.925ms         83.083us         NaN              0.000us          0.000us          300              [[80, 20]]
-----------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 670.120ms
CUDA time total: 0.000us

Total time based on python measurements:  691.366ms
CPU time measurement python side overhead: 3.17%
ok
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20035

Differential Revision: D15174987

Pulled By: salexspb

fbshipit-source-id: 9600c5d1d1a4c2cba08b320fed9da155d8284ab9
2019-05-07 14:47:01 -07:00
831bd1c27d support onnx export rnn with batch_first=True (#19766)
Summary:
Also fixed test_pytorch_onnx_caffe2.py rnn tests which are not really testing batch_first = True.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19766

Reviewed By: zrphercule

Differential Revision: D15220950

Pulled By: houseroad

fbshipit-source-id: 96833af7c569f1f939174ba672704f7af87d69f8
2019-05-07 14:19:25 -07:00
e58817fed9 Make graph->param_node()->next() the first node (#19788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19788
ghimport-source-id: fec4b7ea6c4cdb6bf3624262ea4e37f2641d4a6f

Differential Revision: D15094260

Pulled By: zdevito

fbshipit-source-id: b415f029afe4163e9d0bd97a4e0c56c9e625c765
2019-05-07 14:03:02 -07:00
bc5398451e Enable ROCm multi-gpu with Gloo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18640

Differential Revision: D15185822

Pulled By: bddppq

fbshipit-source-id: 1b49ab3fb0f251cfc7ef3ddd62033ae0065a4ec3
2019-05-07 09:55:47 -07:00
sdg
cf55670bdd Add proper __repr__ to LogSoftMax (#20018)
Summary:
Fixes #19961
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20018

Differential Revision: D15218171

Pulled By: ezyang

fbshipit-source-id: 36bdf44d3b7a6df6a6ec5275a74741d4b057d3b4
2019-05-07 08:38:09 -07:00
b8256280ce Working on component-wise transformations that mimic torch.cat and torch.stack (#11868)
Summary:
As discussed in #11755 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11868

Differential Revision: D10032248

Pulled By: ezyang

fbshipit-source-id: d3a81c19f65a3e716f7f1cfc0a42b86c32fc484c
2019-05-07 07:49:29 -07:00
f7a7868820 add process_group in convert_sync_batchnorm (#19240)
Summary:
In line 508.
convert_sync_batchnorm is called recursively to convert the bn to syncbn, thus the process_group also should be passed in the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19240

Differential Revision: D15240318

Pulled By: ezyang

fbshipit-source-id: 0fc9e856392824814991e5e9e8f9513d57f311af
2019-05-07 06:51:18 -07:00
2356fac9a5 Add DirichletFullyConnectedActor to Soft Actor-Critic
Summary: This can be used for problems where the action vector must sum to 1

Reviewed By: kittipatv

Differential Revision: D15206348

fbshipit-source-id: 665fbed893d8c52d451a12d3bb2e73b2638b7963
2019-05-06 23:52:35 -07:00
4ca325df87 Add Custom graph fusion (#18588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18588
ghimport-source-id: f40df177af8b87c73f04bf337f478a62133284cf

Differential Revision: D14901297

Pulled By: bwasti

fbshipit-source-id: 1b6371a5175b3d63dad542b7cc22cb82e8c6cfd0
2019-05-06 23:15:16 -07:00
19e6886576 Intra-op parallel microbenchmarks for PT (#19997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19997
ghimport-source-id: 420d4a68a1ef879beee2734adba8abb575e0b0ab

Differential Revision: D15231375

Pulled By: ilia-cher

fbshipit-source-id: ce7248ea2ebb54d25c9d831c6e3f23f3534557dd
2019-05-06 20:21:45 -07:00
481b6d0268 Allow a non-OpenMP based build (#19749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749
ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9

Differential Revision: D15141993

Pulled By: ilia-cher

fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d
2019-05-06 19:34:48 -07:00
8c97f0b19e Initialize Caffe2 only when running Caffe2 benchmarks (#19980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19980
ghimport-source-id: ca31ca25b88a1c6219e4a32483f70738a8fdbf88

Differential Revision: D15229797

Pulled By: ilia-cher

fbshipit-source-id: 0b23dbdba0c0f60932a75d8b1900c54285f5a8e4
2019-05-06 19:17:23 -07:00
0c7e98b765 Support for non-contiguous tensors and arbitrary dtypes in PT benchmarks (#19993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19993
ghimport-source-id: 4cf51b61bb83b72883148ab0faa0c75c3cef7635

Differential Revision: D15230363

Pulled By: ilia-cher

fbshipit-source-id: a3ab591d6fd24e874958401e63eaec56bda19a5c
2019-05-06 19:12:09 -07:00
839343a482 Add USE_CUDA macro in THD DataChannelGloo for non-GPU use case (#20186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20186

We're missing two USE_CUDA macro for GPU-related code in THD's DataChannelGloo.
Also adding GlooCache back to compilation.

Differential Revision: D15227502

fbshipit-source-id: f260e1cb294d662ba0c170931913b64287d62344
2019-05-06 19:06:58 -07:00
8ca10d35e5 Add torch.nn.quantized.functional namespace (#20042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20042

Exposing torch.ops.quantized as torch.nn.quantized.functional

Differential Revision: D15178099

fbshipit-source-id: 8d65134bd727296f2750bbd2b54df0b99fc84b33
2019-05-06 18:49:58 -07:00
838ada3a62 Update logic for folding onnx::Constant nodes. (#20109)
Summary:
Currently, constant folding pass during ONNX conversion removes all onnx::Constant nodes that are parents of nodes that are folded. In situations where the parent onnx::Constant node is other subscribers downstream this could be a problem. This change updates the removal logic to remove to only those onnx::Constant nodes that do not have other subscribers downstream
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20109

Reviewed By: zrphercule

Differential Revision: D15220392

Pulled By: houseroad

fbshipit-source-id: 150788654ea1c84262becaffd6de152114bf76c0
2019-05-06 17:51:58 -07:00
877b7c1b8d Fix NameError with PYTORCH_JIT=0 (#20120)
Summary:
Right now using `PYTORCH_JIT=0` gives this error:

`NameError: name '_CachedForward' is not defined`](https://our.intern.facebook.com/intern/diff/15210046/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20120

Pulled By: driazati

Differential Revision: D15210046

fbshipit-source-id: 493716d38e0078bfe96fab3dc624ec029988cf1c
2019-05-06 17:41:09 -07:00
1cfe15ef2a temporarily disable devtoolset7 nightlies (#20174)
Summary:
toward issue #20066
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20174

Differential Revision: D15229018

Pulled By: kostmo

fbshipit-source-id: 350c16a27c6530fe7d1d36a2dc11cb5008cf30e5
2019-05-06 17:15:10 -07:00
4211f674f0 Cleanup includes in c10/core/CPUAllocator.cpp. (#19885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19885
ghimport-source-id: 1f7d228ac8d1dd9aeb70446a6baf63f62f195663

Differential Revision: D15118516

Pulled By: ZolotukhinM

fbshipit-source-id: bdaf56d97db9e70cbd36ca03349f6eabfbac2668
2019-05-06 16:06:19 -07:00
8fbde94664 lower batchmm to non-diff optimization (#19987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19987
ghimport-source-id: ca4c38312bd56d8a71f1925297deee7f64f573d3

Differential Revision: D15190356

Pulled By: wanchaol

fbshipit-source-id: 761edb08c670fcbc24a06a5b11ceddf311f75884
2019-05-06 15:58:33 -07:00
0c5dc965a4 Add logging import and failing MLP (#20115)
Summary:
Add logging import and a failed MLP model that confirms that we don't fail `add_graph` when graph optimization fails.

This addresses part of https://github.com/pytorch/pytorch/issues/18903

cc lanpa ezyang natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20115

Reviewed By: natalialunova

Differential Revision: D15206765

Pulled By: orionr

fbshipit-source-id: c40b7e2671ef845a1529a2910ba030159f53f393
2019-05-06 15:44:59 -07:00
035966d538 Add options to Operator to enable registration of alias analysis passes (#19382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19382
ghimport-source-id: aeaad3b84ea20dd95b38635ca28c5ff657187909

Differential Revision: D14990873

Pulled By: bwasti

fbshipit-source-id: e1292ac8358ca8ff5bad8d8aeaddf06c23e66067
2019-05-06 15:40:13 -07:00
5c9ab6f411 Specialize Optional[T] to T (or subtype for Tensor) or None when executing graph (#18407)
Summary:
This patch specializes `Optional[Tensor]` graph inputs to either a `DimensionedTensorType` (if a Tensor is passed) or `NoneType`. Other `Optional[T]` are specialized to `T` or `None`.

- For unwrapping (checked and unchecked) we need to keep the output type, as IR code that follows unwrapping may not work with NoneType (just as it doesn't deal with Optional). While it would not be hit during execution, it will run against the (legitimate) assumptions of the analysis passes.
- Function lookup currently will not match NoneType when it expects optional (I'm not entirely sure why this doesn't lead to unhappyness currently, but hey), I amend this at the level of the function matching code (`operator.cpp`), but see Adam's comments. We would run into trouble if we needed to select between functions whose signature only differs in Optional types with different subtypes, but we would have the same problem when calling them directly, so I would think this is OK.

- It would enable throwing away branches we can't hit. This also reduces the "blockyness" of the graph, so it may be easier to apply optimizations (e.g. fuse things in `if t is None: ...` and outside the `if`.
- Arguments passed into `Optional[Tensor]` arguments will get shape information, which is very handy.
- It get's rid of the problem that tensors passed into Optional arguments get requires_grad set erroneously #18270 (though that also affects lists, which aren't fixed here).
- `Optional[List[int]]` is needed for #18697.

- We're changing typing in a more subtle way than the `TensorType`->`DimensionedTensorType`.
- In particular, specializing to NoneType loses the Type information captured in the `OptionalType` element type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18407

Reviewed By: zdevito

Differential Revision: D15216808

Pulled By: eellison

fbshipit-source-id: 01f1a7643deaf4962c3f55eff2070d54b0e54b69
2019-05-06 15:35:03 -07:00
47f5be164a allow classes to be used in their own methods (#20106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20106
ghimport-source-id: e85bebfae59ad7720cfa64c5f88d5d5fa742c221

Reviewed By: shannonzhu

Differential Revision: D15202261

Pulled By: suo

fbshipit-source-id: ae01b868c0939cecf650bd2b5ad8bb94312eaef7
2019-05-06 15:25:52 -07:00
e37d9c8168 fix compilation order for class methods (#20094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20094
ghimport-source-id: cea2887831fcda83d50adfd2ee8414dedbebe897

Reviewed By: shannonzhu

Differential Revision: D15197079

Pulled By: suo

fbshipit-source-id: b36d5f8c666372e82f6e47a3be70f6dc835cc6ab
2019-05-06 15:25:48 -07:00
0aa7407dd0 Rearrange stopping condition in CompositeReader (#20062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20062

Previously, the batch counter is incremented even if none of the readers has data. In this diff,
1) Limiter is applied to the last reader so that the batch counter is not incremented unless the first N-1 readers have data
2) The stop blob of the last reader as the stop blob of the task so that it's checked before the counter is incremented

Reviewed By: xianjiec

Differential Revision: D15099761

fbshipit-source-id: 47ed6c728118fe453cf57ac3457085867939485b
2019-05-06 15:06:32 -07:00
b3c35e5202 Export randn_like in ONNX exporter (#20093)
Summary:
As a work around for dynamic shape case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20093

Reviewed By: zrphercule

Differential Revision: D15220661

Pulled By: houseroad

fbshipit-source-id: de271fce542be380bd49a3c74032c61f9aed3b67
2019-05-06 14:54:46 -07:00
26f5275644 Index into a tuple with non constant integer (#20081)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/16962

This needs fixing because we turn lists into tuples when constantify a module, so indexing into a Tuple of one type with a non-constant integer is quite common.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20081

Differential Revision: D15205893

Pulled By: eellison

fbshipit-source-id: 61d74ee071ad0aad98e46fe807d6f6cc5f6abd2f
2019-05-06 14:23:16 -07:00
722eb48ff2 Cleanup includes in torch/csrc/* (#19924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19924
ghimport-source-id: f7248b16c8e263a7d0ba7975b1fc0b00cb2cf2c0

Differential Revision: D15125018

Pulled By: ZolotukhinM

fbshipit-source-id: 322c7ca53e38ef8b43b5ac5bd747b28bc10379f1
2019-05-06 14:03:18 -07:00
6ca38d9840 Cleanup includes in torch/csrc/autograd/* (#19923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19923
ghimport-source-id: 54debdd21ca0f4230b1915905673de274807a2e5

Differential Revision: D15125016

Pulled By: ZolotukhinM

fbshipit-source-id: 8d54f436e4508067089a1d05ce192093220aa1bb
2019-05-06 13:48:42 -07:00
8b46938355 Cleanup includes in torch/csrc/jit/* (#19922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19922
ghimport-source-id: 0434c46bf75621ff79ea27a18a2475e7f13e2487

Differential Revision: D15125015

Pulled By: ZolotukhinM

fbshipit-source-id: 5685edfc94067f62e363a85e9badb7f757b1d321
2019-05-06 13:40:26 -07:00
edb376eceb Cleanup includes in torch/csrc/jit/script/* (#19921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19921
ghimport-source-id: 12a4553a4a081e8a41f4ed432b4ce3dc14e4699f

Differential Revision: D15125017

Pulled By: ZolotukhinM

fbshipit-source-id: f7285bd1e0745dadb9cd353a5fa8a09728012a59
2019-05-06 13:24:22 -07:00
17268a9225 Add print function for QTensor (#19513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19513

Add support for printing a QTensor in python frontend

Differential Revision: D15017168

fbshipit-source-id: 312d1f18e6ca3c9eb4a5b8bb1c64f7cc8bc1dcf5
2019-05-06 13:12:43 -07:00
0de4b9e97e Improve nn.ActivationCls repr of inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20127

Differential Revision: D15212928

Pulled By: soumith

fbshipit-source-id: f2e3ccb51315a11043d685bc6bf415ea039eeaa3
2019-05-06 13:04:23 -07:00
a8387b7779 Delete TensorImpl::GetDevice() (#20025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20025

Delete TensorImpl::GetDevice() and clean all its call sites.

Reviewed By: ezyang

Differential Revision: D15170917

fbshipit-source-id: b6862b74aa036198544f79d18a8c0f995cb0ca7b
2019-05-06 12:44:23 -07:00
9005a2c0fc disable flaky test_proper_exit again, still occasionally failing (#20063)
Summary:
test was disabled for being flaky, re-enabled in https://github.com/pytorch/pytorch/pull/19421 but still occasionally failing:

https://circleci.com/gh/pytorch/pytorch/1520165?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

```
Apr 29 19:51:58 ======================================================================
Apr 29 19:51:58 FAIL: test_proper_exit (__main__.TestDataLoader)
Apr 29 19:51:58 There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore
Apr 29 19:51:58 ----------------------------------------------------------------------
Apr 29 19:51:58 Traceback (most recent call last):
Apr 29 19:51:58   File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper
Apr 29 19:51:58     fn(*args, **kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 847, in test_proper_exit
Apr 29 19:51:58     self.fail(fail_msg + ', and had exception {}'.format(loader_p.exception))
Apr 29 19:51:58 AssertionError: test_proper_exit with use_workers=True, pin_memory=False, hold_iter_reference=False, exit_method=worker_kill: loader process did not terminate, and had exception Traceback (most recent call last):
Apr 29 19:51:58   File "test_dataloader.py", line 227, in run
Apr 29 19:51:58     super(ErrorTrackingProcess, self).run()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
Apr 29 19:51:58     self._target(*self._args, **self._kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 424, in _test_proper_exit
Apr 29 19:51:58     for i, _ in enumerate(it):
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 545, in __next__
Apr 29 19:51:58     idx, batch = self._get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 522, in _get_batch
Apr 29 19:51:58     success, data = self._try_get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 480, in _try_get_batch
Apr 29 19:51:58     data = self.data_queue.get(timeout=timeout)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Apr 29 19:51:58     res = self._recv()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Apr 29 19:51:58     return pickle.loads(buf)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Apr 29 19:51:58     return Unpickler(file).load()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Apr 29 19:51:58     dispatch[key](self)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Apr 29 19:51:58     value = func(*args)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Apr 29 19:51:58     fd = multiprocessing.reduction.rebuild_handle(df)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
Apr 29 19:51:58     conn = Client(address, authkey=current_process().authkey)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 169, in Client
Apr 29 19:51:58     c = SocketClient(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
Apr 29 19:51:58     s.connect(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/socket.py", line 224, in meth
Apr 29 19:51:58     return getattr(self._sock,name)(*args)
Apr 29 19:51:58 error: [Errno 111] Connection refused

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20063

Differential Revision: D15218223

Pulled By: nairbv

fbshipit-source-id: 32018c4220f7cb9372ef138631fc3a79759265e1
2019-05-06 08:34:27 -07:00
8818005a91 Fix warnings coming from bernoulli and dirichlet kernels (#19933)
Summary:
Fixes the following warnings in CI:
```
/tmp/pip-req-build-un327jey/aten/src/ATen/native/cuda/Distributions.cu(123): warning: pointless comparison of unsigned integer with zero
              detected during instantiation of "void <unnamed>::bernoulli_tensor_cuda_kernel<scalar_t,prob_t>(at::Tensor &, const at::Tensor &, std::pair<uint64_t, uint64_t>) [with scalar_t=uint8_t, prob_t=uint8_t]"
    (238): here
```
```
/tmp/pip-req-build-un327jey/aten/src/ATen/native/cuda/Distributions.cu:220:116: warning: 'c10::ScalarType detail::scalar_type(const at::Type&)' is deprecated [-Wdeprecated-declarations]
    /tmp/pip-req-build-un327jey/aten/src/ATen/Dispatch.h:46:1: note: declared here
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19933

Differential Revision: D15218177

Pulled By: ezyang

fbshipit-source-id: 3994f7130da1ab04d2e8c3c4fe97f722b7502b41
2019-05-06 08:29:23 -07:00
3a318074e5 Fix examples in jit#user-defined-types documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20003

Differential Revision: D15218172

Pulled By: ezyang

fbshipit-source-id: 07dfebafa337252c2733c018ecaabb0d9902e07d
2019-05-06 08:20:00 -07:00
f29858ff14 Resolve host_define.h warnings (#19917)
Summary:
Eigen was updated with the commit needed to get rid of this warning that plagued the CI. This PR bumps third_party/eigen to that commit head.
```
warning: #warning "host_defines.h is an internal header file and must not be used directly.  This file will be removed in a future CUDA release.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19917

Differential Revision: D15218183

Pulled By: ezyang

fbshipit-source-id: 653c7d61ea401a7d4469c2009612dc43cc70122d
2019-05-06 08:19:57 -07:00
cc06e2f947 fix build with python-2.7.5 (#20137)
Summary:
pytorch failed to build with the following error, complaining about the first regex match
It may be caused by a bug in python 2.7.5
This change proposed is a workaround for building pytorch with python 2.7.5
Since the '*' star notation is greedy in python regex, the new expression shall produce the identical result with the old one.

```
Traceback (most recent call last):
  File "/data2/nihuini/pytorch/cmake/../aten/src/ATen/gen.py", line 14, in <module>
    import preprocess_declarations
  File "/data2/nihuini/pytorch/aten/src/ATen/preprocess_declarations.py", line 3, in <module>
    from function_wrapper import TYPE_FORMAL_GENERIC
  File "/data2/nihuini/pytorch/aten/src/ATen/function_wrapper.py", line 5, in <module>
    from code_template import CodeTemplate
  File "/data2/nihuini/pytorch/aten/src/ATen/code_template.py", line 13, in <module>
    class CodeTemplate(object):
  File "/data2/nihuini/pytorch/aten/src/ATen/code_template.py", line 23, in CodeTemplate
    subtitution = re.compile(substitution_str, re.MULTILINE)
  File "/usr/lib64/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib64/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat
--
CMake Error at cmake/Codegen.cmake:162 (message):
  Failed to get generated_cpp list
Call Stack (most recent call first):
  caffe2/CMakeLists.txt:2 (include)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20137

Differential Revision: D15218122

Pulled By: ezyang

fbshipit-source-id: 10b618ff92a04e9074f5d83e31411fc2341e0cf8
2019-05-06 08:08:41 -07:00
61f1242b7f Formula typo fix (#20110)
Summary:
T_{cur + 1} -> T_{cur} + 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20110

Differential Revision: D15218135

Pulled By: ezyang

fbshipit-source-id: fb914d977cac447867921510bf57b59e62e4f68c
2019-05-06 08:08:37 -07:00
343c1c21f2 update nn.init.calculate_gain doc example
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20131

Differential Revision: D15218126

Pulled By: ezyang

fbshipit-source-id: 164c9d1573cd4d2c3689fb83b952e71862d4f1f2
2019-05-06 08:08:34 -07:00
1dfeffbff5 Expose test utils (#20114)
Summary:
Some functions were not decorated with `CAFFE2_API`, makes them unusable when creating unit tests for custom ops outside Caffe2 repo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20114

Differential Revision: D15217490

Pulled By: ezyang

fbshipit-source-id: dda3910ad24e566567607deaac705a34ec8e7b8d
2019-05-06 07:06:04 -07:00
f2c715cbe1 Fix the spelling of "context"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20055

Differential Revision: D15217488

Pulled By: ezyang

fbshipit-source-id: bb2b57b5e749357b47a01c6c3e73addf3c5418c7
2019-05-06 06:54:30 -07:00
f6609daad7 Fix the warning if the wrong gcc is used with nvcc (#20158)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20158

Differential Revision: D15217665

Pulled By: ezyang

fbshipit-source-id: 951bb640cc8bea705cb2f39ca2024e3c5084cd3b
2019-05-06 06:38:38 -07:00
57948414ac Fix small typo T_mul->T_mult
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20148

Differential Revision: D15217485

Pulled By: ezyang

fbshipit-source-id: cb183cdc2eb3e42c685ef024742a18745923d283
2019-05-06 06:32:33 -07:00
71d23ebc13 #20143 TripletMarginLoss example isn't clear (#20145)
Summary:
Pull Request for TripletMarginLoss example isn't clear #20143
SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20145

Differential Revision: D15217491

Pulled By: ezyang

fbshipit-source-id: 46e30496e9f0dd830a2eec42c5775209a773cc48
2019-05-06 06:32:30 -07:00
23ba0561c3 Add Gate Policy GateLearningRateOp (#20044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20044

We do not have a gating functor. This diff adds it. I'm leveraging existing learning rate op because there are other policies I'll need to use as a union together.
* Since there are other policy in LearningRateOp which will be used as a union, I chose to add it as a LearningRateOp.
   * constantwarmup cannot do step function of nonzero first and zero later

* There are multiple uses for it,
    * e.g. as a gating blob generator that is useful for turning off.
   * e.g. as a learning rate switcher at certain iteration.
* For generalizability, no regulation or constraint is applied on the range of the values

* see figure below for illustration

{F157366621}

Reviewed By: ccheng16

Differential Revision: D15178229

fbshipit-source-id: 1e66e9a4bc1bfb946a57f8aefc97d8170f6be731
2019-05-05 20:11:04 -07:00
863818e05a Allow overwriting kernels (#19777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19777

When used from iPython/Jupyter/Bento, the cell doing kernel registration might be executed multiple times.
Also, there might be a kernel library that wants to overwrite one of the existing kernels.
Let's allow this.

Reviewed By: dzhulgakov

Differential Revision: D15090318

fbshipit-source-id: 09f842e8fd36646053c5c2f11325de4d31105b0c
2019-05-05 01:06:26 -07:00
470af2357d Refactorings to prepare for overwritable kernels (#19776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19776

The diff stacked on top of this enables overwriting of kernels, but before that, we need to do some refactorings.
This diff:
- hides deregistration logic behind a RAII class so we can keep more information about what exactly to deregister without the API user knowing about it.
- Split KernelRegistration from SchemaRegistration by taking Dispatcher::OperatorDef and moving it to a different file. This is better readable, especially since kernel registration will become more complex in the next diff.
- Move LeftRight synchronization out of DispatchTable to Operator because there will be a mutex added to Operator in the next diff and related synchronization primitives shouldn't live on different abstraction levels.

Reviewed By: dzhulgakov

Differential Revision: D15090322

fbshipit-source-id: 2e51a192075163f0d496956d9e54b9aaf26b2369
2019-05-05 01:06:23 -07:00
fc00bfd12e Update MultiheadAttention documentations (#20071)
Summary:
Add documentations to add_bias_kv, add_zero_attn, and attn_mask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20071

Differential Revision: D15213034

Pulled By: zhangguanheng66

fbshipit-source-id: c3db4b9e8527863420ba3ce6abf6098d3b0fb7a7
2019-05-04 13:55:41 -07:00
ecdeef37df Fix math rendering of CTC loss docs and fix error of lint in test_torch.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19662

Differential Revision: D15063326

Pulled By: soumith

fbshipit-source-id: 71e79dee19c6259e27393d6b3d5ca3f8edfd6afa
2019-05-03 22:30:41 -07:00
7ddd5d06ed trace multiple methods (#19905)
Summary:
This PR adds a new trace API `trace_module` that will allow us to trace multiple methods as a part of a single `ScriptModule`

See the example below.

```python
        class Net(nn.Module):
            def __init__(self):
                super(Net, self).__init__()
                self.conv = nn.Conv2d(1, 1, 3)

            def forward(self, x):
                return self.conv(x)

            def weighted_kernel_sum(self, weight):
                return weight * self.conv.weight

        example_weight = torch.rand(1, 1, 3, 3)
        example_forward_input = torch.rand(1, 1, 3, 3)
        n = Net()
        inputs = {'forward' : example_forward_input, 'weighted_kernel_sum' : example_weight}
        module = torch.jit.trace_module(n, inputs)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19905

Differential Revision: D15200007

Pulled By: Krovatkin

fbshipit-source-id: 0354d973fe40cb6e58b395bd866df14e0fc29d5b
2019-05-03 16:21:58 -07:00
7ad04ad28d DOC: Update web documentation of geometric_ to be consistent with Tensor behaviour (#20091)
Summary:
Fix #19940 by updating web doc to reflect Tensor behaviour which will reflect [here](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.geometric_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20091

Differential Revision: D15196734

Pulled By: soumith

fbshipit-source-id: a1b8aff9599f170e76a9cbca5112b5a9488bc36c
2019-05-03 15:39:10 -07:00
271f005eeb Add elementwise_affine for LayerNormGradientOp (#19982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19982

Add elementwise_affine for LayerNormGradientOp

Reviewed By: houseroad

Differential Revision: D15157493

fbshipit-source-id: 7465f2c1d4df4649b4903b93483c4861e9c7afa9
2019-05-03 15:33:46 -07:00
440aac082a The deprecated API allows std::vector arguments (#19784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19784

For backwards compatibility, we allow vector arguments if a kernel is registered with the deprecated API.

Reviewed By: dzhulgakov

Differential Revision: D15091972

fbshipit-source-id: 4db3e3a262e605504b05c42d40046011408501d2
2019-05-03 12:18:09 -07:00
e0bd7cc821 Change the export of _dim_arange in ONNX (#20078)
Summary:
Previously using ATen op, now fully switched to pure ONNX/Caffe2 ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20078

Reviewed By: zrphercule

Differential Revision: D15188774

Pulled By: houseroad

fbshipit-source-id: 8ae3094369497e2f3ebf478cda222b73de2a995e
2019-05-03 11:07:05 -07:00
840680bbf3 Reduce overhead of OnnxifiOp (#20085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20085

Reduce the overhead of OnnxifiOp from ~40us avg (~100us max) to ~25us avg (~40us max).

Reviewed By: bertmaher

Differential Revision: D15191071

fbshipit-source-id: 830d2bd08b19567a133d43c8a7b996032629d326
2019-05-03 10:57:19 -07:00
c7c02724cd CMakeLists changes to enable libtorch for Android (#19762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19762
ghimport-source-id: 287aa7fea4efd38994e14d794123eb2046b91fc0

Differential Revision: D15087653

Pulled By: ljk53

fbshipit-source-id: 4498ff9f7f7903c3e25541184302b811267958e9
2019-05-03 09:28:53 -07:00
0e77c0f5de Add ninja to PyTorch README.md file. (#20079)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17572
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20079

Differential Revision: D15195388

Pulled By: zhangguanheng66

fbshipit-source-id: b7448b482f07e96753f727664416a5d0e85602b4
2019-05-03 07:13:51 -07:00
767c82e151 Initialize last_epoch in _LRScheduler.__init__() (#20059)
Summary:
Class attributes preferably be explicitly initiated within
the __init__() call. Otherwise, overriding step() is
prone to bugs.

This patch partially reverts #7889
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20059

Differential Revision: D15195747

Pulled By: soumith

fbshipit-source-id: 3d1a51d8c725d6f14e3e91ee94c7bc7a7d6c1713
2019-05-02 22:38:12 -07:00
af87cfd7f9 Remove in-memory scalars and add comments (#20038)
Summary:
This takes care of some outstanding review comments for https://github.com/pytorch/pytorch/pull/16196/

Specifically:
1. Add comment about kind
2. Add comment about GraphPy
3. Remove ONNX version comment
4. Remove scalar_dict from SummaryWriter and all history functions

cc lanpa ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20038

Reviewed By: natalialunova

Differential Revision: D15177257

Pulled By: orionr

fbshipit-source-id: 218aa799d8b7dbb58f422a331236bba4959347de
2019-05-02 22:26:28 -07:00
fb40e58f24 Remove deprecated tensor constructors in torch.distributions (#19979)
Summary:
This removes the deprecated `tensor.new_*` constructors (see #16770) from `torch.distributions` module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19979

Differential Revision: D15195618

Pulled By: soumith

fbshipit-source-id: 46b519bfd32017265e90bd5c53f12cfe4a138021
2019-05-02 20:45:02 -07:00
792bc56ec2 Update README.md (#20088)
Summary:
Sometimes people need to checkout an older version and build PyTorch. In that case, they need to do `git submodule sync` and maybe `git submodule update --init` as mentioned [here](https://github.com/pytorch/pytorch/issues/20074).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20088

Differential Revision: D15195729

Pulled By: soumith

fbshipit-source-id: 73232b801e5524cdba462dd504fb973d95d0498c
2019-05-02 20:40:03 -07:00
fb8792e2b6 Remove torch/jit from xplat build (#19967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19967

-

Reviewed By: dreiss, dzhulgakov

Differential Revision: D15150843

fbshipit-source-id: af7d6902934883be9d8021b3601de2fe1f3bf806
2019-05-02 15:31:06 -07:00
1ecbf0d213 Spilt MethodValue and FunctionValue (#19985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19985
ghimport-source-id: 6a032fd47f2f2e5ccd59fbd70bd12d0b9a815685

Differential Revision: D15160042

Pulled By: zdevito

fbshipit-source-id: a4ed3750ceea44d7d4c7d970b440840eb0ea5936
2019-05-02 14:37:19 -07:00
2ec2287cce Fix smoke tests on binary builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20069

Differential Revision: D15188864

Pulled By: pjh5

fbshipit-source-id: 692c5ecf1a4f5a560cf8fbcfe3634b809f184d72
2019-05-02 14:24:30 -07:00
99c548e223 Make IValue("string") do the right thing (#20027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20027

Before this change, any pointer was converted to bool, i.e.

    IValue("string") == IValue(true)

After this change, it does the right thing and creates a string.

Reviewed By: dzhulgakov

Differential Revision: D15172409

fbshipit-source-id: 8167dd780005f9bceef4fe3c751f752e42ceeb20
2019-05-02 12:36:13 -07:00
38e630a03f Extract feature length information from ClipRangesGatherSigridHash (#19704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19704

Extract feature length information from ClipRangesGatherSigridHash

Reviewed By: ipiszy

Differential Revision: D15075047

fbshipit-source-id: f70f61b7910515df5c47a042ac7afa523dbdb02a
2019-05-02 12:31:01 -07:00
6f7a315a71 Allow onnx export for maxpool with dilations (#18721)
Summary:
Now, MaxPool operator supports the 'dilations' attribute with this commit:
b22041c3f1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18721

Reviewed By: zrphercule

Differential Revision: D15152400

Pulled By: houseroad

fbshipit-source-id: e8f5ab35c5c2c3a540a22f7cf7bb453d892d0400
2019-05-02 11:26:57 -07:00
75868683dc Removing CUDA 8.0 nightlies (#20068)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/20067
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20068

Differential Revision: D15184571

Pulled By: pjh5

fbshipit-source-id: a37846a23ac7b414f9a3741f37a2db5bb61c93a9
2019-05-02 10:53:56 -07:00
002009b5a9 Automatic update of fbcode/onnx to 7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef (#20012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20012

Previous import was f1311e74ec8a91cbf86094cd6f10157cbf00c536

Included changes:
- **[7d7bc83d](https://github.com/onnx/onnx/commit/7d7bc83d)**: fix shape inference (#1984) <Ashwini Khade>
- **[68630bbd](https://github.com/onnx/onnx/commit/68630bbd)**: fixing some of Mod test cases (#1962) <Jeff Saremi>

Reviewed By: zrphercule

Differential Revision: D15160934

fbshipit-source-id: c53aff401f56b2febeb6c4ee302670eb12b9b495
2019-05-01 22:15:30 -07:00
725ef26f34 Add suport for tensor targets in for-in (#19380)
Summary:
Fixes #19314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19380

Differential Revision: D15167858

Pulled By: Krovatkin

fbshipit-source-id: e87261bbf3e6f8df0601df80280eb3dba42798cd
2019-05-01 17:57:56 -07:00
46589a8a32 fix labs warning in THTensorMoreMath.cpp (#19760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19760
ghimport-source-id: d0aabee8c7b881710292d468dca1537c5ea761b5

Differential Revision: D15087655

Pulled By: ljk53

fbshipit-source-id: ac133dfc2301c2a86c41b4b8f1483d7d23824e1e
2019-05-01 16:35:42 -07:00
ff70c364a4 fix labs warning in THVectorDefault.cpp (#19999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19999
ghimport-source-id: 81157c60f0a9e5adbf12af8fc3a6e4352826b859

Differential Revision: D15169025

Pulled By: ljk53

fbshipit-source-id: 8e6f8df6dec6d21d6c7e743e974f4fcfff7cdeb5
2019-05-01 16:35:39 -07:00
18cb098588 Remove warnings on new_* constructors (#20026)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#20026 Remove warnings on new_* constructors**

Revert of #16770, fixes #19995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20026

Pulled By: driazati

Differential Revision: D15171691

fbshipit-source-id: 057c3b4a9fd6086ca240007e5404a286080f04b6
2019-05-01 16:35:36 -07:00
8987ae314e Remove try/catch in constant propagation (#19686)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19686 [jit] Remove try/catch in constant propagation**

The try-catch here gets tripped pretty often when constant prop is run which screws up `catch throw` in gdb.](https://our.intern.facebook.com/intern/diff/15170134/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19686

Pulled By: driazati

Differential Revision: D15170134

fbshipit-source-id: 93688561126f3ab582c8358e8f2787f7fce9aa73
2019-05-01 16:27:08 -07:00
0da0c4be48 Rotate circleci keys
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19981

Differential Revision: D15174219

Pulled By: pjh5

fbshipit-source-id: 205952aa90ed93f193f40d4293f5a8d82fa33ed6
2019-05-01 16:04:41 -07:00
e846ccd7bc Fix bug in dumpNet (#20001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20001

att

Reviewed By: zrphercule

Differential Revision: D15164116

fbshipit-source-id: dab19fb84fa0ab648103317af5509703db918682
2019-05-01 15:27:49 -07:00
c80ae6dc8e Change the quant-dequant node pattern for jit op substitution pass (#19910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19910

This change modifies the quant-dequant node pattern from
qparam->q->dq to qparam->q->int_repr->qparam->dq. The motivation for
this change is to make the qparams required for op substition one level
up at dequant node instead of multiple levels up.

Differential Revision: D15120146

fbshipit-source-id: 74b0fd5cb50a338f562740a9cc727a7791c718c3
2019-05-01 12:18:33 -07:00
3f7e3d5857 Add the ability to observe intermediate tensors in an onnxifi op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19966

Reviewed By: yinghai

Differential Revision: D15096086

fbshipit-source-id: 8e6a26c46898f99d411dd5841f086946884b2457
2019-05-01 11:51:02 -07:00
de582e2f89 Fix test_forked_cw (#19680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19680

This was broken for quite some time because of an operator schema
check that went into effect at some point in time.

Reviewed By: manojkris

Differential Revision: D15055082

fbshipit-source-id: 7f730f9b810bdaffd69bab7ac4d02c5b2e40645b
2019-05-01 11:11:45 -07:00
e04caa3f44 Pass Quantization parameters for quant nodes (#19402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19402

This pass propagate the qparams calculated after calibration to the
quant nodes which will be used later for quantization

Differential Revision: D14995230

fbshipit-source-id: 5709153ea1c039c4ab4470ddb689a303b0bcc6fd
2019-05-01 09:15:59 -07:00
f564226167 Avoid reusing TYPE parameter in AT_DISPATCH macros. (#19968)
Summary:
From the comment: "don't use TYPE again in case it is an expensive or side-effect op"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19968

Differential Revision: D15151567

Pulled By: gchanan

fbshipit-source-id: 4d42c081ac1472b71f1cea5172cb42a7c83a7043
2019-05-01 08:52:06 -07:00
1f9a0c5dd6 Add observer nodes for input data nodes (#19232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19232

Add observer nodes to collect stats for input data nodes excluding params
which are constant at inference and need not be observed. This information
is required to compute quantization params.

Differential Revision: D14885485

fbshipit-source-id: 8762cc2a4e510e1553b3dbd1d1aecd55b4bdb89f
2019-05-01 03:49:14 -07:00
8cd6d2f101 rename BUILD_ATEN_MOBILE to INTERN_BUILD_MOBILE and make it private (#19942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19942
ghimport-source-id: 6bacc8f5ad7911af8cf5fde9fcb604ade666b862

Reviewed By: dzhulgakov

Differential Revision: D15144325

Pulled By: ljk53

fbshipit-source-id: d63a70f007110d5d1055d6bec1ed09a1a6aafdae
2019-05-01 00:20:24 -07:00
5108e807e0 add new macro TORCH_MOBILE for libtorch mobile build (#19761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19761
ghimport-source-id: 359a44594d3d5afb8102435b4eac6ab920c24ef4

Differential Revision: D15087652

Pulled By: ljk53

fbshipit-source-id: f1a79c38c9415bb3786cc4d073370b1cb807e5ce
2019-04-30 21:25:15 -07:00
360640bc9c Extract Python-specific SugaredValues to a separate file from init.cpp. (#19986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19986
ghimport-source-id: 67f5fec4b5b2114f2922505a7743ed27e6d7e6cc

Differential Revision: D15160820

Pulled By: ZolotukhinM

fbshipit-source-id: e39238db8f30a8809891bff8a2fe39977124f6ca
2019-04-30 19:38:23 -07:00
ba84ad0d97 LeftRight works for classes without default constructors (#19775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19775

-

Reviewed By: dzhulgakov

Differential Revision: D15090319

fbshipit-source-id: e80865975970400c3db24bba4af4327105f3b9b2
2019-04-30 16:34:15 -07:00
e97da36cbb Explicitly disable copy&move on LeftRight (#19774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19774

see in-source comment

Reviewed By: dzhulgakov

Differential Revision: D15090320

fbshipit-source-id: ae9ba5b5df7115c2b1c275e384030063dbbf8f1a
2019-04-30 16:34:12 -07:00
a285cbcccf support different class modes for bbox in box_with_nms_limit_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19820

Reviewed By: newstzpz

Differential Revision: D15112955

fbshipit-source-id: a757622a32cff7159c39735607103138dbbafc24
2019-04-30 16:02:44 -07:00
74f527a8fa Adding job to upload binary sizes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19934

Differential Revision: D15155702

Pulled By: pjh5

fbshipit-source-id: cf841624145d14c7f40153c8fe7b442e633c0f92
2019-04-30 15:56:01 -07:00
48981a02e9 Fix typo in embedding_bag_cpu. (#19432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19432

Fixed typo.

Reviewed By: cpuhrsch

Differential Revision: D15003373

fbshipit-source-id: 8d0f9f70181f6e5041c1a09f5b8a7a5707a4ff2c
2019-04-30 15:51:22 -07:00
cb5442b31a Move IValues from stack into kernels (#19783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19783

Previously, the IValues were copied into the kernel arguments, which caused a refcount bump if Tensor was taken by value.
Now, a kernel can take Tensor by value without any refcount bump because it is moved in.

Reviewed By: dzhulgakov

Differential Revision: D15091973

fbshipit-source-id: 4c5ff2e3ee86f5934cc84191697f7dbc9c3ee345
2019-04-30 15:46:16 -07:00
cb8ff2a2b4 Add mkldnn support for adaptive_avg_pool2d (#19818)
Summary:
AdaptiveAvgPool2d is used in torchvision resnet models https://github.com/pytorch/vision/blob/9a481d0/torchvision/models/resnet.py#L145

Fixes #19797
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19818

Differential Revision: D15112777

Pulled By: bddppq

fbshipit-source-id: 6c9b29c805d28356cda49c10c2cd3ce9d7a8b3f5
2019-04-30 15:00:34 -07:00
f5b1a41c58 specify data type in the doc (#19959)
Summary:
addresses comments in #19915
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19959

Differential Revision: D15149993

Pulled By: orionr

fbshipit-source-id: 0e438cfa1a311e89d4bed7ae9d7710a9f1b19a78
2019-04-30 14:15:56 -07:00
947fd9c3f5 More doc edits (#19929)
Summary:
* document `torch.jit.Attribute`
* add JIT one-liner to `README.md`
* misc clarity edits](https://our.intern.facebook.com/intern/diff/15152418/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19929

Pulled By: driazati

Differential Revision: D15152418

fbshipit-source-id: dfee03f0a17300aaf453fcf17f418463288f66c2
2019-04-30 13:52:07 -07:00
a9c189ca14 Macos upload fix (#19965)
Summary:
The second commit will be removed before landing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19965

Differential Revision: D15153243

Pulled By: pjh5

fbshipit-source-id: 70eae38d0cb07dc732c0cf044d36ec36d0a4472d
2019-04-30 13:46:24 -07:00
0ad1d5c317 fix THAllocator.cpp (#19759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19759
ghimport-source-id: 7ccd6e69dfd36ca9d59da59d5ec55f08e58b79f7

Reviewed By: soumith

Differential Revision: D15087651

Pulled By: ljk53

fbshipit-source-id: 536db274f7e725a0170b6f2c462146c08139f341
2019-04-30 13:29:47 -07:00
4e154c1585 remove C10_MOBILE from LegacyTypeDispatch.cpp (#19758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19758
ghimport-source-id: 59d0d337260e3baf57da87af3f97836b321a3db2

Reviewed By: pjh5

Differential Revision: D15087654

Pulled By: ljk53

fbshipit-source-id: 8f56808faba35bae68ee99c62c761f94807b6fd7
2019-04-30 13:29:43 -07:00
508ca44fcc add logging in sparse_to_dense_mask when skipping (#19945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19945

this logging was removed in D6888977, but i think it could be useful to debug upstream data issue to check the violating index

Reviewed By: xianjiec

Differential Revision: D15127887

fbshipit-source-id: 4ad7eceefcd063bf45bc190a4c0d458a089c918a
2019-04-30 12:57:24 -07:00
56977db4a7 Provide option to save quantized data for DNNLOWP without layout optimization (#19681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19681

For accelerator, we need to lower just the quantized weights data without layout transformation. This diff attempts to provide this option.

Reviewed By: jerryzh168, zrphercule

Differential Revision: D15066568

fbshipit-source-id: 133d749e087c2ad4a899bee5e96f597f70b2443c
2019-04-30 12:32:42 -07:00
ca57dd9332 Fixed log_normal and geometric for CPU (#19938)
Summary:
log_normal_ and geometric_ were disabled for CPU by mistake in [this PR](bc53805f2e), this PR fixes it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19938

Differential Revision: D15143404

Pulled By: izdeby

fbshipit-source-id: 41c7bd29f046b5a3ac6d601de8c64ab553771d19
2019-04-30 12:18:10 -07:00
c6cb32f588 Forbid kernels from returning Scalar[] (#19811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19811

This does not immediately take effect for the custom op API but will break backwards compatibility once we switch to the new operator registration.

Reviewed By: dzhulgakov

Differential Revision: D15101924

fbshipit-source-id: 8890a5a3e163d3263dc1837be0b4851984771917
2019-04-30 12:13:09 -07:00
9a81d1e692 Automatic generation of unittest for Glow integration
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19936

Reviewed By: ipiszy

Differential Revision: D15138090

fbshipit-source-id: 29a812548bb5da00176b00c1e9a26a7c31cea9c0
2019-04-30 12:13:06 -07:00
3a0727e58b Fix flake8. (#19832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19832
ghimport-source-id: 7360a52dbcf83458797c27002afc1fd53ee5907f

Differential Revision: D15115620

Pulled By: ZolotukhinM

fbshipit-source-id: aa62b04facc1e1824a8889a32dace5804daa21df
2019-04-30 12:09:10 -07:00
e710f3b1e1 Fix C10_MOBILE macro for ios (#19779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19779

This macro wasn't set correctly because the target macros weren't included from Apple's header.

Reviewed By: dzhulgakov

Differential Revision: D15090427

fbshipit-source-id: 43ca44f0f409e11718b7f60c3fdcd2aa02d7018e
2019-04-30 12:03:24 -07:00
965c3b2761 Automatic update of fbcode/onnx to f1311e74ec8a91cbf86094cd6f10157cbf00c536 (#19949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19949

Previous import was 22662bfd4dcc6baebf29e3b823a051676f991001

Included changes:
- **[f1311e74](https://github.com/onnx/onnx/commit/f1311e74)**: Lint the docs name (#1982) <Lu Fang>
- **[c1c04af4](https://github.com/onnx/onnx/commit/c1c04af4)**:  Fix a shapeinference bug in upsample v9/10 (#1969) <Raymond Yang>
- **[2f7dc10f](https://github.com/onnx/onnx/commit/2f7dc10f)**: Create managingexperimentalops (#1974) <Joseph Spisak>
- **[419582b7](https://github.com/onnx/onnx/commit/419582b7)**: Create archivefileformat doc based on the wiki equivalent (#1973) <Joseph Spisak>
- **[4fcf3842](https://github.com/onnx/onnx/commit/4fcf3842)**: Create NLPinONNXproposal (#1975) <Joseph Spisak>
- **[153c4f9a](https://github.com/onnx/onnx/commit/153c4f9a)**: Create ONNXIFIproposal (#1976) <Joseph Spisak>
- **[c695016a](https://github.com/onnx/onnx/commit/c695016a)**: Create onnxreleases (#1977) <Joseph Spisak>
- **[ccf4b383](https://github.com/onnx/onnx/commit/ccf4b383)**: Create functionsproposal (#1978) <Joseph Spisak>
- **[c4d32371](https://github.com/onnx/onnx/commit/c4d32371)**: Create typeannotations.md (#1979) <Joseph Spisak>

Reviewed By: zrphercule

Differential Revision: D15145652

fbshipit-source-id: eca661fe5a735dce5af992e16b6a4013e896b8b4
2019-04-30 11:57:47 -07:00
aa6403bae6 Added .bool() method
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19928

Differential Revision: D15131923

Pulled By: izdeby

fbshipit-source-id: 3909cf4623fe85e98ceaf57fbb57745919899445
2019-04-30 10:34:31 -07:00
d868c97580 Improve performance of Int8SpatialBN (needed for DF4 quantization) (#19702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19702

avx2 implementation of core compute for Int8SpatialBN

Reviewed By: jianyuh

Differential Revision: D15073973

fbshipit-source-id: c30b0c621348ba9331ba5e48b281c00cf6e479a1
2019-04-30 10:26:48 -07:00
95ce796663 enable CopyVector for type of int32_t (#19931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19931

as title

Reviewed By: Wakeupbuddy

Differential Revision: D15134369

fbshipit-source-id: a8afa166b5537bf815be875fa8afcb599897d5a7
2019-04-29 22:24:06 -07:00
4c678dbe87 Make moving FunctionSchema efficient (#19773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19773

const members can't be moved, so whenever somebody moved a function schema, it was copied instead.
This diff fixes this.

Reviewed By: dzhulgakov

Differential Revision: D15090323

fbshipit-source-id: 123a1d6b96ac46cb237966c0b072edebcdafe54c
2019-04-29 21:43:40 -07:00
8e77506799 Add onnx export for floor, ceil, log2 and prim::shape
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17895

Reviewed By: zrphercule

Differential Revision: D15103396

Pulled By: houseroad

fbshipit-source-id: 2ec80f11a19a8659aa496e68aed769a8dd1efb18
2019-04-29 21:23:14 -07:00
55c719b161 Remove operator.h's dependency on function_schema.h (#19817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19817

A lot of files were depending on the JIT's typesystem
because operator.h depends on function_schema.h. However,
this isn't fundamental to the design. This diff tries to
remove the direct depenency and only includes the c10
wrapper helpers in files where it is required.

Reviewed By: smessmer

Differential Revision: D15112247

fbshipit-source-id: 2c53d83e542c32d9a398c8b60dbf40ab7a1cb0f6
2019-04-29 19:50:43 -07:00
2a95cf6345 Add a pattern-based fusion pass. (#19596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19596
ghimport-source-id: 1d7af5877dbeffa826201812649a9009c06c6305

Differential Revision: D15042033

Pulled By: ZolotukhinM

fbshipit-source-id: e3178d9aec2ac63fc3779ddedbd967aae0401c76
2019-04-29 19:17:31 -07:00
abbfb7dd23 Fix devtoolset7 binary builds (#19919)
Summary:
cc kostmo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19919

Differential Revision: D15141049

Pulled By: pjh5

fbshipit-source-id: 9080924c5304d7db001d6bd910a8c3051a34eeae
2019-04-29 17:21:03 -07:00
f5c2b5a259 ONNX Export Min and Max ops with dim
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19689

Reviewed By: zrphercule

Differential Revision: D15103119

Pulled By: houseroad

fbshipit-source-id: f44555657f6965c41737e69485da119f37cf9d7c
2019-04-29 16:50:18 -07:00
a5b8a7d44b Don't include TensorMethods.h - it's already included by Tensor.h anyway. (#19831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19831
ghimport-source-id: 03ccc99f348021bd8a91c5006b5e4ebd3565290e

Differential Revision: D15115630

Pulled By: ZolotukhinM

fbshipit-source-id: a322cdca9c87429f092509d855d16551873bf17e
2019-04-29 16:11:25 -07:00
62a1640666 Improve torch.utils.tensorboard docs (#19915)
Summary:
This adds method details and corrects example on the page that didn't run properly. I've now confirmed that it runs in colab with nightly.

For those with internal access the rendered result can be seen at https://home.fburl.com/~orionr/pytorch-docs/tensorboard.html

cc lanpa, soumith, ezyang, brianjo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19915

Differential Revision: D15137430

Pulled By: orionr

fbshipit-source-id: 833368fb90f9d75231b8243b43de594b475b2cb1
2019-04-29 16:06:27 -07:00
ff33c8c24a Avoid printing debug information in the test
Summary: As title says.

Reviewed By: zafartahirov

Differential Revision: D15097015

fbshipit-source-id: b506130d8c8993672815221ba2c861ee1a95c64a
2019-04-29 15:48:38 -07:00
114644449e Break circular dependency between Type.h, Tensor.h and TensorMethods.h (#19830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19830
ghimport-source-id: 9b057c549652a39d6706679ee86494124bb40ef0

Differential Revision: D15115631

Pulled By: ZolotukhinM

fbshipit-source-id: 821c0aedacd9692d6076349ff8318c8f6234891c
2019-04-29 14:46:50 -07:00
ccf35f4be0 remove scalar to float matching (#19918)
Summary:
Trying to get this in before 1.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19918

Reviewed By: driazati

Differential Revision: D15124430

Pulled By: eellison

fbshipit-source-id: 549cdcbaff91218657e94ce08c0f4e69b576d809
2019-04-29 14:41:56 -07:00
4294dba981 Misc pickler improvements (#19638)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19638 [jit] Serialize attribute module as torch.jit._pickle**

* use `torch.jit._pickle` as the module for globals in the pickle program. Pickle will try to resolve these to the actual functions in `torch.jit._pickle.py` automatically (I believe this can also be overridden to point to whatever functions you want). This means that `pickle.load("my_model/attributes.pkl")` will work instead of having to use a custom `pickle.Unpickler`
* use `REDUCE` opcodes instead of `BUILD` to make use of the last bullet
* use a union in the unpickler to support globals better (+ any future metadata we might need that can't be stored in an `IValue`), this makes some of the code around `IntList`s clearer and lets us get rid of any lookbehind for opcodes
* pickle things as a tuple instead of a list (an immutable result is more semantically correct)](https://our.intern.facebook.com/intern/diff/15111203/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19638

Pulled By: driazati

Differential Revision: D15111203

fbshipit-source-id: 526c6c2b63a48eb1cba1c658045a7809730070dd
2019-04-29 13:45:10 -07:00
8933ff651c Remove self-include. (#19833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19833
ghimport-source-id: 396eb04317ad67980a624e71587bd57c4ac353f5

Differential Revision: D15115726

Pulled By: ZolotukhinM

fbshipit-source-id: 20677510a640b32defb63ae8cf3c22ffb1d1fdc4
2019-04-29 12:44:43 -07:00
de19eeee99 Enabled masked for a bool tensor (#19140)
Summary:
Added deprecation warnings for the masked methods and enabled them for a bool tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19140

Differential Revision: D14888021

Pulled By: izdeby

fbshipit-source-id: 0e42daf8f3732ca29f36d10485402bfc502716ad
2019-04-29 10:40:12 -07:00
39b885cbbf Add magma for CUDA 10.1 to Windows docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19914

Differential Revision: D15123029

Pulled By: soumith

fbshipit-source-id: a3d4b498a763e1a9829896d44211be00400ec39d
2019-04-29 10:13:21 -07:00
9d0b5a1ce9 Build caffe2/fb/operators (#19688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19688

Minor changes to hipify script to take extra folders.

Reviewed By: bddppq

Differential Revision: D15068427

fbshipit-source-id: e2e792c8227cbd0e15fd2564f87d740a62c477da
2019-04-29 09:01:10 -07:00
841360029a Finer grained consistency check in reducer (#19901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19901

The existing code used `expect_autograd_hooks_` as a proxy for the
situation where finalization of the previous iteration is needed. This
is not correct, however, since you may decide to completely ignore the
output of a DDP wrapped module. If this is the case, and no gradients
have been passed to the reducer, it is fine to keep going. This commit
adds a new variable `require_finalize_` that tracks whether the
finalization is really needed.

Reviewed By: mrshenli

Differential Revision: D15118871

fbshipit-source-id: 25938eaf1fe13e2940feae1312892b9d3da8a67d
2019-04-28 23:12:19 -07:00
5525c419fc Only call into reducer if torch.is_grad_enabled() (#19897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19897

During validation, gradient reduction is not needed, and autograd is
never called. The model output will always be a detached tensor. After
the new reducer was merged, this meant that it would find all model
parameters unused, and kick off reduction for them. When #19799 and
output where no parameters are used and it tries to kick off reduction
of zeroed gradients. Test for `torch.is_grad_enabled()` and
`self.training` before calling into the reducer.

Reviewed By: mrshenli

Differential Revision: D15118726

fbshipit-source-id: b0208f632a61cbe8110fa626fa427937b7f05924
2019-04-28 23:12:16 -07:00
a92e1ccd6c Remove trailing whitespace flake8 lint (#19828)
Summary:
Not useful and noisy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19828

Differential Revision: D15118966

Pulled By: suo

fbshipit-source-id: e31c05f4eac206a2a2d9508fcbef8a658d2a9baf
2019-04-28 22:19:01 -07:00
b695e562e5 Make find_unused_parameters in DDP default to False (#19895)
Summary:
As DDP in previous releases does not support unused params, turning off `find_unused_parameters` by default to derisk new reducer.

CC pietern soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19895

Reviewed By: pietern

Differential Revision: D15118563

Pulled By: mrshenli

fbshipit-source-id: 6215c486e1dae3387b36011d8e64a2721ac85f58
2019-04-28 21:22:26 -07:00
3f7a87788f Remove redundant include from jit/fuser/cpu/dynamic_library.h. (#19863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19863
ghimport-source-id: e13569d7d5abb243d4d5703a2a35f437f359efee

Differential Revision: D15118505

Pulled By: ZolotukhinM

fbshipit-source-id: 3dd1297fdabb102c5f0f68c9b1198195a851f8d0
2019-04-28 20:34:03 -07:00
3803d1c901 Fix conda build for Windows (#19824)
Summary:
Let's test it before merging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19824

Differential Revision: D15116111

Pulled By: soumith

fbshipit-source-id: 0a73de3f045ee1349061674f5f8e2aaba382493c
2019-04-27 23:10:46 -07:00
9b69da2b55 Allow for iterations where no module parameter is used (#19821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19821

It is possible that not a single parameter is used during an
iteration. If this is the case, the `prepare_for_backward` function
marks all parameters as unused, kicks off reduction of all buckets,
*and* finalizes the reduction.

This is different from the prior implementation where we assumed that
autograd would produce a gradient for at least a single parameter.
We then used the autograd callback mechanism to queue a finalizer
callback. Now, this finalizer may be executed in line.

Reviewed By: mrshenli

Differential Revision: D15113272

fbshipit-source-id: dc91458b569cd8c106ddaeea558464b515683550
2019-04-27 22:57:59 -07:00
f0a007a26c Use QualifiedName for classes (#19575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19575
ghimport-source-id: eaed1d9d66672aadfe893e62a3a811da8ac7d966

Differential Revision: D15035281

Reviewed By: shannonzhu

Pulled By: suo

fbshipit-source-id: 7bac5e5f9223af77268bc03e08b37450a6840dbe
2019-04-27 16:13:27 -07:00
1d6e868c2f make QualifiedName a value type (#19626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19626
ghimport-source-id: 20f03b245245a7fab1c175cebc41e477b1eadc68

Reviewed By: shannonzhu

Differential Revision: D15049582

Pulled By: suo

fbshipit-source-id: 9e73a1d8ee8aa1849880815fa6fddebca408b8b4
2019-04-27 16:13:24 -07:00
096dd8a4f2 separate QualifiedName into its own file (#19566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19566
ghimport-source-id: c237f2a25d1aa9fc41f19fefe7a08a53a54279db

Differential Revision: D15032205

Reviewed By: shannonzhu

Pulled By: suo

fbshipit-source-id: 7527d97565559ebfb2556600eea5d93c1e141ac8
2019-04-27 16:13:20 -07:00
472be69a73 Avoid Output Uninitialized Blobs in Load with load_all=1 (#19133)
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.

See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133

Reviewed By: dzhulgakov

Differential Revision: D14883698

Pulled By: chandlerzuo

fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
2019-04-27 10:45:44 -07:00
268859ce0d Fix CUDA stream syncing bug in allgather and reduce_scatter (#19631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19631
ghimport-source-id: edc47e77d6ef03e966944ff98eefc22f2574eeaa

Reviewed By: mrshenli

Differential Revision: D15110077

Pulled By: mxw

fbshipit-source-id: 27a68308ade5ea511e2ea568a071eedb5d21c1ba
2019-04-27 08:35:56 -07:00
a25b79531c use fully qualified name for ScriptClasses (#19239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19239
ghimport-source-id: 830aad6dc11d2a7247760a9c7c9fc8556f70a706

Differential Revision: D14928293

Reviewed By: eellison

Pulled By: suo

fbshipit-source-id: d2efa5d7f7397526083278d6650b9cee8d967b1a
2019-04-26 19:17:21 -07:00
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
f9786ad351 Add support for LONG_BINGET pickler op (#19815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19815
ghimport-source-id: dd51c13892a8f0d91d726ae8ec65206d5e81f33e

Differential Revision: D15109969

Pulled By: driazati

fbshipit-source-id: da0bb5e30038173e74ca3e0e103dc11ba1638797
2019-04-26 17:13:48 -07:00
5a83a7424d fix optional type unification (#19813)
Summary:
Previously in type unification when we encountered an Optional[T] and a None, we would unify it to Optional[Optional[T]]. If you think about Optionals as a union of [T, None], then a union of [Optional[T], None] -> [T, None]. We should just be never create an Optional of an Optional.

The other fix is to change unify_types directly, but I think this is the more general fix, and would play more nicely with our optional type refinement, which also assumes we never encounter an Optional[Optional[T]].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19813

Reviewed By: suo

Differential Revision: D15103083

Pulled By: eellison

fbshipit-source-id: db803db10d6934eaa5458e7c1746546b0d0c0a6c
2019-04-26 16:14:51 -07:00
698103cdd6 DataLoader docs update to describe how workers are managed, including Windows. (#18091)
Summary:
It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091

Differential Revision: D15083766

Pulled By: soumith

fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf
2019-04-26 16:01:30 -07:00
4e6608e86d Revert D15103223: [pytorch][PR] [CUDA 10] Resolve host_define.h warnings
Differential Revision:
D15103223

Original commit changeset: 5b56c4dd9cc4

fbshipit-source-id: f9a8e5ff0ee54cf5bb588896ab26dd9f0fb9ba45
2019-04-26 16:01:27 -07:00
42fbeef5d7 update F.grid_sample doc for clarity (#19754)
Summary:
https://github.com/pytorch/pytorch/issues/19717
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19754

Differential Revision: D15085449

Pulled By: soumith

fbshipit-source-id: 0dda05bd395d58a496bf397ca7f1c50a239b0ed1
2019-04-26 16:01:24 -07:00
dc67d9f3b9 Cleanup documentation (#19584)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19584 [jit] Cleanup documentation**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/19584

Pulled By: driazati

Differential Revision: D15104801

fbshipit-source-id: 87391fd62ee92b615e680469f8bd9a1ac654be7e
2019-04-26 15:43:07 -07:00
75754beca3 Revert D14577575: [pytorch][PR] Fix lack of state init for adagrad and add share_memory flag
Differential Revision:
D14577575

Original commit changeset: 12440079ac96

fbshipit-source-id: 935106385e608471dc280fc61cfedf19d330812d
2019-04-26 15:43:04 -07:00
11297702b9 Fix the install of TensorBoard for doc generation (#19814)
Summary:
One more fix for https://github.com/pytorch/pytorch/pull/19810

We now know that we are running with python3, so no need to check python version. The quotes were probably causing problems here.

cc ezyang soumith zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19814

Differential Revision: D15106459

Pulled By: orionr

fbshipit-source-id: 0443b9b54d17fead9c8c2c9d8d2f373e1f95a28b
2019-04-26 14:56:04 -07:00
be20d65b70 Follow up to adaptive_max_pool3d() port (#19748)
Summary:
This is a follow up PR for #19547.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19748

Differential Revision: D15103230

Pulled By: ezyang

fbshipit-source-id: e7ce925faeadea502f77ed42d52e247c8c6571d8
2019-04-26 14:34:54 -07:00
cb4d41afcd Follow up to adaptive_max_pool2d() port (#19738)
Summary:
This is a follow up PR for #19409.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19738

Differential Revision: D15103231

Pulled By: ezyang

fbshipit-source-id: 11c9fec641b389906b8accd22504a683331fa6ec
2019-04-26 14:30:09 -07:00
2573e695b0 Resolve host_define.h warnings (#19789)
Summary:
Eigen was updated with the commit needed to get rid of this warning that plagued the CI. This PR bumps third_party/eigen to that commit head.
```
warning: #warning "host_defines.h is an internal header file and must not be used directly.  This file will be removed in a future CUDA release.  Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19789

Differential Revision: D15103223

Pulled By: ezyang

fbshipit-source-id: 5b56c4dd9cc41ff1794570ba2f6abfbe23f6ab68
2019-04-26 13:52:21 -07:00
c5845c4482 Add support for reduce-scatter in c10d (#18844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18844
ghimport-source-id: c6b2f0032c7c2212be2000a9c1f262f63d878a97

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18844 Add support for reduce-scatter in c10d**
* #18820 Refactor ProcessGroupNCCL collective primitives

Reviewed By: mrshenli

Differential Revision: D14768369

fbshipit-source-id: a9def7a0da6e9cd995e982371cc1e22f3df1a156
2019-04-26 13:46:57 -07:00
c9f380df02 Add aten mkldnn linear operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210

Reviewed By: dzhulgakov

Differential Revision: D14901641

fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11
2019-04-26 13:41:57 -07:00
48b81da4cb Add aten mkldnn view operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19209

Reviewed By: dzhulgakov

Differential Revision: D14894545

fbshipit-source-id: 69455184811de1d1444b5d494e4a9d8c83301431
2019-04-26 13:41:54 -07:00
61d5a8dded Add aten mkldnn add operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19207

Reviewed By: dzhulgakov

Differential Revision: D14889477

fbshipit-source-id: 2c5e5ea5dfc26a9c9a172c5fa2c6d7584b167e16
2019-04-26 13:41:51 -07:00
fb53c189b3 Add aten mkldnn batch_norm operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19206

Reviewed By: dzhulgakov

Differential Revision: D14887205

fbshipit-source-id: ea00c9e3205c449d08ab29535309164f951aab95
2019-04-26 13:41:48 -07:00
4864000e55 Add aten mkldnn ops: relu, max_pool2d and avg_pool2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19205

Reviewed By: dzhulgakov

Differential Revision: D14850598

fbshipit-source-id: 5bbd5909c06df9c980de680ffb81bf772766c0ba
2019-04-26 13:41:44 -07:00
3445020ca3 Add aten mkldnn conv2d operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19204

Reviewed By: dzhulgakov

Differential Revision: D14857513

fbshipit-source-id: 1172c9785e5a17a7d7360474551bdc7a511b3f2f
2019-04-26 13:41:41 -07:00
8f1445c406 Add is_mkldnn to at::Tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19633

Reviewed By: dzhulgakov

Differential Revision: D15053320

fbshipit-source-id: 12b9f85a025a9e957e1b7b3014ba44ae71bfd7a5
2019-04-26 13:41:38 -07:00
236c2b2387 Let script module buffer attributes can also cast device/type (#19700)
Summary:
Tested locally this  fix #19039, did not add a test since there's no way to create a script module in the cpp world.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19700

Differential Revision: D15094195

Pulled By: wanchaol

fbshipit-source-id: fcc2c1e5efbc160d976ae485ba2457442f62f065
2019-04-26 13:06:52 -07:00
5099db08d4 Ignore nn::Functional submodules in nn::Module serialization (#19740)
Summary:
Currently, the Python API doesn't serialize layers that don't have weights (such as `nn.ReLU` and `nn.MaxPool2d`e.g. in https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py#L80-L81). If one saves a model that contains weight-less layers in Python and tries to load it into C++, the C++ module loading code (`torch::load(...)`) will throw an error complaining that the expected layers are not found in the serialized file (e.g. https://github.com/pytorch/vision/pull/728#issuecomment-480974175). This PR solves the problem by ignoring layers that are not serializable (which currently only include `nn::Functional`) in the C++ module serialization code (`torch::save(...)` and `torch::load(...)`), and the user is expected to use `nn::Functional` to wrap the weight-less layers so that they can be ignored when serializing / deserializing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19740

Differential Revision: D15100575

Pulled By: yf225

fbshipit-source-id: 956481a2355d1de45341585abedda05e35d2ee8b
2019-04-26 12:47:23 -07:00
61d48aa989 Refactor ProcessGroupNCCL collective primitives (#18820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18820
ghimport-source-id: 220b2a3dd9d4d6d2e557e1802851f082c2dc6452

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18820 Refactor ProcessGroupNCCL collective primitives**

Planning to add reduce-scatter, but no room in my stomach for more
copypasta.

Also rewrote the tensor list validation logic.  The existing validation
was ill-suited for all the cases it was being used for; it took a vector
of input tensors and a vector of output tensors, but only ever received
either two references to the same vector, or a bespoke singleton vector
and a vector of outputs (for which it would ignore all but the first
output).  In the first case, it performed unnecessary checks, and in the
second, it skipped necessary ones.

Reviewed By: mrshenli

Differential Revision: D14762369

fbshipit-source-id: dcf882ce1c5854333a9eb4424bfc18d9f4648ddf
2019-04-26 12:38:48 -07:00
e1ebf330d5 Install TensorBoard for doc generation (#19810)
Summary:
In order to have `torch.utils.tensorboard.SummaryWriter` rendered in the documentation at the bottom of https://pytorch.org/docs/master/tensorboard.html we need to have TensorBoard installed.

This change makes it so our pinned version of `tb-nightly` is used for doc generation same as it is used for running tests at https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L45-L52

Eventually we'll use a pinned version of `pip install tensorboard`, but it's not on the release channel yet.

cc kostmo soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19810

Differential Revision: D15101730

Pulled By: orionr

fbshipit-source-id: c41678c4f9ef3d56a168f2b96a1ab05f351bdc56
2019-04-26 12:06:18 -07:00
bacc8815c7 update Anaconda download link (#19794)
Summary:
Now `https://www.continuum.io/` is redirected to `https://www.anaconda.com` and old Anaconda download link `https://www.continuum.io/downloads` is dead. This PR update it to `https://www.anaconda.com/distribution/#download-section`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19794

Differential Revision: D15099538

Pulled By: soumith

fbshipit-source-id: 967dcda34d9d446c0d26c0014f10cc710f69a0c5
2019-04-26 09:45:44 -07:00
dafee117e8 Removing unused arg f from _model_to_graph(). (#19647)
Summary:
Input argument `f` in `_model_to_graph()` method in `torch/onnx/utils.py` is unused. This PR removes it. If there's a reason to keep it around, please let me know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19647

Reviewed By: dzhulgakov

Differential Revision: D15071720

Pulled By: houseroad

fbshipit-source-id: 59e0dd7a4d5ebd64d0e30f274b3892a4d218c496
2019-04-26 09:40:52 -07:00
0d8a3610c5 Multiple module outputs and multiple calls to backward (#19799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19799

A module that returns multiple outputs and where the called may end up
doing multiple calls to torch.autograd.backward did not work with
DistributedDataParallel. It expected the first call to
torch.autograd.backward to provide gradients for ALL parameters that
expect gradients and were used in computing the module output. If you
have outputs with disjoint autograd graphs it is fine to call
torch.autograd.backward on both and fill in the module's parameter
gradients in separate chunks.

With this change we delay queuing the finalizer callback until we have
marked all buckets as ready, instead of queueing it the first time we
receive an autograd hook. This returns the current implementation to
be functionally equivalent to the DistributedDataParallel
implementation before #18953 was merged.

Reviewed By: mrshenli

Differential Revision: D15097045

fbshipit-source-id: 2df023319713bc31e29a8b45108c78e6593fccd4
2019-04-26 08:20:10 -07:00
dcfb5620df Allow passing lists as trace inputs.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19580

Differential Revision: D15034978

fbshipit-source-id: d3bc32ccae1c12104f2bde43fd4700d220bb3ca9
2019-04-26 02:41:57 -07:00
8f0603b128 C++ changes toward libtorch and libcaffe2 unification (#19554)
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
  caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554

Differential Revision: D15082727

Pulled By: kostmo

fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
2019-04-26 01:38:10 -07:00
9d180e602f More topi support (#19728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19728

Added `Tanh`, `Transpose` and `Mul` support.

Reviewed By: hlu1

Differential Revision: D15078878

fbshipit-source-id: 0a0df6b0d453bc38987b6d744774c127dd6875fe
2019-04-26 00:53:11 -07:00
c182824f69 Update foxi version (#19793)
Summary:
Update foxi to the latest version for group quantization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19793

Reviewed By: jackm321, houseroad

Differential Revision: D15095982

Pulled By: zrphercule

fbshipit-source-id: 0d1cb403cbda47a4fda9035e1712fced60ced283
2019-04-25 22:39:40 -07:00
20c22bcae4 Automatic update of fbcode/onnx to 22662bfd4dcc6baebf29e3b823a051676f991001 (#19790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19790

Previous import was 27d4b617e7097cda7d0d4c45ff2b09d248f33179

Included changes:
- **[22662bfd](https://github.com/onnx/onnx/commit/22662bfd)**: Bump up version number and update Versioning for 1.5.0 release (#1965) <Raymond Yang>
- **[b1a3a8c8](https://github.com/onnx/onnx/commit/b1a3a8c8)**: fix the ci (#1964) <Lu Fang>

Reviewed By: zrphercule

Differential Revision: D15095183

fbshipit-source-id: b69cb62685122b83a1493b2702aa6ec950ee15bf
2019-04-25 22:23:25 -07:00
f0d493d290 Add devtoolset 8 (gcc 8) + glibc 2.26 + centos 7.5 rocm docker image (#19767)
Summary:
xw285cornell

Will add py3.6-devtoolset8-glibc2.26-rocmrpm-centos7.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19767

Differential Revision: D15094446

Pulled By: bddppq

fbshipit-source-id: 01a932d893cf4559f98612888308b3ad6900a038
2019-04-25 22:13:20 -07:00
98e312cf96 TensorBoard support within PyTorch (#16196)
Summary:
This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX  code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at https://github.com/tensorflow/tensorboard/pull/2065.

With  these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using

```
import torch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
s1 = torch.rand(1)
writer.add_scalar('data/scalar1', s1[0], 0)
writer.close()
```

Design:
- `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard
- `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets.

Action items:
- [x] `from torch.utils.tensorboard import SummaryWriter`
- [x] rename functions
- [x] unittests
- [x] move actual writing function to tensorflow/tensorboard in https://github.com/tensorflow/tensorboard/pull/2065

Review:
- Please review for PyTorch standard formatting, code usage, etc.
- Please verify unittest usage is correct and executing in CI

Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future.

cc orionr, ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16196

Differential Revision: D15062901

Pulled By: orionr

fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e
2019-04-25 21:30:23 -07:00
97e80ab6fc Always enable autodiff check (#19787)
Summary:
disable_autodiff_subgraph_inlining should be always on to check AD regression.
Thanks eellison for spotting the test regression!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19787

Differential Revision: D15093104

Pulled By: ailzhang

fbshipit-source-id: 82a75a7dd7097d5f93a2e4074023da2105341c1b
2019-04-25 21:22:30 -07:00
48d5ab54a8 Automatic update of fbcode/foxi to 8f74bc4df3a4cfc69b1a3eadf62aa29d9961c72d AND update Glow AND update C2 (#19792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19792

This diff also contains the contents of D15092641 and D15090411 so as to not let c2, foxi, and glow get out of sync

Previous import was 81e1683d6348eee4b5ed1145222dc2c41be4269c

Included changes:
- **[8f74bc4](https://github.com/houseroad/foxi/commit/8f74bc4)**: Small fixes (#12) <Jack Montgomery>
- **[72097e4](https://github.com/houseroad/foxi/commit/72097e4)**: Add multiple quantization params per tensor (#11) <Jack Montgomery>
- **[b681fe0](https://github.com/houseroad/foxi/commit/b681fe0)**: Merge pull request #10 from jackm321/add_autoinstrument_graph_prop <Jack Montgomery>
- **[a68d835](https://github.com/houseroad/foxi/commit/a68d835)**: Add ONNXIFI_GRAPH_PROPERTY_AUTO_INSTRUMENT_NODES <Jack Montgomery>

Reviewed By: rdzhabarov, zrphercule

Differential Revision: D15086794

fbshipit-source-id: 8df02c62303b580e16a218d6be7791747e3d7213
2019-04-25 21:03:32 -07:00
7a8bc85f47 Profiler: add Self CPU Time Total, CPU time total and other general improvements (#19378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19378

Function profile events are typically nested. In this diff I
add parent child relationship to the intervals. This way we can
attribute self time easily. As a result, user printing a table from a
profiler trace gets self cpu time.

This diff doesn't try to address CUDA self time as CUDA kernels are
already getting special care in the profiler.

There are also some other minor improvements. Like reporting total CPU
time spent, reversed sorting, aggregated data after the table,
etc.

There is a new unit test added which tests more functionality than
previous profiler test

Reviewed By: zheng-xq

Differential Revision: D14988612

fbshipit-source-id: 2ee6f64f0a4d0b659c6b23c0510bf13aa46f07dc
2019-04-25 20:53:55 -07:00
6e06154c13 Quantized SumRelu (#19319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19319

Quantized SUM + ReLU (Fused). The implementation is the same as the one in the DNNLOWP.

Reviewed By: jianyuh

Differential Revision: D14866442

fbshipit-source-id: c8c737a37e35b6ce3c1c2077c07546aba16e0612
2019-04-25 18:01:21 -07:00
76307667ca Use the QTensor with QReLU (#19312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19312

Replaces the tuple hack with the QTensor. Please, note this can be landed ONLY after #18960 (D14810261) is landed.

Reviewed By: raghuramank100

Differential Revision: D14819460

fbshipit-source-id: 75ca649304b1619cb3cfe845962c9f226b8f884a
2019-04-25 18:01:17 -07:00
db9008496e Changing the rounding in the QTensor (#19714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19714

We had rounding in the quantizer set as `round(x/scale) + zp`. To make it consistent, converting it to `round(x/scale + zp)`.

Reviewed By: raghuramank100

Differential Revision: D15077095

fbshipit-source-id: 5d20a90391fe8c2e11b338c05631fcf7770320c3
2019-04-25 18:01:13 -07:00
e814c11045 Fix env vars needed for devtoolset7 binaries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19780

Differential Revision: D15091963

Pulled By: pjh5

fbshipit-source-id: 2594395b2313d5c8a37db28965d99b0541a227e3
2019-04-25 17:50:14 -07:00
c5cca65351 Fixing update_s3_htmls for binaries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19746

Differential Revision: D15091326

Pulled By: pjh5

fbshipit-source-id: ed172c678dd5659fa31d5d9b6ee1bf119ede2889
2019-04-25 17:24:02 -07:00
9ef8eb4cbc Fix case for activations attribute in nn.RNN ONNX export. (#19368)
Summary:
This PR addresses the https://github.com/pytorch/pytorch/issues/19366 issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19368

Reviewed By: zrphercule

Differential Revision: D15043949

Pulled By: houseroad

fbshipit-source-id: 9b90410307d31bc5f2fd14aa0cdd33b22572ed7c
2019-04-25 16:31:25 -07:00
a425e1cbf8 Remove duplicate inlineCallToCode (#19724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19724
ghimport-source-id: a68d28ac9bbe62dd61f03bfd9d57f4ef1d0ce9c9

Reviewed By: jamesr66a

Differential Revision: D15078532

Pulled By: zdevito

fbshipit-source-id: bebd34ff6105f538395260b027dc169448b5bc96
2019-04-25 15:53:10 -07:00
330990d878 Serialize first-class version of functions (#19723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19723
ghimport-source-id: 7f7ec6200c3b42d19046a3e228a3d82212697f14

Reviewed By: jamesr66a

Differential Revision: D15078533

Pulled By: zdevito

fbshipit-source-id: fe421afab9607ee942f6d200f04bb6335fc0aa97
2019-04-25 15:53:07 -07:00
6cb1b994d8 Trace directly into first-class module form. (#19722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19722
ghimport-source-id: b024666feccb324f5ba9aae4a6301723e04d9846

Reviewed By: jamesr66a

Differential Revision: D15078535

Pulled By: zdevito

fbshipit-source-id: b866b31c1864a090c545560cbecee81e34ad2d16
2019-04-25 15:53:03 -07:00
31524bda1f @torch.jit.script(fn) now is a torch.jit.Function (#19721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19721
ghimport-source-id: b4f5024adc845a82dc5197d19aab1496bf85089f

Reviewed By: jamesr66a

Differential Revision: D15078534

Pulled By: zdevito

fbshipit-source-id: 408d3a871302c5ac5d6426dc5de567f2188ebf4c
2019-04-25 15:53:00 -07:00
12f7c2dea3 pybind CompilationUnit and Function directly (#19720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19720
ghimport-source-id: c5829234dbbe8f7fe719ffce3fa92ce5198ffd21

Reviewed By: jamesr66a

Differential Revision: D15078536

Pulled By: zdevito

fbshipit-source-id: e617de31fc907a408fb50e18d9358dfd64de1f9e
2019-04-25 15:52:57 -07:00
bf5a5c2a31 caffe2 | Use _aligned_free in WorkerPool destruction (#19751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19751

This has probably never been tested on Windows but destruction of WorkersPool crashes because it uses _aligned_malloc to allocate and 'free' to deallocate, which is not symmetric. Fix is to use _aligned_free in deallocation

Reviewed By: hlu1

Differential Revision: D15083472

fbshipit-source-id: 42243fce8f2dfea7554b52e6b289d9fea81d7681
2019-04-25 14:54:50 -07:00
65496e4e67 Bug fix in bound shape inferencer (#19729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19729

Accessing dims() without boundary check is not good.

Reviewed By: zrphercule

Differential Revision: D15078912

fbshipit-source-id: 3746d0c18261abeec0c4880c30430125928c3309
2019-04-25 14:50:19 -07:00
556c8a300b Fall back to asking nvcc for detecting cuda version if no *cudaart* is found (#19741)
Summary:
This happens on Debian/Ubuntu with distribution-provided cuda repackaging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19741

Differential Revision: D15082550

Pulled By: soumith

fbshipit-source-id: 2ca39c6cdc9305896529b6fd537270116223cd6c
2019-04-25 10:54:20 -07:00
5025d1d5e4 Automatic update of fbcode/onnx to 27d4b617e7097cda7d0d4c45ff2b09d248f33179 (#19718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19718

Previous import was 0e8d2bc5e51455c70ef790b9f65aa632ed9bc8a7

Included changes:
- **[27d4b617](https://github.com/onnx/onnx/commit/27d4b617)**: Adding RoIAlign operator (#1869) <Sam Pepose>
- **[70c9026c](https://github.com/onnx/onnx/commit/70c9026c)**: add ReverseSequence op (#1927) <Guoliang Hua>
- **[ed2db02a](https://github.com/onnx/onnx/commit/ed2db02a)**: README.md: Update badge style for build status (#1942) <Yulong Wang>
- **[e36d3b54](https://github.com/onnx/onnx/commit/e36d3b54)**: Enable python 3.7 in CI for Windows (#1943) <Raymond Yang>

Differential Revision: D15077516

fbshipit-source-id: c8c6935381ff5a96ab9a4ee519685814f4ea6e59
2019-04-25 10:54:15 -07:00
bbedadddce Fix Circle CI for ONNX repo (#19725)
Summary:
New pip package becomes more restricted. We need to add extra flag to make the installation work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19725

Differential Revision: D15078698

Pulled By: houseroad

fbshipit-source-id: bbd782a0c913b5a1db3e9333de1ca7d88dc312f1
2019-04-25 10:48:41 -07:00
0effe1d4a4 Make interpolate bicubic match opencv result (#19703)
Summary:
Fixes #19650
When driazati started bicubic implementation we used TF result as ground truth. It turns out opencv version bicubic resize is used more commonly.
This PR does two things:
- Fix a bug where we didn't use area mode to compute source index
- Follow the Opencv logic to handle computed negative source indices(we used to bound them by 0).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19703

Differential Revision: D15078159

Pulled By: ailzhang

fbshipit-source-id: 06a32baf2fbc93b90a156b863b4f9fab326d3242
2019-04-25 10:21:31 -07:00
29d8711ef0 Fix compilation on Windows 10 (CUDA 10.0, Visual Studio 2017) (#19615)
Summary:
I want to use libtorch in a C++/CUDA project but as soon as I  include `<torch/torch.h>`, ".cu" files fail to compile:

`torch/csrc/jit/script/tree.h(64): error C3520: 'args': parameter pack must be expanded in this context`

This PR makes it build on my machine (don't know if it breaks anything though).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19615

Differential Revision: D15063712

Pulled By: ezyang

fbshipit-source-id: 7561e705f8f5b42b8e6a23430710b36508fee1ee
2019-04-25 09:37:17 -07:00
af06d6342c Add SGDR(Stochastic Gradient Descent with Warm Restarts) scheduler (#17226)
Summary:
Because of merge error with master in #15042, open a new PR for ezyang.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17226

Differential Revision: D14418145

Pulled By: mrshenli

fbshipit-source-id: 099ba225b28e6aba71760b81b2153ad1c40fbaae
2019-04-25 09:26:31 -07:00
465799fab3 Replace cpu_apply with TensorIterator inside of Copy function (#18618)
Summary:
Replace cpu_apply functions with the TensorIterator.
Vectorize copy and clone functions.
Move big pieces of the code to cpu kernels folder to be able to use AVX2.
Add fast path for copy_ function if tensor types matches.

Slow down observed on smaller tensors (up to 10% or about 1us per op.) which might be explained by the bigger CPU footprint of TensorInterator in compare to simpler cpu_apply. COntrary on bigger tensors we can see 2x-3x performance improvement (single threaded, multithreading giving even better performance boost).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18618

Differential Revision: D14954118

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d9bdf3fd9d5e539a03071cced50d0a47bac1615
2019-04-25 08:09:14 -07:00
6c7135decb fix typo: pytoch -> pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719

Differential Revision: D15080095

Pulled By: ezyang

fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af
2019-04-25 06:40:40 -07:00
3875e1ba45 try to make at::cat in mm_tree_reduction operate on contig tensors (#18816)
Summary:
Sometimes at::cat gets transposed inputs and goes on a slow path. Also, make jit_premul lstm benchmark add bias to the whole input tensor to avoid separate reduction kernels in the backward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18816

Differential Revision: D15013576

Pulled By: wanchaol

fbshipit-source-id: bcfa1cf44180b11b05b0f55f034707012f66281a
2019-04-24 23:44:25 -07:00
c571969148 Fix the insert_guard for norm decomposation (#19646)
Summary:
move the insert_guard all the way up to the beginning of the decomposation, this will fix the case that we lose insert_point context after decomposeCommonNormalization and we still need to modify the graph.

fixes #19502
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19646

Differential Revision: D15058040

Pulled By: wanchaol

fbshipit-source-id: ebdbf8623ebfe4556c461e1b650e94b905791adb
2019-04-24 23:12:37 -07:00
3c81eb3aa7 add max_pool2d to AD, add tests for both autodiff and inference mode
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19661

Differential Revision: D15074431

Pulled By: wanchaol

fbshipit-source-id: b31cf2126c2c5d6a12c2ef5dc67b57677652f1fc
2019-04-24 22:19:51 -07:00
cbd0a2d3c9 Fix the depthwise 3x3x3 fast path criteria for the stride (#19692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19692

Remove the requirement on stride for the optimized depthwise 3x3x3 kernels.

Reviewed By: jspark1105

Differential Revision: D15070214

fbshipit-source-id: 9fe2d8e96930166e4eb0e2dd2288f6a0c4831e0a
2019-04-24 21:35:27 -07:00
614871d948 use relative path to load libthnvrtc (#19690)
Summary:
We had a few hard to repro cases where very occasionally libthnvrtc failed to be loaded due to what looked like garbled dladdr return in `info.dli_fname` field. We could not root cause why this is happening, but this workaround avoids the problem altogether. $ORIGIN is already added to RPATH as the first search location, so dlopen("libthnnvrtc.so") will look for libthnvrtc in the caller (`libtorch.so.1`) directory, which was the purpose of the previous code that was getting `libtorch.so.1` directory using dladdr.
```
root@4ec0aab027a0:/opt/conda/lib/python3.6/site-packages/torch/lib# readelf -d ./libtorch.so.1 | grep RPATH
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN:/usr/local/cuda/lib64:/opt/conda/lib]
```
Hopefully, same should be happening on Mac.
cc zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19690

Differential Revision: D15076990

Pulled By: soumith

fbshipit-source-id: a4d2992ccf26953f1fc73f17c4e752d69c58e2fc
2019-04-24 21:09:31 -07:00
72b8b6c374 Change some comments related to moving copy_ to native (#19618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19618
ghimport-source-id: 6bb9965f2f7b72f602f03e27b664d7d7696edd00

Differential Revision: D15048632

Pulled By: li-roy

fbshipit-source-id: a2707e3086f3a9993780a7f76104c5f00f2a9618
2019-04-24 19:23:06 -07:00
17e4cd0c0a Remove old complex Types (#19616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19616
ghimport-source-id: d8b3e15d84d3e6f810af3cb83d1413c5f048bcdc

Differential Revision: D15047741

Pulled By: li-roy

fbshipit-source-id: 572045f88f410d97f60c56298018bfee6268b375
2019-04-24 19:18:16 -07:00
6fead42eb8 Remove function variant of copy_ (#19622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19622
ghimport-source-id: 41eadd845e2cbd113735b8650dca39354e1c7f1d

Differential Revision: D15049274

Pulled By: li-roy

fbshipit-source-id: 4f7d7cb2c8339e5e7e35f95397fe6a3f4b7c74f3
2019-04-24 17:57:18 -07:00
a6811e17c0 Restore copy_ overload with async arg (#19641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641
ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379

Differential Revision: D15056755

Pulled By: li-roy

fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3
2019-04-24 17:51:50 -07:00
c08f3d06c3 Add some of nn.init to weak script (#19640)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19640 [jit] Add some of nn.init to weak script**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/19640

Pulled By: driazati

Differential Revision: D15065332

fbshipit-source-id: 30df9f02e527cd5e5ebe34b7e003444eae96c66d
2019-04-24 17:00:48 -07:00
9aa0e6078f Support serializing std::vector<torch::Tensor> (#19677)
Summary:
In the distributed training development work, we need to be able to serialize a `std::vector` of `torch::Tensor`s. This PR adds support for serializing `std::vector<torch::Tensor>`.

cc. mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19677

Differential Revision: D15069860

Pulled By: yf225

fbshipit-source-id: 505147e5f5fea78be1bf60fb8418bc187dbc2a98
2019-04-24 16:50:16 -07:00
32174bedb8 Fix fuser tests on sandcastle (#19684)
Summary:
suo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19684

Reviewed By: suo

Differential Revision: D15067942

Pulled By: jamesr66a

fbshipit-source-id: 697a836ea37dab78fffd092194cecd8294ca9907
2019-04-24 16:50:13 -07:00
3d6e956412 Add LONG_BINPUT to unpickler (#19696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19696
ghimport-source-id: 8d711cd3ed2b2810b5b3d765564429882f96d1f1

Differential Revision: D15072658

Pulled By: driazati

fbshipit-source-id: a28a90218874e07cfbed4f8df3d6d23ae5e70933
2019-04-24 16:39:23 -07:00
62447a5aa3 improve err msg (#19645)
Summary:
Print out the tensor value when throwing the cannot insert tensor with grad error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19645

Differential Revision: D15057809

Pulled By: eellison

fbshipit-source-id: 3f622ef1322a75c965e780275f1fb447e9acf38d
2019-04-24 16:22:07 -07:00
6ec55c13a9 Enable assignment for QTensor in pytorch frontend (#19676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19676

Make copy work with QTensor, enable assignment of QTensor in pytorch frontend.

Differential Revision: D15064710

fbshipit-source-id: 04f2dc02a825695d41fa1114bfca49e92108fef3
2019-04-24 16:05:34 -07:00
4a65ee95cc Make torch.equal work with boolean CPU tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19604

Differential Revision: D15056022

Pulled By: li-roy

fbshipit-source-id: 1309b107b2d4ee0a490bce1b43c3c175180a1580
2019-04-24 15:51:10 -07:00
d14abe3aff Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688)
Summary:
Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688

Differential Revision: D15012644

Pulled By: VitalyFedyunin

fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd
2019-04-24 15:38:56 -07:00
d247912dbf Add no-gpu build mode for all of PyTorch and Caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19687

Differential Revision: D15023347

fbshipit-source-id: 5bed0d72e8ff337e066c142ca5c8e2c2bae93746
2019-04-24 13:27:59 -07:00
c855e04d5f Caffe2 shouldn't fail if CUDA peer access is already enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19586

Differential Revision: D15061544

Pulled By: dzhulgakov

fbshipit-source-id: 6a5f9f4fe45259d689671f58ad5206cdaf15c5bd
2019-04-24 13:22:27 -07:00
960513006f Support exporting squeeze & unsqueeze with negative dim attribute
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19297

Reviewed By: zrphercule

Differential Revision: D14953525

Pulled By: houseroad

fbshipit-source-id: 8d7eecd2804b8e27d3ee4ad6e763352818d02d0c
2019-04-24 12:45:59 -07:00
b675f07bb6 Remove useless input shape checker in conv (#19608)
Summary:
The input shape checkers in conv/int8_conv operator is aims to avoid the issue when running with mkldnn winograd, the weigths has to be reordered each time if input shape changed.
However, the checkers result to big performance regression due to frequent reorder.

Meanwhile, in mkldnn-bridge, such case has been already fixed by correcting the prop_kind.
Therefore, we have to remove the useless checker to fix the performance regression.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19608

Differential Revision: D15061169

Pulled By: yinghai

fbshipit-source-id: 649a43ae6fce989e84939210f6dffb143ec3d350
2019-04-24 11:39:43 -07:00
87a6974193 Make it possible for self.forward to return a ScriptMethod (#19217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19217
ghimport-source-id: 6fdd7f5ac041dae950b47ca316f30682ede0b083

Reviewed By: suo

Differential Revision: D14922120

Pulled By: zdevito

fbshipit-source-id: 5e82e5d7ee72df6f401146d2519c80ea336ff40e
2019-04-24 11:14:34 -07:00
2f73b3d26e Add if ops support for onnxifi and ssa-rewrite (#19585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19585

Originally we will unroll all If op to many different subnets;
Now we will not unroll it anymore, but just add all external input of its subnet to the If op, and ssa-rewrite all external input/outputs. That would be enough.

Reviewed By: yinghai

Differential Revision: D15038139

fbshipit-source-id: 8532216d8749068acd5558ad0d8cb1d98463a063
2019-04-24 11:01:13 -07:00
41486306d9 GCC ABI variants for nightly builds (#18888)
Summary:
closes #17492
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18888

Differential Revision: D15065093

Pulled By: pjh5

fbshipit-source-id: 6abeabf68b91106fc8ae9df238f6a40613d40b57
2019-04-24 10:08:56 -07:00
5e62ee2b97 Fix no SIGCHLD checking in DataLoaderIter._shutdown_workers (#19421)
Summary:
Also

1. Bump multiprocessing test timeout following python core tests
2. Fix one type of flakiness in `test_proper_exit`.
3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`.
3. Give `test_proper_exit` another try.

I'll heavily retest this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421

Differential Revision: D15063728

Pulled By: ezyang

fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c
2019-04-24 08:06:58 -07:00
c42f3f9055 Revert D15008160: Enable assignment for QTensor in pytorch frontend
Differential Revision:
D15008160

Original commit changeset: 5f1166246d76

fbshipit-source-id: 24c7350431ae6a87199d6e3f7ffbbc8ec7d3c28b
2019-04-24 06:58:13 -07:00
84b275b70f fix rocm test (#19663)
Summary:
for some reason exec in python fails on rocm build alone
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19663

Differential Revision: D15061382

Pulled By: eellison

fbshipit-source-id: d6e1776e88c22de973796e5080147e6d31aba477
2019-04-24 00:48:27 -07:00
8273b9b3cb Enforce consistent dict iteration order for trace inputs. (#19528)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#19528 [pytorch] Enforce consistent dict iteration order for trace inputs.**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D15023656/)

Don't iterate down unordered_maps and expect ordering. Should fix test flakiness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19528

Differential Revision: D15023656

Pulled By: efaust

fbshipit-source-id: 91c9a31a8652fcf93ae0e942bea4cec67bb490c9
2019-04-23 23:36:48 -07:00
309c15e2df Enable assignment for QTensor in pytorch frontend (#19530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19530
Make copy work with QTensor, enable assignment of QTensor in pytorch frontend.

Differential Revision: D15008160

fbshipit-source-id: 5f1166246d768b23f009cde1fa03e8952368a332
2019-04-23 21:29:31 -07:00
d902774cad Dont introduce aliasing in CSE or Constant Pooling (#19576)
Summary:
We can't introduce aliasing to a graph output, since they may be mutated after.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19576

Differential Revision: D15057734

Pulled By: eellison

fbshipit-source-id: 33594c05d985a0c58edebd6252e1ee2c0efb6f0e
2019-04-23 20:39:09 -07:00
ba1cf38718 Remove QTensor alias (#19635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19635

att

Differential Revision: D15053349

fbshipit-source-id: 7cd0e6c9ff567d05b051527410f452b059458af2
2019-04-23 20:34:11 -07:00
8b798f43e3 Commit explicit libtorch_python sources (#19607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607

Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable.

As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems

Reviewed By: ezyang

Differential Revision: D15023348

fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df
2019-04-23 19:49:42 -07:00
5119cc7cdf builtin ivalues sort (#19572)
Summary:
Add sorting to all the lists which we specialize on (Tensor, int, float, bool).

First part of https://github.com/pytorch/pytorch/issues/19372
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19572

Differential Revision: D15052677

Pulled By: eellison

fbshipit-source-id: 301e8e0e3e29e04aca1311410db0a474fd833cff
2019-04-23 16:38:08 -07:00
80020b3d2d Guard {set,rebase}_history on grad_fn check (#19623)
Summary:
We would previously have statements like

```
set_history(flatten_tensor_args( result ), grad_fn);
```

Internally, {set,rebase}_history would check grad_fn and short circuit if it is nullptr. However, this means that we are executing the expression `flatten_tensor_args( result )` and immediately throwing away the results. This was causing unnecessary allocations + overhead.

My JIT overhead benchmark script (with custom benchmark method):

```
import torch, time

torch.jit.script
def add(x, y):
    return x + y

a = torch.rand([])
b = torch.rand([])

niter = 1000000

with torch.no_grad():
    s = time.time()
    add.__getattr__('forward').benchmark(niter, a, b)
    e = time.time() - s
    print('overhead per call (us)', e / niter * 1e6)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19623

Differential Revision: D15053399

Pulled By: jamesr66a

fbshipit-source-id: 8777e1a2b5c5a5bbd3a035b7247c8154c5fc4aa6
2019-04-23 15:40:11 -07:00
fb9fc42a0c optimize BatchMatmulOp (#18612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18612

optimize BatchMatmulOp

Reviewed By: houseroad

Differential Revision: D14681665

fbshipit-source-id: cf5ea4909ace58fd44fe6fa634531102ac84e851
2019-04-23 15:34:59 -07:00
176bdc0722 fix lint (#19632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19632

at

Differential Revision: D15052952

fbshipit-source-id: 7c38fad99799e5ac914685c36eadf932afe52b74
2019-04-23 15:29:38 -07:00
9b272affde Add base support to torch.logspace, default base=10 (#19542)
Summary:
Add base support for torch.logspace. See #19220 for details.
SsnL can you feedback? Thanks a lot.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542

Differential Revision: D15028484

Pulled By: soumith

fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e
2019-04-23 15:06:34 -07:00
96b966297e disable flake8 E302 (two blank lines) (#19634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19634
ghimport-source-id: 68b11ac3c19daf8df3bbf11e6181e9450899e90a

Differential Revision: D15053466

Pulled By: suo

fbshipit-source-id: 09d7859aa2059fc9eb3b47fa62467537bab40e05
2019-04-23 15:06:31 -07:00
6b8771a7a6 fix nn.Sequential doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19597

Differential Revision: D15042383

Pulled By: soumith

fbshipit-source-id: f912ed2a726a17fcc25795ff66b73ae4caacd247
2019-04-23 14:58:16 -07:00
70b82d28b8 caffe2 | Windows compat fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19531

Reviewed By: hlu1

Differential Revision: D15024541

fbshipit-source-id: cd8249a6d529afb65fa8afd74a05dbfe73eb1fb0
2019-04-23 14:30:19 -07:00
2e048feb9e Remove fixed TODO (#19590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19590

-

Reviewed By: ezyang

Differential Revision: D15039561

fbshipit-source-id: 246cf4fa91a33cb4c96750b534b8c3d0c312f311
2019-04-23 13:50:22 -07:00
55e53d3d7e correct comments in group_norm_op (#19621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19621

Comments for group_norm_op is not accurate (i.e., the math part), this diff will fix it.

Reviewed By: BIT-silence

Differential Revision: D15048695

fbshipit-source-id: 27d41d3ae21054257967815254134849944d56ca
2019-04-23 13:31:15 -07:00
5f82d59c0a Simplify argument test cases (#19593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19593

Removes a lot of duplication

Reviewed By: dzhulgakov

Differential Revision: D15039887

fbshipit-source-id: e90fe024b84220dd337fdd314d8f7e3620baec28
2019-04-23 12:58:35 -07:00
fddd763ec1 Add test cases for optional of list (#19592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19592

This is already supported but wasn't tested yet

Reviewed By: ezyang

Differential Revision: D15039888

fbshipit-source-id: dc8ea724c76dd1719b1d4810a20c8f958e5beecc
2019-04-23 12:58:32 -07:00
fc8834df4b Port adaptive_max_pool3d() to ATen (#19547)
Summary:
This is the second part of #18064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19547

Differential Revision: D15046630

Pulled By: ezyang

fbshipit-source-id: 03f80602b94d47bca66bfd0dcab1b7bb99e5b7f1
2019-04-23 12:51:25 -07:00
0922a64d22 add torch.tensor requires grad (#19445)
Summary:
Add setting requires_grad = True within torchscript to torch.Tensor

Within constant propagation, we can't insert any constants that require grad.

Also added shape analysis and requires grad analysis to torch.tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19445

Differential Revision: D15046211

Pulled By: eellison

fbshipit-source-id: b4ef7a6b4b6b8dc03e1fa49f87dc415874cd1998
2019-04-23 12:27:52 -07:00
4e8cc8ee90 Surface the Glow traces to C2 (#19087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19087

att

Reviewed By: jackm321

Differential Revision: D14863112

fbshipit-source-id: 2680161b9f05391e73bb8dac4fbbeabb87a82c05
2019-04-23 12:27:49 -07:00
444f792fa6 Fix lack of state init for adagrad and add share_memory flag (#17679)
Summary:
The current code initialize the `state` in `__init__` method, but the initialization process is not invoked in `add_parameter_group`.

I followed the same approach in other Optimizers to init the `state`.

```python
import torch

emb = torch.nn.Embedding(10,10)
emb2 = torch.nn.Embedding(10,10)

optim = torch.optim.Adagrad(emb.parameters())
print(optim.state[emb.weight])  # already initialized

optim.add_param_group({'params': emb2.parameters()})
print(optim.state[emb2.weight])  # empty dict

loss = emb2.weight.sum() + emb.weight.sum()
loss.backward()
optim.step()  # raised KeyError
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17679

Differential Revision: D14577575

Pulled By: ezyang

fbshipit-source-id: 12440079ac964b9eedad48e393d47f558babe300
2019-04-23 12:22:19 -07:00
0d0acba3bd Allow extracting element-wise loss in softmax (#19579)
Summary:
Often times, we want to experiment with loss per element (image etc.). This changeset allows getting per element loss as well. This output is optional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19579

Reviewed By: jerryzh168

Differential Revision: D15035797

Pulled By: prigoyal

fbshipit-source-id: 562dea514f49c1f2f1cbbc083a1938dc019a75c4
2019-04-23 11:49:49 -07:00
e9c8f372c4 dispatch max_pools with no indices, expose max_pools to torch namespace (#19449)
Summary:
in functional interfaces we do boolean dispatch, but all to max_pool\*d_with_indices. This change it to emit max_pool\*d op instead when it's not necessary to expose with_indices ops to different backends (for jit).

It also bind max_pool\*d to the torch namespace, which is the same behavior with avg_pool\*d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449

Differential Revision: D15016839

Pulled By: wanchaol

fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c
2019-04-23 11:20:05 -07:00
f3be2816ae Adds fakeQuantizePerTensorAffineOp to pytorch (#19387)
Summary:
Adding fakequant op so that we can use it in pytorch models, the exact implementation might change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/19387

Differential Revision: D13739657

fbshipit-source-id: d5cb084e843d236bb1da9827ac1ba3900ed99786
2019-04-23 11:12:53 -07:00
1b3967b491 -fno-math-errno -fno-trapping-math (#19552)
Summary:
As suggested in https://github.com/pytorch/pytorch/pull/19152#discussion_r275925767, this may give the compiler more opportunities for auto-vectorization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19552

Differential Revision: D15048358

Pulled By: jamesr66a

fbshipit-source-id: db2c2c515c3e9f7d22305c039ab0c8a867fc43a2
2019-04-23 11:06:49 -07:00
d8729efabe Only require python print on certain namespaces (#19383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19383
ghimport-source-id: b93c7849a52d11ecbf26b614704740d44a2447f9

Differential Revision: D15032727

Pulled By: bwasti

fbshipit-source-id: a19f72abb99e63d87eab13022538f325b2e20526
2019-04-23 10:52:48 -07:00
3cc60e54e3 Use fbgemm for quantize/dequantize ops (#19500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19500

Changes the `quantize_linear` and `dequantize` to `fbgemm`-based implementation.

Reviewed By: jianyuh, jerryzh168

Differential Revision: D15014561

fbshipit-source-id: b651e69d336b5b08b4a75a4a4eddf46c040a4934
2019-04-23 10:30:12 -07:00
714344a976 Specify to use Float16UniformFill if necessary in sparse lookup layer (#18499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18499

If the init op is not fp16 compatible, it should throw.
However, in the special case where the original init op is UniformFill,
we replace it with Float16UniformFill

Reviewed By: kennyhorror

Differential Revision: D14627209

fbshipit-source-id: eb427772874a732ca8b3a25d06670d119ce8ac14
2019-04-23 10:14:08 -07:00
e3f1504621 Fix the Division by Zero Bug of CosineAnnealingLR (#19180)
Summary:
Added the formula for the corner case. Updated unit tests.

Fixes #17913
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19180

Differential Revision: D14942023

Pulled By: ezyang

fbshipit-source-id: 167c109b97a7830d5b24541dc91e4788d531feec
2019-04-23 09:54:28 -07:00
7a4189696f Fix the documentation for BCEWithLogitsLoss (#17218, #16804) (#19212)
Summary:
I fixed a mistake in the explanation of `pos_weight` argument in `BCEWithLogitsLoss` and added an example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19212

Differential Revision: D14923431

Pulled By: ezyang

fbshipit-source-id: 15696c67d56789102ac72afbe9bdd7b667eae5a0
2019-04-23 09:54:24 -07:00
bb05f70724 fix the docstring of RandomSampler (#19113)
Summary:
fix
- the order of `Arguments` in `RandomSampler` doc
- the meaningless check of `replacement`'s type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19113

Differential Revision: D15013081

Pulled By: ezyang

fbshipit-source-id: 39e367f42841de6814b1214eb9df7b75f14f747e
2019-04-23 09:54:20 -07:00
83cf9473dc Avoid (future) cusparse name collision (#19591)
Summary:
A future version of cusparse will define "cusparseGetErrorString." This PR simply updates PyTorch's name for this function to "getCusparseErrorString" to avoid the collision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19591

Differential Revision: D15046871

Pulled By: ezyang

fbshipit-source-id: 821304f75fe84c68a26680a93809a18cfdbd540b
2019-04-23 09:40:15 -07:00
f767c9ac76 Add docs and test guaranteeing indices from torch.nonzero ordered C-style (#19539)
Summary:
See #17556.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19539

Differential Revision: D15030151

Pulled By: ezyang

fbshipit-source-id: d46ee56a66d89b0113f86e3f8693dc1680d0adb9
2019-04-23 09:29:21 -07:00
3b4d4ef503 Remove unnecessary printing from tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19606

Differential Revision: D15046583

Pulled By: ezyang

fbshipit-source-id: ea9bb691d23855e7eddbabe68bf112a726641ba4
2019-04-23 09:24:08 -07:00
36084908e4 Fix lr_scheduler's last_epoch value at the time of initialization (BC BREAKING!) (#7889)
Summary:
Hello everyone :) !!

I've found that lr_scheduler was initialized with last_epoch as -1.
This causes that even after the first step (not the one in init but explicit step of scheduler),
learning rate of scheduler's optimizer remains as the previous.
```python
>>> import torch
>>> cc = torch.nn.Conv2d(10,10,3)
>>> myinitial_lr = 0.1
>>> myoptimizer = torch.optim.Adam(cc.parameters(), lr=myinitial_lr)
>>> mylrdecay = 0.5
>>> myscheduler = torch.optim.lr_scheduler.ExponentialLR(myoptimizer,mylrdecay)

>>> myscheduler.get_lr()
[0.2]    # this is because of  get_lr calculates lr by 0.1 * 0.5^-1
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.1    # this is not consistent with get_lr value
>>> myscheduler.last_epoch
-1

>>> myscheduler.step()
>>> myscheduler.get_lr()
[0.1]    # this should be the value right after the init, not after first step
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.1    # since this is after first step, it should have been decayed as 0.05
>>> myscheduler.last_epoch
0

>>> myscheduler.step()
>>> myscheduler.last_epoch
1
>>> myscheduler.get_lr()
[0.05]
>>> myscheduler.optimizer.param_groups[0]["lr"]
0.05
>>> myscheduler.last_epoch
1
```

First problem is, even after the init of lr_scheduler, you get the inconsistent parameter values.

The second problem is, you are stuck with same learning rate in the first 2 epochs if the step function of lr_scheduler is not called in the beginning of the epoch loop.
Of course, you can avoid this by calling lr_scheduler's step in the beginning,
but I don't think this is proper use since, incase of optimizer, step is called in the end of the iteration loop.

I've simply avoided all above issues by setting last_epoch as 0 after the initialization.

This also makes sense when you init with some value of last_epoch which is not -1.
For example, if you want to init with last epoch 10,
lr should not be set with decayed 1 step further. Which is
last_epoch gets +1 in the previous code.
base_lr * self.gamma ** self.last_epoch

Instead, it should be set with step 10 exact value.

I hope this fix find it's way with all your help :)
I'm really looking forward & excited to become a contributor for pytorch!
Pytorch Rocks!!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7889

Differential Revision: D15012769

Pulled By: ezyang

fbshipit-source-id: 258fc3009ea7b7390a3cf2e8a3682eafb506b08b
2019-04-23 08:54:09 -07:00
f9c4ce781f Removes variable which is assigned but not used (#19194)
Summary:
n was set as self.in_channels, but not used within the scope of the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19194

Differential Revision: D14937764

Pulled By: ezyang

fbshipit-source-id: 55cb599109309503fee897f77d798fd454fcc02d
2019-04-23 08:48:03 -07:00
dce3d74dfb add torch.cuda.synchronize(device=None) (#19573)
Summary:
fixes https://github.com/pytorch/pytorch/issues/19509
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19573

Differential Revision: D15045730

Pulled By: ezyang

fbshipit-source-id: 732721b4b360fc4348ca7c87d4cd1386e7651bdd
2019-04-23 08:40:38 -07:00
75ce5173a9 Port adaptive_max_pool2d() to ATen (#19409)
Summary:
This is the first part of  #18064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19409

Differential Revision: D15037390

Pulled By: ezyang

fbshipit-source-id: 16a3feed2fd9cc66033696da224a7d5fb7208534
2019-04-23 07:37:25 -07:00
88f78c719a Fix math formatting of PairwiseDistance and CosineSimilarity docs and fix math formatting of CTC loss docs.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19534

Differential Revision: D15034011

Pulled By: ezyang

fbshipit-source-id: 60b81c970c919508a57c86fb23edc9f64973117c
2019-04-23 07:24:07 -07:00
5bafb64e67 Revert D15039713: [pytorch][PR] add torch.tensor requires grad
Differential Revision:
D15039713

Original commit changeset: 47f1931b6fc4

fbshipit-source-id: fd91ce8ddd6d2f4e0016054dcdc2541dacc0e191
2019-04-22 23:15:49 -07:00
e7fc7c732c Bugfix for fusion device check (#19594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19594

I missed a callsite

Reviewed By: wanchaol

Differential Revision: D15041457

fbshipit-source-id: eef76ad51bee06a56d31b4ab64f19250fe2ad8f0
2019-04-22 20:55:17 -07:00
d2b03512da add torch.tensor requires grad (#19445)
Summary:
Add setting requires_grad = True within torchscript to torch.Tensor

Within constant propagation, we can't insert any constants that require grad.

Also added shape analysis and requires grad analysis to torch.tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19445

Differential Revision: D15039713

Pulled By: eellison

fbshipit-source-id: 47f1931b6fc4a1137c13d80110cc404465bfdf06
2019-04-22 18:02:41 -07:00
8be6d5ffd8 Add onnx support for _unique2 operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19582

Reviewed By: ezyang, jamesr66a

Differential Revision: D15037375

fbshipit-source-id: 6060476925bf02fa07f852054e06d2107f046e38
2019-04-22 17:52:47 -07:00
5a796d15be Automatic update of fbcode/onnx to 0e8d2bc5e51455c70ef790b9f65aa632ed9bc8a7 (#19568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19568

Previous import was 83dd62659fc07d5b7fa93b5d1c1879f93509c7db

Included changes:
- **[0e8d2bc5](https://github.com/onnx/onnx/commit/0e8d2bc5)**: [Minor need to be in 1.5]Fix an issue in NMS test data which introduce wrong shape. (#1953) <Hector Li>
- **[9346dd5d](https://github.com/onnx/onnx/commit/9346dd5d)**: adding modulus operator (#1874) <Jeff Saremi>
- **[414dbc73](https://github.com/onnx/onnx/commit/414dbc73)**: Fix shape inference for slice (#1950) <Hariharan Seshadri>
- **[6fb0775d](https://github.com/onnx/onnx/commit/6fb0775d)**: Fix shape inference for ConstantOfShape op (#1951) <Ashwini Khade>

Reviewed By: bddppq, zrphercule, benoitsteiner

Differential Revision: D15033070

fbshipit-source-id: f7eb90b142cbdc9bf1600cfd33e5a8df709045fb
2019-04-22 17:36:36 -07:00
5be4bee4ff Don't create FusionGroups for known-CPU producer values (#19342)
Summary:
I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342

Differential Revision: D15038351

Pulled By: jamesr66a

fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db
2019-04-22 16:57:18 -07:00
969af4315a Explicitly define supported types (#19516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19516

Explicitly define types that are supported in kernel inputs and outputs.
Also, this allows us to show much nicer error messages if a user writes kernels with wrong argument types.

Reviewed By: ezyang

Differential Revision: D15020306

fbshipit-source-id: 55ebec81e075e874777acd59aa29a5578fc19ef7
2019-04-22 16:31:28 -07:00
8abab61d39 IRParser: optionally create name->value map of the parsed IR. (#19551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19551
ghimport-source-id: e666e3c00786a3b1c747f2dd6e85a48a63bdd69d

Differential Revision: D15028056

Pulled By: ZolotukhinM

fbshipit-source-id: 37e08d6df1d43513748ecfdd8549738eac7ec24e
2019-04-22 16:09:05 -07:00
43d0b78c31 Profiling : Adding Profile Op to provide storage for profiling lambdas
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19365

Differential Revision: D14998968

Pulled By: Krovatkin

fbshipit-source-id: a7f7d1529cbe4e8b30638c6eb8e2ff68f6e114c3
2019-04-22 15:09:30 -07:00
7f053b27bc Step 5: remove _unique_dim in favor of unique_dim (#18654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18654
ghimport-source-id: 63c84cedc3335719fca4a085fa19bdc57d2bc88a

Differential Revision: D15000635

Pulled By: VitalyFedyunin

fbshipit-source-id: 9e8594622a867a79d8e2b6be96579816aa22ae2d
2019-04-22 12:42:51 -07:00
767d184b77 Add back option to not adjust output batch size (#19442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19442

For cases like CV, some of ops like transpose and tile will mangle the batch size so that we don't know how to adjust output batch size. In this case, the current solution is just fix the input batch statically and do not adjust output batch size.

Reviewed By: zrphercule

Differential Revision: D15007237

fbshipit-source-id: a21b943a52ee5462d9d7804dfae44360f579f8cf
2019-04-22 12:29:24 -07:00
7655b857f7 Add debug logic to c2_ref_test and its helpers (#19359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19359

Even with file IO exception handling, some of the sandcastle c2_ref_tests are still failing in length-check assert, as can be seen here:
https://our.intern.facebook.com/intern/test/844424932589974?ref_report_id=0

This is an attempt to add printing logic to debug what's going on.

Reviewed By: dzhulgakov

Differential Revision: D14966274

fbshipit-source-id: adce6d4780d664c5ef59f9341b6133b0d09324cb
2019-04-22 12:08:55 -07:00
a09240b0a0 fix variable shadowing issus (#19567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19567

fix variable shadowing

Reviewed By: bddppq, wx1988

Differential Revision: D15032114

fbshipit-source-id: 895ea21f22b87db8c7c8684f54fa186d22f24d10
2019-04-22 11:55:30 -07:00
19f73180cf Add manual_seed in script (#19510)
Summary:
Add manual_seed to torch script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19510

Reviewed By: suo, driazati

Differential Revision: D15018823

Pulled By: eellison

fbshipit-source-id: d7734a8ad05ba254c0d88abf3fb58c4ce6a4e53b
2019-04-22 10:58:15 -07:00
e714429bf4 Automatic update of fbcode/onnx to 83dd62659fc07d5b7fa93b5d1c1879f93509c7db (#19454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19454

Previous import was ad7313470a9119d7e1afda7edf1d654497ee80ab

Included changes:
- **[83dd6265](https://github.com/onnx/onnx/commit/83dd6265)**: Add NonMaxSuppression operator (#1703) <Hector Li>
- **[31ca5d6f](https://github.com/onnx/onnx/commit/31ca5d6f)**: add node tests for quantized ops (#1944) <Ashwini Khade>
- **[e6076c1d](https://github.com/onnx/onnx/commit/e6076c1d)**: Fix test stat coverage script (#1948) <Raymond Yang>
- **[ad036405](https://github.com/onnx/onnx/commit/ad036405)**: Add IsInf to detect infinity values (#1884) <Wei-Sheng Chin>

Reviewed By: benoitsteiner

Differential Revision: D15010015

fbshipit-source-id: 4b29de21de60f8e6a2db75309809a4e619c92532
2019-04-22 10:46:08 -07:00
5d5d67fa3f Get rid of unnecessary matches_jit_signature: True specifications. (#19549)
Summary:
Unstacked version of https://github.com/pytorch/pytorch/pull/19431.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19549

Reviewed By: ezyang

Differential Revision: D15027965

Pulled By: gchanan

fbshipit-source-id: a4456326a999d77d6baeb0edbb1bb5db5208a8f8
2019-04-22 10:26:29 -07:00
c30224ad21 Rename potri to cholesky_inverse (#19498)
Summary:
Changelog:
- Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498

Differential Revision: D15029901

Pulled By: ezyang

fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a
2019-04-22 08:18:39 -07:00
deadf3ba89 Add assertion to make sure init op is always fp16 compatible in fp16 training
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18498

Reviewed By: kennyhorror

Differential Revision: D14626755

fbshipit-source-id: d8a0b3c02920ab3835911a21bf05e8956853fcd7
2019-04-21 23:43:13 -07:00
689dd800ed Generate only one Type class per backend (#19295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19295
ghimport-source-id: 9345110f91f044a449804ddd5116cc9179444a00

Differential Revision: D14948581

Pulled By: li-roy

fbshipit-source-id: a317b03d58d621e8df162918038f7543bfb13ba2
2019-04-21 21:16:14 -07:00
189f30603c Make complex its own backend (#19275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19275
ghimport-source-id: 73fd40b02152aed6f24225a88d7ffde7f700899e

Differential Revision: D14948582

Pulled By: li-roy

fbshipit-source-id: a1be6e57057defc74a007c5351c5edb2b9dcaf30
2019-04-21 21:16:10 -07:00
ab78449e8c Add ScalarType argument to Type::options() (#19270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19270
ghimport-source-id: a5ade6131f3260066c5750ea1fa9ed5c998bb791

Differential Revision: D14938707

Pulled By: li-roy

fbshipit-source-id: 018fb3f01706531a06515d6d861e5683a455a705
2019-04-21 21:16:07 -07:00
a044ba1af5 Generate cases for all ScalarTypes in Type functions that call to TH (#19230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19230
ghimport-source-id: 81f360f2ebd137b8e7d8e885b85246cc219761aa

Differential Revision: D14927991

Pulled By: li-roy

fbshipit-source-id: 1b6a57918ecdc9c87858d3e50578edef0b6e7ad5
2019-04-21 21:16:03 -07:00
868933a467 Fix clang-format. (#19550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19550
ghimport-source-id: 980d96762426d3e97c26839edbaf107a3fc18b2f

Differential Revision: D15028055

Pulled By: ZolotukhinM

fbshipit-source-id: a50a0aaa74d0f1b9249ad79ab80e4b7747c3bffc
2019-04-21 20:31:09 -07:00
1bd5f2c181 Fix some typos in jit README
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19548

Differential Revision: D15028275

Pulled By: mrshenli

fbshipit-source-id: 84ff635be3b4681962451b4c301271683174d7a8
2019-04-21 19:45:05 -07:00
5afc274708 Match JIT signature with triu_indices / tril_indices. (#19484)
Summary:
This just plugs into the existing mechanism to do a direct translation to TensorOptions in the backend, so no codegen changes.

After this lands, all native_functions will match the JIT signature.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19484

Differential Revision: D15013051

Pulled By: gchanan

fbshipit-source-id: 6818f868d2f765ca3e56e7e6f75fe4f68492466c
2019-04-21 15:57:52 -07:00
9eb48e1b03 Make one_hot non-differentiable. (#19524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19524
ghimport-source-id: ceda3ad43471242ebbd272a21de11731c7d8bef6

Differential Revision: D15021417

Pulled By: gchanan

fbshipit-source-id: 65d1f17a32f81f47dba5e58e343d0b7b828e1d51
2019-04-21 14:14:37 -07:00
6733037416 Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19523
ghimport-source-id: 618a15c2d1d9af9f87b46e32f10ff77111c2e3b7

Differential Revision: D15021420

Pulled By: gchanan

fbshipit-source-id: 048af8da3128de10bdee5827b6fbc169c3ad25a8
2019-04-21 14:14:34 -07:00
3944601588 Have _embedding_bag_dense_backward match JIT signature. (#19522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19522
ghimport-source-id: ad645d87396de645a1aff5fd9d9939cb79cf6558

Differential Revision: D15021419

Pulled By: gchanan

fbshipit-source-id: bd7017edadb4ec9d43cefddf0aee8c52c5cca6a4
2019-04-21 14:14:30 -07:00
e3523979ae Have embedding_dense_backward match JIT signature. (#19521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19521
ghimport-source-id: 817d3defb5f4ee98bae1f0488f99cb0e9a5226a2

Differential Revision: D15021376

Pulled By: gchanan

fbshipit-source-id: 2e29f1d3913f94fab3347dc48676303510d7da46
2019-04-21 14:14:27 -07:00
638ffac359 Update mkldnn-bridge to fix crash issue in DNNLOWP dequantize op (#19159)
Summary:
Remove an useless format checker in mkldnn-bridge to fix the crash issue in DNNLOWP dequantize op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19159

Differential Revision: D15027670

Pulled By: yinghai

fbshipit-source-id: ac97d6ff94de013105108b9596b1bd7621c5aa75
2019-04-21 14:05:13 -07:00
83373e7755 Hook up non_differentiability in derivatives.yaml when no autograd function is generated. (#19520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19520
ghimport-source-id: a1272aa0b23692fb189974c4daba7b2e4e0dad50

Differential Revision: D15021380

Pulled By: gchanan

fbshipit-source-id: ec83efd4bb6d17714c060f13a0527a33a10452db
2019-04-21 13:48:55 -07:00
8868a4f20b Move non_differentiable_arg_names from autograd functions to differentiability_info. (#19519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19519
ghimport-source-id: 74e603688b2e4ed33f6c46c7da9d009336140e74

Differential Revision: D15021378

Pulled By: gchanan

fbshipit-source-id: e366a914c67a90ba0552b67d0bf5b347edbaf189
2019-04-21 11:09:39 -07:00
6d307db5b4 Move cuFFT plan cache note outside Best Practices (#19538)
Summary:
I mistakenly put it there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19538

Differential Revision: D15026500

Pulled By: soumith

fbshipit-source-id: 0c13499571fdfd789c3bd1c4b58abd870725d422
2019-04-20 21:39:59 -07:00
26f1c6d4d4 Revert D14689639: [pytorch] Allow passing lists as trace inputs.
Differential Revision:
D14689639

Original commit changeset: 6dcec8a64319

fbshipit-source-id: 03a5e7c80e7f2420e33b056b5844a78d7fd41141
2019-04-20 08:50:47 -07:00
c96c91da22 Improve optimizations for DNNLOWP support on MKL-DNN (#18843)
Summary:
In this PR, the fusion alogrithms are improved to support DNNLOWP.
1. Enabled conv fusions for DNNLOWP
2. Fused order switch op into following quantize op
3. Improve conv+sum fusion to parse larger scope/window
4. re-org fusion code to fix random crash issue due to changing graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18843

Differential Revision: D15021030

Pulled By: yinghai

fbshipit-source-id: 88d2199d9fc69f392de9bfbe1f291e0ebf78ab08
2019-04-20 02:12:06 -07:00
fe87327c28 Make Observer class as template Quant class for QuantConfig (#19418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19418

This change makes Observer class template which always
takes an observer function as argument. Second test-case becomes redundant, hence removing
it.

Reviewed By: jerryzh168

Differential Revision: D15000594

fbshipit-source-id: 9555fe98a5f2054b8fd01e64e9ac2db72c043bfa
2019-04-19 21:47:54 -07:00
9f4f7e1621 Support compilation on gcc-7.4.0 (#19470)
Summary:
There are two corrections in this pull request.
The first is specific to gcc-7.4.0.
compiled with -std=c++14 gcc-7.4.0 has __cplusplus = 201402L
This does not meet the check set in Deprecated.h, which asks for >201402L.
The compiler goes down to the __GNUC__ check, which passes and sets C10_DEPRECATED_MESSAGE to a value that c++14 does not appear to support or even recognize, leading to a compile time error.
My recommended solution, which worked for my case, was to change the = into a >=

The second correction comes in response to this error:
caffe2/operators/crash_op.cc: In member function ‘virtual bool caffe2::CrashOp::RunOnDevice()’:
caffe2/operators/crash_op.cc:14:11: error: ‘SIGABRT’ was not declared in this scope

I am merely committing to the repository the solution suggested here (which worked for me)
https://discuss.pytorch.org/t/building-pytorch-from-source-in-conda-fails-in-pytorch-caffe2-operators-crash-op-cc/42859
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19470

Differential Revision: D15019529

Pulled By: ailzhang

fbshipit-source-id: 9ce9d713c860ee5fd4266e5c2a7f336a97d7a90d
2019-04-19 21:41:36 -07:00
d17c22d024 Improve embedding_bag add kernel (#19329)
Summary:
This was actually getting pretty poor throughput with respect to memory bandwidth. I used this test to measure the memory bandwidth specifically for the AXPY call: https://gist.github.com/jamesr66a/b27ff9ecbe036eed5ec310c0a3cc53c5

And I got ~8 GB/s before this change, but ~14 GB/s after this change.

This seems to speed up the operator overall by around 1.3x (benchmark: https://gist.github.com/jamesr66a/c533817c334d0be432720ef5e54a4166):

== Before ==

time_per_iter 0.0001298875093460083
GB/s 3.082544287868467

== After ==

time_per_iter 0.00010104801654815674
GB/s 3.9623142905451076

The large difference between the local BW increase and the full-op BW increase likely indicates significant time is being spent elsewhere in the op, so I will investigate that.

EDIT: I updated this PR to include a call into caffe2/perfkernels. This is the progression:

before

time_per_iter 8.983819484710693e-05
GB/s 4.456723564864611

After no axpy
time_per_iter 7.19951868057251e-05
GB/s 5.56126065872172

AFter perfkernels
time_per_iter 5.6699180603027346e-05
GB/s 7.061548257694262

After perfkernels no grad
time_per_iter 4.388842582702637e-05
GB/s 9.122769670026413
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19329

Reviewed By: dzhulgakov

Differential Revision: D14969630

Pulled By: jamesr66a

fbshipit-source-id: 42d1015772c87bedd119e33c0aa2c8105160a738
2019-04-19 19:16:24 -07:00
6325b6e44e Make finding unused model parameters optional (#19515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19515

This is still done by default, but can now be disabled by specifying
`find_unused_parameters=False`. There are use cases where finding
unused parameters results in erroneous behavior, because a subset of
model parameters is used *outside* the `forward` function. One can
argue that doing this is not a good idea, but we should not break
existing use cases without an escape hatch. This configuration
parameter is that escape hatch.

Reviewed By: bddppq

Differential Revision: D15016381

fbshipit-source-id: f2f86b60771b3801ab52776e62b5fd6748ddeed0
2019-04-19 17:23:36 -07:00
63e2833ceb Disallow std::vector arguments (#19511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19511

In the c10 operator registration API, disallow std::vector arguments and show a nice error message
pointing users towards using ArrayRef instead.

Reviewed By: ezyang

Differential Revision: D15017423

fbshipit-source-id: 157ecc1298bbc598d2e310a16041edf195aaeff5
2019-04-19 17:06:31 -07:00
1ac14b03b5 Drop instead of pop (#19503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19503

After reading the arguments from the stack, the c10 kernel wrapper accidentally popped them again, causing a vector to be allocated.
Instead, it should just drop them because they have already been read.

Reviewed By: ezyang

Differential Revision: D15016023

fbshipit-source-id: b694a2929f97fa77cebe247ec2e49820a3c818d5
2019-04-19 17:06:28 -07:00
9818c7cb63 Add minimalistic implementation of subgraph matcher. (#19322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19322
ghimport-source-id: 93c713f829d1b2a9aa5d104cb1f30148dd37c967

Differential Revision: D14962182

Pulled By: ZolotukhinM

fbshipit-source-id: 3989fba06502011bed9c24f12648d0baa2a4480c
2019-04-19 16:35:16 -07:00
26f12af537 Fix op benchmarks error in OSS environment (#19518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19518

Previous design needs to run the op benchmarks from PyTorch root directory which could lead to `module not found` error in OSS environment. This diff fixes that issue by making the benchmark to be launched in the `benchmarks` folder.

Reviewed By: ilia-cher

Differential Revision: D15020787

fbshipit-source-id: eb09814a33432a66cc857702bc86538cd17bea3b
2019-04-19 16:25:16 -07:00
5da7b74d48 fix AI-PEP path error (#19514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19514

as title

Reviewed By: hl475

Differential Revision: D15018499

fbshipit-source-id: 9ce38e3a577432e0575a6743f5dcd2e907d3ab9d
2019-04-19 16:25:13 -07:00
a421f882dc First step at container aliasing (#18710)
Summary:
First step at allowing container types within alias analysis.

Since the current implementation hides the concept of Wildcards within alias analysis and does not expose it to memory dag, we cannot represent whether a container type holds a wildcard. As a result, only handle TupleConstruct, where we can directly inspect if any input values are wildcards, and don't handle nested containers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18710

Differential Revision: D15017068

Pulled By: eellison

fbshipit-source-id: 3ee76a5482cef1cc4a10f034593ca21019161c18
2019-04-19 16:07:11 -07:00
f5fe7aa0b2 Fix relu bug for empty tensor (#19451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19451

Fix relu bug for empty tensor

Reviewed By: xianjiec

Differential Revision: D15009811

fbshipit-source-id: b75e567c3bec08d7d12b950d8f1380c50c138704
2019-04-19 15:21:07 -07:00
2d4875b8ed Allow passing lists as trace inputs.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18636

Differential Revision: D14689639

fbshipit-source-id: 6dcec8a64319ae3c4da9a93f574a13ce8ec223a5
2019-04-19 13:37:50 -07:00
9245eaf3f0 Allow for segmented printing in PythonPrint (#19238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19238
ghimport-source-id: 469d33cd187fa68840b201d625800a0f4fead547

Differential Revision: D14928291

Reviewed By: zdevito

Pulled By: suo

fbshipit-source-id: 257fce3dd1601ba192092d3fc318374e3752907e
2019-04-19 13:02:06 -07:00
73c166a5ed add resolveType to Resolver (#19237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19237
ghimport-source-id: 70777ec37155be37efef1b743d564752e4dff9de

Differential Revision: D14928289

Reviewed By: zdevito

Pulled By: suo

fbshipit-source-id: 46827da9ace16730669fc654bf781d83172d18b1
2019-04-19 13:02:02 -07:00
1e94a3bc4d Turn resolver into a class (#19236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19236
ghimport-source-id: d36705ea5ecff085d0d84ea57bb96d18d7c260dd

Differential Revision: D14928292

Reviewed By: zdevito

Pulled By: suo

fbshipit-source-id: cd038100ac423fa1c19d0547b9e5487a633a2258
2019-04-19 13:01:59 -07:00
405c7bcea0 Fix bad annotation in docs (#19501)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#19501 [jit] Fix bad annotation in docs**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/19501

Pulled By: driazati

Differential Revision: D15016062

fbshipit-source-id: 3dcd0481eb48b84e98ffe8c5df2cbc9c2abf99f9
2019-04-19 12:42:26 -07:00
b85edac16f Fix out-of-topological-order issue in Nomnigraph (#19458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458

The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti.

Differential Revision: D15011893

fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942
2019-04-19 12:19:39 -07:00
53bb739b67 Remove uses of TypeID (#19452)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19452
ghimport-source-id: 816ae7fe1a18d76f064d5796dec44dca6a138a21

Differential Revision: D15009920

Pulled By: li-roy

fbshipit-source-id: 722f05a927528148555561da62839f84dba645c6
2019-04-19 12:07:35 -07:00
3762cf9cc6 Expose QScheme in frontend (#19381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19381

Expose QScheme enum in frontend so that people can use it in
quantization configs in modules.

Differential Revision: D14922992

fbshipit-source-id: ab07b8a7ec42c1c1f5fe84a4a0c805adbcad408d
2019-04-19 11:57:59 -07:00
1898e9368b Revert D15003385: Have embedding_dense_backward match JIT signature.
Differential Revision:
D15003385

Original commit changeset: 53cbe18aa454

fbshipit-source-id: be904ee2212aa9e402715c436a84d95f6cde326f
2019-04-19 11:27:16 -07:00
e3470ae4bd Revert D15003379: Have _embedding_bag_dense_backward match JIT signature.
Differential Revision:
D15003379

Original commit changeset: f8e82800171f

fbshipit-source-id: 55f83557998d166aeb41d00d7a590acdc76fcf22
2019-04-19 11:27:13 -07:00
79bfc3931a Revert D15003387: Remove 'BoolTensor', 'IndexTensor' from frontend specifications.
Differential Revision:
D15003387

Original commit changeset: e518e8ce3228

fbshipit-source-id: af5b107239446ea8d6f229a427d5b157fcafd224
2019-04-19 11:27:10 -07:00
013926cfcf Revert D15003382: Make one_hot non-differentiable.
Differential Revision:
D15003382

Original commit changeset: e9244c7a5f0a

fbshipit-source-id: 84789cf4c46c77cce655e70c2a8ff425f32f48bd
2019-04-19 11:27:08 -07:00
fc1aadec3b Make empty_affine_quantized private (#19446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19446

change empty_affine_quantized to _empty_affine_quantized

Reviewed By: dzhulgakov

Differential Revision: D15008757

fbshipit-source-id: c7699ac0c208a8f17d88e95193970c75ba7219d3
2019-04-19 11:21:44 -07:00
c3755eeeee Make one_hot non-differentiable. (#19430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19430
ghimport-source-id: 6787473873fdc21400138a4322e17fee8db62607

Differential Revision: D15003382

Pulled By: gchanan

fbshipit-source-id: e9244c7a5f0ad7cd2f79635944a8b37f910231c9
2019-04-19 11:03:14 -07:00
622cf1fec9 Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19429
ghimport-source-id: 6116682b84210a34babb8b87a92e7050433e5d59

Differential Revision: D15003387

Pulled By: gchanan

fbshipit-source-id: e518e8ce322810e06175bb4e6672d4ea1eb18efd
2019-04-19 11:03:12 -07:00
b0812d3d4c Have embedding_dense_backward match JIT signature. (#19427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19427
ghimport-source-id: 93438cd495129a1e41118c62e6339909783035fd

Differential Revision: D15003385

Pulled By: gchanan

fbshipit-source-id: 53cbe18aa4541a2501f496abfee526e40093c0ff
2019-04-19 11:03:09 -07:00
a6ab443e32 Have _embedding_bag_dense_backward match JIT signature. (#19428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19428
ghimport-source-id: 037efa3df95efc1fbff631826351d1698a3c49ec

Differential Revision: D15003379

Pulled By: gchanan

fbshipit-source-id: f8e82800171f632e28535e416283d858156068ec
2019-04-19 11:03:06 -07:00
30b2953b8b Stop generating autograd functions for derivatives.yaml entries that only specify output differentiability. (#19424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19424
ghimport-source-id: e9d1b86742607f5cbe39fb278fa7f378739cd6ef

Differential Revision: D15003380

Pulled By: gchanan

fbshipit-source-id: 8efb94fbc0b843863021bf25deab57c492086237
2019-04-19 10:56:20 -07:00
e7b9526dc6 Fix ord() when dealing with utf8 chars (#19423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19423
ghimport-source-id: e7449489fbc86ec1116f94027b3c1561942413ee

Reviewed By: eellison

Differential Revision: D15002847

Pulled By: driazati

fbshipit-source-id: 4560cebcfca695447423d48d65ed364e7dbdbedb
2019-04-19 10:27:04 -07:00
557b1b362f Fix copied optimizer (#19308)
Summary:
Add the defaults field to the copied object.
Prior to this patch, optimizer.__getattr__ has excluded the defaults
attribute of optimizer source object, required by some LR schedulers. (e.g. CyclicLR with momentum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19308

Differential Revision: D15012801

Pulled By: soumith

fbshipit-source-id: 95801b269f6f9d78d531d4fed95c973b280cc96f
2019-04-19 10:27:01 -07:00
30292d994f Add an identity module (#19249)
Summary:
This is a simple yet useful addition to the torch.nn modules: an identity module. This is a first draft - please let me know what you think and I will edit my PR.

 There is no identity module - nn.Sequential() can be used, however it is argument sensitive so can't be used interchangably with any other module. This adds nn.Identity(...) which can be swapped with any module because it has dummy arguments. It's also more understandable than seeing an empty Sequential inside a model.

See discussion on #9160. The current solution is to use nn.Sequential(). However this won't work as follows:

```python
batch_norm = nn.BatchNorm2d
if dont_use_batch_norm:
    batch_norm = Identity
```

Then in your network, you have:

```python
nn.Sequential(
    ...
    batch_norm(N, momentum=0.05),
    ...
)
```

If you try to simply set `Identity = nn.Sequential`, this will fail since `nn.Sequential` expects modules as arguments. Of course there are many ways to get around this, including:

- Conditionally adding modules to an existing Sequential module
- Not using Sequential but writing the usual `forward` function with an if statement
- ...

**However, I think that an identity module is the most pythonic strategy,** assuming you want to use nn.Sequential.

Using the very simple class (this isn't the same as the one in my commit):

```python
class Identity(nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__()
    def forward(self, x):
        return x
```

we can get around using nn.Sequential, and `batch_norm(N, momentum=0.05)` will work. There are of course other situations this would be useful.

Thank you.
Best,
Miles
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19249

Differential Revision: D15012969

Pulled By: ezyang

fbshipit-source-id: 9f47e252137a1679e306fd4c169dca832eb82c0c
2019-04-19 10:12:18 -07:00
ef499cd567 Remove no-fork workaround for running tests with ROCm (#19436)
Summary:
This should have been fixed in newest ROCm version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19436

Reviewed By: ezyang

Differential Revision: D15004685

Pulled By: bddppq

fbshipit-source-id: 19fd4cca94c914dc54aabfbb4e62b328aa348a35
2019-04-19 09:51:03 -07:00
f3ef94a806 Delete defunct test/ffi directory. (#19168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19168
ghimport-source-id: 5190a8d00c529735e99e8745c5e7cf1901fdb800

Differential Revision: D14938318

Pulled By: ezyang

fbshipit-source-id: eaeb6814178c434f737b99ae1fce63fd9ecdb432
2019-04-19 08:16:49 -07:00
a97330b7c5 Fix missing doc out= for torch.cumprod (#19340)
Summary:
Fix #19255 by adding the `out=None` argument for `torch.cumprod` missing [here](https://pytorch.org/docs/master/torch.html#torch.cumprod) also added the docstring for `out` in torch.cumsum which was missing [here](https://pytorch.org/docs/master/torch.html#torch.cumsum)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19340

Differential Revision: D14973931

Pulled By: ezyang

fbshipit-source-id: 232f5c9a606b749d67d068afad151539866fedda
2019-04-19 07:59:57 -07:00
0676ba0c5c Mention packed accessors in tensor basics doc (#19464)
Summary:
This is a continuation of efforts into packed accessor awareness.
A very simple example is added, along with the mention that the template can hold more arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19464

Differential Revision: D15012564

Pulled By: soumith

fbshipit-source-id: a19ed536e016fae519b062d847cc58aef01b1b92
2019-04-19 07:20:16 -07:00
ea6c738c8a Rename 'not_differentiable' to 'non_differentiable'. (#19272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19272
ghimport-source-id: 755e91efa68c5a1c4377a6853f21b3eee3f8cab5

Differential Revision: D15003381

Pulled By: gchanan

fbshipit-source-id: 54db27c5c5e65acf65821543db3217de9dd9bdb5
2019-04-19 07:07:55 -07:00
aa50f1e365 Clean the onnx constant fold code a bit (#19398)
Summary:
This is a follow up PR of https://github.com/pytorch/pytorch/pull/18698 to lint the code using clang-format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19398

Differential Revision: D14994517

Pulled By: houseroad

fbshipit-source-id: 2ae9f93e66ce66892a1edc9543ea03932cd82bee
2019-04-18 23:59:26 -07:00
593bb145ce Allow passing dicts as trace inputs. (#18092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18092

Previously, tracing required all inputs to be either tensors,
or tuples of tensor. Now, we allow users to pass dicts as well.

Differential Revision: D14491795

fbshipit-source-id: 7a2df218e5d00f898d01fa5b9669f9d674280be3
2019-04-18 23:52:00 -07:00
9034b66f14 skip test_trace_c10_ops if _caffe2 is not built (#19099)
Summary:
fix https://github.com/pytorch/pytorch/issues/18142
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19099

Differential Revision: D15010452

Pulled By: houseroad

fbshipit-source-id: 5bf158d7fce7bfde109d364a3a9c85b83761fffb
2019-04-18 23:40:15 -07:00
d9115b533a remove needless ## in REGISTER_ALLOCATOR definition. (#19261)
Summary:
remove needless ## in REGISTER_ALLOCATOR definition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19261

Differential Revision: D15002025

Pulled By: soumith

fbshipit-source-id: 40614b1d79d1fe05ccf43f0ae5aab950e4c875c2
2019-04-18 22:44:09 -07:00
9983c24cfc Strip doc_string from exported ONNX models (#18882)
Summary:
Strip the doc_string by default from the exported ONNX models (this string has the stack trace and information about the local repos and folders, which can be confidential).

The users can still generate the doc_string by specifying add_doc_string=True in torch.onnx.export().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18882

Differential Revision: D14889684

Pulled By: houseroad

fbshipit-source-id: 26d2c23c8dc3f484544aa854b507ada429adb9b8
2019-04-18 22:30:00 -07:00
f0d98199fb improve dim sort performance (#19379)
Summary:
We are already using custom comparators for sorting (for a good reason), but are still making 2 sorting passes - global sort and stable sorting to bring values into their slices. Using a custom comparator to sort within a slice allows us to avoid second sorting pass and brings up to 50% perf improvement.
t-vi I know you are moving sort to ATen, and changing THC is discouraged, but #18350 seems dormant. I'm fine with #18350 landing first, and then I can put in these changes.
cc umanwizard for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19379

Differential Revision: D15011019

Pulled By: soumith

fbshipit-source-id: 48e5f5aef51789b166bb72c75b393707a9aed57c
2019-04-18 22:25:08 -07:00
941ccd6b35 Fix missing import sys in pin_memory.py (#19419)
Summary:
kostmo pointed this out in #15331. Thanks :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19419

Differential Revision: D15002846

Pulled By: soumith

fbshipit-source-id: c600fab3f7a7a5147994b9363910af4565c7ee65
2019-04-18 22:19:26 -07:00
Ran
940caed0d4 update documentation of PairwiseDistance#19241 (#19412)
Summary:
Fix the documentation of PairwiseDistance [#19241](https://github.com/pytorch/pytorch/issues/19241)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19412

Differential Revision: D14998271

Pulled By: soumith

fbshipit-source-id: bcb2aa46d3b3102c4480f2d24072a5e14b049888
2019-04-18 22:13:52 -07:00
8638634a6e fixes link in TripletMarginLoss (#19417)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19245
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19417

Differential Revision: D15001610

Pulled By: soumith

fbshipit-source-id: 1b85ebe196eb5a3af5eb83d914dafa83b9b35b31
2019-04-18 22:13:48 -07:00
08f5c05d60 make separate operators as independent binaries (#19450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19450

We want to make each operator benchmark as a separate binary. The previous way to run the benchmark is by collecting all operators into a single binary, it is unnecessary when we want to filter a specific operator. This diff aims to resolve that issue.

Reviewed By: ilia-cher

Differential Revision: D14808159

fbshipit-source-id: 43cd25b219c6e358d0cd2a61463b34596bf3bfac
2019-04-18 20:00:47 -07:00
1e78252de7 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: a727513842c0a240b377bda4e313fbedbc54c2e8
2019-04-18 18:34:36 -07:00
e1750754c8 Step 4: add support for unique with dim=None (#18651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18651
ghimport-source-id: e11988130a3f9a73529de0b0d08b4ec25fbc639c

Differential Revision: D15000463

Pulled By: VitalyFedyunin

fbshipit-source-id: 9e258e473dea6a3fc2307da2119b887ba3f7934a
2019-04-18 18:28:07 -07:00
1b1d1c9837 allow bools to be used as attributes (#19440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19440
ghimport-source-id: 9c962054d760526bf7da324b114455fcb1038521

Differential Revision: D15005723

Pulled By: suo

fbshipit-source-id: 75fc87ae33894fc34d3b913881defb7e6b8d7af0
2019-04-18 18:13:21 -07:00
a0e09216f0 Fix test build (#19444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19444
ghimport-source-id: c85db00e8037e7f6f0424eb8bd17f957d20b7247

Reviewed By: eellison

Differential Revision: D15008679

Pulled By: driazati

fbshipit-source-id: 0987035116d9d0069794d96395c8ad458ba7c121
2019-04-18 18:05:04 -07:00
b9291f55bb pow scalar exponent / base autodiff, fusion (#19324)
Summary:
Fixes: #19253

Fixing pow(Tensor, float) is straightforward.
The breakage for pow(float, Tensor) is a bit more subtle to trigger, and fixing needs `torch.log` (`math.log` didn't work) from the newly merged #19115  (Thanks ngimel for pointing out this has landed.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19324

Differential Revision: D15003531

Pulled By: ailzhang

fbshipit-source-id: 8b22138fa27a43806b82886fb3a7b557bbb5a865
2019-04-18 17:58:35 -07:00
b4fa979a37 Improve unique CPU performance for returning counts (#19352)
Summary:
Benchmark on a tensor of shape `torch.Size([15320, 2])`. Benchmark code:

```python
print(torch.__version__)
print()
a = tensor.flatten()
print('cpu, sorted=False:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=False)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=False, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=False, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=False, return_inverse=True, return_counts=True)
print()
print('cpu, sorted=True:')
%timeit torch._unique2_temporary_will_remove_soon(a)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True)
print()
```

Before
```
1.1.0a0+36854fe

cpu, sorted=False:
340 µs ± 4.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
724 µs ± 6.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
54.3 ms ± 469 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.6 ms ± 659 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted=True:
341 µs ± 7.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
727 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
54.7 ms ± 795 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.3 ms ± 647 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

After
```
1.1.0a0+261d9e8

cpu, sorted=False:
350 µs ± 865 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
771 µs ± 598 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 6.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 4.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cpu, sorted=True:
324 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
705 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 5.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 5.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19352

Differential Revision: D14984717

Pulled By: VitalyFedyunin

fbshipit-source-id: 3c56f85705ab13a92ec7406f4f30be77226a3210
2019-04-18 17:52:59 -07:00
563de88aa5 Revert D14909203: Remove usages of TypeID
Differential Revision:
D14909203

Original commit changeset: d716179c484a

fbshipit-source-id: 992ff1fcd6d35d3f2ae768c7e164b7a0ba871914
2019-04-18 17:47:39 -07:00
ce969c0bc4 Add tests for argument types (#19290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19290

Add test cases for the supported argument types
And TODOs for some unsupported ones that we might want to support.

Reviewed By: dzhulgakov

Differential Revision: D14931920

fbshipit-source-id: c47bbb295a54ac9dc62569bf5c273368c834392c
2019-04-18 17:20:13 -07:00
d9052b2176 Allow optionals arguments from C++ (#19311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19311
ghimport-source-id: 699f62eb2bbad53ff2045fb2e217eb1402f2cdc5

Reviewed By: eellison

Differential Revision: D14983059

Pulled By: driazati

fbshipit-source-id: 442f96d6bd2a8ce67807ccad2594b39aae489ca5
2019-04-18 17:15:05 -07:00
45d5b6be48 Enhance front-end to add op (#19433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19433

For operator benchmark project, we need to cover a lot of operators, so the interface for adding operators needs to be very clean and simple. This diff is implementing a new interface to add op.

Here is the logic to add new operator to the benchmark:
```
long_config = {}
short_config = {}

map_func

add_test(
  [long_config, short_config],
  map_func,
  [caffe2 op]
  [pt op]
)
```

Reviewed By: zheng-xq

Differential Revision: D14791191

fbshipit-source-id: ac6738507cf1b9d6013dc8e546a2022a9b177f05
2019-04-18 17:07:02 -07:00
edf77fe64a Fix cpp_custom_type_hack variable handling (#19400)
Summary:
My bad - it might be called in variable and non-variable context. So it's better to just inherit variable-ness from the caller.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19400

Reviewed By: ezyang

Differential Revision: D14994781

Pulled By: dzhulgakov

fbshipit-source-id: cb9d055b44a2e1d7bbf2e937d558e6bc75037f5b
2019-04-18 16:44:25 -07:00
4c93be0fa0 fix hub doc formatting issues (#19434)
Summary:
minor fixes for doc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19434

Differential Revision: D15003903

Pulled By: ailzhang

fbshipit-source-id: 400768d9a5ee24f9183faeec9762b688c48c531b
2019-04-18 16:02:19 -07:00
a5c4348d54 Recursively find tensors in DDP module output (#19360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19360

We'll return the output object verbatim since it is a freeform object.
We need to find any tensors in this object, though, because we need to
figure out which parameters were used during this forward pass, to
ensure we short circuit reduction for any unused parameters.

Before this commit only lists were handled and the functionality went
untested. This commit adds support for dicts and recursive structures,
and also adds a test case.

Closes #19354.

Reviewed By: mrshenli

Differential Revision: D14978016

fbshipit-source-id: 4bb6999520871fb6a9e4561608afa64d55f4f3a8
2019-04-18 14:57:09 -07:00
17f05ad5e5 Moving at::Tensor into caffe2::Tensor without bumping refcount (#19388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388

The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor.
Now, it is possible to move it without a refcount bump.

Reviewed By: dzhulgakov

Differential Revision: D14986815

fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b
2019-04-18 14:13:26 -07:00
88f70a1670 Fix pickling torch.float32 (#18045)
Summary:
Attempt fix for #14057 . This PR fixes the example script in the issue.
The old behavior is a bit confusing here. What happened to pickling is python2 failed to recognize `torch.float32` is in module `torch`, thus it's looking for `torch.float32` in module `__main__`. Python3 is smart enough to handle it.
According to the doc [here](https://docs.python.org/2/library/pickle.html#object.__reduce__), it seems `__reduce__` should return `float32` instead of the old name `torch.float32`. In this way python2 is able to find `float32` in `torch` module.
> If a string is returned, it names a global variable whose contents are pickled as normal. The string returned by __reduce__() should be the object’s local name relative to its module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18045

Differential Revision: D14990638

Pulled By: ailzhang

fbshipit-source-id: 816b97d63a934a5dda1a910312ad69f120b0b4de
2019-04-18 12:28:10 -07:00
f5435634b4 Respect order of Parameters in rnn.py (#18198)
Summary:
Previously to get a list of parameters this code was just putting them in the reverse order in which they were defined, which is not always right. This PR allows parameter lists to define the order themselves. To do this parameter lists need to have a corresponding function that provides the names of the parameters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18198

Differential Revision: D14966270

Pulled By: driazati

fbshipit-source-id: 59331aa59408660069785906304b2088c19534b2
2019-04-18 11:18:20 -07:00
2d0d153288 Refactor EmitLoopCommon to make it more amenable to future extensions (#19341)
Summary:
This PR paves the way for support more iterator types in for-in loops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19341

Differential Revision: D14992749

Pulled By: Krovatkin

fbshipit-source-id: e2d4c9465c8ec3fc74fbf23006dcb6783d91795f
2019-04-18 09:59:21 -07:00
b7323a94ad Cleanup init_process_group (#19033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19033

torch.distributed.init_process_group() has had many parameters added, but the contract isn't clear. Adding documentation, asserts, and explicit args should make this clearer to callers and more strictly enforced.

Reviewed By: mrshenli

Differential Revision: D14813070

fbshipit-source-id: 80e4e7123087745bed436eb390887db9d1876042
2019-04-18 09:37:38 -07:00
20a5aa9670 Sync FindCUDA/select_computer_arch.cmake from upstream (#19392)
Summary:
1. Fixes auto detection for Turing cards.
2. Adds Turing Support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19392

Differential Revision: D14996142

Pulled By: soumith

fbshipit-source-id: 3cd45c58212cf3db96e5fa19b07d9f1b59a1666a
2019-04-18 07:03:19 -07:00
9e3bdb3231 Update module.py documentation. (#19347)
Summary:
Added the ">>>" python interpreter sign(three greater than symbols), so that the edited lines will appear as code, not comments/output, in the documentation. Normally, the interpreter would display "..." when expecting a block, but I'm not sure how this would work on the pytorch docs website. It seems that in other code examples the ">>>" sign is used as well, therefore I used with too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19347

Differential Revision: D14986154

Pulled By: soumith

fbshipit-source-id: 8f4d07d71ff7777b46c459837f350eb0a1f17e84
2019-04-18 06:46:24 -07:00
973d51079b Add device-specific cuFFT plan caches (#19300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300

Differential Revision: D14986967

Pulled By: soumith

fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255
2019-04-18 06:39:35 -07:00
b8fb6eae88 Improve bmm() performance on CPU when input tensor is non-contiguous (#19338)
Summary:
This PR aims to improve Transformer performance on CPU, `bmm()` is one of the major bottlenecks now.

Current logic of `bmm()` on CPU only uses MKL batch gemm when the inputs `A` and `B` are contiguous or transposed. So when `A` or `B` is a slice of a larger tensor, it falls to a slower path.

`A` and `B` are both 3D tensors. MKL is able to handle the batch matrix multiplication on occasion that `A.stride(1) == 1 || A.stride(2) == 1` and `B.stride(1) == || B.stride(2) == 1`.

From [fairseq](https://github.com/pytorch/fairseq) implementation of Transformer, multi-head attention has two places to call bmm(), [here](https://github.com/pytorch/fairseq/blob/master/fairseq/modules/multihead_attention.py#L167) and [here](https://github.com/pytorch/fairseq/blob/master/fairseq/modules/multihead_attention.py#L197), `q`, `k`, `v` are all slices from larger tensor. So the `bmm()` falls to slow path at the moment.

Results on Xeon 6148 (20*2 cores 2.5GHz) indicate this PR improves Transformer training performance by **48%** (seconds per iteration reduced from **5.48** to **3.70**), the inference performance should also be boosted.

Before:
```
| epoch 001:   0%| | 27/25337 [02:27<38:31:26,  5.48s/it, loss=16.871, nll_loss=16.862, ppl=119099.70, wps=865, ups=0, wpb=4715.778, bsz=129.481, num_updates=27, lr=4.05e-06, gnorm=9.133,
```
After:
```
| epoch 001:   0%| | 97/25337 [05:58<25:55:49,  3.70s/it, loss=14.736, nll_loss=14.571, ppl=24339.38, wps=1280, ups=0, wpb=4735.299, bsz=131.134, num_updates=97, lr=1.455e-05, gnorm=3.908,
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19338

Differential Revision: D14986346

Pulled By: soumith

fbshipit-source-id: 827106245af908b8a4fda69ed0288d322b028f08
2019-04-18 06:34:17 -07:00
12d6f79ecd Optional inputs and outputs (#19289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19289

Allow optional inputs and outputs in native c10 operators

Reviewed By: dzhulgakov

Differential Revision: D14931927

fbshipit-source-id: 48f8bec009c6374345b34d933f148c08bb4f7118
2019-04-18 02:04:57 -07:00
fa96de2b3f Add some tests (#19288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19288

-

Reviewed By: dzhulgakov

Differential Revision: D14931924

fbshipit-source-id: 6c53b5d1679080939973d33868e58ca4ad70361d
2019-04-18 02:04:53 -07:00
601f36bacc Use string based schema for exposing caffe2 ops (#19287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287

Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators.

Reviewed By: dzhulgakov

Differential Revision: D14931925

fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8
2019-04-18 02:04:50 -07:00
5ca22cce69 Allow registering ops without specifying the full schema (#19286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19286

The operator registration API now allows registering an operator by only giving the operator name and not the full operator schema,
as long as the operator schema can be inferred from the kernel function.

Reviewed By: dzhulgakov

Differential Revision: D14931921

fbshipit-source-id: 3776ce43d4ce67bb5a3ea3d07c37de96eebe08ba
2019-04-18 02:04:46 -07:00
a456e1e196 Add either type (#19285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19285

The either type is a tagged union with two members.
This is going to be used in a diff stacked on top to allow a function to return one of two types.

Also, generally, either<Error, Result> is a great pattern for returning value_or_error from a function without using exceptions and we could use this class for that later.

Reviewed By: dzhulgakov

Differential Revision: D14931923

fbshipit-source-id: 7d1dd77b3e5b655f331444394dcdeab24772ab3a
2019-04-18 02:04:43 -07:00
12dcc77bcb Allow ops without tensor args if only fallback kernel exists (#19284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19284

Instantiating a dispatch table previously only worked when the op had a tensor argument we could dispatch on.
However, the legacy API for custom operators didn't have dispatch and also worked for operators without tensor arguments, so we need to continue supporting that.
It probably generally makes sense to support this as long as there's only a fallback kernel and no dispatched kernel registered.
This diff adds that functionality.

Reviewed By: dzhulgakov

Differential Revision: D14931926

fbshipit-source-id: 38fadcba07e5577a7329466313c89842d50424f9
2019-04-18 02:04:40 -07:00
8036af39d2 String-based schemas in op registration API (#19283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19283

Now that the function schema parser is available in ATen/core, we can use it from the operator registration API to register ops based on string schemas.

This does not allow registering operators based on only the name yet - the full schema string needs to be defined.
A diff stacked on top will add name based registration.

Reviewed By: dzhulgakov

Differential Revision: D14931919

fbshipit-source-id: 71e490dc65be67d513adc63170dc3f1ce78396cc
2019-04-18 01:03:40 -07:00
41dc54e291 Move function schema parser to ATen/core build target (#19282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19282

This is largely a hack because we need to use the function schema parser from ATen/core
but aren't clear yet on how the final software architecture should look like.

- Add function schema parser files from jit to ATen/core build target.
- Also move ATen/core build target one directory up to allow this.

We only change the build targets and don't move the files yet because this is likely
not the final build set up and we want to avoid repeated interruptions
for other developers. cc zdevito

Reviewed By: dzhulgakov

Differential Revision: D14931922

fbshipit-source-id: 26462e2e7aec9e0964706138edd3d87a83b964e3
2019-04-18 01:03:37 -07:00
789c438d86 Automatic update of fbcode/onnx to ad7313470a9119d7e1afda7edf1d654497ee80ab (#19339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19339

Previous import was 971311db58f2fa8306d15e1458b5fd47dbc8d11c

Included changes:
- **[ad731347](https://github.com/onnx/onnx/commit/ad731347)**: Fix shape inference for matmul (#1941) <Bowen Bao>
- **[3717dc61](https://github.com/onnx/onnx/commit/3717dc61)**: Shape Inference Tests for QOps (#1929) <Ashwini Khade>
- **[a80c3371](https://github.com/onnx/onnx/commit/a80c3371)**: Prevent unused variables from generating warnings across all platforms.  (#1930) <Pranav Sharma>
- **[be9255c1](https://github.com/onnx/onnx/commit/be9255c1)**: add title (#1919) <Prasanth Pulavarthi>
- **[7a112a6f](https://github.com/onnx/onnx/commit/7a112a6f)**: add quantization ops in onnx (#1908) <Ashwini Khade>
- **[6de42d7d](https://github.com/onnx/onnx/commit/6de42d7d)**: Create working-groups.md (#1916) <Prasanth Pulavarthi>

Reviewed By: yinghai

Differential Revision: D14969962

fbshipit-source-id: 5ec64ef7aee5161666ed0c03e201be0ae20826f9
2019-04-18 00:45:20 -07:00
fbf505cba7 Remove copy and copy_ special case on Type (#18972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18972
ghimport-source-id: b5d3012b00530145fa24ab0cab693a7e80cb5989

Differential Revision: D14816530

Pulled By: li-roy

fbshipit-source-id: 9c7a166abb22d2cd1f81f352e44d9df1541b1774
2019-04-18 00:21:43 -07:00
a64cce326f Add constant folding to ONNX graph during export (Resubmission) (#18698)
Summary:
Rewritten version of https://github.com/pytorch/pytorch/pull/17771 using graph C++ APIs.

This PR adds the ability to do constant folding on ONNX graphs during PT->ONNX export. This is done mainly to optimize the graph and make it leaner. The two attached snapshots show a multiple-node LSTM model before and after constant folding.
A couple of notes:
1. Constant folding is by default turned off for now. The goal is to turn it on by default once we have validated it through all the tests.
2. Support for folding in nested blocks is not in place, but will be added in the future, if needed.

**Original Model:**
![multiple_lstm_original](https://user-images.githubusercontent.com/23646532/53987630-6ac53980-40d6-11e9-9702-1ccfee124a83.JPG)
**Constant-folded model:**
![multiple_lstm_constant_folded](https://user-images.githubusercontent.com/23646532/53987632-6c8efd00-40d6-11e9-81c5-362c16f68861.JPG)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18698

Differential Revision: D14889768

Pulled By: houseroad

fbshipit-source-id: b6616b1011de9668f7c4317c880cb8ad4c7b631a
2019-04-18 00:10:04 -07:00
01d7d3de46 Remove usages of TypeID (#19183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19183
ghimport-source-id: 9af190b072523459fa61e5e79419b88ac8586a4d

Differential Revision: D14909203

Pulled By: li-roy

fbshipit-source-id: d716179c484aebfe3ec30087c5ecd4a11848ffc3
2019-04-17 23:55:47 -07:00
c7b1fdb767 Fixing function schema parser for Android (#19281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281

String<->Number conversions aren't available in the STL used in our Android environment.
This diff adds workarounds for that so that the function schema parser can be compiled for android

Reviewed By: dzhulgakov

Differential Revision: D14931649

fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad
2019-04-17 23:50:17 -07:00
094678c04b Split function schema parser from operator (#19280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280

We want to use the function schema parser from ATen/core, but with as little dependencies as possible.
This diff moves the function schema parser into its own file and removes some of its dependencies.

Reviewed By: dzhulgakov

Differential Revision: D14931651

fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea
2019-04-17 23:50:15 -07:00
a1174dbc50 fix hub doc format
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19396

Differential Revision: D14993859

Pulled By: ailzhang

fbshipit-source-id: bdf94e54ec35477cfc34019752233452d84b6288
2019-04-17 23:43:56 -07:00
b31bab7860 Clang-format torch/csrc/jit/passes/quantization.cpp. (#19385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19385
ghimport-source-id: 67f808db7dcbcb6980eac79a58416697278999b0

Differential Revision: D14991917

Pulled By: ZolotukhinM

fbshipit-source-id: 6c2e57265cc9f0711752582a04d5a070482ed1e6
2019-04-17 22:08:41 -07:00
6732358bf9 Allow DDP to wrap multi-GPU modules (#19271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19271

allow DDP to take multi-gpu models

Reviewed By: pietern

Differential Revision: D14822375

fbshipit-source-id: 1eebfaa33371766d3129f0ac6f63a573332b2f1c
2019-04-17 21:21:54 -07:00
c48e1679f9 Add validator for optimizers when parameters are shared
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18497

Reviewed By: kennyhorror

Differential Revision: D14614738

fbshipit-source-id: beddd8349827dcc8ccae36f21e5d29627056afcd
2019-04-17 21:10:38 -07:00
2787f1d8ed hub minor fixes (#19247)
Summary:
A few improvements while doing bert model
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19247

Differential Revision: D14989345

Pulled By: ailzhang

fbshipit-source-id: f4846813f62b6d497fbe74e8552c9714bd8dc3c7
2019-04-17 21:04:33 -07:00
776fec0f9f fix wrong schema (#19370)
Summary:
Op was improperly schematized previously.  Evidently checkScript does not test if the outputs are the same type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19370

Differential Revision: D14985159

Pulled By: eellison

fbshipit-source-id: feb60552afa2a6956d71f64801f15e5fe19c3a91
2019-04-17 19:55:30 -07:00
bf5f30f39b Fix printing format in examples in jit/README.md. (#19323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19323
ghimport-source-id: 74a01917de70c9d59099cf601b24f3cb484ab7be

Differential Revision: D14990100

Pulled By: ZolotukhinM

fbshipit-source-id: 87ede08c8ca8f3027b03501fbce8598379e8b96c
2019-04-17 18:38:09 -07:00
48859e3ad3 Allow for single-line deletions in clang_tidy.py (#19082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19082

When you have just one line of deletions, just as with additions, there is no count printed.
Without this fix, we ignore all globs with single-line deletions when selecting which lines were changed.
When all the changes in the file were single-line, this meant no line-filtering at all!

Differential Revision: D14860426

fbshipit-source-id: c60e9d84f9520871fc0c08fa8c772c227d06fa27
2019-04-17 17:02:30 -07:00
242743eedb Revert D14901379: [jit] Add options to Operator to enable registration of alias analysis passes
Differential Revision:
D14901379

Original commit changeset: d92a497e280f

fbshipit-source-id: 51d31491ab90907a6c95af5d8a59dff5e5ed36a4
2019-04-17 16:56:14 -07:00
0414f23855 Revert D14901485: [jit] Only require python print on certain namespaces
Differential Revision:
D14901485

Original commit changeset: 4b02a66d325b

fbshipit-source-id: 93348056c00f43c403cbf0d34f8c565680ceda11
2019-04-17 16:56:11 -07:00
5fa1aad670 Remove unused template parameter in OnnxifiOp (#19362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19362

`float` type is never used in OnnxifiOp....

Reviewed By: bddppq

Differential Revision: D14977970

fbshipit-source-id: 8fee02659dbe408e5a3e0ff95d74c04836c5c281
2019-04-17 16:48:14 -07:00
ad8f34fcca Add empty_quantized (#18960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18960

empty_affine_quantized creates an empty affine quantized Tensor from scratch.
We might need this when we implement quantized operators.

Differential Revision: D14810261

fbshipit-source-id: f07d8bf89822d02a202ee81c78a17aa4b3e571cc
2019-04-17 16:17:40 -07:00
4371cb5e01 Cast not expressions to bool (#19361)
Summary:
As part of implicitly casting condition statements, we should be casting not expressions as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19361

Differential Revision: D14984275

Pulled By: eellison

fbshipit-source-id: f8dae64f74777154c25f7a6bcdac03cf44cbb60b
2019-04-17 16:06:48 -07:00
d6b91075dc Eliminate type dispatch from copy_kernel, and use memcpy directly rather than implementing our own copy. (#19198)
Summary:
It turns out that copying bytes is the same no matter what type
they're interpreted as, and memcpy is already vectorized on every
platform of note.  Paring this down to the simplest implementation
saves just over 4KB off libtorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19198

Differential Revision: D14922656

Pulled By: resistor

fbshipit-source-id: bb03899dd8f6b857847b822061e7aeb18c19e7b4
2019-04-17 15:39:13 -07:00
3e0b46b6d1 Only require python print on certain namespaces (#18846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18846
ghimport-source-id: b211e15d24c88fdc32d79222d9fce2fa9c291541

Differential Revision: D14901485

Pulled By: bwasti

fbshipit-source-id: 4b02a66d325ba5391d1f838055aea13b5e4f6485
2019-04-17 14:24:50 -07:00
3a031c414a Add options to Operator to enable registration of alias analysis passes (#18589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18589
ghimport-source-id: dab203f6be13bf41963848f5315235b6bbe45c08

Differential Revision: D14901379

Pulled By: bwasti

fbshipit-source-id: d92a497e280f1b0a63b11a9fd8ae9b48bf52e6bf
2019-04-17 13:14:55 -07:00
eaa14f5f59 Error out on in-place binops on tensors with internal overlap (#19317)
Summary:
This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common
binops. See #17935 for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317

Differential Revision: D14972399

Pulled By: zou3519

fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2
2019-04-17 13:02:07 -07:00
ff4a4d6155 Update for #19326
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19367

Differential Revision: D14981835

Pulled By: VitalyFedyunin

fbshipit-source-id: e8a97986d9669ed7f465a7ba771801bdd043b606
2019-04-17 12:56:08 -07:00
aad6f97898 Decorator to make sure we can import core from caffe2 (#19273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19273

Some of the CIs are failing if the protobuf is not installed. Protobuf is imported as part of the `caffe2.python.core`, and this adds a skip decorator to avoid running tests that depend on `caffe2.python.core`

Reviewed By: jianyuh

Differential Revision: D14936387

fbshipit-source-id: e508a1858727bbd52c951d3018e2328e14f126be
2019-04-17 11:22:49 -07:00
f1f31b634d Eliminate AdjustBatch ops (#19083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083

As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by
- inling the input adjust batch op into Glow
- inling the output adjust batch op into OnnxifiOp and do that only conditionally.

This is the C2 part of the change and requires change from Glow side to work e2e.

Reviewed By: rdzhabarov

Differential Revision: D14860582

fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6
2019-04-17 10:00:25 -07:00
3fcee4875c Add rst entry for nn.MultiheadAttention (#19346)
Summary:
Fix #19259 by adding the missing `autoclass` entry for `nn.MultiheadAttention` from [here](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/activation.py#L676)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19346

Differential Revision: D14971426

Pulled By: soumith

fbshipit-source-id: ceaaa8ea4618c38fa2bff139e7fa0d6c9ea193ea
2019-04-17 04:40:28 -07:00
db611b7caf Delete C10Tensor (#19328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328

Plans changed and we don't want this class anymore.

Reviewed By: dzhulgakov

Differential Revision: D14966746

fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347
2019-04-17 00:02:27 -07:00
33443d083e Fix python lint (#19331)
Summary:
VitalyFedyunin jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19331

Differential Revision: D14969435

Pulled By: bddppq

fbshipit-source-id: c1555c52064758ecbe668f92b837f2d7524f6118
2019-04-16 21:47:30 -07:00
58d4414c33 Profiling pipeline part1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18772

Differential Revision: D14952781

Pulled By: Krovatkin

fbshipit-source-id: 1e99fc9053c377291167f0b04b0f0829b452dbc4
2019-04-16 21:21:08 -07:00
93201d0676 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19310

Differential Revision: D14952046

Pulled By: soumith

fbshipit-source-id: 1bbaaad6f932a832ea8e5e804d0d9cd9140a5071
2019-04-16 20:31:15 -07:00
06c28d8a12 Add slicing and int_repr() to QTensor (#19296)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#19296 [pt1][quant] Add slicing and int_repr() to QTensor**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14756833/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #18960 [pt1][quant] Add empty_quantized&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14810261/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #19312 Use the QTensor with QReLU&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14819460/)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #19319 [RFC] Quantized SumRelu&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14866442/)

Methods added to pytorch python frontend:
- int_repr() returns a CPUByte Tensor which copies the data of QTensor.
- Added as_strided for QTensorImpl which provides support for slicing a QTensor(see test_torch.py)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19296

Differential Revision: D14756833

Pulled By: jerryzh168

fbshipit-source-id: 6f4c92393330e725c4351d6ff5f5fe9ac7c768bf
2019-04-16 20:17:21 -07:00
33e7977154 move const defs of DeviceType to DeviceType.h (#19185)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#19185 [c10][core][ez] move const defs of DeviceType to DeviceType.h**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14909415/)

att
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19185

Differential Revision: D14909415

Pulled By: jerryzh168

fbshipit-source-id: 876cf999424d8394f5ff20e6750133a4e43466d4
2019-04-16 20:02:21 -07:00
5627940e9c Add a fast path for batch-norm CPU inference. (#19152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19152

Adding a fast path for batch-norm CPU inference when all tensors are contiguous.
* Leverage vectorization through smiple loops.
* Folding linear terms before computation.
* For resnext-101, this version gets 18.95 times faster.
* Add a microbenchmark:
* (buck build mode/opt -c python.package_style=inplace --show-output //caffe2/benchmarks/operator_benchmark:batchnorm_benchmark) && \
(OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 buck-out/gen/caffe2/benchmarks/operator_benchmark/batchnorm_benchmark#binary.par)
* batch_norm: data shape: [1, 256, 3136], bandwidth: 22.26 GB/s
* batch_norm: data shape: [1, 65536, 1], bandwidth: 5.57 GB/s
* batch_norm: data shape: [128, 2048, 1], bandwidth: 18.21 GB/s

Reviewed By: soumith, BIT-silence

Differential Revision: D14889728

fbshipit-source-id: 20c9e567e38ff7dbb9097873b85160eca2b0a795
2019-04-16 19:27:54 -07:00
ff0a7ae43f Testing for folded conv_bn_relu (#19298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19298

Proper testing for conv_bn_relu folding

Differential Revision: D13998891

fbshipit-source-id: ceb58ccec19885cbbf38964ee0d0db070e098b4a
2019-04-16 19:04:06 -07:00
9f35185b56 Initialize intra-op threads in JIT thread pool (#19058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19058
ghimport-source-id: 53e87df8d93459259854a17d4de3348e463622dc

Differential Revision: D14849624

Pulled By: ilia-cher

fbshipit-source-id: 5043a1d4330e38857c8e04c547526a3ba5b30fa9
2019-04-16 18:27:22 -07:00
5e7bc26f65 Fix ASSERT_ANY_THROW. (#19321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19321
ghimport-source-id: 9efffc36950152105bd0dc13f450161367101410

Differential Revision: D14962184

Pulled By: ZolotukhinM

fbshipit-source-id: 22d602f50eb5e17a3e3f59cc7feb59a8d88df00d
2019-04-16 15:44:53 -07:00
78f589e794 Add len() for strings (#19320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19320
ghimport-source-id: 62131cb24e9bf65f0ef3e60001cb36509a1f4163

Reviewed By: bethebunny

Differential Revision: D14961078

Pulled By: driazati

fbshipit-source-id: 08b9a4b10e4a47ea09ebf55a4743defa40c74698
2019-04-16 15:11:33 -07:00
df67969e6b Step 3: Add support for return_counts to torch.unique for dim not None (#18650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18650
ghimport-source-id: 75759c95e6c48e27c172b919097dbc40c6bfb5e6

Differential Revision: D14892319

Pulled By: VitalyFedyunin

fbshipit-source-id: ec5d1b80fc879d273ac5a534434fd648468dda1e
2019-04-16 14:06:45 -07:00
c8897d2263 invoke NN smoketests from a python loop instead of a batch file (#18756)
Summary:
I tried first to convert the `.bat` script to a Bash `.sh` script, but I got this error:
```
[...]/build/win_tmp/ci_scripts/test_python_nn.sh: line 3: fg: no job control
```
Line 3 was where `%TMP_DIR%/ci_scripts/setup_pytorch_env.bat` was invoked.

I found a potential workaround on stack overflow of adding the `monitor` (`-m`) flag to the script, but hat didn't work either:

```
00:58:00 /bin/bash: cannot set terminal process group (3568): Inappropriate ioctl for device
00:58:00 /bin/bash: no job control in this shell
00:58:00 + %TMP_DIR%/ci_scripts/setup_pytorch_env.bat
00:58:00 /c/Jenkins/workspace/pytorch-builds/pytorch-win-ws2016-cuda9-cudnn7-py3-test1/build/win_tmp/ci_scripts/test_python_nn.sh: line 3: fg: no job control
```

So instead I decided to use Python to replace the `.bat` script.  I believe this is an improvement in that it's both "table-driven" now and cross-platform.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18756

Differential Revision: D14957570

Pulled By: kostmo

fbshipit-source-id: 87794e64b56ffacbde4fd44938045f9f68f7bc2a
2019-04-16 13:11:03 -07:00
1c5073fb4b Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.

Now compatible with both torch scripts:
 `  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)`
and
`  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))`

Same checked for all similar functions `rand_like`, `empty_like` and others

It is fixed version of #18455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952

Differential Revision: D14801792

Pulled By: VitalyFedyunin

fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba
2019-04-16 11:06:15 -07:00
31686805f2 Enable unit tests for ROCm 2.3 (#19307)
Summary:
Unit tests that hang on clock64() calls are now fixed.

test_gamma_gpu_sample is now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19307

Differential Revision: D14953420

Pulled By: bddppq

fbshipit-source-id: efe807b54e047578415eb1b1e03f8ad44ea27c13
2019-04-16 10:58:27 -07:00
e1f38a847d Fix type conversion in dequant and add a test (#19226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19226

Type conversoin was wrong previously. Thanks zafartahirov for finding it!

Differential Revision: D14926610

fbshipit-source-id: 6824f9813137a3d171694d743fbb437a663b1f88
2019-04-16 10:52:44 -07:00
da4ff17eee math module support (#19115)
Summary:
This PR refer to issue [#19026](https://github.com/pytorch/pytorch/issues/19026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19115

Differential Revision: D14936053

Pulled By: driazati

fbshipit-source-id: 68d5f33ced085fcb8c10ff953bc7e99df055eccc
2019-04-16 10:44:07 -07:00
344acaa0ca Revert replicate.py to disallow replicating multi-device modules (#19278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19278

Based on discussion in https://github.com/pytorch/pytorch/pull/19278 and https://github.com/pytorch/pytorch/pull/18687, changes to replicate.py will be reverted to disallow replicating multi-device modules.

Reviewed By: pietern

Differential Revision: D14940018

fbshipit-source-id: 7504c0f4325c2639264c52dcbb499e61c9ad2c26
2019-04-16 10:03:38 -07:00
b9c20d5224 graph_for based on last_optimized_executed_graph (#19142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19142
ghimport-source-id: 822013fb7e93032c74867fc77c6774c680aef6d1

Differential Revision: D14888703

Pulled By: zdevito

fbshipit-source-id: a2ad65a042d08b1adef965c2cceef37bb5d26ba9
2019-04-16 09:17:53 -07:00
3b29cbaf86 Enable half for CUDA dense EmbeddingBag backward. (#19293)
Summary:
I audited the relevant kernel and saw it accumulates a good deal into float
so it should be fine.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19293

Differential Revision: D14942274

Pulled By: zou3519

fbshipit-source-id: 36996ba0fbb29fbfb12b27bfe9c0ad1eb012ba3c
2019-04-16 08:57:20 -07:00
3501576230 calculate execution time based on final iterations (#19299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19299

I saw larger than 5% performance variation with small operators, this diff aims to reduce the variation by avoiding python overhead. Previously, in the benchmark, we run the main loop for 100 iterations then look at the time. If it's not significant, we will double the number of iterations to rerun and look at the result. We continue this process until it becomes significant. We calculate the time by total_time / number of iterations. The issue is that we are including multiple python trigger overhead.

Now, I change the logic to calculate execution time based on the last run instead of all runs, the equation is time_in_last_run/number of iterations.

Reviewed By: hl475

Differential Revision: D14925287

fbshipit-source-id: cb646298c08a651e27b99a5547350da367ffff47
2019-04-16 08:57:17 -07:00
646cb6157d Move OMP/MKL thread initialization into ATen/Parallel (#19011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19011
ghimport-source-id: 432e31eccfd0e59fa21a790f861e6b2ff4fdbac6

Differential Revision: D14846034

Pulled By: ilia-cher

fbshipit-source-id: d9d03c761d34bac80e09ce776e41c20fd3b04389
2019-04-16 00:16:32 -07:00
20fc7b6ec7 Avoid undefined symbol error when building AdIndexer LTO (#19009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19009

Move the definition of `MulFunctor<>::Backward()` into a header file.

Reviewed By: BIT-silence

Differential Revision: D14823230

fbshipit-source-id: 1efaec01863fcc02dcbe7e788d376e72f8564501
2019-04-15 23:43:13 -07:00
ada10ad416 Ellipsis in subscript
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17763

Differential Revision: D14893533

Pulled By: Krovatkin

fbshipit-source-id: c46b4e386d3aa30e6dc03e3052d2e5ff097fa74b
2019-04-15 22:10:44 -07:00
f1c8e01524 Add input information in RecordFunction calls (#18717)
Summary:
Add input information into generated RecordFunction calls in
VariableType wrappers, JIT operators and a few more locations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18717

Differential Revision: D14729156

Pulled By: ilia-cher

fbshipit-source-id: 811ac4cbfd85af5c389ef030a7e82ef454afadec
2019-04-15 20:28:08 -07:00
84b264b17d Add NHWC order support in the cost inference function of 3d conv (#19170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19170

As title
The quantized resnext3d model in production got the following failures without the fix:

```
 Caffe2 operator Int8ConvRelu logging error: [enforce fail at conv_pool_op_base.h:463] order == StorageOrder::NCHW. 1 vs 2. Conv3D only supports NCHW on the production quantized model
```

Reviewed By: jspark1105

Differential Revision: D14894276

fbshipit-source-id: ef97772277f322ed45215e382c3b4a3702e47e59
2019-04-15 16:47:22 -07:00
ffc9e29844 unit test with multiple op invocations (#19118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19118

A bug introduced by D14700576 reported by Yufei (fixed by D14778810 and D14785256) was not detected by our units tests.
This diff improves unit tests to catch such errors (with this diff and without D14778810, we can reproduce the bug Yufei reported).
This improvement also revealed a bug that affects the accuracy when we pre-pack weight and bias together and the pre-packed weight/bias are used by multiple nets. We were modifying the pre-packed bias in-place which was supposed to be constants.

Reviewed By: csummersea

Differential Revision: D14806077

fbshipit-source-id: aa9049c74b6ea98d21fbd097de306447a662a46d
2019-04-15 14:41:28 -07:00
00148825fc Run shellcheck on Jenkins scripts (#18874)
Summary:
closes #18873

Doesn't fail the build on warnings yet.
Also fix most severe shellcheck warnings
Limited to `.jenkins/pytorch/` at this time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18874

Differential Revision: D14936165

Pulled By: kostmo

fbshipit-source-id: 1ee335695e54fe6c387ef0f6606ea7011dad0fd4
2019-04-15 12:48:52 -07:00
a0263ec047 Make DistributedDataParallel use new reducer (#18953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18953

This removes Python side bucketing code from DistributedDataParallel
and replaces it with calls to the new C++ based bucketing and reducing
code. To confirm this is working well, we ran a test with both the
previous implementation and the new implementation, and confirmed they
are numerically equivalent.

Performance is improved by a couple percent or more, including the
single machine multiple GPU runs.

Closes #13273.

Reviewed By: mrshenli

Differential Revision: D14580911

fbshipit-source-id: 44e76f8b0b7e58dd6c91644e3df4660ca2ee4ae2
2019-04-15 12:44:38 -07:00
6ed57e052d Fix the return value of ParseFromString (#19262)
Summary:
Fix the return value of ParseFromString.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19262

Differential Revision: D14937605

Pulled By: ezyang

fbshipit-source-id: 3f441086517186a075efb3d74f09160463b696b3
2019-04-15 12:39:29 -07:00
3403cb857b Modify Cholesky derivative (#19116)
Summary:
The derivative of the Cholesky decomposition was previously a triangular matrix.

Changelog:
- Modify the derivative of Cholesky from a triangular matrix to symmetric matrix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19116

Differential Revision: D14935470

Pulled By: ezyang

fbshipit-source-id: 1c1c76b478c6b99e4e16624682842cb632e8e8b9
2019-04-15 12:16:55 -07:00
991279dc7d produce diagram for caffe2 build matrix (#18517)
Summary:
This PR splits the configuration tree data from the logic used to construct the tree, for both `pytorch` and `caffe2` build configs.

Caffe2 configs are also now illustrated in a diagram.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18517

Differential Revision: D14936170

Pulled By: kostmo

fbshipit-source-id: 7b40a88512627377c5ea0f24765dabfef76ca279
2019-04-15 11:45:32 -07:00
7caad0ed33 Free all blocks with outstanding events on OOM-retry (#19222)
Summary:
The caching allocator tries to free all blocks on an out-of-memory
error. Previously, it did not free blocks that still had outstanding
stream uses. This change synchronizes on the outstanding events and
frees those blocks.

See #19219
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19222

Differential Revision: D14925071

Pulled By: colesbury

fbshipit-source-id: a2e9fe957ec11b00ea8e6c0468436c519667c558
2019-04-15 11:29:27 -07:00
86619b8ba6 Make sure that any of the future versions can load and execute older models. (#19174)
Summary:
Helps to test #18952
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19174

Differential Revision: D14899474

Pulled By: VitalyFedyunin

fbshipit-source-id: a4854ad44da28bd0f5115ca316e6078cbfe29d0d
2019-04-15 10:49:31 -07:00
68c4ebbeeb Sync fbcode/caffe2 and xplat/caffe2 (1) (#19218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19218

Sync some contents between fbcode/caffe2 and xplat/caffe2 to move closer towards a world where they are identical.

Reviewed By: dzhulgakov

Differential Revision: D14919916

fbshipit-source-id: 29c6b6d89ac556d58ae3cd02619aca88c79591c1
2019-04-13 21:45:52 -07:00
2060e44ec8 upgrade bazel version in CI [xla ci] (#19246)
Summary:
The latest TF requires upgrading bazel version.
This PR should fix xla tests in CI.
[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19246

Differential Revision: D14929533

Pulled By: ailzhang

fbshipit-source-id: f6deb31428ed39f267d96bb9814d06f76641e73b
2019-04-13 20:16:37 -07:00
1c099fd5c9 Update docker images to use ROCm 2.3 (#19231)
Summary:
xw285cornell petrex iotamudelta

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger-test/24676/
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/17679/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/24652/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger/9943/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19231

Differential Revision: D14928580

Pulled By: bddppq

fbshipit-source-id: 025b0affa6bcda6ee9f823dfc6c2cf8b92e71027
2019-04-13 13:11:26 -07:00
10bc789dff fix flake8 (#19243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19243
ghimport-source-id: ae80aed3a5742df21afb6e55979686220a27cce7

Differential Revision: D14928670

Pulled By: zdevito

fbshipit-source-id: 20ec0d5c8d6f1c515beb55e2e63eddf3b2fc12dd
2019-04-13 10:04:39 -07:00
e958ceb5d7 Remove GraphExecutor's python bindings (#19141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19141
ghimport-source-id: 796a41f5514d29959af052fcf5391a2834850a80

Reviewed By: jamesr66a

Differential Revision: D14888702

Pulled By: zdevito

fbshipit-source-id: c280145f08e7bc210434d1c99396a3257b626cf9
2019-04-13 08:42:24 -07:00
ddda563f22 Cleanup ScriptModule bindings (#19138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19138
ghimport-source-id: 10f810f5e7551c1cb65fc4799744083bd7ffd1ee

Reviewed By: jamesr66a

Differential Revision: D14886945

Pulled By: zdevito

fbshipit-source-id: a5e5bb08694d03166a7516ec038656c2a02e7896
2019-04-13 08:42:21 -07:00
dcb5fd3613 get propagate_shape logic out of module.h (#19137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19137
ghimport-source-id: 2394765f2d401e68ffdfa4c985bfab4cca2517f8

Reviewed By: jamesr66a

Differential Revision: D14885946

Pulled By: zdevito

fbshipit-source-id: daa2894ed9761107e9d273bb172840dc23ace072
2019-04-13 08:42:17 -07:00
1827ca4c35 Make debug subgraph inlining thread local (#19136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19136
ghimport-source-id: 3a24ab36aa753ce5cce7bba3467bdbe88e5c7f60

Reviewed By: jamesr66a

Differential Revision: D14885051

Pulled By: zdevito

fbshipit-source-id: b39c6ceef73ad9caefcbf8f40dd1b9132bba03c2
2019-04-13 08:42:14 -07:00
c38c7b0ec5 Support Kwargs in C++ Function/Method calls (#19086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19086
ghimport-source-id: 7790a5cc6e32f6f72e92add0b9f76dfa49ad9859

Reviewed By: jamesr66a

Differential Revision: D14875729

Pulled By: zdevito

fbshipit-source-id: ad1e4542381d9c33722155459e794f1ba4660dbb
2019-04-13 08:42:11 -07:00
d8669a2c7e Enable working ROCm tests (#19169)
Summary:
Enable multi-GPU tests that work with ROCm 2.2. Have been run three times on CI to ensure stability.

While there, remove skipIfRocm annotations for tests that depend on MAGMA. They still skip but now for the correct reason (no MAGMA) to improve our diagnostics.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19169

Differential Revision: D14924812

Pulled By: bddppq

fbshipit-source-id: 8b88f58bba58a08ddcd439e899a0abc6198fef64
2019-04-12 21:51:10 -07:00
ca02558d40 import warnings in torch.hub & fix master CI travis (#19181)
Summary:
fix missing import in #18758
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19181

Differential Revision: D14908198

Pulled By: ailzhang

fbshipit-source-id: 31e0dc4a27521103a1b93f72511ae1b64a36117f
2019-04-12 21:35:31 -07:00
bba96db2a5 fix lint errors in gen.py (#19221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19221

att

Reviewed By: colesbury

Differential Revision: D14923858

fbshipit-source-id: 4793d7794172d401455c5ce72dfc27dddad515d4
2019-04-12 18:26:38 -07:00
b1539412db Add pass registration mechanism (#18587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18587
ghimport-source-id: 80d753f7046a2a719e0c076684f44fa2059a0921

Differential Revision: D14901227

Pulled By: bwasti

fbshipit-source-id: 56511d0313419b63945a36b80e9ea51abdef2bd4
2019-04-12 15:32:00 -07:00
a3d3008e73 JIT Layernorm fusion (#18266)
Summary:
Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266

Differential Revision: D14879877

Pulled By: wanchaol

fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe
2019-04-12 14:38:31 -07:00
0e435afc3c Add more debugging helper to net transformer (#19176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19176

Add some amenities for debugging.

Reviewed By: llyfacebook

Differential Revision: D14901740

fbshipit-source-id: 2c4018fdbf7e3aba2a754b6b4103a72893c229c2
2019-04-12 14:28:37 -07:00
1c836e7bb9 Add Quantized Backend (#18546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18546

We'll expose all combinations of various ways of quantization in the top level dispatch key, that is we have AffineCPUTensor, PerChannelAffineCUDATensor, etc.

QTensor method added:
- is_quantized()
- item()

Differential Revision: D14637671

fbshipit-source-id: 346bc6ef404a570f0efd34e8793056ad3c7855f5
2019-04-12 12:55:49 -07:00
3f7ddd269c Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim (#18649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18649
ghimport-source-id: 3411d240a6af5fe299a889667964730184e30645

Differential Revision: D14888292

Pulled By: VitalyFedyunin

fbshipit-source-id: 80da83c264598f74ab8decb165da4a1ce2b352bb
2019-04-12 12:41:20 -07:00
bd55abb463 Fix onnx ints (#19102)
Summary:
If JIT constant propagation doesn't work, we have to handle the ListConstructor in symbolic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19102

Reviewed By: zrphercule

Differential Revision: D14875588

Pulled By: houseroad

fbshipit-source-id: d25c847d224d2d32db50aae1751100080e115022
2019-04-12 12:01:14 -07:00
c480798a1c use C10_REGISTER for GELU op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19090

Reviewed By: BIT-silence

Differential Revision: D14864737

fbshipit-source-id: 8debd53171f7068726f0ab777a13ca46becbfbdf
2019-04-12 11:41:04 -07:00
79db4e9c10 Fix tabs lint. (#19196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19196
ghimport-source-id: c10b1b19b087d7650e1614f008a9c2db21dfec2f

Differential Revision: D14913428

Pulled By: ezyang

fbshipit-source-id: 815b919d8e4516d0e5d89ebbdc4dff6d1d08da47
2019-04-12 11:22:05 -07:00
65ae897ae8 Pin nvidia-container-runtime version (#19195)
Summary:
This PR is to fix the CI error:
```
nvidia-docker2 : Depends: nvidia-container-runtime (= 2.0.0+docker18.09.4-1) but 2.0.0+docker18.09.5-1 is to be installed
E: Unable to correct problems, you have held broken packages.
Exited with code 100
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19195

Differential Revision: D14913104

Pulled By: yf225

fbshipit-source-id: d151205f5ffe9cac7320ded3c25baa7e051c3623
2019-04-12 10:00:40 -07:00
deda88e0aa One more fix for #18790
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19187

Differential Revision: D14913100

Pulled By: ezyang

fbshipit-source-id: bf147747f933a2c9a35f3ff00bf6b83a4f29286c
2019-04-12 09:29:15 -07:00
7e73783c6f Fix promoteTypes for QInt types (#19182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19182

This is a bug discovered by zafartahirov, right now if one of the tensor is QInt
type we'll return undefined, but actually we want to allow ops that accepts
Tensors of the same QInt type to work.

Reviewed By: zafartahirov

Differential Revision: D14909172

fbshipit-source-id: 492fd6403da8c56e180efe9d632a3b7fc879aecf
2019-04-11 19:43:18 -07:00
422b01e788 Replace more usages of Type with DeprecatedTypeProperties (#19093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19093
ghimport-source-id: a82e3dce912a173b42a6a7e35eb1302d9f334e03

Differential Revision: D14865520

Pulled By: li-roy

fbshipit-source-id: b1a8bf32f87920ce8d82f990d670477bc79d0ca7
2019-04-11 17:02:05 -07:00
fea0a0be53 Support attributes when copying modules (#19040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19040
ghimport-source-id: 37933efd717795751283cae8141e2e2caaae2e95

Reviewed By: eellison

Differential Revision: D14895573

Pulled By: driazati

fbshipit-source-id: bc2723212384ffa673d2a8df2bb57f38c62cc104
2019-04-11 15:38:06 -07:00
4ae59e4744 Move version_counter_ to TensorImpl (#18223)
Summary:
According to https://github.com/pytorch/pytorch/issues/13638#issuecomment-468055428, after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18223

Differential Revision: D14735123

Pulled By: yf225

fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
2019-04-11 15:12:45 -07:00
507fe66bea Enable comp ops for bool tensor (#19109)
Summary:
Enabled comparison ops for bool tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19109

Differential Revision: D14871187

Pulled By: izdeby

fbshipit-source-id: cf9951847d69124a93e5e21dd0a39c9568b1037d
2019-04-11 14:37:10 -07:00
c7b5a8a876 Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139)
Summary:
Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead.

Removing `is_variable_` is part of the work in Variable/Tensor merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139

Differential Revision: D14893339

Pulled By: yf225

fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a
2019-04-11 14:03:33 -07:00
ef406ee925 First class modules in the compiler, round 2 (#19167)
Summary:
This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where:

* compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr`
* GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`.
* Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things.

* This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound.  Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`.
* This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions.  Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ...

Details:
* In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs.
* When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class.
* The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167

Differential Revision: D14891966

Pulled By: zdevito

fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea
2019-04-11 13:55:48 -07:00
b6ee83a5b4 Materialize a non-default device for C2 legacy storage. (#18605)
Summary:
It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath.

This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device.

Instead, we materialize a device by allocating 0 bytes via the allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605

Differential Revision: D14680620

Pulled By: gchanan

fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432
2019-04-11 13:50:41 -07:00
bbe648dffb Allow empty net type (#19154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154

I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case.

Reviewed By: dzhulgakov

Differential Revision: D14890072

fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d
2019-04-11 12:43:07 -07:00
33a950924a Skip Slice if it's no op (#19155)
Summary:
If it's identity op, just skip the slice and return the input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19155

Reviewed By: zrphercule

Differential Revision: D14890238

Pulled By: houseroad

fbshipit-source-id: f87b93df2cca0cb0e8ae2a1d95ba148044eafd4a
2019-04-11 12:26:32 -07:00
feb5d26510 Rename ONNX util test names (#19153)
Summary:
Rename test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19153

Reviewed By: zrphercule

Differential Revision: D14890095

Pulled By: houseroad

fbshipit-source-id: 37a787398c88d9cc92b411c2355b43200cf1c4b0
2019-04-11 11:29:16 -07:00
c1b92f518d Remove ProcessGroup::getGroupRank (#19147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19147

After #14809 was merged there is no longer a need for getGroupRank.
Every ProcessGroup object has its own rank and size fields which are
accurate for the global group as well as subgroups.

Strictly speaking removing a function in a minor version bump is a big
no-no, but I highly doubt this was ever used outside of
`torch.distributed` itself. This will result in a compile error for
folks who have subclassed the ProcessGroup class though.

If this is a concern we can delay merging until a later point in time,
but eventually this will need to be cleaned up.

Differential Revision: D14889736

fbshipit-source-id: 3846fe118b3265b50a10ab8b1c75425dad06932d
2019-04-11 09:17:40 -07:00
c145c34a7b Basic implementation of QRelu in C10 (#19091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19091

Implements a basic quantized ReLU (uint8). This is a temporary solution before using the `QTensor` type instead of the tuple.

Reviewed By: dzhulgakov

Differential Revision: D14565413

fbshipit-source-id: 7d53cf5628cf9ec135603d6a1fb7c79cd9383019
2019-04-11 08:47:56 -07:00
4b20fc826d Import MultiheadAttention to PyTorch (#18334)
Summary:
Import MultiheadAttention into the core pytorch framework.
Users now can import MultiheadAttention directly from torch.nn.
See "Attention Is All You Need" for more details related to MultiheadAttention function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18334

Differential Revision: D14577966

Pulled By: zhangguanheng66

fbshipit-source-id: 756c0deff623f3780651d9f9a70ce84516c806d3
2019-04-11 08:07:30 -07:00
b6f130aa70 try to enable uncertainty for lr loss (#17236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17236

Following the paper in https://papers.nips.cc/paper/7141-what-uncertainties-do-we-need-in-bayesian-deep-learning-for-computer-vision.pdf, approximate the classification case with the regression formulation. For the LRLoss, add penalty based on the variance and regularization on the variance with a tunable parameter lambda.

Reviewed By: chocjy

Differential Revision: D14077106

fbshipit-source-id: 4405d8995cebdc7275a0dd07857d32a8915d78ef
2019-04-11 07:35:19 -07:00
160d0776d5 Remove comment (#19148)
Summary:
Remove pointer to nonexistent Note.
It is already removed in "Remove support for CUDNN 6 (#15851)"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19148

Differential Revision: D14891514

Pulled By: soumith

fbshipit-source-id: dd33cfefa3a21e18afae5b3992dea085adaabda8
2019-04-11 07:04:45 -07:00
f5165ade5b Revert D14842057: Compiler uses first-class modules**
Differential Revision:
D14842057

Original commit changeset: ca6e7b5a4380

fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3
2019-04-11 06:17:01 -07:00
5e1f0b2a07 Compiler uses first-class modules** (#19043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043
ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf

Differential Revision: D14842057

Pulled By: zdevito

fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0
2019-04-11 00:00:48 -07:00
fce0c5e17d Require matches_jit_signature within native_functions.yaml (#18956)
Summary:
"""
This will verify that the func syntax follows the JIT signature schema. If you are a developer outside the core team, set this to False first to help us track unification. After your tests pass try setting this to True once and leave it set to True if it doesn't trigger any asserts. This means that your signature happens to be compliant. In general, it serves as a means of tracking an ongoing schema unification with the goal of aligning func syntax with other components of PyTorch in order to reduce overall complexity and assert coverage of all functions by each component.
"""
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18956

Differential Revision: D14807952

Pulled By: cpuhrsch

fbshipit-source-id: 42dac49269fb3cd96dc62e0b10820d0c32c7fb0e
2019-04-10 23:35:28 -07:00
e54cb03a51 add/move a few apis in torch.hub (#18758)
Summary:
* `torch.hub.list('pytorch/vision')` - show all available hub models in `pytorch/vision`
* `torch.hub.show('pytorch/vision', 'resnet18')` - show docstring & example for `resnet18` in `pytorch/vision`
* Moved `torch.utils.model_zoo.load_url` to `torch.hub.load_state_dict_from_url` and deprecate `torch.utils.model_zoo`
* We have too many env to control where the cache dir is, it's not very necessary. I actually want to unify `TORCH_HUB_DIR`, `TORCH_HOME` and `TORCH_MODEL_ZOO`, but haven't done it. (more suggestions are welcome!)
* Simplify `pytorch/vision` example in doc, it was used to show how how hub entrypoint can be written so had some confusing unnecessary args.

An example of hub usage is shown below
```

In [1]: import torch

In [2]: torch.hub.list('pytorch/vision', force_reload=True)
Downloading: "https://github.com/pytorch/vision/archive/master.zip" to /private/home/ailzhang/.torch/hub/master.zip
Out[2]: ['resnet18', 'resnet50']

In [3]: torch.hub.show('pytorch/vision', 'resnet18')
Using cache found in /private/home/ailzhang/.torch/hub/vision_master

    Resnet18 model
    pretrained (bool): a recommended kwargs for all entrypoints
    args & kwargs are arguments for the function

In [4]: model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
Using cache found in /private/home/ailzhang/.torch/hub/vision_master
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18758

Differential Revision: D14883651

Pulled By: ailzhang

fbshipit-source-id: 6db6ab708a74121782a9154c44b0e190b23e8309
2019-04-10 23:10:39 -07:00
5164622ba4 Revert D14878128: [jit] Support attributes when copying modules
Differential Revision:
D14878128

Original commit changeset: 7ef5f7b1b16b

fbshipit-source-id: 3818222a897f8c01bc67f550ed0fd3ddecf61015
2019-04-10 22:24:30 -07:00
ce166d949d ProcessGroupMPI exists only if it is valid (#14809)
Summary:
Previously, MPI process groups were created for all processes, even if
they were not part of the created group. Their MPI_Comm member field
would be MPI_COMM_NULL and they would ignore any calls. Their rank and
size were identical to that of the global process group and they had a
special groupRank and groupSize field to capture the _real_ rank.

This also meant assymetry with other process group types, where creating
a new group would either return the process group OR
GroupMember.NON_GROUP_MEMBER. For the MPI process group, it would always
return a process group and an additional check was needed to verify
whether or not a process was indeed part of a process group or not.

This commit changes this such that every MPI process group is a valid
process group, and by extension that we no longer have to special case
MPI to determine whether or not a process is part of a group. Now, if
the value returned by `new_group` is GroupMember.NON_GROUP_MEMBER, the
process is not a member, otherwise it is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14809

Differential Revision: D14887937

Pulled By: pietern

fbshipit-source-id: c5bf86d3b33e524cc5004ee68e30103178fa491d
2019-04-10 21:36:35 -07:00
6b0ca8eae5 Fix flaky store timeout test (#19114)
Summary:
~Sometimes, `init_process_group()`, `store.get()`, and `destory_process_group()` can take more than a few seconds. Hence, removing thread join timeout.~

The error was due to `Address already in use` when starting TPC backend. The solution is to catch the error and report it to the `retry_on_address_already_in_use_error` decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19114

Reviewed By: ezyang

Differential Revision: D14872680

Pulled By: mrshenli

fbshipit-source-id: fc504d02853ca73f76288c0ade564ab20bc01f7e
2019-04-10 20:35:36 -07:00
821b5f138a Optimize SoftmaxOp on CPU (#18635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18635

Optimize SoftmaxOp on CPU

Reviewed By: houseroad

Differential Revision: D14689516

fbshipit-source-id: d2dcee2476d1a3a21f428e99bce9835f1d229d64
2019-04-10 18:52:15 -07:00
1abbee0f8e Allow Tensor lists to show up in symbolic differentiable graphs. (#16784)
Summary:
It is done by flattening all tensor lists that are inputs/outputs to the
graph into the inputs/outputs list in the autograd graph.

This is less desirable than simply allowing IValues to exist in the
inputs/outputs of autograd::Function but it is substantially less
intrusive.

CaptureList describes the variables captured for backward in a single class.
UnpackInstructs describes how the flattened inputs to backwards are re-packed into lists.
ailzhang

This PR is also part 2 of covering maskrcnn & bert AD formulas, following #16689.

Ops added in this PR:
```
cat
index
meshgrid
reshape
split
split_with_sizes
stack
unbind
```
I will also add a few perf numbers here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16784

Differential Revision: D14104063

Pulled By: ailzhang

fbshipit-source-id: 5ceadadfd67ccaac60c5fd6740786c5354e252b9
2019-04-10 18:16:20 -07:00
612998f2ee Support attributes when copying modules (#19040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19040
ghimport-source-id: 37933efd717795751283cae8141e2e2caaae2e95

Differential Revision: D14878128

Pulled By: driazati

fbshipit-source-id: 7ef5f7b1b16b9bf9254e8503564fa3a750d841ab
2019-04-10 16:12:29 -07:00
226a358136 Move ConcatBatchMatMulBatchGatherOp to OSS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19059

Reviewed By: bwasti

Differential Revision: D14849735

fbshipit-source-id: fefd1887d38e51151c07a8b187e9c7c50ef02c6e
2019-04-10 15:29:03 -07:00
70313941b4 Print CuDNN version correctly. (#19110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19110
ghimport-source-id: efbaf9b23cb61e7ea65460684778c6eeb38ae28e

Differential Revision: D14874497

Pulled By: ezyang

fbshipit-source-id: ced03576f7598189dd8cce79b3303a5529551f46
2019-04-10 14:20:22 -07:00
8e8874ae54 Infer device from pointer in from_blob (#19094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19094
ghimport-source-id: 8207cf614ba36333af610309b24fdc13441b2837

Differential Revision: D14865925

Pulled By: li-roy

fbshipit-source-id: 16613801f7fe0e829ccab8af081517ea4257db06
2019-04-10 12:51:05 -07:00
575aebc182 implement operators for DNNLOWP (#18656)
Summary:
Implement operators for DNNLOWP, including int8_conv, int8_FC, int8_pooling, int8_relu, int8_sum, quantize/dequantize, and order_swtich operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18656

Differential Revision: D14767092

Pulled By: yinghai

fbshipit-source-id: 1f3e24929a358a42214da333bd304c593ea4468f
2019-04-10 12:04:39 -07:00
d537e12310 Improve mismatched storage error message. (#19068)
Summary:
Previously the error message would look like:
```
Attempted to set the storage of a tensor on device cuda:0 to a storage on different device cuda. This is no longer allowed; the devices must match.
```

Now it looks like:
```
Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cuda". This is no longer allowed; the devices must match.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19068

Reviewed By: dzhulgakov

Differential Revision: D14854257

Pulled By: gchanan

fbshipit-source-id: deb1ef73c2fcbf9338e7d67f2856282db2befac8
2019-04-10 11:51:33 -07:00
86532c921d Refactor pickler (#19035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19035
ghimport-source-id: 553977b9963d4877e5066a61702f887e81706598

Differential Revision: D14839341

Pulled By: driazati

fbshipit-source-id: d6e4f21b2df28e2a0a21b26bf08d9905599119ad
2019-04-10 11:26:07 -07:00
1858773c0c Fixed bool Tensor value change bug (#19096)
Summary:
Fixes #19077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19096

Differential Revision: D14871044

Pulled By: izdeby

fbshipit-source-id: 61b12559c8c5b9613e00ba5933f478321ea80469
2019-04-10 11:09:07 -07:00
92f70bb639 Split python_ir.h in a more sensible way (#19081)
Summary:
Files included in libtorch do depend on torch/csrc/utils/object_ptr.h, e.g. ir.cpp: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/ir.h#L10 (including usage in std::vector that requires destructor for THPPointer)

However, object_ptr.h depends on python stub: https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/object_ptr.h#L3

Whereas object_ptr.cpp depends full on on python: https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/object_ptr.cpp#L8

`torch/csrc/utils/object_ptr.cpp` is included only in Python extension target: https://github.com/pytorch/pytorch/blob/master/torch/CMakeLists.txt#L541

The only reason it was working on master is that compiler was aggressive enough in pruning unused inline functions. With a bit of changes in flags, it started breaking (like in kostmo's PR).

This PR splits out python-dependent bits more explicitly by forward declaring THPPointer for real.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19081

Reviewed By: ezyang

Differential Revision: D14860091

Pulled By: dzhulgakov

fbshipit-source-id: 4e86cb8e2ac57aedb3cd00c15270d65bb376206c
2019-04-10 10:26:50 -07:00
b461689cfd Clear input/ouput shape cache for each inference (#19085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19085

This is a bug where input_shapes_ and output_shapes_ will grow indefinitely. Fix it here.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D14861695

fbshipit-source-id: d59116f27c3b54f5cc5a33533de4b9222dbb7afc
2019-04-10 10:21:37 -07:00
ea2405c7dc Add torch.unique_consecutive (#19060)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/19045

Please review: VitalyFedyunin ngimel

This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there.

The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely.

Benchmark on a tensor of shape `torch.Size([15320, 2])`:

```python
print(torch.__version__)
print()
a = tensor.sort().values.to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+2addccc

cpu, sorted_input=False:
340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cuda, sorted_input=False:
213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

```python
print(torch.__version__)
print()
a1, a2 = tensor.unbind(1)
indices = (a1 * tensor.max() + a2).sort().indices
a = tensor.index_select(0, indices).to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
cpu, sorted_input=False:
55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cuda, sorted_input=False:
171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060

Differential Revision: D14866909

Pulled By: ezyang

fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f
2019-04-10 07:36:08 -07:00
23b0908d38 Replace tabs with space (#19100)
Summary:
fix the linter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19100

Differential Revision: D14869256

Pulled By: houseroad

fbshipit-source-id: 27ca93cd1dce01ac705b9c9ed93ca8eb6c36351c
2019-04-10 00:35:02 -07:00
a9a29dd63f Fixes error when too many parameters are passed to fused cuda kernel (#18063)
Summary:
Bug fix for https://github.com/pytorch/pytorch/issues/15043, where a large fusion in JIT with a large number of kernel arguments, which exceeds the limit allowed by nvrtc on a cuda device.
  The fix is to check the number of arguments before a cuda kernel is generated. If the number exceeds the limit, take the runFallBack() path.
  Add a reduced test from the original issue to keep the test time low. The test would fail without this fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18063

Differential Revision: D14691401

Pulled By: soumith

fbshipit-source-id: b98829bc89ed7724e91eda82ae3a5a1151af721a
2019-04-09 22:37:09 -07:00
496b0b03d9 amend D14778810 (#18902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18902

Fix in D14778810 had an issue that when we fallback to acc32 because the density of outlier is too high W_quantized_ is already modified. In this diff we first just count the number of outliers (without modifying W_quantized_) and only when density is low enough and no need for fallback we modify W_quantized_ and construct an outlier matrix.

Reviewed By: jspark1105

Differential Revision: D14785256

fbshipit-source-id: 03933110a4ca7409686a06b18a9bb921f8657950
2019-04-09 22:08:54 -07:00
82b570528d Move abs, frac, reciprocal, and neg to TensorIterator (#19041)
Summary:
I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins.

Benchmark script:

```
import torch, time

ops = ['abs', 'neg', 'reciprocal', 'frac']

x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')

for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')

```

Before this change (on my mac with a skylake):
```
op      time per iter (ms)      gops/s  GB/s
abs     0.9730974197387695      1.0775652866097343      8.620522292877874
neg     1.0723679780960083      0.9778136063534356      7.822508850827485
reciprocal      1.2610594034194946      0.8315040490215421      6.6520323921723366
frac    1.1681334018707275      0.8976509004200546      7.181207203360437
```

After this change:
```
op      time per iter (ms)      gops/s  GB/s
abs     0.5031076192855835      2.084198210889721       16.673585687117768
neg     0.4433974027633667      2.3648672578256087      18.91893806260487
reciprocal      0.47145988941192624     2.2241043693195985      17.79283495455679
frac    0.5036592721939087      2.0819154096627024      16.65532327730162
```

So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19041

Differential Revision: D14862037

Pulled By: jamesr66a

fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912
2019-04-09 21:55:00 -07:00
56b18eadab Fix aten op output assignment (#18581)
Summary:
Fixes the problem of #18391

The issue is that when we code gen the ATenOp, we always generated static number of outputs for each operator. E.g. If there's operator from a old model that only requires two outputs, in its createOperator it will only allocate two output blobs, while the newer version of the operator (`unique` in this case) requires more output blob to be allocated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18581

Differential Revision: D14865647

Pulled By: wanchaol

fbshipit-source-id: 85f63fe16d6fe408a09eca84798c7e8cab3070e9
2019-04-09 21:39:12 -07:00
447d74a074 EmbeddingBag w/ differentiable per_sample_weights (#18957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957
ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e

Reviewed By: cpuhrsch

Differential Revision: D14856526

Pulled By: zou3519

fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9
2019-04-09 18:13:06 -07:00
c889ff6cf8 EmbeddingBag w/ per_sample_weights CUDA fwd + bwd (#18800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18800
ghimport-source-id: 17f638dea0e1ac9a86ec06b223c60362ed78449c

Reviewed By: cpuhrsch

Differential Revision: D14851422

Pulled By: zou3519

fbshipit-source-id: 27b114e51e66112e4bc9cfc63d1d1ddfa650d347
2019-04-09 18:13:02 -07:00
0397d7c0c8 EmbeddingBag w/ per_sample_weights CPU backward (#18799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18799
ghimport-source-id: 58a6f629e890449013f24a9b6282664ca2a1e3ba

Reviewed By: cpuhrsch

Differential Revision: D14851417

Pulled By: zou3519

fbshipit-source-id: c36b9d469989354bf6cef1c2c3dc4f13e7cb1a25
2019-04-09 18:12:59 -07:00
2a2007e5ac EmbeddingBag CPU forward with per_sample_weights. (#18735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735
ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920

Reviewed By: cpuhrsch

Differential Revision: D14851415

Pulled By: zou3519

fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325
2019-04-09 18:12:55 -07:00
c561ef5406 Refactor CPU embedding_bag implementation (#18734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18734
ghimport-source-id: e0e50d4b47f2fb8c86e464aacb950521d601f8d3

Reviewed By: cpuhrsch

Differential Revision: D14851413

Pulled By: zou3519

fbshipit-source-id: 8ac4e4de590a363e9807dc552fe4ca52b92652ed
2019-04-09 18:12:52 -07:00
0ca8f7a15f Make BlackBoxPredictor handle networks throwing exceptions (#19080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080

OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test

Reviewed By: dzhulgakov

Differential Revision: D14814194

fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c
2019-04-09 16:42:12 -07:00
168c0797c4 Remind users to set map_location properly when using DDP
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19084

Differential Revision: D14861702

Pulled By: mrshenli

fbshipit-source-id: 10ca4a9b41e707050a6bce228ccca4177c9fa4a6
2019-04-09 16:29:38 -07:00
487388d8ad Rename btrisolve to lu_solve (#18726)
Summary:
Changelog:
- Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726

Differential Revision: D14726237

Pulled By: zou3519

fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b
2019-04-09 15:21:24 -07:00
5eb6a2be41 Avoid calling tensor.data.set_() in DDP
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18961

Differential Revision: D14811208

Pulled By: mrshenli

fbshipit-source-id: c1c46dfa13e0a6ec83aefd35696ee31a7ea3d810
2019-04-09 14:18:24 -07:00
1f0ee9d6e6 Reapply Wrap workaround for cpp custom types a bit prettier and add an example" (#19062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19062

As a temporary demonstration on how to extend this hack further until custom C types are ready.

Reviewed By: ezyang

Differential Revision: D14817809

fbshipit-source-id: 6eaf731e9135313eb858e178abcd9f25380ab8fe
2019-04-09 12:36:32 -07:00
8f9b11cf33 Propagate ProcessGroup timeout to Store (#16571)
Summary:
closes #16520

Hi pietern, I am not sure if this is the expected way to pass timeout to `Store`, could you please help take a look? Thanks!

Questions:
1. How do I write tests for this? I wanted to do something like `test_barrier_timeout_global`, but it seems I need to set the pg's timeout larger than the `Store`'s default timeout (3 min) to see a difference, which is too long for a unit test. And I do not want to change the `Store`'s default timeout either. Any suggestion?
2. Should I also propagate timeout configuration down to `PrefixStore` in `_new_process_group_helper`?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16571

Differential Revision: D13954527

Pulled By: mrshenli

fbshipit-source-id: 77f2653903f24255207233eb298f7c0321119a87
2019-04-09 12:36:28 -07:00
aa017db59c make test_jit_fuser runnable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19036

Differential Revision: D14839800

Pulled By: wanchaol

fbshipit-source-id: b52c131b58e1b42a8c3da5d1117217c3dc2e5f5b
2019-04-09 12:36:25 -07:00
ad45d09202 Fix documentation for unfold(dimension=..., ...), fixes #18793 (#19020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19020
ghimport-source-id: 8f31e51b79daba11939aa7992450984054713b9c

Differential Revision: D14851890

Pulled By: ezyang

fbshipit-source-id: 8498e86a63633fdfd9ecae9b7f85b773b75fe27a
2019-04-09 11:54:25 -07:00
a3e177083b Debugging: Increase process reporting for apt/dpkg. (#18880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18880
ghimport-source-id: b43a33c12df379ec75c1fd4c713c1fc723a763e1

Differential Revision: D14856296

Pulled By: ezyang

fbshipit-source-id: 30691eb14dddfe998b2605b416aaa1b14d1b6ad5
2019-04-09 11:40:47 -07:00
29ea08616b Add torch.__config__.show(), reporting detailed version of all libraries. (#18579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18579
ghimport-source-id: 65124c95e49423de4ad1008c65e75057fea09b94

Differential Revision: D14778507

Pulled By: ezyang

fbshipit-source-id: 1e4bb79f4800a116ce8fb7af2fefbd34da8d102c
2019-04-09 11:13:24 -07:00
31ff0ecd2b Fix torch::nn::init::orthogonal_ with CNNs (#18915)
Summary:
Fixes #18518

I changed the C++ API torch::nn::init::orthogonal_ implementation to match the Python implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18915

Differential Revision: D14851833

Pulled By: ezyang

fbshipit-source-id: 45b5e9741582777c203e9ebed564ab3ac1f94baf
2019-04-09 10:39:15 -07:00
25bd28c3a0 move nightlies to 1.1.0xxx
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19069

Differential Revision: D14854600

Pulled By: soumith

fbshipit-source-id: 85c703bddbd47c1b3914d58ab9521ed22ddeb62a
2019-04-09 10:33:29 -07:00
ba77eadbca add an utility function to check whether it's in the middle of onnx export or not
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19050

Reviewed By: yinghai

Differential Revision: D14849878

Pulled By: houseroad

fbshipit-source-id: a0a4a57f5f9f315ba1334edfccc9284a8099d17f
2019-04-09 10:07:08 -07:00
75d6d8833d remove interned_string.h dep (#19061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061

remove the deps on interned_string.h

Reviewed By: BIT-silence

Differential Revision: D14850078

fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32
2019-04-09 09:59:15 -07:00
b1bea0b733 add logging to make the saving action visible (#19042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19042

show the model saving step in the log.

Reviewed By: kennyhorror

Differential Revision: D14809385

fbshipit-source-id: c7a1e50ff92bb45b16b1c501d9325b304b07fbd3
2019-04-09 09:35:43 -07:00
89145e602b Namedtuple return for gels, triangular_solve, and test refactor (#17195)
Summary:
Partial fix of: https://github.com/pytorch/pytorch/issues/394
- `gels` and `triangular_solve` now returns namedtuple
- refactor test for namedtuple API for better coverage and maintainability
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195

Differential Revision: D14851875

Pulled By: ezyang

fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c
2019-04-09 09:13:26 -07:00
48a35135fb Convert all tabs to spaces, add CI. (#18959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
2019-04-09 08:12:26 -07:00
544783fa1d Fix BN tests for >= 8 GPU test environments (#19049)
Summary:
DDP does not support replicating BN layers within a process. Existing BN tests fail if the test environment has more than 8 GPUs. This is fixed by explicitly setting each process to use a single replica.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19049

Differential Revision: D14845286

Pulled By: mrshenli

fbshipit-source-id: 937dda5081d415ece48b21f2781b6b4e008dd42f
2019-04-09 08:08:05 -07:00
17adce1b69 do not use constexpr with CUDA >= 9.2 compiler on Windows. (#18986)
Summary:
Define `AT_CPP14_CONSTEXPR` from `constexpr` to empty on Windows with CUDA >= 9.2 as workaround.

Discussed in #18425.

When using CUDA 10.1 on Windows, I faced following errors:
~~~
D:/data/source/pytorch\c10/util/ArrayRef.h(144): error: variable in constexpr function does not have automatic storage duration
          detected during instantiation of "const T &c10::ArrayRef<T>::front() const [with T=at::Tensor]"
D:/data/source/pytorch/aten/src\ATen/DeviceGuard.h(30): here
~~~

From documentation of CUDA Toolkit v10.1.105, compiler supports `constexpr` and relaxing requirements (in C++14), but compilation failed.

I suppose this could be compiler bug and require this workaround.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18986

Differential Revision: D14821836

Pulled By: ezyang

fbshipit-source-id: 9800da2fe7291e7c09e8e5e882adebab08d83ae3
2019-04-09 08:03:13 -07:00
5f24f9a29b Add torch/lib/protobuf to gitignore, fixes #18700 (#19019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19019
ghimport-source-id: 84d36f8d27912d1d094d5672154b82187dd88761

Differential Revision: D14846615

Pulled By: ezyang

fbshipit-source-id: e402557ec321c85be3b28c8602b680246c8eecfe
2019-04-09 07:34:37 -07:00
72e171dc52 Automatic update of fbcode/onnx to 971311db58f2fa8306d15e1458b5fd47dbc8d11c (#19046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19046

Previous import was 079c2639f9bb79b1774d1e3bfa05b0c093816ca7

Included changes:
- **[971311db](https://github.com/onnx/onnx/commit/971311db)**: use ONNX_NAMESPACE::to_string instead of std::to_string (#1915) <Lu Fang>
- **[65227446](https://github.com/onnx/onnx/commit/65227446)**: Remove all the experimental ops (#1909) <Lu Fang>
- **[bdb28f29](https://github.com/onnx/onnx/commit/bdb28f29)**: opset converter backward compatibility support for opset versions 9 and 8 (#1847) <Peyman Manikashani>
- **[47692338](https://github.com/onnx/onnx/commit/47692338)**: Create CODEOWNERS for automatic reviewer assignment for PRs (#1910) <Prasanth Pulavarthi>
- **[8121c731](https://github.com/onnx/onnx/commit/8121c731)**: Revert "quantization support in onnx (#1872)" (#1911) <Lu Fang>
- **[4cfa5426](https://github.com/onnx/onnx/commit/4cfa5426)**: quantization support in onnx (#1872) <Ke Zhang>
- **[030bbb80](https://github.com/onnx/onnx/commit/030bbb80)**: Update LICENSE formatting and clarify # of WG chairs (#1907) <Prasanth Pulavarthi>

Reviewed By: yinghai

Differential Revision: D14843284

fbshipit-source-id: 96c1c79abb62beff227a9fc8b2af9382c4673755
2019-04-08 23:20:02 -07:00
3bfdffe487 Fix default CXX for Windows in cpp_extensions.py (#19052)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19017.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19052

Differential Revision: D14846702

Pulled By: soumith

fbshipit-source-id: b0e4dadaa749da0fa2d0405a1a064820d094220a
2019-04-08 23:14:22 -07:00
7db4c8ed76 fix the onnx ci
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19048

Reviewed By: yinghai

Differential Revision: D14844917

Pulled By: houseroad

fbshipit-source-id: 30719e05a443981284dedf34a9e51213271aa934
2019-04-08 23:07:31 -07:00
fd40c0eba0 Add gelu op (#18992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992

Add gelu op

Reviewed By: houseroad

Differential Revision: D14814811

fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491
2019-04-08 21:58:29 -07:00
3ad710b837 Add MKL-DNN Tensor (#17748)
Summary:
This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: https://github.com/pytorch/pytorch/issues/16038

Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17748

Reviewed By: dzhulgakov

Differential Revision: D14614640

Pulled By: bddppq

fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533
2019-04-08 21:41:38 -07:00
e0c593eae7 detect C++ ABI flag for cpp extensions from available runtime information (#18994)
Summary:
Previously, when a user built PyTorch from source, but set the version string manually to be binary-formatted, it would've simply used CXX11_ABI=0 incorrectly.

We have this information available at runtime with `torch._C._GLIBCXX_USE_CXX11_ABI`, so this PR improves the situation by simply using that information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18994

Differential Revision: D14839393

Pulled By: soumith

fbshipit-source-id: ca92e0810b29ffe688be82326e02a64a5649a3ad
2019-04-08 17:50:03 -07:00
df05c7fbac Fix momentum setting in BatchNorm forward pass. (#18764)
Summary:
This is a fix for issue https://github.com/pytorch/pytorch/issues/18525. The issue is related not only to ONNX export, but can manifest in other scenarios.
An existing test point in test/onnx/test_operators.py has been updated to cover this scenario as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18764

Reviewed By: zrphercule

Differential Revision: D14735166

Pulled By: houseroad

fbshipit-source-id: 5a737c648f64355929ff31eb12bd4869e744768d
2019-04-08 16:30:00 -07:00
cfb6054ada add android build workflow to pytorch CI jobs (#18919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18919
ghimport-source-id: 3f0ce4334c899d262403d88bd8bd7513e99570f0

Reviewed By: kostmo

Differential Revision: D14800728

Pulled By: ljk53

fbshipit-source-id: fec2e34c192181b8fa31c9a30f60c9bf7388f083
2019-04-08 16:25:30 -07:00
443a58e03d Export C10 operator in PyTorch Model (#18210)
Summary:
Almost there, feel free to review.

these c10 operators are exported to _caffe2 domain.

TODO:

- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210

Reviewed By: zrphercule

Differential Revision: D14600916

Pulled By: houseroad

fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
2019-04-08 16:06:00 -07:00
09c19e1068 Fix interpolate tracing (#19034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19034
ghimport-source-id: 874e0b0a8685184416152a77fc1850d9a06516ae

Differential Revision: D14837282

Pulled By: zdevito

fbshipit-source-id: b0ed82b607c288a54eecec3d6ed62c4626e5a563
2019-04-08 14:59:26 -07:00
930fb2f319 Fix default dtype in shape analysis (#18968)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/18823

Previously we were setting the dtype to Float when in torchscript the default is double. When the problem in https://github.com/pytorch/pytorch/issues/17662 gets landed, we will have to reevalute (and this test will fail).

We should still be consistent in shape_analysis in the meantime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18968

Differential Revision: D14837939

Pulled By: eellison

fbshipit-source-id: 32383b55c14bdc7753e26dec33c39ab10124c255
2019-04-08 14:50:28 -07:00
a7095b355e Renamed bool tensors into byte tensors (#19021)
Summary:
Renamed bool tensors into byte tensors to represent the correct type in generated code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19021

Differential Revision: D14835188

Pulled By: izdeby

fbshipit-source-id: 0252d2c69dab35ac2f076cf9a87423463e902c76
2019-04-08 13:53:40 -07:00
026a9c6bf2 Handle None indexing in TorchScript (#18615)
Summary:
t[None], t[None, 1:, None] and friends for unsqueezing

Fixes: #12810
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18615

Differential Revision: D14837039

Pulled By: wanchaol

fbshipit-source-id: ab3862c41629f087b0a46b7c59c93dac4018e6fe
2019-04-08 13:44:49 -07:00
239de1623d Turn on mkldnn in most builds except rocm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18965

Differential Revision: D14836931

Pulled By: bddppq

fbshipit-source-id: 463a9bc5043a1f3194158f7bbfae3b71c6cd4b20
2019-04-08 13:19:14 -07:00
9b76f69cd3 Remove dead code in module.cpp (#19022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19022
ghimport-source-id: cdf694c1b426eb9f82d4c148c9f2c2cfc180cedd

Reviewed By: eellison

Differential Revision: D14833409

Pulled By: driazati

fbshipit-source-id: 8914c7227add7f3e07f56b21a513ba7727fb6800
2019-04-08 13:04:04 -07:00
943f712d7a Convert test_recursive_cse to use Filecheck inline annotations. (#19032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19032
ghimport-source-id: 58a146542deb08dd3057d099167ba530a5e51400

Differential Revision: D14836689

Pulled By: ZolotukhinM

fbshipit-source-id: e65ca5f09193eb7c16c204aedd50c474ea31210c
2019-04-08 12:27:20 -07:00
062b1321fe Add a document 'How to Write Tests Using FileCheck' (#19005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19005
ghimport-source-id: f9c3eff54adc8eef3ead2c77be62c44d88d22a00

Differential Revision: D14826845

Pulled By: ZolotukhinM

fbshipit-source-id: 62cc3657ee89acc979403da15e39bd4cd09a866d
2019-04-08 12:12:30 -07:00
e7b2669151 caffe2 - Expose tensor filler util to Python (#18886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886

Expose tensor filler util to Python and add a unit test (both C++/Python)

Reviewed By: salexspb

Differential Revision: D14784470

fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d
2019-04-08 11:54:10 -07:00
66a3277dfa call build_android.sh from pytorch CI build script (#18918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18918
ghimport-source-id: 98c63da263adbbc6ac74a69ac117740c852833cd

Reviewed By: dreiss

Differential Revision: D14800727

Pulled By: ljk53

fbshipit-source-id: 4d06f845bb34bcdb74b0602404f2a0782f8c8783
2019-04-08 11:03:54 -07:00
0565141728 Type annotations for util.data. (#18963)
Summary:
I haven't had a chance to rigorously try these out yet so don't merge yet.
Closes #18725.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18963

Differential Revision: D14832897

Pulled By: ezyang

fbshipit-source-id: 4780e7a34126bc66ddbfd9d808dfc9e0edd77e68
2019-04-08 09:52:53 -07:00
a2ac260524 ifdef guard some explicit pragma unrolls (#19018)
Summary:
the ROCm compiler cannot and will not satisfy them, causing compile time warnings.

Reason being a runtime loop trip count.

Some warnings remain arising from other parts of the ROCm stack - tickets are filed and they will be resolved within these components.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19018

Differential Revision: D14832859

Pulled By: ezyang

fbshipit-source-id: 0d66e4aebe4e56af14dd5e2967d3c374a82be25c
2019-04-08 09:47:23 -07:00
02968398d5 Fix a dev mode bug in activation distribution observer (#19004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19004

Handling the exception case when the data has min 3.40282e+38 max -3.40282e+38

Reviewed By: jspark1105

Differential Revision: D14822193

fbshipit-source-id: b9771d1584fdf8317f5b8c7f5806be5d27314386
2019-04-08 09:36:50 -07:00
2addcccbf1 Clean up some sparse code. (#18962)
Summary:
1) sparse_dispatches in native_parse was not used anymore, got rid of it.
2) got rid of overloaded sizes_ in SparseTensorImpl, which just uses the base implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18962

Differential Revision: D14811545

Pulled By: gchanan

fbshipit-source-id: 2fa60ef50456b5f605caa63beae1d8d2542fd527
2019-04-08 08:15:42 -07:00
65b9196741 Remove tensorWithAllocator() from Type (#18780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18780
ghimport-source-id: 7d18a11ce87d988bd32f6ebb96acd878ab8d61be

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18780 Remove tensorWithAllocator() from Type**
* #18779 Remove tensorFromBlob() from Type

Differential Revision: D14739336

fbshipit-source-id: 429ab10bb9f6ac9f97b5a11c7a836b6b6336cb2d
2019-04-08 00:00:39 -07:00
5241e6ec5c Fix sparse mm for ROCm (#18985)
Summary:
* Annotate also two pass reduction with launch bounds
* ifdef some shortcomings of ROCm w.r.t. short-circuit returns - internal tickets filed
* while there, plug memory leak by destroying matrix descriptor after the sparse call (applicable to cuSPARSE)
* while there, fix types for cusparseXcoo2csr as per cuSPARSE documentation
* enable test_dsmm in test_sparse which now passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18985

Differential Revision: D14822009

Pulled By: bddppq

fbshipit-source-id: 757267a47a63ee56ef396c33059f7eca099f4833
2019-04-07 18:16:16 -07:00
6c91610f0c Check if profiler is disabled in push/pop event (#18908)
Summary:
Make sure to check if profiler is disabled in push/pop and mark event
functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18908

Differential Revision: D14791931

Pulled By: ilia-cher

fbshipit-source-id: e4f5149e69999ee2b9238c21cccad6d27c6a714a
2019-04-07 15:06:04 -07:00
08ee4e5607 Implement Observer pass on simple model and validate stats (#18848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18848

The Observer Module is based on eager mode compute qparam implementation.
Goal is to validate QParam result for EagerMode and Script Mode for simple
model

Observer stats are collected and qparam computed only for activations only at this point

Reviewed By: zafartahirov

Differential Revision: D14720805

fbshipit-source-id: cb2f321b4b9927b37905fdb8eb55c5610d41b351
2019-04-07 09:17:14 -07:00
67fdb4abf7 AVX2 with GCC9 fix. (#18991)
Summary:
Dear All,

The proposed patch fixes the test code snippets used in cmake infrastructure, and implicit failure to set properly the ```CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS``` flag. The libcaffe2.so will have some ```UND``` avx2 related references, rendering it unusable.

* Using GCC 9 test code from cmake build infra always fails:
```
$ gcc  -O2 -g -pipe -Wall -m64 -mtune=generic -fopenmp -DCXX_HAS_AVX_1 -fPIE -o test.o -c test.c -mavx2
test.c: In function ‘main’:
test.c:11:26: error: incompatible type for argument 1 of ‘_mm256_extract_epi64’
   11 |     _mm256_extract_epi64(x, 0); // we rely on this in our AVX2 code
      |                          ^
      |                          |
      |                          __m256 {aka __vector(8) float}
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/immintrin.h:51,
                 from test.c:4:
/usr/lib/gcc/x86_64-redhat-linux/9/include/avxintrin.h:550:31: note: expected ‘__m256i’ {aka ‘__vector(4) long long int’} but argument is of type ‘__m256’ {aka ‘__vector(8) float’}
  550 | _mm256_extract_epi64 (__m256i __X, const int __N)
      |

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 9.0.1 20190328 (Red Hat 9.0.1-0.12) (GCC)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18991

Differential Revision: D14821838

Pulled By: ezyang

fbshipit-source-id: 7eb3a854a1a831f6fda8ed7ad089746230b529d7
2019-04-07 08:27:00 -07:00
f6af76ead7 Remove tensorFromBlob() from Type (#18779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18779
ghimport-source-id: e7453b74fcce0e4f4a9cbce0324992a85272a426

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18780 Remove tensorWithAllocator() from Type
* **#18779 Remove tensorFromBlob() from Type**

Differential Revision: D14739335

fbshipit-source-id: 8a0619a5b412332efa3b2d60c1edebd53d089d50
2019-04-07 01:37:43 -07:00
9b69f21a95 Improve precision of emitted code for prim::Constant (#18817)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/18815 and https://github.com/pytorch/pytorch/pull/18811.

This makes it so that we emit a higher-precision literal for float values in the fusion kernel, as well as assign that to a `double` variable. This prevents us from losing precision for values such as `pi`, but with the previous fixes this will also get downcasted to `float` if downstream operations require it. Therefore, we should not lose performance because of implicit promotions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18817

Differential Revision: D14820842

Pulled By: jamesr66a

fbshipit-source-id: 519671c6ca5e7adac746a4c4c72760a6d91e332f
2019-04-07 00:18:24 -07:00
79533ef097 convert_sync_batch_norm to SyncBatchNorm (#18787)
Summary:
Closes #18382

Please let me know if any changes are required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18787

Differential Revision: D14821147

Pulled By: soumith

fbshipit-source-id: edd98eab1b3f4151c4ae5148146435ddb2ae678d
2019-04-07 00:13:02 -07:00
907b4c5890 fix bug when falling back to acc32 when weight is prepacked (#18974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18974

When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32.

Reviewed By: bddppq

Differential Revision: D14814067

fbshipit-source-id: aec917322de695e283f0aca1e930c5603d196404
2019-04-06 21:53:08 -07:00
dbd9971dd2 move 2ops back to autodiff (#18969)
Summary:
Move these 2 ops back to autodiff to unblock xla CI.
I will leave them for my next PR to cleanup symbolic_variable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18969

Differential Revision: D14816811

Pulled By: ailzhang

fbshipit-source-id: dd8a7e133dcad29560d3d1d25691883960117299
2019-04-06 21:41:25 -07:00
1ffa358fca Preserve naming for inputs/outputs with observer insertion (#18713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18713

  - Quantizer observer node output is hooked up to following node
which mutates the naming for input/output. This is not desired and
required because observer op can be a sync node

  - Quantizer is aimed for quantizing tensors so we should insert observer
op for Values that are type tensor

Reviewed By: zafartahirov

Differential Revision: D14715916

fbshipit-source-id: feca04c65a43103b46084d3548998498b19ee599
2019-04-06 21:01:21 -07:00
34382e428f Emit math functions specific to output type (#18815)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/18811

This makes it so that we only emit the *f variants of math functions if the output value's type is FloatTensor, otherwise we call the double variants to prevent loss of precision. This fixes more numerical issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18815

Differential Revision: D14816965

Pulled By: jamesr66a

fbshipit-source-id: 464be644168875ede987142281fb2168f4041e81
2019-04-06 17:56:05 -07:00
8961ad8c5b add instructions for NVIDIA Jetson platforms (#18990)
Summary:
Thanks to dusty-nv , we now have Stable and Weekly wheels provided for the NVIDIA Jetson Platform. They require JetPack 4.2.

He's also maintaining source build instructions.

This PR adds links to the binaries and source build instructions to the README.

The links are dynamic, so when new stable / weekly wheels are available, Dustin will update the same URL to point to the new files
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18990

Differential Revision: D14820158

Pulled By: soumith

fbshipit-source-id: 761a56557decb72ad9c1b9f8a2745667f558eec3
2019-04-06 12:42:43 -07:00
bcd527190a Quantizer pass to insert quant-dequant nodes into IR (#18446)
Summary:
- Quantizer pass to mutate IR by inserting quant-dequant nodes
before and after nodes which support quantized ops. This information
will be used by jit compiler to substitute with quantized ops

- This currently covers simple model. It will be expanded later
for subgraph pattern matching to cover more complex patterns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18446

Differential Revision: D14592265

Pulled By: nishantpdce

fbshipit-source-id: c9ba6c12aa96cb9c117826e386721eec83a55ea6
2019-04-06 12:39:26 -07:00
7b5b1486c9 add SyncBatchNorm to docs (#18988)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18983
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18988

Differential Revision: D14820042

Pulled By: soumith

fbshipit-source-id: 356169f554a42303b266d700d3379a5288f9671d
2019-04-06 11:43:20 -07:00
d6d0fcc92b Add c10_cuda to libraries in CUDAExtension for Windows (#18982)
Summary:
This change was necessary for me to compile [apex](https://github.com/NVIDIA/apex) on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18982

Differential Revision: D14819818

Pulled By: soumith

fbshipit-source-id: 37ff9b93a72ab2b7c87f23a61e9f776c71c4c1a8
2019-04-06 10:30:51 -07:00
1497d45315 Remove Trainer from README.md (#18980)
Summary:
Trainer has been removed long time ago
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18980

Differential Revision: D14819855

Pulled By: ezyang

fbshipit-source-id: f62020e688ebf6663416aec7435bf1f531607941
2019-04-06 09:12:50 -07:00
13f03a42d2 Create Object that represents a Module (#18469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18469
ghimport-source-id: 73cb8b58f43f10b1dcfca805fd5b25c4fa977632

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18469 Create Object that represents a Module**
* #18468 slots with explicit value/setValue make more sense in future patches
* #18467 Make Object hold its ClassType
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

This changes the underlying storage for script::Module to hold
a ivalue::Object which has slots for all the parameters and attributes.

NamedIValue and Slot are now merged together into one class Slot that stores
the tuple (ivalue::Object, offset) and can be used to read the name, type,
or value of the slot and also to set the value. This cleans up a bunch
of client uses.

This PR does not actually use the module object in any generated code.
A future PR will switch how code is generated to treat modules as
first class.

Differential Revision: D14613508

fbshipit-source-id: d853a7559f58d244de2ef54a781427fcd1060ed0
2019-04-05 18:58:52 -07:00
8c9caf185b Add numpy like repeat as torch.repeat_interleave (#18395)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/14093
cc: SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18395

Differential Revision: D14599509

Pulled By: umanwizard

fbshipit-source-id: 2391a1cc135fe5bab38475f1c8ed87c4a96222f3
2019-04-05 18:16:25 -07:00
e6bbbb017e Fix interpolate trace (#18875)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10654

The issue is that in tracing `.size` returns an int tensor, and when an int tensor is multiplied by a scalar the int dominates and the scalar gets casted 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18875

Differential Revision: D14814441

Pulled By: eellison

fbshipit-source-id: a4e96a2698f2fcbf3ec4b2bb4c43a30250f30ad9
2019-04-05 17:55:23 -07:00
6084908287 Code string API for fuser testing (#18884)
Summary:
This adds a C++ function `debugGetFusedKernelCode` as well as a Python binding `_jit_fuser_get_fused_kernel_code` that will, given a FusionGroup graph and a set of specified inputs, return the compiled kernel source code. We can then check the contents of this source code for verification of the fuser codegen backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18884

Differential Revision: D14795508

Pulled By: jamesr66a

fbshipit-source-id: 8f6e9dd13ebbb517737d893b0b5f5e9aa06af124
2019-04-05 17:13:17 -07:00
ce67775f08 remove unused func (#18712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18712
ghimport-source-id: e435150a501b20695a5276addee93d795e04b532

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18712 [jit][easy] remove unused func**
* #18711 [jit] fix side-effects and aliasing for custom ops

as title

Differential Revision: D14730979

fbshipit-source-id: 381d16ea2a45779bf6d5fc6d90a4f8585461e902
2019-04-05 15:19:28 -07:00
46fe266507 Revert D14778810: [caffe2/int8] fix bug when falling back to acc32 when weight is prepacked
Differential Revision:
D14778810

Original commit changeset: d49a8c4b7c81

fbshipit-source-id: 15568b084848de74437582548bec42aadc74080d
2019-04-05 14:01:33 -07:00
f6f34b3f4c slots with explicit value/setValue make more sense in future patches (#18468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18468
ghimport-source-id: d4b41c521f2269a695e03c8e7d05d5542731ee48

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18469 Create Object that represents a Module
* **#18468 slots with explicit value/setValue make more sense in future patches**
* #18467 Make Object hold its ClassType
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

Reviewed By: suo

Differential Revision: D14613509

fbshipit-source-id: 9f2208d0efd01465c78cebdc3e8365a9e0adf9ff
2019-04-05 13:41:02 -07:00
091acb0978 Make Object hold its ClassType (#18467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18467
ghimport-source-id: d51bdd64d2529d08c634c58df1a0870b54ad49fb

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18469 Create Object that represents a Module
* #18468 slots with explicit value/setValue make more sense in future patches
* **#18467 Make Object hold its ClassType**
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

Currently it holds a symbol whose unqualified name is the name of the
class. This will get confusing when there are multiple possible registries,
and it makes getting the class type from the object difficult.
The pointer to the class is only 4 more bytes so this patch just puts
it in the object.

Reviewed By: suo

Differential Revision: D14613510

fbshipit-source-id: b35175ba4be83d2522deaa6dad5070d6ec691fed
2019-04-05 13:40:59 -07:00
53458c97dd Enforce single parent for script submodules (#18379) (#18860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18860
ghimport-source-id: 96305349bf3db564f43df2263b1e5bddcc9e9dae

Reviewed By: suo

Differential Revision: D14780421

Pulled By: zdevito

fbshipit-source-id: 2bdd89b35866ba035ebea0adab037e441c1006e2
2019-04-05 13:40:56 -07:00
f9a56d4af2 CUDA_NVCC_EXECUTABLE is not needed, as nvcc is in PATH (#18958)
Summary:
As indicated by f0k: https://github.com/pytorch/pytorch/pull/18495#issuecomment-480178763
nvcc via ccache is already first in the PATH in the instructions I provided, so CUDA_NVCC_EXECUTABLE is not needed.

I re-built to test that it's so.

Thank you!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18958

Differential Revision: D14810732

Pulled By: ezyang

fbshipit-source-id: 3758ae2253c745c5d7cfccedd49fa00cc4629965
2019-04-05 13:07:05 -07:00
8e1e29124d Fix precision issue with expansion that prefers 'probs' over 'logits' (#18614)
Summary:
I have experienced that sometimes both were in `__dict__`, but it chose to copy `probs` which loses precision over `logits`. This is especially important when training (bayesian) neural networks or doing other type of optimization, since the loss is heavily affected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18614

Differential Revision: D14793486

Pulled By: ezyang

fbshipit-source-id: d4ff5e34fbb4021ea9de9f58af09a7de00d80a63
2019-04-05 13:07:01 -07:00
b90cbb841d Method is supposed to be in-place (#18684)
Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.

I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.

Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame #40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame #41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame #42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame #43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame #44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame #45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame #46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame #47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame #48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
    input_1: Tensor) -> Tensor:
  _0 = torch.norm(self.item_embedding.weight, 2, 1, True)
  _1 = torch.div(self.item_embedding.weight, _0)
  m_weight = torch.t(_1)
  input_2 = torch.contiguous(input_1)
  weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
             ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  x = torch.embedding(weight_1, input_2, -1, False, False)
  input_3 = torch.div(x, torch.norm(x, 2, 2, True))
  max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
  hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
  _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
  input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True)
  input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
  tastevec = torch.div(input, torch.norm(input, 2, 2, True))
  outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18684

Differential Revision: D14782041

Pulled By: ezyang

fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d
2019-04-05 13:00:29 -07:00
28990f34d9 fix bug when falling back to acc32 when weight is prepacked (#18881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18881

Pull Request resolved: https://github.com/pytorch/pytorch/pull/18878

When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32.

TODO: add unit tests with better coverage

Reviewed By: feiyu1990

Differential Revision: D14778810

fbshipit-source-id: d49a8c4b7c815ab29b77feb53ee730ad63780488
2019-04-05 13:00:26 -07:00
c1790fa202 More numerically stable lerp (#18871)
Summary:
The C++ and CUDA implementations of the lerp are not numerically stable. This is discussed on Wikipedia [here](https://en.wikipedia.org/wiki/Linear_interpolation#Programming_language_support). I checked the GPU SASS output and there's no overhead from using the more precise implementation, from Kepler all the way to Turing. I haven't looked at CPU ASM though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18871

Differential Revision: D14793438

Pulled By: ezyang

fbshipit-source-id: 2ddc2e026c5285466cae7d1b4101174253100445
2019-04-05 12:51:20 -07:00
edc7b4726b Increase default c10d/ProcessGroupGloo test timeout (#18916)
Summary:
See #18659.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18916

Differential Revision: D14808749

Pulled By: pietern

fbshipit-source-id: 9a9c8beddb2dbbb1bf4c5e575743d9e1fa3f07fa
2019-04-05 12:16:30 -07:00
cb3a4a3d28 remove symbolic variable part 1 (#17986)
Summary:
As discussed with gchanan we should deduplicate symbolic_variable and symbolic_script to prepare for the future merge with derivatives.yaml.

This PR moves most easy formulas to symbolic_script.

TODO: run benchmarks to make sure no perf regression

cc: apaszke zdevito wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17986

Differential Revision: D14766412

Pulled By: ailzhang

fbshipit-source-id: d95a3f876e256c0f505779a71587c985571d3b8f
2019-04-05 12:06:47 -07:00
f3dbcfdfb5 Revert D14742020: Wrap workaround for cpp custom types a bit prettier and add an example
Differential Revision:
D14742020

Original commit changeset: 0f2fd83ae56a

fbshipit-source-id: 5640255aef0319b7d8996e07132e87213130d31c
2019-04-05 12:02:12 -07:00
c65eeeb075 Decompose more Windows scripts (#18917)
Summary:
This PR:

* pulls four distinct installation steps out of `build_pytorch.bat` and into their own scripts.
* eliminates the copy step for helper scripts called by `win-build.sh` and `win-test.sh`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18917

Differential Revision: D14807236

Pulled By: kostmo

fbshipit-source-id: 03e91a5834dfd6d68903ad9725eacc099bbf6d53
2019-04-05 11:31:52 -07:00
ef779b3397 Wrap workaround for cpp custom types a bit prettier and add an example (#18791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18791

As a temporary demonstration on how to extend this hack further until custom C types are ready.

Reviewed By: jamesr66a

Differential Revision: D14742020

fbshipit-source-id: 0f2fd83ae56ab2abe16977a1829ed421e6abe74b
2019-04-05 11:20:13 -07:00
c3a559deb7 Remove cuda::compat functions in aten (#18905)
Summary:
Looks like the issue of using `std::` functions is fixed in new rocm version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18905

Differential Revision: D14792943

Pulled By: bddppq

fbshipit-source-id: af11acbb85872943f23b6e55415db1f0699e7b8f
2019-04-05 11:15:16 -07:00
fefa6d305e fix side-effects and aliasing for custom ops (#18711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18711
ghimport-source-id: c9caedc0660b2b7ba3730cd0e1a2e0e9c3cf422b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18711 [jit] fix side-effects and aliasing for custom ops**

Previously we didn't track aliasing, mutation, or side effects for
custom ops. This PR adds in guards with the most conservative
assumptions possible: the op will
1) have side effects,
2) write to everything
3) produce a wildcard.

In order to tell whether a given operator is a custom op, this PR introduces
the concept of a "reserved" namespace (basically all our builtin namespaces).
Custom ops live in non-reserved namespaces, so a check on the namespace
is sufficient to tell whether a schema/node is "custom" or not.

This is just to get things correct for now. Follow-ups to this:
- Users should be able to specify aliasing/mutability without having to learn
the whole alias annotation schema.
- Relax assumptions a bit. In particular outputs can only alias input tensors,
they don't have to be wildcards.

Fixes #18490

Differential Revision: D14730978

fbshipit-source-id: 540b47a24ccf24145051609bdcc99c97e46e0fe0
2019-04-05 10:48:14 -07:00
abc758ed40 Expand the list of ops that mutate an inputs shape (#18812)
Summary:
Expand the list of ops that resize an input in-place to include broadcasting ops and other ops that affect shape. Whoever is reviewing the PR could you please look through pytorch in place ops and see if I missed any.

Expanding the PR from: https://github.com/pytorch/pytorch/pull/17518

This is already being tested in test_resize_input_ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18812

Differential Revision: D14793410

Pulled By: eellison

fbshipit-source-id: 125f4f5375ac1036fb96fabc9da2aaccc9adc778
2019-04-05 10:43:34 -07:00
e45e3634d6 add launch bounds, enable more tests (#18909)
Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.

Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909

Differential Revision: D14801490

Pulled By: ezyang

fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7
2019-04-05 10:17:15 -07:00
1d263ed92a Add backward pass to infer single missing input shape for Concat opportunitiscally (#18911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18911

Att.

Reviewed By: bddppq

Differential Revision: D14791295

fbshipit-source-id: 4b7a775924f0eadb0cb73aa6c434a6a5be8b92be
2019-04-05 10:11:58 -07:00
0c5d444b28 change to use clang if NDK >= 18 (#18914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18914
ghimport-source-id: 4d9d9322ee5559d96e13533ec37ff3be86a0227c

Reviewed By: ezyang

Differential Revision: D14794162

Pulled By: ljk53

fbshipit-source-id: caac55e12b1e62bf6ebcc6e2062d5ed122ad4e64
2019-04-05 10:02:03 -07:00
5e8a9e8802 Revert D14673459: [pytorch][PR] [jit] Replace Slot on script::Method with NamedIValue
Differential Revision:
D14673459

Original commit changeset: 21200180c47f

fbshipit-source-id: 9c01de4cf5bb7c87ac0c55705b901db990cd917b
2019-04-05 09:57:13 -07:00
8793e8db42 Disable flaky test_proper_exit test. (#18950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18950
ghimport-source-id: 27bd575fd3c73a51ace1360aa020fa63a792a5d2

Differential Revision: D14802009

Pulled By: ezyang

fbshipit-source-id: 051e1d038892c2c6e8337357fa80771b8dc42680
2019-04-05 09:49:54 -07:00
865ed7682d Checkout pytorch_sphinx_theme with https. (#18859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18859
ghimport-source-id: fbbcb8a2dd9c9f0a317de489b6bbb83e9071a7d8

Differential Revision: D14801989

Pulled By: ezyang

fbshipit-source-id: a9bc02e1383adafcac01994e6346b28551d95c71
2019-04-05 09:35:49 -07:00
ce92cf9bd1 Add tests for reducer class (#18845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18845

This adds a few CPU only test cases for the reducer class.

Reviewed By: mrshenli

Differential Revision: D14768432

fbshipit-source-id: c008a52206826304e634a95bc14167ed94c97662
2019-04-05 09:07:29 -07:00
79ac2120ba Fix a few instances of notifying on a CV while holding the lock (#18857)
Summary:
Fix a few instances of notifying on a CV while holding the lock to release the lock before notifying.  This avoids an extra thread suspension when the notified thread tries to grab the lock.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18857

Differential Revision: D14779132

Pulled By: resistor

fbshipit-source-id: b18a05c4c15be1426ebfdffac1c8f002b771cfd7
2019-04-05 08:41:53 -07:00
0829ef00dd Unify caffe2 and libtorch build scripts on Windows (#18683)
Summary:
`scripts/build_windows.bat` is the original way to build caffe2 on Windows, but since it is merged into libtorch, the build scripts should be unified because they actually do the same thing except there are some different flags.

The follow-up is to add the tests. Looks like the CI job for caffe2 windows is defined [here](https://github.com/pytorch/ossci-job-dsl/blob/master/src/jobs/caffe2.groovy#L906). Could we make them a separate file, just like what we've done in `.jenkins/pytorch/win-build.sh`? There's a bunch of things we can do there, like using ninja and sccache to accelerate build.

cc orionr yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18683

Differential Revision: D14730188

Pulled By: ezyang

fbshipit-source-id: ea287d7f213d66c49faac307250c31f9abeb0ebe
2019-04-05 07:47:32 -07:00
84068f43f2 Simplify storage wrapping in TH. (#18855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18855
ghimport-source-id: 01faa229fa4db901ab8539d3778b716d909ba4cf

Reviewed By: dzhulgakov

Differential Revision: D14790669

Pulled By: gchanan

fbshipit-source-id: 167b9bc9c9872743fa8f6040a26ddf7ff5789c27
2019-04-05 07:21:42 -07:00
043e363c6c Cache device on TensorImpl; clean up TensorImpl constructors. (#18833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18833
ghimport-source-id: 6f2be25fcc5e6be3ffe20582e604bd2c1fbab66b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.**
* #18832 [STACK] Disallow changing the device of a tensor via set_.
* #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.

1) We cache device on TensorImpl.  This means we can access the device without a virtual function and allows us to more easily extend TensorImpls (because they don't need to figure out how to store the Device for themselves).

2) Clean up TensorImpl APIs.  We had a constructor that took a TensorTypeId and an allocator and would allocate a Storage based on the recognized types of TensorTypeIds.  Instead, we just have two different constructors: one for types with a storage, one without.

Reviewed By: dzhulgakov

Differential Revision: D14766230

fbshipit-source-id: 745b8db84dcd6cb58f1a8675ad3ff8d033bc50df
2019-04-05 07:21:39 -07:00
b7c830b916 Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854)
Summary:
This reverts commit c484cf43a02863efd2f4a76aad43246fb0191ab5.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854

Differential Revision: D14778393

Pulled By: VitalyFedyunin

fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452
2019-04-05 06:25:33 -07:00
ab4133397c Silence compiler warnings (#18912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18912

We intentionally test a deprecated API, no need to show the warnings here.

Reviewed By: dzhulgakov

Differential Revision: D14792617

fbshipit-source-id: 9ea2a4106d566064283726eed2c274b98f49a2e5
2019-04-05 01:52:00 -07:00
c34e5ff952 ScriptModuleOp in caffe2 (#18716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18716

Might be useful as an intermediate stage for some systems that currently use Caffe2 nets as an execution mechanism.

Not sure it's a good idea all together, please comment.

Limitations:
- only Tensor types as inputs/outputs
- the entire module is serialized as a zip archive inside a proto in Caffe2 db, it'd be subject to 4Gb limit and is likely very slow. For small models it'd work though.
- no autograd, though it can be attached in principle
- no way to retrieve parameters inside the script module from C2 runtime perspective (though they potentially can be alias-fetched and stored as individual blobs)
- after deserialization, python wrappers returned don't have correct type (as we don't do module_lookup trick)

Build-wise, I had to add dependency from pybind_state to libtorch.so. I don't think we build Caffe2 python frontend independently anymore, so it should be fine.

Reviewed By: amirshim, houseroad

Differential Revision: D14339599

fbshipit-source-id: 88a37a8abd1f1c4703e5ef937031f222535d4080
2019-04-05 01:07:43 -07:00
8bdd0c3a85 flake8 fix on extracted python script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18931

Differential Revision: D14796114

Pulled By: kostmo

fbshipit-source-id: 25971be5a36fffc61e29db981af7298a0fe0ed8c
2019-04-05 00:54:23 -07:00
8f5e478aa2 Replace Slot on script::Method with NamedIValue (#18252)
Summary:
This refactor lets us track the types of initial values added onto a `Method`. The main motivation for this is the change in `module.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18252

Differential Revision: D14673459

Pulled By: driazati

fbshipit-source-id: 21200180c47f25bb70898771adfb569856e6c34a
2019-04-04 23:35:56 -07:00
90b8552c98 U/kostmo/windows offload scripts 3
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18754

Differential Revision: D14794893

Pulled By: kostmo

fbshipit-source-id: 05187d9b53615ffbcc7253accdc692c4ecaf25d9
2019-04-04 21:08:05 -07:00
4f5e72600e fix lint in optim doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18883

Differential Revision: D14793365

Pulled By: ezyang

fbshipit-source-id: c1b46c98e3319badec3e0e772d0ddea24cbf9c89
2019-04-04 19:08:13 -07:00
8a466d147c Fixed the comment to reference gist example instead of private repo (#18852)
Summary:
Replace link to a file in a private repo with a gist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18852

Reviewed By: ezyang

Differential Revision: D14778481

Pulled By: izdeby

fbshipit-source-id: 8389aa4bf115ddcfd85079cc2c861404efa678e7
2019-04-04 18:26:24 -07:00
b11a8c6aef return missing keys from load_state_dict (#18668)
Summary:
return missing_keys and unexpected_keys from load_state_dict so the user can handle them when strict mode is off; also removed an unused variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18668

Differential Revision: D14782073

Pulled By: ezyang

fbshipit-source-id: ab3b855eb77bb7422594d971988067e86eef20f2
2019-04-04 18:11:56 -07:00
814c1df29a Fix caffe2 miopen conv transpose gradient op for case of no dX gradient
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18809

Reviewed By: ezyang

Differential Revision: D14759762

Pulled By: bddppq

fbshipit-source-id: ff795b7e58c82f67a1d7284b5ab06b0e0e5fd3ae
2019-04-04 17:29:30 -07:00
d35c39e73b don't attempt to multiply by a sparse matrix (#18737)
Summary:
Tested by running the script in #16562 , and there was no error.

Then:
```
>>> print(mat.grad)
tensor([[1., 2., 3.],
        [1., 2., 3.],
        [1., 2., 3.]])
```

which is correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18737

Differential Revision: D14773078

Pulled By: umanwizard

fbshipit-source-id: 8aa36eb6f6aa104263a467d9ac91d61b3bfd05f5
2019-04-04 17:24:53 -07:00
07efee395c add Fast-RNN to AI-PEP
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18885

Reviewed By: hl475

Differential Revision: D14728854

fbshipit-source-id: 7e7a2946929551963f7c938e3d82a260a9efdfbd
2019-04-04 17:04:21 -07:00
7a19d3c9e1 Allow override of backend in dist.new_group() (#18595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18595

There is no need to force the backend to be the same as the global
process group, as long as the backend is "nccl" or "gloo".

Reviewed By: mrshenli

Differential Revision: D14657204

fbshipit-source-id: 868817b9f219e3be8db0761a487f0027ed46663b
2019-04-04 14:23:03 -07:00
1ec1db477d ONNX Export All Cases of Softmax
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18482

Reviewed By: zrphercule

Differential Revision: D14630697

Pulled By: houseroad

fbshipit-source-id: c06f1e3bead10a265c5f4ac3723d49f4caf46801
2019-04-04 13:24:04 -07:00
b4d2df1fee Added bool and half support for resize_as_ and view methods (#18821)
Summary:
Enabled **resize_as_** and **view** methods for bool and half tensors.
tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18821

Reviewed By: ezyang

Differential Revision: D14762852

Pulled By: izdeby

fbshipit-source-id: 4312079fb4e893fea6f71ff4f163094b2674f1e8
2019-04-04 13:09:10 -07:00
bb16e8dacb Automatic update of fbcode/onnx to 079c2639f9bb79b1774d1e3bfa05b0c093816ca7 (#18841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18841

Previous import was f0d7df2c643c4e37f1fd7735ef02c972c4d19fb5

Included changes:
- **[079c2639](https://github.com/onnx/onnx/commit/079c2639)**: update the squeeze and unsqueeze doc (#1905) <Lu Fang>
- **[a8b45d62](https://github.com/onnx/onnx/commit/a8b45d62)**: fix the ir_version onnx-operators.proto (#1903) <Lu Fang>

Reviewed By: zrphercule

Differential Revision: D14767158

fbshipit-source-id: 2d772fece45e25d72bf1d10fad156189397f3f86
2019-04-04 13:01:37 -07:00
33f4751fb8 Actually model scalar type promotion in shape analysis (#18811)
Summary:
This was causing some numerical issues in the fuser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18811

Differential Revision: D14767390

Pulled By: jamesr66a

fbshipit-source-id: f1123d1aab5501abad850d2edc996f8aa8dafe04
2019-04-04 12:56:40 -07:00
d108a1abb7 Add a .ctags.d/ toplevel directory (#18827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18827
ghimport-source-id: 38f857bc29b2c2c6a71069d00c4c69ed0bef1574

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18827 Add a .ctags.d/ toplevel directory**

Exclude build artifacts by default.

Reviewed By: ezyang

Differential Revision: D14765721

fbshipit-source-id: a785dbb2ef1df96af8e23cc65c8db2a6b67b4fce
2019-04-04 12:51:05 -07:00
8ca9ba17da Fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18802

Differential Revision: D14781874

Pulled By: ezyang

fbshipit-source-id: 0f94c40bd84c84558ea3329117580f6c749c019f
2019-04-04 12:46:39 -07:00
b145dcca04 Add support for group ConvTranspose (#18794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18794

Add support for group ConvTranspose

Reviewed By: houseroad

Differential Revision: D14741327

fbshipit-source-id: 5d947ca044bf8495dd7f8f56122441ebbcc6c7e4
2019-04-04 11:52:06 -07:00
8732a1b42e Disallow changing the device of a tensor via set_. (#18832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18832
ghimport-source-id: fde4ad90541ba52dfa02bdd83466f17e6541e535

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.
* **#18832 [STACK] Disallow changing the device of a tensor via set_.**
* #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.

This is necessary to cache the device on a TensorImpl.

Differential Revision: D14766231

fbshipit-source-id: bba61634b2d6252ac0697b96033c9eea680956e8
2019-04-04 11:15:37 -07:00
15b318de84 U/kostmo/win test offload scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18694

Differential Revision: D14766339

Pulled By: kostmo

fbshipit-source-id: a2300e72129979f866430ca5c09dd7fff6df0a89
2019-04-04 10:42:11 -07:00
f97eb8d9e4 Revert D14603722: Enforce single parent for script submodules
Differential Revision:
D14603722

Original commit changeset: 63ab5d0cccf7

fbshipit-source-id: 2c4174def102eda4589e08c4dbd67ce8af975199
2019-04-04 10:32:36 -07:00
52a3a51490 Fix deviceCount on FakeGuardImpl. (#18745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18745
ghimport-source-id: 3ed111efe83b3061652869e33d9b5910b7daa732

Differential Revision: D14759198

Pulled By: ezyang

fbshipit-source-id: 70a8db767f310fe0e0079c7b0693e9330d7cd472
2019-04-04 09:23:36 -07:00
486fae563d Stop swapping in Storages of the wrong device for Tensors. (#18831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18831
ghimport-source-id: 2741e0d70ebe2c2217572c3af54ddd9d2047e342

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.
* #18832 [STACK] Disallow changing the device of a tensor via set_.
* **#18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.**

This is necessary to support device caching, see https://github.com/pytorch/pytorch/pull/18751 and https://github.com/pytorch/pytorch/pull/18578.

In library code, we potentially swap in Storages with the wrong device when device_guard is False.  This happens as follows with "view-like" operations.
1) We allocate a tensor on the 'wrong' device (because device_guard is false).
2) We swap out the 'wrong' storage with the 'right' storage using e.g. THCTensor_setStorage.

Instead, we can just construct the Tensor with the correct Storage from the beginning.  This is what we do with 'view'.

Note there are two other "view-like" cases where this happens:
1) unfold
2) set_()

Because these aren't performance critical, I just added the device_guard instead of applying the above correction.

For completeness, this also includes a test that all `device_guard: false` functions behave properly under these conditions.

Reviewed By: dzhulgakov

Differential Revision: D14766232

fbshipit-source-id: 0865c3ddae3f415df5da7a9869b1ea9f210e81bc
2019-04-04 06:25:33 -07:00
d70c6f23f4 Pass ScalarType separately from Type in python constructors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17786

Reviewed By: ezyang

Differential Revision: D14379075

fbshipit-source-id: 3abf066563b789a30cafe5b0c868a41326f5b833
2019-04-04 02:24:20 -07:00
f5741eb855 Store ScalarType and Backend instead of Type in TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601

Reviewed By: ezyang

Differential Revision: D14274754

fbshipit-source-id: b08880ae586b6ae57d4c0bbeb203796d087926c4
2019-04-04 02:24:16 -07:00
c705d9eb1e Introduce DeprecatedTypeProperties class (#17991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17991

changes:
-Breaks bc: Tensor::type() now returns DeprecatedTypeProperties& rather than Type&.
-Added DeprecatedTypeProperties, it serves as a temporary replacement for Type as the return value of Tensor::type(). This contributes to making Type just for dispatch purposes so that we can make it dtype agnostic.
-Tensor::dispatch_type() now returns Type& like Tensor::type() used to do.
-Changed callsites of Tensor::type() appropriately.

Reviewed By: ezyang

Differential Revision: D14443117

fbshipit-source-id: 239ccb7a09626279a71d1a37f8f82e7f57bf7d9e
2019-04-04 02:24:13 -07:00
095f88e093 Fix to handle null strides in DLPack tensor (#18510)
Summary:
DLPack can have non-strided tensors, which is represented by a nullptr in the place of dl_tensor.strides.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18510

Differential Revision: D14647328

Pulled By: bwasti

fbshipit-source-id: 5364282810a5772cfc2319fc8133fe86fdd84dd1
2019-04-04 00:28:13 -07:00
e5e2110a8e Add shape inference function for Split (#18838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18838

It turns out that we don't have shape inference function of `Split` op at all. This diff adds that.

Reviewed By: bertmaher

Differential Revision: D14766871

fbshipit-source-id: 535cb4f24bdada603c76579e00e7a39aee93e19f
2019-04-04 00:22:22 -07:00
0c237f1383 Fix the duplication problem in _unique_state_dict (#18139)
Summary:
Since parameter.data will create a new torch.Tensor each time, we get duplicate tensors when call _unique_state_dict now. Try to deduplicate it before creating new tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18139

Reviewed By: dzhulgakov

Differential Revision: D14511262

Pulled By: houseroad

fbshipit-source-id: cb69795d0b6509721220650bbb19edeb3459a503
2019-04-03 23:16:44 -07:00
fa0ad057f8 fold col offset into bias; optimize A symmetric quant (#17026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17026

D14013931 was for FC. This diff is similar optimizations for Conv.
A subtle difference is that in FC, once we fold col_offset into bias during pre-processing step, we can treat everything as if A_zero_offset == 0 (symmetric quantization of A).
In Conv, we can't do this because padding still needs to use the original A_zero_offset.
From requantization point of view, once col_offset folded into bias, we can treat as if we're doing symmetric A quantization.
But, for steps involving padding like im2col, im2col fused with packing, and direct conv for depth-wise/group convolution we still need to pass the original A_zero_offset.

Reviewed By: jianyuh

Differential Revision: D14020276

fbshipit-source-id: c29caefd1127bbc6aff0e9d535939bb0c1ecb66c
2019-04-03 22:52:54 -07:00
72913a55a8 fix flake8 lint (#18835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18835
ghimport-source-id: 7b1f433ae51232822704d62699233688072cbc23

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18835 fix flake8 lint**
* #18826 [jit] run cpp tests for non-cuda builds in test_jit.py

...again

Reviewed By: ZolotukhinM

Differential Revision: D14766790

fbshipit-source-id: 29361a407589092831dfbc3c5d63d2834934cd02
2019-04-03 22:24:01 -07:00
0a4117a36e run cpp tests for non-cuda builds in test_jit.py (#18826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18826
ghimport-source-id: 7ffa3bc7ef7402a6d6eb6ba5849e197019d77bf8

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18826 [jit] run cpp tests for non-cuda builds in test_jit.py**

We did all the work of nicely separating our cpp tests that don't require
CUDA, but they aren't run from test_jit.py if CUDA is missing.

Reviewed By: ZolotukhinM

Differential Revision: D14766287

fbshipit-source-id: 9326b3a5c90f6c20fc8cfaf1a1885a363b91f30a
2019-04-03 22:23:58 -07:00
100f95a362 Fix the linter (#18842)
Summary:
Remove extra empty line
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18842

Differential Revision: D14767334

Pulled By: houseroad

fbshipit-source-id: 63224bc407949949e1eb5123d3f151e4ac8f6988
2019-04-03 21:37:01 -07:00
7e59c60454 Enforce single parent for script submodules (#18379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18379
ghimport-source-id: 9895ecc1ff7897e98853dc00675341f36726e7c7

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18379 Enforce single parent for script submodules**
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

The assumption that a ScriptModule has a single parent is present in
our serialization format, and likely a few other places. It is not
enforced on creation of script module hierarchies though, meaning that
problems associated with (e.g. replicating a module twice in the output
format) will not be caught until much later in the development cycle.

This patch enforces the property when a submodule is registered.
It also removes NamedModule since it is no longer necessary in this regime.
This will also allow the easy discover of a modules fully-qualified name
without needing to traverse the Module hierarchy.

Differential Revision: D14603722

fbshipit-source-id: 63ab5d0cccf7d66c7833e0adf9023024ca9607cb
2019-04-03 20:26:58 -07:00
b80a4fa201 Allow ints, floats, and tensors in conditional (#18755)
Summary:
Per our offline discussion, allow Tensors, ints, and floats to be casted to be bool when used in a conditional

Fix for https://github.com/pytorch/pytorch/issues/18381
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18755

Reviewed By: driazati

Differential Revision: D14752476

Pulled By: eellison

fbshipit-source-id: 149960c92afcf7e4cc4997bccc57f4e911118ff1
2019-04-03 17:12:17 -07:00
843e6234f5 Fix layernorm ad formula on weight and bias (#18233)
Summary:
Fix the layernorm formula when weight and bias passed in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18233

Differential Revision: D14760375

Pulled By: wanchaol

fbshipit-source-id: d6bd3b137bc04c391aa5c24d021d1f811ba2a877
2019-04-03 16:58:33 -07:00
0512e4e323 Unify namespace of script::Module (#18378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18378
ghimport-source-id: 55c29bb436a2153d29ff2f4488d99d8863c187b1

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18379 Enforce single parent for script submodules
* **#18378 Unify namespace of script::Module**
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

This removes individual OrderedDicts in favor of a single unified
namespace for all things in a script::Module. This removes a whole
class of bugs where both a method and an parameter could get the
same name, for instance.

Since we no longer have to expose OrderedDict::Item objects, a lot of
downstream code can be simplified.

We no longer now double-store names (both in the key of the dictionary,
and in the object itself).

Differential Revision: D14603723

fbshipit-source-id: b5f7551b3074679623edd6ea70269830353b4d4c
2019-04-03 16:04:17 -07:00
773ce4fbd0 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance (#18648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18648
ghimport-source-id: 1cf4a8fe91492621e02217f38cae5d7e0699fb05

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18661 Step 7: remove _unique
* #18655 Step 6: Rename _unique2 to unique and add int? dim
* #18654 Step 5: remove _unque_dim in favor of unique_dim
* #18651 Step 4: add support for unique with dim=None
* #18650 Step 3: Add support for return_counts to torch.unique for dim not None
* #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim
* **#18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance**

`unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back.

The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure.

Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure.

Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`.

Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure.

Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure.

Step 5: Remove `_unique_dim`. This might cause internal failure.

Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail.

Step 7: Remove `_unique`. This is very likely to cause internal failure.

This PR
======

This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements.

Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy.

Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`:

Before
---------

```python
print(torch.__version__)
%timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize()
```

```
1.0.1
192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

```python
print(torch.__version__)
%timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize()
```

```
1.0.1
226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After
-------

```python
print(torch.__version__)
%timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+83ab8ac
190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

```python
print(torch.__version__)
%timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+83ab8ac
232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Differential Revision: D14730905

fbshipit-source-id: 10026b4b98628a8565cc28a13317d29adf1225cc
2019-04-03 15:29:55 -07:00
7ae0263e1b Support replicating multi-GPU modules (#18687)
Summary:
If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See  #18591
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687

Differential Revision: D14706162

Pulled By: mrshenli

fbshipit-source-id: dca630d3308f2dbcf8b75629c452d7a64092ba42
2019-04-03 14:43:07 -07:00
eabd9eac2a flake8 fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18810

Differential Revision: D14758293

Pulled By: wanchaol

fbshipit-source-id: 975abe4fc5dc0dc4d43af61ec0f987e2c5670874
2019-04-03 14:14:18 -07:00
862aff641a Remove device_guard: False from native_functions that don't have a … (#18803)
Summary:
…tensor.

There's nothing to device_guard on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18803

Reviewed By: ezyang

Differential Revision: D14748091

Pulled By: gchanan

fbshipit-source-id: ed6f16d6f4d3f07b6d5ad9696f71a14333c228b8
2019-04-03 14:00:02 -07:00
cb959aa708 Switch our Linux machine AMI to a newer image. (#18433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18433
ghimport-source-id: 1c92f98b091232c0045a2e1db75d19c1f258ac1f

Differential Revision: D14748827

Pulled By: ezyang

fbshipit-source-id: a459451058cf5560811403bafb96c6ff083d7e3a
2019-04-03 13:50:37 -07:00
dfcd7b0185 QTensor (#18230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230

Implementing minimum qtensor API to unblock other workstreams in quantization

Changes:
- Added Quantizer which represents different quantization schemes
- Added qint8 as a data type for QTensor
- Added a new ScalarType QInt8
- Added QTensorImpl for QTensor
- Added following user facing APIs
  - quantize_linear(scale, zero_point)
  - dequantize()
  - q_scale()
  - q_zero_point()

Reviewed By: dzhulgakov

Differential Revision: D14524641

fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c
2019-04-03 13:17:11 -07:00
3af2d6d904 Enforce import order to make protobuf cpp implementation in python work (#18560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18560

We have to import python protobuf here **before** we load cpp extension.
Otherwise it breaks under certain build conditions if cpp implementation of
protobuf is used. Presumably there's some registry in protobuf library and
python side has to initialize the dictionary first, before static
initialization in python extension does so. Otherwise, duplicated protobuf
descriptors will be created and it can lead to obscure errors like

  Parameter to MergeFrom() must be instance of same class: expected caffe2.NetDef got caffe2.NetDef.

I think it also fixes https://github.com/facebookarchive/caffe2/issues/1573

Reviewed By: ezyang, iroot900

Differential Revision: D14622054

fbshipit-source-id: 2499eb88ecdee85ff8d845859048f7ae5da2a480
2019-04-03 13:17:08 -07:00
3b71f2e1f2 Pin onnx ir_version to 4 (#18768)
Summary:
to make test_operators.py more stable. in future, we will bump this up manually, and I think it's acceptable, since ir_version should be bumped too often.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18768

Reviewed By: zrphercule

Differential Revision: D14741514

Pulled By: houseroad

fbshipit-source-id: 0369dbc55424e345a113e49fc104a441ea290d58
2019-04-03 13:16:59 -07:00
8711df89cc fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18739)
Summary:
resubmit of https://github.com/pytorch/pytorch/pull/18704 with additional fixes

Fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18739

Differential Revision: D14737274

Pulled By: soumith

fbshipit-source-id: cfbbbf68b098594bd045861d1b2c085da693ea51
2019-04-03 12:52:50 -07:00
b5d8844bbe push magma init into lazyInitCUDA (#18527)
Summary:
Tries to fix C++ API's usage of MAGMA-based functions.

Attempts to Fix https://github.com/pytorch/pytorch/issues/18074
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18527

Differential Revision: D14691694

Pulled By: soumith

fbshipit-source-id: dd04e74418e486d73ea4a92193ddf79352ed71ba
2019-04-03 12:47:34 -07:00
ed9724f385 For some files that are touched by the QTensor diff (#18765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18765

att

Reviewed By: ZolotukhinM

Differential Revision: D14733442

fbshipit-source-id: 525002034e6dccc2045da645e1193671fd0474b3
2019-04-03 12:47:31 -07:00
a21e256e8d Fix contiguous AD and Autogradzero inconsistency (#18633)
Summary:
Fixes #17962
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18633

Differential Revision: D14700449

Pulled By: wanchaol

fbshipit-source-id: 3d15d67c01b69b28394a0f2f001db90ed9fd31dc
2019-04-03 12:47:28 -07:00
5950c1e8c4 Added indexing for bool tensors and bool Indices (#18583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18583
ghimport-source-id: 2b1941449827f4ab632fa0f5c8cf0791a6be0845

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18583 Added indexing for bool tensors and bool Indices**
* #18505 Added numpy conversion
* #18166 Bool Tensor for CUDA

-----------
This PR enables bool tensor indexing and indexing with bool indices. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
    a) CPU [Done]
    b) CUDA [In review]
3. Tensor Conversions. [In review]
4. Tensor Indexing. [This PR]
5. Tensor Operations.
6. Back compatibility related changes.

TODO:
as a follow up, we should move nonzero method from TH to Aten to make code cleaner.

Change:
```
v = torch.tensor([True, False, True], dtype=torch.bool)
boolIndices = torch.tensor([True, False, False], dtype=torch.bool)
v[boolIndices]
-> tensor([True], dtype=torch.bool)

v = torch.randn(5, 7, 3)
boolIndices = torch.tensor([True, False, True, True, False], dtype=torch.bool)
v[boolIndices]
->
tensor([[[ 0.5885, -0.3322,  0.7388],
         [ 1.1182,  0.7808, -1.1492],
         [-0.7952,  0.5255, -0.0251],
         [ 0.7128,  0.8099,  1.2689],
         [-0.7018, -1.4733, -0.3732],
         [ 0.4503,  0.4986, -1.1605],
         [ 0.3348, -1.3767, -0.2976]],

        [[-2.0303, -0.4720, -0.1448],
         [-0.1914, -0.6821,  2.0061],
         [-1.0420, -0.1872, -0.3438],
         [ 1.7587, -0.4183, -0.7577],
         [ 1.0094, -0.1950, -0.2430],
         [ 0.1174,  0.3308, -0.5700],
         [ 0.1110, -0.2714,  1.3006]],

        [[-0.1946, -1.4747, -0.4650],
         [-1.0567,  1.0110, -0.2809],
         [ 0.3729, -0.5699,  0.0815],
         [-0.7733, -0.8316,  0.1674],
         [ 1.2000, -0.3745, -1.1679],
         [ 1.7105,  0.9851, -0.1907],
         [-1.1077,  0.2086, -0.0548]]])
```

Differential Revision: D14673403

fbshipit-source-id: 2b88ec2c7eb26a4f5ef64f8707fb68068d476fc9
2019-04-03 12:47:26 -07:00
65dfe1203f add an assertion to check the param num (#18145)
Summary:
Introduce this check to see whether it will break any existing workflow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18145

Reviewed By: dzhulgakov

Differential Revision: D14511711

Pulled By: houseroad

fbshipit-source-id: a7bb6ac84c9133fe94d3fe2f1a8566faed14a136
2019-04-03 12:47:23 -07:00
4afc067fed add Android NDK param to CI docker build script (#18782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18782
ghimport-source-id: 6c4bde7dc835b59209c1d5f7b243f00c9fe99de2

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18782 [pytorch] add Android NDK param to CI docker build script**

Inspired by discussion: https://github.com/pytorch/pytorch/pull/16242

Reviewed By: dreiss

Differential Revision: D14739471

fbshipit-source-id: 0a081045186cbf359eb3cdadee722741cd8cd62f
2019-04-03 12:47:20 -07:00
a7b82a44c4 Upgrade mkldnn-bridge for dnnlowp support (#16308)
Summary:
The mkldnn-bridge is upgraded in this PR to support DNNLOWP operators.
Meanwhile, APIs have been updated in caffe2 to use latest version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16308

Differential Revision: D14697018

Pulled By: yinghai

fbshipit-source-id: ca952589098accb08295fd5aa92924c61e74d69c
2019-04-03 12:47:17 -07:00
46a68c1b37 add 'abs' builtin
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18502

Differential Revision: D14750173

Pulled By: eellison

fbshipit-source-id: 359cf08938ada442ca1a3b3ea14022ce10229499
2019-04-03 12:47:13 -07:00
0916b5419a Fix dense Embedding to work with double backward (#9078)
Summary:
Fixes : #6469

1. `ATen/native/native_functions.yml` had [dispatch](03e7953a98/aten/src/ATen/native/native_functions.yaml (L451-L455)) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](03e7953a98/aten/src/ATen/native/Embedding.cpp (L35-L45)) to it, thus leading to error.

2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](03e7953a98/aten/src/ATen/native/Embedding.cpp (L93)).

Both have been solved and checked against (on CUDA and CPU)

1.  As mentioned in the issue
```
import torch

class Test(torch.nn.Module):

    def __init__(self):
        super(Test,self).__init__()
        self.embd = torch.nn.Embedding(1000, 100)
        self.dense = torch.nn.Linear(100, 1)

    def forward(self, inp):
        inp = self.embd(inp)
        return self.dense(inp)

test = Test()
inp = torch.tensor([0,1,2,1,1])
out = test(inp)
raw_loss = out.mean(dim=0)

loss_grad = torch.autograd.grad(outputs=raw_loss,
                         inputs=list(test.parameters()),
                         retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm

loss.backward(retain_graph=True)

print(test.embd.weight.grad)

```

2. Test Script
```
import torch
import time
start = time.time()
l = [1,1]*100
input = torch.tensor([[1,0],[1,0]],device='cpu')
embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu')

sq = embedding_matrix * embedding_matrix
emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False)

print('Embedding Matrix')
print(embedding_matrix)
print('-----------------')

sum_ = emb.sum()#prod.sum()

loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True)

print('Gradient')
print(loss_grad)
print('-----------------')

sum2_ = sum_ + loss_grad.sum()
print(sum2_)
sum2_.backward()

print(embedding_matrix.grad)
print(time.time() - start)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9078

Reviewed By: ezyang

Differential Revision: D14691901

Pulled By: soumith

fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f
2019-04-03 09:50:34 -07:00
c0ad6747a9 Highlight NCCL all_reduce and all_gather requirements (#18741)
Summary:
See #18689
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18741

Differential Revision: D14726874

Pulled By: mrshenli

fbshipit-source-id: a92404c653e3c62fc23fa3ccacfb3b2959b2e307
2019-04-03 09:50:29 -07:00
2658190def Updating submodules
Reviewed By: zpao

fbshipit-source-id: ea0b06ce68d3fd6092eaea7c835a8b51c1120ea0
2019-04-03 08:30:25 -07:00
5e33085f27 Make it possible for users for select /Zi or /ZI over /Z7 when using MSVC (#18790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18701.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18790

Differential Revision: D14748195

Pulled By: ezyang

fbshipit-source-id: e50df1b5ca199a88d7b5ea3ea45d25d23cd31a27
2019-04-03 08:24:52 -07:00
06b7fe59f2 use optimization in D14020675 (#16945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16945

As title

Reviewed By: jianyuh

Differential Revision: D14020769

fbshipit-source-id: fc0f05fcc57bfe9b4aa0c5750060d7b2ba57dd7a
2019-04-03 08:05:10 -07:00
2113ea6fbf Add device and dtype to storage. (#18749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749
ghimport-source-id: 9026a037f5e11cdb9ccd386f4b6b5768b9c3259b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* #18750 Use non-legacy constructors for tensor deserialization.
* **#18749 Add device and dtype to storage.**

The goal here is to fix our serialization, which currently depends on the legacy constructors.  Having dtype and device on Storage allows us to use the non-legacy constructors.

This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor.

Differential Revision: D14729516

fbshipit-source-id: bf4a3e8669ad4859931f4a3fa56df605cbc08dcb
2019-04-03 07:59:02 -07:00
a3da3653eb Use non-legacy constructors for tensor deserialization. (#18750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18750
ghimport-source-id: f1475cfb67841c41d9867d4429ba9125d5c7dd07

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* **#18750 Use non-legacy constructors for tensor deserialization.**
* #18749 Add device and dtype to storage.

Deserialization currently uses legacy constructors.  This is bad because we need to maintain them, but there is a more immediate problem:
1) We are trying to implement device caching on TensorImpl to get rid of a virtual dispatch
2) This doesn't work if one is able to change the device of a Tensor underlying a Variable.
3) Deserialization does 2)

So the plan is to change deserialization, then enforce that we don't change the device out from underneath a Variable.

Differential Revision: D14729513

fbshipit-source-id: 090d6cdb375b94dc1bf4f554b2df243952b8cdc6
2019-04-03 07:54:11 -07:00
48f70ea0a2 Added numpy conversion (#18505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18505
ghimport-source-id: f3c9b9251e5793f9e192f587194ddfebb45facc1

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18505 [WIP]Added numpy conversion**
* #18166 Bool Tensor for CUDA

Differential Revision: D14646403

fbshipit-source-id: 79d39d692c778ce1981c1d35b1c33e3d93111041
2019-04-03 07:28:24 -07:00
7349dbb7ce Remove THTensor_(newUnfold). (#18773)
Summary:
It's not used and unfold's use of `device_guard: False` is scary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18773

Differential Revision: D14736526

Pulled By: gchanan

fbshipit-source-id: 6281a284bee45fa5038783e4c1ed4d1ed7ca81ab
2019-04-03 07:08:28 -07:00
cb66759600 temp fix for flake8 error (#18788)
Summary:
Fix lint error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788

Reviewed By: houseroad

Differential Revision: D14741840

Pulled By: mingzhe09088

fbshipit-source-id: 1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3
2019-04-02 22:52:52 -07:00
3079d95b6c Fix flake8 issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18762

Reviewed By: houseroad

Differential Revision: D14734152

Pulled By: ifedan

fbshipit-source-id: 5adf123f88273895ad34ee9041896358d686de08
2019-04-02 21:18:01 -07:00
40a54bf2f1 Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531

Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service,
we would like to change it to C10_LOG_FIRST_N to prevent that.

Reviewed By: dzhulgakov

Differential Revision: D14647704

fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e
2019-04-02 21:03:37 -07:00
80404cb2f5 Add support for getting TensorProto argument (#18364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18364

att

Reviewed By: bddppq

Differential Revision: D14584784

fbshipit-source-id: 03f9207d5cf4f7f4b812428a931edbcdcb21ca8d
2019-04-02 20:58:28 -07:00
31849bc524 make test module hook use save/load (#18284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18284
ghimport-source-id: 5a92c03fda19072ffb6afd40e0f56806716c7be6

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18296 [jit] Add namespacing for ScriptClasses
* **#18284 [jit] make test module hook use save/load**
* #18211 [jit] Turn script_type_parser into a class
* #18148 [jit] python interop for script classes

Instead of python-printing and comparing strings (which does not capture
depdency information, etc.), use save/load on in-memory buffers and
compare the main module contents inside the buffer

Reviewed By: ailzhang

Differential Revision: D14581129

fbshipit-source-id: 52264ae9ce076775ab3fd1a0c32c8d6f6677a903
2019-04-02 18:09:52 -07:00
2d07993bcb Add ability to specialize class types to ArgumentSpec (#18314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18314
ghimport-source-id: 8cecb768d476ab19c9460f39c8f94a764e4cb052

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18314 Add ability to specialize class types to ArgumentSpec**
* #18226 Add Slot type to abstract the raw pointers being used for slots.

Differential Revision: D14574395

fbshipit-source-id: cc3af6e56e9ae52990f4a1ad56ecceaa2d493577
2019-04-02 17:35:57 -07:00
5f5a2aaab9 Operator-level performance microbenchmarks (#18740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740

Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure

* benchmark_core.py : core utiltiites for running microbenchmark tests
* benchmark_caffe2.py : Caffe2 specific benchmark utilitites
* benchmark_pytorch.py: PyTorch specific benchmark utilities
* benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP.

The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend.

Include two operator microbenchmarks; support both Caffe2/PyTorch:
* MatMul
* Add

Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo.

Reviewed By: zheng-xq

Differential Revision: D13887111

fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce
2019-04-02 17:06:19 -07:00
b832b99afb Bool Tensor for CUDA (#18166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166
ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18166 Bool Tensor for CUDA**
* #18165 Resolved comments from Bool Tensor for CPU PR
------

This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
a) CPU [Done]
b) CUDA [This PR]
3. Tensor Conversions.
4. Tensor Indexing.
5. Tensor Operations.
6. Back compatibility related changes.

Change:
Enable bool tensor in CUDA with the following operations:

    torch.zeros
    torch.tensor
    torch.ones
    torch.rand/rand_like/randint/randint_like
    torch.full
    torch.full_like
    torch.empty
    torch.empty_like

Tested via unit tests and local scripts.

Differential Revision: D14605104

fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c
2019-04-02 16:17:05 -07:00
b77e3c2ca1 Add helpful information to the gradient/inplace operation exception (#18523)
Summary:
To debug a `one of the variables needed for gradient computation has been modified by an inplace operation` error, I wanted to know *which* variable has been modified, so I extended the error message with what information is easily available at this point.

Before:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
```

After:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [80, 1]], which is output 0 of UnsqueezeBackward0, is at version 1, not expected version 0. Hint: enable anomaly detection to find the forward pass operation which modified it.
```

The hint to enable anomaly detection is only shown when it is not enabled. It's meant to save people some googling. I'd even go further and reference `torch.autograd.set_detect_anomaly(True)`, but maybe we're not running Python?

Disclaimer: I haven't looked at other parts of the code to check if using `std::stringstream` is acceptable practice, let me know if it isn't. Similarly, I haven't checked about indentation practices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18523

Differential Revision: D14683249

Pulled By: soumith

fbshipit-source-id: f97a99d4aabea7461df766d66cd72300b48e2350
2019-04-02 15:23:04 -07:00
74d9146559 build_variables.py: turn on link_whole for _C_impl library. (#18763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18763

Without `link_whole` flag in opt-builds some of the files are not linked into `_C_impl` library, which causes some of static initializers not to run (namely, registering an cutomPythonOperation from python_interpreter.cpp). This diff fixes it.

Differential Revision: D14732471

fbshipit-source-id: 57cff6b4b6d479ad7ab7fd29f677746d91d6ff45
2019-04-02 15:17:13 -07:00
84a9694ed0 Fix windows msbuild bug (#18748)
Summary:
Fix the bug introduced by #18681 where an undefined variable was being used to limit max cpu count when building for Windows without Ninja.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18748

Differential Revision: D14733209

Pulled By: soumith

fbshipit-source-id: 52fc0dd4dde99da75a6956b63f02da2e647eed4f
2019-04-02 14:35:40 -07:00
2e97c82470 torch.cross' dim default changed to c10::optional instead of int=-1 (#17582)
Summary:
Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim.

Fixes #17229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582

Differential Revision: D14483063

Pulled By: ifedan

fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746
2019-04-02 13:27:00 -07:00
3027e783b1 Fix multi-configuration on Windows CMake (CUDA) (#18548)
Summary:
Multiple configurations is the default (eg. Release;Debug) on Windows and this check always broke this configuration as CMAKE_BUILD_TYPE was not set. The workaround was to always set CMAKE_BUILD_TYPE to Debug or Release, which was very unfortunate.

The correct method is to use generator expressions that expand depending on the current CONFIG being processed.

Side note: Anywhere else CMAKE_BUILD_TYPE is checked should probably be fixed too.
Note that the CMakeLists.txt forces it in to Release mode. However, I came across this error when importing the prebuilt Config in to another project, where CMAKE_BUILD_TYPE was not set.

> 3>CMake Error at pre_built/pytorch-1.0.1/share/cmake/Caffe2/public/cuda.cmake:380 (message):
> 3>  Unknown cmake build type:

Proper support for configurations would mean we can build debug and release at the same time and as you can see, it is less CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18548

Differential Revision: D14730790

Pulled By: ezyang

fbshipit-source-id: 70ae16832870d742c577c34a50ec7564c3da0afb
2019-04-02 13:19:07 -07:00
36237c4893 Fix flake8 issues in gragrad test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18727

Differential Revision: D14724887

Pulled By: ifedan

fbshipit-source-id: 8c1db6460303e746e4aea0142302b8d61277c067
2019-04-02 12:45:18 -07:00
f095c34b73 Register operators by passing arguments to RegisterOperators constructor (#18577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18577

This is also part of the legacy API and we need to support it if we want to replace it.

Reviewed By: dzhulgakov

Differential Revision: D14671432

fbshipit-source-id: 007abf4ab816647a509fc08e35d79b6c1aa55b03
2019-04-02 12:33:33 -07:00
58f5954252 Allow registering an operator schema without a kernel (#18551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18551

This is helpful for defining a set of operators as an interface but not adding concrete kernels just yet.
The registration logic will ensure that any other libraries that add kernels for these schemas exactly match the schema defined here.

Reviewed By: dzhulgakov

Differential Revision: D14660208

fbshipit-source-id: 7adb5a4876cff5a0ad21d92d8c450cb889f00cc3
2019-04-02 12:33:30 -07:00
7a37e066e6 Improve compiler error messages of the op registration API (#18550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18550

When the operator registration API is used wrongly, in most cases we should now get a nice compiler error
instead of weird template error messages.

This is done by making the enable_if conditions more broad so they also match error cases,
but then having static_asserts against these error cases inside the function.
Before that, since the function didn't match, the error message said something like "no function found to match your call",
now it will show the error message specified in the static_asserts.

Reviewed By: dzhulgakov

Differential Revision: D14659178

fbshipit-source-id: 7ca4fb72d9051eadf0a7e2717b962bf1213a52b2
2019-04-02 12:33:27 -07:00
ae1d13a06f Improve and test error messages for signature mismatches (#18547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18547

- Argument indices in the error messages are 1-indexed not 0-indexed.
- Add test cases that a mismatching signature actually shows the correct error messages

Reviewed By: dzhulgakov

Differential Revision: D14656695

fbshipit-source-id: 55e45634baa3117e18b8687ea6b2a2f83715bdf6
2019-04-02 12:33:24 -07:00
bb8a0d717c Enable gmock and fix system gtest issue (#18706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18706

- Enable gmock
- Fix issue where the gtest source files in third_party would include system gtest headers

Reviewed By: ezyang

Differential Revision: D14715302

fbshipit-source-id: 5335390913e651bda85c69d7ea9b5c1bce58f172
2019-04-02 12:33:22 -07:00
01c03caacc Emergency workaround for apt-get failure. (#18733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18733
ghimport-source-id: b56766fb4b1084d8a7947cf622275d44e325141b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18733 Emergency workaround for apt-get failure.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: dreiss

Differential Revision: D14725779

fbshipit-source-id: 6855347853a3f13461ca267ed563e2db5815166e
2019-04-02 10:49:21 -07:00
0b6ed83f33 Fix clang-tidy errors in torch/csrc/distributed
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18709

Differential Revision: D14725936

Pulled By: pietern

fbshipit-source-id: 307bc446d53da5d0e04d730bb51b7fb29212ace3
2019-04-02 10:32:37 -07:00
385a755b68 Undefined behavior with memset of std::string to 0 (#18703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18703

 `zeroPtr` is sometimes a `std::string` tensor, so `memset` to 0 is undefined behavior.

This might be accidentally safe with `std::string` implementation that use SSO (Small String Optimization), but will crash otherwise.

Reviewed By: zheng-xq

Differential Revision: D14714458

fbshipit-source-id: 012a18464e6514d38ff791509b88ddc3fc55b2b1
2019-04-02 10:10:11 -07:00
a799751e33 Revert D14717015: [pytorch][PR] fix nccl compilation to make sure it compiles for architectures that pytorch compiles for
Differential Revision:
D14717015

Original commit changeset: 4aac036f57e5

fbshipit-source-id: c820b8dfb27564271e6b80e133fe655658a7c25c
2019-04-02 09:39:03 -07:00
1f5a46ab05 Automatic update of fbcode/onnx to f0d7df2c643c4e37f1fd7735ef02c972c4d19fb5 (#18695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18695

Previous import was fb1a80692c1ab0bd27b1072f2e7bffacba336777

Included changes:
- **[f0d7df2c](https://github.com/onnx/onnx/commit/f0d7df2c)**: fix testcase names of maxpool_2d_ceil and averagepool_2d_ceil (#1896) <karljang>

Reviewed By: zrphercule

Differential Revision: D14709993

fbshipit-source-id: 7fe2145a481ea2c1b6d85ba1c85c662200a53241
2019-04-02 09:16:48 -07:00
c484cf43a0 Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455

Reviewed By: ezyang

Differential Revision: D14672084

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
2019-04-02 08:48:19 -07:00
aed7c9bc96 Improve Backend comment. (#18567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18567
ghimport-source-id: 1e50e611a3afcfae86828b7afe06c3fdc6a7bef7

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18567 Improve Backend comment.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: dzhulgakov

Differential Revision: D14666189

fbshipit-source-id: 64a41c4a998b1a59ff780d1ae06fa16e5ef3c7c4
2019-04-02 08:06:48 -07:00
baac5489a8 Expose alias multinomial methods to ATen (#17904)
Summary:
This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods.

cc: neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904

Differential Revision: D14700205

Pulled By: ezyang

fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3
2019-04-02 07:56:41 -07:00
5ade96fc84 Update cpp_extension.py (#18638)
Summary:
Hi. It seems that when building CPP-extensions with CUDA for Windows, an `extra_cuda_cflags` options are not properly forwarded to `nvcc`.

Use of extra CUDA options is necessary to build, for instance, a InplaceABN (https://github.com/mapillary/inplace_abn), which requires `--expt-extended-lambda` option.

This PR adds one line that correctly appends `extra_cuda_cflags`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18638

Differential Revision: D14704270

Pulled By: ezyang

fbshipit-source-id: e1e330d193d9afd5707a5437a74c0499460d2b90
2019-04-02 07:56:38 -07:00
fba89b2ae1 fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18653

Differential Revision: D14713920

Pulled By: ezyang

fbshipit-source-id: 170295a162dd23916c1dcc9330918d33277cc9ed
2019-04-02 07:51:30 -07:00
d5bf6ddc29 Kill LegacyBridge functions that don't do multiple dispatch. (#18696)
Summary:
At some point, we needed these functions to deal with autograd dispatching to the sparse of TH version of a backwards.  But we rewrote all backwards definitions in terms of native functions, so this is no longer necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18696

Differential Revision: D14710834

Pulled By: gchanan

fbshipit-source-id: b22568c58eefc79d672555bd8832398ccd965cb7
2019-04-02 07:34:55 -07:00
af84371ba8 Updating submodules
Reviewed By: zpao

fbshipit-source-id: da3cd711bb81b07c6c284426ffc5e10a969b0d2b
2019-04-02 06:50:53 -07:00
f084c129db add Int8FCRelu (#18673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18673

Add a fused FC + Relu

Reviewed By: csummersea

Differential Revision: D14667055

fbshipit-source-id: d88fefba008fc0ca450291532d2b320694c6b785
2019-04-01 23:50:30 -07:00
8e873ce273 Fix uninitialized value in pickler (#18678)
Summary:
Fixes #18671
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18678

Differential Revision: D14708969

Pulled By: driazati

fbshipit-source-id: d372c6e3a2a3d3fc48d8afc1fa6807f2ce0e5c6e
2019-04-01 17:34:36 -07:00
2e029db2f9 fixes multiprocessing serialization for integer nn.Parameter (#18639)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639

Differential Revision: D14711565

Pulled By: soumith

fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19
2019-04-01 17:15:42 -07:00
fc6296d777 fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18704)
Summary:
cc: t-vi gchanan zou3519

This fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18704

Differential Revision: D14717015

Pulled By: soumith

fbshipit-source-id: 4aac036f57e564b05d759662e8ad7a80170901c0
2019-04-01 17:10:42 -07:00
1b25fdbcd0 More type stubs (#18511)
Summary:
Added stubs for:

* The `device` module
* The `cuda` module
* Parts of the `optim` module
* Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start.

This would close #16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue.

The big remaining missing package is `nn`.

Also added a `py.typed` file so mypy will pick up on the type stubs. That closes #17639.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511

Differential Revision: D14715053

Pulled By: ezyang

fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9
2019-04-01 16:03:58 -07:00
aa23b8c664 NCCL build fix WITH_DISTRIBUTED=1.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18691

Reviewed By: ezyang

Differential Revision: D14706205

Pulled By: gchanan

fbshipit-source-id: 802f19bfd7df3703c0dbce03036e2f2e32eb3efb
2019-04-01 15:58:54 -07:00
16f07d7dac caffe2 - set up correct inheritance structure for remaining operator test classes (#18622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18622

Set up correct inheritance structure for remaining operator test classes

Reviewed By: ezyang

Differential Revision: D14685941

fbshipit-source-id: a6b1b3be325935b7fec7515be13a4994b3016bf0
2019-04-01 15:53:22 -07:00
20b63aa977 Peephole Optimize Shape Ops (#18549)
Summary:
Peephole optimize ops that just require Dimensioned Tensor Type, which is what we specialize graphs on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18549

Differential Revision: D14690827

Pulled By: eellison

fbshipit-source-id: 9d7439eb584f0a5b877f5aa53cf80150f00e7e5f
2019-04-01 15:39:43 -07:00
4a0f842d42 Deprecated lambda based API (#18542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18542

This adds the deprecated API for defining kernels as lambdas. The new API for defining kernels as lambdas was introduced in D14653005.

Reviewed By: dzhulgakov

Differential Revision: D14653551

fbshipit-source-id: 99900f1436716c69e52c83b68333b642ec2c8558
2019-04-01 14:58:35 -07:00
723ce02a55 deprecated function based API (#18444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18444

This adds the deprecated function based API to c10::RegisterOperators().
This is the API currently exposed under jit::RegisterOperators() and we need to support it for backwards compatibility.

Reviewed By: dzhulgakov

Differential Revision: D14514218

fbshipit-source-id: c77676851cfd431d66f18fd8038cf153a3a7d7cc
2019-04-01 14:58:32 -07:00
246f5c412e Revert "Tensor construction codemod(raw_mutable_data) (#16373)" (#18680)
Summary:
This reverts commit d73c830e236f5b980e5c91914b818d150b60278c.

We have observed significant perf drop when training ResNext101 with multiple amd GPUs:

Before:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1636/console
2 GPUs ResNext training got 150\~160 imgs/sec
4 GPUs ResNext training got 270\~280 imgs/sec

After:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1637/console
Both 2 and 4 GPUs ResNext training drop to 110\~120 imgs/sec

Similar perf drop are seen on ResNet50 training jobs as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18680

Differential Revision: D14702941

Pulled By: bddppq

fbshipit-source-id: 828141805afc23f25c08d4a2eb6d4b99f817c128
2019-04-01 14:39:13 -07:00
bdfdf6c2b9 C++ handler for gradient reduction (#18251)
Summary:
This commit adds the `c10d::Reducer` class that hooks into autograd
and performs gradient bucketing and reduction. These are the core
parts of `nn.parallel.DistributedDataParallel` that up to now were
only usable for CUDA models.

This should enable the following:

* Distributed data parallelism for models defined using the C++ frontend.
* Allow overlap of gradient computation and reduction for non-CUDA models.
* Enable distributed data parallelism for models with some unused parameters.

This does not include any logic for computing bucket assignment, which
can be done separately; either by observing autograd execution order
(this is what Apex does), or by assigning buckets based on some
maximum byte size, or both.

Also see #17757 and #13273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251

Reviewed By: mrshenli

Differential Revision: D14571899

Pulled By: pietern

fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c
2019-04-01 14:30:02 -07:00
a0285dd0f4 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 735fc388bff7066e8f46526266a73bf35e121442
2019-04-01 13:59:58 -07:00
89e9b1cf8e add ConvRelu schema (#18693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18693

As title

Reviewed By: protonu

Differential Revision: D14662880

fbshipit-source-id: 3664faa660a04e1f528a413d2a1700b872c3c684
2019-04-01 13:09:07 -07:00
90a5c56988 offload scripts from win-test.sh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18601

Differential Revision: D14711856

Pulled By: kostmo

fbshipit-source-id: 75fe620541fe2903f69a53dbd1b6d51a0d718113
2019-04-01 13:04:30 -07:00
929258a680 Some fixes for the build script on Windows (#18681)
Summary:
Fixes https://discuss.pytorch.org/t/pytorch-build-from-source-on-windows/40288/13?u=peterjc123.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18681

Differential Revision: D14711039

Pulled By: soumith

fbshipit-source-id: f7e1a94b163064c055670b2925cd4502e7773599
2019-04-01 12:42:51 -07:00
d6c269c33e Fix for double backwards tests (#18190)
Summary:
If none of the outputs require_grad, we don't actually check gradgrad, instead we will check that their numerical gradients are 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18190

Differential Revision: D14563388

Pulled By: ifedan

fbshipit-source-id: a4eb94c9eb60f14dbe6986cd8cef1fe78a7bc839
2019-04-01 12:33:30 -07:00
be01c90797 Add string index/slice operations (#18247)
Summary:
Adds support for string indexing (`"a"[0]`) and slicing (`"abc"[1:3]`)
to script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18247

Differential Revision: D14574486

Pulled By: driazati

fbshipit-source-id: 4b42aa0881e5398ea7f112be46c0335e6e19dced
2019-04-01 12:11:35 -07:00
af9335436d Re-land Parsing file check (#18570)
Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.

Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570

Reviewed By: driazati

Differential Revision: D14707285

Pulled By: eellison

fbshipit-source-id: 3a0265928aa8cad78961723d8bf0fbf871fdb71d
2019-04-01 11:56:32 -07:00
3749d65a7e Create Node2Vec ModuleKeeper
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18504

Reviewed By: sunnieshang

Differential Revision: D14632091

fbshipit-source-id: d4544866552dc6bcbc7515be9e88cb11e7622a44
2019-04-01 10:36:23 -07:00
822c8ee143 use acc16 only when n>128 and k>128 in Skylake (#18672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18672

In Skylake, when n < 128 or k < 128, acc16 is slower.

Reviewed By: jianyuh

Differential Revision: D14700576

fbshipit-source-id: 80ca9f1af4626637eed9c5ca49f95ae744811189
2019-04-01 08:52:28 -07:00
4c74cf7489 Move ideep singleton registration to ATen from C2. (#18335)
Summary:
Since we are going to add ideep to ATen, and ATen is always compiled, it makes sense to have the registration in ATen rather than C2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18335

Reviewed By: bddppq

Differential Revision: D14578652

Pulled By: gchanan

fbshipit-source-id: 4d77fcfc21a362b21d5291a127498aa722548873
2019-04-01 08:00:33 -07:00
ddbfdc911d Create torch/lib directory before copying _C.lib on Windows environment. (#18666)
Summary:
`python setup.py develop` fails with following messages.
~~~
...
-- Building with NumPy bindings
-- Not using cuDNN
-- Not using MIOpen
-- Not using CUDA
-- Using MKLDNN
-- Not using NCCL
-- Building without distributed package

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd to C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd
copying torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python
building 'torch._C' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\torch
creating build\temp.win-amd64-3.7\Release\torch\csrc
...
creating C:\data\source\pytorch\build\lib.win-amd64-3.7\torch
C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\data\source\pytorch\torch\lib /LIBPATH:C:\data\dlenv\libs /LIBPATH:C:\data\dlenv\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" shm.lib torch_python.lib /EXPORT:PyInit__C build\temp.win-amd64-3.7\Release\torch/csrc/stub.obj /OUT:build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib /NODEFAULTLIB:LIBCMT.LIB
   ライブラリ build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib とオブジェクト build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.exp を作成中
コード生成しています。
コード生成が終了しました。
copying build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd -> torch
copying build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> caffe2\python
copying build/temp.win-amd64-3.7/Release/torch/csrc/_C.cp37-win_amd64.lib -> build/lib.win-amd64-3.7/torch/lib/_C.lib
error: could not create 'build/lib.win-amd64-3.7/torch/lib/_C.lib': No such file or directory
~~~

When `python setup.py install` is executed, `torch/lib`  has been created by previous process (copying many files) and this copy succeeds. But in develop mode, that process does not executed and this copy fails.

This patch creates `torch/lib` directory if do not exist.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18666

Differential Revision: D14704269

Pulled By: ezyang

fbshipit-source-id: b2d7c698a906b945bf34bb78f17b91b4fdfd3294
2019-04-01 07:28:08 -07:00
8276d82f78 Move flags that do not work on MSVC (#18686)
Summary:
MSVC errors on these flags as they are not supported
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18686

Differential Revision: D14704254

Pulled By: ezyang

fbshipit-source-id: 936d33ed6b7474d7774a49505cdac50dbe8dd99a
2019-04-01 07:28:05 -07:00
44b5891121 Fix unused lambda capture warnings (#18662)
Summary:
```
aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.DEFAULT.cpp:109:104: warning: lambda capture 'combs' is not used [-Wunused-lambda-capture]
    parallel_for(0, combs, internal::GRAIN_SIZE / (16 * m), [p, self_start, self_end, n, m, res_start, combs](int64_t k, int64_t end) {
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18662

Differential Revision: D14699379

Pulled By: bddppq

fbshipit-source-id: 5062d4327bb5f7b485c2ffa30c98e10576416f03
2019-03-31 22:35:58 -07:00
505d50ea90 handle a rare case of histogram min is inf/nan (#18239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18239

When min is inf or nan, we get UBSAN errors

Reviewed By: csummersea

Differential Revision: D14537668

fbshipit-source-id: e70ffb5ecd2b10793356070c69fdabf8f25b203e
2019-03-31 21:32:54 -07:00
6841537933 Delete duplicated technical content from contribution_guide.rst (#18628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18628
ghimport-source-id: d94b81a6f303883d97beaae25344fd591e13ce52

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18629 Provide flake8 install instructions.
* **#18628 Delete duplicated technical content from contribution_guide.rst**

There's useful guide in contributing_guide.rst, but the
technical bits were straight up copy-pasted from CONTRIBUTING.md,
and I don't think it makes sense to break the CONTRIBUTING.md
link.  Instead, I deleted the duplicate bits and added a cross
reference to the rst document.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14701003

fbshipit-source-id: 3bbb102fae225cbda27628a59138bba769bfa288
2019-03-31 19:13:22 -07:00
35bc83524d Provide flake8 install instructions. (#18629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18629
ghimport-source-id: 66a8871c56ffcfa7d4bfdf601e180fae99194e28

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18629 Provide flake8 install instructions.**
* #18628 Delete duplicated technical content from contribution_guide.rst

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14701004

fbshipit-source-id: b64292f0ef01b7894cf6b9ff8d5fd9e921c8d162
2019-03-31 18:59:18 -07:00
19fe2b9db4 Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621

This diff added caffe2 support for onnxifi quantization.

Reviewed By: yinghai

Differential Revision: D14648767

fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3
2019-03-31 17:42:27 -07:00
3c70326cf4 Fix test on windows (#18667)
Summary:
Breakage in #18188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18667

Differential Revision: D14700133

Pulled By: driazati

fbshipit-source-id: 4cc26bd579fc1b074b3bef6046cc1030facee130
2019-03-31 16:24:21 -07:00
9c87543124 Enforce check ad in test_jit (#18509)
Summary:
If a test triggers autodiff, it must have a `DifferentiableGraph` in its differentiated forward graph, and this subgraph must have either the original aten node, or the corresponding nodes used in AD formula.

Typically a forward differentiable graph looks like this:
```
graph(%i0 : Float(),
      %i1 : Float()):
  %3 : Float() = prim::DifferentiableGraph_0(%i0, %i1)
  return (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Float(),
      %1 : Float()):
  %2 : Float() = aten::max(%0, %1)
  return (%2)
```
which tells us `aten::max(Tensor self, Tensor other) -> Tensor` is symbolically differentiable.

Update: there're a lot of cases (fusions/ConstantChunk/python implementations) that breaks it so I decided to make the check optionally take node names if different from function name.
~~[OLD]Theoretically I could also check if `aten::max` is in the differentiable block or not to be more precise, but there're also cases like `chunk` where in a differentiable block it's replaced with a prim node (ConstantChunk) and we will have to special case them. Any suggestions here (to be more precise or no) is very welcome!~~

We used to have a list containing nn tests should be run against AD, I moved it to an field when constructing our test(either torch or nn). I think it's cleaner this way, and it matches the fact that for the same op we support one schema of it but not all, in this way we could just turn on the corresponding test which triggers that supported schema.

cc: apaszke zdevito wanchaol ngimel for a review

[Done] :
- Going through a manual second pass of all tests to check if they should enable AD test or not....
- Add a readme about how to add AD for an op and how to add/enable its test in test_jit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18509

Differential Revision: D14696811

Pulled By: ailzhang

fbshipit-source-id: c5e693277baac585cd3aed5ab2c0e7faa5e6f29f
2019-03-31 08:51:30 -07:00
828a6a3b39 Use proper isnan check
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18663

Differential Revision: D14699385

Pulled By: bddppq

fbshipit-source-id: 596ad3371e7704802591e49f7e1c55dc6cd2896f
2019-03-31 02:08:11 -07:00
cb39bd9c2f pad_circular -> _pad_circular (#18608)
Summary:
pad_circular is really private, as circular padding is exposed via `F.pad`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18608

Differential Revision: D14691704

Pulled By: soumith

fbshipit-source-id: 8c2f90596feed670976115041efed3ca071e8306
2019-03-30 13:27:04 -07:00
a45b79d23f Fix wrap(at::Scalar) (#18632)
Summary:
Problem:
```cpp
// This function expects a `Variable` as input
inline PyObject* wrap(at::Tensor tensor) {
  return THPVariable_Wrap(Variable(std::move(tensor)));
}

inline PyObject* wrap(at::Scalar scalar) {
  // This function calls `wrap(at::Tensor tensor)` (the function above), but since
  // `scalar_to_tensor(...)` returns a `Tensor` and not a `Variable`, the call to
  // `wrap(at::Tensor tensor)` will fail with "Tensor that was converted to Variable
  // was not actually a Variable", which is not what we want.
  return wrap(scalar_to_tensor(scalar));
}
```

The right fix is to call `make_variable(...)` with the tensor returned from `scalar_to_tensor(scalar)`.

This unblocks https://github.com/pytorch/pytorch/pull/18230 as it is the only patch that hits this code path now. All other native functions that return Scalar (such as `item()` or `_local_scalar_dense()`) either has custom-defined implementation that doesn't go through this path, or is not exposed to Python at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18632

Differential Revision: D14689293

Pulled By: yf225

fbshipit-source-id: be7ba5d3de83a69533a2997de97ad92989ff78ee
2019-03-30 11:36:11 -07:00
0f6bf09db5 Deprecated type() -> scalar_type()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18642

Differential Revision: D14696848

Pulled By: ezyang

fbshipit-source-id: 43d1f86ecee5f6c6c5b70fd7d0e2063c3fc473ab
2019-03-30 10:55:46 -07:00
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
96456bfa4c Update documentation for CTCLoss (#18415)
Summary:
This is meant to resolve #18249, where I pointed out a few things that could improve the CTCLoss docs.

My main goal was to clarify:
- Target sequences are sequences of class indices, excluding the blank index
- Lengths of `target` and `input` are needed for masking unequal length sequences, and do not necessarily = S, which is the length of the longest sequence in the batch.

I thought about Thomas's suggestion to link the distill.pub article, but I'm not sure about it. I think that should be up to y'all to decide.

I have no experience with .rst, so it might not render as expected :)

t-vi ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18415

Differential Revision: D14691969

Pulled By: soumith

fbshipit-source-id: 381a2d52307174661c58053ae9dfae6e40cbfd46
2019-03-30 01:26:34 -07:00
2a58fd9844 Fallback kernels (#18443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18443

Allow registering a kernel without a dispatch key. In this case, the kernel becomes a fallback kernel that is called whenever no other kernel matches.
This is also useful for the legacy function based API (since that API doesn't know about dispatch keys) or any other custom ops that don't care about dispatch
and just want one kernel to be called no matter the dispatch key.

Reviewed By: dzhulgakov

Differential Revision: D14603258

fbshipit-source-id: 242dc8871dad2989ca25079854d0cc97429e7199
2019-03-30 00:07:34 -07:00
f4e87e193a Introduce lambda-based kernel API (#18541)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18541

Allow registering lambdas as c10 kernels.

Reviewed By: dzhulgakov

Differential Revision: D14653005

fbshipit-source-id: f867cc776b1339e83b7a2e1935f5cf924cfba44a
2019-03-30 00:07:31 -07:00
24752eb7b8 Report better errors when kernel or dispatch key are missing (#18302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18302

These might be use cases we want to support in the future, but they don't work yet.
Let's at least report an error instead of doing segfaults or worse.

Reviewed By: dzhulgakov

Differential Revision: D14572346

fbshipit-source-id: 49262ce131493bc887defe2978d8b22f202cd8cc
2019-03-30 00:07:28 -07:00
48e7f98917 Move stuff to cpp files (#18301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18301

Move code out of headers and templates into source files and non-templates.

Reviewed By: dzhulgakov

Differential Revision: D14572347

fbshipit-source-id: 9fd5d62d54000a95e93076cd73f591ba2c5c2653
2019-03-30 00:07:25 -07:00
14c28fabd2 Check kernel against function schema in c10 op registration (#18256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18256

This diff infers the function schema from the kernel function/functor and checks that it matches the specified function schema.

This diff does not allow (yet) to omit specifying the function schema in the registration API. That will come in a future diff.

Reviewed By: dzhulgakov

Differential Revision: D14552738

fbshipit-source-id: 00202b489ede19f26ae686c97416b38c72c11532
2019-03-30 00:07:22 -07:00
c4bb09cc42 Add functor- and function-based kernel registration API (#18162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18162

- Adds the API to register a functor- and function-based kernel.
- Change the experimental c10 ops to use this new API instead of the old one
- Deletes the old APIs in KernelRegistration.h and OpSchemaRegistration.h

Reviewed By: dzhulgakov

Differential Revision: D14514239

fbshipit-source-id: 35b2f6e8f62964e54886450a6a5fac812ed20f26
2019-03-30 00:07:19 -07:00
9abc8a5b47 New operator registration MVP (#18161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18161

This introduces version 0 for the new operator registration.

For now, it only works with kernels that are defined as stack-based functions.
This is actually not the intended public API for defining kernels, but it's the basis which is going to be used to define the public APIs (see diffs on top for them),
and it's also the API used for exposing caffe2 operators.

This diff also switches the mechanism for exposing caffe2 operators to the new mechanism.

Reviewed By: dzhulgakov

Differential Revision: D14514231

fbshipit-source-id: 454ab7b5b46a10203aa27b175400d23f818dd1df
2019-03-30 00:07:16 -07:00
6095814229 Fix trt installation in CI (#18609)
Summary:
caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build is failing
```
...
Mar 29 04:44:46 Need to get 174 MB of archives.
Mar 29 04:44:46 After this operation, 576 MB of additional disk space will be used.
Mar 29 04:44:46 Do you want to continue? [Y/n] Abort.
Exited with code 1
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18609

Differential Revision: D14694990

Pulled By: bddppq

fbshipit-source-id: 260446a8650f660a2baf123a3f17efdf0a8d6c64
2019-03-29 22:47:29 -07:00
24db1667da Attribute serialization improvements (#18188)
Summary:
* adds attributes to `ScriptModule.__getattr__` so they can be accessed in Python after re-importing
* full support for all the possible values for an `int64_t`
    * this necessitated a bunch more `pushWhatever` functions, so re-introduced a templated version to cut down on duplicate code
* tests to validate references / value sharing works
* adds `torch.jit.Unpickler` which people can use to de-serialize the pickle files into Python / have a quick reference on how to do this without PyTorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18188

Differential Revision: D14527490

Pulled By: driazati

fbshipit-source-id: efd15579cc04aa2e28c4b2c9490d82d849dee559
2019-03-29 19:10:12 -07:00
e13101e069 support pre-convert filter format for mkldnn training mode and change 'OptimizeForIdeep' to 'OptimizeForMkldnn' (#15171)
Summary:
For MKL-DNN,the filter data will be reorderd to primitive format, it takes a lot of time.
So the patch provide a method to convert filter format before training.
And "OptimizeForIdeep" will be changed to "OptimizeForMkldnn" in this patch.
 This patch depends on https://github.com/pytorch/pytorch/pull/12866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15171

Differential Revision: D14590741

Pulled By: yinghai

fbshipit-source-id: 07971c9977edac3c8eec08ca2c39cda639683492
2019-03-29 19:00:48 -07:00
d73c830e23 Tensor construction codemod(raw_mutable_data) (#16373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16373

motivation: https://github.com/pytorch/pytorch/pull/12407
This is a manual diff.
most of the fixes should be:

```
auto* Y = Output(0);
Y->Resize(dims);
Y->raw_mutable_data(dtype);
```
-->
```
auto* Y = Output(0, dims, at::dtype(dtype));
```
But there might be other cases.

Reviewed By: dzhulgakov

Differential Revision: D13725460

fbshipit-source-id: 649a4b0e42f62cda1a60171dd9fa3e440dc9dca1
2019-03-29 18:36:46 -07:00
7b0ef31780 Add hash() global (#18258)
Summary:
This adds `hash()` which supports `int`, `str`, and `float`. It relies on `std::hash` which is implementation defined, so the result of `hash()` in TorchScript is not the same as in Python, but should satisfy the same properties.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18258

Differential Revision: D14692317

Pulled By: driazati

fbshipit-source-id: 909df5d024bb3feea157d5a203b7de53c72261c9
2019-03-29 18:29:34 -07:00
a5ddecd00c Move fuser to test_jit_fuser (#18590)
Summary:
Start of breaking up test_jit.py

New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports.

I am adding a test that's expected to fail to be sure it's running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590

Reviewed By: wanchaol

Differential Revision: D14677094

Pulled By: eellison

fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522
2019-03-29 18:13:26 -07:00
85f36014e2 Experimental logging/counters API (#18235)
Summary:
This defines a generic counters API that users can utilize to provide monitoring functionality in e.g. a production service. We expose both counters for runtime internals as well as a TorchScript API to create user-defined counters. Synopsis of the API:

- `torch/csrc/jit/script/logging.h` specifies the externally-facing API in C++
- `torch/jit/_logging.py` specifies the Python API

We use an interface, `LoggerBase`, to define the interactions between users and a logging backend. Implementing a subclass of `LoggerBase` allows the user to handle these events in a custom way, such as logging into a DB or calling into an infra-specific counters API.

From the frontend perspective, we can create log events in two ways:
1. We provide an `add_stat_value(name, val)` function. This calls into the Logger backend with a key/value pair. For example, we might call `add_stat_value('foo', 1)` to bump an event counter.
2. We provide a `time_point()` function to record a timestamp in nanoseconds. This can be used in conjunction with `add_stat_value` to record runtime wall clock durations.

Examples of frontend usage can be found in `test_jit.py TestLogging`.

We provide a trivial `LockingLogger` implementation as an example and for testing purposes. It is likely not ready for production usage. It demonstrates that a backend implementing the API can do things like specify aggregation types and report these aggregate stats via the `get_counters()` API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18235

Differential Revision: D14545060

Pulled By: jamesr66a

fbshipit-source-id: 04099543a1898cfdd411511e46e03d5dce9b4881
2019-03-29 17:14:03 -07:00
e2fd1d966f Revert D14668859: [pytorch][PR] Re-land Parsing file check
Differential Revision:
D14668859

Original commit changeset: 3825a35ddc61

fbshipit-source-id: f3343ec6b63fe8f1f04959adfac4331865990047
2019-03-29 17:14:00 -07:00
47e2772320 Update argument names of torch::autograd::FunctionPostHook (#18140)
Summary:
They are called as (outputs, inputs) and were named (inputs, outputs).

Possible follow up fix is to make the outputs argument an lvalue to allow for calling multiple post hooks without ever copying outputs vector. It looks like the copy is now forced because the hook takes a const reference as input and returns an value. This would change the prototype of the function, so needs further discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18140

Differential Revision: D14684498

Pulled By: pietern

fbshipit-source-id: 1bd3ddbdd1ff7fe0a18241de5a9ec745a4e7ef07
2019-03-29 16:30:23 -07:00
81b73951f1 note on updating existing source (#18409)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18388
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18409

Differential Revision: D14597666

Pulled By: soumith

fbshipit-source-id: 156104c0cd19da06f6f96a225228d1e8cf831af1
2019-03-29 16:14:47 -07:00
393731ab24 Re-land Parsing file check (#18570)
Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.

Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570

Differential Revision: D14668859

Pulled By: eellison

fbshipit-source-id: 3825a35ddc6179a0d433d70d22b5c1a96c20b21a
2019-03-29 15:46:59 -07:00
1240327c5c Refactoring serialization of ONNX initializers to be name-based (Resubmission) (#17830)
Summary:
houseroad - this is the resubmission of https://github.com/pytorch/pytorch/pull/17420, as suggested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17830

Reviewed By: zrphercule

Differential Revision: D14398714

Pulled By: houseroad

fbshipit-source-id: bda475f1ae8a5273ebdb0f6883fc66036c29d326
2019-03-29 15:23:29 -07:00
fca9d9a100 Initial implementation of InsertObserverNodes pass. (#18152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18152
ghimport-source-id: 1dd5e62c4d93394dcd8d8af2871554575c8d3d1a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18152 Initial implementation of InsertObserverNodes pass.**
* #18151 Add quant-passes stubs.

gh-metadata: pytorch pytorch 18150 gh/zolotukhinm@gmail.com/2/head

Differential Revision: D14584223

fbshipit-source-id: 30896acc1a8901d22c6a167eb87d2fbaafbbeb6f
2019-03-29 15:08:57 -07:00
84f020fe09 Fix bug in tensor feed which caused crash due to wrong tensor type (#18552)
Summary:
In blob feeder for ideep device, the wrong device option is given and led to a crash issue.
This patch aims to correct the device option to fix this bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18552

Differential Revision: D14679838

Pulled By: yinghai

fbshipit-source-id: bde11e6a6fe44822166881dcb7c9bd0b34b4ecf3
2019-03-29 14:12:36 -07:00
e3b1758f19 Upgrade mkldnn-bridge to revert tensor capacity patch and prepare for DNNLOWP support (#18471)
Summary:
1. Upgrade mkldnn-bridge to revert tensor capacity patch to avoid ASAN issue.
2. Prepare for DNNLOWP support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18471

Differential Revision: D14621569

Pulled By: yinghai

fbshipit-source-id: 9df300b77d0f2acd1a4f63c2925b7a7cab7a474e
2019-03-29 13:54:04 -07:00
f4e35d30ed register BoxWithNMSLimit with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17956

Reviewed By: houseroad

Differential Revision: D14417300

fbshipit-source-id: eb5e2ba84513b3b7bfa509dc442424b13fe9148f
2019-03-29 13:41:40 -07:00
d895d30876 Fix c10d build without nccl.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18582

Differential Revision: D14672928

Pulled By: gchanan

fbshipit-source-id: 74e9805cbaf5ebe8e3f579fe08dad72eb410b80a
2019-03-29 13:34:38 -07:00
6ebfbdf4c6 Add named submodule support to nn::Sequential (#17552)
Summary:
Previously, we were not able to assign names to `nn::Sequential`'s submodules. This PR adds this feature to match the Python API. Example use:
```cpp
Sequential sequential(named_submodule({
      {"linear", Linear(10, 3)},
      {"conv2d", Conv2d(1, 2, 3)},
      {"dropout", Dropout(0.5)},
      {"batchnorm", BatchNorm(5)},
      {"embedding", Embedding(4, 10)},
      {"lstm", LSTM(4, 5)}
}));
```

It also enables loading parameters of Python `nn.Sequential` module with custom submodules names into C++ frontend, unblocking https://github.com/pytorch/vision/pull/728#issuecomment-466661344.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17552

Differential Revision: D14246834

Pulled By: yf225

fbshipit-source-id: 3030b5c5d68f6dd5d3e37ac4b4f98dc6d6d9ba72
2019-03-29 13:06:29 -07:00
e73be58ff7 Rename btriunpack to lu_unpack (#18529)
Summary:
Changelog:
- Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface.
- Rename all relevant tests, fix callsites
- Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529

Differential Revision: D14683161

Pulled By: soumith

fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1
2019-03-29 13:01:30 -07:00
be2ac6828c fix lint (#18623)
Summary:
Fix lint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18623

Differential Revision: D14686265

Pulled By: eellison

fbshipit-source-id: 4bbe0f5bc58f508cbf4bc1baef2029ce1eaa42d8
2019-03-29 11:50:12 -07:00
62d8c8cf0a Manual hipify caffe2/distributed and rocm update (no hcc modules support) (#18088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18088

Manually hipify the distributed folder

Reviewed By: bddppq

Differential Revision: D14482702

fbshipit-source-id: cc0abdf525b423ab1f18db8010d21e27c6668d36
2019-03-29 11:07:32 -07:00
7c438c82eb Change dnnlowp log level from warning to v2 (#18576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18576

As in title

Reviewed By: feiyu1990

Differential Revision: D14670898

fbshipit-source-id: 1983099b2ba57daab393278553f10dcdb1812fdf
2019-03-29 09:29:25 -07:00
c0a2452ffe multiline KeyError msg python bug workaround (#18557)
Summary:
make multiline KeyError msg readable by working around a python bug https://bugs.python.org/issue2651

discussion: https://github.com/pytorch/pytorch/issues/16647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18557

Differential Revision: D14681086

Pulled By: soumith

fbshipit-source-id: acbd13a823302c854c3d364028ed414fd8ce6bc8
2019-03-29 07:04:20 -07:00
95d3825e48 ReduceLrOnPlateau: best=current -> best=copy(current) (#16364) (#16697)
Summary:
Fixes #16364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16697

Differential Revision: D14680879

Pulled By: soumith

fbshipit-source-id: c50c22f3eacea4474fb3a04fe85fbf11d5a177c9
2019-03-29 06:56:51 -07:00
cf444f3544 make InstanceNorm1d raise an error if the input is 2D (#11992)
Summary:
Resolves #11991 .

Any comment is welcome.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11992

Differential Revision: D14680974

Pulled By: soumith

fbshipit-source-id: 8e287a9c32bf43b35edc9d127f16ed6b72c61d91
2019-03-29 06:50:04 -07:00
c189eba3e1 Fixed torch.arange docs (#18604)
Summary:
Kindly let me know if its okay and if any places i need to make a fix. Closes #18534
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18604

Differential Revision: D14680712

Pulled By: soumith

fbshipit-source-id: 030e4a3d8f7839cbe2b8a3ef386323f0d39eb81a
2019-03-29 06:42:28 -07:00
e22a2b9015 Minor fixes in fastrnns benchmarks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18613

Reviewed By: wanchaol

Differential Revision: D14681838

fbshipit-source-id: 60bd5c9b09398c74335f003cd21ea32dd1c45876
2019-03-29 01:22:28 -07:00
d859031ebf Rename btrifact* to lu (#18435)
Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
2019-03-29 00:34:30 -07:00
c21e763cd6 Optimize relu op on GPU (#18506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18506

Optimize relu op on GPU

Reviewed By: houseroad

Differential Revision: D14633171

fbshipit-source-id: bd3afa9a0bae1325d32ad4153736a0c7ecb0ec64
2019-03-29 00:23:24 -07:00
a5a1c9a171 Automatic update of fbcode/onnx to fb1a80692c1ab0bd27b1072f2e7bffacba336777 (#18585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18585

Previous import was b29e78a4efb8e5d8995f576bbf19a846807829b6

Included changes:
- **[fb1a8069](https://github.com/onnx/onnx/commit/fb1a8069)**: Fix wrongly handled attribute in MVN and test generating scripts (#1877) <Raymond Yang>
- **[b22041c3](https://github.com/onnx/onnx/commit/b22041c3)**: Add dilation attribute to MaxPool (#1864) <karljang>

Reviewed By: zrphercule, benoitsteiner

Differential Revision: D14668623

fbshipit-source-id: fa7f44b1ecc949d8dd654939d20b1e93db98b1d2
2019-03-28 23:47:10 -07:00
0fd1ee3145 Automatic update of fbcode/foxi to 81e1683d6348eee4b5ed1145222dc2c41be4269c (#18596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18596

Previous import was 2bcc4064c90e87b9638615c733485f07c47b7558

Included changes:
- **[81e1683](https://github.com/houseroad/foxi/commit/81e1683)**: Merge pull request #9 from zrphercule/add_foxi_quantization <Rui Zhu>
- **[580559c](https://github.com/houseroad/foxi/commit/580559c)**: char=>uint8 <zrphercule>
- **[1a572f7](https://github.com/houseroad/foxi/commit/1a572f7)**: add quantization <zrphercule>

Reviewed By: zrphercule

Differential Revision: D14677404

fbshipit-source-id: 09429b3bf0e7783a25b8145020e505761bad887d
2019-03-28 23:24:30 -07:00
ff4b6d1a49 Delete batch tensor (#18575)
Summary:
Deleting batch tensor since we are no longer maintaining the project and keeping it functional is blocking other improvements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18575

Differential Revision: D14671126

Pulled By: eellison

fbshipit-source-id: b42d5b699c4d12171ed95e6d3a977532167f0d2c
2019-03-28 23:13:27 -07:00
b1a0233ee4 Update NNPACK to current master (#18580)
Summary:
This fixes builds on x86 (32 bits).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18580

Differential Revision: D14672462

Pulled By: soumith

fbshipit-source-id: 7629b001c2bfa3e5b6ade7f1b03a8280232a4c16
2019-03-28 22:23:08 -07:00
1c3428af31 Enhance build_ios.sh to be consistent with build_android.sh (#18564)
Summary:
1, Enhance build_ios.sh to be consistent with build_android.sh;
2, Add docs for build_ios.sh.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18564

Differential Revision: D14680752

Pulled By: soumith

fbshipit-source-id: 6d2667ed8a3c85a057a522838f5d0461dd4788cf
2019-03-28 21:37:55 -07:00
3752916132 Serialization supports pathlib.Path object for the input argument (#18562)
Summary:
This will allow pathlib.Path object to the torch.load as an input argument.
Fixes #16607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18562

Differential Revision: D14668255

Pulled By: soumith

fbshipit-source-id: 0ae4f7c210918582912f2d1ef2a98f1ab288c540
2019-03-28 21:01:15 -07:00
12abc8a99a Target and input sizes mismatch warning in L1 Loss / L1 Smooth Loss (#18565)
Summary:
Addind the same warning message already present in the mse_loss function to the L1 losses when input and target sizes are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18565

Differential Revision: D14671415

Pulled By: soumith

fbshipit-source-id: 01f5e1fb1ea119dbb2aecf1d94d0cb462f284982
2019-03-28 20:49:51 -07:00
1989716ae5 Resubmit PR-18512: Improved onnx export for 3 onnx ops (#18571)
Summary:
Fix ROCm CI failure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18571

Differential Revision: D14669323

Pulled By: bddppq

fbshipit-source-id: 022afe5c20e680295c9cfdfe1ec14650305955a8
2019-03-28 18:12:49 -07:00
2f174e9453 in caching allocator, ignore and clear the error if not ready
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18584

Differential Revision: D14675041

Pulled By: bddppq

fbshipit-source-id: c1fab797e0d224e0a481a0395a3f9975c4265ff6
2019-03-28 17:53:30 -07:00
600eeecbf4 Add external callbacks into RecordFunction (#17844)
Summary:
Add a way to insert external callbacks into PT's RecordFunction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17844

Differential Revision: D14399664

Pulled By: ilia-cher

fbshipit-source-id: 76654799811fefd3ffed4abfb46ed95b492cebab
2019-03-28 17:48:45 -07:00
11ac0cf276 Implement rotated generate_proposals_op without opencv dependency (CPU version)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18533

Reviewed By: ezyang

Differential Revision: D14648083

fbshipit-source-id: e53e8f537100862f8015c4efa4efe4d387cef551
2019-03-28 17:02:50 -07:00
1ae2c1950c Use SetOutputTensor instead of copying outputs manually (#17770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17770

As title

Reviewed By: dzhulgakov

Differential Revision: D14370937

fbshipit-source-id: f415490c38556cf03bb13dce3643775331483448
2019-03-28 16:01:33 -07:00
aea8ee1f68 Fix NCCL/Gloo process groups and DDP stream sync bug (#18465)
Summary:
DDP with NCCL backend uses a [worker stream](d3eb941ed9/torch/csrc/distributed/c10d/ddp.cpp (L142)) to flatten grand batch
tensors, and passes the flattened tensor to [another stream](d3eb941ed9/torch/lib/c10d/ProcessGroupNCCL.cpp (L379)) to
conduct ncclAllReduce. The flattened tensor has to record the
ncclAllReduce stream, otherwise multiple streams might access the
same memory space.

cc ppwwyyxx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18465

Differential Revision: D14613449

Pulled By: mrshenli

fbshipit-source-id: b62773732552d12cc87b7adeb6897e9e11753ea9
2019-03-28 15:12:40 -07:00
9eb0f435d9 Inference LSTM integration test (#18559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18559

Adding integration test for inference LSTM

Reviewed By: houseroad

Differential Revision: D14656698

fbshipit-source-id: 80fb2a72be30fcb695f4471b72bf9d6e3965bf81
2019-03-28 11:31:06 -07:00
aa20591baa Add Slot type to abstract the raw pointers being used for slots. (#18226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18226
ghimport-source-id: b9ec8651212875b30971cc6859d2ddec6559ae3a

If modules become first-class IValues, then the slots will no longer be raw pointers but (IValue, index) pairs. This commit inserts the Slot abstraction so that this change can be made in later patches.

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18226 Add Slot type to abstract the raw pointers being used for slots.**

Differential Revision: D14542022

fbshipit-source-id: b81d7f4334c983d663e7551bda82df43680d7c5f
2019-03-28 10:35:36 -07:00
77280b11e3 Revert D14635130: Improved onnx export for 3 onnx ops.
Differential Revision:
D14635130

Original commit changeset: d54a2b6e2950

fbshipit-source-id: f624e2befdde245cb88435a95508b2a8e6b12e61
2019-03-28 10:26:34 -07:00
eee760dbd3 Improved onnx export for 3 onnx ops. (#18512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18512

Ceil and Floor have been supported since version 6 of ONNX: export them using the native onnx ops instead of an Aten op.
Similarly, support for the Where op has been added in version 9, so we don't need to wrap these op in an Aten op.

Reviewed By: houseroad

Differential Revision: D14635130

fbshipit-source-id: d54a2b6e295074a6214b5939b21051a6735c9958
2019-03-28 08:55:21 -07:00
ffc7158bf2 Revert D14652372: [pytorch][PR] Add parsing to file check
Differential Revision:
D14652372

Original commit changeset: 7430b9d1dc2b

fbshipit-source-id: fa3d0f68515fe53447746469844d2db20c1292e0
2019-03-28 00:12:47 -07:00
b0d9712938 C++17.h: forward -> c10::guts::forward (#18492)
Summary:
Use c10::guts::forward instead of forward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18492

Reviewed By: smessmer

Differential Revision: D14625513

Pulled By: ilia-cher

fbshipit-source-id: 8bc4e20f102fe2a107a22f3e172882d60b95ab0e
2019-03-27 21:14:07 -07:00
9696f06bcf Use __ldg for CUDA kernels in fuser (#18540)
Summary:
While benchmarking a kernel with broadcasted inputs, I noticed
that is was much slower than a hand-coded kernel for the smae task.

The kernel in question computed a * b + c for a of shape
32 x 32 x 10240 and b and c of shape 1 x 32 x 1.

This patch accellerates said kernel from 450us to 250us on my GTX1080Ti.

I didn't change half because there doesn't seem to be __ldg for
half.

An alternative could be to sprinkle const and restrict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18540

Differential Revision: D14657840

Pulled By: soumith

fbshipit-source-id: 408847346ec12d1d1d9b119ac50bbc70f0d9ed33
2019-03-27 20:22:17 -07:00
8635078d9e Adds Cyclical Learning Rate and Momentum (#18001)
Summary:
This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR

This is finishing what #2016 started. Resolves #1909.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001

Differential Revision: D14451845

Pulled By: sampepose

fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9
2019-03-27 19:56:04 -07:00
54abfda124 Completely synchronize behavior of Facebook flake8 and public flake8. (#18538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18538
ghimport-source-id: 665b09f158d1c5dd94686d4212792504b55b7f73

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18538 Completely synchronize behavior of Facebook flake8 and public flake8.**

Previously, developers at Facebook had the very funny experience
wherein /usr/local/bin/flake8 behaved differently than a freshly
installed flake8 from pip.  In this commit, I add enough ignores to
.flake8 and install enough plugins to make the Facebook flake8
and public flake8 line up exactly.  These means you don't have
to care which flake8 you use; they all will report accurate information
on your Python files.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14652336

fbshipit-source-id: ba7776eaa139cf2e3df2e65349da6fd7c99acca4
2019-03-27 19:51:21 -07:00
8faf0112f3 add slow tests annotation to some jit tests (#18545)
Summary:
Adds slow test annotation to the following very slow tests -

70.33s     test/test_jit.py::TestScript::test_script_module_script_resnet
32.33s     test/test_jit.py::TestBatched::test_beam_search
17.70s     test/test_jit.py::TestBatched::test_greedy_search
15.58s     test/test_jit.py::TestScript::test_script_module_trace_resnet18

The list of remaining slow tests is below. Let me know if you think any of the others should be added to slow tests as well. Slow tests will only run on master.

15.28s call     test/test_jit.py::TestJit::test_export_batchnorm
12.96s call     test/test_jit.py::TestEndToEndHybridFrontendModels::test_snli
11.65s call     test/test_jit.py::TestEndToEndHybridFrontendModels::test_neural_style
6.38s call     test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_1d
5.96s call     test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_2d_uneven_pad
5.91s call     test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_3d_custom_params
4.76s call     test/test_jit.py::TestJit::test_alexnet
3.82s call     test/test_jit.py::TestScript::test_number_math
3.81s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_no_bias
3.76s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_groups_thnn
3.65s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride_pad1circular
3.49s call     test/test_jit.py::TestBatched::test_lstm
3.33s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_pad2circular
3.19s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv1d_stride1_pad2circular
3.11s call     test/test_jit.py::TestEndToEndHybridFrontendModels::test_dcgan_models
3.11s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride_padding
3.11s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride
3.08s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_no_bias
3.08s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv1d_stride1_pad1circular
3.07s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_groups
3.05s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_dilated
3.05s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_depthwise_with_multiplier
3.04s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_groups
3.03s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_dilated
3.02s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_depthwise_dilated
3.02s call     test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_dilated_strided
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18545

Differential Revision: D14656064

Pulled By: eellison

fbshipit-source-id: d17ee23c3b3679276cee983555d43e83ce099356
2019-03-27 19:27:23 -07:00
0daafe0209 Add parsing to file check (#18304)
Summary:
This allows you to embed checks in IR, making the test more readable.

E.g.
```
graph_str = 'graph(%0 : Double(5, 5)):
          # CHECK: aten::relu
          %1 : Double(5, 5) = aten::relu(%0)
          return (%1)'
FileCheck().run(graph_str, parseIR(graph_str))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18304

Differential Revision: D14652372

Pulled By: eellison

fbshipit-source-id: 7430b9d1dc2b7584704375aac02d7392ecec76a0
2019-03-27 18:16:05 -07:00
ad1ebf7082 bug fix for node with writers in create autodiff subgraph (#18491)
Summary:
Previously we were moving nodes with writers into differentiable subgraphs, without necessarily preserving whether or not they were written to. This can lead to bugs with CSE, which needs that context.

I'm not completely sure if there's anything else we can do to be more aggresive here - inline these nodes and not run CSE and just run constant pooling, or possibly something else, but I think we should land this correctness condition first and then possibly think further.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18491

Differential Revision: D14648562

Pulled By: eellison

fbshipit-source-id: bc1e444774ccdb708e22f0e06a477a221a231f9e
2019-03-27 16:08:03 -07:00
d74b11ce0e add extra info for the auto gen sum ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17934

Reviewed By: iroot900

Differential Revision: D14418689

fbshipit-source-id: 9e11e461001467f0000ea7c355d5b0f0d738fa85
2019-03-27 14:56:32 -07:00
58f3712ceb Clarify error text of the pin_memory function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18530

Reviewed By: ezyang

Differential Revision: D14647578

Pulled By: VitalyFedyunin

fbshipit-source-id: ddd70240d52d2e9a96e26f5a0dfea8d76fe25078
2019-03-27 14:56:29 -07:00
6684ef3f23 Move fast rnn benchmark to pytorch/pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18369

Differential Revision: D14652039

Pulled By: wanchaol

fbshipit-source-id: 1177b1f60d96672c3e2c9d527b56ee06ca7c0af1
2019-03-27 14:46:09 -07:00
e4f1681c82 Rename isTensor api -> isCompleteTensor (#18437)
Summary:
Is Tensor has been brought up as misleading a couple times, rename it isCompleteTensor for clarity.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18437

Differential Revision: D14605223

Pulled By: eellison

fbshipit-source-id: 189f67f12cbecd76516a04e67d8145c260c79036
2019-03-27 14:46:06 -07:00
1eee2090d4 Const trace error v2 (#18535)
Summary:
Trying to reland https://github.com/pytorch/pytorch/pull/18298
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18535

Differential Revision: D14652391

Pulled By: eellison

fbshipit-source-id: 699e30045dd5f14f0a2b98378272045a292e1e2a
2019-03-27 14:40:56 -07:00
fdedc62c26 enable more unit tests (#18537)
Summary:
Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537

Differential Revision: D14651822

Pulled By: ezyang

fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235
2019-03-27 14:27:23 -07:00
c3e3c5cc39 Skip tests if C2/ONNX models cannot be read (#18494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18494

Today we have some C2 end2end test run requiring reading model data from external filesystem (for example, Gluster and AWS). This could be a source for flaky test when the external filesystems are not reachable during the tests.

In this diff, we add try/catch logic around where we download models and open model files from external system. In case such attempts fails, we will catch the excption and let the unittest skip the current test instead of failure.

I also refactor the code a little bit by removing some duplicated logic on downloading and build the c2 model data. It has been duplicated in two classes and a few functions...

Reviewed By: yinghai

Differential Revision: D14442241

fbshipit-source-id: da8bf56c8d096efa34ca2070de5cd10a18aad70c
2019-03-27 11:21:44 -07:00
30da6c7d06 Add qtensors in caffe2 protobuf argument (#18486)
Summary:
We are about to merge onnxifi quantization support soon. Before that, I would like to merge this diff seperately to make sure it doesnt break anything.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18486

Reviewed By: bddppq, houseroad

Differential Revision: D14626419

Pulled By: yinghai

fbshipit-source-id: 504c1eae60be1e629203267b59defb8b69d82c0a
2019-03-27 11:16:40 -07:00
defe67caf2 Generate sphinx docs with secure content. (#18508)
Summary:
There are a number of pages in the docs that serve insecure content. AFAICT this is the sole source of that.

I wasn't sure if docs get regenerated for old versions as part of the automation, or if those would need to be manually done.

cf. https://github.com/pytorch/pytorch.github.io/pull/177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18508

Differential Revision: D14645665

Pulled By: zpao

fbshipit-source-id: 003563b06048485d4f539feb1675fc80bab47c1b
2019-03-27 11:01:48 -07:00
8c3285bf11 Fix loss functions doc (#18420)
Summary:
Correct docstring display error on web page caused by my previous PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18420

Differential Revision: D14642467

Pulled By: soumith

fbshipit-source-id: 16fdd3301a4c5bad27fbcd8686f7fbfcc1e908ee
2019-03-27 10:23:24 -07:00
81e030d9a6 Upgrade flake8-bugbear to master, fix the new lints. (#18507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507
ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18507 Upgrade flake8-bugbear to master, fix the new lints.**

It turns out Facebobok is internally using the unreleased master
flake8-bugbear, so upgrading it grabs a few more lints that Phabricator
was complaining about but we didn't get in open source.

A few of the getattr sites that I fixed look very suspicious (they're
written as if Python were a lazy language), but I didn't look more
closely into the matter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14633682

fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8
2019-03-27 08:07:41 -07:00
85d78a0532 Add export annotations for functions in c10 (#18464)
Summary:
Fixes #18461.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18464

Differential Revision: D14620963

Pulled By: ezyang

fbshipit-source-id: c11f3967de2ac69c7140767c8fe73a85555e9f40
2019-03-27 07:58:58 -07:00
a3933b87c6 Back out "Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch" (#18514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18514

Original commit changeset: d6267ddfc339

Reviewed By: bddppq

Differential Revision: D14634476

fbshipit-source-id: 2633b0b4c512d71001e5c20cd79c0c0d7856f942
2019-03-26 23:44:33 -07:00
eae7ad4ca8 Automatic update of fbcode/onnx to b29e78a4efb8e5d8995f576bbf19a846807829b6 (#18503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18503

Previous import was c05f2ae412daf8fd64136ca354b97ccf73e0ea6c

Included changes:
- **[b29e78a4](https://github.com/onnx/onnx/commit/b29e78a4)**: update copyright for open governance (#1885) <Prasanth Pulavarthi>
- **[3b0ecd55](https://github.com/onnx/onnx/commit/3b0ecd55)**: open governance (#1881) <Prasanth Pulavarthi>
- **[bbe28349](https://github.com/onnx/onnx/commit/bbe28349)**: Revert "Adding Reverse op (#1804)" (#1882) <Lu Fang>
- **[5be3e223](https://github.com/onnx/onnx/commit/5be3e223)**: Adding Reverse op (#1804) <Peyman Manikashani>

Reviewed By: zrphercule

Differential Revision: D14632717

fbshipit-source-id: 2637a4090e7071a59caff3a910fa4f077906bf3c
2019-03-26 21:58:22 -07:00
f3ddc40ca4 Move weight offload inside backend construction functor (#18385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18385

By moving the weight offload into the backend initialization function, we can instantiate the backend once by creating the OnnxifiOp once and then clean up the parameter workspace. And we need to keep hold of that instantiated net (OnnxifiOp) without cleaning it. Subsequent ctor of OnnxifiOp of the same model will hit the cached backend and they will not look into weight offloading, which is safe as the weight is already gone.

Reviewed By: ipiszy

Differential Revision: D14590379

fbshipit-source-id: f7f34016e09777ad3df0af487885cd14658e1044
2019-03-26 21:03:17 -07:00
60538c8366 fix #16448 (#18479)
Summary:
Fixes #16448

bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18479

Differential Revision: D14635360

Pulled By: ezyang

fbshipit-source-id: 4010319fbce050dd0bdf4da3cd1171b9737f3c4c
2019-03-26 20:58:25 -07:00
f447b63ed0 Add section about .code to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18493

Differential Revision: D14634677

Pulled By: jamesr66a

fbshipit-source-id: 9ee065f6ce4218f725b93deb4c64b4ef55926145
2019-03-26 20:52:31 -07:00
45ec4920e3 how to use the ccache package on Ubuntu (#18495)
Summary:
Added full instructions for how to use the `ccache` package. Thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18495

Differential Revision: D14635351

Pulled By: ezyang

fbshipit-source-id: 158e1052bae580e95f73644252fdbddcc0213128
2019-03-26 20:08:09 -07:00
d5861aa55c Append c10 libs to TorchConfig.cmake (#18418)
Summary:
Fixes #18416.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18418

Differential Revision: D14635322

Pulled By: ezyang

fbshipit-source-id: 81cb658f73583e4cd0358173617f747ebf4f7f8a
2019-03-26 19:53:02 -07:00
2ba41c5550 Add some missing docs for tensor methods and attributes, new unittest to enforce tensors.rst no longer miss anything (#16057)
Summary:
This depend on https://github.com/pytorch/pytorch/pull/16039

This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`.

When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057

Differential Revision: D14619550

Pulled By: ezyang

fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc
2019-03-26 18:05:56 -07:00
66e8c74814 Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch
Differential Revision:
D14613517

Original commit changeset: dd20d718db55

fbshipit-source-id: d6267ddfc339d04f182e2de1750a601c8d6bf8c6
2019-03-26 17:37:55 -07:00
e912632b74 Fix direct comparison of OperatorDef proto structs (#18466)
Summary:
arguments order is okay to be different

ajyu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18466

Differential Revision: D14627258

Pulled By: bddppq

fbshipit-source-id: 430e1fb1bea2c5639a547ae7c1652368788c86b9
2019-03-26 17:25:09 -07:00
66628f78b7 Revert D14605905: [pytorch][PR] Add return_counts to torch.unique
Differential Revision:
D14605905

Original commit changeset: 555f5a12a8e2

fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db
2019-03-26 17:18:01 -07:00
bdd098c694 Fix typo in Github links in elementwise_ops_schema.cc (#18018)
Summary:
s/elementwise_op_schema.cc/elementwise_ops_schema.cc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18018

Differential Revision: D14612291

Pulled By: soumith

fbshipit-source-id: 09276283b9ff92c039ce530165c62cc8421fb443
2019-03-26 15:37:26 -07:00
5292685d2f Improve numerical precision of (s)logdet (#18449)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449

Differential Revision: D14611638

Pulled By: soumith

fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf
2019-03-26 15:32:14 -07:00
436723122e fix arange shape issue inconsistency across cpu and cuda (#18462)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462

Differential Revision: D14620263

Pulled By: soumith

fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb
2019-03-26 15:27:24 -07:00
bbe110f4e1 Updating onnxtrt submodule to master branch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18441

Differential Revision: D14613517

Pulled By: bddppq

fbshipit-source-id: dd20d718db55942df9cce7acd1151d6902bc57ff
2019-03-26 14:25:55 -07:00
654e59fcac Minor fix for onnx ConstantOfShape export (#18199)
Summary:
Set value as tensor of 1 element instead of scalar, according to ONNX spec.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18199

Reviewed By: dzhulgakov

Differential Revision: D14542588

Pulled By: houseroad

fbshipit-source-id: 70dc978d870ebe6ef37c519ba4a20061c3f07372
2019-03-26 13:23:16 -07:00
5bff395a82 Namedtuple return for solve, slogdet, sort, topk (#17093)
Summary:
More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~

cc: ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093

Differential Revision: D14620068

Pulled By: ezyang

fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237
2019-03-26 12:39:08 -07:00
c6bfcb854b Expose c10 operators to caffe2 by operator name (#18160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18160

When exposing a c10 operator to the caffe2 frontend, don't use the operator schema but use the operator name instead.
This allows us to get rid of the existing mechanism for operator schema registration in a diff stacked on top.

Reviewed By: dzhulgakov

Differential Revision: D14513420

fbshipit-source-id: 6b08a9c6d9497eaf18b62361dd44bc07c7b4b76b
2019-03-26 12:36:11 -07:00
3bbe204f32 Test running a CUDA build on CPU machine. (#18242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18242
ghimport-source-id: b949d312a48226a34f90304162e910acee7c95cd

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18242 Test running a CUDA build on CPU machine.**
* #18362 Add ability to query if built with CUDA and MKL-DNN.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584429

fbshipit-source-id: b54de5b33f0c795a7d9605d30576cdf9b74050fd
2019-03-26 12:31:11 -07:00
0aeaeffb6c Properly use cudaGetLastError return code. (#18485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18485

I don't know how (1) we landed the wrong version of the patch and (2) how
this passed the push blocking test

Reviewed By: pjh5

Differential Revision: D14621961

fbshipit-source-id: 0a3953d7adcdc79727a61c2acff65f436dcafe55
2019-03-26 12:26:44 -07:00
265fa0ce4d Move math::Axpy function to elementwise lib (#18316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18316

Move math::Axpy function to elementwise lib

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14574697

fbshipit-source-id: 7cfbb2da295c8966c5328bd6b577cce2638eea62
2019-03-26 12:19:19 -07:00
6f3186a578 Upgrade mkldnn to version 0.18.1 (#18463)
Summary:
Upgrade mkldnn to version 0.18.1
Fix the MKLDNN build issue if linking with MKL 2019.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18463

Differential Revision: D14620228

Pulled By: ezyang

fbshipit-source-id: 136074ad0e4631e1dde4ca1b0af4ee6a41e50913
2019-03-26 11:00:25 -07:00
fa0bfa03ed Add Google tag (#17690)
Summary:
This PR adds a Global Site Tag to the site.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17690

Differential Revision: D14620816

Pulled By: zou3519

fbshipit-source-id: c02407881ce08340289123f5508f92381744e8e3
2019-03-26 10:35:24 -07:00
20159c3ffe remove redundant --install_dir parameter in GEN_COMMAND (#18473)
Summary:
remove redundant --install_dir parameter in GEN_COMMAND, since "--install_dir parameter " already contained in ${GEN_COMMAND}.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18473

Differential Revision: D14620193

Pulled By: ezyang

fbshipit-source-id: ee9953b5d055f4b8beb3557f95f6539051b0028a
2019-03-26 10:22:00 -07:00
1a742075ee Resolving comments from Bool Tensor for CPU PR (#18165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165
ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18166 Bool Tensor for CUDA
* **#18165 Resolved comments from Bool Tensor for CPU PR**
-------
------------
This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs.

gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies:

**[utils/python_scalars.h]** why is this converting from uint8_t and not bool? (comment?)
When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool.

**[ATen/Dispatch.h]**better name?.
fixed.

**[test/test_torch.py]** what about other factories, such as full? (and more).
There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR.

**[generic/THTensorMath.h]** any changes in this file actually needed?
Bad merge. Fixed.

**[TH/THTensor.h]** this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool?
Added

**[c10/core/ScalarType.h]** I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here?
Added bool to the macro and created a similar one without for a single case which fails the build with errors:

_./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator*’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value*’)
return (*this) * insertConstant(rhs);_

Differential Revision: D14605105

fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d
2019-03-26 09:59:34 -07:00
515238e0a5 Unify cudaGetDeviceCount implementations. (#18445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18445
ghimport-source-id: 30d018737bf6989bc68b7e3676f44e0ca6141fde

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18445 Unify cudaGetDeviceCount implementations.**

I went about doing this by searching for calls to cudaGetDeviceCount,
and then methodically replacing them with references to c10::cuda::device_count()
or at::cuda::device_count().

There is a point to doing this: the various implementations wildly differed
in their handling of what to do when cudaGetDeviceCount returns an error.
The final standardized behavior is that **all errors are swallowed** and
we return device count of zero.  This indirectly fixes running CUDA builds
on CPU, which was broken in #17847.

I added 'noexcept' to the 'deviceCount' virtual method on DeviceGuardImpl.
This is a BC-breaking change for anyone inheriting from DeviceGuardImpl
but all you need to do is put 'noexcept' on your method and it is backwards
compatible with older libtorch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14612189

fbshipit-source-id: 3c8d186e3dd623c0e27625212c7ce30f75d943cb
2019-03-26 09:50:14 -07:00
cf094d4edc Use TensorIterator for unary operations
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18309

Differential Revision: D14591533

Pulled By: cpuhrsch

fbshipit-source-id: a3b0788a481bddf1803c9f2d3289263d7364f8d7
2019-03-26 09:22:52 -07:00
5e462a3ed6 Introduce SobolEngine (#10505)
Summary:
`SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.

Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332.

This closes #9332 .

cc: soumith Balandat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10505

Reviewed By: zou3519

Differential Revision: D9330179

Pulled By: ezyang

fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d
2019-03-26 07:53:07 -07:00
9080942afb fix str of autogradzero
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18442

Differential Revision: D14602880

Pulled By: wanchaol

fbshipit-source-id: ebd00f9bb5f1f7e33964c10d8c9f165b7bb4985f
2019-03-25 23:49:27 -07:00
dc6b5b2a52 Optimize boolean expressions & unwraps (#18259)
Summary:
Simplify or eliminate boolean and/or expressions, optimize unwrapping a value that cannot be None, and optimize using `is` with a None and a non-None value

Since peephole optimize is now introducing constants, i added another constant propagation pass after running it.

Previously i had a PR that did this & optimized shape ops - i will add the shape optimizations in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18259

Differential Revision: D14602749

Pulled By: eellison

fbshipit-source-id: 1c3f5a67067d8dfdf55d7b78dcb616472ea8a267
2019-03-25 21:50:57 -07:00
a729630cbf Fix python resolution in caffe2 CI scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18417

Differential Revision: D14612704

Pulled By: bddppq

fbshipit-source-id: 0942048a9c3990afc50ce73c1fa1005c4d4097aa
2019-03-25 20:56:15 -07:00
bf2a30cb22 Support dim=None for argmax and argmin (#18264)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/18263
cc: houseroad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18264

Reviewed By: ezyang

Differential Revision: D14559234

Pulled By: houseroad

fbshipit-source-id: c5b8623752d6c6af41c6d715fd9585a65294868d
2019-03-25 20:43:34 -07:00
e2730ddb21 Add return_counts to torch.unique (#18391)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/12598

This PR was originally authorized by ptrblck at https://github.com/pytorch/pytorch/pull/15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`.

The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see https://github.com/pytorch/pytorch/issues/18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is.

cc: ptrblck ezyang ngimel colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18391

Reviewed By: soumith

Differential Revision: D14605905

Pulled By: VitalyFedyunin

fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce
2019-03-25 20:38:17 -07:00
ba4de667fa change dropout lowering in symbolic_script (#18375)
Summary:
Dropout is now eligible for fusion, and generated fused kernels are just as fast as dropout in ATen. Change its lowering in symbolic script so that it can actually be fused. Still special-cased for cuda, because without fusion this lowering is less efficient than current (bernoulli_ * input). Testing is covered by the test case that ailzhang added (test_dropout_cuda).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18375

Differential Revision: D14611938

Pulled By: soumith

fbshipit-source-id: 11b18f4784e6c9265e382a8f8deca7add8df3b37
2019-03-25 20:05:11 -07:00
a40e0a7f2d Add torch.version.git_version (#18299)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/18293
cc: colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18299

Differential Revision: D14611972

Pulled By: soumith

fbshipit-source-id: cdb48ef37c8869713a9a43ea0da08e1bed9279a2
2019-03-25 19:59:40 -07:00
674c274d92 Change deprecated IntList to IntArrayRef
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18262

Differential Revision: D14612244

Pulled By: ezyang

fbshipit-source-id: 5d21c7b94d64104fececcb15c6d38d9bd2a1fc70
2019-03-25 19:47:21 -07:00
d1e416ac73 Enable printing to stderr for test_proper_exit for better debugging (#18458)
Summary:
related to https://github.com/pytorch/pytorch/issues/16608
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18458

Differential Revision: D14611718

Pulled By: soumith

fbshipit-source-id: 6dc903ff2d32b9c3b76470869d1f4e9a67f706df
2019-03-25 19:20:21 -07:00
e1c272797b Don't require pygraphviz for regenerate.sh (#17485)
Summary:
closes #17336

Do not overwrite config.yml if script throws an error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17485

Differential Revision: D14604388

Pulled By: kostmo

fbshipit-source-id: 5024545e3a8711abdbc0800911c766929dbca196
2019-03-25 18:04:53 -07:00
13b95eac55 Add quant-passes stubs. (#18151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18151
ghimport-source-id: 7d12462971bdf3e5e26a3f150f1fcad05bba1a15

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18152 Initial implementation of InsertObserverNodes pass.
* **#18151 Add quant-passes stubs.**

gh-metadata: pytorch pytorch 18149 gh/zolotukhinm@gmail.com/1/head

Differential Revision: D14584224

fbshipit-source-id: b3d0b5ff797160d5ad23f91f732e627b0129086c
2019-03-25 17:48:54 -07:00
6a1a019c0a caffe2 - support flaky operator tests for caffe2 build (#18155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18155

- Make a python decorator caffe2_flaky for caffe2 operator unit tests.
- The environment variable CAFFE2_RUN_FLAKY_TESTS are now used to mark flaky test mode

During test run,
- If flaky tests mode are on, only flaky tests are run
- If flaky tests mode are off, only non-flaky tests are run

Mark ctc_beam_search_decoder_op_test as flaky

Reviewed By: ezyang, salexspb

Differential Revision: D14468816

fbshipit-source-id: dceb4a48daeb5437ad9cc714bef3343e9761f3a4
2019-03-25 16:58:34 -07:00
7a90bae416 Remove unused th_scalar_type (#18390)
Summary:
th_scalar_type seems to be unused anywhere so can be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18390

Reviewed By: ezyang

Differential Revision: D14591374

Pulled By: izdeby

fbshipit-source-id: 2113aa81229cdfdfb8dc5c951ea6dea3725b8582
2019-03-25 15:55:10 -07:00
56c16fe26f Porting CPU UpSample functions to ATen (#18020)
Summary:
This PR resolves partially #10482
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18020

Differential Revision: D14598029

Pulled By: ezyang

fbshipit-source-id: 513e7c6438ab6d5dc3f43241e7cb724744e9a287
2019-03-25 14:39:13 -07:00
ed8c462dc7 Fix caffe2 build with BLAS=OpenBLAS (#18422)
Summary:
g++ complains about failing to find the declaration of cblas_sscal and cblas_dscal BLAS function
let's fix it  :)

fedora 29, gcc 8.3.1, openblas 0.3.5
build with cmake -DBLAS=OpenBLAS ..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18422

Differential Revision: D14598977

Pulled By: soumith

fbshipit-source-id: bde77bfb359d2ff38226401caeed78c114ef7468
2019-03-25 11:59:10 -07:00
6c9b312fd4 Add addcmul, lerp to fuser, enable scalar->float specialization in symbolic script (#18081)
Summary:
This PR did two things:

1. Enable scalar->float specialization in symbolic script, so AD formula that contains scalar in the schema, should write `float` instead.
2. add addcmul, lerp to AD and fuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18081

Differential Revision: D14490493

Pulled By: wanchaol

fbshipit-source-id: b3b86d960d5f051b30733bc908b19786111cdaa4
2019-03-25 11:05:45 -07:00
50df3e5e2e Add ability to query if built with CUDA and MKL-DNN. (#18362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362
ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18362 Add ability to query if built with CUDA and MKL-DNN.**

Fixes #18108.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584430

fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473
2019-03-25 10:39:09 -07:00
17fcdfb925 Updating submodules
Reviewed By: yns88

fbshipit-source-id: b2c5eb7dfa9048e399461c00d1103e945a30a5bc
2019-03-25 10:32:26 -07:00
5653a914f7 Implement reference counting for shared IPC CUDA tensors (#16854)
Summary:
This is to fix #16141 and similar issues.

The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.

ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854

Differential Revision: D13994490

Pulled By: VitalyFedyunin

fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
2019-03-25 10:24:38 -07:00
f5ea528687 Don't segfault on trying to get data_ptr of sparse tensor. (#18347)
Summary:
Also asserts in storage_initialized that there is a storage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18347

Differential Revision: D14582028

Pulled By: gchanan

fbshipit-source-id: df3f5d181188f39e361839169fd054539c3b2839
2019-03-25 08:59:53 -07:00
647154f82a Assert tensor isn't sparse in enforce_invariants. (#18338)
Summary:
There's no reason we can't check this, but I'm punting on implementing it for now.  But it currently segfaults, so this is an improvements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18338

Differential Revision: D14580308

Pulled By: gchanan

fbshipit-source-id: 44d4cafeab12e1beeb3453a2d4068d221c2e9c4f
2019-03-25 08:44:17 -07:00
a4f83fff2b Only look for Caffe2 package when shared (#18421)
Summary:
Previously it would look for the Config even if it was not written.

Fixed #18419
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18421

Differential Revision: D14597139

Pulled By: ezyang

fbshipit-source-id: c212cbf5dc91564c12d9d07e507c8285e11c6bdf
2019-03-25 07:27:24 -07:00
c297f26843 Add more options to the quantization model exporter (#18383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18383

Add command line options for different quantization schemes.

Reviewed By: amylittleyang

Differential Revision: D14476862

fbshipit-source-id: 37fbf5b4c1c550121eae313f5a71d703a0a87f0f
2019-03-25 04:23:17 -07:00
9e176fe5fe Revert "Specialize optional tensor inputs to graphs in the JIT (#18360)" (#18411)
Summary:
This reverts commit 7cc7ed1322405ba3c627b9c5661a330f92c4183d.

I think it's better to sort out the issues raised in #18407 firs. I'm sorry for not stopping it earlier.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18411

Differential Revision: D14594937

Pulled By: soumith

fbshipit-source-id: 3c90b7fa7694e2f59e55607acecde4a47af801ea
2019-03-24 21:29:29 -07:00
5acac411e4 Fix deprecated: type() -> scalar_type() (#18406)
Summary:
Sorry for not sending these fixes in a single PR. I found this compiler warning when I was working on something else, and I just go to GitHub and modify the file directly for convenience...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18406

Differential Revision: D14594180

Pulled By: soumith

fbshipit-source-id: 92f48513bc62fbe2c67c759d68830a973296e43b
2019-03-24 19:46:03 -07:00
6c029c80f7 Fix deprecated: type() -> scalar_type()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18394

Differential Revision: D14593890

Pulled By: soumith

fbshipit-source-id: 92b9a8c22008341c0cc3b7a721bef1973c528daf
2019-03-24 19:27:23 -07:00
8bc5b86709 Added tensor size warning to F.mse_loss() (#18349)
Summary:
To address the issue of broadcasting giving the wrong result in `nn.MSELoss()` as mentioned here https://github.com/pytorch/pytorch/issues/16045 . In particular, the issue often arises when computing the loss between tensors with shapes (n, 1) and (n,)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18349

Differential Revision: D14594176

Pulled By: soumith

fbshipit-source-id: f23ae68a4bf42f3554ad7678a314ba2c7532a6db
2019-03-24 19:22:14 -07:00
ca962f0f95 Fix For Requires Grad Infinite Loop (#18361)
Summary:
Previously, we would continue to run requires grad on a loop body when the outputs and inputs disagreed. This adds a check so that we don't continue running if the results haven't changed since the last run.

Fix for https://github.com/pytorch/pytorch/issues/18320
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18361

Differential Revision: D14584332

Pulled By: eellison

fbshipit-source-id: 696b225f80a2036318540946428b525985a9e735
2019-03-24 14:34:50 -07:00
92c9fef860 update magma instructions (#18410)
Summary:
fixes https://github.com/pytorch/pytorch/issues/18389

cc: stas00
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18410

Differential Revision: D14594198

Pulled By: soumith

fbshipit-source-id: fb46ef77a36c90ad95e47f7066f5d32aa1f1370f
2019-03-24 13:15:11 -07:00
1323c193ed Removed some dead code (#18201)
Summary:
Removed some dead code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18201

Differential Revision: D14555251

Pulled By: izdeby

fbshipit-source-id: f49640133ef4ae1b0306f7cec6655f23869cc6e7
2019-03-24 08:24:03 -07:00
7cc7ed1322 Specialize optional tensor inputs to graphs in the JIT (#18360)
Summary:
This specializes optional tensor inputs to either a DimensionedTensorType or, when None is passed,
UndefinedTensor (aka AutogradZeroTensorType).
This works because we already have different specs and thus separate plans for the two cases.
It enhances the shape analysis - because now unwrapped optional tensors will have DimensionedTensorType with appropriate shape and required grad etc.
Also, when combined with "if-pruning" (which I understand #18259 works towards), we actually get much nicer concrete graphs, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18360

Differential Revision: D14590577

Pulled By: soumith

fbshipit-source-id: cac204a506d1d38b15703cbcc67a6b75fd4979f4
2019-03-23 23:00:37 -07:00
32d0e7e339 Move pyobj_ to TensorImpl (#18225)
Summary:
Currently, `THPVariable_Wrap(…)` and `THPVariable_NewWithVar(…)` depend on the existence of `pyobj_` in the autograd metadata of a Variable to convert the Variable to a Python tensor. However, after the Variable/Tensor merge, there will be Variables that don't contain autograd metadata, and to allow the conversion from non-autograd-meta Variable to a Python tensor we need to store the `pyobj_` outside of autograd metadata and in a place where it will always be available.

This PR makes it possible by moving `pyobj_` into TensorImpl, so that `THPVariable_Wrap(…)` and `THPVariable_NewWithVar(…)` can always access a Variable's `pyobj_` and convert the Variable to a Python tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18225

Differential Revision: D14562616

Pulled By: yf225

fbshipit-source-id: 18d4aaace70eee6120abaf9276036d1f8f51b18d
2019-03-23 12:50:38 -07:00
5860fa5dcf Fix deprecated scalar type in ATen/native/Distributions.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18265

Differential Revision: D14577543

Pulled By: ezyang

fbshipit-source-id: 36674530b32366c51835e4073d7ba23d455d2fda
2019-03-23 10:09:26 -07:00
d9960fbdb2 Revert D14446895: [C2] Implement rotated generate_proposals_op without opencv dependency (~2x faster)
Differential Revision:
D14446895

Original commit changeset: 847f2443e645

fbshipit-source-id: fc6ab5ee59e027f125f5ab0f7ee51ad7db37d4a4
2019-03-23 09:38:55 -07:00
d85451c07b Revert D14584266: [pytorch][PR] Better error message for tensor with grad as constant in tracing
Differential Revision:
D14584266

Original commit changeset: 4e7850dadc78

fbshipit-source-id: 3bb3b5006e469edff984c16e0ff8d5dac2862d88
2019-03-23 02:50:54 -07:00
7c2290e7ce Better error when module attr is used (#18164)
Summary:
Adds a suggestion to add to __constants__ when a torch.nn.Module attr is accessed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18164

Differential Revision: D14580060

Pulled By: eellison

fbshipit-source-id: 0c5adc21d7341a5691d4b45930947cb1ba84c8e8
2019-03-22 20:22:27 -07:00
7be05b822c Fix incorrect sparse add behavior when the sparse tensor has non-contiguous values (#18179)
Summary:
Currently, this code gives incorrect result:
```python
import torch
indices=torch.tensor([[7, 1, 3]])
values=torch.tensor([[1., 1., 1.],
               [1., 1., 1.],
               [1., 1., 1.]])
x = torch.sparse_coo_tensor(indices, values, size=(10, 3))
values=torch.tensor(1.).expand(3, 3)
y = torch.sparse_coo_tensor(indices, values, size=(10, 3))
z = x + y

tensor(indices=tensor([[7, 1, 3]]),
       values=tensor([[2., 1., 1.],
                      [1., 1., 1.],
                      [1., 1., 1.]]),
       size=(10, 3), nnz=3, layout=torch.sparse_coo)
```

This PR fixes the bug by adding special handling for sparse tensors with non-contiguous values in the addition function (specifically, by cat'ing the indices and values together).

This PR closes https://github.com/pytorch/pytorch/issues/17950 and https://github.com/pytorch/pytorch/issues/17919.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18179

Reviewed By: ezyang

Differential Revision: D14569591

Pulled By: yf225

fbshipit-source-id: f5a14c4a31337fc95eab64596212066b4fb18b1a
2019-03-22 19:35:14 -07:00
6052d04100 Implement rotated generate_proposals_op without opencv dependency (1.8x faster) (#18010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18010

[C2] Implement rotated generate_proposals_op without opencv dependency.

Reviewed By: newstzpz

Differential Revision: D14446895

fbshipit-source-id: 847f2443e645f8cae1327dfbaa111c48875ca9be
2019-03-22 18:15:27 -07:00
0d78126a6f Remove empty file (actual file_check.cpp resides in torch/csrc/jit/testing) (#18303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18303
ghimport-source-id: 66f4402075b123e36c6ffdf806b7c93187a1a58a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18307 Convert test_recursive_cse to use Filecheck inline annotations.
* #18306 [Filecheck] Add a feature to parse check annotations from string.
* #18305 Add python bindings for parseIR.
* **#18303 Remove empty file (actual file_check.cpp resides in torch/csrc/jit/testing)**

Differential Revision: D14586003

fbshipit-source-id: a13e57bd4302e4d3f06198068d525de25e2aa8b3
2019-03-22 17:03:25 -07:00
ff3ecfec89 Turn script_type_parser into a class (#18211)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18211
ghimport-source-id: 73b81e9ec631937b14db1da10991831788a6894b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18296 [jit] Add namespacing for ScriptClasses
* #18284 [jit] make test module hook use save/load
* **#18211 [jit] Turn script_type_parser into a class**
* #18148 [jit] python interop for script classes

If we are namespacing classes, the type parser will need to carry around
some state about which namespaces to look in. This PR just wraps it in a
class in preparation.

Also, subscriptToType can no longer be static, since parseTypeFromExpr
may give different results depending on the namespaces available, so
it's been made a regular function instead of a static map lookup.

Reviewed By: eellison

Differential Revision: D14581128

fbshipit-source-id: 711315472ccde1920abf9fdb5a871ac27fb86787
2019-03-22 16:30:05 -07:00
10751d5fb4 python interop for script classes (#18148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18148
ghimport-source-id: 40a9d745dc9aeba53d098743323fcbd50ca65137

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18148 py interop**

Support for converting classes between the Python–TorchScript boundary. Like other TorchScript values, ScriptClasses are native Python values when used in Python and IValues when used in TorchScript.

Notably, there is a copy across this boundary, which will be surprising to users who will expect standard Python reference semantics. I have some ideas for fixing that, but it's a more involved process.

Reviewed By: jamesr66a

Differential Revision: D14526259

fbshipit-source-id: 5916e3032488a42dc7da756c1826d7c040a21ebd
2019-03-22 16:30:04 -07:00
3badea6eb3 Better error message for tensor with grad as constant in tracing (#18298)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/17583

There's an unrelated issue right now causing a segfault when printing tensor so that might have to fixed first for this to land
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18298

Differential Revision: D14584266

Pulled By: eellison

fbshipit-source-id: 4e7850dadc78ef1e98ad40b9d8adc0fef42acf48
2019-03-22 15:29:30 -07:00
2ad2b2c7b1 Support for basic list comprehensions (#17267)
Summary:
Supports the following syntax:
```
        torch.jit.script
        def comp(l):
            # type: (List[float]) -> List[float]

            n = [x * 3 for x in l]
            return n
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17267

Differential Revision: D14581119

Pulled By: Krovatkin

fbshipit-source-id: 6fd091a8a9ab607386ac58fda6ad88bf8aea380e
2019-03-22 15:25:13 -07:00
e20894fce5 Make it possible to trigger XLA/slow tests via commit message. (#18345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18345
ghimport-source-id: 9649d76bb194866859d62e6ba2a3a265c96ebba5

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18345 Make it possible to trigger XLA/slow tests via commit message.**

Four variants are supported: `[xla ci] [ci xla] [xla test] [test xla]`; substitute
xla with slow for slow tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584557

fbshipit-source-id: fcbfdfb28246823135bb3d3910baae073d16e81d
2019-03-22 15:06:40 -07:00
f68faa35c0 Avoid refcount when looking up dispatch key
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18294

Reviewed By: ezyang

Differential Revision: D14512979

fbshipit-source-id: 45e548974f06184c375c2bb8339e3049a4ebd880
2019-03-22 14:09:20 -07:00
e5eb871419 Fix DCHECK to handle dangling else (#18295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18295

Replace "if (false)" with "while (false)" which fixes potential dangling else issue as shown in added test case.

Reviewed By: ezyang

Differential Revision: D14569608

fbshipit-source-id: 407052db9182ce27b7a59841e90fa50d3eca262e
2019-03-22 14:04:29 -07:00
ed47b85d3b Allow fusion of float function arguments (#18087)
Summary:
so that functions like `def fn(x, p:float)` can be fused. Fixes #9940 and #11186. Fuses only float (not integer) arguments to simplify assembling arguments for fusion launch.
CPU fusion is disabled in CI and this won't be tested, but I tested it locally.
cc t-vi, apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18087

Differential Revision: D14581206

Pulled By: wanchaol

fbshipit-source-id: ccb0cf79b1751706f9b2cdf1715115eae5a39fb6
2019-03-22 13:52:33 -07:00
2aac18098d Fix error reporting in NVRTC use of the fuser (#18327)
Summary:
Two functions were not directed ad NVRTC.
It's a bit hard to test this, as the fuser usually produces correct code - unless I try to hack on it. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18327

Differential Revision: D14579285

Pulled By: soumith

fbshipit-source-id: 1be7ba461cc473d514ba619507742a47d4d7c97e
2019-03-22 13:41:00 -07:00
fe5d23cf4a Using sqrt for better precision in cosine_similarity (#18250)
Summary:
address comment in #18168 .
Testing in CI...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18250

Differential Revision: D14568601

Pulled By: ailzhang

fbshipit-source-id: 39fbbdb08743b53fa665c7e88e4750cbe0976ec7
2019-03-22 13:33:30 -07:00
18a6781f57 Fix alignment issues for Fake BFP16 fp32 -> bfp16 rounding routines (#18321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18321

As title.

Reviewed By: jspark1105

Differential Revision: D14575512

fbshipit-source-id: 0e33cdab54b1aef8b67f0b4c366692c5dbdf631d
2019-03-22 12:41:58 -07:00
6e0cbc7f31 Untangle internal build python and cpp dependencies
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18326

Reviewed By: ezyang

Differential Revision: D14576294

fbshipit-source-id: 186ce1e3d026d962b7386f861eddf093f583a878
2019-03-22 12:18:03 -07:00
d4c52158c7 Caffe2: crash op (#18207)
Summary:
this is handy when testing various core dump related
things. If in the future we want to unit test our future gdb debugger
extensions, we can use this op to generate a core dump for us within a
unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18207

Differential Revision: D14482186

Pulled By: salexspb

fbshipit-source-id: 39a9fffbdd4bd083597f544d1c783a82cf023a89
2019-03-22 11:52:01 -07:00
172ec4ace5 caffe2 - Util to cleanup external inputs and outputs from a NetDef (#18194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18194

Add a util method to cleanup external inputs and outputs from a NetDef

The following conditions will be met after the modification
- No duplicate external inputs
- No duplicate external outputs
- Going through list of ops in order, all op inputs must be outputs
from other ops, or registered as external inputs.
- All external outputs must be outputs of some operators.

Reviewed By: ZolotukhinM

Differential Revision: D14528589

fbshipit-source-id: c8d82fda1946aa3696abcbec869a4a8bb22f09b6
2019-03-22 11:23:03 -07:00
7397eb7e8e End to end hack to call server side Caffe2 ops (#18267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18267

Motivation: we don't actually want to use it for real under any circumstances. This is an idea to unblock our internal progress and parallelize workstreams. We can easily define schemas for all ops in question and implement forwarding to C2 ops which is NOT going to be performant. Then several things can be happening in parallel:
* move code of ops outside of C2 ops that depend on protobuf into c10
* development of optimization/fusion passes
* building python-level wrappers with clean API
* improving perf

This demonstrates, Relu, quant, dequant. It seems to cover all use cases necessary (maybe except weights prepacking). Ideally I'd demonstrate Conv, but will get to it later in a separate PR (contributions welcomed)

Reviewed By: ezyang

Differential Revision: D14531232

fbshipit-source-id: 4cd4a71ae0cb373c6c0e81f965c442b82a1b4069
2019-03-22 11:17:45 -07:00
f6df6aed89 Optimize MomentumSGDUpdate maximum block size and make it templated
Summary: Removing the maximum number of blocks limit from the operator and making the nesterov parameter templated to remove branching.

Reviewed By: BIT-silence

Differential Revision: D14567003

fbshipit-source-id: 394c2039ee214adc6ccd2e562e4e9563d307131f
2019-03-22 09:54:25 -07:00
e3da16a99e Add test for #17271 (torch.exp incorrect for 2**31 size tensor) (#18292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18292
ghimport-source-id: a3e96584db0eef7b6202a1211808f9f6e59dd529

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18292 Add test for #17271 (torch.exp incorrect for 2**31 size tensor)**
* #18291 Correctly call superclass setUp in TestCase subclasses.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14567642

fbshipit-source-id: c60ee7597a86f5d2c5c0b72cb106f17815950427
2019-03-22 07:50:38 -07:00
2934153f35 Correctly call superclass setUp in TestCase subclasses. (#18291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18291
ghimport-source-id: d6e95e899bd320407967df41435801e54864ba62

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18292 Add test for #17271 (torch.exp incorrect for 2**31 size tensor)
* **#18291 Correctly call superclass setUp in TestCase subclasses.**

This makes PYTORCH_TEST_SKIP_FAST work correctly for more
tests, reducing the wasted testing effort on our slow_test job.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14567643

fbshipit-source-id: 40cf1d6556e0dd0a0550ff3d9ffed8b6000f8191
2019-03-22 07:46:44 -07:00
46990c20fa Verify def before infer fensor (#18129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129

A lot of tensor interference function assume the operator passes the schema.
So call Verity to make sure this is actually the case.

Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference)

Reviewed By: mdschatz

Differential Revision: D14503933

fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb
2019-03-22 06:36:25 -07:00
77a7285764 add more Python interface functions to make quantization simpler (#18246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18246

Simplifies histogram collection and quantization process.

Histogram collection before this diff was something like this
```
from caffe2.quantization.server import dnnlowp_pybind11
...
dnnlowp_pybind11.ObserveHistogramOfOutput(hist_file)
for ...
   workspace.RunNet(predict_net)
dnnlowp_pybind11.ClearNetObservers()  # This is to trigger Stop function in the observer to dump out histogram file but this can have unintended consequence of also clearing all the other useful observers we attached
```

After this diff we can
```
workspace.CreateNet(predict_net)  # Note we need to create net to have a net to attach observer
histogram_observer = dnnlowp_pybind11.AddHistogramObserver(predic_net, hist_file)
for ...
   workspace.RunNet(predict_net)
predict_net.RemoveObserver(histogram_observer)
```

Choosing quantization parameters of weights before this diff was something like this
```
dnnlowp_pybind11.ObserveHistogramOfOutput(weight_hist_file)
workspace.RunNetOnce(init_net)
dnnlowp_pybind11.ClearNetObservers() # Has same issue as the histogram collection example above

dnnlowp_pybind11.RegisterQuantizationParamsWithHistogram(
    weight_hist_file, is_weight=True, qparams_output_file_name=qparams_file
)
workspace.CreateNet(init_net, overwrite=True)
dnnlowp_pybind11.ClearNetObservers()

logger.info("Loading quantization params from {}".format(qparams_file))
blobs_to_qparams = {}
with open(qparams_file) as f:
    lines = f.readlines()
for line in lines:
    op_id, op_type, output_id, tensor_name, mini, maxi, scale, zero_point, precision = (
        line.split()
    )
    op_id = int(op_id)
    output_id = int(output_id)
    op = net.Proto().op[op_id]
    if op_type != op.type or op.output[output_id] != tensor_name:
        print(
            "Corrupt qparams file {} {} {} {} {}".format(
                qparams_file, op_type, op.type, op.output[output_id], tensor_name
            )
        )
    blobs_to_qparams[tensor_name] = QuantizationParam(float(scale), int(zero_point))

```

After this diff this can be simplified to
```
blobs_to_qparams = {}
for op in init_net.Proto().op:
    for output in op.output:
        scale, zero_point = dnnlowp_pybind11.ChooseQuantizationParams(output)
        blobs_to_qparams[output] = QuantizationParam(scale, zero_point)
```

Reviewed By: dskhudia

Differential Revision: D14544694

fbshipit-source-id: 4fd06cd63256201e2e9d15c39f503138d1be53c2
2019-03-22 00:52:24 -07:00
f3cf6ed789 add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#18257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18257

support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it.

Reviewed By: jspark1105

Differential Revision: D14552695

fbshipit-source-id: 53dd44c84ad019019ab9f35fc04d076b7f941ddc
2019-03-22 00:19:59 -07:00
afc7574aed Automatic update of fbcode/onnx to c05f2ae412daf8fd64136ca354b97ccf73e0ea6c (#18285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18285

Previous import was 96c58ceeacf0f2b73d752e413e4fd78787a12da3

Included changes:
- **[c05f2ae4](https://github.com/onnx/onnx/commit/c05f2ae4)**: update both core and ml docs (#1879) <Lu Fang>
- **[f895279b](https://github.com/onnx/onnx/commit/f895279b)**: fix the problems introduced in previous PRs in operator registration (#1878) <Lu Fang>
- **[f6f80657](https://github.com/onnx/onnx/commit/f6f80657)**: Skip the schema check on ops in non-standard domain (#1876) <Lu Fang>
- **[8c8be722](https://github.com/onnx/onnx/commit/8c8be722)**: Introduce Function Body Helper  (#1868) <Sherlock>
- **[b605eafb](https://github.com/onnx/onnx/commit/b605eafb)**: Support down sampling for Upsample with scales < 1. (#1773) <Ke Zhang>
- **[47f7aa71](https://github.com/onnx/onnx/commit/47f7aa71)**: Remove scaledtanh (#1866) <Ashwini Khade>
- **[4dfc56de](https://github.com/onnx/onnx/commit/4dfc56de)**: Add Ceil support for Max and Average Pooling (#1860) <Lara Haidar>
- **[552a8efc](https://github.com/onnx/onnx/commit/552a8efc)**: Add testcase generator for functions (#1862) <Raymond Yang>
- **[fdb978a5](https://github.com/onnx/onnx/commit/fdb978a5)**: Promote Thresholded Relu Op (#1856) <Ashwini Khade>
- **[ce332628](https://github.com/onnx/onnx/commit/ce332628)**: Update Slice with dynamic input & optional input steps (#1836) <Bowen Bao>
- **[3a9a8787](https://github.com/onnx/onnx/commit/3a9a8787)**: Merge function into opschema (#1834) <Raymond Yang>
- **[3dbf8fe9](https://github.com/onnx/onnx/commit/3dbf8fe9)**: Handle string comparision represented as np.objects (#1851) <Dmitri Smirnov>
- **[3b0d3bb2](https://github.com/onnx/onnx/commit/3b0d3bb2)**: remove global variable in header file (#1850) <Lu Fang>
- **[1cca8733](https://github.com/onnx/onnx/commit/1cca8733)**: bump the version for drop out - fix the issue that the version was not bumped when changing its type constraint declaration. (#1848) <Ke Zhang>
- **[1ec81bc6](https://github.com/onnx/onnx/commit/1ec81bc6)**: Change TopK operator to allow dynamic 'k' (#1829) <Hariharan Seshadri>
- **[a89a4a16](https://github.com/onnx/onnx/commit/a89a4a16)**: Remove exp op: Affine, ImageScaler,ParametricSoftplus, Crop. (#1832) <Ke Zhang>

Reviewed By: yinghai

Differential Revision: D14566202

fbshipit-source-id: b1e5912ae6887e2865fc628363071e2b9938dfa4
2019-03-22 00:13:42 -07:00
f79eac2c7a Cleanup TorchScript rst docs (#18234)
Summary:
* Adds more headers for easier scanning
* Adds some line breaks so things are displayed correctly
* Minor copy/spelling stuff
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18234

Reviewed By: ezyang

Differential Revision: D14567737

Pulled By: driazati

fbshipit-source-id: 046d991f7aab8e00e9887edb745968cb79a29441
2019-03-21 20:19:17 -07:00
46439c78d0 Replace the remaining usages of IntList in caffe2 to IntArrayRef
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18282

Differential Revision: D14569269

Pulled By: bddppq

fbshipit-source-id: 5fc33701b83f9efdec4b456d2691764831d10e7f
2019-03-21 16:34:38 -07:00
979db03722 Blacklist certain op types when doing bound shape inference (#18290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18290

Some such as `Tile` will mess up our tracking of batch size and for now it makes sense to stop the shape inference on these ops so that we don't lower it and downstream ops without proper batch info.

Reviewed By: zrphercule

Differential Revision: D14463550

fbshipit-source-id: 2792481efa540f2a7dd310e677c213860c3053ca
2019-03-21 15:43:05 -07:00
104773c715 Fix use of c10::guts::apply (#18159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18159

In some instances, the call to forward could clash with std::forward. Fully qualify it to make sure it gets the right one

Reviewed By: ezyang

Differential Revision: D14512189

fbshipit-source-id: 6242607dbe54fcdb93229c1a4aaee8b84a88caa1
2019-03-21 14:57:33 -07:00
8b94de06af Allow using C10_DECLARE_TENSOR_TYPE and C10_DEFINE_TENSOR_TYPE from any namespace (#18158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18158

They didn't work when called from other namespaces before because they didn't fully specify the c10 namespace.

Reviewed By: ezyang

Differential Revision: D14512187

fbshipit-source-id: a496b89a1bbe2b56137cfae03ab94a60f38d7068
2019-03-21 14:57:32 -07:00
daa77c6e26 Move schema inference to c10 (#18090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18090

This schema inference is needed by the c10 operator registration mechanism. Move it to c10.
It is going to be used by diffs stacked on top.

Reviewed By: ezyang

Differential Revision: D14491454

fbshipit-source-id: 0f8ddcdbd91467c8347d315dd443a1ca8b216481
2019-03-21 14:57:30 -07:00
1877087df2 Allow registering same operator schema multiple times (#18038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18038

Now that we have named overloads, we can allow registering the same function schema multiple times and just check it's identical.

This is going to be used in custom op registration since they register the schema every time a kernel is registered.

Reviewed By: dzhulgakov

Differential Revision: D14467494

fbshipit-source-id: 2c26cf72a64b65f120afe05e989302ec42597515
2019-03-21 14:57:28 -07:00
291746f110 Rename trtrs to triangular_solve (#18213)
Summary:
Changelog:
- Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage.
- Move `isnan` to _torch_docs.py
- Remove unnecessary imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213

Differential Revision: D14566902

Pulled By: ezyang

fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f
2019-03-21 14:27:21 -07:00
1c671c56c1 Fix contribution_guide docs (#18237)
Summary:
Fixes Typo and a Link in the `docs/source/community/contribution_guide.rst`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18237

Differential Revision: D14566907

Pulled By: ezyang

fbshipit-source-id: 3a75797ab6b27d28dd5566d9b189d80395024eaf
2019-03-21 13:20:57 -07:00
cf19ad2152 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 80b00c33e6f6c7cfa08f645cd33419f6545f45d2
2019-03-21 13:15:54 -07:00
43a5c636e2 Optimize group_norm_op (#17945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17945

Optimize group_norm_op

Reviewed By: houseroad

Differential Revision: D14419908

fbshipit-source-id: 4024b5c5dbeff97f4f026d61fc44af1f0e98ed68
2019-03-21 13:05:01 -07:00
9214852da2 Enable running of slow tests in CI. (#18236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18236
ghimport-source-id: 2bb80d017c2ea833669a2d55b340a922b2d44685

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18236 Enable running of slow tests in CI.**
* #18231 Add a decorator for marking slow tests.

These tests only run on master, as they are slow.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14563115

fbshipit-source-id: f54ddef4abedc7e872e58657fc9ac537952773d0
2019-03-21 12:44:45 -07:00
a7d886b9a0 Run clang-format on torch/csrc/distributed/c10d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18255

Differential Revision: D14563072

Pulled By: pietern

fbshipit-source-id: bd83f90ae949b14bc95f4009ba12319c9b7936d0
2019-03-21 11:55:11 -07:00
99ddcb2c9f Shut up compiler about unused the_type. (#18278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18278
ghimport-source-id: 3c35f6e7229c3c2b3a27d96370d7c05fad58365e

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18278 Shut up compiler about unused this_type.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14563050

fbshipit-source-id: 4b516f6c9ef3784d1430f793f304066c351b1a93
2019-03-21 11:39:21 -07:00
549c4da917 Add a decorator for marking slow tests. (#18231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18231
ghimport-source-id: 78c230f60c41877fe91b89c8c979b160f36f856b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18231 Add a decorator for marking slow tests.**

The general strategy:
- It's a normal skip decorator, which triggers a skip if
  PYTORCH_TEST_WITH_SLOW is not set.
- It also annotates the method in question that says it's
  slow.  We use this to implement a catch-all skipper in
  setUp that skips all non-slow tests when
  PYTORCH_TEST_SKIP_FAST is set.

I added a little smoketest to test_torch and showed that I get:

```
Ran 432 tests in 0.017s
OK (skipped=431)
```

when running with PYTORCH_TEST_WITH_SLOW=1 and PYTORCH_TEST_SKIP_FAST=1

CI integration coming in later patch, as well as nontrivial uses of
this decorator.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14544441

fbshipit-source-id: 54435ce4ec827193e019887178c09ebeae3ae2c9
2019-03-21 11:17:34 -07:00
3eff333bff lint changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18276

Differential Revision: D14563385

Pulled By: ifedan

fbshipit-source-id: 12a51dbdb7b9e96be9fefa21fe298796b1ae6b58
2019-03-21 11:11:35 -07:00
8356ffa922 move median to ATen (#17637)
Summary:
This moves median to ATen.

- median with dimension reduces to kthvalue
- median without dimension (aka medianall) is implemented in parallel to kthvalue because we would not want to reshape (copying for non-contiguous) and then copy again in kthvalue. We can sue the helper functions we moved from kthvalue.
- `median_cuda` was accidentally already put into ATen in #17544.
- The quickselect algorthm without indices for CPU in TH is now obsolete and removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17637

Differential Revision: D14346510

Pulled By: ezyang

fbshipit-source-id: c07ad144efbd6b4194179bb1c02635862521d8cb
2019-03-21 10:02:04 -07:00
d1497debf2 Fix B903 lint: save memory for data classes with slots/namedtuple (#18184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18184
ghimport-source-id: 2ce860b07c58d06dc10cd7e5b97d4ef7c709a50d

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18184 Fix B903 lint: save memory for data classes with slots/namedtuple**
* #18181 Fix B902 lint error: invalid first argument.
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530872

fbshipit-source-id: e26cecab3a8545e7638454c28e654e7b82a3c08a
2019-03-21 09:10:30 -07:00
ba81074c40 Fix B902 lint error: invalid first argument. (#18181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181
ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* **#18181 Fix B902 lint error: invalid first argument.**
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

A variety of sins were committed:
- Some code was dead
- Some code was actually a staticmethod
- Some code just named it the wrong way
- Some code was purposely testing the omitted case

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530876

fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e
2019-03-21 09:10:28 -07:00
0654c7d4a7 Fix B006 lint errors: using mutable structure in default argument. (#18178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18178
ghimport-source-id: 667ee76b418f505fa64b863e52a603c508dcd1bf

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* #18181 Fix B902 lint error: invalid first argument.
* **#18178 Fix B006 lint errors: using mutable structure in default argument.**
* #18177 Fix lstrip bug revealed by B005 lint

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530874

fbshipit-source-id: 38f4456a085bfe55f2a96fff53028ebd0d621604
2019-03-21 09:10:25 -07:00
0122121176 Two amendments for the shape analysis (#18271)
Summary:
Two small refinements to the shape analysis:
- `detach` can set requires grad to false for dimensioned tensors (not sure if I would also need to deal with Complete?).
- add `batch_norm_stats`.

I noticed these while looking at what's going on when trying to code batch norm manually. (Hi wanchaol )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18271

Differential Revision: D14561303

Pulled By: ezyang

fbshipit-source-id: 64a6879392e77403c44f2ed82f84b6397754d0ea
2019-03-21 08:07:51 -07:00
9bc8badbcb Fix lstrip bug revealed by B005 lint (#18177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18177
ghimport-source-id: fbbf915b66762fc88bc5b541464e71ba27500958

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* #18181 Fix B902 lint error: invalid first argument.
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* **#18177 Fix lstrip bug revealed by B005 lint**

lstrip() doesn't strip a prefix; it strips all of the characters
in the passed in string.  B005 lint revealed this.  Replaced with
substring operation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530873

fbshipit-source-id: 13b3438fcc3cce13b5110730dc3d0b528a52930f
2019-03-21 07:56:24 -07:00
e5cdd94324 Backward function for torch.cdist
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17173

Differential Revision: D14111482

Pulled By: ifedan

fbshipit-source-id: d72cfd53c29d0f8cf5f8ad1148d14f3d5abd938e
2019-03-21 00:39:29 -07:00
2016daaf51 Fix ONNX symbolic for argmin and argmax (#18261)
Summary:
Fix the problem introduced in https://github.com/pytorch/pytorch/pull/17103
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18261

Reviewed By: bddppq

Differential Revision: D14558781

Pulled By: houseroad

fbshipit-source-id: 7bb50072e77d1d7b2a93f4011fa1362f26e9df1c
2019-03-20 22:51:13 -07:00
e04c9195b7 Update math::Transpose to support tensor with size > 2G (#17670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17670

Update math::Transpose to support tensor with size > 2G

i-am-not-moving-c2-to-c10

Differential Revision: D14313624

fbshipit-source-id: 0b4a85b913972e5a8981f0d40d0c539407b98f30
2019-03-20 18:22:21 -07:00
bbbabda4e8 handle dst_bin_width==0 case properly (#18240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18240

For rare cases when dst_bin_width == 0 we should just put all numbers to an arbitrary bin.

Reviewed By: csummersea

Differential Revision: D14544685

fbshipit-source-id: 02d04ff8bd1555d6cf7e7eeb1196a4ab3325a9e5
2019-03-20 17:11:25 -07:00
e12091d0a3 Revert D14114134: [asr] add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta
Differential Revision:
D14114134

Original commit changeset: 112bb2ceb9d3

fbshipit-source-id: 763262c1b78eed88a653caad5adc27d97feb43aa
2019-03-20 16:32:53 -07:00
7e6220393f Cleanup arg{min, max} (#17103)
Summary:
Why do we need this workaround? `PythonArgParser` handles these two cases well.

The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was:

> Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy

Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature.

Will keep an eye on the CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103

Differential Revision: D14523503

Pulled By: VitalyFedyunin

fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80
2019-03-20 16:28:27 -07:00
ebc9f75895 Added the exception of ignore_index (#18117)
Summary:
Fix #17801 to add an exception regarding `ignore_index` in the documentation for `torch.nn.CrossEntropyLoss` and `torch.nn.NLLLoss`

If any other files/functions are hit, I'd be glad to incorporate the changes there too! 😊
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18117

Differential Revision: D14542079

Pulled By: ezyang

fbshipit-source-id: 7b918ac61f441dde7d3d6782d080c500cf2097f1
2019-03-20 16:03:34 -07:00
fd35814348 Add .get() for dicts (#18238)
Summary:
Fixes #18232
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18238

Differential Revision: D14546689

Pulled By: driazati

fbshipit-source-id: ed021e6f54c891d6c734c8f2345f4e83a3c6c905
2019-03-20 14:57:13 -07:00
d5328a8a30 Update nccl submodule to 2.4.2 (#17883)
Summary:
Didn't test this. Let's see what happens.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17883

Differential Revision: D14547470

Pulled By: pietern

fbshipit-source-id: c35d232f6bcc5a2dce55da636a0acbea5c2725d8
2019-03-20 14:39:52 -07:00
83d84c22e4 Reinstate ncclCommDestroy (#17943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17943

Together with xw285cornell came up with a solution for static destruction
order fiasco that caused the NCCL context to be destroyed **after**
the CUDA context was already destroyed. In this commit we destroy all
cached NCCL contexts as soon as the last NCCL related Caffe2 operator
instance is destructed, thereby avoiding a dependency on static
variable destruction.

Reviewed By: xw285cornell

Differential Revision: D14429724

fbshipit-source-id: fe5ce4b02b1002af8d9f57f6fa089b7a80e316ce
2019-03-20 14:20:45 -07:00
272a48f6fe Enable autograd to recognize the XLA backend as one providing multiple devices (#17847)
Summary:
…e devices, while not being CUDA/HIP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17847

Differential Revision: D14545634

Pulled By: ezyang

fbshipit-source-id: 417181bf2ff4f8978544afe2fb6b042e787854ed
2019-03-20 13:58:36 -07:00
1b71f6d4eb add fbgemm fp16 (fbfcpacked) support, add global_init_net in predictor_export_meta (#17905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17905

support adding op in global_init_net. because pred_init_net is per thread, and just doesn't cut it.

Reviewed By: jspark1105

Differential Revision: D14114134

fbshipit-source-id: 112bb2ceb9d3d5e663dd430585567f4eaa2db35f
2019-03-20 13:52:10 -07:00
1442808fcd fixed typo in shape_analysis.cpp (#18227)
Summary:
cc: VitalyFedyunin
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18227

Differential Revision: D14541764

Pulled By: VitalyFedyunin

fbshipit-source-id: 9477deb1a99e6581f15a4de4d7631d747f56f3a6
2019-03-20 12:47:32 -07:00
18b31b73fb Retain the parameter names in ONNX exporter (#17551)
Summary:
So, we will keep the names of ONNX initializers the same as the names in PyTorch state dict.

Later, we will make this as the default behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17551

Reviewed By: dzhulgakov

Differential Revision: D14491920

Pulled By: houseroad

fbshipit-source-id: f355c02e1b90d7ebbebf4be7c0fb6ae208ec795f
2019-03-20 12:11:23 -07:00
abc171bd53 Fix typo in docstring
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18216

Differential Revision: D14539824

Pulled By: ezyang

fbshipit-source-id: 490b72951a75f3f8b949a2d692d660a3693ee98a
2019-03-20 11:16:36 -07:00
a519217ee7 Add batched version of trtrs (#18025)
Summary:
- Remove single batch TH/THC implementations
- Remove `_batch_trtrs_lower` from `multivariate_normal`
- Add tests for batched behavior
- Modify trtrs_backward to accommodate for batched case
- Modify docs

In a future PR, this will be renamed to `triangular_solve`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18025

Differential Revision: D14523004

Pulled By: ifedan

fbshipit-source-id: 11c6a967d107f969b60e5a5c73ce6bb8099ebbe1
2019-03-20 11:11:32 -07:00
e312801453 Remove GLOO usage when USE_GLOO is OFF
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18203

Differential Revision: D14540520

Pulled By: soumith

fbshipit-source-id: f1c96cc563ed1e913040e3e16b109d3e3030128c
2019-03-20 09:31:53 -07:00
2a6cbfaccf Enable 32 bit CPU build on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18176

Differential Revision: D14539884

Pulled By: ezyang

fbshipit-source-id: 0e4bd9c1ef1830cd9bcc40df36b87534f61def08
2019-03-20 09:26:50 -07:00
19c13eee39 Correct cmake flags passing (#18217)
Summary:
Fixes #18214.

According to the CMake manual, we should pass the arguments first, and put the directory as the last element. Otherwise, these flags may not be passed correctly.

Reference:
1. https://cmake.org/cmake/help/latest/manual/cmake.1.html#synopsis
2. https://stackoverflow.com/a/27169347
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18217

Differential Revision: D14540588

Pulled By: ezyang

fbshipit-source-id: a027f585dde66c5da7bbbe584fa42c3e56027d59
2019-03-20 09:21:31 -07:00
bd1271338a Add python_variable._is_view for debugging. (#18197)
Summary:
I don't know if we actually want to expose this or not, but it's useful for debugging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18197

Reviewed By: ezyang

Differential Revision: D14530712

Pulled By: gchanan

fbshipit-source-id: 98fdba9cf113738f0db3a198c49365de536b9919
2019-03-20 08:43:02 -07:00
4741d613ee Do not apply these explicit unroll pragmas for ROCm. (#18204)
Summary:
Loop analysis indicates that there is a runtime trip count and hence
unrolling cannot take place.

This will silence compile-time warnings we have been observing with recent ROCm releases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18204

Differential Revision: D14539875

Pulled By: ezyang

fbshipit-source-id: a7ea7f2a95603754296b76a6b62a154f56f4ad4d
2019-03-20 08:06:07 -07:00
8f1db1c6c1 Copy-edit CONTRIBUTING and update. (#18131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18131
ghimport-source-id: 473dae70f6c236d317bec77d894310c0aa0376ec

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18131 Copy-edit CONTRIBUTING and update.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14505049

fbshipit-source-id: 02aeae33c0049889243c56dd0d761487dac2351e
2019-03-20 07:40:59 -07:00
8895bfba6a fix cosine_similarity (#18168)
Summary:
fixes #18057 according to colesbury 's suggestion. Thanks!
cc: ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18168

Differential Revision: D14520953

Pulled By: ailzhang

fbshipit-source-id: 970e6cfb482d857a81721ec1d0ee4a4df84a0450
2019-03-19 20:09:17 -07:00
3baf99bea7 Breakup test misc pt2 (#18191)
Summary:
Further breakup test_misc.h. The remaining tests don't directly map to a jit file so I left them in test_misc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18191

Differential Revision: D14533442

Pulled By: eellison

fbshipit-source-id: 7f538ce0aea208b6b55a4716dfcf039548305041
2019-03-19 19:41:22 -07:00
9d455ac2fe Add serialization docs to jit/README (#17951)
Summary:
Documents the serialization format for `torch.jit.save`. Some of the info is copied from houseroad's internal doc.

[Formatted Markdown](https://github.com/driazati/pytorch/blob/serial_docs/torch/csrc/jit/README.md)

Also refactors the readme to have a heading hierarchy + table of contents
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17951

Differential Revision: D14531644

Pulled By: driazati

fbshipit-source-id: cbcd9462054cc9f8a2f8cea2c98d8aba4e7d227c
2019-03-19 16:47:04 -07:00
08aa973fb8 Turn on Travis builds for ghstack PRs. (#18193)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18193
ghimport-source-id: 540859cf0b238a9832f45b3f4c2351e3343fc1a2

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18193 Turn on Travis builds for ghstack PRs.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14529945

fbshipit-source-id: 4476e996e311a04f2a997ca9b7c4cf2157dd6286
2019-03-19 14:51:07 -07:00
cd6a6c54c6 do not throw when unicode is seen in pull request info (#18195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18195
ghimport-source-id: 05102cb115c6bd6d141f51905e20155bcd79a908

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18195 [build] do not throw when unicode is seen in pull request info**

Differential Revision: D14529707

fbshipit-source-id: 2f6a31b01b3a9b044fd24be466cc5325b70929ad
2019-03-19 14:45:47 -07:00
6758f5587f Delete bugbear from Python 2 lint. (#18192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18192
ghimport-source-id: 9523a09d7ec202ef08cf0ecdf48c42739ea6b0ce

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18192 Delete bugbear from Python 2 lint.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14529240

fbshipit-source-id: 1a433b53dd38d1c455e8c0750d97c594ac51ef09
2019-03-19 14:24:03 -07:00
1bc4eb93c7 Support attributes when emitting function calls (#18156)
Summary:
The type of each `initial_ivalue` is completely known at some point but that information is discarded by the time a call to it is emitted. This PR is kind of a hack, as a better (longer) solution, the method should know about the type of each initial value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18156

Differential Revision: D14525768

Pulled By: driazati

fbshipit-source-id: 52d53e9711a07a4551c988bd95fe997e654aa465
2019-03-19 13:56:40 -07:00
f212fd9fd6 Customized pin_memory for PackedSequence (#18079)
Summary:
fixes https://github.com/pytorch/pytorch/issues/18078
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18079

Reviewed By: ezyang

Differential Revision: D14521192

Pulled By: zou3519

fbshipit-source-id: cec773a3a6f2c405a0d9701e213b7caf81649181
2019-03-19 13:41:30 -07:00
916a670828 Enable flake8-bugbear line length checking. (#18138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18138
ghimport-source-id: be62a71ef98714e6f168a00f84120f612363528e

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18138 Enable flake8-bugbear line length checking.**

flake8-bugbear's line length checker (B950) which permits violations
of up to 10% but specifies the "true" limit when you go over.

I had to ignore a bunch of flake8-bugbear's other checks when I
turned this on.  They're good checks though (they're turned on
in fbcode) and we should fix them eventually.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: salexspb

Differential Revision: D14508678

fbshipit-source-id: 2610ecc0dd43cc0788d77f4d024ebd85b26b8d41
2019-03-19 13:31:04 -07:00
794c631e23 fix bug in alias analysis (#18146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18146
ghimport-source-id: 4b061c27c5c44ef0d06066490ed16cab3d0c7a64

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18146 [jit] fix bug in alias analysis**

We handled hasWriters() incorrectly in the case of wildcards. There's
even a comment describing the correct behavior. Sad!

Much thanks to t-vi for tracking this down and suggesting the fix!

Differential Revision: D14524208

fbshipit-source-id: 8010b54257241bd64013a0d0a8b6e7d22d8c70af
2019-03-19 11:35:28 -07:00
234bb8719a Add backend checks to solve methods (gesv, cholesky_solve) (#18116)
Summary:
Changelog:
- Incorporate a simple backend check in the linearSolveCheckInputs function in LinearAlgebraUtils.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18116

Differential Revision: D14504469

Pulled By: soumith

fbshipit-source-id: 7402b6dbaa8d73048946613b806d54f68bcbd8f4
2019-03-19 10:44:45 -07:00
7bb36ada1f fix -Wsign-compare warnings for some files inside c2 (#18123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123

the motivation of this fix is to resolve things like:
for(auto i = 0; i < N; i++) where N is bigger than int32

These instances of comparison were found by enabling -Wsign-compare

There are way too many things to fix, so issuing this as a series of fixes

The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances

Reviewed By: ZolotukhinM

Differential Revision: D14497094

fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe
2019-03-19 10:39:20 -07:00
1c76746f61 SGD: remove unneeded multiply-add initialization operations (#18114)
Summary:
The momentum buffer is initialized to the value of
d_p, but the current code takes the long way to do this:
1. Create a buffer of zeros
2. Multiply the buffer by the momentum coefficient
3. Add d_p to the buffer

All of these can be collapsed into a single step:
1. Create a clone of d_p
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18114

Differential Revision: D14509122

Pulled By: ezyang

fbshipit-source-id: 4a79b896201d5ff20770b7ae790c244ba744edb8
2019-03-19 10:34:17 -07:00
a50ba7e238 specialized CUDA impl for dropout in AD (#17756)
Summary:
In aten we have a _fused_dropout implementation for CUDA case. As ngimel suggested if we discard it in JIT AD, it hurts performance.

It doesn't seem ideal to include backend specific implementation in AD, but this is helpful to prevent performance regression atm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17756

Differential Revision: D14368999

Pulled By: ailzhang

fbshipit-source-id: 9a371c5020f630e8f6e496849ec9772b6f196169
2019-03-19 10:34:15 -07:00
9a153412fd Fix underflow issue with dirichlet sample (#17488)
Summary:
Addresses #15738, using fritzo's suggestion. This adds a `torch._sample_dirichlet` method in `Distributions.cpp` and `Distributions.cu`.
 - For CPU, this leads to no perf hit since all we do is to promote the `alpha` to double when getting the gamma samples (the gamma sampler anyways uses `accscalar_t`(double for CPU)) and cast it back to float32 on return.
 - I have added an analogous method for CUDA as well, but the default sampler for CUDA uses scalar_t for efficiency, so I have kept it as that. With this, I do not see the bias towards 1 as reported in #15738 with `float32`, but there is a spurious mode at 0.5, as would be expected. Users would need to explicitly use `float64` for GPU to not see the spurious mode at 0.5. (EDIT: see note below, it appears that the bias issue is still there for certain builds).

Added some tests and checked that there is no perf regression. My experience with C++ is very limited, so apologies in advance if I missed something basic. cc. ailzhang, fritzo, fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17488

Differential Revision: D14410301

Pulled By: ezyang

fbshipit-source-id: 62b2f694b4642685eab06db96d74ce28e05c3992
2019-03-19 10:34:13 -07:00
84fe20600d Kill Backend constructor of TensorOptions. (#18137)
Summary:
It's wrong and unused.  Use one of the many other constructors instead :).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18137

Differential Revision: D14508364

Pulled By: gchanan

fbshipit-source-id: 19c6ff78ad9d9221d0874425edd02b78627c4ca7
2019-03-19 08:00:21 -07:00
3a85f88efd Remove deviceTypeToBackend, which is underspecified. (#18135)
Summary:
There are multiple backends for a device type, so we just kill this function.
Also, kill an getNonVariableType instance which was also underspecified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18135

Differential Revision: D14507474

Pulled By: gchanan

fbshipit-source-id: fc791a76d4b851b23d09a070725f3838621eb13d
2019-03-19 07:53:28 -07:00
190c36bbc2 Stop generating unimplemented type methods. (#18144)
Summary:
This gets rid of 'aten_sparse' which was used at one time with legacy THS code, but is now only overloaded in native_parse.py.
The way that 'aten_sparse' worked was wonky -- it extended all backends (default [CPU, CUDA]) to include sparse.
But this is totally unnecessary; we already have the backends we need to generate for from type_method_definition_dispatch.

codegen changes: fc37c8e171/diff.txt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18144

Reviewed By: ezyang

Differential Revision: D14511324

Pulled By: gchanan

fbshipit-source-id: 8bb4ac4cf0985f8756790779a22bc229e18e8e7f
2019-03-19 07:42:06 -07:00
8ed2b88bf1 Corrected type of 'swap' in torch.nn.TripletMarginLoss (#18115)
Summary:
Fix #16428 by correcting type of 'swap' from `float` to `bool`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18115

Differential Revision: D14516615

Pulled By: ezyang

fbshipit-source-id: c61a45d533f3a443edf3c31c1ef3d9742bf46d2b
2019-03-19 07:09:15 -07:00
542c273e5b handle scenario when GPU support is not available and p2p_access_pattern is empty (#17974)
Summary:
Observed that when there is no GPU support available `workspace `sets `GetGpuPeerAccessPattern `to `[]` in
https://github.com/pytorch/pytorch/blob/master/caffe2/python/workspace.py#L79
and this case is not handled in https://github.com/pytorch/pytorch/blob/master/caffe2/python/data_parallel_model.py#L1065.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17974

Differential Revision: D14517066

Pulled By: ezyang

fbshipit-source-id: 186911d95c07e9a55ab82a41d0c7c919e4281bb4
2019-03-18 23:11:54 -07:00
195cba500f Fix Caffe2 operator schemas (#15462) (#13229) (#18109)
Summary:
Maratyszcza harouwu yinghai

This is broken since #13065. `c_str()` returns a pointer that isn't permanent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18109

Differential Revision: D14516622

Pulled By: ezyang

fbshipit-source-id: 7113d92eac4f61479c4c7b323cf78cc8aa00b17e
2019-03-18 21:00:43 -07:00
afb2f2424a Increase line-width of Declarations.yaml (#18050)
Summary:
There are some line breaks in schema_string of Declarations.yaml.
Is this valid yaml? I am reading yaml-spec.
It seems that the “|” indicator or single/double quote is required to insert line-break.
https://yaml.org/spec/1.2/spec.html
![image](https://user-images.githubusercontent.com/2469618/54405834-1e53ac80-471b-11e9-9925-be13a109eb46.png)
Could you increase line-width of yaml to avoid newline?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18050

Differential Revision: D14516694

Pulled By: ezyang

fbshipit-source-id: 1db9f3bf131b54a783d668de973915892603189e
2019-03-18 20:49:05 -07:00
86f1dd3fb0 Updating submodules
Reviewed By: yns88

fbshipit-source-id: eeeec4229e05916f2c17e525aee5ac4465ef52db
2019-03-18 20:40:35 -07:00
2737d2c7dc delete unnecessary file .gitkeep (#18136)
Summary:
delete unnecessary file .gitkeep in /pytorch/tree/master/torch/csrc/nn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18136

Differential Revision: D14516584

Pulled By: ezyang

fbshipit-source-id: a7555693cb3df1c5e37fcd3ca9bb379a2258f2d1
2019-03-18 20:31:25 -07:00
3d44305e9d Attribute serialization (#17423)
Summary:
Allows serialization/loading of attributes (`IValue`s of any type).
* metadata (attribute name, type) is stored in the `model.json`
* The binary format is a subset of the `pickle` module that supports the operations necessary for `IValue`s
    * Attributes are serialized in the order they are defined on a module to a list in a single `attributes` file, with submodule attributes coming first. This order directly matches the order attributes are listed in `model.json`
    * This can be inspected in Python with `pickle.load()` or with `pickletools` (PyTorch need not be installed for this to work)
        * A class is used to store a tensor's index into the tensor table of the model, so to unpickle the file you have to use a custom Unpickler:
        ```python
        class TensorID(object):
            def __setstate__(self, id):
                self.id = id

        class JitUnpickler(pickle.Unpickler):
            def find_class(self, module, name):
                if module == '__main__' and name == 'TensorID':
                    return TensorID

        JitUnpickler(open("my_model/attributes.pkl", "rb")).load()
        ```
    * pickle format: https://svn.python.org/projects/python/trunk/Lib/pickletools.py
* It currently does not support/guarantee that anything saved out with `pickle` (i.e. if you edit `attributes` with `pickle` directly) instead of our tools will be imported correctly

Also will fix #17683 and fix #16367

Followup Work:
* document format / choice of pickle: #17951
* create an example
* list specializations
* int size specializations, large binputs
* do a first pass over attributes to output only necessary `BINPUT` ops
* attribute reassignment (e.g `self.my_attribute = new_value`)
* `tensor.save("some_checkpoint.pkl")` support with tensors embedded in Pickle file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17423

Differential Revision: D14470965

Pulled By: driazati

fbshipit-source-id: 6a21a9939efdbe59b4bc57fd31d6d630bab5297e
2019-03-18 18:18:22 -07:00
87b6cbb6fd fix bug in pool_dnnlowp_op_avx2.cc (#18141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18141

VLEN should've been 32

Reviewed By: jianyuh

Differential Revision: D14510780

fbshipit-source-id: ddf12746e1c69677a268432432ddb088cc210084
2019-03-18 16:31:42 -07:00
0a8efce51e Updating submodules
Reviewed By: yns88

fbshipit-source-id: ed297c07c681f5f45d3f99edf48680015ca5b138
2019-03-18 16:21:23 -07:00
421b508d55 Rename gesv to solve (#18060)
Summary:
Changelog:

- Renames `gesv` to `solve` to remain consistent with `cholesky_solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060

Differential Revision: D14503117

Pulled By: zou3519

fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5
2019-03-18 16:04:24 -07:00
0eb4f7aa71 Modify BeamSearch to support CharSourceEncoder (#18107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18107

Pull Request resolved: https://github.com/pytorch/translate/pull/396

also:

1. fix issues with OptionalType not having a createWithContainedType (PyTorch diff)
2. Delete tests for ONNX full beam search export (nobody is using it and it just makes things harder. Currently ONNX doesn't support `_unwrap_optional`)

Reviewed By: jmp84

Differential Revision: D14483771

fbshipit-source-id: 0e37ef1cb5a16d03a535eef808b0488b98802128
2019-03-18 14:11:57 -07:00
670f509984 Circular Convolution Function via circular padding (#17240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17240

Added circular padding in addition to zero padding to Conv1D, Conv2D and Conv3D based on the solution suggested in: https://github.com/pytorch/pytorch/issues/3858

Reviewed By: ezyang

Differential Revision: D14126416

fbshipit-source-id: a2f1587503ee0cfff98d5cb0d5b0a600ef8aaeb4
2019-03-18 12:33:20 -07:00
2b7a5d1876 don't include /usr/include when nvcc is in /usr/bin (#18127)
Summary:
...because gcc will have failures with very strange error messages
if you do.

This affects people with Debian/Ubuntu-provided NVCC, the PR should
not change anything for anyone else.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18127

Differential Revision: D14504386

Pulled By: soumith

fbshipit-source-id: 1aea168723cdc71cdcfffb3193ee116108ae755e
2019-03-18 12:18:27 -07:00
ed36fd30c8 fix double free in test_jit (#18121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18121
ghimport-source-id: 70c273bfbcb68f7b25cf87f5614c662960864758

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18121 [jit] fix double free in test_jit**

These definitions used to be in anonymous namespace so they weren't exported from the translation unit. #18071 put those in a `test` namespace so I guess they were getting their destructors called twice on exit somehow. Making them static again fixes the problem.

Reviewed By: ezyang

Differential Revision: D14498349

fbshipit-source-id: f969781695dcbebdfcfce667fce5b986222a373e
2019-03-18 09:59:13 -07:00
754bf595ca Replace resize_dim with set_sizes_and_strides in THTensor_(squeeze) (#18059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18059

Replace resize_dim() with set_sizes_and_strides() in `THTensor_(squeeze)` in aten/src/TH/generic/THTensor.cpp and `THCTensor_(squeeze)` in aten/src/THC/generic/THCTensor.cpp

Reviewed By: ezyang

Differential Revision: D14471066

fbshipit-source-id: 1c8c412ff09246c4df6843736e3bf0279bfadea8
2019-03-18 08:52:58 -07:00
2e311d2003 update exp. family doc (#18118)
Summary:
sphinx doesn't understand hyphen. it does not merge the two halves together in html.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18118

Differential Revision: D14498012

Pulled By: mrshenli

fbshipit-source-id: d6f4cfddc0a8e3a8f91578da43c26ca9c6fff3ce
2019-03-17 21:39:42 -07:00
fe22871b49 Change one_hot from IndexTensor to Tensor. (#18073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18073
ghimport-source-id: f4dadebafa0423c4c5a0e46c15b38129402d830a

Stack:
* #18072 properly device_guard IndexTensor and BoolTensor.
* **#18073 Change one_hot from IndexTensor to Tensor.**

There is no codegen change.

Reviewed By: ezyang

Differential Revision: D14485248

fbshipit-source-id: ee2ba8e5dcbbbaf0214a026c8e7ed4e6712becb0
2019-03-17 15:40:40 -07:00
3c2fccc1b4 properly device_guard IndexTensor and BoolTensor. (#18072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18072
ghimport-source-id: 9653731602c72f299e095dd50e3afe6bcc8b01d6

Stack:
* **#18072 properly device_guard IndexTensor and BoolTensor.**
* #18073 Change one_hot from IndexTensor to Tensor.

Currently IndexTensor and BoolTensors do not have device_guards applied to them.
This is bad in the case where the only tensor(s) are IndexTensors or BoolTensors, because no device guard is present.

The only case this currently happens is with one_hot which ends up not mattering because of the way the implementation is written.  But I wanted to make sure we are covered here.

Reviewed By: ezyang

Differential Revision: D14485249

fbshipit-source-id: e57b28086fa1ad2fdd248bb1220e8a2e42da03e1
2019-03-17 15:40:39 -07:00
f9ad125e39 fix corner case for optional aliasing (#18093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18093
ghimport-source-id: 021adc52aa7bfe5fff74531c76a8cd28cab30b2a

Stack:
* **#18093 [jit] fix corner case for optional aliasing**

Occasionally the compiler can insert constant Nones to make types line
up. In that case, don't try to make a pointer from the optional type to
None, since we know statically that None won't be mutated or whatever.

Reviewed By: shannonzhu

Differential Revision: D14493004

fbshipit-source-id: 6564065f39d99ee5af664f3a0fe235892973d9be
2019-03-17 14:56:40 -07:00
96fe2b4ecb Typo fix (#18089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18089

Typo fix for the fully connected layer documentation.

Reviewed By: jspark1105

Differential Revision: D14488632

fbshipit-source-id: ca0271ca0250c1d653ed7f250e8588f7b2ce1056
2019-03-16 15:07:01 -07:00
da3cc6e7ee Caffe2 - Add flag to fails if float point exceptions is detected in operator runs (#18040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18040

Add flag to fails if float point exceptions is detected in operator runs

Sample exception

Exception [enforce fail at operator.h:837] !std::fetestexcept(FE_DIVBYZERO). Division by zero floating point exception (FE_DIVBYZERO) reported.
Error from operator:
input: "1" input: "0" output: "out" name: "" type: "Div"

Reviewed By: jspark1105

Differential Revision: D14467731

fbshipit-source-id: fad030b1d619a5a661ff2114edb947e4562cecdd
2019-03-16 12:28:05 -07:00
0fe6e8c870 Remove ComputeLibrary submodule
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18052

Reviewed By: ezyang

Differential Revision: D14477355

fbshipit-source-id: c56b802f6d69701596c327cf9af6782f30e335fa
2019-03-16 09:06:42 -07:00
c7448aa13c remove unused parameters in optimizer tests (#18084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18084

data_strategy parameter was not used in some of unit tests for optimizers

Reviewed By: hyuen

Differential Revision: D14487830

fbshipit-source-id: d757cd06aa2965f4c0570a4a18ba090b98820ef4
2019-03-15 18:06:15 -07:00
be364ac8d7 Specify overload name in function schema (#18037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037

The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this:

    my_func.overload1(arg1: Tensor) -> Tensor
    my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor

Reviewed By: zdevito

Differential Revision: D14467497

fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650
2019-03-15 16:58:13 -07:00
7a3488e0fc Expose c10 cuda ops to caffe2 (#18036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036

- Add macros to export c10 cuda operators to caffe2 frontend
- Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries

Reviewed By: ezyang

Differential Revision: D14467495

fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9
2019-03-15 16:58:12 -07:00
cb2ea17707 Automatic update of fbcode/foxi to 2bcc4064c90e87b9638615c733485f07c47b7558 (#18070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18070

Previous import was d1f45b1a2b1585d0e9bc65e15e463db344fc3ff6

Included changes:
- **[2bcc406](https://github.com/houseroad/foxi/commit/2bcc406)**: Merge pull request #7 from jackm321/tracing_fixes <Jack Montgomery>
- **[c39033c](https://github.com/houseroad/foxi/commit/c39033c)**: Fixes for tracing events <Jack Montgomery>
- **[50912cf](https://github.com/houseroad/foxi/commit/50912cf)**: Merge pull request #5 from jackm321/add_trace_events <Jack Montgomery>
- **[ba2fdcb](https://github.com/houseroad/foxi/commit/ba2fdcb)**: Merge pull request #5 from jackm321/add_trace_events <Jack Montgomery>
- **[7d42b12](https://github.com/houseroad/foxi/commit/7d42b12)**: address comments <Jack Montgomery>
- **[dcabd8d](https://github.com/houseroad/foxi/commit/dcabd8d)**: Add trace events interface <Jack Montgomery>

Reviewed By: houseroad

Differential Revision: D14483201

fbshipit-source-id: f51ed869c9a89521079df89903abc0ac0a45ac7b
2019-03-15 16:49:08 -07:00
d1843d4173 Add backwards compatibility and other fixes to Dispatch macros. (#17996)
Summary:
Changes:
1) https://github.com/pytorch/pytorch/pull/17527 changed dispatch macros to be ScalarType based instead of at::Type based.  This broke cpp extensions that relied on dispatch macros.  Since IMO these should be ScalarType based (and some extensions have already updated), we allow either at::Type or at::ScalarType to be passed, but passing at::Type will result in a deprecated warning.

2) Reintroduce macros that were deleted (AT_DISPATCH_ALL_TYPES_AND_HALF, AT_DISPATCH_COMPLEX_TYPES, AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX, AT_DISPATCH_ALL_TYPES_AND_COMPLEX); the AND_HALF ones now give a deprecated warning because there are more extensible macros that were introduced in their place.

3) Makes AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND into a ScalarType based macro (and updates usages).  This was the result of a logical merge conflicts.

4) Adds a new macro, C10_DEPRECATED_MESSAGE for passing a deprecated message to the compiler.  I didn't spend much time seeing if this can be enabled for versions before C++14.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17996

Reviewed By: ezyang

Differential Revision: D14446203

Pulled By: gchanan

fbshipit-source-id: 1da56e2e9c15aa8f913ebbf6bf1110c5b6dc375e
2019-03-15 14:21:46 -07:00
f3806094d5 Breakup Test Misc (batch 1/2) (#18071)
Summary:
Breakup test_misc so that a test for a file is in test_filename. I think we might want to wait on moving test files into the source directory, since that would involve moving some tests over to the C10 folder, and this goes 99% of the way for test discoverability IMO anyway.

I added a file test_utils for common functions invoked in the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18071

Differential Revision: D14485787

Pulled By: eellison

fbshipit-source-id: dcb20d1978d490999d435ea20c1d0503413a5c80
2019-03-15 13:56:19 -07:00
aafbefa4d6 Remove the identical if branch (#18019)
Summary:
elif branch and else branch have the same content.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18019

Differential Revision: D14475107

Pulled By: ezyang

fbshipit-source-id: 5075cc938f57649af7537de1a7c9d76ea976cafc
2019-03-15 13:14:26 -07:00
80a7eac79e Remove Type::elementSizeInBytes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17785

Reviewed By: ezyang

Differential Revision: D14379074

fbshipit-source-id: 60727f187d61eb571b144bd6eed4dd4908da0b51
2019-03-15 12:56:02 -07:00
9a8a268672 add index and count to list (#17446)
Summary:
see https://github.com/pytorch/pytorch/issues/16662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17446

Differential Revision: D14461293

Pulled By: Krovatkin

fbshipit-source-id: 03572467cdf85efc909c1864c0558a93085c8ff3
2019-03-15 12:45:17 -07:00
001cffed9d ONNX Export IsNan op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17698

Reviewed By: zrphercule

Differential Revision: D14470646

Pulled By: houseroad

fbshipit-source-id: d3e6adc83c4f9fa288c5fe0ae4c6af71fdd47905
2019-03-15 12:19:03 -07:00
18f721fb9a support serialization of classes (#17856)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#17856 [jit] support serialization of classes**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14402599/)

Add support for saving/loading TorchScript modules that depend on user-defned classes.

We track class dependencies the same we track tensor constants, then write them
all out such that we can just compile them in order before compiling the module
hierarchy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17856

Reviewed By: shannonzhu

Differential Revision: D14461599

Pulled By: suo

fbshipit-source-id: 7115f87e069fd00dc8381d7de9997864fef7ea9f
2019-03-15 12:06:23 -07:00
cd26200d1b add reverse to list (#17001)
Summary:
Add reverse functionality to list. See https://github.com/pytorch/pytorch/issues/16662

```python
import torch

torch.jit.script
def foo():
    a = [1, 2, 3, 4]
	a.reverse()

    return a
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17001

Reviewed By: eellison

Differential Revision: D14092019

Pulled By: driazati

fbshipit-source-id: b353c763677c22312b64dde0db268e2988610ba1
2019-03-15 11:53:37 -07:00
b420f8ff70 1/2 Add Tracing support for C2 Ops (#17899)
Summary:
The C10 ops are not registered as custom ops in PyTorch. So we have to add the explicit support for it, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17899

Reviewed By: dzhulgakov

Differential Revision: D14436999

Pulled By: houseroad

fbshipit-source-id: a31fdf13a5c84f9b156a7288e0ffa57deb23b83f
2019-03-15 11:48:34 -07:00
3b5ddaf034 Delete dead code in THTensorMoreMath.cpp (#17993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17993
ghimport-source-id: 5427773f6306bdeddffd9a3ae032acc3f253f458

Stack:
* #17926 Implement at::has_internal_overlap helper function
* #17927 Error out on in-place (unary) ops on tensors that have internal overlap
* **#17993 [easy] Delete dead code in THTensorMoreMath.cpp**

We seem to have new implementations already for these in ATen.

Reviewed By: ezyang

Differential Revision: D14457838

fbshipit-source-id: 8481aad74b2127bd28c0f3e09740889fc0488a31
2019-03-15 07:50:20 -07:00
3c977fb7ce Error out on in-place (unary) ops on tensors that have internal overlap (#17927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17927
ghimport-source-id: 626d321e430b6b5c0ea3aa1eb9df8c1e2d058bf8

Stack:
* #17926 Implement at::has_internal_overlap helper function
* **#17927 Error out on in-place (unary) ops on tensors that have internal overlap**

On the way to #17935.

Works for CPU and CUDA on the following ops:
- abs_, acos_, asin_, atan_, ceil_, cos_, erf_, erfc_, exp_, expm1_
- floor_, log_, log10_, log1p_, log2_, round_, rsqrt_,
- sin_, sqrt_, tan_, tanh_, trunc_

This PR adds a check to see if the out/result tensor has internal
overlap. If it does, then we error out because the result **may** be
incorrect.

This is overly conservative; there are some cases where if the result is
the same as the input, the inplace operation is OK (such as floor_,
round_, and trunc_). However, the current code isn't organized in such a
way that this is easy to check, so enabling those will come in the future.

Reviewed By: ezyang

Differential Revision: D14438871

fbshipit-source-id: 15e12bf1fdb2ab7f74bb806e22bc74840bd6abd1
2019-03-15 07:50:19 -07:00
a4123decf7 Implement at::has_internal_overlap helper function (#17926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17926
ghimport-source-id: 9f7572b5d43e474492363fa17dcb86a6c27ca13c

Stack:
* **#17926 Implement at::has_internal_overlap helper function**
* #17927 Error out on in-place (unary) ops on tensors that have internal overlap

On the way to #17935.

Checks if a tensor's sizes/strides indicate that multiple elements share
the same memory location. This problem in general is hard so
at::has_internal_overlap implements two heuristics and avoids solving
the general problem:

if a tensor is contiguous, it cannot have internal overlap
if a tensor has any zero strides, it does have internal overlap
otherwise, return MemOverlap::kTooHard to indicate that there might be
overlap, but we don't know.

Reviewed By: ezyang

Differential Revision: D14438858

fbshipit-source-id: 607ab31771315921ab6165b2a1f072ac3e75925a
2019-03-15 07:50:17 -07:00
ea652973f2 Fix truncation of default float values in JIT signatures. (#18044)
Summary:
In python2, float values get truncated.  We are storing default float values as floats (not 100% sure why?), which results in the defaults being truncated in the JIT and not matching the (specified) native function signatures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18044

Reviewed By: ezyang

Differential Revision: D14469868

Pulled By: gchanan

fbshipit-source-id: a456de599e8dab106966bcac7a6033f02ce3cdd2
2019-03-15 07:43:15 -07:00
40074d647c Allow None for checkpoint (#17969)
Summary:
Currently, we cannot run a checkpointed function with None argument.

```python
out = torch.utils.checkpoint.checkpoint(run_fn, input_var, None)
```

```
  File "/home/tunz/anaconda3/envs/torchdev/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 14, in detach_variable
    x = inp.detach()
AttributeError: 'NoneType' object has no attribute 'detach'
```

This PR makes checkpoint function to safely handle None argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17969

Differential Revision: D14475148

Pulled By: ezyang

fbshipit-source-id: 9afe9e9aac511a6df1e1620e9ac341536890d451
2019-03-15 07:38:41 -07:00
54ef852d7f Fix unclosed files in download.py, test_onnxifi.py, test_trt.py (#18017)
Summary:
According to https://docs.python.org/3/tutorial/inputoutput.html, it is good practice to use the "with" keyword when dealing with file objects. If not, you should call f.close() to close the file and immediately free up any system resources used by it.  Thus, I adjust the open file function to "with open() as f".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18017

Differential Revision: D14475112

Pulled By: ezyang

fbshipit-source-id: d1c0821e39cb8a09f86d6d08b437b4a99746416c
2019-03-15 07:29:46 -07:00
785c76584c Run multi-gpu (single host) resnet50 and resnext101 training in bench (#18043)
Summary:
This is now working in rocm 2.2

cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18043

Differential Revision: D14477493

Pulled By: bddppq

fbshipit-source-id: 4d2dab1d5dbdbd4d6189162c074b19c4e9882c7d
2019-03-15 02:51:54 -07:00
8f07a9da30 Update nonzero onnx export (#18047)
Summary:
The output format of NonZero in ONNX(numpy https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html) differs from that in PyTorch:
In ONNX: `[rank_of_input, num_of_nonzeros]`, whereas in PyTorch: `[num_of_nonzeros, rank_of_input]`.
To resolve the difference a Transpose op after the nonzero output is added in the exporter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18047

Differential Revision: D14475081

Pulled By: ezyang

fbshipit-source-id: 7a3e4899f3419766b6145d3e9261e92859e81dc4
2019-03-14 22:19:20 -07:00
e21aa16931 more careful use of auto in sparse operations (#17958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17958

In some places, we need 64-bit for corner cases even though it's going to be rare.
In some places, we were using 64-bit unnecessarily.

Reviewed By: hyuen

Differential Revision: D14435523

fbshipit-source-id: e01ab73029ff780133af7ff4bbbe2e17926ed5a2
2019-03-14 22:10:42 -07:00
30b80de876 Update caffe2 docker images tag to 253 (#18031)
Summary:
To use ROCm 2.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18031

Reviewed By: ezyang

Differential Revision: D14469242

Pulled By: bddppq

fbshipit-source-id: c969bcf95dabe067d7b1a2cf6e07209e11148ec1
2019-03-14 20:53:07 -07:00
8362177bcf Fix typo (#17949)
Summary:
Fix a very common typo in my name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17949

Differential Revision: D14475162

Pulled By: ezyang

fbshipit-source-id: 91c2c364c56ecbbda0bd530e806a821107881480
2019-03-14 20:11:33 -07:00
1ba1ca0acb Update to ROCm2.2 (#18007)
Summary:
ROCm 2.2 was released today, if we respin the CI docker images with the attached, PyTorch/Caffe2 will support ROCm 2.2

Changes necessary:
* for the Ubuntu target, HIP PR 934 needs to be applied to fix the forceinline definition. ROCm 2.3 will contain this.
* two unit tests proof flaky on different platforms, disable them defensively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18007

Differential Revision: D14473903

Pulled By: bddppq

fbshipit-source-id: b1939f11d1c765a3bf71bb244b15f6ceb0e816d3
2019-03-14 18:47:22 -07:00
8b32933ea1 fix clang-tidy (#18030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18030
ghimport-source-id: d68781226eee923c90be862ef54693feef5f1c1a

Stack:
* **#18030 [jit] fix clang-tidy**

fix the following complaint
```
pytorch/torch/csrc/jit/ir.cpp:84:7: error: pass by value and use std::move [modernize-pass-by-value,-warnings-as-errors]
      const std::string& delim = ", ")
      ^~~~~~~~~~~~~~~~~~
      std::string
```

Reviewed By: shannonzhu

Differential Revision: D14466714

fbshipit-source-id: 195cba335ae656db28fc6230b9e56ad208c88c29
2019-03-14 17:31:08 -07:00
e782f200f7 Allow fewer arguments than the max in ArgumentSpec (#17826)
Summary:
Fixes #17558

The flattened tuple `Optional[Tuple[int, int]]` could either result in 1 (`None`) or 2 (`int` and `int`) values, so allow this case in `ArgumentSpec`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17826

Differential Revision: D14415290

Pulled By: driazati

fbshipit-source-id: 971bfa39502cfb8f08a991f16ffed6d138e48dc9
2019-03-14 16:54:44 -07:00
9de4350b77 Automatic update of fbcode/foxi to d1f45b1a2b1585d0e9bc65e15e463db344fc3ff6 (#18028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18028

Previous import was 520e8e135f1ad75959bf9b5bd15c361b8caeb8d6

Included changes:
- **[d1f45b1](https://github.com/houseroad/foxi/commit/d1f45b1)**: update the gitignore (#6) <Lu Fang>
- **[398135c](https://github.com/houseroad/foxi/commit/398135c)**: Remove static variable in header (#3) <Lu Fang>
- **[f817be1](https://github.com/houseroad/foxi/commit/f817be1)**: sync to ONNX cb544d07cc022e3fe83622fda9b2b1fa00b75b89 (#2) <Lu Fang>

Reviewed By: zrphercule

Differential Revision: D14464213

fbshipit-source-id: b5d166f05f7fd503dec11d676e219cc6c6a373f9
2019-03-14 15:47:36 -07:00
d3e3b246ea Use std::isnan instead of self-comparison. (#18021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18021
ghimport-source-id: 03423ba47ba5900c2b400c4457b148147ce8b35e

Stack:
* **#18021 Use std::isnan instead of self-comparison.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: soumith

Differential Revision: D14460699

fbshipit-source-id: d8feb7f3f0e93996bd1b4f4aea163548b1d12437
2019-03-14 15:41:36 -07:00
b263a2d8a1 Unroll If ops when doing ONNXIFI transform (#18039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18039

We basically flatten the whole net in order to ease the ONNXIFI transform. An alternative way is to ONNXIFI the internal net of the If op, which can be done by adding interfacing inputs/outputs that the internal then_net or else_net referred to the inputs/outputs of the If op. This will be left as an TODO option.

Reviewed By: zrphercule

Differential Revision: D14452132

fbshipit-source-id: 00ad48d40da6fb8eabf9cca36701bcf61cbe4edc
2019-03-14 14:51:24 -07:00
77d6d9e1b8 Minor improvements to ONNXIFI transform (#17964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17964

1. Make the output of TensorFill outputs CONSTANT during shape inference
2. Add option to avoid adding BatchAdjust ops
3. Add option to avoid lowering subgraph that's smaller than a limit

Reviewed By: hyuen

Differential Revision: D14360903

fbshipit-source-id: b3c5966b44e7cd0d56428acd6cc97f529b36b171
2019-03-14 14:51:23 -07:00
3057580c89 Run fp16 resnext101 training in bench script (#17963)
Summary:
cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17963

Differential Revision: D14453445

Pulled By: bddppq

fbshipit-source-id: 7ca0e0c33ae89d4d4cf6ddba321daf4d6b2d5ed6
2019-03-14 14:41:37 -07:00
Jie
6458a6f0fc Tensor Iterator loop unrolling (#17667)
Summary:
Modified Tensor Iterator gpu reduction kernel.
Creating multiple accumulator during thread reduce, this removes data dependency
between unrolled loops, expose instruction level parallelism that benefits
latency bounded kernels (e.g. welford used by `torch.std`)

This approach increases register usage, such that we need to tune unrolling
factors to prevent register spilling.
Current implementation tune down the unrolling factor to 2 for welford (register
heavy kernel), while keeping it unchanged (4) for the rest of reduction kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17667

Differential Revision: D14368325

Pulled By: umanwizard

fbshipit-source-id: 9d64c0dccabdb1b7c3922a6557224af704a1974e
2019-03-14 14:09:01 -07:00
9506779a73 Temp fix for TileOp backward compatibility (#18026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18026

Temp fix for TileOp backward compatibility

Reviewed By: kittipatv

Differential Revision: D14463672

fbshipit-source-id: 1f3ec550245cb63f1bc4f26196b9334cfe5d0705
2019-03-14 13:54:29 -07:00
e862243abe add a dump method to TreeViews (#17965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17965
ghimport-source-id: 0d3d6340141d8413ce524a8d8ed0d308854ee7ef

Stack:
* (to be filled)

Also added it to the python bindings. Not for any particular reason,
just because otherwise the function gets elided (even in debug mode!)
and thus can't be called from the debugger

Reviewed By: eellison

Differential Revision: D14442654

fbshipit-source-id: 2868bb32ccb80b04f9483883faa702f63a7948bf
2019-03-14 12:27:32 -07:00
5cbc1981f3 JIT IR - Make valueMapPtr optional in convertNetDefToIR (#17942)
Summary:
Make valueMapPtr optional in convertNetDefToIR, and add tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17942

Differential Revision: D14429687

Pulled By: duc0

fbshipit-source-id: 3a5a72bbb5acc1bfd7144a987688c599016fbf7a
2019-03-14 12:22:49 -07:00
53fb9a462a register RoIAlign with C10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17889

Reviewed By: smessmer

Differential Revision: D14411630

fbshipit-source-id: c3b7941d725ae2c78e8d79f52a7983db92b75807
2019-03-14 11:55:29 -07:00
10d64a1372 add tanh to AD and fix layernorm schema
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17816

Differential Revision: D14453048

Pulled By: wanchaol

fbshipit-source-id: 45815db964a4d9ee85d8933e335b47f215e3c467
2019-03-14 11:20:40 -07:00
9af6564060 Add magma debug version for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18008

Differential Revision: D14455117

Pulled By: soumith

fbshipit-source-id: 29d9a2e0b36d72bece0bb1870bbdc740c4d1f9d6
2019-03-14 10:15:57 -07:00
bba906c2cb Simplify env creation when running Windows tests (#17916)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/13465.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17916

Differential Revision: D14460589

Pulled By: soumith

fbshipit-source-id: e952d08648b833cfd4a8551355ecd68045fea25c
2019-03-14 10:10:31 -07:00
84c30398c7 Fix lint in test_multiprocessing.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18016

Reviewed By: eellison

Differential Revision: D14458177

fbshipit-source-id: f17b3e06223ab399e9ce24be6988effe04dad636
2019-03-14 09:58:13 -07:00
73c5921134 Remove ArgcountSortPlugin, which doesn't seem to be used.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17977

Reviewed By: ezyang

Differential Revision: D14438842

Pulled By: gchanan

fbshipit-source-id: 9b1746880fd7e3bd2b76a2559face34940ce7570
2019-03-14 09:30:38 -07:00
3fe7bdb2ff Fix lint in test_nn.py (#18006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18006
ghimport-source-id: e267ece1ac03e0d17e01dddf4a77f52421859435

Stack:
* **#18006 Fix lint in test_nn.py**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Reviewed By: eellison

Differential Revision: D14458108

fbshipit-source-id: 18ee6199447efed55a922cff5b3ad940a21c0536
2019-03-14 08:59:24 -07:00
a41b6d7d1f Simplify macros for exposing c10 ops to c2 (#17781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17781

The wrapper for calling a c10 operator from caffe2 is now based on a runtime FunctionSchema instead of compile time information. This way, it can be created for any c10 operator schema with just one invocation to a simple macro instead of having to define arguments and more as compile time structures.

Furthermore, previously, the wrapper assumed there's an argument present for preallocated outputs, but that was only true for caffe2 operators exported to c10. So the wrapper only worked correctly for calling caffe2->c10->caffe2. Now with the new implementation, it works for any c10 operator.

Also, binary size for this should be much smaller.

Reviewed By: ezyang

Differential Revision: D14375054

fbshipit-source-id: bac7ab8e63929e6e2a148eacac41ed092009aa86
2019-03-14 08:54:16 -07:00
25d06eef7b Improve caffe2 operator wrapping (#17743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17743

- caffe2::Operator::SetOutputTensor() can now be used in operators that are called from c10/PyTorch.
- If the operator uses SetOutputTensor() instead of XOutput(), the wrapper doesn't preallocate an empty tensor for the operator anymore. Only outputs accessed in XOutput() will get an output tensor preallocated.
- Remove the copying of the vector with output tensors into a vector with pointer to output tensors.
- Preallocated outputs are now passed in as one TensorList argument on the stack. This TensorList argument has a well-defined name so other wrappers (i.e. the wrapper calling from c2 into c10) can recognize and use it).
- Macros for exporting caffe2 operators to c10 are simplified. Instead of having `c10_op_handle_for_c2_op`, we now pass in the operator handle as a template argument.
- `SetOutputTensor` and `OutputTensorOrUndefined` now work with operators exported to c10

Reviewed By: ezyang

Differential Revision: D14362434

fbshipit-source-id: 44a5e717204f21ea8e9728437429d9b84906f9f5
2019-03-14 08:54:15 -07:00
6def5b69e3 Remove unused KwargsPlugin.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17980

Reviewed By: ezyang

Differential Revision: D14438877

Pulled By: gchanan

fbshipit-source-id: f93764b00999effb5c8f852f8eda3a6da32dc767
2019-03-14 08:03:55 -07:00
40a3e14ade Disable btri tests on Windows if MAGMA is not found (#17989)
Summary:
Fixes #17988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17989

Reviewed By: ezyang

Differential Revision: D14454571

Pulled By: soumith

fbshipit-source-id: fc39a807a597d3574f4ca4e22cea12194e4693c0
2019-03-14 07:22:55 -07:00
16e50c78e7 Report convolution size mismatch (#17436)
Summary:
1. Kernel size is larger than input
2. Expected output size to be less than zero

Test case added:
- invalid_conv1d
- Relevant test cases for conv2d and conv3d exists

Fixes #17247
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17436

Reviewed By: mrshenli

Differential Revision: D14354272

Pulled By: fmassa

fbshipit-source-id: 94b98621aa03b1f60d151ef9399ed3da55d41b42
2019-03-14 06:35:29 -07:00
8bd9465b79 make momentum non negative in adagrad test (#18009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18009

momentum should be initialized with non-negative values

Reviewed By: hyuen

Differential Revision: D14450841

fbshipit-source-id: 5bbbd11645db9e6f2dc42b26a00ff3caf378c59f
2019-03-14 03:15:07 -07:00
f827f1052a Fix the CI
Summary: https://github.com/pytorch/pytorch/pull/17995 's CI has verified it should fix the CI.

Reviewed By: bddppq

Differential Revision: D14447674

fbshipit-source-id: 50085db9ae7421b5be216ed0a2216234babfdf6c
2019-03-13 17:28:50 -07:00
6df7116273 Fix missing return in HIPStreamMasqueradingAsCUDA::operator<< (#17961)
Summary:
```
In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/hip/BatchLinearAlgebra.hip:3:
In file included from /var/lib/jenkins/workspace/aten/src/ATen/hip/HIPContext.h:5:
/var/lib/jenkins/workspace/aten/src/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h:107:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
1 warning generated.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17961

Reviewed By: houseroad

Differential Revision: D14436421

Pulled By: bddppq

fbshipit-source-id: 962665602178699d7c7b55f4ca7ff1eb72ee0349
2019-03-13 16:04:42 -07:00
c5b50a3440 Remove AssertNDim, which doesn't seem to be used.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17978

Reviewed By: colesbury

Differential Revision: D14438845

Pulled By: gchanan

fbshipit-source-id: 106650c37fb1885201eaef27cb6d86b49ef27976
2019-03-13 15:10:55 -07:00
42acae5406 Remove unused BoolOption.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17979

Reviewed By: zou3519

Differential Revision: D14438876

Pulled By: gchanan

fbshipit-source-id: a6aeab0261ce6926ed82a81edee4564a8dd341ed
2019-03-13 13:38:19 -07:00
1e42720a77 Fix some typos in distributed.py.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17959

Differential Revision: D14437347

Pulled By: soumith

fbshipit-source-id: 4c33571f56e9da687666516a310f91924cddd4d9
2019-03-13 09:28:03 -07:00
1c3494daf0 Fix Windows test CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17954

Differential Revision: D14437473

Pulled By: soumith

fbshipit-source-id: f0d79ff0c5d735f822be3f42bbca91c1928dacaf
2019-03-13 09:22:46 -07:00
9089182ce4 Fix lint in test_utils.py (#17944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17944
ghimport-source-id: 5b45086428b5a36e737882c78f285141121fd1bc

Stack:
* **#17944 Fix lint in test_utils.py**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14430132

fbshipit-source-id: b00de7b4c685645ad5a4dc8c5fe6ce7e1893a3eb
2019-03-13 09:02:35 -07:00
26a4c2ada6 Speed up gemm by reordering the for loops (#17730)
Summary:
Optimize the order of the "for" loops.

Note: For "transa = true" cases, the order of the "for" loops has been optimzied in the original code. Therefore, no significant improvement is observed in those case (i.e. "transa && transb" and "transa && !transb")

mode/opt (i.e. static libary)
//////////////////////////////////////////////////////////////////////////////
transa && transb
after:
loops:  2229     x:     128      y:     128      z:     128      time:  2243ns      =>  acceleration multiplier:  0.90
loops:  124      x:     128      y:     1024     z:     128      time:  40381ns      =>  acceleration multiplier:  0.97
loops:  121      x:     1024     y:     128      z:     128      time:  41651ns      =>  acceleration multiplier:  0.96
loops:  15       x:     1024     y:     1024     z:     128      time:  333771ns       =>  acceleration multiplier:  0.98
loops:  4610     x:     128      y:     128      z:     64       time:  1084ns       =>  acceleration multiplier:  0.95
loops:  252      x:     128      y:     1024     z:     64       time:  19860ns      =>  acceleration multiplier:  0.98
loops:  248      x:     1024     y:     128      z:     64       time:  20232ns      =>  acceleration multiplier:  0.98
loops:  30       x:     1024     y:     1024     z:     64       time:  167338ns      =>  acceleration multiplier:  0.99

before:
loops:  2468     x:     128      y:     128      z:     128      time:  2026ns
loops:  128      x:     128      y:     1024     z:     128      time:  39338ns
loops:  126      x:     1024     y:     128      z:     128      time:  39930ns
loops:  16       x:     1024     y:     1024     z:     128      time:  327549ns
loops:  4840     x:     128      y:     128      z:     64       time:  1033ns
loops:  258      x:     128      y:     1024     z:     64       time:  19441ns
loops:  252      x:     1024     y:     128      z:     64       time:  19854ns
loops:  31       x:     1024     y:     1024     z:     64       time:  166254ns

//////////////////////////////////////////////////////////////////////////////
transa && !transb
after:
loops:  4880     x:     128      y:     128      z:     128      time:  1024ns      =>  acceleration multiplier:  0.98
loops:  638      x:     128      y:     1024     z:     128      time:  7839ns      =>  acceleration multiplier:  1.04
loops:  605      x:     1024     y:     128      z:     128      time:  8276ns      =>  acceleration multiplier:  1.01
loops:  77       x:     1024     y:     1024     z:     128      time:  65713ns      =>  acceleration multiplier:  1.00
loops:  9935     x:     128      y:     128      z:     64       time:  503ns      =>  acceleration multiplier:  1.00
loops:  1252     x:     128      y:     1024     z:     64       time:  3994ns      =>  acceleration multiplier:  1.00
loops:  1183     x:     1024     y:     128      z:     64       time:  4226ns      =>  acceleration multiplier:  0.98
loops:  153      x:     1024     y:     1024     z:     64       time:  32766ns      =>  acceleration multiplier:  0.99

before:
loops:  4985     x:     128      y:     128      z:     128      time:  1003ns
loops:  615      x:     128      y:     1024     z:     128      time:  8140ns
loops:  599      x:     1024     y:     128      z:     128      time:  8357ns
loops:  76       x:     1024     y:     1024     z:     128      time:  65934ns
loops:  9897     x:     128      y:     128      z:     64       time:  505ns
loops:  1248     x:     128      y:     1024     z:     64       time:  4008ns
loops:  1203     x:     1024     y:     128      z:     64       time:  4159ns
loops:  154      x:     1024     y:     1024     z:     64       time:  32499ns

//////////////////////////////////////////////////////////////////////////////
!transa && transb
after:
loops:  3919     x:     128      y:     128      z:     128      time:  1276ns      =>  acceleration multiplier:  2.97
loops:  497      x:     128      y:     1024     z:     128      time:  10069ns      =>  acceleration multiplier:  7.85
loops:  449      x:     1024     y:     128      z:     128      time:  11145ns      =>  acceleration multiplier:  4.77
loops:  57       x:     1024     y:     1024     z:     128      time:  88595ns      =>  acceleration multiplier:  7.12
loops:  7575     x:     128      y:     128      z:     64       time:  660ns      =>  acceleration multiplier:  3.00
loops:  967      x:     128      y:     1024     z:     64       time:  5173ns      =>  acceleration multiplier:  7.66
loops:  877      x:     1024     y:     128      z:     64       time:  5702ns      =>  acceleration multiplier:  4.76
loops:  111      x:     1024     y:     1024     z:     64       time:  45232ns      =>  acceleration multiplier:  7.03

before:
loops:  1320     x:     128      y:     128      z:     128      time:  3789ns
loops:  64       x:     128      y:     1024     z:     128      time:  79061ns
loops:  95       x:     1024     y:     128      z:     128      time:  53107ns
loops:  8        x:     1024     y:     1024     z:     128      time:  631161ns
loops:  2521     x:     128      y:     128      z:     64       time:  1983ns
loops:  127      x:     128      y:     1024     z:     64       time:  39604ns
loops:  185      x:     1024     y:     128      z:     64       time:  27128ns
loops:  16       x:     1024     y:     1024     z:     64       time:  318155ns

//////////////////////////////////////////////////////////////////////////////
!transa && !transb
after:
loops:  3895     x:     128      y:     128      z:     128      time:  1283ns      =>  acceleration multiplier:  1.73
loops:  393      x:     128      y:     1024     z:     128      time:  12746ns      =>  acceleration multiplier:  3.36
loops:  411      x:     1024     y:     128      z:     128      time:  12170ns      =>  acceleration multiplier:  1.93
loops:  46       x:     1024     y:     1024     z:     128      time:  110116ns      =>  acceleration multiplier:  3.17
loops:  7404     x:     128      y:     128      z:     64       time:  675ns      =>  acceleration multiplier:  1.58
loops:  636      x:     128      y:     1024     z:     64       time:  7872ns      =>  acceleration multiplier:  2.70
loops:  724      x:     1024     y:     128      z:     64       time:  6911ns      =>  acceleration multiplier:  1.32
loops:  73       x:     1024     y:     1024     z:     64       time:  68502ns      =>  acceleration multiplier:  2.49

before:
loops:  2253     x:     128      y:     128      z:     128      time:  2219ns
loops:  117      x:     128      y:     1024     z:     128      time:  42788ns
loops:  214      x:     1024     y:     128      z:     128      time:  23465ns
loops:  15       x:     1024     y:     1024     z:     128      time:  349076ns
loops:  4694     x:     128      y:     128      z:     64       time:  1065ns
loops:  236      x:     128      y:     1024     z:     64       time:  21251ns
loops:  549      x:     1024     y:     128      z:     64       time:  9108ns
loops:  30       x:     1024     y:     1024     z:     64       time:  170799ns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17730

Differential Revision: D14325149

Pulled By: zhangguanheng66

fbshipit-source-id: a7a5a83890fdf99fee6eb87a3a5060b7b6bd862f
2019-03-13 08:57:26 -07:00
ecc5e623a2 fix punctuation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17973

Differential Revision: D14438725

Pulled By: zou3519

fbshipit-source-id: 30a5485b508b4ae028057e0b66a8abb2b163d66b
2019-03-13 08:14:30 -07:00
13bc002422 fixes for AVX detection (#17915)
Summary:
Our AVX2 routines use functions such as _mm256_extract_epi64
that do not exist on 32 bit systems even when they have AVX2.
This disables AVX2 when _mm256_extract_epi64 does not exist.

This fixes the "local" part of #17901 (except disabling FBGEMM),
but there also is sleef to be updated and NNPACK to be fixed,
see the bug report for further discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17915

Differential Revision: D14437338

Pulled By: soumith

fbshipit-source-id: d4ef7e0801b5d1222a855a38ec207dd88b4680da
2019-03-13 03:55:06 -07:00
7e34bd230b Disable FBGEMM when building under x86 32bit (#17922)
Summary:
FBGEMM doesn't work on x86 32bit and prior to this patch, it will
generate x86_64 objects in a build that is supposed to be x86 32bit.
FBGEMM actually relies on registers not available on x86_32, so
we disable it.

This takes of one element of #17901. There are more dependencies
and a separate PR (#17915) regarding AVX detection for the code in the
main repository.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17922

Differential Revision: D14437340

Pulled By: soumith

fbshipit-source-id: bd9fc98cf607d9b0bc28127fbbc8b04fa10eecbe
2019-03-13 03:46:50 -07:00
f6de833cac Update docs for mark_non_differentiable method (#17891)
Summary:
The current documentation doesn't reflect the real values of tensors during the backward pass.
This issue is mentioned in https://github.com/pytorch/pytorch/issues/12631
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17891

Differential Revision: D14419949

Pulled By: soumith

fbshipit-source-id: 8b495628c3f017bc880f8096682cd176a53974e5
2019-03-13 03:19:59 -07:00
1e7f027f5b Simplify OpKernel (#17925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17925

There's no need for OpKernel to keep the cache creator around if we initialize cache on construction.

This basically means, kernel caches are now constructed when the kernel is looked up from the dispatcher, and not delayed to the first call anymore.
This gives us the benefit of cheaper calling because now kernel calling doesn't have to check if the cache is already initialized.

Also, this improves thread-safety. Now, OpKernel is thread-safe if the kernel is thread-safe.

Reviewed By: ezyang

Differential Revision: D14424907

fbshipit-source-id: a0d09a3a560dfe78aab53d558c9ebb91b57722df
2019-03-13 01:40:10 -07:00
8714b8bb89 Mark DispatchTable move ctor and move assignment operator as deleted (#17948)
Summary:
```
21:39:50 /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/DispatchTable.h:125:3: warning: explicitly defaulted move constructor is implicitly deleted [-Wdefaulted-function-deleted]
21:39:50   DispatchTable(DispatchTable&&) = default;
21:39:50   ^
21:39:50 /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/DispatchTable.h:212:36: note: move constructor of 'DispatchTable' is implicitly deleted because field 'kernels_' has a deleted move constructor
21:39:50   detail::ThreadsafeOperatorTable_ kernels_;
21:39:50                                    ^
21:39:50 /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/DispatchTable.h:105:68: note: copy constructor of 'ThreadsafeOperatorTable_' is implicitly deleted because field 'map_' has a deleted copy constructor
21:39:50    LeftRight<ska::flat_hash_map<TensorTypeId, DispatchTableEntry>> map_;
21:39:50                                                                    ^
21:39:50 /var/lib/jenkins/workspace/c10/util/LeftRight.h:152:16: note: copy constructor of 'LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::DispatchTableEntry, std::hash<c10::TensorTypeId>, std::equal_to<c10::TensorTypeId>, std::allocator<std::pair<c10::TensorTypeId, c10::DispatchTableEntry> > > >' is implicitly deleted because field '_writeMutex' has a deleted copy constructor
21:39:50     std::mutex _writeMutex;
21:39:50                ^
21:39:50 /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/mutex:129:5: note: 'mutex' has been explicitly marked deleted here
21:39:50     mutex(const mutex&) = delete;
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17948

Reviewed By: ezyang

Differential Revision: D14431344

Pulled By: bddppq

fbshipit-source-id: b1c6593b73cb467a58b09a3470b8899b82564d5e
2019-03-13 01:29:50 -07:00
4dcb4b1601 Add more hint in the JIT tracer (#17957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17957

So developer knows what action should be taken when model contains nondeterministic node

Reviewed By: dzhulgakov

Differential Revision: D14435923

fbshipit-source-id: 12d930185852f78c54efc8e90c51aa7c7c7faab5
2019-03-13 00:56:59 -07:00
c8f9072ab6 Fix half-float conversion ops to handle tensors larger than 2B of params (#17952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17952

As desc.

Reviewed By: hyuen

Differential Revision: D14435092

fbshipit-source-id: dc614ba16ad531101d04d01aec8f1fbd534ebec5
2019-03-12 23:03:22 -07:00
8bc3b66be9 Override the resolve_library_path in FBCode (#17497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17497

The following problems have been addressed: 1) import torch.ops correctly, 2) make realpath call optional

Reviewed By: dzhulgakov

Differential Revision: D14094358

fbshipit-source-id: 2f9a6fca656867287a7c82c465a4554384ff7323
2019-03-12 22:09:24 -07:00
b21e9e4dae update ccache guide (#17938)
Summary:
closes #17937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17938

Differential Revision: D14435791

Pulled By: kostmo

fbshipit-source-id: b1d0db8902f78bde51150606e2a67fb9ddfe7812
2019-03-12 21:48:17 -07:00
9a946c4072 unify cpp tests (#17947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17947

Instead of having a gtest and a no-gtest file that you have to remember to register tests in, add a single registration point and use some macro magic to make it work for both gtest and non-gtest builds

Reviewed By: eellison

Differential Revision: D14431302

fbshipit-source-id: e1abac135992577a943eaa7abcc81a6ed31fa6e5
2019-03-12 21:35:40 -07:00
4f939dded1 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 7d454d0f58898741f293b356dfc10d7fc31fd55c
2019-03-12 20:34:05 -07:00
66556f48e3 Remove sinkMaxPool transformation (#17694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17694

Remove sinkMaxPool transformation as it's unused

Differential Revision: D14328624

fbshipit-source-id: bd245403b756157120faa61a0f9253c15120e7a8
2019-03-12 20:10:46 -07:00
f7b70a69e5 Fix Windows build (#17917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17917

D14375995 introduced instantiation of the following templates with `bool` type (more specifically `To` is `int64_t`, `From` is `bool`):
```
template <typename To, typename From>
typename std::enable_if<std::is_integral<From>::value, bool>::type overflows(
    From f) {
  using limit = std::numeric_limits<typename scalar_value_type<To>::type>;
  if (!limit::is_signed && std::numeric_limits<From>::is_signed) {
    // allow for negative numbers to wrap using two's complement arithmetic.
    // For example, with uint8, this allows for `a - b` to be treated as
    // `a + 255 * b`.
    return f > limit::max() ||
        (f < 0 && -static_cast<uint64_t>(f) > limit::max());
  } else {
    return f < limit::lowest() || f > limit::max();
  }
}

template <typename To, typename From>
typename std::enable_if<std::is_floating_point<From>::value, bool>::type
overflows(From f) {
  using limit = std::numeric_limits<typename scalar_value_type<To>::type>;
  if (limit::has_infinity && std::isinf(static_cast<double>(f))) {
    return false;
  }
  if (!limit::has_quiet_NaN && (f != f)) {
    return true;
  }
  return f < limit::lowest() || f > limit::max();
}
```
MSVC gives C4804 warning and because "treat warnings as errors" is on it fails to build on Windows. Disabling such warning for those 2 templates.

Reviewed By: mingzhe09088

Differential Revision: D14421157

fbshipit-source-id: e72ba34406628c84da48518b32a46f851819bad1
2019-03-12 19:53:56 -07:00
92e35ac0a7 fix overly restrictive assertion (#17939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17939

Instead of just asserting min <= 0 and max >= 0 , we adjust histogram to include 0 in the range.
We need to include 0 in the range during norm error minimization to correctly represent our quantization method that includes 0.

Reviewed By: csummersea

Differential Revision: D14428732

fbshipit-source-id: 6669a9d2c7d409ec3b31aee0afe48071986b9b71
2019-03-12 18:18:49 -07:00
e34abe03a8 Enable threadpool threads to greedily acquire new tasks if available. (#17808)
Summary:
This improves locality and affinity by keeping work on the same
threads preferentially to starting work on new ones, and reduces
contention on the threadpool lock more generally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17808

Differential Revision: D14391282

Pulled By: resistor

fbshipit-source-id: 3aec81656a50460a725aa4187c61864295d4f46e
2019-03-12 18:05:55 -07:00
552f903c63 JIT IR - Add option to remove prefix string when converting from JIT IR to NetDef (#17931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17931

When converting from NetDef to IR and back, the prefix string should be removed so the operator types are preserved in caffe2.

Reviewed By: ZolotukhinM

Differential Revision: D14425954

fbshipit-source-id: 2807e7337b0f804f126970768b1250a4a8c5f35c
2019-03-12 17:02:26 -07:00
4ad17c9031 Misleading documentation for module._load_from_state_dict (#17618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17618

Base on the code, we only add key to `missing_keys` and `unexpected_keys` if `$strict` is `True`. The documentation is confusing.

This diff also fix one FLAKE8 warning.

Reviewed By: ailzhang

Differential Revision: D14280593

fbshipit-source-id: d368f5596bdf74ff62ee4d28d79120f5af91e0a3
2019-03-12 16:57:39 -07:00
6248266d91 Enable detectron on AMD GPU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17862

Differential Revision: D14429234

Pulled By: bddppq

fbshipit-source-id: 5cb8750bd9db0ff8a179977d2bfbb180265cce81
2019-03-12 16:29:42 -07:00
1cfb50334f Removed dead code from THTensorMath.h (#17873)
Summary:
This PR removes dead code from THTensorMath.h
I found these unused methods while working on a PR where i plan to move fill and zero methods from TH/THC to Aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17873

Differential Revision: D14407013

Pulled By: izdeby

fbshipit-source-id: a3551c5d91e7b380931a8b3bd4b3ae972d16911d
2019-03-12 13:53:33 -07:00
6466ddbd86 Fix lint in test_torch.py (#17807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17807

Lint also detected a bug in test_linspace where we weren't
actually testing the CUDA case.

Differential Revision: D14388241

fbshipit-source-id: e219e46400f4952c6b384bca3baa0724ef94acde
2019-03-12 13:48:28 -07:00
40ecdc57ff Updating submodules
Reviewed By: zpao

fbshipit-source-id: 06c0f738c791cccf79025d15f1fc2076bf34fcd1
2019-03-12 13:29:46 -07:00
ee87254720 Eliminate the use of Type. (#17804)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#17804 Eliminate the use of Type.**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14382165/)

at::CPU produces Type object which is then casted into TensorOptions, instead directly using TensorOptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17804

Differential Revision: D14407851

Pulled By: ezyang

fbshipit-source-id: 6462d698305b7c24382c1bfd440d3227bd28d9e4
2019-03-12 12:54:24 -07:00
0f7e6f293b Make Variable::set_data non-const; cosmetic fixes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17761

Differential Revision: D14406603

Pulled By: ezyang

fbshipit-source-id: bc8bba73352eb4b3e21196b36522e9cec70f6676
2019-03-12 12:41:57 -07:00
3e00f79a1e remove warning for upsample code (#17921)
Summary:
IIRC we decided to remove warning in code in #11568. This got reverted accidentally in #14123.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17921

Differential Revision: D14422811

Pulled By: ailzhang

fbshipit-source-id: 7067264bd1d3e3b7861d29e18ade2969ed705ca1
2019-03-12 12:16:33 -07:00
f229521154 Optimize TileOp (#17290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17290

Optimize TileOp

Reviewed By: wesolwsk

Differential Revision: D14145844

fbshipit-source-id: 1571fa0512218dbc48080592ede4e23903be85dd
2019-03-12 12:16:30 -07:00
54b33503ec Optimize channel_stats_op (#16243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16243

Optimize channel_stats_op and add NHWC impl

Reviewed By: takatosp1

Differential Revision: D13775515

fbshipit-source-id: decb889e646f5316d4afefdf9f9b6bc6343613cd
2019-03-12 12:08:00 -07:00
99f1465c35 enable shape inference for elementwise operators (#17885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17885

enable shape inference for elementwise operators

Reviewed By: yinghai

Differential Revision: D14411014

fbshipit-source-id: b19bcaabb2bba26fb79745ec84af0e4e5ed18ff0
2019-03-12 12:02:24 -07:00
4d2f6f1bbe Remove remaining test jit expects redux (#17924)
Summary:
Trying to reland https://github.com/pytorch/pytorch/pull/17886 since it broke a build and I reverted it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17924

Differential Revision: D14423842

Pulled By: eellison

fbshipit-source-id: f219e786bd07f7da3b7f9e866981199f5ccf6318
2019-03-12 11:33:34 -07:00
abab9c1d78 Handle Scalars Better (#17875)
Summary:
This PR allows Scalars to be castable with `int()` and `float()`, allows scalars to match with float arguments, and prints out a better error message if `x.item()` is used as an int.

Scalars are a very uncommon case, and I don't think we want to add the maintenance burden of building out op coverage for it. It's more maintainable to better handle converting it to int/float.

Fix https://github.com/pytorch/pytorch/issues/17652

Also note: https://github.com/pytorch/pytorch/issues/16849
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17875

Differential Revision: D14411138

Pulled By: eellison

fbshipit-source-id: a4e957cefb0ffd10ddb234d92f6d1558cfce8751
2019-03-12 10:52:26 -07:00
fd04073e61 Fixed a formatting issue in doc comments (#17505)
Summary:
for torch.distributed.broadcast_multigpu per issue #17243
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17505

Reviewed By: janewangfb

Differential Revision: D14373865

Pulled By: pietern

fbshipit-source-id: 6d7e91a3da50a7c9ba417ad852f7746eb5200043
2019-03-12 09:55:29 -07:00
18949c8e00 Add nbytes, itemsize, element_size to at::Tensor. (#17810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17810

Partially addresses #12728. Also, switch the element_size bindings
to use the new function, rather than the method on Type.

We don't add Python bindings yet, as they need to be special
(they will be properties.)

Differential Revision: D14388790

fbshipit-source-id: 294183d0c8a59b0c13f2bf21d6f1cd557333e83b
2019-03-12 09:48:54 -07:00
dc4cbd9565 Fix lint in test_distributions.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17821

Differential Revision: D14392899

fbshipit-source-id: 99f75b1d3a71bde8050caef8df7e5b9ecfe0c755
2019-03-12 09:39:24 -07:00
030fec9703 Fix lint in test_jit.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17823

Differential Revision: D14392996

fbshipit-source-id: b9aa83898768c929e753c0f17bb09a54d724ae4d
2019-03-12 09:20:20 -07:00
4073e3c2f2 Fix lint errors in test_autograd
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17812

Reviewed By: eellison

Differential Revision: D14388897

fbshipit-source-id: 6e2671805dc8d57af68eb0a0cd6ccb24d9db45e2
2019-03-12 08:55:12 -07:00
f3a860ba07 Added a few extra python bindings to help with walking the IR graph from Python (#17822)
Summary:
These changes add the following new Python bindings:

- Values have a 'type' property now that allows getting to the 'type' object
- Blocks have now inputs and outputs as well as returnNode and paramNode properties
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17822

Differential Revision: D14410123

Pulled By: ezyang

fbshipit-source-id: 64ef79f85a7a43b83e4b127b1d39efcaa64b74dc
2019-03-12 08:55:10 -07:00
aba9051a65 kthvalue consistency with sort in the presence of NaN (#17824)
Summary:
This PR causes kthvalue to be consistent with sort
(i.e. treat NaN as larger than any number), so that
`a.kthvalue(n) == a.sort()[n - 1]`.

One drawback is that median with a NaN argument does not return NaN,
which is a deviation from NumPy.

Thank you, ngimel, for raising this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17824

Differential Revision: D14410092

Pulled By: ezyang

fbshipit-source-id: bdec2d8272dc4c65bcf2f9b8995e237774c44c02
2019-03-12 08:49:19 -07:00
joy
9ecee93a16 Fix minor grammatical mistakes in torch/nn/modules/loss.py (#17892)
Summary:
Fixes some minor grammatical mistakes in the doc of `loss.py`.

I think in the doc:
>  Note that for some losses, there multiple elements per sample.

the "are" is lost between "there" and "multiple".

This mistake takes place in all the descriptions of parameter `size_average` and there are 17 of them.
It's minor but perfects the doc I think. 😁
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17892

Differential Revision: D14418177

Pulled By: ezyang

fbshipit-source-id: 412759f2f9b215819463bf8452ab0e0513218cd6
2019-03-12 08:42:50 -07:00
02c48cced9 Remove (almost all) TensorOptions from native_functions.yaml (#17385)
Summary:
Stacked on top of https://github.com/pytorch/pytorch/pull/17386

Brings us to 1014/1106 of writing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17385

Differential Revision: D14248008

Pulled By: cpuhrsch

fbshipit-source-id: 033e00de91e3edf7ae01ca03ebe436c0446b3b5c
2019-03-12 08:00:00 -07:00
12d6725c15 Restore full Windows tests (#17102)
Summary:
closes #17101
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17102

Differential Revision: D14420716

Pulled By: ezyang

fbshipit-source-id: 0134736e2d919afa683afa84cb2140f659042643
2019-03-12 06:34:45 -07:00
525fef708d Prevent VS2017 from emitting ambiguous symbol errors (second time)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17863

Differential Revision: D14404818

Pulled By: soumith

fbshipit-source-id: 9dac6b926e270e2a29ec2e4dba2e93984da0e5f5
2019-03-12 01:56:58 -07:00
af2e347164 Fix windows test hang (#17778)
Summary:
This PR resolves two concurrent issues discovered when running the test in windows. Details about the windows test can be found here: https://github.com/pytorch/pytorch/issues/17609

The change covers two fixes:
1. update running_preloaders_ upfront before creating worker thread to prevent underflow.
2. add a lock when updating stop_ to prevent dead lock in condition variable cv_write_.

The fix has been tested on both Windows and Linux. With --gtest_repeat=1000, the tests runs smoothly without issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17778

Differential Revision: D14404910

Pulled By: soumith

fbshipit-source-id: 2fbb8007e4b0bce4613e9a9fd31b8aace1bbfa8d
2019-03-12 01:50:49 -07:00
f268370b42 torch.btrifact for tensors with greater than 3 dimensions (#14964)
Summary:
Motivation:
- Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check:
>   AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes());

What is in this PR?:
- Move `btrifact` to ATen
- Remove relation to TH/THC.
- Handle tensors with more than three dimensions
- Tests
- Docs modifications: added a note about the non-pivoting variant.

[blocked due to old magma-cuda binaries]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964

Differential Revision: D14405106

Pulled By: soumith

fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184
2019-03-12 01:46:07 -07:00
b161ac9634 Small clean up of aten_op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17530

Reviewed By: ezyang

Differential Revision: D14237931

fbshipit-source-id: fb73d63d89fab0622097a49be6ed0b75ddb02a7c
2019-03-11 21:04:16 -07:00
496a3339dc add support for parsing class defs to the string frontend (#17628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17628

This is not hooked up anywhere yet, just adding support.
This shares the same restrictions as the python frontend—namely, that the only exprs allowed right now are method defs.

Reviewed By: shannonzhu

Differential Revision: D14291654

fbshipit-source-id: 7798e5ff412a52ef8803c7bae8f439e50968a73a
2019-03-11 19:13:55 -07:00
64bb86d946 add test for out of order methods (#17624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17624

Just to make sure this path works

Reviewed By: shannonzhu

Differential Revision: D14288056

fbshipit-source-id: b719c0e90252b6821b1f9b22d3d98982985a6cb3
2019-03-11 19:13:54 -07:00
f9820e55af initializing class value (#17585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17585

Create a sugared value that represents a class during initialization. This is
so that assignments to attributes correctly define attributes in __init__ but
raise an error elsewhere.

Reviewed By: shannonzhu

Differential Revision: D14263403

fbshipit-source-id: 09b2feeb272302f00a79c2a0302fbdf5483aed6a
2019-03-11 19:13:52 -07:00
2e753fc753 Remove unused parameter in ProcessGroupGloo (#17718)
Summary:
This is not used anywhere and wasn't cleaned up prior to 1.0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17718

Reviewed By: janewangfb

Differential Revision: D14355154

Pulled By: pietern

fbshipit-source-id: f8ff3c8f50cd6365b369a5c5b85d72d8940df048
2019-03-11 18:01:20 -07:00
f540536dfd Revert D14414435: [pytorch][PR] Remove remaining IR Expect files
Differential Revision:
D14414435

Original commit changeset: 0bfd7ce66ac2

fbshipit-source-id: 02de1814f3c4e581d3798059cee752517b176ed9
2019-03-11 17:36:44 -07:00
fd67f6b463 Remove remaining IR Expect files (#17886)
Summary:
Last batch of IR expect files removed. Includes some removal of expect files that are no longer used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17886

Differential Revision: D14414435

Pulled By: eellison

fbshipit-source-id: 0bfd7ce66ac2f72a57f15f45ebd60b95e80b6c16
2019-03-11 17:32:19 -07:00
4aa22833cf Bool tensor creation (cpu) (#17376)
Summary:
This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
    1. Storage Implementation [Done]
    2. Tensor Creation.
        a) CPU (this PR)
        b) CUDA
    3. Tensor Conversions.
    4. Tensor Indexing.
    5. Tensor Operations.
    6. Back compatibility related changes.

**Change**:
Enable CPU tensors and these operations:
- torch.zeros
- torch.tensor
- torch.ones
- torch.randint
- torch.full
- torch.full_like
- torch.empty
- torch.empty_like

**Tested via**:
1) unit tests

2)
torch.zeros(2,2, dtype=torch.bool)
torch.tensor([True, False], dtype=torch.bool)
torch.tensor([-1, -1.1, 0, 1, 1.1, 2], dtype=torch.bool)
torch.ones([1,2], dtype=torch.bool)
torch.randint(10, (2, 2), dtype=torch.bool)
torch.full((2, 3), True, dtype=torch.bool)
torch.empty(4, dtype=torch.bool)

a = torch.tensor([0,0,1])
b = torch.full_like(a, True)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17376

Reviewed By: ezyang

Differential Revision: D14375995

Pulled By: izdeby

fbshipit-source-id: a65490b5360ee0e6e3accc54ce7e32e49ad2d2a8
2019-03-11 17:03:40 -07:00
b5fa5a5603 Remove device guard from TypeDefault::copy()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17833

Reviewed By: ezyang

Differential Revision: D14400901

Pulled By: li-roy

fbshipit-source-id: ababc95dadfc94a996a80c5332f45f76a300963d
2019-03-11 15:53:41 -07:00
066d15840f re-enable torch.split tests (#17859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17859

this has been fixed due to improvements in shape analysis

Reviewed By: driazati

Differential Revision: D14402781

fbshipit-source-id: 4ef2722ffedd9c8ac1eff55c244b421d7d3715ed
2019-03-11 15:22:55 -07:00
d391137acd Fix lint in test_dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17878

Reviewed By: eellison

Differential Revision: D14409933

fbshipit-source-id: 20ee8953a21e29b4557aff62b5e48dddd630eef6
2019-03-11 14:50:51 -07:00
fa29c179b7 Optimize fused_dropout_kernel launch bounds for AMD hardware
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17870

Differential Revision: D14409990

Pulled By: ezyang

fbshipit-source-id: 0452282f459770823641b2527f47b1186ab14666
2019-03-11 14:45:42 -07:00
3f1d0ee5d5 Deprecate torch.pstrf (#17866)
Summary:
Changelog:
- Add deprecation warning to torch.pstrf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17866

Differential Revision: D14405527

Pulled By: soumith

fbshipit-source-id: 73f3b7d61c60eb57e4bffd08112e552ae3e6dfdc
2019-03-11 12:27:52 -07:00
11c89dde55 Allow structseq to be input of operators where tuple is expected (#17208)
Summary:
Currently the following code gives an error on python 2 because `ret` is a structseq which is not a tuple
```python
ret = a.max(dim=0)
ret1 = torch.max(a, dim=0, out=ret)
```

This PR modify tuple check in python arg parser to allow structseq to be input of operators where tuple is expected, which would make the above code work.

Depend on: https://github.com/pytorch/pytorch/pull/17136
Partially fixes: https://github.com/pytorch/pytorch/issues/16813
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17208

Differential Revision: D14280198

Pulled By: VitalyFedyunin

fbshipit-source-id: beffebfd3951c4f5c7c8fe99a5847616a89491f3
2019-03-11 11:33:35 -07:00
b9e8f56daa Add PyTorch Governance, Contributor Guide, and List of Persons of Interest
Summary: Adding new documents to the PyTorch website to describe how PyTorch is governed, how to contribute to the project, and lists persons of interest.

Reviewed By: orionr

Differential Revision: D14394573

fbshipit-source-id: ad98b807850c51de0b741e3acbbc3c699e97b27f
2019-03-11 10:36:41 -07:00
abd39d5a88 Fix compilation error (#17860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17860

att

Reviewed By: bddppq

Differential Revision: D14402751

fbshipit-source-id: 2d53b230dfd775372addeab1d3eaf0b9552fae9f
2019-03-11 10:26:42 -07:00
b3c9090736 Revert D14392864: Fix lint in test_dataloader.py
Differential Revision:
D14392864

Original commit changeset: 12477b9cfe29

fbshipit-source-id: 1864a80d5cfaceeae55d0145340a578f978ab4a7
2019-03-11 10:19:41 -07:00
817fd9ebf1 Removed dead code from THTensorMath.h (#17769)
Summary:
This PR removes dead code from THTensorMath.h
I found these unused methods while working on a PR where i plan to move **fill** and **zero** methods from TH/THC to Aten.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17769

Differential Revision: D14372732

Pulled By: izdeby

fbshipit-source-id: 94fd3b52c691ebc89d2bdc8905452e7498038bf5
2019-03-11 10:14:44 -07:00
b57fe3cc66 Introducing array-like sequence methods __contains__ (#17733)
Summary:
for tensor

Fixes: #17000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17733

Differential Revision: D14401952

Pulled By: soumith

fbshipit-source-id: c841b128c5a1fceda1094323ed4ef1d0cf494909
2019-03-11 09:00:16 -07:00
906f9efc57 Revert "Add check for x64 Python before setup (#17707)" (#17864)
Summary:
This reverts commit 08fb9021da32e73bd7dec73104eea6a76dd44439.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17864

Differential Revision: D14404920

Pulled By: soumith

fbshipit-source-id: d41fc06e249f3437d4f80d1d6a5fdbd44c90462b
2019-03-11 08:52:13 -07:00
8045b3eb14 Registering of kl-divergence for independent distribution (#17681)
Summary:
This address issue https://github.com/pytorch/pytorch/issues/13545 and implements the proposed fix together with a single test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17681

Differential Revision: D14360161

Pulled By: ezyang

fbshipit-source-id: 427afc88e9054b5b0dc39ebbab1087b990695ea5
2019-03-11 08:10:16 -07:00
c02369151d Fix lint in test_dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17820

Reviewed By: eellison

Differential Revision: D14392864

fbshipit-source-id: 12477b9cfe290428d51cc28e024c8cbe8bb7bf51
2019-03-11 08:01:33 -07:00
1d827b7271 Further improvements of nn.container docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17731

Differential Revision: D14401894

Pulled By: soumith

fbshipit-source-id: cebb25859f78589cc4f4f8afb1e84c97f82b6962
2019-03-10 18:30:39 -07:00
b6313d74e1 fix faq typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17851

Differential Revision: D14401791

Pulled By: soumith

fbshipit-source-id: ed6d64d6f5985e7ce76dca1e9e376782736b90f9
2019-03-10 15:33:52 -07:00
6bcff88d3e Fix log_softmax and softmax if any dimension is 0-d (#17651)
Summary:
- Test added
- test_dim_function_empty: softmax and log_softmax on last dimension

fixes: #17262
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17651

Differential Revision: D14349009

Pulled By: gchanan

fbshipit-source-id: b6f728f5c6be8ae7615749e3f0c201886632923e
2019-03-10 15:25:58 -07:00
75f88d4da6 Correct loss docstrings (#17300)
Summary:
In the loss doc description, replace the deprecated 'reduct' and 'size_average' parameters with the 'reduction' parameter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17300

Differential Revision: D14195789

Pulled By: soumith

fbshipit-source-id: 625e650ec20f13b2d22153a4a535656cf9c8f0eb
2019-03-10 11:56:41 -07:00
98c54e9fa6 When openblas exists, "OpenBLAS_FOUND" is defined, rather than "OPENBLAS_FOUND". (#17841)
Summary:
See https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindOpenBLAS.cmake#L36

This typo lead to cmake fails to detect openblas on ubuntu.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17841

Differential Revision: D14400261

Pulled By: soumith

fbshipit-source-id: 287e019e122230cf6b70ab1ea94e5c514f429c88
2019-03-10 09:34:50 -07:00
a6c4ea66dd Passing indices as a list to Subset instead of Tensor (#17649)
Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing

fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649

Differential Revision: D14400250

Pulled By: soumith

fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0
2019-03-10 09:23:53 -07:00
81e025d9ac Clarify JIT docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17846

Differential Revision: D14400363

Pulled By: jamesr66a

fbshipit-source-id: 862316b5fd95526b6edebeca19d2cc522779df11
2019-03-09 23:13:31 -08:00
24e7b824e0 Add metadata for torch jit TracedModules. (#17640)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17640

Pull Request resolved: https://github.com/pytorch/pytorch/pull/17311

I've extended our model metadata framework in this diff to support
traced modules as well. Re-used a lot of components from the previous
implementation of ScriptModule metadata.

Tracing is a little different from Scripting since you can't just create a
subclass of TopLevelTraceModule (type returned by torch.jit.trace) and attach
metadata the way we did for ScriptModule. As a result, I've introduced a
separate API torch.fb.jit_trace which returns an instance of
TracedModuleWithMetadata which is a subclass of TopLevelTracedModule. As a
result, we can now attach metadata to this instance.

Reviewed By: dzhulgakov

Differential Revision: D14117966

fbshipit-source-id: 3eee5eef733cb8d6a219c02e2f41d08698eca326
2019-03-09 21:37:15 -08:00
320c6977c2 Fix PySlice_Unpack not available on PyPy 3.6 yet (#17836)
Summary:
This is one of the fixes needed to support compilation on PyPy 3.6, see https://github.com/pytorch/pytorch/issues/17835
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17836

Differential Revision: D14399404

Pulled By: soumith

fbshipit-source-id: ca650a6e2066aed86ddd3314a95d0cb3c515c633
2019-03-09 20:10:16 -08:00
742568e7eb PyPy compatibility: let unmodified slots be inherited in the standard way (#17837)
Summary:
This is needed to fix a segfault on PyPy 3.6, see https://bitbucket.org/pypy/pypy/issues/2968/segfault-calling-cpyext_tp_new_tuple and https://github.com/pytorch/pytorch/issues/17835
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17837

Differential Revision: D14399408

Pulled By: soumith

fbshipit-source-id: 75328a30018313d3223dd3e3eef9240a416c049b
2019-03-09 11:42:16 -08:00
17232fb842 Run fp16 resnet50 training in bench script (#17831)
Summary:
cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17831

Differential Revision: D14398532

Pulled By: bddppq

fbshipit-source-id: 37c03cc2eebe3a6083e05631cb6ff03474e4a8a2
2019-03-08 21:53:12 -08:00
c10c73f047 Int8 FC performance debugging (#17700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17700

Add performance debugging utilities in DNNLOWP FC operator and the python script

Reviewed By: amylittleyang

Differential Revision: D14321299

fbshipit-source-id: 50dbd7b352a1da5d2ecb659d8003e71e70750063
2019-03-08 19:03:54 -08:00
0fd1dc45c0 Optimize LayerNormOp (#17604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17604

Optimize LayerNormOp

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14274175

fbshipit-source-id: a7aa263a1b0eb109682d2be99306e7b2cdcc0faf
2019-03-08 17:38:14 -08:00
65b00aa597 Remove some simple use cases of Type::ScalarType()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17529

Reviewed By: ezyang

Differential Revision: D14237932

fbshipit-source-id: be633a1fc19215d53cfe083fdd7196acf2b7dd2f
2019-03-08 16:42:05 -08:00
3aeb78079b Change Dispatch.h to use ScalarType over Type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17527

Reviewed By: zou3519

Differential Revision: D14235395

fbshipit-source-id: 3f53e33f6794f1f14c2edf79014b8ef8397822c5
2019-03-08 16:42:04 -08:00
cc07f968f8 Revert D14361993: [pytorch][PR] [Onnx] - refactoring serialization of ONNX initializers to be name-based
Differential Revision:
D14361993

Original commit changeset: da93e945d557

fbshipit-source-id: 15eea001fbcd059ac13903405aeb9ea182c6ee8b
2019-03-08 16:31:14 -08:00
1d26a3ae7e Open registration for c10 thread pool (#17788)
Summary:
1. Move ATen threadpool & open registration mechanism to C10
2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788

Reviewed By: zdevito

Differential Revision: D14379707

Pulled By: jamesr66a

fbshipit-source-id: 949662d0024875abf09907d97db927f160c54d45
2019-03-08 15:38:41 -08:00
0955592243 Cast nn.Upsample.scale_factor to a float (#17732)
Summary:
Fixes #17106
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17732

Differential Revision: D14388192

Pulled By: driazati

fbshipit-source-id: d9c9e87a7c6db63c1de3ddebbb8dcf619f0dc34d
2019-03-08 15:29:35 -08:00
4bea15f580 Fix lint in run_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17815

Reviewed By: eellison

Differential Revision: D14390308

fbshipit-source-id: 22efd62a1bbd1fc8155a942d7160d5b7d3158e6b
2019-03-08 14:41:36 -08:00
dbb5d02a45 Fix lint in test/common_utils.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17814

Reviewed By: eellison

Differential Revision: D14390194

fbshipit-source-id: b4b3bbe20a15d0b9ed127b255e01c0d6d0832c1b
2019-03-08 14:22:57 -08:00
7aae51cded Replace tensor.type().scalarType() calls with tensor.scalar_type()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17515

Reviewed By: ezyang

Differential Revision: D14233250

fbshipit-source-id: 6c7af8d2291c0c2b148001b30cf03834f34366c0
2019-03-08 14:08:18 -08:00
efed875b3f Catch exceptions in bound_shape_inference (#17775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17775

Handles use input shape hint properly.

Reviewed By: zrphercule

Differential Revision: D14368735

fbshipit-source-id: 504cd96589e47aa432617e56362aa6b01a25ba9b
2019-03-08 13:18:28 -08:00
4a7c549e8f refactor caffe2 operator constructors - 11/9 (#17722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17722

clangr codemod

Reviewed By: ezyang

Differential Revision: D14350584

fbshipit-source-id: adef54cedc9409b4fb365f6644e2621a9e47b2ff
2019-03-08 12:38:54 -08:00
55cf9c742a Suppress C408 lint (don't use dict constructor) (#17813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17813

We have a lot of manually written out dict() constructors,
and (1) I don't think use of curly brace syntax is much
of an improvement and (2) it seems like a waste of time to
fix them all.

Reviewed By: eellison

Differential Revision: D14390136

fbshipit-source-id: 6199bef4dea75b6079bcb9d9e8acf20a2e1a86e1
2019-03-08 12:19:17 -08:00
11f50e73e3 Add matches_jit_signature to recent native functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17805

Differential Revision: D14388004

Pulled By: cpuhrsch

fbshipit-source-id: c50580b6fe1e9cfefed91aaa526376325d9f9c0d
2019-03-08 11:42:25 -08:00
fe90ee9dc8 Add /MD to prevent linking errors on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17799

Differential Revision: D14385777

Pulled By: ezyang

fbshipit-source-id: 8c1d9f80c48399087f5fae4474690e6d80d740e6
2019-03-08 10:46:25 -08:00
a60fadfb71 Change message on unknown db type to be friendly (#17795)
Summary:
CreateDB actually returns nullptr when db type is unknown and throws when the file is missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795

Reviewed By: ezyang

Differential Revision: D14383226

Pulled By: dzhulgakov

fbshipit-source-id: 1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b
2019-03-08 10:46:24 -08:00
667763a63a Trace rnn max_batch_size (#17727)
Summary:
This causes the tracer to record the select / cast to int operation instead of just an int constant

Fixes #15319 but relies on a fix for #17583 first
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17727

Differential Revision: D14377886

Pulled By: driazati

fbshipit-source-id: 59453def54ba72756303f723993844dbeb5d2f8b
2019-03-08 10:36:36 -08:00
7f7d12854d Remove legacy way of exposing caffe2 operators to PyTorch (#17742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17742

This path isn't used anymore, and is incompatible with the changes stacked on top of this diff.
Removing it.
cc bwasti to check and confirm these can really be deleted

Reviewed By: ezyang

Differential Revision: D14362426

fbshipit-source-id: 32cdc19f28c2a981ae1e204901420998367ee588
2019-03-08 10:22:41 -08:00
b132f0f1e7 Remove 'Tensor' key from ATen codegen. (#17782)
Summary:
We used to have different ATen Tensor types, but we don't anymore.  This was just being maintained by a codegen'ed comment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17782

Reviewed By: ezyang

Differential Revision: D14378004

Pulled By: gchanan

fbshipit-source-id: 1bbf276393a391252d372cc385230c784bd78588
2019-03-08 09:46:38 -08:00
5ffd7dbbb4 Remove ProcessorSpecificPlugin. (#17789)
Summary:
It doesn't seem to be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17789

Reviewed By: ezyang

Differential Revision: D14382423

Pulled By: gchanan

fbshipit-source-id: 0ac3236c48979a1b2bcd615e307e55f10fd8eb77
2019-03-08 09:46:37 -08:00
33c83a3f35 Remove THPPlugin. (#17790)
Summary:
It doesn't seem to be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17790

Reviewed By: ezyang

Differential Revision: D14380897

Pulled By: gchanan

fbshipit-source-id: 3c3884a08c3b6c1489347d439509b19e079c5861
2019-03-08 09:42:01 -08:00
424a03186a Replace tens with hundreds.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17752

Differential Revision: D14366743

fbshipit-source-id: 39f6ac08180d780866e284024918d9abd197d239
2019-03-08 07:33:41 -08:00
6aacc1b2dd Support failback for more operators in ideep (#17747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17747

RMACRegions, Normalize and RoIPooling

Reviewed By: dskhudia

Differential Revision: D14365096

fbshipit-source-id: dafcb7077515e03c2880832a442015b70fc7140d
2019-03-08 05:48:22 -08:00
256923523a Cleanup include files in jit/passes/common_subexpression_elimination.h.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17784

Differential Revision: D14381529

Pulled By: ZolotukhinM

fbshipit-source-id: e32e17ee644ef888a6d56a8ee3648e7ac21758bf
2019-03-08 01:11:20 -08:00
b290a16b2d Use return names in JIT operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17638

Differential Revision: D14295606

Pulled By: cpuhrsch

fbshipit-source-id: 62040ac65434411357808735f0fe6cd33cc1c30f
2019-03-07 23:34:42 -08:00
ac87488bd3 Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#17764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17764

Original commit changeset: f1923fdca4a1

reverted int8 ops fixes the original runtime regression.
We'll ignore the memory regression since it is flaky, see D14228484

Reviewed By: dzhulgakov

Differential Revision: D13885233

fbshipit-source-id: ccbe4b94acb44b7b4cb3ae4d73e3f6091e1e1195
2019-03-07 18:38:53 -08:00
cc7aec12fd Clean up some old ScalarType stuff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17755

Differential Revision: D14377135

Pulled By: li-roy

fbshipit-source-id: 35305760a1621340ba66c61a193ff61cfedfa7e8
2019-03-07 16:21:52 -08:00
549eb9e9bc add reference to flake8-mypy in contributing.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17759

Differential Revision: D14376813

Pulled By: eellison

fbshipit-source-id: cca1128e967ef7368633b94a3fa3c8e76a4a16f4
2019-03-07 15:28:59 -08:00
9d70e199f4 Move lerp to ATen, add functionality for tensor weights (#17348)
Summary:
Changelog:
- Remove TH/THC bindings
- Add tensor weights for `lerp`
- Modify derivatives appropriately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348

Differential Revision: D14355845

Pulled By: soumith

fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf
2019-03-07 14:04:58 -08:00
6227afb305 Refactor dispatcher (#17753)
Summary:
This is a side PR for a bool tensor feature. The idea of this change came from a feedback received in this [PR](https://github.com/pytorch/pytorch/pull/17376).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17753

Differential Revision: D14367989

Pulled By: izdeby

fbshipit-source-id: 4fa380e56e20f18e480be68920170dbc3a4eb91c
2019-03-07 13:41:54 -08:00
aa57f17808 add layernorm to AD
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17702

Differential Revision: D14368472

Pulled By: wanchaol

fbshipit-source-id: 8db390e39444078258ad1d34ba74d6ddafa5d02b
2019-03-07 13:36:51 -08:00
5bf9e41938 move half<->float conversions to oss operators (#17548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17548

expose half float operators to OSS

common/math/Float16.h is the original implementation
this is substituted by caffe2/c10/util/Half.h

from the comments seems like the both implementations don't handle denormals

Reviewed By: jspark1105

Differential Revision: D14244200

fbshipit-source-id: f90ba28c5bf6a2b451b429cc4925b8cc376ac651
2019-03-07 13:00:13 -08:00
aa4c4c47fa Fix the update ONNX expect files (#17767)
Summary:
Fix the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17767

Reviewed By: zrphercule

Differential Revision: D14370483

Pulled By: houseroad

fbshipit-source-id: e7b0bbde0797c41f5a010fa206fab80fe2792eb7
2019-03-07 12:54:42 -08:00
7bcc2301ee Cleanup testFusion/testOne: there are unused arguments.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17737

Differential Revision: D14366584

Pulled By: ZolotukhinM

fbshipit-source-id: 3c2dd2aabfecca475909e4eec4a077d900795da9
2019-03-07 11:19:24 -08:00
4480aa31c2 Automatic update of fbcode/onnx to 96c58ceeacf0f2b73d752e413e4fd78787a12da3 (#17676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17676

Previous import was e18bb41d255a23daf368ffd62a2645db55db4c72

Included changes:
- **[96c58ce](https://github.com/onnx/onnx/commit/96c58ce)**: Fix shape inference when auto_pad is notset again (#1830) <Li-Wen Chang>
- **[873ddbb](https://github.com/onnx/onnx/commit/873ddbb)**: More extendable Runner (#1809) <Michał Karzyński>

Reviewed By: zrphercule

Differential Revision: D14321241

fbshipit-source-id: 12de9021afc61f5435f1b719cccf7b0f4ad73a84
2019-03-07 11:10:31 -08:00
1043ff6d68 Set the default ONNX opset to the latest stable opset (i.e., 9) (#17736)
Summary:
1) The changes in the new opset won't affect internal pipeline.
2) The CI won't be affected by the ONNX changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17736

Reviewed By: zrphercule

Differential Revision: D14358710

Pulled By: houseroad

fbshipit-source-id: 4ef15d2246b50f6875ee215ce37ecf92d555ca6a
2019-03-07 10:56:06 -08:00
a2381fa346 Add module attributes (#17309)
Summary:
Similar to `nn.Parameter`s, this PR lets you store any `IValue` on a module as an attribute on a `ScriptModule` (only from the Python front-end currently). To mark something as an attribute, it should wrapped in `jit.Attribute(value, type)` (ex. `self.table = torch.jit.Attribute(table, Dict[str, torch.Tensor])`)

Followup Work:
* (de)serializing for use in C++
* change `self.training` to be a `bool` attribute instead of a buffer
* mutable attributes
* string frontend support
* documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17309

Differential Revision: D14354316

Pulled By: driazati

fbshipit-source-id: 67e08ab5229366b67fbc837e67b58831a4fb3318
2019-03-07 10:44:10 -08:00
e4c9d75008 - refactoring serialization of ONNX initializers to be name-based (#17420)
Summary:
Currently, serialization of model parameters in ONNX export depends on the order in which they are stored in a container (`list` on Python side and `std::vector` on C++ side). This has worked fine till now, but if we need to do any pass on that graph that mutates the parameter list, then strictly order-based serialization may not work.

This PR is the first in a set to bring in more passes (such as constant folding) related to ONNX export. This PR lays the groundwork by moving the serialization in ONNX export from order-based to name based approach, which is more amenable to some of the passes.

houseroad - As discussed this change uses a map for export, and removes the code from `export.cpp` that relies on the order to compute initializer names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17420

Differential Revision: D14361993

Pulled By: houseroad

fbshipit-source-id: da93e945d55755c126de06641f35df87d1648cc4
2019-03-07 10:25:00 -08:00
3f94fc4862 ONNX Export for Max and Average Pooling in CEIL_MODE
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16769

Differential Revision: D14362175

Pulled By: houseroad

fbshipit-source-id: 65cfb1dfba6a43d39cc85374add368fe8e4e5645
2019-03-07 10:10:21 -08:00
561037aef8 use flake8-mypy (#17721)
Summary:
Use flake8 installed with mypy checks so that our linter matches fbcode. Mypy type errors also provide valuable signal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17721

Differential Revision: D14357778

Pulled By: eellison

fbshipit-source-id: d8c9ea3fe3b5f550c3b70fe259e0eabf95e4c92d
2019-03-07 09:15:54 -08:00
1d522598fb use fp16<->fp32 intrinsic (#17496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17496

As title.

Reviewed By: hyuen

Differential Revision: D14222907

fbshipit-source-id: d5d6c032e725ca8b52aca2be7401ec3c59f6a242
2019-03-07 02:23:07 -08:00
f8778aef78 Implement a Caffe2 standalone LSTM operator (#17726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17726

Pull Request resolved: https://github.com/pytorch/pytorch/pull/17725

Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461

Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.

Two things missing:

- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch

Reviewed By: dzhulgakov

Differential Revision: D14351575

fbshipit-source-id: 3b99b53212cf593c7a49e45580b5a07b90809e64
2019-03-07 01:08:49 -08:00
7d02a1fbc7 caffe2:libtorch_cuda depends on caffe2:caffe2_gpu (#17729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17729

When doing "import torch" in fbcode, previously the caffe2 cuda kernels weren't loaded because libcaffe2_gpu.so wasn't loaded.
Once you also did "from caffe2.python import workspace", then the cuda kernels were loaded because that triggered a runtime mechanism for loading libcaffe2_gpu.so.

We want the cuda kernels to always be available, so this diff adds a dependency from caffe2:libtorch_cuda to caffe2:caffe2_gpu.

Reviewed By: ezyang

Differential Revision: D14353498

fbshipit-source-id: 76a9fe69f231b308ab40eac393bb216c6fad3658
2019-03-06 23:53:16 -08:00
39423fbdd4 add tensor and cost inference functions (#17684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17684

Adding tensor and cost inference functions to more int8 operators.

Reviewed By: yinghai

Differential Revision: D14174746

fbshipit-source-id: dfad975fa75899565c8fb61f1b7747a9206ebd22
2019-03-06 23:34:02 -08:00
3dba1285ab ONNX Export Narrow op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17550

Differential Revision: D14350401

Pulled By: houseroad

fbshipit-source-id: 4d88079bb7a8bbd270b0272009826eb3b202cc33
2019-03-06 22:37:58 -08:00
3230404645 Keep the dim_type of hinted shape as BATCH if possible (#17734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17734

If input is not BATCH, we will skip adjust its batch size during onnxifi transformation. So when we take hints, we take it as CONSTANT but later need to change it to BATCH if possible.

Reviewed By: jackm321

Differential Revision: D14355983

fbshipit-source-id: 63eb54a44afb1565c71486fdd73db07ca0ac4fd4
2019-03-06 19:58:35 -08:00
jwu
8ec7357312 fix different round behavior on CPU and GPU #16498 (#17443)
Summary:
xxtemp, colesbury, bhushan23, zou3519,  convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed.
See #16498
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17443

Differential Revision: D14261786

Pulled By: VitalyFedyunin

fbshipit-source-id: 98156436b545d72769831a89e2775d43ad913ebc
2019-03-06 19:40:10 -08:00
68c5c66800 Warn about memory overlaps on expanded tensors (#17576)
Summary:
Eventually we should remove these when we're certain that all our ops
handle memory overlaps correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17576

Differential Revision: D14349990

Pulled By: zou3519

fbshipit-source-id: c3a09f6113b9b1bf93e7f13c0b426c45b2cdf21f
2019-03-06 17:44:04 -08:00
93768785ec fix exp fam. formula
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17719

Differential Revision: D14349029

Pulled By: soumith

fbshipit-source-id: cf016756a9319436f7379e8377f8bd1e1b672b40
2019-03-06 15:47:13 -08:00
f6fda4409b refactor caffe2 operator constructors - 10/9 (#17659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17659

clangr codemod

Reviewed By: ezyang

Differential Revision: D14304675

fbshipit-source-id: 45fbd84c50651a70ae29bf46df3322715e99d225
2019-03-06 15:11:47 -08:00
4db3f8f806 Improve ONNX symbolic for logsoftmax and softmax (#17672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17672

support dtype in the onnx symbolic

Reviewed By: zrphercule

Differential Revision: D14313987

fbshipit-source-id: e9364621b3f795191d880599711dfbcb220d0e31
2019-03-06 15:02:08 -08:00
c78da0c6ed Enable using CMD when building cpp extensions on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17706

Differential Revision: D14346482

Pulled By: ezyang

fbshipit-source-id: 7c85e51c701f6c0947ad324ef19fafda40ae1cb9
2019-03-06 14:45:31 -08:00
a87d475c2f Do not rename net boundary inputs/outputs during ssaRewrite. (#17545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17545

This diff avoids renaming boundary inputs of net during onnxifi transform.
It also removes adding mappings for the initializer during onnxifi op creation.
Thus gets read of the mapped ws creation during onnxifi op creation.

Reviewed By: zrphercule

Differential Revision: D14243161

fbshipit-source-id: 6eafa920c45f6a6bfacbbb443e8e84cf9778644c
2019-03-06 14:26:58 -08:00
9024faaafe Reapply D14078519 (#17596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17596

Was reverted before, now fixed version.

Reviewed By: ezyang

Differential Revision: D14270288

fbshipit-source-id: c72490b5d02cc6098cb60145fa9a842b3c9a24c5
2019-03-06 13:51:00 -08:00
bd7fcced69 Batch of expect file removals (#17581)
Summary:
Another batch of removing expect files.

One note - I removed the Batched expect files without adding equivalent tests since they are already being tested in another ways, and we are no longer actively maintaining that project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17581

Differential Revision: D14343578

Pulled By: eellison

fbshipit-source-id: ce0b1fd2b5b4ec80ad9003bab1b58f41645d3da6
2019-03-06 13:44:26 -08:00
39669316a6 (#14267)
Summary:
- Summary:

Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)

- User-facing api:

1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)

2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)

- supported use case:
DistributedDataParallel with ***single-gpu multi-process***

a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:

  1. use layers directly:

     torch.nn.SyncBatchNorm(...)

     similar API as with torch.nn.BatchNormXd(...)
     with added argument `process_group` which is used to limit the scope of
     synchronization within each process group. Default value is None, which
     implies synchronization across all GPUs

  2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)

     recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
     preserving values of parameters/buffers.
     the utility function also allows user to specify process_group value to all
     converted layers.

b. user wraps their model with
   `torch.distributed.parallel.DataParallelDistributed`, from this point, user
   should follow the general guidelines for DDP use guide

- Error checking

For use cases not supported, we error out:

1. Application launched without ddp:
   > import torch
   > sbn = torch.nn.SyncBatchNorm(10).cuda()
   > inp = torch.randn(5, 10, 3, 3).cuda()
   > sbn(inp) --> Error!
   > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

2. Application launched using DDP with multi-GPU per-process:
   > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
   > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267

Differential Revision: D14270035

Pulled By: ezyang

fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d
2019-03-06 13:39:11 -08:00
0ed1b9fb98 Update ModuleDict doc about order
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17717

Differential Revision: D14346557

Pulled By: ezyang

fbshipit-source-id: 2484c7d8105f9aa8bce5567d1fa2d4f587cc9cc2
2019-03-06 13:09:46 -08:00
e2de88dc5a Update CODEOWNERS (#17720)
Summary:
teng-li is passing the baton to mrshenli. Thanks for all your work on distributed teng-li!! 🎉
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17720

Differential Revision: D14350120

Pulled By: pietern

fbshipit-source-id: edfe784520c54630203cc8fbb296455d3dbf341b
2019-03-06 12:33:48 -08:00
073634612f ONNX Export Argmin and Argmax ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17382

Differential Revision: D14338811

Pulled By: houseroad

fbshipit-source-id: be07548d8063d1aa94f1801c18137738365b85fb
2019-03-06 12:11:47 -08:00
97eb139a94 Turn atol to 1e-5 when comparing the end to end results (#17708)
Summary:
results smaller than 1e-5 don't make sense.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17708

Differential Revision: D14348893

Pulled By: houseroad

fbshipit-source-id: 5e07c38e5b58b27b61fae63bfc3c21e2fe5629fe
2019-03-06 12:06:45 -08:00
7fa996f8e2 remove loop expects (#17695)
Summary:
Replace loop unrolling expect files with assertions on the output IR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17695

Differential Revision: D14347105

Pulled By: eellison

fbshipit-source-id: 1703b4ca32bc1c67c01fc4330b0e6eb66feaa103
2019-03-06 11:48:46 -08:00
b87abdfc12 typo fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653

Differential Revision: D14302003

Pulled By: ezyang

fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573
2019-03-06 11:36:44 -08:00
e3516d0a95 omit group conv NHWC test for GPU (#17715)
Summary:
Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error:

```
Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1])

You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case
```
This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`.
I think this example should be skipped.

I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715

Differential Revision: D14346642

Pulled By: ezyang

fbshipit-source-id: b1f1fef09f625fdb43d31c7213854e61a96381ba
2019-03-06 11:32:35 -08:00
10ea02facf fix tuple matching (#17687)
Summary:
Check for Tuple Matching in isSubvalueOf, since they may contain container types that need to be recursed within isSubvalueOf

Fix for https://github.com/pytorch/pytorch/issues/17650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17687

Differential Revision: D14324642

Pulled By: eellison

fbshipit-source-id: 7f1e019875286b2640a3b9c003d1635dda8cf543
2019-03-06 11:25:36 -08:00
c658d9b21b Temporarily disable Upsample operator tests in pytorch-onnx tests (#17696)
Summary:
In discussion with houseroad, because Upsample op is being updated in ONNX https://github.com/onnx/onnx/pull/1773 and these tests are blocking it. These tests will be updated once the ONNX PR goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17696

Differential Revision: D14338845

Pulled By: houseroad

fbshipit-source-id: cfaf8cf1ab578ae69dd3bf21b1c0681b572b9b6f
2019-03-06 11:25:34 -08:00
08fb9021da Add check for x64 Python before setup (#17707)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17657.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17707

Differential Revision: D14346705

Pulled By: ezyang

fbshipit-source-id: 5daafacdb99eb9a9c6517263d10f20c79f920d24
2019-03-06 10:48:16 -08:00
1e6acc676f Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623

Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices.  Rename it to something that more
clearly spells out its applicability.

I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written.  More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.

This is a respin of D13156470 (#14284)

Reviewed By: dzhulgakov

Differential Revision: D14285504

fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d
2019-03-06 10:48:15 -08:00
e9eb18a18c Remove nomscheduler (#17693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17693

Remove nomscheduler tool

Reviewed By: yinghai

Differential Revision: D14328168

fbshipit-source-id: 674d0e18596a4dc2bbb6b8d321f4066c4fc454ab
2019-03-06 10:48:13 -08:00
886e482776 index operation support for torch.HalfTensor (#17645)
Summary:
- Test cases added
1. indexing for half tensor
2. setting for half tensor

fixes #17161
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17645

Differential Revision: D14302069

Pulled By: ezyang

fbshipit-source-id: 100f141c07046f200c904e27c5882a9417bccda0
2019-03-06 10:32:35 -08:00
507c93bad2 Revert D14160172: Implement a Caffe2 standalone LSTM operator
Differential Revision:
D14160172

Original commit changeset: c33e3f9e8aea

fbshipit-source-id: cffe35d93f0ac75ca93aa98a3b82af3d372f2fc1
2019-03-06 08:44:25 -08:00
39f94619ec fix typo in hub doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17705

Differential Revision: D14338380

Pulled By: ailzhang

fbshipit-source-id: d53eece30bede88a642e718ee6f829ba29c7d1c4
2019-03-05 23:19:30 -08:00
fefaebabba fix dropout AD & rename range to rangelist (#17691)
Summary:
fixes #17669
Address apaszke 's comments in #17523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17691

Differential Revision: D14328083

Pulled By: ailzhang

fbshipit-source-id: 9ec4a54f13bfd1aaf4b1821dd00c31793ac07a44
2019-03-05 20:50:10 -08:00
36e0d39f50 enable use of MIOpen for depthwise convolutions (#17685)
Summary:
* added miopen conv mode to be used for setConvDescriptor
* added miopen depthwise convolutions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17685

Differential Revision: D14327811

Pulled By: bddppq

fbshipit-source-id: d5bdc1abafd5f39694fadf3f9275b9d880c5b115
2019-03-05 18:44:14 -08:00
bfe7a58f69 Implement a Caffe2 standalone LSTM operator (#17461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461

Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.

Two things missing:

- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch

Reviewed By: dzhulgakov

Differential Revision: D14160172

fbshipit-source-id: c33e3f9e8aeae578b64d97593cb031a251216029
2019-03-05 17:34:44 -08:00
a478d41620 Fix nll_loss crash on cpu where ignore_index is out of bounds (#17328)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15508
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17328

Differential Revision: D14322629

Pulled By: soumith

fbshipit-source-id: 7d02f372be78794782c18affcfc109ce30b1e91c
2019-03-05 14:35:05 -08:00
288e1fbd18 Add '--hip-clang-launch' to favor <<<>>>-based launch. (#17686)
Summary:
hip-clang uses triple chevron kernel dispatch syntax. Add an option to the hipification script to skip translating triple chevron to hipLaunchKernelGGL.

Once we switch to hip-clang, this option will be default and subsequently removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17686

Differential Revision: D14327810

Pulled By: bddppq

fbshipit-source-id: 5e1512325077dd3ebb8fb9b5bf35fd1f8d9a4dc3
2019-03-05 12:52:22 -08:00
079093a662 Improve caching allocator for Pascal and newer GPUs. (#17120)
Summary:
```
NVIDIA changed the CUDA allocation behavior on Pascal GPUs. The
page size increased from 1MB to 2MB and allocations larger than 1MB
are now always page-aligned. Previously, allocations larger than 1MB
were aligned to 128KB boundaries.

This interacted poorly with the caching allocator. The remaining
memory in a page could only be filled by small cudaMalloc calls, but
the caching allocator never cudaMalloc's a chunk smaller than 1MB.
This behavior could also cause a large discrepancy between the memory
usage reported by nvidia-smi and the memory usage reported by
PyTorch, because nvidia-smi counts a partially used page as "full",
while PyTorch only counts the actual memory requested.

This PR makes a few changes to the caching allocator to better support
Pascal and Volta GPUs:

 - All cudaMalloc calls are now multiples of 2MB (the page size)
 - Requests between 1-10MB allocate (and split) a 20MB block to
   reduce wasted space due to rounding
 - Small requests are now packed into 2MB blocks (instead of 1MB)

This improves Mask R-CNN memory usage by 10-20% in internal tests on
Volta GPUs. Maxwell performance seems to be largely unchanged, but
it's possible that some use cases suffer slightly.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17120

Differential Revision: D14301536

Pulled By: colesbury

fbshipit-source-id: a8282315ea8f7b8ca149b5066fdeaecd0d404edf
2019-03-05 09:44:27 -08:00
8420a2025b Turn the Half::from_bits into a constexpr function to avoid unresolve… (#17661)
Summary:
…d symbol errors when building in DEBUG mode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17661

Differential Revision: D14319610

Pulled By: soumith

fbshipit-source-id: 6c508a37155e29260f403d7174f343aa1ff32385
2019-03-05 07:31:38 -08:00
7fc3aa8c49 Remove Expect Files from python / tracing / script interop
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17622

Differential Revision: D14308307

Pulled By: eellison

fbshipit-source-id: bda249d38ac2570000a12b0ca328c26233ecefe8
2019-03-04 23:04:54 -08:00
b219882c0b Enable apex on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17675

Differential Revision: D14320473

Pulled By: soumith

fbshipit-source-id: cb696984f5196f9b8b50722b4fe927bb6407c322
2019-03-04 21:53:47 -08:00
f176450d60 bump docker build to upgrade magma to 2.5.0 (#17674)
Summary:
upgrades magma in docker build.

vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17674

Differential Revision: D14320187

Pulled By: soumith

fbshipit-source-id: 7887f65fb703b802fc6231408b55ad9c4039882b
2019-03-04 20:31:16 -08:00
54c4b5a4db refactor caffe2 operator constructors - 1/9 (#17082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17082

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078498

fbshipit-source-id: f7f65d6d81c7942293f53fdaa61f756d8b7360c1
2019-03-04 16:04:01 -08:00
910519e45b Expose cuda kernel for caffe2::GenerateProposals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17066

Reviewed By: ezyang, wat3rBro

Differential Revision: D14071130

fbshipit-source-id: 6fe26503f6069c36ec31d6c09b549b932d5db242
2019-03-04 14:59:08 -08:00
aea8dd8377 print warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17176

As title

Reviewed By: csummersea

Differential Revision: D14111616

fbshipit-source-id: 1282cb2452c4ad385fd2dc6d3f8c19e9fec715ff
2019-03-04 14:28:42 -08:00
8569d9cbea Fix XOutput/XOutputTensor for ivalue based c2 operators (#17599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17599

XOutput/XOutputTensor was broken for ivalue based operators. This diff fixes that.

Reviewed By: ezyang

Differential Revision: D14274003

fbshipit-source-id: b99f020244c66c4e2551dbd32ae0f665cc91b338
2019-03-04 14:20:13 -08:00
c7db0b35d8 Fix InputSize/OutputSize for ivalue based operators (#17579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17579

These methods previously just returned 0 when it was not a legacy operator,
making it impossible to convert some operators.

Reviewed By: dzhulgakov

Differential Revision: D14253094

fbshipit-source-id: 72bfdcf6da291a4ab80d1e0ceb20984b86edc408
2019-03-04 14:20:12 -08:00
173561ff12 Fix clamp fusion on missing limits (#17533)
Summary:
Fixes #17449

Context: before #17186, we don't fuse `clamp` for the case when `min/max` are missing inputs, because they are `prim::None` node, after #17186, we make None a `prim::Constant` node which enables the fusion for `clamp`. But codegen.cpp does not handle the case when `prim::Constant` is not a Double/Int/Bool, this PR makes it so that missing inputs are handled correctly, it is done in the following way:

1. emit nothing when you see `type? = prim::Constant()`
2. when emitRHS, do special casing for aten::clamp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17533

Differential Revision: D14238450

Pulled By: wanchaol

fbshipit-source-id: 61a272154754b13e89021bb86002927f02cde19c
2019-03-04 13:18:10 -08:00
Jie
a87eeec9bf int32 indexing for Tensor Iterator Reduction (#17428)
Summary:
1. Enabling int32 indexing for cases where TI cannot accumulate in output due to
incompatible data types (e.g. Welford).
2. Updating Welford kernel to use int32 instead of int64 indexing on GPU.

This change improves performance for torch.var / torch.std

Implementation:
1. Allocated extra buffer to handle accumulation between sub Tensor Iterators.
2. Removed int64 indexing in gpu_reduce_kernel
3. WelfordOps now supports index type / combination typeas a template parameter.
While GPU uses int32_t and float, CPU implementation uses int64_t and double.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17428

Differential Revision: D14264608

Pulled By: umanwizard

fbshipit-source-id: 3eb54451de925b469dbc1127e5ea7443c4431036
2019-03-04 13:11:47 -08:00
3257608276 Removed all usages of TH_Index_Base (#17591)
Summary:
TH_Index_Base is hard coded to 0 and can be removed from the code base.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17591

Differential Revision: D14269273

Pulled By: izdeby

fbshipit-source-id: d844e261f4af7297bad8a81e7d6dcf0a391b94e6
2019-03-04 12:51:42 -08:00
dec116e96f PyTorch/Caffe2 tensor interop in Python (#17190)
Summary:
Because of two separate python extensions with different pybind
instances I have to go through void* conversion. Since it's hidden from
user, it's fine.

New APIs added on C2 side:
- workspace.FetchTorch('blob')
- workspace.Workspace.current.blobs['blob'].to_torch()
- workspace.FeedBlob('blob', pytorch_tensor)

Works on CPU an GPU.

The only glitches are with resizing because of variable/tensor split.
But data sharing works properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17190

Reviewed By: ezyang

Differential Revision: D14163882

Pulled By: dzhulgakov

fbshipit-source-id: d18e5b8fcae026f393c842a1149e972515732de2
2019-03-04 11:34:01 -08:00
244d330980 Fixed typo in aten/src/ATen/native_parse.py (#17641)
Summary:
Hi, there.
There is a typo in aten/src/ATen/native_parse.py, and I fix it.
`std::aray` -> `std::array`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17641

Differential Revision: D14301981

Pulled By: ezyang

fbshipit-source-id: a37859cdedcbf6c29333b954486dfa086d6c2176
2019-03-04 10:10:52 -08:00
5b835682e3 Remove GPU dependency from ProfileObserver (#17592)
Summary:
Remove GPU dependency and register ProfileObserver.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17592

Reviewed By: ezyang

Differential Revision: D14265801

Pulled By: mdschatz

fbshipit-source-id: f98c0c32653c64a8b087c58ece4f864dfbe1d4b8
2019-03-04 10:00:46 -08:00
6a297b8675 Don't make factory methods create a tensor and then immediately copy it (#17565)
Summary:
Create a `make_variable` override that moves out of a tensor instead of going through `shallow_copy_and_detach`. Call this override from factory methods like `empty` that create a brand new tensor, do nothing with it, and then copy it into a variable.

Will update this with actual numbers, but it seems to get rid of around 20-40% of the overhead of calling `torch.empty(0)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17565

Differential Revision: D14266130

Pulled By: umanwizard

fbshipit-source-id: f57d5f2ca3f80ee8ee96d50f905e852fd10db941
2019-03-03 22:16:21 -08:00
7a51c03a30 Fixed typo in torch/functional.py w/r/t broadcast_tensors (#17642)
Summary:
In reference to #17574
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17642

Differential Revision: D14297177

Pulled By: ezyang

fbshipit-source-id: 968176ea3b46a0153da0fd9e6b40db314d29e51c
2019-03-03 10:08:41 -08:00
01977c0a89 Change fake tqdm constructor to match real tqdm (#17636)
Summary:
Currently, the fake tqdm implementation requires an input (whereas real tqdm does not).

This caused a problem in torchvision (https://github.com/pytorch/vision/pull/770), and seems likely to cause minor irritations elsewhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17636

Differential Revision: D14296530

Pulled By: ezyang

fbshipit-source-id: bc077d898773c93dab34c985a7b30525a43e558a
2019-03-03 01:06:10 -08:00
ef7ddcd29e Mark native_functions as matched if uncaptured by JIT (#17631)
Summary:
Various functions aren't used by the JIT, so they're jit-compliant w.r.t. their schema by default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17631

Differential Revision: D14295559

Pulled By: cpuhrsch

fbshipit-source-id: a2ecdcb5df47eb67c54ec642d88d42e985515142
2019-03-02 18:20:04 -08:00
80927fc068 Ban std::array from native_functions.yaml
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17629

Differential Revision: D14292941

Pulled By: cpuhrsch

fbshipit-source-id: 3c3eed57a5505a4e1da3aea682092677ab0e73e3
2019-03-01 19:21:49 -08:00
416474a720 Remove more usages of BoolTensor and IndexTensor from native_functions.yaml
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16468

Differential Revision: D14095405

Pulled By: cpuhrsch

fbshipit-source-id: ea4d6bb7a4e81c05fe9861190ddbf52201612bbf
2019-03-01 19:14:41 -08:00
c6715eda06 Implement kthvalue in ATen (#17544)
Summary:
The CPU version is based on the TH version.
The GPU version is based on #8406 by Pararth Shah (thank you).

CPU quickselect based on that in TH's THTensorMoreMath.cpp, but with C++ (quickselectnoindex will be achieved by a different swap)
CPU kthvalue is based on the THTensor function in the same file.
The dim_apply function is a C++ replacement for TH_TENSOR_DIM_APPLYx macros.
The CUDA kernel uses functions adapted from the THCTensorSortK implementation.
In particular radixSelect is from THCTensorTopK.cuh.
The CUDA launcher code replaces a bunch of macros with C++. It will be re-used in one of the following patches.

Plan for further PRs:
- This
- Sort
- TopK + Mode + Median in any order
- Rip out THC stuff.

There may be utility functions / structs in the SortingCommon.cuh that come into
relevance only with sort.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17544

Differential Revision: D14286934

Pulled By: ezyang

fbshipit-source-id: 35dbea050b097e88777ac5fa5c0f499d5e23c738
2019-03-01 19:00:10 -08:00
43f94077d8 Change vml.h to support sizes greater than 2**32 - 1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17280

Differential Revision: D14154997

Pulled By: cpuhrsch

fbshipit-source-id: c19b15d18da59c9ee87e82765d3244d2a4ef6729
2019-03-01 17:22:26 -08:00
2336f0ba06 msvc_fixes (#17201)
Summary:
Fixing MSVC errors

```
  D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(144): error C4002: too many actual paramet
ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp
roj]
  D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(259): error C4002: too many actual paramet
ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp
roj]
  D:/pytorch-scripts/caffe2_builders/v141/pytorch/aten/src/THCUNN/SpatialDilatedMaxPooling.cu(51): error C4002: too man
y actual parameters for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2
\caffe2_gpu.vcxproj]
```

on variadic C10_LAUNCH_BOUNDS as well as Debug linking issues with at::Half in pool_op_cudnn.cc like this one

```
pool_op_cudnn.obj : error LNK2019: unresolved external symbol "public: bool __cdecl caffe2::MaxPoolFunctor<class caff
e2::CUDAContext>::GlobalPoolingBackward<struct c10::Half,2>(int,int,int,struct c10::Half const *,struct c10::Half const
 ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@
@$01@?$MaxPoolFunctor@VCUDAContext@caffe2@@caffe2@QEBA_NHHHPEBUHalf@c10@00PEAU23@PEAVCUDAContext@1@Z) referenced in
 function "public: bool __cdecl caffe2::`anonymous namespace'::CuDNNMaxPoolFunctor::GlobalPoolingBackward<struct c10::H
alf,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class
caffe2::CUDAContext *)const " (??$GlobalPoolingBackward@UHalf@c10@@$01@CuDNNMaxPoolFunctor@?A0xb936404a@caffe2@QEBA_NH
HHPEBUHalf@c10@00PEAU34@PEAVCUDAContext@2@Z) [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caff
e2_gpu.vcxproj]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17201

Differential Revision: D14165732

Pulled By: ezyang

fbshipit-source-id: 875fd9a5b2db6f83fc483f6d750d2c011260eb8b
2019-03-01 15:17:41 -08:00
06c8aa7a3b Hipify fixes for Masquerade logic (#17598)
Summary:
ezyang Please review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17598

Differential Revision: D14287724

Pulled By: ezyang

fbshipit-source-id: 46e5083854a827370bb4c81b82e5a4ede511e473
2019-03-01 15:13:19 -08:00
ab95b5c6cc Rename prim::Undefined to prim::AutogradZero (#17611)
Summary:
supersedes #17245
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17611

Differential Revision: D14283581

Pulled By: wanchaol

fbshipit-source-id: 8022d02b8a021ea2fee9a18a2c8920eb123200c5
2019-03-01 15:13:18 -08:00
5b6703629c Add python test for extension backend tensor.device (#17602)
Summary:
Adding a test for #17361
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17602

Differential Revision: D14287373

Pulled By: li-roy

fbshipit-source-id: 544ecf17eb310aed22ba0ea5f86f46b8e3bb69b5
2019-03-01 14:22:49 -08:00
2ed99fee0d Revert D13935403: Call c10 cuda op from test_torch
Differential Revision:
D13935403

Original commit changeset: b2915ec8a366

fbshipit-source-id: 0f3409d5c102d719bc1f0483695aee93e7d613c9
2019-03-01 14:18:26 -08:00
c2c32340a4 add command line option to use hive filler; add README (#17619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17619

--filler hive --iter -1 will let debugger exhaust all batches from a hive partition before exiting.
add README that summarizes command line options and usage.

Reviewed By: yinghai

Differential Revision: D14220166

fbshipit-source-id: daa23b7e8a9184481c6d7b67acf1599e5c99d74a
2019-03-01 13:56:15 -08:00
5360984fbd Remove TH(CU)NN Sparse Linear (#17610)
Summary:
Sparse Linear in TH(CU)NN implements sparse linear layers without
using sparse matrices.
It is currently not documented in PyTorch and there is no functional or
module interface. This means it is unused from a PyTorch point of view.

The reason for removing it is twofold:
- The module uses sort, which I would like to move to ATen.
- When we implement a SparseLinear layer, we would want to do it
  using sparse tensors, so it's not all that useful, anyway.

I checked this on slack with soumith, I hope the above is an accurate
representation. All bad ideas are my own.

This is part of the ongoing work to move
sort/topk/mode/median/kthvalue to ATen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17610

Differential Revision: D14280663

Pulled By: gchanan

fbshipit-source-id: 289231d2c20626855ce2ceecd4f204b460c32378
2019-03-01 12:36:52 -08:00
19a6de328f Correct docstring of vision/init functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17351

Differential Revision: D14276355

Pulled By: soumith

fbshipit-source-id: 9b572b6a04eeb1e44cd93961edac76ed10f7b24e
2019-03-01 11:40:23 -08:00
0a7b2af13b Call c10 cuda op from test_torch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16692

Reviewed By: ezyang

Differential Revision: D13935403

fbshipit-source-id: b2915ec8a3664bb6e918ed357908cc33d8f9449a
2019-03-01 10:59:19 -08:00
698f947463 Revert #17191 and #17215 that no longer apply on Windows (#17567)
Summary:
They are previously merged to resolve #17051. However, since it was resolved by the upstream, and it was causing some issues like https://github.com/abjer/tsds/issues/8, I think it's time to revert these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17567

Differential Revision: D14265241

Pulled By: kostmo

fbshipit-source-id: 7fa2b7dd4ebc5148681acb439cf82d983898694e
2019-03-01 10:37:27 -08:00
e6a9062335 usertype -> class (#17528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17528

as title. register_prim_ops is messy because someone ruined clang-format, but I figured it's okay to include here since this is such a mechanical change

Reviewed By: driazati

Differential Revision: D14236943

fbshipit-source-id: c2b22845837b7f830015510e48ec2ee5202fa407
2019-03-01 10:08:23 -08:00
830ca665f5 alias analysis refactor take 2 (#17594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17594

The original version of this broke things because a concurrent change raced with it in CI.

Reviewed By: ezyang

Differential Revision: D14266663

fbshipit-source-id: e8ac5dfcb7349b4f2c425d9f0eabbfc964314063
2019-03-01 10:08:22 -08:00
7fddd01c51 Fix the missing Windows CPU job in the build status section (#17608)
Summary:
It will be better to split the CPU job on CI. But unluckily, we are out of Windows machines.
cc, davidbrownellWork yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17608

Differential Revision: D14281393

Pulled By: soumith

fbshipit-source-id: ae9a6140b7207ce56cfb2da3d812bc3fe060764a
2019-03-01 10:03:25 -08:00
81f2bdf9c2 Update magma to 2.5.0 for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17607

Differential Revision: D14281291

Pulled By: yf225

fbshipit-source-id: 51209c5540932871e45e54ba6d61b3b7d264aa8c
2019-03-01 09:53:56 -08:00
a6170573c8 Adding support for 0-d tensor for transpose (.t()) (#17535)
Summary:
- Test updates
1. test_torch: added 0-d test case and t_() test cases
2. test_jit  : updated error message for TestAsync.test_async_script_error

- Updating documentation for torch.t()
Adding information regarding new support of 0-D and 1-D tenso

Fixes #17520
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17535

Differential Revision: D14269984

Pulled By: gchanan

fbshipit-source-id: 38b723f31484be939261c88edb33575d242eca65
2019-03-01 08:45:01 -08:00
6899e901cc Updating submodules
Reviewed By: yns88

fbshipit-source-id: 05fafcfb34c76f425ac5c8ef24a5f920641c2cf7
2019-03-01 01:37:01 -08:00
212024282b Mark cudaGetLastError return value unused in C10_CUDA_CHECK
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17605

Reviewed By: xw285cornell

Differential Revision: D14277586

Pulled By: bddppq

fbshipit-source-id: 38879208f2ab83cf39d8a8a61b288cd09fcafd9a
2019-03-01 00:05:46 -08:00
d3fcd0d798 add dropout during eval (#17549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17549

Currently Dropout is only enabled in training, we enable the option of having dropout in Eval.

This is to follow [1]. This functionality would be used for uncertainty estimation in exploration project.

[1] Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.

Reviewed By: Wakeupbuddy

Differential Revision: D14216216

fbshipit-source-id: 87c8c9cc522a82df467b685805f0775c86923d8b
2019-02-28 23:21:29 -08:00
3ed44b6714 Adjust launch_bounds annotation for AMD hardware. (#17555)
Summary:
The max pooling backwards kernel is currently annotated with launch bounds (256,8).

Adjust the number of waves to 4 (4 times 64 is 256) for ROCm. This improves training performance for torchvision models by up to 15% (AlexNet) on a gfx906 GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17555

Differential Revision: D14277744

Pulled By: bddppq

fbshipit-source-id: 2a62088f7b8a87d1e350c432bf655288967c7883
2019-02-28 22:59:11 -08:00
6b07612cef Fix verbose compiler warning in flat_hash_map (#17562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17562

fixes https://github.com/pytorch/pytorch/issues/17332

Reviewed By: ezyang

Differential Revision: D14254499

fbshipit-source-id: 9d5d7408c2ce510ac20cd438c6514dc2bbe3a854
2019-02-28 16:38:43 -08:00
35a52aa33f Fix diagnostic pragmas (#17561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17561

The push at the top of the file was missing a corresponding pop

Reviewed By: ezyang

Differential Revision: D14254500

fbshipit-source-id: ff20359b563d6d6dcc68273dc754ab31aa8fad12
2019-02-28 16:38:42 -08:00
2e94054e34 Allow dispatch based on tensor list args (#17522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17522

Dispatch is still based on the first tensor arg, but that first "tensor arg" is now allowed to be a tensor list.
That is, the first argument that is either Tensor or TensorList will be the deciding factor for dispatch.
If it is a TensorList, then that TensorList must not be empty or dispatch will fail.

Reviewed By: ezyang

Differential Revision: D14235840

fbshipit-source-id: 266c18912d56ce77aa84306c5605c4191f3d882b
2019-02-28 16:32:00 -08:00
b004b31d06 Allow exposing caffe2 operators with variable number of input tensors to c10 (#17491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17491

Before, there was no way to expose a caffe2 operator that had a variable number of inputs.
Now, this is allowed by giving the operator one tensor list input.
Note that the tensor list must be the first input, and that any other tensor inputs will be ignored and inaccessible in this case.

Reviewed By: ezyang

Differential Revision: D14220705

fbshipit-source-id: 7f921bfb581caf46b229888c409bbcc40f7dda80
2019-02-28 16:31:59 -08:00
1ccf74ae9d blacklist fft algorithms for strided dgrad (#17016)
Summary:
Applies https://github.com/pytorch/pytorch/pull/16626 from v1.0.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17016

Differential Revision: D14270100

Pulled By: ezyang

fbshipit-source-id: 1137899dd1551d33d16f39e8dde76cad8192af46
2019-02-28 16:28:07 -08:00
0f60283e84 Revert D14078519: [codemod][caffe2] [clangr] refactor caffe2 operator constructors - 5/9
Differential Revision:
D14078519

Original commit changeset: b0ca31a52e4a

fbshipit-source-id: 713ae108d3dd6f33abdbf98a5f213e57e2b64642
2019-02-28 15:09:28 -08:00
b36d9351b1 Add generic list/dict custom op bindings (#17587)
Summary:
Fixes #17017

Sandcastle refuses to land #17037, so trying fresh here
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17587

Differential Revision: D14265402

Pulled By: driazati

fbshipit-source-id: b942721aa9360ac6b3862f552ac95529eb0cf52c
2019-02-28 15:00:26 -08:00
7413f0926a refactor caffe2 operator constructors - 8/9 (#17089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17089

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078539

fbshipit-source-id: 9ca196af4af7f26fc82e6cf82b35d478d0597752
2019-02-28 14:45:20 -08:00
28b5df1c8f refactor caffe2 operator constructors - 6/9 (#17087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17087

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078525

fbshipit-source-id: 7cc03b30b0d4eb99818e35406be4119b27bdb1bc
2019-02-28 14:23:57 -08:00
a4ed7126ca refactor caffe2 operator constructors - 2/9 (#17083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17083

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078504

fbshipit-source-id: 34dddb035eee2fca3150e47c57489614b91b6725
2019-02-28 14:23:55 -08:00
8db403b9dc refactor caffe2 operator constructors - 7/9 (#17088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17088

clangr codemod

also manually moved the constructor of a class from the .cpp file to the .h file.

Reviewed By: ezyang

Differential Revision: D14078531

fbshipit-source-id: 2adb4ac0ce523742da6cce3bc3b6c177b816c299
2019-02-28 14:23:53 -08:00
42512242cc refactor caffe2 operator constructors - 4/9 (#17085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17085

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078515

fbshipit-source-id: aaa48ae10892e3f47063f2133e026fea46f3240b
2019-02-28 14:23:52 -08:00
b0d3165cc8 refactor caffe2 operator constructors - 3/9 (#17084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17084

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078507

fbshipit-source-id: ed02d772890b30196302b6830f541f054b7e95c8
2019-02-28 14:13:17 -08:00
c9989dfe37 Make HIPStream also masquerade as CUDA. (#17469)
Summary:
HIPGuard interfaces that interacted with HIPStream were previously
totally busted (because the streams had the wrong device type).
This fixes it, following along the same lines of MasqueardingAsCUDA.

Along the way I beefed up the explanatory comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc jithunnair-amd iotamudelta bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17469

Differential Revision: D14243396

Pulled By: ezyang

fbshipit-source-id: 972455753a62f8584ba9ab194f9c785db7bb9bde
2019-02-28 13:46:11 -08:00
e157a6432f Fix Python device type property for XLA and MSNPU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17361

Differential Revision: D14243546

Pulled By: soumith

fbshipit-source-id: b7498968f72e3d97de5bf6e5b44c5a59b6913acb
2019-02-28 13:36:19 -08:00
c596683309 Rely on numel() == 1 to check if distribution parameters are scalar. (#17503)
Summary:
As discussed here #16952, this PR aims at improving the __repr__ for distribution when the provided parameters are torch.Tensor with only one element.

Currently, __repr__() relies on dim() == 0 leading to the following behaviour :

```
>>> torch.distributions.Normal(torch.tensor([1.0]), torch.tensor([0.1]))
Normal(loc: torch.Size([1]), scale: torch.Size([1]))
```

With this PR, the output looks like the following:
```
>>> torch.distributions.Normal(torch.tensor([1.0]), torch.tensor([0.1]))
Normal(loc: 1.0, scale: 0.10000000149011612)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17503

Differential Revision: D14245439

Pulled By: soumith

fbshipit-source-id: a440998905fd60cf2ac9a94f75706021dd9ce5bf
2019-02-28 13:36:17 -08:00
9cbd7a18f5 fix reordering of inlines (#17557)
Summary:
See comment inside of code. This fixes a bug where sometimes we would try to avoid printing long lines but would inadvertently reorder the expressions, which can change the semantics of the program
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17557

Differential Revision: D14250608

Pulled By: zdevito

fbshipit-source-id: d44996af4e90fe9ab9508d13cd04adbfc7bb5d1c
2019-02-28 13:12:15 -08:00
2e5a8cee82 Customize the printing of namedtuple return (#17136)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17112
```python
print("good", torch.randn(5,5,5).max(1))
print("terrible", torch.randn(5,5,10).max(1))
print("not as good", torch.randn(5,5,500).max(1))
print ("old behaviour = gold standard")
print(tuple(torch.randn(5,5,5).max(1)))
print(tuple(torch.randn(5,5,10).max(1)))
print(tuple(torch.randn(5,5,500).max(1)))
```
now gives
```
>>> import torch
>>> print("good", torch.randn(5,5,5).max(1))
good torch.return_types.max(
values=tensor([[ 1.2821,  1.8063,  1.8075,  1.3082, -0.1267],
        [ 0.3437,  0.7353,  1.2619,  0.7557,  1.6662],
        [ 0.8583,  1.8906,  1.0246,  1.7598,  1.1184],
        [ 1.7821,  0.0230,  0.9452,  1.0318,  1.0823],
        [ 0.4116, -0.0379, -0.1843,  1.4129,  1.8796]]),
indices=tensor([[4, 4, 3, 2, 1],
        [1, 2, 4, 1, 1],
        [2, 4, 0, 2, 1],
        [0, 2, 0, 3, 1],
        [0, 4, 4, 4, 4]]))
>>> print("terrible", torch.randn(5,5,10).max(1))
terrible torch.return_types.max(
values=tensor([[ 2.1272,  1.3664,  2.2067,  1.3974, -0.0883,  1.2505,  1.0074,  1.1217,
          0.3849,  0.6936],
        [ 0.6288, -0.4560,  1.2748,  1.5482,  1.2777,  1.6874,  0.7151,  0.6041,
          1.3572,  1.6232],
        [ 1.6703,  1.0075,  1.6480,  2.2839,  1.3390,  0.4938,  1.6449,  1.7628,
          0.8141,  2.5714],
        [ 0.7079,  1.8677,  3.2478,  1.5591,  2.4870,  0.8635, -0.1450,  1.6923,
          1.4924,  1.6298],
        [ 2.4056,  0.8002,  0.9317,  0.7455,  0.7866,  2.1191,  0.3492,  1.2095,
          1.8637,  1.7470]]),
indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4],
        [4, 2, 2, 1, 2, 2, 3, 1, 1, 3],
        [0, 3, 3, 0, 2, 1, 4, 1, 0, 1],
        [4, 1, 3, 0, 3, 2, 0, 1, 4, 3],
        [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]]))
>>> print("not as good", torch.randn(5,5,500).max(1))
not as good torch.return_types.max(
values=tensor([[ 0.3877,  0.7873,  1.8701,  ...,  0.5971,  1.6103, -0.3435],
        [ 1.1300,  2.2418,  1.4239,  ...,  1.3943,  0.3872,  1.6475],
        [ 2.0656,  1.3136,  0.9896,  ...,  2.3918,  0.8226,  1.0517],
        [ 1.1054,  0.9945,  1.0561,  ...,  2.1039,  1.1524,  3.0304],
        [ 1.5041,  2.2809,  1.0883,  ...,  0.8504,  2.4774,  1.1041]]),
indices=tensor([[4, 3, 1,  ..., 1, 4, 0],
        [4, 4, 4,  ..., 3, 0, 3],
        [3, 0, 1,  ..., 2, 2, 4],
        [0, 1, 1,  ..., 4, 2, 2],
        [1, 0, 4,  ..., 2, 0, 2]]))
>>> print ("old behaviour = gold standard")
old behaviour = gold standard
>>> print(tuple(torch.randn(5,5,5).max(1)))
(tensor([[ 1.1908,  1.1807,  1.3151,  1.7184,  0.3556],
        [ 0.3798,  0.9213,  0.3001,  1.3087,  2.2419],
        [ 1.4233,  1.4814,  1.9900,  1.7744,  1.3059],
        [ 1.0026, -0.0330,  1.3061,  1.8730,  2.0685],
        [ 1.3041,  1.6458,  1.3449,  1.8948,  3.6206]]), tensor([[0, 4, 3, 4, 0],
        [1, 1, 4, 0, 4],
        [4, 1, 0, 3, 3],
        [1, 2, 1, 4, 0],
        [3, 3, 0, 3, 3]]))
>>> print(tuple(torch.randn(5,5,10).max(1)))
(tensor([[-0.1232,  0.8275,  0.6732,  1.1223,  0.8247,  1.2851,  1.6009,  1.9979,
          1.9109,  0.7313],
        [ 0.2260,  0.5922,  1.6928,  0.6024,  2.1158,  3.0619,  0.5653,  0.7426,
          0.8316,  0.6346],
        [ 0.4319,  0.2231,  0.5255,  1.7620,  1.1657,  0.8875,  0.5782,  0.6506,
          0.5032,  1.7097],
        [ 0.4137,  1.7265,  1.4260,  2.0301,  1.2244,  0.7128,  2.6345,  0.7230,
          1.3553,  1.6508],
        [ 1.0684,  1.7195,  1.4068,  0.7076, -0.0242,  0.8474,  0.8754,  1.7108,
          0.2188,  1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0],
        [1, 4, 0, 0, 3, 2, 0, 0, 3, 3],
        [2, 3, 1, 1, 4, 0, 1, 4, 4, 4],
        [0, 4, 1, 3, 2, 0, 2, 0, 3, 1],
        [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]]))
>>> print(tuple(torch.randn(5,5,500).max(1)))
(tensor([[0.9395, 1.5572, 1.8797,  ..., 2.0494, 0.8202, 0.9623],
        [1.7937, 0.7225, 1.8836,  ..., 0.7927, 1.4976, 1.1813],
        [0.8558, 1.6943, 1.4192,  ..., 0.8327, 1.9661, 0.4197],
        [1.2993, 1.4995, 0.9357,  ..., 0.7810, 1.3030, 2.6216],
        [1.4206, 1.8315, 1.0338,  ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3,  ..., 3, 0, 2],
        [0, 1, 0,  ..., 0, 4, 3],
        [3, 4, 3,  ..., 3, 0, 0],
        [3, 2, 3,  ..., 1, 2, 1],
        [1, 2, 4,  ..., 3, 1, 3]]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136

Differential Revision: D14250021

Pulled By: VitalyFedyunin

fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93
2019-02-28 13:07:26 -08:00
1046593509 Revert D14231251: [jit] alias_analysis refactor
Differential Revision:
D14231251

Original commit changeset: 6cd98ae6fced

fbshipit-source-id: 96189f47daf7cc4cf4ef5cd343022d56a2296b39
2019-02-28 12:56:17 -08:00
44fb22f9fe refactor caffe2 operator constructors - 5/9 (#17086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17086

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078519

fbshipit-source-id: b0ca31a52e4ab97b145a1490461d59f8fa93874a
2019-02-28 12:00:39 -08:00
54c5b10934 alias_analysis refactor (#17511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17511

AliasTracker was doing bookkeeping for three concepts: the points-to graph,
writes, and wildcards.

This PR makes AliasTracker's job clearer: it keeps track of the points-to
graph. Thus it has been renamed MemoryDAG. Write and wildcard information were
pulled back into AliasDb as part of this—I may decide to pull them into their
own little modules since I don't want the alias analysis stuff to get too
bloated.

This refactor is necessary because we want to start tracking information for
aliasing elements that _aren't_ first-class IR Values (e.g. the "stuff" inside
a list). So MemoryDAG can't know too much about Values

Reviewed By: houseroad

Differential Revision: D14231251

fbshipit-source-id: 6cd98ae6fced8d6c1522c2454da77c3c1b2b0504
2019-02-28 12:00:36 -08:00
f9d3f1dca5 allow "before" and "after" alias annotations (#17480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17480

This was always part of our "spec" but not implemented

Reviewed By: houseroad

Differential Revision: D14214301

fbshipit-source-id: 118db320b43ec099dc3e730c67d39487474c23ea
2019-02-28 12:00:34 -08:00
e1df99295f ONNXIFI extension & e2e tests. (#17478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17478

Enable onnxifi_ext in glow and build an e2e test in caffe2.

Reviewed By: yinghai

Differential Revision: D14190136

fbshipit-source-id: 26245278b487b551623109b14432f675279b17b5
2019-02-28 11:54:55 -08:00
7fbee1f79e update slack invite instructions
Summary: update slack invite instructions

Reviewed By: pjh5

Differential Revision: D14255348

fbshipit-source-id: 564fed0d44a6a68f80d1894fed40c3ddb360aa52
2019-02-28 11:29:05 -08:00
456d3e5f56 Fix errors in the description for installation on Windows (#17475)
Summary:
+ All quotes for ENV VARS are erroneous;
+ Toolset hasn't be specified;
+ Provide paths for all 3 Visual Studio 2017 products: Community/Professional/Enterprise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17475

Differential Revision: D14262968

Pulled By: soumith

fbshipit-source-id: c0504e0a6be9c697ead83b06b0c5cf569b5c8625
2019-02-28 10:41:16 -08:00
a9395ce259 refactor caffe2 operator constructors - 9/9 (#17090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17090

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078550

fbshipit-source-id: 68e6de4298e55ce83039b7806c1a275c4d6593c8
2019-02-28 09:53:18 -08:00
9bcceb75b5 Fix the false generated_comment (#17563)
Summary:
The generated_comments are wrong to below generated files:
```bash
./torch/csrc/autograd/generated/VariableType_0.cpp:3:// generated from tools/autograd/templates/VariableType_0.cpp
./torch/csrc/autograd/generated/VariableType_1.cpp:3:// generated from tools/autograd/templates/VariableType_1.cpp
./torch/csrc/autograd/generated/VariableType_2.cpp:3:// generated from tools/autograd/templates/VariableType_2.cpp
./torch/csrc/autograd/generated/VariableType_3.cpp:3:// generated from tools/autograd/templates/VariableType_3.cpp
./torch/csrc/autograd/generated/VariableType_4.cpp:3:// generated from tools/autograd/templates/VariableType_4.cpp
./torch/csrc/autograd/generated/VariableTypeEverything.cpp:3:// generated from tools/autograd/templates/VariableTypeEverything.cpp

./torch/csrc/jit/generated/register_aten_ops_0.cpp:23:// generated from tools/autograd/templates/register_aten_ops_0.cpp
./torch/csrc/jit/generated/register_aten_ops_1.cpp:23:// generated from tools/autograd/templates/register_aten_ops_1.cpp
./torch/csrc/jit/generated/register_aten_ops_2.cpp:23:// generated from tools/autograd/templates/register_aten_ops_2.cpp
```

These generated files were split to speed the compile, however, the template files are not.
After this fix, the comments will look like below:
```bash
./torch/csrc/autograd/generated/VariableType_0.cpp:3:// generated from tools/autograd/templates/VariableType.cpp
./torch/csrc/autograd/generated/VariableType_1.cpp:3:// generated from tools/autograd/templates/VariableType.cpp
......
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17563

Differential Revision: D14260992

Pulled By: soumith

fbshipit-source-id: 038181367fa43bee87837e4170704ddff7f4d6f2
2019-02-28 09:44:08 -08:00
13e6326c07 Remove useless OpenCV reference
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17564

Differential Revision: D14255542

Pulled By: dzhulgakov

fbshipit-source-id: c129f3751ae82deedd258ee16586552b77baaca6
2019-02-27 23:31:10 -08:00
03132c1f56 convolution/matmul/dropout (#17523)
Summary:
* Add AD formula for _convolution & matmul & dropout
* add prim::range, fixes #17483
Example:
```
dim = 3
x = range(dim)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17523

Differential Revision: D14254002

Pulled By: ailzhang

fbshipit-source-id: ba60d77b047db347929b72beca2623fb26aec957
2019-02-27 21:41:59 -08:00
221edddd18 disallow shape analysis with resize ops (#17518)
Summary:
resize_ and resize_as resize the input tensor. because our shape analysis
is flow invariant, we don't do shape analysis on any op that relies on a Tensor that can alias a resized Tensor.

E.g. in the following graph the x += 10 x may have been resized.
```
torch.jit.script
def test(x, y):
    for i in range(10):
        x += 10
        x.resize_as_([1 for i in int(range(torch.rand())))
    return x

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17518

Differential Revision: D14249835

Pulled By: eellison

fbshipit-source-id: f281b468ccb8c29eeb0f68ca5458cc7246a166d9
2019-02-27 19:02:09 -08:00
6706e9af19 Make C10_MOBILE consistent with how feature macros are usually used (#17481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481

Usually, feature macros are either defined or undefined and checked accordingly.
C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0.

This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it
from the server build (because I was using ifdef). Also, I found a place in the existing code base that made
that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts

Reviewed By: dzhulgakov

Differential Revision: D14214825

fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6
2019-02-27 17:57:51 -08:00
7c5ffc4120 Disable c10 dispatcher on mobile (#17078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17078

This prevents caffe2 operators from being expsoed to c10 on mobile,
which in turn causes the whole c10 dispatcher to be stripped away
and saves binary size.

We probably want to re-enable the c10 dispatcher for mobile,
but for now this is ok.

Reviewed By: ezyang

Differential Revision: D14077972

fbshipit-source-id: e4dd3e3b60cdfbde91fe0d24102c1d9708d3e5c4
2019-02-27 17:57:50 -08:00
1154506533 Always synchronize src and dst streams when copying tensors (#16966)
Summary:
fixes #15568
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16966

Differential Revision: D14213144

Pulled By: mrshenli

fbshipit-source-id: 2fcf5e07895fde80b4aee72e2736b0def876d21f
2019-02-27 14:57:56 -08:00
5f06dcc4d7 ONNX Export Adaptive Pooling
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17412

Differential Revision: D14247923

Pulled By: houseroad

fbshipit-source-id: 5530cea8f80da7368bff1e29cf89c45ad53accee
2019-02-27 14:57:54 -08:00
e47aeede32 Use name for output variables instead of out in JIT (#17386)
Summary:
This adds 88 matches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17386

Differential Revision: D14179139

Pulled By: cpuhrsch

fbshipit-source-id: 2c3263b8e4d084db84791e53290e8c8b1b7aecd5
2019-02-27 14:03:33 -08:00
1971c0528d Forcing UTC on Mac circleci jobs (#17516)
Summary:
And adding timestamps to linux build jobs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17516

Differential Revision: D14244533

Pulled By: pjh5

fbshipit-source-id: 26c38f59e0284c99f987d69ce6a2c2af9116c3c2
2019-02-27 13:22:06 -08:00
9709d5e787 Fix math::Set for large tensor (#17539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17539

Fix math::Set for large tensor

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov, houseroad

Differential Revision: D14240756

fbshipit-source-id: 0ade26790be41fb26d2cc193bfa3082c7bd4e69d
2019-02-27 12:34:58 -08:00
b4572668b4 Add sparse gradient option to gather operation (#17182)
Summary:
This PR allows `gather` to optionally return sparse gradients, as requested in #16329. It also allows to autograd engine to accumulate sparse gradients in place when it is safe to do so.
I've commented out size.size() check in `SparseTensor.cpp` that also caused #17152, it does not seem to me that check serves a useful purpose, but please correct me if I'm wrong and a better fix is required.
Motivating example:
For this commonly used label smoothing loss function
```
def label_smoothing_opt(x, target):
    padding_idx = 0
    smoothing = 0.1
    logprobs = torch.nn.functional.log_softmax(x, dim=-1, dtype=torch.float32)
    pad_mask = (target == padding_idx)
    ll_loss = logprobs.gather(dim=-1, index=target.unsqueeze(1), sparse = True).squeeze(1)
    smooth_loss = logprobs.mean(dim=-1)
    loss =  (smoothing - 1.0) * ll_loss - smoothing * smooth_loss
    loss.masked_fill_(pad_mask, 0)
    return loss.sum()
```
backward goes from 12.6 ms with dense gather gradients to 7.3 ms with sparse gradients, for 9K tokens x 30K vocab, which is some single percent end-to-end improvement, and also improvement in peak memory required.
Shout-out to core devs: adding python-exposed functions with keyword arguments through native_functions.yaml is very easy now!

cc gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17182

Differential Revision: D14158431

Pulled By: gchanan

fbshipit-source-id: c8b654611534198025daaf7a634482b3151fbade
2019-02-27 11:42:48 -08:00
a2b9f7f484 add elastic zeus handler (#16746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16746

as titled. We use a special url schem elasticzeus for elastic zeus so that we dont need to change the public interface of init_process_group.

Reviewed By: aazzolini, soumith

Differential Revision: D13948151

fbshipit-source-id: 88939dcfa0ad93467dabedad6905ec32e6ec60e6
2019-02-27 11:29:59 -08:00
222a07863f optimize elementwise sum (#17456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17456

Using an instruction sequence similar to function in fbgemm/src/QuantUtilAvx2.cc
elementwise_sum_benchmark added

Reviewed By: protonu

Differential Revision: D14205695

fbshipit-source-id: 84939c9d3551f123deec3baf7086c8d31fbc873e
2019-02-27 10:12:41 -08:00
8c72217817 Enable boolean_mask, adadelta, adagrad fp16 on ROCm (#17235)
Summary:
-  Fix bugs, indentation for adadelta and adagrad tests to enable fp16
- Enable boolean_mask fp16  on ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17235

Differential Revision: D14240828

Pulled By: bddppq

fbshipit-source-id: ab6e8f38aa7afb83b4b879f2f4cf2277c643198f
2019-02-27 10:07:36 -08:00
e0b44cac1f Enabled HALF for fill() and zero() methods. Moved them into THTensorFill (#17536)
Summary:
For some additional context on this change, please, see this [PR](https://github.com/pytorch/pytorch/pull/17376)

As a part of work on Bool Tensor, we will need to add support for a bool type to _fill() and _zero() methods that are currently located in THTensorMath. As we don't need anything else and those methods are not really math related - we are moving them out into separate THTensorFill for simplicity.

Change:
-moved _fill() and _zero() from THTensorMath.h to THTensorFill
-enabled _fill() and _zero() for HALF type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17536

Differential Revision: D14242130

Pulled By: izdeby

fbshipit-source-id: 1d8bd806f0f5510723b9299d360b70cc4ab96afb
2019-02-27 09:21:54 -08:00
44a607b90c Fix autograd with buffers requiring grad in DataParallel (#13352)
Summary:
Causing a problem with spectral norm, although SN won't use that anymore after #13350 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13352

Differential Revision: D14209562

Pulled By: ezyang

fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134
2019-02-26 20:53:19 -08:00
74098eadb0 enable assymetric dilations and stride for miopen conv (#17472)
Summary:
As of MIOpen 1.7.1 as shipped in ROCm 2.1 this works correctly and we can use MIOpen and do not need to fall back
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17472

Differential Revision: D14210323

Pulled By: ezyang

fbshipit-source-id: 4c08d0d4623e732eda304fe04cb722c835ec70e4
2019-02-26 20:45:35 -08:00
76828647c1 Enable tests working on ROCm 2.1 dual gfx906
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17473

Reviewed By: bddppq

Differential Revision: D14210243

Pulled By: ezyang

fbshipit-source-id: 519032a1e73c13ecb260ea93102dc8efb645e070
2019-02-26 20:41:16 -08:00
96faaa9d50 Fix linking errors when building dataloader test binaries on Windows (#17494)
Summary:
Fixes #17489.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17494

Differential Revision: D14226525

Pulled By: ezyang

fbshipit-source-id: 3dfef9bc6f443d647e9f05a54bc17c5717033723
2019-02-26 20:36:45 -08:00
cbefd0323b Fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17521

Differential Revision: D14237482

Pulled By: soumith

fbshipit-source-id: 636e0fbe2c667d15fcb649136a65ae64937fa0cb
2019-02-26 20:23:34 -08:00
eff672ef06 Remove Bool/IndexTensor from schema for native functions with derivatives (#17193)
Summary:
This only deals with four functions, but is an important first step towards removing BoolTensor and IndexTensor entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17193

Differential Revision: D14157829

Pulled By: cpuhrsch

fbshipit-source-id: a36f16d1d88171036c44cc7de60ac9dfed9d14f2
2019-02-26 17:54:33 -08:00
348d1889ff Fix operator initialization order (#15445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15445

Initilize task graph after operators (task graph uses ops)

Reviewed By: yinghai

Differential Revision: D13530864

fbshipit-source-id: fdc91e9158c1b50fcc96fd1983fd000fdf20c7da
2019-02-26 15:41:16 -08:00
4ca1a54526 Make transpose consistent with numpy's behavior (#17462)
Summary:
Pytorch's tensor.t() is now equivalent with Numpy's ndarray.T for 1D tensor
i.e. tensor.t() == tensor

Test case added:
- test_t

fixes #9687
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17462

Differential Revision: D14214838

Pulled By: soumith

fbshipit-source-id: c5df1ecc8837be22478e3a82ce4854ccabb35765
2019-02-26 14:23:19 -08:00
63519df07a Bump up the ONNX default opset version to 10 (#17419)
Summary:
Align with the master of ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17419

Reviewed By: zrphercule

Differential Revision: D14197985

Pulled By: houseroad

fbshipit-source-id: 13fc1f7786aadbbf5fe83bddf488fee3dedf58ce
2019-02-26 12:37:27 -08:00
72eb70c272 ' ' ==> ' ' (#17498)
Summary:
Fix formatting error for cpp code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17498

Reviewed By: zou3519

Differential Revision: D14224549

Pulled By: fmassa

fbshipit-source-id: f1721c4a75908ded759aea8c561f2e1d66859eec
2019-02-26 12:31:50 -08:00
1607bb322d Support all ROCm supported uarchs simultaneously: gfx803, gfx900, gfx906 (#17367)
Summary:
Correct misspelled flag.

Remove dependency on debug flag (HCC_AMDGPU_TARGET)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17367

Differential Revision: D14227334

Pulled By: bddppq

fbshipit-source-id: d838f219a9a1854330b0bc851c40dfbba77a32ef
2019-02-26 11:54:07 -08:00
5903522ad6 refactor: a bit intricate so I refactor it (#16995)
Summary:
this code is a bit intricate so i refactor it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16995

Differential Revision: D14050667

Pulled By: ifedan

fbshipit-source-id: 55452339c6518166f3d4bc9898b1fe2f28601dc4
2019-02-26 10:21:22 -08:00
e5b4baab40 new batch of expect file removals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17486

Differential Revision: D14218963

Pulled By: eellison

fbshipit-source-id: dadc8bb71e756f47cdb04525d47f66c13ed56d16
2019-02-26 08:20:43 -08:00
2cdbb140e6 user defined types (#17314)
Summary:
First pass at user defined types. The following is contained in this PR:
- `UserType` type, which contains a reference to a module with all methods for the type, and a separate namespace for data attributes (map of name -> TypePtr).
- `UserTypeRegistry`, similar to the operator registry
- `UserObject` which is the runtime representation of the user type (just a map of names -> IValues)
- `UserTypeValue` SugaredValue, to manage getattr and setattr while generating IR, plus compiler.cpp changes to make that work.
- Frontend changes to get `torch.jit.script` to work as a class decorator
- `ClassDef` node in our AST.
- primitive ops for object creation, setattr, and getattr, plus alias analysis changes to make mutation safe.

Things that definitely need to get done:
- Import/export, python_print support
- String frontend doesn't understand class definitions yet
- Python interop (using a user-defined type outside TorchScript) is completely broken
- Static methods (without `self`) don't work

Things that are nice but not essential:
- Method definition shouldn't matter (right now you can only reference a method that's already been defined)
- Class definitions can only contain defs, no other expressions are supported.

Things I definitely won't do initially:
- Polymorphism/inheritance
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17314

Differential Revision: D14194065

Pulled By: suo

fbshipit-source-id: c5434afdb9b39f84b7c85a9fdc2891f8250b5025
2019-02-26 01:34:07 -08:00
3ceeaae5e6 add mutability to docs (#17454)
Summary:
Not sure the best way to integrate this…I wrote something that focuses on mutability "vertically" through the stack. Should I split it up and distribute it into the various sections, or keep it all together?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17454

Differential Revision: D14222883

Pulled By: suo

fbshipit-source-id: 3c83f6d53bba9186c32ee443aa9c32901a0951c0
2019-02-26 00:36:19 -08:00
5d77f4f0d5 Remove usages of int64_t from native_functions.yaml
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17387

Differential Revision: D14185458

Pulled By: cpuhrsch

fbshipit-source-id: 5c8b358d36b77b60c3226afcd3443c2b1727cbc2
2019-02-25 17:52:26 -08:00
2f840ba6d6 upload alias tracker graph for docs (#17476)
Summary:
as title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17476

Differential Revision: D14218312

Pulled By: suo

fbshipit-source-id: 64df096a3431a6f25cd2373f0959d415591fed15
2019-02-25 16:58:43 -08:00
68e90a398e Temporarily disable select/topk/kthvalue AD (#17470)
Summary:
Temporarily disable them for perf consideration. Will figure out a way to do `torch.zeros(sizes, grad.options())` in torchscript before enabling these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17470

Differential Revision: D14210313

Pulled By: ailzhang

fbshipit-source-id: efaf44df1192ae42f4fe75998ff0073234bb4204
2019-02-25 16:29:11 -08:00
411cf434af Batch of expect file removals Remove dce expect files (#17471)
Summary:
Batch of removing expect test files
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17471

Differential Revision: D14217265

Pulled By: eellison

fbshipit-source-id: 425da022115b7e83aca86ef61d4d41fd046d439e
2019-02-25 16:15:19 -08:00
df0d4e6c7a Back out part of "Fix NERPredictor for zero initialization"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17482

Reviewed By: david-y-lam

Differential Revision: D14216135

fbshipit-source-id: 2ef4cb5dea74fc5c68e9b8cb43fcb180f219cb32
2019-02-25 16:03:45 -08:00
e4e9b738d3 Followup to #17049: change more instances of RuntimeError to IndexError
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17114

Differential Revision: D14150890

Pulled By: gchanan

fbshipit-source-id: 579ca71665166c6a904b894598a0b334f0d8acc7
2019-02-25 15:34:22 -08:00
59ece70201 Missing argument description (value) in scatter_ function documentation (#17467)
Summary: Update the docs to include the value parameter that was missing in the `scatter_` function.

Differential Revision: D14209225

Pulled By: soumith

fbshipit-source-id: 5c65e4d8fbd93fcd11a0a47605bce6d57570f248
2019-02-25 14:39:26 -08:00
9e08c998db Throw exception when foxi is not checked out (#17477)
Summary:
Add check and provide useful warning/error information to user if foxi is not checked out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17477

Reviewed By: zrphercule

Differential Revision: D14212896

Pulled By: houseroad

fbshipit-source-id: 557247d5d8fdc016b1c24c2a21503e59f874ad09
2019-02-25 14:39:24 -08:00
6f53c51a01 Updating submodules
Reviewed By: yns88

fbshipit-source-id: ae3e05c2ee3af5df171556698ff1469780d739d1
2019-02-25 13:50:20 -08:00
79a5a73a1e simplify aliasdb interface (#17453)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#17453 [jit] simplify aliasdb interface**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D14205209/)

The previous "getWrites" API relies on the user to do alias checking, which is confusing and inconsistent with the rest of the interface. So replace it with a higher-level call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17453

Differential Revision: D14209942

Pulled By: suo

fbshipit-source-id: d4aff2af6062ab8465ee006fc6dc603296bcb7ab
2019-02-25 13:34:51 -08:00
b0b7541ca4 fix list type unification (#17424)
Summary:
Previously we were unifying the types of lists across if block outputs. This now fails with Optional subtyping because two types which can be unified have different runtime representations.
```

            torch.jit.script
            def list_optional_fails(x):
                # type: (bool) -> Optional[int]
                if x:
                    y = [1]
                else:
                    y = [None]
                return y[0]

```
the indexing op will expect y to be a generic list, but it will find an intlist.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17424

Differential Revision: D14210903

Pulled By: eellison

fbshipit-source-id: 4b8b26ba2e7e5bebf617e40316475f91e9109cc2
2019-02-25 13:34:50 -08:00
b527055fcf Restore current streams on dst device after switching streams (#17439)
Summary:
When switching back to `d0` from a stream on a different device `d1`, we need to restore the current streams on both `d0` and `d1`. The current implementation only does that for `d0`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17439

Differential Revision: D14208919

Pulled By: mrshenli

fbshipit-source-id: 89f2565b9977206256efbec42adbd789329ccad8
2019-02-25 12:06:41 -08:00
29c27d7b99 Automatic update of fbcode/onnx to e18bb41d255a23daf368ffd62a2645db55db4c72 (#17460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17460

Previous import was 4c091e048ca42682d63ccd3c1811560bc12b732d

Included changes:
- **[e18bb41](https://github.com/onnx/onnx/commit/e18bb41)**: Infer shape of the second output of Dropout op (#1822) <Shinichiro Hamaji>
- **[cb544d0](https://github.com/onnx/onnx/commit/cb544d0)**: Clarify dtype of Dropout's mask output (#1826) <Shinichiro Hamaji>
- **[b60f693](https://github.com/onnx/onnx/commit/b60f693)**: Fix shape inference when auto_pad  is notset (#1824) <Li-Wen Chang>
- **[80346bd](https://github.com/onnx/onnx/commit/80346bd)**: update test datat (#1825) <Rui Zhu>
- **[b37fc6d](https://github.com/onnx/onnx/commit/b37fc6d)**: Add stringnormalizer operator to ONNX (#1745) <Dmitri Smirnov>

Reviewed By: zrphercule

Differential Revision: D14206264

fbshipit-source-id: 0575fa3374ff2b93b2ecee9989cfa4793c599117
2019-02-25 11:09:08 -08:00
393c97fda7 Fix variable checking in THCPModule_setRNGState (#17474)
Summary:
See https://github.com/pytorch/pytorch/pull/16325/files#r259576901
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17474

Differential Revision: D14209549

Pulled By: yf225

fbshipit-source-id: 2ae091955ae17f5d1540f7d465739c4809c327f8
2019-02-25 11:05:51 -08:00
724c7e76c6 Fix reduction='none' in poisson_nll_loss (#17358)
Summary:
Changelog:
- Modify `if` to `elif` in reduction mode comparison
- Add error checking for reduction mode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17358

Differential Revision: D14190523

Pulled By: zou3519

fbshipit-source-id: 2b734d284dc4c40679923606a1aa148e6a0abeb8
2019-02-25 10:35:33 -08:00
f9ba3831ef Apply modernize-use-override (4)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.
bypass-lint
drop-conflicts

Reviewed By: ezyang

Differential Revision: D14191981

fbshipit-source-id: 1f3421335241cbbc0cc763b8c1e85393ef2fdb33
2019-02-25 08:31:27 -08:00
15a55b86ed Fix nonzero for scalars on cuda, to_sparse for scalars on cpu/cuda. (#17406)
Summary:
I originally set out to fix to_sparse for scalars, which had some overly restrictive checking (sparse_dim > 0, which is impossible for a scalar).

This fix uncovered an issue with nonzero: it didn't properly return a size (z, 0) tensor for an input scalar, where z is the number of nonzero elements (i.e. 0 or 1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17406

Differential Revision: D14185393

Pulled By: gchanan

fbshipit-source-id: f37a6e1e3773fd9cbf69eeca7fdebb3caa192a19
2019-02-25 08:23:40 -08:00
65ecef1509 Export ElementwiseLinear to ONNX (Mul + Add). (#17411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17411

Reshape-based approach to support dynamic shape.
The first Reshape flatten inner dimensions and the second one recover the actual shape.
No Shape/Reshape will be generated unless necessary.

![image](https://user-images.githubusercontent.com/5203025/52215001-114ace80-28ce-11e9-815f-28ad190d3189.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16716

Reviewed By: zrphercule

Differential Revision: D14094532

Pulled By: houseroad

fbshipit-source-id: bad6a1fbf5963ef3dd034ef4bf440f5a5d6980bc
2019-02-25 08:11:13 -08:00
3d68a2d6de Add foxi submodule (ONNXIFI facebook extension)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17178

Reviewed By: yinghai

Differential Revision: D14197987

Pulled By: houseroad

fbshipit-source-id: c21d7235e40c2ca4925a10c467c2b4da2f1024ad
2019-02-25 08:00:03 -08:00
3de67cd63d Fix remaining -Wreturn-std-move violations in fbcode (#17308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17308

In some cases there is still no RVO/NRVO and std::move is still needed. Latest
Clang gained -Wreturn-std-move warning to detect cases like this (see
https://reviews.llvm.org/D43322).

Reviewed By: igorsugak

Differential Revision: D14150915

fbshipit-source-id: 0df158f0b2874f1e16f45ba9cf91c56e9cb25066
2019-02-25 07:29:16 -08:00
4ac91b2d64 add debug/release tip to cpp docs (#17452)
Summary:
as title. These were already added to the tutorials, but I didn't add them to the cpp docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17452

Differential Revision: D14206501

Pulled By: suo

fbshipit-source-id: 89b5c8aaac22d05381bc4a7ab60d0bb35e43f6f5
2019-02-24 23:08:15 -08:00
15840e30dc add pointer to windows FAQ in contributing.md (#17450)
Summary:
" ProTip! Great commit summaries contain fewer than 50 characters. Place extra information in the extended description."
lol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17450

Differential Revision: D14206500

Pulled By: suo

fbshipit-source-id: af7ffe299f8c8f04fa8e720847a1f6d576ebafc1
2019-02-24 23:03:00 -08:00
d76b9395a0 Remove ROIPooling (#17434)
Summary:
Fixes: #17399

It's undocumented, unused and, according to the issue, not actually working.

Differential Revision: D14200088

Pulled By: soumith

fbshipit-source-id: a81f0d0f5516faea2bd6aef5667b92c7dd012dbd
2019-02-23 21:00:10 -08:00
d80f0a1f3a Add example to WeightedRandomSampler doc string (#17432)
Summary: Example for the weighted random sampler are missing [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler)

Differential Revision: D14198642

Pulled By: soumith

fbshipit-source-id: af6d8445d31304011002dd4308faaf40b0c1b609
2019-02-23 20:29:06 -08:00
96b765dcf6 Revert D14095703: [pytorch][PR] [jit] Add generic list/dict custom op bindings
Differential Revision:
D14095703

Original commit changeset: 2b5ae20d42ad

fbshipit-source-id: 85b23fe4ce0090922da953403c95691bf3e28710
2019-02-23 15:55:08 -08:00
a1ca908ac2 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 8fa0be05e7410a863febb98b18be55ab723a41db
2019-02-23 12:45:50 -08:00
bb3a2d99ac Jaliyae/chunk buffer fix (#17409)
Summary:
The chunk buffer had a possibility to hang when no data is read and the buffer size is lower than chunk size. We detected this while running with larger dataset and hence the fix. I added a test to mimic the situation and validated that the fix is working. Thank you Xueyun for finding this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17409

Differential Revision: D14198546

Pulled By: soumith

fbshipit-source-id: b8ca43b0400deaae2ebb6601fdc65b47f32b0554
2019-02-23 08:48:53 -08:00
5ea6344c54 Skip test_event_handle_multi_gpu() on a single GPU system (#17402)
Summary:
This fixes the following test failure:

```
======================================================================
ERROR: test_event_handle_multi_gpu (__main__.TestMultiprocessing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_multiprocessing.py", line 445, in test_event_handle_multi_gpu
    with torch.cuda.device(d1):
  File "/home/stefan/rel/lib/python3.7/site-packages/torch/cuda/__init__.py", line 229, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /home/stefan/pytorch/torch/csrc/cuda/Module.cpp:33
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17402

Differential Revision: D14195190

Pulled By: soumith

fbshipit-source-id: e911f3782875856de3cfbbd770b6d0411d750279
2019-02-23 08:29:36 -08:00
be4ad3fe30 fix(typo): Change 'integeral' to 'integer'
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17396

Differential Revision: D14195023

Pulled By: soumith

fbshipit-source-id: 300ab68c24bfbf10768fefac44fad64784463c8f
2019-02-23 08:22:01 -08:00
6a99f86429 Fix the ONNX expect file (#17430)
Summary:
The CI is broken now, this diff should fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17430

Differential Revision: D14198045

Pulled By: houseroad

fbshipit-source-id: a1c8cb5ccff66f32488702bf72997f634360eb5b
2019-02-23 00:02:02 -08:00
674e11ccde order caffe2 ubuntu configs contiguously (#17427)
Summary:
This involves another purely cosmetic (ordering) change to the `config.yml` to facilitate simpler logic.

Other changes:
* add some review feedback as comments
* exit with nonzero status on config.yml mismatch
* produce a diagram for pytorch builds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17427

Differential Revision: D14197618

Pulled By: kostmo

fbshipit-source-id: 267439d3aa4c0a80801adcde2fa714268865900e
2019-02-22 20:18:29 -08:00
c10662962c remove redundant inference functions for FC (#17407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17407

As title says

Reviewed By: csummersea

Differential Revision: D14177921

fbshipit-source-id: e48e1086d37de2c290922d1f498e2d2dad49708a
2019-02-22 20:13:20 -08:00
08fed51926 optimize max pool 2d (#17418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17418

Retry of D14181620 this time with CMakeLists.txt changes

Reviewed By: jianyuh

Differential Revision: D14190538

fbshipit-source-id: c59b1bd474edf6376f4c2767a797b041a2ddf742
2019-02-22 19:43:57 -08:00
340cf2a2dd Generate derived extension backend Type classes for each scalar type (#17278)
Summary:
Previously we only generate one class for each extension backend. This caused issues with scalarType() calls and mapping from variable Types to non-variable types. With this change we generate one Type for each scalar type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17278

Reviewed By: ezyang

Differential Revision: D14161489

Pulled By: li-roy

fbshipit-source-id: 91e6a8f73d19a45946c43153ea1d7bc9d8fb2409
2019-02-22 18:38:33 -08:00
47263e48f4 Better handling of net errors in prof_dag counters (#17384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17384

Better handling of possible net run errors in prof_dag counters.

Reviewed By: yinghai

Differential Revision: D14177619

fbshipit-source-id: 51bc952c684c53136ce97e22281b1af5706f871e
2019-02-22 18:38:31 -08:00
d8d8371bd3 Batch of Expect Files removal (#17414)
Summary:
Batch of removing expect files, and some tests that no longer test anything.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17414

Differential Revision: D14196342

Pulled By: eellison

fbshipit-source-id: 75c45649d1dd1ce39958fb02f5b7a2622c1d1d01
2019-02-22 18:11:51 -08:00
c65b0cbe3d Fix target name.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17365

Differential Revision: D14195831

Pulled By: soumith

fbshipit-source-id: fdf03f086f650148c34f4c548c66ef1eee698f05
2019-02-22 17:27:16 -08:00
4491577fb5 jit technical docs - parts 1, 2, and most of 3 (#16887)
Summary:
This will evolve into complete technical docs for the jit. Posting what I have so far so people can start reading it and offering suggestions. Goto to Files Changed and click 'View File' to see markdown formatted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16887

Differential Revision: D14191219

Pulled By: zdevito

fbshipit-source-id: 071a0e7db05e4f2eb657fbb99bcd903e4f46d84a
2019-02-22 17:27:14 -08:00
9e69703dac USE_ --> BUILD_ for CAFFE2_OPS and TEST (#17390)
Differential Revision: D14195572

Pulled By: soumith

fbshipit-source-id: 28e4ff3fe03a151cd4ed014c64253389cb85de3e
2019-02-22 17:19:44 -08:00
3ab2080047 Fix install libcaffe2_protos.a issue mentioned in #14317 (#17393)
Summary:
Fix install libcaffe2_protos.a issue mentioned in #14317.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17393

Differential Revision: D14195359

Pulled By: soumith

fbshipit-source-id: ed4da594905d708d03fcd719dc50aec6811d5d3f
2019-02-22 17:05:48 -08:00
1d05d0d848 Improve onnxifi backend init time (#17375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17375

Previously we create the onnxGraph first and take it to the onnx manager for registration. It doesn't work well in practice. This diff takes "bring your own constructor" approach to reduce the resource spent doing backend compilation.

Reviewed By: kimishpatel, rdzhabarov

Differential Revision: D14173793

fbshipit-source-id: cbc4fe99fc522f017466b2fce88ffc67ae6757cf
2019-02-22 16:58:30 -08:00
e984244828 fix code block typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17421

Differential Revision: D14194877

Pulled By: soumith

fbshipit-source-id: 6173835d833ce9e9c02ac7bd507cd424a20f2738
2019-02-22 16:34:12 -08:00
807632d402 Double resnet50 batch size in benchmark script (#17416)
Summary:
The benchmarks are now running on gpu cards with more memory
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17416

Differential Revision: D14190493

Pulled By: bddppq

fbshipit-source-id: 66db1ca1fa693d24c24b9bc0185a6dd8a3337103
2019-02-22 15:30:35 -08:00
6d744f8fbf Preserve names when converting to/from NetDef.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17378

Differential Revision: D14176515

Pulled By: ZolotukhinM

fbshipit-source-id: da9ea28310250ab3ca3a99cdc210fd8d1fbbc82b
2019-02-22 15:25:52 -08:00
dbd66c17bc Add generic list/dict custom op bindings (#17037)
Summary:
Fixes #17017
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17037

Differential Revision: D14095703

Pulled By: driazati

fbshipit-source-id: 2b5ae20d42ad21c98c86a8f1cd7f1de175510507
2019-02-22 14:49:43 -08:00
93e8b938ff fix test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17304

Differential Revision: D14151545

Pulled By: eellison

fbshipit-source-id: d85535b709c58e2630b505ba57e9823d5a59c1d5
2019-02-22 14:43:23 -08:00
9aae82bc2c Improvements for current AD (#17187)
Summary:
This PR removes a few size of `self` that passed from forward pass to backward pass when `self` is already required in backward pass. This could be reason that cause the potential slow down in #16689 . I will attach a few perf numbers (still a bit volatile among runs tho) I got in the comment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17187

Differential Revision: D14179512

Pulled By: ailzhang

fbshipit-source-id: 5f3b1f6f26a3fef6dec15623b940380cc13656fa
2019-02-22 14:34:14 -08:00
e422b27f17 Bump up the producer version in ONNX exporter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17410

Reviewed By: zrphercule

Differential Revision: D14187821

Pulled By: houseroad

fbshipit-source-id: a8c1d2f7b6ef63e7e92cba638e90922ef98b8702
2019-02-22 14:28:59 -08:00
b18c60839d list add insert and remove (#17200)
Summary:
See https://github.com/pytorch/pytorch/issues/16662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17200

Differential Revision: D14144020

Pulled By: driazati

fbshipit-source-id: c9a52954fd5f4fb70e3a0dc02d2768e0de237142
2019-02-22 14:12:56 -08:00
9977f43d19 Pin nightly builds to last commit before 5am UTC (#17381)
Summary:
This fell through the cracks from the migration from pytorch/builder to circleci. It's technically still racey, but is much less likely now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17381

Differential Revision: D14190137

Pulled By: pjh5

fbshipit-source-id: 2d4cd04ee874cacce47d1d50b87a054b0503bb82
2019-02-22 14:08:06 -08:00
356a94b64e Lazily load libcuda libnvrtc from c++ (#17317)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16860
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17317

Differential Revision: D14157877

Pulled By: zdevito

fbshipit-source-id: c37aec2d77c2e637d4fc6ceffe2bd32901c70317
2019-02-22 13:51:45 -08:00
81b43202ae Refactor Type Parser b/w Schemas & IRParser into a type common parser (#17383)
Summary:
Creates a new shared type parser to be shared between the IR parser and the Schema Parser.

Also adds parsing of CompleteTensorType and DimensionedTensorType, and feature-gates that for the IRParser.

Renames the existing type_parser for python annotations, python_type_parser, and names the new one jit_type_parser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17383

Differential Revision: D14186438

Pulled By: eellison

fbshipit-source-id: bbd5e337917d8862c7c6fa0a0006efa101c76afe
2019-02-22 13:43:55 -08:00
b0c18570ca add the support for stable ONNX opsets in exporter (#16068)
Summary:
Still wip, need more tests and correct handling for opset 8 in symbolics.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16068

Reviewed By: zrphercule

Differential Revision: D14185855

Pulled By: houseroad

fbshipit-source-id: 55200be810c88317c6e80a46bdbeb22e0b6e5f9e
2019-02-22 12:05:17 -08:00
dd3acbc6d5 add readme and notice at the top of config.yml (#17323)
Summary:
reorder some envars for consistency

add readme and notice at the top of config.yml

generate more yaml from Python

closes #17322
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17323

Differential Revision: D14186734

Pulled By: kostmo

fbshipit-source-id: 23b2b2c1960df6f387f1730c8df1ec24a30433fd
2019-02-22 11:30:49 -08:00
0c24f3754b Revert D14181620: [caffe2/int8] optimize max pool 2d
Differential Revision:
D14181620

Original commit changeset: ffc6c4412bd1

fbshipit-source-id: 4391703164a672c9a8daecb24a46578765df67c6
2019-02-22 11:23:59 -08:00
60de0b885f fallback operators to CPU for onnx support (#15270)
Summary:
fallback operators to CPU for onnx support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15270

Differential Revision: D14099496

Pulled By: yinghai

fbshipit-source-id: 52b744aa5917700a802bdf19f7007cdcaa6e640a
2019-02-22 10:47:53 -08:00
4778a4089e optimize max pool 2d (#17391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17391

Optimize 2D max pool using AVX2 intrinsics.

Reviewed By: jianyuh

Differential Revision: D14181620

fbshipit-source-id: ffc6c4412bd1c1d7839fe06226921df40d9cab83
2019-02-22 10:36:19 -08:00
8a21c6a5ee Fixed the script for the THC generated files (#17370)
Summary:
As of tight now, the script will produce a new generated file which will be inconsistent with the rest.

Test Result:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/17370

Differential Revision: D14184943

Pulled By: izdeby

fbshipit-source-id: 5d3b956867bee661256cb4f38f086f33974a1c8b
2019-02-22 09:42:43 -08:00
2b86cc442c Fix coalesce, clone, to_dense for sparse scalars.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17379

Differential Revision: D14183641

Pulled By: gchanan

fbshipit-source-id: dbd071b648695d51502ed34ab204a1aee7e6259b
2019-02-22 09:02:37 -08:00
3d5968d366 Fix DataParallel(cpu_m).cuda() not working by checking at forward (#17363)
Summary:
Fixes #17362
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17363

Differential Revision: D14175151

Pulled By: soumith

fbshipit-source-id: 7b7e2335d553ed2133287deeaca3f6b6254aea4a
2019-02-22 08:31:36 -08:00
be6ad7ddde Rename BatchNorm running_variance to running_var (#17371)
Summary:
Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (https://github.com/pytorch/vision/pull/728#issuecomment-466067138):
```
terminate called after throwing an instance of 'c10::Error'
  what():  No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so)
frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1)
frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1)
frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1)
frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6)
frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
```
Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem.

This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17371

Reviewed By: goldsborough

Differential Revision: D14172775

Pulled By: yf225

fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
2019-02-22 08:00:25 -08:00
562fa55f3d Updating submodules
Reviewed By: zpao

fbshipit-source-id: ac16087a2b27b028d8e9def81369008c4723d70f
2019-02-21 22:52:41 -08:00
260f66c316 Fix concat dimension check bug (#17343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17343

See [post](https://fb.workplace.com/groups/1405155842844877/permalink/2630764056950710/)

Reviewed By: dzhulgakov

Differential Revision: D14163001

fbshipit-source-id: 038f15d6a58b3bc31910e7bfa47c335e25739f12
2019-02-21 19:34:30 -08:00
1fea60be25 Add dict to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16640

Differential Revision: D14178270

Pulled By: driazati

fbshipit-source-id: 581040abd0b7f8636c53fd97c7365df99a2446cf
2019-02-21 17:45:24 -08:00
2370c989d8 Add LSTM to standard library (#15744)
Summary:
**WIP**

Attempt 2 at #14831

This adds `nn.LSTM` to the jit standard library. Necessary changes to the module itself are detailed in comments. The main limitation is the lack of a true `PackedSequence`, instead this PR uses an ordinary `tuple` to stand in for `PackedSequence`.

Most of the new code in `rnn.py` is copied to `nn.LSTM` from `nn.RNNBase` to specialize it for LSTM since `hx` is a `Tuple[Tensor, Tensor]` (rather than just a `Tensor` as in the other RNN modules) for LSTM.

As a hack it adds an internal annotation `@_parameter_list` to mark that a function returns all the parameters of a module. The weights for `RNN` modules are passed to the corresponding op as a `List[Tensor]`. In Python this has to be gathered dynamically since Parameters could be moved from CPU to GPU or be deleted and replaced (i.e. if someone calls `weight_norm` on their module, #15766), but in the JIT parameter lists are immutable, hence a builtin to handle this differently in Python/JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15744

Differential Revision: D14173198

Pulled By: driazati

fbshipit-source-id: 4ee8113159b3a8f29a9f56fe661cfbb6b30dffcd
2019-02-21 16:24:19 -08:00
ac00a0cd47 Dict mutability (#16884)
Summary:
Adds `aten::_set_item` for `dict[key]` calls
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16884

Differential Revision: D14000488

Pulled By: driazati

fbshipit-source-id: ea1b46e0a736d095053effb4bc52753f696617b2
2019-02-21 16:24:17 -08:00
3a47d56946 Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705) (#17337)
Summary:
Attempt #2 (attempt 1 is https://github.com/pytorch/pytorch/pull/16705 and got reverted because of CI failures)

Fixes https://github.com/pytorch/pytorch/issues/14805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17337

Differential Revision: D14175626

Pulled By: soumith

fbshipit-source-id: 66f2e10e219a1bf88ed342ec5c89da6f2994d8eb
2019-02-21 16:12:02 -08:00
290b2a1d9d Fix Insert Constant Lint Fail (#17316)
Summary:
The test I added was failing lint because a constant was being created that wasn't being destroyed.

It was being inserted to all_nodes, then failing the check
`      AT_ASSERT(std::includes(ALL_OF(sum_set), ALL_OF(all_nodes_set)));`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17316

Differential Revision: D14172548

Pulled By: eellison

fbshipit-source-id: 0922db21b7660e0c568c0811ebf09b22081991a4
2019-02-21 15:54:44 -08:00
4c6da649e5 Partial support for kwarg_only arguments in script (#17339)
Summary:
This provides the minimum necessary to allow derivative formulas for things that have a kwarg only specifier in their schema. Support for non-parser frontend default arguments for kwargs is not completed.
Fixes #16921
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17339

Differential Revision: D14160923

Pulled By: zdevito

fbshipit-source-id: 822e964c5a3fe2806509cf24d9f51c6dc01711c3
2019-02-21 15:27:06 -08:00
5fa78303ed fix double backward for half softmax/logsoftmax (#17330)
Summary:
Fix for #17261, SsnL do you have tests for it in your other PR? If not, I'll add to this. Example from #17261 now does not error out (and same for log_softmax).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17330

Differential Revision: D14171529

Pulled By: soumith

fbshipit-source-id: ee925233feb1b44ef9f1d757db59ca3601aadef2
2019-02-21 14:58:45 -08:00
9101dfc57c Revisit some native functions to increase number of jit matches (#17340)
Summary:
Adds about 30 matches due to new functions / misuse of double.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17340

Differential Revision: D14161109

Pulled By: cpuhrsch

fbshipit-source-id: bb3333446b32551f7469206509b480db290f28ee
2019-02-21 14:41:06 -08:00
46f15b74b7 Add Value::isValidName method. (#17372)
Summary:
The method will be used in IRParser and in NetDef converter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17372

Differential Revision: D14172494

Pulled By: ZolotukhinM

fbshipit-source-id: 96cae8422bc73c3c2eb27524f44ec1ee8cae92f3
2019-02-21 14:34:17 -08:00
6feded880e Fix #17218 by updating documentation (#17258)
Summary:
Fix Issue #17218 by updating the corresponding documentation in [BCEWithLogitsLoss](https://pytorch.org/docs/stable/nn.html#torch.nn.BCEWithLogitsLoss)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17258

Differential Revision: D14157336

Pulled By: ezyang

fbshipit-source-id: fb474d866464faeaae560ab58214cccaa8630f08
2019-02-21 14:17:35 -08:00
45251fb52e fix lint (#17366)
Summary:
fix lint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17366

Differential Revision: D14171702

Pulled By: soumith

fbshipit-source-id: 5d8ecfac442e93b11bf4095f9977fd3302d033eb
2019-02-21 13:39:53 -08:00
3145f46a22 switch to Operation in register_prim_ops.cpp (#17183)
Summary:
This PR switches from `OperationCreator` to `Operation` to simplify the logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17183

Differential Revision: D14169829

Pulled By: Krovatkin

fbshipit-source-id: 27f40a30c92e29651cea23f08b5b1f13d7eced8c
2019-02-21 12:45:25 -08:00
b312e9de6a Use standard docker image for XLA build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17287

Differential Revision: D14169689

Pulled By: kostmo

fbshipit-source-id: 24e255be23936542093008ed51d2c061b2924993
2019-02-21 11:56:23 -08:00
25730f15bb Modernize test_sparse. (#17324)
Summary:
Our sparse tests still almost exclusively use legacy constructors.  This means you can't, for example, easily test scalars (because the legacy constructors don't allow them), and not surprisingly, many operations are broken with sparse scalars.

Note: this doesn't address the SparseTensor constructor itself, because there is a separate incompatibility there that I will address in a follow-on commit, namely, that torch.sparse.FloatTensor() is supported, but torch.sparse_coo_tensor() is not (because the size is ambiguous).

The follow-on PR will explicitly set the size for sparse tensor constructors and add a test for the legacy behavior, so we don't lose it.

Included in this PR are changes to the constituent sparse tensor pieces (indices, values):
1) IndexTensor becomes index_tensor
2) ValueTensor becomes value_tensor if it is a data-based construction, else value_empty.
3) Small changes around using the legacy tensor type directly, e.g. torch.FloatTensor.dtype exists, but torch.tensor isn't a type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17324

Differential Revision: D14159270

Pulled By: gchanan

fbshipit-source-id: 71ee63e1ea6a4bc98f50be41d138c9c72f5ca651
2019-02-21 11:40:43 -08:00
c63af8837d remove nn.Upsample deprecation warnings from tests (#17352)
Differential Revision: D14168481

Pulled By: soumith

fbshipit-source-id: 63c37c5f04d2529abd4f42558a3d5e81993eecec
2019-02-21 11:27:24 -08:00
3069c45069 upgrade documentation in setup.py to NO_ -> USE_ (#17333)
Summary:
fixes https://github.com/pytorch/pytorch/issues/17265
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17333

Differential Revision: D14168483

Pulled By: soumith

fbshipit-source-id: a79f4f9d9e18cb64e2f56f777caa69ae92d2fa4b
2019-02-21 10:25:43 -08:00
5744d5213d Enforce non-negativity of tensor construction (#17077)
Summary:
Apparently, before the only way we enforced it was size>=0 in alloc_cpu. So empty((5,-5)) would fail but empty((-5,-5)) would hang :)

Please suggest better place to enforce it if any.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17077

Differential Revision: D14077930

Pulled By: dzhulgakov

fbshipit-source-id: 1120513300fd5448e06fa15c2d72f9b0ee5734e4
2019-02-21 09:28:53 -08:00
94a95a0c7f Fixing docstring in CTCLoss (#17307)
Summary:
The argument `zero_infinity` is in the wrong place! :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17307

Differential Revision: D14154850

Pulled By: ezyang

fbshipit-source-id: 7a9fe537483b23041f21ba1b80375b7f44265538
2019-02-21 08:13:28 -08:00
de81a2741f Fix the slowness of mvn's log_prob (#17294)
Summary:
This PR addresses the slowness of MVN's log_prob as reported in #17206.

t-vi I find it complicated to handle permutation dimensions if we squeeze singleton dimensions of bL, so I leave it as-is and keep the old approach. What do you think?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17294

Differential Revision: D14157292

Pulled By: ezyang

fbshipit-source-id: f32590b89bf18c9c99b39501dbee0eeb61e130d0
2019-02-21 08:04:33 -08:00
722cbe3064 Move argsort to C++
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17099

Differential Revision: D14165671

Pulled By: ezyang

fbshipit-source-id: 3871de6874fe09871ebd9b8943c13c9af325bf33
2019-02-21 07:59:27 -08:00
37890610b0 Include vec256 headers in setup.py (#17220)
Summary:
Fix #16650.

Headers such as `ATen/cpu/vml.h` contain `#include <ATen/cpu/vec256/vec256.h>`
for example, but these vec256 headers aren't included, due to commit e4c0bb1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17220

Differential Revision: D14165695

Pulled By: ezyang

fbshipit-source-id: 27b2aa2a734b3719ca4af0565f79623b64b2620f
2019-02-21 07:37:01 -08:00
5106918656 Enable MAX_JOBS for using Ninja on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17341

Differential Revision: D14164740

Pulled By: soumith

fbshipit-source-id: 7a1c3db0a7c590f72a777fcd32e1c740bb0c6257
2019-02-21 04:40:17 -08:00
29f4f8f048 Avoid unnecessary CPU-to-GPU copy of torch.load with CUDA (#17297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17297

When `torch.load` needs to load a tensor, no matter which device it will be end up being loaded on, it first creates a CPU storage for it of the necessary size. This storage is allocated but it's not "set" yet, hence no data is written to it: it exists in the kernel's memory map, but it's not resident and doesn't take up physical pages. Then, this storage is passed to the `map_location` function (if the parameter is a string, a device or a map, PyTorch builds that function automatically). The default map for CUDA consists effectively in `lambda storage, _: storage.cuda()` (I omitted the code needed to pick the correct device). This creates a GPU storage and copies over the data of the CPU storage. *This step is unnecessary as we're copying uninitialized memory*. (Surprisingly enough, though, it appears the kernel is smart enough that reading from the unpaged CPU memory doesn't cause it to become paged.) Once `map_location` returns a storage residing on the correct target device, `torch.load` resumes reading the file and copying the tensor's content over into the storage. This will overwrite the content that had previously been written to it, which confirms that the above copy was pointless.

A way to avoid this useless copy is to just create and return a new empty storage on the target GPU, instead of "transforming" the original one.

This does indeed increase the performance:
```
In [5]: torch.save(torch.rand(100, 100, 100), "/tmp/tensor")

In [6]: %timeit torch.load("/tmp/tensor", map_location="cuda")
1.55 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: %timeit torch.load("/tmp/tensor", map_location=lambda storage, _: torch.cuda.FloatStorage(storage.size()))
1.03 ms ± 44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Credit for this diff is shared with adamlerer and fmassa.

Differential Revision: D14147673

fbshipit-source-id: a58d4bc0d894ca03a008499334fc2cdd4cc91e9f
2019-02-21 01:32:19 -08:00
2c302b6ea6 allow lists to contain any tensor type (#17321)
Summary:
If something is a TensorList, it should be a list of `TensorType`, not a list of some specialized type.
Fixes #17140, #15642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17321

Differential Revision: D14158192

Pulled By: suo

fbshipit-source-id: ba8fe6ae8d618c73b23cd00cbcb3111c390c5514
2019-02-21 00:18:50 -08:00
d92ddcf7ca Skip convnets benchmark in rocm CI (#17331)
Summary:
random coredump
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17331

Differential Revision: D14162018

Pulled By: bddppq

fbshipit-source-id: 3ed15a79b7bca2498c50f6af80cbd6be7229dea8
2019-02-20 21:12:24 -08:00
b3b692a80a Don't have malloc-free pairs that cross DLL boundaries. (#17302)
Summary:
See https://blogs.msdn.microsoft.com/oldnewthing/20060915-04/?p=29723
for more background on this requirement on Windows.

Fixes #17239.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc xkszltl peterjc123
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17302

Differential Revision: D14150067

Pulled By: ezyang

fbshipit-source-id: 9dc16ca781ff17515b8df1bb55492477e7843d4c
2019-02-20 20:31:41 -08:00
c063a33ef3 Add support to build for multiple amd gpu targets (#17329)
Summary:
iotamudelta petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17329

Differential Revision: D14161277

Pulled By: bddppq

fbshipit-source-id: f3eb9f52e96a8fcd779c57df0f8c9a2c54754e35
2019-02-20 18:45:24 -08:00
501d346da8 batched cleanups (#17288)
Summary:
Bunch of random stuff I came across while doing UDT stuff. Putting in a separate PR to avoid noise
- fix up the alias analysis list ops to include fork/wait
- improve dump() for aliasDb to print writes
- Move BuiltinFunction::call() to sugaredvalue with the rest of the methods
- formatting and includes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17288

Differential Revision: D14147105

Pulled By: suo

fbshipit-source-id: 62e2a922a1726b684347365dc42c72188f154e9c
2019-02-20 18:31:53 -08:00
cb3d3d1115 (Permanently) fix CI breakage due to new docker version. (#17338)
Summary:
Pull request resolved: https://github.com/pytorch/pytorch/pull/17338

See comment in config.yml for details.

build-break

Reviewed By: orionr

Differential Revision: D14160934

fbshipit-source-id: a91160ab15dd6c174a7d946a78a7d2d50ae0a011
2019-02-20 18:00:11 -08:00
376bb40379 Implementation convolutionTranspose operator for mkl-dnn (#12866)
Summary:
the speed-up of a single operation is up to 2-3X on BDW.
This PR depend on https://github.com/pytorch/pytorch/pull/14308
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12866

Differential Revision: D13936110

Pulled By: ezyang

fbshipit-source-id: 34e3c2ca982a41e8bf556e2aa0477c999fc939d3
2019-02-20 17:26:10 -08:00
c02e2ff0b0 Support multi-device configuration for MKL-DNN (#12856)
Summary:
MKL-DNN support multi-node mode,but not support multi-devices mode,this commit will support multi-devices for MKL-DNN.This commit  depend on https://github.com/pytorch/pytorch/pull/11330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12856

Differential Revision: D13735075

Pulled By: ezyang

fbshipit-source-id: b63f92b7c792051f5cb22e3dda948013676e109b
2019-02-20 16:57:43 -08:00
90950f79c7 fix missing std (#17263)
Summary:
add missing std introduced by #16689 . Investigating why this wasn't caught in CI (nor my local dev environment).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17263

Reviewed By: ezyang

Differential Revision: D14134556

Pulled By: ailzhang

fbshipit-source-id: 6f0753fa858d3997e654924779646228d6d49838
2019-02-20 16:47:35 -08:00
0edc81136c Rethrow exceptions from RunAsync (#15034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15034

Rethrow exception happened during RunAsync, ensure that pending tasks
are not executed after marked as finished

Reviewed By: andrewwdye

Differential Revision: D13409649

fbshipit-source-id: 3fd12b3dcf32af4752f8b6e55eb7a92812a5c057
2019-02-20 16:32:24 -08:00
0337494c6a Reinforce scheduling invariants (#17132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17132

schedule() function is not supposed to throw exception and is supposed
to succeed in scheduling the full graph of tasks, potential errors (e.g. errors
from underlying thread pool, out of memory exceptions etc) are considered not
recoverable.
The invariant - the graph of tasks is either not executed or
executed in full before the call to finishRun()

Reviewed By: andrewwdye

Differential Revision: D14092457

fbshipit-source-id: a3e5d65dfee5ff5e5e71ec72bb9e576180019698
2019-02-20 16:32:23 -08:00
3e44880d4d Modify TileOp GPU implementation to expose more concurrency and better utilize GPU memory bandwidth (#17275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17275

Previous implementation used a memcpy inside the kernel. It is more efficient to reduce the data fetched per thread to a single word from memory. This exposes more concurrency and takes advantage of GPU memory coalescing support.

Reviewed By: takatosp1

Differential Revision: D14120147

fbshipit-source-id: c4734003d4342e55147c5b858f232a006af60b68
2019-02-20 16:02:14 -08:00
9e4a993878 Support str for native_functions.yaml schema
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17276

Differential Revision: D14154222

Pulled By: cpuhrsch

fbshipit-source-id: 411181da5399608c1d1f3218f8f570bb106c88ec
2019-02-20 15:47:06 -08:00
2e67b34ea7 Separate gpu reduce functions (#17146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17146

Separate gpu reduce functions

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14097564

fbshipit-source-id: a27de340997111a794b1d083c1673d4263afb9fb
2019-02-20 14:49:01 -08:00
474adf5458 Minor doc updates in c10/core/Allocator.h (#17164)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17164

Differential Revision: D14154393

Pulled By: ezyang

fbshipit-source-id: 59d8276d4bb4e7cadb4382769b75e5348ed388de
2019-02-20 14:36:15 -08:00
b2dde4386a Namedtuple return for symeig, eig, pstrf, qr, geqrf (#16950)
Summary: More ops for https://github.com/pytorch/pytorch/issues/394

Differential Revision: D14118645

Pulled By: ezyang

fbshipit-source-id: a98646c3ddcbe4e34452aa044951286dcf9df778
2019-02-20 14:01:19 -08:00
36ddad3bfe Allow PyTorch to be built without NCCL (#17295)
Summary:
With this patch you can use USE_DISTRIBUTED=OFF (possibly in combination with USE_NCCL=OFF (?))

The significance is partly because the NCCL doesn't build with CUDA 8.
This is written under the assumption that NCCL is required for distributed if not, the USE_DISTRIBUTED check in nccl.py should be replaced by a check for the USE_NCCL environment variable.

Fixes: #17274
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17295

Differential Revision: D14155080

Pulled By: ezyang

fbshipit-source-id: 0d133f7c5b4d118849f041bd4d4cbbd7ffc3c7b4
2019-02-20 13:35:16 -08:00
e6cf3c886d add foxi submodule (#17184) 2019-02-20 16:25:05 -05:00
54e4c4d7de Removed obsolete argument correct_transform_coords in bbox_transform op. (#16723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16723

Removed obsolete argument correct_transform_coords in bbox_transform op.
* It was only for backward compatibility. We should not have models using it now.

Differential Revision: D13937430

fbshipit-source-id: 504bb066137ce408c12dc9dcc2e0a513bad9b7ee
2019-02-20 13:22:33 -08:00
075c7b1fef make the threshold for acurracy more precise (#17194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17194

we found that there is a per row absolute error due to int8 quant
and a relative error table-wide in case fp16 is used

Reviewed By: csummersea

Differential Revision: D14113353

fbshipit-source-id: c7065aa9d15c453c2e5609f421ad0155145af889
2019-02-20 13:14:11 -08:00
db1d61a5c3 Add rule based filtering for ONNXIFI transformation (#17198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17198

We come to the point that we need to apply some rules to bind certain ops together to avoid un-inferrable intermediate shapes. We either lower them together to backend or neither. This diff adds a pass for us to add rules like this. The first one is to bind `Gather` with `SparseLengthsWeighted*`.

Reviewed By: ipiszy

Differential Revision: D14118326

fbshipit-source-id: 14bc62e1feddae02a3dd8eae93b8f553d52ac951
2019-02-20 12:47:24 -08:00
63214b572b Updating submodules
Reviewed By: zpao

fbshipit-source-id: 4ee15707bcf8c23c2d7feb6987acecef4131d467
2019-02-20 09:38:12 -08:00
260facfdea caffe2 | added missing operator source file (#17272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17272

after windows-specific fixes were applied new file was left out of CMakeLists

Reviewed By: orionr

Differential Revision: D14140419

fbshipit-source-id: 6a6c652048ed196ec20241bc2a1d08cbe2a4e155
2019-02-20 09:28:29 -08:00
a91e056f2a add list methods: copy,extend (#17092)
Summary:
This PR adds the following methods to python's list.

* copy
* extend

and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17092

Differential Revision: D14141817

Pulled By: Krovatkin

fbshipit-source-id: c89207f0f25f3d1d4ad903ee634745615d61d576
2019-02-20 09:24:25 -08:00
79f898263b Improve error message w/ size inference on empty tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17255

Differential Revision: D14143094

Pulled By: soumith

fbshipit-source-id: f96fa7f8eb6eaac72887d3e837546cbfa505f101
2019-02-20 09:12:26 -08:00
c3a23379ea add install step and docs for Android build (#17298)
Summary:
This commit did below enhancements:
1, add doc for build_android.sh;
2, add install step for build_android.sh, thus the headers and libraries can be collected together for further usage conveniently;
3, change the default INSTALL_PREFIX from $PYTORCH_ROOT/install to $PYTORCH_ROOT/build_android/install to make the project directory clean.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17298

Differential Revision: D14149709

Pulled By: soumith

fbshipit-source-id: a3a38cb41f26377e21aa89e49e57e8f21c9c1a39
2019-02-20 07:05:24 -08:00
1b3315ec17 improve libtorch install docs with GPU note (#17299)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17299

Differential Revision: D14149712

Pulled By: soumith

fbshipit-source-id: 5b83110bb00e4d4dad04c1f293c2b52e41711f11
2019-02-20 06:30:08 -08:00
237e5438f5 Add launch bounds for TopK kernel, be more conservative in sorting (#17296)
Summary:
The particular use case reported is Jetson TX2 and maskrcnn.

Fixes #17144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17296

Differential Revision: D14147886

Pulled By: soumith

fbshipit-source-id: 44d5a89aaeb4cc07d1b53dd90121013be93c419c
2019-02-20 03:10:46 -08:00
b8d1f4a423 ONNX Export Maxpool Indices
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16455

Differential Revision: D14140375

Pulled By: houseroad

fbshipit-source-id: 12d02c447e7fe0fae49969d1daf40a87660ed416
2019-02-19 21:10:14 -08:00
4f45bc73f7 Revert D14144264: [pytorch][PR] [jit] clean up print from test
Differential Revision:
D14144264

Original commit changeset: eec837d29c46

fbshipit-source-id: ad91cb1d047fd34967385b661a6757111f92026e
2019-02-19 18:56:23 -08:00
f8c9ec5e44 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 68a648b2136823994f02fa5b567a2656494f6dd3
2019-02-19 17:50:40 -08:00
5be3ffbde2 clean up print from test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17279

Differential Revision: D14144264

Pulled By: suo

fbshipit-source-id: eec837d29c46e96be37c54192a841046b486cb8b
2019-02-19 17:46:41 -08:00
428b666814 Fix dll loading process in newer Python on Windows (#17191)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17051.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17191

Differential Revision: D14138427

Pulled By: kostmo

fbshipit-source-id: 9f207105161ad0312eb09fd86072afd5f22de785
2019-02-19 17:16:41 -08:00
972fc5f191 Fix dll loading issue for Caffe2 and Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17215

Reviewed By: orionr

Differential Revision: D14138445

Pulled By: kostmo

fbshipit-source-id: 0bb4f2f1ed5bda7416ba7e4c6b0618414b328934
2019-02-19 17:04:06 -08:00
2b57bdb7ab Fix cuda softmax backward with empty input (#17259)
Summary:
Fixes #17256
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17259

Differential Revision: D14142196

Pulled By: soumith

fbshipit-source-id: 1f2dc202951b59b43da27684f9f924314bcd3040
2019-02-19 16:41:52 -08:00
Jie
594a4d7b55 at::native batch norm kernel launch config update (#17047)
Summary:
limit block dimension to avoid configuration error on batch norm kernel launch

This should resolve #16998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17047

Differential Revision: D14142132

Pulled By: soumith

fbshipit-source-id: 9c8c52dcd1d108cda1f65f5227e625b8fe6e12a0
2019-02-19 16:41:51 -08:00
6455d91e4d False alarm about leak in TestNN.test_variable_sequence_cuda (#17242)
Summary:
`TestNN.test_variable_sequence_cuda` sometimes brakes due to CUDA leak.
The cause appears to be too small tolerance breaking float16 sub-test of the test above.
When it breaks it calls abort disrupting correct tear down of the test
and false alarming about the leak.

~~Also, removed annoying **Upsample** module warning.
IMHO this warning is wrong because the module **Upsample** is not deprecated. Seems like it's been mixed
with `nn.functional.upsample` function which is indeed deprecated in favor of `nn.functional.interpolate`, see `torch/nn/functional.py:2387` for details (this replacement is also performed in `test_nn.py`).~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17242

Differential Revision: D14141686

Pulled By: soumith

fbshipit-source-id: faa8f87440d94bdc6ab0ff00be6dad82353115c4
2019-02-19 15:59:30 -08:00
09c9af9451 U/kostmo/gen circle conf (#17189)
Summary:
Diagram preview:
![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/53040977-a0f88d00-3437-11e9-9190-796cc243e0f9.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17189

Differential Revision: D14141362

Pulled By: kostmo

fbshipit-source-id: 0625a1234d0307c6be79f17e756ddb1cc445b374
2019-02-19 15:37:09 -08:00
f827f9f77a update doc for multinomial (#17269)
Summary:
Update documentation to raise awareness of the fix in #12490. Thanks matteorr for pointing this out!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17269

Reviewed By: ezyang

Differential Revision: D14138421

Pulled By: ailzhang

fbshipit-source-id: 6433f9807a6ba1d871eba8e9d37aa6b78fa1e1fd
2019-02-19 15:30:52 -08:00
d73e6cb59d Automatic update of fbcode/onnx to 4c091e048ca42682d63ccd3c1811560bc12b732d (#17264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17264

Previous import was 822d8df0a2a32233c6022f50a158817a0f19bdc7

Included changes:
- **[4c091e0](https://github.com/onnx/onnx/commit/4c091e0)**: Support defined ONNX_ML in parent cmake files (#1821) <Lu Fang>
- **[57372f3](https://github.com/onnx/onnx/commit/57372f3)**: Delete OpsetVersionConverter.md which is a duplicate of VersionConverter.md (#1818) <Prasanth Pulavarthi>
- **[ab1c57e](https://github.com/onnx/onnx/commit/ab1c57e)**: [ONNXIFI]Add extension to be implementable (#1796) <Rui Zhu>
- **[b92eee8](https://github.com/onnx/onnx/commit/b92eee8)**: Revert "Implement Op Annotation's for ONNX (#1648)" (#1812) <Ke Zhang>
- **[61f1e9e](https://github.com/onnx/onnx/commit/61f1e9e)**: Enable ONNX_ML by default (#1810) <Shinichiro Hamaji>
- **[4f064a1](https://github.com/onnx/onnx/commit/4f064a1)**: fix Greater and Less doc (#1811) <Guoliang Hua>
- **[0628582](https://github.com/onnx/onnx/commit/0628582)**: Implement Op Annotation's for ONNX (#1648) <Armen>
- **[ad9d2f7](https://github.com/onnx/onnx/commit/ad9d2f7)**: Versioning doc update for Opset 9 (#1805) <Vinitra Swamy>
- **[e71e3be](https://github.com/onnx/onnx/commit/e71e3be)**: add dilation case for ConvTranspose op (#1797) <Randy>

Reviewed By: yinghai

Differential Revision: D14135024

fbshipit-source-id: 1e4f9dda89abf48994798d080dd5d58207a6e4b6
2019-02-19 14:54:34 -08:00
c88798dbc1 Make tril_ and triu_ actually in-place (#17031)
Summary:
Currently, when the input tensor `self` is not contiguous, `tril_` and `triu_` calls `self = self.contiguous()`, which allocates a new contiguous tensor and assign it to `self`. This effectively changes the input tensor `self`'s pointer and will break downstream code after Variable/Tensor merge.

This PR fixes it so that `tril_` and `triu_` always update the input tensor in-place and preserve the input tensor's TensorImpl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17031

Differential Revision: D14069592

Pulled By: yf225

fbshipit-source-id: d188218f426446a44ccc1d33fc28ac3f828c6a05
2019-02-19 14:47:17 -08:00
0fc03d155a Fix remaining -Wreturn-std-move violations in fbcode
Summary:
Some value are copied when it could've been moved.
Detected by compiler flag -Wreturn-std-move

Reviewed By: igorsugak

Differential Revision: D14134303

fbshipit-source-id: 8fc3bb2017108b3d65097cb8447e33f5b6c743b4
2019-02-19 12:41:27 -08:00
89df22e57b Lightweight String check Utility (#16858)
Summary:
light weight implementation of LLVM filecheck utility. Currently only handles string matching - regexes & saving a regex to a variable name can be added as needed.

Current intended usage is through FileCheckBuilder python handle, and is shown in the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16858

Differential Revision: D14096244

Pulled By: eellison

fbshipit-source-id: c7c8d1457691c105e6ccbb3c1a378d96baac2569
2019-02-19 12:31:57 -08:00
82aa511146 move prim::None to prim::Constant (again) (#17186)
Summary:
Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test.

https://github.com/pytorch/pytorch/pull/16160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186

Differential Revision: D14115304

Pulled By: eellison

fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7
2019-02-19 11:45:50 -08:00
9ebc433bda Clarification of Lerp operation on tensors (#17253)
Summary: The `tensor` be used as `end` clarified in the docs.

Differential Revision: D14132212

Pulled By: ezyang

fbshipit-source-id: e9bca14d5079e5f7adfc18afcb1eec832ef86e9e
2019-02-19 11:27:02 -08:00
19117f6a0a reenable rand_like fusion when there is no broadcast (#16087)
Summary:
Reenables rand_like fusion if no tensor is broadcasted in the fusion group. This is a sufficient but not necessary condition for fused rand_like to produce correct results, and it has an unpleasant side effect of falling back to non-fused path if rand_like was optimistically included in the fusion group, but there is a broadcast in the fusion group not necessarily related to rand_like. E.g. before this PR, if the network had (biasAdd -> relu -> dropout), fuser could fuse biasAdd and relu, now it will try fusing the whole thing (if dropout is expressed via rand_like) and fall back every time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16087

Differential Revision: D13720232

Pulled By: zou3519

fbshipit-source-id: 1e19203bec4a59257bfc7078b054a19f00fab4ad
2019-02-19 11:12:25 -08:00
43d5cd4d34 discrepancy in smoke_macos_libtorch_2.7_cpu job spec (#17224)
Summary:
closes #17223
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17224

Reviewed By: pjh5

Differential Revision: D14121612

Pulled By: kostmo

fbshipit-source-id: bfd5a392de5e614031389725535756d7fa7db784
2019-02-19 10:14:21 -08:00
444039c47b Bool tensor. Part 0: Boolean storage implementation (#16810)
Summary:
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:

0. Storage Implementation (this change)
1. Tensor Creation.
2. Tensor Conversions.
3. Tensor Indexing.
4. Tensor Operations.
5. Back compatibility related changes.

This feature was requested by the community:
https://github.com/pytorch/pytorch/issues/4764
https://github.com/pytorch/pytorch/issues/4219
https://github.com/pytorch/pytorch/issues/4288

**Change**:
Added boolean type to the Storage class for CPU and CUDA backends.

**Tested via**:
1. unit tests
2. running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810

Reviewed By: gchanan

Differential Revision: D14087246

Pulled By: izdeby

fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
2019-02-19 08:22:13 -08:00
e81878e0a9 Correct padding and activations docstrings in nn module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17197

Differential Revision: D14131284

Pulled By: soumith

fbshipit-source-id: 6edd225b47b1dde81b5ad0a23c588c6621987a69
2019-02-19 08:16:52 -08:00
f2f4030294 Use move to avoid copying (#17188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17188

Using flag "-Wreturn-std-move", compiler can identify the cases where a copy
operation
is performed when a move operation would have been available. Wrapped return
statement with std::move to fix.

For some reason, these files are not automatically modded. With D14115372
we should be able to turn on the compile flag

Reviewed By: soumith

Differential Revision: D14115786

fbshipit-source-id: e763b92eecbe4468027fc141d029618d1e9f280b
2019-02-19 07:14:27 -08:00
57617ee429 Replace resize_dim() with set_sizes_and_strides() (#17127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17127

Replace resize_dim() with set_sizes_and_strides() in   `THTensor_(squeeze1d) in aten/src/TH/generic/THTensor.cpp`

Reviewed By: ezyang

Differential Revision: D14088697

fbshipit-source-id: 518b72f7c0c4fbedf11a29a6ceb9fee8eefd9273
2019-02-19 07:04:17 -08:00
9477c143c6 C++ Frontend: adding two distributed samples (Random and Sequential) (#16910)
Summary:
Adding two distrbuted samplers, Random and Sequential to the mix. Similar to python counterpart, DistributedSampler introduces a new method `set_epoch(size_t epoch)` which can be use to shuffle data determinstically between distributed processes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16910

Differential Revision: D14130980

Pulled By: soumith

fbshipit-source-id: ec08b7130c01e2fc6dc3693f7ac622a0a6d60f10
2019-02-19 05:40:37 -08:00
8852e21245 Correct recurrent/linear/dropout/sparse layers docstrings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17238

Differential Revision: D14130811

Pulled By: soumith

fbshipit-source-id: d3998ca7da46aec5a59220c6af489f71f3d60735
2019-02-19 05:23:04 -08:00
fad9eda7fb Optional arg fixes (#17222)
Summary:
fixes #17210.
cc : ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17222

Differential Revision: D14130833

Pulled By: soumith

fbshipit-source-id: 19ff6020c47208e3436ae28cd16110a0f435b25e
2019-02-19 04:39:18 -08:00
7e5442f900 Reset grad attribute when called using del (#16525)
Summary:
del Tensor.grad set PyObject to nullptr
and Tensor.grad = None set PyObject to Py_None
Handling both the cases now
fixes ##16471
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16525

Differential Revision: D14130800

Pulled By: soumith

fbshipit-source-id: ed85c38305bba94d5047311cb58e4e4cedd09832
2019-02-19 04:33:57 -08:00
9a7bcacc27 Logging stuffs (#17177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17177

Add more logging and flag.

Reviewed By: yinghai

Differential Revision: D14111643

fbshipit-source-id: 4b1c005faf41c21f59100bc401120c6970a24c42
2019-02-17 13:41:50 -08:00
3a01a45f06 Implement IRParser. (#16987)
Summary:
It might need some cleaning up and might be missing some features, but it should be already working for most cases.

This PR is based on top of PR16986 (so please review only the last commit here).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16987

Differential Revision: D14074577

Pulled By: ZolotukhinM

fbshipit-source-id: 712b598f423265655f574bb9903e2066628eaad3
2019-02-16 20:23:50 -08:00
bf16a6bc3c Skip onnx logsoftmax tests in rocm (#17170)
Summary:
similar to softmax there are issues of getting nan randomly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17170

Differential Revision: D14110515

Pulled By: bddppq

fbshipit-source-id: 5c97661184d45a02122fd69d35a839fdf4520c8c
2019-02-16 18:06:04 -08:00
b6b99fd7d3 Add namedtuple return for min, median, mode, kthvalue, add test for namedtuple return API (#16186)
Summary:
This partially fixes https://github.com/pytorch/pytorch/issues/394 and depend on https://github.com/pytorch/pytorch/pull/15429. I suggest to review this only after https://github.com/pytorch/pytorch/pull/15429 get landed, otherwise the diff might be large to review.

The test only allows explicitly whitelisted operators to have named return.

Differential Revision: D14070735

Pulled By: ezyang

fbshipit-source-id: ace2a672998b4e4a8094f52cbda5aa1cea6e3b42
2019-02-16 00:01:33 -08:00
b3d8c569d3 Remove templates for GenericDict
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17175

Differential Revision: D14113022

Pulled By: driazati

fbshipit-source-id: 5183e131cc8ccb58525875f76fa03133570a59ea
2019-02-15 21:35:19 -08:00
20fd6dca77 fix missing constant in adaptive_avg_pool2d AD (#17180)
Summary:
Thanks ngimel for pointing this out!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17180

Differential Revision: D14113001

Pulled By: ailzhang

fbshipit-source-id: 78e7d7f2cda3889138e2bf26a54980c2cc665882
2019-02-15 21:14:34 -08:00
6c06b32558 Implement NetDef <--> JIT IR converters. Try 2. (#17123)
Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.

Two things that we might want to add/change:

1. Add semantic conversion as well (but probably it would be a good idea to keep
   it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
   uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17123

Differential Revision: D14090244

Pulled By: ZolotukhinM

fbshipit-source-id: 07175fa9235582e1d1da5f10a42a5c1280b1b394
2019-02-15 20:39:30 -08:00
cde7204636 change the epsilon for fp32/fp16 to uint8 to be the same (#17062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17062

from jiyan's training jobs it seems like we found a quantization bug

fp32
fp32->rowwise int8 is fine
fp16 is fine
fp16->rowwise int8 is not fine

we are preconverting everything to fp32 and using the existing code, so there is no need to change the epsilon in the case of fp16 since at the time of converting, everything is a float

Reviewed By: jspark1105

Differential Revision: D14063271

fbshipit-source-id: 747297d64ed8c6fdf4be5bb10ac584e1d21a85e6
2019-02-15 18:33:37 -08:00
91c1d728ac Revert D14109636: [pytorch][PR] move prim::None to a case in prim::Constant
Differential Revision:
D14109636

Original commit changeset: d26fd3839761

fbshipit-source-id: c8c8113e2bff49ea93235732603e6ebc89356533
2019-02-15 16:38:12 -08:00
7caa21f5ca move prim::None to a case in prim::Constant (#16160)
Summary:
This change simplifies analysis done on constants since prim::None does not need to be handled separately now.  To check if a constant node is None, use node->isNone().

Next step will be to remove prim::Undefined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160

Differential Revision: D14109636

Pulled By: eellison

fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876
2019-02-15 16:27:57 -08:00
4fcab92d6c Move outplace ops to ATen (#16788)
Summary:
Based on https://github.com/pytorch/pytorch/pull/12413, with the following additional changes:

-  Inside `native_functions.yml` move those outplace operators right next to everyone's corresponding inplace operators for convenience of checking if they match when reviewing
- `matches_jit_signature: True` for them
- Add missing `scatter` with Scalar source
- Add missing `masked_fill` and `index_fill` with Tensor source.
- Add missing test for `scatter` with Scalar source
- Add missing test for `masked_fill` and `index_fill` with Tensor source by checking the gradient w.r.t source
- Add missing docs to `tensor.rst`

Differential Revision: D14069925

Pulled By: ezyang

fbshipit-source-id: bb3f0cb51cf6b756788dc4955667fead6e8796e5
2019-02-15 15:58:10 -08:00
5737c5259c Fix for 16939:multinomial performance regressed
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17121

Differential Revision: D14088558

Pulled By: ifedan

fbshipit-source-id: e03583135f1e797fe1d8081ec5e9e6b63d4015c1
2019-02-15 15:44:41 -08:00
7157be8622 Add special ops for BatchNorm symbolic differentiation (#15403)
Summary:
The main problem there is with differentiating batch norm statically
is that we make a lot of complex run-time decisions about the backend
we choose. Then, the autograd derivatives are implemented for every
backend separately, which makes sense, because they might be saving
buffers containing different values. To resolve the issue, the forward
op returns an index of the chosen backend, and the backward function
takes it as an argument, such that it knows how to interpret the buffers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403

Differential Revision: D14098815

Pulled By: ailzhang

fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad
2019-02-15 15:40:28 -08:00
21696502ff improve error msg when module list isn't added to __constants__ (#17167)
Summary:
Add suggestion to add to __constants__ when a ModuleList of Sequential module is used as a tuple

Addresses https://github.com/pytorch/pytorch/issues/13899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17167

Differential Revision: D14107688

Pulled By: eellison

fbshipit-source-id: 8c07d1f3e25a9c6bdcfd96dbf6b72c2130838278
2019-02-15 15:03:50 -08:00
1cdcdd78af Kaiming Initialization (#14718)
Summary:
/cc goldsborough

Working on #14582

The corresponding python implementations are at: [pytorch/torch/nn/init.py](6302e4001a/torch/nn/init.py (L261-L327))

Here is my initial implementation of Kaiming Initialization. I have not been able to figure out how to successfully run tests locally so I haven't added any yet.

A couple questions:
- Are the enums defined in the right place? I copied their names from Python, but do you prefer different naming conventions for C++?
- To run tests locally do I use `python setup.py test`? Can I run just a subset of the tests somehow?
- Should I add my tests at [test/cpp/api/misc.cpp](https://github.com/pytorch/pytorch/blob/master/test/cpp/api/misc.cpp#L47-L54)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14718

Differential Revision: D14049159

Pulled By: goldsborough

fbshipit-source-id: 966ac5126875936e69b185b5041f16476ed4cf70
2019-02-15 14:58:22 -08:00
5eee0670ab Pass torch.distributed launch process local rank as environment variable instead of argument (#16360)
Summary:
In `torch.distributed.launch.py`, it passes `local_rank` as argument and requires user's program to parse it. However, it would be more flexible for users and consistent with other variables, e.g. `RANK`, `MASTER_PORT`, `WORLD_SIZE`, if passing through environment variables.

265ed8ff45/torch/distributed/launch.py (L200-L212)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16360

Differential Revision: D14070372

Pulled By: ezyang

fbshipit-source-id: c3f6a8e55ab513918cad09d1326eccdedb4d98c9
2019-02-15 14:52:55 -08:00
ea405f8d01 Assert cases exist for unschematized ops in alias analysis
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16334

Differential Revision: D13901238

Pulled By: driazati

fbshipit-source-id: be99f89e7dc6a299b770ea92e217932a5271027d
2019-02-15 14:27:26 -08:00
8d33eb450e Fix avg pool2d api (#17166)
Summary:
Fix xla breakage (partially).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17166

Differential Revision: D14106954

Pulled By: ailzhang

fbshipit-source-id: 35ae6713272d0517b66da2ee9209f49015492b89
2019-02-15 13:58:30 -08:00
eba1b23ddd Fix syntax error in set instantiation (#17174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17174

Use curly braces syntax to avoid Lint complaint

Reviewed By: yf225

Differential Revision: D14111368

fbshipit-source-id: 44aa21deb9feededb94f23d92262a4164fe0cc1c
2019-02-15 13:58:29 -08:00
6454e3262d Make getting the dtype of a tensor work for backend extensions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17131

Differential Revision: D14093163

Pulled By: gchanan

fbshipit-source-id: 06638706e26505e3c741b7ae290000ca258599db
2019-02-15 13:47:37 -08:00
9b5d3f6f5e Stop reassigning (output) reference arguments in BinaryOps. (#17059)
Summary:
The binary ops that are using TensorIterator do a trick in order to only write the code once for out and non-out variants:

1) Have the non-out variant call the out variant with an undefined tensor.
2) the out variant then reassigns the result tensor to the output of the TensorIterator; this is a no-op in the case where a valid tensor was passed and it correctly propagates the result back to the non-out variant, which is legal because it's just reassigning an undefined tensor.

I believe other solutions to this problem would require an unnecessary reference bump, e.g. defining another out variant that returns a Tensor rather than a reference.

Unfortunately, this doesn't work with const-references, which we want to move our output arguments to be (because const doesn't actually provide const correctness here, and writers mistakenly reassign the parameter in the case it isn't an out variant).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17059

Differential Revision: D14068402

Pulled By: gchanan

fbshipit-source-id: 89fef177a1e174dbe2858e2eae0f6d85460b07d1
2019-02-15 13:41:04 -08:00
70ee257ad4 Fix batch insert (#17158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17158

Because of Reshape op, batch size can be changed. This diff addresses first order issue raised from multiple batch size system. We need to export different real_batch_size for different max_batch_size input and attach it to the right output.

It also fixes a false exception.

Reviewed By: ipiszy

Differential Revision: D14099541

fbshipit-source-id: 0fa9e86826f417a11d2b5dd2ee60dff64a7ce8c4
2019-02-15 12:28:23 -08:00
01686db21b Generate CircleCI config.yml from a script (#17039)
Summary:
This initial PR splits the `.circleci/config.yml` file into several smaller files that are stitched verbatim back into the original.  A proof of concept of dynamically generating yaml for the job configuration list is also introduced.

Since the `config.yml` file must exist in the repo in its final form, there must exist a manual update and check-in step to regenerate `config.yml` from its constituent parts.
Consistency between the checked-in `config.yml` file and the authoritative source data is enforced at build time through TravisCI.

closes #17038
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17039

Reviewed By: yf225

Differential Revision: D14109059

Pulled By: kostmo

fbshipit-source-id: bc04a73145290358854f5a5e552a45e559118fc3
2019-02-15 12:21:25 -08:00
82b269060c Add support for simpler for-in-list + tests (#16726)
Summary:
This PR add supports for simpler for-in-list loops such as the example below:

```python
torch.ji.python
def sum_list(a):
    # type: (List[int]) -> int
    sum = 0
    for i in a:
        sum += i

    return sum
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16726

Differential Revision: D14070007

Pulled By: ezyang

fbshipit-source-id: b4d971ee647729a6caa3099ceac34ec5c4f143de
2019-02-15 11:41:20 -08:00
326c891d32 Update pybind11 (#17143)
Summary:
Fixes #17130
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17143

Differential Revision: D14107386

Pulled By: zdevito

fbshipit-source-id: 1834d14bcdcad6857c199bf4fb8f67298394bbf3
2019-02-15 11:24:25 -08:00
472cfc0f2c Enforce module device at DataParallel construction time (#17129)
Summary:
closes #17065

CC douwekiela
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17129

Differential Revision: D14093353

Pulled By: mrshenli

fbshipit-source-id: 9a5a10f16e392337a7f7073223541cf69b402f82
2019-02-15 11:14:46 -08:00
b892f69440 one_hot docs missing (#17142)
Summary:
one_hot docs is missing [here](https://pytorch.org/docs/master/nn.html#one-hot).

I dug around and could not find a way to get this working properly.

Differential Revision: D14104414

Pulled By: zou3519

fbshipit-source-id: 3f45c8a0878409d218da167f13b253772f5cc963
2019-02-15 10:48:18 -08:00
38139bc356 add pop support to list (#17015)
Summary:
[WIP] add "pop" to list, see https://github.com/pytorch/pytorch/issues/16662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17015

Differential Revision: D14071680

Pulled By: eellison

fbshipit-source-id: b49a318059c1cc131acda50713132e11b562568f
2019-02-15 10:48:17 -08:00
7481cc9d7c Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: bbfb709d8681da60ccc9f3bafc6c296c32fcf835
2019-02-15 10:42:23 -08:00
dad0dbd3b9 merge fully_connected_rowwise_dnnlowp_op into fully_connected_dnnlowp_op (#17105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17105

To make FC with rowwise quantization faster, reduce code duplication, and make code consistent with Convolution

Reviewed By: csummersea

Differential Revision: D14080461

fbshipit-source-id: 2b0e67b86e7e3029c90751a8824bf80ae1223680
2019-02-15 09:50:11 -08:00
90fc6133b2 bug fix when we prepack weight and bias together (#17145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17145

Prepacked weight contains both weight and bias, so the bias should be obtained from input index 1, not from 2

Reviewed By: jianyuh

Differential Revision: D14097281

fbshipit-source-id: b8b836b85a7b240e2fd1734377c46d9bf2ce3390
2019-02-15 09:21:20 -08:00
fbd690c1fe caffe2: fix PinnedCPUAllocator cudaHostRegister() leak (#16340)
Summary:
In the NUMA case, PinnedCPUAllocator's allocate() would return a
DataPtr constructed by DefaultCPUAllocator, which would reference
the Default... Delete() rather than the Pinned... Delete(). That
meant Pinned... Delete() would never run, so cudaHostUnregister()
would never be called when regions were freed.

See: https://github.com/pytorch/pytorch/issues/16280

This change adds a 'naked_allocate()' method to the Default allocator
that just returns a pointer to the allocated memory rather than
wrapping it in a DataPtr. Pinned allocator uses that then constructs
a DataPtr with reference to its own Delete().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16340

Reviewed By: dzhulgakov

Differential Revision: D13843206

Pulled By: ezyang

fbshipit-source-id: 9efb572e5a01b49ef2a4aceeccc13cd0b1066528
2019-02-15 07:02:33 -08:00
07b5782ff7 Add some missing docs to torch.rst, new unittest to enforce torch.rst no longer miss anything (#16039)
Summary:
This prevent people (reviewer, PR author) from forgetting adding things to `torch.rst`.

When something new is added to `_torch_doc.py` or `functional.py` but intentionally not in `torch.rst`, people should manually whitelist it in `test_docs_coverage.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16039

Differential Revision: D14070903

Pulled By: ezyang

fbshipit-source-id: 60f2a42eb5efe81be073ed64e54525d143eb643e
2019-02-15 07:02:31 -08:00
Jie
a771a6ba67 (#16825)
Summary:
setting the correct math type for cudnn rnn, which is enforced starting from cudnn 7.5+

1. Updating persistent rnn check with input data type instead of rnn math type;
2. Updating rnn type promotion to set correct math type for accumulation;
3. Replace datatype check for filter descriptor from rnn.datatype to input.datatype;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16825

Differential Revision: D14071190

Pulled By: ezyang

fbshipit-source-id: 1c9a1531ccf510cb0619e830be444c20c5e72f3f
2019-02-15 07:02:30 -08:00
acf5ec07af Correct conv and pooling docstrings in nn module (#17052)
Summary:
This PR fix conv and pooling docstrings in nn module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17052

Differential Revision: D14068566

Pulled By: ezyang

fbshipit-source-id: 3ec1de232ff6334b6a544dadefbb0ee6193d443a
2019-02-15 06:58:02 -08:00
6c67dcfb05 Fix AdaptiveLogSoftmaxWithLoss's constructor (#16694)
Summary:
t-ken1 and I are members of a same team.
I have added test codes about the pull request https://github.com/pytorch/pytorch/pull/16656.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16694

Differential Revision: D14070106

Pulled By: ezyang

fbshipit-source-id: ff784dbf45e96a6bcf9a4b5cb9544a661a8acad2
2019-02-15 06:58:00 -08:00
48943c3b7a Update Upsample docs to match nn.interpolate
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17134

Reviewed By: ezyang

Differential Revision: D14095694

Pulled By: driazati

fbshipit-source-id: 79afec9ddd50b3b8ce39acf98c2543cf1a3d1127
2019-02-15 06:38:41 -08:00
f84165d20d Remove static_cast insertion/kernel argument extration. (#17055)
Summary:
In light of the antistatic feature being a part of the released ROCm 2.1, remove
the feature in pyHIPIFY for extraction of kernel arguments and insertion of
static_casts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17055

Differential Revision: D14068478

Pulled By: bddppq

fbshipit-source-id: 6895f490c78247a129aa18c520ff8d4d1a3d3642
2019-02-15 01:54:31 -08:00
b65c22c01a Upgrade mkl-dnn to v0.17.3 to fix core dump issue (#17107)
Summary:
Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17107

Differential Revision: D14097600

Pulled By: yinghai

fbshipit-source-id: 2baa44e211ce37fbdf01585344c98745f5ba008c
2019-02-15 01:23:07 -08:00
056dd5b6de Updated bbox_transform and nms unit test for caffe2 ops. (#16722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16722

Updated bbox_transform and nms unit test for caffe2 ops.

Differential Revision: D13937416

fbshipit-source-id: 034743d29671c6e73d323a935e2d734ecc071bff
2019-02-15 00:21:55 -08:00
2634e306e4 Extend support for exporting reshape to onnx. (#16971)
Summary:
Resolve issue with reshape_as test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16971

Differential Revision: D14098871

Pulled By: houseroad

fbshipit-source-id: ed6b966821462d374313256abbbe27f96ce11b2c
2019-02-15 00:17:05 -08:00
f062f5fd4a add std to autodiff, and mean/var/std to operator set (#17137)
Summary:
supersedes #16684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17137

Differential Revision: D14096724

Pulled By: wanchaol

fbshipit-source-id: d801d70029a6a1f5851400ff4094c0299c102b2b
2019-02-14 23:18:53 -08:00
678a472ee5 Script module data parallel (#16891)
Summary:
support data parallel for ScriptModule.

see unit tests for testing done for this PR. I also tried traced version of resnet18 from torchvision.

I'm yet to try a complete end-to-end data parallel training. This will be next steps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16891

Differential Revision: D14002222

Pulled By: gqchen

fbshipit-source-id: fce3598169113215599815c6978e66d3c3a8c282
2019-02-14 22:52:19 -08:00
0a975d333f add pre-packing operation in README.md (#17151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17151

As title

Reviewed By: jianyuh

Differential Revision: D14084272

fbshipit-source-id: e58c041e0374f6e82b337e5b6325ef06981ad8b4
2019-02-14 22:46:47 -08:00
a1f2ed008f Minor fix of the histogram observer in FBL eval flows (#17118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17118

Fix the bug in quantization eval workflow; Add mul_nets option in histogram observer pybind

Reviewed By: yinghai

Differential Revision: D14085321

fbshipit-source-id: 08e3153148522ebc9512a57144d9a8ad154bb6f8
2019-02-14 22:02:04 -08:00
5f6ecd14c4 more test coverage on emitIf none dispatch (#16794)
Summary:
Follow up of #14533, add more test coverage for emitif metaprogramming conditions. Also delete some unwrap optional usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16794

Differential Revision: D14096868

Pulled By: wanchaol

fbshipit-source-id: ee1cec609c58d0dd65211249a90207be06649e71
2019-02-14 21:39:55 -08:00
91c50aeec6 Speed-up adaptive average pooling for the common case of size=1 output (#17011)
Summary:
When adaptive pooling has to produce a single pixel feature map, it is faster to do so by calling .mean(). Backward calls a pretty inefficient cuda kernel with atomics, which becomes ridiculously slow for halfs. For half this PR provides approx 30x speed-up for adaptive average pooling, which results in 30% end-to-end speed-up on senet. Improvements are smaller for float, but still significant (approx 5x).
Also this PR unifies handling of 3d (no batch dimension) and 4d tensors, using negative dimension indices.
cc ezyang for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17011

Reviewed By: ailzhang

Differential Revision: D14078747

Pulled By: soumith

fbshipit-source-id: 0eb9255da2351190a6bcaf68c30e2ae2402a2dd9
2019-02-14 21:15:16 -08:00
7cff803d0a Improve example for torch.mode (#17069)
Summary:
This updates the example for `torch.mode` to show a case where there is a mode.
Also add a bit of a description to the explanation as well as being a bit more precise about "a" mode rather than "the" mode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17069

Differential Revision: D14078722

Pulled By: soumith

fbshipit-source-id: 837a238d53a9b8e868511acbdc258633975bea48
2019-02-14 18:52:53 -08:00
58648a19df Create BackendTransformerBase to host common functions used for backend lowering (#17074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17074

There are some common functionalities in backend lowering. This diff creates a base class which hosts these common stuff.

Reviewed By: ipiszy

Differential Revision: D14073192

fbshipit-source-id: 9617603d0e73db6f7fcc5572756b9dbab506dae5
2019-02-14 17:57:03 -08:00
6a46738986 Fix android crash when model detects nothing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17119

Reviewed By: sf-wind

Differential Revision: D14087835

Pulled By: ZhizhenQin

fbshipit-source-id: 32e61d46679bae645fd0bbec724513cfa5c553ab
2019-02-14 17:29:14 -08:00
d61455cf40 Fix some documentation links in torch.tensor (#17109)
Summary:
Currently it's broken https://pytorch.org/docs/stable/tensors.html#torch.Tensor.norm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17109

Differential Revision: D14093567

Pulled By: ezyang

fbshipit-source-id: b167cde2150ee97ccf5689fcf50ff8157acfce10
2019-02-14 17:13:50 -08:00
5f866d0ea2 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14086124

fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58
2019-02-14 16:52:57 -08:00
f1da9892e9 Generalize catArray for contiguous inputs and dim != 0 (#17032)
Summary:
I noticed that we were sinking a lot of time into `cat` operations in machine translation on CPU, and drilled down to us doing the cat element-by-element, even though all the inputs were contiguous. The reason was we were doing the cat along a dimension that was not 0, and that caused us to not use the fast `memcpy` branch. This PR generalizes that branch.

Quick benchmark script:
```
import torch, time

tensors = [torch.rand(6, 2, 1024) for i in range(5)]

NITER = 1000
s = time.time()
for i in range(NITER):
    torch.cat(tensors, dim=1)
print('time per iter ', (time.time() - s) / NITER)
```

Before:
```
time per iter  8.089399337768554e-05
```

After:
```
time per iter  2.183413505554199e-05
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17032

Differential Revision: D14090038

Pulled By: jamesr66a

fbshipit-source-id: 2c733a84915896008ac95f2233f44894bd2573de
2019-02-14 16:33:23 -08:00
f3dd5563e4 fix test_jit canonicalize_tensor_iterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17104

Differential Revision: D14089928

Pulled By: wanchaol

fbshipit-source-id: 8b288514ab9ee8d24a11d39b75eef95783f28f20
2019-02-14 16:03:34 -08:00
65e06df24a Use new constructor in USE_SIMPLE_CTOR_DTOR (#17080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17080

This changes all operators using this macro to the new format

Reviewed By: dzhulgakov

Differential Revision: D14078628

fbshipit-source-id: 67048e485e326765fd49567cc008633d3d500d5c
2019-02-14 15:54:16 -08:00
6f2bcc9b4f Caffe2 TARGETS for HIP (#17076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17076

OSS: slightely change the tools/amd_build/build_amd.py to add the output_directory for internal use. Also modify the renaming convention in hipify script to reflect the updated rules.

Reviewed By: bddppq

Differential Revision: D13767218

fbshipit-source-id: cbcadc51daab42197d545f204840dcc18176bb3d
2019-02-14 15:45:21 -08:00
b0545aa85f maskrcnn & bert AD coverage part 1 (#16689)
Summary:
- Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver.
- Added a hack to loop up keyword only argument. Will add proper support for kw only later
- Simulate function overload in aten using `_<number>` as function name suffix.
- Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support.
- Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk`  and leave them for next PR.

Ops supported in this PR:
```
erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689

Differential Revision: D14020806

Pulled By: ailzhang

fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5
2019-02-14 15:36:39 -08:00
b5193b6a81 Second PR to restore reverted commit (#16224) (#17040)
Summary:
update:
  1. global_reduce check for should_block_y_reduce first.
     This avoids the enabling global_reduce without block_y_reduce. Leading to
     accessing shared memory during global reduce without allocation.
  2. updating block_y_reduce heuristics. Improves perf on tiny tensors
  3. adding test case covering old cases where illegal memory access might occur

  TensorIterator cuda launch configs update (#16224)
    Update launch configs for TensorIterator gpu_reduce_kernel. Enable flexible
    block dimension to improve efficiency for reduction cases with small fast
    dimension.

    Previously TensorIterator launches blocks with fixed 32x16 threads.
    For cases like:

      import torch
      torch.randn(2**20, 4, device='cuda').sum(0)

    The fixed launch config does handle coalesced memory access efficiently.

    Updated launch configure enables flexible block dimension. Combining with
    improved reduction scheme (using flexible vertical / horizontal reduction
    instead of limited warp / block reduction in the old code), it ensures optimal
    memory access pattern even with reduction on dimension with small stride.

    Possible future improvements:
    1. Precise dynamic shared memory allocation.
    2. Using warp shuffle for vertical (block_y) reduction.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/16224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17040

Differential Revision: D14078295

Pulled By: umanwizard

fbshipit-source-id: ecc55054a5a4035e731f0196d633412225c3b06c
2019-02-14 15:23:01 -08:00
b515ebc6f1 Remove fake inference for shape info in ONNXIFI transform (#17046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17046

As we are moving to use bound shape inference, we can remove the awkward fake inference run path and make the code cleaner.

Reviewed By: ipiszy

Differential Revision: D14061501

fbshipit-source-id: b3ace98b3dabef3c3359086a0bb1410518cefa26
2019-02-14 15:12:20 -08:00
0a5de6e972 Update alexnet expect.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17122

Reviewed By: colesbury

Differential Revision: D14090209

Pulled By: gchanan

fbshipit-source-id: 78c5961dd7d752b237782b6ed90c376bbd6d3145
2019-02-14 14:45:02 -08:00
ff2053dfa1 add clear functionality to list (#17050)
Summary:
Add clear functionality to list. See #16662

```python
import torch

torch.jit.script
def foo():
    a = [1, 2, 3, 4]
	a.clear()

    return a
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17050

Differential Revision: D14071799

Pulled By: driazati

fbshipit-source-id: 305551c16f7db127c43de0ad5885d9f10678e101
2019-02-14 14:38:11 -08:00
d52862ca81 Moderate the dim type after LengthsRangeFill (#17096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17096

LengthsRangeFill will take a batch size of lengths input and expand it into sequence. Later op should follow this type until it hits another batch type moderating op, e.g. SparseLengthsSum.

Reviewed By: ipiszy

Differential Revision: D14079422

fbshipit-source-id: 1a26925d502c32875ea95c160268bf6a256cc955
2019-02-14 14:28:27 -08:00
016f212357 fix behavior of ConcatDataset w/ negative indices (#15756)
Summary:
Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the *first* dataset, rather than the last entry of the *last* dataset, as we would expect.

This is a simple fix to support the expected behavior for negative indices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756

Reviewed By: ezyang

Differential Revision: D14081811

Pulled By: fmassa

fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a
2019-02-14 13:02:54 -08:00
65d6f1014a Add support of count_include_pad and test end to end test for AveragePool (#17034)
Summary:
Add support of count_include_pad end to end test for AveragePool

We can export AveragePool from PyTorch with count_include_pad attribute. However, we don't directly support it in Caffe2's ONNX backend.
We also want to check whether we can pass the end to end test for average pool operator with count_include_pad attribute (pytorch => onnx => caffe2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17034

Reviewed By: houseroad

Differential Revision: D14060186

Pulled By: dwarakrajagopal

fbshipit-source-id: 10dae532611c71f8c8cfc3fa701cc7c1c1c02695
2019-02-14 11:48:42 -08:00
19addc7eb0 Support nonzero onnx export
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17036

Differential Revision: D14079676

Pulled By: houseroad

fbshipit-source-id: 562b538dd9ab330c26f15fdb34c98dc7a23571a1
2019-02-13 23:52:42 -08:00
5a26579e27 Add more headers to setup.py to make pytorch/benchmark work (#16890)
Summary:
Since we don't do tmp_install any more it's better to include all necessary headers.

cc kostmo for better suggestions of how to list all headers here
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16890

Differential Revision: D14079848

Pulled By: dzhulgakov

fbshipit-source-id: 4522c80d05e5d91f99f6700cde46cac559330d28
2019-02-13 23:14:36 -08:00
3408d9de20 Clean up Storage/StorageImpl constructors (#16948)
Summary:
Small cleanup while doing https://github.com/pytorch/pytorch/pull/16857:

- rename C2 constructors as create_legacy
- remove duplicated constructors
- make resizable flag non-default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16948

Differential Revision: D14062755

Pulled By: dzhulgakov

fbshipit-source-id: 3b7b4ec9cdf67d2628cccc001156e040006b673e
2019-02-13 22:58:32 -08:00
11816affab Safety check for negative alloc_cpu() attempt (#17071)
Summary:
Some legacy TH code was relying on alloc to throw when called with negative number!!! E.g. `torch.linspace(0, 1, -1)`. And it breaks ASAN build. I still believe alloc should receive size_t, but I added a safety enforce inside.

It should fix ASAN. I'll follow up with a proper fix for empty_cpu (which is probably the right place to do it) separately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17071

Differential Revision: D14074157

Pulled By: dzhulgakov

fbshipit-source-id: 3ed3bdb873e446edecb558e1df491310fd7179e3
2019-02-13 22:41:13 -08:00
f0fed41ea2 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: b4e7a3850b01bbec56faa3eb0feb3bc6197c0393
2019-02-13 22:07:16 -08:00
92a516b9ff Apply modernize-use-override - 2/2
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14054721

fbshipit-source-id: 15d266fa1779b1e3ea6270f00841d7fb1e4d44ee
2019-02-13 21:01:28 -08:00
84bdf86034 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 5d9763a6f26ba53c6402b978004aaa7508f4e354
2019-02-13 21:01:27 -08:00
8abfd28f58 #16627 convert weights using torch.as_tensor to avoid warning (#17067)
Summary:
Minor change which fixes #16627
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17067

Differential Revision: D14078726

Pulled By: soumith

fbshipit-source-id: c04a5f1eff44e4a4b04b981f0ae8de6ff018515b
2019-02-13 20:54:29 -08:00
b33f4cff6b Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: e074a865b859fd72b34b012505dfbd3a27a0cc41
2019-02-13 20:54:27 -08:00
dae356df1f Revert D14062537: [pytorch][PR] Implement NetDef <--> JIT IR converters.
Differential Revision:
D14062537

Original commit changeset: 88b184ee7276

fbshipit-source-id: 01971bbe20daade40cc2cbf85fc08edb380b445c
2019-02-13 20:29:17 -08:00
c3f5ba9460 PyTorch model metadata. (#16275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16275

Adding a generic string `metadata` field as part of the model to capture additional metadata with the model.

Reviewed By: dzhulgakov

Differential Revision: D13579029

fbshipit-source-id: 7456ef2edbe73bb70bbb31889cecd94e0db329a2
2019-02-13 19:48:11 -08:00
46503a7ac0 Trim libshm deps, move tempfile.h to c10 (#17019)
Summary:
libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019

Differential Revision: D14052688

Pulled By: dzhulgakov

fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c
2019-02-13 19:38:35 -08:00
d25fee31fc Implement NetDef <--> JIT IR converters. (#16967)
Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.

Two things that we might want to add/change:
1. Add semantic conversion as well (but probably it would be a good idea to keep
it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16967

Differential Revision: D14062537

Pulled By: ZolotukhinM

fbshipit-source-id: 88b184ee7276779e5e9152b149d69857515ad98a
2019-02-13 18:39:39 -08:00
decc0893f2 Remove IgnoredPythonOp sugared value
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17042

Differential Revision: D14072497

Pulled By: driazati

fbshipit-source-id: 68fe3fa89c22e60142d758c8cbe0e6e258e7d5c2
2019-02-13 17:59:56 -08:00
3a34f443c5 Separate reduce functions from math (#16929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929

Separate CPU reduce functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13999469

fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1
2019-02-13 17:50:47 -08:00
9b7f3da74b Skip test_cudnn_multiple_threads_same_device on ROCm (flaky) (#17061)
Summary:
cc iotamudelta
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10722//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10710//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10753//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/1756//console
```
19:07:18 ======================================================================
19:07:18 FAIL: test_cudnn_multiple_threads_same_device (test_nn.TestNN)
19:07:18 ----------------------------------------------------------------------
19:07:18 Traceback (most recent call last):
19:07:18   File "/var/lib/jenkins/workspace/test/test_nn.py", line 3905, in test_cudnn_multiple_threads_same_device
19:07:18     (2048 - test_iters) * (2048 - test_iters))
19:07:18   File "/var/lib/jenkins/workspace/test/common_utils.py", line 453, in assertEqual
19:07:18     super(TestCase, self).assertLessEqual(abs(x - y), prec, message)
19:07:18 AssertionError: 3794704.0 not less than or equal to 1e-05 :
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17061

Differential Revision: D14069324

Pulled By: bddppq

fbshipit-source-id: e33b09abca217a62a8b577f9c332ea22985ef4ff
2019-02-13 17:18:47 -08:00
a670824fee Support FC (Caffe2) -> Gemm (ONNX) with variable input shape. (#16184)
Summary:
For >2D input, previously the code uses static shape captured during tracing and reshape before/after `Gemm`.
Now we add `-1` to the first `Reshape`, and uses `Shape(X) => Slice(outer) => Concat(with -1 for inner) => Reshape` for the second.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16184

Differential Revision: D14070754

Pulled By: ezyang

fbshipit-source-id: 86c69e9b254945b3406c07e122e57a00dfeba3df
2019-02-13 17:12:34 -08:00
2ad5dcbbe4 Make timeout in resnet50_trainer configurable (#17058)
Summary:
xw285cornell petrex dagamayank
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17058

Differential Revision: D14068458

Pulled By: bddppq

fbshipit-source-id: 15df4007859067a22df4c6c407df4121e19aaf97
2019-02-13 17:03:48 -08:00
41dddfd55f Make mkldnn Stream object thread_local and enable mkldnn thread-safe (#17022)
Summary:
This PR fixes following issue: https://github.com/pytorch/pytorch/issues/16828

It is a combination of two things:
1) MKLDNN streams are not thread-safe but are currently shared between different threads. This change makes them thread_local
2) By default MKLDNN primitives can share global memory and can't be invoked from multiple threads. This PR enables the MKLDNN_ENABLE_CONCURRENT_EXEC cmake configuration option that makes them thread-safe.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17022

Differential Revision: D14069052

Pulled By: ezyang

fbshipit-source-id: f8f7fcb86c40f5d751fb35dfccc2f802b6e137c6
2019-02-13 16:04:53 -08:00
491f2d4cb8 Support conversion from Caffe2 MergeDim to ONNX Reshape + Squeeze. (#16189)
Summary:
`MergeDim` can be done by `Reshape([1, -1, 0, 0, ...]) + Squeeze`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16189

Differential Revision: D14070676

Pulled By: ezyang

fbshipit-source-id: 28d7e9b35cc2c1dcbd4afb3fbdf7383e219b1777
2019-02-13 15:53:38 -08:00
86594e63eb Fix mvlgamma doc (#17045)
Summary:
Changelog:
- Fix the constant in the docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17045

Differential Revision: D14068698

Pulled By: ezyang

fbshipit-source-id: af040b9a9badea213785f5bf3b6daf4d90050eb2
2019-02-13 15:24:44 -08:00
f79563a665 Change IR graph print format to make it look more pythonic (#16986)
Summary:
This removes curly braces from the outputs (we have indentation to indicate scopes), also adds ':' after graph and blocks declaration and removes ';' from the return line. ".expect" tests are updated to keep up with it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16986

Differential Revision: D14062540

Pulled By: ZolotukhinM

fbshipit-source-id: 7f8e2d11619152a21ef7f1f7f8579c49392c3eca
2019-02-13 12:37:24 -08:00
18b8572505 Turn off the ability for Declarations.cwrap entries to be methods.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17053

Differential Revision: D14065887

Pulled By: gchanan

fbshipit-source-id: 5d06ac66d27d28d48c2aff2b0d911f34ea0cd6fd
2019-02-13 12:25:05 -08:00
bc39cf4d5e Remove chunk count check on the ChunkBuffer (#16868)
Summary:
Previously, the ChunkBuffer depends on the remaining chunk count to signal end of dataloading. This does not work with distributed samplers where each sampler only loads a subset of  chunks. This refactor remove the dependency on the remaining chunk count at the ChunkBuffer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16868

Differential Revision: D14066517

Pulled By: goldsborough

fbshipit-source-id: 293dfe282ceff326dff0876c2f75c2ee4f4463e2
2019-02-13 11:09:42 -08:00
a5e7b1d032 Use IndexError instead of RuntimeError in ATen CPU kernels
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17049

Reviewed By: ezyang

Differential Revision: D14064700

Pulled By: fmassa

fbshipit-source-id: 3575db103bba5a7d82f574cbb082beca419151ec
2019-02-13 10:19:28 -08:00
1fc05bd285 Mark IntList as deprecated; add C10_DEPRECATED_USING (#16824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16824

There was a big wooly yak getting the deprecated macros to work.
Gory details are in Deprecated.h

Reviewed By: smessmer

Differential Revision: D13978429

fbshipit-source-id: f148e5935ac36eacc481789d22c7a9443164fe95
2019-02-13 08:51:20 -08:00
db82fc7ca6 Add more debugging facilities to ONNXIFI transform (#17043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17043

Add more debugging facilities for ONXNIFI transform.

Reviewed By: ipiszy

Differential Revision: D14019492

fbshipit-source-id: 8c258ccba2f8ce77db096031fc8a61e15bd8af93
2019-02-13 00:05:41 -08:00
26018d027a Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 399afdc341075c383227d0d410a30eeb6c1d3b08
2019-02-13 00:05:40 -08:00
2b5bef22b7 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: edb216d2eca7120d0f7729b2e4640096a0341154
2019-02-12 21:26:24 -08:00
51dd2000cd unify c2 and TH allocator (#16892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892

Replaces https://github.com/pytorch/pytorch/pull/14517

Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators.
`memset` of caffe2 allocator is gone now. These two allocators should be almost the same.

Baseline:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl                       148 ns        148 ns    4676594
BM_StorageImplCtor                        54 ns         54 ns   12957810
BM_MallocStorageImpl                      62 ns         62 ns   11254745
BM_TensorImplCtor                         22 ns         22 ns   31939472
BM_MallocTensorImpl                      105 ns        105 ns    6505661
BM_Malloc_1                               43 ns         43 ns   16464905
BM_MakeTensorFromStorage                 126 ns        126 ns    5586116
BM_MakeVariableFromTensor                236 ns        236 ns    2995528
BM_ATenCPUTensorAllocationSmall1         319 ns        319 ns    2268884
BM_ATenCPUTensorAllocationSmall2         318 ns        318 ns    2163332
BM_ATenCPUTensorAllocationMedium1        403 ns        403 ns    1663228
BM_ATenCPUTensorAllocationMedium2        448 ns        448 ns    1595004
BM_ATenCPUTensorAllocationBig1           532 ns        532 ns    1352634
BM_ATenCPUTensorAllocationBig2          4486 ns       4486 ns     160978
```
Changed:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl                       141 ns        141 ns    4803576
BM_StorageImplCtor                        55 ns         55 ns   13129391
BM_MallocStorageImpl                      64 ns         64 ns   11088143
BM_TensorImplCtor                         23 ns         23 ns   31616273
BM_MallocTensorImpl                      101 ns        101 ns    7017585
BM_Malloc_1                               39 ns         39 ns   18523954
BM_MakeTensorFromStorage                 118 ns        118 ns    5877919
BM_MakeVariableFromTensor                452 ns        452 ns    1565722
BM_ATenCPUTensorAllocationSmall1         384 ns        384 ns    1819763
BM_ATenCPUTensorAllocationSmall2         389 ns        389 ns    1857483
BM_ATenCPUTensorAllocationMedium1        425 ns        425 ns    1646284
BM_ATenCPUTensorAllocationMedium2        430 ns        430 ns    1561319
BM_ATenCPUTensorAllocationBig1           508 ns        508 ns    1309969
BM_ATenCPUTensorAllocationBig2          3799 ns       3799 ns     173674
```

lstm benchmark:
Before:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```

After:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```

Reviewed By: ezyang

Differential Revision: D13202632

fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d
2019-02-12 21:16:34 -08:00
f87022bf2f Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 7d730945dbdd7bb7d10192061229ee6e759a1a7f
2019-02-12 20:45:38 -08:00
fb5790ce94 Remove second output of Reshape during ONNXIFI transform (#17027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17027

Glow doesn't support second output of Reshape right now and it's useless. For correctness, we do make sure that the second output of Reshape is of Constant type during bound shape inference.

Reviewed By: ipiszy

Differential Revision: D14056555

fbshipit-source-id: f39cca7ba941bf5a5cc3adc96e2b1f943cc0be93
2019-02-12 18:31:53 -08:00
9d01be1a5a enable more unit tests in test_nn (#16994)
Summary:
These tests work with ROCm 2.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16994

Differential Revision: D14059802

Pulled By: bddppq

fbshipit-source-id: 8e2cbb13196c2e0283d3e02b7f761374bc580751
2019-02-12 17:58:44 -08:00
02b838e065 fix bicubic upsampling and enable tests (#17020)
Summary:
Fix macro name in ifdef guard, enable upsampling tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17020

Differential Revision: D14059780

Pulled By: bddppq

fbshipit-source-id: 82c57d17d5bccdccb548c65d2b7a1ff8ab05af30
2019-02-12 17:33:08 -08:00
92221ad840 Fold col offsets into bias; optimize A symmetric quant (#16942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16942

We can fold col offsets into bias if zero point of activation is constant.
fbgemm still needs to provide an option to pass col offsets in case zero point of activation keep changes (e.g., dynamic quantization).
A trick to optimize static quantization case is setting A zero point to 0 after folding into bias.

This diff also optimizes when weights use symmetric quantization. When B zero point is 0, we use PackAMatrix instead of PackAWithRowOffset .

TODO:
Ideally, PackAWithRowOffset should perform as fast as PackAMatrix when B_zero_point is 0 to make client code simpler
Same in PackAWithIm2Col and depth-wise convolution (group convolution is already doing this)

Reviewed By: csummersea

Differential Revision: D14013931

fbshipit-source-id: e4d313343e2a16a451eb910beed30e35de02a40c
2019-02-12 17:33:06 -08:00
3e1e5d5a8b enable unit tests in test_cuda that now pass with ROCm 2.1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17012

Differential Revision: D14059761

Pulled By: bddppq

fbshipit-source-id: 8309c3ffe1efed42b5db69fdec26427413c3f224
2019-02-12 17:28:46 -08:00
9696fee635 Register CUDA kernels for caffe2 operators (#16691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691

Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10.
This now also registers the CUDA kernels with it.

Reviewed By: bwasti

Differential Revision: D13901619

fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6
2019-02-12 17:24:01 -08:00
059c55f8cc Enable test_jit tests that work on ROCm 2.1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17010

Differential Revision: D14059748

Pulled By: bddppq

fbshipit-source-id: 7a1f7eee4f818dba91e741437415370973e4d429
2019-02-12 17:18:44 -08:00
7c24de8d04 Extract ShapeInfo and some util functions into a separate file. (#17025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17025

Extract ShapeInfo and some util functions into a separate file.

Reviewed By: yinghai

Differential Revision: D14017432

fbshipit-source-id: 201db46bce6d52d9355a1a86925aa6206d0336bf
2019-02-12 17:06:29 -08:00
f435fb8290 Allow customization of blob node in net_drawer (#16915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16915

TSIA

Reviewed By: ipiszy

Differential Revision: D14018010

fbshipit-source-id: df5ccc06fa37f08e7a02a8acc466c4ad47afe04e
2019-02-12 15:02:50 -08:00
65b49b4696 Ignore unknown_shaped tensor in bound shape inference (#16916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16916

Two fixes for maximum effort bound shape inference
1. Ignore failed and unknown shape
2. Add specialization for `SparseLengthsWeightedSumFused8BitRowwise`.

Reviewed By: ipiszy

Differential Revision: D14017810

fbshipit-source-id: 25cd68d35aa20b9ed077bdb562eb7f9deff0ab96
2019-02-12 15:02:49 -08:00
7c1e4258a9 Workarounds to the lack of nvidia-smi and ldconfig programs in macosx (was PR 16968) (#16999)
Summary:
Fix issue #12174 for Mac OSX.

PS: This is a duplicate of PR #16968 that got messed up. Sorry for the confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16999

Differential Revision: D14050669

Pulled By: zou3519

fbshipit-source-id: a4594c03ae8e0ca91a4836408b6c588720162c9f
2019-02-12 14:39:28 -08:00
0d95028bee Dispatch the correct legacy function for geqrf_out and ormqr_out (#16964)
Summary:
This fixes the segfault.

Changelog:
- Modify the function calls in LegacyDefinitions for `geqrf_out` and `ormqr_out`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16964

Differential Revision: D14025985

Pulled By: gchanan

fbshipit-source-id: aa50e2c1694cbf3642273ee14b09ba12625c7d33
2019-02-12 13:48:51 -08:00
68c3b959de Register layout for XLA backend.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16946

Differential Revision: D14054716

Pulled By: gchanan

fbshipit-source-id: 063495b99b9f7d29ca3ad2020a6bc90d36ba0d7d
2019-02-12 13:44:07 -08:00
0eee56fff7 Export ReduceMean/ReduceFrontMean/ReduceBackMean (Caffe2) to ReduceMean (ONNX). (#16727)
Summary:
The second input (`lengths`) is not supported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16727

Differential Revision: D14054105

Pulled By: houseroad

fbshipit-source-id: 36b8d00460f9623696439e1bd2a6bc60b7bb263c
2019-02-12 13:35:32 -08:00
b0d57aa7b1 Clean up allocations in FBGEMM linear (#16985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16985

These statements were causing some redundant allocations + copying, so I cleaned
them up

Reviewed By: zdevito, wanchaol

Differential Revision: D14031067

fbshipit-source-id: f760fb29a2561894d52a2663f557b3e9ab1653de
2019-02-12 13:02:21 -08:00
34e4bd3ec5 Properly dispatch s_copy__cpu.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16974

Differential Revision: D14030516

Pulled By: gchanan

fbshipit-source-id: ba4cde5ebf2898d207efbc9117c1f1d6ccae861b
2019-02-12 12:53:36 -08:00
2b1e2b6b53 Get rid of unused THPStorage defines related to accreal.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16973

Differential Revision: D14029538

Pulled By: gchanan

fbshipit-source-id: b51f203ccff97695bf228772bb13e3e6b9bb6d1a
2019-02-12 12:48:48 -08:00
f2e6a3f230 Fix AddAdjustBatchOp (#16997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16997

1. Don't create multiple AdjustBatch ops for the same input name. We create it once and hook input to abc_post_adjust_batch.

2. Dangling tensor. The problem for such an error is still with AttachAdjustBatchOp. Considering such as net
```
op {
  type : "Relu"
  input: "X"
  outpu: "Y"
}
op {
  type : "Relu"
  input: "Y"
  output: "Y2"
}
external_output: "Y"
external_output: "Y2"
```
In this the output of first Relu will be used as an internal node as well as output. We cannot simply rename Y into Y_pre_batch_adjust. Basically, we need another pass in to check all the input of the ops in the net and rename Y into Y_pre_batch_adjust.

Reviewed By: bertmaher

Differential Revision: D14041446

fbshipit-source-id: f6553e287a8dfb14e4044cc20afaf3f290e5151b
2019-02-12 11:45:43 -08:00
21ce1da5e9 Roll back PyTorch DockerVersion to 282
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17013

Differential Revision: D14052415

Pulled By: yf225

fbshipit-source-id: df663fb46ee825174fe06b8d395979b3d4e84766
2019-02-12 11:39:15 -08:00
a2c322e735 fix silent failure on Windows builds (#16984)
Summary:
Closes #16983

Remove backticks that are being interpreted by the shell. Add -e option to bash script to avoid future such failures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16984

Reviewed By: yf225

Differential Revision: D14039128

Pulled By: kostmo

fbshipit-source-id: c31a1895377ca86c1b59e79351843cc8c4fd7de3
2019-02-12 11:06:27 -08:00
3618b52c74 Add module and name to func created with _jit_internal.boolean_dispatch (#16922)
Summary:
The use case for making this PR is the following bug :
(with F = torch.nn.functional)
`F.max_pool2d.__module__` is `torch._jit_internal`
`F.max_pool2d.__name__` is `fn`

With this PR you get:
`F.max_pool2d.__module__` is `torch.nn.functional`
`F.max_pool2d.__name__` is `max_pool2d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16922

Differential Revision: D14020053

Pulled By: driazati

fbshipit-source-id: c109c1f04640f3b2b69bc4790b16fef7714025dd
2019-02-12 09:38:48 -08:00
40528efeac More docs for methods in operator.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16826

Reviewed By: izdeby

Differential Revision: D13979891

fbshipit-source-id: df8391ffaff0d44845057bb839f05aea6fc5712c
2019-02-12 08:19:38 -08:00
e5742494f6 Minor typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16980

Differential Revision: D14033686

Pulled By: gchanan

fbshipit-source-id: 9f7967defc6795640e14157d0b701b185061741f
2019-02-12 08:02:04 -08:00
f61f9e1757 Fix allow_inf in assertEqual (#16959)
Summary:
gchanan pointed out in https://github.com/pytorch/pytorch/pull/16389 that `allow_inf` is treating `-inf` and `inf` as equal. This fixes it.

Also fixing #16448 since it's near and 2.1 has released.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16959

Differential Revision: D14025297

Pulled By: gchanan

fbshipit-source-id: 95348309492e7ab65aa4d7aabb5a1800de66c5d6
2019-02-12 07:56:34 -08:00
ae1fc584ea Refine return type Stream to HIPStream in HIPStreamGuardMasqueradingAsCUDA (#16978)
Summary:
Previously, we used the templated class directly to provide
implementations.  However, there is a subtle difference
between this, and CUDAStreamGuard: CUDAStreamGuard has refined types
for the Streams it returns.  This lead to a compilation failure
of HIPified ddp.cpp.  This commit lines them up more closely,
at the cost of copy-paste.

A possible alternate strategy would have been to extend the
InlineDeviceGuard templates to optionally accept refinements
for Stream.  I leave this for future work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16978

Differential Revision: D14045346

Pulled By: ezyang

fbshipit-source-id: 2b101606e62e4db588027c57902ea739a2119410
2019-02-12 07:36:25 -08:00
b7b245845a Revert D14030665: [pytorch][PR] [HOTFIX] Pin docker-ce version to the one expected by nvidia-docker2
Differential Revision:
D14030665

Original commit changeset: dece6a5aa4d1

fbshipit-source-id: 885a464ec3d1c23d4e07630fa3b67e69a3eab1b8
2019-02-12 07:04:57 -08:00
bad4442a7c Parse the command line and check the arguments before build_deps() (#16914)
Summary:
This is needed to check for wrong arguments or --help options
before `build_deps()` is executed. Otherwise command line arguments
are not parsed and checked until `setup()` is run.

Fixes: #16707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16914

Differential Revision: D14041236

Pulled By: soumith

fbshipit-source-id: 41f635772ccf47f05114775d5a19ae04c495ab3b
2019-02-12 00:15:42 -08:00
4d4c5273de Fix and add testing for nullptr allocator in c2->pt conversion (#16857)
Summary:
Fixes the bug for when tensor is created on Caffe2 side, then passed to PT and resized. Now we just initialize allocator correctly.

Note that the code in raw_mutable_data() is still necessary because of non-resizable tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16857

Reviewed By: houseroad

Differential Revision: D14019469

Pulled By: dzhulgakov

fbshipit-source-id: 14d3a3b946d718bbab747ea376903646b885706a
2019-02-11 23:21:02 -08:00
aa626840af Fix NERPredictor for zero initialization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16931

Reviewed By: dragonxlwang

Differential Revision: D14016749

fbshipit-source-id: b5512c52cef77651bdba1e31f588ea649daacdd9
2019-02-11 23:12:26 -08:00
d266453541 Allow calling a Python function with a dict
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16989

Differential Revision: D14037896

Pulled By: driazati

fbshipit-source-id: 5f26d2d8fabf0f267909a3383f19d984645f94d0
2019-02-11 21:52:44 -08:00
4292d13240 Keep weights name unchanged during SsaRewrite (#16932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16932

During onnxifi transformation net ssa is rewritten. At the last step the weight
names are changed back to what they were before. The diff keeps the weight
names unchanged thru the process.

Reviewed By: yinghai

Differential Revision: D13972597

fbshipit-source-id: 7c29857f788a674edf625c073b345f2b44267b33
2019-02-11 14:55:31 -08:00
917eac91f4 Pin docker-ce version to the one expected by nvidia-docker2 (#16976)
Summary:
Fix errors such as https://circleci.com/gh/pytorch/pytorch/760715.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16976

Differential Revision: D14030665

Pulled By: yf225

fbshipit-source-id: dece6a5aa4d13ff771c18b4ce02a0b9f9572a379
2019-02-11 14:21:20 -08:00
920c684367 Expose GenerateProposals to PyTorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16880

Reviewed By: bwasti

Differential Revision: D13998092

fbshipit-source-id: 23ab886ba137377312557fa718f262f4c8149cc7
2019-02-11 14:15:47 -08:00
0c02d317ea Expose BBoxTransform to pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16879

Reviewed By: bwasti

Differential Revision: D13998093

fbshipit-source-id: ddfe4bff83e9a1a4cedf1e520e6d2977b21cb3af
2019-02-11 14:15:45 -08:00
64aa769ef9 Minimize templated code in caffe2 operator wrapper (#16965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16965

Instead of having one large templated function to wrap the caffe2 op, minimize the amount of templated code.
Non-templated code can be reused between different operators and decreases binary size.

Reviewed By: orionr

Differential Revision: D14018806

fbshipit-source-id: bedd4152eec21dd8c5778446963826316d210543
2019-02-11 14:15:43 -08:00
7743ed8502 Don't keep unnecessary saved_inputs alive (#16583)
Summary:
Fixes #16577.

This greatly improves memory efficiency of certain ops like Dropout2d. Previously, they were implemented as `input * mask` where mask never requires_grad, but we didn't use that knowledge in forward, and (in case of a in-place dropout) kept input.clone() for the backward, when it would simply get ignored.

This patch tries to address this situation by emitting some guards for stores like this, but only if they are as simple, as checking if a single value requires_grad.

Interestingly, the same optimizations apply to methods like bmm, baddmm, etc., but _not to mm nor addmm_, because of how their derivatives are defined. Apparently they unnecessarily use `mat1` to compute the derivative of `mat1` just to improve the error message in case `mat1` was sparse. I'd like to apply this optimization to that case, but I don't want to loose the nicer error message, so if anyone has any ideas for solutions, please let me know...

Full list of operators affected by this patch:
* _nnpack_spatial_convolution
* addbmm
* addcdiv
* addcmul
* addmv
* addr
* baddbmm
* bmm
* cross
* div
* dot
* fmod
* ger
* index_add_
* mul
* mv
* scatter_add_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16583

Differential Revision: D13900881

Pulled By: gchanan

fbshipit-source-id: dd0aeb2ab58c4b6aa95b37b46d3255b3e014291c
2019-02-11 13:42:09 -08:00
e2a5b203fc Enforce same input tensor storage in VariableType functions (#16305)
Summary:
In VariableType.cpp, when a function modifies its input tensors, it should only change the input tensors' storage data in-place, and should never change the input tensors' storage pointers. This PR adds checks for this, and also fixes functions that fail this test.

This is part of the Variable/Tensor merge work (https://github.com/pytorch/pytorch/issues/13638).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16305

Differential Revision: D13897855

Pulled By: yf225

fbshipit-source-id: 0c4fc7eb530d30db88037b1f0981f6f8454d3b79
2019-02-11 13:33:12 -08:00
4b454c3bdd Revert unneeded fixes in flat_hash_map (#16907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16907

The begin()/end() fix actually doesn't make sense, see my comment on https://github.com/skarupke/flat_hash_map/pull/8
This diff removes it.

Reviewed By: ezyang

Differential Revision: D13985779

fbshipit-source-id: f08b02c941069e2a4e728e02a19b65dc72f96b41
2019-02-11 13:28:25 -08:00
9521612bb7 Fix constexpr in KernelRegistrationBuilder (#16906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16906

In C++11, constexpr implies const, so these methods actually wouldn't be rvalue overloads as intended but const rvalue overloads.
Let's only apply the constexpr flag in C++14 to be safe.

Reviewed By: bddppq

Differential Revision: D13998486

fbshipit-source-id: a04d17ef0cc8f45e3d0a1ca9843d194f4f0f6f7f
2019-02-11 13:28:23 -08:00
af0c79eed4 Catch cudaError_t return val (nodiscard in rocm) (#16399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16399

Catching cudaError_t return values in a few places, because it's nodiscard in rocm. Unless we add -Wno-unused-result, it'll end up with a compilation error.

Also in c10/cuda/test, check whether a host has GPU or not. We were silently throwing out the error before (so not really testing the cuda api).

Reviewed By: bddppq

Differential Revision: D13828281

fbshipit-source-id: 587d1cc31c20b836ce9594e3c18f067d322b2934
2019-02-11 13:18:36 -08:00
29f096cc70 optionally zero infinite losses in CTCLoss (#16199)
Summary:
Here is a stab at implementing an option to zero out infinite losses (and NaN gradients).
It might be nicer to move the zeroing to the respective kernels.
The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful.

This topic seems to come up regularly, e.g. in  #14335
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199

Differential Revision: D14020462

Pulled By: ezyang

fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428
2019-02-11 13:12:55 -08:00
632df48207 Merge binaries "convert_image_to_tensor" and "caffe2_benchmark" (#16875)
Summary:
Merge binaries "convert_image_to_tensor" and "caffe2_benchmark" to remove the overhead of writing to/reading from Tensor file.

*TODO next: TensorProtos is another overhead. No need for de-serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16875

Reviewed By: sf-wind

Differential Revision: D13997726

Pulled By: ZhizhenQin

fbshipit-source-id: 4dec17f0ebb59cf1438b9aba5421db2b41c47a9f
2019-02-11 13:07:26 -08:00
b4f1a871e8 Fix missing CircleCI GPG key (#16961)
Summary:
I'm seeing a bunch of apt gpg key errors on CI with the following message:
```
An error occurred during the signature verification. The repository is not
updated and the previous index files will be used. GPG error:
https://packagecloud.io trusty InRelease: The following signatures couldn't
be verified because the public key is not available:
NO_PUBKEY 4E6910DFCB68C9CD
```

Most of the times apt will reuse the old cached version, but sometimes this results in a build failure: https://circleci.com/gh/pytorch/pytorch/758366?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link.

This should hopefully fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16961

Differential Revision: D14028151

Pulled By: ezyang

fbshipit-source-id: 7648a0a58ece38d8d04916937a9fa17f34f8833e
2019-02-11 12:31:38 -08:00
c90a33b627 Disable binary_linux_conda_3.6_cu90_build on PRs. (#16958)
Summary:
Issue tracked at https://github.com/pytorch/pytorch/issues/16710

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16958

Differential Revision: D14028078

Pulled By: ezyang

fbshipit-source-id: 6c68f79775a156ef4a55ac450a5a0ecacc0e6af5
2019-02-11 11:53:49 -08:00
48c5d0ae8c Install Thrust package and stop patching (#16911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16911

I think the Thrust package has want we want for /opt/rocm/include/thrust. We probably can stop patching it now.

Reviewed By: bddppq

Differential Revision: D14015177

fbshipit-source-id: 8d9128783a790c39083a1b8b4771c2c18bd67d46
2019-02-11 09:47:39 -08:00
8042edcdb1 Make pin_memory and default_collate preserve namedtuples (#16440)
Summary:
Open issue: https://github.com/pytorch/pytorch/issues/3281
Corresponding PR (conflict): https://github.com/pytorch/pytorch/pull/4577

Another open issue: https://github.com/pytorch/pytorch/issues/14613
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16440

Differential Revision: D14020901

Pulled By: ezyang

fbshipit-source-id: 4abe817fc43c281a510715d311bad544511995d3
2019-02-11 08:47:33 -08:00
d7e6f9b5a7 Revert D14020906: [pytorch][PR] Extend support for exporting reshape to onnx.
Differential Revision:
D14020906

Original commit changeset: 168616873044

fbshipit-source-id: 2730bb6990d41f3a9cef6625ea919c219733433d
2019-02-11 06:08:55 -08:00
8b4dea3f56 Added scientific notation on set_printoptions (#16876)
Summary:
This PR fixes #15683
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16876

Differential Revision: D14021703

Pulled By: soumith

fbshipit-source-id: 1f603a7d24e331831d8d389f4a704c6a5b070b0c
2019-02-11 04:55:12 -08:00
4335aac6e6 Extend support for exporting reshape to onnx.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16632

Differential Revision: D14020906

Pulled By: ezyang

fbshipit-source-id: 168616873044b980145a3554dab942bdec19efb2
2019-02-10 20:19:35 -08:00
e661dc27ff Int8GivenTensorFill Operator Schema fix typo (#16204)
Summary:
Hi,
caffe2/operators/quantized/int8_given_tensor_fill_op.cc expects the value array to be named "values" but the operator schema describe "value" (no s). I guess it is a little typo but it made me losing a bit of time before understanding why I had this error by passing "value" instead of "values":
```
[F int8_given_tensor_fill_op.h:95] Check failed: output->t.numel() == values_.numel() output size: 3 given size: 0
Aborted (core dumped)
```

Thanks,
Eyyüb Sari
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16204

Differential Revision: D14020476

Pulled By: ezyang

fbshipit-source-id: a8a46bfc44ec125e7925ce4b7c79fdf99c890a50
2019-02-10 20:08:45 -08:00
8f6ee88a1d Add support for fusion of half batch norm with float stats (#16735)
Summary:
Fixes #16642.

cc ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16735

Differential Revision: D14020310

Pulled By: ezyang

fbshipit-source-id: ac78726f471d16d188eb998354d52bc79fe2c282
2019-02-10 19:37:57 -08:00
c282afffa7 Improve the Sparse matrix multiplication computational speed #16187 (#16905)
Summary:
Instead of converting coo to csr format of the sparse matrix in the original implementation, in my revision I directly use coo format for sparse dense matrix mutliplication.
On my linux machine it is 5 times faster than the original code:

```
(original code)
SIZE: 15000 DENSITY: 0.01 DEVICE: cpu
torch: 0.39403 seconds
np:    0.00496674 seconds
torch/np: 79.3338

----------------------------------------

(my update)
SIZE: 15000 DENSITY: 0.01 DEVICE: cpu
torch: 0.0812583 seconds
np:    0.00501871 seconds
torch/np: 16.1911

```

Further code feedback and running time tests are highly welcomed. I will keep revise my code if needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16905

Differential Revision: D14020095

Pulled By: ezyang

fbshipit-source-id: 4ab94075344a55b375f22421e97a690e682baed5
2019-02-10 19:37:54 -08:00
0742874643 Allow dataloader to accept a custom memory pinning function (#16743)
Summary:
Renewed attempt at https://github.com/pytorch/pytorch/pull/14171

From the original PR:
> Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
>
>This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.

The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader.  slayton58 suggested a cleaner approach:  allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback.  I've updated the test and docstrings accordingly.

The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related.  I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.

fmassa and yf225 who were my POCs on the old PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743

Differential Revision: D13991745

Pulled By: ezyang

fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17
2019-02-10 19:37:53 -08:00
73d7ecd183 Add abs for ByteTensor and CharTensor. (#16893)
Summary:
Fixes #15089
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16893

Differential Revision: D14020115

Pulled By: ezyang

fbshipit-source-id: 6f3be6ed28d2d37667159be45959d400bc473451
2019-02-10 19:31:57 -08:00
eae139e18f Support named tuple return from operators on JIT (#16253)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/16233

The following changes are made:
- Modify `TupleType` to store optional field names
- Modify schema matching to return fill in those field names when creating  `TupleType` as return type.
- Modify codegen of JIT to copy field names to schema string
- Modify `SchemaParser` to set field names of returned schema.
- Modify `SimpleValue::attr` to emit tuple indexing for named tuple.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16253

Reviewed By: ezyang

Differential Revision: D13954298

Pulled By: zdevito

fbshipit-source-id: 247d483d78a0c9c12d1ba36e1f1ec6c3f1a3007b
2019-02-10 18:15:56 -08:00
9cb41e5386 Enhance the documentation for torch.nn.DataParallel (#15993)
Summary:
I found a few sentences in DataParallel docstring confusing, so I suggest this enhancement.

- Arbitrary arguments are allowed to be passed .... *INCLUDING* tensors (Not *EXCLUDING*)
- The original author said that "other types" are shallow-copied but I think actually only some builtin types are (effectively) shallow-copied.  And "other types" are shared. Here is an example.

```python
import torch
from torch.nn import Module, DataParallel
from collections import deque

class MyModel(Module):
    def forward(self, x):
        x.append(None)

model = MyModel(); model.cuda()
model = DataParallel(model)

d = deque()
model.forward(d)
print(d)
```

This is a side note.

As far as I know, copying objects is not a specially frequent operation in python unlike some other languages. Notably, no copying is involved in assignment or function parameter passing. They are only name bindings and it is the whole point of "everything is object" python philosophy, I guess. If one keep this in mind, it may help you dealing with things like multithreading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15993

Differential Revision: D14020404

Pulled By: ezyang

fbshipit-source-id: a38689c94d0b8f77be70447f34962d3a7cd25e2e
2019-02-10 15:55:31 -08:00
aae6b53c5b DOC: correct docstring for torch and torch.Tensor package (#16842)
Summary:
This PR is a simple fix for the mistake in the  "tensor"  and "torch.Tensor"doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16842

Differential Revision: D14020300

Pulled By: ezyang

fbshipit-source-id: 3ab04f1223d6e60f8da578d04d759e385d23acbb
2019-02-10 14:37:29 -08:00
6a528007a6 find libnvToolsExt instead of using only hardcoded path (#16714)
Summary:
This changes the libnvToolsExt dependency to go through CMake find_library.

I have a machine where cuda libs, and libnvToolsExt in particular, are in the "usual library locations". It would be neat if we could find libnvToolsExt and use the path currently hardcoded as default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16714

Differential Revision: D14020315

Pulled By: ezyang

fbshipit-source-id: 00be27be10b1863ca92fd585f273d50bded850f8
2019-02-10 14:01:00 -08:00
8c9df48fd4 Clean up autograd method tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16790

Differential Revision: D14020305

Pulled By: ezyang

fbshipit-source-id: 3aa3362830cde35967a3895837a25b3cf3287569
2019-02-10 13:49:12 -08:00
a1a330bd6e fixed LogSigmoid math string that wasn't rendering in documentation (#16900)
Summary:
The documentation for LogSigmoid says:

> Applies the element-wise function:
> \<blank\>

Now the documentation properly displays the math string.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16900

Differential Revision: D14020097

Pulled By: ezyang

fbshipit-source-id: 41e229d0fcc6b9bb53367be548bf85286dc13546
2019-02-10 11:47:56 -08:00
e0323a6aea ctc_loss error message bug fix. (#16917)
Summary:
CTCLLoss argument error message is wrong.
Please fix this. (sorry if I made some mistakes.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16917

Differential Revision: D14019983

Pulled By: ezyang

fbshipit-source-id: 3337a2e86da6f3f7594c73fddb73340494a19ce2
2019-02-10 10:49:29 -08:00
202eaa4ef4 Use non-Variable type for callsites that check type equality (#16325)
Summary:
When Variable and Tensor are merged, the dynamic type of the tensors passed to certain functions will become variables, and expecting `type()` on those variables to still return non-Variable types will cause type mismatch error.

One way to fix this problem is to use the thread-local guard `at::AutoNonVariableTypeMode` to force `type()` to return non-Variable type, but ideally we want to limit the use of `at::AutoNonVariableTypeMode` to be only in VariableType.cpp. Another way to fix the problem is to use `at::globalContext().getNonVariableType()` instead to get the non-Variable type of the tensor, which is what this PR is trying to achieve.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16325

Differential Revision: D14012022

Pulled By: yf225

fbshipit-source-id: 77ef1d2a02f78bff0063bdd72596e34046f1e00d
2019-02-10 09:47:50 -08:00
a9f1d2e371 Fix the error in the note about torch.device documentation. (#16839)
Summary:
This PR is a simple fix for the mistake in the first note for `torch.device` in the "tensor attributes" doc.
![image](https://user-images.githubusercontent.com/8536399/52399611-1becaa00-2b00-11e9-85bf-cac04b29842d.png)

```
>>> # You can substitute the torch.device with a string
>>> torch.randn((2,3), 'cuda:1')
```
Above code will cause error like below:
```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-abdfafb67ab1> in <module>()
----> 1 torch.randn((2,3), 'cuda:1')

TypeError: randn() received an invalid combination of arguments - got (tuple, str), but expected one of:
 * (tuple of ints size, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
 * (tuple of ints size, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
```

Simply adding the argument name `device` solves the problem: `torch.randn((2,3), device='cuda:1')`.

However, another concern is that this note seems redundant as **there is already another note covering this usage**:
![image](https://user-images.githubusercontent.com/8536399/52399583-0ecfbb00-2b00-11e9-914f-e95da4edecd1.png)

So maybe it's better to just remove this note?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16839

Reviewed By: ezyang

Differential Revision: D13989209

Pulled By: gchanan

fbshipit-source-id: ac255d52528da053ebfed18125ee6b857865ccaf
2019-02-09 20:18:58 -08:00
19790b218f Register coalescer bug was fixed in ROCm 2.1 (#16923)
Summary:
Remove specialization/workaround for ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16923

Differential Revision: D14018521

Pulled By: bddppq

fbshipit-source-id: d88162740bca6dc8ad37397dfbf8c84408074a00
2019-02-09 11:27:50 -08:00
d72c5d5a49 Alignas is now correctly handled on ROCm (#16920)
Summary:
Post 2.1 release, packing is fixed and alignas works as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16920

Differential Revision: D14018539

Pulled By: bddppq

fbshipit-source-id: 0ed4d9e9f36afb9b970812c3870082fd7f905455
2019-02-09 11:27:48 -08:00
5089ee9677 Enable buildin bitonic sort (#16919)
Summary:
It now works post ROCm 2.1 release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16919

Differential Revision: D14018538

Pulled By: bddppq

fbshipit-source-id: c4e1bafb53204a6d718b2d5054647d5715f23243
2019-02-09 11:27:47 -08:00
f169f398d0 Change the default image size from 227 to 224 in resnet50 trainer (#16924)
Summary:
cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16924

Differential Revision: D14018509

Pulled By: bddppq

fbshipit-source-id: fdbc9e94816ce6e4b1ca6f7261007bda7b80e1e5
2019-02-09 11:18:58 -08:00
23e1c55cc0 enable unit tests working on ROCm 2.1 (#16871)
Summary:
This is the first round of enabling unit tests that work on ROCm 2.1 in my tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871

Differential Revision: D13997662

Pulled By: bddppq

fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f
2019-02-09 00:30:50 -08:00
fc4f33b08f Add suggest add to __constants__ message on save fail
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16850

Differential Revision: D14014735

Pulled By: eellison

fbshipit-source-id: 7b6d5d5b64b9b107743cea1548cb4ee1b653977e
2019-02-08 19:40:11 -08:00
6737190b5c Make the exception raised from "numpy.dtype(numpy.void, (INT,))" less cryptic (#16809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16809

https://fb.facebook.com/groups/582508038765902/permalink/736710343345670/?comment_id=824042307945806&reply_comment_id=824318864584817

numpy.dtype(numpy.void, (<INT>, )) raises a cryptic message "invalid itemsize in generic type tuple" that is hard to debug.

This diff adds the message to ask the user to investigate the error causing blob.

Reviewed By: kennyhorror

Differential Revision: D13973359

fbshipit-source-id: 43a0c492ffafbabdfd7f7541c08a258e5ac0280f
2019-02-08 16:46:50 -08:00
12bace141b Revert D13970381: [caffe2] Add visibility to registry class to fix ubsan error
Differential Revision:
D13970381

Original commit changeset: 763db24b8a98

fbshipit-source-id: dda8672ed0bc6fecc4dde5ce73feb99e15205978
2019-02-08 16:21:10 -08:00
0799a81cb7 Extend Net.RunAllOnGPU() to support RecurrentNetwork op (#15713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15713

[caffe2] Extend Net.RunAllOnGPU() to support RecurrentNetwork op

Reviewed By: dzhulgakov

Differential Revision: D13576507

fbshipit-source-id: f517127492c9d516ece663d42fef84338c70344e
2019-02-08 15:48:42 -08:00
48fe839d56 delete critical section in TH*Tensor_addmm (#16889)
Summary:
This was serializing all calls to `addmm` (and any op that used it, in my case `bmm`) in the entire process, and led to downright atrocious performance in the TorchScript threaded runtime. Removing this gives a 2x throughput boost for high-load machine translation inference.

The original justification for this is dubious: there are other `gemm` callsites in the codebase that are not protected by critical sections. And in caffe2 land we never had any issues with nonreentrant BLAS libraries
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16889

Differential Revision: D14008928

Pulled By: jamesr66a

fbshipit-source-id: 498e2133bd6564dba539a2d9751f4e61afbce608
2019-02-08 14:49:01 -08:00
f83556bb7b Revert D13806753: [pytorch][PR] TensorIterator cuda launch configs update
Differential Revision:
D13806753

Original commit changeset: 37e45c7767b5

fbshipit-source-id: 74ac9f54f86853287b372ccf21fb37ed0e04a5d3
2019-02-08 12:44:42 -08:00
cd2dca3caf Allow sequential modules in module list (#16882)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/16845
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16882

Differential Revision: D14007746

Pulled By: eellison

fbshipit-source-id: d7918275cc1de6a67320619c3203463f66783343
2019-02-08 12:32:11 -08:00
5ada54e0bc Impl ExpandDims op and fallback to CPU if needed (#15264)
Summary:
Impl ExpandDims op and fallback to CPU if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15264

Differential Revision: D13808797

Pulled By: yinghai

fbshipit-source-id: 7795ec303a46e85f84e5490273db0ec76e8b9374
2019-02-08 12:04:53 -08:00
54c981d9a9 Add visibility to registry class to fix ubsan error (#16792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16792

fix

Reviewed By: ezyang

Differential Revision: D13970381

fbshipit-source-id: 763db24b8a98a2757a63b77c70c8c68ba47f31e6
2019-02-08 10:17:47 -08:00
b9b0be7af2 Remove Legacy entry point. (#16721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16721

The very key line is we have to set the stream to the default
stream before calling the allocator.  This is very interesting.
It shouldn't be necessary, but seemingly is!

Reviewed By: dzhulgakov

Differential Revision: D13943193

fbshipit-source-id: c21014917d9fe504fab0ad8abbc025787f559287
2019-02-08 09:33:58 -08:00
b3fbd3eebf Deduplicate instances caching allocator, so that we only have one instance. (#16720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16720

I'm taking the deduplication slowly because there is something here
that is causing problems, and I want to figure out what it is.

Reviewed By: dzhulgakov

Differential Revision: D13943194

fbshipit-source-id: cbc08fee5862fdcb393b9dd5b1d2ac7250f77c4b
2019-02-08 09:33:56 -08:00
5c982622b0 Delete duplicate copy of THCCachingAllocator (round two). (#16615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16615

This is another go at landing https://github.com/pytorch/pytorch/pull/16226
Now that the caching allocator is moved to c10_cuda, we can
delete the duplicate copy from Caffe2.

The difference between this and the previous PR is that this
version faithfully maintains the binding code; in particular,
we end up with a SECOND copy of the caching allocator in
this patch.  I verified that this code does NOT cause a crash
in the workflow we canaried last time.

In further diffs, I plan to eliminate the second copy, and then
adjust the binding code.

Reviewed By: dzhulgakov

Differential Revision: D13901067

fbshipit-source-id: 66331fd4eadffd0a5defb3cea532d5cd07287872
2019-02-08 09:33:55 -08:00
f03296299b Bump caffe2 docker images to 248 (#16863)
Summary:
Jenkins jobs update will be separate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16863

Differential Revision: D13994672

Pulled By: bddppq

fbshipit-source-id: 5b27879dc6ac11a42016fe7835e9124345005ebb
2019-02-08 00:40:04 -08:00
6ce147c021 Also register op schema when no kernels are registered
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16878

Reviewed By: bwasti

Differential Revision: D13997959

fbshipit-source-id: 7527a560b03f672f76e95d4f22ae28ce24698cc1
2019-02-07 20:53:21 -08:00
2c713032a1 Don't automatically handle context parameter (#16867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16867

Some caffe2 operators (example: BBoxTransform) have not just one template parameter which is the context, but might have multiple template parameters.
Because of this, we can't handle the context parameter inside the macro.

Reviewed By: bwasti

Differential Revision: D13995696

fbshipit-source-id: f55c3be913c8b125445a8d486846fc2fab587a63
2019-02-07 20:53:17 -08:00
fe5989d466 Support onnxifi with partially shaped inferred net (#16877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16877

That's it.

Reviewed By: ipiszy

Differential Revision: D13997771

fbshipit-source-id: f512c7f30b4a4747aca335a0769712c2a2cc2206
2019-02-07 20:44:39 -08:00
7ce33c586d Robust determination of cudnn library and relevant conda packages. (#16859)
Summary:
This PR implements:
1. a fix to issue #12174 - determine the location of cudnn library using `ldconfig`
2. a fix to determine the installed conda packages (in recent versions of conda, the command `conda` is a Bash function that cannot be called within a python script, so using CONDA_EXE environment variable instead)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16859

Differential Revision: D14000399

Pulled By: soumith

fbshipit-source-id: 905658ecacb0ca0587a162fade436de9582d32ab
2019-02-07 20:34:46 -08:00
930ed00b33 Specialize LengthsRangeFill and SparseLengthsWeightedSum in bound shape inference (#16869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16869

TSIA.

Reviewed By: ipiszy, rdzhabarov

Differential Revision: D13994946

fbshipit-source-id: 7e507abc5a3c2834c92910e521387085c56e8b2e
2019-02-07 20:18:15 -08:00
b5111918cd Activation histogram net observer with multiple histogram files as output (#16855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16855

Save the histogram of each net to a separate file

Reviewed By: jspark1105

Differential Revision: D13991610

fbshipit-source-id: a5be4e37a5e63567dcd7fdf99f451ee31bb350a5
2019-02-07 19:51:30 -08:00
ee0e71bee7 Allow dicts in C++ frontend (#16846)
Summary:
Fixes #16856
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16846

Differential Revision: D13991103

Pulled By: driazati

fbshipit-source-id: 4830dd6f707fa90429b5d3070eeda0bee53d2f2b
2019-02-07 18:44:49 -08:00
2db847b3a7 Separate elementwise level2 math functions (#16753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753

Separate elementwise level2 math functions

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13954928

fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67
2019-02-07 18:38:26 -08:00
22477c6a7f Fix (#2) ppc64le build break on git status --porcelain check (#16852)
Summary:
Add test/.hypothesis/ to .gitignore to pass git status --porcelain check in CI build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16852

Differential Revision: D14000206

Pulled By: soumith

fbshipit-source-id: 5da99a4bb242c12aa35776f7254f6399a7fa6d8c
2019-02-07 18:29:37 -08:00
96369506c4 doc updates for TorchScript (#16866)
Summary:
Some batched updates:
1. bool is a type now
2. Early returns are allowed now
3. The beginning of an FAQ section with some guidance on the best way to do GPU training + CPU inference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16866

Differential Revision: D13996729

Pulled By: suo

fbshipit-source-id: 3b884fd3a4c9632c9697d8f1a5a0e768fc918916
2019-02-07 18:03:57 -08:00
67bb7b2931 Fix autodiff of nll_loss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16851

Differential Revision: D13995046

Pulled By: wanchaol

fbshipit-source-id: 557c99f1d1825fa9b6031dd9fa8ba9b54205e8c4
2019-02-07 17:42:01 -08:00
c35f3ae89f aten::_convolution now participates in shape analysis (#16837)
Summary:
During tracing, we record `aten::_convolution` rather than `aten::convolution`. The schema for the former was not present in the shape analysis pass, and resulted in some missing shape information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16837

Differential Revision: D13993831

Pulled By: jamesr66a

fbshipit-source-id: ebb63bf628d81613258caf773a3af5930303ce5a
2019-02-07 17:26:11 -08:00
c65b03b9f8 Enable arg_ops_test/unique_ops_test on AMD/rocm (#16853)
Summary:
Verified both tests are passing on rocm 2.1 env.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16853

Differential Revision: D13996279

Pulled By: bddppq

fbshipit-source-id: c0df610d7d9ca8d80ed2d1339cdadef59105a71c
2019-02-07 16:51:15 -08:00
bca358ad02 Update CI to recently released ROCm 2.1 release (#16808)
Summary:
* we do not need EAP packages any longer as the antistatic feature is now in the release
* consistently install the rccl package
* Skip one unit test that has regressed with 2.1
* Follow-up PRs will use 2.1 features once deployed on CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16808

Differential Revision: D13992645

Pulled By: bddppq

fbshipit-source-id: 37ca9a1f104bb140bd2b56d403e32f04c4fbf4f0
2019-02-07 15:12:18 -08:00
0f42a1ed29 Use bound shape inference in SparseNN tests (#16834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16834

Inserting AdjustBatch ops will possibly change the names of the input/output, so we need to create a mapping and use the renamed names for external_inputs/outputs and input_shape_info for the onnxifi_net.

Reviewed By: ipiszy

Differential Revision: D13982731

fbshipit-source-id: c18b8a03d01490162929b2ca30c182d166001626
2019-02-07 14:51:32 -08:00
66084c0bc9 Add recognition for XLA device types.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16844

Differential Revision: D13988805

Pulled By: gchanan

fbshipit-source-id: 4e89d6d2cde8bdac41739efa65cc91569a360953
2019-02-07 14:51:28 -08:00
64339dbd51 Fix and re-enable test case (#16643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643

The test was disabled in D13908117 because it conflicted with another diff that was about to land.
Now fixed the merge conflict and re-landing it.

Reviewed By: ezyang

Differential Revision: D13911775

fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b
2019-02-07 13:58:16 -08:00
6750e1e3e9 C10_REGISTER_CAFFE2_OPERATOR: Macro for registering c2 kernels (#16548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548

With this macro, a caffe2 operator can now directly be registered with c10.
No need to write custom wrapper kernels anymore.

Differential Revision: D13877076

fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c
2019-02-07 13:58:14 -08:00
ac4f66c9c3 Fix Anaconda logins on binary builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16848

Differential Revision: D13993614

Pulled By: pjh5

fbshipit-source-id: 16854b06d01460b78d9dbe7bd0341b7332984795
2019-02-07 13:44:52 -08:00
4193f7a106 new embedding label type in image input op (#16835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16835

We were using label type `multi_label_dense` to denote both 1) dense representation of integer label 2) embedding label of data type floating number.

This cause some issues as two cases have different assumption, such as for integer label, we will check whether label value is in [0, number_class - 1]. But such check should be skipped for `embedding label`.

Reviewed By: BIT-silence

Differential Revision: D13985048

fbshipit-source-id: 1202cdfeea806eb47647e3f4a1ed9c104f72ad2c
2019-02-07 13:35:59 -08:00
b6648c1bbc Update ATen internals to use int64_t for dimension indexing (#16739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16739

Some code ATen locations seemed to use int, etc. inclorrectly where either
int64_t or size_t was required. Update them to use int64_t for dimension indexing where necessary.

Reviewed By: ezyang

Differential Revision: D13950124

fbshipit-source-id: aaf1cef783bf3c657aa03490f2616c35c816679f
2019-02-07 13:15:42 -08:00
1aa90192ea Make JIT attributes t_ and ts_ store Variable instead of Tensor (#16596)
Summary:
Discussed with zdevito and we want to use Variable (with `set_requires_grad(false)`) instead of Tensor in all parts of JIT, to eliminate the distinction and the conceptual overhead when trying to figure out which one to use.

This also helps with the Variable/Tensor merge work tracked at https://github.com/pytorch/pytorch/issues/13638, which will make common functions (such as `numel()` / `sizes()` / `dim()`) on Variable much faster when finished.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16596

Differential Revision: D13979971

Pulled By: yf225

fbshipit-source-id: c69119deec5bce0c22809081115f1012fdbb7d5a
2019-02-07 12:34:00 -08:00
44d98c30a3 Better error when using a constant tensor (#16724)
Summary:
Fixes #16284
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16724

Differential Revision: D13990531

Pulled By: driazati

fbshipit-source-id: adbf47a07eddb3813fbe1322944abfe5fcff89fa
2019-02-07 12:28:28 -08:00
72f070a124 Backport the stable doc build on v1.0.1 to master (#16503)
Summary:
List of changes:
- Always push the final state of the doc build docker for debugging purposes.
- Adds code for the stable doc build. This code is never actually run on master, only the v1.0.1 branch. There is a big note for this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16503

Differential Revision: D13972469

Pulled By: zou3519

fbshipit-source-id: 68f459650ef0de200a34edd43fc1372143923972
2019-02-07 11:41:07 -08:00
ac00e85e36 Remove undefined tensor in jit script (#16379)
Summary:
This PR is a follow up of #15460, it did the following things:

* remove the undefined tensor semantic in jit script/tracing mode
* change ATen/JIT schema for at::index and other index related ops with `Tensor?[]` to align with what at::index is really doing and to adopt `optional[tensor]` in JIT
* change python_print to correctly print the exported script
* register both TensorList and ListOfOptionalTensor in JIT ATen ops to support both
* Backward compatibility for `torch.jit.annotate(Tensor, None)`

List of follow ups:

* remove the undefined tensor semantic in jit autograd, autodiff and grad_of
* remove prim::Undefined fully

For easy reviews, please turn on `hide white space changes` in diff settings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16379

Differential Revision: D13855677

Pulled By: wanchaol

fbshipit-source-id: 0e21c14d7de250c62731227c81bfbfb7b7da20ab
2019-02-07 11:02:14 -08:00
0d366e1bde Support multiple inheritance in torch.distributions (#16772)
Summary:
This adds calls to `super().__init__()` in three classes in torch.distributions.

This is needed when `Distribution` and `Transform` objects are used with multiple inheritance, as e.g. combined with `torch.nn.Module`s. For example
```py
class MyModule(torch.distributions.Transform, torch.nn.Module):
    ...
```
cc  martinjankowiak esling who have wanted to use this pattern, e.g. in #16756
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16772

Differential Revision: D13978633

Pulled By: soumith

fbshipit-source-id: 8bc6cca1747cd74d32135ee2fe588bba2ea796f1
2019-02-07 01:37:57 -08:00
2681af1c8a Remove redundant wrappers in torch.distributions (#16807)
Summary:
Changelog:
- Remove torch.distributions.multivariate_normal._batch_diag : same functionality is provided by torch.diagonal
- Remove torch.distributions.lowrank_multivariate_normal._batch_vector_diag : same functionality is provided by torch.diag_embed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16807

Differential Revision: D13985550

Pulled By: soumith

fbshipit-source-id: 25c7d00c52ff7f85e431134e9ce0d5dda453667b
2019-02-07 01:13:55 -08:00
511f6fc2d5 Insert AdjustBatchSizeOp into the predict_net. (#16811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16811

As the title. The AdjustBatch ops will be inserted before and after the Onnxifi op to:
1) adjust batch/seq sizes to the ideal batch/seq size before these tensors are processed by the Onnxifi op;
2) adjust batch size to the original batch size for batches generated by the Onnxifi op.

Reviewed By: yinghai

Differential Revision: D13967711

fbshipit-source-id: 471b25ae6a60bf5b7ebee1de6449e0389b6cafff
2019-02-07 00:40:11 -08:00
aa88c2c0b6 Unify gpu_support variable in python tests (#16748)
Summary:
Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748

Differential Revision: D13983132

Pulled By: bddppq

fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f
2019-02-07 00:29:51 -08:00
85ac272670 Update Docker file section in README.md (#16812)
Summary:
Emphasize on the fact that docker build should be triggered from pytorch repo directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16812

Differential Revision: D13985531

Pulled By: soumith

fbshipit-source-id: c6511d1e81476eb795b37fb0ad23e8951dbca617
2019-02-06 23:53:50 -08:00
Jie
49443d49fb TensorIterator cuda launch configs update (#16224)
Summary:
Update launch configs for TensorIterator gpu_reduce_kernel. Enable flexible
block dimension to improve efficiency for reduction cases with small fast
dimension.

Previously TensorIterator launches blocks with fixed 32x16 threads.
For cases like:

  import torch
  torch.randn(2**20, 4, device='cuda').sum(0)

The fixed launch config does handle coalesced memory access efficiently.

Updated launch configure enables flexible block dimension. Combining with
improved reduction scheme (using flexible vertical / horizontal reduction
instead of limited warp / block reduction in the old code), it ensures optimal
memory access pattern even with reduction on dimension with small stride.

Possible future improvements:
1. Precise dynamic shared memory allocation.
2. Using warp shuffle for vertical (block_y) reduction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16224

Differential Revision: D13806753

Pulled By: soumith

fbshipit-source-id: 37e45c7767b5748cf9ecf894fad306e040e2f79f
2019-02-06 23:10:41 -08:00
b2135b2b72 Define layer_norm schema in caffe2 (#16535)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16535

There is now no need anymore to define the layer norm schema in a central location.
It can just be defined in caffe2 next to the kernel implementation.

Reviewed By: ezyang

Differential Revision: D13869503

fbshipit-source-id: c478153f8fd712ff6d507c794500286eb3583149
2019-02-06 21:21:34 -08:00
16468a9f45 Automatically register c10 ops with JIT (#16534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16534

All c10 ops from the c10 dispatcher are now automatically registered with JIT

Reviewed By: dzhulgakov

Differential Revision: D13869275

fbshipit-source-id: 5ab5dec5b983fe661f977f9d29d8036768cdcab6
2019-02-06 21:21:33 -08:00
e5e0bf4152 Add AdjustBatch Op (#16676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16676

This op is used for changing batch size (first dimension) of the tensor.

Reviewed By: bertmaher, ipiszy

Differential Revision: D13929200

fbshipit-source-id: 4f2c3faec072d468be8301bf00c80d33adb3b5b3
2019-02-06 19:15:41 -08:00
100aa0798e Bring back running pytorch tests in rocm CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16829

Differential Revision: D13982323

Pulled By: bddppq

fbshipit-source-id: 6ffadb96b9e2ebd64a29e38674a51401dfb211db
2019-02-06 17:58:48 -08:00
f34192db0f Rename DynamicType -> TensorType (#16787)
Summary:
```
import json
from subprocess import check_call
from pprint import pprint
renames = {
    'c10::TensorType': 'DimentionedTensorType',
    'c10::DynamicType': 'TensorType',
    'c10::TensorTypePtr': 'DimentionedTensorTypePtr',
    'c10::DynamicTypePtr': 'TensorTypePtr',
    'c10::TypeKind::DynamicType': 'TensorType',
    'c10::TypeKind::TensorType': 'DimentionedTensorType',
}

entries = json.loads(open('compile_commands.json', 'r').read())

build = None
sources = []

for e in entries:
    name = e['file']
    if not ('jit' in name or 'ATen/core' in name):
        continue
    build = e['directory']
    sources.append(name)

args = ['clang-rename', '-i', '-force', '-pl']
for name in sorted(renames.keys()):
    args += ['-qualified-name={}'.format(name), '-new-name={}'.format(renames[name])]

for source in sources:
    cmd = args + [source]
    pprint(args)
    check_call(cmd, cwd=build)
    check_call(['git', 'stash', 'push', '-m', 'rename'])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16787

Differential Revision: D13974132

Pulled By: zdevito

fbshipit-source-id: 8368fd53e17cff83707bbe77f2d7aad74f8ce60e
2019-02-06 17:31:07 -08:00
1b919ca93e Use bound shape inference in onnxifi transform (#16598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16598

ATT.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D13893698

fbshipit-source-id: 8d2ad9814fe76924a46b450eb7ebd3601fbdbbc7
2019-02-06 16:34:37 -08:00
717ae09184 improve error message (#16719)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16712
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16719

Differential Revision: D13978688

Pulled By: ezyang

fbshipit-source-id: 61f8fa4c54c6969a58550e32e18be2eb9254ced7
2019-02-06 15:51:58 -08:00
8105aaca86 int8 SpatialBN (#16796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16796

SpatialBN int8 version

Reviewed By: dskhudia

Differential Revision: D13971224

fbshipit-source-id: e55fd608c161069daaa4e62c618bc14b01f32cb7
2019-02-06 15:32:01 -08:00
30ab1773f9 call istringstream clear after str (#16820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16820

Sometimes parsing histogram was not working correctly due to changes in D13633256
We need to call istringstream clear after str

Reviewed By: csummersea

Differential Revision: D13977509

fbshipit-source-id: ce3e8cb390641d8f0b5c9a7d6d6daadffeddbe11
2019-02-06 15:23:08 -08:00
ea35d8e40a Replace resize_dim() with set_sizes_and_strides() (#16732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16732

Use set_sizes_and_strides instead of resize_dim with.

Reviewed By: ezyang

Differential Revision: D13947867

fbshipit-source-id: 067b096b1fde14b039690992a5fe3ace386b2789
2019-02-06 14:50:52 -08:00
929cd23da1 no EIGEN engine for DeformConv (#16785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16785

There's no EIGEN engine implemented for DeformConv but unit test was checking it.

Reviewed By: BIT-silence

Differential Revision: D13967306

fbshipit-source-id: e29c19f59f5700fc0501c59f45d60443b87ffedc
2019-02-06 11:59:31 -08:00
8d4b2db529 format deform_conv_test.py (#16786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16786

Format to prepare D13967306

Reviewed By: BIT-silence

Differential Revision: D13967317

fbshipit-source-id: 2de895f8474b04c55ba067fbf788c553dc010c60
2019-02-06 11:59:29 -08:00
1a13dedf98 Fix/Improve bound shape inference with real net tests (#16597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16597

This diff fixes some bugs in shape inference for `SparseLengthsSumFused8BitRowwise`. And added input shape inference for `Concat` when `add_axis=1`.

Reviewed By: bertmaher

Differential Revision: D13892452

fbshipit-source-id: 6cd95697a6fabe6d78a5ce3cb749a3a1e51c68e7
2019-02-06 10:41:07 -08:00
34cfbb0040 Typofix (#16800)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16800

Differential Revision: D13972592

Pulled By: ezyang

fbshipit-source-id: 45c352ac6090c8060bf75f44dec7205556986d88
2019-02-06 10:34:04 -08:00
30a6feda84 caffe2 | MSVS compatibility fixes (#16765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16765

Code changes required to build caffe2 for windows with toolchain used by FB.

Reviewed By: orionr

Differential Revision: D13953258

fbshipit-source-id: 651823ec9d81ac70e32d4cce5bc2472434104733
2019-02-06 09:47:01 -08:00
887080e92a Fallback sum/add to CPU if needed (#15267)
Summary:
Fallback sum/add to CPU if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15267

Differential Revision: D13935064

Pulled By: yinghai

fbshipit-source-id: eb228683d00a0462a1970f849d35365bc98340d6
2019-02-06 09:35:14 -08:00
39eab01b61 Automatic update of fbcode/onnx to 822d8df0a2a32233c6022f50a158817a0f19bdc7 (#16791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16791

Previous import was bfa8b335ab6d1ed7b688dc2ec96421a3fe9e644c

Included changes:
- **[822d8df](https://github.com/onnx/onnx/commit/822d8df)**: allow removed experimental ops in the checker for now (#1792) <Lu Fang>

Reviewed By: MisterTea

Differential Revision: D13970103

fbshipit-source-id: 5feaaa6c65ba10901eeba0b63724d7e451e9b642
2019-02-06 09:21:41 -08:00
f2e0d64775 Adding torch/lib64 in .gitignore for ppc64le CI build to pass (#16782)
Summary:
Adding torch/lib64 in .gitignore so that a git status --porcelain
check during CI build and test passes for ppc64le. During build
torch/lib64 is created and contains third-party libraries. This
should be ignored by the porcelain check
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16782

Differential Revision: D13972794

Pulled By: ezyang

fbshipit-source-id: 5459c524eca42d396ac46e756a327980b4b1fa53
2019-02-06 09:05:49 -08:00
a3f600e394 Revert D13854304: [redo][c10] LayerNorm Registration Example
Differential Revision:
D13854304

Original commit changeset: ec463ce22721

fbshipit-source-id: 4262b9a2ef486e1c7c0283ea021331ac97cc5f56
2019-02-06 08:26:23 -08:00
fc0e88dd77 Revert D13855525: [c10] Expose RoIAlign to torch
Differential Revision:
D13855525

Original commit changeset: cfee7bb1544d

fbshipit-source-id: 0b4124b78c4082b52e592a1275069c879a9aed39
2019-02-06 08:26:22 -08:00
33a6a7a3ea Revert D13856086: [c10] Expose GenerateProposals to torch
Differential Revision:
D13856086

Original commit changeset: a4873646a71a

fbshipit-source-id: 79b634426404236ddbc407d3796a350ad3dae5ca
2019-02-06 08:26:20 -08:00
018485130f Revert D13864292: [c10] Expose BBoxTransform to pytorch
Differential Revision:
D13864292

Original commit changeset: 1f57664e7834

fbshipit-source-id: 37663b7e8213185ecaa5c219076fc7de64704549
2019-02-06 08:26:18 -08:00
c0a7bf94ed Revert D13865221: [c10] Expose BoxWithNMSLimit
Differential Revision:
D13865221

Original commit changeset: 8a3f1d420183

fbshipit-source-id: 0057be9619b660dcad8c01bae67b54400127577e
2019-02-06 08:26:17 -08:00
cda43336d4 Revert D13866214: [c10] Expose HeatmapMaxKeypoints to torch
Differential Revision:
D13866214

Original commit changeset: 2ca79037fc07

fbshipit-source-id: d2c653f4f32cf0ea76875888f3523c0dc7db9960
2019-02-06 08:26:16 -08:00
d327965dac Fix pip list format in collect_env (#16798)
Summary:
Since pip 18.0 (2018-07-22), `legacy` is no longer a valid choice for `pip list --format` as can be seen in the [Release Notes](https://pip.pypa.io/en/stable/news/#id62). Therefore, the options now are: `columns`, `freeze` and `json`. With `legacy`, this is how it looked like:

```
[...]
Versions of relevant libraries:
[pip3] numpy (1.16.1)
[pip3] torch (1.0.1)
[pip3] torchvision (0.2.1)
[...]
```

Changing to `freeze`, this is how it looks like:

```
[...]
Versions of relevant libraries:
[pip3] numpy==1.16.1
[pip3] torch==1.0.1
[pip3] torchvision==0.2.1
[...]
```

Currently, this is what happens:

```
[...]
Versions of relevant libraries:
[pip] Could not collect
[...]
```
The `freeze` option is also available in old pip, so this change is backwards compatible. Also, if we would like to keep the old style, which I think it is not necessary, I could easily change that.

 ---

In case anyone wants to know how `columns` looks like (I prefer `freeze`):

```
[...]
Versions of relevant libraries:
[pip3] numpy               1.16.1
[pip3] torch               1.0.1
[pip3] torchvision         0.2.1
[...]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16798

Differential Revision: D13971793

Pulled By: soumith

fbshipit-source-id: 3721d9079a2afa245e1185f725598901185ea4cd
2019-02-06 07:48:08 -08:00
d1b2ab83fc disable default system-wide detection of gflags, glog, opencv, lmdb, leveldb (#16789)
Summary:
They can instead by enable by env flags USE_* (as always).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16789

Differential Revision: D13971789

Pulled By: soumith

fbshipit-source-id: d5eac9be677114be3fb15b43080faa0efdfff8ee
2019-02-06 05:13:47 -08:00
255136fc1d fix BUILD_CAFFE2_OPS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16783

Differential Revision: D13965061

Pulled By: zdevito

fbshipit-source-id: 6fe710ca51e2f338873b56f23256668ca3fe2032
2019-02-05 22:39:51 -08:00
ab035d01e3 Remove unnecessary typing import. (#16777)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16777

Differential Revision: D13969679

Pulled By: ezyang

fbshipit-source-id: d4728797a5927ae32628621c654eadb93c0e7682
2019-02-05 21:12:35 -08:00
43f4c86238 Fix alias analysis for fork/wait (#16671)
Summary:
(review top commit only).

As expected, fork/wait introduces some corner cases into the alias analysis. The comments inline should describe the changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16671

Differential Revision: D13963219

Pulled By: suo

fbshipit-source-id: 2bec6fc03a4989cf309fbb9473f3f2ffe2c31431
2019-02-05 20:43:30 -08:00
c1dff549da changes to apply xla patch (#16781)
Summary:
This PR will let xla tests passes after https://github.com/pytorch/xla/pull/183 is in.

Will add back the branch filters once it's ready.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16781

Differential Revision: D13968976

Pulled By: ailzhang

fbshipit-source-id: df3b173336b3247aa56ef723569a1f68cdfa56e0
2019-02-05 19:03:05 -08:00
db5a3c274d Tensor construction codemod (#16568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16568

In caffe2/caffe2/operators and caffe2/caffe2/fb/operators
(Resize + mutable_data) and (ResizeLike + mutable_data)
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13863416

fbshipit-source-id: 90ad3971850b89bf4b2ac81e9fa59d3bc43dc1c9
2019-02-05 18:51:02 -08:00
18edd3ab08 Warn when tracing legacy constructors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16770

Differential Revision: D13963581

Pulled By: driazati

fbshipit-source-id: 8f8cdfc455ba65be370fd952fc5e5c233525d002
2019-02-05 18:32:59 -08:00
7bf7a4162d Use torch.zeros for nn.LSTM
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16779

Differential Revision: D13963577

Pulled By: driazati

fbshipit-source-id: dc9edc3d2096760737ecbe4b3dd441ed2d53f4ad
2019-02-05 17:57:51 -08:00
c5c831953b Set SCCACHE_IDLE_TIMEOUT=1200 (#16728)
Summary:
Doubling the sccache timeout from default of 600.

the asan build of #16645 will fail without this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16728

Differential Revision: D13963727

Pulled By: li-roy

fbshipit-source-id: 3614d75c1b46d663fa05b84f99d8a099283a8e64
2019-02-05 15:28:35 -08:00
448e0d78e9 Document hip-clang and its __HIP__ macro (#16771)
Summary:
In #16085 , we introduced initial hip-clang bring-up code. Document the use of the __HIP__ macro now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16771

Differential Revision: D13961538

Pulled By: ezyang

fbshipit-source-id: 67f6226abcbe62e2f4efc291c84652199c464ca6
2019-02-05 15:13:52 -08:00
4404762d7d Rename IntList to IntArrayRef. (#16751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751

This was made more complicated by the fact that ivalue::IntList
is a thing.  So I had to fix all of the sites where we referring
to IValue post facto.

The following codemods were run, in this order:

```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```

Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752

Reviewed By: dzhulgakov

Differential Revision: D13954363

fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
2019-02-05 14:54:34 -08:00
e2d3a3fd6a dict values(), keys(), and len() (#16629)
Summary:
Adds some operations for dicts to match Python and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16629

Differential Revision: D13961144

Pulled By: driazati

fbshipit-source-id: b31f27a4320ff62cd118b508fb0a13056535dc7c
2019-02-05 13:55:25 -08:00
0ceef3c9f6 Automatic update of fbcode/onnx to bfa8b335ab6d1ed7b688dc2ec96421a3fe9e644c (#16767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16767

Previous import was 875f7bbe537b9d6931d065977c192eaaf61e1179

Included changes:
- **[bfa8b33](https://github.com/onnx/onnx/commit/bfa8b33)**: [ONNXIFI]Add extension of onnxSetIOAndRunGraph (#1781) <Rui Zhu>

Reviewed By: zrphercule

Differential Revision: D13959349

fbshipit-source-id: 4876d00a3f7033cf9d89554f8b4789acd6881f72
2019-02-05 13:17:35 -08:00
0f7a0f8c83 Fix commit races on binary CI on master PR-merges (#16773)
Summary:
There is no way to test this until it is merged.

On master jobs that run after a PR is merged, there is no CIRCLE_PR_NUMBER so the binary builds clone pytorch/pytorch/master, which races.

Based off of https://circleci.com/docs/2.0/env-vars/ and the circleci checkout code
```
git config --global url."ssh://git@github.com".insteadOf "https://github.com" || true
git config --global gc.auto 0 || true

if [ -e /home/circleci/project/.git ]
then
  cd /home/circleci/project
  git remote set-url origin "$CIRCLE_REPOSITORY_URL" || true
else
  mkdir -p /home/circleci/project
  cd /home/circleci/project
  git clone "$CIRCLE_REPOSITORY_URL" .
fi

if [ -n "$CIRCLE_TAG" ]
then
  git fetch --force origin "refs/tags/${CIRCLE_TAG}"
else
  git fetch --force origin "master:remotes/origin/master"
fi

if [ -n "$CIRCLE_TAG" ]
then
  git reset --hard "$CIRCLE_SHA1"
  git checkout -q "$CIRCLE_TAG"
elif [ -n "$CIRCLE_BRANCH" ]
then
  git reset --hard "$CIRCLE_SHA1"
  git checkout -q -B "$CIRCLE_BRANCH"
fi

git reset --hard "$CIRCLE_SHA1"
```
I believe we do no use git tags
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16773

Differential Revision: D13962132

Pulled By: pjh5

fbshipit-source-id: c62d2139f38ff39ecda1509b0bcd8bd102828e40
2019-02-05 13:17:33 -08:00
a9713d07b0 Expose HeatmapMaxKeypoints to torch (#16528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16528

..

Reviewed By: smessmer

Differential Revision: D13866214

fbshipit-source-id: 2ca79037fc070bade5542345af5ce09f88beda44
2019-02-05 12:56:58 -08:00
3df7b321cc Expose BoxWithNMSLimit (#16529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16529

..

Reviewed By: smessmer

Differential Revision: D13865221

fbshipit-source-id: 8a3f1d420183ed5ae51b3c9e4eb6e033078c7ae4
2019-02-05 12:56:56 -08:00
add39b85cc Expose BBoxTransform to pytorch (#16530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16530

..

Reviewed By: smessmer

Differential Revision: D13864292

fbshipit-source-id: 1f57664e78347e72c0087aa3d825a6a9517c1945
2019-02-05 12:56:54 -08:00
f33a2b960e Expose GenerateProposals to torch (#16477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16477

expose generateproposals to torch

Reviewed By: smessmer

Differential Revision: D13856086

fbshipit-source-id: a4873646a71a6b6c01740d21729e827f4b36588f
2019-02-05 12:56:52 -08:00
f5d4636021 Expose RoIAlign to torch (#16476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16476

enable calling roialign (caffe2) from torch frontend

Reviewed By: smessmer

Differential Revision: D13855525

fbshipit-source-id: cfee7bb1544dc58df4231604ba01d61ca905ae3f
2019-02-05 12:56:50 -08:00
240240bb10 LayerNorm Registration Example (#16478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16478

This diff includes an example registration of a caffe2 op in torch.  A previous attempt ran into a static initialization order bug.

Reviewed By: smessmer

Differential Revision: D13854304

fbshipit-source-id: ec463ce2272126d08a5163d1599361ee5b718bbc
2019-02-05 12:56:48 -08:00
af4d2b889c Enable undefined at::Tensor to be passed as Output (#16730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16730

with Jerry's new updates Tensor must be defined -- as a result I've needed to update the shim for caffe2 ops being used in PyTorch

Reviewed By: smessmer

Differential Revision: D13946950

fbshipit-source-id: 6f77877c61a743f82bdfc2ad04d6ab583000cc18
2019-02-05 12:56:46 -08:00
9811a4220d Add XLA / TPU device type, backend type and type id (#16763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16763

Replicate the easy bits in https://github.com/pytorch/pytorch/pull/15153 with TPU / XLA instead of MSNPU. Also don't initialize the storage for XLA tensors for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16585

Reviewed By: ezyang

Differential Revision: D13912118

Pulled By: gchanan

fbshipit-source-id: 4889177e2478768fb281ed075b71146d1d850bd9
2019-02-05 12:56:44 -08:00
6efa40e07b Preserve method parameter names (#16750)
Summary:
Fixes #16591

This uses uniqueBaseName so that parameters do not end up with suffixes. It changes next_id to be per-base-name rather than global to fix jittering issues when re-importing a re-numbered graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16750

Differential Revision: D13960282

Pulled By: zdevito

fbshipit-source-id: 2156f581d9b95d77bf1f1252074e800b19116555
2019-02-05 12:51:24 -08:00
f8d4a14f6d add xla tests to enabled-configs (#16761)
Summary:
This should enable xla tests thus let master xla tests pass.
As usual, I will add the branch filters back before landing.
Thanks ezyang !
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16761

Differential Revision: D13959746

Pulled By: ailzhang

fbshipit-source-id: 7384da281d093d16edccb4283c74e47ac659eeff
2019-02-05 12:25:45 -08:00
e30d33483b Fix logging top commit of pytorch + builder in binaries for long summaries (#16766)
Summary:
I'll test with this really long summary.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce risus sem, mattis vitae commodo vitae, mattis vel ex. Integer nec consectetur ligula, sit amet ultricies risus. Suspendisse potenti. Donec aliquet quam ante. Donec porttitor justo ligula, ut vestibulum erat facilisis a. Nullam eget lobortis nisi. Aenean quis sem id ante eleifend condimentum nec a lacus. Sed sed dolor augue. Proin feugiat, tellus in eleifend cursus, libero nulla lacinia erat, et efficitur dui odio ut ex. In et sem purus. Proin dictum scelerisque magna, nec feugiat dolor lobortis id. Proin ante urna, ultrices in semper et, pulvinar et dui. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Mauris ullamcorper neque a pharetra rhoncus.

Aliquam vel semper felis. Integer id massa erat. Morbi leo eros, varius sed viverra eu, dictum nec purus. Fusce vitae mollis sem, non fringilla nulla. Donec tincidunt luctus dolor. Morbi lobortis, magna quis viverra bibendum, lacus tortor pulvinar risus, eu porta tellus nulla vitae dolor. Sed tincidunt, turpis quis facilisis malesuada, nulla eros lobortis lorem, a fermentum mi nisl non quam. Pellentesque vehicula, nisl non eleifend viverra, tellus neque accumsan tellus, id ultricies lacus mi sed sapien. Proin rutrum ultrices quam sit amet euismod. Maecenas vel faucibus libero, nec efficitur mi. Proin felis augue, elementum eget vestibulum non, euismod sed urna. Curabitur purus nisi, interdum nec rutrum id, faucibus nec sapien. Integer consectetur interdum elit, volutpat vulputate velit. Integer et ultricies magna. Fusce blandit lorem urna, quis sodales sapien porttitor in. Nulla nec sodales sem.

Morbi consequat massa sit amet fringilla pretium. Nunc maximus vitae neque auctor pharetra. Morbi gravida feugiat urna, eu sagittis est pulvinar eget. Maecenas ut fermentum ante, eget malesuada neque. In ut maximus magna. Donec nec finibus sapien. Quisque viverra erat lobortis, rhoncus augue sed, hendrerit dui. Donec in feugiat augue, a ultrices justo. Pellentesque rutrum augue sed nulla auctor, a venenatis risus aliquam. Nullam ipsum justo, dictum sit amet elementum eu, eleifend a turpis. Proin ut tellus ut urna volutpat fermentum ac aliquam tellus.

Quisque ultricies est id eros dictum ultrices. Cras eu urna interdum, eleifend felis vitae, vulputate nulla. Cras tincidunt, mi sodales imperdiet tristique, diam odio convallis ligula, ac vulputate enim sapien eu tellus. Phasellus eleifend finibus sapien id ullamcorper. Donec aliquet eleifend consectetur. Proin in nulla venenatis, egestas neque quis, blandit sem. Suspendisse pellentesque arcu vel ligula fermentum maximus. Aliquam non ipsum ut ante pharetra finibus.

Nunc rhoncus purus sit amet risus congue venenatis. Integer id vestibulum neque, et fermentum elit. Nunc sit amet tortor quis mi aliquam vestibulum et in mauris. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas mollis hendrerit nulla, non tempus neque pharetra ac. Proin commodo bibendum velit, consectetur pretium metus sollicitudin eget. Aliquam malesuada semper tempor. Ut vel vulputate dolor, eu faucibus mauris. Nam commodo quis dolor sit amet eleifend. Phasellus eget massa odio. Donec tempor est at ante finibus lobortis. Suspendisse porttitor imperdiet ultrices. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.

Nullam id dignissim magna, non suscipit odio. Vestibulum vel maximus erat, suscipit ullamcorper tellus. Fusce egestas augue lorem, in ultricies est vehicula ac. Integer pretium, ex in elementum varius, nisi turpis posuere lectus, nec posuere ligula mi ac ligula. Donec vehicula dolor ut ex elementum, quis scelerisque tellus molestie. Mauris euismod magna ac ornare cursus. Vivamus dapibus quam nec tellus aliquam elementum.

Phasellus ultricies quis augue ut fringilla. Suspendisse eu molestie eros. Suspendisse potenti. Curabitur varius sodales maximus. Etiam nec rutrum est. Sed vulputate suscipit elit, eu condimentum mauris pretium eget. Curabitur convallis commodo dui. Aenean lectus orci, pretium non mi sit amet, commodo imperdiet dui. In hac habitasse platea dictumst. In et ex nisl. Duis justo tortor, finibus at augue vitae, fermentum hendrerit tellus. Donec malesuada justo a molestie posuere. Morbi nisl leo, feugiat ut faucibus ut, mattis id purus.

Vestibulum hendrerit lorem ligula, et ullamcorper nisl lacinia sed. Integer vitae lacinia nunc, sed interdum enim. Aliquam aliquet ipsum vitae eros ornare accumsan. Phasellus venenatis laoreet est, sed feugiat neque lobortis id. Proin pulvinar placerat leo lacinia vehicula. Duis accumsan semper lobortis. Donec elementum nunc non quam aliquam, rutrum fringilla justo interdum. Morbi pulvinar pellentesque massa vitae maximus. Cras condimentum aliquam massa, et pellentesque lorem dictum a. Vivamus at dignissim justo. Donec ligula dui, tempus vestibulum est vel, rutrum blandit arcu. Vivamus iaculis molestie neque in elementum. Sed convallis tempus quam non elementum. Nulla euismod lobortis ligula. Etiam ac mauris eget magna posuere ornare id vitae felis. Nunc efficitur lorem et euismod porttitor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16766

Differential Revision: D13959962

Pulled By: pjh5

fbshipit-source-id: 9b71bdf981d4fda9d8951e2d183db81f349b7f81
2019-02-05 11:30:53 -08:00
2a85d98745 Fix type-o in unsupported data type error message (#16537)
Summary:
-In the case where an operator does not support a given data type
 an error message is emitted to alert the user, this message is
incorrectly structured. This commit adds to and rearranges the
error message to make it a little clearer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16537

Differential Revision: D13958859

Pulled By: zou3519

fbshipit-source-id: 935fc3adcef2f969042b1db902c9ec004488ea9c
2019-02-05 10:46:46 -08:00
963e410b57 Make tuple checks faster (#16657)
Summary:
As the comment indicates, the issue is only present in some versions of
Python 2, so we should be able to use heavily optimized PyTuple_Check in
most cases, and skip allocation of the strings, and unnecessary lookups
on object's type.

cc ezyang zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16657

Differential Revision: D13957854

Pulled By: ezyang

fbshipit-source-id: be32eb473ad77a0805e8247d8d583d673d4bdf25
2019-02-05 09:35:37 -08:00
85ad011843 Fixes selection of cuDNN algorithm (#15881)
Summary:
This PR updates the logic for using cudnnGet* and cudnnFind*. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions.

Changelist:
- Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`.
- Used cudnnGet*_v7 everywhere cudnnGet* was being used.
- Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851)

Differential Revision: D13957944

Pulled By: ezyang

fbshipit-source-id: a88c39d80ae37f2d686665622302b62b50fab404
2019-02-05 09:30:00 -08:00
c751cf8b36 Don't throw in operator== for TypeMeta and ScalarType (#16736)
Differential Revision: D13957847

Pulled By: ezyang

fbshipit-source-id: 3cc01538aab1bbb396c29ce61e0e95118f8d011f
2019-02-05 08:56:22 -08:00
1ce188c510 logsumexp for multiple dimensions (#16475)
Summary:
Move `logsumexp` and `max_values` to `TensorIterator` and use it to make `logsumexp` work for multiple dimensions.

Timings on a tensor of shape `(10,1000000,10)`, for each combination of (cpu, single-threaded cpu, gpu) and dimension:

**before**
208 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
279 ms ± 5.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
199 ms ± 2.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.11 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.25 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.11 s ± 6.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
132 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.6 ms ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

**after**
199 ms ± 8.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
307 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
207 ms ± 7.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.16 s ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.26 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.13 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 ms ± 868 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
132 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.6 ms ± 21.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16475

Differential Revision: D13855746

Pulled By: umanwizard

fbshipit-source-id: aaacc0b967c3f89073487e1952ae6f76b7bd7ad3
2019-02-05 08:32:11 -08:00
4047c97266 Revert D13952085: [pytorch][PR] Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA
Differential Revision:
D13952085

Original commit changeset: 410c4e117a44

fbshipit-source-id: fca59c37e71f8e61ae52867d5401b28fbacefe5a
2019-02-05 07:42:59 -08:00
29827e1971 Integrate PyTorch quantization APIs into ensemble export modules (#309)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/309

Pull Request resolved: https://github.com/pytorch/pytorch/pull/16481

This gives us a boolean flag `quantize` on the `BeamSearch` module that allows us to apply FBGEMM quantization to a pretrained PyTorch model and export this to PyTorch native runtime.

Reviewed By: jmp84

Differential Revision: D13514776

fbshipit-source-id: 3f7cbff0782aae54c9623ad1ea7e66d7f49e2b32
2019-02-05 01:55:11 -08:00
0cd918f4d3 Fork/join parallelism for ensemble export modules (#310)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/310

This adds fork/join parallelism to the EncoderEnsemble and DecoderBatchedStepEnsemble models. Note that when run in Python, these calls are no-op, and similarly we remove these calls before exporting to ONNX. But when we run in the PyTorch native runtime, we will now have the opportunity to run these sections in parallel.

Benchmark validation is pending me slogging through FBLearner Flow issues, as usual

Reviewed By: jmp84

Differential Revision: D13827861

fbshipit-source-id: 0cb9df6e10c0ba64a6b81fa374e077bce90f1d5b
2019-02-05 01:55:09 -08:00
ce15ae8f23 Add an API to set the number of threads in C10 thread pool (#16669)
Summary:
Tested locally on machine translation service
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16669

Differential Revision: D13927858

Pulled By: jamesr66a

fbshipit-source-id: efcb8c21e0c2f76ac37967e6f52967da515595c3
2019-02-05 00:15:56 -08:00
3796cbaf7a Try to turn off zero-out of tensors fully
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16601

Reviewed By: ezyang

Differential Revision: D13893776

fbshipit-source-id: 3190258f2591540dc54ad8504ac6ded998bef384
2019-02-04 23:59:11 -08:00
ae5fd10b02 Tensor method rename size()->numel() - 2/3 (#16745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16745

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: dzhulgakov

Differential Revision: D13944353

fbshipit-source-id: 25c2ca22204706544ee67e59c663bf495f2b4f6b
2019-02-04 23:59:10 -08:00
3df91ceb5e Tensor method rename size()->numel() - 3/3 (#16747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16747

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: dzhulgakov

Differential Revision: D13944380

fbshipit-source-id: 2167e2092ab27d31a4d5ef6cfa4b65d192f597a8
2019-02-04 23:54:33 -08:00
cb9740a608 Tensor method rename size()->numel() - 1/3
Summary: Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: dzhulgakov

Differential Revision: D13944296

fbshipit-source-id: 67e97c2cf45889d25f2cb3e2203cecba03c8a3aa
2019-02-04 23:33:17 -08:00
a7a2618d51 Bug fix in l2 quantization (#16749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16749

Use global quantization options in l2 quantization

Reviewed By: jspark1105

Differential Revision: D13951378

fbshipit-source-id: d4e356149587e5d2d09a6937c7fa1aa131957fd6
2019-02-04 22:31:38 -08:00
b1822966ee points-to graph simplification (#16605)
Summary:
This PR reworks the mutability API to be simpler (updates passes to use "mayAlias" calls) and improves the caching logic.

The difference is that we now directly express the idea of a "memory location." Leaves in the alias trackers points-to graph are considered unique memory locations, and mayAlias questions can be boiled down whether two values share a leaf.

To speed up queries, some basic path compression has been added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16605

Differential Revision: D13952738

Pulled By: suo

fbshipit-source-id: cfc7fb2b23369f1dc425d1d8ca2c753c193d95dd
2019-02-04 22:04:25 -08:00
6c04224cd8 Revert "Move outplace ops to ATen (#12413)" (#16731)
Summary:
This reverts commit f660d3ae19decc64390e894fbaf8de80d87585e0.

cc zasdfgbnm

Reasoning at https://github.com/pytorch/pytorch/pull/12413#issuecomment-460424129
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16731

Differential Revision: D13948022

Pulled By: ezyang

fbshipit-source-id: b10669cf03679e306850314b7b5b08bed0839e19
2019-02-04 19:30:04 -08:00
1409a2afc8 Automatic update of fbcode/onnx to 875f7bbe537b9d6931d065977c192eaaf61e1179 (#16734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16734

Previous import was 15c33c945851907411619f599900c3852108e7e3

Included changes:
- **[875f7bb](https://github.com/onnx/onnx/commit/875f7bb)**: Bump docker image version from 230 to 238 (#1786) <bddppq>
- **[f94e430](https://github.com/onnx/onnx/commit/f94e430)**: Fix: setup.py is using wrong cmake build type (#1784) <Changming Sun>
- **[2896c77](https://github.com/onnx/onnx/commit/2896c77)**: Fix Cast testcase data (#1776) <Raymond Yang>

Reviewed By: bddppq

Differential Revision: D13948288

fbshipit-source-id: 5f733005d4bf483d58b630d511cadb0fa4ac7910
2019-02-04 17:37:40 -08:00
3f570b5eea Fix static linkage cases and NO_DISTRIBUTED=1 + CUDA (#16705)
Differential Revision: D13952085

Pulled By: soumith

fbshipit-source-id: 410c4e117a44c08eadc6f3ded91fafc320a7c696
2019-02-04 16:51:12 -08:00
846a64e805 Tensor method rename ndim()->dim() - 1/3 (#16678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16678

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: houseroad

Differential Revision: D13929413

fbshipit-source-id: 677ce760bdbf9f5560630fdc40dd60af227fb696
2019-02-04 15:49:16 -08:00
9e31d6dbf1 Merge job-spec env variables of Pytorch/Caffe2 CI jobs (#16649)
Summary:
The idea is to unify the environment variables `JOB_BASE_NAME` and `BUILD_ENVIRONMENT` which controlled the Pytorch and Caffe2 jobs respectively. In this commit, we have converted all the `JOB_BASE_NAME` references in _.jenkins/pytorch/*_ files to `BUILD_ENVIRONMENT`. Then, did the same thing in ._circleci/config.yml_. One thing that we needed to be careful was when both `BUILD_ENVIRONMENT `and `JOB_BASE_NAME` were present under same declaration in _config.yml_ file (e.g., for "caffe2-" stuffs). To ensure that all "==" checks work as expected, we also had to add "*" in some if conditions in _.jenkins/caffe2/build.sh_ file. Finally, removed "-build", "-test", etc. suffixes from `COMPACT_JOB_NAME` variable assignment in the bash script files in _.jenkins/pytorch_ folder, e.g., modify `COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"` to `COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16649

Differential Revision: D13946392

Pulled By: mmh683

fbshipit-source-id: 790de6abf96de184758e395c9098a50998e05bc5
2019-02-04 15:37:44 -08:00
b250385811 Log top commit of pytorch + builder in binaries
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16729

Differential Revision: D13947737

Pulled By: pjh5

fbshipit-source-id: 9ba8ea56baff7147f73458ab26d0553fff31a46f
2019-02-04 14:30:44 -08:00
c15ed3a2f2 Run resnext101 training in rocm benchmark (#16017)
Summary:
cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16017

Differential Revision: D13946680

Pulled By: bddppq

fbshipit-source-id: ea125b0389188a59db3d537671a3214a557aecdb
2019-02-04 14:16:25 -08:00
6d407baedf Replace resize_dim() with set_sizes_and_strides() in THTensor_(unsqueeze1d) in aten/src/TH/generic/THTensor.cpp (#16673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16673

Replace resize_dim() with set_sizes_and_strides() in THTensor_(unsqueeze1d) in aten/src/TH/generic/THTensor.cpp, as described in T38058642.

Reviewed By: ezyang

Differential Revision: D13928879

fbshipit-source-id: d593cebcc82589cd362ac78884d4e367d0da0ce6
2019-02-04 12:32:14 -08:00
db4235f31d Tensor method rename ndim()->dim() - 2/3 (#16679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16679

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: houseroad

Differential Revision: D13929450

fbshipit-source-id: fcc222744c28b41f2cedffc0c2ef5d04aceaa5af
2019-02-04 11:12:57 -08:00
73db487a8e Update the cmake build configuration for AppleClang compiler (#15820)
Summary:
This pr try to merge the https://github.com/pytorch/pytorch/pull/11563 again and fix the linking error in https://github.com/pytorch/pytorch/pull/14837.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15820

Differential Revision: D13942024

Pulled By: ezyang

fbshipit-source-id: dc6d1e9c4b0f177914f3745665244272a03ce33c
2019-02-04 08:53:47 -08:00
dc528fd734 Fix build with cuda but no cudnn in caffe2 (#16701)
Summary:
Just noticed while building on a machine without cudnn present - it was building but the runtime failed since some methods weren't bound
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16701

Differential Revision: D13937247

Pulled By: dzhulgakov

fbshipit-source-id: c81f05be7a9e64a1a8591036dcf8692c0ed4064e
2019-02-03 22:14:51 -08:00
da24749e8d Fix ReservoirSampling zero-initialization reliance (#16702)
Summary:
The op was implicitly relying on pos_to_output to be zero-initialized after extending. We're removing this functionality from allocator, thus fixing here. For some reason it wasn't spotted by junk-initialization but was reliably reproducible with standard malloc() if both junk_fill and zero_fill flags are turned off.

cc kittipatv jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16702

Reviewed By: kittipatv

Differential Revision: D13937257

Pulled By: dzhulgakov

fbshipit-source-id: 3ee520b05467108e6c3e64eb3e6c60589bdf3d87
2019-02-03 21:31:56 -08:00
6cb593b88c Remove --without-parallel (#16704)
Summary:
See homebrew/homebrew-core@60c72ba9 and homebrew/homebrew-core#31510.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16704

Differential Revision: D13938093

Pulled By: pietern

fbshipit-source-id: 8a70d462180257f96202a0373a86a273b524045c
2019-02-03 13:39:26 -08:00
a53d28dd87 Bump gloo (#16638)
Summary:
This bump includes:
* Memory leak fix where the Gloo transport would hold on to auxiliary
  structures for send/recv pairs after they finished.
* Fix write-after-free from Gloo thread during stack unwinding on error.
* Removal of the PATENTS file.

Fixes #16144.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16638

Differential Revision: D13937950

Pulled By: pietern

fbshipit-source-id: 3cfecaf13ee0f214c06681386557a4b1c3e1d6b9
2019-02-03 11:52:31 -08:00
6d86bc7c3f Fix issue with scalars and __rpow__ (#16687)
Summary:
Changelog:

- Modify __rpow__ function in tensor.py to adapt to scalars
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16687

Differential Revision: D13936720

Pulled By: soumith

fbshipit-source-id: b0c8727968b04efbc6e7461807c812d962f03370
2019-02-02 18:55:51 -08:00
4c16ea93d1 Improve LeftRight (#16524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16524

- Make it exception safe. When an exception happens during write, the old state is recovered.
- Use RAII instead of try/catch to increment counters in readers. This is more readable, and it also makes it work with reader closures that return void, which previously didn't work because the reader return value was stored on the stack.
- Assert there's no reads or writes happening when it's destructed to avoid destruction race conditions
- Explain the algorithm in detail in comments
- Add test cases

Reviewed By: ezyang

Differential Revision: D13866609

fbshipit-source-id: 01306a282a3f555569caa13d8041486f960d00e2
2019-02-02 16:33:27 -08:00
39a55f4ea6 Updating submodules
Reviewed By: zpao

fbshipit-source-id: e66e01e164d1784740fcb8bebc4817d2a8cd7903
2019-02-02 16:33:25 -08:00
988647b21c Updating submodules
Reviewed By: zpao

fbshipit-source-id: 31a8d843ffba2d7405b4742ea553937a00dff216
2019-02-02 05:05:07 -08:00
44809fda84 fix conditional in mean workaround (#16686)
Summary:
When trying to get a test to pass I was missing an exclamation mark. Instead now I just use a different function in the conditional
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16686

Differential Revision: D13935182

Pulled By: jamesr66a

fbshipit-source-id: 7525a1a829276641dbafe06734f03f6202df6b22
2019-02-02 00:55:58 -08:00
7d4a81cbb2 Use macro for reduce on 2d blocks (#16344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16344

Use macro for reduce on 2d blocks

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13808988

fbshipit-source-id: b68c0fb6079c1b6e203a072083aba7a95c202bc2
2019-02-01 23:49:07 -08:00
f36f3cce9a Simplify layer_norm_op_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16570

Reviewed By: ezyang

Differential Revision: D13883913

fbshipit-source-id: 7437d3cbc00c0de92bb01562c620cb658aa9f0d3
2019-02-01 21:34:18 -08:00
13db5dbb81 Make predictor base class
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16541

Reviewed By: ajtulloch

Differential Revision: D13858261

fbshipit-source-id: acbfdbea59bd20ab1cc7956ee0d8856d6faa8361
2019-02-01 20:59:19 -08:00
98b333d810 Tag model_id and onnxifi index in OnnxifiOp (#16648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16648

We added onnxGraph sharing keyed on model id and net seq number but we forgot to supply these info to the Onnxifi. Therefore, we will only create ONE onnxGraph whatsoever... This diff adds necessary info to the OnnxifiOp to prevent this from happening.

Reviewed By: bertmaher, rdzhabarov

Differential Revision: D13912356

fbshipit-source-id: fe8982327287a35f32fe3b125d94b617d18c0ab5
2019-02-01 18:51:51 -08:00
a4ac3cbb2f Updating submodules
Reviewed By: zpao

fbshipit-source-id: ed389204bc423d2d5f7a36e2d61c0f55fe0522e1
2019-02-01 18:46:03 -08:00
c865d46736 Add @ignore annotation (#16055)
Summary:
Adds a decorator `torch.jit.ignore` for Python functions that tells the compiler to skip over these Python values, putting a `prim::Error` in their place which always throws an exception when run.

This lets you have Python-only code in your model in an explicit way, which is useful for debugging, and still be able to save/load the model.

Fixes #15815
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16055

Differential Revision: D13797286

Pulled By: driazati

fbshipit-source-id: 29d36776608ec101649a702952fc6ff3c27655b1
2019-02-01 16:46:12 -08:00
31ab03e34f Add Winograd Conv method for CPU (#15196)
Summary:
Add winograd conv method. Users can select the direct conv or winograd conv in the model file.
We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196

Differential Revision: D13463721

Pulled By: yinghai

fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee
2019-02-01 16:41:30 -08:00
69a816c060 Increase timeout on anaconda logins
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16682

Differential Revision: D13931438

Pulled By: pjh5

fbshipit-source-id: 9961e91a80d8c59ab6347e830b1da38533524dd2
2019-02-01 16:33:51 -08:00
dc64d95f3a Tensor method rename ndim()->dim() - 3/3 (#16680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16680

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: houseroad

Differential Revision: D13929471

fbshipit-source-id: b284ead11031f96fd8b6d96d2f29ffeb14207faa
2019-02-01 16:28:28 -08:00
9594d9bcfd fix the ONNX ci (#16674)
Summary:
~~Let's see whether this trigger and fix the problem~~

remove the expect files from test_verify
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16674

Reviewed By: zrphercule

Differential Revision: D13930668

Pulled By: houseroad

fbshipit-source-id: 092157af07f475cf3809c95a4fe586e050c53b7e
2019-02-01 15:58:39 -08:00
7139410b72 Allow USE_NINJA to be toggled by an env variable
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16665

Differential Revision: D13930021

Pulled By: pjh5

fbshipit-source-id: 4b490f952a56e8561329ab8898be2bf779b46b9d
2019-02-01 15:33:06 -08:00
bd75fba4e8 fix tracing using a dictionary as input (#16616)
Summary:
Previously this would fail with the error message:
```
ValueError: Auto nesting doesn't know how to process an input object of type dict. Accepted types: Tensors, or lists/tuples of them
```
Turns out we're not using the line that causes this error (or a side effect of that line), so removing it fixes the issue. Also cleaned up some related dead code (cc apaszke to make sure the code isn't useful in some way)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16616

Differential Revision: D13908352

Pulled By: suo

fbshipit-source-id: 27094f1f4ea0af215b901f7ed3520e94fbc587b3
2019-02-01 14:44:56 -08:00
aaa8ace486 Implement new c10 dispatcher (#16625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16625

This is a squash of multiple PRs that refactored the old c10 dispatcher into a new one that follows the c10 dispatcher design doc.
It is now unboxed and follows the Stack semantics from JIT. It also uses the runtime JIT schema instead of its own compile time schema definitions.

Reviewed By: ezyang

Differential Revision: D13907069

fbshipit-source-id: edcc4806ccd21474fdfb5a98516219b1956db13d
2019-02-01 13:52:01 -08:00
a40e8ce7c5 Add train() / eval() / is_training() to C++ ScriptModule API (#16044)
Summary:
This PR aims to fix https://discuss.pytorch.org/t/how-to-change-a-loaded-model-to-evaluation-mode-in-c/32330, by adding `train()` / `eval()` / `is_training()` to C++ ScriptModule API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16044

Differential Revision: D13857724

Pulled By: yf225

fbshipit-source-id: 16d3969fb5840ff7e66c7f72e800e6c75db8d2ff
2019-02-01 13:07:38 -08:00
6d373c02ef Revert "Fixes selection of cuDNN algorithm (#15881)" (#16484)
Summary:
There is a regression in cudnnGet*_v7 that causes slowdown in resnet50 training. I am opening a bug with cuDNN team about this. This reverts commit 38374468832e307ca741901870914857a836dd5d.

ezyang 😿
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16484

Differential Revision: D13924755

Pulled By: soumith

fbshipit-source-id: 8c719345fc443f1289539bfae630eea9224ba4a5
2019-02-01 13:07:36 -08:00
638dbe4b46 Revert "Upgrade mkl-dnn to v0.17.3 to fix core dump issue (github#161… (#16660)
Summary:
…83) (#16653)"

This reverts commit 87ae1558a6c8c7c0693bfa995458d16239c484d7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16660

Differential Revision: D13924272

Pulled By: soumith

fbshipit-source-id: 79747d728adff1a9c32d8529846f0305052e57e8
2019-02-01 11:12:16 -08:00
4c803f4ebd Expose backend extensions to python
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16582

Reviewed By: gchanan

Differential Revision: D13887539

fbshipit-source-id: 8755babf2e3e849af974655f2f3a91740efe977e
2019-02-01 11:00:18 -08:00
7e642dfff3 Introduce backend extensions (overriding operators on custom backends)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15153

Reviewed By: gchanan

Differential Revision: D13445571

fbshipit-source-id: 62e2ebe0a6e81c4983b47cddb57ee5eb78e96708
2019-02-01 11:00:16 -08:00
64186e06ec Dispatch factory functions on Type (#15093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15093

Needed for backend extensions.

Reviewed By: ezyang

Differential Revision: D13427897

fbshipit-source-id: d0b34b0072e597ae599bd3bc25356831d7a18d6a
2019-02-01 11:00:15 -08:00
d29912f59e Only run Travis on master branch, not on export-DXXXXX branches. (#16628)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16628

Differential Revision: D13922097

Pulled By: ezyang

fbshipit-source-id: eb16d90cc61167af5edc0c4e361d7a807a3099e5
2019-02-01 09:31:46 -08:00
3672f1536e Ignore assert_git_not_dirty for xla tests (#16611)
Summary:
Testing, will restore the branch filter before landing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16611

Differential Revision: D13902234

Pulled By: ailzhang

fbshipit-source-id: 7fa4048b891645f5253c48b905fb9630e3079524
2019-02-01 08:56:44 -08:00
7078b2baf5 Better bounds checks in ctcloss (#16269)
Summary:
Adds better bounds checks for target lengths in CTC loss, checks for integral types for target and prediction lengths, and adds tests for each, according to #15946
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16269

Differential Revision: D13847567

Pulled By: ezyang

fbshipit-source-id: 5d7a975565e02baf78fe388813a1d1ef56dfb212
2019-02-01 08:02:54 -08:00
87ae1558a6 Upgrade mkl-dnn to v0.17.3 to fix core dump issue (github#16183) (#16653)
Summary:
Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16653

Differential Revision: D13918278

Pulled By: soumith

fbshipit-source-id: b9c09c50ef188b4099966216e155c9f3f2542276
2019-02-01 07:16:46 -08:00
10cd9d5a03 Skip dag_net_forking test on Rocm (#16639)
Summary:
-Skip the test due to flaky behavior on AMD/Rocm
-The fix is expected in Rocm 2.2 ( HSA runtime)
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639

Differential Revision: D13915231

Pulled By: bddppq

fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8
2019-02-01 00:53:54 -08:00
b67b29b667 add SingleLoadedNetSupplier (#16620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16620

LogfiledbNetLoader loads all external input blobs into a workspace instance, we pack a shared pointer to this loaded workspace into the SingleLoadedNetSupplier.
SingleLoadedNetSupplier will pass this workspace to BlackBoxPredictor to be executed. (D13891759 is a WIP of how it all comes together)

Reviewed By: pjh5

Differential Revision: D13901467

fbshipit-source-id: 20589f898922f5f1aec50be131dad17a8c38e9b2
2019-01-31 23:51:59 -08:00
4ae9ab24b6 Update conv_base to support empty batch (#16603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16603

Update conv_base to support empty batch

Reviewed By: houseroad

Differential Revision: D13894111

fbshipit-source-id: fc4370ff16ba6046f374e77bd845d28e6af05ea3
2019-01-31 23:46:18 -08:00
b0e692c8a6 Improving docs for MultiLabelSoftMarginLoss (#16644)
Summary:
Resolves #15863

Changed the documentation for MultiLabelSoftMarginLoss and MultiLabelMarginLoss to be more explicit about the `target` format.

More than happy to change the messaging based on discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16644

Differential Revision: D13912395

Pulled By: soumith

fbshipit-source-id: 24a3c214c5f6f9d043e25b13ac758c1c1211b641
2019-01-31 22:07:14 -08:00
536f647bae respect MAX_JOBS (#16641)
Summary:
We inadvertently switch the OSX build over to ninja on CI. It then fails to respect MAX_JOBS and hits the same scache deadlock bug, this makes the ninja build respect MAX_JOBS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16641

Differential Revision: D13910751

Pulled By: zdevito

fbshipit-source-id: 61bec500539519b019b74421a13cd87fc1d86090
2019-01-31 20:55:37 -08:00
6ba4ca8780 Workaround unvectorized mean implementation (#16618)
Summary:
Workaround for https://github.com/pytorch/pytorch/issues/16617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16618

Differential Revision: D13904276

Pulled By: jamesr66a

fbshipit-source-id: f8b5ea4c5f12dbc405123c9080c55b342c95bcd1
2019-01-31 20:50:53 -08:00
2486facc34 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 4d94eb18d4da58541a96c9f9c2ecc9746f779933
2019-01-31 19:37:24 -08:00
e48ffa84d8 Add compare_exchange_deleter to DataPtr/UniqueVoidPtr (#16513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16513

compare_exchange_deleter makes it easier to replace a
deleter on a DataPtr with a new one, without requiring
allocating another closure to hold the old deleter.
See comment for details.

This diff was originally landed as part of D13762540
(#16226) but we are reverting that diff D13863610 (#16510)

Reviewed By: smessmer

Differential Revision: D13864245

fbshipit-source-id: 56eda4748238dd3a5130ba6434fda463fe7c690e
2019-01-31 17:40:04 -08:00
e4c1b51d82 Shim caffe2 GetRepeatedArgument helper for use with Ivalue (#16519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16519

GetRepeatedArguments is needed for some ops

Reviewed By: dzhulgakov

Differential Revision: D13864293

fbshipit-source-id: a39255cd391c28acd75a6f0e81d558542417e032
2019-01-31 17:33:57 -08:00
13422fca32 Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)
Summary:
1. add `torch.backends.openmp.is_available()`
2. Improve various `cmake` outputs
3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets
4. Fix `MKL` warning message, and QUIET flag.
5. Fix various typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425

Differential Revision: D13903395

Pulled By: soumith

fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31 16:15:46 -08:00
f660d3ae19 Move outplace ops to ATen (#12413)
Summary:
So that things like below can be JITable, and available in C++ API:

```python
import torch

torch.jit.script
def f(x, y, z):
    x.index_add(0, y, z)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12413

Differential Revision: D13899948

Pulled By: suo

fbshipit-source-id: b0006b4bee2d1085c813733e1037e2dcde4ce626
2019-01-31 16:09:45 -08:00
6e17f4a126 Grant credentials to s3 html update job
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16631

Differential Revision: D13908331

Pulled By: pjh5

fbshipit-source-id: 846a4f933d947f7217b856bd79ff85b7f97288a8
2019-01-31 15:59:31 -08:00
d5d7718770 fix scope related naming issue in build_quant_conv_bn_relu, and also format function signature
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14885

Reviewed By: harouwu

Differential Revision: D13374077

fbshipit-source-id: 5082c4ea0d2fdc197243b022b9b489f38b04c8e9
2019-01-31 15:53:27 -08:00
51752e09c6 Disable layernorm_c10 test for now (#16630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16630

two PRs landed concurrently - enforcing tensor constraints and refactoring c10. Since it's not a prod code - disable test and I'll let Sebastian to fix it properly.

Reviewed By: ezyang

Differential Revision: D13908117

fbshipit-source-id: 381c5626078b794afa1fc7a95cb1ea529650424c
2019-01-31 15:47:13 -08:00
a386c28fcd Remove constant propagation expect files (#16348)
Summary:
Remove constant prop expect files, and express graph conditions via python bindings.

First diff in larger effort to remove expect files
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16348

Differential Revision: D13906929

Pulled By: eellison

fbshipit-source-id: 7963caa3ccbc7bfc0006a160c952aa173d1ce633
2019-01-31 15:41:22 -08:00
dfb081a7e4 Fix a lot of C++ build warnings (#16411)
Summary:
I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411

Differential Revision: D13901006

Pulled By: jamesr66a

fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033
2019-01-31 14:35:56 -08:00
3f8fd19a86 Add immutable dict support (#16208)
Summary:
This PR adds basic support (creation and indexing) for immutable dictionaries in Script. This includes Python/string frontend support and a `IValue::GenericDict` type backed by a `std::unordered_map`. Only `str`, `int`, and `float` are supported as keys, any type can be a value. Structure is pretty similar to list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16208

Differential Revision: D13881686

Pulled By: driazati

fbshipit-source-id: 29ce9835b953c3456f57bcc2bbdf7fe0cbf941c0
2019-01-31 14:29:23 -08:00
4bdf51cbd6 Make the miopen handle part of ConvolutionParams (#16613)
Summary:
so that it's included in the hashed key that decides whether to call Find or not. This is required to ensure that Find is run for all devices
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16613

Differential Revision: D13901769

Pulled By: bddppq

fbshipit-source-id: 7d29ea9e40231cd4eef80847afa1307efeb0945c
2019-01-31 14:09:04 -08:00
a061e3fd77 Back out "Revert D13596031: Improve c2-aten tensor interop and add proper testing" (#16514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16514

Original commit changeset: dc371697f14b
Relanding https://github.com/pytorch/pytorch/pull/15860 - the problem was that layer_norm was using at::empty which is not yet on mobile

Reviewed By: ezyang

Differential Revision: D13861480

fbshipit-source-id: e2116da32bc117175c96b9151b1beba9b31eff36
2019-01-31 13:38:55 -08:00
0b29bd82f6 use distutils to discover msvc compiler paths (#16540)
Summary:
This simplifies the process for building on windows, since users no longer have to find and run the vcvarsall.bat file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16540

Differential Revision: D13893596

Pulled By: zdevito

fbshipit-source-id: 79b7ad55c3251b3f573fd8464931138f8a52dd1d
2019-01-31 13:25:33 -08:00
1ff46f03ed Fix SIOF in torch using caffe2 registry (#16473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16473

This resolves the issues associated with caffe2 initialization (specifically the REGISTER_FUNCTION_SCHEMA_OPERATOR calls) being run after Torch's static op registration calls.

The fix employs a meyer's singleton wrapped by the constructor of a type.  Everything is placed inside a macro to make it easier for users to use.

Reviewed By: smessmer

Differential Revision: D13854306

fbshipit-source-id: ecf60861f229532826fae254974e9af4389055df
2019-01-31 13:04:11 -08:00
1efad7f6be Swap Caffe2 operator constructor to pass arguments by value (#16576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16576

allows instantiation of operator with arguments passed by move rather than explicit copies

per Sebastian's suggestion

Reviewed By: smessmer

Differential Revision: D13882416

fbshipit-source-id: bc8d50e73f5a1ae87155b0cf96799b8573a7a8fa
2019-01-31 13:04:09 -08:00
26565046ac Allow ScriptModule(optimize=False) when jit disabled (#16297)
Summary:
Fixes #16285
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16297

Differential Revision: D13797276

Pulled By: driazati

fbshipit-source-id: 3a93500d4233cfbb8f5af7feba43f6ff4c3d22c7
2019-01-31 12:29:15 -08:00
20d45c43d7 Get more fusion after autodiff uses SumToSize (#14957)
Summary:
Here is a fresh attempt at getting some fusion back in autodiff-generated graphs in the presence of SumToSize.

- The sum to size operator is now  `aten::_grad_sum_to_size` to allow symbolic script differentiation (and that in turn would need to use this in place of sum_to_size to signal that it strictly operates on gradients). This is also used in the autodiff code, replacing `prim::SumToSize`.
- `_grad_sum_to_size` is now fusable, `cat`s - which are fused afterwards thanks to Adam's simplification of the code - are only fused if there is no `_grad_sum_to_size` in the fusion group.
- I push the `_grad_sum_to_size` out of the the fusion group when compiling and record the desired summations in the KernelSpec. The reasoning is the following:
  - As the autodiff is a repeated applicaiton of the chain rule, we always have the pattern `grad_in = mm(A, grad_out)`,  with A often diagonal for cases interesting to the fuser, whence it is `grad_in = a * grad_out` (a pointwise multiplication). We know that only `grad_out` may have AutodiffGradSumToSize applied, so we can commute AutodiffGradSumToSize with the `mul` (and `div` and `neg` are of similar origin).
  - For `type_as` the gradient might be giving the type, so just skip SumToSize,
  - `add` (which was inserted as `prim::AutogradAdd`) adding gradients when the forward used the same value in several places. This is non-broadcasting, so we know that the two arguments would have the same sizes as inputs - which is good so we don't have to do bookkeeping of the two parts.

Details:
- During fusion, the Tensor arguments are always kept as the first parameters of the fusion group to accomodate indexing assumptions in the fuser.
- The rewriting of the fusion group to record the necessary output transformation and eliminate `_grad_sum_to_size` from the fusion group is now in the fuser compile step.
- In the execution step, the arguments are split into Tensor / Non-Tensor and the non-tensor args are mostly forgotten about except for doing `sum_to_size` at the end. This would want to be improved if/when we fuse nonconstant scalar arguments.
- In a number of places in the fuser, the non-Tensor arguments to the fusion group needed to be ignored.

Thank you, apaszke for the insightful discussion. All bad ideas and errors are my own.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14957

Differential Revision: D13888173

Pulled By: zou3519

fbshipit-source-id: 071992c876e8b845f2b3e6329ae03a835d39a0ea
2019-01-31 12:24:38 -08:00
4b7e70067c Enable USE_NINJA in build_pytorch_libs.py if it is in PATH (#16545)
Summary:
It is required to fix the nightly conda builds.
cc zdevito ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16545

Differential Revision: D13900610

Pulled By: soumith

fbshipit-source-id: 676f903a082f6f083e07245a1df38175bb82b2f7
2019-01-31 11:57:11 -08:00
b109549bf3 Replaced "from_numpy" with "as_tensor" in docs. (#16587)
Summary:
In the warning box on https://pytorch.org/docs/stable/tensors.html#torch.Tensor.new_tensor it says:

> new_tensor() always copies data. [...] If you have a numpy array and want to avoid a copy, use **torch.from_numpy()**.

But then further up the page we have another warning box with the message:

> torch.tensor() always copies data. [...] If you have a numpy array and want to avoid a copy, use **torch.as_tensor()**.

Now I believe this is just a small oversight, since from_numpy is to be deprecated in favour of as_tensor. See for example https://github.com/pytorch/pytorch/issues/6885 and https://github.com/pytorch/pytorch/issues/8611. I suggest to just use **torch.as_tensor()** in both of the warning boxes.

cc gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16587

Differential Revision: D13897038

Pulled By: gchanan

fbshipit-source-id: 2eb3cd47d2c0b5bf4350f980de3be9fe59b4a846
2019-01-31 11:51:32 -08:00
482d3a3bf3 printing correct dimension while indexing (#16495)
Summary:
applySelect does modify the tensor and removes the top most dimension which makes it complicated to track just using dim and need to use another parameter as real_dim to signify original dimension
fixes #16192
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16495

Differential Revision: D13897182

Pulled By: gchanan

fbshipit-source-id: 105581dbbff6b431cc8e2539a07e0058161e53a1
2019-01-31 11:45:56 -08:00
32daa90fbd remove unused capture (#16526)
Summary:
We don't use this in the lambda body anymore. Remove it to fix a warning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16526

Differential Revision: D13867043

Pulled By: umanwizard

fbshipit-source-id: 4c9a9d194fdfcb63fde16823517d2c6c8e2ae93d
2019-01-31 11:11:55 -08:00
72a431edce split up AliasTracker into a separate file (#16588)
Summary:
This just moves thing around to make AliasTracker independently testable and keep things a little more separate. Follow-on PRs will change the interfaces of AliasDb and AliasTracker to be more clearly distinct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16588

Differential Revision: D13891894

Pulled By: suo

fbshipit-source-id: c5b590b5fdd462afefe743e499034068bf35784a
2019-01-31 10:53:53 -08:00
e7e3838f3b Access profiler from cpp (#16580)
Summary:
jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16580

Differential Revision: D13891299

Pulled By: zdevito

fbshipit-source-id: 83b335bf3231a9ab30e9318f2bce6d741ba5ffae
2019-01-31 10:37:47 -08:00
d2861230f3 Fix cuFFT plan cache size on CUDA 10 cannot be set to > 4096 (#16384)
Summary:
Doc doesn't need to be changed. Also clarifies two inaccurate comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16384

Differential Revision: D13886637

Pulled By: soumith

fbshipit-source-id: 227385008211a6f3ad9135c54fd2d3754cc9daaf
2019-01-31 06:56:39 -08:00
d47108add0 Clean up binary jobs in CircleCI (#16511)
Summary:
- Add libtorch upload jobs
- Unify checkout and env code for binary jobs (san binary test jobs)
- Compress variables passed into binary jobs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16511

Differential Revision: D13893714

Pulled By: pjh5

fbshipit-source-id: b8bd72e1397dec569a8ec3e859e319178c7c6f8b
2019-01-30 23:46:58 -08:00
8b053fccc7 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 36c332beab1aaccb281d5ee07952d399056b7f8c
2019-01-30 23:37:23 -08:00
db121375e7 more careful use of inline/template function in perfkernels (#15388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15388

This is another pass to make perfkernels code safer from illegal instruction error.
Removed dependency to c10/util/Logging.h
We're err on the safer side at the expense of some verbosity.

Reviewed By: dskhudia

Differential Revision: D13502902

fbshipit-source-id: 4f833115df885c5b4f8c1ca83b9badea1553f944
2019-01-30 22:49:37 -08:00
26200ebf56 Updating submodules
Reviewed By: zpao

fbshipit-source-id: a0a2a635f86ef3bebfb4ca1a36f7ec9c2b09d7bb
2019-01-30 21:12:02 -08:00
d3742603cb DeviceScope support for CUDA and testing (#15357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15357

Supporting device option in FQ bn folding for ITER related ops

Reviewed By: wat3rBro

Differential Revision: D13370259

fbshipit-source-id: 4324c2716dfa69ddedc661ae2b1ad34c2f6fc4b6
2019-01-30 18:42:12 -08:00
a44826e659 Fix: avoid race condition on model zoo directory creation (#16578)
Summary:
The current implementation of the `torch.utils.model_zoo.load_url`
function is prone to a race condition when creating the directory in
which it saves the loaded models, since it checks whether the
directory exists and then creates it in two separate steps. The
directory can be created after the check was made but before we
attempt to create the directory, resulting in an unhandled exception.

Instead, try to create the directory directly, and do nothing if it
already exists.

Note: for Python versions ≥ 3.2, we could simply use the
`exist_ok=True` flag on `os.makedirs`, but this is unavailable in
Python 2.7.

Signed-off-by: Antoine Busque <antoine.busque@elementai.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16578

Differential Revision: D13886470

Pulled By: soumith

fbshipit-source-id: 88815c8a65eec96caea32d6e9a7f83802502fdb9
2019-01-30 18:35:45 -08:00
bc53805f2e Remove redundant declarations (#16463)
Summary:
As there are no checks that all the functions are actually being used, we can end up with stale entries. This diff removes unused entries from Declarations.cwrap

Testing:
Successful build via "python setup.py develop"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16463

Differential Revision: D13885815

Pulled By: izdeby

fbshipit-source-id: 4e35c2ac9196167af74dff3d4f971210721285f8
2019-01-30 18:29:00 -08:00
3ba6f55ae3 begin splitting up cpp tests (#16536)
Summary:
Start splitting up these tests so we don't have a massive test file. Doesn't change how you run them, since `gtest.cpp` and `no-gtest.cpp` will still collect everything.

Renamed `tests.h` to `test_misc.h` to vaguely discourage people from adding yet more stuff to it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16536

Reviewed By: zdevito, eellison

Differential Revision: D13882215

Pulled By: suo

fbshipit-source-id: 61cf97f3c2c50703dcf6a3a34da01415ecb7e7d6
2019-01-30 17:58:54 -08:00
0ef9569841 Use dispatch tensor for device_guard instead of first Tensor argument
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16579

Differential Revision: D13886593

Pulled By: cpuhrsch

fbshipit-source-id: 0722ec61da13c2541f7de51bf5c1ecfb9a12fad2
2019-01-30 17:30:24 -08:00
fc2d8c6889 Eliminate PYCMD in favor of PYTHON_EXECUTABLE in CMake.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16522

Differential Revision: D13867376

Pulled By: resistor

fbshipit-source-id: 6bce68facea83c5161a31fcdfafe08827999eb2b
2019-01-30 17:13:43 -08:00
16e2e4f29f added example to clear ambiguity in torch.Tensor.view (#16563)
Summary:
Added example to the documentation of [torch.Tensor.view](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view) to avoid the misunderstanding referenced in issue [#16560](https://github.com/pytorch/pytorch/issues/16560)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16563

Differential Revision: D13885008

Pulled By: soumith

fbshipit-source-id: b7e7fbea1f16124bc4e679ae9c50ab619e1f043d
2019-01-30 16:53:31 -08:00
851437dd4b Fix uninitialized data and broken broadcasting with sparse.mm and spa… (#16572)
Summary:
…rse.addmm.

Fixes https://github.com/pytorch/pytorch/issues/16543.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16572

Differential Revision: D13884235

Pulled By: gchanan

fbshipit-source-id: 308916051364d72f72ec56f0495c6c7c09845131
2019-01-30 16:08:50 -08:00
33f2ab1fdb add new build files to gitignore; test that build does not leave git repo checkout dirty (#16565)
Summary:
These appear when I run
```
MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ NO_CUDA=1 NO_DISTRIBUTED=1 BUILD_CAFFE2_OPS=0 DEBUG=1 python3 setup.py develop --cmake
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16565

Differential Revision: D13885790

Pulled By: ezyang

fbshipit-source-id: af0e028d7fa7832a945aaee4e241ceb5418f4ec8
2019-01-30 15:19:11 -08:00
c653fa2b00 Move Deprecated.h to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16504

Reviewed By: smessmer

Differential Revision: D13860570

fbshipit-source-id: 4742dc30c78d49b0f655b4e9536f51b36a595636
2019-01-30 14:26:37 -08:00
18659e1336 Allow generic containers as module inputs (#16482)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16326

Previously we didn't handle module inputs which included Generic Lists. When checking whether a generic list if a subvalue of the input arg type, I currently recurse on every element of the list. This shouldn't be too slow since the innermost list will be specialized and we won't have to check it's elements.

E.g. Tensor[][] -> GenericList [TensorList ].

The error message could be improved, but extracting the complete type of nested lists would have to deal with unifying types across lists / empty lists & typevars so I'm going to save that for a follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16482

Differential Revision: D13882582

Pulled By: eellison

fbshipit-source-id: 3609bc572f0ee9ebf20a77ea5ebc8fa3b165e24b
2019-01-30 14:20:56 -08:00
22e9c3055a Explicit pdist captures (#16286)
Summary:
Per discussion with cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16286

Differential Revision: D13883001

Pulled By: erikbrinkman

fbshipit-source-id: 86f35d35fde5db67e3fbb09abc418da0116c9aac
2019-01-30 14:02:36 -08:00
1905bbb01d Include ATen/core/functional.h directly instead of torch/csrc/utils/functional.h. (#16377)
Summary:
One more shim removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16377

Differential Revision: D13821816

Pulled By: ZolotukhinM

fbshipit-source-id: 007f014d404de51841437db7eef28367a2f6e46b
2019-01-30 14:02:34 -08:00
b28f0f9d37 Remove --no-update-dependencies (#16575)
Summary:
Absolutely no idea why this is needed. This should be a valid argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16575

Differential Revision: D13884796

Pulled By: pjh5

fbshipit-source-id: 6011e721e2870499f6b5e627d5ad00ece08b530b
2019-01-30 13:53:51 -08:00
3ab736b774 Update PyTorch DockerVersion to 285. (#16507)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16507

Differential Revision: D13884588

Pulled By: ezyang

fbshipit-source-id: b7e22daa15874f9a226195d4749b4f9f827d7c1e
2019-01-30 13:29:25 -08:00
2ed5569bd6 Support fallback for more operators (#16566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16566

it's a follow-up to https://github.com/pytorch/pytorch/pull/16456

Reviewed By: yinghai

Differential Revision: D13881462

fbshipit-source-id: eff063580ac8f622477417ed4b25320299451811
2019-01-30 13:21:20 -08:00
307c83b5eb fix the linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16567

Differential Revision: D13882166

Pulled By: houseroad

fbshipit-source-id: daf760f51e4fce376ca09421900405970d00c4d2
2019-01-30 13:16:49 -08:00
c43917b0a3 Add a test case calling caffe2 layer_norm from caffe2 executor but through the c10 dispatcher
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16283

Reviewed By: ezyang

Differential Revision: D13792591

fbshipit-source-id: 9c190649e38e8706549102b2e136ceaf508eb37f
2019-01-30 13:16:47 -08:00
2af95d8e3e Back out "[pt1][tensor] Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize" (#16516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16516

Original commit changeset: 64abce3dbaed

Reviewed By: dzhulgakov

Differential Revision: D13863715

fbshipit-source-id: f1923fdca4a1a82768d9c280a8493ff15a7eb2ba
2019-01-30 12:50:38 -08:00
cdbd388206 Remove the debugging info of pytorch=>onnx coverage script (#16538)
Summary:
Remove the debug info.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16538

Reviewed By: houseroad

Differential Revision: D13872068

Pulled By: zrphercule

fbshipit-source-id: 7572668d0048c37f6b6029a48e5ae4b8b21823f7
2019-01-30 11:40:28 -08:00
a7796bc24d CUDA histogram implementation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15842

Reviewed By: zou3519

Differential Revision: D13868982

Pulled By: jaciefan

fbshipit-source-id: bce81dc121c4538d204047506f8f14d0b4d8f905
2019-01-30 11:36:20 -08:00
dc84ff1e5a Use a points-to graph for alias analysis (#16386)
Summary:
This PR changes the way we store aliasing information from a "set" approach to a "points-to" analysis. Set-based approaches lose information in ways that make it difficult to do "live" updates to the alias DB as one as mutating the graph.

The tradeoff is that simple queries get more expensive, since they require traversing the points-to graph to answer most questions. In practice, this is unlikely to be that costly since we don't have massive aliasing chains, but we could create an approximation/caching layer if this becomes a problem.

My rough plan is:
1. This PR, switching to a points-to graph
2. Make it "live": analyzing a node should record all the edges the node added, so that we can rollback when the node is destroyed.
3. Reduce wildcard scope: we can make the wildcard a special vertex that points to anything that we're not "sure" about; namely, things that have been put inside lists, or graph inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16386

Differential Revision: D13855117

Pulled By: suo

fbshipit-source-id: f009f58143173c275501624eb105d07ab60fe5e1
2019-01-30 11:28:03 -08:00
dff8165d04 ONNX Export Flatten operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16240

Reviewed By: bddppq

Differential Revision: D13800025

Pulled By: houseroad

fbshipit-source-id: ae4c5e42026477b28ffd44eda2438d93936ea510
2019-01-30 11:05:00 -08:00
68620cdcb5 Revert D13880053: [pytorch][PR] add new build files to gitignore; test that build doesn't leave repo dirty
Differential Revision:
D13880053

Original commit changeset: 0171f42438ef

fbshipit-source-id: a734f8704c1cbe16434c672289c505b19b2b490a
2019-01-30 11:04:58 -08:00
34b43baeec Allow list and tuples to be passed as output_size to max_unpool1d (#16489)
Summary:
Changelog:
- Modify concantenation of [1] to a tuple by using cases for list and non-list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16489

Differential Revision: D13875838

Pulled By: soumith

fbshipit-source-id: fade65cc47385986b773b9bde9b4601ab93fe1cf
2019-01-30 11:00:34 -08:00
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
3b91df3744 add example multiprocess code (#16345)
Summary: fixes #16141

Differential Revision: D13868539

Pulled By: ailzhang

fbshipit-source-id: 03e858d0aff7804c5e9e03a8666f42fd12836ef2
2019-01-30 09:35:58 -08:00
fa717cba63 Support int64_t shape data for ideep reshape op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16533

Reviewed By: jerryzh168

Differential Revision: D13867402

fbshipit-source-id: ff53a851f142ef915ad69da3868bb3aab4d48987
2019-01-30 09:00:09 -08:00
2d2eb7145a add new build files to gitignore; test that build doesn't leave repo dirty
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16441

Differential Revision: D13880053

Pulled By: ezyang

fbshipit-source-id: 0171f42438efdd651b6af22e521b80e85b12681c
2019-01-30 08:41:59 -08:00
7d7855ea31 Fallback support for more operators (#16456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16456

Adding fallbacks for more operators and fixing ifndef for expand_op.h

Reviewed By: yinghai

Differential Revision: D13845382

fbshipit-source-id: b7c5b7f7f176707b9ddffade139562a8085967ed
2019-01-30 03:54:11 -08:00
21907b6ba2 Fix the dropout onnx symbolic, and ensure all exported models in test_operators.py are eval mode (#16547)
Summary:
In eval mode, skip dropout operator in onnx exporter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16547

Reviewed By: houseroad

Differential Revision: D13877136

Pulled By: dzhulgakov

fbshipit-source-id: c366da156f83677bcf4989b79166aae5b6c36125
2019-01-30 01:16:21 -08:00
598b713660 Seperate level1 elementwise functions from math (#16397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16397

Seperate level1 elementwise functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13830626

fbshipit-source-id: e6e672647076dab8b3b24be181f580a1486250c9
2019-01-30 00:04:12 -08:00
ed4776820a Fix includes for ATen/core/stack.h (#16462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16462

This file was moved, now we change the includes to the new location and remove the proxy header.

Reviewed By: ezyang

Differential Revision: D13847279

fbshipit-source-id: 4617d52fdcfe785cb7b2154460a6686c437abd8b
2019-01-29 23:33:13 -08:00
7c66ad7455 Add test case for calling c10 ops from pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16062

Reviewed By: ezyang

Differential Revision: D13628955

fbshipit-source-id: f6ed3f07db2675bd9ae9251da990ca7b8c963717
2019-01-29 18:22:52 -08:00
12f92f453b Kernel gets Stack* instead of ArrayRef<IValue> (#16282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16282

This changes the core kernel abstraction to be a function taking a stack, popping its arguments from the stack and pushing results to the stack,
instead of getting arguments as ArrayRef<IValue> and returning an output IValue.

Caffe2 operators need to have a way to pass in preallocated output tensors.
The convention for them is to get all inputs *and* outputs on the stack and also return all of them, i.e. a caffe2 op will always have inputs == outputs.
This will probably change in later diffs towards making the outputs in-arguments optional in the JIT schema.

Reviewed By: ezyang

Differential Revision: D13792335

fbshipit-source-id: e9cc2b5e438cc4653e1f701633a154b92b604932
2019-01-29 18:22:51 -08:00
6249442e90 Chunk dataset implementation (#15932)
Summary:
This PR contains the implementation of chunk dataset, with the API proposed in PR https://github.com/pytorch/pytorch/pull/15562

A chunk dataset is derived from StatefulDataset. It utilizes worker threads to prefetches chunk data, splits it into batches and caches them into a queue. When get_batch is called from dataloader, batch data is retrieved from the queue, and data in new chunks will be pushed for later following batches.

Chunk dataset uses two samplers (chunk_sampler and example_sampler) to perform sampling. The chunk_sampler decides which chunk to load, and example_sampler shuffles the examples inside a specific chunk. More detail of this sampling approach can be found here: http://martin.zinkevich.org/publications/nips2010.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15932

Differential Revision: D13868688

Pulled By: soumith

fbshipit-source-id: a43000c478ca2a3c64cc84b3626d6b8b1ad9a07e
2019-01-29 18:06:01 -08:00
21193bf123 try to get rid of tmp_install (#16414)
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414

Differential Revision: D13863635

Pulled By: zdevito

fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
2019-01-29 17:29:40 -08:00
ffed8bff6a Fix torch.sparse.sum parsing of dim. (#16517)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16501.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16517

Differential Revision: D13865322

Pulled By: gchanan

fbshipit-source-id: fa0ac37a9e7b8f19a5bdf75e5771128e48c41612
2019-01-29 16:19:22 -08:00
f924fc6eb1 Make Store::setTimeout take milliseconds (#16278)
Summary:
This is particularly useful when using a c10d::Store from tests.

cc jgehring
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16278

Reviewed By: janewangfb

Differential Revision: D13866271

Pulled By: pietern

fbshipit-source-id: c8670b5f4ebd5cd009f2cabbe46cc17a9237d775
2019-01-29 16:15:25 -08:00
279238f0b8 Back out "Delete duplicate copy of THCCachingAllocator." (#16510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16510

This diff was supposed to be memory usage neutral, but based on
some internal flows involving cuDNN, it was not. Reverting pending
further investigation.

Original commit changeset: 03f1ebf7f11c

Reviewed By: xw285cornell

Differential Revision: D13863610

fbshipit-source-id: 15517e255fd6b0c064b65fb99f0ef19742236cfd
2019-01-29 15:44:19 -08:00
4f809397fd Fix compare_exchange_weak usage in weak_intrusive_ptr (#16302)
Summary:
In the case of spurious failure, refcount is not incremented -- which leads to underflow once all references are released.

This was discovered when exercising multiprocessing on ppc64le.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16302

Differential Revision: D13845435

Pulled By: ezyang

fbshipit-source-id: 8e264fff9dca8152cb12617e3216d5e48acd9557
2019-01-29 14:10:04 -08:00
719134f3c3 Automatic update of fbcode/onnx to 15c33c945851907411619f599900c3852108e7e3 (#16493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16493

Previous import was dc75285d4a1cff9618400164dfdb26c5a1bab70a

Included changes:
- **[15c33c9](https://github.com/onnx/onnx/commit/15c33c9)**: Add ppc64le build (#1768) <Chin Huang>
- **[198f840](https://github.com/onnx/onnx/commit/198f840)**: Update Broadcasting.md (#1769) <Verma-Rajat>
- **[60ac95f](https://github.com/onnx/onnx/commit/60ac95f)**: Merge back from release 1.4.1 (#1767) <Raymond Yang>
- **[a683372](https://github.com/onnx/onnx/commit/a683372)**: Bump up version number for v1.4.0 (#1761) (#1763) <Raymond Yang>
- **[dbf3581](https://github.com/onnx/onnx/commit/dbf3581)**: Add TfIdfVectorizer operator to ONNX (#1721) <Dmitri Smirnov>

Reviewed By: zrphercule

Differential Revision: D13858840

fbshipit-source-id: 1d00f63f265cc6deed965b92ed00c44f547ff03e
2019-01-29 13:48:49 -08:00
541ce96564 Make the pytorch's cmake minimum required version equal to caffe2's. (#16506)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#16506 Make the pytorch's cmake minimum required version equal to caffe2's.**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D13861564/)

Originally authored by JerryShih <bignose1007@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16506

Differential Revision: D13863979

Pulled By: ezyang

fbshipit-source-id: 9275739a820ae03ec6eaa41959f9340c9bba8de3
2019-01-29 13:39:32 -08:00
3ab620926f More windows fixes towards the code refactor (#16451)
Summary:
Fixes #16446.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16451

Differential Revision: D13864388

Pulled By: soumith

fbshipit-source-id: 6cb173eafbd3da33c479c56c85aff75e8be4bf35
2019-01-29 13:15:36 -08:00
ded6fb0293 Add stack & cat support for CPU Half (#16389)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/6968

Needed for #14705
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16389

Differential Revision: D13861446

Pulled By: gchanan

fbshipit-source-id: 7b8700b95aaf252d9669693dbddccb2302e58409
2019-01-29 13:06:29 -08:00
d79e45bbba Add some smoke tests for Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16496

Differential Revision: D13863489

Pulled By: soumith

fbshipit-source-id: 518003c27a6b788b5a78b58cdb8698f0bb6ce4d8
2019-01-29 12:54:39 -08:00
6a6983ed7f create type hint stub files for module torch (#12500)
Summary:
We have:

- This is an initial stab at creating a type stub `torch/__init__.pyi` .
- This is only tested on Python 3, since that's the only Python version mypy
  works on.
- So far, we only aim at doing this for torch functions and torch.Tensor.
- Quite a few methods and functions have to be typed manually. These are
  done in `torch/__init__.pyi.in`

For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out.

An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500

Differential Revision: D13695553

Pulled By: ezyang

fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21
2019-01-29 12:14:17 -08:00
3b337e7892 Revert D13596031: Improve c2-aten tensor interop and add proper testing
Differential Revision:
D13596031

Original commit changeset: d20b601e06ba

fbshipit-source-id: dc371697f14b3893a9164380a39e7a49d8d68ecf
2019-01-29 07:14:57 -08:00
bd19dd4b90 url download bugfix for URLs served without Content-Length header (#16153)
Summary:
Some HTTP servers dont return Content-Length, account for that

Fixes: https://github.com/pytorch/pytorch/issues/16152

Differential Revision: D13858882

Pulled By: soumith

fbshipit-source-id: e4293e9368ed4c87548d22adec1ce0c25ea4bd8f
2019-01-29 01:28:47 -08:00
dbebb5322c Properly screen string literals when dumping JIT IR
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16056

Differential Revision: D13719444

Pulled By: ZolotukhinM

fbshipit-source-id: 7113ee9328eff6263513476cdf9254a2e1116f4c
2019-01-29 00:26:37 -08:00
0e6123fb8a Remove dependency on ResourceGuard from IR.h. (#16351)
Summary:
It looks like `WithInsertionPoint` and `WithCurrentScope` can be easily implemented without
`ResourceGuard` - that helps readability and removes one more dependency. Is there anything I'm missing?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16351

Differential Revision: D13821826

Pulled By: ZolotukhinM

fbshipit-source-id: b203200b345fb5508a97dc8656e6f51cde4cc21f
2019-01-29 00:21:32 -08:00
862d466bef Remove redundant includes from scope.h and attributes.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16472

Differential Revision: D13852553

Pulled By: ZolotukhinM

fbshipit-source-id: d5634982c2c42e704d9902774a77660e05fd71eb
2019-01-28 23:47:15 -08:00
5e21e0fe75 Improve c2-aten tensor interop and add proper testing (#15860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15860

Few changes (which are harder to split in separate diffs, so together):
- make conversion explicit (as they can throw to avoid surprises)
- fix tensor legacy dispatch not initialized when tensor is created on C2 side
- add a bunch of invariants to enforce

Reviewed By: ezyang

Differential Revision: D13596031

fbshipit-source-id: d20b601e06ba47aeff2f6e8e15769840e2d46108
2019-01-28 23:41:50 -08:00
9d6be6ac09 Remove redundant "build" setup.py commond from onnx scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16487

Differential Revision: D13858628

Pulled By: bddppq

fbshipit-source-id: e1ff3fc5f9be5d3dbbf96ee73c3a8c901b440b82
2019-01-28 22:59:33 -08:00
7f552041ff Fix identifier shadowing in tracer (#16480)
Summary:
This was causing build failures under `-Werror` targets under optimized build modes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16480

Differential Revision: D13857621

Pulled By: jamesr66a

fbshipit-source-id: 2990b987dbca943298ad478c9ee2792236f5fa5b
2019-01-28 21:47:39 -08:00
f204e3e624 Pass WERROR to CMake as an explicit parameter rather than an env var.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16465

Differential Revision: D13853949

Pulled By: resistor

fbshipit-source-id: 71ccf90a2824ad21c9f26dd753b186f30435d82a
2019-01-28 20:57:18 -08:00
99fab45733 Remove redundant build from build develop instructions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16467

Differential Revision: D13849661

Pulled By: ezyang

fbshipit-source-id: d3d58bd31ac65ad9cbf0057b9a4c499c0f59d95a
2019-01-28 20:47:11 -08:00
52ca4b86db Change SetOutputSize in ConvTransposeUnpoolBaseOp (#16179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16179

to avoid passing partially initialized Tensor around.

Reviewed By: ezyang

Differential Revision: D13744009

fbshipit-source-id: 4c545765e1cd164b3e87ce08ec4c1cb1e37e2b8f
2019-01-28 18:28:11 -08:00
5ebf4cd4e3 Move stack.h to ATen/core (#16247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16247

Stack is going to be used by the c10 dispatcher.
This just moves the file, also changing the namespace turned out to be more complicated than I thought, I'll leave the namespace for now.

Reviewed By: ezyang

Differential Revision: D13774189

fbshipit-source-id: 66aeee36425e0ea2b3a4f8159604f38572306d57
2019-01-28 17:46:10 -08:00
504bcb276c Remove state from schema and calling API (#16180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16180

Only the kernel knows about its state, the caller doesn't see it anymore.

Reviewed By: ezyang

Differential Revision: D13744071

fbshipit-source-id: cb00ff1a881508c1b36ac4123bee1f68ca02ca9c
2019-01-28 17:46:08 -08:00
cc2d49deb7 Remove generic_if.h. (#16354)
Summary:
The current uses of `IR_IF` are mostly trivial, so there is not much value in having special macros for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16354

Differential Revision: D13821823

Pulled By: ZolotukhinM

fbshipit-source-id: 1ca73111f5b4868fa38a1f29c9230540773e5de6
2019-01-28 17:02:23 -08:00
ed50bccb35 Remove CUDA_VERSION to flag and remove JOB_BASE_NAME from binary jobs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16470

Differential Revision: D13853387

Pulled By: pjh5

fbshipit-source-id: a2baccde65ab82b69380ee57b16e43cc80ed3e04
2019-01-28 16:52:11 -08:00
4eceb7a055 Fix cmake byte version issue in build_pytorch_libs.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16457

Differential Revision: D13846408

Pulled By: gchanan

fbshipit-source-id: 26962bc12d7d9fdad71f9dd7526f6d32e6008295
2019-01-28 16:00:28 -08:00
ff963d4b9f Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#16273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16273

Previously we have SetOutputSize which accept a partially initialized Output Tensor and set it to the correct size,
the diff change this to GetOutputSize that returns the correct size instead.
e.g.
```
auto* Y = Output(0);
ConvPoolOp<Context>::SetOutputSize(X, Y, channels);
...
Y->mutable_data<T>...
```
-->
```
auto sizes = ConvPoolOp<Context>::GetOutputSize(X, channels);
auto* Y = Output(0, sizes, at::dtype<T>());
```

Reviewed By: dzhulgakov

Differential Revision: D13736281

fbshipit-source-id: 64abce3dbaed0b375098463333dfd0ea5a3b1945
2019-01-28 15:56:34 -08:00
b076227b21 Move tracer impls into cpp file (#16410)
Summary:
Working on the tracer was really annoying because a lot of the implementations were in `tracer.h` and editing that file caused us to rebuild almost the whole world. So this moves all the implementations into tracer.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16410

Differential Revision: D13847776

Pulled By: jamesr66a

fbshipit-source-id: ec8500da32b2d4cd990f293a0a96101d3e82f158
2019-01-28 15:34:02 -08:00
1a77918955 fix alias annotations on to, cpu, cuda (#16460)
Summary:
Fix alias annotations for ops that may return a fresh tensor. The previous version was overly conservative.

Currently there is no actual behavior change in the alias analysis, but we may use the information in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16460

Differential Revision: D13849086

Pulled By: suo

fbshipit-source-id: cd23b314a800e5e077d866e74456d37a321439d5
2019-01-28 15:20:23 -08:00
e3c0926c44 Remove usage of deprecated "min_satisfying_examples" hypothesis setting (#16401)
Summary:
This setting has been deprecated in [hypythesis 3.56.0](d1b0df5b91/hypothesis-python/docs/changes.rst (3560---2018-04-17)) and recently has been removed in [hypothesis 4.x](d1b0df5b91/hypothesis-python/docs/changes.rst (400---2019-01-14)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16401

Reviewed By: ezyang

Differential Revision: D13832528

Pulled By: bddppq

fbshipit-source-id: 04b9f1dfdf2dcfe0ef121dd02f7fbfdf6bf4aead
2019-01-28 14:17:10 -08:00
0ae45e30bc Support Tensor alias annotations for native_functions.yaml (#16239)
Summary:
Adds Tensor alias annotations.

This isn't a full implementation of alias annotations, but that isn't required to increase compliance with the JIT signature schema. There are some sanity checks within native_parse.py for their usage, which can also help overall correctness. Otherwise, this exists solely for further alignment between the JIT signature schema and the native_functions.yaml func schema.

This gets us to ~85% matches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16239

Differential Revision: D13804133

Pulled By: cpuhrsch

fbshipit-source-id: aa5750f2c7e0f08b8c35d6d8f38cb148e9629855
2019-01-28 13:57:25 -08:00
120c54743e Annotate the bicubic interpolation kernels (#16449)
Summary:
with the correct `__launch_bounds__` for ROCm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16449

Differential Revision: D13844111

Pulled By: bddppq

fbshipit-source-id: 07ed8552a630f3a6426d9e5648c415f066991e3d
2019-01-28 13:47:28 -08:00
fb17be1368 Clear cmake cache when --cmake (#16426)
Summary:
Also, because sometimes we have `CMakeCache.txt` but cmake errored out so I'm adding the existence of `'build.ninja'` as another criterion of rerunning cmake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16426

Differential Revision: D13843801

Pulled By: ezyang

fbshipit-source-id: ea1efb201062f23b7608f8d061997d8a8e293445
2019-01-28 13:43:17 -08:00
e866bc7c88 Remove dims() in caffe2::Tensor (#16356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16356

att

Reviewed By: dzhulgakov

Differential Revision: D13813197

fbshipit-source-id: 68c0fb43404536f622422c51949c819d8a037aa5
2019-01-28 12:42:42 -08:00
05678d0bfa Op-calling API can handle state (#16177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16177

Change the API for calling operators so that it can store state in an OpKernel object.
This diff doesn't store the state there yet, that comes in a follow up diff.

Reviewed By: ezyang

Differential Revision: D13742889

fbshipit-source-id: 20511a9a1b9f850074e50634d4b4acf87f8c6ecd
2019-01-28 11:46:05 -08:00
80f4374dde Handle stack correctly (#16246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246

The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple.
For some reason, this did work when called from python (probably some auto-unpacking),
but once called from JIT, it segfaulted. This diff fixes that.

Reviewed By: dzhulgakov

Differential Revision: D13780147

fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475
2019-01-28 11:46:03 -08:00
c7547dbd5e Fix compiler error in swapBytes64 for rare architectures (#16418)
Summary:
swapBytes64 used to use SwapByteOrder_32 and value, both of which dont exist. This commit rewrites that part from scratch.
This happened on Debugbuild on Microsoft compiler. For that case " && !defined(_DEBUG)" is also removed, because _byteswap_uint64 works fine in debug mode (if it is necessary it should me commented why).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16418

Differential Revision: D13843306

Pulled By: ezyang

fbshipit-source-id: dde1c7baeccec3aaa750d4b7200b3f4ccb4a00cb
2019-01-28 11:38:07 -08:00
17d7818578 Fix lint errors introduced in pytorch/pytorch@ceece5d (#16454)
Summary:
ifedan

```
./test/common_utils.py:748:1: E302 expected 2 blank lines, found 1
./test/test_torch.py:1235:5: E303 too many blank lines (2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16454

Differential Revision: D13844905

Pulled By: bddppq

fbshipit-source-id: 3dc7c740d86310a8efc9864d7c7798fda8257a21
2019-01-28 11:29:11 -08:00
17e3ab957a Report the slowest 10 tests when using pytest (#16423)
Summary:
This flag is useful in identifying if a test is taking way too long like the ones in the following snippet when running the test suite with pytest. 9757ad35b0/test/common_utils.py (L814-L835)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16423

Differential Revision: D13843507

Pulled By: ezyang

fbshipit-source-id: 643e1766a85905b3b112ea5ca562135a17896a72
2019-01-28 10:33:05 -08:00
0a2d14dd7c Optimize SpatialBNOp on GPU (#16395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16395

Optimize SpatialBNOp on GPU

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13829833

fbshipit-source-id: 04d2a63e8e9830c4c39a91cf87fcd7aa765dc55f
2019-01-28 09:36:45 -08:00
ceece5dd0f CPU implementation of torch.cdist (#16168)
Summary:
cdist is used for calculating distances between collections of observations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16168

Differential Revision: D13739147

Pulled By: ifedan

fbshipit-source-id: 9419c2c166891ac7db40672c72f17848f0b446f9
2019-01-28 09:16:32 -08:00
14138f4605 Don't initialize a new std::vector in a loop. (#15850)
Summary:
Before this diff, we execute `std::vector<optional<acc_t>> buffer((unsigned)max_threads, optional<acc_t> {});` in every iteration of `foreach_reduced_elt`. Change the code to only execute that line if we need it; i.e., we are actually about to parallelize.

This overhead is quite significant when we are doing a lot of small reductions in single-threaded code.

```
x=torch.randn((1024,10,1024),dtype=torch.float64)
torch.set_num_threads(1)
%timeit x.std(1)
```

Before (with #15845 applied): 708.25 ms
After: 508 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15850

Differential Revision: D13612960

Pulled By: umanwizard

fbshipit-source-id: f5e61abfe0027775c97ed81ac09c997fbee741df
2019-01-28 08:50:27 -08:00
d2cdffaf37 More documentation on caffe2::Operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16371

Reviewed By: dzhulgakov

Differential Revision: D13820472

fbshipit-source-id: efccea0e92c86d30ec2bdda50eb9aab8a3a1504d
2019-01-28 07:41:14 -08:00
fdaa77ae8b Better error message when creating a module instance in jit.script (#16416)
Summary:
Made the change requested in #15555

PR was failing build due to a time out error while getting packages using pip.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16416

Differential Revision: D13833873

Pulled By: soumith

fbshipit-source-id: e2200e9e8015558fcd359dfa3d025b25802d62b5
2019-01-27 16:29:46 -08:00
952a03ccea Fix issues on Windows brought by #16289 (#16412)
Summary:
This one needs to be merged ASAP because the CUDA build for Windows is skipped at this time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16412

Differential Revision: D13833889

Pulled By: soumith

fbshipit-source-id: 95a401a01fb0f9c1045df0bfd72d8206b8a6f3fd
2019-01-27 15:02:31 -08:00
2b6607065b Fix a typo in Parallel.h (#16419)
Summary:
Fix a typo in Parallel.h.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16419

Differential Revision: D13833705

Pulled By: soumith

fbshipit-source-id: 824ebe753e028fc8e2b5d7a51fdba98a365fd29a
2019-01-27 14:16:47 -08:00
ee18448138 Don't install PDB for Windows static build of caffe2_observers (#16420)
Summary:
Fixes #16292.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16420

Differential Revision: D13833704

Pulled By: soumith

fbshipit-source-id: 482ad6ce103bed7206e924e8c82454fbb1bfac42
2019-01-27 12:29:49 -08:00
c863a759a0 Fix slogdet sign requiring grad when input requires grad (#16337)
Summary:
The real fix for https://github.com/pytorch/pytorch/issues/15605.

This is sort of BC breaking because now
```py
In [1]: import torch

In [2]: a = torch.randn(3, 3, requires_grad=True)

In [3]: a.slogdet()
Out[3]: (tensor(1.), tensor(0.1356, grad_fn=<SlogdetBackward>))

In [4]: a.slogdet()[0].requires_grad
Out[4]: False
```
while before this patch ` a.slogdet()[0]` requires grad with `grad_fn=<SlogdetBackward>`. But any use of backproping through this value will meet the error in #15605 so I don't think this is a problem.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16337

Differential Revision: D13832644

Pulled By: soumith

fbshipit-source-id: f96c477e99edcbdbd966888e5c5ea7fd058429a8
2019-01-27 12:11:14 -08:00
6944461a76 CI Fix: restore MAX_JOBS variable (#16415)
Summary:
Restores a CI workaround (https://github.com/pytorch/pytorch/pull/7361) that got dropped with build_pytorch_libs.sh.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16415

Differential Revision: D13833092

Pulled By: zdevito

fbshipit-source-id: f78b60cafd8da945790dba28de373b8faf46e9f5
2019-01-27 01:27:50 -08:00
3c30cf3237 Update einsum documentation. (#16323)
Summary:
The documentation stated that operands to einsum should be a list of Tensors, not individual arguments. The function, however, now accepts individual arguments for each Tensor operand *and* a single argument consisting of a list of Tensors. The documentation was updated to reflect this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16323

Differential Revision: D13832647

Pulled By: soumith

fbshipit-source-id: c01c2b350f47674d3170337f493b0ee2ea381b3f
2019-01-26 18:00:57 -08:00
de6bb3f3e3 Fix flake8 warnings/errors in test_jit.py (#16409)
Summary:
These were really annoying to see in the phabricator UI when trying to land PRs that touched test_jit.py, so this fixes them.

One remaining item is the T484 error. Locally, flake8 still chokes on that line even though I put the noqa comment there (and tried varying whitespaces around it etc). Not sure why it still persists...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16409

Differential Revision: D13832658

Pulled By: jamesr66a

fbshipit-source-id: 46356ba6444ae5ee1a141c28489bdcc7c99e39c0
2019-01-26 17:42:08 -08:00
d1ed0176df Trace fork and join calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16232

Differential Revision: D13772974

Pulled By: jamesr66a

fbshipit-source-id: b2db370271809e26d3301f8cc98eec567db5e62b
2019-01-26 14:42:45 -08:00
8c81a72e87 Switch to CUDA implementation if batch size >= 65536 for affine_grid (#16403)
Summary:
Changelog:

- Append a condition that switches to the native CUDA implementation for affine_grid

Fixes #16365

Differential Revision: D13832192

Pulled By: soumith

fbshipit-source-id: 3f484e6673d71e3ba7627b170cb8f1611e12b9b2
2019-01-26 11:18:57 -08:00
f6e6b0fd33 gitignore gdb history
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16404

Differential Revision: D13832191

Pulled By: soumith

fbshipit-source-id: ab23d1ad72c041ec2d9616c273bbf399e0feb10d
2019-01-26 09:46:01 -08:00
41e9b092a9 Revert D13821061: [redo][c10] layernorm example
Differential Revision:
D13821061

Original commit changeset: 82f0dade0145

fbshipit-source-id: e5b0b1bab0c9e731ae04add35e9a6c91656dd178
2019-01-25 22:52:04 -08:00
f4e54fd659 trying to fix testX (#16370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16370

passed locally but seems testX has some problem

Reviewed By: ezyang

Differential Revision: D13820250

fbshipit-source-id: e4ad9d1ec99508867d4ead46753a7fb7019c50bd
2019-01-25 17:02:21 -08:00
27a1ba3ef2 layernorm example (#16374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374

this fixes the original attempt in OSS (adds to CMake and python build files)

Reviewed By: smessmer

Differential Revision: D13821061

fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162
2019-01-25 16:52:33 -08:00
13fde345fb plug caffe2 into jit" (#16388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16388

previous diff broke master -- this refactors out the custom_operator.cpp file into a separate header + cpp pair (caffe2_operator.{h,cpp})

Reviewed By: smessmer

Differential Revision: D13823550

fbshipit-source-id: 00e005e650336132d05aef97c1f0e5242ccad5ba
2019-01-25 16:52:32 -08:00
41acbb3b6b Enable centos pytorch rocm CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14879

Differential Revision: D13821534

Pulled By: bddppq

fbshipit-source-id: 45151b880992f1efa83e29c4985a723374575506
2019-01-25 16:27:55 -08:00
9477a5d9c8 Remove bash from build (#16289)
Summary:
This commit removes the dependency on `build_pytorch_libs.sh` by moving the remaining functionality that is not expressible in cmake into python. Removing the indirection through bash also removes over 300 lines of environment munging code that is incredibly hard to understand because it passes a lot of secret parameters through `os.env`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16289

Reviewed By: ezyang

Differential Revision: D13821662

Pulled By: zdevito

fbshipit-source-id: d658d26925e3b1169ac1e3d44a159cf8a1f0d9b1
2019-01-25 16:03:53 -08:00
539894d70a Remove caffe2::ShareData (#16139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139

Original commit changeset: 4b15a4c62995

Reviewed By: dzhulgakov

Differential Revision: D13677464

fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb
2019-01-25 15:39:11 -08:00
ca86d1f01d Trying a fix to anaconda logins on nightlies
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16387

Differential Revision: D13826227

Pulled By: pjh5

fbshipit-source-id: 769a53e40a4912879faf9716a80c0e0c86acdbf8
2019-01-25 15:19:29 -08:00
956cabd887 Update Documentation for Optionals (#16380)
Summary:
Now that https://github.com/pytorch/pytorch/pull/15587 has landed, updating docs.

Will close https://github.com/pytorch/pytorch/issues/15278
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16380

Differential Revision: D13825221

Pulled By: eellison

fbshipit-source-id: c5a7a7fbb40ba7be46a80760862468f2c9967169
2019-01-25 15:14:16 -08:00
c42431bd7a Revert D13740752: [c10] plug caffe2 into jit
Differential Revision:
D13740752

Original commit changeset: 2d9383574d42

fbshipit-source-id: e9ff217a438720423340a10af7fa263b33f2ae24
2019-01-25 12:29:19 -08:00
0e6791b275 Impl Shape op for mkldnn (#15266)
Summary:
Impl Shape op for mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15266

Differential Revision: D13804558

Pulled By: yinghai

fbshipit-source-id: 8a35f608c23973d7a15c3d645aee4059eb55f245
2019-01-25 11:04:57 -08:00
958f846fb3 Back out "[c10] layernorm example"
Summary: Original commit changeset: 87240ca7f48d

Reviewed By: bddppq

Differential Revision: D13816657

fbshipit-source-id: bafcf0779d811c7e4a134cfb323a89352fa8c180
2019-01-25 10:22:30 -08:00
f087c65a56 Add xla test in CI (#15978)
Summary:
Adding xla CPU tests in our CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15978

Differential Revision: D13816344

Pulled By: ailzhang

fbshipit-source-id: f74c52e846976ea4ac439313847908a0e99d05eb
2019-01-25 09:24:45 -08:00
45602ce9a2 Delete Tensor::swap(), replace with pointer swap (#12730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12730

i-am-not-moving-c2-to-c10

Reviewed By: smessmer

Differential Revision: D10415430

fbshipit-source-id: 8a2ce8611c5fa77bbbd73fb6788c1baa3b370f07
2019-01-25 08:25:07 -08:00
4aae89fa7b Make test_proper_exit more robust (#16249)
Summary:
1. Improve error message for better debugging info
2. Increase timeout
3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness

Attempt to fix #14501

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249

Differential Revision: D13784702

Pulled By: ezyang

fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f
2019-01-25 08:25:05 -08:00
ec2a7fa4d4 fix contbuild (#16362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16362

https://our.intern.facebook.com/intern/testinfra/diagnostics/281475065177800.844424930381786.1548397180/

Reviewed By: ezyang

Differential Revision: D13816639

fbshipit-source-id: 024117233f6d3bc6244013ca2ee1aea065560212
2019-01-25 08:25:04 -08:00
8683b75df6 Minor change of group_norm_gradient on GPU (#16307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16307

Minor change of group_norm_gradient on GPU

Reviewed By: houseroad

Differential Revision: D13800613

fbshipit-source-id: 9e55f93b1e322efe3fc2d684b9c47c3dbb7a0f48
2019-01-25 01:25:29 -08:00
52135e9b12 Revert D13551909: [fbcode] logdevice for generic feature type
Differential Revision:
D13551909

Original commit changeset: 807830c50bee

fbshipit-source-id: 48cacf4ec1765253a9be9d78f4b28cc48330be59
2019-01-25 00:33:06 -08:00
11a2b3799b logdevice for generic feature type (#16191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16191

logdevice related modifications for generic feature type

we directly convert the generic feature structures to json strings, which corresponds to the column input in offline and dper

Reviewed By: itomatik

Differential Revision: D13551909

fbshipit-source-id: 807830c50bee569de202530bc3700374757793a2
2019-01-24 23:33:19 -08:00
265ed8ff45 layernorm example (#16350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350

Example usage of the new caffe2 integration

Reviewed By: smessmer

Differential Revision: D13408546

fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611
2019-01-24 22:28:22 -08:00
6d2aee4a9b plug caffe2 into jit (#16331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16331

Temporary measure to enable caffe2 ops in pytorch

Reviewed By: smessmer

Differential Revision: D13740752

fbshipit-source-id: 2d9383574d42ce84ee471aba32eeb4f5a0cc7a4c
2019-01-24 22:28:21 -08:00
d4b60f4014 Add RunOperator for using FunctionSchema registered ops easily in caffe2 (#16173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16173

Helper to make it easy to run ops in caffe2

Reviewed By: smessmer

Differential Revision: D13468240

fbshipit-source-id: 2276c7870af6dcdf829957f005fd16ac1ef319b5
2019-01-24 22:28:19 -08:00
3b6b777a11 Add correct Input() shim to caffe2 operator impl (#16048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16048

This enables full shimming of the operator (previously it was only
Output() shimmed).

Reviewed By: smessmer

Differential Revision: D13468241

fbshipit-source-id: c853b775ab5cdcd968f4a6cc4766e91c3c6b1c45
2019-01-24 22:28:18 -08:00
7ce634ebc2 Relax lower bound for nogil timing test to avoid false alarm (#16259)
Summary:
fixes #16250, #16271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16259

Differential Revision: D13784505

Pulled By: mrshenli

fbshipit-source-id: 0b7ad98cd3c018b9907d70158de3abc3c4cb57ef
2019-01-24 17:16:02 -08:00
c787de6284 Code-style fixes. (#16342)
Summary:
Some cleanups in ir.{h,cpp}. I plan to continue cleaning it up, so this is a first step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16342

Differential Revision: D13808897

Pulled By: ZolotukhinM

fbshipit-source-id: 2dedb414576c3efbf8e36434145d7f14a66b1ee7
2019-01-24 16:44:25 -08:00
6700eff03e disable testing group conv with EIGEN engine (#16335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335

group conv is not implemented with EIGEN engine so this diff disables related tests

Reviewed By: jamesr66a

Differential Revision: D13807204

fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7
2019-01-24 16:39:20 -08:00
c2be9f1487 Remove unneeded manual unwrap optionals (#16245)
Summary:
Remove calls to torch.jit._unwrap_optional that are no longer needed.

The remaining instances would require control flow logic for exceptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16245

Differential Revision: D13804292

Pulled By: eellison

fbshipit-source-id: 08c5cbe4b956519be2333de5cf4e202488aff626
2019-01-24 15:48:01 -08:00
f769cf999d fix buildindexop (#16341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16341

as in the title

Reviewed By: intermilan

Differential Revision: D13808679

fbshipit-source-id: 0d12d3253f380bec66bc9be899be565861b8163a
2019-01-24 15:31:54 -08:00
69b5ae4c54 Revert D13747581: Optimize SpatialBN on GPU
Differential Revision:
D13747581

Original commit changeset: 48a885a240ef

fbshipit-source-id: 58cec6023843d7459865eb80c9db8dac463cb96c
2019-01-24 15:26:37 -08:00
0b470d0a3b Add Test for ReinitializeTensor (#16338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16338

att

Reviewed By: ezyang

Differential Revision: D13806760

fbshipit-source-id: 322b9b7d314aeb0194f52b803ca35c0cb8efcdec
2019-01-24 15:05:21 -08:00
2a70f24cce Add thread-local guard: at::AutoNonVariableTypeMode (#15939)
Summary:
This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable.

This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939

Reviewed By: ezyang

Differential Revision: D13640980

Pulled By: yf225

fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f
2019-01-24 14:33:03 -08:00
f0dd85d141 reduce parameter space of test_1x1_conv to avoid timeout (#16223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16223

As title says

Reviewed By: jamesr66a

Differential Revision: D13758202

fbshipit-source-id: 3cdffb80a5dad53b29e65e8eb0ae128edba70dbb
2019-01-24 14:17:11 -08:00
fdda533eb1 Update docs to include variable annotation example (#16324)
Summary:
Relates to this issue https://github.com/pytorch/pytorch/issues/16288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16324

Reviewed By: ezyang

Differential Revision: D13805412

Pulled By: suo

fbshipit-source-id: 8b80f988262da2c717452a71142327bbc23d1b8f
2019-01-24 13:10:56 -08:00
792cb774f1 Delete duplicate copy of THCCachingAllocator. (#16226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16226

Now that the caching allocator is moved to c10_cuda, we can
delete the duplicate copy from Caffe2.

Reviewed By: dzhulgakov, smessmer

Differential Revision: D13762540

fbshipit-source-id: 03f1ebf7f11c68c19aa0d66110156fe228da6138
2019-01-24 12:06:57 -08:00
e936a69085 Move THCCachingAllocator to c10_cuda. (#16119)
Summary:
Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119

Reviewed By: smessmer

Differential Revision: D13718768

fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656
2019-01-24 12:06:56 -08:00
24b50f1411 Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117

This means I can move it to c10_cuda with minimal fuss.

Reviewed By: smessmer

Differential Revision: D13717836

fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00
2019-01-24 12:06:54 -08:00
47bf30661f Directly include headers from ATen.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287

Differential Revision: D13792949

Pulled By: ZolotukhinM

fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3
2019-01-24 11:22:27 -08:00
af513cd433 Refactor the docs build workflow (#16265)
Summary:
In preparation for setting up a doc build job for stable docs, I wanted
to refactor the workflow so that future changes will be easier.

This PR the following changes:
- Refactor the doc push script into a reusable command
- Add command line options for the doc push script.
  These don't matter too much for now but will be useful
  for setting up future jobs for building different versions of the
  docs.
- Instead of checking out pytorch/pytorch:master, we re-use the pytorch
  installation inside the docker image.
- Change the sed in the script to a perl command. sed is annoyingly
  different across platforms; the perl command is more stable
- Run the script in dry run mode (without pushing the doc build)
  whenever a PR is opened. This lets us test changes to the doc build workflow.

Test Plan
- I tested the doc build script locally with my own credentials and it
  worked fine.
- Wait for the pytorch_doc_push CI.
- After merging this PR, keep an eye on the pytorch_doc_push CI status.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16265

Differential Revision: D13803511

Pulled By: zou3519

fbshipit-source-id: 4564bca3e74d490f89a1d1da9fb8b98eb44bdbb1
2019-01-24 11:18:57 -08:00
a4be15377f Save a little bit of work in constant pooling by not moving nodes that will get deleted.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16161

Differential Revision: D13791247

Pulled By: resistor

fbshipit-source-id: 2a5a4f98309509b4ba875373ee57e6f63c75a4fd
2019-01-24 10:59:57 -08:00
0cb24098c7 Handle non-contiguous inputs with mkldnn convolution. (#16300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16018.

Backwards appears to be fine because the derivative is written in terms of mkldnn_convolution itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16300

Differential Revision: D13797776

Pulled By: gchanan

fbshipit-source-id: 68a990b8a3c186412a99d176931314806c9ed7bf
2019-01-24 07:39:31 -08:00
45c3cc9174 Optimize SpatialBN on GPU (#16202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16202

Optimize SpatialBN on GPU

Reviewed By: houseroad

Differential Revision: D13747581

fbshipit-source-id: 48a885a240ef2a325235e8f89ebbe50e7c780c84
2019-01-24 02:55:10 -08:00
60241e94b3 optimize group_norm (#16216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16216

Optimize GroupNormOp

Reviewed By: houseroad

Differential Revision: D13754145

fbshipit-source-id: 650f64c81486c6c9d276f2e3325392d5838751ba
2019-01-23 23:57:45 -08:00
8ab4d348f4 Fix the tensor deserialization problem of jit script module on CUDA (#16279)
Summary:
Now we create a temporary tensor for the whole record.

Fix https://github.com/pytorch/pytorch/issues/15271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16279

Reviewed By: BIT-silence

Differential Revision: D13791442

Pulled By: houseroad

fbshipit-source-id: 6f52ca09627fb684f74121357cc42e4adadec36a
2019-01-23 21:35:35 -08:00
3cba115abb Small fixes for pdist (#16210)
Summary:
pdist was recently patched to remove buggy batch support and fix issues
with large tensors. This fixed missed a few spots, and didn't handle a
few recommendations that this commit addresses.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16210

Differential Revision: D13791914

Pulled By: gchanan

fbshipit-source-id: 0595841be1b298f7268fd4c02a6628acfec918f2
2019-01-23 19:40:16 -08:00
0a3932acb2 Fix comparison in ReinitializeTensor (#16294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294

In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`.

Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs.

Reviewed By: BIT-silence

Differential Revision: D13795635

fbshipit-source-id: 24d6afa1a0196a32eb0134ee08b4280244cdb0c3
2019-01-23 19:29:29 -08:00
f25322fb97 Fix issues under caffe round 1
Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2

Reviewed By: ezyang

Differential Revision: D13776185

fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24
2019-01-23 19:04:59 -08:00
31de19f210 Add support for overloaded functions (#15556)
Summary:
This PR adds support for overloaded functions as a step toward adding rnn modules to the JIT standard library.

Possible overloads must be manually specified, and when resolving the overload it chooses by the first one that passes the schema matching logic. The structure is very similar to boolean dispatch in #14425. The overload will only work on weak modules.

In order to avoid supporting overloaded methods in Python to match the JIT execution, the current setup offloads that work to the user. In the test added in `test_jit.py`, two methods are used to overload the `forward` method. In order to call `forward` outside the JIT, a Python-only `forward` that does the right argument type switching must also be provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15556

Differential Revision: D13576348

Pulled By: driazati

fbshipit-source-id: 7d3bdd4ee5a6088cc20c92f26a696d1ee5b9204b
2019-01-23 18:16:01 -08:00
8710184eea Constant propagation changes (#16244)
Summary:
- remove loop node that is guaranteed not to execute
- remove extra loop outputs that are no longer needed

- if we are inlining an if node, only run constant propagation on the block that will execute

- remove the recurse argument since we only expose the Graph Constant Propagation and it's not used

This also includes  a few extra hooks to python_ir that I think make it a little be easier to test graph conditions from python.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16244

Differential Revision: D13791635

Pulled By: eellison

fbshipit-source-id: d16351fffcfc8013b02015db200f8fde002e0577
2019-01-23 17:50:33 -08:00
4b06c063a5 raise exception if try jit.load non-existent file (#16270)
Summary:
addresses https://github.com/pytorch/pytorch/issues/16267
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16270

Differential Revision: D13791773

Pulled By: suo

fbshipit-source-id: 256304a02dbf724a7c0baade48c94b3ee77f53cf
2019-01-23 16:16:18 -08:00
80bd28bcb2 Fixing upload of nightly binaries and clean MacOS output (#16016)
Summary:
- Fix environment variable used to guard binary uploads
- Move common MacOS brew setup-code into a common function to decrease code duplication and also to move that noisy console output into its own CircleCI step
- Split Mac builds into separate build-test and upload jobs. Add one of these jobs to PR runs; add upload jobs to nightly binarybuilds workflow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16016

Differential Revision: D13791084

Pulled By: pjh5

fbshipit-source-id: 8eeb8e1963d46eab84f0f6dad9f0265163d5bf73
2019-01-23 15:38:04 -08:00
fc5b79cd1c CUDA event should only be recorded after NCCL group (#8219)
Summary:
Otherwise, it won't work if we sync on this event.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8219

Reviewed By: pietern

Differential Revision: D13788657

Pulled By: teng-li

fbshipit-source-id: 8c96e9691ed2441d7a685fb7ae8fece906f58daf
2019-01-23 14:18:26 -08:00
07a090247a Change data() accessor in Caffe2 to return non-const pointer. (#16176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16176

This makes PyTorch and Caffe2's data() method line up.
Historically, PyTorch made no distinction between tensors
with const or non-const data, and thus provided a
non-const pointer with data() member.  Changing the API to
return a const-pointer would break all mutable code, whereas
changing the Caffe2 API to change a pointer doesn't break
any code, *except* for code which required an exact match
on const-ness (e.g., in template arguments).  Since the latter
is less disruptive, we've opted for it here.

The few places downstream that broke due to this are fixed
in this patch.

Reviewed By: smessmer

Differential Revision: D13742916

fbshipit-source-id: baa4b4544cfdf7c1f369f4d69a1e0d5953c1bd99
2019-01-23 13:55:24 -08:00
dba4d37ac2 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 99d58034f9369846f8c82a5ea11c71e202e52a4e
2019-01-23 13:08:36 -08:00
f2b1842344 Align native_functions.yaml func schema more with JIT signature schema (#16111)
Summary:
This PR applies a few minor modifications leading to 100s of additional matches

Modifications to native_functions.yaml
1) double to float
2) int64_t to int
3) IntList[\d*] to int[\d*]
4) {} to []
5) Tensor? x=[] to Tensor? x=None
6) TensorList to Tensor[]
7) 1e-x to 1e-0x
8) Generator* x = nullptr to Generator? x = None
9) `{.*}` to `[.*]`

Overall this adds about 300 new matches and brings us to about 1/2 compliance of native_functions func with their JIT signature equivalent

While this is still a draft "tools/jit/gen_jit_dispatch.py" contains code to aid in finding close signatures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16111

Reviewed By: ezyang

Differential Revision: D13738123

Pulled By: cpuhrsch

fbshipit-source-id: d1ec1e089bdb26ec155f6f31ccf768270acb76c7
2019-01-23 12:52:31 -08:00
3837446883 Fixes selection of cuDNN algorithm (#15881)
Summary:
This PR updates the logic for using cudnnGet* and cudnnFind*. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions.

Changelist:
- Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`.
- Used cudnnGet*_v7 everywhere cudnnGet* was being used.
- Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851)

Differential Revision: D13787601

Pulled By: ezyang

fbshipit-source-id: 81fe86727673d021306fe1c99c3e528b7c9ad17f
2019-01-23 12:47:15 -08:00
879bf65811 Disable flaky test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16274

Reviewed By: pietern

Differential Revision: D13788036

fbshipit-source-id: a9b7353fb0655908e6d47387cc77af33e9471aed
2019-01-23 11:57:44 -08:00
9310eb1fd0 Update third_party protobuf to v3.6.1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16251

Reviewed By: ezyang

Differential Revision: D13781444

Pulled By: bddppq

fbshipit-source-id: b713a021033d214f30a49ee02b95edf8633bcc50
2019-01-23 09:34:53 -08:00
e669f72466 fix sigma in the middle of when word (#16227)
Summary:
there is a random sigma in the when word on :
https://pytorch.org/cppdocs/contributing.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16227

Differential Revision: D13762753

Pulled By: goldsborough

fbshipit-source-id: 3d4bf4be859a3069402fe8c3fbc8ebee4f25cc5a
2019-01-23 08:35:32 -08:00
36e27aa092 Typos and broken RSTs fixed in torch.distribution (#16136)
Summary:
- probabilty -> probability
- make long lines break
- Add LogitRelaxedBernoulli in distribution.rst
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16136

Differential Revision: D13780406

Pulled By: soumith

fbshipit-source-id: 54beb975eb18c7d67779a9631dacf7d1461a6b32
2019-01-23 03:03:10 -08:00
8b49efe86a tune elementwise for AMD uarch (#16217)
Summary:
Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression.

No functional/performance change for CUDA - just shifting numbers into constrexpr.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217

Differential Revision: D13776684

Pulled By: bddppq

fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9
2019-01-22 18:23:51 -08:00
ddeaa541aa fix typo in resnet50_trainer.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219

Differential Revision: D13776742

Pulled By: bddppq

fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9
2019-01-22 17:28:04 -08:00
e7a77ac3b0 Automatic update of fbcode/onnx to dc75285d4a1cff9618400164dfdb26c5a1bab70a
Summary:
Previous import was c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7

Included changes:
- **[dc75285](https://github.com/onnx/onnx/commit/dc75285)**: Relax constraint that the initializers must be a subset of graph inputs (#1718) <G. Ramalingam>
- **[985c8cd](https://github.com/onnx/onnx/commit/985c8cd)**: Fix typo in scan shape inferencing (#1753) <Scott McKay>
- **[ab52a5d](https://github.com/onnx/onnx/commit/ab52a5d)**: remove stale test cases <Lu Fang>
- **[56434bb](https://github.com/onnx/onnx/commit/56434bb)**: Removing experimental ConstantFill op. <Spandan Tiwari>
- **[881c63c](https://github.com/onnx/onnx/commit/881c63c)**: Show string names of data types instead of int IDs (#1749) <Shinichiro Hamaji>
- **[0a12fe4](https://github.com/onnx/onnx/commit/0a12fe4)**: Update ConstantOfShape op. (#1744) <Bowen Bao>
- **[ef028e5](https://github.com/onnx/onnx/commit/ef028e5)**: Update definition of Cast Op to support casting to/from string (#1704) <Raymond Yang>

Reviewed By: BIT-silence

Differential Revision: D13773962

fbshipit-source-id: b98079277994a699d4807210ba1d9c27f4672090
2019-01-22 15:01:29 -08:00
2235fb256e Add default_stream() and enhance current_stream() (#16200)
Summary:
Closes #16156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200

Differential Revision: D13747455

Pulled By: mrshenli

fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec
2019-01-22 14:35:19 -08:00
3e790f6ee8 complex_registration_extension.cpp includes to angled brackets
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16122

Reviewed By: smessmer

Differential Revision: D13717900

fbshipit-source-id: 8401f39d993482d3e08d2d79bc1841deafee2a5b
2019-01-22 14:22:38 -08:00
0f45e6dbdc Remove ATen/Allocator.h forwarding header.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16121

Reviewed By: smessmer

Differential Revision: D13717899

fbshipit-source-id: 83488f2aa801ca75059949ec85171ec03e64c4ff
2019-01-22 14:22:36 -08:00
77de69867a Remove dead curVal store.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16116

Reviewed By: smessmer

Differential Revision: D13717719

fbshipit-source-id: 2ecee3f08f64e64ec5ac3c92fb326bc3df37e40e
2019-01-22 13:38:36 -08:00
325df4ccfb Make kernel registration constexpr again (#16166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16166

Since we now don't use std::function anymore, we can make kernel registration constexpr again.

Reviewed By: ezyang

Differential Revision: D13738630

fbshipit-source-id: 918fa3a3c8c6f0ddbd0f08b3b143cdf066265387
2019-01-22 13:29:13 -08:00
cd8f4154f4 Avoid closure around kernel (#16165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16165

Store kernels as direct function pointers instead of std::function.
Using direct function pointers avoids a performance risk std::function would introduce.

Reviewed By: ezyang

Differential Revision: D13738627

fbshipit-source-id: a348906c8a201436699681980a82ca95065a06a0
2019-01-22 13:29:11 -08:00
6192831b76 Pass IValues from JIT to c10 dispatcher (#16066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16066

Don't unwrap and re-wrap but directly pass through the IValues

Reviewed By: ezyang

Differential Revision: D13689037

fbshipit-source-id: 99b8155e640eb61a3c0597bf0f2b9c338712b45e
2019-01-22 13:29:09 -08:00
1c058de9ac Release GIL when synchronize or wait (#16182)
Summary:
address the second future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182

Differential Revision: D13744972

Pulled By: mrshenli

fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92
2019-01-22 13:29:07 -08:00
c6503a4205 Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable
Differential Revision:
D13540278

Original commit changeset: 3768c76a90b0

fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7
2019-01-22 12:22:40 -08:00
c5e1b469be Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429)
Summary:
Partially fixes: https://github.com/pytorch/pytorch/issues/394

Implementation detail:

Codegen is modified to generate codes that looks like below:
```C++
static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<6> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  static PyStructSequence_Field fields0[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.svd_out", nullptr,
    fields0, 3
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    namedtuple_type_initialized0 = true;
  }
  static PyStructSequence_Field fields1[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc1 = {
    "torch.return_types.svd", nullptr,
    fields1, 3
  };
  static PyTypeObject type1;
  static bool namedtuple_type_initialized1 = false;
  if (!namedtuple_type_initialized1) {
    PyStructSequence_InitType(&type1, &desc1);
    namedtuple_type_initialized1 = true;
  }
  if (r.idx == 0) {
    if (r.isNone(3)) {
      return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2)));
    } else {
      auto results = r.tensorlist_n<3>(3);
      return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2]));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}
```
Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called.

When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`.

In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue.

Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs.

There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429

Differential Revision: D13709678

Pulled By: ezyang

fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf
2019-01-22 11:12:18 -08:00
1e19fd941f Fix formating in caffe2/quantization/server/README.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237

Reviewed By: dskhudia

Differential Revision: D13751791

Pulled By: jspark1105

fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799
2019-01-22 10:15:37 -08:00
9521a15c88 hip-clang enablement (#16085)
Summary:
Initial enabling of the upcoming hip-clang compiler for the PyTorch source base.

Changes:
* update the Eigen submodule to a version including our upstreamed hip-clang enabling there
* modify a few ifdef guards with the `__HIP__` macro used by hip-clang
* use `__lane_id` instead of `hc::__lane_id`
* add Debug flags for ROCm to the cmake infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085

Differential Revision: D13709459

Pulled By: ezyang

fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7
2019-01-22 09:09:48 -08:00
4cf76574b9 Raise CalledProcessError when torch.distributed launch process not return 0 (#16069)
Summary:
`torch.distributed.launch.py` will not raise error when `subprocess.Popen` is not return 0.
For better debugging it should always raise an error if processes launched have unusual behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16069

Differential Revision: D13709467

Pulled By: ezyang

fbshipit-source-id: 31d32a5ec8fed7bccd62d845bfba0e670ed3fe20
2019-01-22 08:50:47 -08:00
53ae8bc64d Reserve vectors that we know the size in advance for. (#16201)
Summary:
Save reallocation costs, by reserving vectors according to how many elements we expect to put in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201

Differential Revision: D13762594

Pulled By: ezyang

fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0
2019-01-22 08:02:40 -08:00
dfcafb1f71 cpp doc fix (#16221)
Summary:
Fixed a few C++ API callsites to work with v1.0.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16221

Differential Revision: D13759207

Pulled By: yf225

fbshipit-source-id: bd92c2b95a0c6ff3ba5d73cb249d0bc88cfdc340
2019-01-21 21:56:22 -08:00
addebf110f Move away from ConstantFill (#16214)
Summary:
Prerequisite of https://github.com/onnx/onnx/pull/1434
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16214

Reviewed By: BIT-silence

Differential Revision: D13755116

Pulled By: houseroad

fbshipit-source-id: a46be8d7df959b5ede93e1f9c911a9a9326e6879
2019-01-21 20:15:38 -08:00
9757ad35b0 ban conv_double_backward from sandcastle, it takes too long
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16220

Differential Revision: D13755108

Pulled By: zdevito

fbshipit-source-id: 46b1b128b155964c25249add0c84680491845e9b
2019-01-21 20:00:29 -08:00
0cd1ab82b0 Remove dead code from setup.py, remove need for build target. (#16162)
Summary:
Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun.

The NinjaBuilder stuff is dead. It was used to make building _C.so
faster but now _C.so is just an empty stub file.

Removed a bunch of custom build commands from setup.py that are
no longer meaningful now that cmake handles most of the build.

Removed unused targets in build_pytorch_lib.sh/bat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162

Differential Revision: D13744155

Pulled By: zdevito

fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892
2019-01-21 17:27:56 -08:00
bed7db7772 Unhide unique from C++, make unique partially scriptable (#15256)
Summary:
This PR does three things:

~~Allow `int64_t?` in function schema,  which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~

~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~

~~Example:~~

```yaml
- func: myop(Tensor self, int64_t? dim=None) -> Tensor
  variants: function
```

~~cc: zou3519~~

Edit: implemented in https://github.com/pytorch/pytorch/pull/15234

Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`.

Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user.

Differential Revision: D13540278

Pulled By: wanchaol

fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd
2019-01-21 12:31:37 -08:00
c33512bdfc Automatic update of fbcode/onnx to c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 (#16190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16190

Previous import was fd60104394fa353e1762f44ecad1b2166e33deef

Included changes:
- **[c553fb3](https://github.com/onnx/onnx/commit/c553fb3)**: Handle negative axis in scan shape inference (#1748) <G. Ramalingam>
- **[51b6ecc](https://github.com/onnx/onnx/commit/51b6ecc)**: external_data: Store large tensor values in separate files (#678) <Michał Karzyński>
- **[ba05f26](https://github.com/onnx/onnx/commit/ba05f26)**: Scan output axes (#1737) <G. Ramalingam>
- **[90920c0](https://github.com/onnx/onnx/commit/90920c0)**: Add NonZero op. (#1714) <Sergii Dymchenko>
- **[c4cf112](https://github.com/onnx/onnx/commit/c4cf112)**: fix the test cases for constantofshape (#1746) <Lu Fang>
- **[d902349](https://github.com/onnx/onnx/commit/d902349)**: Add sample implementation support (#1712) <Lu Fang>

Differential Revision: D13745693

fbshipit-source-id: 05e2cce9ae1dfa2865db83840df64673d55cea57
2019-01-21 09:46:29 -08:00
866c4e3467 Separate Moments from math and optimize it (#16175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175

Separate Moments from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13742472

fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
2019-01-20 08:53:25 -08:00
898329c3f9 Unify device() return type in Stream, Event, and Tensor (#16150)
Summary:
Addresses one future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150

Differential Revision: D13732299

Pulled By: mrshenli

fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18
2019-01-19 23:01:31 -08:00
1fb6b431a3 Replace use of ConstantLike with with ConstantOfShape (#16095)
Summary:
Submitting this PR as an update to existing PR (https://github.com/pytorch/pytorch/pull/15938) on houseroad 's request.

This PR replaces the use of ONNX op `ConstantLike` with `ConstantOfShape` in the ONNX exporter. In addition to removing the call sites in `symbolic.py`, it also replace the call site in `peephole.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16095

Differential Revision: D13745723

Pulled By: houseroad

fbshipit-source-id: e2a5f534f01adf199df9e27544f7afcfa540e1f0
2019-01-19 19:52:54 -08:00
33d1ec396b Fix LBFGS issue (#16167)
Summary:
Resolves #15923 where LBFGS threw "Error: a leaf Variable that requires grad has been used in an in-place operation."
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16167

Differential Revision: D13745822

Pulled By: soumith

fbshipit-source-id: 7d1d0511d06838c0c6f4c8a6b53cf15193283059
2019-01-19 15:01:06 -08:00
a28c0ff7b8 Allow for concurrent quantization in FullyConnectedDNNLowPOp (#16174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16174

Our service creates a new caffe2 workspace for the same underlying network on multiple threads concurrently at service startup time (later these workspaces are being reused for sequential requests), resulting in concurrent quantization via FullyConnectedDNNLowPOp calling GetOrCreateFbgemmPackBMatrix(). The lazily performed quantizations during the first inference in each workspace are all funnelled through GetOrCreateFbgemmPackBMatrix()'s cache_mutex, which means quantization is serialized, so at service startup time only a single CPU core is being used for around a minute until the serial quantization is done.
An better solution would be to avoid the quantization of the same weight matrix of the operator copies in different net copies to begin with, but this here is the simpler solution for our current problem.

Reviewed By: jspark1105

Differential Revision: D13708785

fbshipit-source-id: 537519896b3b939c552d67f400bafc8a69ce11eb
2019-01-19 06:00:22 -08:00
daedec2350 Support ConstantOfShape in Caffe2 ONNX Backend (#16108)
Summary:
This PR is the prerequisite to land https://github.com/pytorch/pytorch/pull/16095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16108

Reviewed By: BIT-silence

Differential Revision: D13725722

Pulled By: houseroad

fbshipit-source-id: 28c0fb72f075cd04f9db44dfab0163844c20c620
2019-01-18 22:58:23 -08:00
b436f94b53 Separate affine_channel from math and optimize it (#16135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135

Separate affine_channel from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13727606

fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
2019-01-18 22:40:16 -08:00
e8b872abe2 Pass IValue from c10 dispatcher to caffe2 operator (#16065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16065

Before, we registered the caffe2 kernel with the c10 dispatcher using plain C types.
Now, we pass in IValues, which avoids the unwrapping inbetween.

Reviewed By: ezyang

Differential Revision: D13689036

fbshipit-source-id: b976a2c46a5a541f6a926b3df255e8a535e32420
2019-01-18 16:02:18 -08:00
c9044166a5 Make c10 dispatcher use boxed kernel function pointers (#16051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051
This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.

Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`.

Reviewed By: ezyang

Differential Revision: D13684518

fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991
2019-01-18 16:02:15 -08:00
b662a9b66a add back NNPACK in PyTorch (#15924)
Summary:
This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions.

In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.)

The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.)
The CMake changes try to use the NNPack we already have in git.

In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924

Differential Revision: D13709576

Pulled By: ezyang

fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c
2019-01-18 15:34:35 -08:00
ed57425b0a improve performance of unique with inverse indices (#16145)
Summary:
Partial fix for #15804, only w/o dim.
For jcjohnson benchmarking script I'm getting the following results on V100:
Before:
```
unning with N = 10000, M = 10000
cuda (no inverse): 0.98 ms
cpu (no inverse): 0.96 ms
cuda (with inverse): 1.07 ms
cpu (with inverse): 1.76 ms

Running with N = 10000, M = 100000
cuda (no inverse): 0.76 ms
cpu (no inverse): 1.53 ms
cuda (with inverse): 1.23 ms
cpu (with inverse): 3.02 ms

Running with N = 100000, M = 100000
cuda (no inverse): 1.28 ms
cpu (no inverse): 11.22 ms
cuda (with inverse): 69.76 ms
cpu (with inverse): 20.28 ms

Running with N = 100000, M = 1000000
cuda (no inverse): 0.78 ms
cpu (no inverse): 18.78 ms
cuda (with inverse): 133.45 ms
cpu (with inverse): 34.09 ms

Running with N = 500000, M = 500000
cuda (no inverse): 1.43 ms
cpu (no inverse): 61.13 ms
cuda (with inverse): 3315.18 ms
cpu (with inverse): 104.57 ms

Running with N = 500000, M = 5000000
cuda (no inverse): 0.86 ms
cpu (no inverse): 96.44 ms
cuda (with inverse): 5209.93 ms
cpu (with inverse): 176.10 ms
```
After
```
Running with N = 10000, M = 10000
cuda (no inverse): 1.04 ms
cpu (no inverse): 0.94 ms
cuda (with inverse): 0.64 ms
cpu (with inverse): 1.76 ms

Running with N = 10000, M = 100000
cuda (no inverse): 0.77 ms
cpu (no inverse): 1.55 ms
cuda (with inverse): 0.58 ms
cpu (with inverse): 2.79 ms

Running with N = 100000, M = 100000
cuda (no inverse): 1.30 ms
cpu (no inverse): 14.15 ms
cuda (with inverse): 1.63 ms
cpu (with inverse): 20.90 ms

Running with N = 100000, M = 1000000
cuda (no inverse): 0.82 ms
cpu (no inverse): 18.63 ms
cuda (with inverse): 0.61 ms
cpu (with inverse): 33.52 ms

Running with N = 500000, M = 500000
cuda (no inverse): 1.51 ms
cpu (no inverse): 59.81 ms
cuda (with inverse): 1.23 ms
cpu (with inverse): 110.69 ms

Running with N = 500000, M = 5000000
cuda (no inverse): 0.92 ms
cpu (no inverse): 104.26 ms
cuda (with inverse): 0.84 ms
cpu (with inverse): 187.12 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16145

Differential Revision: D13738821

Pulled By: soumith

fbshipit-source-id: 0811fb4ade47e3b466cebbc124e3f3333a986749
2019-01-18 14:56:39 -08:00
c6d9c51c7e fix for clang-tidy (#16164)
Summary:
It turns out that clang-tidy is bundled with travis's standard trusty distribution, so no need to install it manually.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16164

Differential Revision: D13738986

Pulled By: suo

fbshipit-source-id: d0cd76c615625b2ed7f18951289412989f15849d
2019-01-18 14:04:26 -08:00
292edfb087 Change current device in stream context manager if necessary (#16128)
Summary:
Fixes #16019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128

Differential Revision: D13721850

Pulled By: mrshenli

fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5
2019-01-18 12:39:51 -08:00
eea50e91fa Fix SoftmaxOps (#16049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16049

We might see the pattern
```
if (scale_.numel() != N) {
   scale_->Resize(N);
   // set initial value for scale_
}

// In class:
Tensor scale_{CPU};
```
before in the code, where `scale_` is a member variable of Type `caffe2::Tensor`
This pattern actually serves two purposes, if `scale_` is partially initialized with device type but not size, this call will
initialize Tensor with the correct size, or if `scale_` is already initialized with size, it will check whether the size
matches a runtime value `N` and if not it will Resize. To rewrite this we'll do the following:
```
if (!scale_.defined() || scale_.numel() != N) {
  ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU));
  // set initial value for scale_
}

```
There are some variants, if `scale_` is resized to a constant size, we can call `ReinitializeTensor` instead
```
if (scale_.numel() != 1) {
  scale_->Resize(1);
}
```
-->
```
ReinitializeTensor(&scale_, {1}, at::dtype<float>().device(CPU));
```

Normal Resize will be refactored directly into ReinitializeTensor:
```
scale_->Resize(N);
```
-->
```
ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU));
```

Reviewed By: dzhulgakov

Differential Revision: D13667883

fbshipit-source-id: 2c7cb61544b72765b594011b99150eb5a1b50836
2019-01-18 12:30:59 -08:00
3f4bb3d493 rest of uses for deprecation of dims() in Tensor (#16118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118

att

Differential Revision: D13697211

fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112
2019-01-18 11:52:12 -08:00
b69c05dbd6 RNN operators should inherit step_net device_options (#16086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16086

[caffe2] RNN operators should inherit step_net device_options
According to NetDef documentaiton, if network has a specific device option it applies to all network operators that do not explicitly specifiy it.
But this does not seem to be the case for RecurrentNetwork operators

Reviewed By: orionr

Differential Revision: D13699552

fbshipit-source-id: 14529bc9504e3b02f763e3c2429be21e46f82b68
2019-01-18 11:36:38 -08:00
d4f6befc93 Add implicit optional unwrapping (#15587)
Summary:
Add support for type inference for optional type refinement.

If a conditional is of the form "x is None" or "x is not None", or is a boolean expression containing multiple none checks, the proper type refinements are inserted in each branch.

For example:
if optional_tensor is not None and len(optional_tensor) < 2:
	# optional_tensor is a Tensor

if optional_tensor1 is not None and optional_tensor2 is not None:
	# both optional_tensor1 and optional_tensor2 are Tensors

TODO:

- not run an op for unchecked unwrap optional in the interpreter

- potentially refine types to prim::None (omitted for now to simply things & because it's not an actual use cause).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15587

Differential Revision: D13733810

Pulled By: eellison

fbshipit-source-id: 57c32be9f5a09ab5542ba0144a6059b96de23d7a
2019-01-18 11:25:01 -08:00
da578b7dcf Add defined() to caffe2::Tensor (#16125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125

Add defined() method to check whether the Tensor is defined.

Reviewed By: ezyang

Differential Revision: D13719222

fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47
2019-01-18 11:03:36 -08:00
b9b160d86f Remove ATen/Half.h and ATen/core/Half.h forwarding headers.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115

Reviewed By: bddppq

Differential Revision: D13717049

fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a
2019-01-18 10:55:21 -08:00
1ff864712b Port legacy any(*) to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15547

Differential Revision: D13549495

Pulled By: mrshenli

fbshipit-source-id: 09a065a8ffa7d73f409759b779c7314cc87f4853
2019-01-18 10:32:19 -08:00
ed0a761c82 Improve pack_sequence and pack_padded_sequence error message (#16084)
Summary:
Mention that if enforce_sorted=True, the user can set
enforce_sorted=False. This is a new flag that is probably hard to
discover unless one throughly reads the docs.

Fixes #15567
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084

Differential Revision: D13701118

Pulled By: zou3519

fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260
2019-01-18 07:58:54 -08:00
b4bc55beef TCP init method race condition fix (#15684)
Summary:
This PR fixes a race condition for TCP init method, when master rank can exit earlier than slave ranks and thus the TCP daemon thread gets shutdown before other slaves are able to access it.

This will let every rank (process) write a special key to the store to mark that they are completed (and thus about to exit).  The master rank (who is the server) will always wait until all the ranks to complete before complete itself.

This should fix: https://github.com/pytorch/pytorch/issues/15638

Tested using the repro of https://github.com/pytorch/pytorch/issues/15638 and works fine. Also test_distributed and test_c10d should have already had this coverage.

I had to make rendezvous test in c10d the world size of 1, since it is a single process code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15684

Differential Revision: D13570904

Pulled By: teng-li

fbshipit-source-id: 34f3bc471204bbd29320df359347ad5561c6b589
2019-01-18 02:29:38 -08:00
aaff2fecda Remove caffe2::Tensor copy constructor (#15416)
Summary:
Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior.

This change also identified a few places that misused the copy constructor - those are fixed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416

Reviewed By: Yangqing

Differential Revision: D13524598

fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7
2019-01-18 00:31:56 -08:00
b5c733324c Fix RERUN_CMAKE
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16132

Differential Revision: D13726816

Pulled By: zdevito

fbshipit-source-id: 26ad70651b0138642ad5240670f5c452018c13a2
2019-01-18 00:04:31 -08:00
cb2961f63c Cleanup includes in python_print.cpp.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16129

Differential Revision: D13724297

Pulled By: ZolotukhinM

fbshipit-source-id: 24e140bc052c85ef40b928eb84f463d341346a51
2019-01-17 18:13:17 -08:00
27674dc7c6 Refactor attributes.h (#16098)
Summary:
This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098

Differential Revision: D13717637

Pulled By: ZolotukhinM

fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb
2019-01-17 17:39:58 -08:00
40b3e4907c Fix export macros (#15856)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15856

They seem to be wrong.
cc zdevito to take a look but I think this is now more correct.

It's weird this didn't cause linker errors. Probably, this functionality isn't used across library boundaries yet.

Reviewed By: dzhulgakov

Differential Revision: D13605257

fbshipit-source-id: 7077ca9027c3ac79a4847ec15ead7ddb28696445
2019-01-17 16:04:01 -08:00
0ab8de3125 Remove some dependencies from ivalue.h to ATen (#15855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15855

This is preparation work for moving IValue to c10.

Reviewed By: ezyang

Differential Revision: D13605259

fbshipit-source-id: cc545f582ab8607bb02aaf71273cb2710200b295
2019-01-17 16:03:58 -08:00
68164c1c3e Code style cleanup (#15854)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15854

- Remove unnecessary `inline` keyword
- Add a TODO stating the intention for Blob::ShareExternal()

Reviewed By: dzhulgakov

Differential Revision: D13605258

fbshipit-source-id: c0bc85c74c4ca4b3811d42ac7f866182e159d840
2019-01-17 16:03:56 -08:00
637b35b372 Use intrusive_ptr for Blob in IValue (#16052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16052

We need IValue to take/return Blob as an intrusive_ptr because we want to pass it around and Blob has disabled copying.
This is needed in a diff on top.

Reviewed By: ezyang

Differential Revision: D13684761

fbshipit-source-id: 7cb3d7e9fec39a2bc9f063d4d30404e6d7016eb2
2019-01-17 15:56:54 -08:00
3e85a2bcbf Move c10 dispatcher back to ATen/core (#16050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16050

The c10 dispatcher will (soon) depend on IValue and IValue can't be moved to c10 yet because it depends on at::Tensor, which depends on legacy Type dispatch and we don't want the legacy dispatch in c10.

So instead, we move the c10 dispatcher back to ATen/core until we can actually move at::Tensor to c10.

Reviewed By: ezyang

Differential Revision: D13684517

fbshipit-source-id: 1125f4254223907c52f96ff73034f6d4ae9fd0a7
2019-01-17 15:56:52 -08:00
a9438ba62f Moving cuda-convnet2 to the internal fb dir to satisfy internal dependencies. (#16104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16104

PyTorch PR 15784 removed cuda-convnet from the contrib directory. This broke
some internal-only fb dependencies. Moving this to the internal area.

Reviewed By: ezyang

Differential Revision: D13709112

fbshipit-source-id: 2d7811545da67489869b59c350a29817eff693cf
2019-01-17 15:11:20 -08:00
431a34f3ff further wildcard cleanups (#16041)
Summary:
Some cleanup to wildcard handling, including one bugfix: previously, we were not considering writes to the wildcard set as part of the potential write set for nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16041

Differential Revision: D13705738

Pulled By: suo

fbshipit-source-id: acb8ccbaa70fe47445577ddf24a69f84630de411
2019-01-17 14:54:34 -08:00
962f3f4864 Refactor _jit_internal (#16058)
Summary:
Use qualified names in `jit/__init__.py` to avoid polluting that namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16058

Differential Revision: D13718745

Pulled By: driazati

fbshipit-source-id: 19d150569c8374541250a961f24f70c3f523de03
2019-01-17 13:56:50 -08:00
99b029aca3 Include all Caffe2 headers in Python installations (#16124)
Summary:
Confirmed on a local run that all the additional headers are present. This shouldn't be caught in any existing tests though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16124

Differential Revision: D13720773

Pulled By: pjh5

fbshipit-source-id: 22a42639f5649cac555ecc5a8b6760a8cbfcf01f
2019-01-17 13:51:51 -08:00
0282318bea Add comment to explain rnn bias vectors (#15843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15843

RNN/LSTMs only need one bias vector, but our implementation uses two to be compatible with CuDNN. This diff adds a comment to explain this.

Reviewed By: ezyang

Differential Revision: D13602365

fbshipit-source-id: eef5bd9383d9f241dc0ef0472f753b4a44cc19b5
2019-01-17 13:32:42 -08:00
7e9e1c7a9f Add @yf225 to cpp codeowner
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16120

Differential Revision: D13718880

Pulled By: yf225

fbshipit-source-id: 1c0a41ffba71855a3ad88b8d263ba2bd5076351d
2019-01-17 13:03:48 -08:00
9aac5c7e85 Update FP16 to latest master (#14498)
Summary:
TSIA - fp16 cmake had a bug that is fixed in https://github.com/Maratyszcza/FP16/pull/9 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14498

Differential Revision: D13240829

Pulled By: Yangqing

fbshipit-source-id: 724745750efe4f1b49d29ee07380c36997579915
2019-01-17 13:00:28 -08:00
d6a8dd9538 Cleanup gumbel_softmax (#13339)
Summary:
Fixes #12643, amends to #3341.

- Allow multidimensional input ~~(but apply softmax over `dim=-1`)~~ with `dim` argument
- Cleaner: Less lines of code
- Faster (1.32x speedup vs original, 2x speedup vs using `torch.Distributions`)
- Small fixes in docstring
- Remove some references in docstring. Was the linked (excellent) ipynb the first to do the straight-through trick? Instead, I propose changing to reference to the two papers most known for it.
- Add deprecationwarning for `eps`. It's not needed anymore.
- Initial commit keeps some code alternatives commented to exploit CI

- As of discussion when `gumbel_softmax` was added (#3341), this was merged into `torch.nn.functional` before all the work with `Distributions` and `Pyro`, and there will probably be multiple other best practices for this in the future.
I've tested building using the `Distributions`-api, but it was too slow, see below.

I therefore propose not using `Distributions` to keep it fast and simple, but adding a comment in docstring that `gumbel_softmax` may be deprecated in the future.

```
dist = torch.distributions.RelaxedOneHotCategorical(temperature=tau, logits=logits, validate_args=False)
y_soft = dist.rsample()
```

Pros:
* Built using tricks like `logsumexp` etc
* Explicitly uses `torch.distributions.utils._finfo` to avoid overflow (old implementation had an `eps` flag)
* Maintained for this exact purpose.

Cons:
* Very slow. Construction of distribution adds overhead see timings below. May be solved in future with speedups of `TransformedDistribution` and `Distribution`.
* Assumes which `dim` to apply softmax over.

```
    y_soft = logits.new(logits.shape)
    y_soft = (logits - y_soft.exponential_().log()) / tau  # Gumbel noise
    y_soft = y_soft.softmax(dim)  # Gumbel softmax noise
```
Pros:
* Faster

```
    import time
    start = time.time()
    num_draws = 1000000
    logits = torch.randn(1,3)

    for draw in range(num_draws):
        y_draw = gumbel_softmax(logits, hard=True)
        counts = counts + y_draw
    print(end - start)

>> 12.995795965194702

>> 7.658372640609741

>> 20.3382670879364
````

Decide on which path to chose. I'll commit in changes to the unit tests in a while to show that it passes both old tests and new tests. I'll also remove the commented code about `RelaxedOneHotCategorical`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13339

Differential Revision: D13092434

Pulled By: ezyang

fbshipit-source-id: 4c21788df336f4e9c2ac289022e395b261227b4b
2019-01-17 12:56:35 -08:00
a667767220 Add matches_jit_signature attribute to native_functions.yaml (#16040)
Summary:
If "matches_jit_signature" is set to True for a particular function, we will assume that the func syntax follows the JIT signature syntax. This is a temporary attribute and doesn't need to be set by developers outside the core team. It serves as a means of tracking an ongoing schema unification with the goal of aligning func syntax with other components of PyTorch in order to reduce overall complexity and match coverage of different function descriptions.

Followup PRs might be about removing _out from native_functions.yaml and using Tensor annotations instead, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16040

Reviewed By: ezyang

Differential Revision: D13703176

Pulled By: cpuhrsch

fbshipit-source-id: ce248e1823a6f18efa95502f9f3eebf023b4a46c
2019-01-17 12:39:08 -08:00
fe4ae9dfe4 add if in register_buffer like register_parameters (#16110)
Summary:
without this "if", code below will throw error " Linear' object has no attribute '_buffers' "
And with this if, error would be "cannot assign buffer before Module.\_\_init\_\_() call", which I think it's more accurate, just like register_parameter.
```
import math
import torch
from torch.nn.parameter import Parameter
from torch.nn import functional as F
from torch.nn import Module
class Linear(Module):
    def __init__(self, in_features, out_features, bias=True):

        self.in_features = in_features
        self.out_features = out_features
        self.register_buffer('test', torch.Tensor(out_features, in_features))
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)

        super(Linear, self).__init__()

        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self):
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

linear = Linear(3,4)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16110

Differential Revision: D13715839

Pulled By: soumith

fbshipit-source-id: c300eff0a8655aade448354cf489a592f7db722a
2019-01-17 11:50:12 -08:00
f1ad5e08c7 Revert D13709409: [pytorch][PR] Exclude pyi from flake8 checks.
Differential Revision:
D13709409

Original commit changeset: ec4a959e146f

fbshipit-source-id: feabed5719a0bfdfe7979074b7e1ba9756c4ba25
2019-01-17 11:28:56 -08:00
6641b09fac respect grad guard for torch.jit._fork and torch.jit._wait (#16101)
Summary:
respect grad guard for torch.jit._fork and torch.jit._wait.

Verified that the test failed without the fix, and pass with the fix.

Ideally I would like to enable and disable grad inside the forked function.
It doesn't seems like it's supported at this moment. This code handles that
as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16101

Differential Revision: D13708374

Pulled By: gqchen

fbshipit-source-id: 0533f080c4d0253fb4c61d2a0d3cc22de5721a09
2019-01-17 11:12:57 -08:00
595f767880 Revert batched pdist, improve existing kernel, add test (#15901)
Summary:
1) Reverts https://github.com/pytorch/pytorch/pull/12302 which added support for batched pdist. Except I kept the (non-batched) test improvements that came with that PR, because they are nice to have.  Motivation: https://github.com/pytorch/pytorch/issues/15511
2) For the non-batched pdist, improved the existing kernel by forcing fp64 math and properly checking cuda launch errors
3) Added a 'large tensor' test that at least on my machine, fails on the batch pdist implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15901

Reviewed By: ezyang

Differential Revision: D13616730

Pulled By: gchanan

fbshipit-source-id: 620d3f9b9acd492dc131bad9d2ff618d69fc2954
2019-01-17 10:44:43 -08:00
fbdafb006e Fix trivial typos in torch.cuda._utils (#16026)
Summary:
Trivial typo fixings.

Maybe the indefinite article "an" is needed before each "specified index" but I'm not perfectly sure.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16026

Differential Revision: D13709499

Pulled By: ezyang

fbshipit-source-id: 698b000bb8aa063afd81db6e67046456a439b2ce
2019-01-17 10:40:43 -08:00
dbe6a7a9ff Unify the shape notation for all of the pytorch modules (#15741)
Summary:
PR to update the shape notation for all of the torch.nn modules to take a unified form. The goal is to make these definitions machine-readable and those checkable by unifying the style across all of the different modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15741

Differential Revision: D13709601

Pulled By: ezyang

fbshipit-source-id: fb89a03903fdf0cd0dcf76f3e469b8582b2f3634
2019-01-17 10:32:14 -08:00
67f2039f4c Fix numerical stability in binomial.log_prob (#15962)
Summary:
This issue was discovered by fehiepsi in https://github.com/uber/pyro/issues/1706 with the `log_prob` computation for Binomial, ~and can be seen with `torch.float32` when we have a combination of low probability value and high `total_count` - a test is added to capture this (since scipy only uses float64, the comparison is done using relative tolerance).~

The problem is in the code that tries to pull out the minimum values amongst the logits (written by me earlier, presumably to avoid numerical instability issues), but it is not needed.

EDIT: After a few attempts, I have been unable to reliably show that the change is more numerically stable, and have removed my previous test which fails on linux. The reason is that the issue manifests itself when `total_count` is high and `probs` is very low. However, the precision of `lgamma` when `total_count` is high is bad enough to wash away any benefits. The justification for this still stands though - (a) simplifies code (removes the unnecessary bit), (b) is no worse than the previous implementation, (c) has better continuity behavior as observed by fehiepsi in the issue above.

cc. fehiepsi, alicanb, fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15962

Differential Revision: D13709541

Pulled By: ezyang

fbshipit-source-id: 596c6853b6e4d5fba42336afa168a665ab6fbde2
2019-01-17 10:18:37 -08:00
c51cf09a4b Automatic update of fbcode/onnx to fd60104394fa353e1762f44ecad1b2166e33deef (#16094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16094

Previous import was 84a0441ae28795a928005863dc142bee81827566

Included changes:
- **[fd60104](https://github.com/onnx/onnx/commit/fd60104)**: deprecate no-spatial mode of BN (#1637) <liqunfu>

Reviewed By: BIT-silence

Differential Revision: D13705357

fbshipit-source-id: 44dbc8bf15fced6d50048b04c2882e38f75c0e34
2019-01-17 10:14:15 -08:00
f09003d95d A trivial typo fixed in onnx.verify.verify (#15871)
Summary:
A trivial typo fixing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15871

Differential Revision: D13709588

Pulled By: ezyang

fbshipit-source-id: 84460e53e30470bef72bc836c08fd149b4d725cf
2019-01-17 09:57:33 -08:00
0e80df515d Remove support for CUDNN 6 (#15851)
Summary: This PR aims to remove support for cuDNN 6.

Differential Revision: D13709595

Pulled By: ezyang

fbshipit-source-id: 853624db1cf66b0534d7028654c38c2806fb4107
2019-01-17 09:57:26 -08:00
1a5c5fe7c9 Exclude pyi from flake8 checks. (#16105)
Summary:
Idiomatic pyi files will fail with Python 2 flake8 even
though they would work with mypy.  This is because pyi
files generally use Python 3 only syntax.  No point
in linting them.

There are currently no pyi files checked in, this is purely
a prophylactic measure.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16105

Reviewed By: zou3519

Differential Revision: D13709409

Pulled By: ezyang

fbshipit-source-id: ec4a959e146f81ccb9533b04348be8dd78808421
2019-01-17 09:51:45 -08:00
76782cfc21 Update cpuinfo to avoid reporting error when sysfs is not accessible (#16107)
Summary:
On some cloud-based x86 systems /sys/ is not mounted.
cpuinfo has a work-around for these systems, but it reports an error if sysfs files fail to read, and this error was confusing to some users (e.g. pytorch/cpuinfo#20). This update downgrades the error to a warning, so it is not reported with default configuration options.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16107

Differential Revision: D13715243

Pulled By: soumith

fbshipit-source-id: f5c4c86422343ca449487f0185f3a8865ccf3b9d
2019-01-17 09:26:49 -08:00
1a09a2a27f Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16106

Differential Revision: D13709490

Pulled By: bddppq

fbshipit-source-id: 1b5b32261f06543371f7bd7ac9b11957a5eb4ad0
2019-01-17 09:18:08 -08:00
334258e39e Potential fix for model inference crash on Win10 (#15919) (#16092)
Summary:
Please refer to issue #15919
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16092

Differential Revision: D13712897

Pulled By: soumith

fbshipit-source-id: edcd1ed3504f1fa1af841a1757616382c745958f
2019-01-17 08:44:02 -08:00
24f4d3987e Move all Stream and Event Python implementation to C++ (#15937)
Summary:
1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation.
2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++
3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937

Differential Revision: D13649001

Pulled By: mrshenli

fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240
2019-01-17 07:29:22 -08:00
1e425d1a47 A trivial typo fix in caffe2.python (#15907)
Summary:
blobl -> globl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15907

Differential Revision: D13709586

Pulled By: ezyang

fbshipit-source-id: 9d3ad76b7fea76c7934407d3c164417b4157e234
2019-01-17 04:57:34 -08:00
7536887cb7 Add count_include_pad for avg_pool on CuDNN (#16100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16100

Add count_include_pad for avg_pool on CuDNN

Reviewed By: houseroad

Differential Revision: D13707959

fbshipit-source-id: 261f5d116066fef75cf9a5787dfbc5d12b5b9f9b
2019-01-17 02:10:12 -08:00
4171ef3728 Enhance the documentation for DistributedDataParallel from torch.nn.parallel.distributed (#16010)
Summary:
- a typo fixed
- made the docs consistent with #5108

And maybe one more change is needed. According to the current docs
> The batch size should be larger than the number of GPUs used **locally**.

But shouldn't the batch size be larger than the number of GPUs used **either locally or remotely**? Sadly, I couldn't experiment this with my single GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16010

Differential Revision: D13709516

Pulled By: ezyang

fbshipit-source-id: e44459a602a8a834fd365fe46e4063e9e045d5ce
2019-01-17 01:02:44 -08:00
ded4ff87af fix a little error in comments (#15922)
Summary:
There is a little error in the comment, "A->B",  so the Task B must start after task A finishes, not "B".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15922

Differential Revision: D13709579

Pulled By: ezyang

fbshipit-source-id: 735afe83f4532b7c7456da3e96209b3e07071f37
2019-01-17 00:25:23 -08:00
c7a48da493 Corresponding data type for BYTE (#15627)
Summary:
TensorProto.DataType in caffe2/proto/caffe2.proto has BYTE = 3 defined, while there is no corresponding TypeMeta defined in caffe2/core/types.cc: DataTypeToTypeMeta. This issue failed the C++ tutorial of MNIST + LMDB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15627

Differential Revision: D13709602

Pulled By: ezyang

fbshipit-source-id: d4826d0f9b3975e6a8478d4bad1abbbedcaea197
2019-01-17 00:17:56 -08:00
ec8b1c94a9 Fix possible importing errors in build_libtorch.py (#15471)
Summary:
1. I fixed the importing process, which had some problems
    -  **I think `setup_helpers` should not be imported as the top level module. It can lead to many future errors. For example, what if `setup_helpers` imports another module from the upper level?** So we need to change it.
    - The code is not consistent with other modules in `tools` package. For example, other
    modules in the package imports `from tools.setuptools...` not `from setuptools...`.
    - **It should be able to run with `python -m tools.build_libtorch` command**  because this module is a part of the tools package. Currently, you cannot do that and I think it's simply wrong.

~~2. I Added platform specific warning messages.
    - I constantly forgot that I needed to define some environment variables in advance specific to my platform to build libtorch, especially when I'm working at a non pytorch root directory. So I thought adding warnings for common options would be helpful .~~

~~3. Made the build output path configurable. And a few other changes.~~

orionr  ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15471

Differential Revision: D13709607

Pulled By: ezyang

fbshipit-source-id: 950d5727aa09f857d973538c50b1ab169d88da38
2019-01-16 23:55:57 -08:00
fcb4b4f002 Remove redundant includes from ir.{h,cpp}.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16080

Differential Revision: D13701796

Pulled By: ZolotukhinM

fbshipit-source-id: 7efae3a0fd969376e4b438a8d8fb96adb33dc55c
2019-01-16 23:43:45 -08:00
f7733526aa Generate PDB files for better debugging on Windows (#16008)
Summary:
1. Unify `build_pytorch_libs.bat`, `setup.py` and `torch/CMakeLists.txt` on the debugging flags with the `CMAKE_BUILD_TYPE` being `Debug`, `Release` and `RelWithDebInfo`.
2. Install PDBs through CMake if they are generated.

Reference:
1. CMake PDB install: https://gitlab.kitware.com/cmake/cmake/issues/18393#note_459199
2. About debugging flags https://stackoverflow.com/a/4662345
3. MSDN page about /DEBUG flag: https://docs.microsoft.com/en-us/cpp/build/reference/debug-generate-debug-info?view=vs-2017
4. MSDN page about /Z{i/I/7}: https://docs.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=vs-2017

Work to do:
- [x] Test the changes work in Release config through this PR
- [ ] <del> Test debug build through https://github.com/pytorch/pytorch/pull/16009 </del>
- [x] Test release build with debugging symbols through #16013

Difficulties:
- [x] Replace /Zi flags with /Z7 (which will be added if DEBUG or RelWithDebInfo is used), as it is not supported by sccache
- [x] Resolve `LINK : fatal error LNK1210: exceeded internal ILK size limit; link with /INCREMENTAL:NO` in the debug build
- [ ] DEBUG build blocked by a MSVC bug. In order to resolve it, we'll need to update the MSVC in CI: https://developercommunity.visualstudio.com/content/problem/225957/fatal-error-lnk1318-unexpected-pdb-error-ok-0.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16008

Differential Revision: D13709527

Pulled By: ezyang

fbshipit-source-id: e8365bc75d9ec64099093f7001f83d99a06b196b
2019-01-16 23:34:32 -08:00
0dfdc2cbdb Update int8_simd.h (#13859)
Summary:
If we use clang with sse4 support, we will have the function redefinition
error between [1] and [2]. This patch try to add some checkings to fix this
problem.

I just turn on USE_NATIVE_ARCH with clang, then I hit the redefinition error.

[1]
caffe2/operators/quantized/int8_simd.h
[2]
third_party/gemmlowp/gemmlowp/fixedpoint/fixedpoint_sse.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13859

Differential Revision: D13095694

Pulled By: ezyang

fbshipit-source-id: c65166e4d5a04bb54e2b82c52740af00116ccb0d
2019-01-16 23:19:46 -08:00
ffd613800f Add IS_PYTORCH_CI flag for testing (#16006)
Summary:
Use case:
Some data loader tests rely on `psutil` (a third party lib). So they are guarded by `skipIf`. But we want to always test them on CI envs. With `IS_PYTORCH_CI`, we can raise if `psutil` is not found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16006

Reviewed By: ezyang

Differential Revision: D13673957

Pulled By: yf225

fbshipit-source-id: c63a7138093f45333c0b371fed0bcc88b67f2a22
2019-01-16 23:07:38 -08:00
7c56db73d5 Moving torch.norm to ATen using TensorIterator (#15414)
Summary:
Adding supports for torch.nomr:
i. multi dimensions for dim
ii. dtype that specifies math/output tensor type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15414

Differential Revision: D13702022

Pulled By: ezyang

fbshipit-source-id: da2676f2b6aff988889b1539d0de8ecd4946823a
2019-01-16 22:15:25 -08:00
55511004d1 Resolve errors in perfkernel for Windows (#16031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16031

1. MSVC only has _mm_prefetch(const char*, int). Fixed in both python codegen and C++ files.
2. uint32_t in "cvtsh_ss_bugfix.h" requires "#include <cstdint>".
3. Some files use gflags headers. Add dependency via c10.
4. Isolate arch flags with interface library and private compile options.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15753

Reviewed By: dskhudia

Differential Revision: D13636233

Pulled By: jspark1105

fbshipit-source-id: cdcbd4240e07b749554a2a5676c11af88f23c31d
2019-01-16 21:51:00 -08:00
aa6b0f50ad add a constexpr in c10::Half (#16091)
Summary:
Debug build generates references which are not resolved otherwise

as recognized by dlibenzi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16091

Differential Revision: D13703584

Pulled By: soumith

fbshipit-source-id: 6ac5666d2c6b1520e083f6eac9c535a1609d9c6b
2019-01-16 21:13:21 -08:00
d277f77da2 Tensor reinitialization codemod - 3/5 (#15912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15912

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13586734

fbshipit-source-id: 8485d2c51225343961351c7a2e8f95055534f9a9
2019-01-16 19:49:01 -08:00
57d29ffa9c Bound shape inference for c2 (#16081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16081

A simple version of bound shape inference, conditioned on batch size. In addition to doing normal shape inference, it will change the batch size (1st dim of the shape) of the inputs as well as batch size modulating ops such as `SparseLengthsSum`. Probably support to more ops is needed, such as `SparseToDense`. We can build on this.

Reviewed By: jackm321, rdzhabarov

Differential Revision: D13661968

fbshipit-source-id: 6a724a647e109757c26e3e26e15a49725ecc75cc
2019-01-16 19:02:56 -08:00
7a5f782c2e Fix max_pool_grad test (#16088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16088

Fix max_pool_grad test

Reviewed By: houseroad

Differential Revision: D13700917

fbshipit-source-id: f4f942ee920bcd943c38a8f8a6aafd1d13c4515f
2019-01-16 15:32:27 -08:00
5b2d30ec85 Revert D12812029: [pt1][tensor] Remove deprecated caffe2::Tensor APIs
Differential Revision:
D12812029

Original commit changeset: ea0c3dd882be

fbshipit-source-id: d5bb4cbb1d7c9be08789599a7db0fb3313f3dbc4
2019-01-16 14:53:20 -08:00
237c0c3c7a Port the backend of FractionalMaxPool3d from TH to ATen (#15575)
Summary:
1. Port the FractionalMaxPool3d implementation from THNN/THCUNN to ATen.
2. Expose this function to Python module nn.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15575

Differential Revision: D13612848

Pulled By: chandlerzuo

fbshipit-source-id: 5f474b39005efa7788e984e8a805456dcdc43f6c
2019-01-16 14:16:30 -08:00
aff0964ee7 update pytorch docker to cuda 10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16082

Differential Revision: D13699081

Pulled By: soumith

fbshipit-source-id: 86942e2c5595931384cf87dd1ef75936a4d74a57
2019-01-16 13:37:37 -08:00
d33e7d1236 multinomial: fix detection of zero probability (#16075)
Summary:
The cumsum over the probabilities can be not monotonically
non-decreasing. Thus it is hard to detect zero probability
classes using just the cumsum.
This changes the binary search postprocessing to use the
(non-cumulated) distribution instead.

Thank you, jcjohnson, for the bug report with
reproducing case.

Fixes: #13867
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16075

Differential Revision: D13695565

Pulled By: soumith

fbshipit-source-id: 02c4d6f868f0050c1ae7d333f4317c5610e49cd9
2019-01-16 12:50:49 -08:00
e58cc6ab28 Enable single graph sharing between multiple threads for onnxifiop (#16047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16047

Implements single thead safe map enabling sharing of generated graph between
different ops.
Added model_id to every onnxified op to help create a unique id in the map.
Some formatting fix.

Reviewed By: yinghai

Differential Revision: D13663927

fbshipit-source-id: 27417e8fe752fdd48abb6a87966cd76d592e1206
2019-01-16 12:19:16 -08:00
503f412f79 Fix error message formatting in AT_CHECK/AT_ERROR (#16067)
Summary:
Changelog:

- Fix formatting for error messages in prelu, EmbeddingBag, RNN

Fixes https://github.com/pytorch/pytorch/issues/16043
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16067

Differential Revision: D13693286

Pulled By: soumith

fbshipit-source-id: b0760d13c9a45e82dababfc44dabe648e5345ca3
2019-01-16 11:34:13 -08:00
71b24127d2 Correct sphinx-note in symeig (wrong indentation)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16073

Differential Revision: D13692874

Pulled By: soumith

fbshipit-source-id: ea2a98e88679d382f9a2edab199e9ba7c8ce2213
2019-01-16 10:47:48 -08:00
3cf76e78bd Fix the caffe2_gpu linkage with torch on Windows (#16071)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15992.
Inspired by https://docs.microsoft.com/en-us/cpp/build/reference/optimization-best-practices?view=vs-2017. But this PR needs to be tested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16071

Differential Revision: D13693006

Pulled By: soumith

fbshipit-source-id: e83e9ae2591fa4da01d2b1b593558dba3bdc3cf7
2019-01-16 09:10:39 -08:00
a2af554e6f Port legacy all(*) to ATen (#15540)
Summary:
Questions:

1. ~This PR disables `common_dtype` computation [in `TensorIterator.cpp`](https://github.com/mrshenli/pytorch/blob/all/aten/src/ATen/native/TensorIterator.cpp#L489-L491) for `all*` operators. The reason is that, [this code](https://github.com/mrshenli/pytorch/blob/all/aten/src/ATen/native/TensorIterator.cpp#L120) otherwise complains type mismatch, where the `op.tensor` is `type Variable[CPUByteType]` while the `op` is `CPUByteType`. I am not sure if this is the right solution for this problem.~

2. Should I clean up all occurrences of `_th_all` and `_th_all_out` (and `logicalAnd`, `logicalAndAll`)?

3. Do I need to implement derivatives for `all`?

gchanan

Benchmark:

<img width="590" alt="screen shot 2018-12-26 at 3 24 31 pm" src="https://user-images.githubusercontent.com/16999635/50456505-e9596a00-0922-11e9-844e-00c4b4aad7ca.png">

<img width="587" alt="screen shot 2018-12-26 at 3 26 10 pm" src="https://user-images.githubusercontent.com/16999635/50456509-ef4f4b00-0922-11e9-96bf-0a30c8574fe7.png">

<img width="590" alt="screen shot 2018-12-26 at 3 26 54 pm" src="https://user-images.githubusercontent.com/16999635/50456510-ef4f4b00-0922-11e9-8a63-e47988843cc8.png">

<img width="589" alt="screen shot 2018-12-26 at 3 27 16 pm" src="https://user-images.githubusercontent.com/16999635/50456511-ef4f4b00-0922-11e9-9004-2518aebcdc6e.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15540

Differential Revision: D13548938

Pulled By: mrshenli

fbshipit-source-id: 5a2e5eef1047decb4c79906cb9f3332034908c9c
2019-01-16 09:06:26 -08:00
411173757e Rename away uses of THAllocator and THCDeviceAllocator (#16061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16061

I discovered I needed to delete these names in preparation of moving
THCCachingAllocator to c10_cuda; might as well also fix all the other
sites too.

Reviewed By: dzhulgakov

Differential Revision: D13686869

fbshipit-source-id: e8cc55d39ac4bfd3e3a22c761f89a7a111ce5f5e
2019-01-16 05:36:47 -08:00
4d07951a54 Stop pretending that TH headers are both C++ and C compatible. (#16059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16059

Just deleted all __cplusplus ifdef guards; we only ever use
these headers in C++ contexts.

Reviewed By: dzhulgakov

Differential Revision: D13686580

fbshipit-source-id: ce28c4a32f3596bfb17aeeb34904a02899991453
2019-01-16 05:36:45 -08:00
fb68d813be Fix logic errors when accumulating reductions in output (CUDA) (#16023)
Summary:
The correct logic is as follows:

* If there is an earlier split, we need to combine with its result
* If there is *not* a later split, we need to project before saving into the output.

This should partially f i x #15837  . For example:
```
In [7]: a=torch.ones([1838860800], dtype=torch.float, device="cuda:1")

In [8]: a.mean()
Out[8]: tensor(1., device='cuda:1')
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16023

Differential Revision: D13678449

Pulled By: umanwizard

fbshipit-source-id: ab5078484c88e96bb30121b5cf24a0e8b0a8c2f8
2019-01-15 19:57:57 -08:00
5353847b19 Remove deprecated caffe2::Tensor APIs (#15814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15814

Plan is to remove the APIs we want to deprecate one by one and make sure it still builds in sandcastle and ossci

Reviewed By: ezyang

Differential Revision: D12812029

fbshipit-source-id: ea0c3dd882bec95fcd4507160ebc61f598b6d040
2019-01-15 18:42:04 -08:00
5e72e99c86 Remaining Tensor API fixes - dims() -> sizes() (#15743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15743

Remaining fixes so that D12812029 will compile

Reviewed By: dzhulgakov

Differential Revision: D13535559

fbshipit-source-id: 2c8b3403570c8c35ac8efe2d827233abc0e6e0d1
2019-01-15 18:42:02 -08:00
8b5894491c Comment about CuDNNWrapper (#15496)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15496

Differential Revision: D13544130

Pulled By: ezyang

fbshipit-source-id: 51bdd8312b482925b30a478774cdfa629c57ee4e
2019-01-15 18:01:12 -08:00
ad39cbde59 Port FractionalMaxPool2d from TH to ATen (#15531)
Summary:
Tested:

pytest test/test_nn.py -k Fractional
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15531

Differential Revision: D13612833

Pulled By: chandlerzuo

fbshipit-source-id: b919d698d068b97ba7a4f8021367e7f6c8aae39c
2019-01-15 17:57:12 -08:00
dc4977ddf0 Support tracing GenericList (#15969)
Summary:
Treat GenericList similarly to tuples and TensorList: recursively unpack them and assignValueTrace accordingly. Also add interpreter support for ListUnpack on GenericList
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15969

Differential Revision: D13665139

Pulled By: jamesr66a

fbshipit-source-id: cd8cb3dd7475f424e48a69d217f2eac529df9f6a
2019-01-15 17:32:48 -08:00
b792bfec0e s/fwdproxy.any/fwdproxy/g in fbsource (#16024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16024

codemod with 'Yes to all': s/fwdproxy.any/fwdproxy/g in fbsource

Reviewed By: maxgeorg

Differential Revision: D13666336

fbshipit-source-id: a5a694d66efec5304a1c8c231d638441f88efe1d
2019-01-15 17:26:31 -08:00
8f11df3cb7 Automatic update of fbcode/onnx to 84a0441ae28795a928005863dc142bee81827566 (#16046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16046

Previous import was 7abd834091f1024c11749dcfd25126802db9fdd5

Included changes:
- **[84a0441](https://github.com/onnx/onnx/commit/84a0441)**: Clarify namescopes in the presence of nested subgraphs (#1665) <G. Ramalingam>
- **[118fec5](https://github.com/onnx/onnx/commit/118fec5)**: Add Where op. (#1569) <Sergii Dymchenko>
- **[beefa15](https://github.com/onnx/onnx/commit/beefa15)**: Use strings directly for casing as np.object w/o redundant StringHolder. (#1736) <Dmitri Smirnov>
- **[4023bae](https://github.com/onnx/onnx/commit/4023bae)**: Add a capability to input/output unicode strings (#1734) <Dmitri Smirnov>
- **[1a8a7fc](https://github.com/onnx/onnx/commit/1a8a7fc)**: typos fixed: iutput -> input (#1726) <Beomsoo Kim>
- **[0128478](https://github.com/onnx/onnx/commit/0128478)**: Scan test update (#1732) <G. Ramalingam>
- **[c6a24fd](https://github.com/onnx/onnx/commit/c6a24fd)**: turn rtol to 0.002 on densenet121, since AMD and Nvidia GPU's precion difference (#1733) <Lu Fang>
- **[5b7ac72](https://github.com/onnx/onnx/commit/5b7ac72)**: Add Shrink operator (#1622) <Rui Zhu>

Reviewed By: yinghai

Differential Revision: D13676711

fbshipit-source-id: 513cc137223469b47af48919432aaecf58006012
2019-01-15 17:17:31 -08:00
13f38ab79d Add count_include_pad to average_pool_gradient_op (#15997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15997

Add count_include_pad to average_pool_gradient_op

Reviewed By: houseroad

Differential Revision: D13648339

fbshipit-source-id: 205cb2acb32dc24a85256b628298b1a11f0ffa2c
2019-01-15 16:56:40 -08:00
b2eb98f6c3 Remove cuda from autograd profiler (#15898)
Summary:
This puts stubs in the autograd profiler for the use of cuda APIs allowing the cuda parts of libtorch to be linked separately from the CPU parts.

This also edits the buck build.

Previous:

For GPU builds:
_C -> csrc -> caffe2
For CPU builds:
_C -> csrc-cpu -> caffe2

Now:
GPU:
_C -> libtorch_cuda -> (libtorch -> caffe2, for CPU)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15898

Reviewed By: ailzhang

Differential Revision: D13617991

Pulled By: zdevito

fbshipit-source-id: 6d84a50bb356a54b4217f93219902755601b00e1
2019-01-15 16:43:11 -08:00
2824cb6e9c Fix namespace typo. (#16021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16021

Adds nom:: so that TRIVIAL_CONVERTER works more generally.

Reviewed By: janewangfb

Differential Revision: D13664748

fbshipit-source-id: 100f47a8326e41bd0ac2ae281669f5a0363fe060
2019-01-15 16:43:08 -08:00
c448f85e1f Fixing missing cpp tests for Caffe2 setup.py builds (#16037)
Summary:
These were broken (always skipped in setup.py builds) by https://github.com/pytorch/pytorch/pull/15917
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16037

Differential Revision: D13675549

Pulled By: pjh5

fbshipit-source-id: fed50855dd0b5d0c80fface3d8b2156f18aae4e7
2019-01-15 13:09:12 -08:00
57b5e7572b Test cases for calling caffe2 LayerNorm from PyTorch and JIT
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15895

Reviewed By: dzhulgakov

Differential Revision: D13615336

fbshipit-source-id: de28fef8ce025d6d37a4c80c029ec97b7195cfd9
2019-01-15 12:03:57 -08:00
620ff25bdb Enhance cpu support on gloo based multi-nodes mode. (#11330)
Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330

Reviewed By: yinghai

Differential Revision: D13624519

Pulled By: wesolwsk

fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f
2019-01-15 11:47:10 -08:00
7d601715e5 Constant prop prim::None (#15979)
Summary:
Previously we were only constant propping prim::Constants, but we should be constant propping prim::None as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15979

Differential Revision: D13664692

Pulled By: eellison

fbshipit-source-id: 01839403576c21fc030c427e49275b8e1210fa8f
2019-01-15 11:34:51 -08:00
9a6fe4feec Add a note about THNN height/width/etc argument reordering. (#15819)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15819

Differential Revision: D13665297

Pulled By: ezyang

fbshipit-source-id: 4570275bc9e65269788f836f2447d09474cefeff
2019-01-15 10:52:39 -08:00
406b9c49bd Fix Python path finding for benchmark tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16022

Differential Revision: D13673792

Pulled By: pjh5

fbshipit-source-id: 177a823ef343b7f60e26ad9ef51415332045438d
2019-01-15 10:48:40 -08:00
7f1397acef Quantized RNNCell modules (#15469)
Summary:
Similarly to https://github.com/pytorch/pytorch/pull/13777, we apply post-processing quantization to RNN cell modules (`RNNCell`, `LSTMCell`, and `GRUCell`).

A further follow-up PR will involve quantizing the full `RNN`, `GRU`, and `LSTM` modules. This depends on those modules being scriptable as part of the standard library scripting effort, though. Note that infrastructure in this pr such as `gather_quantized_params` is currently unused but should be used in the future when we can port over the full RNN modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15469

Differential Revision: D13545802

Pulled By: jamesr66a

fbshipit-source-id: ad3b694517842893ea619438e9f5e88fd7b96510
2019-01-15 10:40:51 -08:00
19717224c5 Miscellaneous broken RSTs fixed (#16033)
Summary:
https://pytorch.org/docs/master/tensors.html#torch.Tensor.bernoulli_
https://pytorch.org/docs/master/torch.html#torch.addmm
https://pytorch.org/docs/master/distributed_deprecated.html#torch.distributed.deprecated.reduce_multigpu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16033

Differential Revision: D13671202

Pulled By: soumith

fbshipit-source-id: 276e10e610affe205376573e7f0f9894695d218d
2019-01-15 09:50:12 -08:00
b329e03684 Add PyTorchPredictorContainer (#15899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15899

Add PyTorchPredictorContainer to support multiple jit script modules

Reviewed By: pritamdamania87

Differential Revision: D13596139

fbshipit-source-id: 3ce0bdf2f4dbba7aa1d20e824d03e5ac98f5d887
2019-01-15 09:18:18 -08:00
1065e7cd24 Add itertools.{prod, combinations, combinations_with_replacement} like op to pytorch (#9393)
Summary:
closes https://github.com/pytorch/pytorch/issues/7580
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9393

Differential Revision: D13659628

Pulled By: zou3519

fbshipit-source-id: 3a233befa785709395a793ba8833413be394a6fd
2019-01-15 08:31:22 -08:00
964732fa8d use fbgemm gconv in dnnlowp (#16020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16020

Needs to go over more iterations. For conv, I think we need a high level interface that abstracts out low-level details of which code path will be taken (acc16, outlier-aware, depth-wise, group conv, ...) otherwise the client code will be complex as can be seen from DNNLOWP Conv ops. This will also help us to make interface more stable.

Reviewed By: dskhudia, jianyuh

Differential Revision: D13588996

fbshipit-source-id: 9afce9e441bcaf20437fcc2874fb9d4165a46bcb
2019-01-15 00:02:31 -08:00
bc233fe405 var for multiple dimensions (#15892)
Summary:
Timings are the same as for `std` .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15892

Differential Revision: D13651173

Pulled By: umanwizard

fbshipit-source-id: a26bf1021dd972aa9e3e60fb901cd4983bfa190f
2019-01-14 20:17:42 -08:00
02b9f686a7 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 19841cff4a7fd69318d7828db75c16cd75757edd
2019-01-14 20:17:41 -08:00
5d527b9cc2 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 68b7c41366618ffd636c2b9c45c7ffbbcbc44f85
2019-01-14 18:43:27 -08:00
10b16953d1 nomnigraph - easy - use new test utils in converter_nomnigraph_test (#15751)
Summary:
Use new test utils in converter_nomnigraph_test , and add utils to set device option name, external inputs, outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15751

Differential Revision: D13586228

Pulled By: duc0

fbshipit-source-id: ff809dd7bf9f30641ce2a6fef7e2810f005521c2
2019-01-14 18:38:38 -08:00
4ed9de8680 Remove code duplication (#15880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15880

The layer_norm reference was implemented twice. Removing one of them.

Reviewed By: dzhulgakov

Differential Revision: D13611232

fbshipit-source-id: cee96c78d3255c3a4e34300693bf9260cf096615
2019-01-14 17:59:37 -08:00
ddece5a793 Fix ormqr docs, fixes #15565 (#15694)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc meganset
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15694

Differential Revision: D13573064

Pulled By: zou3519

fbshipit-source-id: 1d0b693d7c26db91826b81e6c98b45a69b5e9bc4
2019-01-14 17:08:18 -08:00
774705ba05 Fix c10d checking errno unconditionally (#15986)
Summary:
In #15964, I learned that `errno` is only meaningful if the function call fails. E.g., on some macos, a successful `fork()` sets `errno` to `EINVAL` in child process. This commit changes the `SYSCALL` macro so error checking is only done when an error happens. This means checking whether `rv == -1` for most calls, but is checking `rv == nullptr` for `inet_ntop`.

Now `SYSCALL` accepts a second argument `success_cond`, which should be an expression returning whether the call succeeded. `SYSCHECK_ERR_RETURN_NEG1` is the shorthand for checking if rv is `-1`.

Any suggestion on better macro names is welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15986

Reviewed By: janewangfb

Differential Revision: D13661790

Pulled By: pietern

fbshipit-source-id: 9551b14b9f88805454a7bfb8e4d39e0f3aed8131
2019-01-14 16:02:05 -08:00
4fb3931896 add tensor.to to script (#15976)
Summary:
Previously it only worked with keyword arguments. Now it is fully compatible.

Fix for: https://github.com/pytorch/pytorch/issues/15478
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15976

Differential Revision: D13643979

Pulled By: eellison

fbshipit-source-id: 6a47bce7db362da80452adffebd2732f8e62a240
2019-01-14 15:49:31 -08:00
8964a2e6e6 Split Caffe2 CI into cmake-only and python builds (#15917)
Summary:
bypass-lint

- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917

Reviewed By: orionr

Differential Revision: D13637583

Pulled By: pjh5

fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153
2019-01-14 15:20:44 -08:00
4bdaca827c Make call operator on module holder call forward (#15831)
Summary:
In Python, you can use the call operator to invoke the `forward()` method of a module. In C++ this was currently not possible, because I couldn't figure out how to deduce the return type of a module's `forward()` method under the constraint that `forward()` may not exist at all (since the base module class in C++ does not mandate a `forward()` method). I now figured it out, so the call operator can be used.

ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15831

Differential Revision: D13652676

Pulled By: goldsborough

fbshipit-source-id: ccab45a15215dda56460e560f0038781b539135f
2019-01-14 14:40:33 -08:00
c620725177 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 0e31357e8a34614226e8948ae76d67e0786a9196
2019-01-14 12:46:24 -08:00
8c55e56c37 Fix broken rst of torch.nn.utils.spectral_norm and others (#15995)
Summary:
- Currently, the [rst](https://pytorch.org/docs/stable/nn.html#torch.nn.utils.spectral_norm) looks broken, at least in my browser. So I fixed it.
- I thought a subscript may be needed to the left W in the definition.
- A few typos fixed.

crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15995

Differential Revision: D13649888

Pulled By: soumith

fbshipit-source-id: 00a2c3b043c7c8ebdd9fc2bf77ba27ae695fee3f
2019-01-14 07:35:36 -08:00
300dcc3b96 Add cuda.reset_max_memory_* (#15985)
Summary:
Addresses #15968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985

Differential Revision: D13649916

Pulled By: soumith

fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff
2019-01-14 07:31:51 -08:00
7c08f1083e libshm retry on EINTR (#15964)
Summary:
fixes https://github.com/pytorch/pytorch/issues/14314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15964

Differential Revision: D13639034

Pulled By: soumith

fbshipit-source-id: 44592762aa46982e5d3616d55b5666a2c2ce9105
2019-01-14 04:30:40 -08:00
abdaa477e5 Improved the documentation for torch.nn.functional.pad (#15984)
Summary:
- Fixed a few typos and grammar errors.
- Changed the sentences a bit.
- Changed the format of the tuples to be consistent with padding notations in the other places. For example, `ReflectionPad2d`'s dostring contains :math:`H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}`.

I also made sure that the generated html doesn't break.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15984

Differential Revision: D13649939

Pulled By: soumith

fbshipit-source-id: 0abfa22a7bf1cbc6546ac4859652ce8741d41232
2019-01-14 04:12:45 -08:00
6ec753f2f9 Improve the docstring of nn.random.fork_rng (#15960)
Summary:
Improved the docstring of nn.random.fork_rng
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15960

Differential Revision: D13649929

Pulled By: soumith

fbshipit-source-id: d3843179a2f1f838792c2f07f34deda2c06af56e
2019-01-14 02:41:18 -08:00
492b7d410b doc fixes (#15990)
Summary: fixes  #15597 ,  #15283 and #10258

Differential Revision: D13649905

Pulled By: soumith

fbshipit-source-id: 753f46c2c96c61fba460019d9ed3e0d047d42ee7
2019-01-13 23:38:39 -08:00
ca18fb8567 simplify lambda function use in conv dnnlowp ops to fix #15911 (#15996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15996

As reported in issue #15911, gcc 4.9 was getting internal compiler error due to a complex use of lambda function in conv_dnnlowp_op.cc and conv_acc16_op.cc . This diff simplifies them.

Reviewed By: viswanathgs

Differential Revision: D13648264

fbshipit-source-id: 1551ae8a0a7653749185dca51ccceb2471b96b82
2019-01-13 23:32:48 -08:00
a7415787ac fix RandomSampler length (#15991)
Summary:
Hi!

This PR addresses #15537  issue.
Please review.

Thanks!

Differential Revision: D13649890

Pulled By: soumith

fbshipit-source-id: 166212ae383331345423236dfc4fa2ea907d265d
2019-01-13 23:09:51 -08:00
e18d6cd455 Fix static build on Windows (#15989)
Summary:
Tested locally. It could be now be started by running `set EXTRA_CAFFE2_CMAKE_FLAGS= -DTORCH_STATIC=1` before build. If we want to make sure it works, then maybe we should add it into CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15989

Differential Revision: D13649935

Pulled By: soumith

fbshipit-source-id: 956945ed572819d8cf0bc9bd48df3ea9bc6f4a8a
2019-01-13 22:53:30 -08:00
a282378baf Caffe 2: Reshape Op upgrade (#15380)
Summary:
This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380

Differential Revision: D13649825

Pulled By: houseroad

fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94
2019-01-13 22:49:40 -08:00
04b8a2f1ba fix compile error reported in issue #15911 (#15953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15953

Fix issue reported in https://github.com/pytorch/pytorch/issues/15911

Reviewed By: csummersea

Differential Revision: D13633256

fbshipit-source-id: 3808f100ff7dedfe5e20708e72e6081ff07eb32c
2019-01-12 21:03:12 -08:00
6371bc76a9 Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983

Original commit changeset: 6e4275d02f4c

Reviewed By: supertopher, Yangqing

Differential Revision: D13644123

fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504
2019-01-12 07:07:22 -08:00
35480a7c44 Remove StopGradient op when it is inplace in inference (#12152)
Summary:
For Inference, if the StopGradient op is inpalce, we just remove it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12152

Differential Revision: D13633946

Pulled By: yinghai

fbshipit-source-id: 57762bcc37b38a1d39cb4af316ca50bfe961b105
2019-01-11 23:55:01 -08:00
586d030311 Add global pooling specialization and also update MaxPooling on GPU (#15824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15824

Add global pooling specialization and also update MaxPooling on GPU

Reviewed By: houseroad

Differential Revision: D13596340

fbshipit-source-id: c8a42aa69ee92c383c9f19d3ed57b77cb3e5bd28
2019-01-11 22:37:48 -08:00
83c054de48 AliasDB interface cleanup (#15656)
Summary:
This is the first of several PRs to simplify AliasDb usage.
- Hide the concept wildcards from users. They are too hard to think about and too easy to forget about.
- Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move).

Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656

Differential Revision: D13615492

Pulled By: suo

fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f
2019-01-11 20:06:53 -08:00
00b2dff6b6 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 2671ea6bb594280a9d3352fbfa3628f28c6847aa
2019-01-11 19:57:11 -08:00
a4c1aa4bc5 Add the normalize transform to the core library (#15891)
Summary:
Adds the `Normalize` transform to the core C++ frontend library.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15891

Differential Revision: D13642167

Pulled By: goldsborough

fbshipit-source-id: 573428e626d6106cf2aadf3dc2e2aecb9a85efc3
2019-01-11 19:50:18 -08:00
e5266b4ba6 3x3x3 depthwise convolution with per channel quantization (#15775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15775

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/55

fbgemm didn't have per-channel quantization for 3x3x3 depth-wise convolution

Reviewed By: jianyuh

Differential Revision: D13587438

fbshipit-source-id: 91c36fae7a0e8386e3bc49808e18918b01681dd1
2019-01-11 19:42:29 -08:00
264e16bffd Make it consistent for OperatorBase usage (#15908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15908

"OperatorBase::" is changed to "this->template ".

For example,

  # This no longer works
  OperatorBase::GetSingleArgument<>()
  # Should change to:
  this->template GetSingleArgument<>()

https://fb.workplace.com/groups/101100140348621/permalink/576804082778222/

Follow up of D13574832.

Sample Diff:
D9319742, D10045844.

Reviewed By: jspark1105

Differential Revision: D13613574

fbshipit-source-id: 2cb4094557b4af78d41e289816cad3e1194fb82c
2019-01-11 19:32:58 -08:00
55b0e2a1eb rocm build (#15981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15981

caffe2/operators/unique_ops.cu translated to caffe2/operators/hip/unique_ops.hip breaks rocm build

Reviewed By: BIT-silence

Differential Revision: D13646129

fbshipit-source-id: 900a14e14216686ec4560b30df2eabbd7ec2ff91
2019-01-11 18:39:52 -08:00
6f08e2a588 Updating submodules
Reviewed By: zpao

fbshipit-source-id: 3bbf550cb0bfe71c05b73b8bc4ce97285b50608b
2019-01-11 18:00:01 -08:00
bff0f88cc8 Tensor construction codemod(ResizeLike) - 2/3 (#15940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15940

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13629047

fbshipit-source-id: 5f0641a9aaab9045fa63c32c6a07a4cab3340cc3
2019-01-11 17:41:48 -08:00
162ad94590 Fixed typo in batchnorm docstrings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15975

Differential Revision: D13642271

Pulled By: soumith

fbshipit-source-id: 60ffa392bf1f916f2b93c943bb44a642a9815c42
2019-01-11 17:28:37 -08:00
fd0ed2e298 Tensor reinitialization codemod - 4/5 (#15967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15967

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13586735

fbshipit-source-id: eae2d79e1107a2e813ce3809e690af4706aaa9ca
2019-01-11 16:41:19 -08:00
94acddb57f Fix the lint (#15973)
Summary:
Fix the lint error introduced in https://github.com/pytorch/pytorch/pull/15965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15973

Differential Revision: D13640856

Pulled By: houseroad

fbshipit-source-id: 3f14d9898dcfb0fc469468f63fa1461c88b66b2e
2019-01-11 15:59:59 -08:00
eb15587c99 Tensor reinitialization codemod - 2/5 (#15947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15947

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13586732

fbshipit-source-id: 5295ab27ca0155f96a4fccf9c0ba8a609101ba24
2019-01-11 15:05:01 -08:00
1235aa4fca Expose dim() on type and use it in ONNX symbolics (#15933)
Summary:
While integrating fork/join into production translation, we found that trying to export `transpose()` where the input is of `TensorType` (rather than `CompleteTensorType`) failed. This is not ideal, since `TensorType` still contains the number of dimensions of the tensor, and that's all the `transpose` symbolic needs.

This PR introduces a pybind binding for `dim()` on `TensorType` (and `CompleteTensorType` by inheritance). We now use this in places where it logically makes sense in the symbolics: those symbolics which only require knowledge of the number of dimensions rather than concrete sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15933

Differential Revision: D13639657

Pulled By: jamesr66a

fbshipit-source-id: 6e50e407e93060085fd00a686a928764d0ec888d
2019-01-11 14:54:19 -08:00
253b680928 Tensor construction codemod(ResizeLike) - 3/3 (#15943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15943

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13629082

fbshipit-source-id: d3863615fd612f73bb73ac67159fd0f0d237fe5c
2019-01-11 14:34:31 -08:00
d042914221 FC shape inference should use int64_t (#15961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15961

as title

Reviewed By: yinghai

Differential Revision: D13634427

fbshipit-source-id: ec7d168b6272f0dac8a693401cfd0bea368f929a
2019-01-11 14:28:39 -08:00
d33159a426 Undo norm optimizations and add more documentation for parallel.h (#15885)
Summary:
See https://github.com/pytorch/pytorch/issues/15602
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
2019-01-11 13:32:35 -08:00
926e718d5f Add/fallback some operators for mkl-dnn (#11696)
Summary:
Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW.
Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator.
Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators.
Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators.
Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696

Reviewed By: yinghai

Differential Revision: D10100438

Pulled By: wesolwsk

fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d
2019-01-11 12:53:06 -08:00
96ea2594d8 Don't call cudaStreamDestroy at destruction time (#15692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692

It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed.

Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose.

Reviewed By: Yangqing

Differential Revision: D13571988

fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46
2019-01-11 12:36:41 -08:00
726341fea7 Tensor construction codemod(ResizeLike) - 1/3 (#15944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15944

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13628999

fbshipit-source-id: e17c44cec6746674dfd5c2a89c28c4ac0a3da450
2019-01-11 12:28:12 -08:00
bcc88dfb4e Move nightly binary builds to 05:05 UTC (#15966)
Summary:
This corresponds to 00:05 EST
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15966

Differential Revision: D13639027

Pulled By: pjh5

fbshipit-source-id: 6685a7af74329b2730e519afd10e350ef2258f32
2019-01-11 11:46:21 -08:00
e07cca1312 Add backend checks for batch norm (#15955)
Summary:
Fixes #15826

Changelog:
- Add backend checks in `batch_norm_cpu` and `batch_norm_cuda`
- Modify check in `checkBackend` to pass on undefined tensors.

Differential Revision: D13636410

Pulled By: soumith

fbshipit-source-id: 3b1cfe5ca8b7c0346569077163503065e75c2659
2019-01-11 11:28:45 -08:00
c9d7ead0c4 Add scalar_type_to_pytorch_type dict in ONNX symbolic
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15965

Differential Revision: D13637521

Pulled By: zrphercule

fbshipit-source-id: 922cadc56f6380f67c14444cff4aa354a87150af
2019-01-11 10:55:43 -08:00
3f6b212e80 Register CPU/CUDA fuser dynamically (#15887)
Summary:
This avoids a bunch of conditional compilation logic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15887

Reviewed By: eellison

Differential Revision: D13613239

Pulled By: zdevito

fbshipit-source-id: a18fc69676b3ef19b4469ab58d8714d1f6efccbb
2019-01-11 10:50:35 -08:00
d580d3583b Simplify cat fusion (#15633)
Summary:
That makes that definition of a "fusable node" much simpler,
as we don't need to keep considering whether something has to be an
"exit node" at every step. The fuser now tries to maximize the
pointwise fusions first, and proceeds to prepending chunks and appending
concats only once a fix point is reached.

This patch not only makes the fuser much simpler to reason about,
making it siginifcantly easier to implement features like SumToSize
fusion, to improve performance of derivative graphs.

cc zou3519 mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15633

Differential Revision: D13575306

Pulled By: zou3519

fbshipit-source-id: 0c55ea61d65d1f1ed3d75a8e1e83bc85a83f3aff
2019-01-11 10:33:42 -08:00
3d0d16d31c Add bindings for .cpu() & .cuda() to script (#15904)
Summary:
Adding bindings for .cpu() and .cuda() to script.

It's worth noting that if the device remains unchanged, than the returned tensor aliases the input, but if it does change than they do not alias each other.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15904

Differential Revision: D13632879

Pulled By: eellison

fbshipit-source-id: 024a04f267909674aa1e510562efd9cb081f407c
2019-01-11 10:04:08 -08:00
03a570cad9 comment out large test cases for tril(u)_indices (#15959)
Summary:
4GB is still too large and leads to CUDA OOM failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15959

Differential Revision: D13635146

Pulled By: mrshenli

fbshipit-source-id: 3dc34a03d6ed65c458839d8fa37cd05bf3bc8106
2019-01-11 09:25:03 -08:00
7841fe4f27 Automatic update of fbcode/onnx to 7abd834091f1024c11749dcfd25126802db9fdd5 (#15942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15942

Previous import was 8384c788939bc65463f9754b6a7a00b212b18ba1

Included changes:
- **[7abd834](https://github.com/onnx/onnx/commit/7abd834)**: Clarify some aspects of the Loop spec. (#1587) <Scott McKay>
- **[5a5b15f](https://github.com/onnx/onnx/commit/5a5b15f)**: Support rtol and atol at the model granularity (#1723) <Lu Fang>
- **[ba76e45](https://github.com/onnx/onnx/commit/ba76e45)**: print some information (#1724) <Lu Fang>
- **[797390d](https://github.com/onnx/onnx/commit/797390d)**: Update README.md (#1722) <Prasanth Pulavarthi>
- **[40cdb5f](https://github.com/onnx/onnx/commit/40cdb5f)**: repaire convtranspose shape inference (#1660) <peter yang>
- **[68fdb3f](https://github.com/onnx/onnx/commit/68fdb3f)**: [Minor] Fix Windows line ending in test coverage generating script (#1717) <Raymond Yang>
- **[00101bf](https://github.com/onnx/onnx/commit/00101bf)**: Remove ConstantLike op. Updates to ConstantOfShape op. (#1716) <Spandan Tiwari>
- **[c59e90a](https://github.com/onnx/onnx/commit/c59e90a)**: add a shape inference test for group conv (#1719) <Lu Fang>

Reviewed By: zrphercule

Differential Revision: D13629499

fbshipit-source-id: 4b3e4cb29bdb84c3777a8fb26263548efb20f317
2019-01-11 08:28:01 -08:00
70dd44f6a8 Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
Summary:
Fixes #15764
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15886

Differential Revision: D13612971

Pulled By: umanwizard

fbshipit-source-id: 91f552a25d1fd108f2f0b10e09a0ce0364f8c21e
2019-01-11 08:14:11 -08:00
b7cdeb3fc3 Port empty_strided to ATen. (#15948)
Summary:
Turns out this has basically been implemented already in Resize.h / Resize.cuh.
Also added some testing, basically just to check that empty_strided behaves equivalently to as_strided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15948

Differential Revision: D13631098

Pulled By: gchanan

fbshipit-source-id: eb0e04eead45e4cff393ebde340f9d265779e185
2019-01-11 07:58:05 -08:00
14dcdc4c35 Move cudaDeviceProp to ATen (#14834)
Summary:
This PR moves `deviceProperties` from `THCState` struct to `CUDAContext` in ATen and hence, takes one more step towards removing `THCState`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14834

Differential Revision: D13633956

Pulled By: soumith

fbshipit-source-id: 51820ac224fc566f17aa92570fd378cff4248596
2019-01-11 07:09:32 -08:00
da753b7ccf Trivial typo fixings in nn.functional dropout* docstrings (#15951)
Summary:
Defualt -> Default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15951

Differential Revision: D13633875

Pulled By: soumith

fbshipit-source-id: 0da823ef235418396e9322089f6610b592e6990f
2019-01-10 22:42:52 -08:00
86af14b0c7 Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461)
Summary:
When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning.

This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461

Differential Revision: D13633952

Pulled By: soumith

fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff
2019-01-10 21:44:39 -08:00
07ea3e035e Fix fallback issues to handle inplace case (#15726)
Summary:
Fix fallback issues to handle inplace case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15726

Differential Revision: D13591243

Pulled By: yinghai

fbshipit-source-id: 6897f1daacb36beabcdfc22c39242bbdfdd0e534
2019-01-10 19:47:09 -08:00
0934e8de58 Optimize CPU version performance of the nonzero function. (#15925)
Summary:
Same as #15190 but compatible with MSVS compiler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15925

Differential Revision: D13623473

Pulled By: VitalyFedyunin

fbshipit-source-id: d0db9dbc1a0d8fc9bda08348cb1d3763ae9f8679
2019-01-10 17:50:38 -08:00
890568a018 Tensor reinitialization codemod - 5/5 (#15884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15884

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: hyuen

Differential Revision: D13586737

fbshipit-source-id: dc8e49e9f29505b8898bb19f84c1a983f2d811ab
2019-01-10 16:32:26 -08:00
e46e572b30 Add backward pass notes for eig() and symeig()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15929

Differential Revision: D13626158

Pulled By: soumith

fbshipit-source-id: ab869560926036053c39d20b217ccef8767e7d3f
2019-01-10 16:27:48 -08:00
da7468853a caffe2::Tensor::is_same() (#15407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15407

Don't ask the tensor for its intrusive pointer if we just want to check if two tensors are the same.
This mirrors ATen APIs.

Reviewed By: dzhulgakov

Differential Revision: D13520389

fbshipit-source-id: 681317f36f480ab60e532bb08a073f98f39770fd
2019-01-10 16:22:25 -08:00
b9e1028cff Clean up Half (#15317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15317

- Merge bitcasts.h and Half.h
- Remove 'static' keyword

Reviewed By: dzhulgakov

Differential Revision: D13498492

fbshipit-source-id: 46d47143e7d3a9d3f4aa7d92379dbba015c97435
2019-01-10 16:22:23 -08:00
d408324350 Move files to/from c10/core and c10/util (#15316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316

This starts cleaning up the files in c10 according to the module structure we decided on.

Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h

Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov

Differential Revision: D13498493

fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63
2019-01-10 16:22:22 -08:00
6b64052e20 Remove Context from c10 operator schemas (#15312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15312

Context will soon be entirely obsolete. Remove it from the operator schema interface.

Reviewed By: dzhulgakov

Differential Revision: D13495323

fbshipit-source-id: caa0f8f092cd6284e510c3e1e3374fe2f8338364
2019-01-10 16:22:20 -08:00
8136c39b5e Enable calling caffe2 LayerNorm from PyTorch and JIT (#15243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15243

Register it as a custom JIT op.

Reviewed By: dzhulgakov

Differential Revision: D13473791

fbshipit-source-id: 0f7e72e3efc85a75060a7597fadaf0a8bd289651
2019-01-10 16:22:18 -08:00
913785445e fix rocm build
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15945

Differential Revision: D13630505

Pulled By: zdevito

fbshipit-source-id: a4d2ae1370ab475fc1711027c0c9d2a9192be195
2019-01-10 16:16:15 -08:00
27f6a29fd0 Remove USE_CUDA and USE_ROCM in engine.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15893

Differential Revision: D13627319

Pulled By: zdevito

fbshipit-source-id: 7c72c1c6cc242143fb66383423c668c9b9810884
2019-01-10 14:45:11 -08:00
c5012d8641 Extend note about contributing to the C++ frontend (#15902)
Summary:
soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15902

Differential Revision: D13628525

Pulled By: goldsborough

fbshipit-source-id: 70cf36d1bacd9d689d4fa4f2290886fd3765e89b
2019-01-10 14:22:00 -08:00
3ec3351306 Fix different env variables in schedules runs pt 2 (#15934)
Summary:
Unfortunately I do not know how to test this without merging it first
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15934

Reviewed By: orionr

Differential Revision: D13627472

Pulled By: pjh5

fbshipit-source-id: 35eced1483bbf3c0c3f6f62fb7bbbf2f200e50e6
2019-01-10 14:09:12 -08:00
4b427780aa Change PoolOp Functors design to support CuDNN CUDA fallback (#15903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15903

Change PoolOp Functors design to support CuDNN CUDA fallback

Reviewed By: houseroad

Differential Revision: D13617085

fbshipit-source-id: 8a539d77f35bc47afe5dc8e32aaad52e45cb691c
2019-01-10 14:00:22 -08:00
b1fa19961e Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
Summary:
Wasn't clearing optimizer buffers before adding new entries to it during deserialization. Successive calls to `torch::load` with the same optimizer would just append to the buffer container. Also moved `serialize()` function from `torch::optim::detail` into `torch::optim` so users can use it for custom optimizers.

Fixes #15792

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15926

Differential Revision: D13623615

Pulled By: goldsborough

fbshipit-source-id: e193091f25f56a95f2a9648af312cb7caa45f300
2019-01-10 13:55:50 -08:00
9173cd5a4d fix aliasing on unwrap optional (#15748)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/15604
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15748

Differential Revision: D13583632

Pulled By: eellison

fbshipit-source-id: 9655ee010494179e17e34f3047363477dad15fb1
2019-01-10 12:52:53 -08:00
d35295c603 JIT Batch Norm fusion (#15897)
Summary:
Resubmit of #15146, which has been accidentally reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15897

Differential Revision: D13616093

Pulled By: zou3519

fbshipit-source-id: 0c3a3bec8f9fed57274da9f6c7cf40cbc05cf91a
2019-01-10 12:38:47 -08:00
7f268c6262 Fix different env variables in schedules runs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15927

Reviewed By: orionr

Differential Revision: D13624127

Pulled By: pjh5

fbshipit-source-id: e8b14f0401b0c278a5d17af6d7979800917e3ae6
2019-01-10 12:33:24 -08:00
4edc8273eb Allow for registration after GlobalInit (#15876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15876

Build changes made it so some .so libraries are now registered after GlobalInit is called. Although this shouldn't be common, it also shouldn't be explicitly excluded. These changes allow for late Caffe2 registration, but also warn in that case.

Reviewed By: kuttas

Differential Revision: D13608186

fbshipit-source-id: 0ca7bcd32516d374077db0c2548cf8c28ccdd5f6
2019-01-10 09:35:33 -08:00
9b5ec2a076 Fix TestDataLoader.test_proper_exit (#15665)
Summary:
Currently, in `test_proper_exit`,
1. we do not kill the correct input `pid` in the `kill_pid` function
fe15d6a2c2/test/test_dataloader.py (L325-L329)
2. the Windows command that detects process status doesn't actually work
fe15d6a2c2/test/test_dataloader.py (L641-L646)
3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`.

In this PR, I, in separate commits:
1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30  https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795).
2. Rewrite `test_proper_exit` with `psutil` so we

    1. do not rely on the hacky `is_process_alive` fe15d6a2c2/test/test_dataloader.py (L640-L653)
   2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger
   3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario.

3. Fix Windows data loader not having any mechanism to detect worker failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665

Differential Revision: D13615527

Pulled By: soumith

fbshipit-source-id: cfb2f67837d2d87928a53f00b4d20f09754b7949
2019-01-10 08:47:27 -08:00
0ed3f766e9 Unify flags and environmental variable when building LibTorch/PyTorch (#15868)
Summary:
Fixes #15858.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15868

Differential Revision: D13622354

Pulled By: soumith

fbshipit-source-id: bb8c49520ebf926c6194d42db75accba867018c7
2019-01-10 06:47:14 -08:00
3d68f35639 Adding binary builds to circleci
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15577

Reviewed By: orionr

Differential Revision: D13617359

Pulled By: pjh5

fbshipit-source-id: 2b2a1b8735f2af6973a2352bee78912794402ae1
2019-01-10 00:06:09 -08:00
2fa9264ba1 Fix lint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15910

Differential Revision: D13620684

Pulled By: houseroad

fbshipit-source-id: af3b1e2fed55ecd3417f66e549fa921bf4fd758e
2019-01-09 23:20:32 -08:00
cdaeb0db54 Make SGD match python (#15840)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15530
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15840

Differential Revision: D13608503

Pulled By: goldsborough

fbshipit-source-id: aad17c110d64cbe2c126bccd36d228e4108ffa9a
2019-01-09 22:21:14 -08:00
628bf5e3c9 test_jit.py: Speedup EndToEnd tests by reducing workload size. (#15906)
Summary:
Currently these tests are taking most of the time in test_jit.py run, with the
proposed changes the testing time is reduced by ~75%:

```
TestEndToEndHybridFrontendModels.test_neural_style: 203.360s -> 10.650s
TestEndToEndHybridFrontendModels.test_snli: 422.315s -> 9.152s
TestEndToEndHybridFrontendModels.test_super_resolution: 73.362s -> 19.185s

time python test/test_jit.py (real): 13m50.828s -> 3m11.768s
time python test/test_jit.py (user): 85m59.745s -> 13m18.135s
time python test/test_jit.py (sys): 144m9.028s -> 25m58.019s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15906

Differential Revision: D13619659

Pulled By: ZolotukhinM

fbshipit-source-id: 6c22d8740f8ddb865c3a0667af32653723383816
2019-01-09 21:14:35 -08:00
23e28efed4 Porting legacy reflection_pad2d to ATen
Summary:
Other changes:
1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index.
2. Changed Camelcase naming to underscore naming.

Differential Revision: D13546803

fbshipit-source-id: 1df54f13e64934da3d803d9b6586bd5208d42d6d
2019-01-09 20:55:27 -08:00
5f1dd9e743 Fix log_prob for Gumbel distribution (#15878)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15681

Changelog:
- Add hard-coded implementation of log_prob
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15878

Differential Revision: D13613716

Pulled By: soumith

fbshipit-source-id: 2ba74e52748b6213098b167940dcc068f0c056f4
2019-01-09 20:09:34 -08:00
4caca2f062 Tensor method rename sizes().size() -> dim()
Summary: Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: smessmer

Differential Revision: D13568637

fbshipit-source-id: 4e1b6658355d4073097eb666ba73596e0261bef1
2019-01-09 19:53:56 -08:00
b4c3268b23 Batched upper triangular, lower triangular (#15257)
Summary:
Changelog:

- Implements `triu` and `tril` for batches of 2D tensors.
- Remove TH/THC binding for `tril`
- Fix CUDA implementation
- Update docstrings for tril and triu.
- Remove mask-based `triu` and `tril` in cholesky forward and backward.
- Remove batched tril in torch.distributions.utils
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15257

Differential Revision: D13613888

Pulled By: mrshenli

fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4
2019-01-09 19:46:39 -08:00
5af9aaa5bb Minor bug fix in dnnlowp (#15841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15841

Fix the bugs in dnnlowp to support int8/int16 quantization for sparsenn.

Reviewed By: jspark1105

Differential Revision: D13600878

fbshipit-source-id: 27f06d7c54a663208320c8f211714220a9b49540
2019-01-09 17:18:30 -08:00
159c2f3918 test_jit.py: Replace direct exec invocation with a wrapper. (#15882)
Summary:
Python2 doesn't allow to invoke `exec` from a nested function:

  File "test/test_jit.py", line 4653
     exec(code, globals(), scope)
  SyntaxError: unqualified exec is not allowed in function 'test' it is a nested function

This patch wraps exec with a separate function, making it work for both python2
and python3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15882

Differential Revision: D13614235

Pulled By: ZolotukhinM

fbshipit-source-id: 9a074308c2379f089402e0bf5a996cc649d6dbca
2019-01-09 17:01:20 -08:00
b28738ccb5 Revert D13468570: [pytorch][PR] Optimize CPU version performance of the nonzero function.
Differential Revision:
D13468570

Original commit changeset: e55ce54d6062

fbshipit-source-id: 4c043564b0a69b5af11559e5dc94790e7064841f
2019-01-09 15:41:36 -08:00
71c6e24373 Fix several ResourceWarning: unclosed file (#15746)
Summary:
Hello,

This is a patch to fix `ResourceWarning: unclosed file`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15746

Differential Revision: D13587286

Pulled By: soumith

fbshipit-source-id: 08ac34c5b51d9334867f65a2927bff11511553f3
2019-01-09 15:36:53 -08:00
a1180d8e86 Fix BuildIndexOp (#15580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15580

adding the UNDEFINED datatype.

Reviewed By: itomatik

Differential Revision: D13556099

fbshipit-source-id: b730f7fca8faefb8a013c265296eee26bcedaff0
2019-01-09 15:12:50 -08:00
7b9f794580 Wrap C10 CUDAStream instead of cudaStream_t in THCPStream
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15833

Differential Revision: D13608337

Pulled By: mrshenli

fbshipit-source-id: 4c66ef89fad0dc14a11ddb69da92907797cd2828
2019-01-09 15:12:48 -08:00
0c32e1b43e use C10_MOBILE/ANDROID/IOS (#15363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363

Didn't define C10_MOBILE in the numa file move diff: D13380559
move CAFFE2_MOBILE/ANDROID/IOS to c10

```
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS"

```

i-am-not-moving-c2-to-c10

Reviewed By: marcinkwiatkowski

Differential Revision: D13490020

fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4
2019-01-09 15:08:20 -08:00
5838b59c5d Optimize CPU version performance of the nonzero function. (#15190)
Summary:
Optimized CPU version of the nonzero. Now 2x faster (in avg.) than numpy.

Can be further optimized for 1D tensors and boolean tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15190

Differential Revision: D13468570

Pulled By: VitalyFedyunin

fbshipit-source-id: e55ce54d60626a42d9a10a02e407856458b8055e
2019-01-09 13:37:38 -08:00
0571eaebab Remove TH binding of newWithStorage as it is not used.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15838

Differential Revision: D13601517

Pulled By: gchanan

fbshipit-source-id: 71ec107de2c880e7e0fd2ad6b4ea3d112dbb9d86
2019-01-09 13:10:33 -08:00
692caa7211 Revert D13598894: [pytorch][PR] [Caffe2] [ROCm] Use correct workspace alloc call in MIOpen conv operator
Differential Revision:
D13598894

Original commit changeset: 44886161abdf

fbshipit-source-id: 6c6057136f1ea741fcd1734695356709aeb4bf12
2019-01-09 10:03:50 -08:00
14b40c0633 Revert D13548303: [pytorch][PR] Add support for batch_norm fusion to the JIT
Differential Revision:
D13548303

Original commit changeset: a2e2e5abc383

fbshipit-source-id: 5b70cdbcbd1cac06eeefb2a939773358c061183c
2019-01-09 08:53:57 -08:00
fe15d6a2c2 Fix macos build (#15873)
Summary:
macos builds are broken now with the following error:
```
/usr/local/Homebrew/Library/Homebrew/config.rb:39:in `initialize': no implicit conversion of nil into String (TypeError)
	from /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `new'
	from /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `<top (required)>'
	from /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.7/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
	from /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.7/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
	from /usr/local/Homebrew/Library/Homebrew/global.rb:25:in `<top (required)>'
	from /usr/local/Homebrew/Library/Homebrew/brew.rb:13:in `require_relative'
	from /usr/local/Homebrew/Library/Homebrew/brew.rb:13:in `<main>'
Exited with code 1
```

No recent commits look suspicious, and I can even reproduce locally on my macbook, so it might be related to some new `brew` updates. Empirically, calling `brew update` first seems to fix this.

Example error build: https://circleci.com/gh/pytorch/pytorch/534392?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15873

Differential Revision: D13608019

Pulled By: soumith

fbshipit-source-id: 1499cb5246929e275a11ca6fccef6ef32918e45e
2019-01-09 07:50:36 -08:00
f0c2a9a7b6 Add torch.bincount() test case on sliced tensor (#15835)
Summary:
This was causing a problem in #15735 but appears to have been fixed.
Adding this test to prevent regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15835

Differential Revision: D13600282

Pulled By: zou3519

fbshipit-source-id: d9939e74d372be71c50122a5f6a615fbd7fa4df6
2019-01-09 07:31:19 -08:00
961f829067 deduplicated code in elementwise_op_broadcast_test.py (#15865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15865

factored out code used in tests for operators Add, Mul and Sub
into two new methods: a first one to generate the test vectors, a second
one to run the actual tests given a caffe2 and python operator.

Reviewed By: houseroad

Differential Revision: D13526955

fbshipit-source-id: 8970ba5a1305ca19a54a14b51816d4a19f19d678
2019-01-09 03:07:22 -08:00
c7ec7cdd46 Fixed syntax error in doctest (#15646)
Summary:
I fixed a very small extra parenthesis in a doctest.

I'm also going to use this issue as a place to propose the eventual inclusion of xdoctest (a pip installable library I wrote) in pytorch's test suite. I think there are a lot of problems with Python's built in doctest module, and I've built xdoctest to fix them. I would love for my project to get some exposure and its addition to PyTorch may benefit both projects. Please see the readme for more details on what xdoctest brings to the table over the builtin doctest module: https://github.com/Erotemic/xdoctest

I came across this small syntax error when working on ensuring xdoctest was compatible with pytorch. It isn't 100% there yet, but I'm working on it. My goal is to ensure that xdoctest is 100% compatible with all of torch's doctest out-of-the-box before writing up the PR. I'm also airing the idea out-loud before I commit too much time into this (or get my hopes up), so I'm attaching this little blurb to a no-brainer-merge PR to (1) demonstrate a little bit of value (because xdoctest flagged this syntax error) and (2) see how its received.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15646

Differential Revision: D13606111

Pulled By: soumith

fbshipit-source-id: d4492801a38ee0ae64ea0326a83239cee4d811a4
2019-01-09 01:29:11 -08:00
ac206a95f5 crelu mentioned (#15825)
Summary:
Mentioning crelu near relu in the docs .
fixes #15730  .
cc : ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15825

Differential Revision: D13605782

Pulled By: soumith

fbshipit-source-id: d34932cf82e5407c48548dbdfc1c61b596669a0b
2019-01-08 22:55:49 -08:00
5fe2697655 Initialize tensor with fp32 in Caffe2Backend.prepare() (#15832)
Summary:
Fix https://github.com/pytorch/pytorch/issues/14104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15832

Reviewed By: bddppq

Differential Revision: D13598332

Pulled By: yinghai

fbshipit-source-id: 3302ac47928974f49353c5da8af440e5c1716c22
2019-01-08 22:33:52 -08:00
c93cf89de2 Fix cuda native loss_ctc for varying input length (#15798)
Summary:
Thank you, freesouls, for the reproducing example!

This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately.

Fixes: #14401
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15798

Differential Revision: D13605739

Pulled By: soumith

fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f
2019-01-08 22:28:39 -08:00
cb32418669 Add element-wise multiplication in formulas (#15834)
Summary:
Absence of element-wise multiplication can confused some beginners
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15834

Differential Revision: D13603369

Pulled By: soumith

fbshipit-source-id: 1d5c17c57778ddbb4b201122d826d1d6437204d1
2019-01-08 21:17:25 -08:00
3f6e58b43b Typos fixed in CWrapPlugin.get_type_check (#15859)
Summary:
Typos fixed in CWrapPlugin.get_type_check
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15859

Differential Revision: D13605908

Pulled By: soumith

fbshipit-source-id: a8c970f0ac6d54dfd69b9775fc1a2b4f198b4ed6
2019-01-08 20:55:35 -08:00
1d6e818f2c Move LayerNorm op schema to c10 (#15199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15199

In order to call it from PyTorch, this op schema can't live in caffe2 but must be included from PyTorch.
Moving it to c10. This is not where it should be in the end (that's why there is a large TODO here),
but an intermediate hack to enable this use case and proof-of-concept.

Reviewed By: ezyang

Differential Revision: D13462124

fbshipit-source-id: 1e187b9def8ef049c91e6de947ea4a85758d711b
2019-01-08 20:31:48 -08:00
11708cbd7b Update flat_hash_map (#15367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15367

This updates flat_hash_map and fixes an issue with singletons across library boundaries
(see the PRs linked at the top of the file)

Reviewed By: ezyang

Differential Revision: D13510912

fbshipit-source-id: e90a297a7a2d69ae3fe48e4fcd8a44ad4b81292a
2019-01-08 20:31:46 -08:00
905df3943a Fix C10_API/C10_EXPORT for op schema registration (#15324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15324

This was missing but needs to be here, otherwise we can't register schemas without linker errors.

Reviewed By: ezyang

Differential Revision: D13500679

fbshipit-source-id: ba06351cb8ae09ec456cb93e527d388ace578fbb
2019-01-08 20:31:45 -08:00
d562840910 Use C10Tensor in the dispatcher (#15195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15195

This removes the use of caffe2::Tensor or at::Tensor in the c10 dispatcher and only uses C10::Tensor.
It also changes output tensors to be passed as `const Tensor&` instead of `Tensor*` because we otherwise can't forward them in operator_c10wrapper.h.

Reviewed By: ezyang

Differential Revision: D13461640

fbshipit-source-id: 7f79925a7d60f01660a24bbfda47391af0c70ed3
2019-01-08 20:31:43 -08:00
8ac55a6812 Convert caffe2/aten Tensors to/from c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14820

Reviewed By: dzhulgakov

Differential Revision: D13348044

fbshipit-source-id: 95008e6ead3cfc478696b1c203769241d4cf6ca8
2019-01-08 20:31:42 -08:00
31d7c933af Implement c10::Tensor (#14819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14819

This is a minimal wrapper for a c10::TensorImpl,
maybe destined for greatness later when we move caffe2::Tensor or at::Tensor into c10.

Reviewed By: dzhulgakov

Differential Revision: D13348039

fbshipit-source-id: 874f515358e94f35dc7a4c3e55b35fde59c51ff1
2019-01-08 20:31:40 -08:00
828cb18fa3 Allow ReadyQueue to handle empty tasks (#15791)
Summary:
Allow the comparison function used in ReadyQueue to handle the empty FunctionTasks created by the reentrant autograd.
Fix #11732
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15791

Differential Revision: D13598006

Pulled By: soumith

fbshipit-source-id: 0bfdf28a735fbfe44f0fdbaf8b74a6198e6a1984
2019-01-08 20:06:04 -08:00
8a07cbe5e1 In loop_wrapper, do not copy the passed-in functor (capture it by reference instead). (#15845)
Summary:
The overhead of the copy actually makes an appreciable difference when doing a lot of small reductions (i.e., when the reduced dimension is significantly smaller than the non-reduced dimensions.

```
x=torch.randn((1024,10,1024),dtype=torch.float64)
torch.set_num_threads(1)
%timeit x.std(1)
```

Before: 813.0 ms

After: 708.25 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15845

Differential Revision: D13603246

Pulled By: umanwizard

fbshipit-source-id: 020d224d76fcb8a0b55b75b0f2937e9508891beb
2019-01-08 19:59:39 -08:00
2b22612289 Add NHWC support to Resize Operator (#15553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15553

Add unit test and implementation of NHWC layout for Resize operator.

Also, add pragma parallel loop to old NCHWC layout.

Reviewed By: jspark1105

Differential Revision: D13540762

fbshipit-source-id: eebf252bf0d1efdff180a171d804181045f100a5
2019-01-08 16:44:17 -08:00
8a5ba577c1 Revert "remove use of tmp_install" (#15847)
Summary:
This reverts commit 04bf5285896e52ac118d2f9e9b7f582f695f13e2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15847

Differential Revision: D13603174

Pulled By: anderspapitto

fbshipit-source-id: ae321434d3345ad94fad67bf71fd027cddeb4588
2019-01-08 16:30:19 -08:00
4f51ca490e Correcting source pybind11 library to install into Python
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15836

Reviewed By: anderspapitto

Differential Revision: D13601331

Pulled By: pjh5

fbshipit-source-id: 36785c501774c01f47acb49cdac265b2c95a5040
2019-01-08 15:06:55 -08:00
acc83ad54e implement floordiv with correct integer and division by 0 semantics (#15813)
Summary:
fixes #15768
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15813

Differential Revision: D13594872

Pulled By: zdevito

fbshipit-source-id: c6c78c9e17fb16ec2bdc42402d203592cf35b7db
2019-01-08 13:44:18 -08:00
92a2bfe52d A trivial error message updates on at::Tensor _convolution (#15830)
Summary:
I fixed an grammatical error on this function previously, but I also realized that its content was also wrong. A weight tensors of a convolutional layer should be at least 3 dimensional, not 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15830

Differential Revision: D13597968

Pulled By: soumith

fbshipit-source-id: 72a75106e88945c68d6462828b149441cfb5acde
2019-01-08 13:20:00 -08:00
24314e9ceb Enable torch static build on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15769

Reviewed By: yf225, pjh5

Differential Revision: D13597845

Pulled By: orionr

fbshipit-source-id: 99640e22974990ae570a4795ce07274c4447cb01
2019-01-08 13:19:57 -08:00
196eee6ccd Fix sum_to behavior with zero dimensions (#15796)
Summary:
Fixes #15223.

This fixes an autograd bug where backprop either fails or produces
gradients of incorrect sizes when tensors with zero-sized dimensions are
involved.

Previously, we were reducing along dimensions that had size greater than 1
when summing to a size in autograd. This is incorrect because we should also reduce
along dimensions with size 0 to produce a tensor of size 1 in that
dimension that then gets viewed to the correct shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15796

Differential Revision: D13593199

Pulled By: zou3519

fbshipit-source-id: 2e2acac34943a9b7fabadc10c9efd4f66db298fd
2019-01-08 13:19:54 -08:00
734eb31035 Cache workspace size in the BenchmarkCache. (#15742)
Summary:
Cache the workspace size information for MIOpen for a given configuration as opposed to inquiring it every time. This reduces overhead significantly as inquiring the workspace size forces a full read of the performance database in MIOpen and this database has grown significantly in recent releases. This caching gets us back to ideal performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15742

Differential Revision: D13598932

Pulled By: bddppq

fbshipit-source-id: 4e65d247b71dec828293cf0562aac3fbd4fad83a
2019-01-08 13:10:15 -08:00
1bc47c0d86 Refactors shape logic out of code generation, fixes possible segfault (#15750)
Summary:
This PR:

- Removes shape logic from the code generator, which was previously relied on to return chunk and concat information
- Copies the logic to detect if a kernel has a rand_like node to the executor, making its pass independent of the code generator
- Fixes a possible segfault where references to a vector still being modified were relied upon

The actual shape logic is unchanged.

The possible segfault is in the handling of the former "flat_inputs" in codegen.cpp. This vector holds pairs, and the second element of these pairs is a reference. In some cases these would be references to items in the vector chunk_desc, which could be added to later, possibly invalidating any references to items in it. I hit a similar segfault in testing when naively making parallel code for "flat_outputs."

I'm submitting this small PR because it's separable, self-contained, has a fix, and I am trying to actively get away from large PRs to encourage more stability and incremental change in the fuser.

ngimel zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15750

Differential Revision: D13597451

Pulled By: zou3519

fbshipit-source-id: 0d48b365779b42849b044ba0286258aacc7b0332
2019-01-08 12:36:59 -08:00
c42def29c8 Use parallel thrust execution policy on ROCm (#15481)
Summary:
The Thrust shipped with ROCm  is recent enough to support this API. Minimize divergence between CUDA/ROCm by changing idef guards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15481

Differential Revision: D13598739

Pulled By: bddppq

fbshipit-source-id: 20d0a7e3887a4050eea65033161561af47411de1
2019-01-08 12:20:26 -08:00
cc402d8fa1 Use correct workspace alloc call in MIOpen conv operator (#15712)
Summary:
This PR contains changes for:
1. Using memory alloc from HIPContext while allocating workspace for MIOpen conv and transpose_conv operators rather than direct HIP mem alloc
2. Minor cleanup and removing an unnecessary sync call from MIOpen conv op

Differential Revision: D13598894

Pulled By: bddppq

fbshipit-source-id: 44886161abdf91cd29c7c93b3e23620e1b09c7c9
2019-01-08 11:38:45 -08:00
532a709771 Tensor method rename dims()->sizes() - 2/2
Summary: Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: smessmer

Differential Revision: D13581787

fbshipit-source-id: b04c6aa87fea3a10b522a71fccc1fcfb76a2c212
2019-01-08 11:34:36 -08:00
ede1f4ad05 Remove caffe2::ShareData (#15418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15418

Previously we are using Resize + ShareData.
Instead, we'll create a function on Tensor that clones itself with same storage.

Suppose we want `t` to `ShareData` with `t0`, Previous:
```
Tensor t(dims, CPU);
t.Resize(t0.sizes());
t.ShareData(t0);
```
Now:
```
Tensor t = t0.Alias();
```

Reviewed By: dzhulgakov

Differential Revision: D13507609

fbshipit-source-id: 6e4275d02f4c3356cbce91127f1b01111dc86b9f
2019-01-08 11:01:56 -08:00
8232bd526f Move isnan to C++ (#15722)
Summary:
Wanted to use `Tensor.isnan` in C++, figured it'd be nice to have, so I made it into a tiny native function.

gchanan ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15722

Differential Revision: D13591315

Pulled By: goldsborough

fbshipit-source-id: a78bd22101fde87a0257f759b9bfcf3b4208f5fa
2019-01-08 10:42:33 -08:00
461dc9a28b use all_weights instead of _parameters in _flat_weights in rnn (#15766)
Summary:
Fixes #15749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15766

Differential Revision: D13592320

Pulled By: soumith

fbshipit-source-id: 6c3805f576c3df5a2da8bef1e4305eda379718df
2019-01-08 09:48:36 -08:00
8f11147d43 Use CUDAGuard when serializing CUDA Tensors (#15807)
Summary:
Fixes #15308. Before this change, `torch.save` and `torch.load` would
initialize the CUDA context on GPU 0 if it hadn't been initialized
already, even if the serialized tensors are only on GPU 1.

This PR fixes that bug by using CUDAGuard in the storage serialization
path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15807

Differential Revision: D13593201

Pulled By: zou3519

fbshipit-source-id: 4addc91ea5a5278d56a03f3d422577ee39e99897
2019-01-08 07:31:50 -08:00
29a9d6af45 Stop leaving garbage files after running test_jit.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15404

Differential Revision: D13548316

Pulled By: zou3519

fbshipit-source-id: fe8731d8add59777781d34d9c3f3314f11467b23
2019-01-08 07:22:55 -08:00
5e1b35bf28 Add support for batch_norm fusion to the JIT (#15146)
Summary:
We don't support reductions yet, but simply decomposing batch_norm
into a kernel that computes the stats, and the fusing everything else
with ReLU and following pointwise ops provides nice speedups.

Note that this is only limited to inference mode for now, because we
don't support convolutions and batch norm in AD, so the fuser isn't
applied to those parts.

This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard `torchvision` models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe).

cc zou3519 zdevito mruberry ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15146

Differential Revision: D13548303

Pulled By: zou3519

fbshipit-source-id: a2e2e5abc383f637fae19bd1b423f20c2cbc056a
2019-01-08 07:00:19 -08:00
c3a0000864 Support communicating with C2 protobuf in Onnxifi flow (#15472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15472

Create a path to pass serialized C2 protobuf instead of ONNX during ONNXIFI flow

Reviewed By: houseroad

Differential Revision: D13536603

fbshipit-source-id: 7d016474f4beedbda480ed2e2c0004af7868aafe
2019-01-07 22:12:29 -08:00
4650d70e93 Add count_include_pad arg for AveragePoolOp on GPU (#15787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15787

Add count_include_pad arg for AveragePoolOp on GPU

Reviewed By: houseroad

Differential Revision: D13589185

fbshipit-source-id: 235a84cfcd2033ee796c13e338fc3d03e832b5b1
2019-01-07 21:36:26 -08:00
99d2743863 Move Stream.query() implementation down to C++ (#15737)
Summary:
See #15682

Pushing up this small PR to check if I am doing the right thing. If correct, more will follow for other Stream APIs. Questions will be added inline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15737

Differential Revision: D13581400

Pulled By: mrshenli

fbshipit-source-id: 24afed7847b89b62f0692c79a101ec7ff9d9ee4d
2019-01-07 20:58:07 -08:00
55baca57d2 A trivial error in the error message of at::Tensor _convolution fixed (#15772)
Summary:
A trivial grammatical error fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15772

Differential Revision: D13592279

Pulled By: zou3519

fbshipit-source-id: 14f60c61747a3893cd0e4c860f7b4c4c4ba28c28
2019-01-07 20:01:43 -08:00
770b5ac42b clean up D13579188 (#15759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15759

Some flags have too long names. And some other few minor clean ups.

Reviewed By: jianyuh

Differential Revision: D13587353

fbshipit-source-id: f8aee7f167505644f5d8f80fe2eed70201ef1e54
2019-01-07 18:48:25 -08:00
24867a58aa Add support for exporting onnx split (#15092)
Summary:
* With the update of split output to dynamic list it breaks the export to onnx.
 Now split ir becomes two ops: 1. Dynamic[] <= Split(), and 2. out1, out2, out3
 <= Prim::ListUnpack. In this fix these two consecutive ops get fused when being
 exported to onnx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15092

Reviewed By: dzhulgakov

Differential Revision: D13583832

Pulled By: houseroad

fbshipit-source-id: 3eb18c871e750921ad6d5cc179254bee9bcf4c99
2019-01-07 16:09:24 -08:00
bc328d01e5 simplify conv dnnlowp ops by not allowing fp32 in/out (#15758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15758

DNNLOWP Conv operators became very complex due to many options. This diff simplifies them by not allowing fp32 in/out. This is OK for Conv operators because Conv operators are usually used in deep networks where quantizing and dequantizing using separate operators is not much overhead.

Reviewed By: csummersea

Differential Revision: D13587341

fbshipit-source-id: e88c919dae79d1c5b7d787ea539edf5bcb064afc
2019-01-07 15:14:59 -08:00
49ba2cb796 Enable conv+add fusion, same as conv+sum (#15268)
Summary:
Enable conv+add fusion, same as conv+sum

Caution: only element-wise add is supported on IDEEP without scalar
broadcast. Otherwise, the fusion is illegal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15268

Differential Revision: D13577375

Pulled By: yinghai

fbshipit-source-id: 92c9c4b667c5ca5f7a262a5bffaa8aa68eeff3bd
2019-01-07 14:42:45 -08:00
76feb8c40f Allow List arguments to Python Ops (#15721)
Summary:
Adds `List` to eval environment for type lines and allows `List` to be used on PythonOps (follows the same style as the `Tuple` code), fixes #15661
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15721

Differential Revision: D13578540

Pulled By: driazati

fbshipit-source-id: fce54dc3c0931d8b017b2e3483f0ac53826dda94
2019-01-07 13:51:53 -08:00
668678e753 Bump CircleCI docker version to 278 (#15795)
Summary:
Just changing the version number doesn't seem to work. I needed to also fix macos brew parallel conflict

should this merge together with https://github.com/pytorch/ossci-job-dsl/pull/36 ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15795

Differential Revision: D13591839

Pulled By: yf225

fbshipit-source-id: 6b2a90943e63c8dcc4b6d9159eb54f1b5974c9ac
2019-01-07 12:32:33 -08:00
382807302c Fix C++ Frontend example in frontend.html (#15717)
Summary:
The small end-to-end example in https://pytorch.org/cppdocs/frontend.html is a little outdated and needs fixes.

ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15717

Differential Revision: D13591306

Pulled By: goldsborough

fbshipit-source-id: 3334d68c7f77cf094b66ec2b2f396c4c65bb0d72
2019-01-07 11:39:47 -08:00
321a559359 Fix restructured text issue in tensor_basics.rst (#15701)
Summary:
Fix submitted by huntzhan in https://github.com/pytorch/cppdocs/pull/4. The source is in this repo so the patch has to be applied here.

soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15701

Differential Revision: D13591302

Pulled By: goldsborough

fbshipit-source-id: 796957696fd560a9c5fb42265d7b2d018abaebe3
2019-01-07 11:35:19 -08:00
2ebeb33697 Fallback to CPU concat op to handle TensorCPU inputs (#15263)
Summary:
Fallback to CPU concat op to handle TensorCPU inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15263

Differential Revision: D13587030

Pulled By: yinghai

fbshipit-source-id: 010a8579d61c3beb8556eb92493a552b2ab0030c
2019-01-07 11:13:23 -08:00
c68eb5ec44 fix conv unit test for groupwise quantization and pre-packing (#15761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15761

As title says.

Reviewed By: csummersea

Differential Revision: D13587727

fbshipit-source-id: f0631b8cbb89d65a1d952bc25b463de23de93bec
2019-01-07 11:08:32 -08:00
95febdfacc Add is_floating_point to docs (#15704)
Summary:
Fixes #15700 .

Changelog:

- Expose torch.*.is_floating_point to docs

Differential Revision: D13580734

Pulled By: zou3519

fbshipit-source-id: 76edb4af666c08237091a2cebf53d9ba5e6c8909
2019-01-07 10:43:22 -08:00
2ff0e3b196 Pool prim::None nodes (#15745)
Summary:
Make the constant pooling pass pool prim::None nodes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15745

Differential Revision: D13583518

Pulled By: eellison

fbshipit-source-id: 7f8aa70522515805ab0991c6db3d96b5a96cdede
2019-01-07 10:00:51 -08:00
3277723173 Replace some malloc+memset pairs with calloc.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15765

Differential Revision: D13588723

Pulled By: resistor

fbshipit-source-id: 47d35dc608847a5b173cfcf2aaa2a77359e56722
2019-01-06 18:57:17 -08:00
b6a8c45f57 Removes print statements from test_torch.py (#15747)
Summary:
These print statements do not affect the test, and tests (generally) shouldn't print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15747

Differential Revision: D13587289

Pulled By: soumith

fbshipit-source-id: c758793c9e35faf02bacba6c7c6d072f7c40453f
2019-01-05 09:07:27 -08:00
04f5605ba1 Fix several DeprecationWarning: invalid escape sequence (#15733)
Summary:
Hello,

This is a little patch to fix `DeprecationWarning: invalid escape sequence`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15733

Differential Revision: D13587291

Pulled By: soumith

fbshipit-source-id: ce68db2de92ca7eaa42f78ca5ae6fbc1d4d90e05
2019-01-05 08:53:35 -08:00
2fb2d080d3 caffe2_benchmark msvc build fix (#15619)
Summary:
Fixing error in caffe2_benchmark binary

```
2018-12-29T14:09:59.7867995Z   d:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.h(90): error C2678: binary '|=': no operator found which takes a left-hand operand of type 'std::_Iosb<int>::_Openmode' (or there is no acceptable conversion) (compiling source file D:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.cc) [D:\a\1\s\caffe2_builders\v141\pytorch\build\Release\binaries\caffe2_benchmark.vcxproj]
2018-12-29T14:09:59.7868252Z   d:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.h(92): error C2678: binary '|=': no operator found which takes a left-hand operand of type 'std::_Iosb<int>::_Openmode' (or there is no acceptable conversion) (compiling source file D:\a\1\s\caffe2_builders\v141\pytorch\binaries\benchmark_helper.cc) [D:\a\1\s\caffe2_builders\v141\pytorch\build\Release\binaries\caffe2_benchmark.vcxproj]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15619

Differential Revision: D13580195

Pulled By: soumith

fbshipit-source-id: b0a4479cd5f7555801b1977aeee96b6433293da7
2019-01-05 08:29:31 -08:00
a918f1d9af Adding a hook (wrapper) for non-std stream reader in PyTorchStreamReader (#15551)
Summary:
To implement a stream is very annoying, since it is closely defined with the underlying storage streambuffer.

So in this PR, we add ReadAdapterInterface and PyTorchStreamReader will use it. We implement IStreamAdapter as a wrapper of std::istream. And keep the user interface unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15551

Reviewed By: zrphercule

Differential Revision: D13568907

Pulled By: houseroad

fbshipit-source-id: 93708cb801248a6c101f35cb14d1631029365c3c
2019-01-04 22:50:07 -08:00
1488c5dd03 support 0 size in any of the tensor dimensions in mkldnn (#15295)
Summary:
support 0 size in any of the tensor dimensions in mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295

Differential Revision: D13573747

Pulled By: yinghai

fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53
2019-01-04 22:33:18 -08:00
2d8b332262 Port replication_pad2d and replication_pad3d to ATen (#15538)
Summary:
port replication padding 2D and 3D from legacy TH API implementation
to ATen implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15538

Differential Revision: D13547567

Pulled By: lhuang04

fbshipit-source-id: decfe100d9edfdcfb62f39ee23f37b6cae0d461f
2019-01-04 17:08:14 -08:00
3d44eeec0a Fix different types in rsub caused bug (#15707)
Summary:
Before this pr, rsub did not convert two elements into the same dtype, therefore "1 - x" may export to an onnx model that two elements of rsub having different dtype.
By adding this symbolic patch this bug should be fixed.
Related test cases also created.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15707

Differential Revision: D13583042

Pulled By: zrphercule

fbshipit-source-id: 3a2de47a1a8d1ded1a0adfb911adbe6ac729cdef
2019-01-04 16:14:13 -08:00
ae91156e5d Tensor method rename dims()->sizes() - 1/2
Summary: Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: BIT-silence

Differential Revision: D13581782

fbshipit-source-id: b16b4198e100617769d84aa599bf141117cfbe5b
2019-01-04 16:02:22 -08:00
12e6c1ceeb Automatic update of fbcode/onnx to 8384c788939bc65463f9754b6a7a00b212b18ba1 (#15739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15739

Previous import was 765f5ee823a67a866f4bd28a9860e81f3c811ce8

Included changes:
- **[8384c78](https://github.com/onnx/onnx/commit/8384c78)**: add constantofshape (#1582) <Rui Zhu>
- **[9afc06c](https://github.com/onnx/onnx/commit/9afc06c)**: Set symbol visibility to hidden for non-Windows (#1707) <Paul Jesse Hellemn>
- **[6f8a9f0](https://github.com/onnx/onnx/commit/6f8a9f0)**: Revert "Add NonMaxSupression operator (#1695)" (#1702) <Lu Fang>
- **[8b89544](https://github.com/onnx/onnx/commit/8b89544)**: Add NonMaxSupression operator (#1695) <Hector Li>
- **[0a7cc48](https://github.com/onnx/onnx/commit/0a7cc48)**: Add bfloat16 support. (#1699) <Dmitri Smirnov>
- **[da7c50c](https://github.com/onnx/onnx/commit/da7c50c)**: ONNX does not maintain versions for experimental ops (#1696) <Ke Zhang>
- **[0c8d857](https://github.com/onnx/onnx/commit/0c8d857)**: Correct type of value_info in Graph (#1694) <Maik Riechert>
- **[f612532](https://github.com/onnx/onnx/commit/f612532)**: Fix typos (#1686) <Eundoo Song>

Reviewed By: zrphercule

Differential Revision: D13581674

fbshipit-source-id: 8f8ee86a05a86fe99bf94509148c559ea3df1464
2019-01-04 15:56:55 -08:00
04bf528589 remove use of tmp_install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14553

Differential Revision: D13583335

Pulled By: anderspapitto

fbshipit-source-id: 8711fead9eda877c1037a0bc59f91a3d2e01f3e0
2019-01-04 13:48:12 -08:00
6adbe12c74 Update CI credentials
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15736

Differential Revision: D13583174

Pulled By: yf225

fbshipit-source-id: 742470db10ef9df8f95e27626453b68ca90723e8
2019-01-04 13:36:10 -08:00
43761e01f5 Temporarily disable all XXXlike operator tests in pytorch-onnx test (#15740)
Summary:
We are going to have some breaking changes in ConstantLike and related operators in onnx, therefore it is better to disable all related tests for these operators for now.
These operators are not currently supported by caffe2, and are not included in our most recently released onnx, therefore we do not need to worry about internal/external production breaking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15740

Differential Revision: D13582528

Pulled By: zrphercule

fbshipit-source-id: 92a890c1dc2a833969af69edfea85331bb4d562f
2019-01-04 13:36:09 -08:00
07c4991622 Tensor construction codemod - 2/2 (#15600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15600

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13542455

fbshipit-source-id: 8a3b15b0a1f81565f34e309114e1c3e1f7f65a3c
2019-01-04 13:31:53 -08:00
b1529eeadb Print out operator suggestions for unknown builtin op (#15183)
Summary:
This improves the error message for "unknown builtin op" to suggest similarly named ops.

Currently it prints out all operators with a name within two edits.

Related issue: https://github.com/pytorch/pytorch/issues/13409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15183

Differential Revision: D13578509

Pulled By: eellison

fbshipit-source-id: 5c73408eda1f7aa456f5bd28790c34df0c76aeca
2019-01-04 13:04:44 -08:00
fad8480146 Updating submodules
Reviewed By: yns88

fbshipit-source-id: b8be56b57d109dfef5980ea7255e2ab021da099e
2019-01-04 12:28:13 -08:00
9e88547d72 Tensor construction codemod - 1/2 (#15598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15598

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13542429

fbshipit-source-id: db1059c78e85724d9b4fdab70466cf329db68359
2019-01-04 11:53:36 -08:00
ad0ef7ae48 remove dependency to fp32 batch permutation op (#15723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15723

As title says.

Reviewed By: jianyuh

Differential Revision: D13578604

fbshipit-source-id: 0da0ac31ae83c1e0daa9077e878feb4deffed6a3
2019-01-04 07:56:05 -08:00
e313f1a7bf Cudnn Handle Pool 3: At Wit's End (#15668)
Summary:
ezyang Here's a freshly rebased version of https://github.com/pytorch/pytorch/pull/15080 with the if statement that relieved the hangs that occasionally, nondeterministically, occurred on cudnnCreate on a particular windows build ([example w/debug statements](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/19238/console))  in https://github.com/pytorch/pytorch/pull/15280.

I'd like to run the CI over this several times before it's considered mergeable.  Sometimes the windows hang doesn't manifest for 2 or 3 consecutive trials.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15668

Differential Revision: D13579291

Pulled By: soumith

fbshipit-source-id: 3972eb98bad6ece933ca5e67a10fc4bc2ed06068
2019-01-04 06:28:21 -08:00
e798a09f6d Remove TH/THC link for cholesky_solve (#15691)
Summary:
Changelog:
- Remove TH/THC binding
- Port single matrix case to ATen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15691

Differential Revision: D13579317

Pulled By: soumith

fbshipit-source-id: 63a55606c656396e777e8e6828acd2ef88ed1543
2019-01-04 06:24:17 -08:00
b740b92f36 Modify torch.gesv error message (#15654)
Summary:
[doc](https://pytorch.org/docs/stable/torch.html#torch.gesv) uses `B` uppercase so error message should follow to avoid confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15654

Differential Revision: D13571297

Pulled By: soumith

fbshipit-source-id: 0b4e7797eceff92618f808bbfa65d13c1dcc2da0
2019-01-03 21:46:02 -08:00
069d894145 make conv_depthwise_dnnlowp_op_test faster (#15725)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15725

As title says.

Reviewed By: jianyuh

Differential Revision: D13579188

fbshipit-source-id: 382072c95929ccf9e189e2338e35b046c4a0650f
2019-01-03 21:46:00 -08:00
2d8f14cd12 clarified language of doc for torch.mul (#15664)
Summary:
see issue #15636

Please note - I build the documents but the HTML is not updated with the edited content.
I did not also build the fork.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15664

Differential Revision: D13571310

Pulled By: soumith

fbshipit-source-id: d43be0f61705693d778cc12c13e86d6b06130ac7
2019-01-03 21:39:35 -08:00
a923ea7cf0 disallow nbits_in_non_outlier == 0 in acc16 conv; option to fallback to acc32 (#15708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15708

nbits_in_non_outlier == 0 doesn't make sense because it means everything is outlier and we can just use 32-bit accumulation.
Depending on architecture, break-even point between acc16 and acc32 can be different. Adding thresholds for falling back to acc32.

Reviewed By: jianyuh

Differential Revision: D13574832

fbshipit-source-id: b7a37aacbfdc7867e31838dafcdd5f7c2ac282af
2019-01-03 20:31:33 -08:00
bebf1f7463 Torch tensor (#15224)
Summary:
Support torch.tensor in script. Already been accepted, trying to reland
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15224

Differential Revision: D13466616

Pulled By: eellison

fbshipit-source-id: f7850da07b0eb11af98f255fc15bd3cf861f2a40
2019-01-03 17:35:17 -08:00
1e9a6d7192 A quick fix for Stream operation errors on non-current device (#15689)
Summary:
see #15682

This is a quick fix by implementing the simpler solution as suggested by colesbury. As benchmark result shows, it slows down `Stream.query()` by ~20%, I would be happy to further pursue a more complex solution by implementing this in C++/ATen. But I would still vote for merge this quick fix first just to get rid of the bug sooner.

~Test TBA~ Added

FYI jeffreyksmithjr

now

```python
In [1]: def f():
   ...:     d0 = torch.device('cuda:0')
   ...:     d1 = torch.device('cuda:1')
   ...:     with torch.cuda.device(d0):
   ...:         s0 = torch.cuda.current_stream()
   ...:     with torch.cuda.device(d1):
   ...:         s1 = torch.cuda.current_stream()
   ...:     s0.query()
   ...:     s1.query()

In [4]: %timeit f()
38.1 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit f()
37.6 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

before

```python
In [4]: %timeit f()
28.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit f()
35.3 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15689

Differential Revision: D13571697

Pulled By: mrshenli

fbshipit-source-id: 4fe697f91248c6419136d37bb5b7147e612e2f4c
2019-01-03 15:14:58 -08:00
3270e4d4a5 Break up generated tests (#13992)
Summary:
This PR breaks up `TestJitGenerated` into 3 classes. This makes for
easier testing of specific groups (e.g. run all generated functional
tests without having to wait for the autograd tests)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13992

Differential Revision: D13076371

Pulled By: driazati

fbshipit-source-id: 1267af59be7d69feb690f5805fcd43fea58a7159
2019-01-03 14:34:13 -08:00
dcbc4f32db flake8 hook fix (#15693)
Summary:
This PR bypasses checking the user's configuration entirely and always use strict, since the CI considers it a hard failure if you can't pass flake8.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15693

Differential Revision: D13574889

Pulled By: suo

fbshipit-source-id: f5e1c5731cc49b6223b415317033c275bc7d4fec
2019-01-03 13:55:20 -08:00
2403135257 Prevent VS2017 from emitting ambiguous symbol errors (#15697)
Summary:
These `std::forward` calls cause VS2017 to emit:

    error C2872: 'std': ambiguous symbol

This fix prevents the ambiguity by specifying that `::std` is intended.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15697

Differential Revision: D13573483

Pulled By: goldsborough

fbshipit-source-id: 0439de3523a37a18df7af0cff4a1284a53833ddd
2019-01-03 13:45:35 -08:00
d42e90991b trace s_copy_ (#15690)
Summary:
s_copy_ was previously special-cased for out of place tracing.
This adds support for inplace tracing, which fixes tracing of
inception_v3

Fixes #15216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15690

Differential Revision: D13572011

Pulled By: zdevito

fbshipit-source-id: 1d565dec039a4b8c59179254285e61d2517ef9a9
2019-01-03 12:28:14 -08:00
78442f04fc Add mkldnn conv double backward (#15686)
Summary:
Fixes #15353 .

Like cudnn conv implementation, mkldnn also falls back to the default `_convolution_double_backward` as double backward.

This bug wasn't caught by CI before because mkldnn is only used when input scalar type is float, but our tests are all using double as default.

Adding test for float inputs, but mkldnn seems to have imprecision issues similar to cudnn implementation, so here I only check if double backward exists instead of calling `gradgradcheck`. Please correct me if the precision should actually be checked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15686

Differential Revision: D13571682

Pulled By: ailzhang

fbshipit-source-id: f1762439762370f276cfd59e8b8b8a4dee960a4b
2019-01-03 10:50:00 -08:00
947229ebd7 Fix ONNX export of logical ops, including torch.ne, to have correct output datatype (#15677)
Summary:
This is the an updated version of the earlier PR https://github.com/pytorch/pytorch/pull/15185, since that one was closed.

Currently PyTorch ONNX exporter exports the logical ops (lt, gt, le, ge, eq, ne) with output type in corresponding ONNX ops as type tensor(uint8). But ONNX spec allows for only tensor(bool), which is why models that have these ops fail to load properly.

This issue is captured in #11339. Part of this issue, relating to the allowed input types, has been fixed in ONNX spec by houseroad. This PR fixes the other part pertaining to output type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15677

Reviewed By: dzhulgakov

Differential Revision: D13568450

Pulled By: houseroad

fbshipit-source-id: a6afbea1afdb4edad8f8b1bc492f50b14e5f2fce
2019-01-03 10:35:25 -08:00
279ca4acd2 Port legacy reflection_pad1d to ATen (#15480)
Summary:
1. Avoided using `THCDeviceTensor` by re-calculating the mapping from cuda (blockIdx, threadIdx) to input/output tensor index.
2. Changed Camelcase naming to underscore naming.

Profiling:

Legacy:

```bash
$py.test test/test_nn.py -k ReflectionPad1d -v -s
....
=========== 2 passed, 1258 deselected, 800 warnings in 4.35 seconds ============
```

Now:

```bash
$py.test test/test_nn.py -k ReflectionPad1d -v -s
...
=========== 2 passed, 1258 deselected, 800 warnings in 4.03 seconds ============
```

I have two questions about the code. Any insights are appreciated. gchanan zou3519

1. I can verify that [this magic](https://github.com/pytorch/pytorch/blob/master/aten/src/THCUNN/TemporalReflectionPadding.cu#L32-L36) correctly maps output index to input index in different cases. But, I have no idea about how did you come up with this algorithm that merges three categories (in left padding, in original input, in right padding) into a single statement?

2. Why do we need [get contiguous](https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/TemporalReflectionPadding.c#L80) tensors when calculating forward and backward propagation?

Reflection_pad2d porting will come in the next PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15480

Differential Revision: D13544924

Pulled By: mrshenli

fbshipit-source-id: 182045434f210032a82cab721a190da0cd781fbf
2019-01-03 10:30:37 -08:00
1159302ab1 bug fix in 3d group conv (#15625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15625

3D group conv (both NCHW and NHWC layout) was not correct.
Added group=2 in test_1d_convolution and test_3d_convolution in conv_test

Reviewed By: protonu

Differential Revision: D13562099

fbshipit-source-id: 586e8a7574a2764f2a3b559db6c2415b3ab90453
2019-01-03 09:46:49 -08:00
6103a04cff Port torch.arange to aten and parallelize on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15667

Differential Revision: D13566631

Pulled By: gchanan

fbshipit-source-id: e3243a4e81ecb58373681df8bf6a00428352fb14
2019-01-03 09:20:41 -08:00
10c10b0990 Ignore flake8 warning about whitespace before ':' (#15663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15663

Ignore sometimes incorrect flake8 warning about whitespace before ':'

See https://github.com/ambv/black/issues/315

Reviewed By: soumith

Differential Revision: D13565818

fbshipit-source-id: 9d5ec2335899527ee71f4b505c00865a354e3bf0
2019-01-03 05:02:10 -08:00
f53010370b Add count_include_pad arg for PoolOpGradient on CPU and fix ARM performance issue. (#15651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15651

Add count_include_pad arg for PoolOpGradient on CPU and fix ARM performance issue.

Reviewed By: houseroad

Differential Revision: D13564257

fbshipit-source-id: 3a143f1122bc507ccb7827e9b46908d5c7203735
2019-01-03 00:18:47 -08:00
3b5a940355 Unify the usage of Dequantize (#15685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15685

The declaration of "Dequantize" is in "fbsource/fbcode/deeplearning/fbgemm2/QuantUtils.h", so it requires the "namespace fbgemm".

<T> is actually optional, since the type can de deduced from the first argument.

In some places we have "Dequantize<T>(...)", while in other places we have "Dequantize(...)". We'd better unify them. As a reference, all occurrences of "Quantize" are using "fbgemm::Quantize<T>(...)".

Reviewed By: jspark1105

Differential Revision: D13570847

fbshipit-source-id: 7fca9f7f9e4e0d9e5eb27ac44b8707adc3c80717
2019-01-02 21:32:46 -08:00
efc3d6b65d Fix vec256 inversion (#15659)
Summary:
soumith zou3519

I was browsing the code, and think `vec256_int.h` might need a minor revision, but not 100% sure.

1. It currently invert the result by `XOR` with 0. Should it `XOR` with 1 instead?
~2. AVX2 logical operations would set all bits in a byte/word/... to `1` if the condition holds. So functions, such as `_mm256_cmpeq_epi64 ` would return `0/-1` instead of `0/1`. Should it be masked with `1` to make sure it returns 0/1?~

~Would I be correct if I assume that the code revised below is not yet activated, but will be after we port legacy code to ATen?~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15659

Differential Revision: D13565929

Pulled By: mrshenli

fbshipit-source-id: 8ae3daf256c3d915dd855a2215c95275e899ea8c
2019-01-02 21:32:44 -08:00
b0cf780ecc Add min/max on numbers to JIT
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15680

Differential Revision: D13568806

Pulled By: zdevito

fbshipit-source-id: ef0f33cc12a057184293bc31d28cc7b24f73eb94
2019-01-02 20:10:38 -08:00
e2549cbc01 initialize with ident value in global reduction (#15653)
Summary:
Fixes #15647. cc colesbury.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15653

Differential Revision: D13571132

Pulled By: soumith

fbshipit-source-id: 8f25943c974b3b931f4528e0e0a370bc095dab51
2019-01-02 19:52:57 -08:00
0b0553f92d Updating submodules
Reviewed By: yns88

fbshipit-source-id: f7b540159cf1fe72825d09d55d56117d14ff90eb
2019-01-02 19:00:02 -08:00
879bccb1af Support for Jetson Xavier (#15660)
Summary:
The request changes are to support building Pytorch 1.0 on the Jetson Xavier with Openblas.  Jetson Xavier with Jetpack 3.3 has generic lapack installed. To pick up the CUDA accelerated BLAS/Lapack, I had to build Openblas and build/link pytorch from source. Otherwise, I got a runtime error indicating lapack routines were not cuda enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15660

Differential Revision: D13571324

Pulled By: soumith

fbshipit-source-id: 9b148d081d6e7fa7e1824dfdd93283c67f69e683
2019-01-02 18:51:42 -08:00
62883a911c Fixing cuda100 smoke tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15673

Reviewed By: yf225

Differential Revision: D13568746

Pulled By: pjh5

fbshipit-source-id: e636de417d61b48074399da75bfb2576c9f62743
2019-01-02 17:13:16 -08:00
3ea5a9a66d Remove PythonOp non-CPU path and PytorchOp (#15417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15417

Right now the way we test whether Blob contains a CPU tensor is broken in ```PythonOpBase``` is broken, which means non-CPU path might never be taken.
Searching through the codebase, non-gpu path is used in PythonDLPack, and it is used in PytorchOp which is unused. So we'll remove non-gpu path in this diff.

Reviewed By: dzhulgakov

Differential Revision: D13495011

fbshipit-source-id: 9fe9537f05026d2a2cf7051efa81d184de722710
2019-01-02 16:36:37 -08:00
7857909158 Updating submodules
Reviewed By: yns88

fbshipit-source-id: bb142e8f91046cc2b7ea32dac46ec0753b4bc218
2019-01-02 14:58:48 -08:00
d86cc3e7de fix select after chunk op (#15672)
Summary:
Fixes #15669.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15672

Differential Revision: D13567274

Pulled By: suo

fbshipit-source-id: a63e6cfc9dacedd4cb99dc51eee452038418001e
2019-01-02 14:35:23 -08:00
bb3c3f516b make flake8 failure blocking (#15675)
Summary:
Right now it just prints whatever flake8 errors and moves forward with the commit. This is too easy to miss.

It should block the commit so that the user can fix the issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15675

Differential Revision: D13567821

Pulled By: suo

fbshipit-source-id: 5f0de40ddd771bad8d6848417408cffbceb03183
2019-01-02 12:52:59 -08:00
c5554856c9 redo sleef build fix (#15549)
Summary:
This was accidentally reverted by #14866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15549

Differential Revision: D13549674

Pulled By: zdevito

fbshipit-source-id: e209aac53dccb082b91cfa2d292310eabeb459e3
2019-01-02 12:48:25 -08:00
bee6c6761e format conv_test.py to prepare D13562099 (#15632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15632

Just formatting and a few lints.

Reviewed By: yinghai

Differential Revision: D13562403

fbshipit-source-id: c56f8ee61f68cdaccc0828a764ff729454f68259
2019-01-02 11:34:30 -08:00
eeb14675f1 Fix torch.gesv args in doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15649

Differential Revision: D13564312

Pulled By: soumith

fbshipit-source-id: b3bba2ece600880077eb09b092ce17e331995bd6
2019-01-02 00:20:22 -08:00
b52420742d clamp fixes (#15479)
Summary: fix to #15338 .

Differential Revision: D13564343

Pulled By: soumith

fbshipit-source-id: be64b572945533e10ae6f627d335b47f093720a3
2019-01-01 23:12:17 -08:00
8278a8b16f Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: acb68439e62ea270af22364183a6ecba883fab66
2019-01-01 23:12:16 -08:00
2398b607ec Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 5c5ad6a5cc9220ee1dd9565d64c7459f866ff74d
2019-01-01 17:23:01 -08:00
a0d22b6965 Fix typo in documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15628

Differential Revision: D13562685

Pulled By: soumith

fbshipit-source-id: 1621fcff465b029142313f717035e935e9159513
2018-12-30 18:07:57 -08:00
7bb41e3953 Make btriunpack work for high dimensional batches and faster than before (#15286)
Summary:
Changelog:
- Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15286

Differential Revision: D13562038

Pulled By: soumith

fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6
2018-12-30 12:42:07 -08:00
56d945a1ca Add count_include_pad arg for average_pool_op on CPU (#15593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15593

Add count_include_pad arg for average_pool_op on CPU

Reviewed By: houseroad

Differential Revision: D13558123

fbshipit-source-id: 188879ec3af313105ff66ac0b5a81ea44fca2855
2018-12-30 04:16:47 -08:00
ef487d4f1d Remove TH/THC link for cholesky (#15595)
Summary:
Changelog:
- Remove TH/THC binding
- Port single matrix case to ATen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15595

Differential Revision: D13561657

Pulled By: soumith

fbshipit-source-id: 65f8c4b455cf19a0c7b6aeac2e3b985c7a7208f8
2018-12-29 17:54:50 -08:00
2a45050fdc Concatenate directly into shared memory when constructing batches for numpy (#14534)
Summary:
Since #1323 tensors are shared with shared memory, but this feature is not active for numpy.
This PR fix this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534

Differential Revision: D13561649

Pulled By: soumith

fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0
2018-12-29 17:51:02 -08:00
4047cdc690 Add a patch for OSX with SDK<10.12 (#15615)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15614

Build passing on SDK 10.9
https://dev.azure.com/ramonaoptics/feedstock-builds/_build/results?buildId=13
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15615

Differential Revision: D13561737

Pulled By: soumith

fbshipit-source-id: 2ab0f78338d4949fa3f2735915fd96dce4bcd621
2018-12-29 16:11:58 -08:00
d3e5540276 Fix typo: szie -> size
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15466

Differential Revision: D13536343

Pulled By: soumith

fbshipit-source-id: cb3df30bf346ef6bc0bc1b6430107b3e0e086f8d
2018-12-28 22:40:52 -08:00
119efd5266 Make the warning suppression safer (#15560)
Summary:
Address the problem introduced in https://github.com/pytorch/pytorch/pull/15499#issuecomment-450038494.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15560

Differential Revision: D13561346

Pulled By: soumith

fbshipit-source-id: 6abf622672bdcb77ae1a7188e8a3817fa97aecbc
2018-12-28 22:12:36 -08:00
d53012b4fe add NCHW2NHWC and NHWC2NCHW in utils.py (#15588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588

Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions.

Reviewed By: csummersea

Differential Revision: D13557674

fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0
2018-12-28 17:34:50 -08:00
9c8d8eab9d Remove TH/THC link for gesv (#15510)
Summary:
This PR removes the TH/THC binding for gesv.

Changelog:
- Remove TH/THC binding
- Port single matrix case to ATen
- Enable test_gesv for CUDA as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15510

Differential Revision: D13559990

Pulled By: soumith

fbshipit-source-id: 9da2825e94d3103627e719709e6b1f8b521a07fb
2018-12-28 16:54:27 -08:00
cd3c4a2f1c keep extra_info of each op in ProfDagStats (#15244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15244

This DIFF keeps track of the extra_info information attached to each operator. When getPerOpStas() is called, it attaches the extra_info to the result ProfDagStats protobuf.

Facebook
Net transform attaches a global_op_id which is defined as a tuple of (orig_net_name, original_op_index) to each operator,
The global_op_id is encoded as extra_info in each operator.

Reviewed By: aazzolini

Differential Revision: D13016289

fbshipit-source-id: 3e2719ec7ed0ebe47740b77581c565ff7e79b102
2018-12-28 15:03:23 -08:00
692898fe37 Error when torch.load-ing a JIT model (#15578)
Summary:
Throw a warning when calling `torch.load` on a zip file

Fixes #15570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15578

Differential Revision: D13555954

Pulled By: driazati

fbshipit-source-id: a37ecdb3dd0c23eff809f86e2f8b74cd48ff7277
2018-12-28 13:54:32 -08:00
fb22f76eb6 default_collate should collate bool list to byte tensors (#14669)
Summary:
Based on #15331 . Review only the last commit.

Fixes https://github.com/pytorch/pytorch/issues/14507.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14669

Reviewed By: ezyang

Differential Revision: D13528725

Pulled By: soumith

fbshipit-source-id: f12f1ac1c4ff2a3ddd6877c0c096a5da3a1ffa3c
2018-12-28 12:26:46 -08:00
6a3e54eda9 append caffe2 prefix to dnnlowp cmd line options (#15582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15582

Following convention of having caffe2_ prefix in command line options

Reviewed By: viswanathgs

Differential Revision: D13252055

fbshipit-source-id: 142a6395b832f211f34d0a87ec2d62c1e5fcdc69
2018-12-28 11:51:59 -08:00
2c4c8784d2 adding nightly build smoke tests to circleci
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15441

Reviewed By: yf225

Differential Revision: D13552399

Pulled By: pjh5

fbshipit-source-id: 4a52ee2d08324b9ab6b8c266ad6a1cd3bdad1c71
2018-12-28 10:47:38 -08:00
c1643ec551 add the int support (#15581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15581

as title

Reviewed By: protonu

Differential Revision: D13556274

fbshipit-source-id: ba21f0970257d526e2fe7574eea4f89465b9c618
2018-12-27 17:16:32 -08:00
9bf7eb914d Move VariableImpl functions to AutogradMeta and Variable (#15487)
Summary:
In this PR, we are moving all functions away from `Variable::Impl`, in order to get rid of `Variable::Impl` (and the `data_` Tensor in it) in the next PR. Some of the functions (such as `set_requires_grad` / `requires_grad` / `grad`) will be living in `AutogradMeta` class, while others (such as `backward()` / `rebase_history()` / `grad_accumulator()` / `grad_fn()`) will be living in `Variable` class.

This is the 2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15487

Differential Revision: D13553173

Pulled By: yf225

fbshipit-source-id: 691f9432d0cd0640af380c757f3e3a2f64f8851c
2018-12-27 17:16:31 -08:00
50fbf79451 test basic tensor interop
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12249

Differential Revision: D13469356

Pulled By: li-roy

fbshipit-source-id: b49748462aa44ac34b8ce79783f2c895a537a232
2018-12-27 17:04:00 -08:00
70f0c4745b Allow int/float cast to bool (#13391)
Summary:
This PR adds explicit `bool()` casts to match Python semantics

`bool(1) = True`
`bool(0) = False`
`bool(0.0) = False`
`bool(0.1) = True`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13391

Differential Revision: D12871213

Pulled By: driazati

fbshipit-source-id: 773a48b2647973138efe854abe725d647f1d727d
2018-12-27 16:01:08 -08:00
0fff5b3612 remove print ops before exporting onnx graph (#15550)
Summary:
Removing print ops before exporting onnx graph, fixes https://github.com/pytorch/pytorch/issues/15505
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15550

Differential Revision: D13551195

Pulled By: eellison

fbshipit-source-id: 1ea1e34cb5b8433eacc2b86fb10b241198af96be
2018-12-27 15:46:05 -08:00
62151aa259 Added deviceCount() virtual method to DeviceGuardImplInterface (#15574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15574

Added deviceCount() virtual method to DeviceGuardImplInterface, also added correspondent implementation for CPUGuardImpl, CUDAGuardImpl, FakeGuardImpl, VirtualGuardImpl, HIPGuardImplMasqueradingAsCUDA

Reviewed By: soumith

Differential Revision: D13554609

fbshipit-source-id: 913bf2aad44a0a356efe54505ee4abaf6c4622db
2018-12-27 15:36:32 -08:00
02a249ed92 Port torch.range to aten and parallelize on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15484

Differential Revision: D13538955

Pulled By: gchanan

fbshipit-source-id: ee3889ad116988d963e603621310b3bbdce0aec9
2018-12-27 15:25:57 -08:00
d63740bc3f Export group norm as ATen and add test (#15569)
Summary:
Short term solution, export group norm as an ATen op to unblock users.
Long term will add GroupNorm to onnx.

Add an end to end test for this one.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15569

Differential Revision: D13554293

Pulled By: houseroad

fbshipit-source-id: b4974c9ea2a1b81338ca1e5c6747efe2715d7932
2018-12-27 14:44:29 -08:00
e4477feb15 Update cuda.get/set_rng_state doc (#14324)
Summary:
Now that `cuda.get/set_rng_state` accept `device` objects, the default value should be an device object, and doc should mention so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14324

Reviewed By: ezyang

Differential Revision: D13528707

Pulled By: soumith

fbshipit-source-id: 32fdac467dfea6d5b96b7e2a42dc8cfd42ba11ee
2018-12-27 14:09:25 -08:00
9ad6ada9de Update QNNPACK (#15561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15561

- Update QNNPACK submodule to master (API-incompatible)
- Do matching changes in Caffe2 Int8 operators

Reviewed By: dreiss

Differential Revision: D13551322

fbshipit-source-id: 066f9087061167f7d7cfbc1c8f8628dfa93d056e
2018-12-27 11:59:54 -08:00
ed949e20cb Revert D13552080: [pytorch][PR] add clang-format check to CI
Differential Revision:
D13552080

Original commit changeset: 462a73894c16

fbshipit-source-id: ebfc5aa3343cebabbc24ff39e4e9841a372443e2
2018-12-27 10:56:52 -08:00
c86cd9e530 Fix wrong class name in jit _make_fail (#15559)
Summary:
It should be ScriptModule rather than TracedModule :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15559

Differential Revision: D13552058

Pulled By: soumith

fbshipit-source-id: 0aa17639c225818b00d59daec4bc2336f039f658
2018-12-27 02:02:33 -08:00
80cc280c68 add clang-format check to CI (#15543)
Summary:
Simple check that runs against your PR's changes and complains if running clang-format would have created a change. Does nothing when run against master, so it's "safe" to accept changes that fail this check and it won't break the build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15543

Reviewed By: soumith

Differential Revision: D13552080

Pulled By: suo

fbshipit-source-id: 462a73894c16e7108806af7fa88440c377d4d0d2
2018-12-26 22:20:32 -08:00
4d029bba7f Fix github branch prefix v (#15552)
Summary:
Fixes #15519 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15552

Differential Revision: D13550780

Pulled By: ailzhang

fbshipit-source-id: b117e5ced42de207b91045bffcee8907dd73201e
2018-12-26 19:48:47 -08:00
eeaf1b64cb Rotated boxes support for GPU GenerateProposals op (#15470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15470

On top of D13509114 and D13017791. Pretty straight-forward.

Reviewed By: newstzpz

Differential Revision: D13536671

fbshipit-source-id: ff65981b70c63773ccc9aef3ff28e3c9508f6716
2018-12-26 18:03:56 -08:00
e25702ac2b CUDA kernel for rotated NMS support, over 200x speedup than CPU (#15365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15365

On top of D13017791, adding rotated NMS support with the same kernel building
blocks. Results in 218x speedup on avg.

Reviewed By: SuperIRabbit

Differential Revision: D13509114

fbshipit-source-id: c1d33c8dc4bc50b5906b4f01bb0caf1115e2a357
2018-12-26 18:03:55 -08:00
7b87ecae37 Move autograd metadata from VariableImpl to TensorImpl (#13827)
Summary:
Changes originally in this PR:
1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct
2. Change Variable::Impl functions to use data members in `AutogradMeta` struct
3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl
4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()`

Changes moved from https://github.com/pytorch/pytorch/pull/13645:
1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python.
2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated.

This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827

Differential Revision: D13507173

Pulled By: yf225

fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a
2018-12-26 16:34:24 -08:00
4c5b1cc026 version bump to 1.1 (#15554)
Summary:
version bump to 1.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15554

Differential Revision: D13550818

Pulled By: soumith

fbshipit-source-id: 8a28582c98b42c081e103581551a01fd96c9f42d
2018-12-26 15:44:25 -08:00
8c6ff91d57 In README.md CMAKE_PREFIX_PATH should be CONDA_PREFIX when using an conda virtual environment (#15548)
Summary:
In current README.md, `CMAKE_PREFIX_PATH` is set to conda root even when you have activated an virtual environment. When an conda virtualenv is activated, packages are installed in `CONDA_PREFIX`, not conda root. I think `CMAKE_PREFIX_PATH` should also be set to `CONDA_PREFIX` in this case. I think some build issues can be solved with the new instruction. Maybe something like #14954.

soumith,
When I made PR #15335 I was confused and made a wrong point. I think this PR could be the real solution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15548

Differential Revision: D13549681

Pulled By: soumith

fbshipit-source-id: 42d855b6e49ee58d735d2f4715d3e5752a748693
2018-12-26 12:57:07 -08:00
cdb8edce75 add from_pretrained method to EmbeddingBag (#15273)
Summary:
The `EmbeddingBag` module does not include a `from_pretrained` method like the `Embedding` module.  I added it for consistency between the two modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15273

Differential Revision: D13547842

Pulled By: soumith

fbshipit-source-id: 8ffde51ff0c1e8fc8310263b6f375da88089ff7d
2018-12-26 08:35:39 -08:00
5ac95758e2 Make argument size checking consistent across CPU and CUDA for torch.gesv (#15430)
Summary:
There is an inconsistency in the size of arguments for gesv, which is fixed in this PR.

Changelog:
- Replicate check in CPU as done for CUDA
- Fix argument ordering (minor) in CUDA checking

Fixes #15328

Differential Revision: D13531167

Pulled By: soumith

fbshipit-source-id: c4b4e4fc12880208d08e88d1e47e730ac98c2ad3
2018-12-26 08:32:28 -08:00
f636dc9276 clang format world (#15524)
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.

Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524

Differential Revision: D13547989

Pulled By: suo

fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
2018-12-26 06:55:01 -08:00
d4712ee218 Added correct isinf handling for Integral tensors (#15489)
Summary:
Currently torch.isinf on integral tensor will raise RuntimeError: value cannot be converted to type int16_t without overflow: inf.
This pr will suppress the error and return false(0) for all integral tensors. The behavior will also be consistent with np.isinf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15489

Reviewed By: zou3519

Differential Revision: D13540786

Pulled By: flashhack

fbshipit-source-id: e730dea849da6a59f3752d347bcfbadfd12c6483
2018-12-26 06:36:09 -08:00
d602ddcda3 Trivial comment update in autograd/function.h (#15529)
Summary:
I removed the explanation on `num_inputs` parameter. This parameter was removed in #8168

colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15529

Differential Revision: D13547854

Pulled By: soumith

fbshipit-source-id: 8a9ac58f2c93a2533b82ec63089477166ed0bcb9
2018-12-26 02:25:54 -08:00
6e4be0af2e Fix failed type cast in Windows Debug Build (#15333)
Summary:
Fixes #15330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15333

Differential Revision: D13531317

Pulled By: soumith

fbshipit-source-id: b956f27bd7fa33cbdf405338fcbcbc7df2fd629f
2018-12-26 00:48:58 -08:00
12e0ed55b4 Upgrade MKL-DNN to version 0.17 and static build MKL-DNN (#15504)
Summary:
Upgrade MKl-DNN to 0.17 and static build MKL-DNN to fix the potentail build error due to old mkldnn version in host system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15504

Differential Revision: D13547885

Pulled By: soumith

fbshipit-source-id: 46f790a3d9289c1e153e51c62be17c5206ea8f9a
2018-12-25 22:56:51 -08:00
2fe5c29d81 remove legacy from docs (#15112)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15112

Differential Revision: D13547845

Pulled By: soumith

fbshipit-source-id: 61e3e6c6b0f6b6b3d571bee02db2938ea9698c99
2018-12-25 21:57:54 -08:00
60b13d1f71 Use at::zeros instead of torch::zeros in non-differentiable example (#15527)
Summary:
There was a typo in C++ docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15527

Differential Revision: D13547858

Pulled By: soumith

fbshipit-source-id: 1f5250206ca6e13b1b1443869b1e1c837a756cb5
2018-12-25 21:50:17 -08:00
2ed95c5871 Fix the compare logic in function overflows for MSVC (#15499)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15497.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15499

Differential Revision: D13547835

Pulled By: soumith

fbshipit-source-id: a674da93bf905a0b81f0cc60449ccb97c2746926
2018-12-25 21:50:15 -08:00
521894c490 Allow converting char tensor to numpy; add [fi]info.min (#15046)
Summary:
https://github.com/pytorch/pytorch/pull/14710 with test fixed.

Also added `finfo.min` and `iinfo.min` to get castable tensors.

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15046

Reviewed By: soumith

Differential Revision: D13429388

Pulled By: SsnL

fbshipit-source-id: 9a08004419c83bc5ef51d03b6df3961a9f5dbf47
2018-12-24 09:11:24 -08:00
b7bc49ad70 Port replication_pad1d to ATen (#15507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15507

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15485

port replication_pad1d

Reviewed By: ezyang

Differential Revision: D13531920

fbshipit-source-id: dcd64ebd2c24b7431996231b8d5addfb600b1072
2018-12-24 06:34:02 -08:00
ad6799537e Support stateful dataset (#15096)
Summary:
Currently re-implements the dataloader for stateful datasets. Outstanding work:
- Refactor DataLoader and DataLoader2 to have common base classes and only differ in specifi pieces of logic,
- Figure out how to not duplicate the `MapDataset` logic for stateful vs. non-stateful
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15096

Differential Revision: D13522043

Pulled By: goldsborough

fbshipit-source-id: 08e461ca51783047f11facc4d27dfa2e4f1e4c2a
2018-12-24 06:26:40 -08:00
8cd917812b put interactive prompt in bash (#15521)
Summary:
This makes compatibility with different versions of python a little bit simpler, and fixes a problem where stdin wasn't being read from the terminal properly in the prompt.

zdevito This should fix your EOF exception.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15521

Differential Revision: D13546358

Pulled By: suo

fbshipit-source-id: fb7551a86c888196831c046d9d9848e7ff05b925
2018-12-24 05:37:46 -08:00
f8a56bf476 Fix the iterator category for torch::data::Iterator (#15500)
Summary:
Try to fix https://github.com/pytorch/pytorch/issues/14410.
Additional info: From this [page](https://stackoverflow.com/questions/14062297/canonical-way-to-define-forward-output-iterator), If we change it into `input_iterator_tag`, it doesn't mean the `output_iterator_tag` is lost.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15500

Differential Revision: D13545773

Pulled By: soumith

fbshipit-source-id: 327bfb7be83d53e42925e0e391b2a4277e3a1b36
2018-12-23 19:49:44 -08:00
c07647814b Precommit hook: just warn if no clang-tidy (#15514)
Summary:
The precommit hook shouldn't hard fail if there's no `clang-tidy`, just warn and omit the check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15514

Differential Revision: D13545776

Pulled By: suo

fbshipit-source-id: 9bf3f8ee18703c6d1a39eb7776092fb5e120d2a1
2018-12-23 14:38:13 -08:00
4a716250cc Add torch.rot90 to torch.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15512

Differential Revision: D13545775

Pulled By: soumith

fbshipit-source-id: 2a8896571745630cff4aaf3d5469ef646bdcddb4
2018-12-23 14:31:11 -08:00
51f1c4fea5 fix parallelization detection for CPU foreach_reduced_elt (#15483)
Summary:
This does two things:

(1): revert #15114 , which is incorrect and actually just completely disables parallelization in this function (because `at::get_num_threads` returns `-1` unless it has been set explicitly)

(2): Fix our (FB-internal) failing tests that #15114 was intended to fix, by still working correctly in a setup where `#ifdef _OPENMP` is set and `omp_get_max_threads() > 1` , but `#pragma omp parallel` only launches one thread. I believe such an unusual situation only exists in certain unit tests within FB infra but we still need it to work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15483

Differential Revision: D13538940

Pulled By: umanwizard

fbshipit-source-id: a3362c7ac7327ced350d127bb426f82c59e42732
2018-12-23 12:51:40 -08:00
4e4ef0cffb add rowwise adagrad lp test (#15082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082

We didn't have unit test for low-precision rowwise adagrad

Reviewed By: chocjy

Differential Revision: D13300732

fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546
2018-12-22 10:25:39 -08:00
e012b183dd handle empty inputs to SparseLengthsMean correctly (#15389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389

SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros.
The unit tests were also not covering this special case which is fixed by this diff.

Reviewed By: salexspb

Differential Revision: D13515970

fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da
2018-12-21 22:20:14 -08:00
58a7f2aed1 Add pthreadpool_create and pthreadpool_destroy (#15492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15492

Add pthreadpool_create and pthreadpool_destroy, which are used by NNPACK tests.

Reviewed By: Maratyszcza

Differential Revision: D13540997

fbshipit-source-id: 628c599df87b552ca1a3703854ec170243f04d2e
2018-12-21 20:28:18 -08:00
90aa21e795 Metadata for input/output formats in model file proto. (#15252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15252

We would like to extend the model file format to include strongly type, semantic information
about the model inputs and outputs.

The goal is for a user to be able to consider a model file like a function with
a well defined API describing what the inputs and outputs would be.

Reviewed By: dzhulgakov

Differential Revision: D13009915

fbshipit-source-id: 5df124a876ad03c05fbdaacae0eab659637734c1
2018-12-21 17:42:38 -08:00
f3a588fede add len to nativeResolver (#15488)
Summary:
(otherwise len is not resolvable using torch::jit::compile)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15488

Differential Revision: D13539991

Pulled By: zdevito

fbshipit-source-id: 3ba85fa7b1adb163f9229c568f7997d22321903d
2018-12-21 16:47:15 -08:00
934fc28656 Remove NoneGenerator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15335

Differential Revision: D13540357

Pulled By: driazati

fbshipit-source-id: a289e5944b65872103f68faac74e18f10e7c6fff
2018-12-21 16:33:37 -08:00
1dcf2ea096 Add self to Python printer reserved words (#15318)
Summary:
This adds `self` to the list of reserved words and also sorts the lines and prevents the tracer from naming values 'self' (which happens in torch/tensor.py)

Fixes #15240
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15318

Differential Revision: D13540192

Pulled By: driazati

fbshipit-source-id: 46ae02e51b1b31d5c62110fa83ba258ea6bada27
2018-12-21 16:02:07 -08:00
70aafad08a AD support for adaptive_avg_pool2d (#15459)
Summary:
This adds AD support for adaptive_avg_pool2d, which is necessary for resnet50 in pytorch/vision:master. cc: soumith asuhan dlibenzi

apaszke  I saw that autodiff bug you fixed in #15403 , as it doesn't prevent this PR from passing, so I'll leave it for your PR to fix it. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15459

Differential Revision: D13534732

Pulled By: ailzhang

fbshipit-source-id: 4e48b93e35d5ecfe7bd64b6a132a55b07843f206
2018-12-21 15:38:24 -08:00
01be9b7292 Handling nullptr case
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15467

Reviewed By: Maratyszcza

Differential Revision: D13536504

fbshipit-source-id: ab46ff6bb4b6ce881c3e29d7e6a095ea62289db4
2018-12-21 15:08:00 -08:00
235d47760b Relax check on outputs (#15458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15458

many nets in the wild seem to have outputs that are never produced by the net.

Reviewed By: ZolotukhinM

Differential Revision: D13534185

fbshipit-source-id: 2b23b39c28404c53f68868f3bf6df53c5fea9eab
2018-12-21 14:19:37 -08:00
6bf05bfde6 allow non-final returns (#15463)
Summary:
This PR allows a subclass of programs that have return statements that are not final in the graph.

`final_returns.h` contains the a comment describing how this is accomplished.
To minimize complexity in `compiler.cpp`, this pass is done as an AST-to-AST rewrite before the compiler runs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15463

Differential Revision: D13538962

Pulled By: zdevito

fbshipit-source-id: 67105ca873351825b4a364092ab1873779f3e462
2018-12-21 14:01:33 -08:00
3da4a04733 Fixed trivial typos in Dropout2D and Dropout3D classes (#15200)
Summary:
Fixed trivial typos in Dropout2D and Dropout3D classes

weiyangfb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15200

Differential Revision: D13537888

Pulled By: ezyang

fbshipit-source-id: 8fb06027ca663a2e4bfa016af400698ae3c88ad1
2018-12-21 11:58:10 -08:00
ff8fbc4f23 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 59d7a5b82fb78bc2d2285d0896e35c262512ffb9
2018-12-21 11:47:05 -08:00
7e2ec24886 eq_fixes (#15475)
Summary:
fixes #15464 .
cc : ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15475

Differential Revision: D13537812

Pulled By: ezyang

fbshipit-source-id: 127adf612ac8b3d3a64baa3d12a53daba7d3e4b8
2018-12-21 11:43:06 -08:00
d9cad71b36 Enable running collect_env.py without building PyTorch (#15468)
Summary: Closes #15346

Differential Revision: D13537873

Pulled By: ezyang

fbshipit-source-id: 7765ce4108dae9479d8900c0815cc2f174596a83
2018-12-21 11:37:43 -08:00
ac506f5820 Back out "[nomnigraph][executor] computeChains with nomnigraph" (#15451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15451

Original commit changeset: ccd050bfead6

Reviewed By: ilia-cher

Differential Revision: D13533161

fbshipit-source-id: 1d0dcd54c2e3875aab015f3e996693e67a449b87
2018-12-21 11:09:27 -08:00
acbd9c49b0 Direct FBGEMM integraton into ATen (#13777)
Summary:
This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation:

1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel.
2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value.
3) Biases are unquantized
4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops *must* be run with FBGEMM

The API can be seen in the added test case. Highlights are:
1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above.
2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777

Differential Revision: D13383276

Pulled By: jamesr66a

fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344
2018-12-21 10:35:51 -08:00
614121c1ef Replace getargspec with getfullargspec (#15396)
Summary:
Replace `getargspec` with `getfullargspec` to resolve test warnings. Fixes #15344 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15396

Differential Revision: D13529548

Pulled By: zou3519

fbshipit-source-id: 50d3be92423a9ce89bc4895b67569663e1abbaa6
2018-12-21 09:40:33 -08:00
2b23ba8ef0 The benchmark binary support multiple batches in one run (#15443)
Summary:
It is sometimes beneficial to run multiple batches in one benchmark and check the aggregated results.

This PR enables this functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15443

Reviewed By: llyfacebook

Differential Revision: D13531129

Pulled By: sf-wind

fbshipit-source-id: 553a762a5cbadf5a3d9fd6af767ae34899bc1aa2
2018-12-21 08:45:41 -08:00
433db13b48 Move torch.logspace to ATen and parallelize on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15438

Reviewed By: ezyang

Differential Revision: D13529626

Pulled By: gchanan

fbshipit-source-id: 896e8afee3d6b5a706c4f5815b91ba6bd8af6672
2018-12-21 08:24:33 -08:00
61cc701dd7 Fix cudnn dropout (#15473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15473

Revert accidental changes introduced in D13335176

IntList is a range and copying it just copies pointers. Thus pointers would point either on deallocated memory or on the same memory causing equality always pass.

Reviewed By: ezyang

Differential Revision: D13537131

fbshipit-source-id: c97b3533be689bb4cdadd9e612f1284ac50e4bda
2018-12-21 08:15:44 -08:00
f52f68bcf9 format specialized_segment_ops_test.py to prepare D13515970 (#15408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15408

Applied formatting to specialized_segment_ops_test.py to prepare D13515970

Reviewed By: salexspb

Differential Revision: D13520300

fbshipit-source-id: c3250b6abe8087c607f65ae60d1da61bd46c342b
2018-12-20 23:44:47 -08:00
cb79e1b3a5 Clean up onnxifi transformation code (#15453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15453

Just move things around to facilitate further development. No logic change.

Reviewed By: rdzhabarov

Differential Revision: D13533959

fbshipit-source-id: eebab1306939e802aacffb24a711d372fd67916c
2018-12-20 22:06:47 -08:00
26b04523b1 Record Caffe2's current stream ID in c10_cuda. (#15174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174

Previously, Caffe2 maintained a separate per-thread per-device
current logical CUDA stream ID.  In this PR, we switch Caffe2 over
to using c10::Stream to manage the current stream, and also
manage the allocation of cudaStream_t objects.

This results in a slight behavior change: previously, Caffe2
would have been willing to allocate an arbitrary number of
CUDA streams, depending on how high the logical stream IDs
went.  The c10::Stream pool has a fixed number of streams, once
you exceed it, it wraps around.

Reviewed By: dzhulgakov

Differential Revision: D13451550

fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc
2018-12-20 21:54:05 -08:00
3353064060 Add option to automatically handle unsorted variable-length sequences in RNNs (#15225)
Summary:
Fixes #3584.

Motivation: manually sorting sequences, packing them, and then unsorting them
is something a lot of users have complained about doing, especially when we can
offer library support for them.

Overview: we internally sort sequences before packing them and store a list of
`unsorted_indices` that represent how to unsort the sequences inside
PackedSequence. The packing helper functions return PackedSequence with the
`permutation` field and the unpacking helper functions use it to unsort.

To implement this, the following changes were made:
- PackedSequence now keeps `sorted_indices` and `unsorted_indices`.
  These two can be thought of as permutations and are inverses of each other.
  `sorted_indices` is how the sequences were sorted; `unsorted_indices` is how
  to unsort the sequences.
- Added an `enforce_sorted` argument to pack_sequence and pack_padded_sequence
  that maintains the legacy behavior of error-ing out on unsorted-sequences.
  When `enforce_sorted=True`, these functions maintain their ONNX exportability.
- pack_sequence(sequences, enforce_sorted) takes in unsorted sequences.
- pack_padded_sequence can take in a padded tensor that represents padded,
  unsorted sequences.
- pad_packed_sequence unsorts the PackedSequence such that it is still the
  inverse operation of packed_padded_sequence.
- RNNs apply `sort_indices` to their input hidden state and apply
  `unsort_indices` to their output hidden state. This is to ensure that the
  hidden state batches correspond to the user's ordering of input sequences.

NOT BC-Breaking
- The default for pack_sequence and pack_padded_sequence is
  `enforce_sorted=True` to avoid breaking ONNX export. To use the new
  functionality, pass in `enforce_sorted=False`

Testing Plan
- Modified TestNN.test_pack_sequence, TestNN.test_packed_padded_sequence,
  and TestNN.test_variable_sequence (RNN test) to check the behavior
  of unsorted sequences, sorted sequences, and sorted sequences with
  enforce_sorted=True
- test/test_jit.py has a test to see if RNNs are exportable with
  enforce_sorted=True

cc colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15225

Reviewed By: soumith

Differential Revision: D13507138

Pulled By: zou3519

fbshipit-source-id: b871dccd6abefffca81bc4e3efef1873faa242ef
2018-12-20 17:37:18 -08:00
52699f0754 Change default value of unique to 'sorted=True'
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15379

Differential Revision: D13531287

Pulled By: ezyang

fbshipit-source-id: 1512da7d660dc413688d99264e6434897c3ac78c
2018-12-20 17:09:08 -08:00
4ee1c2c632 add denormal options (ftz and daz)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15423

Reviewed By: yinghai

Differential Revision: D13526340

fbshipit-source-id: de2ecc717b4f778f33a8bf940ed144dbb230c7a8
2018-12-20 17:04:39 -08:00
3a6d473b49 collect_env fix (#15447)
Summary:
fixes #15214
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15447

Differential Revision: D13531523

Pulled By: ezyang

fbshipit-source-id: 8f24f5ae9f3e78f6c5c9ee702ba14faca7aa297a
2018-12-20 16:56:34 -08:00
a178f0a316 Remove unused field in jit script module deserializer (#15439)
Summary:
A little bit clean up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15439

Reviewed By: zrphercule

Differential Revision: D13532015

Pulled By: houseroad

fbshipit-source-id: 2fb1e01fc28549c7e78af6c65ee68339950bc7da
2018-12-20 16:18:40 -08:00
8883ac4b58 Revert D13494873: [pytorch][PR] Fixing ONNX export of logical ops to have correct output datatype
Differential Revision:
D13494873

Original commit changeset: 069d2f956a5a

fbshipit-source-id: 80ef10b2eb623a63da51dc2e4874f2ee446f426d
2018-12-20 15:56:31 -08:00
95a0e2c421 Fix ASAN div by zero error in rotated GenerateProposals op (#15415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15415

Was introduced in D13429770

Reviewed By: SuperIRabbit

Differential Revision: D13524114

fbshipit-source-id: a890eb3b97c24952c361155d1432a801499f4ddd
2018-12-20 15:44:15 -08:00
ed5b584f65 Tensor construction codemod(ResizeLike) - 7/7 (#15087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15087

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419765

fbshipit-source-id: 34d695309a66723281429610a12544598c507d74
2018-12-20 15:33:07 -08:00
d6cbcb43c5 allow numpy-like boolean-list indexing in pytorch (#14932)
Summary:
Suggested fix to issue #6773, the fix allows numpy-like boolean-list indexing in pytorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14932

Differential Revision: D13398795

Pulled By: ezyang

fbshipit-source-id: 67f8daf9829db2550ff76d2bde673be6dd2708cd
2018-12-20 15:33:06 -08:00
f56217af3b Doc improvement on DDP (#15440)
Summary:
I noticed that some users don't even know we have this support. Adding into the doc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15440

Differential Revision: D13531045

Pulled By: teng-li

fbshipit-source-id: 9757c400c0010608758c754df04e603b36035a10
2018-12-20 14:51:57 -08:00
cde26c659e Fix type annotation error. (#15448)
Summary:
According to mypy, the trailing -> None is mandatory.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15448

Differential Revision: D13532179

Pulled By: ezyang

fbshipit-source-id: e8972f8c9ada4657c518cd7bcd46e489ab8ddf5f
2018-12-20 14:47:57 -08:00
c24a124fa0 Add launch bounds needed for ROCm 2.0 (#15400)
Summary:
ROCm 2.0's compiler requires launch_bounds annotations if flat work group sizes are larger than the default of 256.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15400

Differential Revision: D13531239

Pulled By: ezyang

fbshipit-source-id: c0b40600a8c332823da6c7113c644d8dba424a9c
2018-12-20 14:39:13 -08:00
1a2ec10bd4 Support enough of closures to write autograd functions (#15411)
Summary:
This PR adds enough of the infra for supporting closures (inner script functions) in order to allow us to expression symbolic gradients using them. We do not actually ever run graphs that contain these closures. The symbolic_script infrastructure just extracts them out of the original forward graph and turns them into discrete forward/backward pairs. This cuts down on the type annotations necessary to write forward/backward pairs and aligns closely with the "differentiator" function approach to expression reverse-mode AD.

Example:

This code:
```
import torch

r = torch.jit.CompilationUnit(
'''
def mul_forward(self, other):
    def backward(grad_output):
        grad_self = (grad_output * other).sum_to_size(self.size())
        grad_other = (grad_output * self).sum_to_size(other.size())
        return grad_self, grad_other
    return self * other, backward
''')

print(r.module.code)
```

Will produce this graph (pretty printed for clarity):

```
def mul_forward(self,
    self: Tensor,
    other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
  backward = (self.__lambda, (other, self))
  return (torch.mul(self, other), backward)

def __lambda(self,
    context: Tuple[Tensor, Tensor],
    grad_output: Tensor) -> Tuple[Tensor, Tensor]:
  other, self, = context
  grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
  grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
  return (grad_self, grad_other)
```

symbolic_script will then do some modifications to remove the unsuppored prim::Function node, yielding:

```
def mul_forward(self,
    self: Tensor,
    other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
  return (torch.mul(self, other), (other, self))

def backward(self,
    context: Tuple[Tensor, Tensor],
    grad_output: Tensor) -> Tuple[Tensor, Tensor]:
  other, self, = context
  grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
  grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
  return (grad_self, grad_other)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15411

Differential Revision: D13523340

Pulled By: zdevito

fbshipit-source-id: 4d4a269460e595b16802c00ec55ae00e3e682d49
2018-12-20 14:39:11 -08:00
3fdf567752 Adding CUDA version for C2 operators generate proposals and nms (#13694)
Summary:
Related to issue #13684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13694

Reviewed By: wat3rBro

Differential Revision: D13017791

Pulled By: newstzpz

fbshipit-source-id: 4bdc58e474d8e1f6cd73a02bf51f91542a2b9d0b
2018-12-20 14:39:09 -08:00
a47749cb28 Add at::one_hot (#15208)
Summary: Closes: https://github.com/pytorch/pytorch/issues/15060

Differential Revision: D13528014

Pulled By: ezyang

fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293
2018-12-20 14:24:58 -08:00
2a64a78e7b Extract arguments to its own file and pass arguments to ios apps (#15413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15413

In order to pass arguments to the ios app, need to extarct the arguments
to its own file. Also, in the ios app, do not use the benchmark.json, which
parses the arguments.

This is an incompatible change, needs to add hot fix to the tests.

Reviewed By: llyfacebook

Differential Revision: D13523240

fbshipit-source-id: b559cc7f52d8f50ee206a7ff8d7b59292d855197
2018-12-20 13:31:48 -08:00
f0f9277c3c Fixing ONNX export of logical ops to have correct output datatype (#15185)
Summary:
Currently PyTorch ONNX exporter exports the logical ops (`lt`, `gt`, `le`, `ge`, `eq`) with output type in corresponding ONNX ops as type `tensor(uint8)`. But ONNX spec allows for only `tensor(bool)`, which is why models that have these ops fail to load properly.

This issue is captured in https://github.com/pytorch/pytorch/issues/11339. Part of this issue, relating to the allowed input types, has been fixed in ONNX spec by houseroad. This PR fixes the other part pertaining to output type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15185

Differential Revision: D13494873

Pulled By: houseroad

fbshipit-source-id: 069d2f956a5ae9bf0ac2540a32594a31b01adef8
2018-12-20 12:37:27 -08:00
cb0b096f2b Miscellaneous small doc fixes (#15373)
Summary:
This PR makes some small changes for better consistency in our README and
CONTRIBUTING docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15373

Differential Revision: D13512753

Pulled By: driazati

fbshipit-source-id: 44398ad1894eef521d5f5acb1d06acaad67728cf
2018-12-20 12:33:40 -08:00
cac02034f6 Extend README for ATen/native/cpu (#15437)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15437

Differential Revision: D13529436

Pulled By: ezyang

fbshipit-source-id: 2e2193d54ea7f7626fe7392e4d0c130c2f87a76f
2018-12-20 11:17:00 -08:00
06a7cb5901 Implementing cuda kernel for tril_indices and triu_indices (#15203)
Summary:
Followup PR of #14904, and the stretch goal of #12653.

Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor.

The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers.

Algorithm details are describe in [comments of TensorFactories.cu](23ddb6f58a/aten/src/ATen/native/cuda/TensorFactories.cu (L109-L255)).

zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203

Reviewed By: zou3519

Differential Revision: D13517695

Pulled By: mrshenli

fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea
2018-12-20 10:23:38 -08:00
5c66662e58 Revert D13498974: [pytorch][PR] [jit] Add self to Python printer reserved words
Differential Revision:
D13498974

Original commit changeset: 488efb661476

fbshipit-source-id: 3b991bccf4cf2ffdafe70f145aff0ae2837e31f8
2018-12-20 10:02:37 -08:00
8db44eda01 Add support for batched pdist (#12302)
Summary:
This updates pdist to work for batched inputs, and updates the
documentation to reflect issues raised.

closes #9406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302

Reviewed By: ezyang

Differential Revision: D13528485

Pulled By: erikbrinkman

fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de
2018-12-20 09:41:08 -08:00
7a764fe270 multi-dim standard deviation for CUDA. (#14990)
Summary:
This is the CUDA version of #14535 .
It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type.
We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before).
As an initial use-case, we implement `std` in multiple dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990

Differential Revision: D13405097

Pulled By: umanwizard

fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb
2018-12-20 08:56:32 -08:00
5e624948b6 Add self to Python printer reserved words (#15318)
Summary:
This adds `self` to the list of reserved words and also sorts the lines and prevents the tracer from naming values 'self' (which happens in torch/tensor.py)

Fixes #15240
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15318

Differential Revision: D13498974

Pulled By: driazati

fbshipit-source-id: 488efb661476cdcdb8ecb9cb48942f02e3c1e611
2018-12-20 02:29:09 -08:00
eb5d28ecef Pretty printing of C++ modules (#15326)
Summary:
A long outstanding nicety: pretty printing of C++ modules. E.g.
```
  Sequential sequential(
      Linear(10, 3),
      Conv2d(1, 2, 3),
      Dropout(0.5),
      BatchNorm(5),
      Embedding(4, 10),
      LSTM(4, 5));
std::cout << sequential;
```
prints
```
torch::nn::Sequential(
  (0): torch::nn::Linear(in=10, out=3, with_bias=true)
  (1): torch::nn::Conv2d(input_channels=1, output_channels=2, kernel_size=[3, 3], stride=[1, 1])
  (2): torch::nn::Dropout(rate=0.5)
  (3): torch::nn::BatchNorm(features=5, eps=1e-05, momentum=0.1, affine=true, stateful=true)
  (4): torch::nn::Embedding(count=4, dimension=10)
  (5): torch::nn::LSTM(input_size=4, hidden_size=5, layers=1, dropout=0)
)
```

apaszke ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15326

Differential Revision: D13518986

Pulled By: goldsborough

fbshipit-source-id: 63bf753672f0e348951de3645208f263581de5fb
2018-12-19 21:55:49 -08:00
2ef0f1222a Restructuring prof dag counters (#13321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13321

This diff simply refactors the `ProfDAGCounters` into two:
* `ProfDAGCounters` that gathers stats at runtime.
* `ProfDAGReport` which holds the report from the gathered stats once stats collection is done.

This refactoring allow us to implement `+=` for `ProfDAGReport`, which can be used for aggregating same-net reports on each host.

Reviewed By: donglimm

Differential Revision: D12837988

fbshipit-source-id: 0470c5fd6437f12711cab25a15a12965d79b2a91
2018-12-19 21:48:30 -08:00
b89b46abfb Remove python_default_init from ATen and use Optional (#15234)
Summary:
Optional clean up. This PR remove python_default_init from the yaml files, and the code-gen, and utilize optional type to do the work.

This also fix the bug in the #13149 to correctly adopt as_strided backward.

Fixes #9941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15234

Differential Revision: D13502044

Pulled By: wanchaol

fbshipit-source-id: 774b61fc4414482cf11d56e22bd0275aefb352a4
2018-12-19 21:38:50 -08:00
3fc889e976 Tensor construction codemod(ResizeLike) - 1/7 (#15073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15073

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13419563

fbshipit-source-id: 8c284405fa3a867303216df876ee6b20d8a46551
2018-12-19 21:38:48 -08:00
2db742fc95 Do not use fork to invoke test scripts in pytorch rocm CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14600

Differential Revision: D13523937

Pulled By: bddppq

fbshipit-source-id: 1493fdd051283650081d7944bb2bd7f0c4c44990
2018-12-19 21:35:16 -08:00
1071e92335 Replace Vec256<T>::size with constexpr method (#15406)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#15406 Replace Vec256<T>::size with constexpr method**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D13519902/)

See Note [constexpr static function to avoid odr-usage compiler bug]
for detailed justification.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15406

Differential Revision: D13523774

Pulled By: ezyang

fbshipit-source-id: c0ab44298bb2ef3d68a66d026fc6bc156a909a6b
2018-12-19 20:33:45 -08:00
9abd755a76 Make cpuinfo logging less verbose (#15405)
Summary:
Log only errors in cpuinfo.

Fix to #15401 and #15398
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15405

Differential Revision: D13526251

Pulled By: Maratyszcza

fbshipit-source-id: 4d9eba0912f7b45093bed2e343cd77a151ffa8c4
2018-12-19 20:23:36 -08:00
88bf683cbc Support error handling in forked threads (#14523)
Summary:
Save error info in the future for parent thread to pick up. Throw the error
when the thread is the root thread.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14523

Differential Revision: D13251756

Pulled By: highker

fbshipit-source-id: b40f9a45665e1a934743f131ec5e8bad5622ce67
2018-12-19 18:54:46 -08:00
5dd5ef3214 default options for OutputTensorCopyFrom (#15248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15248

OutputTensorCopyFrom takes four arguments: index, a source Tensor, TensorOptions and whether we want to perform an async call.
We want to provide some default option for TensorOptions, (1). default device to context_.device() (2). default dtype to input.dtype(). User can also explicitly provide these options to override default values.

next diff will change the order of TensorOptions parameter so that user don't need to write down tensor options unless they want to override.

Reviewed By: dzhulgakov

Differential Revision: D13453824

fbshipit-source-id: 87401f81c7c3f9fd3d8936c710e6c2e04a59b689
2018-12-19 18:14:47 -08:00
a00cfd1e9b Fix Module::copy_into
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15393

Differential Revision: D13519477

Pulled By: highker

fbshipit-source-id: d62928597ec0700b550e7cf481c8febae57b200d
2018-12-19 17:09:59 -08:00
0b219538cf add unpack_outputs to inlineCallTo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15382

Differential Revision: D13518844

Pulled By: zdevito

fbshipit-source-id: 981936988080af80629b70bf5f6dfa52ceb09c2f
2018-12-19 15:11:59 -08:00
07d20b1e7c Fix documentation (#15372)
Summary:
Current documentation example doesn't compile. This fixes the doc so the example works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15372

Differential Revision: D13522167

Pulled By: goldsborough

fbshipit-source-id: 5171a5f8e165eafabd9d1a28d23020bf2655f38b
2018-12-19 15:04:24 -08:00
055de167d5 computeChains with nomnigraph (#15366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15366

swap the old implementation with a slightly easier one to understand

I ran the tests and compared the number of chains compared to the old algorithm.  This one outperforms on every test, but we have yet to see if that impacts performance at all.

old chain 34 nomnigraph chain 25
old chain 46 nomnigraph chain 34
old chain 228 nomnigraph chain 188
old chain 397 nomnigraph chain 338

Reviewed By: ilia-cher

Differential Revision: D13057451

fbshipit-source-id: ccd050bfead6eb94ab9c7b0a70b09a22c2b9e499
2018-12-19 15:04:23 -08:00
9217bde807 Refactor dataloader.py (#15331)
Summary:
Same as #14668, and was approved there.

ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you!

Below is the original description at #14668:

As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.

So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.

No functionality is changed, except that  I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331

Reviewed By: yf225

Differential Revision: D13503120

Pulled By: ailzhang

fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e
2018-12-19 12:36:03 -08:00
41e7e1bc40 Rename potrs to cholesky_solve (#15334)
Summary:
Changelog:
- Renames `potrs` to `cholesky_solve` to remain consistent with Tensorflow and Scipy (not really, they call their function chol_solve)
- Default argument for upper in cholesky_solve is False. This will allow a seamless interface between `cholesky` and `cholesky_solve`, since the `upper` argument in both function are the same.
- Rename all tests
- Create a tentative alias for `cholesky_solve` under the name `potrs`, and add deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15334

Differential Revision: D13507724

Pulled By: soumith

fbshipit-source-id: b826996541e49d2e2bcd061b72a38c39450c76d0
2018-12-19 12:31:24 -08:00
33018e4e09 centralize side effects ops as node method (#15188)
Summary:
A number of different passes rely on whether a node has side effects. This centralizes the list of side effectful ops in one place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15188

Differential Revision: D13508438

Pulled By: eellison

fbshipit-source-id: 2143e782b787731ce007b6dcd50cbde30e1b8dd0
2018-12-19 10:52:54 -08:00
560530aeec Optional ScalarType support for native functions & JIT (#15154)
Summary:
For #6593 and #9515

This completes the support for optional<ScalarType> in native, JIT and autograd.

Note: Mostly following the existing implementation for optional<Scalar> that was added in https://github.com/pytorch/pytorch/pull/12582.

This PR introduces a way to make functions accept an optional dtype and it will unblock #9515 by allowing the `dtype` param for type promotion interface:
```
func: name(inputs, *, ScalarType? dtype=None, Casting casting=same_kind)
```

An alternative approach could have been using `ScalarType::Undefined` for the same purpose but without optional, though it would have been a bit hacky.
```
func: name(inputs, *, ScalarType dtype=Undefined, Casting casting=same_kind)
```

Here's an example use of this in action: 971f69eac6

There are already a bunch of native functions that were getting optional `dtype` through function overloading. https://github.com/pytorch/pytorch/pull/15133 is the attempt to migrate all of those. I will send those changes separately after this since some functions (e.g. sum) need quite a bit of change in the codebase. See the commits over there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15154

Differential Revision: D13457760

Pulled By: tugrulates

fbshipit-source-id: 706134f0bd578683edd416b96329b49a1ba8ab48
2018-12-19 10:45:35 -08:00
54d4fe3f49 Implement 'to' on ScriptModules (#15340)
Summary:
Following #6008
Fixes "Implement 'to' on ScriptModules #7354"

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15340

Differential Revision: D13506646

Pulled By: zdevito

fbshipit-source-id: 318fea2e8e51a37ce9844efa4c8db67d45a66317
2018-12-19 10:41:23 -08:00
1d94a2bee3 Update cpuinfo submodule (#15385)
Summary:
Pull cpuinfo changes that should make it work on AWS Lambda servers (which don't have `/sys/devices/system/cpu/{possible,present}` files, and probably don't mount sysfs at all).

I'm not 100% sure it will fix the issue, but getting this update in would make it easier for users to test using a nightly build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15385

Reviewed By: soumith

Differential Revision: D13517467

Pulled By: Maratyszcza

fbshipit-source-id: e8e544cd1f9dad304172ebb7b6ba7a8ad7d34e66
2018-12-19 07:31:45 -08:00
cbde820bc3 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: dfbdae40e505c46cd64751c6ec107c84f9434131
2018-12-18 23:37:34 -08:00
cd8dd49fba race condition fix of using mutable_data inside OPENMP region for batched matmul (#15371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15371

Similar to D13387692:

Never call mutable_data from an OpenMP region!!!

Reviewed By: jspark1105

Differential Revision: D13511259

fbshipit-source-id: 100812d2a547c0a1d5018749d5fdc88162375673
2018-12-18 23:22:56 -08:00
6ca1d93473 add whitelisted clang-format checks (#15254)
Summary:
This PR adds clang-format automation:
- It only checks on whitelisted files, so we can enable incrementally without noise
- There is a pre-commit hook provided that will do the same check, plus prompt users to apply the clang-format changes (no change is made without the user agreeing).

My plan is to migrate over whole files at a time, clang-formatting them and then adding them to the whitelist. Doing it this way should avoid too many merge pains (the most you'll have to is run clang-format on the affected file before rebasing).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15254

Differential Revision: D13515888

Pulled By: suo

fbshipit-source-id: d098eabcc97aa228c4dfce8fc096c3b5a45b591f
2018-12-18 22:34:20 -08:00
122b4ef41d build fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15384

Differential Revision: D13515708

Pulled By: zdevito

fbshipit-source-id: ea077cfec30edf41b85dc83c0a969d1146434145
2018-12-18 22:11:44 -08:00
0368054a6d Split up compiler.cpp (#15355)
Summary:
This separates the different parts of compiler.cpp to make their relationship more clear. In particular it adds:

* sugared_value.{h,cpp} - all the public SugaredValues that the compiler defines and a few that were inside compiler.cpp
* type_parser.{h, cpp} - Turns TreeRef's defining types into TypePtr
* schema_matching.{h, cpp} - infrastructure for matching arguments against overloaded schema and emitting builtin operators with a particular schema.
Retains:
* compiler.{h, cpp} - now responsible simply for the `defineMethodsInModule` infra structure.

Some utility functions like inlineCallTo have moved to ir.h.

Only thing that is not a move is some changes in module.h/cpp that remove multiple returns from `Method::emit_call_to`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15355

Reviewed By: suo, wanchaol

Differential Revision: D13507524

Pulled By: zdevito

fbshipit-source-id: 69ec936a9ff1a383c12a883616346b219c72e393
2018-12-18 19:43:35 -08:00
6ab2e7442d Autograd using torchscript (#14604)
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).

We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.

This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.

Differential Revision: D13494635

Pulled By: ailzhang

fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
2018-12-18 19:10:57 -08:00
4928c76415 Minor clean up for test_jit (#15368)
Summary:
* remove None args in functional tests
* remove some expect files that are not necessary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15368

Differential Revision: D13512349

Pulled By: wanchaol

fbshipit-source-id: 304cffff966487d15c373057ae8ad114ef8aa7f9
2018-12-18 18:26:37 -08:00
f3bff2d500 Add RNNCell modules to Script standard library (#14695)
Summary:
Adds RNNCell modules to script standard lib

cc apaszke for argument_spec changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14695

Differential Revision: D13467680

Pulled By: driazati

fbshipit-source-id: 13a14da87714325cc4c3d49e5fde8a850d5d757b
2018-12-18 17:28:28 -08:00
f3cc9b2218 Remove fully qualified weak script names (#15364)
Summary:
Cleanup to make references to `weak_script` consistent across codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15364

Differential Revision: D13509676

Pulled By: driazati

fbshipit-source-id: 93dbbbe57e9b9b6587895f3cc6fac678babd21de
2018-12-18 16:48:52 -08:00
096ee8467c Redefine scheduler to set learning rate using recursive formula (#14010)
Summary:
Modified step_lr for StepLR, MultiStepLR, ExponentialLR and CosineAnnealingLR. In this way, multiple schedulers can be used simultaneously to modify the learning rates.

Related issue: https://github.com/pytorch/pytorch/issues/13022

Added unit tests combining multiple schedulers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14010

Reviewed By: ezyang

Differential Revision: D13494941

Pulled By: chandlerzuo

fbshipit-source-id: 7561270245639ba1f2c00748f8e4a5f7dec7160c
2018-12-18 16:44:31 -08:00
5e97720100 Replace resize_dim() with set_sizes_and_strides() in (#15348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15348

We have a function resize_dim() on TensorImpl in c10/core/TensorImpl.h which lets you change the dimensionality of a tensor, resizing both sizes and strides. Unfortunately, this API is fairly easy to misuse, because it fills in the new entries with garbage when you size it larger. We want to refactor the call sites to use set_sizes_and_strides() instead, so that there is never an intermediate tensor state where the sizes/strides don't make sense. In this diff, resize_dim() is
replaced with set_sizes_and_strides() in aten/src/TH/THTensor.hpp.

Reviewed By: ezyang

Differential Revision: D13505512

fbshipit-source-id: 193bab89f0018c13ca07488be336d8e967746b76
2018-12-18 16:38:36 -08:00
5667af3880 Minor cleanup for TestFuser tests (#15134)
Summary:
Changelog:
- change some expect tests that didn't have to be expect tests,
  instead use self.assertAllFused
- Some of the fuser tests weren't using self.assertAllFused.
- Minor test renames

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15134

Differential Revision: D13507481

Pulled By: zou3519

fbshipit-source-id: dd0788530a60bb5ed2f42b961fae3db2b4404b64
2018-12-18 16:33:59 -08:00
3681bf7cff add dense vector to id_list operator (#15090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15090

as title
step 2 of the linked task

Reviewed By: ellie-wen

Differential Revision: D13425977

fbshipit-source-id: f3538ed68f42470ba39c5b779af764d4a5591a9d
2018-12-18 16:27:38 -08:00
f5da198236 fix clang-tidy script for python 3
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15360

Differential Revision: D13509668

Pulled By: suo

fbshipit-source-id: a3448a115eaac8dd4c3f179901a23bdbc5098408
2018-12-18 15:06:14 -08:00
2469f7e02e Port torch.linspace to ATen and parallelize it on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15320

Reviewed By: ezyang

Differential Revision: D13498995

Pulled By: gchanan

fbshipit-source-id: fba655d51d978fffaa53a5e4cae4a99ebfb0eddc
2018-12-18 15:01:49 -08:00
3118124cd6 Add (Un)Fold modules to standard library (#14759)
Summary:
Depends on #14597 for the corresponding aten ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14759

Differential Revision: D13325356

Pulled By: driazati

fbshipit-source-id: 99e39449c1ccfa293de05672c31a11e580bdd11f
2018-12-18 12:03:08 -08:00
f4c504593c Fix the (reduce)min and (reduce)max ONNX exporting (#15241)
Summary:
max and reducemax are smashed together, we need to support one input case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15241

Reviewed By: yinghai

Differential Revision: D13473312

Pulled By: houseroad

fbshipit-source-id: 9b8c847286a2631b006ca900271bc0d26574101a
2018-12-18 11:48:06 -08:00
056cfaf3ff Method returns a single argument (#15289)
Summary:
This PR changes Method (just Method not all graphs) to always have a single
return argument.

This is part 1 in a set of changes that will enable us to have better handling if early return statements.
The simplification that this change provides greatly reduces the work for the next step.

This change makes it so that Method and Python handle multiple returns in the same way:
* 0 - None
* 1 - <single value>
* many - Tuple[...]

The result is that a lot of special-case handling in compiler.cpp and its
bindings can be removed. It also fixes several bugs in return handling,
including one where return values were not always checked against their
attributed values.

Notes:
* inferTypeFrom is renamed to be more accurate and discourage use.
* This has uncovered some bugs in other components, which are noted in
  the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15289

Differential Revision: D13481649

Pulled By: zdevito

fbshipit-source-id: 0e2242a40bb28cca2d0e8be48bede96195e4858c
2018-12-18 10:44:09 -08:00
12cf5178aa caffe2 mobile opengl (#15322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15322

caffe2 mobile opengl code is not used, deleting it to reduce complications when we perform other changes

Reviewed By: Maratyszcza

Differential Revision: D13499943

fbshipit-source-id: 6479f6b9f50f08b5ae28f8f0bc4a1c4fc3f3c3c2
2018-12-18 08:20:52 -08:00
54d8ce94ee Revert D13383102: [pytorch][PR] Upgrade MKL-DNN to version 0.17
Differential Revision:
D13383102

Original commit changeset: c434f0e0ddff

fbshipit-source-id: 690f46ca0710954fa591a5ea77535e9759db4de5
2018-12-18 07:39:20 -08:00
bb9b7de831 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 4bf66581d07d839f459869bc9c6428011063cc5b
2018-12-17 21:25:36 -08:00
3a98462f2c improve script/no script save error (#15321)
Summary:
Improves the error message for #15116
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15321

Differential Revision: D13499379

Pulled By: zdevito

fbshipit-source-id: b8dc0a83efabff74199f4aab2ee98aa41c42608b
2018-12-17 21:13:58 -08:00
e37a22128e Allow tracing with fork/wait (#15184)
Summary:
There is still limitation on this: if a script module is somewhere
in the trace, the inputs/outputs can only be tensors or tuples of
tensors.

resolves #15052
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15184

Differential Revision: D13457691

Pulled By: highker

fbshipit-source-id: 8fe46afc41357a0eb8eadd83f687b31d074deb0e
2018-12-17 20:34:26 -08:00
Jie
bd958cde68 [TensorIterator fixing mean to output correct result for half precisi… (#14878)
Summary:
…on](#12115)

mean is calculated in two step sum()/numel(). For half precision, data gets
casted back to half after sum().
We fused the division into the reduction kernel by adding pre_op/post_op.

This allows us to do torch.ones(65536).cuda().half().mean() to return correct
result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14878

Differential Revision: D13491159

Pulled By: soumith

fbshipit-source-id: e83802e1628b6d2615c45e18d7acf991d143a09e
2018-12-17 20:13:30 -08:00
71ee882157 Reenable OpenMP by reverting the following two commits. (#15315)
Summary:
Revert "Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)"

This reverts commit a84e873bb156080ea76ab182171b1f3b4d5395f6.

Revert "Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)"

This reverts commit 8901935ad42fe9bf093d1106ea43606008a4024d.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15315

Differential Revision: D13495852

Pulled By: ezyang

fbshipit-source-id: bcd3f60088b14831c53d3c171f10cd1ab6b35dee
2018-12-17 19:54:41 -08:00
aec9fdf0a4 Fix _apply in nn.Module (#15305)
Summary:
Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module).

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305

Differential Revision: D13493937

Pulled By: goldsborough

fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750
2018-12-17 16:22:21 -08:00
2f38ffbcb3 Add a correctness check for C++ types to custom operators (#15247)
Summary:
The JIT uses `int64_t` for its integer type and `double` for its floating point type, but users quite often want to write `int` or `float` and that currently fails in not-so-nice ways for custom ops. This PR adds a simple `static_assert` to catch these common failure cases.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15247

Differential Revision: D13493941

Pulled By: goldsborough

fbshipit-source-id: c1cd0d10ab5838c75f167c0bdb57e45a0bc1344e
2018-12-17 16:17:27 -08:00
e650a84872 caffe2/python/task: added __repr__ methods to all task definitions (#15250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15250

This adds `__repr__` methods to all of the classes under task.py. This makes the objects much easier to interact with when using them in an interactive manner, such as in a Jupyter notebook.

The default `__repr__` method just returns the object ID which is very unhelpful.

Reviewed By: hanli0612

Differential Revision: D13475758

fbshipit-source-id: 6e1b166ec35163b9776c797b6a2e0d002560cd29
2018-12-17 16:02:16 -08:00
e0b261a35b Port nn fold and unfold to c++
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14597

Reviewed By: ezyang

Differential Revision: D13272227

fbshipit-source-id: 6eccab5ff5830a977398a96393b778095120edc6
2018-12-17 15:46:37 -08:00
c66adfc16b Allow future type parsing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14887

Differential Revision: D13490984

Pulled By: highker

fbshipit-source-id: 165fe995867be273793f983154aa6cbce13e4396
2018-12-17 15:39:52 -08:00
efb37e86eb Removing BUILD_C10_EXPERIMENTAL_OPS option and unglobbing experimental/c10d ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15064

Reviewed By: orionr

Differential Revision: D13474801

Pulled By: pjh5

fbshipit-source-id: 9d3664c3a3a1b6c2d9f083f8476fe3b037296b98
2018-12-17 15:35:41 -08:00
59d71b9664 Bicubic interpolation for nn.functional.interpolate (#9849)
Summary:
Addresses #918, interpolation results should be similar to tf

* Adds bicubic interpolation operator to `nn.functional.interpolate`
* Corresponding test in `test_nn.py`

The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849

Differential Revision: D9007525

Pulled By: driazati

fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc
2018-12-17 15:31:48 -08:00
c5dd91c4ae add isinstance static type checking for jit (#15076)
Summary:
This PR add isinstance to do static type checking in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15076

Differential Revision: D13471067

Pulled By: wanchaol

fbshipit-source-id: d39b7ed5db9fcca4b503659d02cf7795950ea8ea
2018-12-17 15:21:49 -08:00
216ab259fb Fix the missing caffe2 proto files for Windows (#15157)
Summary:
Fixes #15156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15157

Differential Revision: D13490420

Pulled By: orionr

fbshipit-source-id: 4387d707f634a5975238af915b1befb2277f8ec7
2018-12-17 15:21:47 -08:00
f4c59c5fdf Replace SwitchToDevice(0) with SwitchToDevice() (#15126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15126

I want to make people stop manufacturing StreamId from thin air,
and a first step is to make people use the default stream.

Reviewed By: dzhulgakov

Differential Revision: D13432922

fbshipit-source-id: 9f0d8d70646c50d979bde5ba3c3addeebac48a3d
2018-12-17 15:15:00 -08:00
df4c9471ec Don't enforce docstrings on bool dispatch (#15306)
Summary:
Allows 2 functions that are boolean dispatched to have no docstrings (the only case that will fail now is if both functions have docstrings)

Fixes #15281
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15306

Differential Revision: D13494884

Pulled By: driazati

fbshipit-source-id: 65fec39ae03a7d6a68ad617c9b270faeb1617930
2018-12-17 14:41:05 -08:00
95d3fed68f Fix for issue 14829 (#14908)
Summary:
* Modify the testcase as outlined in the issue
   * Issue url: https://github.com/pytorch/pytorch/issues/14829
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14908

Differential Revision: D13490360

Pulled By: ezyang

fbshipit-source-id: ff11a72e19b49223652182e82c2b4e65fe444ca7
2018-12-17 14:28:50 -08:00
e07fc114a0 Minor fixes in .jenkins/caffe2/bench.sh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15304

Differential Revision: D13493876

Pulled By: bddppq

fbshipit-source-id: 7146eb2587e526af65b4b0290c25bd55653a3088
2018-12-17 13:53:55 -08:00
700271d0e9 Adding ONNX export for torch.expand and torch.ne (#15050)
Summary:
`torch.expand` and `torch.ne` are used often in models and this PR adds ONNX export support for them. ArmenAg has created issue https://github.com/pytorch/pytorch/issues/10882 for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15050

Differential Revision: D13453036

Pulled By: houseroad

fbshipit-source-id: 4724b4ffcebda6cd6b2acac51d6733cb27318daf
2018-12-17 13:48:14 -08:00
3df79f403e Tighten up invariants regarding StreamId. (#15125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15125

I realized that it is really bad juju if you fake a StreamId
out of thin air, because in general this isn't going to work.
So, make the constructor a lot scarier.

Most "faking StreamId out of thin air" happens because someone
just wants to put something on the default stream.

Reviewed By: dzhulgakov

Differential Revision: D13432800

fbshipit-source-id: a86991d6fc1d8aa4e54e8175e5f06f90856238e6
2018-12-17 13:30:54 -08:00
1dbc7cff3e Fix tensor printing bug in Python 2 (#12732)
Summary:
`rsplit` doesn't have kwargs in Python 2 so this line raises an error

Fixes #15135
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12732

Differential Revision: D10458630

Pulled By: driazati

fbshipit-source-id: a63e42fbc0e39e4291480775b516c98122ec05a1
2018-12-17 13:17:51 -08:00
d71fac20eb Refactor hotpatch_vars and apply it to libtorch (#14976)
Summary:
Fixes #14801.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14976

Differential Revision: D13485381

Pulled By: soumith

fbshipit-source-id: 0af3c2e1b90988d56f6f85632328d1e4b788ffd2
2018-12-16 21:53:31 -08:00
656b565a0f Trivial comment correction in dataloader (#15276)
Summary:
Trivial comment correction in dataloader
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15276

Differential Revision: D13477324

Pulled By: soumith

fbshipit-source-id: 2a74a014999655d129311d611f2a09411339cb13
2018-12-15 10:59:00 -08:00
c51c825efe Delete ffi documentation (#15220)
Summary: Deleting FFI documentation since its deprecated.

Differential Revision: D13477329

Pulled By: soumith

fbshipit-source-id: 0b3d485eb7cef1f05b6b397dff50f21a49d6409e
2018-12-15 09:49:02 -08:00
60badccd10 Fix a typo in the assert
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15265

Reviewed By: llyfacebook

Differential Revision: D13477029

Pulled By: sf-wind

fbshipit-source-id: 9c5571a583c01f9701625541ebec0c836cb923f2
2018-12-15 09:09:09 -08:00
4bcb425490 fix cholesky call in potrs example (#15215)
Summary:
Cholesky by default returns the lower triangular matrix, see [docs](https://pytorch.org/docs/stable/torch.html#torch.cholesky).

However `torch.potrs` by default requires the upper triangular matrix. The naming of the variable `u` suggests that the example expects the upper to be returned, so I've added the flag to make that happen in the example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15215

Differential Revision: D13476468

Pulled By: soumith

fbshipit-source-id: 7b68035f435a2b1be4d363b3f63e407394af949d
2018-12-15 04:43:34 -08:00
2b57bd4107 value-based mark and sweep DCE (#14910)
Summary:
This makes DCE more granular by tracking live values/aliases through the graph (rather than just nodes). So we can be more aggressive in DCE around control flow blocks. For example, in:
```
%a0 = aten::foo()
%b = aten::foo()
%a2, %b2 = prim::If(%cond) {
  block0() {
    %a1 = aten::foo(%.0)
    %b1 = aten::foo(%b)
  } -> (%a1, %b1)
}
return (%a2)
```
we will now dce all the `%b` stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14910

Differential Revision: D13476445

Pulled By: suo

fbshipit-source-id: 2bf5db19711c07dde946697a4f4b270bd8baf791
2018-12-15 01:16:44 -08:00
df614371c7 Mention Jacobian-vector product in the doc of torch.autograd (#15197)
Summary:
A friend of me is learning deep learning and pytorch, and he is confused by the following piece of code from the tutorial https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients :

```python
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)
```

He don't know where the following line comes from:
```python
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
```

What are we computing? Why don't we compute "the gradient of `y` w.r.t `x`"?

In the tutorial, it only says
> You can do many crazy things with autograd!

Which does not explain anything. It seems to be hard for some beginners of deep learning to understand why do we ever do backwards with external gradient fed in and what is the meaning of doing so. So I modified the tutorial in https://github.com/pytorch/tutorials/pull/385
and the docstring correspondingly in this PR, explaining the Jacobian vector product. Please review this PR and https://github.com/pytorch/tutorials/pull/385 together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15197

Differential Revision: D13476513

Pulled By: soumith

fbshipit-source-id: bee62282e9ab72403247384e4063bcdf59d40c3c
2018-12-15 00:10:30 -08:00
5b542a755f Tensor method rename dims()->sizes() (#15246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15246

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: igorsugak

Differential Revision: D13470369

fbshipit-source-id: ce995beab7c64bebe8b234fb5e6d015940ec2952
2018-12-14 21:11:02 -08:00
f118568662 Create parser.cpp (#15238)
Summary:
Moves implementation into .cpp file. Parser was getting included in several compilation units.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15238

Differential Revision: D13474635

Pulled By: zdevito

fbshipit-source-id: 7dc824eea8f506d6c8ae1aa67aeec0c34d5285fc
2018-12-14 19:31:36 -08:00
e1808be37d Add several features to converting images to blobs (#15204)
Summary:
Several enhancements are implemented:

* Resize the images to be within a boundary between min-size and max-size (can be height and weight). It tries to resize the minimum size to match the min-size and keep the aspect ratio. However, if in that case the maximum size is more than the max-size, then resize the maximum size to be equal to the max-size (and the minimum size is less than min-size). The min/max sizes are specified in argument scale, in a comma separated form. If one of the size is -1, then that size is not a restriction.

* Change the OpenCV resize function arguments from using cv::Size() to the x, y scale. Theoretically they should be the same. But in reality, the two ways of specifying them may result to different resized outputs.

* Once the image is read in, change the data to floats. That means, after resize and other preprocessing steps, the float values are preserved (not truncated to int).

* It is possible to convert data in text format to the blob format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15204

Reviewed By: llyfacebook

Differential Revision: D13467225

Pulled By: sf-wind

fbshipit-source-id: 7da34a72d43a9603cd7ab953f5821c1222d0178f
2018-12-14 17:37:21 -08:00
717496e6c1 Supply static shape info to Reshape when doing onnxGetCompatibility (#15242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15242

Newer version ONNX Reshape gets shape info from a tensor. Hence for static backend, we need to provide this info to it when doing `onnxGetCompatibility` too.

Reviewed By: jackm321

Differential Revision: D13471959

fbshipit-source-id: 8a58e28edd900b6ad54a1dbd63ff2579fbe0e820
2018-12-14 16:37:39 -08:00
763b9954f3 FP16MomentumSGDUpdate Op fix and enable for ROCm (#15150)
Summary:
1. Fix a bug in FP16MomentumSGDUpdate operator
2. Enable operator for ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15150

Differential Revision: D13473145

Pulled By: bddppq

fbshipit-source-id: 4c5c5f30cb9bba658e3639dbe193fa08a304d306
2018-12-14 16:33:45 -08:00
e596d23137 Start unittesting our main observer (#15191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15191

OSS:

just splitting out basic flags from a unit test. So I can extend them in another test where I need to add additional flags.

Reviewed By: yinghai

Differential Revision: D13159184

fbshipit-source-id: 9823e792cf0ed8d0379235c44564862b7d784845
2018-12-14 16:24:38 -08:00
34f1f2208b Build c10 HIP test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15233

Reviewed By: ezyang

Differential Revision: D13471002

Pulled By: bddppq

fbshipit-source-id: b42c3bc2b9db672ce50a52eb700cc6ed13d3535f
2018-12-14 15:36:38 -08:00
5e09c7bc80 record unit time in torch.cuda.event (#15221)
Summary: Record unit of time for torch.cuda.Event's elapsed_time

Differential Revision: D13467646

Pulled By: zou3519

fbshipit-source-id: 4f1f4ef5fa4bc5a1b4775dfcec6ab155e5bf8d6e
2018-12-14 15:29:06 -08:00
054456eb93 Preserve module hierarchy on traced modules (#15101)
Summary:
We need this, for example, to properly call `_unpack` when we have a traced module in the hierarchy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15101

Differential Revision: D13468467

Pulled By: jamesr66a

fbshipit-source-id: c2b6740b12cde6e23395d12e42d4fc2c4c7ca3f2
2018-12-14 15:07:51 -08:00
60f02b87be fix an issue where two rules build the same .py files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15230

Differential Revision: D13471625

Pulled By: zdevito

fbshipit-source-id: a982413a308c7a9bb5b6a82fe96fd3de44f555aa
2018-12-14 14:52:52 -08:00
bd368b867d Do not ifdef __launch_bounds__ out for ROCm. (#15228)
Summary:
The compiler understands it and profits from knowing it by not using too
many VGPRs as it defaults to 256 default workgroup size.

Fixes a problem in bringup of ROCm 2.0 on gfx906.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15228

Differential Revision: D13470950

Pulled By: bddppq

fbshipit-source-id: f9aa44c7c95299a099c0ea9317b9044cc056acc5
2018-12-14 14:47:32 -08:00
dcd1685282 Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated
Differential Revision:
D13440858

Original commit changeset: 1c6af5c53538

fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298
2018-12-14 14:35:01 -08:00
9f1d8f2eeb enabled tests in test_nn, test_cuda and test_sparse (#15232)
Summary:
tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232

Differential Revision: D13470991

Pulled By: bddppq

fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925
2018-12-14 14:27:57 -08:00
e9fb4d1f11 Fix jit doc codeblocks and tables (#15227)
Summary:
Some of the codeblocks were showing up as normal text and the "unsupported modules" table was formatted incorrectly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15227

Differential Revision: D13468847

Pulled By: driazati

fbshipit-source-id: eb7375710d4f6eca1d0f44dfc43c7c506300cb1e
2018-12-14 14:27:56 -08:00
b316e44a46 Remove __forceinline__ hipification step. (#15229)
Summary:
The HIP definition now correctly contains the inline attribute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15229

Differential Revision: D13470962

Pulled By: bddppq

fbshipit-source-id: 34f8361bda5f3dce20a2eeb530c3a25d1b1bdd06
2018-12-14 14:24:05 -08:00
7a61306031 Enable all clang-tidy performance checks (#15198)
Summary:
This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings.

![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png)

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198

Differential Revision: D13468797

Pulled By: goldsborough

fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2
2018-12-14 13:32:47 -08:00
fc2856e9aa Refactor caffe2 CI scripts and add benchmark scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14575

Differential Revision: D13468049

Pulled By: bddppq

fbshipit-source-id: e73bc8742c8a03f498816eee8a72b06a3e19fe48
2018-12-14 13:19:33 -08:00
4327a2d70a Better tests/support for Python/C++ inter-op (#15193)
Summary:
Methods like `module.named_modules()` returns a container of `shared_ptr<nn::Module>`. Currently the `nn::Module` base class does  not have Python bindings. This PR fixes this, and adds more unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15193

Differential Revision: D13458713

Pulled By: goldsborough

fbshipit-source-id: 4091fe1b96a1be8db14c6a4307fbacc2b41ff6fe
2018-12-14 08:42:10 -08:00
fb8487d708 Tensor construction codemod(ResizeLike) - 3/7 (#15122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15122

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13419643

fbshipit-source-id: 65b5a037b94d458b944d51f790ba2829db1fb530
2018-12-14 02:08:37 -08:00
78bf1a9065 Revert D13407930: [pytorch][PR] Support torch.tensor in script
Differential Revision:
D13407930

Original commit changeset: d17f1195a221

fbshipit-source-id: f4458872c48ec4a2c9983b21ed90bcdc0ae665b7
2018-12-13 22:13:07 -08:00
331c4b5b4d caffe2 - make DataRandomFiller usable in unit tests (#15027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15027

- Make DataRandomFiller able to accept input_dims and input_types for only non intermediate inputs. Add a helper to fill input directly to a workspace

Reviewed By: highker

Differential Revision: D13408345

fbshipit-source-id: 5fc54d33da12e3f0a200e79380d4c695b0339b17
2018-12-13 20:45:52 -08:00
66b26806fc caffe2 - easy - utils to set argument of operator (#15022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15022

Add setArgument testing utils to make it easy to set argument for an operator

Reviewed By: yinghai

Differential Revision: D13405225

fbshipit-source-id: b5c1859c6819d53c1a44718e2868e3137067df36
2018-12-13 20:45:50 -08:00
9726651d1e caffe2 - easy - test utils for tensor assertion (#15020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15020

Add test utils for assertion of a tensor (sizes and values)

Reviewed By: salexspb

Differential Revision: D13401146

fbshipit-source-id: bc385df074043e03ea884940b5631b96de4a607e
2018-12-13 20:45:48 -08:00
d0b4ae835d caffe2 - easy - test utils to compare tensors in two workspaces (#15181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15181

Add test utils to compare tensors in two workspaces

Reviewed By: ZolotukhinM

Differential Revision: D13387212

fbshipit-source-id: e19d932a1ecc696bd0a08ea14d9a7485cce67bb2
2018-12-13 20:45:46 -08:00
a0f68646ac caffe2 - easy - test utils to fill tensors (#15019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15019

Put some utils to fill tensors to test_utils

Reviewed By: salexspb

Differential Revision: D13386691

fbshipit-source-id: 51d891aad1ca12dc5133c0352df65b8db4f96edb
2018-12-13 20:45:44 -08:00
8fedde5530 caffe2 - easy - test utils to create operator (#15180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15180

Test utils to create an operator

On top of D13370461

Reviewed By: ZolotukhinM

Differential Revision: D13382773

fbshipit-source-id: a88040ed5a60f31d3e73f1f958219cd7338dc52e
2018-12-13 20:45:42 -08:00
eb6fec3652 caffe2 - easy - Create test_util to make it easier to write C++ unit tests (#15014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15014

Currently it looks like many of the simple operations such as comparing tensors, creating tensors, fetching tensors... are too verbose and took effort to write correctly in unit tests.
Easy to use utilities are often more important to increase productivity writing unit tests. While caffe2 python unit tests are relatively easier to write at the moment, the C++ side seems lacking.
In this change I create a test_util, started with assertsTensorEquals, getTensor, createTensor, and we can start putting more easy to use utilities there.

Reviewed By: salexspb

Differential Revision: D13370461

fbshipit-source-id: bee467a127e1d032ef19482f98aa5c776cf508c0
2018-12-13 20:45:41 -08:00
81644ed9ab Fix derivative for mvlgamma (#15049)
Summary:
Fixes #15015.

Added tests to validate derivative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15049

Reviewed By: soumith

Differential Revision: D13434117

Pulled By: zou3519

fbshipit-source-id: 4a292600af9eb08b67c0f8b5482e9512aac95e72
2018-12-13 20:32:57 -08:00
0b9b965c1a Fix numpy conversion for int8 tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15194

Differential Revision: D13459270

Pulled By: li-roy

fbshipit-source-id: 605534add263860a3ad9a7fa70888301ee0bf8e4
2018-12-13 19:38:09 -08:00
fb140c7828 add erf and erfc to fuser/autodiff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15139

Differential Revision: D13455690

Pulled By: soumith

fbshipit-source-id: b06e5f5d362869c2e5fa11a52f9450d77c30d4cb
2018-12-13 19:17:40 -08:00
bb8ee2de0f Move TensorImpl::CopyFrom to caffe2::Tensor (2/2) (#14858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14858

This diff doesn't change logic but just takes the existing code and moves it to caffe2::Tensor

Reviewed By: ezyang

Differential Revision: D13365817

fbshipit-source-id: bc73b27a793602cb14200dcdf357aa63233da43c
2018-12-13 18:41:24 -08:00
070f33f154 Move TensorImpl::CopyFrom to caffe2::Tensor (1/2) (#14656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14656

This diff doesn't move it yet, but prepares it to be moved, i.e. removes all access to class internals.

dzhulgakov: Please comment on if you think it still makes sense to land this even though it's not blocking anymore since we're going to move at::CopyBytes anyhow.

ezyang: There's some changes in the implementation, especially handling undefined dest tensors. Please review carefully.

Reviewed By: ezyang

Differential Revision: D13287688

fbshipit-source-id: 17800ca8a79ab1633f23be58d96f99a160d8ed24
2018-12-13 18:41:23 -08:00
dc72a5e02c For rotated proposals, replace cv::rotatedRectangleIntersection with a correct version that doesn't have underflow problem (#15113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15113

cv::rotatedRectangleIntersection has a known float underflow bug that would cause failure in ```CV_Assert(intersection.size() <= 8)```

For rotated proposals, replace cv::rotatedRectangleIntersection with a correct version that doesn't have underflow problem.

Otherwise, when ```USE_CPP_GENERATE_PROPOSALS = true```, the training would fail.

Reviewed By: viswanathgs

Differential Revision: D13429770

fbshipit-source-id: 5e95d059f3c668f14059a0a83e8e53d8554cdb99
2018-12-13 18:13:46 -08:00
aecab53778 Support torch.tensor in script (#14913)
Summary:
Adding support for torch.tensor in script.

The input list is typed as t[], because it can be arbitrarily nested. I added a check a compile time check  that the inner type of the list is a bool, float, or int.

Also adds specialization for Boolean Lists, which already existed at the ivalue level but had not been added to the compiler yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14913

Differential Revision: D13407930

Pulled By: eellison

fbshipit-source-id: d17f1195a22149d5b0d08d76c89a7fab8444f7c5
2018-12-13 17:38:38 -08:00
bbbfda72a0 Remove TensorImpl -> Type dependency
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15086

Reviewed By: dzhulgakov

Differential Revision: D13425628

fbshipit-source-id: 08a8a774d17b071367454e027012a02f96d177d4
2018-12-13 17:10:59 -08:00
1e9c384afb Enable performance-unnecessary-value-param in .clang-tidy (#15026)
Summary:
This PR fixes around 250 places in the codebase where we were making unnecessary copies of objects (some large, some small).

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15026

Differential Revision: D13458784

Pulled By: goldsborough

fbshipit-source-id: be5148b2ce09493588d70952e6f6d6ff5ec5199b
2018-12-13 16:15:35 -08:00
bdfff2f8c2 Add missing caffe2_hip extension in setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15189

Reviewed By: orionr

Differential Revision: D13457644

Pulled By: bddppq

fbshipit-source-id: c2363e9b8fd21709b62777e5b2199f01ec1c65f8
2018-12-13 15:59:51 -08:00
de0784510d Remove disabled_features in hipify
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15098

Reviewed By: ezyang

Differential Revision: D13453762

Pulled By: bddppq

fbshipit-source-id: e177042c78f5bf393163d660c25b80285353853d
2018-12-13 15:43:57 -08:00
855d9e1f19 Run ONNX cuda backend test cases via ROCm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15069

Differential Revision: D13427757

Pulled By: bddppq

fbshipit-source-id: ba0273d75986cd5b146f7041a83c63ddf9c6c0cf
2018-12-13 15:10:00 -08:00
6911ce19d7 Remove _finfo; replace _finfo usage with torch.finfo (#15165)
Summary:
This PR removes the usage of _finfo defined in torch.distributions.utils and changes the call sites
to use torch.finfo instead

Differential Revision: D13451936

Pulled By: soumith

fbshipit-source-id: 6dbda3a6179d9407bc3396bf1a2baf3e85bc4cf2
2018-12-13 14:30:27 -08:00
f1f7c16c90 Tensor construction codemod(ResizeLike) - 4/7 (#15088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15088

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419682

fbshipit-source-id: 3e59403bc1c0e71e5cb66df932ed0c6a0a72e643
2018-12-13 13:39:56 -08:00
cbd1c519c4 Replace non-printable-ascii characters in ProtoDebugString (#14918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14918

When ProtoBuf-Lite is in use, ProtoDebugString just calls SerializeAsString.
This produces binary output, which is not a very suitable "debug" string.
Specifically, we've observed it causing problems when calling code tries to
add the debug string to a Java exception message (which requires valid UTF-8).
Now, we replace all non-ASCII bytes with "?".

This is not a very fast implementation, but generating debug strings shouldn't
be a performance-sensitive operation in any application.

Reviewed By: dzhulgakov

Differential Revision: D13385540

fbshipit-source-id: 8868172baf20efaf53fecf7d666a6980f59b64f5
2018-12-13 13:16:24 -08:00
994f72ee3e Tensor construction codemod(ResizeLike) - 6/7 (#15137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15137

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419736

fbshipit-source-id: f4ad7b9582c2f809258169b7fef9adbca7063d99
2018-12-13 12:47:33 -08:00
43c0b50c2e Tensor construction codemod(ResizeLike) - 5/7 (#15084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15084

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419711

fbshipit-source-id: dd2b740c3f13d8087085bafc5571aaf908d1af42
2018-12-13 12:42:52 -08:00
86fbf17ba6 Use std::vector instead of alloca to work around hcc crash
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15175

Differential Revision: D13453708

Pulled By: bddppq

fbshipit-source-id: f8c147ae9f679e395fee9d4c73ebcca052c9a752
2018-12-13 12:34:36 -08:00
f61612206c Fix old tensor OutputTensorCopyFrom usage in ImageInput operator (#15094)
Summary:
cc jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15094

Differential Revision: D13451898

Pulled By: bddppq

fbshipit-source-id: 27906be62fb88aaa13c257441a2e35a285b445ee
2018-12-13 11:48:19 -08:00
e5bd6fe86d Kill non-forward, non-backward functions generated from nn.yaml (#15127)
Summary:
Updating binding to legacy functions.
Remove unused declarations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15127

Differential Revision: D13433405

Pulled By: VitalyFedyunin

fbshipit-source-id: 58544d38affd20818742338c9eb789d9d14ccbaa
2018-12-13 11:34:50 -08:00
bc80deea1b Delete defunct USE_SIMPLE_BASE_CTOR_DTOR (#15144)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15144

Differential Revision: D13440872

Pulled By: ezyang

fbshipit-source-id: 2b1d73fac0c63729ba01d8f129642334ae9d9cf3
2018-12-13 11:20:37 -08:00
e51092a2b8 Fix typo (#15045)
Summary:
Simple typo fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15045

Reviewed By: dzhulgakov

Differential Revision: D13413509

Pulled By: houseroad

fbshipit-source-id: be66700c30d038368b1433232a4e3fd9299c83d6
2018-12-13 11:13:19 -08:00
ca4358c8f5 Use a pool of per-thread cudnn handles for each device, updated (#15080)
Summary:
Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080

Differential Revision: D13440858

Pulled By: ezyang

fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035
2018-12-13 10:24:06 -08:00
214f46faf5 Fix bincount for non-contiguous inputs on CPU (#15109)
Summary:
Fixes #15058.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15109

Differential Revision: D13447448

Pulled By: soumith

fbshipit-source-id: 56e8d42934538fb00465105a2c5ccfeb7c18a651
2018-12-13 09:44:20 -08:00
bf7a2b9125 Unify SparseTensorImpl::size_ and TensorImpl::sizes_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15130

Differential Revision: D13434981

Pulled By: VitalyFedyunin

fbshipit-source-id: 98bd4d66834a3c3d2ea577adb0c8413852da095d
2018-12-13 08:55:35 -08:00
0bf1383f0a Python <-> C++ Frontend inter-op (#13481)
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).

I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.

The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.

apaszke zdevito

CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481

Differential Revision: D12981996

Pulled By: goldsborough

fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
2018-12-13 08:04:02 -08:00
b14d6d730a Reuse KernelSpec for FusionGroups with equivalent graphs (#14541)
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.

This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.

In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.

This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541

Differential Revision: D13324487

Pulled By: zou3519

fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
2018-12-13 07:54:35 -08:00
aa022313cb Removes THCNumerics usages in RNN.cu (#15085)
Summary:
We don't need THCNumerics here since at::Half can be implicitly converted to float and the cuda math dispatches are handled by `/usr/local/cuda/include/crt/math_functions.hpp` and `cmath`. ATen should be free of THCNumerics after this and when porting kernels from THC, one should not use THCNumerics.

Should close: https://github.com/pytorch/pytorch/issues/11878
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15085

Differential Revision: D13447558

Pulled By: soumith

fbshipit-source-id: 4ff5cbf838edcd01e2d1397e4d7f4f920e9e9fc3
2018-12-13 00:24:17 -08:00
1e0eab5df8 minimize header file includes from _avx2.cc (#14950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14950

Minimize the number of headers included from _avx2.cc files to avoid accidental compilation of functions defined the header files reused by other translation units that can lead to illegal instruction errors.

Reviewed By: dskhudia

Differential Revision: D13394483

fbshipit-source-id: 67149a6fb51f7f047e745bfe395cb6dd4ae7c1ae
2018-12-13 00:18:11 -08:00
4b97a46421 Disable strict-overflow flag to avoid compilation error (#14977)
Summary:
Disable strict-overflow flag to avoid compilation error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14977

Differential Revision: D13447577

Pulled By: soumith

fbshipit-source-id: 1957bd5aa3c7b79219da3dd53560464977c89526
2018-12-12 22:41:33 -08:00
1e93317b99 Remove "early-release beta" disclaimer from README (#15136)
Summary:
Now that PyTorch 1.0 is out, this should be updated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15136

Differential Revision: D13447377

Pulled By: soumith

fbshipit-source-id: bd4e662c53d0699f25d4d90c1b4c1e182b4427c2
2018-12-12 22:14:14 -08:00
fabd23cb2d support casting to string (#15110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110

support casting to string on CPU

Reviewed By: intermilan

Differential Revision: D13429381

fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1
2018-12-12 21:33:58 -08:00
1717ea1da0 Implementation of ChannelShuffle Op for MKLDNN (#15106)
Summary:
the speed-up of a single operation is up to 3X .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106

Differential Revision: D13429596

Pulled By: bddppq

fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5
2018-12-12 20:25:12 -08:00
895cb8fcea Fix resize for edge case tensors (#14874)
Summary:
Certain tensor shapes failed when being resized. This pull request addresses the bug found in #13404.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14874

Differential Revision: D13429788

Pulled By: soumith

fbshipit-source-id: 8aa6451dbadce46d6d1c47a01cb26e6559bcfc8c
2018-12-12 19:56:23 -08:00
78a77667dd Autoformat build_variables.py (#15152)
Summary:
autoformat `tools/build_variables.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15152

Differential Revision: D13445343

Pulled By: goldsborough

fbshipit-source-id: fd63588de114cb92deda03fa1a0b36f5f9082b2f
2018-12-12 19:30:17 -08:00
fab78827d6 don't compile dnnlowp.cc in avx2 option (#15147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15147

Forgot to take out dnnlowp.cc from avx2 list in a previous diff.

Reviewed By: dskhudia

Differential Revision: D13440686

fbshipit-source-id: 9ada98b6e885c7d5f22c91a735ff60304480b4cb
2018-12-12 18:57:09 -08:00
d8260239a0 docs: minor spelling tweaks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15148

Differential Revision: D13443708

Pulled By: suo

fbshipit-source-id: 5e3ec0afd3416ab8ce207f2d04105c49e1c04611
2018-12-12 18:17:14 -08:00
2211a283d2 Export defs.bzl to open source for pytorch (#15132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15132

Pull Request resolved: https://github.com/facebook/fbshipit/pull/64

Reviewed By: dzhulgakov

Differential Revision: D13424093

fbshipit-source-id: bbebef964b9f3aef8f59cd394eca068680c36b5a
2018-12-12 17:40:29 -08:00
107c9ef518 Add back c2 string_utils include header to benchmark_helper
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15143

Differential Revision: D13439694

fbshipit-source-id: 78698b66d52a0178118cbf3e79a7a5ad1763d47b
2018-12-12 16:38:00 -08:00
6610ace28b use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994)
Summary:
* relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2
* use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API
* with this: enable all but one half test in test_nn

While there, fix also:
* a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994

Differential Revision: D13439869

Pulled By: bddppq

fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e
2018-12-12 16:16:47 -08:00
f34d827007 Optimize CPU GenerateProposals op by lazily generating anchors (3-5x faster) (#15103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15103

There are two main optimizations in this diff:
1. We generate all anchors for every single spatial grid first, and then apply
NMS to pick 2000 anchors according to RPN_PRE_NMS_TOP_N. By first sorting the
score and picking the 2000 top ones and then lazily generating only the
corresponding anchors is much faster.
2. Transposing bbox_deltas from (num_anchors * 4, H, W) to
(H, W, num_anchors * 4) was also quite slow - taking about 20ms in the RRPN
case when there are lots of anchors which it's negligible for RPN case (like
0.1 ms). Instead of transponsing, performing all operations in the
(num_anchors, H, W) format speeds things up.

For regular RPN scenario, this gives 5x speedup from 5.84ms to 1.18ms a case
with 35 anchors over a 600x600 image.

For rotated boxes with 245 anchors, the runtime down from 80ms to 27ms per
iter.

Reviewed By: newstzpz

Differential Revision: D13428688

fbshipit-source-id: 6006b332925e01a7c9433ded2ff5dc9e6d96f7d3
2018-12-12 15:53:52 -08:00
90f9e8103c Implement torch.tril_indices and torch.triu_indices (#12653) (#14904)
Summary:
This is an optimized implementation that does the following:

1. created an empty Tensor of correct size.
2. fill the Tensor with correct values.

The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors.

1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations.
2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration.
3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it.

<img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png">

NOTE:

This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following:

```python
x = torch.ones(3, 3)
i = torch.tril_indices(3, 3)
x[i]  # need to first convert the 2D tensor into a tuple of two 1D tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904

Reviewed By: zou3519

Differential Revision: D13433027

Pulled By: mrshenli

fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a
2018-12-12 15:40:14 -08:00
342e62f1e3 Minor documentation mistake (#15068)
Summary:
keepdim is a optional parameter for torch.max()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15068

Differential Revision: D13437745

Pulled By: zou3519

fbshipit-source-id: b5198c7d4ae17758cd136f6e5aecc6cb5838f174
2018-12-12 15:24:26 -08:00
5837320b70 Add script standard library documentation + cleanup (#14912)
Summary:
Documents what is supported in the script standard library.

* Adds `my_script_module._get_method('forward').schema()` method to get function schema from a `ScriptModule`
* Removes `torch.nn.functional` from the list of builtins. The only functions not supported are `nn.functional.fold` and `nn.functional.unfold`, but those currently just dispatch to their corresponding aten ops, so from a user's perspective it looks like they work.
* Allow printing of `IValue::Device` by getting its string representation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14912

Differential Revision: D13385928

Pulled By: driazati

fbshipit-source-id: e391691b2f87dba6e13be05d4aa3ed2f004e31da
2018-12-12 12:30:13 -08:00
64b3364209 Move adaptive avg pooling 2d to ATen native (#14714)
Summary:
adaptive_avg_pool1d, adaptive_avg_pool2d, and adaptive_avgpool3d are neural network functions that are currently implemented in our legacy THNN (CPU) / THCUNN (CUDA) libraries.  It is generally better if these live in our new library ATen, since it is more feature complete and reduces cognitive overhead.

This change moves currently to adaptive_avg_pool1d and adaptive_avg_pool2d to ATen.

timed relevant cpu tests with this change:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.s.s.s.s.s.s.s...
----------------------------------------------------------------------
Ran 17 tests in 6.273s

OK (skipped=7)

real	0m7.164s
user	3m1.289s
sys	0m0.905s
```

compared to master:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.s.s.s.s.s.s.s...
----------------------------------------------------------------------
Ran 17 tests in 7.232s

OK (skipped=7)

real	0m8.065s
user	3m34.714s
sys	0m2.440s
```

also timed relevant cuda tests with this change:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.................
----------------------------------------------------------------------
Ran 17 tests in 21.049s

OK

real	0m24.106s
user	0m20.890s
sys	0m4.026s
```

compared to master
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.................
----------------------------------------------------------------------
Ran 17 tests in 23.021s

OK

real	0m27.095s
user	0m20.121s
sys	0m3.668s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14714

Differential Revision: D13384084

Pulled By: xnder

fbshipit-source-id: 344442103ccbbda72d3c010d2feea00e9985d226
2018-12-12 12:25:22 -08:00
63e77ab6c4 Move numa.{h, cc} to c10/util (#15024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15024

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14393

att

Reviewed By: dzhulgakov

Differential Revision: D13380559

fbshipit-source-id: abc3fc7321cf37323f756dfd614c7b41978734e4
2018-12-12 12:21:10 -08:00
b34ab435ef Stop erroneously running aten::warn (#15124)
Summary:
Fixes #15119. Before this PR, we were propagating constants through
aten::warn AND running it as a part of shape analysis.
This caused aten::warn to be run regardless of if it is
supposed to be run dynamically. This PR adds an exclusion for aten::warn
in constant propagation and shape analysis, similar to that of prim::RaiseException.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15124

Differential Revision: D13432815

Pulled By: zou3519

fbshipit-source-id: 15ab533ce2accb2da3fd4e569070c7979ce61708
2018-12-12 11:35:23 -08:00
2d485ffb17 Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248

This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl
with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current
behavior of pretending CUDA is HIP when you build with ROCm enabled.

Reviewed By: bddppq

Differential Revision: D13145293

fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02
2018-12-12 11:24:26 -08:00
9943cf2378 Kill Type.storage. (#15075)
Summary:
It's not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15075

Reviewed By: ezyang

Differential Revision: D13422487

Pulled By: gchanan

fbshipit-source-id: 272aa0a10e96f3ffb97d571490b517f972b9dcf7
2018-12-12 10:57:54 -08:00
9d2955c39c fix infinite loop when get_max_threads is nonzero but num_threads is 1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15114

Differential Revision: D13431891

Pulled By: umanwizard

fbshipit-source-id: f968b8e50cf776c346d4a28d72b12e7856c95839
2018-12-12 10:04:18 -08:00
68ad9ae5be Ensure there aren't variables in checked_tensor_unwrap, checked_tenso… (#15105)
Summary:
…r_list_unwrap.

These functions use unsafeGetTensorImpl(), which doesn't work with Variables (in a silent way that may blow up later).
So let's do early checking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15105

Reviewed By: ezyang

Differential Revision: D13429149

Pulled By: gchanan

fbshipit-source-id: b85f6f5b7cdb9a6dd0c40205b924c840a3920ba0
2018-12-12 09:58:03 -08:00
0ad39ec5c1 Add better support for bools in the graph fuser (#15057)
Summary:
Fixes #15038.

aten::_cast_Float(tensor, non_blocking) support was added in #14336.
Its second argument is a bool, but because we don't support generating values
of type bool in the fuser codegen, the codegen errored out.

aten::_cast_Float in the fuser never actually uses its non_blocking
argument, so another way to fix this would be to have a special op for a
fused cast but I thought that we might have fusible ops that do take
bool arguments in the future so this would be good to have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15057

Differential Revision: D13432091

Pulled By: zou3519

fbshipit-source-id: 455fe574f5f080aca9a112e346b841a2534a8dc3
2018-12-12 09:39:44 -08:00
f36a84b71b fix some tests that I accidentally disabled (#15077)
Summary:
While moving these scenarios into `_test_dim_ops` I accidentally left an empty loop in the actual tests, causing them to do nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15077

Differential Revision: D13428759

Pulled By: umanwizard

fbshipit-source-id: 08f53068981d9192c1408878b168e9053f4dc92e
2018-12-12 09:25:34 -08:00
3ae684266a Don't setup x86_64-linux-gnu-gcc as an sccache wrapper. (#15078)
Summary:
When I do this setup in a local Docker development environment,
I get the following error:

    x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory

Somehow, gcc seems to get confused when it gets run from the wrong
directory.  Best not to do it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15078

Differential Revision: D13432143

Pulled By: ezyang

fbshipit-source-id: b18e15f493503a4c8205c85f92a214e49762a7bc
2018-12-12 08:01:03 -08:00
00a4c8d41c Use c10::to_string that works cross platform (#15117)
Summary:
Fix master breakage introduced in #15108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15117

Differential Revision: D13430568

Pulled By: bddppq

fbshipit-source-id: ce10bc552f085d1bf0afbc13119991bee014ac95
2018-12-12 02:58:49 -08:00
1423c0d9f1 Add EmptyNameScope to allow you jump out from current scope. (#14631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14631

adding a empty name scope to allow people jump out from current namescope.

This could be useful when you want to access blob from parent or sibling scope.

 Facebook:

e.g: we encoutered a potential usecase in D13124249 (it's a large diff, please search by EmptyNameScope in that diff), we need to access to a blob declared in root namescope from a device namescope (device namescope has been used by parallel_GPU API). `EmptyNameScope` can help us do that with ease.

I referenced to `EmptyDeviceScope` D6103412 while implementing this one.

Reviewed By: yinghai

Differential Revision: D13272240

fbshipit-source-id: d4cde5abcc2336e456b6c6ef086266ef94d86da8
2018-12-12 01:39:50 -08:00
479481b6cb Remove linker and dlopen flags that allowed undefined symbols in rocm build (#15091)
Summary:
Previously the undefined symbols were caused by disabled_modules in tools/amd_build/disabled_features.json (now it's cleared).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15091

Differential Revision: D13429595

Pulled By: bddppq

fbshipit-source-id: b341e83f9e5a8d16440a364e837b045a8a4fd6e1
2018-12-11 23:23:47 -08:00
0dade9862c Fix serialization (#15033)
Summary:
Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do, doesn't get properly loaded. This had to do with the fact that the old protobuf format couldn't store empty parameters.

Fixes https://github.com/pytorch/pytorch/issues/14891

soumith ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15033

Differential Revision: D13411322

Pulled By: goldsborough

fbshipit-source-id: 2ef73b2aa93fa9e46b1cbe1fd47d9f134d6016d5
2018-12-11 22:43:36 -08:00
e20f9bbead Update the output format for benchmark_helper. It outputs the dimensi… (#15108)
Summary:
…on first and all the values in the next line. This way, it can output arbitrary blob
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15108

Reviewed By: llyfacebook

Differential Revision: D13429346

Pulled By: sf-wind

fbshipit-source-id: 5e0bba2a46fbe8d997dfc3d55a698484552e3af8
2018-12-11 22:24:56 -08:00
b07ee44f40 Pre-commit flake8/clang-tidy (#15102)
Summary:
Provide a pre-commit hook that does flake8 and clang tidy checks. Enables the clang-tidy script to run in parallel to make it fast enough to be used in a pre-commit hook.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15102

Reviewed By: soumith

Differential Revision: D13429629

Pulled By: zdevito

fbshipit-source-id: bd52fe5652f29b033de8d9926d78350b2da4c2fc
2018-12-11 22:18:18 -08:00
f8455ed754 add gloo support for gather on GPU (#14916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14916

as titled

Reviewed By: pietern

Differential Revision: D13267832

fbshipit-source-id: 3b89d08af93f74941f17ff892c33fc2a4a023c19
2018-12-11 21:21:10 -08:00
3fa53da61a Fix include paths for UndefinedTensorImpl.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14818

Reviewed By: ezyang

Differential Revision: D13348042

fbshipit-source-id: 11bdfc755767ce9d0a6fa95b2cf49d50adde8d60
2018-12-11 21:01:45 -08:00
63db95dd11 Move UndefinedTensorImpl to c10 (meh) (#14817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14817

unfortunately, we still need this.

Reviewed By: ezyang

Differential Revision: D13348041

fbshipit-source-id: e8dcc89f5c71bd1ea2c9813990dac6e58e63b1fd
2018-12-11 21:01:42 -08:00
2dfdbef91d Fix include paths for TensorImpl.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14816

Reviewed By: ezyang

Differential Revision: D13348040

fbshipit-source-id: a7204d89c2dd277d13093b0ed862f40b53dee82f
2018-12-11 21:01:40 -08:00
9e9e87c19e Move TensorImpl to c10 (yay!)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14795

Reviewed By: ezyang

Differential Revision: D13336856

fbshipit-source-id: 5375d0e42312ff7564f4df06210a5e49542d59e3
2018-12-11 21:01:38 -08:00
bff6d42cef Add at::scalar_tensor factory function, use it instead of Type.scalar… (#15074)
Summary:
…_tensor.

This is part of a long series of paring down the Type interface.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15074

Differential Revision: D13421482

Pulled By: gchanan

fbshipit-source-id: 84010ee71fef2cb74d32d5de7858d8ed9f36b885
2018-12-11 20:37:41 -08:00
b710642969 Make ATen HIPify out-of-place, but still reuse CUDA names. (#14866)
Summary:
```
    This diff changes the HIPification of ATen to be out-of-place.
    We now have the following mappings:

    - ATen/cuda => ATen/hip
    - ATen/native/cuda => ATen/native/hip
    - ATen/native/sparse/cuda => ATen/native/sparse/hip
    - THC => THH
    - THCUNN => THHUNN

    The build system is adjusted to know about these new build paths,
    and HIPify is taught how to adjust include paths and
    THC_GENERIC_FILE appropriately.  ATen_hip is now built as
    the ATen_hip library, rather than reusing ATen_cuda.

    However, despite these new filepaths, none of the identifiers in ATen
    have actually changed.  So, e.g., THHGeneral.h still defines functions
    named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself.
    We'll tackle this in a subsequent PR; this diff is just to get the files
    out-of-place.

    Minor extra improvements:

    - Don't edit tmp_install when hipifying
    - HIP no longer builds native_cudnn_cpp; it was unnecessary
    - Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency
      with all the other variables.
    - HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it
      did not previously.)
    - You can now override file extension matching in pyHIPIFY
      by explicitly specifying its full name in the matching list.
      This is used so we can HIPify CMakeLists.txt in some situations.

    A little bit of string and ceiling wax:

    - gen.py grows a --rocm flag so that it knows to generate CUDA
      files which actually refer to the HIP headers (e.g., THH.h)
      We'll get rid of this eventually and generate real HIP files,
      but not for this PR.
    - Management of HIP dependencies is now completely deleted
      from the ATen CMakeLists.txt.  The old code was dead (because
      it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly
      ignored by the Caffe2 build system) and didn't actually work.
```

Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866

Differential Revision: D13419475

Pulled By: ezyang

fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db
2018-12-11 19:15:27 -08:00
5c2c40ad87 Add error type to raise statement
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15039

Differential Revision: D13419566

Pulled By: zou3519

fbshipit-source-id: f67a3aebce937e3e640e91e81eb3e184cfdf269c
2018-12-11 17:41:44 -08:00
73ee7fda4c Remove deprecated variable_tensor_functions (#15003)
Summary:
Removing the deprecated functions in `torch/csrc/variable_tensor_functions.h` (like `torch::CPU`) and corresponding implementations from `torch/csrc/torch.cpp` from master after the release.

ezyang gchanan soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15003

Differential Revision: D13418086

Pulled By: goldsborough

fbshipit-source-id: a0accdf6f7b0efa1ec07ac7b74b86ff2da37543f
2018-12-11 17:16:11 -08:00
0552326846 add gloo scatter support on GPU (#14917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14917

as titled

Reviewed By: pietern

Differential Revision: D13271560

fbshipit-source-id: 0187a3390f8ebd72a2c074e7a651432159d427c0
2018-12-11 17:11:13 -08:00
92314c83fa re-enable copy of python files, but be careful that the copy is only … (#14982)
Summary:
…done once

This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982

Differential Revision: D13413960

Pulled By: zdevito

fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853
2018-12-11 16:54:08 -08:00
71e0cb505c Split off fuser tests in test_jit.py to their own test case (#15072)
Summary:
This PR creates TestFuser inside test_jit.py to be a home for graph fuser
specific tests.

This was a useful exercise because now that all the fuser tests are in
one place, I can spot redundant and bitrotting tests for cleanup in a
future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15072

Differential Revision: D13421458

Pulled By: zou3519

fbshipit-source-id: 80b1a7712feff75a0c186d1664601c4edbbca694
2018-12-11 14:55:06 -08:00
7408ce2f80 Supress warnings on generated tests
Summary: Removes all warnings spew for the TestJitGenerated tests

Differential Revision: D13420919

fbshipit-source-id: f251c12f923088ccc5daa2984c15003a67cbd1c1
2018-12-11 14:00:41 -08:00
04b65dfd1f Issue 14984: Remove divide by zero error in index_put_ (#14986)
Summary:
No check for zero index tensor was done in the accumulate=True (serial) case in the new TensorIterator code since https://github.com/pytorch/pytorch/pull/13420.

https://github.com/pytorch/pytorch/issues/14984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14986

Differential Revision: D13417861

Pulled By: colesbury

fbshipit-source-id: e6ed1af8f708b53a35803fc157ed1f043169ec89
2018-12-11 13:38:12 -08:00
109c8d22dc Update onnx coverage script for more accurate result (#15029)
Summary:
The coverage of scalar-input test cases were not accurate. This patch fixed that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15029

Differential Revision: D13419764

Pulled By: zrphercule

fbshipit-source-id: a14a5cbef432bea8c9126156f5deb1125e1aeb47
2018-12-11 13:14:35 -08:00
f2f47de5ad tox.ini -> .flake8 (#15065)
Summary:
We were only using this file to configure flake8, and fbcode linters do not recognize tox.ini which causes spurious linter warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15065

Differential Revision: D13420774

Pulled By: suo

fbshipit-source-id: e43a46befa36862c8b3c0a90074aec6a66531492
2018-12-11 13:14:34 -08:00
ca7f8fed60 silence unreachable code warnings (#15036)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#15036 silence unreachable code warnings**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D13411100/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15036

Differential Revision: D13414712

Pulled By: li-roy

fbshipit-source-id: d4aa84571fa94c66f3c5bfa9575a10c6ee398f9e
2018-12-11 13:09:04 -08:00
d825b39061 improve deep equality check in alias annotation test (#15031)
Summary:
Previously we were returning true if either IValue wasn't a tensor, which…is bad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15031

Differential Revision: D13409759

Pulled By: suo

fbshipit-source-id: f8bdcd05d334c1276ce46f55812065d358c1ff5d
2018-12-11 12:14:00 -08:00
02d149b767 Fix race condition in ThreadPool::workOnTasksUntilCompleted (#14833)
Summary:
Resolves #14704
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14833

Differential Revision: D13405211

Pulled By: highker

fbshipit-source-id: 8552d51eeb5d3af0ed66c461e5ddfeb9ae2926bd
2018-12-11 11:46:58 -08:00
c2a754c58b Fix CMakeLists.txt for Int8 python bindings (#15047)
Summary:
Currently in caffe2, one cannot properly fetch the content of Int8 blobs.

Upon digging the source code, it turns out that the relevant source code is not being compiled. Adding the source to CMakeLists.txt fixes this issue.

First time ever doing a pull request. Please let me know if there's any rule I should follow. Thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15047

Differential Revision: D13417583

Pulled By: bddppq

fbshipit-source-id: dd39575971a3012635edbf97a045d80e4b62a8eb
2018-12-11 10:48:47 -08:00
687834dcb4 Install cpp tests when built (#15000)
Summary:
This is broken out of https://github.com/pytorch/pytorch/pull/13733/

We want to install cpp tests so they can ultimately be runnable from that location for Caffe2 tests run from PyTorch builds.

cc pjh5 yf225 anderspapitto
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15000

Reviewed By: pjh5

Differential Revision: D13416253

Pulled By: orionr

fbshipit-source-id: 51280be0a22557a742f90c9f303c58c35cbd4a38
2018-12-11 10:07:48 -08:00
5d3a347685 Stashing checkpointing RNG states based on devices of arg tensors (#14518)
Summary:
This PR intends to address apaszke's concerns in https://github.com/pytorch/pytorch/pull/14253#issuecomment-441740016.  Preserving the rng state is now controlled by a kwarg rather than a global state, hopefully in a python 2.7-compatible way.

Additionally, the checkpointing function stashes and restores the RNG states of
1. devices associated with all input tensor args to run_fn as well as
2. the current device.

I could easily change this to only save and restore the RNG states associated 1. alone.  This would simplify the logic to create a [deduplicated, ordered](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R37) list of devices considered active.

I'm wondering if the [get_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R32) and [set_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) functions are general enough to reside elsewhere (presumably torch/random.py).  I'm also wondering if the check on [torch.cuda._initialized](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) would be better placed within `get_device_states`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14518

Differential Revision: D13356210

Pulled By: ezyang

fbshipit-source-id: afa4cc21ce7862142d5cb1dec3750018df222039
2018-12-11 09:48:45 -08:00
25ddd659c9 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: d39b31f12ab2ab570548f3e8a65949332a64a0ff
2018-12-11 07:40:37 -08:00
bf1d411dbf Switch Int8Softmax, Int8Relu, and Int8LeakyRelu to QNNPACK (#14933)
Summary:
Int8Softmax: 4x-5x speedup compared to previous implementation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14933

Differential Revision: D13406820

Pulled By: Maratyszcza

fbshipit-source-id: ea8cbe1b861ddb7ff1b851d06d52c6fd6d04ed01
2018-12-11 00:49:06 -08:00
a1ea7dbe40 Adjust the API call to deserilize the tensorproto (#14132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14132

as title

Reviewed By: jerryzh168

Differential Revision: D13110697

fbshipit-source-id: 822c9079de11951f90aec3d26f0e4108847e7dac
2018-12-10 22:54:42 -08:00
27d5ae7afb use datatype dependent tolerance in data parallel tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856

Differential Revision: D13413560

Pulled By: soumith

fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481
2018-12-10 22:50:27 -08:00
81dc78d871 Update pooling.py (#14998)
Summary:
Strange line in the documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14998

Differential Revision: D13413235

Pulled By: soumith

fbshipit-source-id: 80d05ec1185719b785f0aac914bc2369c1174f2f
2018-12-10 22:36:20 -08:00
48a361cc62 Clean up casting ops (#14947)
Summary:
This removes FloatToInt style names replacing it with just the destination
name (e.g. FloatToInt -> Float). This makes it more consistent with the
syntax and makes it easier to add type conversions (just add a new
prim::Int op, for instance).

None of these ops get serialized so this should not effect loading of
old models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14947

Differential Revision: D13408409

Pulled By: zdevito

fbshipit-source-id: d773fe863f14d9de893f686832769f8cc8903a8e
2018-12-10 22:15:08 -08:00
cff509e2b1 share code between adagrad and rowwise adagrad tests (#14692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14692

Remove some code duplication

Reviewed By: chocjy

Differential Revision: D13296731

fbshipit-source-id: 5924e037ca64fc4b89234be922bc5ca47fb8bd32
2018-12-10 22:10:39 -08:00
c48b15e41a TBB task graph (#15041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15041

Adding an alternative implementation of a task graph based on TBB

Reviewed By: dmudiger

Differential Revision: D13412517

fbshipit-source-id: f5efedd680bbe0072bf38d504e5682ab51dd630f
2018-12-10 21:35:04 -08:00
45dfc6764e Enable more caffe2 fp16 rocm tests (#15040)
Summary:
cc rohithkrn petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15040

Reviewed By: houseroad

Differential Revision: D13413068

Pulled By: bddppq

fbshipit-source-id: b2967f16f8da0b9e80083138fb8632c14e9e9b63
2018-12-10 21:30:21 -08:00
5022f9d6ef Enable the build of tests in ATen/core (#15032)
Summary:
Otherwise they won't build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15032

Reviewed By: yinghai

Differential Revision: D13409801

Pulled By: houseroad

fbshipit-source-id: 95464aa8f3604835997ba1bb7f3c3e51485d1686
2018-12-10 21:24:54 -08:00
962b82dd81 More scaffolding for LegacyTHDispatch. (#14852)
Summary:
1) at::functions are now also exposed in the at::legacy::th namespace and we move relevant calls over to use them (to avoid merge conflicts)
2) LegacyTHDispatch now handles device-type initialization
3) We generate derived LegacyTHDispatchers, e.g. THLegacyCPULongDispatcher, although they are currently empty.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14852

Reviewed By: ezyang

Differential Revision: D13360852

Pulled By: gchanan

fbshipit-source-id: af6705aeba3593ea5dba9bfc62890e5257bc81f8
2018-12-10 19:57:01 -08:00
e9cd781681 Back out "Revert D13043261: [caffe2] Task graph and task future abstractions in executor"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15030

Reviewed By: bddppq

Differential Revision: D13408998

fbshipit-source-id: 9eb675e09fbc4829eab34df7aa660a0590816feb
2018-12-10 19:30:58 -08:00
83f32eebd9 Tensor construction codemod - 2/3 (#14836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14836

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: bddppq

Differential Revision: D13335176

fbshipit-source-id: 8d89510670e2cf70559d2f75e68f7181feb0b6d9
2018-12-10 19:30:56 -08:00
5222a1b190 Fixing reading of FBGEMM from env variables
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15023

Reviewed By: orionr

Differential Revision: D13406778

Pulled By: pjh5

fbshipit-source-id: 2265f01170fb7969cbdf4e44ca6ef183f5d8017d
2018-12-10 18:18:38 -08:00
a97cf568a4 Alignas Array struct (#14920)
Summary:
This PR aligns the Array struct such that cuda vector performance improvements can be utilized.

I tested this by using it on our Philox header. Note how the vector store instruction gets used for cuda vector types and when using alignas on Array, vs when not using alignas on Array.

With cuda vector type (uint4, uint2, float4): https://godbolt.org/z/UaWOmR
With alignas: https://godbolt.org/z/Eeh0t5
Without alignas: https://godbolt.org/z/QT63gq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14920

Differential Revision: D13406751

Pulled By: soumith

fbshipit-source-id: 685b1010ef1f576dde30c278b1e9b642f87c843d
2018-12-10 17:58:03 -08:00
7e2b074219 Integrate rocBLAS fp16 api into Caffe2 (#14882)
Summary:
This PR integrates rocBLAS half and mixed precision APIs in to Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882

Differential Revision: D13407840

Pulled By: bddppq

fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e
2018-12-10 17:54:06 -08:00
92f3616f36 Fix old tensor CopyFrom usage in boolean mask operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15025

Differential Revision: D13407323

Pulled By: bddppq

fbshipit-source-id: 1bc1d28ad0c6c71d25d788549be18917e393ee50
2018-12-10 17:23:45 -08:00
4fcc2fffc3 unit test with multiple omp threads (#14958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14958

Test with multiple threads

Reviewed By: jianyuh

Differential Revision: D13394791

fbshipit-source-id: 931a6c3bda15ebc816807e537dd0841c383e7a6f
2018-12-10 17:23:44 -08:00
9b272c08cf Remove partially initialized Tensor in Deserialization (#14197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13642

Previously we pass in a patially initialized Tensor to Deserialize and it will fill
it with the result of deserialization of a tensor proto. Now we want it to return
a Tensor directly since it's just a shared pointer to TensorImpl.

Reviewed By: dzhulgakov

Differential Revision: D12874357

fbshipit-source-id: 12b80a763375da23cfa64a74d6bc186d8d03b94f
2018-12-10 17:17:29 -08:00
4a145cd95c Revert D13043261: [caffe2] Task graph and task future abstractions in executor
Differential Revision:
D13043261

Original commit changeset: d89424354aea

fbshipit-source-id: b307e3281c4d83b60ba2bfadcbcf69afb7a41412
2018-12-10 16:03:59 -08:00
0a36fe565d apply() for ScriptModules (#14655)
Summary:
This can be use to initialize state that is not necessarily eligible for serialization/is implementation-specific. Concretely, I'm going to use this to pack the weight matrices for quantized Linear modules according to the FBGEMM APIs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14655

Differential Revision: D13404438

Pulled By: jamesr66a

fbshipit-source-id: 2d327cef5520fdd716b5b1b29effd60a049e8a4a
2018-12-10 15:40:31 -08:00
9bbb3efe2f Simplify THPPointer implementation for Storage. (#14897)
Summary:
We've virtualized the destructor for storage, so we
no longer have to forward to a particular backend.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14897

Differential Revision: D13399216

Pulled By: ezyang

fbshipit-source-id: 531d29c3f278477cfa8759f30ab4f304d695b659
2018-12-10 15:18:49 -08:00
23cc3daabd Disable getNumGPUs rewrite (#14993)
Summary:
cc iotamudelta

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14993

Differential Revision: D13405804

Pulled By: ezyang

fbshipit-source-id: c4aa9ed29ee2a4f3abf76c1e0fa8babfd738db35
2018-12-10 15:13:55 -08:00
6ad9f7b798 Fix include path for WrapDimMinimal.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14794

Reviewed By: dzhulgakov

Differential Revision: D13336842

fbshipit-source-id: ca49a9fd1d409d8a75e43eeb9b9b02c305ebb79a
2018-12-10 15:10:03 -08:00
279ec9ef7a Move WrapDimMinimal to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14793

Reviewed By: ezyang

Differential Revision: D13336841

fbshipit-source-id: 4365a799e1856cc68dd94a273e97663fee5f51db
2018-12-10 15:10:01 -08:00
66315ab323 Stop disabling maybeOverlappingIndices (#14999)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14999

Differential Revision: D13405754

Pulled By: ezyang

fbshipit-source-id: 98459496494390ad1115b4f1f6738d53c14f0745
2018-12-10 15:02:08 -08:00
483ba553bd add gloo allgather support on GPU (#14576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14576

as titled

Reviewed By: pietern

Differential Revision: D13266063

fbshipit-source-id: e262f77d63724a7504a7112907bbfba49612fe75
2018-12-10 14:32:54 -08:00
029600813e Task graph and task future abstractions in executor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14116

Reviewed By: dmudiger

Differential Revision: D13043261

fbshipit-source-id: d89424354aea14d1d14eb8320fb3aa34908a4e81
2018-12-10 14:28:56 -08:00
a51fe386c8 caffe2/caffe2/contrib/script (#15007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14979

att

Reviewed By: dzhulgakov

Differential Revision: D13286191

fbshipit-source-id: b8a6bc7aea44487aea4dcf7f44c858fd30c6293c
2018-12-10 14:23:31 -08:00
25144c8a09 s/Torch Script/TorchScript/g (#15011)
Summary:
pls
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15011

Differential Revision: D13404158

Pulled By: suo

fbshipit-source-id: e906281463d65c86e4e9073eb0c0a26f4f29e307
2018-12-10 13:48:24 -08:00
110ccbb689 Improve the docs of interpolate(align_corners=) (#14806)
Summary:
ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14806

Reviewed By: ailzhang

Differential Revision: D13366332

Pulled By: ppwwyyxx

fbshipit-source-id: 08fcea95d5c86b11cdfe464fdd9daa50050871f1
2018-12-10 12:50:38 -08:00
e77de07448 Improve build time of register_symbols.cpp without compiler hacks (#14911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14911

In optimized modes the compiler tries to inline all the
`unordered_map::operator[]` calls, creating a massive amount of code
which takes several minutes to optimize. Instead, create a table of
PODs and populate the maps using a simple loop.

Reviewed By: soumith, luciang

Differential Revision: D13382948

fbshipit-source-id: b6752921e0f7213595d26b39e4397f6a3897960b
2018-12-10 11:57:11 -08:00
18c93b87c2 Delete defunct THP_API.h header. (#14899)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14899

Differential Revision: D13383687

Pulled By: ezyang

fbshipit-source-id: f2a08a769cc3775ba55f9c58d622a83df622d816
2018-12-10 10:47:24 -08:00
1989157eb6 Disable test_leaf_variable_sharing on ASAN runs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15001

Reviewed By: orionr

Differential Revision: D13399119

fbshipit-source-id: 6b1d098e55a67b1f5bc6d08a8ee3c1be8234a654
2018-12-10 10:43:05 -08:00
d30b6bf3b6 Revert D13306052: [pytorch][PR] Allow converting CharTensor to np arrays
Differential Revision:
D13306052

Original commit changeset: 202d038f139c

fbshipit-source-id: 11f6bdd687f8ea5ce2e5f28f48d19449a5c403eb
2018-12-10 10:36:17 -08:00
dc1e6d0b98 Non-INTERFACE AT_LINK_STYLE is dead code (#14822)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14822

Differential Revision: D13355574

Pulled By: ezyang

fbshipit-source-id: a7173084f8735424619b2e393df2715a05918b44
2018-12-10 09:42:53 -08:00
54d5c53826 Support torch.load with encoding (#14743)
Summary:
Addresses a common compatibility issue when loading Py2 checkpoints in Py3 regarding to bytes.

E.g.,
[1] https://github.com/pytorch/pytorch/issues/5994,
[2] https://github.com/CSAILVision/places365/issues/25,
[3] https://discuss.pytorch.org/t/how-to-load-a-saved-model-trained-on-pytorch-0-3-1-python-2-7-on-pyorch-1-0-python-3-7/31212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14743

Reviewed By: weiyangfb

Differential Revision: D13350888

Pulled By: soumith

fbshipit-source-id: 2df4e828a8b70509118a355307ca3ebe51e108f6
2018-12-10 08:07:36 -08:00
9b2bd284b3 Convert int8 numpy array to CharTensor (#14700)
Summary:
When rewriting `default_collate`, I noticed that `from_numpy` and `as_tensor` and `tensor` all do not work on `np.int8` arrays.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14700

Reviewed By: weiyangfb

Differential Revision: D13305297

Pulled By: soumith

fbshipit-source-id: 2937110f65ed714ee830d50098db292238e9b2a9
2018-12-10 07:39:06 -08:00
e1b5dbf699 Allow converting CharTensor to np arrays (#14710)
Summary:
The other direction of #14700

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14710

Reviewed By: weiyangfb

Differential Revision: D13306052

Pulled By: soumith

fbshipit-source-id: 202d038f139cf05e01069ff8d05268c66354c983
2018-12-10 07:35:28 -08:00
b039a715ce pre-pack operation of dnnlowp conv with 16-bit accumulation (#14881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14881

This diff allows us to pre-quantize and pre-pack weight matrix used in DNNLOWP_ACC16 .
The intended use pattern is run Int8ConvPackWeight in init_net that generates a packed weight and Int8Conv with DNNLOWP_ACC16 engine uses the the packed weight.

Reviewed By: csummersea

Differential Revision: D13374662

fbshipit-source-id: dd02b9a4eb7af1fe208aa857fcd0b445e6e395af
2018-12-10 01:08:21 -08:00
e747acbebb Respect -q of setup.py (#14972)
Summary:
1. Changes the prints along the 'rebuild' pathway to respect the '-q' flag of setup.py
A clean rebuild now only prints:

    [zdevito@devgpu172.prn2 /data/users/zdevito/pytorch] python setup.py -q rebuild develop
    [0/1] Install the project...
    -- Install configuration: "RelWithDebInfo"
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.

2. Deletes apparently dead calls to `generate_code`. Now that CMake builds these files,
it appears that it is getting called twice and the second version is never used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14972

Reviewed By: soumith

Differential Revision: D13396330

Pulled By: zdevito

fbshipit-source-id: 83c45143bbc6a6d2c1cfee929291ec059f2b5dc3
2018-12-09 22:47:49 -08:00
fab8085111 _get_device_index supports parsing device strings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14929

Reviewed By: weiyangfb

Differential Revision: D13394498

Pulled By: soumith

fbshipit-source-id: 948c6118abdf6c1e1a8a17709333954cafb2345e
2018-12-09 21:12:46 -08:00
5fd69e7551 remove mingfeima mkldnn reference from README, as no longer necessary (#14975)
Summary: we now get mkldnn automatically from third_party/ideep

Differential Revision: D13396480

Pulled By: soumith

fbshipit-source-id: 20f819ba4b78cbe9c7d0baeab1c575669cbf6c20
2018-12-09 20:44:10 -08:00
aefc83f46d fixing some rebuild issues (#14969)
Summary:
This fixes rebuild issues with the ninja part of the build. With this patch all ninja files will now report `nothing to do` if nothing has changed assuming `BUILD_CAFFE2_OPS=0`.

1. This only does the python file processing for caffe2 when BUILD_CAFFE2_OPS=1, this part of the build file is written in such a way that it is always required to rerun and can take substantial time to move files around in the no-op build. In the future this part should be rewritten to use a faster method of copying the files or should treat copying the files as part of the build rules and only run when the files are out of date.

2. This points `sleef` to a patched version that fixes a dead build output that is causing everything to relink all the time. See https://github.com/shibatch/sleef/pull/231#partial-pull-merging for the upstream change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14969

Reviewed By: soumith

Differential Revision: D13395998

Pulled By: zdevito

fbshipit-source-id: ca85b7be9e99c5c578103c144ef0f2c3b927e724
2018-12-09 16:32:19 -08:00
fc30e2782c Remove deprecated info argument in btrifact (#14935)
Summary:
As specified in title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14935

Differential Revision: D13394449

Pulled By: soumith

fbshipit-source-id: 569d59414f3a1a43ea641bded4b5433eb53e3490
2018-12-09 15:59:30 -08:00
86e03b8a30 add fix for CUDA 10 (#14971)
Summary:
Linux binaries-only fix for CUDA10
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14971

Differential Revision: D13395932

Pulled By: soumith

fbshipit-source-id: a72d6ab6b98c6c936e6391d55d2e4e45b9f1e6dd
2018-12-09 15:54:27 -08:00
5f2736b84a Fix mismatched test_{full,ones,zeros}_like onnx expect files (#14956)
Summary:
master broken #14903
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14956

Differential Revision: D13395363

Pulled By: bddppq

fbshipit-source-id: 31f0913843292e557807fd5a976f8907fa6cae4b
2018-12-09 08:57:14 -08:00
a1494efdfa fix auto grad summing for IfOp where intermediate output needs renaming (#14772)
Summary:
fix auto grad summing for IfOp where intermediate output needs renaming.

Bug before this diff:
- we only renames the output of IfOp without changing the subnet ops output
- this results in blob not found error

the unittest provides an example
this diff fix that for IfOp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14772

Differential Revision: D13327090

Pulled By: harouwu

fbshipit-source-id: ec40ee88526ace3619c54551e223dd71158a02f8
2018-12-09 08:26:46 -08:00
fa12e1e4d4 Export ones_like, zeros_like and full_like using ONNX ConstantLike op. (#14903)
Summary:
This PR does the following:
1) Updates the ONNX export for `torch.zeros_like` and `torch.full_like` ops to use ONNX op `ConstantLike`. This reduces the export of experimental op `ConstantFill`, which may possibly be removed in future, see https://github.com/onnx/onnx/pull/1434).
2) It also adds export support for `torch.ones_like`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14903

Differential Revision: D13383700

Pulled By: houseroad

fbshipit-source-id: 566d00a943e9497172fcd5a034b638a650ab13a2
2018-12-08 22:49:02 -08:00
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
a7b3197b2d race condition fix of calling mutable_data inside a openmp region (#14921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14921

Fix race condition introduced in D13188595 .
Let's reminder ourselves "never call mutable_data from an OpenMP region!!!"

Reviewed By: jianyuh

Differential Revision: D13387692

fbshipit-source-id: 6a3aeedeeda55a9ede660de8f1f44d4eee76ae2b
2018-12-08 18:17:20 -08:00
e9db9595d2 Add crop argument, can crop rec as well, first resize and then crop
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14894

Reviewed By: llyfacebook

Differential Revision: D13377604

Pulled By: sf-wind

fbshipit-source-id: 333d0d864e6c2dc85f405baa25ed58029d62750f
2018-12-08 11:14:56 -08:00
b0909ea6a0 Switch Int8Sigmoid to QNNPACK (#14883)
Summary:
50x-100x speedup compared to current version.
Also, fixes a bug in the current version when batch size exceeds 1 (current version processes only the first image in this case).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14883

Differential Revision: D13390655

Pulled By: Maratyszcza

fbshipit-source-id: 1b33a97bf2d0866d38faa2b42e64fd2859017898
2018-12-08 02:47:29 -08:00
5e06fa0baf ONNX changes to use int32_t (instead of enum) to store data type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14926

Reviewed By: houseroad

Differential Revision: D13390642

Pulled By: bddppq

fbshipit-source-id: c2314b24d9384f188fda2b9a5cc16465ad39581e
2018-12-08 01:06:08 -08:00
c8a5ec14dd Remove at references from c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14432

Reviewed By: dzhulgakov

Differential Revision: D13223904

fbshipit-source-id: 43b06e33e088e7789ccea6d92267936fe30d8571
2018-12-08 00:28:35 -08:00
25110d61fb Implement std for multiple dimensions on CPU devices. (#14535)
Summary:
Tested on a tensor with 1 billion elements and 3 dimensions on a powerful, highly
multi-core Linux machine.

parallelized: All operations (e.g., `t.std(1)`) that could be done in the old code are now several times faster. All
new operations (e.g., `t.std((0,2))` are significantly faster than the NumPy equivalents.
`t.std((0, 1, 2))`, a new operation, is logically equivalent to the
old `t.std()`, but faster.

serial: The above comment about old operationos now being faster still
holds, but `t.std((t1, ..., tn))` is now a few
times slower than `t.std()`. If this turns out to be important, we can
special-case that to use the old algorithm.

The approach is to create a new method, `TensorIterator::foreach_reduced_elt`,
valid for `TensorIterator`s that represent a dimension reduction. This
method calls a supplied function for each element in the output,
supplying it with the input elements that correspond to that output.

Given that primitive, we can implement reductions like the following pseudocode:

If there is more than one output element:
```
PARALLEL FOR EACH element IN output:
    accumulator = identity
    SERIAL FOR EACH data_point IN element.corresponding_input:
        accumulator.update(data_point)
    element = accumulator.to_output()
```

If there is only one output element, we still want to parallelize, so we
do so along the *input* instead:

```
accumulators[n_threads]
PARALLEL FOR EACH input_chunk IN input.chunks():
    accumulators[thread_num()] = identity
    SERIAL FOR EACH data_point IN input_chunk:
        accumulators[thread_num()].update_with_data(data_point)
accumulator = identity
SERIAL FOR EACH acc in accumulators:
    accumulator.update_with_other_accumulator(acc)
output_element = accumulator.to_output()
```

Note that accumulators and data points do not have to be the same type
in general, since it might be necessary to track arbitrary amounts of
data at intermediate stages.

For example, for `std`, we use a parallel version of Welford's
algorithm, which requies us to track the mean, second moment, and number
of elements, so the accumulator type for `std` contains three pieces of
data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14535

Differential Revision: D13283887

Pulled By: umanwizard

fbshipit-source-id: 8586b7bf00bf9f663c55d6f8323301e257f5ec3f
2018-12-07 20:16:04 -08:00
c2a75926ca Add CAFFE2_API to video processing functions (#14900)
Summary:
Extracted from https://github.com/pytorch/pytorch/pull/13733

Some tests were failing because these methods didn't have an export.

cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14900

Reviewed By: pjh5

Differential Revision: D13381130

Pulled By: orionr

fbshipit-source-id: 030536f8fb09765c09a7b0bd45400161053f2e18
2018-12-07 19:55:21 -08:00
52942e1f09 Enable unit tests known to work on ROCm (#14011)
Summary:
* Enable unit tests known to work on ROCm.
* Disable a few that are known to be flaky for the time being.
* Use std::abs for Half
* No more special casing for ROCm in TensorMathReduce
* Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce

ezyang bddppq for awareness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011

Differential Revision: D13387679

Pulled By: bddppq

fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71
2018-12-07 18:57:32 -08:00
5be28ade66 Automatic update of fbcode/onnx to aca8473a40cf43f01958c81b648efcee7f3a755a (#14865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14865

Previous import was 42804705bdbf179d1a98394008417e1392013547

Included changes:
- **[aca8473](https://github.com/onnx/onnx/commit/aca8473)**: Add Erf operator for computing error function (#1675) <bddppq>
- **[3fc82ca](https://github.com/onnx/onnx/commit/3fc82ca)**: Add IsNaN operator. (#1656) <Pranav Sharma>
- **[0685f01](https://github.com/onnx/onnx/commit/0685f01)**: Add Sign Op (#1658) <Rui Zhu>
- **[2a8fae8](https://github.com/onnx/onnx/commit/2a8fae8)**: Fix unused var warning (#1669) <Yinghai Lu>
- **[e212833](https://github.com/onnx/onnx/commit/e212833)**: Update scan (#1653) <G. Ramalingam>

Reviewed By: zrphercule

Differential Revision: D13370727

fbshipit-source-id: 13a93d5acc8d4758f682278ea162ec9124ced22d
2018-12-07 17:37:42 -08:00
11a9248d01 Enable fp16 for MIOPEN operators in Caffe2 (#14905)
Summary:
This PR enables fp16 MIOPEN operators in Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14905

Differential Revision: D13383439

Pulled By: bddppq

fbshipit-source-id: 840afa8d08bef2952ca0039dee2423f1542bb330
2018-12-07 17:26:44 -08:00
70598740ec Upgrade MKL-DNN to version 0.17 (#14308)
Summary:
upgrade MKL-DNN to version 0.17
update mkldnn bridge to latest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14308

Differential Revision: D13383102

Pulled By: yinghai

fbshipit-source-id: c434f0e0ddff2ee2c86db2d6c44a37298fd005a3
2018-12-07 16:44:50 -08:00
478eb70c07 Fix build with OpenCV 4.0 (#14356)
Summary:
Fixes #14355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14356

Differential Revision: D13356237

Pulled By: bddppq

fbshipit-source-id: 2bf6ee21995c2c7b617c4e78ea7341f975f1b937
2018-12-07 16:40:31 -08:00
4453a1ff88 Remove unused TensorImpl dependencies
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14792

Reviewed By: ezyang

Differential Revision: D13336843

fbshipit-source-id: 12f84799a70c2e90a8b934dd8dc031c09a6782f0
2018-12-07 16:23:48 -08:00
65aa11a876 Remove TensorImpl -> context_base dependency (#14658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14658

Remove this dependency by moving at::CopyBytes to c10.
The implementations for at::CopyBytes will have to live in aten/caffe2 for now because they're not unified for CUDA yet.
They'll be moved into c10/backend/xxx later.

Reviewed By: dzhulgakov

Differential Revision: D13288655

fbshipit-source-id: 1c92379345308b3cd39a402779d7b7999613fc0d
2018-12-07 16:23:46 -08:00
086a37876b Fix include paths for TensorOptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14747

Reviewed By: ezyang

Differential Revision: D13318645

fbshipit-source-id: f5ba77a93f6019fbf5faffb47a2837c95fad474d
2018-12-07 16:23:44 -08:00
459aac4f24 Update graph printouts in JIT docs (#14914)
Summary:
Tracing records variable names and we have new types and stuff in the IR, so this updates the graph printouts in the docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14914

Differential Revision: D13385101

Pulled By: jamesr66a

fbshipit-source-id: 6477e4861f1ac916329853763c83ea157be77f23
2018-12-07 15:08:53 -08:00
5734e96775 Improve hub documentation (#14862)
Summary:
Added a few examples and explains to how publish/load models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14862

Differential Revision: D13384790

Pulled By: ailzhang

fbshipit-source-id: 008166e84e59dcb62c0be38a87982579524fb20e
2018-12-07 14:59:01 -08:00
65da7ddad6 USE_FBGEMM=True by default
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14868

Differential Revision: D13383390

Pulled By: jamesr66a

fbshipit-source-id: 1880c07dfd239e19153bd4fde2ab2c8d0604f956
2018-12-07 14:22:55 -08:00
a0ee3a279c USE_TENSORRT support and TensorRT 5 compatibility
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13945

Differential Revision: D13317525

Pulled By: yinghai

fbshipit-source-id: 8630dfec1bbc5aac19539e344e7c38a7fd8b051d
2018-12-07 14:01:11 -08:00
febc7ff99f Add __init__.py so files get picked up on install (#14898)
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.

Broken out of https://github.com/pytorch/pytorch/pull/13733/

cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898

Reviewed By: pjh5

Differential Revision: D13381123

Pulled By: orionr

fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe
2018-12-07 13:40:23 -08:00
efc5e9f71a Replace calls of Type::_th_tensor. (#14877)
Summary:
_th_tensor is moving off Type, so these calls need to be replaced.

Unfortunately, replacing these with a full-fledged solution [e.g. from_storage(..., TensorOptions)] is a bit complicated because the storage itself fully defines the Type (modulo variable).  It's simpler to just wait for the Variable/Tensor merge rather than to solve this now, so instead I changed the call sites to: at::empty({0}, type.options()).set_(storage...).

This isn't great because we are also trying to get rid of Type::options, but this seems to be the lesser-of-two-evils.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14877

Differential Revision: D13374310

Pulled By: gchanan

fbshipit-source-id: eb953ed041507e6190d6f32e383912e5a08311cd
2018-12-07 13:04:48 -08:00
d6c53328f9 Large scale fix of python-related files in torch/csrc/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515

Differential Revision: D13247966

Pulled By: goldsborough

fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628
2018-12-07 13:04:46 -08:00
939877bf4b Implementation of WeightedSum op for mkl-dnn and fix FC op output shape issue.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14407

Reviewed By: yinghai

Differential Revision: D13364364

Pulled By: wesolwsk

fbshipit-source-id: e69bcd1bc52e35b2f0e45e5dc40184f1bd66605d
2018-12-07 12:35:19 -08:00
265b55d028 Revert D13205604: Move numa.{h, cc} to c10/util
Differential Revision:
D13205604

Original commit changeset: 54166492d318

fbshipit-source-id: 89b6833518c0b554668c88ae38d97fbc47e2de17
2018-12-07 10:01:25 -08:00
1c9df7facf Expose torch.roll function and method (#14880)
Summary: Fixes #14859 .

Differential Revision: D13376915

Pulled By: zou3519

fbshipit-source-id: f1fc0e8492a159431a3fc0a19a41aa10429ecc80
2018-12-07 07:42:47 -08:00
6651fae827 Make autograd engine compatible with hip
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14873

Differential Revision: D13375053

Pulled By: bddppq

fbshipit-source-id: f3051640386667bbf0566856ed433eb83276c39e
2018-12-07 00:12:06 -08:00
6e453e56f9 Fixed ConvT docstring (#14876)
Summary:
Fixes #14099

I attempted to be as consistent as possible with the formatting, hence why my equation reads d*(k - 1) instead of (k - 1)*d.

Also there is an unused variable on line 46: `n = self.in_channels`. I could fix that here too if that's not too out of scope.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14876

Differential Revision: D13374317

Pulled By: soumith

fbshipit-source-id: a9f110acafa58cdb4206956dbe3ab4738d48292d
2018-12-06 23:57:30 -08:00
51d26e76f7 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 7da015701f18f8a0b5a8092aae02a42ede7bfd44
2018-12-06 22:52:22 -08:00
4655b7bc4b Remove weak module test expect files (#14871)
Summary:
This PR removes some expect files that aren't really testing anything
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14871

Differential Revision: D13373762

Pulled By: driazati

fbshipit-source-id: e3537ee83df23b3b3b854f9b1253fd0cc8e9dd33
2018-12-06 21:55:12 -08:00
1a247f872f gradcheck (#14596)
Summary:
- allow gradcheck to take sparse tensor as input
- sparse output is not allowed yet at gradcheck
- add backward for `to_dense()` to get around sparse output
- calling gradcheck at test_sparse, so that we can use `_gen_sparse()` and also easily cover coalesced / uncoalesced test cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14596

Differential Revision: D13271904

Pulled By: weiyangfb

fbshipit-source-id: 5317484104404fd38058884c86e987546011dd86
2018-12-06 18:03:38 -08:00
bfa666eb0d Skipping two c10d tests only if there are multi-GPUs (#14860)
Summary:
Otherwise, these tests will fail, even though there are never meant to run on single GPU machines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14860

Differential Revision: D13369060

Pulled By: teng-li

fbshipit-source-id: 8a637a6d57335491ba8602cd09927700b2bbf8a0
2018-12-06 17:28:07 -08:00
ada8f828f9 Move TensorOptions, DefaultTensorOptions to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14746

Reviewed By: ezyang

Differential Revision: D13318644

fbshipit-source-id: b703d7dc67e75d9e9571c80d62a100c5fc4e84df
2018-12-06 15:59:04 -08:00
bd3eb87258 Switch Int8MaxPool operator to QNNPACK (#14832)
Summary:
1.6-2.4X speedup on ARM when compiled with gcc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14832

Differential Revision: D13358160

Pulled By: Maratyszcza

fbshipit-source-id: 39e9791886fac62650bb53a9df341889f0bb5d49
2018-12-06 15:14:28 -08:00
e6a420114f collect_env.py: get conda magma and mkl information (#14854)
Summary:
Fixes #12371
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14854

Differential Revision: D13363635

Pulled By: zou3519

fbshipit-source-id: f8b5d05038bf5ce451399dfeed558ae298178128
2018-12-06 14:58:14 -08:00
ddca0442b6 Add LogSigmoid support in ONNX symbolic (#14830)
Summary:
Add LogSigmoid:

torch.LogSigmoid(x) = onnx.Log(onnx.Sigmoid(x))
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14830

Differential Revision: D13353891

Pulled By: zrphercule

fbshipit-source-id: bf456170b9e6c4edad07b3333cd5797f8e0fa97f
2018-12-06 14:17:33 -08:00
5f0bff9639 Kill GPU memory logs in normal runs (#14838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14838

The GPU memory tracking logs are incredibly annoying and merely serve
to pollute output. I `VLOG(1)`ed them. Hopefully, this is non-controversial.

Reviewed By: kuttas

Differential Revision: D13343290

fbshipit-source-id: b3cae99346c97b66e97ea660061e15dc5c99b9fc
2018-12-06 13:51:14 -08:00
f82f4de229 Stop inserting static casts in Hipify (#14853)
Summary:
Latest hcc can now properly cast to correct type internally, so there is no need to insert static_cast in hipify scripts anymore.
However the hcc included in the latest ROCm release (1.9.2) doesn't have this fix, so leaving a flag to continue doing static_cast for those using the official ROCm releases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14853

Differential Revision: D13363171

Pulled By: bddppq

fbshipit-source-id: a36476a8511222ff3c933d31788e8a0ffb04f5ca
2018-12-06 13:19:33 -08:00
b5db6ac9f1 Tensor construction codemod - 3/3 (#14835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14835

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: bddppq

Differential Revision: D13335184

fbshipit-source-id: 26d8247e16b30bdff045530034af9b72c76d066f
2018-12-06 11:50:59 -08:00
20d1bff292 Tensor construction codemod - 1/3 (#14828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14828

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: bddppq

Differential Revision: D13335160

fbshipit-source-id: a3ae4c5a86bfbdaf2d5aa14e0eef57255e829fd4
2018-12-06 11:47:32 -08:00
1d111853ae Move numa.{h, cc} to c10/util (#14393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14393

att

Reviewed By: ezyang

Differential Revision: D13205604

fbshipit-source-id: 54166492d31827b0343ed070cc36a825dd86e2ed
2018-12-06 11:30:13 -08:00
75a2d8e2de Upgrade CI to ROCm 1.9.2 (#14216)
Summary:
Drop custom hcc/hip as the 1.9.2 release should contain the relevant patches therein.

Most notable feature in 1.9.2 is mixed precision support in rocBLAS and MIOpen. These features will be enabled by subsequent PRs.

bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14216

Differential Revision: D13354294

Pulled By: bddppq

fbshipit-source-id: 2541d4a196af21c9432c1aff7f6e65b572628028
2018-12-06 10:13:39 -08:00
1c8d41a08d Allow linspace and logspace with steps=1 and start != end like numpy (#14748)
Summary:
`torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine.
Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?"
I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy.

This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`.

If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748

Differential Revision: D13356136

Pulled By: ezyang

fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82
2018-12-06 09:30:55 -08:00
Jie
d2fdc33411 (#14580)
Summary:
Removes cast of half to float in torch.sum, with float16 input tensor and
float32 output tensor, instead we cast data when loading input in kernel.

This supposingly would save a kernel launch as well as a full global memory load
on promoted data type (float).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14580

Differential Revision: D13356203

Pulled By: ezyang

fbshipit-source-id: 85e91225b880a65fe3ceb493371b9b36407fdf48
2018-12-06 09:03:46 -08:00
eb3cabffd6 Consistent formatting in losses' docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14739

Differential Revision: D13356143

Pulled By: ezyang

fbshipit-source-id: 9ae8316dd8ba6e910247b64cec22db63df10e11c
2018-12-06 09:01:24 -08:00
2e7cc86a62 Add (partial) autodiff support for nll_loss (#14305)
Summary:
Not ready yet, need some comments / help with this. It's good enough for https://github.com/pytorch/xla immediate goals (forward + backward trace fusion), but there are at least two issues with it:

1. If we don't allow it, `test/test_jit.py` fails to cover the change.
2. If we allow the weight to be set, running `test/test_jit.py TestJitGenerated.test_nn_nll_loss` fails with:

```
======================================================================
ERROR: test_nn_nll_loss (__main__.TestJitGenerated)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_jit.py", line 10001, in do_test
    fn, f_args_variable, kwargs_variable, no_grad=no_grad)
  File "test/test_jit.py", line 9360, in check_against_reference
    outputs_test = self.runAndSaveRNG(func, recording_inputs, kwargs)
  File "test/test_jit.py", line 425, in runAndSaveRNG
    results = func(*inputs, **kwargs)
  File "test/test_jit.py", line 9298, in script_fn
    self.assertExportImport(CU.the_method.graph, tensors)
  File "test/test_jit.py", line 415, in assertExportImport
    self.assertExportImportModule(m, inputs)
  File "test/test_jit.py", line 419, in assertExportImportModule
    self.assertEqual(self.runAndSaveRNG(m.forward, inputs),
  File "test/test_jit.py", line 425, in runAndSaveRNG
    results = func(*inputs, **kwargs)
RuntimeError:
arguments for call are not valid:

  for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight, *, Tensor out) -> Tensor:
  expected a value of type Tensor for argument 'total_weight' but found bool
  <internally-created-node>
  ~ <--- HERE

  for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> Tensor:
  expected a value of type Tensor for argument 'total_weight' but found bool
  <internally-created-node>
  ~ <--- HERE
for call at:
<internally-created-node>
~ <--- HERE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14305

Differential Revision: D13356265

Pulled By: ezyang

fbshipit-source-id: 504d783b2d87f923e698a6a4efc0fd9935a94a41
2018-12-06 08:58:54 -08:00
e7bd8457a6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 2adbb6f97d4b8f067a2538fec855063510b0ca3f
2018-12-06 08:58:53 -08:00
6039c7611f Updating submodules
Reviewed By: yns88

fbshipit-source-id: e0509413215f3b7578b825c52365fec4da625bd5
2018-12-06 02:55:47 -08:00
12addc64a6 Fixed MIOpen RNN Segfault issue and enabled RNN test (#14810)
Summary:
This pull request contains changes for:
1. Added MIOpen RNN API miopenGetRNNLayerBiasSize and miopenGetRNNLayerParamSize.
2. Fixed usage of API miopenGetRNNLayerParam.
3. Modifying the RNN test to run using MIOpen engine.

Differential Revision: D13355699

Pulled By: bddppq

fbshipit-source-id: 6f750657f8049c5446eca893880b397804120b69
2018-12-05 23:54:31 -08:00
39d50ef4f6 Export complete subgraph io info when calling onnxGetBackendCompatibility (#14827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14827

We need to send complete IO info when doing `onnxGetBackendCompatibility` to backend like Glow. Previously we are missing some info because sometimes we generate more than one nodes from one C2 op. This fixes the issue.

Reviewed By: jackm321

Differential Revision: D13352049

fbshipit-source-id: 8d8ac70656a0ac42f3a0ccecad61456a4f3b2435
2018-12-05 23:52:06 -08:00
ba287eebca Fix clip gradient with empty input (#14709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14709

As titled

Reviewed By: Wakeupbuddy

Differential Revision: D13305554

fbshipit-source-id: 380062d4b0e4f9dc0207a27766cac7b8d05384d5
2018-12-05 22:53:25 -08:00
997df9a6ec Remove protobuf dependency in pytorch cmake file. (#14182)
Summary:
Currently, pytorch doesn't dependent on protobuf. So, we don't need to include the protobuf dir in pytorch cmake file.
And if we build caffe2 without custom-protobuf[1], we will have the protobuf mismatched problem.

[1]
92dbd0219f/CMakeLists.txt (L65)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14182

Differential Revision: D13356273

Pulled By: ezyang

fbshipit-source-id: 8120c3452d158dc51d70156433d7b9076c6aed47
2018-12-05 22:49:50 -08:00
3799d32b7b Optimize images (#14084)
Summary:
This is a PR that [ImgBot](https://imgbot.net/) opened on my fork https://github.com/zasdfgbnm/pytorch/pull/1, I forward it here.  ImgBot does lossless compression on images to reduce file size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14084

Differential Revision: D13356293

Pulled By: ezyang

fbshipit-source-id: 731236d95ad870db8ccb99b03ed306704365242c
2018-12-05 22:46:32 -08:00
e27d77815d Prevent profile_observer_test from being run by CPU test (#14168)
Summary:
Fix CMakeLists.txt, so the test for CPU won't run profile_observer_test.cc, as currently it only supports GPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14168

Differential Revision: D13356274

Pulled By: ezyang

fbshipit-source-id: 7d105f2e18675e5fab129864958148b0f18d582c
2018-12-05 22:34:29 -08:00
14fb651b5f CAFFE2_INCLUDE_DIRS points to invalid path (#14306)
Summary:
I know that including CAFFE2_INCLUDE_DIRS in include headers are not necessary for newer cmakes. But I had this in one of my old projects and **cmake gave me error that "/usr/lib/include" is invalid path**.

It seems like "${_INSTALL_PREFIX}/lib/include" should be changed to "${_INSTALL_PREFIX}/include" as all caffe2 headers are in /include rather than /lib/include/

Please correct me if I am wrong?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14306

Differential Revision: D13356246

Pulled By: ezyang

fbshipit-source-id: e2d5d3c42352e59b245714ad90fd7a9ef48170d7
2018-12-05 22:32:04 -08:00
5e307bd1be use "Extension" instead of the unimported "setuptools.Extension" (#14475)
Summary:
use "Extension" instead of the unimported "setuptools.Extension"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14475

Differential Revision: D13356219

Pulled By: ezyang

fbshipit-source-id: 5a3e7eb73a32d6bf09676efd9eddded5586435cd
2018-12-05 22:18:47 -08:00
d393dd0744 generate ATen core files with LF. (#14667)
Summary:
on Windows environment, some ATen core files (Type.h, Tensor.h, TensorMethods.h) are created and it's new line code is CRLF. (maybe enviconment dependant)
therefore, comparing files is failed in generate_outputs()agener917.py and compilation stopped.
this patch generates these files with LF forcibly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14667

Differential Revision: D13356170

Pulled By: ezyang

fbshipit-source-id: ef8cc3a6cc8bf3c45b78e9eb3df98cf47c0d33bb
2018-12-05 22:14:29 -08:00
2d60afbc90 Remove outdated css file and refs in cpp conf.py (#14779)
Summary:
pytorch_theme.css is no longer necessary for the cpp or html docs site build. The new theme styles are located at https://github.com/pytorch/pytorch_sphinx_theme. The Lato font is also no longer used in the new theme.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14779

Differential Revision: D13356125

Pulled By: ezyang

fbshipit-source-id: c7635eb7512c7dcaddb9cad596ab3dbc96480144
2018-12-05 21:55:45 -08:00
82903dda9b Fixes for some Windows compiler warnings (#14490)
Summary:
Implement some simple fixes to clean up windows build by fixing compiler warnings. Three main types of warnings were fixes:

1. GCC specific pragmas were changed to not be used on windows.
2. cmake flags that don't exist on windows were removed from windows build
3. Fix a macro that was defined multiple times on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14490

Differential Revision: D13241988

Pulled By: ezyang

fbshipit-source-id: 38da8354f0e3a3b9c97e33309cdda9fd23c08247
2018-12-05 21:27:07 -08:00
a6399121da Shut up "address will always evaluate to 'true'" warnings (#14774)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14774

Differential Revision: D13327969

Pulled By: ezyang

fbshipit-source-id: 43380c89eedaaa89467952401b8fd3f5a9ad754a
2018-12-05 21:18:31 -08:00
f9446e0c94 HIPify less files in PyTorch (#14804)
Summary:
Stacked on #14803
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14804

Differential Revision: D13347986

Pulled By: ezyang

fbshipit-source-id: c93177b4ad51855660d0de36d042bfc542bd4be0
2018-12-05 20:52:38 -08:00
ba0ebe33c1 Unify device argument parsing between torch and c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786

Differential Revision: D13334501

Pulled By: bddppq

fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c
2018-12-05 18:37:32 -08:00
252e9058d4 Improve assertion failure message (#14813)
Summary:
See #14554.

I can't figure out how the reported issue can happen. The best next
thing is have more information when this happens again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14813

Differential Revision: D13351908

Pulled By: pietern

fbshipit-source-id: 61b30fcae2e34da54329d0893ca4921b6ad60f0d
2018-12-05 17:20:25 -08:00
83ad52634a Add FunctionSchema based Operator Registry (#13789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13789

This enables creation of operators with FunctionSchema and IValue

Reviewed By: smessmer

Differential Revision: D13008791

fbshipit-source-id: 151efc88ac315f4a0ab0171a99774caaf767ef1e
2018-12-05 17:20:24 -08:00
67dcf10631 Increase test timeout (#14814)
Summary:
It is possible that some sort of contention causes process scheduling
delays which in turn cause the timeout to *not* be hit.

Increased sleep here will decrease the probability of this happening.

Fixes #14555.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14814

Differential Revision: D13351924

Pulled By: pietern

fbshipit-source-id: 1222cf0855408dfcb79f30f94694c790ee998cf9
2018-12-05 17:18:11 -08:00
c02b3e7cea Retry test on address already in use error (#14815)
Summary:
Thanks nairbv for the suggestion.

Also see #14589.

Fixes #14703.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14815

Differential Revision: D13351913

Pulled By: pietern

fbshipit-source-id: d11a4152505d0ce15592b13e417bb80551476a61
2018-12-05 17:09:46 -08:00
6fccca4278 improve ONNX tests on torch.Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14821

Reviewed By: zrphercule

Differential Revision: D13348773

Pulled By: houseroad

fbshipit-source-id: 611ca6e28f715e5518649c8c16f702ac3433308c
2018-12-05 17:07:10 -08:00
4357 changed files with 319194 additions and 220037 deletions

2
.circleci/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
*.svg
*.png

476
.circleci/README.md Normal file
View File

@ -0,0 +1,476 @@
Structure of CI
===============
setup job:
1. Does a git checkout
2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why?
We don't always do a Git checkout on all subjobs, but we usually
still want to be able to call scripts one way or another in a subjob.
Persisting files this way lets us have access to them without doing a
checkout. This workspace is conventionally mounted on `~/workspace`
(this is distinguished from `~/project`, which is the conventional
working directory that CircleCI will default to starting your jobs
in.)
3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so
we can determine in subjobs if we should actually run the jobs or
not, even if there isn't a Git checkout.
CircleCI configuration generator
================================
One may no longer make changes to the `.circleci/config.yml` file directly.
Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory.
Usage
----------
1. Make changes to these scripts.
2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.
You'll see a build failure on TravisCI if the scripts don't agree with the checked-in version.
Motivation
----------
These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix.
The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content.
Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate
multiple parts of the file.
* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets
Also see https://github.com/pytorch/pytorch/issues/17038
Future direction
----------------
### Declaring sparse config subsets
See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
In contrast with a full recursive tree traversal of configuration dimensions,
> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
----------------
----------------
# How do the binaries / nightlies / releases work?
### What is a binary?
A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
A **binary configuration** is a collection of
* release or nightly
* releases are stable, nightlies are beta and built every night
* python version
* linux: 2.7m, 2.7mu, 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
* macos and windows: 2.7, 3.5, 3.6, 3.7
* cpu version
* cpu, cuda 9.0, cuda 10.0
* The supported cuda versions occasionally change
* operating system
* Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
* MacOS
* Windows - these are built on Azure pipelines
* devtoolset version (gcc compiler version)
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
### Where are the binaries?
The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
We have 3 types of binary packages
* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
* shared with dependencies
* static with dependencies
* shared without dependencies
* static without dependencies
All binaries are built in CircleCI workflows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
# CircleCI structure of the binaries
Some quick vocab:
* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
## How are the workflows structured?
The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration
1. binarybuilds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
2. Persists the package to the workspace
2. binary_linux_conda_3.7_cpu_test
1. Loads the package to the workspace
2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
3. Runs some smoke tests in the docker
4. (Actually, for macos this is a step rather than a separate job)
3. binary_linux_conda_3.7_cpu_upload
1. Logs in to aws/conda
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
2. Runs the smoke tests
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/run_tests.sh
* https://github.com/pytorch/builder/blob/master/smoke_test.sh
* https://github.com/pytorch/builder/blob/master/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
* binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
### **Why do the steps all refer to scripts?**
CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
### **What is binary_run_in_docker for?**
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.
* This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
# Code structure of the binaries (circleci agnostic)
## Overview
The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is
```
# All code needed to set-up environments for build code to run in,
# but only code that is specific to the current CI system
pytorch/pytorch
- .circleci/ # Folder that holds all circleci related stuff
- config.yml # GENERATED file that actually controls all circleci behavior
- verbatim-sources # Used to generate job/workflow sections in ^
- scripts/ # Code needed to prepare circleci environments for binary build scripts
- setup.py # Builds pytorch. This is wrapped in pytorch/builder
- cmake files # used in normal building of pytorch
# All code needed to prepare a binary build, given an environment
# with all the right variables/packages/paths.
pytorch/builder
# Given an installed binary and a proper python env, runs some checks
# to make sure the binary was built the proper way. Checks things like
# the library dependencies, symbols present, etc.
- check_binary.sh
# Given an installed binary, runs python tests to make sure everything
# is in order. These should be de-duped. Right now they both run smoke
# tests, but are called from different places. Usually just call some
# import statements, but also has overlap with check_binary.sh above
- run_tests.sh
- smoke_test.sh
# Folders that govern how packages are built. See paragraphs below
- conda/
- build_pytorch.sh # Entrypoint. Delegates to proper conda build folder
- switch_cuda_version.sh # Switches activate CUDA installation in Docker
- pytorch-nightly/ # Build-folder
- manywheel/
- build_cpu.sh # Entrypoint for cpu builds
- build.sh # Entrypoint for CUDA builds
- build_common.sh # Actual build script that ^^ call into
- wheel/
- build_wheel.sh # Entrypoint for wheel builds
```
Every type of package has an entrypoint build script that handles the all the important logic.
## Conda
Both Linux and MacOS use the same code flow for the conda builds.
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
tldr; on conda-build is
1. Creates a brand new conda environment, based off of deps in the meta.yaml
1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
2. Calls build.sh in the environment
3. Copies the finished package to a new conda env, also specified by the meta.yaml
4. Runs some simple import tests (if specified in the meta.yaml)
5. Saves the finished package as a tarball
The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
The entrypoint file `builder/conda/build_conda.sh` is complicated because
* It works for both Linux and MacOS
* The mac builds used to create their own environments, since they all used to be on the same machine. Theres now a lot of extra logic to handle conda envs. This extra machinery could be removed
* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
## Manywheels (linux pip and libtorch packages)
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unneccessary folders and movements here and there.
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
## Wheels (MacOS pip and libtorch packages)
The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
* The mac builds used to all run on one machine (we didnt have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* Ditto the comment above. This should definitely be separated out.
Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## General notes
### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
* Its confusinig. Most of those scripts deal with python specifics.
* The extra conditionals everywhere severely complicate the wheel build scripts
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
### Note on docker images / Dockerfiles
All linux builds occur in docker images. The docker images are
* soumith/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* soumith/manylinux-cuda90
* soumith/manylinux-cuda92
* soumith/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
### General Python
* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
# How to manually rebuild the binaries
tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci
Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
```
# Make your changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
# Regenerate the yaml, has to be in python 3.7
.circleci/regenerate.sh
# Make a commit
git add .circleci *
git commit -m "My real changes"
git push origin my_branch
# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
# Also eliminate ensure-consistency and should_run_job checks
# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
# Make a commit you won't keep
git add .circleci
git commit -m "[DO NOT LAND] testing binaries for above changes"
git push origin my_branch
# Now you need to make some changes to the first commit.
git rebase -i HEAD~2 # mark the first commit as 'edit'
# Make the changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
.circleci/regenerate.sh
# Ammend the commit and recontinue
git add .circleci
git commit --amend
git rebase --continue
# Update the PR, need to force since the commits are different now
git push origin my_branch --force
```
The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
## How to build a binary locally
### Linux
You can build Linux binaries locally easily using docker.
```
# Run the docker
# Use the correct docker image, soumith/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
# container at path/to/bar. So if you then run `touch path/to/bar/baz`
# in the docker container then you will see path/to/foo/baz on your local
# machine. You could also clone the pytorch and builder repos in the docker.
#
# If you're building a CUDA binary then use `nvidia-docker run` instead, see below.
#
# If you know how, add ccache as a volume too and speed up everything
docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it soumith/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
# The builds generate lots of output so you probably need this when
# building locally.
/builder/conda/build_pytorch.sh |& tee build_output.log
```
**Building CUDA binaries on docker**
To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though its gonna take a loong time).
For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
### MacOS
Theres no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If youre trying to repro an error on a Mac build in .circleci and you cant seem to repro locally, then my best advice is actually to iterate on .circleci :/
But if you want to try, then Id recommend
```
# Create a new terminal
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
# know how to do
# Install a new miniconda
# First remove any other python or conda installation from your PATH
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="~/my_new_conda/bin:$PATH"
# Create a clean python env
# All MacOS builds use conda to manage the python env and dependencies
# that are built with, even the pip packages
conda create -yn binary python=2.7
conda activate binary
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint you want
path/to/builder/wheel/build_wheel.sh
```
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
1. You make the conda command accessible by prepending `path/to/conda_root/bin` to your PATH.
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
3. Now say you (or some code that you ran) call python executable `foo`
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called base), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
Newer conda versions and proper python hygeine can prevent this, but just install a new miniconda to be safe.
### Windows
Maybe @peterjc123 can fill this section in.

View File

@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""
This module models the tree of configuration variants
for "smoketest" builds.
Each subclass of ConfigNode represents a layer of the configuration hierarchy.
These tree nodes encapsulate the logic for whether a branch of the hierarchy
should be "pruned".
In addition to generating config.yml content, the tree is also traversed
to produce a visualization of config dimensions.
"""
from collections import OrderedDict
from cimodel.lib.conf_tree import ConfigNode
import cimodel.data.dimensions as dimensions
LINKING_DIMENSIONS = [
"shared",
"static",
]
DEPS_INCLUSION_DIMENSIONS = [
"with-deps",
"without-deps",
]
def get_processor_arch_name(cuda_version):
return "cpu" if not cuda_version else "cu" + cuda_version
LINUX_PACKAGE_VARIANTS = OrderedDict(
manywheel=[
"2.7m",
"2.7mu",
"3.5m",
"3.6m",
"3.7m",
],
conda=dimensions.CONDA_PYTHON_VERSIONS,
libtorch=[
"2.7m",
],
)
CONFIG_TREE_DATA = OrderedDict(
linux=(dimensions.CUDA_VERSIONS, LINUX_PACKAGE_VARIANTS),
macos=([None], OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.CONDA_PYTHON_VERSIONS,
libtorch=[
"2.7",
],
)),
)
# Why is this an option?
# All the nightlies used to be devtoolset3 and built with the old gcc ABI. We
# added a devtoolset7 option so that we could build nightlies with the new gcc
# ABI. That didn't work since devtoolset7 can't build with the new gcc ABI. But
# then we set devtoolset7 to be the default anyways, since devtoolset7
# understands avx512, which is needed for good fbgemm performance.
# This should be removed. The base dockers should just be upgraded to
# devtoolset7 so we don't have to reinstall this in every build job.
# The same machinery that this uses, though, should be retooled for a different
# compiler toolchain that can build with the new gcc ABI.
DEVTOOLSET_VERSIONS = [
7,
]
class TopLevelNode(ConfigNode):
def __init__(self, node_name, config_tree_data, smoke):
super(TopLevelNode, self).__init__(None, node_name)
self.config_tree_data = config_tree_data
self.props["smoke"] = smoke
def get_children(self):
return [OSConfigNode(self, x, c, p) for (x, (c, p)) in self.config_tree_data.items()]
class OSConfigNode(ConfigNode):
def __init__(self, parent, os_name, cuda_versions, py_tree):
super(OSConfigNode, self).__init__(parent, os_name)
self.py_tree = py_tree
self.props["os_name"] = os_name
self.props["cuda_versions"] = cuda_versions
def get_children(self):
packaging_variants = [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]
if self.find_prop("smoke"):
filtered_packaging_variants = list(filter(lambda x: x.get_label() != "libtorch", packaging_variants))
return filtered_packaging_variants
else:
return packaging_variants
class PackageFormatConfigNode(ConfigNode):
def __init__(self, parent, package_format, python_versions):
super(PackageFormatConfigNode, self).__init__(parent, package_format)
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux":
return [LinuxGccConfigNode(self, v) for v in DEVTOOLSET_VERSIONS]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
class LinuxGccConfigNode(ConfigNode):
def __init__(self, parent, devtoolset_version):
super(LinuxGccConfigNode, self).__init__(parent, "DEVTOOLSET=" + str(devtoolset_version))
self.props["devtoolset_version"] = devtoolset_version
def get_children(self):
cuda_versions = self.find_prop("cuda_versions")
# XXX devtoolset7 on CUDA 9.0 is temporarily disabled
# see https://github.com/pytorch/pytorch/issues/20066
if self.find_prop("devtoolset_version") == 7:
cuda_versions = filter(lambda x: x != "90", cuda_versions)
return [ArchConfigNode(self, v) for v in cuda_versions]
class ArchConfigNode(ConfigNode):
def __init__(self, parent, cu):
super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(cu))
self.props["cu"] = cu
def get_children(self):
return [PyVersionConfigNode(self, v) for v in self.find_prop("python_versions")]
class PyVersionConfigNode(ConfigNode):
def __init__(self, parent, pyver):
super(PyVersionConfigNode, self).__init__(parent, pyver)
self.props["pyver"] = pyver
def get_children(self):
smoke = self.find_prop("smoke")
package_format = self.find_prop("package_format")
os_name = self.find_prop("os_name")
has_libtorch_variants = package_format == "libtorch" and os_name == "linux"
linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []
return [LinkingVariantConfigNode(self, v) for v in linking_variants]
class LinkingVariantConfigNode(ConfigNode):
def __init__(self, parent, linking_variant):
super(LinkingVariantConfigNode, self).__init__(parent, linking_variant)
def get_children(self):
return [DependencyInclusionConfigNode(self, v) for v in DEPS_INCLUSION_DIMENSIONS]
class DependencyInclusionConfigNode(ConfigNode):
def __init__(self, parent, deps_variant):
super(DependencyInclusionConfigNode, self).__init__(parent, deps_variant)
self.props["libtorch_variant"] = "-".join([self.parent.get_label(), self.get_label()])

View File

@ -0,0 +1,213 @@
#!/usr/bin/env python3
from collections import OrderedDict
import cimodel.data.binary_build_data as binary_build_data
import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
class Conf(object):
def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, devtoolset_version):
self.os = os
self.cuda_version = cuda_version
self.pydistro = pydistro
self.parms = parms
self.smoke = smoke
self.libtorch_variant = libtorch_variant
self.devtoolset_version = devtoolset_version
def gen_build_env_parms(self):
elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]
if self.devtoolset_version is not None:
elems.append("devtoolset" + str(self.devtoolset_version))
return elems
def gen_docker_image(self):
docker_word_substitution = {
"manywheel": "manylinux",
"libtorch": "manylinux",
}
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
# The cpu nightlies are built on the soumith/manylinux-cuda100 docker image
alt_docker_suffix = self.cuda_version or "100"
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
def get_name_prefix(self):
return "smoke" if self.smoke else "binary"
def gen_build_name(self, build_or_test):
parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()
if self.libtorch_variant:
parts.append(self.libtorch_variant)
if not self.smoke:
parts.append(build_or_test)
return "_".join(parts)
def gen_yaml_tree(self, build_or_test):
env_tuples = [("BUILD_ENVIRONMENT", miniutils.quote(" ".join(self.gen_build_env_parms())))]
if self.libtorch_variant:
env_tuples.append(("LIBTORCH_VARIANT", miniutils.quote(self.libtorch_variant)))
os_name = miniutils.override(self.os, {"macos": "mac"})
d = {"<<": "*" + "_".join([self.get_name_prefix(), os_name, build_or_test])}
if build_or_test == "test":
if not (self.smoke and self.os == "macos"):
env_tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))
if self.cuda_version:
env_tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))
else:
if self.os == "linux" and build_or_test != "upload":
d["docker"] = [{"image": self.gen_docker_image()}]
d["environment"] = OrderedDict(env_tuples)
if build_or_test == "test":
if self.cuda_version:
d["resource_class"] = "gpu.medium"
return d
def get_root(smoke, name):
return binary_build_data.TopLevelNode(
name,
binary_build_data.CONFIG_TREE_DATA,
smoke,
)
def gen_build_env_list(smoke):
root = get_root(smoke, "N/A")
config_list = conf_tree.dfs(root)
newlist = []
for c in config_list:
conf = Conf(
c.find_prop("os_name"),
c.find_prop("cu"),
c.find_prop("package_format"),
[c.find_prop("pyver")],
c.find_prop("smoke"),
c.find_prop("libtorch_variant"),
c.find_prop("devtoolset_version"),
)
newlist.append(conf)
return newlist
def predicate_exclude_nonlinux_and_libtorch(config):
return config.os == "linux" and (config.smoke or config.pydistro != "libtorch")
def add_build_entries(jobs_dict, phase, smoke, filter_predicate=lambda x: True):
configs = gen_build_env_list(smoke)
for conf_options in filter(filter_predicate, configs):
jobs_dict[conf_options.gen_build_name(phase)] = conf_options.gen_yaml_tree(phase)
def add_binary_build_specs(jobs_dict):
add_build_entries(jobs_dict, "build", False)
def add_binary_build_tests(jobs_dict):
add_build_entries(jobs_dict, "test", False, predicate_exclude_nonlinux_and_libtorch)
def add_binary_build_uploads(jobs_dict):
add_build_entries(jobs_dict, "upload", False)
def add_smoke_test_specs(jobs_dict):
add_build_entries(jobs_dict, "test", True)
def get_nightly_tests():
configs = gen_build_env_list(False)
filtered_configs = filter(predicate_exclude_nonlinux_and_libtorch, configs)
tests = []
for conf_options in filtered_configs:
params = {"requires": ["setup", conf_options.gen_build_name("build")]}
tests.append({conf_options.gen_build_name("test"): params})
return tests
def get_nightly_uploads():
configs = gen_build_env_list(False)
def gen_config(conf, phase_dependency):
return {
conf.gen_build_name("upload"): OrderedDict([
("context", "org-member"),
("requires", ["setup", conf.gen_build_name(phase_dependency)]),
]),
}
mylist = []
for conf in configs:
phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"
mylist.append(gen_config(conf, phase_dependency))
return mylist
def gen_schedule_tree(cron_timing):
return [{
"schedule": {
"cron": miniutils.quote(cron_timing),
"filters": {
"branches": {
"only": ["master"],
},
},
},
}]
def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):
jobs_list = ["setup"]
configs = gen_build_env_list(smoke)
for build_config in configs:
build_name = build_config.gen_build_name("build")
jobs_list.append({build_name: {"requires": ["setup"]}})
jobs_dict[toplevel_key] = OrderedDict(
triggers=gen_schedule_tree(cron_schedule),
jobs=jobs_list,
)
graph = visualization.generate_graph(get_root(smoke, toplevel_key))
graph.draw(toplevel_key + "-config-dimensions.png", prog="twopi")
def add_binary_build_jobs(jobs_dict):
add_jobs_and_render(jobs_dict, "binarybuilds", False, "5 5 * * *")
def add_binary_smoke_test_jobs(jobs_dict):
add_jobs_and_render(jobs_dict, "binarysmoketests", True, "15 16 * * *")

View File

@ -0,0 +1,109 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
from cimodel.lib.conf_tree import Ver
CONFIG_TREE_DATA = [
(Ver("ubuntu", "14.04"), [
(Ver("gcc", "4.8"), [X("py2")]),
(Ver("gcc", "4.9"), [X("py2")]),
]),
(Ver("ubuntu", "16.04"), [
(Ver("cuda", "9.0"), [
# TODO make explicit that this is a "secret TensorRT build"
# (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)
# TODO Uh oh, were we supposed to make this one important?!
X("py2"),
XImportant("cmake"),
]),
(Ver("cuda", "9.1"), [XImportant("py2")]),
(Ver("mkl"), [XImportant("py2")]),
(Ver("gcc", "5"), [XImportant("onnx_py2")]),
(Ver("clang", "3.8"), [X("py2")]),
(Ver("clang", "3.9"), [X("py2")]),
(Ver("clang", "7"), [XImportant("py2"), XImportant("onnx_py3.6")]),
(Ver("android"), [XImportant("py2")]),
]),
(Ver("centos", "7"), [
(Ver("cuda", "9.0"), [X("py2")]),
]),
(Ver("macos", "10.13"), [
# TODO ios and system aren't related. system qualifies where the python comes
# from (use the system python instead of homebrew or anaconda)
(Ver("ios"), [X("py2")]),
(Ver("system"), [XImportant("py2")]),
]),
]
class TreeConfigNode(ConfigNode):
def __init__(self, parent, node_name, subtree):
super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))
self.subtree = subtree
self.init2(node_name)
# noinspection PyMethodMayBeStatic
def modify_label(self, label):
return str(label)
def init2(self, node_name):
pass
def get_children(self):
return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]
def is_build_only(self):
if str(self.find_prop("language_version")) == "onnx_py3.6":
return False
return str(self.find_prop("compiler_version")) in [
"gcc4.9",
"clang3.8",
"clang3.9",
"clang7",
"android",
] or self.find_prop("distro_version").name == "macos"
class TopLevelNode(TreeConfigNode):
def __init__(self, node_name, subtree):
super(TopLevelNode, self).__init__(None, node_name, subtree)
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return DistroConfigNode
class DistroConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["distro_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return CompilerConfigNode
class CompilerConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["compiler_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return LanguageConfigNode
class LanguageConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["language_version"] = node_name
self.props["build_only"] = self.is_build_only()
def child_constructor(self):
return ImportantConfigNode
class ImportantConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["important"] = True
def get_children(self):
return []

View File

@ -0,0 +1,187 @@
#!/usr/bin/env python3
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
from cimodel.lib.conf_tree import Ver
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode
from dataclasses import dataclass
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"
DOCKER_IMAGE_VERSION = 287
@dataclass
class Conf:
language: str
distro: Ver
compiler: Ver
build_only: bool
is_important: bool
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_py3.6" \
or self.compiler.name in ["android", "mkl", "clang"] \
or str(self.distro) in ["ubuntu14.04", "macos10.13"]
return [] if omit else ["cudnn7"]
def get_build_name_root_parts(self):
return [
"caffe2",
self.language,
] + self.get_build_name_middle_parts()
def get_build_name_middle_parts(self):
return [str(self.compiler)] + self.get_cudnn_insertion() + [str(self.distro)]
def construct_phase_name(self, phase):
root_parts = self.get_build_name_root_parts()
return "_".join(root_parts + [phase]).replace(".", "_")
def get_platform(self):
platform = self.distro.name
if self.distro.name != "macos":
platform = "linux"
return platform
def gen_docker_image(self):
lang_substitutions = {
"onnx_py2": "py2",
"onnx_py3.6": "py3.6",
"cmake": "py2",
}
lang = miniutils.override(self.language, lang_substitutions)
parts = [lang] + self.get_build_name_middle_parts()
return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))
def gen_yaml_tree(self, phase):
tuples = []
lang_substitutions = {
"onnx_py2": "onnx-py2",
"onnx_py3.6": "onnx-py3.6",
}
lang = miniutils.override(self.language, lang_substitutions)
parts = [
"caffe2",
lang,
] + self.get_build_name_middle_parts() + [phase]
build_env = "-".join(parts)
if not self.distro.name == "macos":
build_env = miniutils.quote(build_env)
tuples.append(("BUILD_ENVIRONMENT", build_env))
if self.compiler.name == "ios":
tuples.append(("BUILD_IOS", miniutils.quote("1")))
if phase == "test":
# TODO cuda should not be considered a compiler
if self.compiler.name == "cuda":
tuples.append(("USE_CUDA_DOCKER_RUNTIME", miniutils.quote("1")))
if self.distro.name == "macos":
tuples.append(("PYTHON_VERSION", miniutils.quote("2")))
else:
tuples.append(("DOCKER_IMAGE", self.gen_docker_image()))
if self.build_only:
tuples.append(("BUILD_ONLY", miniutils.quote("1")))
d = OrderedDict({"environment": OrderedDict(tuples)})
if phase == "test":
resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"
d["resource_class"] = resource_class
d["<<"] = "*" + "_".join(["caffe2", self.get_platform(), phase, "defaults"])
return d
def get_root():
return TopLevelNode("Caffe2 Builds", CONFIG_TREE_DATA)
def instantiate_configs():
config_list = []
root = get_root()
found_configs = conf_tree.dfs(root)
for fc in found_configs:
c = Conf(
language=fc.find_prop("language_version"),
distro=fc.find_prop("distro_version"),
compiler=fc.find_prop("compiler_version"),
build_only=fc.find_prop("build_only"),
is_important=fc.find_prop("important"),
)
config_list.append(c)
return config_list
def add_caffe2_builds(jobs_dict):
configs = instantiate_configs()
for conf_options in configs:
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
for phase in phases:
jobs_dict[conf_options.construct_phase_name(phase)] = conf_options.gen_yaml_tree(phase)
graph = visualization.generate_graph(get_root())
graph.draw("caffe2-config-dimensions.png", prog="twopi")
def get_caffe2_workflows():
configs = instantiate_configs()
# TODO Why don't we build this config?
# See https://github.com/pytorch/pytorch/pull/17323#discussion_r259450540
filtered_configs = filter(lambda x: not (str(x.distro) == "ubuntu14.04" and str(x.compiler) == "gcc4.9"), configs)
x = []
for conf_options in filtered_configs:
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
for phase in phases:
requires = ["setup"]
sub_d = {"requires": requires}
if phase == "test":
requires.append(conf_options.construct_phase_name("build"))
if not conf_options.is_important:
# If you update this, update
# pytorch_build_definitions.py too
sub_d["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
x.append({conf_options.construct_phase_name(phase): sub_d})
return x

View File

@ -0,0 +1,23 @@
#!/usr/bin/env python3
PHASES = ["build", "test"]
CUDA_VERSIONS = [
None, # cpu build
"92",
"100",
]
STANDARD_PYTHON_VERSIONS = [
"2.7",
"3.5",
"3.6",
"3.7",
]
CONDA_PYTHON_VERSIONS = [
"2.7",
"3.6",
"3.7",
]

View File

@ -0,0 +1,200 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
CONFIG_TREE_DATA = [
("trusty", [
(None, [
XImportant("2.7.9"),
X("2.7"),
X("3.5"),
X("nightly"),
]),
("gcc", [
("4.8", [X("3.6")]),
("5.4", [
XImportant("3.6"),
("3.6", [
("xla", [XImportant(True)]),
("namedtensor", [XImportant(True)]),
]),
]),
("7", [X("3.6")]),
]),
]),
("xenial", [
("clang", [
("5", [
XImportant("3.6"), # This is actually the ASAN build
("3.6", [
("namedtensor", [XImportant(True)]), # ASAN
]),
]),
]),
("cuda", [
("9", [
# Note there are magic strings here
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L21
# and
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L143
# and
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)
X("2.7"),
XImportant("3.6"),
("2.7", [
("namedtensor", [XImportant(True)]),
]),
]),
("9.2", [X("3.6")]),
("10", [X("3.6")]),
]),
("android", [
("r19c", [XImportant("3.6")]),
]),
]),
]
def get_major_pyver(dotted_version):
parts = dotted_version.split(".")
return "py" + parts[0]
class TreeConfigNode(ConfigNode):
def __init__(self, parent, node_name, subtree):
super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))
self.subtree = subtree
self.init2(node_name)
def modify_label(self, label):
return label
def init2(self, node_name):
pass
def get_children(self):
return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]
class TopLevelNode(TreeConfigNode):
def __init__(self, node_name, subtree):
super(TopLevelNode, self).__init__(None, node_name, subtree)
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return DistroConfigNode
class DistroConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["distro_name"] = node_name
def child_constructor(self):
distro = self.find_prop("distro_name")
next_nodes = {
"trusty": TrustyCompilerConfigNode,
"xenial": XenialCompilerConfigNode,
}
return next_nodes[distro]
class TrustyCompilerConfigNode(TreeConfigNode):
def modify_label(self, label):
return label or "<unspecified>"
def init2(self, node_name):
self.props["compiler_name"] = node_name
def child_constructor(self):
return TrustyCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode
class TrustyCompilerVersionConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["compiler_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return PyVerConfigNode
class PyVerConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["pyver"] = node_name
self.props["abbreviated_pyver"] = get_major_pyver(node_name)
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return ExperimentalFeatureConfigNode
class ExperimentalFeatureConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["experimental_feature"] = node_name
def child_constructor(self):
experimental_feature = self.find_prop("experimental_feature")
next_nodes = {
"xla": XlaConfigNode,
"namedtensor": NamedTensorConfigNode,
"important": ImportantConfigNode,
}
return next_nodes[experimental_feature]
class XlaConfigNode(TreeConfigNode):
def modify_label(self, label):
return "XLA=" + str(label)
def init2(self, node_name):
self.props["is_xla"] = node_name
def child_constructor(self):
return ImportantConfigNode
class NamedTensorConfigNode(TreeConfigNode):
def modify_label(self, label):
return "NAMEDTENSOR=" + str(label)
def init2(self, node_name):
self.props["is_namedtensor"] = node_name
def child_constructor(self):
return ImportantConfigNode
class ImportantConfigNode(TreeConfigNode):
def modify_label(self, label):
return "IMPORTANT=" + str(label)
def init2(self, node_name):
self.props["is_important"] = node_name
def get_children(self):
return []
class XenialCompilerConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["compiler_name"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return XenialCompilerVersionConfigNode
class XenialCompilerVersionConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["compiler_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return PyVerConfigNode

View File

@ -0,0 +1,310 @@
#!/usr/bin/env python3
from collections import OrderedDict
from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
from dataclasses import dataclass, field
from typing import List, Optional
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"
DOCKER_IMAGE_VERSION = 323
@dataclass
class Conf:
distro: str
parms: List[str]
pyver: Optional[str] = None
cuda_version: Optional[str] = None
# TODO expand this to cover all the USE_* that we want to test for
# tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)
is_xla: bool = False
restrict_phases: Optional[List[str]] = None
gpu_resource: Optional[str] = None
dependent_tests: List = field(default_factory=list)
parent_build: Optional['Conf'] = None
is_namedtensor: bool = False
is_important: bool = False
# TODO: Eliminate the special casing for docker paths
# In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch
def get_parms(self, for_docker):
leading = []
# We just don't run non-important jobs on pull requests;
# previously we also named them in a way to make it obvious
# if self.is_important and not for_docker:
# leading.append("AAA")
leading.append("pytorch")
if self.is_xla and not for_docker:
leading.append("xla")
if self.is_namedtensor and not for_docker:
leading.append("namedtensor")
cuda_parms = []
if self.cuda_version:
cuda_parms.extend(["cuda" + self.cuda_version, "cudnn7"])
return leading + ["linux", self.distro] + cuda_parms + self.parms
def gen_docker_image_path(self):
parms_source = self.parent_build or self
base_build_env_name = "-".join(parms_source.get_parms(True))
return miniutils.quote(DOCKER_IMAGE_PATH_BASE + base_build_env_name + ":" + str(DOCKER_IMAGE_VERSION))
def get_build_job_name_pieces(self, build_or_test):
return self.get_parms(False) + [build_or_test]
def gen_build_name(self, build_or_test):
return ("_".join(map(str, self.get_build_job_name_pieces(build_or_test)))).replace(".", "_").replace("-", "_")
def get_dependents(self):
return self.dependent_tests or []
def gen_yaml_tree(self, build_or_test):
build_job_name_pieces = self.get_build_job_name_pieces(build_or_test)
build_env_name = "-".join(map(str, build_job_name_pieces))
env_dict = OrderedDict([
("BUILD_ENVIRONMENT", build_env_name),
("DOCKER_IMAGE", self.gen_docker_image_path()),
])
if self.pyver:
env_dict["PYTHON_VERSION"] = miniutils.quote(self.pyver)
if build_or_test == "test" and self.gpu_resource:
env_dict["USE_CUDA_DOCKER_RUNTIME"] = miniutils.quote("1")
d = {
"environment": env_dict,
"<<": "*" + "_".join(["pytorch", "linux", build_or_test, "defaults"]),
}
if build_or_test == "test":
resource_class = "large"
if self.gpu_resource:
resource_class = "gpu." + self.gpu_resource
if self.gpu_resource == "large":
env_dict["MULTI_GPU"] = miniutils.quote("1")
d["resource_class"] = resource_class
return d
def gen_workflow_yaml_item(self, phase):
# All jobs require the setup job
parameters = OrderedDict({"requires": ["setup"]})
if phase == "test":
# TODO When merging the caffe2 and pytorch jobs, it might be convenient for a while to make a
# caffe2 test job dependent on a pytorch build job. This way we could quickly dedup the repeated
# build of pytorch in the caffe2 build job, and just run the caffe2 tests off of a completed
# pytorch build job (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259452641)
dependency_build = self.parent_build or self
parameters["requires"].append(dependency_build.gen_build_name("build"))
if not self.is_important:
# If you update this, update
# caffe2_build_definitions.py too
parameters["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
return {self.gen_build_name(phase): parameters}
# TODO This is a hack to special case some configs just for the workflow list
class HiddenConf(object):
def __init__(self, name, parent_build=None):
self.name = name
self.parent_build = parent_build
def gen_workflow_yaml_item(self, phase):
return {self.gen_build_name(phase): {"requires": [self.parent_build.gen_build_name("build")]}}
def gen_build_name(self, _):
return self.name
# TODO Convert these to graph nodes
def gen_dependent_configs(xenial_parent_config):
extra_parms = [
(["multigpu"], "large"),
(["NO_AVX2"], "medium"),
(["NO_AVX", "NO_AVX2"], "medium"),
(["slow"], "medium"),
(["nogpu"], None),
]
configs = []
for parms, gpu in extra_parms:
c = Conf(
xenial_parent_config.distro,
["py3"] + parms,
pyver="3.6",
cuda_version=xenial_parent_config.cuda_version,
restrict_phases=["test"],
gpu_resource=gpu,
parent_build=xenial_parent_config,
is_important=xenial_parent_config.is_important,
)
configs.append(c)
for x in ["pytorch_short_perf_test_gpu", "pytorch_python_doc_push", "pytorch_cpp_doc_push"]:
configs.append(HiddenConf(x, parent_build=xenial_parent_config))
return configs
def get_root():
return TopLevelNode("PyTorch Builds", CONFIG_TREE_DATA)
def gen_tree():
root = get_root()
configs_list = conf_tree.dfs(root)
return configs_list
def instantiate_configs():
config_list = []
root = get_root()
found_configs = conf_tree.dfs(root)
restrict_phases = None
for fc in found_configs:
distro_name = fc.find_prop("distro_name")
python_version = None
if distro_name == "xenial":
python_version = fc.find_prop("pyver")
parms_list = [fc.find_prop("abbreviated_pyver")]
else:
parms_list = ["py" + fc.find_prop("pyver")]
compiler_name = fc.find_prop("compiler_name")
cuda_version = None
if compiler_name == "cuda":
cuda_version = fc.find_prop("compiler_version")
elif compiler_name == "android":
android_ndk_version = fc.find_prop("compiler_version")
# TODO: do we need clang to compile host binaries like protoc?
parms_list.append("clang5")
parms_list.append("android-ndk-" + android_ndk_version)
restrict_phases = ["build"]
elif compiler_name:
gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")
parms_list.append(gcc_version)
# TODO: This is a nasty special case
if compiler_name == "clang":
parms_list.append("asan")
if cuda_version in ["9.2", "10"]:
# TODO The gcc version is orthogonal to CUDA version?
parms_list.append("gcc7")
is_xla = fc.find_prop("is_xla") or False
is_namedtensor = fc.find_prop("is_namedtensor") or False
is_important = fc.find_prop("is_important") or False
gpu_resource = None
if cuda_version and cuda_version != "10":
gpu_resource = "medium"
c = Conf(
distro_name,
parms_list,
python_version,
cuda_version,
is_xla,
restrict_phases,
gpu_resource,
is_namedtensor=is_namedtensor,
is_important=is_important,
)
if cuda_version == "9" and python_version == "3.6":
c.dependent_tests = gen_dependent_configs(c)
config_list.append(c)
return config_list
def add_build_env_defs(jobs_dict):
mydict = OrderedDict()
config_list = instantiate_configs()
for c in config_list:
phases = c.restrict_phases or dimensions.PHASES
for phase in phases:
# TODO why does this not have a test?
if phase == "test" and c.cuda_version == "10":
continue
d = c.gen_yaml_tree(phase)
mydict[c.gen_build_name(phase)] = d
if phase == "test":
for x in filter(lambda x: type(x) is not HiddenConf, c.get_dependents()):
d = x.gen_yaml_tree(phase)
mydict[x.gen_build_name(phase)] = d
# this is the circleci api version and probably never changes
jobs_dict["version"] = 2
jobs_dict["jobs"] = mydict
graph = visualization.generate_graph(get_root())
graph.draw("pytorch-config-dimensions.png", prog="twopi")
def get_workflow_list():
config_list = instantiate_configs()
x = ["setup"]
for conf_options in config_list:
phases = conf_options.restrict_phases or dimensions.PHASES
for phase in phases:
# TODO why does this not have a test?
if phase == "test" and conf_options.cuda_version == "10":
continue
x.append(conf_options.gen_workflow_yaml_item(phase))
# TODO convert to recursion
for conf in conf_options.get_dependents():
x.append(conf.gen_workflow_yaml_item("test"))
return x

View File

@ -0,0 +1,110 @@
#!/usr/bin/env python3
from dataclasses import dataclass, field
from typing import Optional, Dict
def X(val):
"""
Compact way to write a leaf node
"""
return val, []
def XImportant(name):
"""Compact way to write an important (run on PRs) leaf node"""
return (name, [("important", [X(True)])])
@dataclass
class Ver:
"""
Represents a product with a version number
"""
name: str
version: str = ""
def __str__(self):
return self.name + self.version
@dataclass
class ConfigNode:
parent: Optional['ConfigNode']
node_name: str
props: Dict[str, str] = field(default_factory=dict)
def get_label(self):
return self.node_name
# noinspection PyMethodMayBeStatic
def get_children(self):
return []
def get_parents(self):
return (self.parent.get_parents() + [self.parent.get_label()]) if self.parent else []
def get_depth(self):
return len(self.get_parents())
def get_node_key(self):
return "%".join(self.get_parents() + [self.get_label()])
def find_prop(self, propname, searched=None):
"""
Checks if its own dictionary has
the property, otherwise asks parent node.
"""
if searched is None:
searched = []
searched.append(self.node_name)
if propname in self.props:
return self.props[propname]
elif self.parent:
return self.parent.find_prop(propname, searched)
else:
# raise Exception('Property "%s" does not exist anywhere in the tree! Searched: %s' % (propname, searched))
return None
def dfs_recurse(
node,
leaf_callback=lambda x: None,
discovery_callback=lambda x, y, z: None,
child_callback=lambda x, y: None,
sibling_index=0,
sibling_count=1):
discovery_callback(node, sibling_index, sibling_count)
node_children = node.get_children()
if node_children:
for i, child in enumerate(node_children):
child_callback(node, child)
dfs_recurse(
child,
leaf_callback,
discovery_callback,
child_callback,
i,
len(node_children),
)
else:
leaf_callback(node)
def dfs(toplevel_config_node):
config_list = []
def leaf_callback(node):
config_list.append(node)
dfs_recurse(toplevel_config_node, leaf_callback)
return config_list

View File

@ -0,0 +1,13 @@
#!/usr/bin/env python3
def quote(s):
return sandwich('"', s)
def sandwich(bread, jam):
return bread + jam + bread
def override(word, substitutions):
return substitutions.get(word, word)

View File

@ -0,0 +1,64 @@
#!/usr/bin/env python3
from collections import OrderedDict
LIST_MARKER = "- "
INDENTATION_WIDTH = 2
def is_dict(data):
return type(data) is dict or type(data) is OrderedDict
def is_collection(data):
return is_dict(data) or type(data) is list
# TODO can eventually drop this custom sorting
def sortkey(x):
k = x[0]
return (
k == "<<",
k != "environment",
k,
)
def render(fh, data, depth, is_list_member=False):
"""
PyYaml does not allow precise control over the quoting
behavior, especially for merge references.
Therefore, we use this custom YAML renderer.
"""
indentation = " " * INDENTATION_WIDTH * depth
if is_dict(data):
tuples = list(data.items())
if type(data) is not OrderedDict:
tuples.sort(key=sortkey)
for i, (k, v) in enumerate(tuples):
# If this dict is itself a list member, the first key gets prefixed with a list marker
list_marker_prefix = LIST_MARKER if is_list_member and not i else ""
trailing_whitespace = "\n" if is_collection(v) else " "
fh.write(indentation + list_marker_prefix + k + ":" + trailing_whitespace)
render(fh, v, depth + 1 + int(is_list_member))
# TODO Could eventually drop this cosmetic convention
if depth == 2:
fh.write("\n")
elif type(data) is list:
for v in data:
render(fh, v, depth, True)
else:
list_member_prefix = indentation + LIST_MARKER if is_list_member else ""
fh.write(list_member_prefix + str(data) + "\n")

View File

@ -0,0 +1,86 @@
#!/usr/bin/env python3
"""
This module encapsulates dependencies on pygraphviz
"""
import colorsys
import cimodel.lib.conf_tree as conf_tree
def rgb2hex(rgb_tuple):
def to_hex(f):
return "%02x" % int(f * 255)
return "#" + "".join(map(to_hex, list(rgb_tuple)))
def handle_missing_graphviz(f):
"""
If the user has not installed pygraphviz, this causes
calls to the draw() method of the returned object to do nothing.
"""
try:
import pygraphviz # noqa: F401
return f
except ModuleNotFoundError:
class FakeGraph:
def draw(self, *args, **kwargs):
pass
return lambda _: FakeGraph()
@handle_missing_graphviz
def generate_graph(toplevel_config_node):
"""
Traverses the graph once first just to find the max depth
"""
config_list = conf_tree.dfs(toplevel_config_node)
max_depth = 0
for config in config_list:
max_depth = max(max_depth, config.get_depth())
# color the nodes using the max depth
from pygraphviz import AGraph
dot = AGraph()
def node_discovery_callback(node, sibling_index, sibling_count):
depth = node.get_depth()
sat_min, sat_max = 0.1, 0.6
sat_range = sat_max - sat_min
saturation_fraction = sibling_index / float(sibling_count - 1) if sibling_count > 1 else 1
saturation = sat_min + sat_range * saturation_fraction
# TODO Use a hash of the node label to determine the color
hue = depth / float(max_depth + 1)
rgb_tuple = colorsys.hsv_to_rgb(hue, saturation, 1)
this_node_key = node.get_node_key()
dot.add_node(
this_node_key,
label=node.get_label(),
style="filled",
# fillcolor=hex_color + ":orange",
fillcolor=rgb2hex(rgb_tuple),
penwidth=3,
color=rgb2hex(colorsys.hsv_to_rgb(hue, saturation, 0.9))
)
def child_callback(node, child):
this_node_key = node.get_node_key()
child_node_key = child.get_node_key()
dot.add_edge((this_node_key, child_node_key))
conf_tree.dfs_recurse(toplevel_config_node, lambda x: None, node_discovery_callback, child_callback)
return dot

File diff suppressed because it is too large Load Diff

39
.circleci/ensure-consistency.py Executable file
View File

@ -0,0 +1,39 @@
#!/usr/bin/env python3
import os
import subprocess
import sys
import tempfile
import generate_config_yml
CHECKED_IN_FILE = "config.yml"
REGENERATION_SCRIPT = "regenerate.sh"
PARENT_DIR = os.path.basename(os.path.dirname(os.path.abspath(__file__)))
README_PATH = os.path.join(PARENT_DIR, "README.md")
ERROR_MESSAGE_TEMPLATE = """
The checked-in CircleCI "%s" file does not match what was generated by the scripts.
Please re-run the "%s" script in the "%s" directory and commit the result. See "%s" for more information.
"""
def check_consistency():
_, temp_filename = tempfile.mkstemp("-generated-config.yml")
with open(temp_filename, "w") as fh:
generate_config_yml.stitch_sources(fh)
try:
subprocess.check_call(["cmp", temp_filename, CHECKED_IN_FILE])
except subprocess.CalledProcessError:
sys.exit(ERROR_MESSAGE_TEMPLATE % (CHECKED_IN_FILE, REGENERATION_SCRIPT, PARENT_DIR, README_PATH))
finally:
os.remove(temp_filename)
if __name__ == "__main__":
check_consistency()

127
.circleci/generate_config_yml.py Executable file
View File

@ -0,0 +1,127 @@
#!/usr/bin/env python3
"""
This script is the source of truth for config.yml.
Please see README.md in this directory for details.
"""
import os
import sys
import shutil
from collections import namedtuple, OrderedDict
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.binary_build_definitions as binary_build_definitions
import cimodel.data.caffe2_build_definitions as caffe2_build_definitions
import cimodel.lib.miniutils as miniutils
import cimodel.lib.miniyaml as miniyaml
class File(object):
"""
Verbatim copy the contents of a file into config.yml
"""
def __init__(self, filename):
self.filename = filename
def write(self, output_filehandle):
with open(os.path.join("verbatim-sources", self.filename)) as fh:
shutil.copyfileobj(fh, output_filehandle)
class FunctionGen(namedtuple('FunctionGen', 'function depth')):
__slots__ = ()
class Treegen(FunctionGen):
"""
Insert the content of a YAML tree into config.yml
"""
def write(self, output_filehandle):
build_dict = OrderedDict()
self.function(build_dict)
miniyaml.render(output_filehandle, build_dict, self.depth)
class Listgen(FunctionGen):
"""
Insert the content of a YAML list into config.yml
"""
def write(self, output_filehandle):
miniyaml.render(output_filehandle, self.function(), self.depth)
def horizontal_rule():
return "".join("#" * 78)
class Header(object):
def __init__(self, title, summary=None):
self.title = title
self.summary_lines = summary or []
def write(self, output_filehandle):
text_lines = [self.title] + self.summary_lines
comment_lines = ["# " + x for x in text_lines]
lines = miniutils.sandwich([horizontal_rule()], comment_lines)
for line in filter(None, lines):
output_filehandle.write(line + "\n")
# Order of this list matters to the generated config.yml.
YAML_SOURCES = [
File("header-section.yml"),
File("linux-build-defaults.yml"),
File("macos-build-defaults.yml"),
File("nightly-binary-build-defaults.yml"),
File("linux-binary-build-defaults.yml"),
File("macos-binary-build-defaults.yml"),
File("nightly-build-smoke-tests-defaults.yml"),
Header("Job specifications job specs"),
Treegen(pytorch_build_definitions.add_build_env_defs, 0),
File("job-specs-setup.yml"),
File("job-specs-custom.yml"),
Treegen(caffe2_build_definitions.add_caffe2_builds, 1),
File("binary_update_htmls.yml"),
Header("Binary build specs individual job specifications"),
Treegen(binary_build_definitions.add_binary_build_specs, 1),
Header(
"Binary build tests", [
"These are the smoke tests run right after the build, before the upload.",
"If these fail, the upload doesn't happen."
]
),
Treegen(binary_build_definitions.add_binary_build_tests, 1),
File("binary-build-tests.yml"),
Header("Binary build uploads"),
Treegen(binary_build_definitions.add_binary_build_uploads, 1),
Header("Smoke test specs individual job specifications"),
Treegen(binary_build_definitions.add_smoke_test_specs, 1),
File("workflows.yml"),
Listgen(pytorch_build_definitions.get_workflow_list, 3),
File("workflows-pytorch-macos-builds.yml"),
Listgen(caffe2_build_definitions.get_caffe2_workflows, 3),
File("workflows-binary-builds-smoke-subset.yml"),
Header("Daily smoke test trigger"),
Treegen(binary_build_definitions.add_binary_smoke_test_jobs, 1),
Header("Daily binary build trigger"),
Treegen(binary_build_definitions.add_binary_build_jobs, 1),
Header("Nightly tests"),
Listgen(binary_build_definitions.get_nightly_tests, 3),
File("workflows-nightly-uploads-header.yml"),
Listgen(binary_build_definitions.get_nightly_uploads, 3),
File("workflows-s3-html.yml"),
]
def stitch_sources(output_filehandle):
for f in YAML_SOURCES:
f.write(output_filehandle)
if __name__ == "__main__":
stitch_sources(sys.stdout)

8
.circleci/regenerate.sh Executable file
View File

@ -0,0 +1,8 @@
#!/bin/bash -xe
# Allows this script to be invoked from any directory:
cd $(dirname "$0")
NEW_FILE=$(mktemp)
./generate_config_yml.py > $NEW_FILE
cp $NEW_FILE config.yml

View File

@ -0,0 +1,4 @@
All the scripts in this directory are callable from `~/workspace/.circleci/scripts/foo.sh`.
Don't try to call them as `.circleci/scripts/foo.sh`, that won't
(necessarily) work. See Note [Workspace for CircleCI scripts] in
job-specs-setup.yml for more details.

View File

@ -0,0 +1,48 @@
#!/bin/bash
set -eux -o pipefail
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
# It is very important that this stays in sync with binary_populate_env.sh
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
# Clone the Pytorch branch
git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"
pushd "$PYTORCH_ROOT"
if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then
# "smoke" binary build on PRs
git fetch --force origin "pull/${CIRCLE_PR_NUMBER}/head:remotes/origin/pull/${CIRCLE_PR_NUMBER}"
git reset --hard "$CIRCLE_SHA1"
git checkout -q -B "$CIRCLE_BRANCH"
git reset --hard "$CIRCLE_SHA1"
elif [[ -n "${CIRCLE_SHA1:-}" ]]; then
# Scheduled workflows & "smoke" binary build on master on PR merges
git reset --hard "$CIRCLE_SHA1"
git checkout -q -B master
else
echo "Can't tell what to checkout"
exit 1
fi
git submodule update --init --recursive --quiet
echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder master repo
git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
git fetch origin
git reset origin/master --hard
echo "Using builder from "
git --no-pager log --max-count 1
popd

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -eux -o pipefail
# This step runs on multiple executors with different envfile locations
if [[ "$(uname)" == Darwin ]]; then
envfile="/Users/distiller/project/env"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
envfile="/home/circleci/project/env"
else
# docker executor (binary builds)
envfile="/env"
fi
# TODO this is super hacky and ugly. Basically, the binary_update_html job does
# not have an env file, since it does not call binary_populate_env.sh, since it
# does not have a BUILD_ENVIRONMENT. So for this one case, which we detect by a
# lack of an env file, we manually export the environment variables that we
# need to install miniconda
if [[ ! -f "$envfile" ]]; then
MINICONDA_ROOT="/home/circleci/project/miniconda"
workdir="/home/circleci/project"
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
export -f retry
else
source "$envfile"
fi
conda_sh="$workdir/install_miniconda.sh"
if [[ "$(uname)" == Darwin ]]; then
retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
else
retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
fi
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
# We can't actually add miniconda to the PATH in the envfile, because that
# breaks 'unbuffer' in Mac jobs. This is probably because conda comes with
# a tclsh, which then gets inserted before the tclsh needed in /usr/bin

View File

@ -0,0 +1,30 @@
#!/bin/bash
echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"
set -eux -o pipefail
source /env
# Defaults here so they can be changed in one place
export MAX_JOBS=12
# Parse the parameters
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
build_script='conda/build_pytorch.sh'
elif [[ "$DESIRED_CUDA" == cpu ]]; then
build_script='manywheel/build_cpu.sh'
else
build_script='manywheel/build.sh'
fi
# We want to call unbuffer, which calls tclsh which finds the expect
# package. The expect was installed by yum into /usr/bin so we want to
# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in
# the conda docker images, so we prepend it to the path here.
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
mkdir /just_tclsh_bin
ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh
export PATH=/just_tclsh_bin:$PATH
fi
# Build the package
SKIP_ALL_TESTS=1 unbuffer "/builder/$build_script" | ts

View File

@ -0,0 +1,50 @@
#!/bin/bash
source /home/circleci/project/env
cat >/home/circleci/project/ci_test_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -eux -o pipefail
# Set up Python
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda create -qyn testenv python="$DESIRED_PYTHON"
source activate testenv >/dev/null
elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then
export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"
else
python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"
fi
# Install the package
# These network calls should not have 'retry's because they are installing
# locally and aren't actually network calls
# TODO there is duplicated and inconsistent test-python-env setup across this
# file, builder/smoke_test.sh, and builder/run_tests.sh, and also in the
# conda build scripts themselves. These should really be consolidated
pkg="/final_pkgs/\$(ls /final_pkgs)"
if [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "\$pkg" --offline
retry conda install -yq future numpy protobuf six
if [[ "$DESIRED_CUDA" != 'cpu' ]]; then
# DESIRED_CUDA is in format cu90 or cu100
if [[ "${#DESIRED_CUDA}" == 4 ]]; then
cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
else
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
fi
retry conda install -yq -c pytorch "cudatoolkit=\${cu_ver}"
fi
else
pip install "\$pkg"
retry pip install -q future numpy protobuf six
fi
# Test the package
/builder/check_binary.sh
# =================== The above code will be executed inside Docker container ===================
EOL
echo
echo
echo "The script that will run in the next step is:"
cat /home/circleci/project/ci_test_script.sh

View File

@ -0,0 +1,40 @@
#!/bin/bash
# Do NOT set -x
source /home/circleci/project/env
set -eu -o pipefail
set +x
declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
cat >/home/circleci/project/login_to_anaconda.sh <<EOL
set +x
echo "Trying to login to Anaconda"
yes | anaconda login \
--username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"
set -x
EOL
chmod +x /home/circleci/project/login_to_anaconda.sh
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -eux -o pipefail
export PATH="$MINICONDA_ROOT/bin:$PATH"
# Upload the package to the final location
pushd /home/circleci/project/final_pkgs
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry timeout 30 /home/circleci/project/login_to_anaconda.sh
anaconda upload "$(ls)" -u pytorch --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
for pkg in $(ls); do
retry aws s3 cp "$pkg" "$s3_dir" --acl public-read
done
else
retry pip install -q awscli
s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read
fi

View File

@ -0,0 +1,24 @@
#!/bin/bash
set -eux -o pipefail
source "/Users/distiller/project/env"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
# For some reason `unbuffer` breaks if we change the PATH here, so we
# write a script with the PATH change in it and unbuffer the whole
# thing
build_script="$workdir/build_script.sh"
touch "$build_script"
chmod +x "$build_script"
# Build
cat >"$build_script" <<EOL
export PATH="$workdir/miniconda/bin:$PATH"
if [[ "$PACKAGE_TYPE" == conda ]]; then
"$workdir/builder/conda/build_pytorch.sh"
else
export TORCH_PACKAGE_NAME="$(echo $TORCH_PACKAGE_NAME | tr '-' '_')"
"$workdir/builder/wheel/build_wheel.sh"
fi
EOL
unbuffer "$build_script" | ts

View File

@ -0,0 +1,30 @@
#!/bin/bash
set -eux -o pipefail
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
pkg="$workdir/final_pkgs/$(ls $workdir/final_pkgs)"
# Don't test libtorch
# TODO we should test libtorch
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
exit 0
fi
# Create a new test env
# TODO cut all this out into a separate test job and have an entirely different
# miniconda
source deactivate || true
conda create -qyn test python="$DESIRED_PYTHON"
source activate test >/dev/null
# Install the package
if [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "$pkg" --offline
else
pip install "$pkg" --no-index --no-dependencies -v
fi
# Test
pushd "$workdir/pytorch"
$workdir/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"

View File

@ -0,0 +1,40 @@
#!/bin/bash
# Do NOT set -x
set -eu -o pipefail
set +x
export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
cat >/Users/distiller/project/login_to_anaconda.sh <<EOL
set +x
echo "Trying to login to Anaconda"
yes | anaconda login \
--username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \
--password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"
set -x
EOL
chmod +x /Users/distiller/project/login_to_anaconda.sh
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -eux -o pipefail
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
pushd "$workdir/final_pkgs"
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry /Users/distiller/project/login_to_anaconda.sh
retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry pip install -q awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
for pkg in $(ls); do
retry aws s3 cp "$pkg" "$s3_dir" --acl public-read
done
else
retry pip install -q awscli
s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read
fi

View File

@ -0,0 +1,106 @@
#!/bin/bash
set -eux -o pipefail
export TZ=UTC
# We need to write an envfile to persist these variables to following
# steps, but the location of the envfile depends on the circleci executor
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
envfile="$workdir/env"
touch "$envfile"
chmod +x "$envfile"
# Parse the BUILD_ENVIRONMENT to package type, python, and cuda
configs=($BUILD_ENVIRONMENT)
export PACKAGE_TYPE="${configs[0]}"
export DESIRED_PYTHON="${configs[1]}"
export DESIRED_CUDA="${configs[2]}"
export DESIRED_DEVTOOLSET="${configs[3]:-}"
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
export BUILD_PYTHONLESS=1
fi
# Pick docker image
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="soumith/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="soumith/manylinux-cuda100"
else
export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"
fi
# Upload to parallel folder for gcc abis
# All nightlies used to be devtoolset3, then devtoolset7 was added as a build
# option, so the upload was redirected to nightly/devtoolset7 to avoid
# conflicts with other binaries (there shouldn't be any conflicts). Now we are
# making devtoolset7 the default.
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$(uname)" == 'Darwin' ]]; then
export PIP_UPLOAD_FOLDER='nightly/'
else
# On linux machines, this shouldn't actually be called anymore. This is just
# here for extra safety.
export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'
fi
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
if [[ "$(uname)" == 'Darwin' ]]; then
export PYTORCH_BUILD_VERSION="1.2.0.dev$DATE"
else
export PYTORCH_BUILD_VERSION="1.2.0.dev$DATE+$DESIRED_CUDA"
fi
export PYTORCH_BUILD_NUMBER=1
cat >>"$envfile" <<EOL
# =================== The following code will be executed inside Docker container ===================
export TZ=UTC
echo "Running on $(uname -a) at $(date)"
export PACKAGE_TYPE="$PACKAGE_TYPE"
export DESIRED_PYTHON="$DESIRED_PYTHON"
export DESIRED_CUDA="$DESIRED_CUDA"
export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"
export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.2.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
export TORCH_PACKAGE_NAME='torch-nightly'
export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'
export USE_FBGEMM=1
export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"
export DOCKER_IMAGE="$DOCKER_IMAGE"
export workdir="$workdir"
export MAC_PACKAGE_WORK_DIR="$workdir"
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
export MINICONDA_ROOT="$workdir/miniconda"
export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"
export CIRCLE_TAG="${CIRCLE_TAG:-}"
export CIRCLE_SHA1="$CIRCLE_SHA1"
export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
export CIRCLE_BRANCH="$CIRCLE_BRANCH"
# =================== The above code will be executed inside Docker container ===================
EOL
echo 'retry () {' >> "$envfile"
echo ' $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)' >> "$envfile"
echo '}' >> "$envfile"
echo 'export -f retry' >> "$envfile"
cat "$envfile"

View File

@ -0,0 +1,48 @@
#!/bin/bash
# This section is used in the binary_test and smoke_test jobs. It expects
# 'binary_populate_env' to have populated /home/circleci/project/env and it
# expects another section to populate /home/circleci/project/ci_test_script.sh
# with the code to run in the docker
# Expect all needed environment variables to be written to this file
source /home/circleci/project/env
echo "Running the following code in Docker"
cat /home/circleci/project/ci_test_script.sh
echo
echo
set -eux -o pipefail
# Expect actual code to be written to this file
chmod +x /home/circleci/project/ci_test_script.sh
# Run the docker
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
export id=$(docker run --runtime=nvidia -t -d "${DOCKER_IMAGE}")
else
export id=$(docker run -t -d "${DOCKER_IMAGE}")
fi
# Copy the envfile and script with all the code to run into the docker.
docker cp /home/circleci/project/. "$id:/circleci_stuff"
# Copy built packages into the docker to test. This should only exist on the
# binary test jobs. The package should've been created from a binary build job,
# whhich persisted the package to a CircleCI workspace, which this job then
# copies into a GPU enabled docker for testing
if [[ -d "/home/circleci/project/final_pkgs" ]]; then
docker cp /home/circleci/project/final_pkgs "$id:/final_pkgs"
fi
# Copy the needed repos into the docker. These do not exist in the smoke test
# jobs, since the smoke test jobs do not need the Pytorch source code.
if [[ -d "$PYTORCH_ROOT" ]]; then
docker cp "$PYTORCH_ROOT" "$id:/pytorch"
fi
if [[ -d "$BUILDER_ROOT" ]]; then
docker cp "$BUILDER_ROOT" "$id:/builder"
fi
# Execute the test script that was populated by an earlier section
export COMMAND='((echo "source /circleci_stuff/env && /circleci_stuff/ci_test_script.sh") | docker exec -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

View File

@ -0,0 +1,127 @@
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "cpp_doc_push_script.sh: Invoked with $*"
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the Python API docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$3" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
# ======================== Building PyTorch C++ API Docs ========================
echo "Building PyTorch C++ API docs..."
# Clone the cppdocs repo
rm -rf cppdocs
git clone https://github.com/pytorch/cppdocs
set -ex
sudo apt-get -y install doxygen
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
time GEN_TO_SOURCE=1 python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THNN/generic/THNN.h \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml
# Copy some required files
cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py
cp torch/_utils_internal.py tools/shared
# Generate PyTorch files
time python tools/setup_helpers/generate_code.py \
--declarations-path build/aten/src/ATen/Declarations.yaml \
--nn-path aten/src/
# Build the docs
pushd docs/cpp
pip install breathe==4.11.1 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx==1.8.5
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j
popd
popd
pushd cppdocs
# Purge everything with some exceptions
mkdir /tmp/cppdocs-sync
mv _config.yml README.md /tmp/cppdocs-sync/
rm -rf *
# Copy over all the newly generated HTML
cp -r "${pt_checkout}"/docs/cpp/build/html/* .
# Copy back _config.yml
rm -rf _config.yml
mv /tmp/cppdocs-sync/* .
# Make a new commit
git add . || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Automatic sync on $(date)" || true
git status
if [ "$dry_run" = false ]; then
echo "Pushing to https://github.com/pytorch/cppdocs"
set +x
/usr/bin/expect <<DONE
spawn git push -u origin master
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -0,0 +1,118 @@
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 2: What version of the docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: The branch to push to. Usually is "site"
branch="$3"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1
fi
# Argument 4: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$4" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
git clone https://github.com/pytorch/pytorch.github.io -b $branch
pushd pytorch.github.io
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
rm -rf pytorch || true
# Install TensorBoard in python 3 so torch.utils.tensorboard classes render
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# Get all the documentation sources, put them in one place
pushd "$pt_checkout"
git clone https://github.com/pytorch/vision
pushd vision
conda install -q pillow
time python setup.py install
popd
pushd docs
rm -rf source/torchvision
cp -a ../vision/docs/source source/torchvision
# Build the docs
pip -q install -r requirements.txt || true
if [ "$is_master_doc" = true ]; then
make html
else
make html-stable
fi
# Move them into the docs repo
popd
popd
git rm -rf "$install_path" || true
mv "$pt_checkout/docs/build/html" "$install_path"
# Add the version handler by search and replace.
# XXX: Consider moving this to the docs Makefile or site build
if [ "$is_master_doc" = true ]; then
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
else
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"
fi
git add "$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "auto-generating sphinx docs" || true
git status
if [ "$dry_run" = false ]; then
echo "Pushing to pytorch.github.io:$branch"
set +x
/usr/bin/expect <<DONE
spawn git push origin $branch
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -0,0 +1,87 @@
#!/usr/bin/env bash
set -ex -o pipefail
# Set up NVIDIA docker repo
curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
# Remove unnecessary sources
sudo rm -f /etc/apt/sources.list.d/google-chrome.list
sudo rm -f /etc/apt/heroku.list
sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
sudo rm -f /etc/apt/partner.list
sudo apt-get -y update
sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce
# WARNING: Docker version is hardcoded here; you must update the
# version number below for docker-ce and nvidia-docker2 to get newer
# versions of Docker. We hardcode these numbers because we kept
# getting broken CI when Docker would update their docker version,
# and nvidia-docker2 would be out of date for a day until they
# released a newer version of their package.
#
# How to figure out what the correct versions of these packages are?
# My preferred method is to start a Docker instance of the correct
# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask
# apt what the packages you need are. Note that the CircleCI image
# comes with Docker.
sudo apt-get -y install \
linux-headers-$(uname -r) \
linux-image-generic \
moreutils \
docker-ce=5:18.09.4~3-0~ubuntu-xenial \
nvidia-container-runtime=2.0.0+docker18.09.4-1 \
nvidia-docker2=2.0.3+docker18.09.4-1 \
expect-dev
sudo pkill -SIGHUP dockerd
retry () {
$* || $* || $* || $* || $*
}
retry sudo pip -q install awscli==1.16.35
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-410.104.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
fi
if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH:-}" >> /home/circleci/project/env
echo "declare -x PYTHON_VERSION=${PYTHON_VERSION:-}" >> /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env
fi
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
echo "declare -x MAX_JOBS=${MAX_JOBS}" >> /home/circleci/project/env
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# This IAM user allows write access to S3 bucket for sccache & bazels3cache
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
set -x
else
# This IAM user allows write access to S3 bucket for sccache
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
set -x
fi
fi
# This IAM user only allows read-write access to ECR
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}
eval $(aws ecr get-login --region us-east-1 --no-include-email)
set -x

View File

@ -0,0 +1,50 @@
#!/usr/bin/env bash
set -eux -o pipefail
# Set up CircleCI GPG keys for apt, if needed
curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
# Stop background apt updates. Hypothetically, the kill should not
# be necessary, because stop is supposed to send a kill signal to
# the process, but we've added it for good luck. Also
# hypothetically, it's supposed to be unnecessary to wait for
# the process to block. We also have that line for good luck.
# If you like, try deleting them and seeing if it works.
sudo systemctl stop apt-daily.service || true
sudo systemctl kill --kill-who=all apt-daily.service || true
sudo systemctl stop unattended-upgrades.service || true
sudo systemctl kill --kill-who=all unattended-upgrades.service || true
# wait until `apt-get update` has been killed
while systemctl is-active --quiet apt-daily.service
do
sleep 1;
done
while systemctl is-active --quiet unattended-upgrades.service
do
sleep 1;
done
# See if we actually were successful
systemctl list-units --all | cat
# For good luck, try even harder to kill apt-get
sudo pkill apt-get || true
# For even better luck, purge unattended-upgrades
sudo apt-get purge -y unattended-upgrades
cat /etc/apt/sources.list
# For the bestest luck, kill again now
sudo pkill apt || true
sudo pkill dpkg || true
# Try to detect if apt/dpkg is stuck
if ps auxfww | grep '[a]pt'; then
echo "WARNING: There are leftover apt processes; subsequent apt update will likely fail"
fi
if ps auxfww | grep '[d]pkg'; then
echo "WARNING: There are leftover dpkg processes; subsequent apt update will likely fail"
fi

View File

@ -0,0 +1,96 @@
import argparse
import re
import sys
# Modify this variable if you want to change the set of default jobs
# which are run on all pull requests.
#
# WARNING: Actually, this is a lie; we're currently also controlling
# the set of jobs to run via the Workflows filters in CircleCI config.
default_set = [
# PyTorch CPU
# Selected oldest Python 2 version to ensure Python 2 coverage
'pytorch-linux-trusty-py2.7.9',
# PyTorch CUDA
'pytorch-linux-xenial-cuda9-cudnn7-py3',
# PyTorch ASAN
'pytorch-linux-xenial-py3-clang5-asan',
# PyTorch DEBUG
'pytorch-linux-trusty-py3.6-gcc5.4',
# Caffe2 CPU
'caffe2-py2-mkl-ubuntu16.04',
# Caffe2 CUDA
'caffe2-py2-cuda9.1-cudnn7-ubuntu16.04',
# Caffe2 ONNX
'caffe2-onnx-py2-gcc5-ubuntu16.04',
'caffe2-onnx-py3.6-clang7-ubuntu16.04',
# Caffe2 Clang
'caffe2-py2-clang7-ubuntu16.04',
# Caffe2 CMake
'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',
# Binaries
'manywheel 2.7mu cpu devtoolset7',
'libtorch 2.7m cpu devtoolset7',
# Caffe2 Android
'caffe2-py2-android-ubuntu16.04',
# Caffe2 OSX
'caffe2-py2-system-macos10.13',
# PyTorch OSX
'pytorch-macos-10.13-cuda9.2-cudnn7-py3',
# PyTorch Android
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c',
# XLA
'pytorch-xla-linux-trusty-py3.6-gcc5.4',
# Other checks
'pytorch-short-perf-test-gpu',
'pytorch-python-doc-push',
'pytorch-cpp-doc-push',
]
# Takes in commit message to analyze via stdin
#
# This script will query Git and attempt to determine if we should
# run the current CI job under question
#
# NB: Try to avoid hard-coding names here, so there's less place to update when jobs
# are updated/renamed
#
# Semantics in the presence of multiple tags:
# - Let D be the set of default builds
# - Let S be the set of explicitly specified builds
# - Run S \/ D
parser = argparse.ArgumentParser()
parser.add_argument('build_environment')
args = parser.parse_args()
commit_msg = sys.stdin.read()
# Matches anything that looks like [foo ci] or [ci foo] or [foo test]
# or [test foo]
RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')
markers = RE_MARKER.finditer(commit_msg)
for m in markers:
if m.group(1) and m.group(2):
print("Unrecognized marker: {}".format(m.group(0)))
continue
spec = m.group(1) or m.group(2)
if spec in args.build_environment or spec == 'all':
print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))
sys.exit(0)
for spec in default_set:
if spec in args.build_environment:
print("Accepting {} as part of default set".format(args.build_environment))
sys.exit(0)
print("Rejecting {}".format(args.build_environment))
sys.exit(1)

View File

@ -0,0 +1,29 @@
#!/usr/bin/env bash
set -exu -o pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# Check if we should actually run
echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"
echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"
if [ -z "${BUILD_ENVIRONMENT:-}" ]; then
echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"
echo "CircleCI scripts are probably misconfigured."
exit 1
fi
if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then
echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"
echo "written out. Are you perhaps running the wrong copy of this script?"
echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"
exit 1
fi
if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then
if [[ $CIRCLE_BRANCH != "ci-all/"* ]]; then
# Don't swallow "script doesn't exist
[ -e "$SCRIPT_DIR/should_run_job.py" ]
if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then
circleci step halt
exit
fi
fi
fi

View File

@ -0,0 +1,20 @@
# There is currently no testing for libtorch TODO
# binary_linux_libtorch_2.7m_cpu_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 2.7m cpu"
# resource_class: gpu.medium
# <<: *binary_linux_test
#
# binary_linux_libtorch_2.7m_cu90_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 2.7m cu90"
# resource_class: gpu.medium
# <<: *binary_linux_test
#
# binary_linux_libtorch_2.7m_cu100_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 2.7m cu100"
# resource_class: gpu.medium
# <<: *binary_linux_test

View File

@ -0,0 +1,98 @@
# update_s3_htmls job
# These jobs create html files for every cpu/cu## folder in s3. The html
# files just store the names of all the files in that folder (which are
# binary files (.whl files)). This is to allow pip installs of the latest
# version in a folder without having to know the latest date. Pip has a flag
# -f that you can pass an html file listing a bunch of packages, and pip will
# then install the one with the most recent version.
update_s3_htmls: &update_s3_htmls
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *binary_checkout
# N.B. we do not run binary_populate_env. The only variable we need is
# PIP_UPLOAD_FOLDER (which is 'nightly/' for the nightlies and '' for
# releases, and sometimes other things for special cases). Instead we
# expect PIP_UPLOAD_FOLDER to be passed directly in the env. This is
# because, unlike all the other binary jobs, these jobs only get run once,
# in a separate workflow. They are not a step in other binary jobs like
# build, test, upload.
#
# You could attach this to every job, or include it in the upload step if
# you wanted. You would need to add binary_populate_env in this case to
# make sure it has the same upload folder as the job it's attached to. This
# function is idempotent, so it won't hurt anything; it's just a little
# unnescessary"
- run:
name: Update s3 htmls
no_output_timeout: "1h"
command: |
set +x
echo "declare -x \"AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}\"" >> /home/circleci/project/env
echo "declare -x \"AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}\"" >> /home/circleci/project/env
source /home/circleci/project/env
set -eux -o pipefail
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry pip install awscli==1.6
"/home/circleci/project/builder/cron/update_s3_htmls.sh"
# Update s3 htmls for the nightlies
update_s3_htmls_for_nightlies:
environment:
PIP_UPLOAD_FOLDER: "nightly/"
<<: *update_s3_htmls
# Update s3 htmls for the nightlies for devtoolset7
update_s3_htmls_for_nightlies_devtoolset7:
environment:
PIP_UPLOAD_FOLDER: "nightly/devtoolset7/"
<<: *update_s3_htmls
# upload_binary_logs job
# The builder hud at pytorch.org/builder shows the sizes of all the binaries
# over time. It gets this info from html files stored in S3, which this job
# populates every day.
upload_binary_sizes: &upload_binary_sizes
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_install_miniconda
- run:
name: Upload binary sizes
no_output_timeout: "1h"
command: |
set +x
echo "declare -x \"AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}\"" > /home/circleci/project/env
echo "declare -x \"AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}\"" >> /home/circleci/project/env
export DATE="$(date -u +%Y_%m_%d)"
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
source /home/circleci/project/env
set -eux -o pipefail
# This is hardcoded to match binary_install_miniconda.sh
export PATH="/home/circleci/project/miniconda/bin:$PATH"
# Not any awscli will work. Most won't. This one will work
retry conda create -qyn aws36 python=3.6
source activate aws36
pip install awscli==1.16.46
"/home/circleci/project/builder/cron/upload_binary_sizes.sh"

View File

@ -0,0 +1,63 @@
# WARNING: DO NOT EDIT THIS FILE DIRECTLY!!!
# See the README.md in this directory.
# IMPORTANT: To update Docker image version, please first update
# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/pytorch/DockerVersion.groovy and
# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/caffe2/DockerVersion.groovy,
# and then update DOCKER_IMAGE_VERSION at the top of the following files:
# * cimodel/data/pytorch_build_definitions.py
# * cimodel/data/caffe2_build_definitions.py
# And the inline copies of the variable in
# * verbatim-sources/job-specs-custom.yml
# (grep for DOCKER_IMAGE)
docker_config_defaults: &docker_config_defaults
user: jenkins
aws_auth:
# This IAM user only allows read-write access to ECR
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}
# This system setup script is meant to run before the CI-related scripts, e.g.,
# installing Git client, checking out code, setting up CI env, and
# building/testing.
setup_linux_system_environment: &setup_linux_system_environment
name: Set Up System Environment
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh
# NB: This (and the command below) must be run after attaching
# ~/workspace. This is NOT the default working directory (that's
# ~/project); this workspace is generated by the setup job.
should_run_job: &should_run_job
name: Should Run Job After attach_workspace
no_output_timeout: "2m"
command: ~/workspace/.circleci/scripts/should_run_job.sh
setup_ci_environment: &setup_ci_environment
name: Set Up CI Environment After attach_workspace
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/setup_ci_environment.sh
# Installs expect and moreutils so that we can call `unbuffer` and `ts`.
# Also installs OpenMP
# !!!!NOTE!!!! this is copied into a binary_macos_brew_update job which is the
# same but does not install libomp. If you are changing this, consider if you
# need to change that step as well.
macos_brew_update: &macos_brew_update
name: Brew update and install moreutils, expect and libomp
no_output_timeout: "1h"
command: |
set -ex
# See https://discourse.brew.sh/t/fetching-homebrew-repos-is-slow/5374/3
brew untap caskroom/homebrew-cask
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards
brew update
brew unlink parallel
brew install moreutils
brew link parallel --overwrite
brew install expect
brew install libomp

View File

@ -0,0 +1,267 @@
pytorch_short_perf_test_gpu:
environment:
BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:323"
PYTHON_VERSION: "3.6"
USE_CUDA_DOCKER_RUNTIME: "1"
resource_class: gpu.medium
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
name: Perf Test
no_output_timeout: "1h"
command: |
set -e
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env
# This IAM user allows write access to S3 bucket for perf test numbers
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env
set -x
docker cp /home/circleci/project/env $id:/var/lib/jenkins/workspace/env
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
pytorch_python_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-python-doc-push
# TODO: stop hardcoding this
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:323"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site") | docker exec -u jenkins -i "$id" bash) 2>&1'
# stable release docs push. Due to some circleci limitations, we keep
# an eternal PR open for merging v1.2.0 -> master for this job.
# XXX: The following code is only run on the v1.2.0 branch, which might
# not be exactly the same as what you see here.
elif [[ "${CIRCLE_BRANCH}" == "v1.2.0" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/stable 1.2.0 site-v1.2.0") | docker exec -u jenkins -i "$id" bash) 2>&1'
# For open PRs: Do a dry_run of the docs build, don't push build
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/master master site dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
pytorch_cpp_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:323"
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master") | docker exec -u jenkins -i "$id" bash) 2>&1'
# stable release docs push. Due to some circleci limitations, we keep
# an eternal PR open (#16502) for merging v1.0.1 -> master for this job.
# XXX: The following code is only run on the v1.0.1 branch, which might
# not be exactly the same as what you see here.
elif [[ "${CIRCLE_BRANCH}" == "v1.0.1" ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/stable 1.0.1") | docker exec -u jenkins -i "$id" bash) 2>&1'
# For open PRs: Do a dry_run of the docs build, don't push build
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/master master dry_run") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
pytorch_macos_10_13_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- checkout
- run:
<<: *macos_brew_update
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
mkdir -p /Users/distiller/pytorch-ci-env/workspace
# copy with -a to preserve relative structure (e.g., symlinks), and be recursive
cp -a /Users/distiller/project/. /Users/distiller/pytorch-ci-env/workspace
- persist_to_workspace:
root: /Users/distiller/pytorch-ci-env
paths:
- "*"
pytorch_macos_10_13_py3_test:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
# This workspace also carries binaries from the build job
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *macos_brew_update
- run:
name: Test
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# copy with -a to preserve relative structure (e.g., symlinks), and be recursive
# TODO: I'm not sure why we can't just run our job in
# ~/workspace and call it a day
# NB: Yes, you need workspace twice
cp -a ~/workspace/workspace/. /Users/distiller/project
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- checkout
- run:
<<: *macos_brew_update
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
# Install CUDA 9.2
sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true
curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip
unzip ~/cuda_9.2.64_mac_installer.zip -d ~/
sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window
sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib
sudo rm -rf /usr/local/cuda || true
# Install cuDNN 7.1 for CUDA 9.2
curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz
rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1
tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1
sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/
sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/
sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
git submodule sync && git submodule update -q --init --recursive
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

View File

@ -0,0 +1,34 @@
setup:
docker:
- image: circleci/python:3.7.3
steps:
- checkout
- run:
name: Ensure config is up to date
command: ./ensure-consistency.py
working_directory: .circleci
- run:
name: Save commit message
command: git log --format='%B' -n 1 HEAD > .circleci/scripts/COMMIT_MSG
# Note [Workspace for CircleCI scripts]
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# In the beginning, you wrote your CI scripts in a
# .circleci/config.yml file, and life was good. Your CI
# configurations flourished and multiplied.
#
# Then one day, CircleCI cometh down high and say, "Your YAML file
# is too biggeth, it stresses our servers so." And thus they
# asketh us to smite the scripts in the yml file.
#
# But you can't just put the scripts in the .circleci folder,
# because in some jobs, you don't ever actually checkout the
# source repository. Where you gonna get the scripts from?
#
# Here's how you do it: you persist .circleci/scripts into a
# workspace, attach the workspace in your subjobs, and run all
# your scripts from there.
- persist_to_workspace:
root: .
paths: .circleci/scripts

View File

@ -0,0 +1,101 @@
# binary linux build defaults
##############################################################################
binary_linux_build: &binary_linux_build
resource_class: 2xlarge+
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Install unbuffer and ts
command: |
set -eux -o pipefail
source /env
retry yum -q -y install epel-release
retry yum -q -y install expect moreutils
- run:
name: Update compiler to devtoolset7
command: |
set -eux -o pipefail
source /env
if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' ]]; then
source "/builder/update_compiler.sh"
# Env variables are not persisted into the next step
echo "export PATH=$PATH" >> /env
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /env
else
echo "Not updating compiler"
fi
- run:
name: Build
no_output_timeout: "1h"
command: |
source "/pytorch/.circleci/scripts/binary_linux_build.sh"
- persist_to_workspace:
root: /
paths: final_pkgs
# This should really just be another step of the binary_linux_build job above.
# This isn't possible right now b/c the build job uses the docker executor
# (otherwise they'd be really really slow) but this one uses the macine
# executor (b/c we have to run the docker with --runtime=nvidia and we can't do
# that on the docker executor)
binary_linux_test: &binary_linux_test
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
# TODO: We shouldn't attach the workspace multiple times
- attach_workspace:
at: /home/circleci/project
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Prepare test code
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_linux_test.sh
- run:
<<: *binary_run_in_docker
binary_linux_upload: &binary_linux_upload
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- attach_workspace:
at: /home/circleci/project
- run:
<<: *binary_populate_env
- run:
<<: *binary_install_miniconda
- run:
name: Upload
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_linux_upload.sh

View File

@ -0,0 +1,234 @@
##############################################################################
# Linux build defaults
##############################################################################
pytorch_linux_build_defaults: &pytorch_linux_build_defaults
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- checkout
- run:
<<: *setup_ci_environment
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
# Pull Docker image and run build
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
git submodule sync && git submodule update -q --init --recursive
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
fi
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
# Note [Special build images]
# The namedtensor and xla builds use the same docker image as
# pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
# distinguish between them so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
docker push ${COMMIT_DOCKER_IMAGE}
fi
pytorch_linux_test_defaults: &pytorch_linux_test_defaults
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
name: Test
no_output_timeout: "90m"
command: |
set -e
# See Note [Special build images]
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
if [ -n "${MULTI_GPU}" ]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"'&& echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
caffe2_linux_build_defaults: &caffe2_linux_build_defaults
resource_class: large
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *setup_linux_system_environment
- checkout
- run:
<<: *setup_ci_environment
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
cat >/home/circleci/project/ci_build_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"
# Reinitialize submodules
git submodule sync && git submodule update -q --init --recursive
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
sudo chown -R jenkins:jenkins '/opt/conda'
fi
# Build
./.jenkins/caffe2/build.sh
# Show sccache stats if it is running
if pgrep sccache > /dev/null; then
sccache --show-stats
fi
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_build_script.sh
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}
else
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
fi
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
docker push ${COMMIT_DOCKER_IMAGE}
fi
caffe2_linux_test_defaults: &caffe2_linux_test_defaults
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *setup_linux_system_environment
- run:
<<: *should_run_job
- run:
<<: *setup_ci_environment
- run:
name: Test
no_output_timeout: "1h"
command: |
set -e
# TODO: merge this into Caffe2 test.sh
cat >/home/circleci/project/ci_test_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -ex
export BUILD_ENVIRONMENT="$BUILD_ENVIRONMENT"
# libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...
sudo ln /dev/null /dev/raw1394
# conda must be added to the path for Anaconda builds (this location must be
# the same as that in install_anaconda.sh used to build the docker image)
if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then
export PATH=/opt/conda/bin:$PATH
fi
# Upgrade SSL module to avoid old SSL warnings
pip -q install --user --upgrade pyOpenSSL ndg-httpsclient pyasn1
pip -q install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
# Build
./.jenkins/caffe2/test.sh
# Remove benign core dumps.
# These are tests for signal handling (including SIGABRT).
rm -f ./crash/core.fatal_signal_as.*
rm -f ./crash/core.logging_test.*
# =================== The above code will be executed inside Docker container ===================
EOL
chmod +x /home/circleci/project/ci_test_script.sh
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-cmake-${CIRCLE_SHA1}
else
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_test_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

View File

@ -0,0 +1,72 @@
##############################################################################
# Macos binary build defaults
# The root of everything is /Users/distiller/pytorch-ci-env/workspace
##############################################################################
binary_mac_build: &binary_mac_build
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- run:
name: Test
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
binary_mac_upload: &binary_mac_upload
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
name: Upload
no_output_timeout: "10m"
command: |
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_upload.sh"
cat "$script"
source "$script"

View File

@ -0,0 +1,90 @@
##############################################################################
# Macos build defaults
##############################################################################
caffe2_macos_build_defaults: &caffe2_macos_build_defaults
macos:
xcode: "9.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_workspace:
at: ~/workspace
- run:
<<: *should_run_job
- checkout
- run:
<<: *macos_brew_update
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CIRCLECI=1
brew install cmake
# Reinitialize submodules
git submodule sync && git submodule update -q --init --recursive
# Reinitialize path (see man page for path_helper(8))
eval `/usr/libexec/path_helper -s`
# Use Homebrew Python if configured to do so
if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then
export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH
fi
pip -q install numpy
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
chmod +x ${TMPDIR}/conda.sh
/bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/conda.sh
export PATH="${TMPDIR}/anaconda/bin:${PATH}"
source ${TMPDIR}/anaconda/bin/activate
fi
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
# This IAM user allows write access to S3 bucket for sccache
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
export SCCACHE_BIN=${PWD}/sccache_bin
mkdir -p ${SCCACHE_BIN}
if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"
chmod a+x "${SCCACHE_BIN}/clang++"
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"
chmod a+x "${SCCACHE_BIN}/clang"
export PATH="${SCCACHE_BIN}:$PATH"
fi
# Build
if [ "${BUILD_IOS:-0}" -eq 1 ]; then
unbuffer scripts/build_ios.sh 2>&1 | ts
elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then
# All conda build logic should be in scripts/build_anaconda.sh
unbuffer scripts/build_anaconda.sh 2>&1 | ts
else
unbuffer scripts/build_local.sh 2>&1 | ts
fi
# Show sccache stats if it is running
if which sccache > /dev/null; then
sccache --show-stats
fi

View File

@ -0,0 +1,70 @@
##############################################################################
# Binary build (nightlies nightly build) defaults
# The binary builds use the docker executor b/c at time of writing the machine
# executor is limited to only two cores and is painfully slow (4.5+ hours per
# GPU build). But the docker executor cannot be run with --runtime=nvidia, and
# so the binary test/upload jobs must run on a machine executor. The package
# built in the build job is persisted to the workspace, which the test jobs
# expect. The test jobs just run a few quick smoke tests (very similar to the
# second-round-user-facing smoke tests above) and then upload the binaries to
# their final locations. The upload part requires credentials that should only
# be available to org-members.
#
# binary_checkout MUST be run before other commands here. This is because the
# other commands are written in .circleci/scripts/*.sh , so the pytorch source
# code must be downloaded on the machine before they can be run. We cannot
# inline all the code into this file, since that would cause the yaml size to
# explode past 4 MB (all the code in the command section is just copy-pasted to
# everywhere in the .circleci/config.yml file where it appears).
##############################################################################
# Checks out the Pytorch and Builder repos (always both of them), and places
# them in the right place depending on what executor we're running on. We curl
# our .sh file from the interweb to avoid yaml size bloat. Note that many jobs
# do not need both the pytorch and builder repos, so this is a little wasteful
# (smoke tests and upload jobs do not need the pytorch repo).
binary_checkout: &binary_checkout
name: Checkout pytorch/builder repo
command: ~/workspace/.circleci/scripts/binary_checkout.sh
# Parses circleci arguments in a consistent way, essentially routing to the
# correct pythonXgccXcudaXos build we want
binary_populate_env: &binary_populate_env
name: Set up binary env variables
command: ~/workspace/.circleci/scripts/binary_populate_env.sh
binary_install_miniconda: &binary_install_miniconda
name: Install miniconda
no_output_timeout: "1h"
command: ~/workspace/.circleci/scripts/binary_install_miniconda.sh
# This section is used in the binary_test and smoke_test jobs. It expects
# 'binary_populate_env' to have populated /home/circleci/project/env and it
# expects another section to populate /home/circleci/project/ci_test_script.sh
# with the code to run in the docker
binary_run_in_docker: &binary_run_in_docker
name: Run in docker
# This step only runs on circleci linux machine executors that themselves
# need to start docker images
command: ~/workspace/.circleci/scripts/binary_run_in_docker.sh
# This is copied almost verbatim from the macos_brew_update job
# In version 2.1 and above we could make this a command and pass a parameter to
# it, but in this version there is no way to pass a parameter to a step
binary_macos_brew_update: &binary_macos_brew_update
name: Brew update and install moreutils and expect
no_output_timeout: "1h"
command: |
set -eux -o pipefail
# See https://discourse.brew.sh/t/fetching-homebrew-repos-is-slow/5374/3
brew untap caskroom/homebrew-cask
# moreutils installs a `parallel` executable by default, which conflicts
# with the executable from the GNU `parallel`, so we must unlink GNU
# `parallel` first, and relink it afterwards
brew update
brew unlink parallel
brew install moreutils
brew link parallel --overwrite
brew install expect

View File

@ -0,0 +1,64 @@
# Nighlty build smoke tests defaults
# These are the second-round smoke tests. These make sure that the binaries are
# correct from a user perspective, testing that they exist from the cloud are
# are runnable. Note that the pytorch repo is never cloned into these jobs
##############################################################################
smoke_linux_test: &smoke_linux_test
machine:
image: ubuntu-1604:201903-01
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace:
at: /home/circleci/project
- run:
<<: *setup_linux_system_environment
- run:
<<: *setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -ex
cat >/home/circleci/project/ci_test_script.sh <<EOL
# The following code will be executed inside Docker container
set -eux -o pipefail
/builder/smoke_test.sh
# The above code will be executed inside Docker container
EOL
- run:
<<: *binary_run_in_docker
smoke_mac_test: &smoke_mac_test
macos:
xcode: "9.0"
steps:
- attach_workspace:
at: ~/workspace
- attach_workspace: # TODO - we can `cp` from ~/workspace
at: /Users/distiller/project
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
<<: *binary_macos_brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -ex
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# TODO unbuffer and ts this, but it breaks cause miniconda overwrites
# tclsh. But unbuffer and ts aren't that important so they're just
# disabled for now
./builder/smoke_test.sh

View File

@ -0,0 +1,4 @@
##############################################################################
# Daily binary build trigger
##############################################################################

View File

@ -0,0 +1,51 @@
# Binary builds (subset, to smoke test that they'll work)
#
# NB: If you modify this file, you need to also modify
# the binary_and_smoke_tests_on_pr variable in
# pytorch-ci-hud to adjust the list of whitelisted builds
# at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
- binary_linux_manywheel_2.7mu_cpu_devtoolset7_build:
requires:
- setup
- binary_linux_manywheel_3.7m_cu100_devtoolset7_build:
requires:
- setup
- binary_linux_conda_2.7_cpu_devtoolset7_build:
requires:
- setup
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_devtoolset7_build
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_build:
requires:
- setup
# TODO we should test a libtorch cuda build, but they take too long
# - binary_linux_libtorch_2.7m_cu90_devtoolset7_static-without-deps_build
- binary_macos_wheel_3.6_cpu_build:
requires:
- setup
- binary_macos_conda_2.7_cpu_build:
requires:
- setup
- binary_macos_libtorch_2.7_cpu_build:
requires:
- setup
- binary_linux_manywheel_2.7mu_cpu_devtoolset7_test:
requires:
- setup
- binary_linux_manywheel_2.7mu_cpu_devtoolset7_build
- binary_linux_manywheel_3.7m_cu100_devtoolset7_test:
requires:
- setup
- binary_linux_manywheel_3.7m_cu100_devtoolset7_build
- binary_linux_conda_2.7_cpu_devtoolset7_test:
requires:
- setup
- binary_linux_conda_2.7_cpu_devtoolset7_build
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3.6_cu90_devtoolset7_test:
# requires:
# - setup
# - binary_linux_conda_3.6_cu90_devtoolset7_build

View File

@ -0,0 +1,11 @@
#- binary_linux_libtorch_2.7m_cpu_test:
# requires:
# - binary_linux_libtorch_2.7m_cpu_build
#- binary_linux_libtorch_2.7m_cu90_test:
# requires:
# - binary_linux_libtorch_2.7m_cu90_build
#- binary_linux_libtorch_2.7m_cu100_test:
# requires:
# - binary_linux_libtorch_2.7m_cu100_build
# Nightly uploads

View File

@ -0,0 +1,19 @@
# Warning: indentation here matters!
# Pytorch MacOS builds
- pytorch_macos_10_13_py3_build:
requires:
- setup
filters:
branches:
only: master
- pytorch_macos_10_13_py3_test:
requires:
- setup
- pytorch_macos_10_13_py3_build
filters:
branches:
only: master
- pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
requires:
- setup

View File

@ -0,0 +1,30 @@
# Scheduled to run 4 hours after the binary jobs start
# These jobs need to run after all the binary jobs run, regardless of if the
# jobs failed or not. There's no way to do this in CircleCI right now, so we
# just schedule this to run after all the binary jobs should've finished.
# These jobs are all idempotent and very lightweight; they just upload html
# files that track what binaries are available and what their sizes are.
update_s3_htmls:
triggers:
- schedule:
cron: "0 9 * * *"
filters:
branches:
only:
- master
jobs:
- setup
- update_s3_htmls_for_nightlies:
context: org-member
requires:
- setup
- update_s3_htmls_for_nightlies_devtoolset7:
context: org-member
requires:
- setup
- upload_binary_sizes:
context: org-member
requires:
- setup

View File

@ -0,0 +1,12 @@
##############################################################################
##############################################################################
# Workflows
##############################################################################
##############################################################################
# PR jobs pr builds
workflows:
version: 2
build:
jobs:

View File

@ -3,26 +3,29 @@
Checks: '
-*
,bugprone-*
,-bugprone-macro-parentheses
,-bugprone-forward-declaration-namespace
,-bugprone-macro-parentheses
,cppcoreguidelines-*
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-bounds-pointer-arithmetic
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-type-cstyle-cast
,-cppcoreguidelines-pro-type-reinterpret-cast
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,-cppcoreguidelines-interfaces-global-init
,-cppcoreguidelines-owning-memory
,hicpp-signed-bitwise
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-bounds-pointer-arithmetic
,-cppcoreguidelines-pro-type-cstyle-cast
,-cppcoreguidelines-pro-type-reinterpret-cast
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-type-union-access
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,hicpp-exception-baseclass
,hicpp-avoid-goto
,modernize-*
,-modernize-use-default-member-init
,-modernize-return-braced-init-list
,-modernize-use-auto
,-modernize-use-default-member-init
,-modernize-use-using
,performance-*
,-performance-noexcept-move-constructor
'
WarningsAsErrors: '*'
HeaderFilterRegex: 'torch/csrc/.*'

2
.ctags.d/pytorch.ctags Normal file
View File

@ -0,0 +1,2 @@
--exclude=build/*
--exclude=include/*

12
.flake8 Normal file
View File

@ -0,0 +1,12 @@
[flake8]
select = B,C,E,F,P,T4,W,B9
max-line-length = 120
# C408 ignored because we like the dict keyword argument syntax
# E501 is not flexible enough, we're using B950 instead
ignore =
E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
# these ignores are from flake8-bugbear; please fix!
B007,B008,
# these ignores are from flake8-comprehensions; please fix!
C400,C401,C402,C403,C404,C405,C407,C411,
exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,tools/amd_build/pyHIPIFY,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi

29
.gitignore vendored
View File

@ -22,23 +22,27 @@
aten/build/
aten/src/ATen/Config.h
aten/src/ATen/cuda/CUDAConfig.h
build/
caffe2/cpp_test/
dist/
docs/src/**/*
docs/cpp/build
docs/cpp/source/api
test/.coverage
test/.hypothesis/
test/cpp/api/mnist
test/custom_operator/model.pt
test/data/gpu_tensors.pt
test/data/legacy_modules.t7
test/data/legacy_serialized.pt
test/data/linear.pt
test/data/*.pt
dropout_model.pt
test/generated_type_hints_smoketest.py
test/htmlcov
test/cpp_extensions/install/
third_party/build/
tools/shared/_utils_internal.py
torch.egg-info/
torch/__init__.pyi
torch/nn/functional.pyi
torch/nn/modules/*.pyi
torch/csrc/autograd/generated/*
torch/csrc/cudnn/cuDNN.cpp
torch/csrc/generated
@ -52,6 +56,8 @@ torch/csrc/nn/THNN_generic.cwrap
torch/csrc/nn/THNN_generic.h
torch/csrc/nn/THNN.cpp
torch/csrc/nn/THNN.cwrap
torch/bin/
torch/cmake/
torch/lib/*.a*
torch/lib/*.dll*
torch/lib/*.exe*
@ -59,16 +65,27 @@ torch/lib/*.dylib*
torch/lib/*.h
torch/lib/*.lib
torch/lib/*.so*
torch/lib/protobuf*.pc
torch/lib/build
torch/lib/caffe2/
torch/lib/cmake
torch/lib/include
torch/lib/pkgconfig
torch/lib/protoc
torch/lib/protobuf/
torch/lib/tmp_install
torch/lib/torch_shm_manager
torch/lib/site-packages/
torch/lib/python*
torch/lib64
torch/include/
torch/share/
torch/test/
torch/version.py
# Root level file used in CI to specify certain env configs.
# E.g., see .circleci/config.yaml
env
.circleci/scripts/COMMIT_MSG
# IPython notebook checkpoints
.ipynb_checkpoints
@ -151,6 +168,9 @@ docs/source/scripts/activation_images/
# OSX dir files
.DS_Store
# GDB history
.gdb_history
## Caffe2
# build, distribute, and bins (+ python proto bindings)
@ -185,7 +205,6 @@ docs/dev
*.sst
*.ldb
LOCK
LOG*
CURRENT
MANIFEST-*

146
.gitmodules vendored
View File

@ -1,84 +1,116 @@
[submodule "third_party/pybind11"]
path = third_party/pybind11
url = https://github.com/pybind/pybind11.git
ignore = dirty
path = third_party/pybind11
url = https://github.com/pybind/pybind11.git
[submodule "third_party/cub"]
path = third_party/cub
url = https://github.com/NVlabs/cub.git
ignore = dirty
path = third_party/cub
url = https://github.com/NVlabs/cub.git
[submodule "third_party/eigen"]
path = third_party/eigen
url = https://github.com/eigenteam/eigen-git-mirror.git
ignore = dirty
path = third_party/eigen
url = https://github.com/eigenteam/eigen-git-mirror.git
[submodule "third_party/googletest"]
path = third_party/googletest
url = https://github.com/google/googletest.git
ignore = dirty
path = third_party/googletest
url = https://github.com/google/googletest.git
[submodule "third_party/benchmark"]
path = third_party/benchmark
url = https://github.com/google/benchmark.git
ignore = dirty
path = third_party/benchmark
url = https://github.com/google/benchmark.git
[submodule "third_party/protobuf"]
path = third_party/protobuf
url = https://github.com/google/protobuf.git
ignore = dirty
path = third_party/protobuf
url = https://github.com/protocolbuffers/protobuf.git
[submodule "third_party/ios-cmake"]
path = third_party/ios-cmake
url = https://github.com/Yangqing/ios-cmake.git
ignore = dirty
path = third_party/ios-cmake
url = https://github.com/Yangqing/ios-cmake.git
[submodule "third_party/NNPACK"]
path = third_party/NNPACK
url = https://github.com/Maratyszcza/NNPACK.git
ignore = dirty
path = third_party/NNPACK
url = https://github.com/Maratyszcza/NNPACK.git
[submodule "third_party/gloo"]
path = third_party/gloo
url = https://github.com/facebookincubator/gloo
ignore = dirty
path = third_party/gloo
url = https://github.com/facebookincubator/gloo
[submodule "third_party/NNPACK_deps/pthreadpool"]
path = third_party/pthreadpool
url = https://github.com/Maratyszcza/pthreadpool.git
ignore = dirty
path = third_party/pthreadpool
url = https://github.com/Maratyszcza/pthreadpool.git
[submodule "third_party/NNPACK_deps/FXdiv"]
path = third_party/FXdiv
url = https://github.com/Maratyszcza/FXdiv.git
ignore = dirty
path = third_party/FXdiv
url = https://github.com/Maratyszcza/FXdiv.git
[submodule "third_party/NNPACK_deps/FP16"]
path = third_party/FP16
url = https://github.com/Maratyszcza/FP16.git
ignore = dirty
path = third_party/FP16
url = https://github.com/Maratyszcza/FP16.git
[submodule "third_party/NNPACK_deps/psimd"]
path = third_party/psimd
url = https://github.com/Maratyszcza/psimd.git
ignore = dirty
path = third_party/psimd
url = https://github.com/Maratyszcza/psimd.git
[submodule "third_party/zstd"]
path = third_party/zstd
url = https://github.com/facebook/zstd.git
ignore = dirty
path = third_party/zstd
url = https://github.com/facebook/zstd.git
[submodule "third-party/cpuinfo"]
path = third_party/cpuinfo
url = https://github.com/Maratyszcza/cpuinfo.git
ignore = dirty
path = third_party/cpuinfo
url = https://github.com/pytorch/cpuinfo.git
[submodule "third_party/python-enum"]
path = third_party/python-enum
url = https://github.com/PeachPy/enum34.git
ignore = dirty
path = third_party/python-enum
url = https://github.com/PeachPy/enum34.git
[submodule "third_party/python-peachpy"]
path = third_party/python-peachpy
url = https://github.com/Maratyszcza/PeachPy.git
ignore = dirty
path = third_party/python-peachpy
url = https://github.com/Maratyszcza/PeachPy.git
[submodule "third_party/python-six"]
path = third_party/python-six
url = https://github.com/benjaminp/six.git
[submodule "third_party/ComputeLibrary"]
path = third_party/ComputeLibrary
url = https://github.com/ARM-software/ComputeLibrary.git
ignore = dirty
path = third_party/python-six
url = https://github.com/benjaminp/six.git
[submodule "third_party/onnx"]
path = third_party/onnx
url = https://github.com/onnx/onnx.git
ignore = dirty
path = third_party/onnx
url = https://github.com/onnx/onnx.git
[submodule "third_party/onnx-tensorrt"]
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
ignore = dirty
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
[submodule "third_party/sleef"]
path = third_party/sleef
url = https://github.com/shibatch/sleef
ignore = dirty
path = third_party/sleef
url = https://github.com/zdevito/sleef
[submodule "third_party/ideep"]
path = third_party/ideep
url = https://github.com/intel/ideep
ignore = dirty
path = third_party/ideep
url = https://github.com/intel/ideep
[submodule "third_party/nccl/nccl"]
path = third_party/nccl/nccl
url = https://github.com/NVIDIA/nccl
ignore = dirty
path = third_party/nccl/nccl
url = https://github.com/NVIDIA/nccl
[submodule "third_party/gemmlowp/gemmlowp"]
path = third_party/gemmlowp/gemmlowp
url = https://github.com/google/gemmlowp.git
ignore = dirty
path = third_party/gemmlowp/gemmlowp
url = https://github.com/google/gemmlowp.git
[submodule "third_party/QNNPACK"]
path = third_party/QNNPACK
url = https://github.com/pytorch/QNNPACK
ignore = dirty
path = third_party/QNNPACK
url = https://github.com/pytorch/QNNPACK
[submodule "third_party/neon2sse"]
path = third_party/neon2sse
url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
ignore = dirty
path = third_party/neon2sse
url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
[submodule "third_party/fbgemm"]
path = third_party/fbgemm
url = https://github.com/pytorch/fbgemm
ignore = dirty
path = third_party/fbgemm
url = https://github.com/pytorch/fbgemm
[submodule "third_party/foxi"]
ignore = dirty
path = third_party/foxi
url = https://github.com/houseroad/foxi.git
[submodule "third_party/tbb"]
path = third_party/tbb
url = https://github.com/01org/tbb
branch = tbb_2018

60
.jenkins/caffe2/bench.sh Executable file
View File

@ -0,0 +1,60 @@
#!/bin/bash
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Anywhere except $ROOT_DIR should work. This is so the python import doesn't
# get confused by any 'caffe2' directory in cwd
cd "$INSTALL_PREFIX"
if [[ $BUILD_ENVIRONMENT == *-cuda* ]]; then
num_gpus=$(nvidia-smi -L | wc -l)
elif [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
num_gpus=$(rocminfo | grep 'Device Type.*GPU' | wc -l)
else
num_gpus=0
fi
caffe2_pypath="$(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.dirname(os.path.realpath(caffe2.__file__)))')"
# Resnet50
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 2
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4
fi
# ResNext
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4
fi
# Shufflenet
if (( $num_gpus == 0 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu --model shufflenet
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --model shufflenet
fi
if (( $num_gpus >= 2 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2 --num_gpus 2 --model shufflenet
fi
if (( $num_gpus >= 4 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4 --model shufflenet
fi

View File

@ -2,15 +2,30 @@
set -ex
pip install --user --no-cache-dir hypothesis==3.59.0
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# TODO: Migrate all centos jobs to use proper devtoolset
if [[ "$BUILD_ENVIRONMENT" == *py2-cuda9.0-cudnn7-centos7* ]]; then
# There is a bug in pango packge on Centos7 that causes undefined
# symbols, upgrading glib2 to >=2.56.1 solves the issue. See
# https://bugs.centos.org/view.php?id=15495
sudo yum install -y -q glib2-2.56.1
fi
# CMAKE_ARGS are only passed to 'cmake' and the -Dfoo=bar does not work with
# setup.py, so we build a list of foo=bars and then either convert it to
# -Dfoo=bars or export them before running setup.py
build_args=()
build_to_cmake () {
cmake_args=()
for build_arg in $*; do
cmake_args+=("-D$build_arg")
done
echo ${cmake_args[@]}
}
# The INSTALL_PREFIX here must match up with test.sh
INSTALL_PREFIX="/usr/local/caffe2"
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
CMAKE_ARGS=()
SCCACHE="$(which sccache)"
if [ "$(which gcc)" != "/root/sccache/gcc" ]; then
# Setup SCCACHE
###############################################################################
@ -87,52 +102,61 @@ report_compile_cache_stats() {
fi
}
###############################################################################
# Explicitly set Python executable.
###############################################################################
# On Ubuntu 16.04 the default Python is still 2.7.
PYTHON="$(which python)"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
CMAKE_ARGS+=("-DPYTHON_EXECUTABLE=${PYTHON}")
fi
###############################################################################
# Use special scripts for Android and setup builds
###############################################################################
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
CMAKE_ARGS+=("-DBUILD_BINARY=ON")
CMAKE_ARGS+=("-DBUILD_TEST=ON")
CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")
CMAKE_ARGS+=("-DUSE_ZSTD=ON")
"${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"
build_args+=("BUILD_BINARY=ON")
build_args+=("BUILD_TEST=ON")
build_args+=("USE_OBSERVERS=ON")
build_args+=("USE_ZSTD=ON")
"${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@"
exit 0
fi
###############################################################################
# Set cmake args
# Set parameters
###############################################################################
CMAKE_ARGS+=("-DBUILD_BINARY=ON")
CMAKE_ARGS+=("-DBUILD_TEST=ON")
CMAKE_ARGS+=("-DINSTALL_TEST=ON")
CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")
CMAKE_ARGS+=("-DUSE_ZSTD=ON")
CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")
if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then
CMAKE_ARGS+=("-DBLAS=MKL")
CMAKE_ARGS+=("-DUSE_MKLDNN=ON")
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
build_args+=("BUILD_PYTHON=OFF")
else
build_args+=("BUILD_PYTHON=ON")
build_args+=("PYTHON_EXECUTABLE=${PYTHON}")
fi
if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then
build_args+=("BLAS=MKL")
build_args+=("USE_MKLDNN=ON")
fi
build_args+=("BUILD_BINARY=ON")
build_args+=("BUILD_TEST=ON")
build_args+=("INSTALL_TEST=ON")
build_args+=("USE_ZSTD=ON")
if [[ $BUILD_ENVIRONMENT == *py2-cuda9.0-cudnn7-ubuntu16.04* ]]; then
# removing http:// duplicate in favor of nvidia-ml.list
# which is https:// version of the same repo
sudo rm -f /etc/apt/sources.list.d/nvidia-machine-learning.list
curl -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
sudo dpkg -i ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
sudo apt-key add /var/nvinfer-runtime-trt-repo-5.0.2-ga-cuda9.0/7fa2af80.pub
sudo apt-get -qq update
sudo apt-get install -y --no-install-recommends libnvinfer5=5.0.2-1+cuda9.0 libnvinfer-dev=5.0.2-1+cuda9.0
rm ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
build_args+=("USE_TENSORRT=ON")
fi
if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
CMAKE_ARGS+=("-DUSE_CUDA=ON")
CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")
CMAKE_ARGS+=("-DUSE_NNPACK=OFF")
build_args+=("USE_CUDA=ON")
build_args+=("USE_NNPACK=OFF")
# Target only our CI GPU machine's CUDA arch to speed up the build
build_args+=("TORCH_CUDA_ARCH_LIST=Maxwell")
# Explicitly set path to NVCC such that the symlink to ccache or sccache is used
CMAKE_ARGS+=("-DCUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")
build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")
# Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.
# Setting PATH to resolve to the right nvcc alone isn't enough.
@ -143,10 +167,16 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
export PATH="/usr/local/cuda/bin:$PATH"
fi
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
build_args+=("USE_ROCM=ON")
# This is needed to enable ImageInput operator in resnet50_trainer
CMAKE_ARGS+=("-USE_OPENCV=ON")
build_args+=("USE_OPENCV=ON")
# This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
CMAKE_ARGS+=("-USE_LMDB=ON")
build_args+=("USE_LMDB=ON")
# When hcc runs out of memory, it silently exits without stopping
# the build process, leaving undefined symbols in the shared lib
# which will cause undefined symbol errors when later running
# tests. Setting MAX_JOBS to smaller number to make CI less flaky.
export MAX_JOBS=4
########## HIPIFY Caffe2 operators
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py"
@ -155,37 +185,25 @@ fi
# building bundled nccl in this config triggers a bug in nvlink. For
# more, see https://github.com/pytorch/pytorch/issues/14486
if [[ "${BUILD_ENVIRONMENT}" == *-cuda8*-cudnn7* ]]; then
CMAKE_ARGS+=("-DUSE_SYSTEM_NCCL=ON")
build_args+=("USE_SYSTEM_NCCL=ON")
fi
# Try to include Redis support for Linux builds
if [ "$(uname)" == "Linux" ]; then
CMAKE_ARGS+=("-DUSE_REDIS=ON")
fi
# Currently, on Jenkins mac os, we will use custom protobuf. Mac OS
# contbuild at the moment is minimal dependency - it doesn't use glog
# or gflags either.
if [ "$(uname)" == "Darwin" ]; then
CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=ON")
build_args+=("USE_REDIS=ON")
fi
# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace
CMAKE_ARGS+=("-DONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")
# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)
# and use that if so.
if [[ -x "$(command -v cmake3)" ]]; then
CMAKE_BINARY=cmake3
else
CMAKE_BINARY=cmake
fi
build_args+=("ONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")
###############################################################################
# Configure and make
###############################################################################
if [[ -z "$INTEGRATED" ]]; then
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
# cmake-only non-setup.py build, to test cpp only bits. This installs into
# /usr/local/caffe2 and installs no Python tests
build_args+=("CMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")
# Run cmake from ./build_caffe2 directory so it doesn't conflict with
# standard PyTorch build directory. Eventually these won't need to
@ -194,8 +212,16 @@ if [[ -z "$INTEGRATED" ]]; then
mkdir build_caffe2
cd ./build_caffe2
# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)
# and use that if so.
if [[ -x "$(command -v cmake3)" ]]; then
CMAKE_BINARY=cmake3
else
CMAKE_BINARY=cmake
fi
# Configure
${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"
${CMAKE_BINARY} "${ROOT_DIR}" $(build_to_cmake ${build_args[@]}) "$@"
# Build
if [ "$(uname)" == "Linux" ]; then
@ -205,20 +231,35 @@ if [[ -z "$INTEGRATED" ]]; then
exit 1
fi
# This is to save test binaries for testing
mv "$INSTALL_PREFIX/test/" "$INSTALL_PREFIX/cpp_test/"
ls -lah $INSTALL_PREFIX
else
# Python build. Uses setup.py to install into site-packages
build_args+=("USE_LEVELDB=ON")
build_args+=("USE_LMDB=ON")
build_args+=("USE_OPENCV=ON")
build_args+=("BUILD_TEST=ON")
# These flags preserve the flags that were used before this refactor (blame
# me)
build_args+=("USE_GLOG=ON")
build_args+=("USE_GFLAGS=ON")
build_args+=("USE_FBGEMM=OFF")
build_args+=("USE_MKLDNN=OFF")
build_args+=("USE_DISTRIBUTED=ON")
for build_arg in "${build_args[@]}"; do
export $build_arg
done
# sccache will be stuck if all cores are used for compiling
# see https://github.com/pytorch/pytorch/pull/7361
if [[ -n "${SCCACHE}" ]]; then
if [[ -n "${SCCACHE}" && $BUILD_ENVIRONMENT != *rocm* ]]; then
export MAX_JOBS=`expr $(nproc) - 1`
fi
USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_BINARY=1 python setup.py install --user
# This is to save test binaries for testing
cp -r torch/lib/tmp_install $INSTALL_PREFIX
ls $INSTALL_PREFIX
$PYTHON setup.py install --user
report_compile_cache_stats
fi
@ -230,39 +271,14 @@ fi
# Install ONNX into a local directory
pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
report_compile_cache_stats
# Symlink the caffe2 base python path into the system python path,
# so that we can import caffe2 without having to change $PYTHONPATH.
# Run in a subshell to contain environment set by /etc/os-release.
#
# This is only done when running on Jenkins! We don't want to pollute
# the user environment with Python symlinks and ld.so.conf.d hacks.
#
if [[ -z "$INTEGRATED" ]]; then
if [ -n "${JENKINS_URL}" ]; then
(
source /etc/os-release
function python_version() {
"$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'
}
# Debian/Ubuntu
if [[ "$ID_LIKE" == *debian* ]]; then
python_path="/usr/local/lib/$(python_version)/dist-packages"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# RHEL/CentOS
if [[ "$ID_LIKE" == *rhel* ]]; then
python_path="/usr/lib64/$(python_version)/site-packages/"
sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"
fi
# /etc/ld.so.conf.d is used on both Debian and RHEL
echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf
sudo ldconfig
)
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
ORIG_COMP=/opt/rocm/hcc/bin/clang-*_original
if [ -e $ORIG_COMP ]; then
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
# note that the wrapping always names the compiler "clang-7.0_original"
WRAPPED=/opt/rocm/hcc/bin/clang-[0-99]
sudo mv $ORIG_COMP $WRAPPED
fi
fi
report_compile_cache_stats

22
.jenkins/caffe2/common.sh Normal file
View File

@ -0,0 +1,22 @@
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR="$ROOT_DIR/caffe2_tests"
gtest_reports_dir="${TEST_DIR}/cpp"
pytest_reports_dir="${TEST_DIR}/python"
# Figure out which Python to use
PYTHON="$(which python)"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
fi
# /usr/local/caffe2 is where the cpp bits are installed to in in cmake-only
# builds. In +python builds the cpp tests are copied to /usr/local/caffe2 so
# that the test code in .jenkins/test.sh is the same
INSTALL_PREFIX="/usr/local/caffe2"
mkdir -p "$gtest_reports_dir" || true
mkdir -p "$pytest_reports_dir" || true
mkdir -p "$INSTALL_PREFIX" || true

View File

@ -1,23 +1,6 @@
#!/bin/bash
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR=$ROOT_DIR/caffe2_tests
# Figure out which Python to use
PYTHON="python"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON="python${BASH_REMATCH[1]}"
fi
# The prefix must mirror the setting from build.sh
INSTALL_PREFIX="/usr/local/caffe2"
# Add the site-packages in the caffe2 install prefix to the PYTHONPATH
SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")
INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Skip tests in environments where they are not built/applicable
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
@ -25,41 +8,39 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
exit 0
fi
# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed
# Caffe2.
export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"
cd "$ROOT_DIR"
if [ -d $TEST_DIR ]; then
echo "Directory $TEST_DIR already exists; please remove it..."
exit 1
# Find where cpp tests and Caffe2 itself are installed
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
# For cmake only build we install everything into /usr/local
cpp_test_dir="$INSTALL_PREFIX/cpp_test"
ld_library_path="$INSTALL_PREFIX/lib"
else
# For Python builds we install into python
# cd to /usr first so the python import doesn't get confused by any 'caffe2'
# directory in cwd
python_installation="$(dirname $(dirname $(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.realpath(caffe2.__file__))')))"
caffe2_pypath="$python_installation/caffe2"
cpp_test_dir="$python_installation/torch/test"
ld_library_path="$python_installation/torch/lib"
fi
mkdir -p $TEST_DIR/{cpp,python}
cd "${WORKSPACE}"
# C++ tests
################################################################################
# C++ tests #
################################################################################
echo "Running C++ tests.."
gtest_reports_dir="${TEST_DIR}/cpp"
junit_reports_dir="${TEST_DIR}/junit_reports"
mkdir -p "$gtest_reports_dir" "$junit_reports_dir"
for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do
for test in $(find "$cpp_test_dir" -executable -type f); do
case "$test" in
# skip tests we know are hanging or bad
*/mkl_utils_test|*/aten/integer_divider_test)
continue
;;
*/scalar_tensor_test|*/basic|*/native_test)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
"$test"
fi
;;
*)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
LD_LIBRARY_PATH="$ld_library_path" "$test"
fi
;;
*)
# Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While
# planning to migrate to gtest as the common PyTorch c++ test suite, we
# currently do NOT use the xml test reporter, because Catch doesn't
@ -70,21 +51,39 @@ for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do
# output than it is to have XML output for Jenkins.
# Note: in the future, if we want to use xml test reporter once we switch
# to all gtest, one can simply do:
# "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
"$test"
LD_LIBRARY_PATH="$ld_library_path" \
"$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
;;
esac
done
# Get the relative path to where the caffe2 python module was installed
CAFFE2_PYPATH="$INSTALL_SITE_DIR/caffe2"
################################################################################
# Python tests #
################################################################################
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
exit 0
fi
if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then
# Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04
# See comments on
# https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830
sudo pip -q uninstall -y hypothesis
# "pip install hypothesis==3.44.6" from official server is unreliable on
# CircleCI, so we host a copy on S3 instead
sudo pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
sudo pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
sudo pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
else
pip install --user --no-cache-dir hypothesis==3.59.0
fi
# Collect additional tests to run (outside caffe2/python)
EXTRA_TESTS=()
# CUDA builds always include NCCL support
if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then
EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")
EXTRA_TESTS+=("$caffe2_pypath/contrib/nccl")
fi
rocm_ignore_test=()
@ -92,34 +91,46 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
# Currently these tests are failing on ROCM platform:
# Unknown reasons, need to debug
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/arg_ops_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/piecewise_linear_transform_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/softmax_ops_test.py")
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")
rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/piecewise_linear_transform_test.py")
# On ROCm, RCCL (distributed) development isn't complete.
# https://github.com/ROCmSoftwarePlatform/rccl
rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")
fi
# Python tests
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
echo "Running Python tests.."
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
# locale setting is required by click package with py3
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
fi
pip install --user pytest-sugar
"$PYTHON" \
-m pytest \
-x \
-v \
--disable-warnings \
--junit-xml="$TEST_DIR/python/result.xml" \
--ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \
--ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \
--junit-xml="$pytest_reports_dir/result.xml" \
--ignore "$caffe2_pypath/python/test/executor_test.py" \
--ignore "$caffe2_pypath/python/operator_test/matmul_op_test.py" \
--ignore "$caffe2_pypath/python/operator_test/pack_ops_test.py" \
--ignore "$caffe2_pypath/python/mkl/mkl_sbn_speed_test.py" \
${rocm_ignore_test[@]} \
"$CAFFE2_PYPATH/python" \
"$caffe2_pypath/python" \
"${EXTRA_TESTS[@]}"
cd ${INSTALL_PREFIX}
if [[ -n "$INTEGRATED" ]]; then
pip install --user torchvision
#####################
# torchvision tests #
#####################
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
pip install -q --user git+https://github.com/pytorch/vision.git
pip install -q --user ninja
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
pip install -q --user onnxruntime==0.4.0
fi
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -4,7 +4,9 @@
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Clang version:"
@ -14,8 +16,24 @@ clang --version
# symbolize=1: Gives us much better errors when things go wrong
export ASAN_OPTIONS=detect_leaks=0:symbolize=1
# TODO: Make the ASAN flags a more unified env var
# FIXME: Remove the hardcoded "-pthread" option.
# With asan build, the cmake thread CMAKE_HAVE_LIBC_CREATE[1] checking will
# succeed because "pthread_create" is in libasan.so. However, libasan doesn't
# have the full pthread implementation. Other advanced pthread functions doesn't
# exist in libasan.so[2]. If we need some pthread advanced functions, we still
# need to link the pthread library.
# This issue is already fixed in cmake 3.13[3]. If we use the newer cmake, we
# could remove this hardcoded option.
#
# [1] https://github.com/Kitware/CMake/blob/8cabaaf054a16ea9c8332ce8e9291bd026b38c62/Modules/FindThreads.cmake#L135
# [2] https://wiki.gentoo.org/wiki/AddressSanitizer/Problems
# [3] https://github.com/Kitware/CMake/commit/e9a1ddc594de6e6251bf06d732775dae2cabe4c8
#
# TODO: Make the ASAN flags a centralized env var and unify with USE_ASAN option
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \
NO_CUDA=1 USE_MKLDNN=0 \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \
CXX_FLAGS="-pthread" \
USE_ASAN=1 USE_CUDA=0 USE_MKLDNN=0 \
python setup.py install
assert_git_not_dirty

View File

@ -4,7 +4,9 @@
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# For distributed, four environmental configs:
@ -18,7 +20,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
@ -30,8 +32,8 @@ if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT"
sudo mkdir -p /var/run/sshd
fi
if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*
if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
fi
echo "Python version:"
@ -44,7 +46,30 @@ echo "CMake version:"
cmake --version
# TODO: Don't run this...
pip install -q -r requirements.txt || true
pip_install -r requirements.txt || true
# TODO: Don't install this here
if ! which conda; then
# In ROCm CIs, we are doing cross compilation on build machines with
# intel cpu and later run tests on machines with amd cpu.
# Also leave out two builds to make sure non-mkldnn builds still work.
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *-trusty-py3.5-* && "$BUILD_ENVIRONMENT" != *-xenial-cuda9-cudnn7-py3-* ]]; then
pip_install mkl mkl-devel
export USE_MKLDNN=1
else
export USE_MKLDNN=0
fi
fi
# Use special scripts for Android builds
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
build_args=()
build_args+=("-DBUILD_CAFFE2_MOBILE=OFF")
build_args+=("-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')")
build_args+=("-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')")
exec ./scripts/build_android.sh "${build_args[@]}" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# When hcc runs out of memory, it silently exits without stopping
@ -65,7 +90,7 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
fi
# Setup wrapper scripts
for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do
for compiler in cc c++ gcc g++; do
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which $compiler) \"\$@\""
@ -83,24 +108,23 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer
# LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user
exit 0
fi
# TODO: Don't install this here
if ! which conda; then
pip install -q mkl mkl-devel
if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc7.2* ]] || [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc4.8* ]]; then
export USE_MKLDNN=1
else
export USE_MKLDNN=0
ORIG_COMP=/opt/rocm/hcc/bin/clang-*_original
if [ -e $ORIG_COMP ]; then
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
# note that the wrapping always names the compiler "clang-7.0_original"
WRAPPED=/opt/rocm/hcc/bin/clang-[0-99]
sudo mv $ORIG_COMP $WRAPPED
fi
exit 0
fi
# sccache will fail for CUDA builds if all cores are used for compiling
# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used
if [ -z "$MAX_JOBS" ]; then
if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then
export MAX_JOBS=`expr $(nproc) - 1`
export MAX_JOBS=$(($(nproc) - 1))
fi
fi
@ -115,35 +139,48 @@ if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then
export DEBUG=1
fi
# Patch required to build xla
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
git clone --recursive https://github.com/pytorch/xla.git
./xla/scripts/apply_patches.sh
fi
# check that setup.py would fail with bad arguments
echo "The next three invocations are expected to fail with invalid command error messages."
( ! get_exit_code python setup.py bad_argument )
( ! get_exit_code python setup.py clean] )
( ! get_exit_code python setup.py clean bad_argument )
# ppc64le build fails when WERROR=1
# set only when building other architectures
# only use for "python setup.py install" line
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
WERROR=1 python setup.py install
elif [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
else
python setup.py install
fi
# Add the test binaries so that they won't be git clean'ed away
git add -f build/bin
assert_git_not_dirty
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn7-py3* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip install -q -r requirements.txt || true
pip_install -r requirements.txt || true
LC_ALL=C make html
popd
assert_git_not_dirty
fi
# Test standalone c10 build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn7-py3* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
mkdir -p c10/build
pushd c10/build
cmake ..
make -j
popd
assert_git_not_dirty
fi
# Test no-Python build
@ -166,4 +203,54 @@ if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake "$CUSTOM_OP_TEST"
make VERBOSE=1
popd
assert_git_not_dirty
fi
# Test XLA build
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# TODO: Move this to Dockerfile.
pip_install lark-parser
# Bazel doesn't work with sccache gcc. https://github.com/bazelbuild/bazel/issues/3642
sudo add-apt-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main"
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
sudo apt-get -qq update
# Install clang-7 clang++-7 for xla
sudo apt-get -qq install clang-7 clang++-7
# Bazel dependencies
sudo apt-get -qq install pkg-config zip zlib1g-dev unzip
# XLA build requires Bazel
wget https://github.com/bazelbuild/bazel/releases/download/0.24.1/bazel-0.24.1-installer-linux-x86_64.sh
chmod +x bazel-*.sh
sudo ./bazel-*.sh
BAZEL="$(which bazel)"
if [ -z "${BAZEL}" ]; then
echo "Unable to find bazel..."
exit 1
fi
# Install bazels3cache for cloud cache
sudo apt-get -qq install npm
npm config set strict-ssl false
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install -qq nodejs
sudo npm install -g bazels3cache
BAZELS3CACHE="$(which bazels3cache)"
if [ -z "${BAZELS3CACHE}" ]; then
echo "Unable to find bazels3cache..."
exit 1
fi
bazels3cache --bucket=ossci-compiler-cache-circleci-xla --maxEntrySizeBytes=0
pushd xla
export CC=clang-7 CXX=clang++-7
# Use cloud cache to build when available.
sed -i '/bazel build/ a --remote_http_cache=http://localhost:7777 \\' build_torch_xla_libs.sh
python setup.py install
popd
assert_git_not_dirty
fi

View File

@ -17,14 +17,28 @@ function cleanup {
set -ex
# Save the SCRIPT_DIR absolute path in case later we chdir (as occurs in the gpu perf test)
SCRIPT_DIR="$( cd "$(dirname "${BASH_SOURCE[0]}")" ; pwd -P )"
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
# Figure out which Python to use for ROCm
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
# non-interactive bashs do not expand aliases by default
shopt -s expand_aliases
export PYTORCH_TEST_WITH_ROCM=1
alias python="$PYTHON"
fi
# This token is used by a parser on Jenkins logs for determining
# if a failure is a legitimate problem, or a problem with the build
# system; to find out more, grep for this string in ossci-job-dsl.
echo "ENTERED_USER_LAND"
export IS_PYTORCH_CI=1
# compositional trap taken from https://stackoverflow.com/a/7287873/23845
# note: printf is used instead of echo to avoid backslash
@ -61,17 +75,33 @@ declare -f -t trap_add
trap_add cleanup EXIT
function assert_git_not_dirty() {
# TODO: we should add an option to `build_amd.py` that reverts the repo to
# an unmodified state.
if ([[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *xla* ]]) ; then
git_status=$(git status --porcelain)
if [[ $git_status ]]; then
echo "Build left local git repository checkout dirty"
echo "git status --porcelain:"
echo "${git_status}"
exit 1
fi
fi
}
if which sccache > /dev/null; then
# Save sccache logs to file
sccache --stop-server || true
rm ~/sccache_error.log || true
SCCACHE_ERROR_LOG=~/sccache_error.log RUST_LOG=sccache::server=error sccache --start-server
# increasing SCCACHE_IDLE_TIMEOUT so that extension_backend_test.cpp can build after this PR:
# https://github.com/pytorch/pytorch/pull/16645
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=1200 RUST_LOG=sccache::server=error sccache --start-server
# Report sccache stats for easier debugging
sccache --zero-stats
function sccache_epilogue() {
echo '=================== sccache compilation log ==================='
python $(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py ~/sccache_error.log
python "$SCRIPT_DIR/print_sccache_log.py" ~/sccache_error.log 2>/dev/null
echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
sccache --show-stats
sccache --stop-server || true
@ -98,22 +128,9 @@ if [ -z "$COMPACT_JOB_NAME" ]; then
exit 1
fi
if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/disabled-configs.txt"; then
echo "Job is explicitly disabled, SKIPPING"
exit 0
else
echo "Job is not disabled, proceeding"
fi
if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/enabled-configs.txt"; then
echo "Job is enabled, proceeding"
else
echo "Job is not enabled, FAILING now (revert changes to enabled-configs.txt to fix this)"
exit 1
fi
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3 ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3* ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]] || \
[[ "$BUILD_ENVIRONMENT" == *pytorch_macos* ]]; then
BUILD_TEST_LIBTORCH=1
else
BUILD_TEST_LIBTORCH=0
@ -122,7 +139,7 @@ fi
# Use conda cmake in some CI build. Conda cmake will be newer than our supported
# min version 3.5, so we only do it in two builds that we know should use conda.
if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn7-py2* ]] || \
if [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py2* ]] || \
[[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then
if ! which conda; then
echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"
@ -138,3 +155,16 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then
fi
fi
fi
function pip_install() {
# retry 3 times
pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@"
}
function get_exit_code() {
set +e
"$@"
retcode=$?
set -e
return $retcode
}

View File

@ -1,5 +0,0 @@
# This file contains a list of disabled configurations. Disabled
# configurations are skipped and not considered a failure if they
# fail. You can use this to temporarily reserve a test name to
# turn on CI side before PyTorch repository supports it. This
# file has the same format as .jenkins/enabled-configs.txt

View File

@ -1,6 +1,8 @@
#!/bin/bash
# shellcheck disable=SC2034
COMPACT_JOB_NAME="docker-build-test"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
docker build -t pytorch .

View File

@ -1,51 +0,0 @@
# This file contains a list of enabled configurations
# to perform tests on. If you want to run tests on CI on
# a limited set of tests before enabling the full test suite,
# you can delete lines from this file. Any test that is not
# in this file will report a failure (so you don't forget to
# reenable the tests on merge ;)
pytorch-linux-xenial-cuda8-cudnn7-py3-build
pytorch-linux-xenial-cuda8-cudnn7-py3-test
pytorch-linux-xenial-cuda8-cudnn7-py3-multigpu-test
pytorch-linux-xenial-cuda9-cudnn7-py2-build
pytorch-linux-xenial-cuda9-cudnn7-py2-test
pytorch-linux-xenial-cuda9-cudnn7-py3-build
pytorch-linux-xenial-cuda9-cudnn7-py3-test
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-build
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-test
pytorch-linux-xenial-py3-clang5-asan-build
pytorch-linux-xenial-py3-clang5-asan-test
pytorch-linux-trusty-py2.7.9-build
pytorch-linux-trusty-py2.7.9-test
pytorch-linux-trusty-py2.7-build
pytorch-linux-trusty-py2.7-test
pytorch-linux-trusty-py3.5-build
pytorch-linux-trusty-py3.5-test
pytorch-linux-trusty-py3.6-gcc4.8-build
pytorch-linux-trusty-py3.6-gcc4.8-test
pytorch-linux-trusty-py3.6-gcc5.4-build
pytorch-linux-trusty-py3.6-gcc5.4-test
pytorch-linux-trusty-py3.6-gcc7.2-build
pytorch-linux-trusty-py3.6-gcc7.2-test
pytorch-linux-trusty-py3.6-gcc7-build
pytorch-linux-trusty-py3.6-gcc7-test
pytorch-linux-trusty-pynightly-build
pytorch-linux-trusty-pynightly-test
pytorch-win-ws2016-cuda9-cudnn7-py3-build
pytorch-win-ws2016-cuda9-cudnn7-py3-test
pytorch-macos-10.13-py3-build
pytorch-macos-10.13-py3-test
pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
pytorch-docker-build-test
short-perf-test-cpu
short-perf-test-gpu
py2-clang7-rocmdeb-ubuntu16.04-build
py2-clang7-rocmdeb-ubuntu16.04-test
py2-devtoolset7-rocmrpm-centos7.5-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-test
pytorch-ppc64le-cuda9.1-cudnn7-py3-build
pytorch-ppc64le-cuda9.1-cudnn7-py3-test

View File

@ -1,9 +1,9 @@
#!/bin/bash
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-build* ]]; then
if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-build* ]]; then
source "$(dirname "${BASH_SOURCE[0]}")/macos-build.sh"
fi
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test* ]]; then
if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-test* ]]; then
source "$(dirname "${BASH_SOURCE[0]}")/macos-test.sh"
fi

View File

@ -1,6 +1,8 @@
#!/bin/bash
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
export PATH="/usr/local/bin:$PATH"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
@ -17,17 +19,18 @@ source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
git submodule sync --recursive
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
# Build PyTorch
if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
export CUDA_VERSION=9.2
export TORCH_CUDA_ARCH_LIST=5.2
export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}
export NO_CUDA=0
export USE_CUDA=1
if [ -z "${IN_CIRCLECI}" ]; then
# Eigen gives "explicit specialization of class must precede its first use" error
@ -50,7 +53,7 @@ if which sccache > /dev/null; then
printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"
chmod a+x "${PYTORCH_ENV_DIR}/clang"
if [[ "${JOB_BASE_NAME}" == *cuda* ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then
printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"
chmod a+x "${PYTORCH_ENV_DIR}/nvcc"
export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"
@ -65,6 +68,8 @@ export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
python setup.py install
assert_git_not_dirty
# Upload torch binaries when the build job is finished
if [ -z "${IN_CIRCLECI}" ]; then
7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

View File

@ -1,12 +1,14 @@
#!/bin/bash
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
export PATH="/usr/local/bin:$PATH"
# Set up conda environment
export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"
export PYTORCH_ENV_DIR="${HOME}/workspace"
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then
mkdir -p ${PYTORCH_ENV_DIR}
@ -16,17 +18,24 @@ fi
export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"
source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja six
pip install hypothesis
pip install -q hypothesis "librosa>=0.6.2" psutil
# faulthandler become built-in since 3.3
if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
pip install -q faulthandler
fi
if [ -z "${IN_CIRCLECI}" ]; then
rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*
fi
git submodule sync --recursive
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/
# Test PyTorch
if [ -z "${IN_CIRCLECI}" ]; then
if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
# Eigen gives "explicit specialization of class must precede its first use" error
# when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
export DEVELOPER_DIR=/Library/Developer/CommandLineTools
@ -49,34 +58,49 @@ if [ -z "${IN_CIRCLECI}" ]; then
7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"
fi
# Test that OpenMP is enabled
pushd test
if [[ ! $(python -c "import torch; print(int(torch.backends.openmp.is_available()))") == "1" ]]; then
echo "Build should have OpenMP enabled, but torch.backends.openmp.is_available() is False"
exit 1
fi
popd
test_python_all() {
echo "Ninja version: $(ninja --version)"
python test/run_test.py --verbose
assert_git_not_dirty
}
test_cpp_api() {
test_libtorch() {
# C++ API
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
# But still clean it before we perform our own build.
#
CPP_BUILD="$PWD/../cpp-build"
rm -rf $CPP_BUILD
mkdir -p $CPP_BUILD/caffe2
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
# But still clean it before we perform our own build.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd $CPP_BUILD/caffe2
VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY
popd
echo "Testing libtorch"
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
CPP_BUILD="$PWD/../cpp-build"
rm -rf $CPP_BUILD
mkdir -p $CPP_BUILD/caffe2
# Unfortunately it seems like the test can't load from miniconda3
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
"$CPP_BUILD"/caffe2/bin/test_api
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd $CPP_BUILD/caffe2
VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY
popd
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
# Unfortunately it seems like the test can't load from miniconda3
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api
assert_git_not_dirty
fi
}
test_custom_script_ops() {
@ -96,18 +120,19 @@ test_custom_script_ops() {
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
assert_git_not_dirty
}
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-test ]]; then
test_python_all
test_cpp_api
test_libtorch
test_custom_script_ops
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then
test_python_all
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
test_cpp_api
elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then
test_libtorch
test_custom_script_ops
fi
fi

View File

@ -4,7 +4,9 @@
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-multigpu-test"
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch (distributed only)"
@ -25,4 +27,9 @@ if [ -n "${IN_CIRCLECI}" ]; then
fi
fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$PWD/../cpp-build"/caffe2/build/bin/test_api
time python test/run_test.py --verbose -i distributed
time python test/run_test.py --verbose -i c10d
time python test/run_test.py --verbose -i c10d_spawn
assert_git_not_dirty

View File

@ -10,7 +10,7 @@ get_runtime_of_command () {
TIMEFORMAT=%R
# runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)
runtime=$( { time $@; } 2>&1 1>/dev/null)
runtime=$( { time "$@"; } 2>&1 1>/dev/null)
if [[ $runtime == *"Error"* ]]; then
exit 1
fi

View File

@ -1,7 +1,6 @@
import sys
import json
import math
import numpy
import argparse
parser = argparse.ArgumentParser()
@ -62,8 +61,10 @@ print("z-value: ", z_value)
if z_value >= 3:
raise Exception('''\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run `cd .jenkins/pytorch/perf_test/ && bash ''' + test_name + '''.sh` on your local machine and compare the runtime before/after your code change.
''')
To reproduce this regression, run
`cd .jenkins/pytorch/perf_test/ && bash {}.sh` on your local machine
and compare the runtime before/after your code change.
'''.format(test_name))
else:
print("z-value < 3, no perf regression detected.")
if args.update:

View File

@ -19,14 +19,14 @@ test_cpu_speed_mini_sequence_labeler () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py)
SAMPLE_ARRAY+=(${runtime})
done
cd ../../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -12,7 +12,7 @@ test_cpu_speed_mnist () {
cd examples/mnist
pip install -r requirements.txt
conda install -c pytorch torchvision-cpu
# Download data
python main.py --epochs 0
@ -20,7 +20,7 @@ test_cpu_speed_mnist () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
@ -28,7 +28,7 @@ test_cpu_speed_mnist () {
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -1,3 +1,5 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_torch () {
@ -17,7 +19,7 @@ test_cpu_speed_torch () {
fi
if ! python perf-tests/modules/test_cpu_torch.py ${ARGS}; then
echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."
echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}

View File

@ -1,3 +1,5 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_torch_tensor () {
@ -17,7 +19,7 @@ test_cpu_speed_torch_tensor () {
fi
if ! python perf-tests/modules/test_cpu_torch_tensor.py ${ARGS}; then
echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."
echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}

View File

@ -19,7 +19,7 @@ test_gpu_speed_cudnn_lstm () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
@ -27,7 +27,7 @@ test_gpu_speed_cudnn_lstm () {
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -19,7 +19,7 @@ test_gpu_speed_lstm () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
@ -27,7 +27,7 @@ test_gpu_speed_lstm () {
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -19,7 +19,7 @@ test_gpu_speed_mlstm () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
@ -27,7 +27,7 @@ test_gpu_speed_mlstm () {
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -12,7 +12,7 @@ test_gpu_speed_mnist () {
cd examples/mnist
pip install -r requirements.txt
conda install -c pytorch torchvision
# Download data
python main.py --epochs 0
@ -23,7 +23,7 @@ test_gpu_speed_mnist () {
# Needs warm up to get accurate number
python main.py --epochs 1 --no-log
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
@ -31,7 +31,7 @@ test_gpu_speed_mnist () {
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -28,7 +28,7 @@ test_gpu_speed_word_language_model () {
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=$NUM_RUNS; i++ )) do
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --cuda --epochs 1)
echo $runtime
SAMPLE_ARRAY+=(${runtime})
@ -36,7 +36,7 @@ test_gpu_speed_word_language_model () {
cd ../..
stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo $stats

View File

@ -1,13 +1,18 @@
#!/bin/bash
# shellcheck disable=SC2034
COMPACT_JOB_NAME="short-perf-test-cpu"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
SCRIPT_PARENT_DIR=$(dirname "${BASH_SOURCE[0]}")
# shellcheck source=.jenkins/pytorch/common.sh
source "$SCRIPT_PARENT_DIR/common.sh"
cd .jenkins/pytorch/perf_test
echo "Running CPU perf test for PyTorch..."
pip install awscli
pip install -q awscli
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321

View File

@ -1,13 +1,17 @@
#!/bin/bash
# shellcheck disable=SC2034
COMPACT_JOB_NAME="short-perf-test-gpu"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
pushd .jenkins/pytorch/perf_test
echo "Running GPU perf test for PyTorch..."
pip install awscli
# Trying to uninstall PyYAML can cause problem. Workaround according to:
# https://github.com/pypa/pip/issues/5247#issuecomment-415571153
pip install -q awscli --ignore-installed PyYAML
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321

View File

@ -4,7 +4,9 @@
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"
# shellcheck disable=SC2034
COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch"
@ -23,17 +25,40 @@ if [ -n "${IN_CIRCLECI}" ]; then
sudo apt-get -qq install --no-install-recommends openssh-client openssh-server
sudo mkdir -p /var/run/sshd
fi
if [[ "$BUILD_ENVIRONMENT" == *-slow-* ]]; then
export PYTORCH_TEST_WITH_SLOW=1
export PYTORCH_TEST_SKIP_FAST=1
fi
fi
# --user breaks ppc64le builds and these packages are already in ppc64le docker
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
# JIT C++ extensions require ninja.
pip install -q ninja --user
pip_install --user ninja
# ninja is installed in /var/lib/jenkins/.local/bin
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# TODO: move this to Docker
pip install -q hypothesis --user
pip_install --user hypothesis
# TODO: move this to Docker
PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)
echo $PYTHON_VERSION
# if [[ $PYTHON_VERSION == "2" ]]; then
# pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py2-none-any.whl
# else
# pip_install --user https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# fi
pip_install --user tb-nightly
# mypy will fail to install on Python <3.4. In that case,
# we just won't run these tests.
pip_install --user mypy || true
fi
# faulthandler become built-in since 3.3
if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
pip_install --user faulthandler
fi
# DANGER WILL ROBINSON. The LD_PRELOAD here could cause you problems
@ -60,13 +85,6 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
# Increase stack size, because ASAN red zones use more stack
ulimit -s 81920
function get_exit_code() {
set +e
"$@"
retcode=$?
set -e
return $retcode
}
(cd test && python -c "import torch")
echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")
@ -74,24 +92,20 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export PYTORCH_TEST_WITH_ROCM=1
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
fi
if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then
if [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX-* ]]; then
export ATEN_CPU_CAPABILITY=default
elif [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then
export ATEN_CPU_CAPABILITY=avx
fi
test_python_nn() {
time python test/run_test.py --include nn --verbose
assert_git_not_dirty
}
test_python_all_except_nn() {
time python test/run_test.py --exclude nn --verbose
assert_git_not_dirty
}
test_aten() {
@ -115,38 +129,28 @@ test_aten() {
ls build/bin
aten/tools/run_tests.sh build/bin
assert_git_not_dirty
fi
}
test_torchvision() {
rm -rf ninja
echo "Installing torchvision at branch master"
rm -rf vision
# TODO: This git clone is bad, it means pushes to torchvision can break
# PyTorch CI
git clone https://github.com/pytorch/vision --quiet
pushd vision
# python setup.py install with a tqdm dependency is broken in the
# Travis Python nightly (but not in latest Python nightlies, so
# this should be a transient requirement...)
# See https://github.com/pytorch/pytorch/issues/7525
#time python setup.py install
pip install -q --user .
popd
pip_install --user git+https://github.com/pytorch/vision.git@2b73a4846773a670632b29fb2fc2ac57df7bce5d
}
test_libtorch() {
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
echo "Testing libtorch"
CPP_BUILD="$PWD/../cpp-build"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$CPP_BUILD"/caffe2/bin/test_jit
else
"$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"
fi
python tools/download_mnist.py --quiet -d mnist
OMP_NUM_THREADS=2 "$CPP_BUILD"/caffe2/bin/test_api
echo "Testing libtorch"
python test/cpp/jit/tests_setup.py setup
CPP_BUILD="$PWD/../cpp-build"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
"$CPP_BUILD"/caffe2/build/bin/test_jit
else
"$CPP_BUILD"/caffe2/build/bin/test_jit "[cpu]"
fi
python test/cpp/jit/tests_setup.py shutdown
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/build/bin/test_api
assert_git_not_dirty
fi
}
@ -155,32 +159,46 @@ test_custom_script_ops() {
echo "Testing custom script operators"
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
pushd test/custom_operator
cp -r "$CUSTOM_OP_BUILD" build
cp -a "$CUSTOM_OP_BUILD" build
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
assert_git_not_dirty
fi
}
if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
test_xla() {
export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"
pushd xla
python test/test_operations.py
python test/test_train_mnist.py --tidy
popd
assert_git_not_dirty
}
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
test_torchvision
test_xla
elif [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then
test_torchvision
test_python_nn
elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then
test_python_all_except_nn
test_aten
test_libtorch
test_custom_script_ops
else
test_torchvision
test_python_nn
test_python_all_except_nn
test_aten
test_libtorch
test_custom_script_ops
else
if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
test_torchvision
test_python_nn
elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
test_torchvision
test_python_all_except_nn
test_aten
test_libtorch
test_custom_script_ops
fi
fi

View File

@ -9,164 +9,29 @@ if [ ! -f setup.py ]; then
exit 1
fi
# shellcheck disable=SC2034
COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
source "$SCRIPT_PARENT_DIR/common.sh"
export IMAGE_COMMIT_ID=`git rev-parse HEAD`
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
fi
mkdir -p ci_scripts/
export TMP_DIR="${PWD}/build/win_tmp"
export TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")
cat >ci_scripts/upload_image.py << EOL
export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers
import os
import sys
import boto3
IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')
$SCRIPT_HELPERS_DIR/build_pytorch.bat
session = boto3.session.Session()
s3 = session.resource('s3')
data = open(sys.argv[1], 'rb')
s3.Bucket('ossci-windows-build').put_object(Key='pytorch/'+IMAGE_COMMIT_TAG+'.7z', Body=data)
object_acl = s3.ObjectAcl('ossci-windows-build','pytorch/'+IMAGE_COMMIT_TAG+'.7z')
response = object_acl.put(ACL='public-read')
assert_git_not_dirty
EOL
cat >ci_scripts/build_pytorch.bat <<EOL
set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%
:: Install MKL
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output mkl.7z
) else (
aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z mkl.7z --quiet
)
7z x -aoa mkl.7z -omkl
)
set CMAKE_INCLUDE_PATH=%cd%\\mkl\\include
set LIB=%cd%\\mkl\\lib;%LIB
:: Install MAGMA
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/magma_2.4.0_cuda90_release.7z --output magma_2.4.0_cuda90_release.7z
) else (
aws s3 cp s3://ossci-windows/magma_2.4.0_cuda90_release.7z magma_2.4.0_cuda90_release.7z --quiet
)
7z x -aoa magma_2.4.0_cuda90_release.7z -omagma
)
set MAGMA_HOME=%cd%\\magma
:: Install sccache
mkdir %CD%\\tmp_bin
if "%REBUILD%"=="" (
:check_sccache
%CD%\\tmp_bin\\sccache.exe --show-stats || (
taskkill /im sccache.exe /f /t || ver > nul
del %CD%\\tmp_bin\\sccache.exe
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %CD%\\tmp_bin\\sccache.exe
) else (
aws s3 cp s3://ossci-windows/sccache.exe %CD%\\tmp_bin\\sccache.exe
)
goto :check_sccache
)
)
:: Install Miniconda3
if "%BUILD_ENVIRONMENT%"=="" (
set CONDA_PARENT_DIR=%CD%
) else (
set CONDA_PARENT_DIR=C:\\Jenkins
)
if "%REBUILD%"=="" (
IF EXIST %CONDA_PARENT_DIR%\\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\\Miniconda3 )
curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O
.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\\Miniconda3
)
call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3
if "%REBUILD%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3
)
:: Install ninja
if "%REBUILD%"=="" ( pip install ninja )
set WORKING_DIR=%CD%
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64
call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64
cd %WORKING_DIR%
git submodule update --init --recursive
set PATH=%CD%\\tmp_bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%
set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt
set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64
set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0
:: Target only our CI GPU machine's CUDA arch to speed up the build
set TORCH_CUDA_ARCH_LIST=5.2
sccache --stop-server
sccache --start-server
sccache --zero-stats
set CC=sccache cl
set CXX=sccache cl
set DISTUTILS_USE_SDK=1
set CMAKE_GENERATOR=Ninja
if not "%USE_CUDA%"=="1" (
if "%REBUILD%"=="" (
set NO_CUDA=1
python setup.py install
)
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
)
if not "%USE_CUDA%"=="0" (
if "%REBUILD%"=="" (
sccache --show-stats
sccache --zero-stats
rd /s /q %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch
for /f "delims=" %%i in ('where /R caffe2\proto *.py') do (
IF NOT "%%i" == "%CD%\caffe2\proto\__init__.py" (
del /S /Q %%i
)
)
copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe
)
set CUDA_NVCC_EXECUTABLE=%CD%\\tmp_bin\\nvcc
if "%REBUILD%"=="" set NO_CUDA=0
python setup.py install && sccache --show-stats && (
if "%BUILD_ENVIRONMENT%"=="" (
echo NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3\` in Command Prompt before running Git Bash.
) else (
mv %CD%\\build\\bin\\test_api.exe %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch\\lib
7z a %IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z
)
)
)
EOL
ci_scripts/build_pytorch.bat
if [ ! -f $IMAGE_COMMIT_TAG.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then
if [ ! -f ${TMP_DIR}/${IMAGE_COMMIT_TAG}.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then
exit 1
fi
echo "BUILD PASSED"

View File

@ -0,0 +1,116 @@
if "%DEBUG%" == "1" (
set BUILD_TYPE=debug
) ELSE (
set BUILD_TYPE=release
)
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%
set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers
call %INSTALLER_DIR%\install_mkl.bat
call %INSTALLER_DIR%\install_magma.bat
call %INSTALLER_DIR%\install_sccache.bat
call %INSTALLER_DIR%\install_miniconda3.bat
:: Install ninja
if "%REBUILD%"=="" ( pip install -q ninja )
git submodule sync --recursive
git submodule update --init --recursive
if "%CUDA_VERSION%" == "9" goto cuda_build_9
if "%CUDA_VERSION%" == "10" goto cuda_build_10
goto cuda_build_end
:cuda_build_9
:: Override VS env here
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
@echo on
popd
set DISTUTILS_USE_SDK=1
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=%CUDA_PATH%
goto cuda_build_common
:cuda_build_10
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
set CUDA_PATH_V10_1=%CUDA_PATH%
goto cuda_build_common
:cuda_build_common
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
:cuda_build_end
set PATH=%TMP_DIR_WIN%\bin;%PATH%
:: Target only our CI GPU machine's CUDA arch to speed up the build
set TORCH_CUDA_ARCH_LIST=5.2
sccache --stop-server
sccache --start-server
sccache --zero-stats
set CC=sccache cl
set CXX=sccache cl
set CMAKE_GENERATOR=Ninja
:: The following code will try to build PyTorch twice if USE_CUDA is neither 0
:: nor 1. It is intended so that both builds can be folded into 1 CI run.
if not "%USE_CUDA%"=="1" (
if "%REBUILD%"=="" (
:: Must save and restore the original value of USE_CUDA, otherwise the
:: `if not "%USE_CUDA%"=="0"` line can be messed up.
set OLD_USE_CUDA=%USE_CUDA%
set USE_CUDA=0
python setup.py install
set USE_CUDA=%OLD_USE_CUDA%
)
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
)
if not "%USE_CUDA%"=="0" (
:: sccache will fail for CUDA builds if all cores are used for compiling
if not defined MAX_JOBS set /A MAX_JOBS=%NUMBER_OF_PROCESSORS%-1
if "%REBUILD%"=="" (
sccache --show-stats
sccache --zero-stats
rd /s /q %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch
for /f "delims=" %%i in ('where /R caffe2\proto *.py') do (
IF NOT "%%i" == "%CD%\caffe2\proto\__init__.py" (
del /S /Q %%i
)
)
copy %TMP_DIR_WIN%\bin\sccache.exe %TMP_DIR_WIN%\bin\nvcc.exe
)
set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc
if "%REBUILD%"=="" set USE_CUDA=1
python setup.py install --cmake && sccache --show-stats && (
if "%BUILD_ENVIRONMENT%"=="" (
echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.
) else (
7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && python %SCRIPT_HELPERS_DIR%\upload_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z
)
)
)

View File

@ -0,0 +1,21 @@
#!/usr/bin/env python
import os
import sys
import boto3
import botocore
IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')
session = boto3.session.Session()
s3 = session.resource('s3')
BUCKET_NAME = 'ossci-windows-build'
KEY = 'pytorch/' + IMAGE_COMMIT_TAG + '.7z'
LOCAL_FILE_PATH = sys.argv[1]
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, LOCAL_FILE_PATH)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise

View File

@ -0,0 +1,17 @@
if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda90
if "%CUDA_VERSION%" == "10" set CUDA_SUFFIX=cuda101
if "%CUDA_SUFFIX%" == "" (
echo unknown CUDA version, please set `CUDA_VERSION` to 9 or 10.
exit /b 1
)
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
) else (
aws s3 cp s3://ossci-windows/magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
)
7z x -aoa %TMP_DIR_WIN%\magma_2.5.0_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
)
set MAGMA_HOME=%TMP_DIR_WIN%\magma

View File

@ -0,0 +1,15 @@
if "%BUILD_ENVIRONMENT%"=="" (
set CONDA_PARENT_DIR=%CD%
) else (
set CONDA_PARENT_DIR=C:\Jenkins
)
if "%REBUILD%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3
if "%REBUILD%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3
)

View File

@ -0,0 +1,10 @@
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output %TMP_DIR_WIN%\mkl.7z
) else (
aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z %TMP_DIR_WIN%\mkl.7z --quiet
)
7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl
)
set CMAKE_INCLUDE_PATH=%TMP_DIR_WIN%\mkl\include
set LIB=%TMP_DIR_WIN%\mkl\lib;%LIB

View File

@ -0,0 +1,15 @@
mkdir %TMP_DIR_WIN%\bin
if "%REBUILD%"=="" (
:check_sccache
%TMP_DIR_WIN%\bin\sccache.exe --show-stats || (
taskkill /im sccache.exe /f /t || ver > nul
del %TMP_DIR_WIN%\bin\sccache.exe
if "%BUILD_ENVIRONMENT%"=="" (
curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe
) else (
aws s3 cp s3://ossci-windows/sccache.exe %TMP_DIR_WIN%\bin\sccache.exe
)
goto :check_sccache
)
)

View File

@ -0,0 +1,39 @@
#!/usr/bin/env python
from __future__ import print_function
import subprocess
TESTS = [
(
"Checking that caffe2.python is available",
"from caffe2.python import core",
),
(
"Checking that MKL is available",
"import torch; exit(0 if torch.backends.mkl.is_available() else 1)",
),
(
"Checking that CUDA archs are setup correctly",
"import torch; torch.randn([3,5]).cuda()",
),
(
"Checking that magma is available",
"import torch; torch.rand(1).cuda(); exit(0 if torch.cuda.has_magma else 1)",
),
(
"Checking that CuDNN is available",
"import torch; exit(0 if torch.backends.cudnn.is_available() else 1)",
),
]
if __name__ == "__main__":
for description, python_commands in TESTS:
print(description)
command_args = ["python", "-c", python_commands]
command_string = " ".join(command_args)
print("Command:", command_string)
subprocess.check_call(command_args)

View File

@ -0,0 +1,85 @@
if exist "%TMP_DIR%/ci_scripts/pytorch_env_restore.bat" (
call %TMP_DIR%/ci_scripts/pytorch_env_restore.bat
exit /b 0
)
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;%PATH%
:: Install Miniconda3
if "%BUILD_ENVIRONMENT%"=="" (
set CONDA_PARENT_DIR=%CD%
) else (
set CONDA_PARENT_DIR=C:\Jenkins
)
if NOT "%BUILD_ENVIRONMENT%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3
if NOT "%BUILD_ENVIRONMENT%"=="" (
:: We have to pin Python version to 3.6.7, until mkl supports Python 3.7
:: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0
)
pip install -q ninja future hypothesis "librosa>=0.6.2" psutil pillow
:: No need to install faulthandler since we only test Python >= 3.6 on Windows
:: faulthandler is builtin since Python 3.3
if "%CUDA_VERSION%" == "9" goto cuda_build_9
if "%CUDA_VERSION%" == "10" goto cuda_build_10
goto cuda_build_end
:cuda_build_9
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
@echo on
popd
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
set CUDA_PATH_V9_0=%CUDA_PATH%
goto cuda_build_common
:cuda_build_10
pushd .
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
@echo on
popd
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
set CUDA_PATH_V10_1=%CUDA_PATH%
goto cuda_build_common
:cuda_build_common
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set NUMBAPRO_CUDALIB=%CUDA_PATH%\bin
set NUMBAPRO_LIBDEVICE=%CUDA_PATH%\nvvm\libdevice
set NUMBAPRO_NVVM=%CUDA_PATH%\nvvm\bin\nvvm64_32_0.dll
:cuda_build_end
set PYTHONPATH=%TMP_DIR_WIN%\build;%PYTHONPATH%
if NOT "%BUILD_ENVIRONMENT%"=="" (
pushd %TMP_DIR_WIN%\build
python %SCRIPT_HELPERS_DIR%\download_image.py %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z
:: 7z: -aos skips if exists because this .bat can be called multiple times
7z x %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z -aos
popd
) else (
xcopy /s %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %TMP_DIR_WIN%\build\torch\
)
@echo off
echo @echo off >> %TMP_DIR%/ci_scripts/pytorch_env_restore.bat
for /f "usebackq tokens=*" %%i in (`set`) do echo set "%%i" >> %TMP_DIR%/ci_scripts/pytorch_env_restore.bat
@echo on

View File

@ -0,0 +1,30 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
cd test\custom_operator
:: Build the custom operator library.
mkdir build
pushd build
echo "Executing CMake for custom_operator test..."
:: Note: Caffe2 does not support MSVC + CUDA + Debug mode (has to be Release mode)
cmake -DCMAKE_PREFIX_PATH=%TMP_DIR_WIN%\build\torch -DCMAKE_BUILD_TYPE=Release -GNinja ..
if ERRORLEVEL 1 exit /b 1
echo "Executing Ninja for custom_operator test..."
ninja -v
if ERRORLEVEL 1 exit /b 1
echo "Ninja succeeded for custom_operator test."
popd
:: Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module="build/model.pt"
:: Run tests C++-side and load the exported script module.
cd build
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_custom_ops.exe model.pt

View File

@ -0,0 +1,7 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
cd %TMP_DIR_WIN%\build\torch\bin
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_api.exe --gtest_filter="-IntegrationTest.MNIST*"
if errorlevel 1 exit /b 1

View File

@ -0,0 +1,2 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
cd test && python run_test.py --exclude nn --verbose && cd ..

View File

@ -0,0 +1,16 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
pushd test
echo Some smoke tests
python %SCRIPT_HELPERS_DIR%\run_python_nn_smoketests.py
if ERRORLEVEL 1 exit /b 1
echo Run nn tests
python run_test.py --include nn --verbose
if ERRORLEVEL 1 exit /b 1
popd

Some files were not shown because too many files have changed in this diff Show More